-
Notifications
You must be signed in to change notification settings - Fork 12
Add Ookla Speedtest Dataset #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for the Ookla Speedtest dataset by introducing new scripts for data ingestion, processing, and raster generation, along with configuration updates and dependency adjustments.
- Updated dependency list in pyproject.toml to align with required packages for the new dataset.
- Added multiple Python modules for downloading, processing, and outputting raster layers from Ookla Speedtest data.
- Included a new configuration file and a main script that integrates the workflow.
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
pyproject.toml | Dependency updates to support the new dataset ingestion and processing. |
datasets/ookla_speedtest/src/transform_populate.py | Implements functions to parse parquet files and generate sparse raster arrays. |
datasets/ookla_speedtest/src/make_geopackage.py | Provides an optional utility to export GeoDataFrames as GeoPackages. |
datasets/ookla_speedtest/src/helpers.py | Defines constants and paths used across the new dataset processing workflow. |
datasets/ookla_speedtest/src/generate_raster.py | Contains functions to define raster profiles and write multiband raster data. |
datasets/ookla_speedtest/src/download_dataset.py | Implements functions to download dataset files from S3 using boto3. |
datasets/ookla_speedtest/main.py | Orchestrates the download and processing pipeline for Ookla Speedtest data. |
datasets/ookla_speedtest/config.toml | Provides configuration parameters and run settings for the dataset integration. |
# Doing the actual donwloading; calling the S3 client, and putting the S3 filenames together | ||
def download_files(year: int, quarters: dict = QUARTERS) -> None: | ||
""" | ||
Donwnloads the performance data files from the target Ookla S3 bucket for 1 year to a local directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word 'Donwnloads' is misspelled. It should be corrected to 'Downloads'.
Donwnloads the performance data files from the target Ookla S3 bucket for 1 year to a local directory | |
Downloads the performance data files from the target Ookla S3 bucket for 1 year to a local directory |
Copilot uses AI. Check for mistakes.
datasets/ookla_speedtest/main.py
Outdated
if output_path.exists() and not self.overwrite_processing: | ||
logger.info(f"Processed layer exists: {output_path}") | ||
else: | ||
logger.info(f"Processing file: {input_path}. Ouput will be saved to: {output_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word 'Ouput' is misspelled. It should be corrected to 'Output'.
logger.info(f"Processing file: {input_path}. Ouput will be saved to: {output_path}") | |
logger.info(f"Processing file: {input_path}. Output will be saved to: {output_path}") |
Copilot uses AI. Check for mistakes.
I didn't mean to summon the AI spellchecker 😂 |
Contents contain scripts for ingesting, processing, and outputting raster layers for Ookla Speedtest + applying it to the class.