Skip to content

Add Ookla Speedtest Dataset #188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Conversation

mrsanford
Copy link

Contents contain scripts for ingesting, processing, and outputting raster layers for Ookla Speedtest + applying it to the class.

@jacobwhall jacobwhall requested review from jacobwhall and Copilot May 8, 2025 18:06
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for the Ookla Speedtest dataset by introducing new scripts for data ingestion, processing, and raster generation, along with configuration updates and dependency adjustments.

  • Updated dependency list in pyproject.toml to align with required packages for the new dataset.
  • Added multiple Python modules for downloading, processing, and outputting raster layers from Ookla Speedtest data.
  • Included a new configuration file and a main script that integrates the workflow.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Dependency updates to support the new dataset ingestion and processing.
datasets/ookla_speedtest/src/transform_populate.py Implements functions to parse parquet files and generate sparse raster arrays.
datasets/ookla_speedtest/src/make_geopackage.py Provides an optional utility to export GeoDataFrames as GeoPackages.
datasets/ookla_speedtest/src/helpers.py Defines constants and paths used across the new dataset processing workflow.
datasets/ookla_speedtest/src/generate_raster.py Contains functions to define raster profiles and write multiband raster data.
datasets/ookla_speedtest/src/download_dataset.py Implements functions to download dataset files from S3 using boto3.
datasets/ookla_speedtest/main.py Orchestrates the download and processing pipeline for Ookla Speedtest data.
datasets/ookla_speedtest/config.toml Provides configuration parameters and run settings for the dataset integration.

# Doing the actual donwloading; calling the S3 client, and putting the S3 filenames together
def download_files(year: int, quarters: dict = QUARTERS) -> None:
"""
Donwnloads the performance data files from the target Ookla S3 bucket for 1 year to a local directory
Copy link
Preview

Copilot AI May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'Donwnloads' is misspelled. It should be corrected to 'Downloads'.

Suggested change
Donwnloads the performance data files from the target Ookla S3 bucket for 1 year to a local directory
Downloads the performance data files from the target Ookla S3 bucket for 1 year to a local directory

Copilot uses AI. Check for mistakes.

if output_path.exists() and not self.overwrite_processing:
logger.info(f"Processed layer exists: {output_path}")
else:
logger.info(f"Processing file: {input_path}. Ouput will be saved to: {output_path}")
Copy link
Preview

Copilot AI May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word 'Ouput' is misspelled. It should be corrected to 'Output'.

Suggested change
logger.info(f"Processing file: {input_path}. Ouput will be saved to: {output_path}")
logger.info(f"Processing file: {input_path}. Output will be saved to: {output_path}")

Copilot uses AI. Check for mistakes.

@jacobwhall
Copy link
Member

I didn't mean to summon the AI spellchecker 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants