Add helper script for quarterlies #4074
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Currently just checks what columns were dropped or added; could be expanded to auto-generate candidate updates to column mappings.
What problem does this address?
Quarterly updates for sources that frequently change the layout/schema/spelling of their raw files are tedious. The 2025Q1 update for EIA 930 dropped 9 and added 30 columns. I used a more-horrible version of this script to help identify and organize the necessary changes to column mappings.
What did you change?
scripts/
folderquarterlies-helper.py
, intended to be run from the command lineDocumentation
Make sure to update relevant aspects of the documentation.
docs/data_sources/templates
).src/metadata
).Testing
How did you make sure this worked? How can a reviewer verify this?
Sample usage:
To-do list
If updating analyses or data processing functions: make sure to update or write data validation tests (e.g.test_minmax_rows()
).Runmake pytest-coverage
locally to ensure that the merge queue will accept your PR.For minor ETL changes or data additions, oncemake pytest-coverage
passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests usingpytest
and--live-dbs
.For bigger ETL or data changes run the full ETL locally and then run the data validations usingmake pytest-validate
.Alternatively, run thebuild-deploy-pudl
GitHub Action manually.