Add helper script for quarterlies #4074

krivard · 2025-02-20T21:47:22Z

Overview

Currently just checks what columns were dropped or added; could be expanded to auto-generate candidate updates to column mappings.

What problem does this address?

Quarterly updates for sources that frequently change the layout/schema/spelling of their raw files are tedious. The 2025Q1 update for EIA 930 dropped 9 and added 30 columns. I used a more-horrible version of this script to help identify and organize the necessary changes to column mappings.

What did you change?

Added scripts/ folder
Added quarterlies-helper.py, intended to be run from the command line

Documentation

Make sure to update relevant aspects of the documentation.

Update the release notes: reference the PR and related issues.
Update relevant Data Source jinja templates (see docs/data_sources/templates).
Update relevant table or source description metadata (see src/metadata).
Review and update any other aspects of the documentation that might be affected by this PR.

Testing

How did you make sure this worked? How can a reviewer verify this?

Sample usage:

$ pudl_datastore
[...]
$ diff -q ../store/pudl_input/eia930/10.5281-zenodo.14026427 ../store/pudl_input/eia930/10.5281-zenodo.14792697 |tail -2
Files ../store/pudl_input/eia930/10.5281-zenodo.14026427/eia930-2024half2.zip and ../store/pudl_input/eia930/10.5281-zenodo.14792697/eia930-2024half2.zip differ
Only in ../store/pudl_input/eia930/10.5281-zenodo.14792697: eia930-2025half1.zip
$ python scripts/quarterlies-helper.py eia930 eia930-2024half2.zip 10.5281-zenodo.14026427 10.5281-zenodo.14792697 |nl
     1	====
     2	eia930-2024half2-balance.csv
     3	Column mismatch
     4	  Removed columns:
     5	    Net Generation (MW) from Hydropower and Pumped Storage
     6	    Net Generation (MW) from Hydropower and Pumped Storage (Adjusted)
     7	    Net Generation (MW) from Hydropower and Pumped Storage (Imputed)
     8	    Net Generation (MW) from Solar
     9	    Net Generation (MW) from Solar (Adjusted)
    10	    Net Generation (MW) from Solar (Imputed)
    11	    Net Generation (MW) from Wind
    12	    Net Generation (MW) from Wind (Adjusted)
    13	    Net Generation (MW) from Wind (Imputed)
    14	  Added columns:
    15	    Net Generation (MW) from Battery Storage
    16	    Net Generation (MW) from Battery Storage (Adjusted)
    17	    Net Generation (MW) from Battery Storage (Imputed)
    18	    Net Generation (MW) from Geothermal
    19	    Net Generation (MW) from Geothermal (Adjusted)
    20	    Net Generation (MW) from Geothermal (Imputed)
    21	    Net Generation (MW) from Hydropower Excluding Pumped Storage
    22	    Net Generation (MW) from Hydropower Excluding Pumped Storage (Adjusted)
    23	    Net Generation (MW) from Hydropower Excluding Pumped Storage (Imputed)
    24	    Net Generation (MW) from Other Energy Storage
    25	    Net Generation (MW) from Other Energy Storage (Adjusted)
    26	    Net Generation (MW) from Other Energy Storage (Imputed)
    27	    Net Generation (MW) from Pumped Storage
    28	    Net Generation (MW) from Pumped Storage  (Adjusted)
    29	    Net Generation (MW) from Pumped Storage (Imputed)
    30	    Net Generation (MW) from Solar with Integrated Battery Storage
    31	    Net Generation (MW) from Solar with Integrated Battery Storage (Imputed)
    32	    Net Generation (MW) from Solar witho Integrated Battery Storage (Adjusted)
    33	    Net Generation (MW) from Solar without Integrated Battery Storage
    34	    Net Generation (MW) from Solar without Integrated Battery Storage (Adjusted)
    35	    Net Generation (MW) from Solar without Integrated Battery Storage (Imputed)
    36	    Net Generation (MW) from Unknown Energy Storage
    37	    Net Generation (MW) from Unknown Energy Storage (Adjusted)
    38	    Net Generation (MW) from Unknown Energy Storage (Imputed)
    39	    Net Generation (MW) from Wind with Integrated Battery Storage
    40	    Net Generation (MW) from Wind with Integrated Battery Storage (Adjusted)
    41	    Net Generation (MW) from Wind with Integrated Battery Storage (Imputed)
    42	    Net Generation (MW) from Wind without Integrated Battery Storage
    43	    Net Generation (MW) from Wind without Integrated Battery Storage (Adjusted)
    44	    Net Generation (MW) from Wind without Integrated Battery Storage (Imputed)
    45	====
    46	eia930-2024half2-interchange.csv
    47	====
    48	eia930-2024half2-subregion.csv
$

To-do list

~~If updating analyses or data processing functions: make sure to update or write data validation tests (e.g. test_minmax_rows()).~~
~~Run make pytest-coverage locally to ensure that the merge queue will accept your PR.~~
Review the PR yourself and call out any questions or issues you have.
For minor ETL changes or data additions, once make pytest-coverage passes, make sure you have a fresh full PUDL DB downloaded locally, materialize new/changed assets and all their downstream assets and run relevant data validation tests using pytest and --live-dbs.
~~For bigger ETL or data changes run the full ETL locally and then run the data validations using make pytest-validate.~~
~~Alternatively, run the build-deploy-pudl GitHub Action manually.~~

Currently just checks what columns were dropped or added.

zaneselvans · 2025-02-27T16:23:07Z

The devtools/ directory is our existing scripty place which this might fit into. It's also overdue for a reorganiztion and the code in there might benefit from being pulled into a cli or scripts subpackage under src/pudl, since none of the code in there can be imported or tested elsewhere as it is now.

Add helper script for quarterlies.

be7d1a4

Currently just checks what columns were dropped or added.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add helper script for quarterlies #4074

Add helper script for quarterlies #4074

Uh oh!

krivard commented Feb 20, 2025

Uh oh!

zaneselvans commented Feb 27, 2025

Uh oh!

Uh oh!

Uh oh!

Add helper script for quarterlies #4074

Are you sure you want to change the base?

Add helper script for quarterlies #4074

Uh oh!

Conversation

krivard commented Feb 20, 2025

Overview

What problem does this address?

What did you change?

Documentation

Testing

How did you make sure this worked? How can a reviewer verify this?

To-do list

Uh oh!

zaneselvans commented Feb 27, 2025

Uh oh!

Uh oh!