Add refactored data processing module to reporting #106

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

flora-hofmann-frequenz wants to merge 1 commit into frequenz-floss:v0.x.x from flora-hofmann-frequenz:dataprocessing_refactored

Contributor

flora-hofmann-frequenz commented Jun 20, 2025

@cyiallou & @cwasicki - this PR takes in most of Costa's recommendations from PR #102

Can you review this one first, and then we decide if we should still look at #102?!

flora-hofmann-frequenz requested review from cyiallou, cwasicki and Copilot

June 20, 2025 13:23

flora-hofmann-frequenz requested a review from a team as a code owner

June 20, 2025 13:23

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull Request Overview

This PR introduces a refactored data processing module aimed at generating microgrid energy reports with improved transformations and renaming strategies.

Refactored functions for timezone conversion, grid data processing, and PV metrics computation
Enhanced column renaming functionality based on configurable component types

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated Show resolved Hide resolved

cyiallou previously approved these changes

View reviewed changes

Contributor

cyiallou left a comment

Thanks Flora. It looks good to me. I only have a couple of minor comments.

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated Show resolved Hide resolved

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated Show resolved Hide resolved

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated Show resolved Hide resolved

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated Show resolved Hide resolved

Contributor

cyiallou commented Jun 24, 2025

Oh yes, I forgot about the CI checks. Those can be easily fixed I guess.

cwasicki reviewed

View reviewed changes

src/frequenz/lib/notebooks/reporting/data_processing.py

+                      COLUMN_CONSUMPTION: COLUMN_CONSUMPTION_NAMED,
+                  }
+                  if "battery" in component_types:

Contributor

cwasicki Jun 25, 2025

I don't think you need the if clauses for component types.

Contributor Author

flora-hofmann-frequenz Jun 26, 2025

There are customer who do not have a battery or pv and then I do not want the specific metrics or graphs to be displayed.

Contributor

cwasicki Jul 11, 2025

But this is just the rename map, it only renames columns that exist. You can define the full mapping independent of the data you select.

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated

+              Outputs:
+              --------
+              Each function returns one of:

Contributor

cwasicki Jun 25, 2025

Can't we have a single function here that prepares a single data frame with all columns that are required for the notebook? I don't understand why we have so many functions that are called from the notebook.

src/frequenz/lib/notebooks/reporting/data_processing.py Outdated Show resolved Hide resolved

flora-hofmann-frequenz mentioned this pull request

Add data_processing module to reporting #102

Closed

flora-hofmann-frequenz dismissed cyiallou’s stale review via

a5e45ab

July 11, 2025 12:00

flora-hofmann-frequenz force-pushed the dataprocessing_refactored branch from 3efefcb to a5e45ab Compare

July 11, 2025 12:00

github-actions bot added the part:docs label

flora-hofmann-frequenz force-pushed the dataprocessing_refactored branch from a5e45ab to 9475301 Compare

July 11, 2025 12:08


          Add refactored data processing module to reporting

8b30ea6

Signed-off-by: Flora <[email protected]>

flora-hofmann-frequenz force-pushed the dataprocessing_refactored branch from 9475301 to 8b30ea6 Compare

July 11, 2025 12:15

Contributor Author

flora-hofmann-frequenz commented Jul 11, 2025

Sorry this one took so long, but I went back to the drawing board: Now its mostly one function that enriches the output from the reporting API. The other functions that are added help with all the different configuration that we have between the different customers set ups, so I would like to keep them in this module (rather than having the notebook too convoluted). I tried to include all feedback give before here and in PR #102.

flora-hofmann-frequenz requested review from cwasicki and cyiallou

July 11, 2025 12:35

matthias-wende-frequenz reviewed

View reviewed changes

src/frequenz/lib/notebooks/reporting/data_processing.py

+              def transform_energy_dataframe(
+                  df: pd.DataFrame,
+                  component_types: List[str],
+                  mcfg: Any,

Contributor

matthias-wende-frequenz Jul 24, 2025

You shouldn't use the Any type annotation here. You can use the actual type, MicrogridConfig instead.

cwasicki reviewed

View reviewed changes

src/frequenz/lib/notebooks/reporting/data_processing.py

+                      COLUMN_CONSUMPTION: COLUMN_CONSUMPTION_NAMED,
+                  }
+                  if "battery" in component_types:

Contributor

cwasicki Jul 11, 2025

But this is just the rename map, it only renames columns that exist. You can define the full mapping independent of the data you select.

src/frequenz/lib/notebooks/reporting/data_processing.py

+                  adding derived columns for PV production, battery throughput, and grid metrics.
+                  Args:
+                      df: Raw DataFrame with energy metrics, expected to have a datetime index.

Contributor

cwasicki Jul 28, 2025

What are the input columns required on the input data?

src/frequenz/lib/notebooks/reporting/data_processing.py


		main_df = df_renamed[_get_main_columns(df_renamed.columns, component_types)]

		return main_df, df_renamed

Contributor

cwasicki Jul 28, 2025

Why do we need two data frames as ouput?

src/frequenz/lib/notebooks/reporting/data_processing.py

+, pd.NA
+                      )
+                  # Convert timestamp to Berlin time

Contributor

cwasicki Jul 28, 2025

Would move this to the beginning of this function.

src/frequenz/lib/notebooks/reporting/data_processing.py



		def compute_power_df(
		main_df: pd.DataFrame, resolution: Union[str, pd.Timedelta]

Contributor

cwasicki Jul 28, 2025

str | pd.Timedelta

Contributor

cwasicki Jul 28, 2025

Why not a datetime.timedelta which we usually use?

src/frequenz/lib/notebooks/reporting/data_processing.py

+                          else [f"PV {pv}" for pv in pv_filter]
+                      )
+                      df = main_df[[COLUMN_TIMESTAMP_NAMED] + pv_columns].copy()
+                      df = df.melt(

Contributor

cwasicki Jul 28, 2025

Why is it converted to long format?

src/frequenz/lib/notebooks/reporting/data_processing.py

+                  Args:
+                      main_df: DataFrame containing PV and grid data.
+                      pv_filter: List of PV components to include (e.g., ["1", "2"] or ["Alle"]).
+                      pvgrid_filter: Filter option for PV and grid analysis (e.g., "PV", "Grid", "PV + Grid").

Contributor

cwasicki Jul 28, 2025

What are these three modes?

src/frequenz/lib/notebooks/reporting/data_processing.py

+                          var_name="PV",
+                          value_name=COLUMN_PV_FEEDIN,
+                      )
+                      df[COLUMN_GRID_NAMED] /= len(pv_columns)

Contributor

cwasicki Jul 28, 2025

What is this calculating?

src/frequenz/lib/notebooks/reporting/data_processing.py

		print(f"{pv:<7}: {formatted_sum} kWh")


		def create_pv_analysis_df(

Contributor

cwasicki Jul 28, 2025

I don't really get what this and the next function are required for.

src/frequenz/lib/notebooks/reporting/data_processing.py

		COLUMN_PV_THROUGHPUT = "PV Durchsatz"


		def transform_energy_dataframe(

Contributor

cwasicki Jul 28, 2025

For this main function it would be good to have a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels