Skip to content

Conversation

flora-hofmann-frequenz
Copy link
Contributor

@cyiallou & @cwasicki - this PR takes in most of Costa's recommendations from PR #102

Can you review this one first, and then we decide if we should still look at #102?!

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a refactored data processing module aimed at generating microgrid energy reports with improved transformations and renaming strategies.

  • Refactored functions for timezone conversion, grid data processing, and PV metrics computation
  • Enhanced column renaming functionality based on configurable component types

cyiallou
cyiallou previously approved these changes Jun 24, 2025
Copy link
Contributor

@cyiallou cyiallou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Flora. It looks good to me. I only have a couple of minor comments.

@cyiallou
Copy link
Contributor

Oh yes, I forgot about the CI checks. Those can be easily fixed I guess.

COLUMN_CONSUMPTION: COLUMN_CONSUMPTION_NAMED,
}

if "battery" in component_types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the if clauses for component types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are customer who do not have a battery or pv and then I do not want the specific metrics or graphs to be displayed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is just the rename map, it only renames columns that exist. You can define the full mapping independent of the data you select.


Outputs:
--------
Each function returns one of:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we have a single function here that prepares a single data frame with all columns that are required for the notebook? I don't understand why we have so many functions that are called from the notebook.

@flora-hofmann-frequenz
Copy link
Contributor Author

Sorry this one took so long, but I went back to the drawing board: Now its mostly one function that enriches the output from the reporting API. The other functions that are added help with all the different configuration that we have between the different customers set ups, so I would like to keep them in this module (rather than having the notebook too convoluted). I tried to include all feedback give before here and in PR #102.

def transform_energy_dataframe(
df: pd.DataFrame,
component_types: List[str],
mcfg: Any,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't use the Any type annotation here. You can use the actual type, MicrogridConfig instead.

COLUMN_CONSUMPTION: COLUMN_CONSUMPTION_NAMED,
}

if "battery" in component_types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this is just the rename map, it only renames columns that exist. You can define the full mapping independent of the data you select.

adding derived columns for PV production, battery throughput, and grid metrics.

Args:
df: Raw DataFrame with energy metrics, expected to have a datetime index.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the input columns required on the input data?


main_df = df_renamed[_get_main_columns(df_renamed.columns, component_types)]

return main_df, df_renamed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need two data frames as ouput?

0, pd.NA
)

# Convert timestamp to Berlin time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would move this to the beginning of this function.



def compute_power_df(
main_df: pd.DataFrame, resolution: Union[str, pd.Timedelta]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str | pd.Timedelta

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a datetime.timedelta which we usually use?

else [f"PV {pv}" for pv in pv_filter]
)
df = main_df[[COLUMN_TIMESTAMP_NAMED] + pv_columns].copy()
df = df.melt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it converted to long format?

Args:
main_df: DataFrame containing PV and grid data.
pv_filter: List of PV components to include (e.g., ["1", "2"] or ["Alle"]).
pvgrid_filter: Filter option for PV and grid analysis (e.g., "PV", "Grid", "PV + Grid").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these three modes?

var_name="PV",
value_name=COLUMN_PV_FEEDIN,
)
df[COLUMN_GRID_NAMED] /= len(pv_columns)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this calculating?

print(f"{pv:<7}: {formatted_sum} kWh")


def create_pv_analysis_df(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get what this and the next function are required for.

COLUMN_PV_THROUGHPUT = "PV Durchsatz"


def transform_energy_dataframe(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this main function it would be good to have a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:docs Affects the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants