-
Notifications
You must be signed in to change notification settings - Fork 4
Add refactored data processing module to reporting #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.x.x
Are you sure you want to change the base?
Add refactored data processing module to reporting #106
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a refactored data processing module aimed at generating microgrid energy reports with improved transformations and renaming strategies.
- Refactored functions for timezone conversion, grid data processing, and PV metrics computation
- Enhanced column renaming functionality based on configurable component types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Flora. It looks good to me. I only have a couple of minor comments.
Oh yes, I forgot about the CI checks. Those can be easily fixed I guess. |
COLUMN_CONSUMPTION: COLUMN_CONSUMPTION_NAMED, | ||
} | ||
|
||
if "battery" in component_types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need the if
clauses for component types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are customer who do not have a battery or pv and then I do not want the specific metrics or graphs to be displayed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is just the rename map, it only renames columns that exist. You can define the full mapping independent of the data you select.
|
||
Outputs: | ||
-------- | ||
Each function returns one of: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we have a single function here that prepares a single data frame with all columns that are required for the notebook? I don't understand why we have so many functions that are called from the notebook.
3efefcb
to
a5e45ab
Compare
a5e45ab
to
9475301
Compare
Signed-off-by: Flora <[email protected]>
9475301
to
8b30ea6
Compare
Sorry this one took so long, but I went back to the drawing board: Now its mostly one function that enriches the output from the reporting API. The other functions that are added help with all the different configuration that we have between the different customers set ups, so I would like to keep them in this module (rather than having the notebook too convoluted). I tried to include all feedback give before here and in PR #102. |
def transform_energy_dataframe( | ||
df: pd.DataFrame, | ||
component_types: List[str], | ||
mcfg: Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't use the Any
type annotation here. You can use the actual type, MicrogridConfig
instead.
COLUMN_CONSUMPTION: COLUMN_CONSUMPTION_NAMED, | ||
} | ||
|
||
if "battery" in component_types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is just the rename map, it only renames columns that exist. You can define the full mapping independent of the data you select.
adding derived columns for PV production, battery throughput, and grid metrics. | ||
|
||
Args: | ||
df: Raw DataFrame with energy metrics, expected to have a datetime index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the input columns required on the input data?
|
||
main_df = df_renamed[_get_main_columns(df_renamed.columns, component_types)] | ||
|
||
return main_df, df_renamed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need two data frames as ouput?
0, pd.NA | ||
) | ||
|
||
# Convert timestamp to Berlin time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would move this to the beginning of this function.
|
||
|
||
def compute_power_df( | ||
main_df: pd.DataFrame, resolution: Union[str, pd.Timedelta] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str | pd.Timedelta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not a datetime.timedelta
which we usually use?
else [f"PV {pv}" for pv in pv_filter] | ||
) | ||
df = main_df[[COLUMN_TIMESTAMP_NAMED] + pv_columns].copy() | ||
df = df.melt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it converted to long format?
Args: | ||
main_df: DataFrame containing PV and grid data. | ||
pv_filter: List of PV components to include (e.g., ["1", "2"] or ["Alle"]). | ||
pvgrid_filter: Filter option for PV and grid analysis (e.g., "PV", "Grid", "PV + Grid"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are these three modes?
var_name="PV", | ||
value_name=COLUMN_PV_FEEDIN, | ||
) | ||
df[COLUMN_GRID_NAMED] /= len(pv_columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this calculating?
print(f"{pv:<7}: {formatted_sum} kWh") | ||
|
||
|
||
def create_pv_analysis_df( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get what this and the next function are required for.
COLUMN_PV_THROUGHPUT = "PV Durchsatz" | ||
|
||
|
||
def transform_energy_dataframe( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this main function it would be good to have a test.
@cyiallou & @cwasicki - this PR takes in most of Costa's recommendations from PR #102
Can you review this one first, and then we decide if we should still look at #102?!