-
Notifications
You must be signed in to change notification settings - Fork 40
Import and Export data from Agents and Neurons #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Solves #120 |
|
||
|
||
def export_history(self, | ||
filename:Union[str,None] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the best way to type hint? Shouldn't we use filename: str | None = None
nowadays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! But it depends what python version do wanna support. I think anything below 3.10 might be an issue but I agree that 3.10 is a fair. In the config we have python>=3.7
"""Exports the agent history to a csv file at the given filename.Only the parameters saved to history are exported, not the agent parameters. | ||
Args: | ||
filename (str, optional): The name of the file to save the history to. Defaults to "agent_history.csv". | ||
params_to_export (list, optional): A list of parameters to export from the agent. If None, exports all parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think this docstring needs updating (params --> keys, and save_to_file is missing)
@@ -8,7 +8,7 @@ | |||
|
|||
|
|||
class ValueNeuron(FeedForwardLayer): | |||
""" | |||
r""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no linting was complaining with all the backslashes in the docs but I guess it should be another PR to solve it library wide. removing it
import scipy | ||
import inspect | ||
import os | ||
import warnings | ||
from datetime import datetime | ||
from scipy import stats as stats | ||
from typing import Union | ||
|
||
from typing import Union, Tuple, List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, I recently learned this way to type hinting has been superseded by |
, tuple[]
and list[]
. Are you aware of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just chiming in to say that these modern type hinting features were introduced in Python 3.9 (for list[]
, dict[]
etc.) and Python 3.10 (|
inplace of Union
).
Your requirements currently state >= Python 3.7, so if someone tries to use RAIB with "modern" type hinting in an env <3.9, they will get errors.
My personal recommendation would be to just ditch older Python versions, anything <3.10 is anyway not officially supported by the whole scientific Python ecosystem, see SPEC 0.
Of course, you may have reaseons for supporting older Python versions that I'm not aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense! Thanks for the background. what do you think Mehul? Ditch <3.10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes exactly I agree with @niksirbi !! I commented this in the other thread!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grad. Let's do that. If it's easy go ahead and change all the typing to >=3.10 compatible. Or else we ca make a separate PR
|
||
|
||
def export_history(history_dict: dict, | ||
filename:Union[str,None] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason why file name should ever be optional? Currently Agent.export_history()
and Neurons.export_history()
pass it.
relatedly, is this an internal function? Do you ever foresee users calling this directly, or only via the Agent
and Neurons
wrappers. If so maybe we should mention this, or even rename it to _export_history
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh! So I made it optional so that if the user does not pass it the function automatically makes the name using the name of the agent or the neuron. This helps when one want to export all the neurons in the agent/ all agents in the env and they don't need to pass all the filenames. Check export_agents in Env.
Also _ is generally used for hidden functions. I think these should be callable by the user from the Agent/Neuron obejcts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But wouldn't that make this a hidden function. This is only callable though the user-facing Agent
or Neurons.export
APIs which contains the necessary logic for converting to a dataframe-compatible dict.
filename = filename.split('.')[0] + f".{format}" | ||
|
||
# add the prefix to the filename if it is provided | ||
if filename_prefix is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? Does filename_prefix
not just add unnecessary lines of code? What does this add that the user couldn't just give in the filename
argument. I would suggest removing this unless I've misunderstood
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for the env level export. The user can just give a prefix for the env and then all the agents will be export (using automatic names). User can definitely give filenames too!
This is for when the env has more than 1 agent
Hi Mehul, this is a great start! I have a few points I'd like to discuss and potentially resolve: 1. Streamlining Data Export LogicThe most significant aspect of this PR (and not your fault at all) is the current necessity to manually loop over and prepare each history entry. While I understand this is done to ensure each variable is saved as a single column in a CSV, I'm concerned about the repeated logic for Proposal: Generic I suggest creating a generic utility function, perhaps named
Benefits of this approach:
Sidenote: Take a look at 2. Importing Exported DataFollowing the above, we would ideally need a corresponding import utility that can invert this logic, reconstructing the original data structures from the exported DataFrame. This might be tricky, but I believe it's possible, you've kind of already been writing this anyway. Specifically, if any of the keys had matching prefixes followed by 3. Dependencies and File FormatIt looks like new dependencies might be introduced. My preference is to stick primarily to ALTERNATIVE APPROACH: Dual History MaintenanceI'm also open to an alternative idea: modifying This approach would eliminate the need to retrospectively convert the dictionary to a DataFrame, as both structures would always exist in parallel. However, my current preference still lies with the |
"""Exports the history of the agents in the environment. | ||
Args: | ||
agent_names (str, list[str]): the name of the agent you want to export the history for. If None, exports all agents. | ||
save (bool): whether to save the history to a file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose we should remove this parameter and always save to file. Can you think of a good reason why a users would want to export the data to a df but then not save? If so, wouldn't they just use the agent.history attribute.
What about this: We write all the logic for export to a dataframe. And have a wrapper on this which then saves this. o users have two separate APIs:
convert_history_to_dataframe()
and export_history()
. I think it's clearer that way.
print(f"Exporting history for agent {agent.name}") | ||
df = agent.export_history(keys_to_export=keys_to_export, | ||
save_to_file=save_to_file, | ||
filename_prefix=f"{self.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here this should be agent.name? No?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no..as this is looping over the agent objects after looking up the names passed by the user
In light of a further read and better understanding I think this might be better organised as follows: In utils a single function Once these are built we can think about an environment (and agent) level AP which saves all subagent (and subneuron) level data into a single csv. How does this sound? Very open to push back! |
I completely agree with this approach and basically had 2 choices:-
Yes1 I was aiming for a method that the users can do import/export at will but maybe we need to limit this also. For eg. we only allow exporting positions and the import functionality automatically populates all the others? (using a mini simulation?) - this will alow users to import data that does not have velocities,etc
This is a hard one to solve just using csvs for multiple reasons but main being that csv do not support arrays (we need that for fr in neurons - csv stores them as string!!) and I do believe that even positions should be an array. Loading and writing arrays in csv is super slow and the file tends to be very large. SLEAP/DLC use hdf5 file format to write data in a matrix. Also if we are able to do parquets it's much easier to make a general utility function to "solve all"!
I don't think this is necessary and just eats up on the ram. it takes very less compute power to convert and once we decide a format this can be avoided |
So I'd say that exporting here is a priority over importing. That's definitely the case for most RiaB users who generate data with the package and then study it elsewhere, but not sure about your needs @niksirbi. If that's the case then I think I greatly prefer the centralised method of having one function which converts any dictionary to a dataframe, living outside the specific classes. I don't think this necessarily precludes importing which, again, should be done with a centralised utility which loads a csv, converts to pandas then tries to group any columns with the same prefix. Then passes this to the My suggestion would be to prioritise exporting over importing and do this as cleanly as possible in a way which doesn't limit us further down the line.
This is sort of what is achieved by
Tbh I'm a little out of my depth here regarding what the community would be satisfied with and what is best practise (different things in my experience). However I am quite happy that riab still only has 4 dependencies so moving to 6 is a bg jump. Not against it, just want ot know it's worth it. Dumb question; could we just save as npz? |
Regarding our use-case, i.e. loading RiaB-generate Agent trajectories into As to the wider discussion, the needs of RiaB's users are of primary importance here, so feel free to implement export in any way that's maximally useful to them. We in |
@mehulrastogi does the above sound good to you? shall we go ahead and structure it a little in this direction? |
Made it so that one can call the following to save/import data from parquets:-
I choose parquets as we need to able to save lists in a df and parquet make it easy. I have also made an option in the functions so that the users can export it as a csv but the functionality might be limited (import might cause an issue-testing this)