Skip to content

Import and Export data from Agents and Neurons #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mehulrastogi
Copy link
Contributor

Made it so that one can call the following to save/import data from parquets:-

# Export data into a dataframe 
ag_his = Ag.export_history(save_to_file=True)
pcs_his = PCs.export_history(save_to_file=True)

Ag.import_history(filename='agent_agent_0_history.parquet')
PCs.import_history(filename='neuron_PlaceCells_history.parquet')

I choose parquets as we need to able to save lists in a df and parquet make it easy. I have also made an option in the functions so that the users can export it as a csv but the functionality might be limited (import might cause an issue-testing this)

@mehulrastogi
Copy link
Contributor Author

Solves #120



def export_history(self,
filename:Union[str,None] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the best way to type hint? Shouldn't we use filename: str | None = None nowadays?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! But it depends what python version do wanna support. I think anything below 3.10 might be an issue but I agree that 3.10 is a fair. In the config we have python>=3.7

"""Exports the agent history to a csv file at the given filename.Only the parameters saved to history are exported, not the agent parameters.
Args:
filename (str, optional): The name of the file to save the history to. Defaults to "agent_history.csv".
params_to_export (list, optional): A list of parameters to export from the agent. If None, exports all parameters.
Copy link
Collaborator

@TomGeorge1234 TomGeorge1234 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this docstring needs updating (params --> keys, and save_to_file is missing)

@@ -8,7 +8,7 @@


class ValueNeuron(FeedForwardLayer):
"""
r"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no linting was complaining with all the backslashes in the docs but I guess it should be another PR to solve it library wide. removing it

import scipy
import inspect
import os
import warnings
from datetime import datetime
from scipy import stats as stats
from typing import Union

from typing import Union, Tuple, List
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, I recently learned this way to type hinting has been superseded by |, tuple[] and list[]. Are you aware of this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just chiming in to say that these modern type hinting features were introduced in Python 3.9 (for list[], dict[] etc.) and Python 3.10 (| inplace of Union).

Your requirements currently state >= Python 3.7, so if someone tries to use RAIB with "modern" type hinting in an env <3.9, they will get errors.

My personal recommendation would be to just ditch older Python versions, anything <3.10 is anyway not officially supported by the whole scientific Python ecosystem, see SPEC 0.

Of course, you may have reaseons for supporting older Python versions that I'm not aware of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense! Thanks for the background. what do you think Mehul? Ditch <3.10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes exactly I agree with @niksirbi !! I commented this in the other thread!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grad. Let's do that. If it's easy go ahead and change all the typing to >=3.10 compatible. Or else we ca make a separate PR



def export_history(history_dict: dict,
filename:Union[str,None] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason why file name should ever be optional? Currently Agent.export_history() and Neurons.export_history() pass it.

relatedly, is this an internal function? Do you ever foresee users calling this directly, or only via the Agent and Neurons wrappers. If so maybe we should mention this, or even rename it to _export_history

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh! So I made it optional so that if the user does not pass it the function automatically makes the name using the name of the agent or the neuron. This helps when one want to export all the neurons in the agent/ all agents in the env and they don't need to pass all the filenames. Check export_agents in Env.

Also _ is generally used for hidden functions. I think these should be callable by the user from the Agent/Neuron obejcts

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't that make this a hidden function. This is only callable though the user-facing Agent or Neurons.export APIs which contains the necessary logic for converting to a dataframe-compatible dict.

filename = filename.split('.')[0] + f".{format}"

# add the prefix to the filename if it is provided
if filename_prefix is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? Does filename_prefix not just add unnecessary lines of code? What does this add that the user couldn't just give in the filename argument. I would suggest removing this unless I've misunderstood

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the env level export. The user can just give a prefix for the env and then all the agents will be export (using automatic names). User can definitely give filenames too!

This is for when the env has more than 1 agent

@TomGeorge1234
Copy link
Collaborator

TomGeorge1234 commented Jun 18, 2025


Hi Mehul, this is a great start! I have a few points I'd like to discuss and potentially resolve:

1. Streamlining Data Export Logic

The most significant aspect of this PR (and not your fault at all) is the current necessity to manually loop over and prepare each history entry. While I understand this is done to ensure each variable is saved as a single column in a CSV, I'm concerned about the repeated logic for Neurons and Agent history and the hardcoding for specific keys.

Proposal: Generic _dict_to_dataframe() utility

I suggest creating a generic utility function, perhaps named _dict_to_dataframe(), that takes any dictionary (like our history dictionaries) and transforms it into a Pandas DataFrame suitable for export. This function would:

  • Iterate over the dictionary's keys.
  • Dynamically inspect the type and dimensionality of each value (e.g., list, list of lists, NumPy array).
  • Handle multi-dimensional variables (like pos or firing_rate) by flattening them into multiple columns. For example, pos could become pos_0, pos_1, and firing_rate could become firing_rate_0, ..., `firing_rate_{N-1}. All other 1D keys would retain their original names.

Benefits of this approach:

  • Maintainability: One centralized function instead of duplicated logic across different parts of the code.
  • Flexibility: It won't be limited to the currently saved keys. If users add new variables to history, they might be automatically handled.
  • Readability: Decouples the export preparation from the history-saving mechanism.

Sidenote: Take a look at Agent.get_history_arrays(). It converts the history dictionary into a strict dictionary of arrays, which could be a useful first step for _dict_to_dataframe().

2. Importing Exported Data

Following the above, we would ideally need a corresponding import utility that can invert this logic, reconstructing the original data structures from the exported DataFrame. This might be tricky, but I believe it's possible, you've kind of already been writing this anyway. Specifically, if any of the keys had matching prefixes followed by _0, _1 ec. these could be stacked into an array.

3. Dependencies and File Format

It looks like new dependencies might be introduced. My preference is to stick primarily to pandas for handling the data frame operations (as you currently do). I also lean towards CSVs as the primary export format, as it aligns with common practices in neuroscience tools (e.g., DeepLabCut) and is generally more novice-friendly. Could the use of parquets somehow be allowed but not encouraged nor required as a dependency.

ALTERNATIVE APPROACH: Dual History Maintenance

I'm also open to an alternative idea: modifying Agent.save_to_history() to simultaneously maintain both the existing history dictionary (which is good for plotting, etc.) and a history_dataframe. This history_dataframe would contain the same information but exclusively as 1D variables, added directly to a Pandas DataFrame.

This approach would eliminate the need to retrospectively convert the dictionary to a DataFrame, as both structures would always exist in parallel. However, my current preference still lies with the _dict_to_dataframe() utility, but I wanted to put this alternative out there for discussion.


"""Exports the history of the agents in the environment.
Args:
agent_names (str, list[str]): the name of the agent you want to export the history for. If None, exports all agents.
save (bool): whether to save the history to a file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose we should remove this parameter and always save to file. Can you think of a good reason why a users would want to export the data to a df but then not save? If so, wouldn't they just use the agent.history attribute.

What about this: We write all the logic for export to a dataframe. And have a wrapper on this which then saves this. o users have two separate APIs:
convert_history_to_dataframe() and export_history(). I think it's clearer that way.

print(f"Exporting history for agent {agent.name}")
df = agent.export_history(keys_to_export=keys_to_export,
save_to_file=save_to_file,
filename_prefix=f"{self.name}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here this should be agent.name? No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no..as this is looping over the agent objects after looking up the names passed by the user

@TomGeorge1234
Copy link
Collaborator

In light of a further read and better understanding I think this might be better organised as follows:

In utils a single function convert_dictionary_to_dataframe() maps any dictionary to a pandas dataframe. Agents and Neurons then each have a user-facing .convert_history_to_dataframe() and.export_history_to_file() method. The first one simply calls the util on its own history, the second one calls the first one and saves it (exposing some kwargs for csv or parquet or whatever).

Once these are built we can think about an environment (and agent) level AP which saves all subagent (and subneuron) level data into a single csv.

How does this sound? Very open to push back!

@mehulrastogi
Copy link
Contributor Author

1. Streamlining Data Export Logic

The most significant aspect of this PR (and not your fault at all) is the current necessity to manually loop over and prepare each history entry. While I understand this is done to ensure each variable is saved as a single column in a CSV, I'm concerned about the repeated logic for Neurons and Agent history and the hardcoding for specific keys.

Proposal: Generic _dict_to_dataframe() utility

I suggest creating a generic utility function, perhaps named _dict_to_dataframe(), that takes any dictionary (like our history dictionaries) and transforms it into a Pandas DataFrame suitable for export. This function would:

  • Iterate over the dictionary's keys.
  • Dynamically inspect the type and dimensionality of each value (e.g., list, list of lists, NumPy array).
  • Handle multi-dimensional variables (like pos or firing_rate) by flattening them into multiple columns. For example, pos could become pos_0, pos_1, and firing_rate could become firing_rate_0, ..., `firing_rate_{N-1}. All other 1D keys would retain their original names.

Benefits of this approach:

  • Maintainability: One centralized function instead of duplicated logic across different parts of the code.
  • Flexibility: It won't be limited to the currently saved keys. If users add new variables to history, they might be automatically handled.
  • Readability: Decouples the export preparation from the history-saving mechanism.

Sidenote: Take a look at Agent.get_history_arrays(). It converts the history dictionary into a strict dictionary of arrays, which could be a useful first step for _dict_to_dataframe().

I completely agree with this approach and basically had 2 choices:-

  • either we make an overall to the way we save everything in the history (standardising as far as possible)
    - if we go this way maybe we should be able to import entire agents in a nice way?(including the params) or we limit ourselves to only things related to "poses"? in which case we should only export the position and the heading (no vel), but here a harcoded solution limiting what users can export be much more useful
  • or each obj is responsible for handling the datastructures in the history
    - this in the longterm an be limiting as the users can add a history param which won't be automatically exported
    but a discussion needs to be done if we should allow that anyways out of box
    - the pro is that without changing a lot of code the users can import/export the histories (which is the main reason I went for it)

2. Importing Exported Data

Following the above, we would ideally need a corresponding import utility that can invert this logic, reconstructing the original data structures from the exported DataFrame. This might be tricky, but I believe it's possible, you've kind of already been writing this anyway. Specifically, if any of the keys had matching prefixes followed by _0, _1 ec. these could be stacked into an array.

Yes1 I was aiming for a method that the users can do import/export at will but maybe we need to limit this also. For eg. we only allow exporting positions and the import functionality automatically populates all the others? (using a mini simulation?) - this will alow users to import data that does not have velocities,etc

3. Dependencies and File Format

It looks like new dependencies might be introduced. My preference is to stick primarily to pandas for handling the data frame operations (as you currently do). I also lean towards CSVs as the primary export format, as it aligns with common practices in neuroscience tools (e.g., DeepLabCut) and is generally more novice-friendly. Could the use of parquets somehow be allowed but not encouraged nor required as a dependency.

This is a hard one to solve just using csvs for multiple reasons but main being that csv do not support arrays (we need that for fr in neurons - csv stores them as string!!) and I do believe that even positions should be an array. Loading and writing arrays in csv is super slow and the file tends to be very large.

SLEAP/DLC use hdf5 file format to write data in a matrix.

Also if we are able to do parquets it's much easier to make a general utility function to "solve all"!

ALTERNATIVE APPROACH: Dual History Maintenance

I'm also open to an alternative idea: modifying Agent.save_to_history() to simultaneously maintain both the existing history dictionary (which is good for plotting, etc.) and a history_dataframe. This history_dataframe would contain the same information but exclusively as 1D variables, added directly to a Pandas DataFrame.

I don't think this is necessary and just eats up on the ram. it takes very less compute power to convert and once we decide a format this can be avoided

@TomGeorge1234
Copy link
Collaborator

I completely agree with this approach and basically had 2 choices:-

either we make an overall to the way we save everything in the history (standardising as far as possible)

  • if we go this way maybe we should be able to import entire agents in a nice way?(including the params) or we limit ourselves to only things related to "poses"? in which case we should only export the position and the heading (no vel), but here a harcoded solution limiting what users can export be much more useful
    or each obj is responsible for handling the datastructures in the history
  • this in the longterm an be limiting as the users can add a history param which won't be automatically exported
    but a discussion needs to be done if we should allow that anyways out of box
  • the pro is that without changing a lot of code the users can import/export the histories (which is the main reason I went for it)

So I'd say that exporting here is a priority over importing. That's definitely the case for most RiaB users who generate data with the package and then study it elsewhere, but not sure about your needs @niksirbi. If that's the case then I think I greatly prefer the centralised method of having one function which converts any dictionary to a dataframe, living outside the specific classes.

I don't think this necessarily precludes importing which, again, should be done with a centralised utility which loads a csv, converts to pandas then tries to group any columns with the same prefix. Then passes this to the Agent / Neurons who can check the right variables are present. Anyway, this is unlikely to be used by many people imo and does overlap with the existing Agent.import_trajectory() API which is much more simplistic but has done the job.

My suggestion would be to prioritise exporting over importing and do this as cleanly as possible in a way which doesn't limit us further down the line.

Yes1 I was aiming for a method that the users can do import/export at will but maybe we need to limit this also. For eg. we only allow exporting positions and the import functionality automatically populates all the others? (using a mini simulation?) - this will alow users to import data that does not have velocities,etc

This is sort of what is achieved by Agent.import_trajectory(). We could therefore just minority adapt this function to allow importing from a csv/parquet.

This is a hard one to solve just using csvs for multiple reasons but main being that csv do not support arrays (we need that for fr in neurons - csv stores them as string!!) and I do believe that even positions should be an array. Loading and writing arrays in csv is super slow and the file tends to be very large.

SLEAP/DLC use hdf5 file format to write data in a matrix.

Also if we are able to do parquets it's much easier to make a general utility function to "solve all"!

Tbh I'm a little out of my depth here regarding what the community would be satisfied with and what is best practise (different things in my experience). However I am quite happy that riab still only has 4 dependencies so moving to 6 is a bg jump. Not against it, just want ot know it's worth it. Dumb question; could we just save as npz?

@niksirbi
Copy link
Contributor

So I'd say that exporting here is a priority over importing. That's definitely the case for most RiaB users who generate data with the package and then study it elsewhere, but not sure about your needs @niksirbi. If that's the case then I think I greatly prefer the centralised method of having one function which converts any dictionary to a dataframe, living outside the specific classes.

Regarding our use-case, i.e. loading RiaB-generate Agent trajectories into movement, the only thing we absolutely need is the x,(y, z) positions of the Agent(s) over time. If you export heading as well, I'm confident we can also load that in our data structures, but not absolutely necessary. So basically, as long as we can have Agent positions in any file format that can be read into a dataframe, we are good (parquet is fine as well, and may be even preferable due to the reasons @mehulrastogi mentioned). I'd only ask that the contents of that file are clearly documented somewhere, so we and others can figure out how to parse it.

As to the wider discussion, the needs of RiaB's users are of primary importance here, so feel free to implement export in any way that's maximally useful to them. We in movement will find a way to extract the info we need from the file, as long as the above criterion in met.

@TomGeorge1234
Copy link
Collaborator

@mehulrastogi does the above sound good to you? shall we go ahead and structure it a little in this direction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants