Import and Export data from Agents and Neurons #129

mehulrastogi · 2025-06-16T12:50:09Z

Made it so that one can call the following to save/import data from parquets:-

# Export data into a dataframe 
ag_his = Ag.export_history(save_to_file=True)
pcs_his = PCs.export_history(save_to_file=True)

Ag.import_history(filename='agent_agent_0_history.parquet')
PCs.import_history(filename='neuron_PlaceCells_history.parquet')

I choose parquets as we need to able to save lists in a df and parquet make it easy. I have also made an option in the functions so that the users can export it as a csv but the functionality might be limited (import might cause an issue-testing this)

mehulrastogi · 2025-06-16T12:53:14Z

Solves #120

TomGeorge1234 · 2025-06-18T09:00:25Z

ratinabox/Agent.py

-
+
+    def export_history(self, 
+                    filename:Union[str,None] = None,


Is this still the best way to type hint? Shouldn't we use filename: str | None = None nowadays?

Yeah! But it depends what python version do wanna support. I think anything below 3.10 might be an issue but I agree that 3.10 is a fair. In the config we have python>=3.7

TomGeorge1234 · 2025-06-18T09:22:04Z

ratinabox/Agent.py

+        """Exports the agent history to a csv file at the given filename.Only the parameters saved to history are exported, not the agent parameters.
+        Args:
+            filename (str, optional): The name of the file to save the history to. Defaults to "agent_history.csv".
+            params_to_export (list, optional): A list of parameters to export from the agent. If None, exports all parameters.


think this docstring needs updating (params --> keys, and save_to_file is missing)

TomGeorge1234 · 2025-06-18T09:22:40Z

ratinabox/contribs/ValueNeuron.py

@@ -8,7 +8,7 @@


 class ValueNeuron(FeedForwardLayer):
-    """
+    r"""


no linting was complaining with all the backslashes in the docs but I guess it should be another PR to solve it library wide. removing it

TomGeorge1234 · 2025-06-18T09:23:41Z

ratinabox/utils.py

 import scipy
 import inspect
 import os
 import warnings
 from datetime import datetime
 from scipy import stats as stats
 from typing import Union
+
+from typing import Union, Tuple, List


Same as above, I recently learned this way to type hinting has been superseded by |, tuple[] and list[]. Are you aware of this?

Just chiming in to say that these modern type hinting features were introduced in Python 3.9 (for list[], dict[] etc.) and Python 3.10 (| inplace of Union).

Your requirements currently state >= Python 3.7, so if someone tries to use RAIB with "modern" type hinting in an env <3.9, they will get errors.

My personal recommendation would be to just ditch older Python versions, anything <3.10 is anyway not officially supported by the whole scientific Python ecosystem, see SPEC 0.

Of course, you may have reaseons for supporting older Python versions that I'm not aware of.

Make sense! Thanks for the background. what do you think Mehul? Ditch <3.10?

Ahh yes exactly I agree with @niksirbi !! I commented this in the other thread!

Grad. Let's do that. If it's easy go ahead and change all the typing to >=3.10 compatible. Or else we ca make a separate PR

ratinabox/utils.py

TomGeorge1234 · 2025-06-18T09:28:41Z

ratinabox/utils.py

+
+
+def export_history(history_dict: dict,
+                   filename:Union[str,None] = None,


is there a reason why file name should ever be optional? Currently Agent.export_history() and Neurons.export_history() pass it.

relatedly, is this an internal function? Do you ever foresee users calling this directly, or only via the Agent and Neurons wrappers. If so maybe we should mention this, or even rename it to _export_history

Ahh! So I made it optional so that if the user does not pass it the function automatically makes the name using the name of the agent or the neuron. This helps when one want to export all the neurons in the agent/ all agents in the env and they don't need to pass all the filenames. Check export_agents in Env.

Also _ is generally used for hidden functions. I think these should be callable by the user from the Agent/Neuron obejcts

But wouldn't that make this a hidden function. This is only callable though the user-facing Agent or Neurons.export APIs which contains the necessary logic for converting to a dataframe-compatible dict.

TomGeorge1234 · 2025-06-18T09:31:14Z

ratinabox/utils.py

+        filename = filename.split('.')[0] + f".{format}"
+
+    # add the prefix to the filename if it is provided
+    if filename_prefix is not None:


What is this for? Does filename_prefix not just add unnecessary lines of code? What does this add that the user couldn't just give in the filename argument. I would suggest removing this unless I've misunderstood

This is for the env level export. The user can just give a prefix for the env and then all the agents will be export (using automatic names). User can definitely give filenames too!

This is for when the env has more than 1 agent

TomGeorge1234 · 2025-06-18T09:55:29Z

Hi Mehul, this is a great start! I have a few points I'd like to discuss and potentially resolve:

1. Streamlining Data Export Logic

The most significant aspect of this PR (and not your fault at all) is the current necessity to manually loop over and prepare each history entry. While I understand this is done to ensure each variable is saved as a single column in a CSV, I'm concerned about the repeated logic for Neurons and Agent history and the hardcoding for specific keys.

Proposal: Generic _dict_to_dataframe() utility

I suggest creating a generic utility function, perhaps named _dict_to_dataframe(), that takes any dictionary (like our history dictionaries) and transforms it into a Pandas DataFrame suitable for export. This function would:

Iterate over the dictionary's keys.
Dynamically inspect the type and dimensionality of each value (e.g., list, list of lists, NumPy array).
Handle multi-dimensional variables (like pos or firing_rate) by flattening them into multiple columns. For example, pos could become pos_0, pos_1, and firing_rate could become firing_rate_0, ..., `firing_rate_{N-1}. All other 1D keys would retain their original names.

Benefits of this approach:

Maintainability: One centralized function instead of duplicated logic across different parts of the code.
Flexibility: It won't be limited to the currently saved keys. If users add new variables to history, they might be automatically handled.
Readability: Decouples the export preparation from the history-saving mechanism.

Sidenote: Take a look at Agent.get_history_arrays(). It converts the history dictionary into a strict dictionary of arrays, which could be a useful first step for _dict_to_dataframe().

2. Importing Exported Data

Following the above, we would ideally need a corresponding import utility that can invert this logic, reconstructing the original data structures from the exported DataFrame. This might be tricky, but I believe it's possible, you've kind of already been writing this anyway. Specifically, if any of the keys had matching prefixes followed by _0, _1 ec. these could be stacked into an array.

3. Dependencies and File Format

It looks like new dependencies might be introduced. My preference is to stick primarily to pandas for handling the data frame operations (as you currently do). I also lean towards CSVs as the primary export format, as it aligns with common practices in neuroscience tools (e.g., DeepLabCut) and is generally more novice-friendly. Could the use of parquets somehow be allowed but not encouraged nor required as a dependency.

ALTERNATIVE APPROACH: Dual History Maintenance

I'm also open to an alternative idea: modifying Agent.save_to_history() to simultaneously maintain both the existing history dictionary (which is good for plotting, etc.) and a history_dataframe. This history_dataframe would contain the same information but exclusively as 1D variables, added directly to a Pandas DataFrame.

This approach would eliminate the need to retrospectively convert the dictionary to a DataFrame, as both structures would always exist in parallel. However, my current preference still lies with the _dict_to_dataframe() utility, but I wanted to put this alternative out there for discussion.

TomGeorge1234 · 2025-06-18T12:32:14Z

ratinabox/Environment.py

+        """Exports the history of the agents in the environment.
+        Args:
+            agent_names (str, list[str]): the name of the agent you want to export the history for. If None, exports all agents.
+            save (bool): whether to save the history to a file


I would propose we should remove this parameter and always save to file. Can you think of a good reason why a users would want to export the data to a df but then not save? If so, wouldn't they just use the agent.history attribute.

What about this: We write all the logic for export to a dataframe. And have a wrapper on this which then saves this. o users have two separate APIs:
convert_history_to_dataframe() and export_history(). I think it's clearer that way.

TomGeorge1234 · 2025-06-18T12:34:02Z

ratinabox/Environment.py

+                print(f"Exporting history for agent {agent.name}")
+            df = agent.export_history(keys_to_export=keys_to_export, 
+                                 save_to_file=save_to_file,
+                                 filename_prefix=f"{self.name}")


Here this should be agent.name? No?

no..as this is looping over the agent objects after looking up the names passed by the user

TomGeorge1234 · 2025-06-18T12:42:54Z

In light of a further read and better understanding I think this might be better organised as follows:

In utils a single function convert_dictionary_to_dataframe() maps any dictionary to a pandas dataframe. Agents and Neurons then each have a user-facing .convert_history_to_dataframe() and.export_history_to_file() method. The first one simply calls the util on its own history, the second one calls the first one and saves it (exposing some kwargs for csv or parquet or whatever).

Once these are built we can think about an environment (and agent) level AP which saves all subagent (and subneuron) level data into a single csv.

How does this sound? Very open to push back!

mehulrastogi · 2025-06-18T13:03:45Z

1. Streamlining Data Export Logic

The most significant aspect of this PR (and not your fault at all) is the current necessity to manually loop over and prepare each history entry. While I understand this is done to ensure each variable is saved as a single column in a CSV, I'm concerned about the repeated logic for Neurons and Agent history and the hardcoding for specific keys.

Proposal: Generic _dict_to_dataframe() utility

I suggest creating a generic utility function, perhaps named _dict_to_dataframe(), that takes any dictionary (like our history dictionaries) and transforms it into a Pandas DataFrame suitable for export. This function would:

Iterate over the dictionary's keys.

Dynamically inspect the type and dimensionality of each value (e.g., list, list of lists, NumPy array).

Handle multi-dimensional variables (like pos or firing_rate) by flattening them into multiple columns. For example, pos could become pos_0, pos_1, and firing_rate could become firing_rate_0, ..., `firing_rate_{N-1}. All other 1D keys would retain their original names.

Benefits of this approach:

Maintainability: One centralized function instead of duplicated logic across different parts of the code.

Flexibility: It won't be limited to the currently saved keys. If users add new variables to history, they might be automatically handled.

Readability: Decouples the export preparation from the history-saving mechanism.

Sidenote: Take a look at Agent.get_history_arrays(). It converts the history dictionary into a strict dictionary of arrays, which could be a useful first step for _dict_to_dataframe().

I completely agree with this approach and basically had 2 choices:-

either we make an overall to the way we save everything in the history (standardising as far as possible)
- if we go this way maybe we should be able to import entire agents in a nice way?(including the params) or we limit ourselves to only things related to "poses"? in which case we should only export the position and the heading (no vel), but here a harcoded solution limiting what users can export be much more useful
or each obj is responsible for handling the datastructures in the history
- this in the longterm an be limiting as the users can add a history param which won't be automatically exported
but a discussion needs to be done if we should allow that anyways out of box
- the pro is that without changing a lot of code the users can import/export the histories (which is the main reason I went for it)

2. Importing Exported Data

Following the above, we would ideally need a corresponding import utility that can invert this logic, reconstructing the original data structures from the exported DataFrame. This might be tricky, but I believe it's possible, you've kind of already been writing this anyway. Specifically, if any of the keys had matching prefixes followed by _0, _1 ec. these could be stacked into an array.

Yes1 I was aiming for a method that the users can do import/export at will but maybe we need to limit this also. For eg. we only allow exporting positions and the import functionality automatically populates all the others? (using a mini simulation?) - this will alow users to import data that does not have velocities,etc

3. Dependencies and File Format

It looks like new dependencies might be introduced. My preference is to stick primarily to pandas for handling the data frame operations (as you currently do). I also lean towards CSVs as the primary export format, as it aligns with common practices in neuroscience tools (e.g., DeepLabCut) and is generally more novice-friendly. Could the use of parquets somehow be allowed but not encouraged nor required as a dependency.

This is a hard one to solve just using csvs for multiple reasons but main being that csv do not support arrays (we need that for fr in neurons - csv stores them as string!!) and I do believe that even positions should be an array. Loading and writing arrays in csv is super slow and the file tends to be very large.

SLEAP/DLC use hdf5 file format to write data in a matrix.

Also if we are able to do parquets it's much easier to make a general utility function to "solve all"!

ALTERNATIVE APPROACH: Dual History Maintenance

I'm also open to an alternative idea: modifying Agent.save_to_history() to simultaneously maintain both the existing history dictionary (which is good for plotting, etc.) and a history_dataframe. This history_dataframe would contain the same information but exclusively as 1D variables, added directly to a Pandas DataFrame.

I don't think this is necessary and just eats up on the ram. it takes very less compute power to convert and once we decide a format this can be avoided

TomGeorge1234 · 2025-06-19T06:50:35Z

I completely agree with this approach and basically had 2 choices:-

either we make an overall to the way we save everything in the history (standardising as far as possible)

if we go this way maybe we should be able to import entire agents in a nice way?(including the params) or we limit ourselves to only things related to "poses"? in which case we should only export the position and the heading (no vel), but here a harcoded solution limiting what users can export be much more useful
or each obj is responsible for handling the datastructures in the history

this in the longterm an be limiting as the users can add a history param which won't be automatically exported
but a discussion needs to be done if we should allow that anyways out of box

the pro is that without changing a lot of code the users can import/export the histories (which is the main reason I went for it)

So I'd say that exporting here is a priority over importing. That's definitely the case for most RiaB users who generate data with the package and then study it elsewhere, but not sure about your needs @niksirbi. If that's the case then I think I greatly prefer the centralised method of having one function which converts any dictionary to a dataframe, living outside the specific classes.

I don't think this necessarily precludes importing which, again, should be done with a centralised utility which loads a csv, converts to pandas then tries to group any columns with the same prefix. Then passes this to the Agent / Neurons who can check the right variables are present. Anyway, this is unlikely to be used by many people imo and does overlap with the existing Agent.import_trajectory() API which is much more simplistic but has done the job.

My suggestion would be to prioritise exporting over importing and do this as cleanly as possible in a way which doesn't limit us further down the line.

Yes1 I was aiming for a method that the users can do import/export at will but maybe we need to limit this also. For eg. we only allow exporting positions and the import functionality automatically populates all the others? (using a mini simulation?) - this will alow users to import data that does not have velocities,etc

This is sort of what is achieved by Agent.import_trajectory(). We could therefore just minority adapt this function to allow importing from a csv/parquet.

This is a hard one to solve just using csvs for multiple reasons but main being that csv do not support arrays (we need that for fr in neurons - csv stores them as string!!) and I do believe that even positions should be an array. Loading and writing arrays in csv is super slow and the file tends to be very large.

SLEAP/DLC use hdf5 file format to write data in a matrix.

Also if we are able to do parquets it's much easier to make a general utility function to "solve all"!

Tbh I'm a little out of my depth here regarding what the community would be satisfied with and what is best practise (different things in my experience). However I am quite happy that riab still only has 4 dependencies so moving to 6 is a bg jump. Not against it, just want ot know it's worth it. Dumb question; could we just save as npz?

niksirbi · 2025-06-19T10:36:24Z

So I'd say that exporting here is a priority over importing. That's definitely the case for most RiaB users who generate data with the package and then study it elsewhere, but not sure about your needs @niksirbi. If that's the case then I think I greatly prefer the centralised method of having one function which converts any dictionary to a dataframe, living outside the specific classes.

Regarding our use-case, i.e. loading RiaB-generate Agent trajectories into movement, the only thing we absolutely need is the x,(y, z) positions of the Agent(s) over time. If you export heading as well, I'm confident we can also load that in our data structures, but not absolutely necessary. So basically, as long as we can have Agent positions in any file format that can be read into a dataframe, we are good (parquet is fine as well, and may be even preferable due to the reasons @mehulrastogi mentioned). I'd only ask that the contents of that file are clearly documented somewhere, so we and others can figure out how to parse it.

As to the wider discussion, the needs of RiaB's users are of primary importance here, so feel free to implement export in any way that's maximally useful to them. We in movement will find a way to extract the info we need from the file, as long as the above criterion in met.

TomGeorge1234 · 2025-06-29T09:21:23Z

@mehulrastogi does the above sound good to you? shall we go ahead and structure it a little in this direction?

TomGeorge1234 and others added 4 commits December 5, 2023 15:23

branch deprecation warning

96685ea

import and export functionality

0a711bb

Merge branch 'main' into 120-data-saving

e048534

don't save file by default

6776515

TomGeorge1234 reviewed Jun 18, 2025

View reviewed changes

ratinabox/utils.py Show resolved Hide resolved

TomGeorge1234 reviewed Jun 18, 2025

View reviewed changes



		def export_history(history_dict: dict,
		filename:Union[str,None] = None,

Import and Export data from Agents and Neurons #129

Are you sure you want to change the base?

Import and Export data from Agents and Neurons #129

Uh oh!

Conversation

mehulrastogi commented Jun 16, 2025

Uh oh!

mehulrastogi commented Jun 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomGeorge1234 Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomGeorge1234 commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Streamlining Data Export Logic

2. Importing Exported Data

3. Dependencies and File Format

ALTERNATIVE APPROACH: Dual History Maintenance

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomGeorge1234 commented Jun 18, 2025

Uh oh!

mehulrastogi commented Jun 18, 2025

1. Streamlining Data Export Logic

2. Importing Exported Data

3. Dependencies and File Format

ALTERNATIVE APPROACH: Dual History Maintenance

Uh oh!

TomGeorge1234 commented Jun 19, 2025

Uh oh!

niksirbi commented Jun 19, 2025

Uh oh!

TomGeorge1234 commented Jun 29, 2025

Uh oh!

Uh oh!

TomGeorge1234 Jun 18, 2025 •

edited

Loading

TomGeorge1234 commented Jun 18, 2025 •

edited

Loading