-
Notifications
You must be signed in to change notification settings - Fork 5
ENH: Use DIPY's parallelization #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #142 +/- ##
==========================================
+ Coverage 70.10% 70.49% +0.38%
==========================================
Files 23 23
Lines 1067 1081 +14
Branches 129 128 -1
==========================================
+ Hits 748 762 +14
+ Misses 275 274 -1
- Partials 44 45 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3fbbf57
to
0cb775c
Compare
DTI doesn’t support parallelization because we use numpy-level
parallelization to fit chunks of 10e4 voxels at a time. Should work with
other models though, I think.
…On Fri, May 23, 2025 at 9:40 PM Oscar Esteban ***@***.***> wrote:
Dropping joblib in favor of the multi_voxel_fit decorator of
dipy/dipy#2593 <dipy/dipy#2593>.
I've tested with DIPY 1.10 and 1.11 and yet I get:
TypeError: TensorModel.fit() got an unexpected keyword argument 'engine'
WDYT @arokem <https://github.com/arokem>?
------------------------------
You can view, comment on, or merge this pull request online at:
#142
Commit Summary
- 3fbbf57
<3fbbf57>
enh: use DIPY's parallelization
File Changes
(1 file <https://github.com/nipreps/nifreeze/pull/142/files>)
- *M* src/nifreeze/model/dmri.py
<https://github.com/nipreps/nifreeze/pull/142/files#diff-b58005fd41fe310827c710a6540c00b56f75574cc68dba23d06d6ccb5a34c880>
(69)
Patch Links:
- https://github.com/nipreps/nifreeze/pull/142.patch
- https://github.com/nipreps/nifreeze/pull/142.diff
—
Reply to this email directly, view it on GitHub
<#142>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA46NVFUDQ3QJZ2AS5S2I3277Z4NAVCNFSM6AAAAAB52DQDXOVHI2DSMVQWIX3LMV43ASLTON2WKOZTGA4DQMBTHAZDGNY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Wouldn't it be very beneficial to add the multiprocessing layer around? Then you would have numpy's parallelization in separate jobs. Those individual processes would be a bit slower due to CPU workload, but altogether the only bottleneck would be memory, with a significant increase in speed. |
Depending on your definition of "very"... But ultimately an empirical question. In terms of cpu time the thing that would be most beneficial is to carefully tune that 10e4 number ( |
f2e520d
to
d816f7c
Compare
"joblib", | ||
"nipype>= 1.5.1,<2.0", | ||
"nitransforms>=22.0.0,<24", | ||
"nireports", | ||
"numpy>=1.21.3", | ||
"nest-asyncio>=1.5.1", | ||
"ray", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we requiring both joblib
and ray
? I am not sure how we can make these be individual choices for the user depending on their preference (?). We should probably raise a warning if there is no parallelization backend so that users become aware of the reason why the run is so slow.
As an additional note, DIPY also offers dask
as another parallelization backend, e.g.
https://github.com/dipy/dipy/blob/master/dipy/utils/tests/test_parallel.py#L12-L18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our experiments, ray performs very well, even relative to joblib, so giving people this option would be great (maybe even as default?). I would stay away from dask. In our experiments it very often choked and performed worse than a serial baseline.
For some of the details: https://nrdg.github.io/2024-dipy-parallelization/
@arokem Is it possible it's called "step"? |
Yes, that's the one! Sorry for the brain-fart. |
@arokem FYI, unless there's something very flawed in my tests, I've managed to decorate the DTI fitting with the multi_fit decorator and this is the result:
So multiprocessing is way faster than numpy parallelization (sans error on my side configuring numpy, which I'm thinking could be the culprit). |
21b7d76
to
b7c8ccd
Compare
b7c8ccd
to
e1f35be
Compare
Okay, so this was a good try, but I have slowed down things by much because prediction is not optimized for parallelization in DIPY (unless I got something wrong). Other conclusions:
|
Marking as a draft because I'm not 100% positive we want to go down this route. Let's chat in a TechMon or a specific nifreeze follow-up. @arokem let us know if you want to bring the TechMon back to a schedule more considerate of your timezone or you prefer a one-off meeting some time. |
Brings improvements in parallelization management from #142, so they are kept even if we finally decided against #142. In particular, it opens up the implementation to set the ``step`` feature of DTI models. It also extends the base data object with two convenience properties for retrieving the 3D shape of the data and the number of voxels per volume. X-References: #142.
Brings improvements in parallelization management from #142, so they are kept even if we finally decided against #142. In particular, it opens up the implementation to set the ``step`` feature of DTI models. It also extends the base data object with two convenience properties for retrieving the 3D shape of the data and the number of voxels per volume. X-References: #142.
I would be happy to chat about this, but you are right that Tuesdays at 8am my time are never going to work for me. I would think that getting anything above optimally tuned numpy parallelization would be hard to achieve, and would be (both cases, I guess) quite dependent on a lot of things outside our control (user hardware, for example). |
Let's find a better moment. Are you planning on going to OHBM?
While this is theoretically correct, in my experience it is really hard to prepare optimal numpy deployments (with large constraints in Linux platforms with cPython for the way virtual memory is managed and how numpy is implemented). However, it is also relatively inexpensive and effective to add multiprocessing for parallelization. The problem I'm hitting right now is that DIPY's implementation is (IMHO) a bit irregular about it:
All of the above is in DIPY's area. However, as I introduce in #158, we could try to leverage our HDF5 data store to optimize parallelization for minimal memory usage. Right now, we split the data in the main process (either with our parallelization or DIPY's), while the HDF5 opens an opportunity to not access the data until you are in a worker ready to do the actual processing. |
Sadly, I will not be at OHBM this year. If things are irregular within DIPY, we should fix that! |
Okay, I'll try to set it up after OHBM then (things are quite hasty for me before). Re: DIPY. I'll be super-happy to report every issue I encounter, and I also understand that DIPY's scope/vision/roadmap may not be aligned with my suggestions, so no hard feelings if my comments are dismissed after consideration :) I'm also happy to look at it from a new perspective and contribute more to DIPY if you otherwise see the feedback as highly useful. |
Dropping joblib in favor of the multi_voxel_fit decorator of dipy/dipy#2593.
I've tested with DIPY 1.10 and 1.11 and yet I get:
WDYT @arokem?