ENH: Use DIPY's parallelization #142

oesteban · 2025-05-24T04:40:17Z

Dropping joblib in favor of the multi_voxel_fit decorator of dipy/dipy#2593.

I've tested with DIPY 1.10 and 1.11 and yet I get:

TypeError: TensorModel.fit() got an unexpected keyword argument 'engine'

WDYT @arokem?

codecov · 2025-05-24T04:51:43Z

Codecov Report

Attention: Patch coverage is 44.44444% with 25 lines in your changes missing coverage. Please review.

Project coverage is 70.49%. Comparing base (d16c97c) to head (e1f35be).

Files with missing lines	Patch %	Lines
src/nifreeze/model/dmri.py	6.66%	14 Missing ⚠️
src/nifreeze/estimator.py	60.00%	7 Missing and 1 partial ⚠️
src/nifreeze/data/base.py	66.66%	2 Missing ⚠️
src/nifreeze/model/_dipy.py	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #142      +/-   ##
==========================================
+ Coverage   70.10%   70.49%   +0.38%     
==========================================
  Files          23       23              
  Lines        1067     1081      +14     
  Branches      129      128       -1     
==========================================
+ Hits          748      762      +14     
+ Misses        275      274       -1     
- Partials       44       45       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arokem · 2025-05-24T05:39:14Z

DTI doesn’t support parallelization because we use numpy-level parallelization to fit chunks of 10e4 voxels at a time. Should work with other models though, I think.

…

On Fri, May 23, 2025 at 9:40 PM Oscar Esteban ***@***.***> wrote: Dropping joblib in favor of the multi_voxel_fit decorator of dipy/dipy#2593 <dipy/dipy#2593>. I've tested with DIPY 1.10 and 1.11 and yet I get: TypeError: TensorModel.fit() got an unexpected keyword argument 'engine' WDYT @arokem <https://github.com/arokem>? ------------------------------ You can view, comment on, or merge this pull request online at: #142 Commit Summary - 3fbbf57 <3fbbf57> enh: use DIPY's parallelization File Changes (1 file <https://github.com/nipreps/nifreeze/pull/142/files>) - *M* src/nifreeze/model/dmri.py <https://github.com/nipreps/nifreeze/pull/142/files#diff-b58005fd41fe310827c710a6540c00b56f75574cc68dba23d06d6ccb5a34c880> (69) Patch Links: - https://github.com/nipreps/nifreeze/pull/142.patch - https://github.com/nipreps/nifreeze/pull/142.diff — Reply to this email directly, view it on GitHub <#142>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAA46NVFUDQ3QJZ2AS5S2I3277Z4NAVCNFSM6AAAAAB52DQDXOVHI2DSMVQWIX3LMV43ASLTON2WKOZTGA4DQMBTHAZDGNY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

oesteban · 2025-05-24T06:05:32Z

DTI doesn’t support parallelization because we use numpy-level parallelization to fit chunks of 10e4 voxels at a time. Should work with other models though, I think.

Wouldn't it be very beneficial to add the multiprocessing layer around? Then you would have numpy's parallelization in separate jobs. Those individual processes would be a bit slower due to CPU workload, but altogether the only bottleneck would be memory, with a significant increase in speed.

arokem · 2025-05-24T14:25:42Z

Depending on your definition of "very"... But ultimately an empirical question.

In terms of cpu time the thing that would be most beneficial is to carefully tune that 10e4 number (chunk_size I believe) to the machine you are running on. But not the best use of programmer time, given these models run quite fast as is.

jhlegarreta · 2025-05-26T14:32:50Z

pyproject.toml

    "joblib",
    "nipype>= 1.5.1,<2.0",
    "nitransforms>=22.0.0,<24",
    "nireports",
    "numpy>=1.21.3",
    "nest-asyncio>=1.5.1",
+    "ray",


Are we requiring both joblib and ray? I am not sure how we can make these be individual choices for the user depending on their preference (?). We should probably raise a warning if there is no parallelization backend so that users become aware of the reason why the run is so slow.

As an additional note, DIPY also offers dask as another parallelization backend, e.g.
https://github.com/dipy/dipy/blob/master/dipy/utils/tests/test_parallel.py#L12-L18

In our experiments, ray performs very well, even relative to joblib, so giving people this option would be great (maybe even as default?). I would stay away from dask. In our experiments it very often choked and performed worse than a serial baseline.

For some of the details: https://nrdg.github.io/2024-dipy-parallelization/

oesteban · 2025-05-27T05:04:19Z

(chunk_size I believe)

@arokem Is it possible it's called "step"?

https://github.com/dipy/dipy/blob/c8fa6d5fc4c21d23c74056182731b65da138a644/dipy/reconst/dti.py#L739-L745

arokem · 2025-05-28T00:12:10Z

@arokem Is it possible it's called "step"?

Yes, that's the one! Sorry for the brain-fart.

oesteban · 2025-06-05T15:26:36Z

Depending on your definition of "very"... But ultimately an empirical question.

@arokem FYI, unless there's something very flawed in my tests, I've managed to decorate the DTI fitting with the multi_fit decorator and this is the result:

Handling only threading through step (n_jobs = 1) and increasing the chunk size to the number of voxels / number of threads, it fits my image in some 270 s.
Handling only multiprocessing (n_jobs = 12), fitting goes down to 37 s, and with n_jobs=20 it's 30.72 s.
Combining (step=(nvox // (omp_nthreads * n_jobs))=248842 and n_jobs=20, n_threads=20) it goes to 31 s.

So multiprocessing is way faster than numpy parallelization (sans error on my side configuring numpy, which I'm thinking could be the culprit).

oesteban · 2025-06-06T13:33:43Z

Okay, so this was a good try, but I have slowed down things by much because prediction is not optimized for parallelization in DIPY (unless I got something wrong).

Other conclusions:

It's better not to pass a b0 as S0 unless it has been carefully processed:
- CSF removed, otherwise it will artificially make ventricles and CSF around the brain to bright. Also, less voxels to fit.
- Potentially, the b0 should be affected by the same inhomogeneity of DWIs, so that the cortical GM is artificially brighter, but matches the effect of proximity of the coils.
- Needs to be denoised
DTI's step doesn't seem to have a meaningful impact as it changes, so I'm guessing there's something making my numpy work in single thread (although I've checked and it's not the case).
DKI seems to have some internal parallelization but I don't fully understand how it works.

oesteban · 2025-06-06T13:39:28Z

Marking as a draft because I'm not 100% positive we want to go down this route. Let's chat in a TechMon or a specific nifreeze follow-up. @arokem let us know if you want to bring the TechMon back to a schedule more considerate of your timezone or you prefer a one-off meeting some time.

Brings improvements in parallelization management from #142, so they are kept even if we finally decided against #142. In particular, it opens up the implementation to set the ``step`` feature of DTI models. It also extends the base data object with two convenience properties for retrieving the 3D shape of the data and the number of voxels per volume. X-References: #142.

arokem · 2025-06-09T04:17:22Z

I would be happy to chat about this, but you are right that Tuesdays at 8am my time are never going to work for me. I would think that getting anything above optimally tuned numpy parallelization would be hard to achieve, and would be (both cases, I guess) quite dependent on a lot of things outside our control (user hardware, for example).

oesteban · 2025-06-09T07:46:06Z

I would be happy to chat about this, but you are right that Tuesdays at 8am my time are never going to work for me

Let's find a better moment. Are you planning on going to OHBM?

I would think that getting anything above optimally tuned numpy parallelization would be hard to achieve, and would be (both cases, I guess) quite dependent on a lot of things outside our control (user hardware, for example).

While this is theoretically correct, in my experience it is really hard to prepare optimal numpy deployments (with large constraints in Linux platforms with cPython for the way virtual memory is managed and how numpy is implemented). However, it is also relatively inexpensive and effective to add multiprocessing for parallelization.

The problem I'm hitting right now is that DIPY's implementation is (IMHO) a bit irregular about it:

DTI has numpy parallelization, and decoration with the multi_voxel_fit thingy helped me speed up substantially.
DKI may be optimized, but it uses multi_voxel_fit internally without exposing arguments (engine, number of processes) and it's really hard to optimize in practice.
The rest of the models, in theory, should go well with the multi voxel fit feature.

All of the above is in DIPY's area. However, as I introduce in #158, we could try to leverage our HDF5 data store to optimize parallelization for minimal memory usage. Right now, we split the data in the main process (either with our parallelization or DIPY's), while the HDF5 opens an opportunity to not access the data until you are in a worker ready to do the actual processing.

arokem · 2025-06-11T06:11:03Z

Sadly, I will not be at OHBM this year. If things are irregular within DIPY, we should fix that!

oesteban · 2025-06-11T07:19:29Z

Sadly, I will not be at OHBM this year. If things are irregular within DIPY, we should fix that!

Okay, I'll try to set it up after OHBM then (things are quite hasty for me before).

Re: DIPY. I'll be super-happy to report every issue I encounter, and I also understand that DIPY's scope/vision/roadmap may not be aligned with my suggestions, so no hard feelings if my comments are dismissed after consideration :)

I'm also happy to look at it from a new perspective and contribute more to DIPY if you otherwise see the feedback as highly useful.

oesteban force-pushed the enh/leverage-dipy-parallelization branch from 3fbbf57 to 0cb775c Compare May 24, 2025 05:09

oesteban force-pushed the enh/leverage-dipy-parallelization branch from f2e520d to d816f7c Compare May 26, 2025 08:04

jhlegarreta reviewed May 26, 2025

View reviewed changes

oesteban force-pushed the enh/leverage-dipy-parallelization branch from 21b7d76 to b7c8ccd Compare June 5, 2025 21:01

oesteban requested review from arokem and jhlegarreta June 5, 2025 21:10

oesteban force-pushed the enh/leverage-dipy-parallelization branch from b7c8ccd to e1f35be Compare June 6, 2025 12:03

oesteban added 3 commits June 6, 2025 14:04

enh: use DIPY's parallelization

70c84d7

enh: force parallelization even for dti and dki

d615ae1

enh: robustify parallelization argument passing

e1f35be

oesteban marked this pull request as draft June 6, 2025 13:37

oesteban mentioned this pull request Jun 8, 2025

ENH: Improve general handling of parallelization #157

Merged

oesteban mentioned this pull request Jun 8, 2025

Design issues of parallelization #158

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Use DIPY's parallelization #142

ENH: Use DIPY's parallelization #142

Uh oh!

oesteban commented May 24, 2025

Uh oh!

codecov bot commented May 24, 2025 •

edited

Loading

Uh oh!

arokem commented May 24, 2025 via email

Uh oh!

oesteban commented May 24, 2025

Uh oh!

arokem commented May 24, 2025

Uh oh!

jhlegarreta May 26, 2025

Uh oh!

arokem May 26, 2025

Uh oh!

oesteban commented May 27, 2025

Uh oh!

arokem commented May 28, 2025

Uh oh!

oesteban commented Jun 5, 2025

Uh oh!

oesteban commented Jun 6, 2025

Uh oh!

oesteban commented Jun 6, 2025

Uh oh!

arokem commented Jun 9, 2025

Uh oh!

oesteban commented Jun 9, 2025

Uh oh!

arokem commented Jun 11, 2025

Uh oh!

oesteban commented Jun 11, 2025

Uh oh!

Uh oh!

ENH: Use DIPY's parallelization #142

Are you sure you want to change the base?

ENH: Use DIPY's parallelization #142

Uh oh!

Conversation

oesteban commented May 24, 2025

Uh oh!

codecov bot commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arokem commented May 24, 2025 via email

Uh oh!

oesteban commented May 24, 2025

Uh oh!

arokem commented May 24, 2025

Uh oh!

jhlegarreta May 26, 2025

Choose a reason for hiding this comment

Uh oh!

arokem May 26, 2025

Choose a reason for hiding this comment

Uh oh!

oesteban commented May 27, 2025

Uh oh!

arokem commented May 28, 2025

Uh oh!

oesteban commented Jun 5, 2025

Uh oh!

oesteban commented Jun 6, 2025

Uh oh!

oesteban commented Jun 6, 2025

Uh oh!

arokem commented Jun 9, 2025

Uh oh!

oesteban commented Jun 9, 2025

Uh oh!

arokem commented Jun 11, 2025

Uh oh!

oesteban commented Jun 11, 2025

Uh oh!

Uh oh!

codecov bot commented May 24, 2025 •

edited

Loading