Skip to content

ENH: Use DIPY's parallelization #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

oesteban
Copy link
Member

Dropping joblib in favor of the multi_voxel_fit decorator of dipy/dipy#2593.

I've tested with DIPY 1.10 and 1.11 and yet I get:

TypeError: TensorModel.fit() got an unexpected keyword argument 'engine'

WDYT @arokem?

Copy link

codecov bot commented May 24, 2025

Codecov Report

Attention: Patch coverage is 44.44444% with 25 lines in your changes missing coverage. Please review.

Project coverage is 70.49%. Comparing base (d16c97c) to head (e1f35be).

Files with missing lines Patch % Lines
src/nifreeze/model/dmri.py 6.66% 14 Missing ⚠️
src/nifreeze/estimator.py 60.00% 7 Missing and 1 partial ⚠️
src/nifreeze/data/base.py 66.66% 2 Missing ⚠️
src/nifreeze/model/_dipy.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #142      +/-   ##
==========================================
+ Coverage   70.10%   70.49%   +0.38%     
==========================================
  Files          23       23              
  Lines        1067     1081      +14     
  Branches      129      128       -1     
==========================================
+ Hits          748      762      +14     
+ Misses        275      274       -1     
- Partials       44       45       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oesteban oesteban force-pushed the enh/leverage-dipy-parallelization branch from 3fbbf57 to 0cb775c Compare May 24, 2025 05:09
@arokem
Copy link
Contributor

arokem commented May 24, 2025 via email

@oesteban
Copy link
Member Author

DTI doesn’t support parallelization because we use numpy-level parallelization to fit chunks of 10e4 voxels at a time. Should work with other models though, I think.

Wouldn't it be very beneficial to add the multiprocessing layer around? Then you would have numpy's parallelization in separate jobs. Those individual processes would be a bit slower due to CPU workload, but altogether the only bottleneck would be memory, with a significant increase in speed.

@arokem
Copy link
Contributor

arokem commented May 24, 2025

Depending on your definition of "very"... But ultimately an empirical question.

In terms of cpu time the thing that would be most beneficial is to carefully tune that 10e4 number (chunk_size I believe) to the machine you are running on. But not the best use of programmer time, given these models run quite fast as is.

@oesteban oesteban force-pushed the enh/leverage-dipy-parallelization branch from f2e520d to d816f7c Compare May 26, 2025 08:04
"joblib",
"nipype>= 1.5.1,<2.0",
"nitransforms>=22.0.0,<24",
"nireports",
"numpy>=1.21.3",
"nest-asyncio>=1.5.1",
"ray",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we requiring both joblib and ray? I am not sure how we can make these be individual choices for the user depending on their preference (?). We should probably raise a warning if there is no parallelization backend so that users become aware of the reason why the run is so slow.

As an additional note, DIPY also offers dask as another parallelization backend, e.g.
https://github.com/dipy/dipy/blob/master/dipy/utils/tests/test_parallel.py#L12-L18

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our experiments, ray performs very well, even relative to joblib, so giving people this option would be great (maybe even as default?). I would stay away from dask. In our experiments it very often choked and performed worse than a serial baseline.

For some of the details: https://nrdg.github.io/2024-dipy-parallelization/

@oesteban
Copy link
Member Author

(chunk_size I believe)

@arokem Is it possible it's called "step"?

https://github.com/dipy/dipy/blob/c8fa6d5fc4c21d23c74056182731b65da138a644/dipy/reconst/dti.py#L739-L745

@arokem
Copy link
Contributor

arokem commented May 28, 2025

@arokem Is it possible it's called "step"?

Yes, that's the one! Sorry for the brain-fart.

@oesteban
Copy link
Member Author

oesteban commented Jun 5, 2025

Depending on your definition of "very"... But ultimately an empirical question.

@arokem FYI, unless there's something very flawed in my tests, I've managed to decorate the DTI fitting with the multi_fit decorator and this is the result:

  • Handling only threading through step (n_jobs = 1) and increasing the chunk size to the number of voxels / number of threads, it fits my image in some 270 s.
  • Handling only multiprocessing (n_jobs = 12), fitting goes down to 37 s, and with n_jobs=20 it's 30.72 s.
  • Combining (step=(nvox // (omp_nthreads * n_jobs))=248842 and n_jobs=20, n_threads=20) it goes to 31 s.

So multiprocessing is way faster than numpy parallelization (sans error on my side configuring numpy, which I'm thinking could be the culprit).

@oesteban oesteban force-pushed the enh/leverage-dipy-parallelization branch from 21b7d76 to b7c8ccd Compare June 5, 2025 21:01
@oesteban oesteban requested review from arokem and jhlegarreta June 5, 2025 21:10
@oesteban oesteban force-pushed the enh/leverage-dipy-parallelization branch from b7c8ccd to e1f35be Compare June 6, 2025 12:03
@oesteban
Copy link
Member Author

oesteban commented Jun 6, 2025

Okay, so this was a good try, but I have slowed down things by much because prediction is not optimized for parallelization in DIPY (unless I got something wrong).

Other conclusions:

  • It's better not to pass a b0 as S0 unless it has been carefully processed:
    • CSF removed, otherwise it will artificially make ventricles and CSF around the brain to bright. Also, less voxels to fit.
    • Potentially, the b0 should be affected by the same inhomogeneity of DWIs, so that the cortical GM is artificially brighter, but matches the effect of proximity of the coils.
    • Needs to be denoised
  • DTI's step doesn't seem to have a meaningful impact as it changes, so I'm guessing there's something making my numpy work in single thread (although I've checked and it's not the case).
  • DKI seems to have some internal parallelization but I don't fully understand how it works.

@oesteban oesteban marked this pull request as draft June 6, 2025 13:37
@oesteban
Copy link
Member Author

oesteban commented Jun 6, 2025

Marking as a draft because I'm not 100% positive we want to go down this route. Let's chat in a TechMon or a specific nifreeze follow-up. @arokem let us know if you want to bring the TechMon back to a schedule more considerate of your timezone or you prefer a one-off meeting some time.

oesteban added a commit that referenced this pull request Jun 8, 2025
Brings improvements in parallelization management from #142, so they are
kept even if we finally decided against #142.

In particular, it opens up the implementation to set the ``step``
feature of DTI models.

It also extends the base data object with two convenience properties for
retrieving the 3D shape of the data and the number of voxels per volume.

X-References: #142.
oesteban added a commit that referenced this pull request Jun 8, 2025
Brings improvements in parallelization management from #142, so they are
kept even if we finally decided against #142.

In particular, it opens up the implementation to set the ``step``
feature of DTI models.

It also extends the base data object with two convenience properties for
retrieving the 3D shape of the data and the number of voxels per volume.

X-References: #142.
@arokem
Copy link
Contributor

arokem commented Jun 9, 2025

I would be happy to chat about this, but you are right that Tuesdays at 8am my time are never going to work for me. I would think that getting anything above optimally tuned numpy parallelization would be hard to achieve, and would be (both cases, I guess) quite dependent on a lot of things outside our control (user hardware, for example).

@oesteban
Copy link
Member Author

oesteban commented Jun 9, 2025

I would be happy to chat about this, but you are right that Tuesdays at 8am my time are never going to work for me

Let's find a better moment. Are you planning on going to OHBM?

I would think that getting anything above optimally tuned numpy parallelization would be hard to achieve, and would be (both cases, I guess) quite dependent on a lot of things outside our control (user hardware, for example).

While this is theoretically correct, in my experience it is really hard to prepare optimal numpy deployments (with large constraints in Linux platforms with cPython for the way virtual memory is managed and how numpy is implemented). However, it is also relatively inexpensive and effective to add multiprocessing for parallelization.

The problem I'm hitting right now is that DIPY's implementation is (IMHO) a bit irregular about it:

  • DTI has numpy parallelization, and decoration with the multi_voxel_fit thingy helped me speed up substantially.
  • DKI may be optimized, but it uses multi_voxel_fit internally without exposing arguments (engine, number of processes) and it's really hard to optimize in practice.
  • The rest of the models, in theory, should go well with the multi voxel fit feature.

All of the above is in DIPY's area. However, as I introduce in #158, we could try to leverage our HDF5 data store to optimize parallelization for minimal memory usage. Right now, we split the data in the main process (either with our parallelization or DIPY's), while the HDF5 opens an opportunity to not access the data until you are in a worker ready to do the actual processing.

@arokem
Copy link
Contributor

arokem commented Jun 11, 2025

Sadly, I will not be at OHBM this year. If things are irregular within DIPY, we should fix that!

@oesteban
Copy link
Member Author

Sadly, I will not be at OHBM this year. If things are irregular within DIPY, we should fix that!

Okay, I'll try to set it up after OHBM then (things are quite hasty for me before).

Re: DIPY. I'll be super-happy to report every issue I encounter, and I also understand that DIPY's scope/vision/roadmap may not be aligned with my suggestions, so no hard feelings if my comments are dismissed after consideration :)

I'm also happy to look at it from a new perspective and contribute more to DIPY if you otherwise see the feedback as highly useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants