Unsorted thoughts about type 3 transforms #600

mreineck · 2025-01-03T11:52:06Z

mreineck
Jan 3, 2025
Maintainer

This is just a loose collection of things I'd like to understand better about type 3 transforms, and which might help in improving them in the future ...

As far as I understand, type 1 and 2 transforms are adjoint to each other (if iflag is set properly in both directions).

From looking at the mathematical definition, type 3 should be adjoint to itself; that is, a single function can carry out forward and adjoint transforms (again, when iflag is set properly).

Given this, it feels surprising to me that the internal structure of the implementation of type 3 transforms is so "asymmetric":

first, there is a spreading operation, producing an oversampled grid
then there is a full type 2 NUFFT, which internally involves an even more oversampled grid
then there is a correction of the individual output points that requires fairly expensive calculations

Is there a way to avoid the twofold oversampling? And can we replace the expensive point-by-point correction at the end with something cheaper?
The symmetry in the operation itself feels like a strong hint to me that something can be done here, but I have no concrete idea yet ...

mreineck · 2025-01-03T12:05:13Z

mreineck
Jan 3, 2025
Maintainer Author

Perhaps as a first step: would things break if the final correction step was moved in front of the interpolation operation, i.e. if we corrected the grid from which we interpolate the final result, as opposed to the final points themselves?

I'm happy to try this, but if anyone can immediately see that this cannot work from a theoretical point of view, please tell me :-)

0 replies

ahbarnett · 2025-01-06T22:29:38Z

ahbarnett
Jan 6, 2025
Maintainer

Happy New Year Martin,

Sorry, I have been overwhelmed by other projects.

Thinking about accelerating type 3 would be great - it has many applications (eg powder diffraction, SAXS). The squaring of the oversampling factor sigma I do not know a way around. It arises because of the point sampling on both input and output sides (if smooth functions were the I/O, no sigma^2 would be needed). It would certainly require "breaking open" the black-box type 2 comprising step 2. Note the four references I give to this t3 alg in the FINUFFT paper.

Note also Rmk 4 from my paper: the absolute minimum n (outer FFT size) would be (2/pi)SX, using the Nyquist criterion for input points at \pm X, and output at \pm S. But you cannot approach this criterion without the kernel getting arbitrarily wide (approaching a sinc). And the inner t2 needs to be accurate, hence also sigma needed there.

To me the "fairly expensive" d cosines per output point is linear and trivially parallelizable, so it's not clear pointwise-scaling the entire regular grid instead is a win (it touches more DRAM) - but I imagine it would work with about the same accuracy, so go ahead and try - your intuition is good!

Clearly the t3 algorithm is not optimal in clustered settings, as you know from my Rmk 5 in FINUFFT paper - and it has been exploited in recent nice work of Sepand Kashani https://arxiv.org/abs/2306.06007 which shrinks FFT sizes at the expense of many-vector transforms (which he found a lovely way to speed up - see gdoc notes). A multilevel butterfly scheme for t3 might win in the end, avoiding FFTs altogether, certainly for high-freq case (XS)^d >> N or M. This would extend work of Lexing Ying https://arxiv.org/abs/0801.1524 which I should have cited in the FINUFFT paper. But that's a PhD project :)

Anyway, I hope that provides food for thought... Best, Alex

0 replies

mreineck · 2025-01-07T10:43:23Z

mreineck
Jan 7, 2025
Maintainer Author

Thank you very much for your thoughts, Alex!

I will have a closer look at the other papers covering type 3, and also at Sepand's PhD thesis as soon as possible, which shld mke many aspects of the transform clearer to me.

Concerning correction of NU output points vs. possibly correcting the intermediate grid: yes, the latter touches potentially more RAM, but this step (if it happens to actually work!) can be done together with other steps (e.g. copying the FFT result from the oversampled array to the output array), so this shouldn't be an issue. I am more concerned with the actual calculation time for the correction ... computing the correction function for a single NU point has a comparable cost to spreading/interpolating this point, so it feels quite significant. But of course this only eeds to be calculated once per plan, not per transform, so it is probably not a big deal in many situations. It might be possible to approximate the correction function with polynomials, which could be nice ... I'll try that as soon as I have enough spare time.

I'm also looking forward to make use of Sepand's divide-and-conquer approach, which should help clustered type 3 transforms a lot. But I'm less confident that the additional trick mentioned in the gdoc notes (re-using precomputed kernel values) will be advantageous within a library that has very efficient kernel computation, like finufft has; see also SepandKashani/fourier_toolkit#1.

1 reply

SepandKashani Jan 7, 2025

As @ahbarnett recommended, I'll add some extra plots in the thesis to show why the kernel reuse between transforms works.
That being said, the results I show use my custom spreader implementation which is slower than the one in FINUFFT. The actual speedup factor obtained when pushing the stack dimension into the innermost spread/interp loop in a full-C++ implementation should be tested.

This is something I'm interested to investigate in 2025 since having a general-purpose spreader is useful in many imaging problems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsorted thoughts about type 3 transforms #600

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Unsorted thoughts about type 3 transforms #600

mreineck Jan 3, 2025 Maintainer

Replies: 3 comments · 1 reply

mreineck Jan 3, 2025 Maintainer Author

ahbarnett Jan 6, 2025 Maintainer

mreineck Jan 7, 2025 Maintainer Author

SepandKashani Jan 7, 2025

mreineck
Jan 3, 2025
Maintainer

Replies: 3 comments 1 reply

mreineck
Jan 3, 2025
Maintainer Author

ahbarnett
Jan 6, 2025
Maintainer

mreineck
Jan 7, 2025
Maintainer Author