Replies: 3 comments 1 reply
-
Perhaps as a first step: would things break if the final correction step was moved in front of the interpolation operation, i.e. if we corrected the grid from which we interpolate the final result, as opposed to the final points themselves? I'm happy to try this, but if anyone can immediately see that this cannot work from a theoretical point of view, please tell me :-) |
Beta Was this translation helpful? Give feedback.
-
Happy New Year Martin, Sorry, I have been overwhelmed by other projects. Thinking about accelerating type 3 would be great - it has many applications (eg powder diffraction, SAXS). The squaring of the oversampling factor sigma I do not know a way around. It arises because of the point sampling on both input and output sides (if smooth functions were the I/O, no sigma^2 would be needed). It would certainly require "breaking open" the black-box type 2 comprising step 2. Note the four references I give to this t3 alg in the FINUFFT paper. Note also Rmk 4 from my paper: the absolute minimum n (outer FFT size) would be (2/pi)SX, using the Nyquist criterion for input points at \pm X, and output at \pm S. But you cannot approach this criterion without the kernel getting arbitrarily wide (approaching a sinc). And the inner t2 needs to be accurate, hence also sigma needed there. To me the "fairly expensive" d cosines per output point is linear and trivially parallelizable, so it's not clear pointwise-scaling the entire regular grid instead is a win (it touches more DRAM) - but I imagine it would work with about the same accuracy, so go ahead and try - your intuition is good! Clearly the t3 algorithm is not optimal in clustered settings, as you know from my Rmk 5 in FINUFFT paper - and it has been exploited in recent nice work of Sepand Kashani https://arxiv.org/abs/2306.06007 which shrinks FFT sizes at the expense of many-vector transforms (which he found a lovely way to speed up - see gdoc notes). A multilevel butterfly scheme for t3 might win in the end, avoiding FFTs altogether, certainly for high-freq case (XS)^d >> N or M. This would extend work of Lexing Ying https://arxiv.org/abs/0801.1524 which I should have cited in the FINUFFT paper. But that's a PhD project :) Anyway, I hope that provides food for thought... Best, Alex |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your thoughts, Alex! I will have a closer look at the other papers covering type 3, and also at Sepand's PhD thesis as soon as possible, which shld mke many aspects of the transform clearer to me. Concerning correction of NU output points vs. possibly correcting the intermediate grid: yes, the latter touches potentially more RAM, but this step (if it happens to actually work!) can be done together with other steps (e.g. copying the FFT result from the oversampled array to the output array), so this shouldn't be an issue. I am more concerned with the actual calculation time for the correction ... computing the correction function for a single NU point has a comparable cost to spreading/interpolating this point, so it feels quite significant. But of course this only eeds to be calculated once per plan, not per transform, so it is probably not a big deal in many situations. It might be possible to approximate the correction function with polynomials, which could be nice ... I'll try that as soon as I have enough spare time. I'm also looking forward to make use of Sepand's divide-and-conquer approach, which should help clustered type 3 transforms a lot. But I'm less confident that the additional trick mentioned in the gdoc notes (re-using precomputed kernel values) will be advantageous within a library that has very efficient kernel computation, like finufft has; see also SepandKashani/fourier_toolkit#1. |
Beta Was this translation helpful? Give feedback.
-
This is just a loose collection of things I'd like to understand better about type 3 transforms, and which might help in improving them in the future ...
As far as I understand, type 1 and 2 transforms are adjoint to each other (if
iflag
is set properly in both directions).From looking at the mathematical definition, type 3 should be adjoint to itself; that is, a single function can carry out forward and adjoint transforms (again, when
iflag
is set properly).Given this, it feels surprising to me that the internal structure of the implementation of type 3 transforms is so "asymmetric":
Is there a way to avoid the twofold oversampling? And can we replace the expensive point-by-point correction at the end with something cheaper?
The symmetry in the operation itself feels like a strong hint to me that something can be done here, but I have no concrete idea yet ...
Beta Was this translation helpful? Give feedback.
All reactions