-
Notifications
You must be signed in to change notification settings - Fork 87
Add the capability to do adjoint transforms #633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Just to be clear: at this point, there is no interface support for the new functioality. I just want to show which kind of impact the new feature has on the existing implementation. |
I've added the C and Python interfaces, as well as basic Python unit tests. |
Test failures seem to be "near misses" in adjoint type 3 transforms. No idea why that direction should be less accurate, and why it only happens in some of the tests. |
…s. Perhaps 1e-6 is just a bit too close to the machine epsilon for single precision
I think this is ready for "technical" review. If there is agreement that the change is desirable, I can try to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few explanatory comments
python/finufft/examples/guru1d1f.py
Outdated
@@ -20,7 +20,7 @@ | |||
strt = time.time() | |||
|
|||
#plan | |||
plan = fp.Plan(1,(N,),dtype='single') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unrelated to the PR, but without it, CI simply fails. I don't know why this hasn't caused issues so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems because we removed the conversion for real dtypes in https://github.com/flatironinstitute/finufft/pull/606/files
python/finufft/examples/guru2d1f.py
Outdated
@@ -34,7 +34,7 @@ | |||
|
|||
# instantiate the plan (note n_trans must be set here), also setting tolerance: | |||
t0 = time.time() | |||
plan = finufft.Plan(nufft_type, (N1, N2), eps=1e-4, n_trans=K, dtype='float32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unrelated to the PR, but without it, CI simply fails. I don't know why this hasn't caused issues so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this, it seems all the *f.py
examples(in python/finufft/examples) should change according to PR606? otherwise the check is_single_dtype(dtype)
finufft/python/finufft/finufft/_interfaces.py
Line 106 in d2e0ff7
is_single = is_single_dtype(dtype) |
@@ -86,6 +120,19 @@ def test_finufft3_plan(dtype, dim, n_source_pts, n_target_pts, output_arg): | |||
|
|||
utils.verify_type3(source_pts, source_coefs, target_pts, target_coefs, 1e-6) | |||
|
|||
# test adjoint type 3 | |||
plan = Plan(3, dim, dtype=dtype, isign=-1, eps=1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm increasing eps
from 1e-6 to 1e-5 here, because I get occasional failures with single precision otherwise. Given that 1e-6 is uncomfortably close to machine epsilon, I'm not too worried about this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. type 3 errors are usually 2-3x bigger than type 1 or 2 at the same tolerance.
@@ -154,7 +154,7 @@ def verify_type1(pts, coefs, shape, sig_est, tol): | |||
|
|||
type1_rel_err = np.linalg.norm(fk_target - fk_est) / np.linalg.norm(fk_target) | |||
|
|||
assert type1_rel_err < 25 * tol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switching from assert
to np.testing.assert_allclose
here, because the latter will provide more information in case of failure, which speeds up debugging a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea
} | ||
} | ||
} | ||
#endif | ||
#else | ||
p->fftPlan->execute(); // if thisBatchSize<batchSize it wastes some flops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs discussion: how do we want to deal with this situation? The trick used here is nice and simple, but it will make the adjoint call slower than the forward one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't understand the trick or why it's slower - sorry for my stupidity. I'd also like to understand how FFTW is handled. Since your change to the fft.h adds the data pointer, can that work with FFTW too (and allow us to allocate-at-execute stage) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trick is the equivalence:
fft_adjoint(x) = conj(fft(conj(x)))
(since we do not have a plan for fft_adjoint
, we just mess around with the sign of the imaginary part of input and output to get te same effect as if we had switched isign
).
The additional complex conjugate of input and output is of course not completely free, but it might be sub-dominant compared to the cost of the FFT itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, we do allocate the temporary at the execute stage in this PR, also for FFTW.
I just fixed py examples in 19a24c6
…On Mon, Feb 24, 2025 at 9:30 AM Libin Lu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In python/finufft/examples/guru2d1f.py
<#633 (comment)>
:
> @@ -34,7 +34,7 @@
# instantiate the plan (note n_trans must be set here), also setting tolerance:
t0 = time.time()
-plan = finufft.Plan(nufft_type, (N1, N2), eps=1e-4, n_trans=K, dtype='float32')
Thanks for catching this, it seems all the *f.py examples(in
python/finufft/examples) should change according to PR606? otherwise the
check is_single_dtype(dtype)
https://github.com/flatironinstitute/finufft/blob/d2e0ff7cd3a7bd66384d961d647060f497fde13c/python/finufft/finufft/_interfaces.py#L106
will fail.
—
Reply to this email directly, view it on GitHub
<#633 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSUEOSDPVX33Q6B3WTT2RMUHHAVCNFSM6AAAAABXLHSEQ2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDMMZXGI4TSMZSGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Martin,
[This is supposed to go as a PR review but appeared as a comment]
Your work here is great. As usual you are able to slot something in with minimal disruption. It will certainly allow power users to slip in the adjoint when needed. I don't understand how the FFTW part works - there seem to be no changes which would allow the fftw_execute to change type (or handle the new data ptr in the fft.h interface) - or did I miss it?
However, I am thinking about the interface from the new user perspective, and think it is getting more confusing than needs be. In your PR a user plans a type 1, then can do its adjoint (type 2 with flipped isign) freely. Users will want to know if that's the same a planning a type 2 and doing a forward - it's not due to isign. (Extra confusion: in Europe t1 is called "adjoint" and t2 "forward" NUDFT.) I see the isign flip between going from 1 to 2 vs 1 to its adj being a major source of confusion. The motivation for your PR was users wanting a single plan for t1 and t2, or for t1 and its adj. I think we should bounce around ideas for a new interface that fits the need. An idea to reduce confusion is to have a "plan 1 and 2", and then execute_1(c,f,isign) execute_2(c,f,isign). No confusion. 1+ is adj of 2-, and 1- adj of 2+. Question is can we do it in a neat way that preserves the existing guru interface (add a type=12 to the plan?) since legacy users need to still be able to plan a type 2 explicitly and have plan execute(c,f) do that t2.
Other questions are: 4 if FFTW plan can handle the isign swtiching?
This would all be for the plan (guru) interface. It would expand that interface from 4 to six commands. Under the hood (as you do in this PR with execute_internal()), execute() could call execute_1 or _2.
I have lots of deadlines in the next month so will have to wait a bit, but I'd love to discuss this and hash out the best interface, since I don't want a confusing interface to become locked in...
Best, Alex
} | ||
} | ||
} | ||
#endif | ||
#else | ||
p->fftPlan->execute(); // if thisBatchSize<batchSize it wastes some flops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't understand the trick or why it's slower - sorry for my stupidity. I'd also like to understand how FFTW is handled. Since your change to the fft.h adds the data pointer, can that work with FFTW too (and allow us to allocate-at-execute stage) ?
@@ -424,72 +425,76 @@ static void deconvolveshuffle3d(int dir, T prefac, std::vector<T> &ker1, | |||
// --------- batch helper functions for t1,2 exec: --------------------------- | |||
|
|||
template<typename T> | |||
static int spreadinterpSortedBatch(int batchSize, FINUFFT_PLAN_T<T> *p, | |||
std::complex<T> *fwBatch, std::complex<T> *cBatch) | |||
static int spreadinterpSortedBatch(int batchSize, const FINUFFT_PLAN_T<T> &p, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THis is just part of your ongoing cleanup, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly yes. But the change to const
also ensures that we don't do things like
innerT2plan->ntrans = thisBatchSize;
any more (which we had in execute()
before). This sort of "messing with the state of another object" can cause a lot of trouble once plans are perhaps invoked in parallel ... and that's what I wanted to make possible with this patch.
The switch from pointer to reference is cosmetics, and I should perhaps have delayed that until later ... sorry,
This is mostly a loop calling deconvolveshuffle?d for the needed dim, batchSize | ||
times. | ||
Barnett 5/21/20, simplified from Malleo 2019 (eg t3 logic won't be in here) | ||
*/ | ||
{ | ||
// since deconvolveshuffle?d are single-thread, omp par seems to help here... | ||
int dir = p.spopts.spread_direction; | ||
if (adjoint) dir = 3 - dir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there's the meat - please add some description of what this obscure subtraction does :) (I know, but most readers will not...) Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are totally right, I'll do that!
#endif | ||
} | ||
template void do_fft<float>(FINUFFT_PLAN_T<float> *p); | ||
template void do_fft<double>(FINUFFT_PLAN_T<double> *p); | ||
template void do_fft<float>(const FINUFFT_PLAN_T<float> &p, std::complex<float> *fwBatch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an FFTW version of passing the data ptr like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* See ../docs/cguru.doc for current documentation. | ||
|
||
For given (stack of) weights cj or coefficients fk, performs NUFFTs with | ||
existing (sorted) NU pts and existing plan. | ||
For type 1 and 3: cj is input, fk is output. | ||
For type 2: fk is input, cj is output. | ||
For adjoint == false: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these comments are super helpful
- 0 < ntrans_actual <= batchSize: instead of doing ntrans transforms, | ||
perform only ntrans_actual | ||
|
||
scratch_size, aligned_scratch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something we should do anyway with t3? (regardless of adjoint?)
|
||
Additional parameters introducd by MR, 02/2025 | ||
|
||
adjoint: if false, behave as before; if true, compute the adjoint of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my overall PR review discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very happy to adjust the interface in any way that makes it easier for finufft users to get used to. This PR is (at the moment) mostly a demonstration of how things can be done. How they are presented is always up to the Finufft team.
When a new interface is perhaps on the horizon, and there are, as @ahbarnett points out, different conventions concerning what is type 1 and type 2, perhaps using descriptive names may be a way out, like |
This is a first outline how I propose to add adjoint transforms; it is mainly meant as a basis for discussions and measurements. The computation of the actual adjoint is completely untested yet, but the standard functionality should still be OK, as far as my tests show.
@ahbarnett, @DiamonDinoia please let me know your thoughts on this one!