Change UnitVector transform to use normalization #138

sethaxen · 2025-04-29T12:45:28Z

This fixes #66 and relates #86

Length 1 unit vectors are no longer supported. While they are technically possible, they will likely suffer from numerical issues, since the transform is undefined at x=0, and for a Markov chain to travel from y=[-1] to y[1], it would have to leap over the origin, which is only even possible due to discretization and likely will often not work.

Note the current implementation is bare-bones and has the limitation that it's not appropriate for optimization when the target distribution may include a unit-vector whose distribution is uniform on the sphere. I think it would be useful to accept a logpr function corresponding to log(r^(1-n) p(r)) where p(r) is an (unnormalized) prior on the discarded norm r. As noted in #86 (comment) and https://discourse.mc-stan.org/t/a-better-unit-vector/26989/30, this would support providing an alternative prior in these rare cases whose logpr is not maximized at r=0. If this is welcome, I'll add it to the PR.

src/special_arrays.jl

Co-authored-by: David Widmann <[email protected]>

tpapp

Thanks! The docs also needs to be updated but I can take care of that myself.

If you have the time I would appreciate an explanation of why we need to correct by the Chi distribution.

tpapp · 2025-04-29T13:35:23Z

test/runtests.jl

+            lj_transform = logdet(J' * J) / 2
+            # lp_prior
+            # un-normalized Chi distribution prior on r
+            lp_prior = (K - 1) * log(r) - r^2 / 2


Can you please explain why this correction is needed?

tpapp · 2025-04-29T13:40:44Z

test/runtests.jl

        x = randn(dimension(tt))
        y = transform(tt, x)
        x′ = inverse(tt, y)
-        @test x ≈ x′
+        m = sum(2:N)
+        @test x[1:end-m] ≈ x′[1:end-m]


We might as well remove these two lines, the one below is sufficient.

tpapp · 2025-04-29T14:21:42Z

Apologies is the question is stupid, but could this be a bijection by adding the norm? Ie x ∈ ℝⁿ would be mapped to the (Named)Tuple (y, r), where r = norm(x, 2) and y = x ./ r.

The user would need to put a prior on r explicitly.

sethaxen · 2025-04-29T15:16:44Z

If you have the time I would appreciate an explanation of why we need to correct by the Chi distribution.

Apologies is the question is stupid, but could this be a bijection by adding the norm? Ie x ∈ ℝⁿ would be mapped to the (Named)Tuple (y, r), where r = norm(x, 2) and y = x ./ r.

Yes that's exactly what's implicitly going on here. There are 2 equivalent perspectives we can take, but I'll focus on the one that I find most intuitive. To make the transform bijective almost everywhere, we expand the parameter space so the transform is x ↦ (y=normalize(x), r=norm(x)). The only point where this is non-bijective is x=0. Its Jacobian is non-square, but we have some tricks to compute the Jacobian determinant (as noted in the test comments, we can use the generalized Jac det that appears in the area formula logJ = logdet(J'J)/2. The Jacobian here is J=[(I-y*y')/r; y'], so logdet(J'J)/2 == (1 - n) * log(r).

After applying the transform we have an extra parameter r, but if we don't specify a prior, then by default r follows the uniform measure over the reals, which has infinite mass. As a result, the distribution we would get on x would also have infinite mass, and MCMC will fail. So we need to place a prior on r. The Chi prior used here is chosen because if we have n IID standard normal parameters x, then y is uniform on the sphere and r has this Chi distribution; so if the user's target distribution is uniform on the sphere then the distribution on x is the nice standard normal.

That choice of prior on r is fine in probably 99% of cases. It only really becomes an issue 1) when optimizing a uniform spherical distribution (highly unlikely) 2) when the target distribution on y is highly concentrated. For n=2, in the worst case, the direction of x is almost completely fixed but the length is not, so the distribution of x is almost on a linear subspace, curvature is high, and HMC can diverge. In these cases choosing a prior on r that is very tight around r=1 can drastically improve sampling. In practice I've only seen this be an issue when n=2.

The user would need to put a prior on r explicitly.

Yes, that's right. Cases like this were my motivation for the syntax product_distribution((y=VonMisesFisher(...), r=Chi(n))) in Distributions. But the downside is it requires the user know 1) that they need to pick a prior 2) know why there's a floating parameter they don't care about and 3) know what a good choice of prior is. To me this seems burdensome when the default Chi prior is fine for almost every user.

A practical issue with the user picking a prior themselves is that logJ has a term (1-n)*log(r), which will be Inf at r=0. If you then add the log-density of Chi, it has a (n-1)*log(r) term, which will be -Inf. The two terms should cancel, but by computing the logJ and the prior separately, you'll get a NaN. So having the user explicitly provide a prior for r would not solve the problem of optimization failing for uniform distributions on the sphere. That's why in the OP I proposed logpr(r) = log(r^(1-n) p(r)). Again, this failure mode is probably rare.

sethaxen · 2025-04-29T15:20:12Z

One more comment: If the approach in this PR is taken, then not every transform here will be bijective/invertible, and the log-density correction is not just a logdetjac. So maybe some documentation would need to be updated. It would still be the case that every transform should be right-invertible.

tpapp · 2025-04-29T15:38:13Z

Thanks for the intuitive explanation, this now makes sense.

so logdet(J'J)/2 == (1 - n) * log(r)

Isn't there an extra -r^2 term, like in the code?

But the downside is it requires the user know 1) that they need to pick a prior 2) know why there's a floating parameter they don't care about and 3) know what a good choice of prior is. To me this seems burdensome when the default Chi prior is fine for almost every user.

Yes, I see your point. Also, in the future, we might consider adding similar mappings where an unconstrained x vector is transformed to an (y, z) tuple, where z may require similar regularization. Instead of special-casing this, I would like to make it generic.

then not every transform here will be bijective/invertible

But returning r too would make it (almost always) invertible, which is why I am considering it. I am really reluctant to give up invertibility.

The two terms should cancel, but by computing the logJ and the prior separately, you'll get a NaN.

I am thinking about the following API:

the UnitVector transformation maps x to (y, r) as described above, returning a NamedTuple. This is documented.
it takes an optional argument for the prior on r, which defaults to Chi distribution. The user can modify this. We document that this only affects the logjac and explain the rationale.
the log jacobian determinant is adjusted by this, we dispatch on Chi in an internal method that avoids the NaN issue; if the user has a preference for another prior they should handle it themselves, there will be a default fallback.

Comments welcome.

sethaxen · 2025-04-29T15:57:53Z

Isn't there an extra -r^2 term, like in the code?

No the -r^2/2 term comes from the Chi prior.

But returning r too would make it (almost always) invertible, which is why I am considering it. I am really reluctant to give up invertibility.

Yeah that makes sense. I also like it's explicitness.

it takes an optional argument for the prior on r, which defaults to Chi distribution. The user can modify this. We document that this only affects the logjac and explain the rationale.

the log jacobian determinant is adjusted by this, we dispatch on Chi in an internal method that avoids the NaN issue; if the user has a preference for another prior they should handle it themselves, there will be a default fallback.

A simpler alternative would be a boolean flag defaulting to true that indicates whether the Chi prior is applied or not. Then document the prior and that the user can set it to false if they want but then they must explicitly increment the log-density with their own prior (i.e. logJ is really just the logdetjac). It doesn't avoid the NaN issue though, but I do think this is a rare issue, and it's also avoidable by choosing a suitable prior on r.

sethaxen · 2025-04-30T09:59:36Z

Suggestion: If also returning r, rename it to UnitVectorNorm. An alternative name could be PolarVector, since the polar decomposition of a vector interpreted as an nx1 matrix would return the unit vector and a 1x1 matrix containing the norm.

tpapp · 2025-05-02T12:04:15Z

@sethaxen, I decided to start over and copied your code over to #139

sethaxen added 11 commits April 29, 2025 13:12

Load norm and rmul! from LinearAlgebra

4401070

Make dimension of transform equal to output size

423bc8c

Update unit vector transform to use normalization

629f4a2

Update inverse transform to be identity

54b3882

Require unit vectors be length >2

0ddf68d

Update unit vector docstring

41ac480

Update UnitVector tests

aae6c51

Update UnitVector dimension tests

6cd99b8

Test that inverse is a "right inverse"

2ab58bb

Avoid test at singularity of UnitVector

340deb9

Update show test

a7c1e73

devmotion reviewed Apr 29, 2025

View reviewed changes

src/special_arrays.jl Outdated Show resolved Hide resolved

src/special_arrays.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

b9f36ed

Co-authored-by: David Widmann <[email protected]>

tpapp reviewed Apr 29, 2025

View reviewed changes

tpapp closed this May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change UnitVector transform to use normalization #138

Change UnitVector transform to use normalization #138

Uh oh!

sethaxen commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

tpapp left a comment

Uh oh!

tpapp Apr 29, 2025

Uh oh!

tpapp Apr 29, 2025

Uh oh!

tpapp commented Apr 29, 2025

Uh oh!

sethaxen commented Apr 29, 2025

Uh oh!

sethaxen commented Apr 29, 2025

Uh oh!

tpapp commented Apr 29, 2025

Uh oh!

sethaxen commented Apr 29, 2025 •

edited

Loading

Uh oh!

sethaxen commented Apr 30, 2025

Uh oh!

tpapp commented May 2, 2025

Uh oh!

Uh oh!

Change UnitVector transform to use normalization #138

Change UnitVector transform to use normalization #138

Uh oh!

Conversation

sethaxen commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

tpapp left a comment

Choose a reason for hiding this comment

Uh oh!

tpapp Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

tpapp Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

tpapp commented Apr 29, 2025

Uh oh!

sethaxen commented Apr 29, 2025

Uh oh!

sethaxen commented Apr 29, 2025

Uh oh!

tpapp commented Apr 29, 2025

Uh oh!

sethaxen commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sethaxen commented Apr 30, 2025

Uh oh!

tpapp commented May 2, 2025

Uh oh!

Uh oh!

sethaxen commented Apr 29, 2025 •

edited

Loading