Skip to content

[hal metal] ray tracing acceleration structures #7660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Lichtso
Copy link
Contributor

@Lichtso Lichtso commented May 2, 2025

Connections
Fixes: #7402

Description
Implements the missing ray tracing acceleration structures in the HAL metal backend.

Testing
The examples ray_scene, ray_shadows, ray_cube_compute, ray_cube_fragment and ray_traced_triangle all work.
That is if invoked via cargo run --bin wgpu-examples ray_traced_triangle, but not via cargo xtask test ray_traced_triangle, still current CI runner is too old to catch that as it does not support hardware ray tracing.

Squash or Rebase?
Squash

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

@Lichtso Lichtso requested a review from a team as a code owner May 2, 2025 22:41
@Lichtso Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch 2 times, most recently from e30b663 to f3830cb Compare May 2, 2025 22:52
Copy link
Collaborator

@Vecvec Vecvec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Glad there didn't need to be any wgpu-core changes. Largely looks good, but I'm not extremely knowledgeable about metal. One question / comment, but haven't yet checked everything with spec.

Copy link
Collaborator

@Vecvec Vecvec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done checks against the Metal spec. It seems this requires MacOS 13.0+ not 11.0+ due to some more recent functions being used. Confusingly, the vertex buffer field suggests that the only format supported is f32x3 so I'm not sure what descriptor.set_vertex_format does.

@@ -890,6 +890,11 @@ impl super::PrivateCapabilities {
&& (device.supports_family(MTLGPUFamily::Apple7)
|| device.supports_family(MTLGPUFamily::Mac2)),
supports_shared_event: version.at_least((10, 14), (12, 0), os_is_mac),
supports_raytracing: if version.at_least((11, 0), (14, 0), os_is_mac) {
device.supports_raytracing()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think raytracing support needs supportsRaytracingFromRender due to support of ray queries in fragment shaders (Requires MacOS 12.0+).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That function is not exposed in the Rust metal crate. But I did bump the min required versions to macOS 13 and iOS 16.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the metal crate is still taking PRs (Idk what state of deprecated they are in) it would probably be a good idea to add this (and the other later ones).

@Lichtso Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch from f3830cb to 234e75b Compare May 3, 2025 08:33
}

unsafe fn destroy_acceleration_structure(
&self,
_acceleration_structure: super::AccelerationStructure,
) {
unimplemented!()
// self.counters.acceleration_structures.sub(1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I there a reason not to have HalCounters::acceleration_structures?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking back at the history I couldn't find a reason, but it's possible it's buried somewhere.

for descriptor in descriptors {
let acceleration_structure_descriptor =
conv::map_acceleration_structure_descriptor(descriptor.entries);
/* The Rust metal crate does not expose metal::MTLAccelerationStructureUsage yet
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not exposed in the Rust metal crate.

@@ -35,6 +35,7 @@ var acc_struct: acceleration_structure;

struct PushConstants {
light: vec3<f32>,
padding: f32,
Copy link
Contributor Author

@Lichtso Lichtso May 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that metal always sends at least 16 bytes for push constants, even if we only pass in 12 bytes. And then the shader validation complains that the receiver here only expects 12 bytes.

@Lichtso Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch 2 times, most recently from 5f1c464 to 2a6d9b6 Compare May 3, 2025 11:13
@Lichtso
Copy link
Contributor Author

Lichtso commented May 3, 2025

Glad there didn't need to be any wgpu-core changes

Almost, had to remove the Option<> around the buffers and always pass the dummy zero buffer when computing the size of the acceleration structures and their scratch buffers because Metal does not like nil.

I can split those first four commits into a separate PR if that helps with the review.

@Vecvec
Copy link
Collaborator

Vecvec commented May 4, 2025

I just remembered that structures have minimum versions, and it seems MTLIndirectAccelerationStructureInstanceDescriptor required MacOS 14.0+ (probably should have checked that earlier...).

@Lichtso Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch 2 times, most recently from 38b3de7 to 9442511 Compare May 4, 2025 11:49
@Lichtso
Copy link
Contributor Author

Lichtso commented May 4, 2025

I just remembered that structures have minimum versions, and it seems MTLIndirectAccelerationStructureInstanceDescriptor required MacOS 14.0+ (probably should have checked that earlier...).

Bumped the min required version even further up.

@Lichtso
Copy link
Contributor Author

Lichtso commented May 4, 2025

I also managed to reduce the issue with the acelleration structure not intersecting any rays to a perfect reproducer and it is wild:

See the last commit "Bug reproducer", which modifies the ray_cube_fragment example to generate two BLASes: One with 152 triangles and one with 153 triangles.

With Metal on macOS the instances of the BLAS with 152 triangles (16344 bytes acceleration_structure_size) work as expected, but the ones with 153 triangles (16472 bytes acceleration_structure_size) suddenly stop intersecting rays after roughly 1.5 seconds no matter how many frames were rendered until then. 0x4000 = 2^14 = 16384 might be some special boundary being crossed. It also keeps happening even if I stop calling build_acceleration_structure() after the inital setup. Using MTLAccelerationStructureInstanceDescriptor or MTLIndirectAccelerationStructureInstanceDescriptor is also irrelevant. Same goes for calling encoder.use_resource_at(blas.as_native(), use_info.uses, use_info.stages) or not.

This also breaks Vulkan on Linux with a SIGSEGV upon Queue::submit: https://github.com/gfx-rs/wgpu/actions/runs/14820911901/job/41607697292?pr=7660

Using an example from metal-rs without wgpu does not reproduce this bug. It seems we are either lacking some validation step or are doing something wrong with our handling of acceleration structures in general.

@Vecvec: What testing hardware do you have available? Can you maybe see why Vulkan is failing this too?

@Lichtso Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch from 9442511 to 90082ad Compare May 4, 2025 12:40
@Vecvec
Copy link
Collaborator

Vecvec commented May 4, 2025

@Vecvec: What testing hardware do you have available? Can you maybe see why Vulkan is failing this too?

I've got a couple of raytracing supported machines (plus llvmpipe which I will also be testing on). I'll have a look and see if I can get any ideas of what the issue might be.

@Vecvec
Copy link
Collaborator

Vecvec commented May 4, 2025

Hits a divide by zero on Microsoft Basic Render Driver (though it doesn't seem to be related to the memory used, and only on one of my comuters). Can't get it to fail on the real gpus yet. Was able to reproduce the llvmpipe seg fault (edit: Don't think it's the same problem as the one here), will continue testing.

@Lichtso
Copy link
Contributor Author

Lichtso commented May 4, 2025

divide by zero

Might be that it tries to normalize a zero-length vector. The modified example does simply duplicate triangles so that could cause some vectors to become zero.

I narrowed the Metal issue down further and it is indeed caused by AccelerationStructureBuildSizes::acceleration_structure_size being greater or equal to 0x4000. For example if I modify device.new_acceleration_structure_with_size(descriptor.size.max(0x4000)) in Device::create_acceleration_structure() only (which is the latest point and makes sure that it is only related to the Metal backend) then all BLAS instances first work fine but disappear after 1.5 seconds. Reading the Metal docs it appears that 16384 (0x4000) is indeed used as API limit for other things like the mesh shader output buffer. So maybe there is a bug in the Metal driver, because I can not immagine that the limit for acceleration structure sizes is supposed to be so low.

Edit: Officially the limits are way higher, see https://developer.apple.com/documentation/metal/mtlaccelerationstructureusage/extendedlimits.

@Vecvec
Copy link
Collaborator

Vecvec commented May 4, 2025

Most other resources are created with an auto release pool around them, is it possible that that is fixing this issue somehow?

@Lichtso
Copy link
Contributor Author

Lichtso commented May 5, 2025

Most other resources are created with an auto release pool around them, is it possible that that is fixing this issue somehow?

Added one in Device::create_acceleration_structure() but unfortunately that was not it either. There must be some other conditions to trigger it because the metal-rs examples don't and the wgpu examples only do when called via cargo xtask test.

I would say we try to land this PR and then open an issue for it to solve that separately.

BTW, I noticed the CI runner "Test Mac aarch64" job is not failing. Probably the test runner is too old to support hardware raytracing and skips the relevant tests.

@Lichtso Lichtso requested a review from Vecvec May 5, 2025 07:50
@Vecvec
Copy link
Collaborator

Vecvec commented May 5, 2025

Added one in Device::create_acceleration_structure() but unfortunately that was not it either

That's annoying, I wonder what it could be

I would say we try to land this PR and then open an issue for it to solve that separately.

Yes, though it could be some time before it lands.

I noticed the CI runner "Test Mac aarch64" job is not failing. Probably the test runner is too old to support hardware raytracing and skips the relevant tests

I checked and it does skip.

Copy link
Collaborator

@Vecvec Vecvec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excluding the things that aren't exposed by the metal crate this looks good to me.

.flags
.contains(wgt::AccelerationStructureGeometryFlags::OPAQUE),
);
// wgt::AccelerationStructureGeometryFlags::NO_DUPLICATE_ANY_HIT_INVOCATION
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like this should set allowDuplicateIntersectionFunctionInvocation if NO_DUPLICATE_ANY_HIT_INVOCATION is not set but metal-rs doesn't support this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to: gfx-rs/metal-rs#361

@jimblandy jimblandy requested a review from cwfitzgerald June 4, 2025 15:17
@cwfitzgerald cwfitzgerald self-assigned this Jun 25, 2025
@cwfitzgerald
Copy link
Member

What's teh current status of this PR? @Vecvec what would the next steps be to be able to land this?

@Vecvec
Copy link
Collaborator

Vecvec commented Jun 25, 2025

It's blocked on gfx-rs/metal-rs#361. There also sounds to be some driver issue which only makes acceleration structures work when a window is there. @Lichtso would probably be able to give more details.

Edit: there is also a need to keep acceleration structures resident (I've listed potential solutions in #7660 (comment))

Edit 2: The potential solutions were only the ones I found on metal's docs, I might look into how metal does its DXR/VKRay conversions.

@cwfitzgerald
Copy link
Member

Re: residency - could you call useResource?

@cwfitzgerald
Copy link
Member

Alright, I'll get that metal PR landed

@Lichtso
Copy link
Contributor Author

Lichtso commented Jun 25, 2025

there is also a need to keep acceleration structures resident (I've listed potential solutions in #7660 (comment))

MTLResidencySet would also have to be exposed in metal-rs first. But I haven't even tried it yet.

The potential solutions were only the ones I found on metal's docs, I might look into how metal does its DXR/VKRay conversions.

MoltenVK has not implemented ray tracing either: (see KhronosGroup/MoltenVK#427 and KhronosGroup/MoltenVK#1956). Or were you thinking about another translation layer / project?

@Vecvec
Copy link
Collaborator

Vecvec commented Jun 26, 2025

Re: residency - could you call useResource?

I think I mentioned this earlier, but it must be in some review comment. We can't due to allowing out of order BLAS builds. Basically:

  • Record build in encoder 1 with blas 1 in tlas 1.
  • Use tlas 1 in encoder 2.
  • Record build in encoder 3 with blas 2 in tlas 1.
  • Queue submit with encoder 3 then encoder 2.

Blas 1 would be resident while blas 2 wouldn't be, but blas 2 would need to be resident.

Edit: #7660 (comment)

@Vecvec
Copy link
Collaborator

Vecvec commented Jun 26, 2025

MoltenVK has not implemented ray tracing either: (see KhronosGroup/MoltenVK#427 and KhronosGroup/MoltenVK#1956). Or were you thinking about another translation layer / project

I was thinking of Game Porting Tool Kit, but maybe it doesn't support ray tracing either.

Edit at least the shader converter supports this, and I would assume apple would support all of it.
https://developer.apple.com/metal/shader-converter/#changelog

Version | Changes | Requirements
-- | -- | --
2 | Support for shader debug information, globally-coherent memory access, and SV_CullPrimitive. | Globally-coherent memory access requires targeting macOS 15, iOS 18, or later.
1.1 | Support for ray tracing shaders. | Metal ray tracing support.
1 | Initial release. | Argument buffers tier 2 support.

@Vecvec
Copy link
Collaborator

Vecvec commented Jun 26, 2025

@Lichtso did you file a bug with apple for acceleration structures not working w/o a window? It would be good to keep an eye on it in this PR (or when this PR lands, in an issue).

@Lichtso
Copy link
Contributor Author

Lichtso commented Jun 26, 2025

@Lichtso did you file a bug with apple for acceleration structures not working w/o a window? It would be good to keep an eye on it in this PR (or when this PR lands, in an issue).

No I haven't, yet. Would have to create minimized reproducer first and write it in Swift. Also, creating the window makes the difference, but it could be a second order effect like timing. E.g. creating the window yields to the kernel and the process is resumed later than if it didn't, things like that.

I was thinking of Game Porting Tool Kit

Ah, you mean the D3DMetal.framework but the source code for that is not public, binary distribution only.

@Vecvec
Copy link
Collaborator

Vecvec commented Jun 26, 2025

Ah, you mean the D3DMetal.framework but the source code for that is not public, binary distribution only.

I'd assumed that apple might make some way of showing what each call translates to so that developers could port their own games so they didn't have to constantly rely on a translation layer, I guess that doesn't exist.

@Vecvec Vecvec self-requested a review June 26, 2025 21:04
@Vecvec
Copy link
Collaborator

Vecvec commented Jul 2, 2025

@Lichtso are you able to use a debugger on the tests? (It looks to be possible at least under cargo test) If so, could you see what acceleration structure sizes we are getting (in case something in metal is failing), whether they are different, and also look at what the acceleration structure pointer is - it is just possible that it is running into something similar to gfx-rs/metal#284. If all of those seem fine, could you try looking at the acceleration structures in the xcode acceleration structure inspector?

@Lichtso
Copy link
Contributor Author

Lichtso commented Jul 3, 2025

look at what the acceleration structure pointer is

I printed the result of device.new_acceleration_structure_with_size(descriptor.size) and got some curious results. It is definietly a valid pointer, no allocation failure / OOM.

Without any window in the process:
<MTLDebugAccelerationStructure: 0x600000d83ed0> -> <MTLGPUDebugAccelerationStructure: 0x600000a25c20> -> <AGXG16XFamilyRayTracingAccelerationStructure: 0x15a7174d0>

With a unrelated window in the same process:
<AGXG16XFamilyRayTracingAccelerationStructure: 0x14ce16260>

Seems like the debug layer stops doing its thing when there is at least one window present.
So, next I checked if MTL_DEBUG_LAYER=0 changes anything, and lo and behold it does.
The issue only occurs when the process does not have any windows and MTL_DEBUG_LAYER is enabled.

try looking at the acceleration structures in the xcode acceleration structure inspector

That is somewhat tricky because the test runs so fast there is no way to manually capture a frame. It would have to be done programatically. But that is more involved and we would first have to build some infrastructure in wgpu to properly do this at the command queue begin / end.

@Vecvec
Copy link
Collaborator

Vecvec commented Jul 3, 2025

That is somewhat tricky because the test runs so fast there is no way to manually capture a frame. It would have to be done programatically.

I've never used it, but device.start_graphics_debugger_capture seems to be able to work on xcode.

@Lichtso
Copy link
Contributor Author

Lichtso commented Jul 19, 2025

I think I will wait for objc2-metal to land and then rebase and adjust this PR.

@cwfitzgerald
Copy link
Member

That may still yet be a while. @ErichDonGubler thoughts on ^

@Vecvec
Copy link
Collaborator

Vecvec commented Jul 19, 2025

I think I will wait for objc2-metal to land and then rebase and adjust this PR.

I'm not sure how long it will take for objc2 to be vetted (and so allow #5641 to be merged), but it would certainly be easier to test ideas.

edit: I see @cwfitzgerald has also responded at the same time, so this can be ignored.

@jimblandy
Copy link
Member

Mozilla will be up-prioritizing the obj2c vetting.

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 6, 2025

I tried but ray tracing in the objc2-metal crate is not usable right now, see: madsmtm/objc2#770

@MarijnS95
Copy link
Contributor

MarijnS95 commented Aug 6, 2025

The project I'm working on has used Ray Tracing from the latest objc2-metal crate for ages. Can you clarify what "is not usable" and perhaps share a WIP branch so that we can help you out?

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 6, 2025

@MarijnS95 I figured it out, just an unfortunate assignment of feature names and what exactly they guard.

@MarijnS95
Copy link
Contributor

Yup, as written in madsmtm/objc2#770 (comment) the feature guard carries the name of the header file that the upstream Xcode SDK defines types in, even if they are sometimes unrelated or confusing to see.

This was convenient for the header-translator but less so for developers that wish to keep their enabled set of features small(er).

@Lichtso
Copy link
Contributor Author

Lichtso commented Aug 9, 2025

Closing this PR in favor of #8071

@Lichtso Lichtso closed this Aug 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Ray Tracing on Metal
5 participants