[hal metal] ray tracing acceleration structures #7660

Lichtso · 2025-05-02T22:41:04Z

Connections
Fixes: #7402

Description
Implements the missing ray tracing acceleration structures in the HAL metal backend.

Testing
The examples ray_scene, ray_shadows, ray_cube_compute, ray_cube_fragment and ray_traced_triangle all work.
That is if invoked via cargo run --bin wgpu-examples ray_traced_triangle, but not via cargo xtask test ray_traced_triangle, still current CI runner is too old to catch that as it does not support hardware ray tracing.

Squash or Rebase?
Squash

Checklist

Run cargo fmt.
Run taplo format.
Run cargo clippy --tests
Run cargo xtask test to run tests.
If this contains user-facing changes, add a CHANGELOG.md entry.

Vecvec

Good job! Glad there didn't need to be any wgpu-core changes. Largely looks good, but I'm not extremely knowledgeable about metal. One question / comment, but haven't yet checked everything with spec.

wgpu-hal/src/metal/conv.rs

Vecvec

Done checks against the Metal spec. It seems this requires MacOS 13.0+ not 11.0+ due to some more recent functions being used. Confusingly, the vertex buffer field suggests that the only format supported is f32x3 so I'm not sure what descriptor.set_vertex_format does.

Vecvec · 2025-05-03T05:31:32Z

wgpu-hal/src/metal/adapter.rs

@@ -890,6 +890,11 @@ impl super::PrivateCapabilities {
                && (device.supports_family(MTLGPUFamily::Apple7)
                    || device.supports_family(MTLGPUFamily::Mac2)),
            supports_shared_event: version.at_least((10, 14), (12, 0), os_is_mac),
+            supports_raytracing: if version.at_least((11, 0), (14, 0), os_is_mac) {
+                device.supports_raytracing()


I think raytracing support needs supportsRaytracingFromRender due to support of ray queries in fragment shaders (Requires MacOS 12.0+).

That function is not exposed in the Rust metal crate. But I did bump the min required versions to macOS 13 and iOS 16.

If the metal crate is still taking PRs (Idk what state of deprecated they are in) it would probably be a good idea to add this (and the other later ones).

wgpu-hal/src/metal/conv.rs

wgpu-hal/src/metal/command.rs

Lichtso · 2025-05-03T08:37:43Z

wgpu-hal/src/metal/device.rs

    }

    unsafe fn destroy_acceleration_structure(
        &self,
        _acceleration_structure: super::AccelerationStructure,
    ) {
-        unimplemented!()
+        // self.counters.acceleration_structures.sub(1);


I there a reason not to have HalCounters::acceleration_structures?

Looking back at the history I couldn't find a reason, but it's possible it's buried somewhere.

Lichtso · 2025-05-03T08:42:19Z

wgpu-hal/src/metal/command.rs

+        for descriptor in descriptors {
+            let acceleration_structure_descriptor =
+                conv::map_acceleration_structure_descriptor(descriptor.entries);
+            /* The Rust metal crate does not expose metal::MTLAccelerationStructureUsage yet


Again, not exposed in the Rust metal crate.

Lichtso · 2025-05-03T08:47:01Z

examples/features/src/ray_shadows/shader.wgsl

@@ -35,6 +35,7 @@ var acc_struct: acceleration_structure;

 struct PushConstants {
    light: vec3<f32>,
+    padding: f32,


It seems that metal always sends at least 16 bytes for push constants, even if we only pass in 12 bytes. And then the shader validation complains that the receiver here only expects 12 bytes.

Lichtso · 2025-05-03T11:26:05Z

Glad there didn't need to be any wgpu-core changes

Almost, had to remove the Option<> around the buffers and always pass the dummy zero buffer when computing the size of the acceleration structures and their scratch buffers because Metal does not like nil.

I can split those first four commits into a separate PR if that helps with the review.

Vecvec · 2025-05-04T06:43:37Z

I just remembered that structures have minimum versions, and it seems MTLIndirectAccelerationStructureInstanceDescriptor required MacOS 14.0+ (probably should have checked that earlier...).

Lichtso · 2025-05-04T12:13:57Z

I just remembered that structures have minimum versions, and it seems MTLIndirectAccelerationStructureInstanceDescriptor required MacOS 14.0+ (probably should have checked that earlier...).

Bumped the min required version even further up.

Lichtso · 2025-05-04T12:22:42Z

I also managed to reduce the issue with the acelleration structure not intersecting any rays to a perfect reproducer and it is wild:

See the last commit "Bug reproducer", which modifies the ray_cube_fragment example to generate two BLASes: One with 152 triangles and one with 153 triangles.

With Metal on macOS the instances of the BLAS with 152 triangles (16344 bytes acceleration_structure_size) work as expected, but the ones with 153 triangles (16472 bytes acceleration_structure_size) suddenly stop intersecting rays after roughly 1.5 seconds no matter how many frames were rendered until then. 0x4000 = 2^14 = 16384 might be some special boundary being crossed. It also keeps happening even if I stop calling build_acceleration_structure() after the inital setup. Using MTLAccelerationStructureInstanceDescriptor or MTLIndirectAccelerationStructureInstanceDescriptor is also irrelevant. Same goes for calling encoder.use_resource_at(blas.as_native(), use_info.uses, use_info.stages) or not.

This also breaks Vulkan on Linux with a SIGSEGV upon Queue::submit: https://github.com/gfx-rs/wgpu/actions/runs/14820911901/job/41607697292?pr=7660

Using an example from metal-rs without wgpu does not reproduce this bug. It seems we are either lacking some validation step or are doing something wrong with our handling of acceleration structures in general.

@Vecvec: What testing hardware do you have available? Can you maybe see why Vulkan is failing this too?

Vecvec · 2025-05-04T19:05:02Z

@Vecvec: What testing hardware do you have available? Can you maybe see why Vulkan is failing this too?

I've got a couple of raytracing supported machines (plus llvmpipe which I will also be testing on). I'll have a look and see if I can get any ideas of what the issue might be.

Vecvec · 2025-05-04T19:33:15Z

Hits a divide by zero on Microsoft Basic Render Driver (though it doesn't seem to be related to the memory used, and only on one of my comuters). Can't get it to fail on the real gpus yet. Was able to reproduce the llvmpipe seg fault (edit: Don't think it's the same problem as the one here), will continue testing.

Lichtso · 2025-05-04T21:21:33Z

divide by zero

Might be that it tries to normalize a zero-length vector. The modified example does simply duplicate triangles so that could cause some vectors to become zero.

I narrowed the Metal issue down further and it is indeed caused by AccelerationStructureBuildSizes::acceleration_structure_size being greater or equal to 0x4000. For example if I modify device.new_acceleration_structure_with_size(descriptor.size.max(0x4000)) in Device::create_acceleration_structure() only (which is the latest point and makes sure that it is only related to the Metal backend) then all BLAS instances first work fine but disappear after 1.5 seconds. Reading the Metal docs it appears that 16384 (0x4000) is indeed used as API limit for other things like the mesh shader output buffer. So maybe there is a bug in the Metal driver, because I can not immagine that the limit for acceleration structure sizes is supposed to be so low.

Edit: Officially the limits are way higher, see https://developer.apple.com/documentation/metal/mtlaccelerationstructureusage/extendedlimits.

Vecvec · 2025-05-04T23:26:29Z

Most other resources are created with an auto release pool around them, is it possible that that is fixing this issue somehow?

Lichtso · 2025-05-05T07:41:40Z

Most other resources are created with an auto release pool around them, is it possible that that is fixing this issue somehow?

Added one in Device::create_acceleration_structure() but unfortunately that was not it either. There must be some other conditions to trigger it because the metal-rs examples don't and the wgpu examples only do when called via cargo xtask test.

I would say we try to land this PR and then open an issue for it to solve that separately.

BTW, I noticed the CI runner "Test Mac aarch64" job is not failing. Probably the test runner is too old to support hardware raytracing and skips the relevant tests.

Vecvec · 2025-05-05T19:14:42Z

Added one in Device::create_acceleration_structure() but unfortunately that was not it either

That's annoying, I wonder what it could be

I would say we try to land this PR and then open an issue for it to solve that separately.

Yes, though it could be some time before it lands.

I noticed the CI runner "Test Mac aarch64" job is not failing. Probably the test runner is too old to support hardware raytracing and skips the relevant tests

I checked and it does skip.

Vecvec

Excluding the things that aren't exposed by the metal crate this looks good to me.

Vecvec · 2025-05-06T00:49:14Z

wgpu-hal/src/metal/conv.rs

+                            .flags
+                            .contains(wgt::AccelerationStructureGeometryFlags::OPAQUE),
+                    );
+                    // wgt::AccelerationStructureGeometryFlags::NO_DUPLICATE_ANY_HIT_INVOCATION


It feels like this should set allowDuplicateIntersectionFunctionInvocation if NO_DUPLICATE_ANY_HIT_INVOCATION is not set but metal-rs doesn't support this.

Added it to: gfx-rs/metal-rs#361

cwfitzgerald · 2025-06-25T19:04:50Z

What's teh current status of this PR? @Vecvec what would the next steps be to be able to land this?

Vecvec · 2025-06-25T19:08:55Z

It's blocked on gfx-rs/metal-rs#361. There also sounds to be some driver issue which only makes acceleration structures work when a window is there. @Lichtso would probably be able to give more details.

Edit: there is also a need to keep acceleration structures resident (I've listed potential solutions in #7660 (comment))

Edit 2: The potential solutions were only the ones I found on metal's docs, I might look into how metal does its DXR/VKRay conversions.

cwfitzgerald · 2025-06-25T20:04:10Z

Re: residency - could you call useResource?

cwfitzgerald · 2025-06-25T20:05:20Z

Alright, I'll get that metal PR landed

Lichtso · 2025-06-25T20:10:47Z

there is also a need to keep acceleration structures resident (I've listed potential solutions in #7660 (comment))

MTLResidencySet would also have to be exposed in metal-rs first. But I haven't even tried it yet.

The potential solutions were only the ones I found on metal's docs, I might look into how metal does its DXR/VKRay conversions.

MoltenVK has not implemented ray tracing either: (see KhronosGroup/MoltenVK#427 and KhronosGroup/MoltenVK#1956). Or were you thinking about another translation layer / project?

Vecvec · 2025-06-26T00:44:03Z

Re: residency - could you call useResource?

I think I mentioned this earlier, but it must be in some review comment. We can't due to allowing out of order BLAS builds. Basically:

Record build in encoder 1 with blas 1 in tlas 1.
Use tlas 1 in encoder 2.
Record build in encoder 3 with blas 2 in tlas 1.
Queue submit with encoder 3 then encoder 2.

Blas 1 would be resident while blas 2 wouldn't be, but blas 2 would need to be resident.

Edit: #7660 (comment)

Vecvec · 2025-06-26T00:45:30Z

MoltenVK has not implemented ray tracing either: (see KhronosGroup/MoltenVK#427 and KhronosGroup/MoltenVK#1956). Or were you thinking about another translation layer / project

I was thinking of Game Porting Tool Kit, ~~but maybe it doesn't support ray tracing either~~.

Edit at least the shader converter supports this, and I would assume apple would support all of it.
https://developer.apple.com/metal/shader-converter/#changelog

Version | Changes | Requirements
-- | -- | --
2 | Support for shader debug information, globally-coherent memory access, and SV_CullPrimitive. | Globally-coherent memory access requires targeting macOS 15, iOS 18, or later.
1.1 | Support for ray tracing shaders. | Metal ray tracing support.
1 | Initial release. | Argument buffers tier 2 support.

Vecvec · 2025-06-26T00:47:35Z

@Lichtso did you file a bug with apple for acceleration structures not working w/o a window? It would be good to keep an eye on it in this PR (or when this PR lands, in an issue).

Lichtso · 2025-06-26T09:23:28Z

@Lichtso did you file a bug with apple for acceleration structures not working w/o a window? It would be good to keep an eye on it in this PR (or when this PR lands, in an issue).

No I haven't, yet. Would have to create minimized reproducer first and write it in Swift. Also, creating the window makes the difference, but it could be a second order effect like timing. E.g. creating the window yields to the kernel and the process is resumed later than if it didn't, things like that.

I was thinking of Game Porting Tool Kit

Ah, you mean the D3DMetal.framework but the source code for that is not public, binary distribution only.

Vecvec · 2025-06-26T18:46:29Z

Ah, you mean the D3DMetal.framework but the source code for that is not public, binary distribution only.

I'd assumed that apple might make some way of showing what each call translates to so that developers could port their own games so they didn't have to constantly rely on a translation layer, I guess that doesn't exist.

Vecvec · 2025-07-02T23:22:21Z

@Lichtso are you able to use a debugger on the tests? (It looks to be possible at least under cargo test) If so, could you see what acceleration structure sizes we are getting (in case something in metal is failing), whether they are different, and also look at what the acceleration structure pointer is - it is just possible that it is running into something similar to gfx-rs/metal#284. If all of those seem fine, could you try looking at the acceleration structures in the xcode acceleration structure inspector?

Lichtso · 2025-07-03T15:30:27Z

look at what the acceleration structure pointer is

I printed the result of device.new_acceleration_structure_with_size(descriptor.size) and got some curious results. It is definietly a valid pointer, no allocation failure / OOM.

Without any window in the process:
<MTLDebugAccelerationStructure: 0x600000d83ed0> -> <MTLGPUDebugAccelerationStructure: 0x600000a25c20> -> <AGXG16XFamilyRayTracingAccelerationStructure: 0x15a7174d0>

With a unrelated window in the same process:
<AGXG16XFamilyRayTracingAccelerationStructure: 0x14ce16260>

Seems like the debug layer stops doing its thing when there is at least one window present.
So, next I checked if MTL_DEBUG_LAYER=0 changes anything, and lo and behold it does.
The issue only occurs when the process does not have any windows and MTL_DEBUG_LAYER is enabled.

try looking at the acceleration structures in the xcode acceleration structure inspector

That is somewhat tricky because the test runs so fast there is no way to manually capture a frame. It would have to be done programatically. But that is more involved and we would first have to build some infrastructure in wgpu to properly do this at the command queue begin / end.

Vecvec · 2025-07-03T19:49:58Z

That is somewhat tricky because the test runs so fast there is no way to manually capture a frame. It would have to be done programatically.

I've never used it, but device.start_graphics_debugger_capture seems to be able to work on xcode.

Lichtso · 2025-07-19T09:58:47Z

I think I will wait for objc2-metal to land and then rebase and adjust this PR.

cwfitzgerald · 2025-07-19T20:11:19Z

That may still yet be a while. @ErichDonGubler thoughts on ^

Vecvec · 2025-07-19T20:11:30Z

I think I will wait for objc2-metal to land and then rebase and adjust this PR.

I'm not sure how long it will take for objc2 to be vetted (and so allow #5641 to be merged), but it would certainly be easier to test ideas.

edit: I see @cwfitzgerald has also responded at the same time, so this can be ignored.

jimblandy · 2025-08-06T15:14:26Z

Mozilla will be up-prioritizing the obj2c vetting.

Lichtso · 2025-08-06T17:36:11Z

I tried ~~but ray tracing in the objc2-metal crate is not usable right now~~, see: madsmtm/objc2#770

MarijnS95 · 2025-08-06T17:45:10Z

The project I'm working on has used Ray Tracing from the latest objc2-metal crate for ages. Can you clarify what "is not usable" and perhaps share a WIP branch so that we can help you out?

Lichtso · 2025-08-06T18:26:03Z

@MarijnS95 I figured it out, just an unfortunate assignment of feature names and what exactly they guard.

MarijnS95 · 2025-08-06T18:27:59Z

Yup, as written in madsmtm/objc2#770 (comment) the feature guard carries the name of the header file that the upstream Xcode SDK defines types in, even if they are sometimes unrelated or confusing to see.

This was convenient for the header-translator but less so for developers that wish to keep their enabled set of features small(er).

Lichtso · 2025-08-09T13:20:23Z

Closing this PR in favor of #8071

Lichtso requested a review from a team as a code owner May 2, 2025 22:41

Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch 2 times, most recently from e30b663 to f3830cb Compare May 2, 2025 22:52

Vecvec approved these changes May 3, 2025

View reviewed changes

wgpu-hal/src/metal/conv.rs Show resolved Hide resolved

Vecvec requested changes May 3, 2025

View reviewed changes

Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch from f3830cb to 234e75b Compare May 3, 2025 08:33

Lichtso commented May 3, 2025

View reviewed changes

Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch 2 times, most recently from 5f1c464 to 2a6d9b6 Compare May 3, 2025 11:13

Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch 2 times, most recently from 38b3de7 to 9442511 Compare May 4, 2025 11:49

Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch from 9442511 to 90082ad Compare May 4, 2025 12:40

Lichtso force-pushed the metal/ray_tracing_acceleration_structures branch from 90082ad to 982103f Compare May 5, 2025 07:33

Lichtso mentioned this pull request May 5, 2025

Raytracing: MTLAccelerationStructureUsage and supports_raytracing_from_render gfx-rs/metal-rs#361

Open

Lichtso requested a review from Vecvec May 5, 2025 07:50

Vecvec approved these changes May 6, 2025

View reviewed changes

Lichtso added 2 commits May 11, 2025 16:58

Removes Option<> around AccelerationStructureTriangleIndices::buffer.

5b34aeb

Removes Option<> around AccelerationStructureTriangles::vertex_buffer.

503f197

jimblandy requested a review from cwfitzgerald June 4, 2025 15:17

cwfitzgerald assigned cwfitzgerald and Vecvec and unassigned cwfitzgerald Jun 11, 2025

cwfitzgerald self-assigned this Jun 25, 2025

Vecvec self-requested a review June 26, 2025 21:04

Lichtso mentioned this pull request Aug 9, 2025

[hal metal] ray tracing acceleration structures #8071

Open

5 tasks

Lichtso closed this Aug 9, 2025

[hal metal] ray tracing acceleration structures #7660

[hal metal] ray tracing acceleration structures #7660

Uh oh!

Conversation

Lichtso commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vecvec left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Vecvec left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lichtso May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lichtso commented May 3, 2025

Uh oh!

Vecvec commented May 4, 2025

Uh oh!

Lichtso commented May 4, 2025

Uh oh!

Lichtso commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vecvec commented May 4, 2025

Uh oh!

Vecvec commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lichtso commented May 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vecvec commented May 4, 2025

Uh oh!

Lichtso commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vecvec commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vecvec left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cwfitzgerald commented Jun 25, 2025

Uh oh!

Vecvec commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cwfitzgerald commented Jun 25, 2025

Uh oh!

cwfitzgerald commented Jun 25, 2025

Uh oh!

Lichtso commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vecvec commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lichtso commented May 2, 2025 •

edited

Loading

Vecvec left a comment •

edited

Loading

Lichtso May 3, 2025 •

edited

Loading

Lichtso commented May 4, 2025 •

edited

Loading

Vecvec commented May 4, 2025 •

edited

Loading

Lichtso commented May 4, 2025 •

edited

Loading

Lichtso commented May 5, 2025 •

edited

Loading

Vecvec commented May 5, 2025 •

edited

Loading

Vecvec commented Jun 25, 2025 •

edited

Loading

Lichtso commented Jun 25, 2025 •

edited

Loading

Vecvec commented Jun 26, 2025 •

edited

Loading

Vecvec commented Jun 26, 2025 •

edited

Loading

Vecvec commented Jun 26, 2025 •

edited

Loading

Vecvec commented Jun 26, 2025 •

edited

Loading

Lichtso commented Jul 3, 2025 •

edited

Loading

Vecvec commented Jul 19, 2025 •

edited

Loading

Lichtso commented Aug 6, 2025 •

edited

Loading

MarijnS95 commented Aug 6, 2025 •

edited

Loading