Meshlet software raster + start of cleanup #14623

JMS55 · 2024-08-05T05:21:51Z

Objective

Faster meshlet rasterization path for small triangles
Avoid having to allocate and write out a triangle buffer
Refactor gpu_scene.rs

Solution

Replace the 32bit visbuffer texture with a 64bit visbuffer buffer, where the left 32 bits encode depth, and the right 32 bits encode the existing cluster + triangle IDs. Can't use 64bit textures, wgpu/naga doesn't support atomic ops on textures yet.
Instead of writing out a buffer of packed cluster + triangle IDs (per triangle) to raster, the culling pass now writes out a buffer of just cluster IDs (per cluster, so less memory allocated, cheaper to write out).
- Clusters for software raster are allocated from the left side
- Clusters for hardware raster are allocated in the same buffer, from the right side
- The buffer size is fixed at MeshletPlugin build time, and should be set to a reasonable value for your scene (no warning on overflow, and no good way to determine what value you need outside of renderdoc - I plan to fix this in a future PR adding a meshlet stats overlay)
- Currently I don't have a heuristic for software vs hardware raster selection for each cluster. The existing code is just a placeholder. I need to profile on a release scene and come up with a heuristic, probably in a future PR.
- The culling shader is getting pretty hard to follow at this point, but I don't want to spend time improving it as the entire shader/pass is getting rewritten/replaced in the near future.
Software raster is a compute workgroup per-cluster. Each workgroup loads and transforms the <=64 vertices of the cluster, and then rasterizes the <=64 triangles of the cluster.
- Two variants are implemented: Scanline for clusters with any larger triangles (still smaller than hardware is good at), and brute-force for very very tiny triangles
- Once the shader determines that a pixel should be filled in, it does an atomicMax() on the visbuffer to store the results, copying how Nanite works
- On devices with a low max workgroups per dispatch limit, an extra compute pass is inserted before software raster to convert from a 1d to 2d dispatch (I don't think 3d would ever be necessary).
- I haven't implemented the top-left rule or subpixel precision yet, I'm leaving that for a future PR since I get usable results without it for now
- Resources used: https://kristoffer-dyrkorn.github.io/triangle-rasterizer and chapters 6-8 of https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index
Hardware raster now spawns 64*3 vertex invocations per meshlet, instead of the actual meshlet vertex count. Extra invocations just early-exit.
- While this is slower than the existing system, hardware draws should be rare now that software raster is usable, and it saves a ton of memory using the unified cluster ID buffer. This would be fixed if wgpu had support for mesh shaders.
- Instead of writing to a color+depth attachment, the hardware raster pass also does the same atomic visbuffer writes that software raster uses.
- We have to bind a dummy render target anyways, as wgpu doesn't currently support render passes without any attachments
- Material IDs are no longer written out during the main rasterization passes.
- If we had async compute queues, we could overlap the software and hardware raster passes.
New material and depth resolve passes run at the end of the visbuffer node, and write out view depth and material ID depth textures

Misc changes

Fixed cluster culling importing, but never actually using the previous view uniforms when doing occlusion culling
Fixed incorrectly adding the LOD error twice when building the meshlet mesh
Splitup gpu_scene module into meshlet_mesh_manager, instance_manager, and resource_manager
- resource_manager is still too complex and inefficient (extract and prepare are way too expensive). I plan on improving this in a future PR, but for now ResourceManager is mostly a 1:1 port of the leftover MeshletGpuScene bits.
Material draw passes have been renamed to the more accurate material shade pass, as well as some other misc renaming (in the future, these will be compute shaders even, and not actual draw calls)

Migration Guide

TBD (ask me at the end of the release for meshlet changes as a whole)

…iting material ID during raster

…sw-raster2

atlv24 · 2024-08-22T00:12:50Z

crates/bevy_pbr/src/meshlet/visibility_buffer_software_raster.wgsl

+    var max_x = u32(ceil(max3(vertex_0.x, vertex_1.x, vertex_2.x)));
+    var max_y = u32(ceil(max3(vertex_0.y, vertex_1.y, vertex_2.y)));
+    max_x = min(max_x, u32(view.viewport.z) - 1u);
+    max_y = min(max_y, u32(view.viewport.w) - 1u);


do you also need to min_x = max(0, min_x) etc?

Iirc the u32(foo) should turn all negatives into 0.

crates/bevy_pbr/src/meshlet/visibility_buffer_software_raster.wgsl

atlv24 · 2024-08-22T00:16:50Z

crates/bevy_pbr/src/meshlet/visibility_buffer_software_raster.wgsl

+        // Scanline setup
+        let edge_012 = -w_x;
+        let open_edge = edge_012 < vec3(0.0);
+        let inverse_edge_012 = select(1.0 / edge_012, vec3(1e8), edge_012 == vec3(0.0));


Idk, it's what the nanite slides do.

I tried to understand what this is doing, but never quite figured it out.

crates/bevy_pbr/src/meshlet/visibility_buffer_software_raster.wgsl

crates/bevy_pbr/src/meshlet/cull_clusters.wgsl

atlv24 · 2024-08-22T00:24:25Z

crates/bevy_pbr/src/meshlet/visibility_buffer_software_raster.wgsl

+
+            // Iterate scanline X interval
+            for (var x = x0; x <= x1; x++) {
+                // Check if point at pixel is within triangle (TODO: this shouldn't be needed, but there's bugs without it)


that's annoying, i'll see if i can figure this out but dont block merge on this

Might just be needed on the first and last pixel (x0, x1), and can be skipped for [x0 + 1, x1 - 1]

…sw-raster2

IceSentry · 2024-08-24T19:20:20Z

crates/bevy_pbr/src/meshlet/from_mesh.rs

@@ -294,6 +289,7 @@ fn simplify_meshlet_groups(
    let target_error = target_error_relative * mesh_scale;

    // Simplify the group to ~50% triangle count
+    // TODO: Simplify using vertex attributes


Is this about the bevy mesh api or more specific to meshlets?

Specific to meshlets. Simplification (i.e. the LOD building) only accounts for position atm, and not things like UVs/normals. See zeux/meshoptimizer#158.

IceSentry · 2024-08-24T19:22:09Z

crates/bevy_pbr/src/meshlet/instance_manager.rs

+
+/// Manages data for each entity with a [`MeshletMesh`].
+#[derive(Resource)]
+pub struct InstanceManager {


Maybe it's fine with the name space, but that feels a bit generic? Why not something like VirtualGeometryMeshManager? To be clear, this isn't a blocker or anything, I'm just curious.

Eh I didn't want to get hung up on naming. We can change it later.

IceSentry · 2024-08-24T19:24:21Z

crates/bevy_pbr/src/meshlet/meshlet_mesh_manager.rs

+
+/// Manages uploading [`MeshletMesh`] asset data to the GPU.
+#[derive(Resource)]
+pub struct MeshletMeshManager {


Maybe this should have Gpu somewhere in the name to clarify it's puprose?

IceSentry · 2024-08-24T19:29:47Z

crates/bevy_pbr/src/meshlet/resource_manager.rs

+                &BindGroupLayoutEntries::sequential(
+                    ShaderStages::COMPUTE,
+                    (
+                        storage_buffer_read_only_sized(false, None),


That's a lot of bindings, but that does feel a lot easier to read then when we were still using the raw structs 😅

Yeahhh I want automated bind groups in the render graph for a reason... Arguably this explicitness is better for perf, but it can't be that much slower to hash-and-cache, and this sucks to write and tweak.

IceSentry

Okay, I'm not seeing anything obviously wrong. I confirmed that the code doesn't break anything else. The various *Manager structs are a bit generically named, but that doesn't really matter for now.

I get a clippy lint locally but it's in a crate that isn't modified in this PR so not sure what's up with that. Other than that LGTM.

mockersf · 2024-08-27T21:45:27Z

This PR maybe made things faster, but it also greatly reduces the range of hardware where it works. meshlets used to work on macOS or on Vulkan with software renderer, not anymore. I don't know about DX12 but there's a comment saying it's now Vulkan only with a recent GPU.

Is it worth it making it faster if it won't run anymore on hardware where it would be useful for it to be faster?

JMS55 · 2024-08-27T22:02:21Z

I'm not open to supporting GPUs without 64bit atomics (meaning no software raster).

Besides being an insane amount of extra code to support (pretty much an entire extra copy of the codebase), there's no point in telling artists to design their scenes around this feature for pixel level geometry detail, and then have older users try to use it, completely fall over due to lack of software raster + low vram, and then blame the game developers for not optimizing their game when the developers can't do anything besides making an entire second copy of the scene designed for working without meshlets. This is not a feature meant for older platforms, and the only reason it ran before was a temporary setup so that I could develop the rest of the feature without waiting for wgpu to implement 64bit atomics.

It should run on DX12 (SM 6.6+ iirc) and Metal (M2+ iirc), except wgpu/naga is bugged on DX12/Metal with these shaders.

JMS55 · 2024-08-28T17:45:08Z

For my future reference, another good SW raster tutorial: https://web.archive.org/web/20050408192410/http://sw-shader.sourceforge.net/rasterizer.html

JMS55 added 30 commits June 25, 2024 15:08

Rename raster to hardware raster

7a31e5b

Switch material depth pass to read visbuffer for material ID, skip wr…

0ac944a

…iting material ID during raster

WIP: Atomics for hardware raster

271c05f

Use dummy render target

c30c251

Add Meshlet::vertex_count

e5c359c

Depend on SHADER_INT64_ATOMIC_MIN_MAX

3d0c896

Misc

cae89fc

Add MeshletMesh::worst_case_meshlets

01573b8

Rename material passes

7149af3

WIP

570419a

Cherry pick culling fix

4f6ac3f

Bugfixes

7ce0cdb

WIP: Setup SW raster clusters list

cad3064

WIP: SW raster shader

c32cbe4

WIP: SW raster

3bd7242

Fix dispatch.yz size

0b571d8

WIP

46cc02f

Misc

a6c844f

Misc

8950035

Misc

9aece81

Fix bug

f954088

Fix bug from variable shadowing

d6770d1

Add TODOs

fd8137c

Make edge functions incremental

7d3ef9e

Remove dummy

495c206

Testing setup

5552706

Merge commit '9da18cce2a1cede5899086a126ec931f3a3727ab' into meshlet-…

f3fc40d

…sw-raster2

Fix bugged resolve_material_depth

efafbc0

Fix shading bugs

2e9a83a

Misc

0408d3d

JMS55 added 5 commits August 12, 2024 14:16

Merge commit '6ab8767d3bf54343c6eb613d25c2cd58311fb9a2' into meshlet-…

8204b69

…sw-raster2

WIP: Update example

35384a6

Fix LOD error bug in mesh -> meshlet mesh conversion

bc4a5df

Misc doc tweaks

3f30d2a

Merge commit 'cf694889826396f51915d229d591579f9fa26007' into meshlet-…

2f65919

…sw-raster2

JMS55 requested a review from IceSentry August 19, 2024 19:59

Disambiguate function call

51f0d16

atlv24 approved these changes Aug 22, 2024

View reviewed changes

JMS55 added 9 commits August 21, 2024 23:01

Misc change

a42bb6f

Set cluster_buffer_slots to tentative 4096

2698780

Merge commit '938d810766d34f1a300beb440273c3db1635ee5c' into meshlet-…

4a06bfd

…sw-raster2

Double cluster_buffer_slots

1415630

Tweak error message

83010dc

More error tweak

3dc2422

Update bunny URL

52acc2d

Yep mixed rendering methods are still bugged

dc00111

Merge commit 'e07119a0f971da2c0eb76a4294d9bf3326d0cf19' into meshlet-…

c82516a

…sw-raster2

IceSentry reviewed Aug 24, 2024

View reviewed changes

IceSentry approved these changes Aug 24, 2024

View reviewed changes

IceSentry added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Aug 24, 2024

alice-i-cecile added this pull request to the merge queue Aug 26, 2024

Merged via the queue into bevyengine:main with commit 6cc96f4 Aug 26, 2024
27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meshlet software raster + start of cleanup #14623

Meshlet software raster + start of cleanup #14623

JMS55 commented Aug 5, 2024 •

edited

Loading

atlv24 Aug 22, 2024

JMS55 Aug 22, 2024

atlv24 Aug 22, 2024

JMS55 Aug 22, 2024

JMS55 Aug 22, 2024

atlv24 Aug 22, 2024

JMS55 Aug 22, 2024

IceSentry Aug 24, 2024

JMS55 Aug 24, 2024

IceSentry Aug 24, 2024

JMS55 Aug 24, 2024

IceSentry Aug 24, 2024

IceSentry Aug 24, 2024

JMS55 Aug 24, 2024

IceSentry left a comment

mockersf commented Aug 27, 2024 •

edited

Loading

JMS55 commented Aug 27, 2024

JMS55 commented Aug 28, 2024

Meshlet software raster + start of cleanup #14623

Meshlet software raster + start of cleanup #14623

Conversation

JMS55 commented Aug 5, 2024 • edited Loading

Objective

Solution

Misc changes

Migration Guide

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IceSentry left a comment

Choose a reason for hiding this comment

mockersf commented Aug 27, 2024 • edited Loading

JMS55 commented Aug 27, 2024

JMS55 commented Aug 28, 2024

JMS55 commented Aug 5, 2024 •

edited

Loading

mockersf commented Aug 27, 2024 •

edited

Loading