Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

substantial performance degredation in bevy 0.10 #7982

Closed
ruabmbua opened this issue Mar 8, 2023 · 16 comments
Closed

substantial performance degredation in bevy 0.10 #7982

ruabmbua opened this issue Mar 8, 2023 · 16 comments
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times C-Regression Functionality that used to work but no longer does. Add a test for this!

Comments

@ruabmbua
Copy link

ruabmbua commented Mar 8, 2023

Bevy version

bevy 0.10

[Optional] Relevant system information

CPU:
  Info: 12-core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP cache:
    L2: 6 MiB
  Speed (MHz): avg: 2479 min/max: 2200/4950 cores: 1: 3700 2: 2200 3: 2200
    4: 2200 5: 3009 6: 2200 7: 2200 8: 2200 9: 2200 10: 2200 11: 2200 12: 2200
    13: 3599 14: 2200 15: 2200 16: 2200 17: 3700 18: 3700 19: 2200 20: 2200
    21: 2200 22: 2200 23: 2200 24: 2200
Graphics:
  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
    driver: amdgpu v: kernel
  Display: wayland server: X.org v: 1.21.1.7 with: Xwayland v: 22.1.8
    compositor: sway v: 1.8.1 driver: X: loaded: modesetting dri: radeonsi
    gpu: amdgpu resolution: 1: 2560x1440~144Hz 2: 1920x1080~60Hz
  API: OpenGL v: 4.6 Mesa 22.3.6 renderer: AMD Radeon RX 5700 XT (navi10
    LLVM 15.0.7 DRM 3.49 6.2.1-arch1-1)
  • cargo 1.70.0-nightly (9880b408a 2023-02-28)
  • Linux 6.2.1-arch1-1 archlinux distro
AdapterInfo { name: "AMD Radeon RX 5700 XT (RADV NAVI10)", vendor: 4098, device: 29471, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 22.3.6", backend: Vulkan }

What you did

I upgraded to bevy 0.10

What went wrong

After upgrading my project to bevy-0.10, the performance suffered a lot. Release mode builds slowed down in some situations
from 140+fps to below 100. In debug builds with dependency crates still set to opt level 3 (for development speed), the performance degraded from 140+fps to below 20 in some situations, an never touching more than 60.

I know that changing engine related code can result in slower performance in some situations, but I did not expect it to slowdown quite as much. Especially since before even IGPU could run my project easily with 60fps.

Additional information

I changed my project a bit to make situations better reproducible (disabled dynamic map generation), and made two profiles for before / after comparison.

Here is the file: https://github.com/ruabmbua/bevy-traces/blob/main/traces.tar.xz

First look at the traces tells me that CommandEncoder::run_render_pass is the problem. Its 20ms instead of 900µs.

@ruabmbua ruabmbua added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Mar 8, 2023
@alice-i-cecile alice-i-cecile added C-Performance A change motivated by improving speed, memory usage or compile times C-Regression Functionality that used to work but no longer does. Add a test for this! A-Rendering Drawing game state to the screen and removed S-Needs-Triage This issue needs to be labelled C-Bug An unexpected or incorrect behavior labels Mar 8, 2023
@alice-i-cecile
Copy link
Member

@superdump @james7132 you two will be interested in this.

Thanks for the traces!

@alice-i-cecile alice-i-cecile added this to the 0.10.1 milestone Mar 8, 2023
@ruabmbua
Copy link
Author

ruabmbua commented Mar 8, 2023

Note that this is my correct GPU:

AdapterInfo { name: "AMD Radeon RX 5700 XT (RADV NAVI10)", vendor: 4098, device: 29471, device_type: DiscreteGpu, driver: "radv", driver_info: "Mesa 22.3.6", backend: Vulkan }

GH issues replaced my paste with something else??

@ruabmbua
Copy link
Author

ruabmbua commented Mar 8, 2023

Update: it turns out DirectionalLight is the problem. When I disable the only one in the scene (for simulating the sun), it seems to be fixed.
There are no issues with other types of lights.

@james7132
Copy link
Member

The first thing I'm noticing is the amount of time spent encoding shadow passes, as you've mentioned in the issue description. This seems like it's an immediate result of the cascaded shadow map changes.

image

@alice-i-cecile alice-i-cecile removed this from the 0.10.1 milestone Mar 8, 2023
@Elabajaba
Copy link
Contributor

Update: it turns out DirectionalLight is the problem. When I disable the only one in the scene (for simulating the sun), it seems to be fixed. There are no issues with other types of lights.

Do the other lights cast shadows?

@ruabmbua
Copy link
Author

ruabmbua commented Mar 8, 2023

Nope, but I just enabled them and there seems to be no perf impact.

@ruabmbua
Copy link
Author

ruabmbua commented Mar 8, 2023

Here is a gpu trace, can be viewed with the RadeonGPUProfiler:

bevy-voxel-experiment_2023.03.08_22.26.55.zip

@ruabmbua
Copy link
Author

ruabmbua commented Mar 8, 2023

grafik

Seems to me like all the shadow cascade passes have barely any gpu utilization, but I am too much of a noob in RGP to figure out why its stalled.

@ruabmbua
Copy link
Author

ruabmbua commented Mar 8, 2023

Looking at the passes, it seems only vertex shader units are utilized, but the actual hw utilization is about nicely 0%-1% for memory, cache, alu, load-store.
And digging into instruction timing, there are some s_waitcnt instructions which show that most of the time is spent waiting in them.

@superdump
Copy link
Contributor

I was going to ask about your scene and materials. I see from the Tracy trace that you’re using voxels- can you please provide a lot more information about how exactly your voxels are rendered, the scene being rendered (number of entities with meshes), what material(s) you’re using.

@ruabmbua
Copy link
Author

Sry for the late update:

  • There are 4096 chunks (64x64 grid) in the scene
  • Each block type in a chunk gets its own mesh
  • No custom shaders, the chunk meshes just use the standard pbr material with a base color texture

And here are some stats of the test scene:

total meshes: 5506
total primitives: 1963452
avg primitives / mesh: 356

And here a picture of the camera for comparing the engine versions, and where I captured the traces:

image

@ruabmbua
Copy link
Author

I just changed around some of the settings for my chunk renderer. I reduced the number of chunks by 4 times, and compensated by making the chunks larger. The goal was to reduce number of total meshes.

It seems that partially fixes the performance problem, while still rendering somewhat the same geometry.

Was my previous approach with so many unique meshes unreasonable? I do not have much experience with this stuff.

@superdump
Copy link
Contributor

Currently the way bevy draws meshes is one draw command per mesh. I’ve been experimenting, learning, and looking into what are called ‘batching’ and ‘instancing’ #89 to use far fewer draw commands for drawing the same things because fewer draw commands for drawing the same things can bring very large performance benefits. Within the constraints of current bevy rendering with one draw per mesh, trying to merge meshes and reduce the draw count will improve performance. I’m speaking loosely here as there are many things that can improve rendering performance and voxels in particular have many different possible rendering techniques. How many voxels are there in your scene?

@ruabmbua
Copy link
Author

The "volume" of loaded voxels in the scene consists of 67108864 voxels.
Or course much of that is empty and does not actually generate any meshes.

I know about instancing, but I doubt it would be helpful in this "minecraft" kind of voxel rendering. I think its not possible to have different geometry for instancing?

Merging the meshes however seems to make sense, since it already improved by a lot by adjusting my parameters.

@ruabmbua
Copy link
Author

I think the gpu might just struggle to actually use all of its resources for vertex shading, when there are too many draw calls with not enough actual primitives inside of them.
I compared the the utilization in gpu profiler, and it got 2-4 times better now.

What is strange however is the fact that old bevy was still a lot better, even with the smaller chunks & more total meshes.
I guess the extra render passes which all invoke the vertex shaders again makes the cost of it just go up a lot.

Maybe its possible to just run the render passes of cascade shadow map at the same time? Not sure how exactly the algorithm looks like in bevy, but I imagine they could run in parallel, and increase the GPU utilization again to fix the performance?

@ruabmbua
Copy link
Author

ruabmbua commented Mar 15, 2023

@superdump I think this can be closed, its still possible to get good performance in bevy 0.10, it just needs some tweaking in the game code.

And I prefer the new cascaded shadow maps a lot over the old directional lights. Now it actually affects the whole scene I have, not just a large part of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times C-Regression Functionality that used to work but no longer does. Add a test for this!
Projects
None yet
Development

No branches or pull requests

5 participants