Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heavy Performance Degradation after making shadows of large amounts of small objects #40059

Closed
gdesecrate opened this issue Jul 2, 2020 · 92 comments

Comments

@gdesecrate
Copy link

gdesecrate commented Jul 2, 2020

Update: First sample scene was made from artistic perspective and not displays issue really efficent,
For better and cleaner example please see the #40059 (comment)

Godot Engine v3.2.2.stable.official - https://godotengine.org
OpenGL ES 2.0 Renderer: Radeon RX 5500 XT (NAVI14, DRM 3.37.0, 5.7.0-3-MANJARO, LLVM 10.0.0)
OpenGL ES 2.0 Batching: ON

Note: by large amount I mean amount of spawned/despawned meshes, not the total amount of objects on scene in the same frame.

Really small objects cause persistent performance drop if shadows are enabled.

Steps to reproduce:
Enable shadows, and scale objects down. After noticeable amount of objects (shrinked on the screen) the performance degradation happens and persist even after scene reload. If you disable shadows - performance drop is reduced.

Minimal reproduction project:
Godot Shadow Issue.zip

After around 10,000 cubes (number in top left corner) the small spikes happed in FPS/process time
Screenshot_20200702_180923
Close to 30k cubes the performance drop is crazy and reloading the scene does not helps!
Screenshot_20200702_181409
If you turn off shadows (there is checkbox for it in top right corner) the process time is back to normal. Profiles shows degradation as "idle time".

If you unpress checkbox "Despawn by shrinking" the performance graph is "flat"
Screenshot_20200702_184539

(!!note the giant spike is caused by me moving window over)

The cube metrics in my sample is not very good because they probably some die off screen and don't cause the shadow issue.

@akien-mga akien-mga added this to the 3.2 milestone Jul 2, 2020
@clayjohn
Copy link
Member

clayjohn commented Jul 2, 2020

30,000 shadow casting objects is a lot! You are likely draw call bottlenecked. In GLES2 there is a draw call for each object for each light. (2 lights = 2 draw calls). If the light is casting shadows you double it (2 lights with shadows = 4 draw calls).

With 1 light you would have 60,000 draw calls! I'll bet if you look at your performance monitor you will see your CPU is maxed out while your GPU is barely running.

@gdesecrate
Copy link
Author

30,000 shadow casting objects is a lot! You are likely draw call bottlenecked. In GLES2 there is a draw call for each object for each light. (2 lights = 2 draw calls). If the light is casting shadows you double it (2 lights with shadows = 4 draw calls).

With 1 light you would have 60,000 draw calls! I'll bet if you look at your performance monitor you will see your CPU is maxed out while your GPU is barely running.

30k is for index of cube, it means "30000th cube was spawned", not that there 30k cubes on the scene. There is like 1.7k of objects on scene at any time. Every cube dies after 2-3 seconds. If they die by shrinking - it corrupts the shadows. So hard you gonna remember it even after scene reload. If it does not shrinks before queue_free() - the shadows does not get corrupted.

I was hunting for that bug for a week. It really was ruining my game.

@clayjohn
Copy link
Member

clayjohn commented Jul 2, 2020

I have now run the example scene provided. The slowdown looks to be a combination of two things:

  1. Scaling physics objects to near 0 causes a slowdown. This is a known issue, see: Godot acts weird when scale of a mesh is near 0. #31134
  2. Having shadows on causes slowdown. As explained above, with so many objects, you are facing a draw-call bottleneck. Having multiple lights scales poorly in GLES2. Having shadows makes this even worse.

Light issue
Your scene has 4 active lights: 2 DirectionalLights, an Omnilight, and a Spotlight. Every light has shadows turned on. The DirectionalLights are each using 4 quadrants. This means that to draw a single cube you are using up to 14 draw calls (1 for each light, 4 for each DirectionalLight, and 1 each for the others).

Depending on where the lights and cubes are positioned, the cube needs to be drawn once for each light, and once for each shadowmap/shadow quadrant. So, even if there are only 1,700 objects on screen you are sending 23,800 draw calls. Like I said before that is a lot of drawcalls. On top of the number of drawcalls, each light needs to perform culling, you have the max distance of the lights set to 400, so almost nothing gets culled.

Summary

Essentially, you have created a worst case scenario for rendering in GLES2. You have a pretty good GPU, but it never gets the opportunity to work because your CPU is working at max capacity trying to keep up with the draw calls and to handle the physics you are throwing at it.

@gdesecrate
Copy link
Author

I do not scale physics objects, they still have the same collusion shape. I scale down only the meshinstance (rendering) node.

Please read the issue first before counting drawcalls and run both tests till like 30k cubes and compare the results.
Restart the app, instantly unpress the checkbox "shrink" and see that there is no degradation. Idk what bottleneck since it works fullspeed on my machine if I do not shrink the meshes with shadows on.

Flow of small objects cause persistent shadow performance degradation. This sample scene is a synthetic test I made to showcase the issue I had in my game.

In my game I removed shrinking objects and accumulative performance degradation was fixed.

@JohanAR
Copy link
Contributor

JohanAR commented Jul 3, 2020

I ran this too but I couldn't wrap my head around it.. I made a modification so I could see how many cubes were active, and it was 182 all the time unless I pressed the button to spawn more (by counting number of instances in the scene graph btw.) According to the debug monitor I had around 1600-1800 draw calls, which isn't that high, assuming Godot reports an accurate number.

I also changed to die-function to this:

var tmpscale = 1.0-(life-diesafter)
if tmpscale < 0.1:
	tmpscale = 0.1
mi.scale=(tmpscale)*Vector3(1,1,1)

So unless my brain has betrayed me, scale should never be less than 0.1 which isn't extremely small and this slowdown still happens. If I change the limit to 0.2 the fps doesn't drop as quickly, but I can see cpu usage slowly going up so I'm going to leave it like this for a couple of minutes to see what happens.

On the other hand, if I reverse the function so that objects grow instead of shrink:
var tmpscale = 1.0+(life-diesafter)
there is no gradual slowdown. I.e. exactly the same scene can run forever at 144Hz as long as the objects aren't scaled down. But if they are scaled down fps starts dropping after ~10 seconds.

Can see it with htop as well:
If I scale cubes down one thread goes to 100% cpu and it starts lagging.
If I scale cubes up the top thread stays at 50% cpu.

I also tried changing the project the GLES3 and I saw the same behaviour there. I have no idea why this happens, but to me it looks like some kind of resource leak associated with small objects and shadows.

@gdesecrate
Copy link
Author

I have encountered this issue while working on my game and most of objects were shrinking before queue_free(). And they were spawning alot (they were small bouncing particles).

Since it was piling up "idle time" on profiler really hard - I thought it was because my bad scripting or issues with GDscript. I spent last week trying to isolate this issue and found it.

For my current game I already removed shrinking and it solved the issue for my current project.

But maybe this issue is affected not just by meshinstance node scale but also by shadow texture or screen size or some ratio between texture size and object scale (I haven't tested it) or even shadow compression parameters. Maybe on 4k displays the behavior is different and degradation happens on different rate.

@lawnjelly
Copy link
Member

lawnjelly commented Jul 4, 2020

I just tried it and for me, the minimum reproduction project isn't really doing the job it should be doing - there is too much going on. We really need a simpler situation with nothing varying except the minimum needed to cause one specific bug, so we can track down what is going wrong (if anything) and fix it.

If you provide a complex scenario, it's difficult to reason with it, because it is complex (there could be multiple possible causes for an issue).

For instance - is this to do with physics or not? Try a situation without physics, do you get the same issue? You are combining physics, scaling, lighting (different types), shadows (different types), plus spawning, throwing it together into a complex mix without pinning down what we should be looking at.

@gdesecrate
Copy link
Author

It was a simple project to synthesize the flow of really small meshes casting/recieving shadows. Was a proof of concept.

Do you need me to create another, more simplier scene? If I remove physics, I probably can spawn meshes even faster to torture the issue.

@lawnjelly
Copy link
Member

lawnjelly commented Jul 5, 2020

Do you need me to create another, more simplier scene?

Yes. Any minimum project should be as simple as possible in order to remove confounding factors.
https://en.wikipedia.org/wiki/Confounding

More info here in the report a bug section:
https://github.com/godotengine/godot/blob/master/CONTRIBUTING.md

@gdesecrate
Copy link
Author

gdesecrate commented Jul 5, 2020

Here you go
shadowbug_.zip
I have managed to reach and display this issue even faster.
a) Random Cubes, shadows:
Screenshot_20200705_110529
b) Random Cubes, no shadows:
Reproduced the:

  1. Scaling physics objects to near 0 causes a slowdown. This is a known issue, see: Godot acts weird when scale of a mesh is near 0. #31134

I do not know why you do refer meshes as "physics objects"
Screenshot_20200705_111937
c) 1.0 scale cubes, no shadows
Screenshot_20200705_112218
d) 1.0 scale cubes, shadows
Screenshot_20200705_112922

@lawnjelly
Copy link
Member

lawnjelly commented Jul 5, 2020

Even without shadows there seems to be a memory leak, looking at Monitors: Memory Static.

When I comment out this line in msh.gd:

	scale=(0.001+randf())*Vector3(1,1,1)

There is a slow leak.

When I change this line to:

	scale = Vector3(1, 1, 1) * 0.001

The memory leaks much faster.

Maybe a leak detection build will track this down, I don't have time at the moment but maybe someone else wants to try in the meantime. At a guess, it could be that leaked objects are still running some kind of processing (octree maybe?).

Leak is slower with constant position and scale, which suggests rendering / octree or physics.

@gdesecrate
Copy link
Author

Screenshot

@JohanAR
Copy link
Contributor

JohanAR commented Jul 9, 2020

What do you want to say with that screenshot? It looks like some person claimed that you can't use queue_free() from an object, which is incorrect. I don't see what this contributes to your issue.

Official FPS tutorial using queue_free() to let objects free themselves

@volzhs
Copy link
Contributor

volzhs commented Jul 9, 2020

I guess he has used godot more than 3 years. (but maybe not be noticed changes)
queue_free() did not free if object is not in scene tree before 3 years ago.
but it's changed to free it whether it's in scene tree or not after 4820dfc

@lawnjelly
Copy link
Member

I just did a little debug. It does appear to be leaking octants in the octree. The octant_count just keeps rising. I tried calling _optimize() after each erase, but it didn't help.

This will probably require @reduz or someone familiar with octree to fix.

@JFonS
Copy link
Contributor

JFonS commented Jul 30, 2020

@gdesecrate I tested you reproduction project and there is a performance degradation indeed. The internal Octree gets trashed by all the tiny objects being added and removed from it.

This issue will need to be investigated and fixed, but if you need to publish your game ASAP, as a workaround I can propose you scale everything in the game by a factor of 10. From what I tested the issue only appears when adding/removing objects smaller than about (0.07,0.07,0.07).

@gdesecrate
Copy link
Author

gdesecrate commented Jul 30, 2020

@JFonS I still got month to planned release. But all seems fine if I disable shadows. And if I going to release demo (~3 maps) the performance drop shouldn't appear much unless demo being replayed many times.

It seems it is complex issue 1) small meshes destroy performance 2) shadows performance (if they are on) get's damaged by small polygons

I believe that issue also(!) affect shadow maps even at larger scale. Because slowdowns (appears after beating the few maps) are gone if I disable shadows in ingame menu and back when I do turn it on.

Most of my ingame objects are 1.0+ (I spent hours reworking them). Few are still around 0.5 but they don't respawn and should not damage the performance much.

I mean my game uses directional light with big distance. And even 1.0 scale objects damage it somehow as I believe.

MeshInstance with 1.0 scale can damage performance as well if it is small sized one (really small geometry). I had to replace sparks with sprites.

I was asking for way to reset the renderer so I could do it at every map load/player death and It would resolve the problem for my current situation.

@lawnjelly
Copy link
Member

lawnjelly commented Jul 30, 2020

Another option is to use pools. This may (or may not) get around the leak issue in the octree.
i.e. If you know you will need maximum 100 boxes, create them at startup and reuse them.

Actually, scratch that idea. I just tried it. The leak is also triggered simply by moving small objects around. Maybe it's creating large depth to the octree and not clearing up after itself when you move objects.

@gdesecrate
Copy link
Author

@lawnjelly It was first thing I tried to do month ago. Did not helped at all. And repluging nodes from tree and back gave me wierd long names like "@@@@@@@@@@@@@@@@@@@@@@@@@@@...." as I recall. I had to rework most of visual effects to make sure issue does not appear after 30 seconds of ingame time (

@Calinou
Copy link
Member

Calinou commented Jul 30, 2020

@gdesecrate The node name length might be fixed by #40724.

@reduz
Copy link
Member

reduz commented Aug 1, 2020

From the look of things it seems like Octree is "leaking" nodes. When a node has no longer any children it should be deleted, and it seems this is not happening, so stuff that culls the octree goes through a lot more children nodes than it should. I don't think this was a problem before, otherwise other users might have noticed so it may have been introduced by a recent change. I suggest doing bisection, otherwise try to verify in octree.h when nodes are deleted if the conditions are properly met.

This class is extremely difficult to debug and its slow, so for 4.0 I will remove it entirely and use brute force. In the mean time I suggest seeing if you guys can find what the problem is.

@gdesecrate
Copy link
Author

gdesecrate commented Aug 1, 2020

@reduz Is there a GDScript command to purge it? Or some hack to clear nodes for octtree?
I believe I am affected by that issue since 2019 in my previous FPS game

Maybe noone noticed because they don't have long playing sections or alot of dynamic objects.
I have stumbled on this bug because I had alot of mesh objects scaling down as visual effect.

@reduz
Copy link
Member

reduz commented Aug 1, 2020

No, it should not need purging. If it leaks nodes its a bug.

@reduz reduz closed this as completed Aug 1, 2020
@clayjohn
Copy link
Member

clayjohn commented Aug 6, 2020

@gdesecrate No. 3.2.3 has already had 2 release candidates, so it will only accept priority bug fixes that are not risky. We are not sure this is even a proper fix so this is a high risk bug fix. At best it will be in 3.2.4

@lawnjelly
Copy link
Member

lawnjelly commented Aug 6, 2020

I'm away from home so have only been able to look over the octree.h file by eye (can't compile), but in addition to the great points @naithar has made (I'm not sure they are the best fix yet, will try when back, but looks like the right avenue):

While the whole idea of using octree is maybe not the best with modern hardware (hence the idea of moving to different method in 4.x) there seem to be a number of potential problems in this implementation. I'll list here just a few things (some of which may not be a problem in practice, as I say I cannot test here):

Paradigm

The most questionable thing seems to be the paradigm of the API. In common with physics, you have a choice, you can:

  1. Move / Create / Erase an object
  2. keep the tree up to date
  3. check for all collisions
  4. then move on to next object

This is what octree seems to do currently. The problem is it can be inefficient, doing essentially the same processing multiple times, and also result in collisions where there are none. For instance if 2 objects are social distancing at 2 metres, and they move 2 metres per tick, then after moving A it will be colliding with B. This reports a pair / collision. However later in the tick B moves also so there should be no collision.

The solution often used is to move processing into 2 stages:

  1. Create / move / erase ALL objects
  2. Keep tree up to date
  3. Process collisions

With a large number of moving objects the first method can generate a lot of repetition in processing that can be done once with the second method. It also makes it easier to debug.

Also for 4.x making a distinction between broad phase and narrow phase may be an idea. When using rooms for instance, there is typically no need for an octree, and you would replace this as the broad phase (or have an optional octree per room, for instance). The best choice for broadphase may depend on the game.

e.g. Broadphase output might be, for a collision check against a convex hull:
List of known within objects: 1, 4, 5, 20, 22
List of possible objects: 2, 3, 15, 16

That way no test needed for the known objects (where the bound was completely within) and the possible objects could be brute force tested with an optimized routine.

And instead of using generic tests like against convex hull, you might test with more specific versions like frustum tests for cameras and lights, which can then do occlusion culling in the broad phase.

Also there is the opportunity to use the same spatial partitioning scheme for godot physics, ray tests etc.

Specifics

A few specifics I noticed:

Move

If an element shrinks, it doesn't seem to be placed in a smaller quadrant

remove_element_from_octant

  1. Unpairing even if paired in multiple octants (maybe this is what the ref count is for - i'm not sure the pairing system is very efficient)
  2. Remove element from octant doesn't appear to actually remove the element from the octant... (especially when called from move)

remove_element

Doesn't remove octants for use_pairs.

Overall I think rather than just a fix without full understanding we need a thorough systematic debugging of this template - perhaps a rewrite of certain aspects, especially given the central nature of octree (having said that I wouldn't object to a bodge fix in short term).

As is, I believe it may be pathologically inefficient in a number of scenarios, and also suffer from multiple bugs. It would also be an option to just replace it with linear methods (and maybe SAP as reduz suggested or something like that). Normally I would agree with keeping as much the same as possible as we are aiming for bug fixes rather than new features, but there seem to be so many potential problems with the current implementation, that starting fresh now with the simpler solution could be advisable.

Anyway keep up the good work, I'll also have a proper look when I'm home next week.

Edit : Also the area @naithar has pinpointed I don't think will fix the leak on moving nodes. Despite _remove_element_from_octant not removing elements from octants, the actual removal is meant to happen during a move on line 925:

		if (use_pairs && e.pairable)
			o->pairable_elements.erase(F->get().E);
		else
			o->elements.erase(F->get().E);

The move basically seems to reinsert the element at the start of the move, then try to perform a delete so that the octants are preserved, and there are temporarily 2 references to an element.

The move bug may be occurring as a result of the order of operations, it may be worth having 2 loops through the owners here, one to erase pairable_elements and one to _remove_element_from_octant (which should be renamed, because it doesn't do this). I can't debug but this seems to be a possible area.

_optimize

Just to be clear this routine isn't to remove octants from the bottom of the tree. It is to shrink the root of the tree, if only one octant is being used. It should probably be renamed.

@naithar
Copy link
Contributor

naithar commented Aug 6, 2020

Edit : Also the area @naithar has pinpointed I don't think will fix the leak on moving nodes. Despite _remove_element_from_octant not removing elements from octants, the actual removal is meant to happen during a move on line 925:

Yeah, _remove_element_from_octant doesn't remove an actual element stored in octant. In move and erase->_remove_element it happens before _remove_element_from_octant gets called.
I'm not sure if _remove_element_from_octant naming is wrong or it was meant to do some more work before, but now it seems to be invalidating (I'm not sure how to call it) octants by going up from specific octant, while _optimize invalidates octants by going down from current root.

I haven't tested how move behaves under stress, so it could be that it still leaks some octants or even objects. I'll try to find out.

Edit
Yeah, if objects are moving I can see a constant growth of memory usage in Instruments app. Octree size is also going up.
I guess the reason for it is that objects are placed in common_parent which in case of "small objects" would force octree to go deeper and create more octants instead of using existing ones. That would block the invalidation from happening.

Edit

Moving

godot/core/math/octree.h

Lines 906 to 913 in 23b553b

//prepare for reinsert
e.octant_owners.clear();
e.common_parent = NULL;
e.aabb = p_aabb;
_insert_element(&e, common_parent); // reinsert from this point
pass++;
after _optimize method fixed octree not getting cleared, but resulted in crash after some time.

Replacing common_parent to root fixed the crash, but resulted in ERR_FAIL_COND(e.octant_owners.front() == NULL); to fire after some time.

Also, while move does cause a leak, after object gets freed by queue_free or whatever, memory gets freed with the "fix" in erase. Also it seems that unmodified move does manage to clean up octree correctly sometimes.

@naithar
Copy link
Contributor

naithar commented Aug 6, 2020

Testing erase fix on master branch doesn't seem to be causing any negative changes to the way Godot's working, but I guess it should be tested more, just in case.

@gdesecrate
Copy link
Author

I tried to fabricate a new test. Where cubes just move around (no respawning).
shadowcubes.zip
Issue did not manifested itself. Yet.

Maybe issue really happens only when meshes do spawn/despawn?

@naithar
Copy link
Contributor

naithar commented Aug 6, 2020

I tried to fabricate a new test. Where cubes just move around (no respawning).

shadowcubes.zip

Issue did not manifested itself. Yet.

Maybe issue really happens only when meshes do spawn/despawn?

Well I think the reason you are not getting massive degradation is mostly the way move is implemented.
Currently it inserts element not at root but at a nearest common parent. In some cases this results in using existing octant, in other it still forces octree to create a new one, but not as deep as when we are inserting elements to the root.
Also move sometimes manages to squeeze the tree, making it smaller and freeing up the memory.
So as move does really leak octants, it doesn't do it as much as spawn/despawn does. And thats the reason it might be missed, but monitoring shows leaks anyway :)

@lawnjelly
Copy link
Member

Am finally back so have had a chance to get a look at this too. @naithar fix is mostly good but there's a small potential problem which could cause a crash which be nice to have covered (this may have been the reason for the original bug, a counter to this crash). I've discussed it with reduz and there are a few other changes which can be made at the same time.

@naithar if you are able to make a WIP PR for your change, I'll write on that the modification that may be needed for safety, see if you agree. Meanwhile I'll have a look at making some of the other changes, for a separate PR.

Also it turned out that it may have been a false alarm from me on the moving, due to me not reading the gdscript in the test project properly (and not being able to run whilst away). Sorry, my bad! 😁

There are still some questions in my mind over whether the pairing could introduce leaks (by hanging references in child nodes, preventing _remove_element_from_octant working properly) but unless we have evidence of this being a problem, I guess we should just trust the existing code for now.

@naithar
Copy link
Contributor

naithar commented Aug 8, 2020

Also it turned out that it may have been a false alarm from me on the moving, due to me not reading the gdscript in the test project properly (and not being able to run whilst away). Sorry, my bad! 😁

In my testing moving objects, which causes octree's move to be called, does cause a memory usage growh which seems to result in FPS degradation after some time. But I'm not sure if it's entirely moves fault, as looking in Instruments app shows a really small increase in memory usage as well as it sometimes gets lower.

There are still some questions in my mind over whether the pairing could introduce leaks (by hanging references in child nodes, preventing _remove_element_from_octant working properly) but unless we have evidence of this being a problem, I guess we should just trust the existing code for now.

From what I've seen clearing paired elements seems to work fine, but maybe I just don't know where to look :)

@naithar if you are able to make a WIP PR for your change, I'll write on that the modification that may be needed for safety, see if you agree. Meanwhile I'll have a look at making some of the other changes, for a separate PR.

Sure, should it go to master branch too?

@lawnjelly
Copy link
Member

I'm facing the same problem as I'm just about to make PR.

Given that we have to tweak it first, and octree has diverged a bit in master (some changes to geometry), it would be best to start with just one version, probably master if you have it compiling. I'm probably going to make mine against 3.2 then port to master when finalized, as it will be a bit more involved to port.

@naithar
Copy link
Contributor

naithar commented Aug 8, 2020

I'm probably going to make mine against 3.2 then port to master when finalized, as it will be a bit more involved to port.

Then I guess it would be better to stick to 3.2 for now, at least this way it would be a bit easier to test

@naithar
Copy link
Contributor

naithar commented Aug 8, 2020

@lawnjelly this might be of some interest for move:

Stack trace for a Malloc 144 bytes section that's growing only on object's movement:

   0 libsystem_malloc.dylib malloc_zone_malloc
   1 libsystem_malloc.dylib malloc
   2 godot Memory::alloc_static(unsigned long, bool) src/core/os/memory.cpp:82
   3 godot DefaultAllocator::alloc(unsigned long) src/core/os/memory.h:65
   4 godot operator new(unsigned long, void* (*)(unsigned long)) src/core/os/memory.cpp:47
   5 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::_insert_element(Octree<VisualServerScene::Instance, true, DefaultAllocator>::Element*, Octree<VisualServerScene::Instance, true, DefaultAllocator>::Octant*) src/core/math/octree.h:495
   6 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::_insert_element(Octree<VisualServerScene::Instance, true, DefaultAllocator>::Element*, Octree<VisualServerScene::Instance, true, DefaultAllocator>::Octant*) src/core/math/octree.h:476
   7 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::_insert_element(Octree<VisualServerScene::Instance, true, DefaultAllocator>::Element*, Octree<VisualServerScene::Instance, true, DefaultAllocator>::Octant*) src/core/math/octree.h:476
   8 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::_insert_element(Octree<VisualServerScene::Instance, true, DefaultAllocator>::Element*, Octree<VisualServerScene::Instance, true, DefaultAllocator>::Octant*) src/core/math/octree.h:476
   9 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::_insert_element(Octree<VisualServerScene::Instance, true, DefaultAllocator>::Element*, Octree<VisualServerScene::Instance, true, DefaultAllocator>::Octant*) src/core/math/octree.h:476
  10 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::_insert_element(Octree<VisualServerScene::Instance, true, DefaultAllocator>::Element*, Octree<VisualServerScene::Instance, true, DefaultAllocator>::Octant*) src/core/math/octree.h:476
  11 godot Octree<VisualServerScene::Instance, true, DefaultAllocator>::move(unsigned int, AABB const&) src/core/math/octree.h:899
  12 godot VisualServerScene::_update_instance(VisualServerScene::Instance*) src/servers/visual/visual_server_scene.cpp:993
  13 godot VisualServerScene::_update_dirty_instance(VisualServerScene::Instance*) src/servers/visual/visual_server_scene.cpp:3452
  14 godot VisualServerScene::update_dirty_instances() src/servers/visual/visual_server_scene.cpp:3464
  15 godot VisualServerRaster::draw(bool, double) src/servers/visual/visual_server_raster.cpp:106
  16 godot VisualServerWrapMT::draw(bool, double) src/servers/visual/visual_server_wrap_mt.cpp:102
  17 godot Main::iteration() src/main/main.cpp:2122
  18 godot OS_OSX::run() src/platform/osx/os_osx.mm:3142
  19 godot main src/platform/osx/godot_main_osx.mm:71
  20 libdyld.dylib start

If I switch moving with spawning/despawning memory usage is pretty constant.

But I've made some logs and it seems like clearing old octants actually work:

[moving ElementID: 14] octants: 187 -->  185
[moving ElementID: 9] octants: 185 -->  187
[moving ElementID: 25] octants: 187 -->  190
[moving ElementID: 15] octants: 190 -->  185
[moving ElementID: 23] octants: 185 -->  184
[moving ElementID: 11] octants: 184 -->  181
[moving ElementID: 16] octants: 181 -->  184
[moving ElementID: 3] octants: 184 -->  182
[moving ElementID: 25] octants: 182 -->  182
[moving ElementID: 9] octants: 182 -->  180
[moving ElementID: 14] octants: 180 -->  178
[moving ElementID: 12] octants: 178 -->  175
[moving ElementID: 8] octants: 175 -->  174
[moving ElementID: 11] octants: 174 -->  176
[moving ElementID: 25] octants: 178 -->  184

So I guess increased memory usage and octree size might be due to the way octree works and it's okay.

Applying #41123 shows the same results - octree growth on objects getting farther away, but octants count is growing much slower and is much smaller. After running test app for 5 minutes with 2500 moving objects octree got only 8 octants.
I've also noticed testing with 25 moving objects results in same octree size.

@lawnjelly
Copy link
Member

Yes, if you call debug_octants on my PR version, it will list the octant tree which can help show what is happening.

Like you say, it may be that the leak you were seeing on move isn't actually a genuine leak, but a consequence of how the octree works, the number of octants may increase a bit as you run with random moves, then level out and stabilise, then these will be properly deleted when you remove the elements.

I couldn't seem to coax any leaking octants on move in my tests, and in the godot debug monitor static memory etc wasn't increasing.

The observations you made on my PR are correct. The default octant size is set to a high value (8096 elements per octant I think). The upshot is that brute force actually seems to be faster than using the octree. At least for render culling. I've yet to fully test it for the other 2 octrees (visibility notifier and godot physics). The octant size is configurable.

We'll be looking at multiple possible spatial partitioning schemes for 4.x, e.g. rooms and also perhaps including a bounding volume hierarchy like bullet, there may still be an octree. But as always we need to check whether brute force is actually faster. With modern CPUs it can be surprising but the benefits of cache etc can mean that brute force trumps algorithms (up to a certain point).

@JohanAR
Copy link
Contributor

JohanAR commented Aug 13, 2020

This might be a controversial opinion, but wouldn't it be better to just pull in someone else's octree library? Surely there has to be a lightweight, fast, well tested and well documented library with a suitable license somewhere? Not just referring to this particular bug, but I had a look at the code and at least IMHO it was very difficult to read and verify its correctness.

@Zireael07
Copy link
Contributor

Octree is going to be dropped for 4.0 either way, from what I heard, so pulling in a library doesn't make sense. Plus Godot's license is incompatible with GPL.

@gdesecrate
Copy link
Author

I know, it could too much to ask. But maybe 2.x branch should receive the patch as well? In case someone still supports their 3D games/products there.

@Calinou
Copy link
Member

Calinou commented Aug 13, 2020

@JohanAR See this section of Best practices for engine contributors. We prefer adding more dependencies only as a last resort. If there's a better solution (which is removing the octree system entirely and replacing it by a simpler approach), we'd rather go for that instead.

@lawnjelly
Copy link
Member

This might be a controversial opinion, but wouldn't it be better to just pull in someone else's octree library? Surely there has to be a lightweight, fast, well tested and well documented library with a suitable license somewhere?

Also note that in Godot 3.x despite the name, the octree does a lot more than a standard octree, it also handles pairing, which makes it difficult / impossible to simply drop in a replacement.

@JohanAR
Copy link
Contributor

JohanAR commented Aug 14, 2020

Well if it's being dropped then it doesn't make sense to pull in a replacement.

I know of Godot's goal to minimize dependencies, and I don't want to go far off topic here, but at least to me an octree sounded like a good candidate for importing: It's a pure maths/data class so unlikely to require additional dependencies, it's unlikely to require updating once pulled in, and it would be good if it was thoroughly unit tested (by someone else :))

@Zireael07 I only had a quick look, but there were plenty of implementations with other licenses such as BSD and MPL. One was "GPL but email me if you want a more permissive license" :)

@lawnjelly I think the pairing stuff is one of the reasons I had difficulties figuring out exactly what the code does :)

But as you informed me that it's going away there's no need to spend more effort on it than necessary

@gdesecrate
Copy link
Author

gdesecrate commented Aug 27, 2020

Build https://github.com/Gramps/GodotSteam/releases/tag/g3-s149-gs37 seems to work well so far

15985220440229

@gdesecrate
Copy link
Author

Does octo-tree affects 3D sound system by any chance?

@gdesecrate
Copy link
Author

gdesecrate commented Sep 3, 2020

Current Gramp's build (with steam platform integration) uses @lawnjelly solution. I have noticed that at some models (parts of final boss' body in Kotel and player hands/weapons models in samozbor id:heaven) flicker or go invisible. Also one of the circular saws on map6 in kotel have bugged 3d sound.

I haven't managed to reproduce flickering/3d sound glitch with build by @naithar

@lawnjelly
Copy link
Member

Current Gramp's build (with steam platform integration) uses @lawnjelly solution. I have noticed that at some models (parts of final boss' body in Kotel and player hands/weapons models in samozbor id:heaven) flicker or go invisible. Also one of the circular saws on map6 in kotel have bugged 3d sound.

I haven't managed to reproduce flickering/3d sound glitch with build by @naithar

There may be a further place I'm not updating the dirty octants, but I'll probably need a minimum reproduction project - it would be best to post on the PR #41123 rather than here. I don't know offhand whether 3d sound uses the octree, will have a look.

@akien-mga
Copy link
Member

#41123 is now merged.

@gdesecrate If you can still reproduce #40059 (comment), please open a dedicated issue so that we can debug further (ideally with a reproduction project).

@akien-mga
Copy link
Member

Might be fixed by #44901 (will be available in 3.2.4 beta 6 to test).

@Calinou
Copy link
Member

Calinou commented Mar 16, 2021

Closing as resolved by #44901. Please comment if you can still reproduce this on 3.2.4rc4 or later.

@Calinou Calinou closed this as completed Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests