Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instancing performance #1017

Open
dvicini opened this issue Dec 18, 2023 · 3 comments
Open

Instancing performance #1017

dvicini opened this issue Dec 18, 2023 · 3 comments

Comments

@dvicini
Copy link
Member

dvicini commented Dec 18, 2023

Hi,

I've encountered a performance issue when trying to use instancing. It seems that when instancing simple geometrical primitives, the render time very quickly gets dominated by JIT tracing time. Using the merge shape turns out to be a lot faster than instancing (>100x for a very simple rendering). Are we expecting instancing only to be useful when the instanced object reaches a certain complexity (e.g., a 3D model of a tree with millions of triangles)? I am expecting some overhead due to the additional indirections incurred by instancing, but that overhead is currently completely dominated by the JIT tracing time. If I switch to scalar_rgb mode, instancing and merging shapes perform similarly in my example setup.

I don't know how actionable this currently is, but I thought I would open an issue, as this was quite surprising to me.

Here is a minimal reproducer comparing instancing and merging shapes:

import time as time

import drjit as dr
import mitsuba as mi
import numpy as np

mi.set_variant('llvm_ad_rgb')

# Create 200 cubes at random positions in [-1,1]
n_cubes = 200
cube_positions = np.random.rand(n_cubes, 3) - 0.5
scale = 0.05

# Use the same BSDF for all of them. This is important for the merge shape to work.
bsdf = mi.load_dict({'type': 'diffuse'})

scene_dict = {
    'type': 'scene',
    'sensor': {
        'type': 'perspective',
        'to_world': mi.ScalarTransform4f.look_at(
            [0, 0, 3], [0, 0, 0], [0, 1, 0]
        ),
        'film': {'type': 'hdrfilm', 'height': 256, 'width': 256},
    },
    'emitter': {'type': 'constant'},
    'integrator': {'type': 'direct'},
}

# Approach 1: Use instancing of a re-used cube shape.
instancing_dict = dict(scene_dict)
instancing_dict['cube_group'] = {
    'type': 'shapegroup',
    'main': {
        'type': 'cube',
        'bsdf': bsdf,
    },
}
for i, position in enumerate(cube_positions):
  instancing_dict[f'instance_{i:03d}'] = {
      'type': 'instance',
      'to_world': mi.ScalarTransform4f.translate(position).scale(float(scale)),
      'shapegroup': {'type': 'ref', 'id': 'cube_group'},
  }

instancing_scene = mi.load_dict(instancing_dict)

# Do a first render to warm up the kernel cache.
image = mi.render(instancing_scene, spp=1)
dr.eval(image)
dr.sync_thread()

# Compute timing only on already cached kernel.
t0 = time.time()
image = mi.render(instancing_scene, spp=1)
dr.eval(image)
dr.sync_thread()
print(f'Took {time.time()-t0} seconds')

# Delete so all object instances are de-registered from the system.
del instancing_scene, instancing_dict 


# Approach 2: Use the merge shape, this is much faster currently.
merge_dict = dict(scene_dict)
merge_shape = {'type': 'merge'}
for i, position in enumerate(cube_positions):
  merge_shape[f'instance_{i:03d}'] = {
      'type': 'cube',
      'bsdf': bsdf,
      'to_world': mi.ScalarTransform4f.translate(position).scale(float(scale)),
  }
merge_dict['merge_shape'] = merge_shape

merge_scene = mi.load_dict(merge_dict)
image = mi.render(merge_scene, spp=1)
dr.eval(image)
dr.sync_thread()
t0 = time.time()
image = mi.render(merge_scene, spp=1)
dr.eval(image)
dr.sync_thread()
print(f'Took {time.time()-t0} seconds')
del merge_scene, merge_dict
@njroussel
Copy link
Member

Hi @dvicini

I had never given this any thought, or I just forgot about it. This is indeed a pretty big issue for the JIT. As you said, I don't think there is an "easy" fix we can apply.

(I'm mostly writing the following two paragraphs for myself)

There already is quite a bit of instancing-specific JIT code in Mitsuba. Most of it is to avoid tracing nested vcalls, and recursion issues. But fundamentally we still need to trace every instance object's compute_surface_interaction method.

The only solution that comes to mind (that requires no changes to Dr.Jit) is to add a new Shape plugin that acts as a holder for all instances and have instance not declare its methods to the JIT. This would guarantee that for a set of instances, we only trace the function once. I haven't fully thought this idead through, there might be some other considerations.

@Speierers Was this ever considered? Did we break/lose some mechanism that handled all of this JIT-tracing? I guess for meshes the merge was always a work-around.

I don't plan on working on this immediately. Or at least not until we've gotten the nanobind-powered Dr.Jit in Mitsuba. I doubt it will significantly help, but we might change some interfaces internally that would make it easier to hotfix/monkeypatch the vcall tracing for instancing.

Anyway, thanks for opening the issue. This will serve as a good reference for anyone else that runs into this issue.

@wjakob
Copy link
Member

wjakob commented Dec 18, 2023

It's an O(N^2) issue where tracing the recursive ray tracing call in instance wants to visit all shapes again. We have logic in place to that causes these functions to exit early, which makes them cheaper but does not fundamentally fix the asymptotics.

The solution I had in mind will be to detect when we are tracing a method call that is identical to one we have already traced (or are currently tracing). .. where identical means: same targets, same input types. In that case, we should be able to reuse the call targets without re-tracing them. This specifically only affects the drjit-core layer, the other parts are unaffected.

@dvicini
Copy link
Member Author

dvicini commented Dec 18, 2023

Thanks, yes that all makes sense. I vaguely remembered this being a challenging case, but didn't fully realize the practical implications. Let's keep this issue open for future reference, it's not really urgent to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants