Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OptiX SBT is released on non-main thread #1091

Open
dvicini opened this issue Feb 28, 2024 · 2 comments
Open

OptiX SBT is released on non-main thread #1091

dvicini opened this issue Feb 28, 2024 · 2 comments

Comments

@dvicini
Copy link
Member

dvicini commented Feb 28, 2024

Hi,

I am seeing an odd, sporadic issue I haven't encountered before. I am running the cuda_ad_rgb variant on a relatively unconventional setup. I am rendering a simple scene in a colab notebook (similar to a Jupyter notebook) on a Nvidia v100 datacenter GPU.

Everything runs fine most of the time. However, every few runs I see the following issue: jit_optix_configure_sbt initially gets invoked on the main thread when loading the scene, but the cleanup callback that releases internal structures gets invoked on another thread. I am debugging this by printing std::this_thread::get_id() both on the configure call and within the cleanup call. I tried turning off parallel scene loading, but that didn't seem to make a difference.

Practically, the issue is then that the jit_free call in the cleanup callback internally refers to thread_state_cuda to free up host pinned memory. However, if the cleanup happens on the non-main thread, the thread_state_cuda might not have been initialized, leading to a null pointer dereference in jitc_free when getting thread_state_cuda->stream.

Two questions for this:

  1. Is it expected that the cleanup might happen on a non-main thread? I am using the mi.render function, do we expect any of that custom op mechanism to potentially lead to another thread holding a reference to the Scene object?

  2. Can we just replace the unprotected access to the cuda thread state by thread_state(JitBackend::CUDA) in jit_free?

It is possible that the non-main thread cleanup is related to something in ipython/colab holding an extra reference to the render op or the Mitsuba scene somehow.

This is the code I run in my colab cell, likely not of much help to understand the issue though:

import drjit as dr
import mitsuba as mi
import numpy as np

dr.set_thread_count(1)
mi.set_variant('cuda_ad_rgb')
dr.set_log_level(dr.LogLevel.InfoSym)

def render():
  scene = mi.load_dict(
      {
          'type': 'scene',
          'integrator': {'type': 'direct'},
          'shape': {
              'type': 'cube',
          },
          'emitter': {'type': 'constant'},
          'sensor': {
              'type': 'perspective',
              'to_world': mi.ScalarTransform4f.look_at(
                  [4, 4, 4.5], [0, 0, 0], [0, 1, 0]
              ),
          },
      },
      parallel=False,
  )
  image = np.array(mi.render(scene, spp=256))
  dr.sync_thread()
  del scene
  return image
  
image = render()
@wjakob
Copy link
Member

wjakob commented Feb 28, 2024

Are you combining this with another array programming framework? For example, PyTorch code involving (our) custom operations causes them to be called from other threads during differentiation.

@dvicini
Copy link
Member Author

dvicini commented Feb 28, 2024

No, nothing of that sort.

We would not expect Dr.Jit to ever use another thread here right? All the references to the renderop and the scene object should be on the main thread?

If that's the case, I am thinking that this might be something colab specific, e.g., it tries to use threads to provide some additional information to the user for debugging or inspection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants