You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am seeing an odd, sporadic issue I haven't encountered before. I am running the cuda_ad_rgb variant on a relatively unconventional setup. I am rendering a simple scene in a colab notebook (similar to a Jupyter notebook) on a Nvidia v100 datacenter GPU.
Everything runs fine most of the time. However, every few runs I see the following issue: jit_optix_configure_sbt initially gets invoked on the main thread when loading the scene, but the cleanup callback that releases internal structures gets invoked on another thread. I am debugging this by printing std::this_thread::get_id() both on the configure call and within the cleanup call. I tried turning off parallel scene loading, but that didn't seem to make a difference.
Practically, the issue is then that the jit_free call in the cleanup callback internally refers to thread_state_cuda to free up host pinned memory. However, if the cleanup happens on the non-main thread, the thread_state_cuda might not have been initialized, leading to a null pointer dereference in jitc_free when getting thread_state_cuda->stream.
Two questions for this:
Is it expected that the cleanup might happen on a non-main thread? I am using the mi.render function, do we expect any of that custom op mechanism to potentially lead to another thread holding a reference to the Scene object?
Can we just replace the unprotected access to the cuda thread state by thread_state(JitBackend::CUDA) in jit_free?
It is possible that the non-main thread cleanup is related to something in ipython/colab holding an extra reference to the render op or the Mitsuba scene somehow.
This is the code I run in my colab cell, likely not of much help to understand the issue though:
Are you combining this with another array programming framework? For example, PyTorch code involving (our) custom operations causes them to be called from other threads during differentiation.
We would not expect Dr.Jit to ever use another thread here right? All the references to the renderop and the scene object should be on the main thread?
If that's the case, I am thinking that this might be something colab specific, e.g., it tries to use threads to provide some additional information to the user for debugging or inspection.
Hi,
I am seeing an odd, sporadic issue I haven't encountered before. I am running the
cuda_ad_rgb
variant on a relatively unconventional setup. I am rendering a simple scene in a colab notebook (similar to a Jupyter notebook) on a Nvidia v100 datacenter GPU.Everything runs fine most of the time. However, every few runs I see the following issue:
jit_optix_configure_sbt
initially gets invoked on the main thread when loading the scene, but the cleanup callback that releases internal structures gets invoked on another thread. I am debugging this by printingstd::this_thread::get_id()
both on the configure call and within the cleanup call. I tried turning off parallel scene loading, but that didn't seem to make a difference.Practically, the issue is then that the
jit_free
call in the cleanup callback internally refers tothread_state_cuda
to free up host pinned memory. However, if the cleanup happens on the non-main thread, thethread_state_cuda
might not have been initialized, leading to a null pointer dereference injitc_free
when gettingthread_state_cuda->stream
.Two questions for this:
Is it expected that the cleanup might happen on a non-main thread? I am using the
mi.render
function, do we expect any of that custom op mechanism to potentially lead to another thread holding a reference to theScene
object?Can we just replace the unprotected access to the cuda thread state by
thread_state(JitBackend::CUDA)
injit_free
?It is possible that the non-main thread cleanup is related to something in ipython/colab holding an extra reference to the render op or the Mitsuba scene somehow.
This is the code I run in my colab cell, likely not of much help to understand the issue though:
The text was updated successfully, but these errors were encountered: