Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intercept mode deadlocks with multiple threads driving separate GPUs #113

Open
gcongiu opened this issue May 17, 2023 · 4 comments
Open

Comments

@gcongiu
Copy link

gcongiu commented May 17, 2023

Running intercept_multi_thread_monitoring test in papi (located in papi/src/components/rocm/tests/intercept_multi_thread_monitoring) deadlocks. The test runs a matrix-to-matrix multiplication kernel and uses multiple threads, each driving and monitoring its own dedicated GPU. The backtrace from rocgdb follows:

#0  0x00007ffff599e54d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff5999eb6 in _L_lock_941 () from /lib64/libpthread.so.0
#2  0x000000010440c741 in ?? ()
#3  0x00007ffeeb9999d1 in rocprofiler::util::HsaRsrcFactory::GetKernelNameRef(unsigned long) () from /opt/rocm-5.5.0/lib/librocprofiler64.so
#4  0x00007ffeeb982f33 in rocprofiler::InterceptQueue::QueryKernelName(unsigned long, amd_kernel_code_s const*) ()
   from /opt/rocm-5.5.0/lib/librocprofiler64.so
#5  0x00007ffeeb9898e5 in rocprofiler::InterceptQueue::OnSubmitCB(void const*, unsigned long, unsigned long, void*, void (*)(void const*, unsigned long)) () from /opt/rocm-5.5.0/lib/librocprofiler64.so
#6  0x00007fffece1aafa in rocr::core::InterceptQueue::StoreRelaxed(long) () from /opt/rocm-5.5.0/lib/libhsa-runtime64.so.1
#7  0x00007fffece0d9a8 in rocr::HSA::hsa_signal_store_screlease(hsa_signal_s, long) () from /opt/rocm-5.5.0/lib/libhsa-runtime64.so.1
#8  0x00007ffff66850c4 in bool roc::VirtualGPU::dispatchGenericAqlPacket<hsa_kernel_dispatch_packet_s>(hsa_kernel_dispatch_packet_s*, unsigned short, unsigned short, bool, unsigned long) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#9  0x00007ffff66828d7 in roc::VirtualGPU::submitKernelInternal(amd::NDRangeContainer const&, amd::Kernel const&, unsigned char const*, void*, unsigned int, amd::NDRangeKernelCommand*, hsa_kernel_dispatch_packet_s*) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#10 0x00007ffff6683638 in roc::VirtualGPU::submitKernel(amd::NDRangeKernelCommand&) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#11 0x00007ffff6654d1a in amd::Command::enqueue() () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#12 0x00007ffff657b723 in ihipModuleLaunchKernel(ihipModuleSymbol_t*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, ihipStream_t*, void**, void**, ihipEvent_t*, ihipEvent_t*, unsigned int, unsigned int, unsigned int, unsigned int, unsigned long, unsigned long, unsigned int) () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#13 0x00007ffff65a27a7 in ihipLaunchKernel(void const*, dim3, dim3, void**, unsigned long, ihipStream_t*, ihipEvent_t*, ihipEvent_t*, int) ()
   from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#14 0x00007ffff657b5a2 in hipLaunchKernel_common () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#15 0x00007ffff6589e12 in hipLaunchKernel () from /opt/rocm-5.5.0/lib/libamdhip64.so.5
#16 0x0000000000473597 in __device_stub__matmul(float*, float*, float*, int) ()
#17 0x0000000000473aa5 in hip_do_matmul_work (handle=0xa88530, stream=0xa1f610) at matmul.cpp:111
#18 0x00000000004730fc in .omp_outlined._debug__(int &, const char *(&)[4], int &, hipError_t &) const (.global_tid.=0x7fffffffc670,
    .bound_tid.=0x7fffffffc668, papi_errno=@0x7fffffffca6c: 0, events=..., pass_with_warning=@0x7fffffffca68: 0,
    hip_errno=@0x7fffffffca64: hipSuccess) at multi_thread_monitoring.cpp:103
#19 0x00000000004734cd in .omp_outlined.(void) const (.global_tid.=0x7fffffffc670, .bound_tid.=0x7fffffffc668, papi_errno=@0x7fffffffca6c: 0,
    events=..., pass_with_warning=@0x7fffffffca68: 0, hip_errno=@0x7fffffffca64: hipSuccess) at multi_thread_monitoring.cpp:63
#20 0x00007ffff7fcbf43 in __kmp_invoke_microtask () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#21 0x00007ffff7f5177f in __kmp_invoke_task_func () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#22 0x00007ffff7f4b85b in __kmp_fork_call () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#23 0x00007ffff7f3cb05 in __kmpc_fork_call () from /opt/rocm-5.5.0/llvm/bin/../lib/libomp.so
#24 0x0000000000472da2 in multi_thread (argc=1, argv=0x7fffffffcb88) at multi_thread_monitoring.cpp:63
#25 0x00000000003eaf80 in main (argc=1, argv=0x7fffffffcb88) at intercept_multi_thread_monitoring.cpp:13

The version of ROCm used to reproduce this problem is 5.5.0 RC5. Test was ran on two MI210s.

@gcongiu
Copy link
Author

gcongiu commented May 18, 2023

I verified this with rocm-5.5.0 stable release and the problem is also present there.

@gcongiu
Copy link
Author

gcongiu commented Jun 9, 2023

@ammarwa any update on this?

@gcongiu
Copy link
Author

gcongiu commented Jun 13, 2023

Added reproducer
issue-113.tar.gz

gcongiu added a commit to gcongiu/rocm-issues that referenced this issue Jul 3, 2023
@ppanchad-amd
Copy link

@gcongiu Apologies for the lack of response. Can you please check if your issue still exists with the latest ROCm 6.2? If so, we will further investigate the issue. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants