Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphBolt][CUDA] hetero_rgcn example crashes #7296

Closed
mfbalin opened this issue Apr 10, 2024 · 4 comments · Fixed by #7295
Closed

[GraphBolt][CUDA] hetero_rgcn example crashes #7296

mfbalin opened this issue Apr 10, 2024 · 4 comments · Fixed by #7295
Assignees
Labels
Work Item Work items tracked in project tracker

Comments

@mfbalin
Copy link
Collaborator

mfbalin commented Apr 10, 2024

🔨Work Item

IMPORTANT:

  • This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
  • DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

Crash probably due to a bug in #7239, investigating. Crash happens with CUDA 12.3 and CUDA 12.4 so far. CUDA 12.3 is in the NVIDIA torch container.

mfbalin@BALIN-PC:~/dgl-1/examples/sampling/graphbolt/lightning$ CUDA_LAUNCH_BLOCKING=1 compute-sanitizer --tool memcheck python ../rgcn/hetero_rgcn.py
========= COMPUTE-SANITIZER
The dataset is already preprocessed.
Loaded dataset: node_classification
node_num for rel_graph_embed: {'author': tensor(1134649, device='cuda:0', dtype=torch.int32), 'field_of_study': tensor(59965, device='cuda:0', dtype=torch.int32), 'institution': tensor(8740, device='cuda:0', dtype=torch.int32)}
Number of embedding parameters: 154029312
Number of model parameters: 337460
Start to train...
Training~Epoch 01: 375it [04:27,  1.36it/s]========= Invalid __shared__ read of size 4 bytes
=========     at void cusparse::csr2csc_rows_expansion_kernel<(int)128, (int)8, int>(const T3 *, int, T3, const T3 *, T3 *)+0x20e0
=========     by thread (99,0,0) in block (78,0,0)
=========     Address 0x1400 is out of bounds
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x252ff7]
=========                in /usr/lib/wsl/drivers/nv_dispi.inf_amd64_268e85175aa9e991/libcuda.so.1.1
=========     Host Frame: [0x93c3fa]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x99859a]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x79b7db]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x79965f]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x799bec]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame:cusparseCsr2cscEx2 [0xf0b22]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame:dgl::aten::CSRMatrix dgl::aten::impl::CSRTranspose<(DGLDeviceType)2, int>(dgl::aten::CSRMatrix) [0x9c93a0]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::aten::CSRTranspose(dgl::aten::CSRMatrix) [0x32a0b5]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::UnitGraph::GetInCSR(bool) const [0x9937d2]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::UnitGraph::GetCSCMatrix(unsigned long) const [0x993d79]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::HeteroGraph::GetCSCMatrix(unsigned long) const [0x8a9b46]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::aten::SpMM(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<dgl::BaseHeteroGraph>, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> >) [0x7ea870]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::aten::__mk_DGL0::{lambda(dgl::runtime::DGLArgs, dgl::aten::__mk_DGL0::DGLRetValue*)#1}::operator()(dgl::runtime, dgl::aten::__mk_DGL0::DGLRetValue) const [clone .constprop.0] [0x80c3ee]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:std::_Function_handler<void (dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*), dgl::aten::__mk_DGL0::{lambda(dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*)#1}>::_M_invoke(std::_Any_data const&, dgl::runtime::DGLArgs&&, dgl::runtime::DGLRetValue*&&) [0x80ca7d]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:DGLFuncCall [0x84a168]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:__pyx_f_3dgl_4_ffi_4_cy3_4core_FuncCall(void*, _object*, DGLValue*, int*) in dgl/_ffi/_cython/core.cpp:6805 [0x18ef7]
=========                in /home/mfbalin/dgl-1/python/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so
=========     Host Frame:__pyx_pw_3dgl_4_ffi_4_cy3_4core_12FunctionBase_5__call__(_object*, _object*, _object*) in dgl/_ffi/_cython/core.cpp:7629 [0x197cf]
=========                in /home/mfbalin/dgl-1/python/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so
=========     Host Frame:_PyObject_MakeTpCall [0x150a7a]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x149095]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x14326c]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:THPFunction_apply(_object*, _object*) [0x7e7d30]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_python.so
=========     Host Frame: [0x15a137]
=========                in /usr/bin/python
=========     Host Frame:PyObject_Call [0x16942a]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x1455d6]
=========                in /usr/bin/python
=========     Host Frame: [0x16893d]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x1455d6]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x14326c]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x1455d6]
=========                in /usr/bin/python
=========     Host Frame: [0x16893d]
=========                in /usr/bin/python
=========     Host Frame:torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) [0x7e0002]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_python.so
=========     Host Frame:torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) [0x4ea4d3a]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) [0x4e9e815]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) [0x4e9f467]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) [0x4e96b75]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) [0x7dc04b]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_python.so
=========     Host Frame: [0xe62b2]
=========                in /lib/x86_64-linux-gnu/libstdc++.so.6
=========     Host Frame:start_thread in ./nptl/pthread_create.c:442 [0x94ac2]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame: [0x12684f]
=========                in /lib/x86_64-linux-gnu/libc.so.6
========= 
========= Program hit cudaErrorUnknown (error 999) due to "unknown error" on CUDA API call to cudaLaunchKernel.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x34d0a2]
=========                in /usr/lib/wsl/drivers/nv_dispi.inf_amd64_268e85175aa9e991/libcuda.so.1.1
=========     Host Frame: [0x9985da]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x79b7db]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x79965f]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame: [0x799bec]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame:cusparseCsr2cscEx2 [0xf0b22]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12
=========     Host Frame:dgl::aten::CSRMatrix dgl::aten::impl::CSRTranspose<(DGLDeviceType)2, int>(dgl::aten::CSRMatrix) [0x9c93a0]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::aten::CSRTranspose(dgl::aten::CSRMatrix) [0x32a0b5]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::UnitGraph::GetInCSR(bool) const [0x9937d2]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::UnitGraph::GetCSCMatrix(unsigned long) const [0x993d79]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::HeteroGraph::GetCSCMatrix(unsigned long) const [0x8a9b46]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::aten::SpMM(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::shared_ptr<dgl::BaseHeteroGraph>, dgl::runtime::NDArray, dgl::runtime::NDArray, dgl::runtime::NDArray, std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> >) [0x7ea870]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:dgl::aten::__mk_DGL0::{lambda(dgl::runtime::DGLArgs, dgl::aten::__mk_DGL0::DGLRetValue*)#1}::operator()(dgl::runtime, dgl::aten::__mk_DGL0::DGLRetValue) const [clone .constprop.0] [0x80c3ee]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:std::_Function_handler<void (dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*), dgl::aten::__mk_DGL0::{lambda(dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*)#1}>::_M_invoke(std::_Any_data const&, dgl::runtime::DGLArgs&&, dgl::runtime::DGLRetValue*&&) [0x80ca7d]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:DGLFuncCall [0x84a168]
=========                in /home/mfbalin/dgl-1/build/libdgl.so
=========     Host Frame:__pyx_f_3dgl_4_ffi_4_cy3_4core_FuncCall(void*, _object*, DGLValue*, int*) in dgl/_ffi/_cython/core.cpp:6805 [0x18ef7]
=========                in /home/mfbalin/dgl-1/python/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so
=========     Host Frame:__pyx_pw_3dgl_4_ffi_4_cy3_4core_12FunctionBase_5__call__(_object*, _object*, _object*) in dgl/_ffi/_cython/core.cpp:7629 [0x197cf]
=========                in /home/mfbalin/dgl-1/python/dgl/_ffi/_cy3/core.cpython-310-x86_64-linux-gnu.so
=========     Host Frame:_PyObject_MakeTpCall [0x150a7a]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x149095]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x14326c]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:THPFunction_apply(_object*, _object*) [0x7e7d30]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_python.so
=========     Host Frame: [0x15a137]
=========                in /usr/bin/python
=========     Host Frame:PyObject_Call [0x16942a]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x1455d6]
=========                in /usr/bin/python
=========     Host Frame: [0x16893d]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x1455d6]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x14326c]
=========                in /usr/bin/python
=========     Host Frame:_PyFunction_Vectorcall [0x15a9fb]
=========                in /usr/bin/python
=========     Host Frame:_PyEval_EvalFrameDefault [0x1455d6]
=========                in /usr/bin/python
=========     Host Frame: [0x16893d]
=========                in /usr/bin/python
=========     Host Frame:torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) [0x7e0002]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_python.so
=========     Host Frame:torch::autograd::Node::operator()(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) [0x4ea4d3a]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) [0x4e9e815]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) [0x4e9f467]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) [0x4e96b75]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) [0x7dc04b]
=========                in /home/mfbalin/.local/lib/python3.10/site-packages/torch/lib/libtorch_python.so
=========     Host Frame: [0xe62b2]
=========                in /lib/x86_64-linux-gnu/libstdc++.so.6
=========     Host Frame:start_thread in ./nptl/pthread_create.c:442 [0x94ac2]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame: [0x12684f]
=========                in /lib/x86_64-linux-gnu/libc.so.6
========= 

Depending work items or issues

@mfbalin mfbalin added the Work Item Work items tracked in project tracker label Apr 10, 2024
@mfbalin mfbalin self-assigned this Apr 10, 2024
@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 10, 2024

I suspect cusparse might be buggy as they changed the relevant code in CUDA 12.2 as #7295 fixes the issue.

@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 10, 2024

And #7297 does not catch any error in any of the asserts.

@mfbalin mfbalin linked a pull request Apr 10, 2024 that will close this issue
8 tasks
@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 10, 2024

CUDA 11.8 does not crash.

@mfbalin
Copy link
Collaborator Author

mfbalin commented Apr 10, 2024

CUDA 12.0 had the following update:
CUDA 12.0 Improved cusparseCsr2cscEx2() performance.

They probably have introduced a bug in CUSparse in CUDA 12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Work Item Work items tracked in project tracker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant