Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An interesting bug caused "CUDA error: unspecified launch failure" #6375

Closed
StortInter opened this issue Sep 22, 2023 · 10 comments
Closed

An interesting bug caused "CUDA error: unspecified launch failure" #6375

StortInter opened this issue Sep 22, 2023 · 10 comments
Assignees

Comments

@StortInter
Copy link

🐛 Bug

Using dgl.graph() and dgl.dataloading.GraphDataLoader() with num_workers causes "RuntimeError: CUDA error: unspecified launch failure".

To Reproduce

Steps to reproduce the behavior:

  1. Install the latest pytorch and dgl with cuda.

The installation commands I used:

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install  dgl -f https://data.dgl.ai/wheels/cu118/repo.html
pip install  dglgo -f https://data.dgl.ai/wheels-test/repo.html

here is my conda env (only list key components)

# packages in environment at /home/lapluis/miniconda3/envs/dgl:
#
# Name                    Version                   Build                   Channel
cuda-cudart               11.8.89                       0                   nvidia
cuda-cupti                11.8.87                       0                   nvidia
cuda-libraries            11.8.0                        0                   nvidia
cuda-nvrtc                11.8.89                       0                   nvidia
cuda-nvtx                 11.8.86                       0                   nvidia
cuda-runtime              11.8.0                        0                   nvidia
cython                    3.0.2                    pypi_0                   pypi
dgl                       1.1.2+cu118              pypi_0                   pypi
dglgo                     0.0.2                    pypi_0                   pypi
python                    3.11.5          hab00c5b_0_cpython                conda-forge
pytorch                   2.0.1           py3.11_cuda11.8_cudnn8.7.0_0      pytorch
pytorch-cuda              11.8                 h7e8668a_5                   pytorch
torchaudio                2.0.2               py311_cu118                   pytorch
torchdata                 0.6.1           py311h6d97842_1                   conda-forge
torchtriton               2.0.0                     py311                   pytorch
torchvision               0.15.2              py311_cu118                   pytorch
  1. Run the code sample:
import os

import dgl
import torch

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
device = torch.device('cuda:0')


class MyDataset(dgl.data.DGLDataset):
    def process(self):
        pass

    def __init__(self):
        super().__init__('MyDataset')

    def __getitem__(self, idx):
        gh = dgl.graph(([1, 2], [1, 2]))    # comment to resolve error
        return 0

    def __len__(self):
        return 1000


if __name__ == '__main__':
    iter_0 = dgl.dataloading.GraphDataLoader(
        dataset=MyDataset(),
        num_workers=1   # set 0 to resolve error
    )

    for i in iter_0.__iter__():
        i.to(device=device)

    for i in iter_0.__iter__():
        i.to(device=device)

Attention: The error can be avoid by delete line 38: gh = dgl.graph(([1, 2], [1, 2])) or set num_workers to 0.

Expected behavior

Get CUDA error like this:

(dgl) lapluis@nccserv0:~$ python bug.py
Traceback (most recent call last):
  File "/home/lapluis/bug.py", line 35, in <module>
    i.to(device=device)
RuntimeError: CUDA error: unspecified launch failure
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

This code is a simplified version of the training code, I tried to use compute-sanitizer to run the original code, I got these:

========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaMemcpyAsync.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x441846]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaMemcpyAsync [0x144a374]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:dgl::runtime::CUDADeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DGLContext, DGLContext, DGLDataType) [0x8b3086]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:dgl::runtime::NDArray::CopyFromTo(DGLArray*, DGLArray*) [0x72936d]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:dgl::runtime::NDArray::CopyTo(DGLContext const&) const [0x764ed3]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:dgl::UnitGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&) [0x872ecf]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:dgl::HeteroGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&) [0x771716]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:std::_Function_handler<void (dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*), dgl::{lambda(dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*)#47}>::_M_invoke(std::_Any_data const&, dgl::runtime::DGLArgs&&, dgl::runtime::DGLRetValue*&&) [0x780156]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:DGLFuncCall [0x70e3f8]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/libdgl.so
=========     Host Frame:__pyx_f_3dgl_4_ffi_4_cy3_4core_FuncCall(void*, _object*, DGLValue*, int*) [0x1a79f]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/_ffi/_cy3/core.cpython-311-x86_64-linux-gnu.so
=========     Host Frame:__pyx_pw_3dgl_4_ffi_4_cy3_4core_12FunctionBase_5__call__(_object*, _object*, _object*) [0x1afef]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/dgl/_ffi/_cy3/core.cpython-311-x86_64-linux-gnu.so
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/call.c:214:_PyObject_MakeTpCall [0x1e007b]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:4774:_PyEval_EvalFrameDefault [0x1ec992]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:6439:_PyEval_Vector [0x2a4d36]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:1155:PyEval_EvalCode [0x2a43ef]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/pythonrun.c:1713:run_eval_code_obj [0x2c2f2a]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/pythonrun.c:1734:run_mod [0x2bf343]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/pythonrun.c:1628:pyrun_file [0x2d4300]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/pythonrun.c:440:_PyRun_SimpleFileObject [0x2d3c5e]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/pythonrun.c:79:_PyRun_AnyFileObject [0x2d3a44]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Modules/main.c:680:Py_RunMain [0x2cdbdf]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Modules/main.c:735:Py_BytesMain [0x292f97]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:../sysdeps/nptl/libc_start_call_main.h:74:__libc_start_call_main [0x271ca]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:../csu/libc-start.c:347:__libc_start_main [0x27285]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame: [0x292e3d]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
========= 
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaHostAlloc.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x441846]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaHostAlloc [0x51bfc]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/../../../../libcudart.so.11.0
=========     Host Frame:at::cuda::CUDAHostAllocatorWrapper::allocate(unsigned long) const [0xe30ec3]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:at::native::_pin_memory_cuda(at::Tensor const&, c10::optional<c10::Device>) [0xe37494]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___pin_memory(at::Tensor const&, c10::optional<c10::Device>) [0x2b4b562]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::Device>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___pin_memory>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x2b4b5f7]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:at::_ops::_pin_memory::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x209e9f1]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::(anonymous namespace)::_pin_memory(at::Tensor const&, c10::optional<c10::Device>) [0x26f26bc]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::Device>), &at::(anonymous namespace)::_pin_memory>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x26f2837]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::_ops::_pin_memory::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x209e9f1]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::VariableType::(anonymous namespace)::_pin_memory(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x3c005af]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>), &torch::autograd::VariableType::(anonymous namespace)::_pin_memory>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x3c0091a]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::_ops::_pin_memory::call(at::Tensor const&, c10::optional<c10::Device>) [0x211171c]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::native::pin_memory(at::Tensor const&, c10::optional<c10::Device>) [0x1a1d95c]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::Device>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__pin_memory>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x2a842c7]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::_ops::pin_memory::call(at::Tensor const&, c10::optional<c10::Device>) [0x211138c]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::THPVariable_pin_memory(_object*, _object*, _object*) [0x4a4d18]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_python.so
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/descrobject.c:366:method_vectorcall_VARARGS_KEYWORDS [0x209388]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/call.c:299:PyObject_Vectorcall [0x1f95bc]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:4774:_PyEval_EvalFrameDefault [0x1ec992]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/call.c:393:_PyFunction_Vectorcall [0x20f121]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:5381:_PyEval_EvalFrameDefault [0x1f04e7]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Include/internal/pycore_call.h:92:_PyObject_VectorcallTstate.lto_priv.4 [0x22fc74]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/classobject.c:67:method_vectorcall [0x22f6a8]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Modules/_threadmodule.c:1093:thread_run [0x30506b]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/thread_pthread.h:243:pythread_wrapper [0x2d05d4]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:./nptl/pthread_create.c:442:start_thread [0x89044]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:../sysdeps/unix/sysv/linux/x86_64/clone3.S:83:clone3 [0x1095fc]
=========                in /lib/x86_64-linux-gnu/libc.so.6
========= 
========= Program hit cudaErrorLaunchFailure (error 719) due to "unspecified launch failure" on CUDA API call to cudaGetLastError.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x441846]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaGetLastError [0x48dd4]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/../../../../libcudart.so.11.0
=========     Host Frame:c10::cuda::c10_cuda_check_implementation(int, char const*, char const*, int, bool) [0x43edd]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libc10_cuda.so
=========     Host Frame:at::cuda::CUDAHostAllocatorWrapper::allocate(unsigned long) const [0xe30ee3]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:at::native::_pin_memory_cuda(at::Tensor const&, c10::optional<c10::Device>) [0xe37494]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___pin_memory(at::Tensor const&, c10::optional<c10::Device>) [0x2b4b562]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::Device>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CUDA___pin_memory>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x2b4b5f7]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cuda.so
=========     Host Frame:at::_ops::_pin_memory::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x209e9f1]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::(anonymous namespace)::_pin_memory(at::Tensor const&, c10::optional<c10::Device>) [0x26f26bc]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::Device>), &at::(anonymous namespace)::_pin_memory>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x26f2837]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::_ops::_pin_memory::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x209e9f1]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::VariableType::(anonymous namespace)::_pin_memory(c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x3c005af]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>), &torch::autograd::VariableType::(anonymous namespace)::_pin_memory>, at::Tensor, c10::guts::typelist::typelist<c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x3c0091a]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::_ops::_pin_memory::call(at::Tensor const&, c10::optional<c10::Device>) [0x211171c]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::native::pin_memory(at::Tensor const&, c10::optional<c10::Device>) [0x1a1d95c]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:c10::impl::wrap_kernel_functor_unboxed_<c10::impl::detail::WrapFunctionIntoFunctor_<c10::CompileTimeFunctionPointer<at::Tensor (at::Tensor const&, c10::optional<c10::Device>), &at::(anonymous namespace)::(anonymous namespace)::wrapper_CompositeImplicitAutograd__pin_memory>, at::Tensor, c10::guts::typelist::typelist<at::Tensor const&, c10::optional<c10::Device> > >, at::Tensor (at::Tensor const&, c10::optional<c10::Device>)>::call(c10::OperatorKernel*, c10::DispatchKeySet, at::Tensor const&, c10::optional<c10::Device>) [0x2a842c7]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:at::_ops::pin_memory::call(at::Tensor const&, c10::optional<c10::Device>) [0x211138c]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_cpu.so
=========     Host Frame:torch::autograd::THPVariable_pin_memory(_object*, _object*, _object*) [0x4a4d18]
=========                in /home/lapluis/miniconda3/envs/dgl/lib/python3.11/site-packages/torch/lib/libtorch_python.so
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/descrobject.c:366:method_vectorcall_VARARGS_KEYWORDS [0x209388]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/call.c:299:PyObject_Vectorcall [0x1f95bc]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:4774:_PyEval_EvalFrameDefault [0x1ec992]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/call.c:393:_PyFunction_Vectorcall [0x20f121]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/ceval.c:5381:_PyEval_EvalFrameDefault [0x1f04e7]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Include/internal/pycore_call.h:92:_PyObject_VectorcallTstate.lto_priv.4 [0x22fc74]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Objects/classobject.c:67:method_vectorcall [0x22f6a8]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Modules/_threadmodule.c:1093:thread_run [0x30506b]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:/usr/local/src/conda/python-3.11.5/Python/thread_pthread.h:243:pythread_wrapper [0x2d05d4]
=========                in /home/lapluis/miniconda3/envs/dgl/bin/python
=========     Host Frame:./nptl/pthread_create.c:442:start_thread [0x89044]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:../sysdeps/unix/sysv/linux/x86_64/clone3.S:83:clone3 [0x1095fc]
=========                in /lib/x86_64-linux-gnu/libc.so.6
========= 
========= Target application returned an error
========= ERROR SUMMARY: 3 errors

Environment

  • DGL Version (e.g., 1.0): 1.12
  • Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3): PyTorch 2.0.1
  • OS (e.g., Linux): Linux
  • How you installed DGL (conda, pip, source): pip
  • Build command you used (if compiling from source): cmake -DBUILD_TYPE=dev -DUSE_CUDA=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda .. (I also tried to install from source)
  • Python version: 3.11.5
  • CUDA/cuDNN version (if applicable): CUDA 11.8 & cuDNN 8.9.4.25-1+cuda11.8 (Driver Version: 520.61.05)
  • GPU models and configuration (e.g. V100): V100
  • Any other relevant information:

Additional context

I also tried to install from source and conda, and tried on another server (Linux + 3090 (Driver Version: 525.125.06)), but got the same error.

Then I tried to run on my PC (Windows 11 + 3080 (Driver Version: 537.34)), install env using conda. Using 'python main.py' was alright, however, I got another error using ipython and python console:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'MyDataset' on <module '__main__' (built-in)>
Traceback (most recent call last):
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\site-packages\torch\utils\data\dataloader.py", line 1132, in _try_get_data
    data = self._data_queue.get(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\multiprocessing\queues.py", line 114, in get
    raise Empty
_queue.Empty

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\site-packages\torch\utils\data\dataloader.py", line 633, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\site-packages\torch\utils\data\dataloader.py", line 1328, in _next_data
    idx, data = self._get_data()
                ^^^^^^^^^^^^^^^^
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\site-packages\torch\utils\data\dataloader.py", line 1294, in _get_data
    success, data = self._try_get_data()
                    ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\stort\miniconda3\envs\dgl\Lib\site-packages\torch\utils\data\dataloader.py", line 1145, in _try_get_data
    raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 25832) exited unexpectedly
@StortInter
Copy link
Author

BTW, this error will not occur if I use the previous version of dgl. I tried to install dgl-1.0.2+cu118 on the server which is the installed version on my cooperator's PC, run the code and nothing happened.

@frozenbugs
Copy link
Collaborator

frozenbugs commented Sep 25, 2023

Hi @yaox12, @chang-l, can you help on this issue.

It is pretty strange error, effectively the code only moves tensor([0]) to cuda:0.
with num_worker = 1 in dataloader and gh = dgl.graph(([1, 2], [1, 2])), then something goes run that breaks to cuda:0 operator.

It crashes even I change the code to dgl-unrelated code:

t = torch.tensor([[0, 0, 0], [0, 1, 2]])
t.to(device=device)

@StortInter
Copy link
Author

I also tried 1.1.x (1.1.1 and 1.1.0) and 1.0.x (1.0.4), this error only occurs on 1.1.x.

@frozenbugs
Copy link
Collaborator

Hi @StortInter

In getitem, when I change return 0 to return gh it will not fail, may I ask why do you need to return a integer?
It is a very weird bug might due to some incompatibility between dgl and pytorch, if you can post a meaningful code which reproduces this issue, we might be able to provide more help.

import os

import dgl
import torch

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
device = torch.device('cuda:0')


class MyDataset(dgl.data.DGLDataset):
    def process(self):
        pass

    def __init__(self):
        super().__init__('MyDataset')

    def __getitem__(self, idx):
        gh = dgl.graph(([1, 2], [1, 2]))    # comment to resolve error
        return gh

    def __len__(self):
        return 1000


if __name__ == '__main__':
    iter_0 = dgl.dataloading.GraphDataLoader(
        dataset=MyDataset(),
        num_workers=1   # set 0 to resolve error
    )

    for i in iter_0.__iter__():
        i.to(device=device)

    for i in iter_0.__iter__():
        i.to(device=device)

@StortInter
Copy link
Author

Hi @StortInter

In getitem, when I change return 0 to return gh it will not fail, may I ask why do you need to return a integer? It is a very weird bug might due to some incompatibility between dgl and pytorch, if you can post a meaningful code which reproduces this issue, we might be able to provide more help.

import os

import dgl
import torch

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
device = torch.device('cuda:0')


class MyDataset(dgl.data.DGLDataset):
 def process(self):
     pass

 def __init__(self):
     super().__init__('MyDataset')

 def __getitem__(self, idx):
     gh = dgl.graph(([1, 2], [1, 2]))    # comment to resolve error
     return gh

 def __len__(self):
     return 1000


if __name__ == '__main__':
 iter_0 = dgl.dataloading.GraphDataLoader(
     dataset=MyDataset(),
     num_workers=1   # set 0 to resolve error
 )

 for i in iter_0.__iter__():
     i.to(device=device)

 for i in iter_0.__iter__():
     i.to(device=device)

Hi @frozenbugs,

Here is the original code of the dataloader:

# -*- coding: utf-8 -*-


import dgl
import numpy as np
from dgl.data import DGLDataset


class GraphDataset_k_nearest(DGLDataset):
    def __init__(self, x, y, k, num_nodes, win_length):
        self.x = x
        self.labels = y
        self.k = k
        self.num_nodes = num_nodes
        self.win_length = win_length

    def __getitem__(self, idx):
        node_features = self.x[idx]

        cor_matrix = np.corrcoef(node_features.T)
        src_node = []
        dst_node = []
        for j in range(cor_matrix.shape[0]):
            dst = cor_matrix[j].argsort()[-self.k:][::-1]
            src_node.extend([j] * len(dst))
            dst_node.extend(dst)

        G = dgl.graph((src_node, dst_node))
        G = dgl.to_bidirected(G)

        features = node_features.reshape(1, node_features.shape[0], node_features.shape[1])
        self.feature = features

        G.ndata['x'] = node_features.reshape(self.num_nodes, self.win_length)
        self.G = G
        return self.G, self.feature, self.labels[idx]

    def __len__(self):
        return len(self.x)

Just use random to generate x={Tensor(10000, 420, 128)}, y={Tensor(10000,)} and set k=128, num_nodes=128 and win_length=420, for the data is too large.

@wzm2256
Copy link
Contributor

wzm2256 commented Oct 20, 2023

BTW, this error will not occur if I use the previous version of dgl. I tried to install dgl-1.0.2+cu118 on the server which is the installed version on my cooperator's PC, run the code and nothing happened.

Thanks! You saved my life. I can now run my code.

Copy link

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you

@chang-l
Copy link
Collaborator

chang-l commented Nov 28, 2023

I think this is could be due to the issue: #6561, i.e., the sampling (child) processes invoked new CUDA instances (which is not allowed when processes are created via fork method). Even though such CUDA error msg is cleaned afterwards in the code (see the description #6561), it is not enough and this cuda error is somehow revealed at device after the sampling done...

The issue #6561 has been fixed by #6568 and merged into master. I tested using the src build and confirm that the crash can be resolved after applying the commit 1b3f14b.

@chang-l
Copy link
Collaborator

chang-l commented Nov 28, 2023

@frozenbugs can you please help double-check if the commit 1b3f14b can fix this issue?

@frozenbugs
Copy link
Collaborator

frozenbugs commented Dec 15, 2023

Yes, it is fixed, thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants