Replace uses of `__CUDA_ARCH` and `NVCOMPILER_CUDA_ARCH__` for compile time target version checks #976

brycelelbach · 2021-03-29T17:44:30Z

We currently use __CUDA_ARCH__/__NVCOMPILER_CUDA_ARCH__ in a few places that are difficult to replace with if target:

For some headers like <cuda/[std/]atomic> and <cuda/[std/]barrier>, we need to produce a compile time error if the header is being compiled for an older SM target.
There are also some memcpy_async implementation details that are #if'd out for older SM targets. I think we should probably just allow these to be present for all SM targets.
- https://github.com/NVIDIA/libcudacxx/blob/feature/nvcxx-compatibility/include/cuda/std/barrier#L307
atomic_flag's wait/notify member functions is only defined for newer targets. Note that we do NOT do this for atomic, which is strange.
- https://github.com/NVIDIA/libcudacxx/blob/feature/nvcxx-compatibility/libcxx/include/atomic#L2600

Possible solutions:

Don't emit an error for older SMs with NVC++. This would lead to (possibly cryptic) compile time failures in some cases and runtime failures in some cases.
Add some sort of compile time "do all targets provide"/"do any target provide" mechanism to <nv/target> that uses NV_TARGET_SM_INTEGER_LIST instead to detect if any of the SMs in the list don't meet the requirements of the feature. This would require some preprocessor logic.
Add some sort of static_assert_target facility to NVC++. This wouldn't solve the case of the memcpy_async overloads that should only be present for newer targets.

The text was updated successfully, but these errors were encountered:

mfbalin · 2024-03-29T16:26:21Z

We compile our library for starting from sm_35 upto sm_90. However, simply including cuda atomic header results in a compile time error. How can we compile our code so that the code path can be enabled only for suitable targets? We utilize the cuCollections library which includes the cuda atomic headers automatically. Since we use the static_map from cuCollections in the host code, I don't know how to get around this limitation.

The PR where we face such a problem: dmlc/dgl#7239

In file included from /home/ubuntu/jenkins/workspace/dgl_PR-7239/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/atomic:733,
                 from /home/ubuntu/jenkins/workspace/dgl_PR-7239/third_party/cccl/libcudacxx/include/cuda/std/atomic:18,
                 from /home/ubuntu/jenkins/workspace/dgl_PR-7239/graphbolt/../third_party/cccl/libcudacxx/include/cuda/atomic:14,
                 from /home/ubuntu/jenkins/workspace/dgl_PR-7239/graphbolt/../third_party/cuco/include/cuco/detail/open_addressing/kernels.cuh:22,
                 from /home/ubuntu/jenkins/workspace/dgl_PR-7239/graphbolt/../third_party/cuco/include/cuco/detail/open_addressing/open_addressing_impl.cuh:21,
                 from /home/ubuntu/jenkins/workspace/dgl_PR-7239/graphbolt/../third_party/cuco/include/cuco/static_map.cuh:21,
                 from /home/ubuntu/jenkins/workspace/dgl_PR-7239/graphbolt/src/cuda/unique_and_compact_impl.cu:14:
/home/ubuntu/jenkins/workspace/dgl_PR-7239/third_party/cccl/libcudacxx/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
   12 | #  error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
      |    ^~~~~

@PointKernel How can I make use of static_map for sm_70 and above when I compile for targets ranging from sm_35 to sm_90?

jrhemstad added thrust For all items related to Thrust. libcu++ For all items related to libcu++ and removed thrust For all items related to Thrust. labels Feb 22, 2023

jarmak-nv assigned alliepiper Feb 23, 2023

alliepiper removed their assignment May 1, 2023

jarmak-nv transferred this issue from NVIDIA/libcudacxx Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace uses of `__CUDA_ARCH` and `NVCOMPILER_CUDA_ARCH__` for compile time target version checks #976

Replace uses of `__CUDA_ARCH` and `NVCOMPILER_CUDA_ARCH__` for compile time target version checks #976

brycelelbach commented Mar 29, 2021 •

edited

Loading

mfbalin commented Mar 29, 2024 •

edited

Loading

Replace uses of __CUDA_ARCH__ and __NVCOMPILER_CUDA_ARCH__ for compile time target version checks #976

Replace uses of __CUDA_ARCH__ and __NVCOMPILER_CUDA_ARCH__ for compile time target version checks #976

Comments

brycelelbach commented Mar 29, 2021 • edited Loading

mfbalin commented Mar 29, 2024 • edited Loading

Replace uses of `__CUDA_ARCH` and `NVCOMPILER_CUDA_ARCH__` for compile time target version checks #976

Replace uses of `__CUDA_ARCH` and `NVCOMPILER_CUDA_ARCH__` for compile time target version checks #976

brycelelbach commented Mar 29, 2021 •

edited

Loading

mfbalin commented Mar 29, 2024 •

edited

Loading