Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kokkos: RELOCATABLE_DEVICE_CODE and desul atomics #13321

Open
jrobcary opened this issue Aug 5, 2024 · 8 comments
Open

Kokkos: RELOCATABLE_DEVICE_CODE and desul atomics #13321

jrobcary opened this issue Aug 5, 2024 · 8 comments
Labels
type: bug The primary issue is a bug in Trilinos code or tests

Comments

@jrobcary
Copy link
Contributor

jrobcary commented Aug 5, 2024

Bug Report

@kokkos

Description

RELOCATABLE_DEVICE_CODE and desul atomics
We use trilinos-15.1.1 in a project with separable CUDA compilation, for which I understand that we need to use relocatable device code (-DKokkos_ENABLE_CUDA_RELOCATABLE_DEVICE_CODE=TRUE). But when we do that and we compile our own code, we run into
#error Relocatable device code mode incompatible with desul atomics configuration
from packages/kokkos/tpls/desul/include/desul/atomics/Macros.hpp.
Is there a way around this?

Steps to Reproduce

  1. SHA1: [insert here]
  2. Configure script: [attach here
    vcloud.txcorp.com-tximplicit-ser-config.sh.txt
    ]
  3. Configure log: [attach her
    vcloud.txcorp.com-tximplicit-ser-config.txt
    e]
  4. Build log: [attach her
    vcloud.txcorp.com-tximplicit-ser-build.sh.txt
    e]
  5. Input deck: [attach here]
  6. Do this.
  7. Do that.
  8. Shake fist angrily at computer.
  9. Run log: [attach here]
@jrobcary jrobcary added the type: bug The primary issue is a bug in Trilinos code or tests label Aug 5, 2024
@jrobcary jrobcary changed the title PackageName: General Summary of the Bug Kokkos: RELOCATABLE_DEVICE_CODE and desul atomics Aug 5, 2024
@jrobcary
Copy link
Contributor Author

jrobcary commented Aug 5, 2024

I can get my program to compile by commenting out line 25,

// #error Relocatable device code mode incompatible with desul atomics configuration

in packages/kokkos/tpls/desul/include/desul/atomics/Macros.hpp, Above this line is the comment,

// Intercept incompatible relocatable device code mode which leads to ODR violations

Since ODR violations would lead to a compile-time or link-time error, and I do not see one, is this line still needed?

Another indicator is that this error is raised in the above file only when one of DESUL_ATOMICS_ENABLE_CUDA_SEPARABLE_COMPILATION and DESUL_IMPL_CUDA_RDC is defined while the other is not. But DESUL_IMPL_CUDA_RDC appears nowhere else in the source, it seems, from

packages$ grep -r DESUL_IMPL_CUDA_RDC .

@rppawlo
Copy link
Contributor

rppawlo commented Aug 5, 2024

@crtrott @dalg24

@dalg24
Copy link
Contributor

dalg24 commented Aug 5, 2024

Since ODR violations would lead to a compile-time or link-time error, and I do not see one, is this line still needed?

The fact your build produced an executable definitely does not mean there were no ODR violations.
And yes we absolutely meant to fail to compile when we detect that incompatibility because that potentially means to nasty bugs with lock-based atomics.

You are getting this error because Kokkos/Trilinos was built with RDC enabled and you attempt to build your code without the proper compiler flag to enable RDC.

@jrobcary
Copy link
Contributor Author

jrobcary commented Aug 5, 2024

So should I add '-rdc' to CMAKE_CXX_FLAGS? Or maybe to CMAKE_CUDA_FLAGS?

@dalg24
Copy link
Contributor

dalg24 commented Aug 5, 2024

If you register your target dependency on Kokkos, your should get the proper flags transitively.

@jrobcary
Copy link
Contributor Author

jrobcary commented Aug 5, 2024

I do, eg,

target_link_libraries(txmbase Kokkos::all_libs)

should I do something different?

@dalg24
Copy link
Contributor

dalg24 commented Aug 5, 2024

No, that is correct, that's what we want you to do.
I am not sure how to advise from the information you posted.
Are you on Kokkos slack? You could ask your question here and we'd try to schedule a call.

@jrobcary
Copy link
Contributor Author

I remember now the problem. We make a clean distinction between C++ code (to be compiled by the C++ compiler) and CUDA code (to be compiled by nvcc or the wrapper). When we add Kokkos::all_libs, we then get, eg, a bunch of cuda flags put into the C++ flags, and then the compiler errors out. So including Kokkos in this way prevents us from having our separation, which we find useful and more maintainable.
Our fix was to clean those files out with

sanitizeTrilinos() {
  sanifiles="Trilinos/TrilinosConfig.cmake Kokkos/KokkosConfig.cmake Kokkos/KokkosTargets.cmake"
  techo "Sanitizing $sanifiles in $PWD"
  for i in $sanifiles; do
    sed -i'' -e 's/--relocatable-device-code=true.*-arch=sm_[0-9]*//' $i;
  done
}

Now I see that we have to get those back in for when we compile our cuda code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants