Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: missing fence on exec space after three arg deep copy #13373

Open
vasylivy opened this issue Aug 20, 2024 · 3 comments
Open

Tpetra: missing fence on exec space after three arg deep copy #13373

vasylivy opened this issue Aug 20, 2024 · 3 comments
Assignees
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests

Comments

@vasylivy
Copy link

vasylivy commented Aug 20, 2024

Hi,

Reporting a bug. See https://github.com/trilinos/Trilinos/blob/master/packages/tpetra/core/src/Tpetra_Import_Util2.hpp#L1128C1-L1137

Missing a fence on the exec space passed in to Kokkos::deep_copy(..)

    Kokkos::deep_copy(execution_space(), PIDList_host, PIDList_view); // needs exec_space.fence() afterwards
  
    // Stash the RemotePIDs. Once remotePIDs is changed to become a Kokkos view, we can remove this and copy directly.
    // Note: If Teuchos::Array had a shrink_to_fit like std::vector,
    // we'd call it here.
    
    Teuchos::Array<int> PIDList(NumRemoteColGIDs);
    for(LO i = 0; i < NumRemoteColGIDs; ++i) {
      PIDList[i] = PIDList_host[i];
    }

As part of this bug fix could someone on the tpetra team also audit the other three arg deep copies in tpetra for correctness? e.g. searching for either "Kokkos::deep_copy(e" or Kokkos::deep_copy(s" for exec, space shows other spots that use the three arg version. We should make sure they also appropriately fence the exec space passed in when needed.

Thanks,

Yaro

@vasylivy vasylivy added the type: bug The primary issue is a bug in Trilinos code or tests label Aug 20, 2024
@jhux2
Copy link
Member

jhux2 commented Aug 20, 2024

For context: https://kokkos.org/kokkos-core-wiki/API/core/view/deep_copy.html#semantics:

If an ExecutionSpace argument exec_space is provided the call is potentially asynchronous—i.e., the call returns before the copy operation is executed. In that case the copy operation will occur only after any already submitted work to exec_space is finished, and the copy operation will be finished before any work submitted to exec_space after the deep_copy call returns is executed. Note: the copy operation is only synchronous with respect to work in the specific execution space instance, but not necessarily with work in other instances of the same type. This behaves analogous to issuing a cudaMemcpyAsync into a specific CUDA stream, without any additional synchronization.

@jhux2
Copy link
Member

jhux2 commented Aug 20, 2024

List of Tpetra's usage of 3-argument deep_copy is attached. (I did this on a slightly out-of-date branch, but line numbers should be very close, if not exact.)
deep_copy_review.txt

@vasylivy
Copy link
Author

There are other similar missing fences in the list @jhux2 posted, just FYI. Are we planning on addressing those here or separately?

Thanks,

Yaro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

3 participants