Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocm/Cuda unit test improvement #95

Merged
merged 1 commit into from
Apr 19, 2023
Merged

Rocm/Cuda unit test improvement #95

merged 1 commit into from
Apr 19, 2023

Conversation

smuzaffar
Copy link
Contributor

@smuzaffar smuzaffar commented Apr 18, 2023

Various unit tests improvements

  • Unit tests with direct rocm or alpaka dependency (with rocm backend enabled)
    • Run tests if alpakaIsEnabledROCmAsync command is either missing or runs successfully
  • Unit tests with direct cuda or alpaka dependency (with cuda backend enabled)
    • Run tests if alpakaIsEnabledCudaAsync command is either missing or runs successfully

If USER_UNIT_TESTS=<tools> is set then alpakaIsEnabledX check is ignored and scram force runs unit tests which directory depend on <tools> e.g.

  • USER_UNIT_TESTS=rocm scram build runtests will run all tests which directly depend on rocm or alpaka-rocm.
  • USER_UNIT_TESTS=cuda scram build runtests will run all tests which directly depend on cuda or alpaka-cuda.
  • USER_UNIT_TESTS="cuda serial" scram build runtests will run all tests which directly depend on cuda , alpaka-cuda or aplaka-serial.
  • USER_UNIT_TESTS="boost" scram build runtests will run all tests which directly depend on boost

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch scramv3.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

test parameters:

  • full_cmssw = true

@smuzaffar
Copy link
Contributor Author

please test

@smuzaffar
Copy link
Contributor Author

here are some examples

  • Using USER_UNIT_TESTS to run only selected unit tests
> USER_UNIT_TESTS=rocm scram b runtests
Skip    0s ... x/y/Alpaka1CudaAsync (Not a rocm alpaka-rocm type test)
Pass    1s ... x/y/Alpaka1ROCmAsync
Skip    0s ... x/y/Alpaka1SerialSync (Not a rocm alpaka-rocm type test)
Skip    0s ... x/y/Boost1 (Not a rocm alpaka-rocm type test)
Skip    0s ... x/y/Cuda1 (Not a rocm alpaka-rocm type test)
Pass    0s ... x/y/Rocm1
Skip    0s ... x/y/Root1 (Not a rocm alpaka-rocm type test)
> USER_UNIT_TESTS=cuda scram b runtests
Pass    1s ... x/y/Alpaka1CudaAsync
Skip    0s ... x/y/Alpaka1ROCmAsync (Not a cuda alpaka-cuda type test)
Skip    0s ... x/y/Alpaka1SerialSync (Not a cuda alpaka-cuda type test)
Skip    0s ... x/y/Boost1 (Not a cuda alpaka-cuda type test)
Pass    0s ... x/y/Cuda1
Skip    0s ... x/y/Rocm1 (Not a cuda alpaka-cuda type test)
Skip    0s ... x/y/Root1 (Not a cuda alpaka-cuda type test)
> USER_UNIT_TESTS="root boost" scram b runtests
Skip    0s ... x/y/Alpaka1CudaAsync (Not a root boost type test)
Skip    0s ... x/y/Alpaka1ROCmAsync (Not a root boost type test)
Skip    0s ... x/y/Alpaka1SerialSync (Not a root boost type test)
Pass    0s ... x/y/Boost1
Skip    0s ... x/y/Cuda1 (Not a root boost type test)
Skip    0s ... x/y/Rocm1 (Not a root boost type test)
Pass    0s ... x/y/Root1
  • Default unit tests on node with Nvidia GPU where alpakaIsEnabledCudaAsync runs successfully and alpakaIsEnabledROCmAsync fails
> scram b runtests
Pass    0s ... x/y/Alpaka1CudaAsync
Skip    0s ... x/y/Alpaka1ROCmAsync (Failed to run alpakaIsEnabledROCmAsync)
Pass    0s ... x/y/Alpaka1SerialSync
Pass    0s ... x/y/Boost1
Pass    0s ... x/y/Cuda1
Skip    0s ... x/y/Rocm1 (Failed to run alpakaIsEnabledROCmAsync)
Pass    0s ... x/y/Root1
  • Default unit tests on node with AMD GPU where alpakaIsEnabledROCmAsync runs successfully and alpakaIsEnabledCudaAsync fails
> scram b runtests
Skip    0s ... x/y/Alpaka1CudaAsync (Failed to run alpakaIsEnabledCudaAsync)
Pass    0s ... x/y/Alpaka1ROCmAsync
Pass    0s ... x/y/Alpaka1SerialSync
Pass    0s ... x/y/Boost1
Skip    0s ... x/y/Cuda1 (Failed to run alpakaIsEnabledCudaAsync)
Pass    0s ... x/y/Rocm1
Pass    0s ... x/y/Root1

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c44bb0/32023/summary.html
COMMIT: b69f8d9
CMSSW: CMSSW_13_1_X_2023-04-18-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw-config/95/32023/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c44bb0/32023/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-c44bb0/32023/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 174 lines from the logs
  • Reco comparison results: 1684 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3459877
  • DQMHistoTests: Total failures: 7674
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3452181
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor Author

+externals

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next scramv3 IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants