Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Alpaka to version 1.1.0 #8957

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Jan 23, 2024

Major changes include:

  • introduce SYCL/oneAPI support for CPUs and Intel GPUs
  • make alpaka platforms full objects, and rename from Pltf to Platform
  • make the vendor random number generators support optional
  • refactor Vec type, and add .x(), .y(), .z() accessors
  • enable support for asynchronous memory operations in ROCm 5.3 and later
  • change all CUDA warp operations to synchronise all threads
  • remove ALPAKA_ASSERT_OFFLOAD, introduce ALPAKA_ASSERT_ACC
  • implement copysign(), fma(), log2(), log10() math functions
  • simplify offset and pitch APIs
  • add support for CUDA 12.2, 12.3 and ROCm 5.6, 5.7, 6.0
  • improve error messages related to kernel launches
  • rework thread pool and callback threads
  • fix alpaka::wait(device, event) function for CUDA/HIP GPUs
  • implement alpaka-ls

Disable the vendor specific random number generators in Alpaka (NVIDIA cuRAND, AMD rocRAND, Intel DPL).

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard (Andrea Bocci) for branch IB/CMSSW_14_0_X/master.

@cmsbuild, @aandvalenzuela, @smuzaffar, @iarspider can you please review it and eventually sign? Thanks.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 23, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2024

This will require a corresponding update in CMSSW.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2024

please test with cms-sw/cmssw#43772

@cmsbuild
Copy link
Contributor

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/36983/summary.html
COMMIT: bf32ea7
CMSSW: CMSSW_14_0_X_2024-01-22-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8957/36983/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

Requested to quit.
Requested to quit.
Requested to quit.
* The action "build-external+alpaka+1.1.0-2f48beae08b1557551668b2b5e1c5bd3" was not completed successfully because Failed to build alpaka. Log file in /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/alpaka/1.1.0-2f48beae08b1557551668b2b5e1c5bd3/log. Final lines of the log file:
warning: line 36: It's not recommended to have unversioned Obsoletes: Obsoletes: external+alpaka+1.1.0-2f48beae08b1557551668b2b5e1c5bd3
error: Bad source: /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/SOURCES/external/alpaka/1.1.0-2f48beae08b1557551668b2b5e1c5bd3/1.1.0-2f48beae08b1557551668b2b5e1c5bd3.tar.gz: No such file or directory

* The action "install-external+alpaka+1.1.0-2f48beae08b1557551668b2b5e1c5bd3" was not completed successfully because The following dependencies could not complete:
build-external+alpaka+1.1.0-2f48beae08b1557551668b2b5e1c5bd3
* The action "build-cms+cmssw-tool-conf+60.0-f22e1c72d73ca12b8bf57d758886aeb1" was not completed successfully because The following dependencies could not complete:
install-external+alpaka+1.1.0-2f48beae08b1557551668b2b5e1c5bd3


Major changes include:
  - introduce SYCL/oneAPI support for CPUs and Intel GPUs
  - make alpaka platforms full objects, and rename from Pltf to Platform
  - make the vendor random number generators support optional
  - refactor Vec type, and add .x(), .y(), .z() accessors
  - enable support for asynchronous memory operations in ROCm 5.3 and later
  - change all CUDA warp operations to synchronise all threads
  - remove ALPAKA_ASSERT_OFFLOAD, introduce ALPAKA_ASSERT_ACC
  - implement copysign(), fma(), log2(), log10() math functions
  - simplify offset and pitch APIs
  - add support for CUDA 12.2, 12.3 and ROCm 5.6, 5.7, 6.0
  - improve error messages related to kernel launches
  - rework thread pool and callback threads
  - fix alpaka::wait(device, event) function for CUDA/HIP GPUs
  - implement alpaka-ls
@cmsbuild
Copy link
Contributor

Pull request #8957 was updated.

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2024

please test with cms-sw/cmssw#43772

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2024

please abort

@fwyzard
Copy link
Contributor Author

fwyzard commented Jan 23, 2024

please test with cms-sw/cmssw#43772

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/36990/summary.html
COMMIT: 59d4d94
CMSSW: CMSSW_14_0_X_2024-01-23-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8957/36990/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/36990/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/36990/git-merge-result

Comparison Summary

Summary:

  • You potentially removed 73 lines from the logs
  • Reco comparison results: 5 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3247526
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3247504
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 161 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 11, 2024

please test with cms-sw/cmssw#43772 for el8_aarch64_gcc12

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 11, 2024

please test with cms-sw/cmssw#43772 for el8_ppc64le_gcc12

@cmsbuild
Copy link
Contributor

-1

Failed Tests: HeaderConsistency UnitTests RelVals AddOn GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37355/summary.html
COMMIT: 59d4d94
CMSSW: CMSSW_14_1_X_2024-02-09-2300/el8_ppc64le_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8957/37355/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37355/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37355/git-merge-result

Unit Tests

I found 1 errors in the following unit tests:

---> test testPhysicsToolsSelectorUtilsPythonTestsDriver had ERRORS

RelVals

  • 11634.0A fatal system signal has occurred: abort signal
  • 12434.0A fatal system signal has occurred: abort signal
  • 12434.7A fatal system signal has occurred: abort signal
Expand to see more relval errors ...

AddOn Tests

A fatal system signal has occurred: abort signal
----- Begin Fatal Exception 12-Feb-2024 04:48:33 CET-----------------------
An exception of category 'FileOpenError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
   [2] Calling RootInputFileSequence::initTheFile()
   [3] Calling StorageFactory::open()
   [4] Calling File::sysopen()
Exception Message:
Failed to open the file 'RelVal_Raw_GRun_MC.root'
   Additional Info:
      [a] Input file file:RelVal_Raw_GRun_MC.root could not be opened.
      [b] open() failed with system error 'No such file or directory' (error code 2)
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 12-Feb-2024 04:53:33 CET-----------------------
An exception of category 'FileOpenError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing input source of type PoolSource
   [2] Calling RootInputFileSequence::initTheFile()
   [3] Calling StorageFactory::open()
   [4] Calling File::sysopen()
Exception Message:
Failed to open the file 'RelVal_Raw_GRun_MC.root'
   Additional Info:
      [a] Input file file:RelVal_Raw_GRun_MC.root could not be opened.
      [b] open() failed with system error 'No such file or directory' (error code 2)
----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

GPU Unit Tests

I found 14 errors in the following unit tests:

---> test alpakaTestAtomicPairCounterCudaAsync had ERRORS
---> test alpakaTestIndependentKernelCudaAsync had ERRORS
---> test alpakaTestKernelCudaAsync had ERRORS
and more ...

@cmsbuild
Copy link
Contributor

-1

Failed Tests: HeaderConsistency UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37356/summary.html
COMMIT: 59d4d94
CMSSW: CMSSW_14_1_X_2024-02-09-2300/el8_aarch64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8957/37356/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37356/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37356/git-merge-result

Unit Tests

I found 3 errors in the following unit tests:

---> test testSiStripDQM_OfflineTkMap had ERRORS
---> test GCPall had ERRORS
---> test testTrackAnalysis had ERRORS

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 12, 2024

please test with cms-sw/cmssw#43772

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 12, 2024

please test with cms-sw/cmssw#43772 for el8_ppc64le_gcc12

@fwyzard
Copy link
Contributor Author

fwyzard commented Feb 12, 2024

please test with cms-sw/cmssw#43772 for el8_aarch64_gcc12

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37358/summary.html
COMMIT: 59d4d94
CMSSW: CMSSW_14_1_X_2024-02-11-2300/el8_aarch64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8957/37358/install.sh to create a dev area with all the needed externals and cmssw changes.

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37359/summary.html
COMMIT: 59d4d94
CMSSW: CMSSW_14_1_X_2024-02-11-2300/el8_ppc64le_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8957/37359/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found 1 errors in the following unit tests:

---> test testPhysicsToolsSelectorUtilsPythonTestsDriver had ERRORS

GPU Unit Tests

I found 14 errors in the following unit tests:

---> test Hits_testCudaAsync had ERRORS
---> test ZVertexSoA_testCudaAsync had ERRORS
---> test TrackSoAHeterogeneousAlpaka_testCudaAsync had ERRORS
and more ...

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e16e15/37364/summary.html
COMMIT: 59d4d94
CMSSW: CMSSW_14_1_X_2024-02-12-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8957/37364/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 45 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3248626
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3248604
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 161 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 34 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 1698
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 38042
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

+externals

looks good, x86_64, aarch64, ppc64le externals built fine.

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next IB/CMSSW_14_1_X/master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)
Notice This PR was tested with additional Pull Request(s), please also merge them if necessary: cms-sw/cmssw#43772

@cmsbuild
Copy link
Contributor

REMINDER @antoniovilela, @sextonkennedy, @rappoccio: This PR was tested with cms-sw/cmssw#43772, please check if they should be merged together

@antoniovilela
Copy link

+1

@cmsbuild cmsbuild merged commit 58d64ce into cms-sw:IB/CMSSW_14_1_X/master Feb 13, 2024
26 of 29 checks passed
@fwyzard fwyzard deleted the IB/CMSSW_14_0_X/master_alpaka_1.1.0 branch March 6, 2024 11:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants