Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BuildRules] Enable Alpaka/Rocm backend #8301

Closed
wants to merge 4 commits into from

Conversation

smuzaffar
Copy link
Contributor

No description provided.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 8, 2023

A new Pull Request was created by @smuzaffar (Malik Shahzad Muzaffar) for branch IB/CMSSW_13_0_X/master.

@cmsbuild, @smuzaffar, @aandvalenzuela, @iarspider can you please review it and eventually sign? Thanks.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.
cms-bot commands are listed here

@smuzaffar
Copy link
Contributor Author

smuzaffar commented Feb 8, 2023

test parameters:

@smuzaffar
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 8, 2023

-1

Failed Tests: ClangBuild
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30524/summary.html
COMMIT: 58499f0
CMSSW: CMSSW_13_0_X_2023-02-08-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30524/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30524/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30524/git-merge-result

Clang Build

I found compilation error while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

>> Entering Package RecoPixelVertexing/PixelVertexFinding
>> Entering Package RecoTauTag/HLTProducers
>> Entering Package RecoTracker/TkSeedGenerator
>> Entering Package FWCore/Version
>> Compile sequence completed for CMSSW CMSSW_13_0_X_2023-02-08-1100
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 1
+ eval scram build outputlog '&&' '(python3' /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cms-bot/buildLogAnalyzer.py --logDir /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-08-1100/tmp/el8_amd64_gcc11/cache/log/src '||' 'true)'
++ scram build outputlog
>> Entering Package Alignment/OfflineValidation
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-08-1100/src/Alignment/OfflineValidation/bin/DMRmerge.cc
>> Compiling  /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_0_X_2023-02-08-1100/src/Alignment/OfflineValidation/bin/Options.cc


@smuzaffar
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 9, 2023

-1

Failed Tests: RelVals RelVals-INPUT AddOn
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30533/summary.html
COMMIT: 58499f0
CMSSW: CMSSW_13_0_X_2023-02-08-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals

----- Begin Fatal Exception 09-Feb-2023 10:26:46 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named step2_L1REPACK_HLT.py
Exception Message:
 unknown python problem occurred.
AttributeError: CUDAService

At:
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/HeterogeneousCore/CUDACore/python/ProcessAcceleratorCUDA.py(36): apply
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1575): handleProcessAccelerators
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1459): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 09-Feb-2023 10:40:11 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named step2_DIGI_L1_DIGI2RAW_HLT.py
Exception Message:
 unknown python problem occurred.
AttributeError: CUDAService

At:
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/HeterogeneousCore/CUDACore/python/ProcessAcceleratorCUDA.py(36): apply
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1575): handleProcessAccelerators
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1459): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 09-Feb-2023 10:40:16 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named step2_DIGI_L1_DIGI2RAW_HLT.py
Exception Message:
 unknown python problem occurred.
AttributeError: CUDAService

At:
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/HeterogeneousCore/CUDACore/python/ProcessAcceleratorCUDA.py(36): apply
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1575): handleProcessAccelerators
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1459): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

RelVals-INPUT

  • 139.001139.001_RunMinimumBias2021/step2_RunMinimumBias2021.log
  • 139.002139.002_RunZeroBias2021/step2_RunZeroBias2021.log
  • 139.003139.003_RunHLTPhy2021/step2_RunHLTPhy2021.log
Expand to see more relval errors ...

AddOn Tests

----- Begin Fatal Exception 09-Feb-2023 10:27:17 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named /data/cmsbld/jenkins/workspace/ib-run-pr-addon/CMSSW_13_0_X_2023-02-08-2300/src/HLTrigger/Configuration/test/OnLine_HLT_GRun.py
Exception Message:
 unknown python problem occurred.
AttributeError: CUDAService

At:
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/HeterogeneousCore/CUDACore/python/ProcessAcceleratorCUDA.py(36): apply
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1575): handleProcessAccelerators
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1459): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 09-Feb-2023 10:28:28 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named RelVal_HLT_RAW2DIGI_L1Reco_RECO.py
Exception Message:
 unknown python problem occurred.
AttributeError: CUDAService

At:
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/HeterogeneousCore/CUDACore/python/ProcessAcceleratorCUDA.py(36): apply
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1575): handleProcessAccelerators
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1459): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 09-Feb-2023 10:27:09 CET-----------------------
An exception of category 'ConfigFileReadError' occurred while
   [0] Processing the python configuration file named /data/cmsbld/jenkins/workspace/ib-run-pr-addon/CMSSW_13_0_X_2023-02-08-2300/src/HLTrigger/Configuration/test/OnLine_HLT_HIon.py
Exception Message:
 unknown python problem occurred.
AttributeError: CUDAService

At:
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/HeterogeneousCore/CUDACore/python/ProcessAcceleratorCUDA.py(36): apply
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1575): handleProcessAccelerators
  /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/CMSSW_13_0_X_2023-02-08-2300/src/FWCore/ParameterSet/python/Config.py(1459): fillProcessDesc
  <string>(2): <module>

----- End Fatal Exception -------------------------------------------------
Expand to see more addon errors ...

@fwyzard
Copy link
Contributor

fwyzard commented Feb 9, 2023

Now I am confused :-(

The same workflows work for me using a release area created with /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmsdist/8301/30533/install.sh, both on a machine with GPUs and on a machine without any GPUs.

@makortel
Copy link
Contributor

makortel commented Feb 9, 2023

I can reproduce the error if I artificially set ProcessAcceleratorCUDA._available = False in cms-sw/cmssw#40725. It seems strange that del or delattr() raise an error when the CUDAService is clearly there. I'll take a deeper look.

@makortel
Copy link
Contributor

makortel commented Feb 9, 2023

Ah, it's because the Process is wrapped as ProcessForProcessAccelerator when passed to ProcessAccelerator.apply(), and that doesn't implement __delattr__(). I'll add it.

@fwyzard
Copy link
Contributor

fwyzard commented Feb 9, 2023

Ah, thanks for the fix.

I can reproduce the error if I artificially set ProcessAcceleratorCUDA._available = False in cms-sw/cmssw#40725.

Any idea why it worked for me on a machine without any GPUs, and on a machine where I set explicitly CUDA_VISIBLE_DEVICES= ?

@makortel
Copy link
Contributor

makortel commented Feb 9, 2023

The fix is in cms-sw/cmssw#40736

I can reproduce the error if I artificially set ProcessAcceleratorCUDA._available = False in cms-sw/cmssw#40725.

Any idea why it worked for me on a machine without any GPUs, and on a machine where I set explicitly CUDA_VISIBLE_DEVICES= ?

I have no idea.

@smuzaffar
Copy link
Contributor Author

please test

@smuzaffar smuzaffar changed the base branch from IB/CMSSW_13_0_X/master to IB/CMSSW_13_1_X/master February 11, 2023 11:51
@cmsbuild
Copy link
Contributor

Pull request #8301 was updated.

@cmsbuild
Copy link
Contributor

Pull request #8301 was updated.

@fwyzard
Copy link
Contributor

fwyzard commented Feb 27, 2023

please test with cms-sw/cmssw#40832

@fwyzard
Copy link
Contributor

fwyzard commented Feb 27, 2023

please test with cms-sw/cmssw#40832 for el8_ppc64le_gcc11

@cmsbuild
Copy link
Contributor

-1

Failed Tests: Build
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30918/summary.html
COMMIT: e45f0cc
CMSSW: CMSSW_13_1_X_2023-02-26-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8301/30918/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30918/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30918/git-merge-result

Build

I found compilation error when building:

>> Cuda Device Link tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelCudaAsync/alpakaTestKernelCudaAsync_cudadlink.o 
>> Building alpaka/cuda binary alpakaTestKernelCudaAsync
Copying tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelCudaAsync/alpakaTestKernelCudaAsync to productstore area:
>> Building alpaka/rocm binary alpakaTestKernelROCmAsync
/data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc11/external/gcc/11.2.1-f9b9dfdd886f71cd63f5538223d8f161/bin/../lib/gcc/x86_64-redhat-linux-gnu/11.2.1/../../../../x86_64-redhat-linux-gnu/bin/ld: cannot find tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelROCmAsync/alpaka/testKernel.dev.cc.o: No such file or directory
collect2: error: ld returned 1 exit status
>> Deleted: tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelROCmAsync/alpakaTestKernelROCmAsync
gmake: *** [tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelROCmAsync/alpakaTestKernelROCmAsync] Error 1
>> Compiling alpaka/serial /data/cmsbld/jenkins/workspace/ib-run-pr-tests/CMSSW_13_1_X_2023-02-26-2300/src/HeterogeneousCore/AlpakaInterface/test/alpaka/testKernel.dev.cc
>> Building alpaka/serial binary alpakaTestKernelSerialSync
Copying tmp/el8_amd64_gcc11/src/HeterogeneousCore/AlpakaInterface/test/alpakaTestKernelSerialSync/alpakaTestKernelSerialSync to productstore area:


@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30922/summary.html
COMMIT: e45f0cc
CMSSW: CMSSW_13_1_X_2023-02-26-2300/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8301/30922/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30922/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30922/git-merge-result

Unit Tests

I found errors in the following unit tests:

---> test testFWCoreUtilities had ERRORS
---> test testONNXRuntime had ERRORS

@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2023

please test

@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2023

@smuzaffar @perrotta @rappoccio, now that 13.1.0-pre1 is out, can we merge this in 13.1.x, and backport it to 13.0.0 ?

@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2023

(all test failures were unrelated to this PR, let's see if they go away re-running on a more recent IB)

@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2023

please test for el8_ppc64le_gcc11

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30987/summary.html
COMMIT: e45f0cc
CMSSW: CMSSW_13_0_X_2023-03-01-2300/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8301/30987/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

FATAL: malformed spec found while quering it. Command: 
source /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/el8_amd64_gcc11/rpm-env.sh ;  rpm -q --specfile /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/tmpspec-coral --info --define "cmsdist_directory /data/cmsbld/jenkins/workspace/ib-run-pr-tests/cmsdist" --define "compilerv 1121" --define "cmscompilerv 11" --define "cmsos el8_amd64" --define "almalinux_ver 8" --define "almalinux 8" --define "centos_ver 8" --define "centos 8" --define "rhel 8" --define "dist .el8" --define "el8 1" --define "package_vectorization %{nil}" --define "cmsswdata_version_link 1" --define 'buildroot /foo'
Resulted in:

warning: line 30: It's not recommended to have unversioned Obsoletes: Obsoletes: cms+coral+CORAL_2_3_21
error: line 417: Unknown tag: <<<<<<< HEAD
error: query of specfile /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/tmp/tmpspec-coral failed, can't parse
Traceback (most recent call last):
  File "./pkgtools/cmsBuild", line 4610, in 
    build(opts, args[1:], PKGFactory)
  File "./pkgtools/cmsBuild", line 3875, in build


@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 2, 2023

-1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-fb8f22/30988/summary.html
COMMIT: e45f0cc
CMSSW: CMSSW_13_0_X_2023-02-28-1500/el8_ppc64le_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/8301/30988/install.sh to create a dev area with all the needed externals and cmssw changes.

External Build

I found compilation error when building:

FATAL: malformed spec found while quering it. Command: 
source /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/el8_ppc64le_gcc11/rpm-env.sh ;  rpm -q --specfile /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/tmpspec-coral --info --define "cmsdist_directory /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/cmsdist" --define "compilerv 1121" --define "cmscompilerv 11" --define "cmsos el8_ppc64le" --define "almalinux_ver 8" --define "almalinux 8" --define "centos_ver 8" --define "centos 8" --define "rhel 8" --define "dist .el8" --define "el8 1" --define "package_vectorization %{nil}" --define "cmsswdata_version_link 1" --define 'buildroot /foo'
Resulted in:

warning: line 30: It's not recommended to have unversioned Obsoletes: Obsoletes: cms+coral+CORAL_2_3_21
error: line 417: Unknown tag: <<<<<<< HEAD
error: query of specfile /scratch/cmsbuild/jenkins_a/workspace/ib-run-pr-tests/testBuildDir/tmp/tmpspec-coral failed, can't parse
Traceback (most recent call last):
  File "./pkgtools/cmsBuild", line 4610, in 
    build(opts, args[1:], PKGFactory)
  File "./pkgtools/cmsBuild", line 3875, in build


@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2023

ehm, what ?

@fwyzard
Copy link
Contributor

fwyzard commented Mar 2, 2023

I presume this

error: line 417: Unknown tag: <<<<<<< HEAD

means there is a conflict somewhere... I've opened #8346 with the same diff but a clean commit history.

@smuzaffar
Copy link
Contributor Author

problem here is that it is also using a cmssw PR for 13.0.X ( #8301 (comment) ) due to which bot tried to use 13.0.X cmsdist branch ( CMSSW_13_0_X_2023-02-28-1500/el8_ppc64le_gcc11 )

@smuzaffar smuzaffar closed this Mar 3, 2023
@smuzaffar smuzaffar deleted the smuzaffar-patch-19 branch March 3, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants