Configure `scram` to disable one of the GPU bakends in a development area ? #45859

fwyzard · 2024-09-02T13:20:29Z

The ROCm (and to some extend CUDA) alpaka backends add a noticeable amount to the time it takes to build some packages.

For users that do not care about running on (AMD) GPUs, we could speed up the compilation process disabling the ROCm (or CUDA) alpaka backend(s).

Also note that it could be much worse if we manage to add the SYCL/oneAPI backend...

This could be implemented in scram, with a syntax like

scram b disable-backend {cuda,rocm}
scram b enable-backend {cuda,rocm}

?

An other way to speed up the compilation would be to target only one actual GPU type, like an NVIDIA T4 or an AMD Mi250.

This could be implemented with a syntax like

scram b enable-backend cuda=sm_89
scram b enable-backend rocm=gfx90a

We could also get the hardware type from cudaComputeCapabilities or rocmComputeCapabilities with a syntax like

scram b enable-backend cuda=native
scram b enable-backend rocm=native

@smuzaffar do you think this could be implemented in scram ?

If you think so, we can discuss the implementation detail here or in person.

The text was updated successfully, but these errors were encountered:

fwyzard · 2024-09-02T13:20:39Z

assign core,heterogeneous

cmsbuild · 2024-09-02T13:20:49Z

New categories assigned: core,heterogeneous

@Dr15Jones,@fwyzard,@makortel,@makortel,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

cmsbuild · 2024-09-02T13:20:50Z

cms-bot internal usage

cmsbuild · 2024-09-02T13:20:51Z

A new Issue was created by @fwyzard.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

smuzaffar · 2024-09-02T14:09:03Z

@fwyzard , yes we should be able to implement this via scram b ..... How about , for local development (e.g. where user only wants to test things on local host) we can just use scram build enable-alpaka-native which on host with

nvidia gpu: disable rocm, use cudaComputeCapabilities to get the get actual gpu and only build for that gpu type
amd gpu: disable cuda, use rocmComputeCapabilities to get the get actual gpu and only build for that gpu type
without gpu: disable both rocm and cuda backends

we can also add scram b {enable|disable}-alpaka-{rocm|cuda} for expicitly enable/disable rocm/cuda backend build

If needed, we can discuss this in core sw meeting tomorrow

fwyzard · 2024-09-02T14:14:09Z

If needed, we can discuss this in core sw meeting tomorrow

Sounds good.

fwyzard · 2024-09-03T17:04:45Z

About updating the flags in cuda.xml and rocm.xml tools.

`cuda.xml`

The syntax for enabling sm_## is -gencode arch=compute_##,code=[sm_##,compute_##].
So, calling e.g.

scram b enable-backend cuda=sm_89

should remove all the CUDA_FLAGS of the form -gencode arch=compute_[0-9]+,code=[sm_[0-9]+,compute_[0-9]+], and add -gencode arch=compute_89,code=[sm_89,compute_89].

The "native" CUDA architectures used by the NVIDIA GPUs in the local machine can be extracted from cudaComputeCapabilities:

$ cudaComputeCapabilities 
   0     8.9    NVIDIA L4
   1     7.5    Tesla T4

should use the architecture sm_75.

Currently there is a script cmsCudaSetup.sh, that does part of what scram b enable-backend cuda=native should do.

`rocm.xml`

The syntax for enabling gfx#### is --offload-arch=gfx####, so

scram b enable-backend rocm=gfx1100

should remove all the ROCM_FLAGS of the form --offload-arch=gfx[0-9a-f]+, and add --offload-arch=gfx1100.

Note that the value after gfx can have 3 or 4 hexadecimal digits.

The "native" ROCm architectures used by the AMD GPUs in the local machine can be extracted from rocmComputeCapabilities:

$ rocmComputeCapabilities 
   0     gfx1100    AMD Radeon Pro W7800 (unsupported)

smuzaffar · 2024-09-12T07:31:14Z

@fwyzard , thanks for the hints in #45859 (comment).
As scram build ... passes every thing to gmake as build targets so it is not easy to implement scram build enable-backend cuda as in this case cuda becomes a build target and gmake will try to run it OR scram build enable-backend cuda=sm_89 in this case cuda becomes a variable overriding its value set by cuda tool. Instead how about

scram build {en,dis}able-backend-{cuda,rocm}: To enable/disable cuda/rocm alpaka backends
scram build enable-backend-{cuda,rocm}-[comma-separated-compute-capabilities] e.g
- scram build enable-backend-cuda-sm_75 or scram build enable-backend-cuda-sm_75,sm_89
- scram build enable-backend-rocm-gfx1100 or scram build enable-backend-rocm-gfx1100,gfx90a
- scram build enable-backend-cuda-native: To find the native compute capabilities and use those
- scram build enable-backend-cuda-reset: To reset the compute capabilities to their original value ( from the release area)
scram build enable-backend-native: To disable the backend not available and call enable-backend-cuda-native for the backend which is available

fwyzard · 2024-09-12T08:15:43Z

I see.

Maybe we could shorten the commands, like

scram build {en,dis}able-{cuda,rocm}
scram build enable-cuda-sm_75
scram build enable-rocm-gfx1100,gfx90a

etc?

And it might be more clear if we split the backend and individual targets with a :

scram build enable-cuda:sm_75
scram build enable-rocm:gfx1100,gfx90a

(I would suggest using = but Make would interpret it as setting a variable)

What do you think ?

smuzaffar · 2024-09-12T08:25:37Z

sounds good, so I will drop -backend from the target and use : for the compute capabilities

smuzaffar · 2024-09-12T09:09:46Z

@fwyzard , for now I have enable-alpaka:native to automatically enable/disable cuda/rocm backend and set the native compute capabilities. Is this a good target name of should I change it to enable-alpaka-native ( enable-native sounds very generic )

fwyzard · 2024-09-12T09:18:58Z

Maybe enable-gpus:native ?

fwyzard · 2024-09-12T09:20:01Z

But it affects only Alpaka modules, not other modules that may use the process.options.accelerators, right ?

Then enable-alpaka:native may be more correct.

smuzaffar · 2024-09-12T09:21:39Z

yes it only afftects the alpaka modules. OK so I will go with enable-alpaka:native then

smuzaffar · 2024-09-12T09:42:34Z

@fwyzard , {en,dis}able-{cuda,rocm} also affect alpaka only, should we change these to {en,dis}able-alpaka:{cuda,rocm} ?

fwyzard · 2024-09-12T09:48:25Z

I'm undecided, because then calls like scram b enable-alpaka:cuda:sm_75 starts to become complicated.

fwyzard · 2024-09-12T09:49:53Z

So I'm leaning more towards scram b enable-gpus:native.

Could you implement that, and later today we ask @makortel his opinion ?

smuzaffar · 2024-09-12T09:53:09Z

As enable-{cuda,rocm}:capabilities only affects cuda/rocm directly so those call can remain enable-{cuda,rocm}:capability.

fwyzard · 2024-09-12T10:01:42Z

What about disable-cuda ?

smuzaffar · 2024-09-12T10:06:43Z

currently disable-cuda only disables the alpaka-cuda backend. It does not disable the cuda build rules so scram will still compile .cu files for non-alpaka packages

smuzaffar · 2024-09-12T10:16:42Z

But if we want disable-cuda to disable both alpaka-cuda backend and also stop building .cu files then I can do it but I think for now that will break builds ( there are packages which has gpu code depenency)

fwyzard · 2024-09-12T11:28:53Z

OK, let me try to summarise:

scram b disable-cuda
- ❌ does not build the alpaka CUDA backend
- ✔️ builds regular .cu files
scram b disable-rocm:
- ❌ does not build the alpaka ROCm backend
- ✔️ builds regular .hip.cc files
scram b enable-cuda:
- ✔️ build the alpaka CUDA backend
- ✔️ builds regular .cu files
scram b enable-cuda:sm_90:
- changes the cuda.xml tool file to support (only) the sm_90 architecture
- ✔️ build the alpaka CUDA backend
- ✔️ builds regular .cu files
scram b enable-cuda:native:
- ❔ uses cudaComputeCapabilities to determine the architecture of the NVIDIA GPUs in the system
- ✔️ changes the cuda.xml tool file to support (only) these architectures
- ✔️ build the alpaka CUDA backend
- ✔️ builds regular .cu files
scram b enable-rocm, enable-rocm:gfx1100, enable-rocm:native:
- same for the AMD GPUs, .hip.cc files, and ROCm alpaka backend
scram b enable-alpaka:native
- ❔ checks for both NVIDIA and AMD GPUs
- ✔️ updates the corresponding tool file to support (only) the GPUs present on the system,
- ❔ enable only the alpaka backend for the GPUs present on the system
- ✔️ builds all regular .cu and .hip.cc files.

Is it correct ?

Basically, it would never affect whether the regular .cu and .hip.cc files are built (other than which architecture is built), only whether the alpaka backends are built or not.

So I think I would prefer scram b enable-gpus:native :-)

fwyzard · 2024-09-12T11:32:50Z

And, once #45844 is complete, we could revisit this

currently disable-cuda only disables the alpaka-cuda backend. It does not disable the cuda build rules so scram will still compile .cu files for non-alpaka packages

and try to disable the CUDA or ROCm backends completely.

smuzaffar · 2024-09-12T11:35:49Z

Is it correct ?

yes this is correct.

So I think I would prefer scram b enable-gpus:native

OK

smuzaffar · 2024-09-12T12:52:01Z

cms-sw/cmssw-config#110 should implement these new rules. scram build help in dev area should show these new build rules

makortel · 2024-09-12T13:48:17Z

I'd find it clearest if the {enable,disable}-{cuda,rocm} and enable-gpus:native would apply equally to the compilation of .cu and .hip.cc files as well. But to be practical I'm ok with leaving that to the time #45844 becomes complete.

cmsbuild added core-pending pending-signatures heterogeneous-pending labels Sep 2, 2024

smuzaffar mentioned this issue Sep 12, 2024

New alpaka backend rules cms-sw/cmssw-config#110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure `scram` to disable one of the GPU bakends in a development area ? #45859

Configure `scram` to disable one of the GPU bakends in a development area ? #45859

fwyzard commented Sep 2, 2024

fwyzard commented Sep 2, 2024

cmsbuild commented Sep 2, 2024

cmsbuild commented Sep 2, 2024

cmsbuild commented Sep 2, 2024

smuzaffar commented Sep 2, 2024

fwyzard commented Sep 2, 2024

fwyzard commented Sep 3, 2024

smuzaffar commented Sep 12, 2024 •

edited

Loading

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

fwyzard commented Sep 12, 2024

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

fwyzard commented Sep 12, 2024

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024 •

edited

Loading

fwyzard commented Sep 12, 2024 •

edited

Loading

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

makortel commented Sep 12, 2024

Configure scram to disable one of the GPU bakends in a development area ? #45859

Configure scram to disable one of the GPU bakends in a development area ? #45859

Comments

fwyzard commented Sep 2, 2024

fwyzard commented Sep 2, 2024

cmsbuild commented Sep 2, 2024

cmsbuild commented Sep 2, 2024

cmsbuild commented Sep 2, 2024

smuzaffar commented Sep 2, 2024

fwyzard commented Sep 2, 2024

fwyzard commented Sep 3, 2024

cuda.xml

rocm.xml

smuzaffar commented Sep 12, 2024 • edited Loading

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

fwyzard commented Sep 12, 2024

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

fwyzard commented Sep 12, 2024

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024 • edited Loading

fwyzard commented Sep 12, 2024 • edited Loading

fwyzard commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

smuzaffar commented Sep 12, 2024

makortel commented Sep 12, 2024

Configure `scram` to disable one of the GPU bakends in a development area ? #45859

Configure `scram` to disable one of the GPU bakends in a development area ? #45859

`cuda.xml`

`rocm.xml`

smuzaffar commented Sep 12, 2024 •

edited

Loading

smuzaffar commented Sep 12, 2024 •

edited

Loading

fwyzard commented Sep 12, 2024 •

edited

Loading