Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional flags to disable SOA->legacy conversion and GPU->CPU transfer #132

Merged

Conversation

makortel
Copy link

This PR simplifies the raw2cluster and rechits CPU output so that they are always produced (since by default the products are always needed in CPU, resolves #67). Then it adds explicit configuration flags to switch off the conversion from CPU-SOA to CPU-legacy, and the GPU->CPU transfers altogether. It also adds two customization functions extending the "profiling workflow" to disable the data conversion, and, in addition, disable the transfers.

The customize functions are useful to study the technical performance of a "what if" case where we would replace the legacy dataformats with SOAs.

@felicepantaleo @fwyzard @VinInn

@fwyzard
Copy link

fwyzard commented Aug 15, 2018

Validation summary

Reference release CMSSW_10_2_2_patch1 at 400b27e
Development branch CMSSW_10_2_X_Patatrack at 12f20c1
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_1-PU25ns_102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_1-102X_upgrade2018_realistic_v9_gcc7-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/3d906310ff6425d9a888f04aa92234c862a5fef7/log .

@fwyzard
Copy link

fwyzard commented Aug 16, 2018

@makortel what is the use case of disabling the conversions but not the transfers ?

@makortel
Copy link
Author

To separate the costs of "GPU-SOA -> CPU-SOA" and "CPU-SOA -> CPU-legacy", eventually showing that the latter is very costly.

(Ok, it's a bit cheating because if we want to persist the CPU-SOA on disk, very likely we can't use the cudaMallocHost for that, so we'd need a memcpy to another CPU-SOA, but I'd expect that to be relatively cheap. OTOH, HLT doesn't need to save clusters+hits to disk)

@fwyzard
Copy link

fwyzard commented Aug 17, 2018

(Ok, it's a bit cheating because if we want to persist the CPU-SOA on disk, very likely we can't use the cudaMallocHost for that, so we'd need a memcpy to another CPU-SOA, but I'd expect that to be relatively cheap.

That's because the framework needs to own the memory ?

@fwyzard
Copy link

fwyzard commented Aug 17, 2018

I hope I fixed the conflicts correctly...

@fwyzard
Copy link

fwyzard commented Aug 17, 2018

Seems so.
However, I would rather leave 59fe1f5 out of this PR - but maybe not remove it from your own branch...

@fwyzard fwyzard merged commit 89e6ad7 into cms-patatrack:CMSSW_10_2_X_Patatrack Aug 17, 2018
fwyzard pushed a commit that referenced this pull request Aug 17, 2018
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
@makortel
Copy link
Author

(Ok, it's a bit cheating because if we want to persist the CPU-SOA on disk, very likely we can't use the cudaMallocHost for that, so we'd need a memcpy to another CPU-SOA, but I'd expect that to be relatively cheap.

That's because the framework needs to own the memory ?

It's more for serialization by ROOT. It may be possible to serialize cudaMallocHost-allocated objects such that they get deserialized with malloc, but it is not evident (to me) how to achieve that.

In addition, in principle another reason to copy soon from non-pageable memory to regular memory on CPU is that non-pageable memory is somewhat limited resource (ok, up to ~4 GB should be relatively safe still what I've read, so I'd expect it to take some time before we hit that limit in practice).

@makortel
Copy link
Author

However, I would rather leave 59fe1f5 out of this PR - but maybe not remove it from your own branch...

That one is actually a crucial part of the more slimmed customizations. The output module (apparently) needs a non-transient product to consume to create the correct dependencies. I.e. consuming e.g. *_pixelTracksHitQuadruplets_*_* leads to nothing happen. So I took the liberty of adding a dummy producer in between, that would be made to produce the pixel track objects after we run the Riemann fit within CA producer.

fwyzard pushed a commit that referenced this pull request Aug 17, 2018
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
@fwyzard
Copy link

fwyzard commented Aug 17, 2018

That one is actually a crucial part of the more slimmed customizations.

Right, I should have grepped case insensitive... I've now restored it.

@fwyzard
Copy link

fwyzard commented Aug 17, 2018

It's more for serialization by ROOT. It may be possible to serialize cudaMallocHost-allocated objects such that they get deserialized with malloc, but it is not evident (to me) how to achieve that.

OK, that's beyond my expertise - but why would it make a difference where the original memory comes from ?

Do you mean that if we use something like a vector with a CUDA-aware allocator, the allocator is part of the type, and ROOT needs to be aware of it ?

@fwyzard
Copy link

fwyzard commented Aug 17, 2018

In addition, in principle another reason to copy soon from non-pageable memory to regular memory on CPU is that non-pageable memory is somewhat limited resource (ok, up to ~4 GB should be relatively safe still what I've read, so I'd expect it to take some time before we hit that limit in practice).

I think we can have as much non-pageable memory as the available physical memory.
For example, on vinavx2 I could cudaMallocHost (and then memset to make sure it was really allocated) up to 14 GB.

@fwyzard
Copy link

fwyzard commented Aug 19, 2018

Going back to

It's more for serialization by ROOT. It may be possible to serialize cudaMallocHost-allocated objects such that they get deserialized with malloc, but it is not evident (to me) how to achieve that.

Reading the documentation for cudaHostRegister, it should be possible to create e.g. an std::vector without any special allocator, and register its memory buffer to CUDA / make it non-pageable (as long as it is not reallocated later on).

@makortel
Copy link
Author

@fwyzard Replying collectively

OK, that's beyond my expertise - but why would it make a difference where the original memory comes from ?

Do you mean that if we use something like a vector with a CUDA-aware allocator, the allocator is part of the type, and ROOT needs to be aware of it ?

Something like that. If we store struct of pointers and ints, I'd guess ROOT can't serialize it without a custom streamer. With vector the allocator is indeed in the type. Maybe with some custom container class we could hide the allocator.

I think we can have as much non-pageable memory as the available physical memory.
For example, on vinavx2 I could cudaMallocHost (and then memset to make sure it was really allocated) up to 14 GB.

Ok, I'm happy to stand corrected. I just recalled people wondering in NVidia forums why they can't allocate more than ~4 GB non-pageable even if the system has more.

Reading the documentation for cudaHostRegister, it should be possible to create e.g. an std::vector without any special allocator, and register its memory buffer to CUDA / make it non-pageable (as long as it is not reallocated later on).

While cudaHostRegister looks nice it is claimed to be expensive. Of course we could measure which one is more expensive, cudaHostRegister or memcpy. Another concern I have with per-event-per-module-per-EDM-stream cudaMallocHost or cudaHostRegister is a synchronization with CUDA runtime/driver(?) as I saw in #125.

@cmsbot
Copy link

cmsbot commented Aug 21, 2018

A new Pull Request was created by @makortel (Matti Kortelainen) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

RecoLocalTracker/Configuration
RecoLocalTracker/SiPixelClusterizer
RecoLocalTracker/SiPixelRecHits
RecoPixelVertexing/Configuration
RecoPixelVertexing/PixelTrackFitting
RecoPixelVertexing/PixelTriplets
SimTracker/TrackerHitAssociation

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

@fwyzard fwyzard added this to the CMSSW_10_2_4_Patatrack milestone Sep 2, 2018
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Oct 19, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard added a commit that referenced this pull request Nov 27, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Dec 25, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Dec 26, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Dec 29, 2020
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Jan 15, 2021
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
fwyzard pushed a commit that referenced this pull request Apr 1, 2021
…nsfer (#132)

Always produce the CPU cluster and rechit collections, since they are needed anyway.
Add transfer and conversion flags to clusterizer, rechits and CA.
Add a skeleton for the future pixel track producer.
Add customize functions to disable conversions to legacy formats, and to disable unnecessary GPU->CPU transfers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants