Implement Vertex finder with one kernel instead of 5 #413

VinInn · 2019-11-25T10:09:57Z

After realizing that the number of vertex to split is small I implemented the whole sequence
finder, fitter,split,fitter,ptsort as a single kernel of ONE block.
to make things simple only for the Density clustering.
Other options are still available using configurable (or conditional compilation).

The quadruplet workflow is a couple of percent (one...) faster on T4.

                    5.75%  564.21ms      5000  112.84us  4.8640us  741.69us  gpuVertexFinder::vertexFinderOneKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, int, float, float, float)
      API calls:   25.46%  4.66845s    175000  26.676us  6.3150us  61.718ms  cudaLaunchKernel
                   18.76%  3.43955s     13678  251.47us  1.6050us  5.9472ms  cudaEventSynchronize

   965.6 ±   2.4 ev/s

------
                    2.81%  265.91ms      5000  53.181us  3.0080us  786.65us  gpuVertexFinder::clusterTracksByDensityKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, int, float, float, float)
                    1.90%  180.09ms      5000  36.017us     992ns  131.74us  gpuVertexFinder::sortByPt2Kernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*)
                    0.63%  59.975ms     10000  5.9970us  1.2800us  120.26us  gpuVertexFinder::fitVerticesKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, float)
                    0.29%  27.200ms      5000  5.4400us  1.5680us  215.36us  gpuVertexFinder::splitVerticesKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, float)
      API calls:   29.07%  5.72791s    195000  29.373us  6.0080us  68.871ms  cudaLaunchKernel
                   18.05%  3.55706s    211228  16.839us     586ns  68.982ms  cudaEventRecord

   952.6 ±   2.1 ev/s

fwyzard · 2019-11-26T15:09:48Z

Validation summary

Reference release CMSSW_11_0_0_pre11 at 5b0a828
Development branch CMSSW_11_0_X_Patatrack at 614ee0b
Testing PRs:

Implement Vertex finder with one kernel instead of 5 #413 at e1b0878

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/151f02097ccf0531a5054afbb9fc8d521f2c6d96/log .

makortel · 2019-11-26T15:21:21Z

RecoPixelVertexing/PixelVertexFinding/src/gpuVertexFinderImpl.h

+    if (oneKernel_) {
+      // implemented only for density clustesrs   
+#ifdef ONE_KERNEL
+      vertexFinderOneKernel<<<1, 1024 - 256, 0, stream>>>(soa, ws_d.get(), minT, eps, errmax, chi2max);


Is there a good motivation for the "one kernel" to be controlled both by #ifdef and by configuration parameter?

ONE_KERNEL controls one vs three as before ONE was done using dynamic parallelism (some as three but in its own kernel) that at the moment does not even link...
I still hope at some point to have the possibility to test dynamic parallelism
(so ok, everything can be done at config level)
I can remove the three kernel option, is just historical development at this point...

Ok, I was just curious. I'm fine with leaving it in, but maybe add a comment explaining ONE_KERNEL?

I will cleanup and comment in next iteration.

fwyzard · 2019-11-26T17:51:29Z

@VinInn , the throughput actually shows somewhat of a slowdown.

VinInn · 2019-11-26T17:54:26Z

we can revert to 5 kernels (enough to change one line)

VinInn · 2019-11-27T09:49:43Z

OOOPSSS a bug. missing braces....
it was running the last 4 kernels twice..
(bug introduced with the latest commit to make it configurable at run time)

so run this morning in a new area using latest patatrack master
twice each version on T4 "Running 4 times over 5000 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs"

current:        980.7 ±   9.2 ev/s    986.0 ±   5.6 ev/s
oneKernel:      995.8 ±   9.5 ev/s    992.2 ±   0.6 ev/s
ThreeKernels:   990.3 ±   7.1 ev/s    994.7 ±   9.7 ev/s
FiveKernels:    985.7 ±   3.8 ev/s    989.5 ±   5.7 ev/s

VinInn · 2019-11-27T09:57:32Z

ok, need to rebase....

fwyzard · 2019-11-28T16:04:33Z

Validation summary

Reference release CMSSW_11_0_0_pre11 at 5b0a828
Development branch CMSSW_11_0_X_Patatrack at cb27c23
Testing PRs:

Implement Vertex finder with one kernel instead of 5 #413 at d0eec01

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

tracking validation plots and summary for workflow 10824.5
tracking validation plots and summary for workflow 10824.51
tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and `nvprof`/`nvvp` profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

reference release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.5
- ✔️ step3.py: log
development release, workflow 10824.51
- ✔️ step3.py: log
development release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
development release, workflow 136.86452
testing release, workflow 10824.5
- ✔️ step3.py: log
testing release, workflow 10824.51
- ✔️ step3.py: log
testing release, workflow 10824.52
- ✔️ step3.py: log
- ✔️ profile.py: log
- ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
- ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
- ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/472b355b129ca166813e91122696e0de78b2a199/log .

VinInn requested a review from makortel November 25, 2019 10:10

makortel reviewed Nov 26, 2019

View reviewed changes

makortel approved these changes Nov 26, 2019

View reviewed changes

fwyzard added the Pixels Pixels-related developments label Nov 27, 2019

VinInn added 5 commits November 27, 2019 11:01

factorize kernels to allow dynamic parallelism

89c0fb0

make single kernel working with just one block

7703dc8

increase maxtk and make onekernel default

10f8a0b

clean up for production

c4f4292

fix bugs, rename condcomp flag

d0eec01

VinInn force-pushed the OneKernelVertex11 branch from 034e339 to d0eec01 Compare November 27, 2019 10:02

Empty lines

f23303b

fwyzard merged commit 459524a into cms-patatrack:CMSSW_11_0_X_Patatrack Dec 3, 2019

makortel mentioned this pull request Dec 13, 2019

[RFC] Reduce calls to cudaEventRecord() via the caching allocators #412

Open

fwyzard pushed a commit that referenced this pull request Oct 8, 2020

Implement GPU vertex finder with a single kernel (#413)

6286df9

fwyzard mentioned this pull request Oct 8, 2020

Patatrack integration - Pixel vertex reconstruction (11/N) cms-sw/cmssw#31723

Merged

fwyzard pushed a commit that referenced this pull request Oct 20, 2020

Implement GPU vertex finder with a single kernel (#413)

e779a69

fwyzard pushed a commit that referenced this pull request Oct 23, 2020

Implement GPU vertex finder with a single kernel (#413)

b61907d

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

Implement GPU vertex finder with a single kernel (#413)

7397438

fwyzard mentioned this pull request Nov 6, 2020

Patatrack integration - Pixel track reconstruction (10/N) cms-sw/cmssw#31722

Merged

fwyzard pushed a commit that referenced this pull request Nov 6, 2020

Implement GPU vertex finder with a single kernel (#413)

1cb9b2a

fwyzard pushed a commit that referenced this pull request Nov 16, 2020

Implement GPU vertex finder with a single kernel (#413)

4233099

fwyzard added a commit that referenced this pull request Dec 26, 2020

Implement GPU vertex finder with a single kernel (#413)

a4fb9ba

fwyzard pushed a commit that referenced this pull request Dec 29, 2020

Implement GPU vertex finder with a single kernel (#413)

22b289a

fwyzard pushed a commit that referenced this pull request Jan 13, 2021

Implement GPU vertex finder with a single kernel (#413)

d5384e7

fwyzard pushed a commit that referenced this pull request Jan 15, 2021

Implement GPU vertex finder with a single kernel (#413)

e459d8c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Vertex finder with one kernel instead of 5 #413

Implement Vertex finder with one kernel instead of 5 #413

VinInn commented Nov 25, 2019 •

edited

Loading

fwyzard commented Nov 26, 2019 •

edited

Loading

makortel Nov 26, 2019

VinInn Nov 26, 2019

makortel Nov 26, 2019

VinInn Nov 26, 2019

fwyzard commented Nov 26, 2019

VinInn commented Nov 26, 2019

VinInn commented Nov 27, 2019

VinInn commented Nov 27, 2019

fwyzard commented Nov 28, 2019 •

edited

Loading

Implement Vertex finder with one kernel instead of 5 #413

Implement Vertex finder with one kernel instead of 5 #413

Conversation

VinInn commented Nov 25, 2019 • edited Loading

fwyzard commented Nov 26, 2019 • edited Loading

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

makortel Nov 26, 2019

Choose a reason for hiding this comment

VinInn Nov 26, 2019

Choose a reason for hiding this comment

makortel Nov 26, 2019

Choose a reason for hiding this comment

VinInn Nov 26, 2019

Choose a reason for hiding this comment

fwyzard commented Nov 26, 2019

VinInn commented Nov 26, 2019

VinInn commented Nov 27, 2019

VinInn commented Nov 27, 2019

fwyzard commented Nov 28, 2019 • edited Loading

Validation summary

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

VinInn commented Nov 25, 2019 •

edited

Loading

fwyzard commented Nov 26, 2019 •

edited

Loading

logs and `nvprof`/`nvvp` profiles

fwyzard commented Nov 28, 2019 •

edited

Loading

logs and `nvprof`/`nvvp` profiles