Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Vertex finder with one kernel instead of 5 #413

Merged

Conversation

VinInn
Copy link

@VinInn VinInn commented Nov 25, 2019

After realizing that the number of vertex to split is small I implemented the whole sequence
finder, fitter,split,fitter,ptsort as a single kernel of ONE block.
to make things simple only for the Density clustering.
Other options are still available using configurable (or conditional compilation).

The quadruplet workflow is a couple of percent (one...) faster on T4.

                    5.75%  564.21ms      5000  112.84us  4.8640us  741.69us  gpuVertexFinder::vertexFinderOneKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, int, float, float, float)
      API calls:   25.46%  4.66845s    175000  26.676us  6.3150us  61.718ms  cudaLaunchKernel
                   18.76%  3.43955s     13678  251.47us  1.6050us  5.9472ms  cudaEventSynchronize

   965.6 ±   2.4 ev/s

------
                    2.81%  265.91ms      5000  53.181us  3.0080us  786.65us  gpuVertexFinder::clusterTracksByDensityKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, int, float, float, float)
                    1.90%  180.09ms      5000  36.017us     992ns  131.74us  gpuVertexFinder::sortByPt2Kernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*)
                    0.63%  59.975ms     10000  5.9970us  1.2800us  120.26us  gpuVertexFinder::fitVerticesKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, float)
                    0.29%  27.200ms      5000  5.4400us  1.5680us  215.36us  gpuVertexFinder::splitVerticesKernel(ZVertexSoA*, gpuVertexFinder::WorkSpace*, float)
      API calls:   29.07%  5.72791s    195000  29.373us  6.0080us  68.871ms  cudaLaunchKernel
                   18.05%  3.55706s    211228  16.839us     586ns  68.982ms  cudaEventRecord

   952.6 ±   2.1 ev/s

@VinInn VinInn requested a review from makortel November 25, 2019 10:10
@fwyzard
Copy link

fwyzard commented Nov 26, 2019

Validation summary

Reference release CMSSW_11_0_0_pre11 at 5b0a828
Development branch CMSSW_11_0_X_Patatrack at 614ee0b
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.86452.png
zoom-136.86452.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/151f02097ccf0531a5054afbb9fc8d521f2c6d96/log .

if (oneKernel_) {
// implemented only for density clustesrs
#ifdef ONE_KERNEL
vertexFinderOneKernel<<<1, 1024 - 256, 0, stream>>>(soa, ws_d.get(), minT, eps, errmax, chi2max);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good motivation for the "one kernel" to be controlled both by #ifdef and by configuration parameter?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ONE_KERNEL controls one vs three as before ONE was done using dynamic parallelism (some as three but in its own kernel) that at the moment does not even link...
I still hope at some point to have the possibility to test dynamic parallelism
(so ok, everything can be done at config level)
I can remove the three kernel option, is just historical development at this point...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I was just curious. I'm fine with leaving it in, but maybe add a comment explaining ONE_KERNEL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will cleanup and comment in next iteration.

@fwyzard
Copy link

fwyzard commented Nov 26, 2019

@VinInn , the throughput actually shows somewhat of a slowdown.

@VinInn
Copy link
Author

VinInn commented Nov 26, 2019

we can revert to 5 kernels (enough to change one line)

@fwyzard fwyzard added the Pixels Pixels-related developments label Nov 27, 2019
@VinInn
Copy link
Author

VinInn commented Nov 27, 2019

OOOPSSS a bug. missing braces....
it was running the last 4 kernels twice..
(bug introduced with the latest commit to make it configurable at run time)

so run this morning in a new area using latest patatrack master
twice each version on T4 "Running 4 times over 5000 events with 1 jobs, each with 8 threads, 8 streams and 1 GPUs"

current:        980.7 ±   9.2 ev/s    986.0 ±   5.6 ev/s
oneKernel:      995.8 ±   9.5 ev/s    992.2 ±   0.6 ev/s
ThreeKernels:   990.3 ±   7.1 ev/s    994.7 ±   9.7 ev/s
FiveKernels:    985.7 ±   3.8 ev/s    989.5 ±   5.7 ev/s

@VinInn
Copy link
Author

VinInn commented Nov 27, 2019

ok, need to rebase....

@fwyzard
Copy link

fwyzard commented Nov 28, 2019

Validation summary

Reference release CMSSW_11_0_0_pre11 at 5b0a828
Development branch CMSSW_11_0_X_Patatrack at cb27c23
Testing PRs:

Validation plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

Throughput plots

/EphemeralHLTPhysics1/Run2018D-v1/RAW run=323775 lumi=53

scan-136.86452.png
zoom-136.86452.png

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • reference release, workflow 10824.5
  • development release, workflow 10824.5
  • development release, workflow 10824.51
  • development release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • development release, workflow 136.86452
  • testing release, workflow 10824.5
  • testing release, workflow 10824.51
  • testing release, workflow 10824.52
    • ✔️ step3.py: log
    • ✔️ profile.py: log
    • ✔️ cuda-memcheck --tool initcheck (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool memcheck --leak-check full --report-api-errors all (report, log) did not find any errors
    • ✔️ cuda-memcheck --tool synccheck (report, log) did not find any errors
  • testing release, workflow 136.86452

Logs

The full log is available at https://patatrack.web.cern.ch/patatrack/validation/pulls/472b355b129ca166813e91122696e0de78b2a199/log .

@fwyzard fwyzard merged commit 459524a into cms-patatrack:CMSSW_11_0_X_Patatrack Dec 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Pixels Pixels-related developments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants