Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign of the association maps, multivector manager, HGCAL Rechits and Validation with significant speedup of Phase-2 workflows #45865

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

felicepantaleo
Copy link
Contributor

@felicepantaleo felicepantaleo commented Sep 3, 2024

PR description:

In the context of Next Generation Triggers and HGCAL TICL reconstruction, fast AssociatorMaps between Reco and Sim, Sim to Sim, and Reco to Reco are required in order to develop new reconstruction algorithms and study their performance. They have also been included in the HGCalValidator and in the TICLDumper, but can be extended to many other detectors (MTD for instance, as it is the new main offender).
The HGCAL Rechit producer was optimized by avoiding calling virtual functions for every rechit and to enable support for the MultiVectorManager.
The MultiVectorManager was optimized and together with the HGCAL RecHit map was used to optimize the E/Gamma reconstruction and the associators.
The HGCAL Validation was redesigned to make effective usage of the new associators. The framework was completely redesigned to generate automatically validation plots for any new collection of TICL tracksters against SimTracksters from Simclusters and CaloParticles.

Performance measurements:

Performed with 14_1_0_pre7 with TTbar PU200, 10 events, single thread, single stream

14_1_0_pre7:

  • Total time: 536s
  • Total allocated memory: 90GB

14_1_0_pre7+PR

  • Total time: 171s
  • Total allocated memory: 56GB

Performance comparison

Full performance comparison table here:
https://docs.google.com/spreadsheets/d/1vdmDDJJ07tf9ekQ8EC24sMqqvtCtiU2Mh6FawsIRu24/edit?usp=sharing

Notable numbers:

  • Overall Phase-2 TTbar PU200 Reconstruction+Validation workflows: speedup 313% , allocated memory -38%
  • Overall E/Gamma Phase-2 Reconstruction: 1.5x speedup
  • ElectronSeedProducer ecalDrivenElectronSeeds: 3x speedup
  • HGCalValidator: 30x speedup
  • New associators: between two orders of magnitude and infinite speedup depending if you want to run on the CaloParticles from pileup as well. In this case the associators in release will crash for the amount of resources needed.

@cms-sw/hgcal-dpg-l2 @cms-sw/egamma-pog-l2 @waredjeb @AuroraPerego @rovere

felicepantaleo and others added 17 commits August 29, 2024 12:33
…to use the new associators and automatically produce validation plots for new tracksters collections.
@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 3, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 3, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45865/41614

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45865/41641

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2024

Pull request #45865 was updated. @Martin-Grunewald, @antoniovagnerini, @civanch, @jfernan2, @mandrenguyen, @mdhildreth, @mmusich, @nothingface0, @rvenditti, @srimanob, @subirsarkar, @syuvivida, @tjavaid can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2024

+1

Size: This PR adds an extra 52KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-232398/41316/summary.html
COMMIT: 28509ad
CMSSW: CMSSW_14_2_X_2024-09-05-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45865/41316/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 297 differences found in the comparisons
  • DQMHistoTests: Total files compared: 45
  • DQMHistoTests: Total histograms compared: 3422869
  • DQMHistoTests: Total failures: 17383
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3405466
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 2745.032 KiB( 44 files compared)
  • DQMHistoSizes: changed ( 24834.911,... ): 734.076 KiB HGCAL/HGCalValidator
  • DQMHistoSizes: changed ( 29896.203 ): -1659.424 KiB HGCAL/HGCalValidator
  • Checked 197 log files, 167 edm output root files, 45 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 6, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-45865/41698

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 6, 2024

@felicepantaleo
Copy link
Contributor Author

enable profiling

@felicepantaleo
Copy link
Contributor Author

I'm enabling profiling even though the validation is not run in profiling workflows. Is it possible to specify another workflow for profiling?

@felicepantaleo
Copy link
Contributor Author

@cmsbuild please test

@felicepantaleo felicepantaleo changed the title Redesign of the association maps, multivector manager, HGCAL Rechits and Validation with significant speedup of Phase-2 reconstruction workflows Redesign of the association maps, multivector manager, HGCAL Rechits and Validation with significant speedup of Phase-2 workflows Sep 6, 2024
@jfernan2
Copy link
Contributor

jfernan2 commented Sep 6, 2024

Profiling comparison tests run on wfs 29834.21 (D110 upgrade) and 12634.21 (Run3 2023) as set on cms-sw/cms-bot#2282

However, the timing comparison fails due to an unknown igprof (and VTune too) segmentation fault, this is a long standing problem which prevents to have timing studies in PRs unfortunately

#43166

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 6, 2024

+1

Size: This PR adds an extra 52KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-232398/41378/summary.html
COMMIT: 8d8b7d4
CMSSW: CMSSW_14_2_X_2024-09-05-2300/el8_amd64_gcc12
Additional Tests: PROFILING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45865/41378/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • ROOTFileChecks: Some differences in event products or their sizes found
  • Reco comparison results: 233 differences found in the comparisons
  • DQMHistoTests: Total files compared: 45
  • DQMHistoTests: Total histograms compared: 3422869
  • DQMHistoTests: Total failures: 14061
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3408788
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 2745.032 KiB( 44 files compared)
  • DQMHistoSizes: changed ( 24834.911,... ): 734.076 KiB HGCAL/HGCalValidator
  • DQMHistoSizes: changed ( 29896.203 ): -1659.424 KiB HGCAL/HGCalValidator
  • Checked 197 log files, 167 edm output root files, 45 DQM output files
  • TriggerResults: no differences found

@felicepantaleo
Copy link
Contributor Author

Thanks @jfernan2 for clarifying. Unfortunately 29834.21 does not run any validation nor prevalidation during step3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants