Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding DeepMET SONIC Producer to CMSSW #37963

Merged
merged 17 commits into from
Jul 15, 2022

Conversation

wpmccormack
Copy link
Contributor

PR description:

This PR introduces a producer for DeepMET using SONIC for co-processor-enabled inferences-as-a-service. For more information on the status of SONIC, please refer to this presentation to JetMET DPG: https://indico.cern.ch/event/1143469/contributions/4799423/attachments/2416540/4135308/March_28_JetMET_SONIC.pdf. Please note that the producer introduced here does not affect the main DeepMET producer in any way. This is an alternate way to run the standard DeepMET model using inference as a service. As noted in the linked slides, this producer has been validated locally on Tier 2 resources and at scale with cloud computing, but with this PR, we hope to perform more "official" production tests. But again, this producer does not any workflow unless explicitly called. A helper-function macro is slightly modified, as are a few config files. There will be an accompanying PR into https://github.com/cms-data/RecoMET-METPUSubtraction, which adds needed model files.

Tagging @kpedro88 @yongbinfeng @violatingcp @nhanvtran @jmduarte

PR validation:

This PR was tested on LPC using a Triton server running on the GPU-enabled AILab machines at FermiLab. Tests have also been performed using the DeepMET SONIC producer at Purdue T2 and in Google Cloud. When the producer is not called, there is no performance impact, and when it is called, the MET results are the same as for the generic DeepMET producer.

@wpmccormack
Copy link
Contributor Author

The accompanying model-file PR is here: cms-data/RecoMET-METPUSubtraction#6

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37963/30035

  • This PR adds an extra 36KB to repository

  • There are other open Pull requests which might conflict with changes you have proposed:

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wpmccormack (Patrick McCormack) for master.

It involves the following packages:

  • Configuration/ProcessModifiers (operations)
  • PhysicsTools/PatAlgos (reconstruction)
  • RecoMET/METPUSubtraction (reconstruction)

@perrotta, @clacaputo, @cmsbuild, @slava77, @jpata, @qliphy, @fabiocos, @davidlange6 can you please review it and eventually sign? Thanks.
@rappoccio, @gouskos, @hatakeyamak, @emilbols, @Martin-Grunewald, @mbluj, @ahinzmann, @demuller, @seemasharmafnal, @mmarionncern, @missirol, @makortel, @jdolen, @azotz, @jdamgov, @nhanvtran, @gkasieczka, @schoef, @fabiocos, @AlexDeMoor, @JyothsnaKomaragiri, @gpetruc, @mariadalfonso, @andrzejnovak this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor

test parameters:
pull_request = cms-data/RecoMET-METPUSubtraction#6

@kpedro88
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6f1ee4/24748/summary.html
COMMIT: 30a4ccd
CMSSW: CMSSW_12_5_X_2022-05-16-1100/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37963/24748/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6f1ee4/24748/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6f1ee4/24748/git-merge-result

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-6f1ee4/11634.301_TTbar_14TeV+2021_Run3FS+TTbar_14TeV_TuneCP5_GenSim+HARVESTNano

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3741432
  • DQMHistoTests: Total failures: 103
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3741306
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 49 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@jpata
Copy link
Contributor

jpata commented May 17, 2022

would it make sense to define a single test workflow where all the sonic modifiers for production models are enabled?

enableSonicTriton,deepMETSonicTriton,particleNetSonicTriton,...

similar to what was done here: https://github.com/cms-sw/cmssw/pull/37134/files#diff-8c163203cbaa64f0e0e6aa2b16346cfab3eaf84a57a008c7fa37193e8b9b2adbR516

@jpata
Copy link
Contributor

jpata commented May 17, 2022

@michaelwassmer @cms-sw/jetmet-pog-l2 could you please review this from the JME side?

@michaelwassmer
Copy link
Contributor

michaelwassmer commented May 17, 2022

Yes, I can do that.
EDIT: From my perspective everything seems fine. The regular MET workflow does not seem to be affected by this. It has to be actively used to switch from the regular producer to the sonic producer.

@kpedro88
Copy link
Contributor

@jpata this workflow is already defined as .9001:

stepDict[stepName][k] = merge([{'--procModifiers': 'allSonicTriton'}, stepDict[step][k]])

hence the modification of https://github.com/cms-sw/cmssw/blob/master/Configuration/ProcessModifiers/python/allSonicTriton_cff.py in this PR.

@jpata
Copy link
Contributor

jpata commented May 18, 2022

test parameters:

@jpata
Copy link
Contributor

jpata commented May 18, 2022

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37963/31030

@cmsbuild
Copy link
Contributor

Pull request #37963 was updated. @perrotta, @rappoccio, @gouskos, @clacaputo, @fabiocos, @cmsbuild, @fgolf, @jpata, @mariadalfonso, @qliphy, @davidlange6 can you please check and sign again.

@kpedro88
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6f1ee4/26212/summary.html
COMMIT: 8c3bac9
CMSSW: CMSSW_12_5_X_2022-07-13-1100/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37963/26212/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6f1ee4/26212/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6f1ee4/26212/git-merge-result

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-6f1ee4/11824.9001_TTbar_13+2021PU_SonicTriton+TTbar_13TeV_TuneCUETP8M1_GenSim+DigiPU+RecoNanoPU+HARVESTNanoPU
  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-6f1ee4/23434.9001_TTbar_14TeV+2026D49PU_SonicTriton+TTbar_14TeV_TuneCP5_GenSimHLBeamSpot14+DigiTriggerPU+RecoGlobalPU+HARVESTGlobalPU

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 52
  • DQMHistoTests: Total histograms compared: 3923752
  • DQMHistoTests: Total failures: 19
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 3923710
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.004 KiB( 51 files compared)
  • DQMHistoSizes: changed ( 312.0 ): 0.004 KiB MessageLogger/Warnings
  • Checked 216 log files, 54 edm output root files, 52 DQM output files
  • TriggerResults: no differences found

@mariadalfonso
Copy link
Contributor

+xpog

the dual inference engine is for the same model, and just for performance evaluation. not for production.

only rebase since last signoff

@clacaputo
Copy link
Contributor

+reconstruction

  • resign after rebase

@perrotta
Copy link
Contributor

+1

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will be automatically merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.