Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding DeepMET files in structure needed for SONIC #6

Merged
merged 7 commits into from
Jul 15, 2022

Conversation

wpmccormack
Copy link
Contributor

This is a companion PR to cms-sw/cmssw#37963. We are adding a SONIC producer for DeepMET to CMSSW to do some production tests. Nominally SONIC is designed to enable easy use of co-processors in CMS workflows, but it can also start up CPU-based inference servers within jobs. This PR does not affect any workflow in CMS in any way unless called specifically (the SONIC producer is independent of the general DeepMET one). The SONIC framework requires the model files and a specifically formatted config file for the Triton server, and these are contained in this PR in the structure required by NVIDIA Triton. Please refer to these slides for more information about SONIC and our use of NVIDIA Triton: https://indico.cern.ch/event/1143469/contributions/4799423/attachments/2416540/4135308/March_28_JetMET_SONIC.pdf

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @wpmccormack (Patrick McCormack) for branch master.

@smuzaffar, @aandvalenzuela, @iarspider, @clacaputo, @cmsbuild, @slava77, @jpata can you please review it and eventually sign? Thanks.
@mmarionncern, @gouskos, @ahinzmann, @jdamgov, @jdolen, @nhanvtran, @gkasieczka, @schoef, @mariadalfonso, @seemasharmafnal this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.
cms-bot commands are listed here

@kpedro88
Copy link
Contributor

The folder RecoMET-METPUSubtraction/deepmet has files in it that are not Triton-related. This can cause issues in the Triton server, though it seems to be working in this particular case. It would be better to have a folder structure models/deepmet, with that folder containing only the config.pbtxt and model files.

Also, we had previously discussed with @cms-sw/jetmet-pog-l2 the possibility of relocating the model files rather than copying them. It would be good to pursue this if possible.

@wpmccormack
Copy link
Contributor Author

Ah, ok I can move them from deepmet/deepmet to something like models/deepmet. Do you know where the model files originally are, or should I contact the JetMET people? Or I guess model 1 is equivalent to deepmet_v1_phase2.pb and model 2 is equivalent to deepmet_resp_v1_phase2.pb

@cmsbuild
Copy link
Contributor

Pull request #6 was updated.

@cmsbuild
Copy link
Contributor

Pull request #6 was updated.

@kpedro88
Copy link
Contributor

please use deepmet_v1_2018/model.graphdef rather than 1/model.graphdef (etc.) for clarity

@wpmccormack
Copy link
Contributor Author

I think Triton requires that the model version has to be an integer

@kpedro88
Copy link
Contributor

To get around this, I'm now envisioning something like autoCond: a dictionary mapping descriptive names to folder version numbers. There could be one of these per algorithm, with the readme here pointing to the CMSSW python file that contains the dictionary. This would make the CMSSW configs more expressive, while still avoiding duplication of model files in cms-data. Probably merits some discussion with ML and JME.

@wpmccormack
Copy link
Contributor Author

Ah ok cool, I'll send an email (but probably not right now, since people probably wouldn't see it until next week anyways)

@cmsbuild
Copy link
Contributor

Pull request #6 was updated.

@kpedro88
Copy link
Contributor

@wpmccormack you have to do this for every version, not just 1 and 2, i.e. assign numbers to each of the other versions. otherwise, the other workflows in the PR tests will not work.

@wpmccormack
Copy link
Contributor Author

Oh, why is that? The other versions are not symbolic links. I can make them links easily of course, but will we be using sonic for the other versions?

@kpedro88
Copy link
Contributor

Workflows using SONIC for Run 3 and Phase 2 exist (and are being executed in the tests of cms-sw/cmssw#37963), so either we have to support them or remove them. I think we should try to be comprehensive.

@wpmccormack
Copy link
Contributor Author

Ah gotcha. I'll give them numbers and re-push

…odel directories to symbolic links to correponding integer
@cmsbuild
Copy link
Contributor

Pull request #6 was updated.

@kpedro88
Copy link
Contributor

@wpmccormack I think a mistake was made with the phase2 models. The file sizes are not the same as what's in the upstream. Can you go back to upstream and copy the files to ensure the correct ones are used?

@cmsbuild
Copy link
Contributor

Pull request #6 was updated.

@wpmccormack
Copy link
Contributor Author

Yikes, yes, somehow I had copied the files into their new folders incorrectly. I've gone back in and double checked that they're right in the most recent push now

@cmsbuild
Copy link
Contributor

Pull request #6 was updated.

@jpata
Copy link

jpata commented Jun 10, 2022

+reconstruction

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1276c2/25426/summary.html
COMMIT: 54f88dc
CMSSW: CMSSW_12_5_X_2022-06-09-2300/el8_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1276c2/25426/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-1276c2/25426/git-merge-result

RelVals

----- Begin Fatal Exception 10-Jun-2022 14:29:03 CEST-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=DeepMETProducer label='deepMETsResolutionTune'
Exception Message:
edm::FileInPath unable to find file RecoMET/METPUSubtraction/data/deepmet/deepmet_v1_2018.pb anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/external/el8_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-09-2300/src
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/136.7611_RunJetHT2016E_reminiaod+RunJetHT2016E_reminiaod+REMINIAOD_data2016_HIPM+HARVESTDR2_REMINIAOD_data2016_HIPM
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 10-Jun-2022 14:29:04 CEST-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=DeepMETProducer label='deepMETsResolutionTune'
Exception Message:
edm::FileInPath unable to find file RecoMET/METPUSubtraction/data/deepmet/deepmet_v1_2018.pb anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/external/el8_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-09-2300/src
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/136.8311_RunJetHT2017F_reminiaod+RunJetHT2017F_reminiaod+REMINIAOD_data2017+HARVEST2017_REMINIAOD_data2017
----- End Fatal Exception -------------------------------------------------
----- Begin Fatal Exception 10-Jun-2022 14:29:08 CEST-----------------------
An exception of category 'FileInPathError' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=DeepMETProducer label='deepMETsResolutionTune'
Exception Message:
edm::FileInPath unable to find file RecoMET/METPUSubtraction/data/deepmet/deepmet_v1_2018.pb anywhere in the search path.
The search path is defined by: CMSSW_SEARCH_PATH
${CMSSW_SEARCH_PATH} is: /cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/poison:/cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/src:/cvmfs/cms-ci.cern.ch/week0/cms-data/RecoMET-METPUSubtraction/6/25426/CMSSW_12_5_X_2022-06-09-2300/external/el8_amd64_gcc10/data:/cvmfs/cms-ib.cern.ch/week0/el8_amd64_gcc10/cms/cmssw/CMSSW_12_5_X_2022-06-09-2300/src
Current directory is: /data/cmsbld/jenkins/workspace/ib-run-pr-relvals/matrix-results/136.88811_RunJetHT2018D_reminiaodUL+RunJetHT2018D_reminiaodUL+REMINIAOD_data2018UL+HARVEST2018_REMINIAOD_data2018UL
----- End Fatal Exception -------------------------------------------------
Expand to see more relval errors ...

@jpata
Copy link

jpata commented Jun 10, 2022

I think the test here fails because it needs to be tested together with cms-sw/cmssw#37963.

@perrotta
Copy link

+1

@perrotta
Copy link

merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants