Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Herwig lhe matching fix (including multithreading fixes) #42673

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

Dominic-Stafford
Copy link
Contributor

This PR reintroduces the HadronizerFilter to Herwig from #40939, designed to address a long-standing issue of the wrong lhe events being saved when using matching in Herwig. This was reverted in #41237 (see #41230) due to issues when running on multiple cores: firstly, one line of regex in the script which merges LHEs from different thrreads incorrectly caught the new lhe tag, and for the Herwig workflows the event number wasn't being properly propagated between the threads, as it needed to be passed to the LHEEventProduct. Both of these issues are now resolved, so it should be possible to re-add the reverted changes.

Should be tested with cms-sw/cmsdist#8670.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42673/36748

Code check has found code style and quality issues which could be resolved by applying following patch(s)

…g events

Read lhe event numbers in, if present

Add HadroniserFilter to Herwig which correctly matches LHE numbers between CMSSW and herwig

Add option to Herwig input fragements to use LHE numbering

Change test examples to use new HadroniserFilter+ LHE numbering

code style

code style

Call addLHEnumbers from mergeLHE.py if asked to number events and not using the DefaultLHEMerger

Number events before merge (so numbers are also available to LHEReader)

Include evtnum in LHEEventProduct (necesary for multithreading)
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-42673/36750

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @Dominic-Stafford for master.

It involves the following packages:

  • Configuration/Generator (generators)
  • GeneratorInterface/Herwig7Interface (generators)
  • GeneratorInterface/LHEInterface (generators)
  • SimDataFormats/GeneratorProducts (generators)

@SiewYan, @mkirsano, @Saptaparna, @cmsbuild, @alberto-sanchez, @menglu21, @GurpreetSinghChahal can you please review it and eventually sign? Thanks.
@youyingli, @mkirsano, @missirol, @rovere, @Martin-Grunewald, @apsallid, @bsunanda, @alberto-sanchez, @fabiocos this is something you requested to watch as well.
@perrotta, @dpiparo, @antoniovilela, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@smuzaffar
Copy link
Contributor

smuzaffar commented Aug 29, 2023

test parameters:

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-94bbc7/34511/summary.html
COMMIT: c7d31e6
CMSSW: CMSSW_13_3_X_2023-08-28-2300/el8_amd64_gcc11
Additional Tests: THREADING
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/42673/34511/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 8 lines to the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3153095
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3153070
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@@ -40,7 +40,7 @@
outputFile = cms.string('cmsgrid_final.lhe'),
scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh'),
generateConcurrently = cms.untracked.bool(True),
postGenerationCommand = cms.untracked.vstring('mergeLHE.py', '-i', 'thread*/cmsgrid_final.lhe', '-o', 'cmsgrid_final.lhe')
postGenerationCommand = cms.untracked.vstring('mergeLHE.py', '-n', '-i', 'thread*/cmsgrid_final.lhe', '-o', 'cmsgrid_final.lhe')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we dropped the concurrency for MG some time ago, you can remove line 42 and 43

@@ -7,5 +7,5 @@
args = cms.vstring('/cvmfs/cms.cern.ch/phys_generator/gridpacks/UL/13TeV/madgraph/V5_2.6.5/dyellell01234j_5f_LO_MLM_v2/DYJets_HT-incl_slc6_amd64_gcc630_CMSSW_9_3_16_tarball.tar.xz','false','slc6_amd64_gcc630','CMSSW_9_3_16'),
nEvents = cms.untracked.uint32(10),
generateConcurrently = cms.untracked.bool(True),
postGenerationCommand = cms.untracked.vstring('mergeLHE.py', '-i', 'thread*/cmsgrid_final.lhe', '-o', 'cmsgrid_final.lhe'),
postGenerationCommand = cms.untracked.vstring('mergeLHE.py', '-n', '-i', 'thread*/cmsgrid_final.lhe', '-o', 'cmsgrid_final.lhe'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

@@ -7,7 +7,7 @@
outputFile = cms.string('cmsgrid_final.lhe'),
scriptName = cms.FileInPath('GeneratorInterface/LHEInterface/data/run_generic_tarball_cvmfs.sh'),
generateConcurrently = cms.untracked.bool(True),
postGenerationCommand = cms.untracked.vstring('mergeLHE.py', '-i', 'thread*/cmsgrid_final.lhe', '-o', 'cmsgrid_final.lhe'),
postGenerationCommand = cms.untracked.vstring('mergeLHE.py', '-n', '-i', 'thread*/cmsgrid_final.lhe', '-o', 'cmsgrid_final.lhe'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

@Dominic-Stafford
Copy link
Contributor Author

Hi @menglu21, what is the plan for madgraph concurrency longer term, will it be reintroduced once a satisfactory read-only gridpack solution is found? If so I think it would be better to leave the unit tests as they are so we can catch multithreading bugs rather than having to resolve all of them when we move back to multi-threading. Also have we also dropped multithreading for POWHEG?

@menglu21
Copy link
Contributor

menglu21 commented Sep 5, 2023

Hi @menglu21, what is the plan for madgraph concurrency longer term, will it be reintroduced once a satisfactory read-only gridpack solution is found? If so I think it would be better to leave the unit tests as they are so we can catch multithreading bugs rather than having to resolve all of them when we move back to multi-threading. Also have we also dropped multithreading for POWHEG?

it should depend on the readonly status and behavior eventually, we are still using multithread for powheg, we can keep the code as you submit

@menglu21
Copy link
Contributor

menglu21 commented Sep 5, 2023

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 5, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 6303322 into cms-sw:master Sep 5, 2023
13 checks passed
@cmsbuild cmsbuild mentioned this pull request Sep 6, 2023
@iarspider
Copy link
Contributor

@Dominic-Stafford @menglu21 Relvals are failing after this PR was merged:

----- Begin Fatal Exception 06-Sep-2023 05:53:57 CEST-----------------------
An exception of category 'ModulesSynchingOnLumis' occurred while
   [0] Calling beginJob
Exception Message:
The framework is configured to use at least two streams, but the following modules
require synchronizing on LuminosityBlock boundaries:
  Herwig7HadronizerFilter generator

The situation can be fixed by either
 * modifying the modules to support concurrent LuminosityBlocks (preferred), or
 * setting 'process.options.numberOfConcurrentLuminosityBlocks = 1' in the configuration file
----- End Fatal Exception -------------------------------------------------

@Dominic-Stafford
Copy link
Contributor Author

Hi, I'm not seeing this in my tests, which are with CMSSW_10_6_30. In the relval you linked there's an earlier error:
Error: The interface 'LesHouchesHandler:EventNumbering' was not found.
which is due to cms-sw/cmsdist#8670 not being merged, is the later error potentially spurious and due to this? Otherwise I can add process.options.numberOfConcurrentLuminosityBlocks = 1 to all the cards (we don't currently have the facility to run Herwig on multiple threads), but I don't understand why this wasn't an issue before with multithreaded externalLheProducer + single threaded Herwig7GeneratorFilter

@iarspider
Copy link
Contributor

Hi, I'm not seeing this in my tests, which are with CMSSW_10_6_30. In the relval you linked there's an earlier error: Error: The interface 'LesHouchesHandler:EventNumbering' was not found. which is due to cms-sw/cmsdist#8670 not being merged, is the later error potentially spurious and due to this? Otherwise I can add process.options.numberOfConcurrentLuminosityBlocks = 1 to all the cards (we don't currently have the facility to run Herwig on multiple threads), but I don't understand why this wasn't an issue before with multithreaded externalLheProducer + single threaded Herwig7GeneratorFilter

It is quite possible, thanks!

@makortel
Copy link
Contributor

makortel commented Sep 6, 2023

The proper workaround would be to add Herwig7HadronizerFilter to

# list of generator EDModules (C++ type) that do not support concurrentLuminosityBlocks
noConcurrentLumiGenerators = [
"AMPTGeneratorFilter",
"BeamHaloProducer",
"CosMuoGenProducer",
"ExhumeGeneratorFilter",
"Herwig7GeneratorFilter",
"HydjetGeneratorFilter",
"Hydjet2GeneratorFilter",
"PyquenGeneratorFilter",
"Pythia6GeneratorFilter",
"Pythia8EGun",
"Pythia8GeneratorFilter",
"Pythia8HadronizerFilter",
"Pythia8PtAndDxyGun",
"Pythia8PtGun",
"ReggeGribovPartonMCGeneratorFilter",
"SherpaGeneratorFilter",
]

Then cmsDriver.py will automatically set numberOfConcurrentLuminosityBlocks = 1 for all jobs that use Herwig7HadronizerFilter.

This error was not visible in 10_6_X, because concurrent lumis was enabled by default in 12_0_X.

@Dominic-Stafford
Copy link
Contributor Author

Ah, thank you @makortel for explaining, I wasn't aware of this, I'll make another PR to add this there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants