Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timing Service now measures time not running a Module #41262

Merged
merged 9 commits into from
Apr 7, 2023

Conversation

Dr15Jones
Copy link
Contributor

@Dr15Jones Dr15Jones commented Apr 3, 2023

PR description:

The built in timing service now calculates the amount of time all cores are not running a module (i.e. Source, EDProducer, EDFilter, EDAnalyzer, OutputModule, ESModule or ESSource) during the data processing loop. This gets reported in the summary and to the framework job report. Time is summed across all threads.

Additional changes:

  • Moved postESSyncIOV signal to run right after ESSources synchronize. This now properly encapsulate the time calling the ESSources.
  • Added beginProcessing and endProcessing signals to specify when starting/ending processing of source transitions.

PR validation:

Framework unit tests pass.

Previously the signal happened after all Records were asynchronously updated to their new IOVs.
Now the signals focus on the timings of calls to the ESSources.
@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 3, 2023

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41262/35015

  • This PR adds an extra 88KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 3, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41262/35016

  • This PR adds an extra 88KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 3, 2023

A new Pull Request was created by @Dr15Jones (Chris Jones) for master.

It involves the following packages:

  • FWCore/Framework (core)
  • FWCore/Integration (core)
  • FWCore/Services (core)

@cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please review it and eventually sign? Thanks.
@makortel, @missirol, @wddgit, @fwyzard this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 3, 2023

-1

Failed Tests: ClangBuild
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d35193/31772/summary.html
COMMIT: f2a7f8f
CMSSW: CMSSW_13_1_X_2023-04-03-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41262/31772/install.sh to create a dev area with all the needed externals and cmssw changes.

Clang Build

I found compilation warning while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

See details on the summary page.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 6, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-41262/35104

  • This PR adds an extra 20KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 6, 2023

Pull request #41262 was updated. @cmsbuild, @smuzaffar, @Dr15Jones, @makortel can you please check and sign again.

@Dr15Jones
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2023

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-d35193/31879/summary.html
COMMIT: 16dd978
CMSSW: CMSSW_13_1_X_2023-04-06-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/41262/31879/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 5 lines to the logs
  • Reco comparison results: 14 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3459609
  • DQMHistoTests: Total failures: 9
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3459578
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 207 log files, 159 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

makortel commented Apr 7, 2023

@Dr15Jones Could you add the other changes of the PR (the moment where "init" accounting changes to "loop" accounting, change in where postESSyncIOVSignal_ is issued) to the PR description?

@makortel
Copy link
Contributor

makortel commented Apr 7, 2023

+1

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@slava77
Copy link
Contributor

slava77 commented Apr 7, 2023

will this PR allow counting the time spent on GPU as well? IIUC, currently it's not included in the TimingService per-module totals.

@makortel
Copy link
Contributor

makortel commented Apr 7, 2023

will this PR allow counting the time spent on GPU as well? IIUC, currently it's not included in the TimingService per-module totals.

No. Already the exact meaning of "time spent on GPU" is complex enough to warrant its own issue and discussion (although #29142 is somewhat related).

@rappoccio
Copy link
Contributor

+1

@slava77
Copy link
Contributor

slava77 commented Apr 7, 2023

will this PR allow counting the time spent on GPU as well? IIUC, currently it's not included in the TimingService per-module totals.

No. Already the exact meaning of "time spent on GPU" is complex enough to warrant its own issue and discussion (although #29142 is somewhat related).

what about at least in a single thread/stream execution case?

@makortel
Copy link
Contributor

makortel commented Apr 7, 2023

will this PR allow counting the time spent on GPU as well? IIUC, currently it's not included in the TimingService per-module totals.

No. Already the exact meaning of "time spent on GPU" is complex enough to warrant its own issue and discussion (although #29142 is somewhat related).

what about at least in a single thread/stream execution case?

I can imagine several possible things to measure as "time spent on GPU" even then, so I'd rather figure out first what quantity/quantities exactly make sense to measure (suggestions are welcome in a separate issue), and then craft a general solution for that.

But I acknowledge that a long running "external worker" is indeed one source of "non-module time" in a situation where the framework('s thread) has nothing else to do than wait for the work to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants