Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix HGCAL Layer Cluster times in heterogeneous workflows at HLT #45838

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

AuroraPerego
Copy link
Contributor

PR description:

The times and timeErrors vectors used to store the rechits time were initialized with a size of 16, but then push_back was used to fill them leaving the first 16 entries as 0.
This PR removes the allocation of 16 elements in the constructor, using reserve() instead.
The change affects only heterogeneous workflows at the HLT.

PR validation:

Tested on wf 31834.492, the HGCAL Layer Clusters times are expected to change:
image
old -> pre fix
new -> post fix

FYI @rovere

@cmsbuild
Copy link
Contributor

cmsbuild commented Aug 30, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AuroraPerego for master.

It involves the following packages:

  • RecoLocalCalo/HGCalRecProducers (upgrade, reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen, @srimanob, @subirsarkar can you please review it and eventually sign? Thanks.
@apsallid, @bsunanda, @cseez, @edjtscott, @felicepantaleo, @hatakeyamak, @lecriste, @lgray, @missirol, @pfs, @rovere, @sameasy, @sethzenz, @vandreev11, @youyingli this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@rovere
Copy link
Contributor

rovere commented Aug 30, 2024

thanks @AuroraPerego for the investigation and the fix.

@rovere
Copy link
Contributor

rovere commented Aug 30, 2024

@cmsbuild please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4551d6/41199/summary.html
COMMIT: 95e7d74
CMSSW: CMSSW_14_2_X_2024-08-29-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/45838/41199/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 2 lines to the logs
  • Reco comparison results: 7 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3328315
  • DQMHistoTests: Total failures: 6
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3328289
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 43 files compared)
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

for (unsigned int i = 0; i < clusters->size(); ++i) {
times[i].reserve(16);
timeErrors[i].reserve(16);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to embed this loop into the previous one at lines 69 to 77? In this way, the heuristic is not even necessary and the size can be reserved correctly (i.e. the maximum possible, eventually some rechits will be discarded from the timing computation).

@rovere
Copy link
Contributor

rovere commented Sep 2, 2024

@cmsbuild please test

@rovere
Copy link
Contributor

rovere commented Sep 2, 2024

The bot seems to be a little stuck...

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2024

Pull request #45838 was updated. @jfernan2, @mandrenguyen, @srimanob, @subirsarkar can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 2, 2024

+1

Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4551d6/41220/summary.html
COMMIT: c733b85
CMSSW: CMSSW_14_2_X_2024-09-01-2300/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45838/41220/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

@mmusich
Copy link
Contributor

mmusich commented Sep 2, 2024

Tested on wf 31834.492, the HGCAL Layer Clusters times are expected to change:

is there a way to make this change appear in the bot tests? E.g. by running wf 31834.492 ?

@AuroraPerego
Copy link
Contributor Author

is there a way to make this change appear in the bot tests? E.g. by running wf 31834.492 ?

I'm unsure because this workflow runs the producer I've changed, but the collection of clusters times is not saved in the event.

@mmusich
Copy link
Contributor

mmusich commented Sep 2, 2024

I'm unsure because this workflow runs the producer I've changed, but the collection of clusters times is not saved in the event.

and does it have any bearing on trigger decisions?

@AuroraPerego
Copy link
Contributor Author

and does it have any bearing on trigger decisions?

I doubt :)

@mmusich
Copy link
Contributor

mmusich commented Sep 2, 2024

I doubt :)

I am confused. So this PR is of no consequence? what's the purpose?

@rovere
Copy link
Contributor

rovere commented Sep 2, 2024

when running CLUE on GPU at HLT for Phase2, the timing assigned to the clusters is wrong.
Regardless of its usage, we need to fix it.
This PR fixes it.

@AuroraPerego
Copy link
Contributor Author

is there a way to make this change appear in the bot tests? E.g. by running wf 31834.492 ?

Thinking about it the effects should be visible downstream, e.g. in tracksters and candidates. They are saved in the event and their time is computed using the clusters times.

@rovere
Copy link
Contributor

rovere commented Sep 4, 2024

test parameters:

  • enable = gpu
  • workflows_gpu = 31834.492

@rovere
Copy link
Contributor

rovere commented Sep 4, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

-1

Failed Tests: RelVals-GPU
Size: This PR adds an extra 12KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4551d6/41273/summary.html
COMMIT: c733b85
CMSSW: CMSSW_14_2_X_2024-09-03-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45838/41273/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

ValueError: Undefined workflows: 31834.492

Comparison Summary

Summary:

@rovere
Copy link
Contributor

rovere commented Sep 4, 2024

test parameters:

  • enable = gpu
  • workflows_gpu = 31834.492
  • workflow_opts = -w upgrade
  • workflow_opts_gpu = -w upgrade

@rovere
Copy link
Contributor

rovere commented Sep 4, 2024

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 4, 2024

+1

Size: This PR adds an extra 12KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-4551d6/41284/summary.html
COMMIT: c733b85
CMSSW: CMSSW_14_2_X_2024-09-04-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/45838/41284/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 7 lines to the logs
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 44
  • DQMHistoTests: Total histograms compared: 3328501
  • DQMHistoTests: Total failures: 0
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3328481
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 43 files compared)
  • Checked 193 log files, 163 edm output root files, 44 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

@mmusich
Copy link
Contributor

mmusich commented Sep 4, 2024

+1

tbh, I am more confused than ever.
In wf 31834.492_TTbar_14TeV+2026D110SimOnGen_Patatrack_FullRecoAlpaka I see plenty of HLT changes log: all of them in tracking (in which I would have not anticipated any changes) and none in HGCal (where I would have expected some according to #45838 (comment)).
By no means I am a reviewer but I would be keen to understand better.

@felicepantaleo
Copy link
Contributor

+1

tbh, I am more confused than ever. In wf 31834.492_TTbar_14TeV+2026D110SimOnGen_Patatrack_FullRecoAlpaka I see plenty of HLT changes log: all of them in tracking (in which I would have not anticipated any changes) and none in HGCal (where I would have expected some according to #45838 (comment)). By no means I am a reviewer but I would be keen to understand better.

The comparisons show tiny difference in pixel tracks offline alpaka GPU vs alpaka GPU if I understand correctly.
Are you surprised because you would have expected phase-2 Pixel Tracks alpaka running on GPU to be bit-by-bit reproducible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants