Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Add duplicate pixels #37496

Conversation

ferencek
Copy link
Contributor

@ferencek ferencek commented Apr 7, 2022

PR description:

In this PR duplicate pixels in the clusterizer code are added together instead of taking the last occurence. The goal of the PR is to bring the CPU legacy code in sync with the GPU code as a follow up to the discussions in #35376 and #37359

No changes are expected in MC samples. In the data there could be some tiny differences in case some duplicate pixels appear.

PR validation:

Code compiles :)

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2022

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37496/29198

  • This PR adds an extra 16KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2022

A new Pull Request was created by @ferencek (Dinko F.) for master.

It involves the following packages:

  • RecoLocalTracker/SiPixelClusterizer (reconstruction)

@jpata, @cmsbuild, @clacaputo, @slava77 can you please review it and eventually sign? Thanks.
@mtosi, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @OzAmram, @ferencek, @dkotlins, @gpetruc, @mmusich, @threus, @tvami this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@silviodonato
Copy link
Contributor

@ferencek we noticed that replacing set_adc with add_adc is not sufficient to fully reproduce the GPU behavior. As on GPU the duplicated pixels will be kept "duplicated" while in this case you would add them into a single pixel.
I think that the main effect is due to the zero suppression: two duplicated pixels might be below the zero suppression, while the summed pixel is above.

@silviodonato
Copy link
Contributor

silviodonato commented Apr 7, 2022

PS. This should reduce the differences in pixel clusters of ~90% so I'm in favor of this PR, just wanted to say that it should not be the last one...

@ferencek
Copy link
Contributor Author

ferencek commented Apr 7, 2022

There are pixel-level and cluster-level thresholds so it is possible that individual pixels would not pass the cluster-level threshold but their sum would. However, I am actually not sure what the final goal here is? To ensure that the legacy CPU and new GPU code give identical results? Does it really make sense to modify the legacy CPU code until it reproduces the GPU code (assuming this is even possible without significant re-engineering)? Note that already this addition of duplicate pixels does not make much sense from the detector point of view. Not that what was done so far in the legacy code was a much better solution. So if we really wanted to touch the legacy code, it would make sense to do something that makes sense from the detector point of view and that would be to completely kill duplicate pixels. I'm not familiar enough with the GPU code but I am a priori against re-engineering the legacy code just so it could reproduce what is done in the GPU code.

In the legacy code the SiPixelArrayBuffer class is used to store pixel charges. Internally it allocates a vector of pixel charges for all pixels in a pixel module where the linearized index for a pixel with row and col coordinates is defined as col * nrows + row. It can therefore be quickly checked without expensive lookups if pixel (row, col) already has some charge set. Since duplicate pixels can appear more than twice, it would be best to introduce a linearized occupany map from which it would be easy to check how many times a given pixel appeared and any pixels appearing more than once would have their charge set to 0. So killing duplicates in the legacy CPU code is possible without much change in the code.

@ferencek
Copy link
Contributor Author

ferencek commented Apr 7, 2022

And based on what I wrote above about the SiPixelArrayBuffer, there is currently no place to hold multiple instances of the same pixel (and I am pretty sure sure that the clustering algorithm that comes downstream and takes the SiPixelArrayBuffer as input was not designed to handle multiple instances of the same pixel). The only three options are the following ones: to sum them all up, ignore them all or keep only one of them (currently the last one that appears is kept). So I am afraid that reproducing what the GPU code does would require re-engineering the legacy code (making it not so legacy anymore). And all this to do something which is not well motivated from the detector point of view.

@VinInn
Copy link
Contributor

VinInn commented Apr 8, 2022

I can only second @ferencek words. The code shall do what best from a detector & physics point of view.
The GPU code was written to reproduce Legacy clusters and was successfully tested on MC. Then duplicate pixels were observed in data and the advice was to ignore them.

If this is not the case anymore somebody shall decide what action shall be taken to produce (what now considered) correct results compatible with computational costs.
(btw duplicate pixels in Run2 data for BPIX1 seem correlated to large noise where a large part of a DCOL lights up: the affected clusters do not seem to come from tracks, it would even make sense to mark those clusters as noise...)

@jpata
Copy link
Contributor

jpata commented Apr 8, 2022

@cmsbuild please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 8, 2022

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-6586b3/23761/summary.html
COMMIT: 6cc9cf5
CMSSW: CMSSW_12_4_X_2022-04-07-2300/slc7_amd64_gcc10
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37496/23761/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 639 differences found in the comparisons
  • DQMHistoTests: Total files compared: 48
  • DQMHistoTests: Total histograms compared: 3593039
  • DQMHistoTests: Total failures: 2102
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3590915
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
  • Checked 200 log files, 45 edm output root files, 48 DQM output files
  • TriggerResults: no differences found

@silviodonato
Copy link
Contributor

So if we really wanted to touch the legacy code, it would make sense to do something that makes sense from the detector point of view and that would be to completely kill duplicate pixels.

The only three options are the following ones: to sum them all up, ignore them all or keep only one of them (currently the last one that appears is kept)

@ferencek clearly I'm in favor to have the best solution (ie. removal of both the duplicated pixel) implemented on both GPU and CPU.
As far as I understand we have a computing-sustainable solution ready for GPU (#37359).
If you think you think it is possible to have an equivalent solution for the CPU soon, it would be great.
Again I don't have idea of the complexity of such a PR, do you have a rough estimate of the amount of time needed? Could it be ready before the start of the data taking?

@ferencek
Copy link
Contributor Author

Hi @silviodonato, yes, I think it is possible to have the CPU code ready before the start of data taking. I assume we are talking about CMSSW_12_4_0 as the target release.

@silviodonato
Copy link
Contributor

@ferencek please have a look at the solution proposed in #37359 (comment), that would be ideal for the TSG as the plan is to move to Alpaka at some point.

@ferencek
Copy link
Contributor Author

@ferencek please have a look at the solution proposed in #37359 (comment), that would be ideal for the TSG as the plan is to move to Alpaka at some point.

@silviodonato, not sure how the migration to Alpaka is relevant to the legacy CPU code. I specifically had in mind minimal changes to the legacy clusterizer and the legacy unpacker I would not even touch. This way the two points Andrea listed would be satisfied.

@silviodonato
Copy link
Contributor

Once we migrate the GPU code to Alpaka, we will be able to run the same code on both CPU and GPU.
Clearly, if you have in mind a minimal change that can solve the problem, it is welcome as well.

@jpata
Copy link
Contributor

jpata commented Apr 13, 2022

@ferencek @OzAmram can you clarify (from the pixel DPG point of view) if this PR is intended as something we should consider ready for merging, or if it's a RFC/discussion topic in the pixel DPG (akin to #37359)?

@ferencek
Copy link
Contributor Author

@jpata, given the discussion above, it turned into more of a discussion topic than something that should be considered for merging. Should I add [DO NOT MERGE] or something else to that effect in the title of the PR to avoid confusion?

@ferencek ferencek changed the title Add duplicate pixels [DO NOT MERGE] Add duplicate pixels Apr 13, 2022
@ferencek
Copy link
Contributor Author

@silviodonato, here is a PR that removes duplicate pixels in the legacy CPU code #37559

@smuzaffar smuzaffar modified the milestones: CMSSW_12_4_X, CMSSW_12_5_X May 16, 2022
@jpata
Copy link
Contributor

jpata commented May 16, 2022

-reconstruction

  • removing from the reco queues until the discussion in the DPG is concluded

@qliphy
Copy link
Contributor

qliphy commented Aug 23, 2022

@ferencek @fwyzard This can be closed now by #37559, right?

@fwyzard
Copy link
Contributor

fwyzard commented Aug 23, 2022

Yes, this PR should be closed.

@perrotta perrotta closed this Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants