Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpaka implementation of Hcal Local Reconstruction (Mahi) #44910

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

kakwok
Copy link
Contributor

@kakwok kakwok commented May 6, 2024

PR description:

This PR port the CUDA implementation of Hcal Local Reconstruction (Mahi) to using Alpaka.
Custom SoA data structure used in CUDA for HCAL condition data and rechits are replaced with PortableCollection.
The current Alpaka implementation aims at reproducing the results from the CUDA implementation, no algorithmic changes are made.

There are 4 main pieces involved in the migration:

  • digiConverter(hcalDigisProducerPortable): Convert CPU digis into SoA format
  • Produce HCAL condition data in SoA format (Multiple producers)
  • Mahi kernels (Mahi.dev.cc)
  • Convert rechits from SoA to legacy format (HcalRecHitSoAToLegacy)

More details on code design are presented in the recent HLT GPU development meetings:
https://indico.cern.ch/event/1350955/
https://indico.cern.ch/event/1350953/
https://indico.cern.ch/event/1350952/
https://indico.cern.ch/event/1230377/
https://indico.cern.ch/event/1230374/

PR validation:

Alpaka GPU and CPU rechit energies are validated with slightly modified workflow 12434.523 (ttbar simulation).
More validation comparisons are being worked on.

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2024

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40188

  • This PR adds an extra 248KB to repository

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2024

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44910/40189

  • This PR adds an extra 248KB to repository

@cmsbuild
Copy link
Contributor

cmsbuild commented May 6, 2024

A new Pull Request was created by @kakwok for master.

It involves the following packages:

  • CondFormats/DataRecord (db, alca)
  • CondFormats/HcalObjects (db, alca)
  • DQM/HcalTasks (dqm)
  • DataFormats/HcalDigi (simulation)
  • DataFormats/HcalRecHit (reconstruction)
  • EventFilter/HcalRawToDigi (reconstruction)
  • HeterogeneousCore/AlpakaInterface (heterogeneous)
  • RecoLocalCalo/Configuration (reconstruction)
  • RecoLocalCalo/HcalRecAlgos (reconstruction)
  • RecoLocalCalo/HcalRecProducers (reconstruction)

@syuvivida, @francescobrivio, @makortel, @rvenditti, @mandrenguyen, @fwyzard, @tjavaid, @mdhildreth, @saumyaphor4252, @perrotta, @consuegs, @jfernan2, @cmsbuild, @civanch, @nothingface0, @antoniovagnerini can you please review it and eventually sign? Thanks.
@argiro, @DryRun, @rsreds, @Martin-Grunewald, @PonIlya, @JanChyczynski, @rovere, @makortel, @missirol, @yuanchao, @tocheng, @bsunanda, @apsallid, @mmusich, @ReyerBand, @wang0jin, @sameasy, @thomreis, @mariadalfonso, @youyingli, @abdoulline, @rchatter, @seemasharmafnal this is something you requested to watch as well.
@sextonkennedy, @antoniovilela, @rappoccio you are the release manager for this.

cms-bot commands are listed here

@fwyzard
Copy link
Contributor

fwyzard commented May 6, 2024

type hcal

@cmsbuild cmsbuild added the hcal label May 6, 2024
#include "FWCore/Utilities/interface/bit_cast.h"
#include "HeterogeneousCore/AlpakaInterface/interface/config.h"

// Note: Does not compile with ALPAKA_FN_ACC on ROCm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, given the #ifdef, the function is defined both for the host and the device, so it makes sense that it doesn't compile as a pure __device__ function.

The canonical alpaka approach would be to have different specialisation for each accelerator - but we can clean this up later.

@mmusich
Copy link
Contributor

mmusich commented Jun 30, 2024

@cms-sw/pdmv-l2 @cms-sw/reconstruction-l2 kind ping.

@mandrenguyen
Copy link
Contributor

+1
No changes to reco comparisons.
Visual inspection of code seems ok.

@fwyzard
Copy link
Contributor

fwyzard commented Jul 2, 2024

@Dr15Jones as far as I can see the @cms-sw/core-l2 signature is required because of the changes in DataFormats/Common/src/classes_def.xml:

 <class name="edm::Wrapper<std::map<edm::ParameterSetID,edm::ParameterSetBlob>>"/>

 <class name="edm::StdArray<short, 4>"/>
+<class name="edm::StdArray<unsigned short, 11>"/>
</lcgdict>

that are needed do store

  using QIE11dataArray = edm::StdArray<uint16_t, QIE11DigiCollection::MAXSAMPLES + Flavor1::HEADER_WORDS>;
  using QIE10dataArray = edm::StdArray<uint16_t, HBHEDataFrame::MAXSAMPLES + Flavor5::HEADER_WORDS>;

defined in DataFormats/HcalDigi/interface/HcalDigiSoA.h (MAXSAMPLES plus HEADER_WORDS is 11 in both cases).

Could you comment if you see any issues with that, or could you sign for @cms-sw/core-l2 this PR and its 14.0.x backport (#45324) ?

@antoniovilela
Copy link
Contributor

@cms-sw/core-l2 @cms-sw/pdmv-l2
Please review & sign.

@antoniovilela
Copy link
Contributor

antoniovilela commented Jul 2, 2024

@cms-sw/core-l2 @cms-sw/pdmv-l2 Please review & sign.

@smuzaffar
Hi Shahzad,
In case you are signing for Core.
Thanks.

@smuzaffar
Copy link
Contributor

+core

core chages in DataFormats/Common/src/classes_def.xml i.e adding new dict for edm::StdArray<unsigned short, 11> look good

@antoniovilela
Copy link
Contributor

+core

core chages in DataFormats/Common/src/classes_def.xml i.e adding new dict for edm::StdArray<unsigned short, 11> look good

Thanks Shahzad.

@antoniovilela
Copy link
Contributor

@cms-sw/pdmv-l2
We will bypass the signature for the master PR, but please have a look and let us know.

@antoniovilela
Copy link
Contributor

+1

@antoniovilela
Copy link
Contributor

merge

@cmsbuild cmsbuild merged commit f0cd8b9 into cms-sw:master Jul 2, 2024
11 checks passed
def customizeHLTforAlpakaPFSoA(process):

# avoid the conversion from SoA to legacy to SoA for the HCAL rechits
process.hltParticleFlowRecHitHBHESoA.producers[0].src = 'hltHbheRecoSoA'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kakwok @fwyzard seems we missed to protect here the cases where these modules do not exist (I.e. the other tables that are not GRun). This leads to crashes in the integration tests when trying to integrate this in the combined table. Can you provide a fix for this real quick?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmusich does #45484 work for you ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fwyzard thank you.

does #45484 work for you ?

I think so. Martin was also reporting problems related to the Hcal local reco part of the customization concerning hltTowerMakerForAllSerialSync. Apparently the check at the top

if not hasattr(process, 'HLTDoLocalHcalSequence'):

does not cover all the use cases.

@fwyzard
Copy link
Contributor

fwyzard commented Jul 17, 2024

But why do we care about these customisations ?
Can we simply stop calling them once HCAL has been migrated ?

@Martin-Grunewald
Copy link
Contributor

I agree and this will be the case!

But before I was just checking & testing things by applying the ALPAKA customisation to ALL subtables, and it ONLY worked with the GRun one, not with any of the others (HIon,PRef, ...)

@mmusich
Copy link
Contributor

mmusich commented Jul 17, 2024

But why do we care about these customisations ?
Can we simply stop calling them once HCAL has been migrated ?

In the long term yes, but:
a) we need a customization that allows a correct migration (integration tests functionally working after migration for all the subtables)
b) similarly to what done for pixel and ecal I think we want to keep the customization around to be able to apply it to older menus for private studies.

@fwyzard
Copy link
Contributor

fwyzard commented Jul 17, 2024

a) we need a customization that allows a correct migration (integration tests functionally working after migration for all the subtables)

/users/fwyzard/CMSSW_14_0_0/from_HLT_v183/HLT/V23 should be fully migrated.

b) similarly to what done for pixel and ecal I think we want to keep the customization around to be able to apply it to older menus for private studies.

Old versions of the GRun menu should already work, though ?

@mmusich
Copy link
Contributor

mmusich commented Jul 17, 2024

versions of the GRun menu should already work, though ?

Yes, but only those.

@fwyzard
Copy link
Contributor

fwyzard commented Jul 17, 2024

Well, I don't see any reason why we should want to apply a customisation intended for the GRun physics menu to configurations that are 1. obsolete, and 2. not physics menus.

If STORM has any specific example of why it would be needed, please explain it.

Otherwise, as far as I'm concerned there is not need to maintain or modify these customisations any further.

@mmusich
Copy link
Contributor

mmusich commented Jul 17, 2024

If STORM has any specific example of why it would be needed, please explain it.

Nothing directly comes to mind, but a use case might appear in future. Also I think it's dangerous to keep potentially broken customizations accessible to users

Otherwise, as far as I'm concerned there is not need to maintain or modify these customisations any further.

I appreciate your lack of will to amend this any further. We'll take care of that ourselves.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.