Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Towards a transient-persistent layer to read NanoAOD in RDataFrame #45972

Open
lenzip opened this issue Sep 11, 2024 · 7 comments
Open

Towards a transient-persistent layer to read NanoAOD in RDataFrame #45972

lenzip opened this issue Sep 11, 2024 · 7 comments

Comments

@lenzip
Copy link
Contributor

lenzip commented Sep 11, 2024

It would be desirable to be able to write analysis in RDataFrame treating the nanoAOD as a collection of objects, rather than dealing with the individual branches, mainly to reduce bolierplate, e.g.:

  • in all instances in which a 4 momentum is needed
  • to avoid having to order several branches, when reordering
  • in functions that e.g. compute MVAs one could pass the object rather than the individual RVecs.

This is already implemented in some tools that use RDataFrame as a backend.
For example in Bamboo this is implemented in a python layer that then translates under the hood in RDataFrame actions on individual columns.

Ideally one would like to be able to build, e.g., an RVec of Muon objects from the individual Muon_attributes RVecs, with something like:

df = df.Define("Muons", "some_function_to_build_muons")

Requirements:

  1. it should allow using the . operator, i.e. one should be able to do Muons[0].pt or Muon[0].pt(), rather than Muon_pt[0].
  2. The reading of branches should be lazy, i.e. only happen if required. In other words one wants to avoid a performance penalty in reading all attributes of muons from the file, if one only uses a few in the analysis.

ROOT does offer a ROOT::VecOps::Construct function that allows building custom objects, e.g. the following code builds a RVec of muon 4-momenta:

import ROOT

df = ROOT.RDataFrame("Events", "root://eospublic.cern.ch//eos/root-eos/benchmark/Run2012B_SingleMu.root")
ROOT.RDF.Experimental.AddProgressBar(df);


df = df.Define("Muon_p4", "ROOT::VecOps::Construct<ROOT::Math::PtEtaPhiMVector>(Muon_pt, Muon_eta, Muon_phi, Muon_mass)")

histo = df.Define("Muon_mt", "return Map(Muon_p4, [](auto v) {return v.Mt();})").Histo1D(("new", "new", 100, 0., 100.), "Muon_mt")

histo.Draw()

This achieves 1. above, but not 2., i.e. it is not lazy, all branches are accessed to pass arguments to the ROOT::Math::PtEtaPhiMVector constructor.

Status:

ROOT people have already discussed this topic with us and in a ROOT PPP meeting.
Afterwards they have given us pointers to the use of ROOT::VecOps::Construct function described in the example above.
We only recently mentioned requirement 2. (lazy branch read in this objectification layer), and they mentioned they are thinking of a possible solution.

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 11, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @lenzip.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@lenzip
Copy link
Contributor Author

lenzip commented Sep 11, 2024

I am posting this here as requested yesterday at the core software meeting, although this does not concern cmssw specifically, just ROOT

@makortel
Copy link
Contributor

type root

@cmsbuild cmsbuild added the root label Sep 11, 2024
@makortel
Copy link
Contributor

assign analysis

I feel this is the closest match, even if this issue doesn't concern CMSSW directly.

@makortel
Copy link
Contributor

Thanks @lenzip!

@cmsbuild
Copy link
Contributor

New categories assigned: analysis

@tvami you have been requested to review this Pull request/Issue and eventually sign? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Work in ROOT
Development

No branches or pull requests

3 participants