Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for std::array event products #30493

Open
makortel opened this issue Jul 1, 2020 · 24 comments
Open

Support for std::array event products #30493

makortel opened this issue Jul 1, 2020 · 24 comments

Comments

@makortel
Copy link
Contributor

makortel commented Jul 1, 2020

Current std::array<T, N> products can not be put into the event because framework complains from missing dictionary. std::array is one of those types that ROOT handles internally without an explicit dictionary. Our current check (or one of them)

// A related free function
bool hasDictionary(std::type_info const& ti) {
if (ti.name()[1] == '\0') {
// returns true for built in types (single character mangled names)
return true;
}
return (TClassTable::GetDict(ti) != nullptr);
}

handles built-in types as a special case, and then asks from TClassTable if the type has a dictionary. For std::array the TClassTable::GetDict() returns a nullptr, and therefore the framework code thinks the std::array to not have a dictionary.

We should look for a better way to ask from ROOT if its parser system can interact with a type T.

@makortel
Copy link
Contributor Author

makortel commented Jul 1, 2020

assign core

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 1, 2020

New categories assigned: core

@Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@cmsbuild
Copy link
Contributor

cmsbuild commented Jul 1, 2020

A new Issue was created by @makortel Matti Kortelainen.

@Dr15Jones, @silviodonato, @dpiparo, @smuzaffar, @makortel can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

makortel commented Jul 1, 2020

@pcanal suggested to look into TClass::GetMissingDictionaries() or TDictionary::GetDictionary().

@makortel
Copy link
Contributor Author

makortel commented Jul 1, 2020

This issue is related to investigations done as part of #27277.

@makortel
Copy link
Contributor Author

makortel commented Jul 2, 2020

Mostly to document for myself. I replaced

// A related free function
bool hasDictionary(std::type_info const& ti) {
if (ti.name()[1] == '\0') {
// returns true for built in types (single character mangled names)
return true;
}
return (TClassTable::GetDict(ti) != nullptr);
}

with

  // A related free function                                                                                                                                                                                                                                                   
  bool hasDictionary(std::type_info const& ti) {
    return TDictionary::GetDictionary(ti) != nullptr;
  }

after which my test failed with

An exception of category 'DictionaryNotFound' occurred while
   [0] Constructing the EventProcessor
   [1] Calling OutputModuleBase::keepThisBranch, checking dictionaries for kept types
Exception Message:
No data dictionary found for the following classes:

  std::array<int,4>

Most likely each dictionary was never generated, but it may
be that it was generated in the wrong package. Please add
(or move) the specification '<class name="whatever"/>' to
the appropriate classes_def.xml file along with any other
information needed there. For example, if this class has any
transient members, you need to specify them in classes_def.xml.
Also include the class header in classes.h

That exception is thrown from

if (!checkDictionary(missingDictionaries, desc.className(), desc.unwrappedType())) {
std::string context("Calling OutputModuleBase::keepThisBranch, checking dictionaries for kept types");
throwMissingDictionariesException(missingDictionaries, context);
}

where the checkDictionary() overload is this one
bool checkDictionary(std::vector<std::string>& missingDictionaries,
std::string const& name,
TypeWithDict const& typeWithDict) {
if (!bool(typeWithDict) || typeWithDict.invalidTypeInfo()) {
missingDictionaries.emplace_back(name);
return false;
}
return true;
}

The branch is fired because typeWithDict.invalidTypeInfo() == true, which translates to
bool TypeWithDict::invalidTypeInfo() const { return *ti_ == typeid(dummyType) || isPointer() || isArray(); }

The isPointer() and isArray() both return false, so *ti_ == typeid(dummyType) must hold.

The TypeWithDict::ti_ can be set to dummyType only in this constructor (when TypeWithDict::class_ != nullptr)

TypeWithDict::TypeWithDict(TClass* cl, long property)
: ti_(cl->GetTypeInfo()),
class_(cl),
enum_(nullptr),
dataType_(nullptr),
arrayDimensions_(nullptr),
property_(property) {
if (ti_ == nullptr) {
ti_ = &typeid(TypeWithDict::dummyType);
}
}

when TClass::GetTypeInfo() == nullptr, which is the case with std::array<T, N> (confirmed with a printout there, and in ROOT prompt).

I need to think a bit more how to proceed from here.

@slava77
Copy link
Contributor

slava77 commented Jul 8, 2020

I just wanted to check on the status

@pcanal
Copy link
Contributor

pcanal commented Jul 8, 2020

he isPointer() and isArray() both return false,

Shouldn't isArray() return true here? (I.e. should it be expanded to also return true for std::array)?

@makortel
Copy link
Contributor Author

makortel commented Jul 8, 2020

he isPointer() and isArray() both return false,

Shouldn't isArray() return true here? (I.e. should it be expanded to also return true for std::array)?

Good question. I need to understand if we use isArray() anywhere in a way that would need to differentiate between std::array<T ,N> and T[N].

On the other hand isArray() == true might not help much, TypeWithDict::invalidTypeInfo() would still return true, and

bool checkDictionary(std::vector<std::string>& missingDictionaries,
std::string const& name,
TypeWithDict const& typeWithDict) {
if (!bool(typeWithDict) || typeWithDict.invalidTypeInfo()) {
missingDictionaries.emplace_back(name);
return false;
}
return true;
}

would add the type to missingDictionaries. This makes me wonder if that check, or the call from OutputModuleBase::keepThisBranch() are fully appropriate (for std::array). Of course it could be that changing isArray() == true would lead to another code branch to be taken elsewhere.

@pcanal
Copy link
Contributor

pcanal commented Jul 8, 2020

Of course it could be that changing isArray() == true would lead to another code branch to be taken elsewhere.

It "should" since a C-style would not have a TClass and would not have a typeinfo stored by ROOT.

TypeWithDict::invalidTypeInfo() would still return true

Indeed, since the dictionary for the std::array is not generate we do not acquire the type_info for it (we 'could' but it would require header parsing at run-time which we are trying to avoid).

@makortel
Copy link
Contributor Author

makortel commented Jul 9, 2020

need to understand if we use isArray() anywhere in a way that would need to differentiate between std::array<T ,N> and T[N].

Fortunately all calls to TypeWithDict::isArray() seem to be within TypeWithDict. If we'd change it to return true for std::array, the consequences would be

  • TypeWithDict::typeInfo() would throw an exception
    throw Exception(errors::LogicError)
    << "Function TypeWithDict::typeInfo: Type\n"
    << qualifiedName() << "\ndoes not have valid type_info information in ROOT\n"
    << "because it is " << category << ".\n";
    }

    On the other hand it should throw an exception already now because of *ti_ == typeid(dummyType)
  • TypeWithDict::getClass() would return nullptr instead of TClass * corresponding the std::array
  • TypeWithDict::isClass() would return false instead of true
  • TypeWithDict::name() would return T[N] instead of std::array<T, N>
  • TypeWithDict::arrayDimension() would return a value instead of failing an assertion
  • TypeWithDict::maximumIndex() would return a value instead of failing an assertion
  • TypeWithDict::toType() would return a TypeWithDict corresponding T instead of std:array<T, N> (i.e. a copy)

The consequences of TypeWithDict::isClass() returning false would be

  • TypeWithDict::isTemplateInstance() would return false instead of true
  • TypeWithDict::dataMemberSize() would return 0 instead of 1
  • TypeWithDict::functionMemberSize() would return 0 instead of 33
  • TypeWithDict::dataMemberByName() would return default-constructed MemberWithDict()
  • TypeWithDict::functionMemberByName() would return default-constructed FunctionWithDict()
  • TypeWithDict::templateArgumentAt() would return default-constructed TypeWithDict()

I didn't continue further, but based on this I'm not sure if treating std::array<T, N> exactly like T[N] would be sufficient.

@makortel
Copy link
Contributor Author

DictionaryTools.h has two overloads for the checkDictionary()

  1. bool checkDictionary(std::vector<std::string>& missingDictionaries, TypeID const& typeID) {
    TClass::GetClass(typeID.typeInfo());
    if (!hasDictionary(typeID.typeInfo())) {
    // a second attempt to load
    TypeWithDict::byName(typeID.className());
    }
    if (!hasDictionary(typeID.typeInfo())) {
    missingDictionaries.emplace_back(typeID.className());
    return false;
    }
    return true;
    }
  2. bool checkDictionary(std::vector<std::string>& missingDictionaries,
    std::string const& name,
    TypeWithDict const& typeWithDict) {
    if (!bool(typeWithDict) || typeWithDict.invalidTypeInfo()) {
    missingDictionaries.emplace_back(name);
    return false;
    }
    return true;
    }

(the definition of hasDictionary() can be seen in #30493 (comment)

There are also two overloads for checkDictionaryOfWrappedType()

bool checkDictionaryOfWrappedType(std::vector<std::string>& missingDictionaries, TypeID const& unwrappedTypeID);
bool checkDictionaryOfWrappedType(std::vector<std::string>& missingDictionaries, std::string const& unwrappedName);

that both call the overload 2 of checkDictionary().

The checkDictionary() is called from

  • void ProductRegistry::initializeLookupTables(std::set<TypeID> const* productTypesConsumed,
    std::set<TypeID> const* elementTypesConsumed,
    std::string const* processName) {
    • two calls to overload 2
  • void ProductRegistry::checkDictionariesOfConsumedTypes(
    std::set<TypeID> const* productTypesConsumed,
    std::set<TypeID> const* elementTypesConsumed,
    std::map<TypeID, TypeID> const& containedTypeMap,
    std::map<TypeID, std::vector<TypeWithDict> >& containedTypeToBaseTypesMap) {
    • two calls to overload 1, one call to overload 2
  • void ProductRegistryHelper::addToRegistry(TypeLabelList::const_iterator const& iBegin,
    TypeLabelList::const_iterator const& iEnd,
    ModuleDescription const& iDesc,
    ProductRegistry& iReg,
    ProductRegistryHelper* iProd,
    bool iIsListener) {
    • one call to overload 1, one call to overload 2
  • each global, limited, and one
    void OutputModuleBase::keepThisBranch(BranchDescription const& desc,
    std::map<BranchID, BranchDescription const*>& trueBranchIDToKeptBranchDesc,
    std::set<BranchID>& keptProductsInEvent) {
    • have one call to overload 2
  • bool public_base_classes(std::vector<std::string>& missingDictionaries,
    TypeID const& typeID,
    std::vector<TypeWithDict>& baseTypes) {
    • one call to overload 1, one call to overload 2

@makortel
Copy link
Contributor Author

The two overloads of checkDictionary() were added in #15198. Before, there were

  • bool
    checkTypeDictionary(TypeID const& type, TypeSet& missingTypes) {
    TClass *cl = TClass::GetClass(type.typeInfo(), true);
    if(cl == nullptr) {
    // Assume not a class
    return true;
    }
    if(!cl->HasDictionary()) {
    missingTypes.insert(type);
    return false;
    }
    return true;
    }
  • bool
    checkClassDictionary(TypeID const& type, TypeSet& missingTypes) {
    TClass *cl = TClass::GetClass(type.typeInfo(), true);
    if(cl == nullptr) {
    throw Exception(errors::DictionaryNotFound)
    << "No TClass for class: '" << type.className() << "'" << std::endl;
    }
    if(!cl->HasDictionary()) {
    missingTypes.insert(type);
    return false;
    }
    return true;
    }

I'm currently trying to understand why the overload 2 of checkDictionary() is needed.

@pcanal
Copy link
Contributor

pcanal commented Jul 15, 2020

Overload 1 does (explicitly but somewhat obscurely):

  • trigger the auto-loading of the dictionary library about the type
  • check the result

while Overload 2 assumes that the dictionary library library is already loaded (and that the answer is already compiled in TypeWithDict object).

@makortel
Copy link
Contributor Author

I tried to just call the checkDictionary() overload 1 from overload 2, but didn't work because the OutputModule takes the TypeWithDict from the BranchDescription, which in turn initializes the TypeWithDict from the type name here

setUnwrappedType(TypeWithDict::byName(fullClassName()));

I'm not really sure why the (unwrapped) TypeWithDict is initialized from type name also when a TypeWithDict is given as an argument to the constructor of BranchDescription

setUnwrappedType(theTypeWithDict);

and that TypeWithDict is initialized from std::type_info here
TypeWithDict type(p->typeID_.typeInfo());
BranchDescription pdesc(branchType,

(still need to investigate where that std::type_info originates from)

Another attempt I did was to replace the checkDictoinary() overload 2 along what I did with overload 1 in #30493 (comment) with

  bool checkDictionary(std::vector<std::string>& missingDictionaries,
                       std::string const& name,
                       TypeWithDict const& typeWithDict) {
    if ((!bool(typeWithDict) || typeWithDict.invalidTypeInfo()) and not hasDictionary(typeWithDict.name())) {
      missingDictionaries.emplace_back(name);
      return false;
    }
    return true;

  bool hasDictionary(std::string const& typeName) {
    return hasDictionary(typeName.c_str());
  }
  bool hasDictionary(char const* typeName) {
    return TDictionary::GetDictionary(typeName) != nullptr;
  }

This allowed me to pass the checkDictionary() checks, but then I encountered a call to BranchDescription.unwrappedTypeID() in

token = consumes(TypeToGet{desc.unwrappedTypeID(), PRODUCT_TYPE},
InputTag{desc.moduleLabel(), desc.productInstanceName(), desc.processName()});

which leads to a call to TypeWithDict::typeInfo(), which then throws an exception because the held type_info is dummy. That raises an additional question if we really need a valid std::type_info there (based on quick look I'd say "yes").

@makortel
Copy link
Contributor Author

On the other hand the consumes() call OutputModuleBase does work for basic types (like int). If I guess correctly, the TypeWithDict have a valid std::type_info for basic types because of

// Handle built-ins
TDataType* theDataType = gROOT->GetType(name.c_str());
if (theDataType) {
switch (theDataType->GetType()) {
case kUInt_t:
return TypeWithDict(typeid(unsigned int), property);
case kInt_t:
return TypeWithDict(typeid(int), property);
case kULong_t:
return TypeWithDict(typeid(unsigned long), property);
case kLong_t:
return TypeWithDict(typeid(long), property);
case kULong64_t:
return TypeWithDict(typeid(unsigned long long), property);
case kLong64_t:
return TypeWithDict(typeid(long long), property);
case kUShort_t:
return TypeWithDict(typeid(unsigned short), property);
case kShort_t:
return TypeWithDict(typeid(short), property);
case kUChar_t:
return TypeWithDict(typeid(unsigned char), property);
case kChar_t:
return TypeWithDict(typeid(char), property);
case kBool_t:
return TypeWithDict(typeid(bool), property);
case kFloat_t:
return TypeWithDict(typeid(float), property);
case kFloat16_t:
return TypeWithDict(typeid(Float16_t), property);
case kDouble_t:
return TypeWithDict(typeid(double), property);
case kDouble32_t:
return TypeWithDict(typeid(Double32_t), property);
case kCharStar:
return TypeWithDict(typeid(char*), property);
case kDataTypeAliasSignedChar_t:
return TypeWithDict(typeid(signed char), property);
}
}

@pcanal
Copy link
Contributor

pcanal commented Jul 15, 2020

I tried to just call the checkDictionary() overload 1 from overload 2,

That seems wrong. Overload 2 knows it does not need overload 1. What is the goal trying this?

@makortel
Copy link
Contributor Author

I tried to just call the checkDictionary() overload 1 from overload 2,

That seems wrong. Overload 2 knows it does not need overload 1. What is the goal trying this?

I wanted to see what happens if I skip the "answer already compiled in TypeWithDict" given that for std::array<T, N> the information in TypeWithDict is wrong in some way (and the exact way it is wrong is what I'm trying to understand).

@makortel
Copy link
Contributor Author

I'm not really sure why the (unwrapped) TypeWithDict is initialized from type name also when a TypeWithDict is given as an argument to the constructor of BranchDescription

On the other hand it would certainly be inconsistent if TypeWithDict::byName() and TypeWithDict::byTypeInfo() to give different answers regarding whether the TypeWithDict has a valid std::type_info or not.

@wddgit
Copy link
Contributor

wddgit commented Aug 6, 2021

Matti asked me to look into this. Below are some comments
about what I found (I know this repeats some parts of the
discussion above). Matti asked me to answer two questions,
plus there is an extra comment at the end.


First Question - How hard is it to make CMSSW work with std::array
as a product type?

This is the major problem. TClass::GetClass("std::array<int,4>")
will return a non-null pointer to a TClass. If you use the
pointer to call GetTypeInfo(), then you get a nullptr. We use
the type_info. The lookup tables that we use to find data items
depend on the type_info (through the TypeID, but that is mostly
a wrapper around a type_info*). The ProductResolvers are ordered
based on the TypeID. The whole getByToken design is based on
this.

The type_info is used by the output module when it declares
the products it consumes. That is why there is a dictionary
check for the products an output module keeps.

Independent of the type_info, we use the dictionary in
the code that supports View. If we want to support
View access into std::array or things like
std::vector of std::array, we would need the dictionary
for std::array.

There could be other issues. Things like cutParser and
expressionParser use ROOT dictionaries. There
are things that go directly to TClass to do various things.
I would not be surprised to find other problems that
would result from writing std::array's to output
files.

My first thought is that the best thing is simply to
say we do not support std::array as a product type,
because ROOT does not support them in the same way
it supports other types. One can use a std::vector as
a substitute for a std::array in a data product.
Also, I did a test. One can at least read and write
data products that contain a std::array as a data
member. A simple class that wraps a std::array
should work already, at least as far as the Framework
is concerned (other things might have problems with that,
I don't know).

I tried to think of some alternatives:

For produced products, we can get the type_info from
the "produces" declaration in the EDProducer. It would
take some development effort to make this work, but those
type_info's are available. This does not work for products
read from the input. For those products, we have a string
with the class name and that is all that is saved in the
file in the persistent part of the BranchDescription. We
use TClass to convert the class-name string into a type_info.
I do not see an easy alternative for this case. For transient
products this is not a problem, but for non-transient
products this is a major problem.

I do not see any easy alternate way to get the type_info.
Here are some ideas, but I don't like any of them:

  1. Convince ROOT to support std::array in a way that allows
    getting the type_info.

  2. We could manually code a function like this with cases
    for each full specialization of std::array:

    type_info const& getTypeInfo(std::string const& className) {
        if (className == "std::array<int, 4>") {
            return typeid(std::array<int, 4>);
        } else if (... all the other cases we use manually implemented ...
        ...
    }
  1. Use some reflection system other than ROOT dictionaries.

  2. We could redesign our getByToken infrastructure to work
    on strings with the class name. This wouldn't be easy and
    would probably run slower. We could also redesign EDConsumerBase
    so that it doesn't depend on the type_info.

If we wanted to support Views into std::array, we would also
need to do work there. This would go beyond being able
to get the type_info.


Second question - What do we use dictionaries for in the Framework?
Matti asked this question while thinking about whether we can
allow transient products without dictionaries.

  1. We use the dictionaries to translate a class name to
    a type_info. We use this type_info in our product lookup
    system (getByToken) and also in EDConsumerBase.
    Note that with some significant work, we could always get
    the type_info from the "produces" declaration in the EDProducer.
    Transient products are always "produced", never from the input
    file.

  2. We use dictionaries in the code that implements Views.
    (Currently we need dictionaries for the top level class, the
    Wrapper, the contained class and the base classes of the
    contained classes. We already take advantage of the fact
    that we do not need dictionaries for all constituents of transient
    classes)

  3. We use the dictionary of the wrapped type to control
    whether a class is transient or not (also can set the
    basketSize and splitLevel in classes_def.xml, these are
    accessible only from the dictionary). The critical thing is
    you currently need the dictionary of the wrapped type to
    determine whether or not a type is transient.

  4. We use dictionaries to identify which Run and Lumi
    products are mergeable. This only applies to Run and
    Lumi products.


One unrelated other thing that I noticed while doing this.
There is a BranchDescription constructor that has as
an argument a TypeWithDict. That argument is used to
initialize something, but before that value is ever
used it gets overwritten. We could delete that argument
and all the code related to it without affecting anything.

@pcanal
Copy link
Contributor

pcanal commented Sep 16, 2021

(still digesting but a quick note):

things like std::vector of std::array

nesting of std::array is not yet supported by the I/O

@pcanal
Copy link
Contributor

pcanal commented Sep 16, 2021

For 1 (and somewhat 2) (TClass returns a typeinfo or getTypeInfo special case std::array), the major question/trade-off is that we need to either:
a. enumerate all std::array instances used (in selection.xml or getTypeInfo) [ Error prone / missing instances ]
b. use the interpreter to produce the type info [ slightly slower, increase memory use, potentially significantly ]
c. auto-generate the dictionary for the std::array (similar to STL collection) [ doesn't support top level std::array, only those nested in a class or struct ]

@pcanal
Copy link
Contributor

pcanal commented Sep 20, 2021

@wddgit Thanks for the detailed information. I am reviewing if there is a good practical solution.

@makortel
Copy link
Contributor Author

makortel commented Oct 20, 2023

#43076 provides a workaround for storing std::array directly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants