Performance: Attribute reading in the HDF5 backend is implemented inefficiently #1639

franzpoeschel · 2024-06-21T15:00:08Z

This issue should serve as documentation of a performance issue seen in our HDF5 backend.

Describe the issue

When parsing an HDF5 dataset, the majority of time will be spent reading attributes. The following graph was created by using openpmd-ls on a PIConGPU output with ~80 iterations:

Looking into the implementation of HDF5IOHandlerImpl::readAttribute():

void HDF5IOHandlerImpl::readAttribute(
    Writable *writable, Parameter<Operation::READ_ATT> &parameters)
{
    /* [ ... ] */

    auto res = getFile(writable);
    File file = res ? res.value() : getFile(writable->parent).value();

    /* [ ... ] */

    hid_t fapl = H5Pcreate(H5P_LINK_ACCESS);

    /* [ ... ] */

    obj_id =
        H5Oopen(file.id, concrete_h5_file_position(writable).c_str(), fapl);
  
    /* [ ... ] */

    H5S_class_t attr_class = H5Sget_simple_extent_type(attr_space);
    Attribute a(0);
    if (attr_class == H5S_SCALAR ||
        (attr_class == H5S_SIMPLE && ndims == 1 && dims[0] == 1))
    {
        if (H5Tequal(attr_type, H5T_NATIVE_CHAR))
        {
            char c;
            status = H5Aread(attr_id, attr_type, &c);
            a = Attribute(c);
        }
        else if (H5Tequal(attr_type, H5T_NATIVE_UCHAR))
        {
            unsigned char u;
            status = H5Aread(attr_id, attr_type, &u);
            a = Attribute(u);
        }
        
    /* [ ... ] */
    /* [ ... ] */
    /* [ ... ] */

        else
            throw error::ReadError(
                error::AffectedObject::Attribute,
                error::Reason::UnexpectedContent,
                "HDF5",
                "[HDF5] Unsupported scalar attribute type for '" + attr_name +
                    "'.");
    }
    else if (attr_class == H5S_SIMPLE)
    {
        if (ndims != 1)
            throw error::ReadError(
                error::AffectedObject::Attribute,
                error::Reason::UnexpectedContent,
                "HDF5",
                "[HDF5] Unsupported attribute (array with ndims != 1)");

        if (H5Tequal(attr_type, H5T_NATIVE_CHAR))
        {
            std::vector<char> vc(dims[0], 0);
            status = H5Aread(attr_id, attr_type, vc.data());
            a = Attribute(vc);
        }
        else if (H5Tequal(attr_type, H5T_NATIVE_UCHAR))
        {
            std::vector<unsigned char> vu(dims[0], 0);
            status = H5Aread(attr_id, attr_type, vu.data());
            a = Attribute(vu);
        }
        
    /* [ ... ] */
    /* [ ... ] */
    /* [ ... ] */

        else
        {

    /* [ ... ] */

        }
    }
    else
        throw std::runtime_error("[HDF5] Unsupported attribute class");
    
    /* [ ... ] */

    status = H5Tclose(attr_type);
    /* [ ... ] */
    status = H5Sclose(attr_space);
    /* [ ... ] */

    auto dtype = parameters.dtype;
    *dtype = a.dtype;
    auto resource = parameters.resource;
    *resource = a.getResource();

    status = H5Aclose(attr_id);
    /* [ ... ] */
    status = H5Oclose(obj_id);
    /* [ ... ] */
    status = H5Pclose(fapl);
    /* [ ... ] */
}

This exposes several problems:

For each read attribute, the object containing the attribute is opened and then closed. For subsequent attribute reads at the same object, this is unnecessary.
For each read attribute, a property is created and then closed. This is generally unnecessary, the property should be stored at the IOHandler level.
I've seen benchmarks before (though not the one above) where H5Tequal() was a sizeable portion of runtime. I'm not sure if we can find a better way to detect the attribute type.

To Reproduce
Not needed as the cause of the inefficiency is understood.

Expected behavior
Would a READ_ATTRIBUTES task help that reads all attributes in a group? --> Not fully since the frontend explicitly reads some attributes. Buffer the attributes of the last active group?
Definitely buffer the attribute read property.

Additional context
Not labeling as a a bug, since this is inefficient, but neither pathetically nor surprisingly so.

The text was updated successfully, but these errors were encountered:

franzpoeschel added the backend: HDF5 label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Attribute reading in the HDF5 backend is implemented inefficiently #1639

Performance: Attribute reading in the HDF5 backend is implemented inefficiently #1639

franzpoeschel commented Jun 21, 2024 •

edited

Loading

Performance: Attribute reading in the HDF5 backend is implemented inefficiently #1639

Performance: Attribute reading in the HDF5 backend is implemented inefficiently #1639

Comments

franzpoeschel commented Jun 21, 2024 • edited Loading

franzpoeschel commented Jun 21, 2024 •

edited

Loading