Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid duplicate loading of component files #11195

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

jan-cerny
Copy link
Collaborator

@jan-cerny jan-cerny commented Oct 12, 2023

As discovered in #11190, the components.load() function is called many times during a build of product content. However, the component files need to be loaded only once.

When the build systems resolves the content using compile_all.py, the compile_all.py creates an instance of ssg.builld_yaml.BuildLoader class. Moreover, many other instances of ssg.builld_yaml.BuildLoader class are created recursively as the loader recurses into sub-directories of the given benchmark root directory. When we iterate over the directories, for each child directory the script creates a new instance of the ssg.build_yaml.BuildLoader by the ssg.builld_yaml.BuildLoader._get_new_loader() method. This method should ensure that the component data and other data won't be unnecessarily loaded again but will be just referenced from the parent ssg.builld_yaml.BuildLoader that handles the parent directory.

The problem is the way how the
ssg.builld_yaml.BuildLoader._load_components() method is called in the ctor of ssg.builld_yaml.BuildLoader. The ctor is called before the initialization of the rule_to_components attribute. Also, there is no code in this ctor or elsewhere assuring that it won't load the components data if they were already loaded.

In this commit, we won't call _load_components in the ctor, which will leave the opportunity to _get_new_loader to swap the data as the comment in this method says.

This should bring a speed up about 5 seconds.

As discovered in ComplianceAsCode#11190,
the `components.load()` function is called many times during a build
of product content. However, the component files need to be loaded only
once.

When the build systems resolves the content using `compile_all.py`, the
`compile_all.py` creates an instance of `ssg.builld_yaml.BuildLoader`
class.  Moreover, many other instances of `ssg.builld_yaml.BuildLoader`
class are created recursively as the loader recurses into
sub-directories of the given benchmark root directory.  When we iterate
over the directories, for each child directory the script creates a new
instance of the `ssg.build_yaml.BuildLoader` by the
`ssg.builld_yaml.BuildLoader._get_new_loader()` method. This method
should ensure that the component data and other data won't be
unnecessarily loaded again but will be just referenced from the parent
`ssg.builld_yaml.BuildLoader` that handles the parent directory.

The problem is the way how the
`ssg.builld_yaml.BuildLoader._load_components()` method is called in the
ctor of `ssg.builld_yaml.BuildLoader`. The ctor is called before
the initialization of the `rule_to_components` attribute. Also, there is
no code in this ctor or elsewhere assuring that it won't load the
components data if they were already loaded.

In this commit, we won't call `_load_components` in the ctor, which will
leave the opportunity to `_get_new_loader` to swap the data as the comment
said.

This should bring a speed up about 5 seconds.
@jan-cerny jan-cerny added this to the 0.1.71 milestone Oct 12, 2023
@jan-cerny jan-cerny added the Infrastructure Our content build system label Oct 12, 2023
@github-actions
Copy link

Start a new ephemeral environment with changes proposed in this pull request:

Fedora Environment
Open in Gitpod

Oracle Linux 8 Environment
Open in Gitpod

@codeclimate
Copy link

codeclimate bot commented Oct 12, 2023

Code Climate has analyzed commit 354fdb4 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 33.3% (50% is the threshold).

This pull request will bring the total coverage in the repository to 57.0%.

View more on Code Climate.

@Mab879 Mab879 self-assigned this Oct 12, 2023
Copy link
Member

@Mab879 Mab879 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I this does result in a speed up and I have tested with build process with these chanages.

@Mab879
Copy link
Member

Mab879 commented Oct 17, 2023

I'm waving the code climate coverage as build process more than covers this.

@Mab879 Mab879 merged commit cd7df50 into ComplianceAsCode:master Oct 17, 2023
37 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Our content build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants