Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements for file permission and ownership templates #8456

Conversation

marcusburghardt
Copy link
Member

@marcusburghardt marcusburghardt commented Mar 29, 2022

These templates used to hit performance issues when directories were
recursively assessed. The OVAL logic was slightly changed to improve the
performance around 50% or more for some rules.

This template used to hit performance issues when directories were
recursively assessed. The OVAL logic was slightly changed to improve the
performance around 50% or more for some rules.
@marcusburghardt marcusburghardt added the do-not-merge/work-in-progress Used by openshift-ci bot. label Mar 29, 2022
Like file_owner template, file_groupowner was also prone to hit performance
issues when directories were recursively assessed. The OVAL logic was slightly
changed to increase the performance for some rules.
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Used by openshift-ci bot. label Mar 29, 2022
@github-actions
Copy link

Start a new ephemeral environment with changes proposed in this pull request:

Open in Gitpod

OVAL assessment was working with include filter to collect file objects
and in a second step, checking these objects against a file state. This
approach was collecting all file objects regardless of their permissions.
This means that many objects may be evaluated unnecessarily.

The logic was changed to exclude already compliant files from the file
objects list. The permission check itself is expensive, limiting the
performance gain. Even so the performance was slightly improved.
@marcusburghardt
Copy link
Member Author

Three templates which are quite used by rules were improved in this PR, in order to improve the performance during the assessment. The performance gains may vary depending on each rule, but it was already possible to see more than 50% of time reduction while testing locally in a VM.

Here is an example of the results:

file_owner template before the changes:

$ time ./tests/test_suite.py rule --libvirt qemu:///session rhel85 --datastream build/ssg-rhel8-ds.xml --debug --dontclean file_ownership_library_dirs
INFO - xccdf_org.ssgproject.content_rule_file_ownership_library_dirs
INFO - Script missing_file_test.pass.sh using profile (all) OK
INFO - Script incorrect_owner.fail.sh using profile (all) OK
INFO - Script correct_owner.pass.sh using profile (all) OK
INFO - Script incorrect_owner_within_dir.fail.sh using profile (all) OK
INFO - Script incorrect_symlink.pass.sh using profile (all) OK

real    6m23.272s
user    0m54.390s
sys     0m4.517s

file_owner template after the changes:

$ time ./tests/test_suite.py rule --libvirt qemu:///session rhel85 --datastream build/ssg-rhel8-ds.xml --debug --dontclean file_ownership_library_dirs
INFO - xccdf_org.ssgproject.content_rule_file_ownership_library_dirs
INFO - Script incorrect_owner.fail.sh using profile (all) OK
INFO - Script correct_owner.pass.sh using profile (all) OK
INFO - Script missing_file_test.pass.sh using profile (all) OK
INFO - Script incorrect_owner_within_dir.fail.sh using profile (all) OK
INFO - Script incorrect_symlink.pass.sh using profile (all) OK

real    3m15.036s
user    0m32.963s
sys     0m2.096s

@marcusburghardt marcusburghardt changed the title Performance improvements for file_owner template Performance improvements for file permission and ownership templates Mar 29, 2022
@@ -13,23 +13,12 @@
{{% endif %}}
</criteria>
</definition>
{{%- if MISSING_FILE_PASS -%}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OVAL template now doesn't contain this parameter at all. How does it work now?

The docs says:

 -   **missing_file_pass** - If set to `"true"` the OVAL check will 
     pass when file is absent. Default value is `"false"`.

I have an impression that it now doesn't work when this is set to false. When it's set to false and the given file doesn't exist the rule should fail. But with this PR change if the given file doesn't exist, the check_existence="none_exists" is satisfied, therefore the rule will pass. Is my understanding correct?

Now the question is how to solve this change. I assume that the missing_file_pass set to false makes sense only when looking for specific files and not for recursive search. For the recursive search (i.e. rules with recursive: true) we usually don't care if something exists. But, our problem is mainly for recursively searching rules because in these rules we have the problem of collecting large amounts of item. We don't need this optimization for rules that match a single file or a couple of file in the same directory. Therefore we could have 2 versions of the OVAL code: one for recursive searching with the filter optimization and one for the other cases. We will document that the missing_file_pass = false is mutually exclusive with recursive = true. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should remove this parameter. I didn't propose to remove from documentation because it should also be removed from many rule.yml files and I don't think this is mandatory for now since a behavior change is not expected. We can gradually deprecate this parameter for these affected templates.

These templates intend to assess the permissions and ownership (uid and gid) of files. Doesn't make any sense to me this assessment for files which are not present. If a file actually doesn't exist, it is quite logical that it won't have problems with permissions and ownership. The idea of the parameter missing_file_pass only makes sense for templates and rules which actually want to asses file existence and not their properties, like file_existence template or the banner_etc_motd rule, for example.

Also, seems that this parameter was introduced in these templates due to the design of the old OVAL logic where an empty file_object would affect the results based on the check_existence= parameter. In other words, this parameter is useless for these templates if we used check_existence=none_exist for the file_test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jan here, the rule passing when a specific file is absent is a behaviour changes we do not want.
We have a specific parameter that allows this behaviour, and I think we should keep it, (if we want to remove it, it should be done in master).

Doesn't make any sense to me this assessment for files which are not present.

I think it depends on the expectation of the rule. There may be rules that don't care if it is missing, while others may care if the file is missing.
From my PoV, this is a flexibility of the template that we want to keep.

@jan-cerny
Copy link
Collaborator

Measuring the time using ssg test suite isn't the best approach. When we discussed that we were concerned of 2 specific things:

  • size of the generated OVAL results
  • time of the XML validation

@mildas
Copy link
Contributor

mildas commented Mar 30, 2022

E8 profile
Stabilization branch - 94MB OVAL results, XML validation 7.5h
This PR - 32MB OVAL results, XML validation 24min

It is much better now. Not so great as it was in previous release - 7MB results and <1min validation - but I don't think these old values are realistic anymore.

@marcusburghardt
Copy link
Member Author

Measuring the time using ssg test suite isn't the best approach. When we discussed that we were concerned of 2 specific things:

  • size of the generated OVAL results
  • time of the XML validation

The size of the OVAL results generated for these affected templates directly reflects the total time, both to generate the file and to process the XML. Therefore, I preferred to report the total time here. But in fact, XML file sizes have also been reduced following a direct proportion of the total time which, according to my local tests during the development, varies for each rule.

@yuumasato yuumasato added this to the 0.1.61 milestone Mar 30, 2022
@yuumasato yuumasato self-assigned this Mar 30, 2022
Copy link
Member

@yuumasato yuumasato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perfomance improvement we were looking for was not so much on the OVAL evaluations, but on the number of objects collected.

It is clear that the exclude filters are the way to do that, :)

Now, I'd prefer to keep the behaviour of missing_file_pass and allow_stricter_permission in the templates for now. If we want to remove them, we should do so in master branch first, this Pr should focus on fixing the performance bug.

Copy link
Member

@yuumasato yuumasato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The object collection improvements in this PR are good, but there is a change in behaviour regading missing_file_pass.
Now, the rules will pass when the file doesn't exist. This may be not ideal as expectations around the rule may differ, and checking the existence of the file becomes an issue.

But after discussing with @marcusburghardt and @Mab879, we agree that the existence of the file we are checking the owner / groupowner /permissions of should be done as an applicability check.
If the file exists, we check their properties, if the file doesn't exist the rule results in not applicable.
This ultimately avoids resulting pass when the file we are checking doesn't exist, and doesn't add the complexity and weight of a second OVAL to the template.

@yuumasato yuumasato added bugfix Fixes to reported bugs. Update Rule Issues or pull requests related to Rules updates. labels Mar 30, 2022
@yuumasato yuumasato merged commit 353bda6 into ComplianceAsCode:stabilization-v0.1.61 Mar 30, 2022
@marcusburghardt marcusburghardt deleted the recursive_find_performance branch March 30, 2022 17:19
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Fixes to reported bugs. Update Rule Issues or pull requests related to Rules updates.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants