Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor build_cpe.py #9834

Merged
merged 10 commits into from
Nov 23, 2022
Merged

Conversation

jan-cerny
Copy link
Collaborator

This PR adds some style improvements proposed in #9799:

  • Move function is_parametrized to Symbol class
  • Extract function recurse_or_substitute_or_do_nothing

For more details, please read commit messages of each commit.

@github-actions
Copy link

Start a new ephemeral environment with changes proposed in this pull request:

Fedora Environment
Open in Gitpod

Oracle Linux 8 Environment
Open in Gitpod

Copy link
Member

@matejak matejak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moves in the right direction towards the project guidelines, but it is still possible to improve with little effort. I have pointed out a couple of simple changes.

@@ -112,6 +112,10 @@ def version_definitions(self):
def name(self):
return self.requirement.project_name

@staticmethod
def _is_parametrized(ref):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the method shouldn't be private, as you intend it to be used by other callers, and ref doesn't give a hint about what is expected as parameter. In fact, the naming and usage Symbol.is_parametrized is misleading, because it's not the symbol and its parametrization that you are evaluating - instead, you are looking at a string that represents a platform unit. So please give a thought about a name of the method and of the argument as well.

ssg/utils.py Outdated
new_dict[k] = v.format(**string_dict)
else:
new_dict[k] = v
if ignored_keys is None or k not in ignored_keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to make ignored_keys an empty list/set if it is not given, and to do so at the top of the function. That way, the if statement may be made human-readable.

ssg/utils.py Outdated
@@ -333,7 +333,16 @@ def enum(*args):
return type('Enum', (), enums)


def apply_formatting_on_dict_values(source_dict, string_dict, ignored_keys=[]):
def recurse_or_substitute_or_do_nothing(v, string_dict, ignored_keys):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def recurse_or_substitute_or_do_nothing(v, string_dict, ignored_keys):
def recurse_or_substitute_or_do_nothing(v, string_dict, ignored_keys=frozenset()):

This way you won't have to check for None. It'll retain the iterable type and won't be affected by the "mutable default argument" problem.

@Mab879 Mab879 added the Infrastructure Our content build system label Nov 17, 2022
@openshift-merge-robot openshift-merge-robot added the needs-rebase Used by openshift-ci bot. label Nov 19, 2022
This method doesn't use the `self` argument and we don't know
what the `ref` parameter represents, therefore the `Symbol` class
is a better place to for this method.
This refactoring helps with passing a mutable object as a default
argument which is potentially dangerous - it is a global object, so if
the function adds anything to it during the execution, then subsequent
calls won't have a default of an empty list. For that reason, it is
common to have default values e.g. `None` and to have statements `s.a.
ignored_keys = ignored_keys or []` in the function.  This function is
split into a part that handles the outer loop and ignored keys, and
another part that handles the processing of individual values
`recurse_or_substitute_or_do_nothing`.
We will remove the leading underscore because we plan to not use it as a
private method in the future. Also, we will rename the method because it
might be misleading that we use the method to parametrize the `Symbol`
class but we are using that to evaluate strings that represent platform
units.  Moreover, the `ref` variable name wasn't complete so it didn't
give a hint about the expected parameter so we will rename it as well.
It's better to make `ignored_keys` an empty list/set if it is not given,
and to do so at the top of the function. That way, the `if` statement
may be made human-readable.
@jan-cerny
Copy link
Collaborator Author

I have rebased this PR on the top of the latest upstream master branch. I have used frozenset as a default argument, I have renamed method _is_parametrized to reference_is_parametrized and then set ignored_keys to an empty list explicitly.

return v.format(**string_dict)
else:
return v

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evgenyz Where the frozenset should be after the refactoring to make it work as expected? Should it be in recurse_or_substitute_or_do_nothing or in the apply_formatting_on_dict_values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably in both places, so the default argument to ignored_keys is an empty iterable that is not mutable. That way, you even won't have to set ignored_keys to that empty list.
However, the decomposition into those two functions in this form was not such great idea, because the recurse_or_substitute_or_do_nothing accepts ignored_keys that it doesn't care about - it just passes those on, and it also hardcodes "recursion" to calling apply_formatting_on_dict_values, which creates a circular dependency. So I would say that having seen this, I would either revert to the original code, or if you became sold to the idea of decomposing handling of container and handling of items, I would pass recurse_or_substitute_or_do_nothing a callback, e.g. lambda value, substitutions: recurse_or_substitute_or_do_nothing(value, substitutions, ignored_keys) that would be used in the item-processing function if the item passed is a dictionary.
So sorry, suggesting the decomposition like this was a mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say every place where ignored_keys is an optional argument, so — both.

ssg/utils.py Outdated
"""
Uses Python built-in string replacement.
It replaces strings marked by {token} if "token" is a key in the string_dict parameter.
It skips keys in source_dict which are listed in ignored_keys parameter.
This works only for dictionaries whose values are dicts or strings
"""
new_dict = {}
if ignored_keys is None:
ignored_keys = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ignored_keys=frozenset() these 2 lines won't be needed.

@@ -93,6 +93,10 @@ def arg(self):
def name(self):
return self.requirement.project_name

@staticmethod
def reference_is_parametrized(reference):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with naming still remains - when considering that Symbol class, what is supposed to be this reference argument that the method accepts?

The `ref` variable can contain either CPE ID (eg. `rhel7`) or a CPE
name (eg. `cpe:/o:redhat:enterprise_linux:7`). To better describe
its contents we will rename the variable to `cpe_id_or_name`.
@jan-cerny
Copy link
Collaborator Author

I have renamed a variable ref, I have also renamed the method reference_is_paramaterized to cpe_id_is_parametrized and changed ignored_keys to a frozenset.

@matejak matejak self-assigned this Nov 21, 2022
@@ -84,19 +84,17 @@ def add_cpe_item(self, cpe_item):
def _is_name(self, ref):
return ref.startswith("cpe:")

def _is_parametrized(self, ref):
return re.search(r'^\w+\[\w+\]$', ref)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By applying this change, the re module is no longer needed and does not need to be imported.

ssg/build_cpe.py Outdated
try:
if self._is_name(ref):
return self.cpes_by_name[ref]
if self._is_name(cpe_id_or_name):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matejak Should the _is_name be moved to Symbol class as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, the class ProductCPEs definitely isn't the one that has the knowledge about how data structures format stuff themselves.
The naming is_name is also confusing - for example, to me "Jan" is a name, whereas human:Jan is not what I would identify as name, it's perhaps some sort of an ID.

@jan-cerny
Copy link
Collaborator Author

I have removed the unused re import and I have move the is_name method to the Symbol class.

@codeclimate
Copy link

codeclimate bot commented Nov 22, 2022

Code Climate has analyzed commit b4233e8 and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 47.2% (0.1% change).

View more on Code Climate.

@matejak
Copy link
Member

matejak commented Nov 23, 2022

The renaming is a huge success, thanks!

@matejak matejak merged commit 7657a78 into ComplianceAsCode:master Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Our content build system
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants