Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: include metrics about number of custom rules [CFG-1133] #121

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

teodora-sandu
Copy link
Contributor

What this does

This PR implements a way to get the rules that were written by a customer using the custom rules SDK. The publicId of the rule is what dictates the uniqueness/name of the rule, so we needed a way to parse the Rego files.

Notes for the reviewer

  • OPA introduced a new command called inspect which helps us find all the rego rules in a folder: https://github.com/open-policy-agent/opa/blob/main/cmd/inspect.go
  • The majority of the code is not exposed in the Golang module so we are adapting a couple lines to read all the rego rules in a folder.
  • The inspect command only looks at files and not the contents - but we need a way to get the publicId
  • The way we get the publicId is by reading each file and using a regex to extract the publicId which would be configured in a publicId key - the regex allows for any spacing so that no matter what tabbing and newline characters the customer uses, we will retrieve it
  • The rules are computed and then their number is added to a temporary metadata.json file, which OPA includes then in the data.json as seen in the screenshot
    Screenshot 2021-12-13 at 17 20 51
  • The rules section in the data.json file is generated automatically by OPA if there are json files, which there are under the fixtures/ folder used for testing

More information

@teodora-sandu teodora-sandu requested a review from a team as a code owner December 13, 2021 17:49
@teodora-sandu teodora-sandu force-pushed the chore/add-number-of-rules-metadata branch from 34bec46 to 0fa0b4a Compare December 13, 2021 17:50
@@ -30,6 +32,19 @@ type BuildCommandParams struct {
func RunBuild(args []string, params *BuildCommandParams) error {
buf := bytes.NewBuffer(nil)

var metadataFile = ""
rules, err := util.RetrieveRules(args)
if err == nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an error we don't want to cause problems at build time since this is just metadata

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we account for this in the CLI and dashboards, I think we're fine.

@teodora-sandu teodora-sandu force-pushed the chore/add-number-of-rules-metadata branch from 0fa0b4a to 798cc13 Compare December 13, 2021 17:52
Copy link
Contributor

@aron aron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rules section in the data.json file is generated automatically by OPA if there are json files, which there are under the fixtures/ folder used for testing

This made me stop, we don't want to be including fixture data in the data.json. These should be added to the list of file ignores when we do the build. If these fixtures are not present does this mean the data.json is not created, regardless of the precense of the metadata file? If so we might want to consider another approach.

In the same vein, can we think of any usecases where someone would want their own metadata.json file as part of their own custom rules? In which case we're going to override it.

I wonder if we should just consider writing a .snyk-metadata.json file and adding that to the tarball generated by OPA build?

Comment on lines 82 to 86
if metadataFile != "" {
_ = os.Remove(metadataFile)
}

return out.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both this os.Remove() call and the out.Close() I think would be better served to live in a defer block, that way we ensure they're run regardless of whether this function returns early or not.

util/inspect.go Outdated
Comment on lines 38 to 43
for _, module := range result.Modules {
fileName := filepath.Clean(module.Name)
if !strings.Contains(fileName, "_test.rego") {
fileNames = append(fileNames, fileName)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this filepath not also contain the rule id? Would this be a cleaner alternative to trying to extract the rule from the Rego file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, but there is nothing stopping customers from changing the publicId while keeping the name of the folder unrelated to it. It depends on how strict we want to be but it seemed very likely to me that this could happen

util/inspect.go Outdated
}

func extractPublicIdFromRego(rego string) string {
re := regexp.MustCompile("\"publicId\"\\s*:\\s*\"(.*?)\"")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first read this felt fragile to me, I was going to suggest parsing the Rego file here and then traversing the AST to find this value but honestly I reckon this will work most of the time.

@@ -30,6 +32,19 @@ type BuildCommandParams struct {
func RunBuild(args []string, params *BuildCommandParams) error {
buf := bytes.NewBuffer(nil)

var metadataFile = ""
rules, err := util.RetrieveRules(args)
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we account for this in the CLI and dashboards, I think we're fine.

@teodora-sandu
Copy link
Contributor Author

teodora-sandu commented Dec 14, 2021

Thanks @aron, I think naming this file .snyk-metadata.json is safer. As to the fixture files being included in the data.json, not sure why that's an issue? We can update the ignores to include fixture files though and the file we're generating here will still be included so it wouldn't be a problem. I'm nervous about modifying the tarball generated by OPA but can do some testing to see what happens. I can't think of a reason why letting OPA include the json file in data.json is not okay though. Let me know what you think 🙂

@aron
Copy link
Contributor

aron commented Dec 14, 2021

As to the fixture files being included in the data.json, not sure why that's an issue? We can update the ignores to include fixture files though and the file we're generating here will still be included so it wouldn't be a problem.

My understanding is that data.json is intended to be loaded alongside the bundled rules and available in the rules. We don't currently support this in our CLI but might find we need to as custom rules are developed to use more of the OPA functionality. This would mean that fixture data would be injected into the rego evaluation context and available via data.foo. Though worth checking with @p15r to understand more.

I can't think of a reason why letting OPA include the json file in data.json is not okay though.

Same reason as above really, we'd be potentially injecting a numberOfRules property into the rego data used by the bundle. Not likely a huge issue but definitely a side effect of what we'd intended.

@aron
Copy link
Contributor

aron commented Dec 14, 2021

image

This is related and interesting, from that same [doc](https://www.openpolicyagent.org/docs/latest/management-bundles/#bundle-file-format ).

@p15r
Copy link

p15r commented Dec 14, 2021

This would mean that fixture data would be injected into the rego evaluation context and available via data.foo.

I think only data that must be evaluated should be in data.. I see a potential issue with conflicting "namespaces". We can't/don't control the ruleIDs that customers are using. If they use ruleIDs with the same names we use for some other data, we have a conflict. All our library code is shared via data. as well and we rely on that names/namespaces in our policies.
Therefore, I think it is worth exploring @aron's suggestion of using the .manifest file instead.

@teodora-sandu
Copy link
Contributor Author

I've had a look at using that .manifest file and I got it working by using bundle mode to generate the bundle (-b option), but currently this option does not work with ignores. I'll split this PR in two:

  1. To add the RetrieveRules functionality and unblock our other tasks
  2. To write to the .manifest file - I will try to fix the bug in OPA downstream in the meantime as well

@teodora-sandu teodora-sandu changed the base branch from develop to chore/add-inspect-utility December 15, 2021 11:33
@teodora-sandu teodora-sandu force-pushed the chore/add-number-of-rules-metadata branch from 108ce61 to a925f51 Compare December 15, 2021 11:34
@teodora-sandu teodora-sandu marked this pull request as draft December 16, 2021 14:20
Base automatically changed from chore/add-inspect-utility to develop December 17, 2021 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants