Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached cloudwatch log group tags #540

Merged
merged 19 commits into from
Mar 17, 2022
Merged

Conversation

jvanbrie
Copy link
Contributor

@jvanbrie jvanbrie commented Mar 10, 2022

What does this PR do?

This pr separates out our caching logic and creates a new cache specifically for cloudwatch log tags.

Motivation

The caching logic was built around the lambda custom tags use case, and that isn't great for future caching use cases. In addition we've gotten request for support for cloudwatch log group tags.

Testing Guidelines

Lambdas are running this new code just fine. We can see the s3 caches updating as well.

Additional Notes

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)
  • This PR passes the unit tests
  • This PR passes the installation tests (ask a Datadog member to run the tests)

@jvanbrie jvanbrie changed the title cached log tags cached cloudwatch log group tags Mar 14, 2022
@jvanbrie jvanbrie marked this pull request as ready for review March 14, 2022 14:39
Copy link

@dylanburati dylanburati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

DD_S3_BUCKET_NAME, self.CACHE_LOCK_FILENAME
)
try:
file_content = cache_lock_object.get()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I fully understand how this s3 file is being used as a lock. My thought would be that there would be something to block on acquiring the lock, but I'm not seeing that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To acquire the lock we create a lock s3 file, and then to remove the lock we delete it. If the file exists we know that the lock is active and there is a lambda running that's in the process of updating the tags. If the lock is active and we didn't grab it, then we don't need to update the tags in the current running lambda process as they'll be updated in the process that grabbed the lock.

@jvanbrie jvanbrie mentioned this pull request Mar 14, 2022
13 tasks
"""
new_tags = {}
for log_group in self.tags_by_id.keys():
new_tags[log_group] = get_log_group_tags(log_group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return False (first var) if there was an exception here similar to in LambdaCustomTagsCache?

I think if there was an exception from get_log_group_tags, we'd overwrite as empty tags which we don't want to do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're making multiple api calls to refill the cache, so I don't want to return false and not write to the cache if one of those api calls fails. I suppose if all the api calls fail we could return False here and skip writing to the s3 cache.
If the api call fails we can take the value we have stored from the local cache to avoid overwriting on a failed api call.

log_group_tags = self.tags_by_id.get(log_group, None)
if log_group_tags is None:
log_group_tags = get_log_group_tags(log_group)
self.tags_by_id[log_group] = log_group_tags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? I think we're updating self.tags_by_id in the _refresh function

Copy link
Contributor Author

@jvanbrie jvanbrie Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we'll get here if we're encountering a log group that isn't in self.tags_by_id, so it'll be a log group we haven't gotten logs from before. In this case we grab tags for it and add it temporarily to self.tags_by_id. It'll get added to the s3 cache when we refresh. This lets us get tags for it right away rather than waiting for a refresh or refreshing all tags whenever we get 1 new log group.

@magnetikonline
Copy link
Contributor

Just to echo my comments in #531 (comment) - it seems that the existing DD_FETCH_LAMBDA_TAGS argument will be the only way to opt out of this feature?

@jvanbrie
Copy link
Contributor Author

jvanbrie commented Mar 16, 2022

Yep, all tag collection will be controlled by that argument.
Edit: Just made a change to separate out control over both tag fetches.

Copy link
Contributor

@hghotra hghotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼

@magnetikonline
Copy link
Contributor

Edit: Just made a change to separate out control over both tag fetches.

Wonderful - thanks @jvanbrie - reviewed that commit, makes this whole PR opt-in 👍

Co-authored-by: Peter Mescalchin <peter@magnetikonline.com>
jvanbrie and others added 3 commits March 17, 2022 13:42
Co-authored-by: Peter Mescalchin <peter@magnetikonline.com>
@jvanbrie jvanbrie merged commit 48581da into master Mar 17, 2022
@jvanbrie jvanbrie deleted the jon.vanbriesen/cached_log_tags branch March 17, 2022 18:13
@magnetikonline
Copy link
Contributor

@jvanbrie did you cut the release right? I can't see any of your change between the latest release and the past?

aws-dd-forwarder-3.42.0...aws-dd-forwarder-3.43.0

Maybe I'm just having a slow day here? 😄.

@jvanbrie
Copy link
Contributor Author

Something is definitely fishy here. Looks like the changes didn't make it into that release. I'll cut a new release tomorrow and see if that has the changes.

@jvanbrie
Copy link
Contributor Author

Thanks for bringing this up, just released a new version and it grabbed the changes this time. aws-dd-forwarder-3.43.0...aws-dd-forwarder-3.44.0

@magnetikonline
Copy link
Contributor

Thanks for bringing this up, just released a new version and it grabbed the changes this time.

No dramas @jvanbrie - agreed - can deff see the changes in there now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants