Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding log group tags #531

Merged
merged 2 commits into from
Mar 4, 2022
Merged

Conversation

jvanbrie
Copy link
Contributor

@jvanbrie jvanbrie commented Mar 4, 2022

What does this PR do?

This pr adds support for cloudwatch log group tags. These tags will now get attached to the logs sent to datadog.

Motivation

Customer requests

Testing Guidelines

Created new lambda function with this code, created some custom tags on a few log groups, and confirmed those tags get attached to the logs that make their way to Datadog. Unit and integration tests pass as well.

Additional Notes

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)
  • This PR passes the unit tests
  • This PR passes the installation tests (ask a Datadog member to run the tests)

logger.exception(f"Failed to get log group tags due to {e}")
if response is not None:
formatted_tags = [
"{key}:{value}".format(key=k, value=v) if v else k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this ternary operator. Why would we set this to k if not v?

Copy link
Contributor Author

@jvanbrie jvanbrie Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to capture how AWS tags work with keys and values. If the value isn't present we don't want to send "KEY:", we want to just send "KEY"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing indenting, but the list comprehension saying for each item, format if v exists, else just return k

format(k, v) if v else k for k, v in items().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand whats going on but not why

So in this case, is k the entire tag? If its a bad format, I'd think we'd exclude it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, k is the whole tag in this case. It's just like tags in datadog without a ":" - a "just key" tag as opposed to a "key:value" tage

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the tags cached? There could be a lot of calls to retrieve the log group tags - resulting in a lot of errors - any thought to handling HTTP 429 errors and providing a backoff/retry strategy?

@jvanbrie jvanbrie merged commit 63d1fda into master Mar 4, 2022
@jvanbrie jvanbrie deleted the jon.vanbriesen/aws-add-log-group-tags branch March 4, 2022 20:43
@rl-michaelbrandeis
Copy link

rl-michaelbrandeis commented Mar 5, 2022

@jvanbrie did datadog update something here and not publish a new forwarder lambda? we are getting an error

not authorized to perform: lambda:GetLayerVersion on resource: arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Forwarder:12 because no resource-based policy allows the lambda:GetLayerVersion action

See this same error from 2021

#493

@rl-michaelbrandeis
Copy link

Forgive me for intruding on your PR here. But it appears your have put the boto3 client connection inside the lambda handler so it will have to reconnect on every invoke, rather than caching a session - and calling a tag lookup on every invoke will surely hit throttling limits and cause issues in an account. The Datadog forwarder uses a LambdaTagsCache in other places to optimize handling of tags already. This seems like a regression and not an ideal solution

@magnetikonline
Copy link
Contributor

magnetikonline commented Mar 6, 2022

The Datadog forwarder uses a LambdaTagsCache in other places to optimize handling of tags already. This seems like a regression and not an ideal solution

I'm just reviewing this PR too @rl-michaelbrandeis - note this PR adds the pulling of CloudWatch Log Group tags - not Lambda tags. The abilities added are really nice and well worth it (I do question if it should be optional though).

I tend to agree though - creating a boto3.client("logs") client and then calling cloudwatch_logs_client.list_tags_log_group() on every invoke of the forwarder Lambda seems excessive?

I would have thought the CloudWatch log group tags could be cached in the same way the Lambda function tags are cached - by using a JSON object within the nominated S3 bucket for the forwarder itself.

@magnetikonline
Copy link
Contributor

@jvanbrie / @sabiurr something for consideration? Caching? ☝️

@jvanbrie
Copy link
Contributor Author

jvanbrie commented Mar 7, 2022

Yep, we're making a few changes to this pr here. The layer version issue has been fixed and we're looking into some caching options for this in the meantime. This pr isn't in the current released version. For now we've reverted the change.

jvanbrie added a commit that referenced this pull request Mar 7, 2022
jvanbrie added a commit that referenced this pull request Mar 7, 2022
@magnetikonline
Copy link
Contributor

Thanks @jvanbrie / @sabiurr - appreciate the fast replies and turnaround 👍 Will keep an eye out for next releases. 👍

@jvanbrie
Copy link
Contributor Author

jvanbrie commented Mar 14, 2022

Hey everyone, I made a new pr to address some of these concerns: #540
This pr changes three major things from this implementation.

  1. We now cache these cloudwatch log group tags (so no api call per invoke)
  2. You can now configure the tag collection (turn it on/off and configure refresh time)
  3. No boto reconnect on each invoke

Feel free to take a look - I'll leave the pr up for a few days before looking to release.

@magnetikonline
Copy link
Contributor

magnetikonline commented Mar 15, 2022

@jvanbrie (sorry, maybe I should have commented on #540) but on this point:

You can now configure the tag collection (turn it on/off and configure refresh time)

I couldn't see changes in that PR that allowed tag collection to be switched on/off for CloudWatch log groups?

Otherwise looked good.

@dung-leviethoang-mox
Copy link

dung-leviethoang-mox commented Mar 15, 2022

I want to report that with the current logic, the lambda assumes the log group is in the same account.
If I want centralize account contain the forwarder lambda, all log groups in several accounts use the lambda as targets for their log group subscriptions, it will throw below error:
Failed to get log group tags due to An error occurred (ResourceNotFoundException) when calling the ListTagsLogGroup operation: The specified log group does not exist.
Here's the issue I'm talking about #541

Just like @magnetikonline mentioned, would be better if we can have a switch to allow tag collection or not.

@jvanbrie
Copy link
Contributor Author

You can control tag collection through the environment variable DD_FETCH_LAMBDA_TAGS. This being set to false will disable tag collection in the lambda: https://github.com/DataDog/datadog-serverless-functions/pull/540/files#diff-023a52190a3595e5875c43668188d53b819c2c966ac78be21408d516391b5679R188

I do not know how involved supporting a single lambda for multiple accounts is, but that is not a goal of this pr. If you want logs from multiple accounts with tags for log groups you can make a lambda in each account. If you don't want to make a lambda per-account you can disable tag collection for the lambda to avoid errors.

@magnetikonline
Copy link
Contributor

You can control tag collection through the environment variable DD_FETCH_LAMBDA_TAGS. This being set to false will disable tag collection in the lambda: https://github.com/DataDog/datadog-serverless-functions/pull/540/files#diff-023a52190a3595e5875c43668188d53b819c2c966ac78be21408d516391b5679R188

Ah thanks @jvanbrie - I was hoping only your new CloudWatch log group tag collection could be disabled - basically retaining the existing behaviour before the PR.

I'd argue that DD_FETCH_LAMBDA_TAGS is now a confusing option name in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants