adding log group tags #531

jvanbrie · 2022-03-04T18:53:18Z

What does this PR do?

This pr adds support for cloudwatch log group tags. These tags will now get attached to the logs sent to datadog.

Motivation

Customer requests

Testing Guidelines

Created new lambda function with this code, created some custom tags on a few log groups, and confirmed those tags get attached to the logs that make their way to Datadog. Unit and integration tests pass as well.

Additional Notes

Types of changes

Bug fix
New feature
Breaking change
Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

This PR's description is comprehensive
This PR contains breaking changes that are documented in the description
This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
This PR impacts documentation, and it has been updated (or a ticket has been logged)
This PR's changes are covered by the automated tests
This PR collects user input/sensitive content into Datadog
This PR passes the integration tests (ask a Datadog member to run the tests)
This PR passes the unit tests
This PR passes the installation tests (ask a Datadog member to run the tests)

sabiurr · 2022-03-04T19:11:15Z

aws/logs_monitoring/parsing.py

+        logger.exception(f"Failed to get log group tags due to {e}")
+    if response is not None:
+        formatted_tags = [
+            "{key}:{value}".format(key=k, value=v) if v else k


I don't really get this ternary operator. Why would we set this to k if not v?

This is to capture how AWS tags work with keys and values. If the value isn't present we don't want to send "KEY:", we want to just send "KEY"

It's confusing indenting, but the list comprehension saying for each item, format if v exists, else just return k

format(k, v) if v else k for k, v in items().

I think I understand whats going on but not why

So in this case, is k the entire tag? If its a bad format, I'd think we'd exclude it

Yeah, k is the whole tag in this case. It's just like tags in datadog without a ":" - a "just key" tag as opposed to a "key:value" tage

Are the tags cached? There could be a lot of calls to retrieve the log group tags - resulting in a lot of errors - any thought to handling HTTP 429 errors and providing a backoff/retry strategy?

rl-michaelbrandeis · 2022-03-05T04:43:46Z

@jvanbrie did datadog update something here and not publish a new forwarder lambda? we are getting an error

not authorized to perform: lambda:GetLayerVersion on resource: arn:aws:lambda:us-east-1:464622532012:layer:Datadog-Forwarder:12 because no resource-based policy allows the lambda:GetLayerVersion action

See this same error from 2021

#493

rl-michaelbrandeis · 2022-03-05T04:56:42Z

Forgive me for intruding on your PR here. But it appears your have put the boto3 client connection inside the lambda handler so it will have to reconnect on every invoke, rather than caching a session - and calling a tag lookup on every invoke will surely hit throttling limits and cause issues in an account. The Datadog forwarder uses a LambdaTagsCache in other places to optimize handling of tags already. This seems like a regression and not an ideal solution

datadog-serverless-functions/aws/logs_monitoring/enhanced_lambda_metrics.py

Line 119 in 1acfca2

class LambdaTagsCache(object):

magnetikonline · 2022-03-06T22:44:53Z

The Datadog forwarder uses a LambdaTagsCache in other places to optimize handling of tags already. This seems like a regression and not an ideal solution

I'm just reviewing this PR too @rl-michaelbrandeis - note this PR adds the pulling of CloudWatch Log Group tags - not Lambda tags. The abilities added are really nice and well worth it (I do question if it should be optional though).

I tend to agree though - creating a boto3.client("logs") client and then calling cloudwatch_logs_client.list_tags_log_group() on every invoke of the forwarder Lambda seems excessive?

I would have thought the CloudWatch log group tags could be cached in the same way the Lambda function tags are cached - by using a JSON object within the nominated S3 bucket for the forwarder itself.

magnetikonline · 2022-03-06T22:48:58Z

@jvanbrie / @sabiurr something for consideration? Caching? ☝️

jvanbrie · 2022-03-07T16:51:11Z

Yep, we're making a few changes to this pr here. The layer version issue has been fixed and we're looking into some caching options for this in the meantime. This pr isn't in the current released version. For now we've reverted the change.

This reverts commit 63d1fda.

magnetikonline · 2022-03-07T23:16:27Z

Thanks @jvanbrie / @sabiurr - appreciate the fast replies and turnaround 👍 Will keep an eye out for next releases. 👍

jvanbrie · 2022-03-14T20:04:53Z

Hey everyone, I made a new pr to address some of these concerns: #540
This pr changes three major things from this implementation.

We now cache these cloudwatch log group tags (so no api call per invoke)
You can now configure the tag collection (turn it on/off and configure refresh time)
No boto reconnect on each invoke

Feel free to take a look - I'll leave the pr up for a few days before looking to release.

magnetikonline · 2022-03-15T02:58:20Z

@jvanbrie (sorry, maybe I should have commented on #540) but on this point:

You can now configure the tag collection (turn it on/off and configure refresh time)

I couldn't see changes in that PR that allowed tag collection to be switched on/off for CloudWatch log groups?

Otherwise looked good.

dung-leviethoang-mox · 2022-03-15T05:04:44Z

I want to report that with the current logic, the lambda assumes the log group is in the same account.
If I want centralize account contain the forwarder lambda, all log groups in several accounts use the lambda as targets for their log group subscriptions, it will throw below error:
Failed to get log group tags due to An error occurred (ResourceNotFoundException) when calling the ListTagsLogGroup operation: The specified log group does not exist.
Here's the issue I'm talking about #541

Just like @magnetikonline mentioned, would be better if we can have a switch to allow tag collection or not.

jvanbrie · 2022-03-15T15:44:51Z

You can control tag collection through the environment variable DD_FETCH_LAMBDA_TAGS. This being set to false will disable tag collection in the lambda: https://github.com/DataDog/datadog-serverless-functions/pull/540/files#diff-023a52190a3595e5875c43668188d53b819c2c966ac78be21408d516391b5679R188

I do not know how involved supporting a single lambda for multiple accounts is, but that is not a goal of this pr. If you want logs from multiple accounts with tags for log groups you can make a lambda in each account. If you don't want to make a lambda per-account you can disable tag collection for the lambda to avoid errors.

magnetikonline · 2022-03-16T05:26:03Z

You can control tag collection through the environment variable DD_FETCH_LAMBDA_TAGS. This being set to false will disable tag collection in the lambda: https://github.com/DataDog/datadog-serverless-functions/pull/540/files#diff-023a52190a3595e5875c43668188d53b819c2c966ac78be21408d516391b5679R188

Ah thanks @jvanbrie - I was hoping only your new CloudWatch log group tag collection could be disabled - basically retaining the existing behaviour before the PR.

I'd argue that DD_FETCH_LAMBDA_TAGS is now a confusing option name in that case.

jvanbrie added 2 commits March 4, 2022 13:31

adding log group tags

7a25e89

style fixes

b20606a

sabiurr reviewed Mar 4, 2022

View reviewed changes

sabiurr approved these changes Mar 4, 2022

View reviewed changes

jvanbrie merged commit 63d1fda into master Mar 4, 2022

jvanbrie deleted the jon.vanbriesen/aws-add-log-group-tags branch March 4, 2022 20:43

jvanbrie added a commit that referenced this pull request Mar 7, 2022

Revert "adding log group tags (#531)"

46438b9

This reverts commit 63d1fda.

jvanbrie mentioned this pull request Mar 7, 2022

Revert "adding log group tags" #538

Merged

jvanbrie added a commit that referenced this pull request Mar 7, 2022

Revert "adding log group tags (#531)" (#538)

160a7d1

This reverts commit 63d1fda.

magnetikonline mentioned this pull request Mar 16, 2022

cached cloudwatch log group tags #540

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding log group tags #531

adding log group tags #531

jvanbrie commented Mar 4, 2022

sabiurr Mar 4, 2022

jvanbrie Mar 4, 2022 •

edited

Loading

buckhx Mar 4, 2022

sabiurr Mar 4, 2022

jvanbrie Mar 4, 2022

Ashley-Smith Mar 7, 2022

rl-michaelbrandeis commented Mar 5, 2022 •

edited

Loading

rl-michaelbrandeis commented Mar 5, 2022

magnetikonline commented Mar 6, 2022 •

edited

Loading

magnetikonline commented Mar 6, 2022

jvanbrie commented Mar 7, 2022 •

edited

Loading

magnetikonline commented Mar 7, 2022

jvanbrie commented Mar 14, 2022 •

edited

Loading

magnetikonline commented Mar 15, 2022 •

edited

Loading

dung-leviethoang-mox commented Mar 15, 2022 •

edited

Loading

jvanbrie commented Mar 15, 2022

magnetikonline commented Mar 16, 2022

adding log group tags #531

adding log group tags #531

Conversation

jvanbrie commented Mar 4, 2022

What does this PR do?

Motivation

Testing Guidelines

Additional Notes

Types of changes

Check all that apply

sabiurr Mar 4, 2022

Choose a reason for hiding this comment

jvanbrie Mar 4, 2022 • edited Loading

Choose a reason for hiding this comment

buckhx Mar 4, 2022

Choose a reason for hiding this comment

sabiurr Mar 4, 2022

Choose a reason for hiding this comment

jvanbrie Mar 4, 2022

Choose a reason for hiding this comment

Ashley-Smith Mar 7, 2022

Choose a reason for hiding this comment

rl-michaelbrandeis commented Mar 5, 2022 • edited Loading

rl-michaelbrandeis commented Mar 5, 2022

magnetikonline commented Mar 6, 2022 • edited Loading

magnetikonline commented Mar 6, 2022

jvanbrie commented Mar 7, 2022 • edited Loading

magnetikonline commented Mar 7, 2022

jvanbrie commented Mar 14, 2022 • edited Loading

magnetikonline commented Mar 15, 2022 • edited Loading

dung-leviethoang-mox commented Mar 15, 2022 • edited Loading

jvanbrie commented Mar 15, 2022

magnetikonline commented Mar 16, 2022

jvanbrie Mar 4, 2022 •

edited

Loading

rl-michaelbrandeis commented Mar 5, 2022 •

edited

Loading

magnetikonline commented Mar 6, 2022 •

edited

Loading

jvanbrie commented Mar 7, 2022 •

edited

Loading

jvanbrie commented Mar 14, 2022 •

edited

Loading

magnetikonline commented Mar 15, 2022 •

edited

Loading

dung-leviethoang-mox commented Mar 15, 2022 •

edited

Loading