Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_cloudwatch_logs: Only create log group if it does not already exist #4826

Merged
merged 1 commit into from
Mar 1, 2022

Conversation

PettitWesley
Copy link
Contributor

Signed-off-by: Wesley Pettit wppttt@amazon.com


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@PettitWesley
Copy link
Contributor Author

Now, if the group already exists, it won't even try to create it, it just creates the stream

Fluent Bit v1.8.13
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/15 08:15:05] [ info] [engine] started (pid=11476)
[2022/02/15 08:15:05] [ info] [storage] version=1.1.6, initializing...
[2022/02/15 08:15:05] [ info] [storage] in-memory
[2022/02/15 08:15:05] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/15 08:15:05] [ info] [cmetrics] version=0.2.2
[2022/02/15 08:15:05] [ info] [sp] stream processor started
[2022/02/15 08:15:10] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream from-fluent-bit-dummy in log group fluent-bit-is-the-best-cloudwatch-group-throttle
[2022/02/15 08:15:10] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Log Stream from-fluent-bit-dummy already exists

However, if stream creation fails on the group not existing, then we try to create it:

Fluent Bit v1.8.13
* Copyright (C) 2015-2021 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2022/02/15 08:14:23] [ info] [engine] started (pid=11215)
[2022/02/15 08:14:23] [ info] [storage] version=1.1.6, initializing...
[2022/02/15 08:14:23] [ info] [storage] in-memory
[2022/02/15 08:14:23] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2022/02/15 08:14:23] [ info] [cmetrics] version=0.2.2
[2022/02/15 08:14:23] [ info] [sp] stream processor started
[2022/02/15 08:14:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream from-fluent-bit-dummy in log group fluent-bit-is-the-best-cloudwatch-group-throttle
[2022/02/15 08:14:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Log Group fluent-bit-is-the-best-cloudwatch-group-throttle not found. Will attempt to create it.
[2022/02/15 08:14:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log group fluent-bit-is-the-best-cloudwatch-group-throttle
[2022/02/15 08:14:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Created log group fluent-bit-is-the-best-cloudwatch-group-throttle
[2022/02/15 08:14:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream from-fluent-bit-dummy in log group fluent-bit-is-the-best-cloudwatch-group-throttle
[2022/02/15 08:14:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Created log stream from-fluent-bit-dummy

This reduces unnecessary calls to CreateLogGroup API.

If auto_create_group is disabled, the user gets a useful warning:

[2022/02/15 08:15:28] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream from-fluent-bit-dummy in log group fluent-bit-is-the-best-cloudwatch-group-throttle-2
[2022/02/15 08:15:28] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Log Group fluent-bit-is-the-best-cloudwatch-group-throttle-2 not found and `auto_create_group` disabled.
[2022/02/15 08:15:28] [ warn] [engine] failed to flush chunk '11589-1644912924.153454467.flb', retry in 10 seconds: task_id=0, input=dummy.0 > output=cloudwatch_logs.0 (out_id=0)
[2022/02/15 08:15:33] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] Creating log stream from-fluent-bit-dummy in log group fluent-bit-is-the-best-cloudwatch-group-throttle-2
[2022/02/15 08:15:33] [error] [output:cloudwatch_logs:cloudwatch_logs.0] Log Group fluent-bit-is-the-best-cloudwatch-group-throttle-2 not found and `auto_create_group` disabled.
[2022/02/15 08:15:33] [ warn] [engine] failed to flush chunk '11589-1644912928.444537263.flb', retry in 6 seconds: task_id=1, input=dummy.0 > output=cloudwatch_logs.0 (out_id=0)

return -1;
} else {
/* retry stream creation */
goto retry_create_stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a retry limit? This could be an infinite loop if the resource continues to fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the advantage of using goto statement here vs return create_log_stream(ctx, stream);? Is it for efficiency so the function stack doesn't need to be reinitialized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second comment... the goto is just because I didn't think of making it recursive... can change it

For the infinite loop... can it? Because the call to create_stream only happens again if the CreateLogGroup succeeds but then create stream again fails with the resource not found exception. Which shouldn't be able to happen repeatedly.

The algo is:

  1. Try to create stream.
  2. If stream fails because group doesn't exist, then create the group. Otherwise, goto 4.
  3. If group creation succeeds, goto 1. Otherwise, goto 4.
  4. Exit

I think it can't be infinite unless I guess the CW API has a bug where it continually returns the same failure incorrectly... I could add a bool to make sure we only try once. But I'm not sure its necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the call to create_stream only happens again if the CreateLogGroup succeeds but then create stream again fails with the resource not found exception. Which shouldn't be able to happen repeatedly.

Just a suggestion - I think it might be better to add a bool here so that we are sure that the creation won't happen again and again. It might be very low possibility but just in case.

return -1;
} else {
/* retry stream creation */
goto retry_create_stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Makes sense.

return -1;
} else {
/* retry stream creation */
goto retry_create_stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the call to create_stream only happens again if the CreateLogGroup succeeds but then create stream again fails with the resource not found exception. Which shouldn't be able to happen repeatedly.

Just a suggestion - I think it might be better to add a bool here so that we are sure that the creation won't happen again and again. It might be very low possibility but just in case.

@PettitWesley
Copy link
Contributor Author

@zhonghui12 @matthewfala Addressed comments.

@@ -391,14 +391,8 @@ static void cb_cloudwatch_flush(const void *data, size_t bytes,

ctx->buf->put_events_calls = 0;

if (ctx->create_group == FLB_TRUE && ctx->group_created == FLB_FALSE) {
ret = create_log_group(ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ret unused now. Causing tests to fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@matthewfala
Copy link
Contributor

@zhonghui12 @matthewfala Addressed comments.

Great! That looks good.

…st to prevent throttling

Signed-off-by: Wesley Pettit <wppttt@amazon.com>
@PettitWesley PettitWesley merged commit d36af1f into fluent:1.8 Mar 1, 2022
PettitWesley added a commit to PettitWesley/fluent-bit that referenced this pull request Mar 2, 2022
Signed-off-by: Wesley Pettit <wppttt@amazon.com>
PettitWesley added a commit that referenced this pull request Mar 2, 2022
Signed-off-by: Wesley Pettit <wppttt@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants