S3: can't use multipart due to large compression factor #674

danielejuan-metr · 2023-06-08T07:40:29Z

Describe the question/issue

We are testing out the S3 plugin for fluentbit in AWS EKS. Will it be possible to enable compression and multipart upload in the latest stable release?

With the output configuration below, fluentbit compresses the chunk and put upload is used (as logged below). We are expecting that the chunks will be compressed but multipart upload will be used until the total file size is reached. Is this a misunderstanding on our part?

Due to the behavior above, the s3 bucket contains a lot of small gz files.

Configuration

[OUTPUT]
        Name                      s3
        Match                     application.*
        region                    ${AWS_REGION}
        bucket                    ${S3_BUCKET_NAME}
        total_file_size           256M
        upload_timeout            5m
        compression               gzip
        s3_key_format             /logs-apps/%Y/%m/%d/%Y%m%d%H%M%S-$TAG-$UUID.gz

Fluent Bit Log Output

[2023/05/30 07:35:20] [ info] [output:s3:s3.0] Pre-compression upload_chunk_size= 5630650, After compression, chunk is only 106332 bytes, the chunk was too small, using PutObject to upload

Fluent Bit Version Info

public.ecr.aws/aws-observability/aws-for-fluent-bit:stable

Cluster Details

No meshes
EKS
Worker Node EC2
Daemonset fluentbit

The text was updated successfully, but these errors were encountered:

PettitWesley · 2023-06-09T03:39:20Z

Please see the new docs I have added here: fluent/fluent-bit-docs#1127

Since your compression factor is very large, the suggestion to increase upload_chunk_size will probably not work, since the max value for it is 50M right now. I've taken an action item for myself to increase it in a future release + revisit this experience.

A short term workaround for you might be to switch to use_put_object On and then increase your upload_timeout.

Let me know if you have any more questions/confusions/feedback and we can discuss it.

danielejuan-metr · 2023-06-13T02:55:32Z

Suppose we use put object on and 256mb total_file_size, can we expect that the buffering will be done on disk (store_dir)? Is the store_dir_limit_size config for the latest stable version unlimited? Upon checking in https://docs.fluentbit.io/manual/v/1.9-pre/pipeline/outputs/s3 store_dir_limit_size does not exist yet.

Thanks @PettitWesley.

danielejuan-metr · 2023-06-14T05:27:15Z

Also, should we use as reference the 2.1 documentation https://docs.fluentbit.io/manual/pipeline/outputs/s3 or 1.9 https://docs.fluentbit.io/manual/v/1.9-pre/ when configuring s3 output plugin if we are using public.ecr.aws/aws-observability/aws-for-fluent-bit:stable?

PettitWesley · 2023-06-14T21:51:57Z

@danielejuan-metr unfortunately, the AWS release has drifted from the upstream, we're mostly 1.9 based with some more recent AWS features added.

Our release notes explain the net changes in each release:

https://github.com/aws/aws-for-fluent-bit/releases
these are our custom patches: https://github.com/aws/aws-for-fluent-bit/blob/mainline/AWS_FLB_CHERRY_PICKS

our stable is now 2.31.11 that is based on 1.9.10 FLB + our custom patches, 1.9.10 had store_dir_limit_size: https://github.com/fluent/fluent-bit/tree/v1.9.10

I'm sorry for this drift... I know its not super convenient right now. Hopefully in the future we will get back to just re-releasing upstream versions.

Suppose we use put object on and 256mb total_file_size, can we expect that the buffering will be done on disk (store_dir)

Yes it will buffer an entire 256mb (if you upload_timeout gives it enough time to) and then compress it, and then send the compressed file all at once.

PettitWesley · 2023-06-15T22:40:21Z

Also, should we use as reference the 2.1 documentation https://docs.fluentbit.io/manual/pipeline/outputs/s3 or 1.9 https://docs.fluentbit.io/manual/v/1.9-pre/ when configuring s3 output plugin if we are using public.ecr.aws/aws-observability/aws-for-fluent-bit:stable?

AWS release 2.31.11/stable contains all of the AWS features you see in 2.1 for S3, so use that.

danielejuan-metr · 2023-09-01T06:33:20Z

@PettitWesley, hoping to reuse this thread for our next question regarding the s3 plugin.

We see that S3 plugin output can support Workers with a limit of 1. Suppose we enable this, does the worker only receive chunks, write to store dir, compress and send to s3?

If sequence above is accurate, suppose compression and sending to s3 is slow, will fluentbit engine backpressure chunks grow? (chunks will be buffered in [Service] storage.path on filesystem buffering)?

PettitWesley · 2023-09-01T18:45:34Z

We see that S3 plugin output can support Workers with a limit of 1. Suppose we enable this, does the worker only receive chunks, write to store dir, compress and send to s3?

This is largely correct, except that the send can happen in the timer thread which runs at an interval to see if any files are ready to send. ~~The timer thread is technically not the same as the worker thread, it runs in the main engine thread, which means it blocks and slows the engine itself~~

Currently in the AWS Distro S3 uses async http, which means that it won't block any thread while its waiting to send.: #702

Please read this: https://docs.fluentbit.io/manual/pipeline/outputs/s3#differences-between-s3-and-other-fluent-bit-outputs

I'm working on a major refactor of S3 output which should make it more reliable and make all of its operations run inside the worker thread. It will still support only 1 worker, but longer term it will enable me to add support for multiple workers.

If sequence above is accurate, suppose compression and sending to s3 is slow, will fluentbit engine backpressure chunks grow? (chunks will be buffered in [Service] storage.path on filesystem buffering)?

Potentially. In practice I've never seen this. As noted above, the send happens async. The compression step does take non-trivial CPU and is synchronous... so that may slow things down.

Please let me know if this makes sense and if you have any more questions.

EDIT: We just checked the code and the timer thread should be running in the worker thread as well.

danielejuan-metr · 2023-09-04T03:41:58Z

Potentially. In practice I've never seen this. As noted above, the send happens async. The compression step does take non-trivial CPU and is synchronous... so that may slow things down.

For additional context of the question above, we are trying to identify the bottleneck of our test setup. From our volume testing, w/ around 27MB/s tail ingestion we see that the filesystem buffer (storage.path) is growing and we saw gigabytes of data in the buffer. We only have s3 as our output w/ worker set to 1 and compression enabled. With your response above, it seems like compression is our bottleneck since sending to s3 is async.

Please let me know if this makes sense and if you have any more questions.

For confirmation as well, we use app name in our s3 log filename prefix. Does this mean that each file have their own timeout timers and file size computation?

Thanks for the response @PettitWesley!

--

Update: We tested without compression and throughputs were the same. Upon checking the releases in the repo, there were benchmark results up to 30mb/s. Was there any concern/issue on throughputs higher than 30mb/s?

PettitWesley changed the title ~~[Question] S3 Output plugin~~ S3: can't use multipart due to large compression factor Jun 9, 2023

PettitWesley added bug Something isn't working enhancement Feature request or enhancement on existing features labels Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3: can't use multipart due to large compression factor #674

S3: can't use multipart due to large compression factor #674

danielejuan-metr commented Jun 8, 2023

PettitWesley commented Jun 9, 2023

danielejuan-metr commented Jun 13, 2023

danielejuan-metr commented Jun 14, 2023

PettitWesley commented Jun 14, 2023

PettitWesley commented Jun 15, 2023

danielejuan-metr commented Sep 1, 2023

PettitWesley commented Sep 1, 2023 •

edited

Loading

danielejuan-metr commented Sep 4, 2023 •

edited

Loading

S3: can't use multipart due to large compression factor #674

S3: can't use multipart due to large compression factor #674

Comments

danielejuan-metr commented Jun 8, 2023

Describe the question/issue

Configuration

Fluent Bit Log Output

Fluent Bit Version Info

Cluster Details

PettitWesley commented Jun 9, 2023

danielejuan-metr commented Jun 13, 2023

danielejuan-metr commented Jun 14, 2023

PettitWesley commented Jun 14, 2023

PettitWesley commented Jun 15, 2023

danielejuan-metr commented Sep 1, 2023

PettitWesley commented Sep 1, 2023 • edited Loading

danielejuan-metr commented Sep 4, 2023 • edited Loading

PettitWesley commented Sep 1, 2023 •

edited

Loading

danielejuan-metr commented Sep 4, 2023 •

edited

Loading