Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3: can't use multipart due to large compression factor #674

Open
danielejuan-metr opened this issue Jun 8, 2023 · 8 comments
Open

S3: can't use multipart due to large compression factor #674

danielejuan-metr opened this issue Jun 8, 2023 · 8 comments
Labels
bug Something isn't working enhancement Feature request or enhancement on existing features

Comments

@danielejuan-metr
Copy link

Describe the question/issue

We are testing out the S3 plugin for fluentbit in AWS EKS. Will it be possible to enable compression and multipart upload in the latest stable release?

With the output configuration below, fluentbit compresses the chunk and put upload is used (as logged below). We are expecting that the chunks will be compressed but multipart upload will be used until the total file size is reached. Is this a misunderstanding on our part?

Due to the behavior above, the s3 bucket contains a lot of small gz files.

Configuration

[OUTPUT]
        Name                      s3
        Match                     application.*
        region                    ${AWS_REGION}
        bucket                    ${S3_BUCKET_NAME}
        total_file_size           256M
        upload_timeout            5m
        compression               gzip
        s3_key_format             /logs-apps/%Y/%m/%d/%Y%m%d%H%M%S-$TAG-$UUID.gz

Fluent Bit Log Output

[2023/05/30 07:35:20] [ info] [output:s3:s3.0] Pre-compression upload_chunk_size= 5630650, After compression, chunk is only 106332 bytes, the chunk was too small, using PutObject to upload

Fluent Bit Version Info

public.ecr.aws/aws-observability/aws-for-fluent-bit:stable

Cluster Details

  • No meshes
  • EKS
  • Worker Node EC2
  • Daemonset fluentbit
@PettitWesley
Copy link
Contributor

Please see the new docs I have added here: fluent/fluent-bit-docs#1127

Since your compression factor is very large, the suggestion to increase upload_chunk_size will probably not work, since the max value for it is 50M right now. I've taken an action item for myself to increase it in a future release + revisit this experience.

A short term workaround for you might be to switch to use_put_object On and then increase your upload_timeout.

Let me know if you have any more questions/confusions/feedback and we can discuss it.

@PettitWesley PettitWesley changed the title [Question] S3 Output plugin S3: can't use multipart due to large compression factor Jun 9, 2023
@PettitWesley PettitWesley added bug Something isn't working enhancement Feature request or enhancement on existing features labels Jun 9, 2023
@danielejuan-metr
Copy link
Author

Suppose we use put object on and 256mb total_file_size, can we expect that the buffering will be done on disk (store_dir)? Is the store_dir_limit_size config for the latest stable version unlimited? Upon checking in https://docs.fluentbit.io/manual/v/1.9-pre/pipeline/outputs/s3 store_dir_limit_size does not exist yet.

Thanks @PettitWesley.

@danielejuan-metr
Copy link
Author

Also, should we use as reference the 2.1 documentation https://docs.fluentbit.io/manual/pipeline/outputs/s3 or 1.9 https://docs.fluentbit.io/manual/v/1.9-pre/ when configuring s3 output plugin if we are using public.ecr.aws/aws-observability/aws-for-fluent-bit:stable?

@PettitWesley
Copy link
Contributor

@danielejuan-metr unfortunately, the AWS release has drifted from the upstream, we're mostly 1.9 based with some more recent AWS features added.

Our release notes explain the net changes in each release:

our stable is now 2.31.11 that is based on 1.9.10 FLB + our custom patches, 1.9.10 had store_dir_limit_size: https://github.com/fluent/fluent-bit/tree/v1.9.10

I'm sorry for this drift... I know its not super convenient right now. Hopefully in the future we will get back to just re-releasing upstream versions.

Suppose we use put object on and 256mb total_file_size, can we expect that the buffering will be done on disk (store_dir)

Yes it will buffer an entire 256mb (if you upload_timeout gives it enough time to) and then compress it, and then send the compressed file all at once.

@PettitWesley
Copy link
Contributor

Also, should we use as reference the 2.1 documentation https://docs.fluentbit.io/manual/pipeline/outputs/s3 or 1.9 https://docs.fluentbit.io/manual/v/1.9-pre/ when configuring s3 output plugin if we are using public.ecr.aws/aws-observability/aws-for-fluent-bit:stable?

AWS release 2.31.11/stable contains all of the AWS features you see in 2.1 for S3, so use that.

@danielejuan-metr
Copy link
Author

@PettitWesley, hoping to reuse this thread for our next question regarding the s3 plugin.

We see that S3 plugin output can support Workers with a limit of 1. Suppose we enable this, does the worker only receive chunks, write to store dir, compress and send to s3?

If sequence above is accurate, suppose compression and sending to s3 is slow, will fluentbit engine backpressure chunks grow? (chunks will be buffered in [Service] storage.path on filesystem buffering)?

@PettitWesley
Copy link
Contributor

PettitWesley commented Sep 1, 2023

We see that S3 plugin output can support Workers with a limit of 1. Suppose we enable this, does the worker only receive chunks, write to store dir, compress and send to s3?

This is largely correct, except that the send can happen in the timer thread which runs at an interval to see if any files are ready to send. The timer thread is technically not the same as the worker thread, it runs in the main engine thread, which means it blocks and slows the engine itself

Currently in the AWS Distro S3 uses async http, which means that it won't block any thread while its waiting to send.: #702

Please read this: https://docs.fluentbit.io/manual/pipeline/outputs/s3#differences-between-s3-and-other-fluent-bit-outputs

I'm working on a major refactor of S3 output which should make it more reliable and make all of its operations run inside the worker thread. It will still support only 1 worker, but longer term it will enable me to add support for multiple workers.

If sequence above is accurate, suppose compression and sending to s3 is slow, will fluentbit engine backpressure chunks grow? (chunks will be buffered in [Service] storage.path on filesystem buffering)?

Potentially. In practice I've never seen this. As noted above, the send happens async. The compression step does take non-trivial CPU and is synchronous... so that may slow things down.

Please let me know if this makes sense and if you have any more questions.

EDIT: We just checked the code and the timer thread should be running in the worker thread as well.

@danielejuan-metr
Copy link
Author

danielejuan-metr commented Sep 4, 2023

Potentially. In practice I've never seen this. As noted above, the send happens async. The compression step does take non-trivial CPU and is synchronous... so that may slow things down.

For additional context of the question above, we are trying to identify the bottleneck of our test setup. From our volume testing, w/ around 27MB/s tail ingestion we see that the filesystem buffer (storage.path) is growing and we saw gigabytes of data in the buffer. We only have s3 as our output w/ worker set to 1 and compression enabled. With your response above, it seems like compression is our bottleneck since sending to s3 is async.

Please let me know if this makes sense and if you have any more questions.

For confirmation as well, we use app name in our s3 log filename prefix. Does this mean that each file have their own timeout timers and file size computation?

Thanks for the response @PettitWesley!

--

Update: We tested without compression and throughputs were the same. Upon checking the releases in the repo, there were benchmark results up to 30mb/s. Was there any concern/issue on throughputs higher than 30mb/s?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement Feature request or enhancement on existing features
Projects
None yet
Development

No branches or pull requests

2 participants