Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 key timestamp is NOT the timestamp of any log record #459

Open
PettitWesley opened this issue Nov 9, 2022 · 5 comments
Open

S3 key timestamp is NOT the timestamp of any log record #459

PettitWesley opened this issue Nov 9, 2022 · 5 comments

Comments

@PettitWesley
Copy link
Contributor

Describe the question/issue

I thought this was not the case but it turns out our code does not actually take the log timestamp and use it to set the file name. Customers likely expect that a S3 file with a certain timestamp would have the first log entry to have that timestamp and then for all subsequent logs in the same S3 file to be afterwards.

Since this is not the case, it may be difficult to find specific logs in files.

See the code here:

The timestamp is always just the current time at which out_s3 started creating the file on disk for buffering. Not the upload time. And not a timestamp from the logs.

@PettitWesley
Copy link
Contributor Author

As noted here, there are potentially two bugs/needed enhancements, and the second is to support non-UTC timestamp for S3: #432 (comment)

@matthewfala matthewfala mentioned this issue Feb 3, 2023
@cdancy
Copy link

cdancy commented Jul 3, 2023

@PettitWesley I assume this is still a known issue and being worked on and/or tracked? We just saw this on a cluster which has a massive amount of pods (41). The folder we wrote to within S3 had the correct date but the timestamp of the gzip file itself was from yesterday: contents within however are from today.

Would it make sense to update the "last modified date" of the gzip file just prior to uploading?

@PettitWesley
Copy link
Contributor Author

PettitWesley commented Jul 5, 2023

https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.31.3
@cdancy
2.31.3 has that feature, but then we removed it because a recent change in S3 (possibly that one or possibly another) added instability, so we reverted all recent S3 changes.

All of the S3 fixes will come back soonish, once I complete the S3 stability refactor (code complete and tested but one pending core change to enable it) : PettitWesley/fluent-bit#24

@cdancy
Copy link

cdancy commented Jul 5, 2023

@PettitWesley that's for getting back. We'll keep following for now. Not a show-stopper but something we noticed trying to debug logs that left us scratching our heads wondering if we were sane or not :)

@bgardner-noggin
Copy link

I'm not sure that the bug fix for #459 (comment) above will resolve the issue

We are using fluentbit to write logs to s3 and then using Athena partitioning to query the logs

eg

A file written to s3 with path year=2023/month=10/day=04/hour=03/somefile.gz

SELECT *
FROM servicelogs
WHERE year = 2023 and month = 10 and day = 4 and hour = 3;

If the file receives its s3 prefix from the time of the first log, this log could contain records from hour 4.

Ideally fluentbit would cutover to a new file at the partition change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants