Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in splitting line on S3 file with DD_MULTILINE_LOG_REGEX_PATTERN #811

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

showwin
Copy link
Contributor

@showwin showwin commented Jun 24, 2024

What does this PR do?

Fix a parsing issue for a S3 file with DD_MULTILINE_LOG_REGEX_PATTERN environment variable, which is introduced by the refactoring at 196c633

Motivation

My log files in S3 bucket was parsed wrongly and ingested to Datadog Log when I upgraded the version from 3.108.0 to 3.116.0. Strictly speaking, the issue was introduced from 3.113.0 [diff].

Testing Guidelines

Since this is a very small change, I didn't wrote any test code. Instead, I paste the sample code here for easier understanding

In [1]: data = 'hello\r\nnext line\nbye'

# expected
In [2]: for line in data.splitlines():
    ...:     print(line)
    ...:
hello
next line
bye

# Now. My parsed log in Datadog Log is like this now.
In [3]: for line in data:
    ...:     print(line)
    ...:
h
e
l
l
o



n
e
x
t

l
i
n
e


b
y
e

Additional Notes

This problem happens only when DD_MULTILINE_LOG_REGEX_PATTERN is given but the content of file doesn't match it.

Types of changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)
  • This PR passes the unit tests
  • This PR passes the installation tests (ask a Datadog member to run the tests)

@showwin showwin requested a review from a team as a code owner June 24, 2024 07:40
@@ -296,6 +296,7 @@ def _extract_other_logs(self):
"DD_MULTILINE_LOG_REGEX_PATTERN %s did not match start of file, splitting by line",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though the log says "splitting by line", the self.data_store.data is not split by line for now, actually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, beware that String splitlines and Bytes don't behave exactly the same (cf here). I guess for an error scenario it's good enough.

@@ -296,6 +296,7 @@ def _extract_other_logs(self):
"DD_MULTILINE_LOG_REGEX_PATTERN %s did not match start of file, splitting by line",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, beware that String splitlines and Bytes don't behave exactly the same (cf here). I guess for an error scenario it's good enough.

@ViBiOh ViBiOh self-assigned this Jul 9, 2024
@ViBiOh ViBiOh added the aws label Jul 9, 2024
@ViBiOh ViBiOh merged commit a020b40 into DataDog:master Jul 9, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants