Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support of GroK command including default patterns #598

Merged
merged 12 commits into from
Aug 29, 2024

Conversation

YANG-DB
Copy link
Member

@YANG-DB YANG-DB commented Aug 24, 2024

Description

Add PPL grok command described here
Map grok to the next spark sql regexp_extract command

Related campaign :
#408

Issues Resolved

#451

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
@YANG-DB YANG-DB added 0.5 PPL Pipe Processing Language support backport 0.5 0.5.1 labels Aug 24, 2024
@YANG-DB YANG-DB marked this pull request as draft August 24, 2024 17:21
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
@YANG-DB YANG-DB marked this pull request as ready for review August 27, 2024 19:13
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
@@ -0,0 +1,3 @@
# Forked from https://github.com/elasticsearch/logstash/tree/v1.4.0/patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do this file resources use in PR? I cannot find any test which leverage the files in folder resource/patterns

Copy link
Member Author

@YANG-DB YANG-DB Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GrokExpression statically loads the default patterns in the patterns folder

    public static class GrokExpression extends ParseExpression {
        private static final GrokCompiler grokCompiler = GrokCompiler.newInstance();

        static {
            grokCompiler.registerDefaultPatterns();
        }

afterwards any such string '.+@%{HOSTNAME:host}' is matched and replaced with the default patterns that are stated in that folder
The FlintSparkPPLGrokITSuite uses these patterns for the tests

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
@YANG-DB YANG-DB requested a review from LantaoJin August 28, 2024 19:38
- `source=accounts | grok email '.+@%{HOSTNAME:host}' | eval eval_result=1 | fields host, eval_result`
- `source=accounts | grok street_address '%{NUMBER} %{GREEDYDATA:address}' | fields address `
- `source=logs | grok message '%{COMMONAPACHELOG}' | fields COMMONAPACHELOG, timestamp, response, bytes`
-
Copy link
Member

@LantaoJin LantaoJin Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L324 need to remove.
BTW, can we add a line to explain current limitation of Grok? such as

Limitation: Overriding existing field is unsupported:
source=accounts | grok address '%{NUMBER} %{GREEDYDATA:address}' | fields address

Copy link
Member

@LantaoJin LantaoJin Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the rest

- `source=accounts | grok email '.+@%{HOSTNAME:host}' | eval eval_result=1 | fields host, eval_result`
- `source=accounts | grok street_address '%{NUMBER} %{GREEDYDATA:address}' | fields address `
- `source=logs | grok message '%{COMMONAPACHELOG}' | fields COMMONAPACHELOG, timestamp, response, bytes`
-
Copy link
Member

@LantaoJin LantaoJin Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the rest

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
@YANG-DB YANG-DB merged commit 176e150 into opensearch-project:main Aug 29, 2024
4 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 29, 2024
* Add support of GroK command including default patterns
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* Add support of GroK command including default patterns
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* fix grok parsing on projected fields
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* update named regexp field index selection to the RegExpMatcher
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* update named regexp field index selection to the RegExpMatcher
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* update comments
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* add `scalastyle:off` to ignore a long regexp test seting
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* fix according to comments
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* update spaces format
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

* update README.md
Signed-off-by: YANGDB <yang.db.dev@gmail.com>

Signed-off-by: YANGDB <yang.db.dev@gmail.com>

---------

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
(cherry picked from commit 176e150)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@opensearch-trigger-bot
Copy link

The backport to 0.5-nexus failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/opensearch-spark/backport-0.5-nexus 0.5-nexus
# Navigate to the new working tree
pushd ../.worktrees/opensearch-spark/backport-0.5-nexus
# Create a new branch
git switch --create backport/backport-598-to-0.5-nexus
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 176e150e0e7b5e94872d2f4ca8b3c2388f7a40f9
# Push it to GitHub
git push --set-upstream origin backport/backport-598-to-0.5-nexus
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/opensearch-spark/backport-0.5-nexus

Then, create a pull request where the base branch is 0.5-nexus and the compare/head branch is backport/backport-598-to-0.5-nexus.

YANG-DB pushed a commit that referenced this pull request Aug 29, 2024
* Add support of GroK command including default patterns


* Add support of GroK command including default patterns




* fix grok parsing on projected fields




* update named regexp field index selection to the RegExpMatcher




* update named regexp field index selection to the RegExpMatcher




* update comments




* add `scalastyle:off` to ignore a long regexp test seting




* fix according to comments




* update spaces format




* update README.md




---------


(cherry picked from commit 176e150)

Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants