Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ingest processor supports excluding fields #10967

Merged
merged 15 commits into from
Jan 17, 2024

Conversation

gaobinlong
Copy link
Collaborator

@gaobinlong gaobinlong commented Oct 27, 2023

Description

Enhance remove ingest processor to support field patterns and excluding fields, the following parameters will be added to the processor:

1. field_pattern: optional, single value or array, support wildcard pattern like a*, *b, or a*b, fields match the pattern will be removed
2. exclude_filed: optional, single value or array, fields not in this list will be removed
3. exclude_filed_pattern: optional, single value or array, fields do not match the pattern will be removed

Here are some examples:

1. Remove the fields which start with `a`: ``` POST _ingest/pipeline/_simulate { "pipeline": { "processors": [ { "remove": { "field_pattern": "a*" } } ] }, "docs": [ { "_source": { "foo.bar": "1" } } ] } ```
  1. Keep some fields, and remove others
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "remove": {
          "exclude_field": ["a", "b", "c"]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "foo.bar": "1"
      }
    }
  ]
}
  1. Keep some fields which start with a* or b*, and remove others
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "remove": {
          "exclude_filed_pattern": ["a*", "b*"]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "foo.bar": "1"
      }
    }
  ]
}
Some explanation about the implementation: 1. Why not make the existing parameter `field` to support wildcard? That's because field name can contain `*`, so if `field` supports wildcard, we don't know it's a concrete filed or a field pattern.
  1. Note that the existing parameter field is changed to be optional, it can be used with field_pattern at the same time, then all the fields specified in the field parameter and the fields match the field_pattern will be removed. However, (field, field_pattern) and (exclude_field, exclude_field_pattern) cannot be set at the same time, users have to choose one type of removal, obverse or reverse.

  2. For the parameter ignore_missing, it will only work with field parameter, not others, that means if the filed pattern specified in field_pattern parameter doesn't hit any field in the documents, then nothing will happen, no exception like filed doesn't exist will be thrown.

  3. Some metadata fields like _index, _id will be ignored when removing fields by specifying field_pattern, exclude_field, or exclude_field_pattern, removing these metadata fields are meaningless and dangerous.

  4. How is the field pattern like? It supports wildcard pattern like a*, b*, a*b, and cannot contain some special character like #, _, this behavior is same with the index pattern in templates.

Related Issues

#1578

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Oct 27, 2023

Compatibility status:

Checks if related components are compatible with change bc79b5d

Incompatible components

Incompatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/k-nn.git]

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

❌ Gradle check result for 5fdc552: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for a729d27: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 7cdcbd0: SUCCESS

Copy link
Contributor

✅ Gradle check result for bc79b5d: SUCCESS

@gaobinlong
Copy link
Collaborator Author

Hi @reta , can this PR be merged now?

@reta
Copy link
Collaborator

reta commented Jan 17, 2024

Hi @reta , can this PR be merged now?

Yeah, sure, we cannot get @msfroh attention sadly :)

@reta reta self-requested a review January 17, 2024 13:13
@reta reta merged commit 5dd4b61 into opensearch-project:main Jan 17, 2024
33 checks passed
@reta reta added v2.12.0 Issues and PRs related to version 2.12.0 backport 2.x Backport to 2.x branch labels Jan 17, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-10967-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5dd4b61e83d9bbbdfd7fd490de4e8336bcc8e4f8
# Push it to GitHub
git push --set-upstream origin backport/backport-10967-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-10967-to-2.x.

@reta
Copy link
Collaborator

reta commented Jan 17, 2024

@gaobinlong expected backport failure, could you please send a manual one? Thank you.

gaobinlong added a commit to gaobinlong/OpenSearch that referenced this pull request Jan 18, 2024
…#10967)

* Remove ingest processor supports field patterns and excluding fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove the code of field pattern

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip version in rest test yml

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Make fields and exclude_fields mutually exclusive when constructing RemoveProcessor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 5dd4b61)
gaobinlong added a commit to gaobinlong/OpenSearch that referenced this pull request Jan 18, 2024
…#10967)

* Remove ingest processor supports field patterns and excluding fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove the code of field pattern

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip version in rest test yml

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Make fields and exclude_fields mutually exclusive when constructing RemoveProcessor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 5dd4b61)
reta pushed a commit that referenced this pull request Jan 18, 2024
* Remove ingest processor supports field patterns and excluding fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove the code of field pattern

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip version in rest test yml

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Make fields and exclude_fields mutually exclusive when constructing RemoveProcessor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 5dd4b61)
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
…#10967)

* Remove ingest processor supports field patterns and excluding fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove the code of field pattern

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip version in rest test yml

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Make fields and exclude_fields mutually exclusive when constructing RemoveProcessor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…#10967)

* Remove ingest processor supports field patterns and excluding fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove the code of field pattern

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip version in rest test yml

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Make fields and exclude_fields mutually exclusive when constructing RemoveProcessor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…#10967)

* Remove ingest processor supports field patterns and excluding fields

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Format some code

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Fix test failure

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Remove the code of field pattern

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Add skip version in rest test yml

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

* Make fields and exclude_fields mutually exclusive when constructing RemoveProcessor

Signed-off-by: Gao Binlong <gbinlong@amazon.com>

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed Indexing & Search v2.12.0 Issues and PRs related to version 2.12.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants