Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest pipeline supports modifying the op_type parameter of an indexing request #15031

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gaobinlong
Copy link
Collaborator

@gaobinlong gaobinlong commented Jul 31, 2024

Description

This PR adds a new metadata field _op_type to the IngestDocument, which makes the ingest processor can modify the op_type parameter of an indexing request, this is useful when users change the indexing requests' target from a ordinary index to a data stream, but because data stream only supports setting op_type to create, so the following bulk request will fail with exception only write ops with an op_type of create are allowed in data streams:

PUT _index_template/template_2
{
  "index_patterns": [
    "ds*"
  ],
  "data_stream":{

  },
  "template": {
    "settings": {
      "number_of_replicas": 0
    },
    "mappings": {
    }
  },
  "priority": 500
}

PUT _data_stream/ds1

PUT /ds1/_bulk?refresh
{"index":{ }}
{ "@timestamp": "2024-03-08T11:04:05.000Z", "foo":"bar" }

, users have to change the op_type to create in the request body:

PUT /ds1/_bulk?refresh
{"create":{ }}
{ "@timestamp": "2024-03-08T11:04:05.000Z", "foo":"bar" }

, and index API also has this issue.

So this PR gives users an option that they can setup an ingest pipeline with modifying the op_type parameter to create to avoid changing the client code, the usage is:

PUT _ingest/pipeline/set_processor
{
  "processors": [
      {
        "set": {
          "field": "_op_type",
          "value": "create"
        }
      }
    ]
}
PUT ds1/_settings
{
  "index.default_pipeline":"set_processor"
}

PUT /ds1/_bulk?refresh

{"index":{ }}
{ "@timestamp": "2024-03-08T11:04:05.000Z", "foo":"bar" }

Related Issues

Resolves #2856.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ng request

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

❌ Gradle check result for 4025a02: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 8f9f91a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

❕ Gradle check result for dde3be1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

✅ Gradle check result for 94c374c: SUCCESS

Copy link

codecov bot commented Jul 31, 2024

Codecov Report

Attention: Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Project coverage is 71.81%. Comparing base (a918530) to head (e9c8b32).
Report is 218 commits behind head on main.

Files with missing lines Patch % Lines
...main/java/org/opensearch/ingest/IngestService.java 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #15031      +/-   ##
============================================
- Coverage     71.84%   71.81%   -0.04%     
+ Complexity    62911    62897      -14     
============================================
  Files          5176     5176              
  Lines        295133   295149      +16     
  Branches      42676    42680       +4     
============================================
- Hits         212029   211951      -78     
- Misses        65709    65754      +45     
- Partials      17395    17444      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@owaiskazi19 owaiskazi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall with minor suggestions

@owaiskazi19
Copy link
Member

@andrross another look?

@owaiskazi19
Copy link
Member

@gaobinlong can you resolve the conflicts?

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

github-actions bot commented Aug 7, 2024

❌ Gradle check result for a8bd353: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@gaobinlong gaobinlong added Indexing Indexing, Bulk Indexing and anything related to indexing and removed Indexing & Search labels Aug 7, 2024
Copy link
Contributor

github-actions bot commented Aug 7, 2024

❕ Gradle check result for a8bd353: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testReadRange

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

github-actions bot commented Aug 7, 2024

✅ Gradle check result for e9c8b32: SUCCESS

@andrross
Copy link
Member

andrross commented Aug 8, 2024

So this PR gives users an option that they can setup an ingest pipeline with modifying the op_type parameter to create to avoid changing the client code

@gaobinlong I'm not convinced this is a good idea. create has different semantics than the other operation types. In the general case you can't take a system that is ingesting documents with the index operation type and replace it with create and expect things to work because those operations do different things.

@gaobinlong
Copy link
Collaborator Author

So this PR gives users an option that they can setup an ingest pipeline with modifying the op_type parameter to create to avoid changing the client code

@gaobinlong I'm not convinced this is a good idea. create has different semantics than the other operation types. In the general case you can't take a system that is ingesting documents with the index operation type and replace it with create and expect things to work because those operations do different things.

Thanks @andrross, this change doesn't target for general cases, but for the case that users want to write to a data stream but don't want to do any code change or configuration change in the client, for example, if Logstash is used to write to a data stream in OpenSearch, the setting action must be set to create, the example is [here]:(https://opensearch.org/docs/latest/tools/logstash/ship-to-opensearch/):

output {    
    opensearch {        
          hosts  => ["https://hostname:port"]     
          auth_type => {            
              type => 'basic'           
              user => 'admin'           
              password => 'admin'           
          }
          index => "my-data-stream"
          action => "create"
   }            
}          

, but if the ingestion tool doesn't support the action parameter, users have no options, so this change provides some flexibility, they can choose to configure the op_type in client side or modify it in the server side. In addition, these fields like _index, _routing, if_seq_no and if_primary_term can be modified by ingest pipeline, so I think it makes sense that we support modifying the op_type parameter during the execution of ingest pipeline.

@andrross
Copy link
Member

if the ingestion tool doesn't support the action parameter, users have no options

Are there users in this situation asking for this feature?

The reason I'm hesitant is that while it does solve the above use case, it seems like it could let users really shoot themselves in the foot, either subtly (the different semantics of index vs create can cause hard to track down errors in the system) to absurd (always set op_type to delete). @msfroh what do you think?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Indexing & Search Indexing Indexing, Bulk Indexing and anything related to indexing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changing op_type in ingest pipeline in case of _bulk operation
3 participants