Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task consumer Integration #2293

Merged

Conversation

sruti1312
Copy link
Contributor

@sruti1312 sruti1312 commented Mar 1, 2022

Signed-off-by: Sruti Parthiban partsrut@amazon.com

Description

Integrate task consumers to capture task resource information during unregister.
Add consumer that logs topN expensive search tasks

[2022-02-28T20:12:26,811][INFO ][t.detailslog             ] [runTask-0] taskId[34], type[direct], action[indices:data/read/search[phase/query]], description[shardId[[shakespeare][0]]], start_time_millis[1646107], resource_stats[{memory=100, cpu=100}], metadata[source[{"query":{"query_string":{"query":"king","fields":[],"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"analyze_wildcard":false,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}}}]], 

Issues Resolved

#1009

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@sruti1312 sruti1312 requested a review from a team as a code owner March 1, 2022 04:18
@sruti1312 sruti1312 changed the title Task consumer Integration [DRAFT] Task consumer Integration Mar 1, 2022
@opensearch-ci-bot
Copy link
Collaborator

Can one of the admins verify this patch?

@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 96388abbca9e2e1c51bb8e1042fbe05a5daa55dc
Log 2892

Reports 2892

@dblock dblock requested a review from kartg March 1, 2022 16:34
@dblock
Copy link
Member

dblock commented Mar 21, 2022

@sruti1312 Take a look at the comments above? @kartg do a CR?

@sruti1312 sruti1312 force-pushed the feature/consumer-integration branch from 96388ab to 778463a Compare March 22, 2022 18:00
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 778463abb7369c03397ba48df27b74b519046c9e
Log 3672

Reports 3672

@sruti1312 sruti1312 force-pushed the feature/consumer-integration branch from 778463a to 44409d3 Compare March 25, 2022 00:17
@sruti1312 sruti1312 changed the title [DRAFT] Task consumer Integration Task consumer Integration Mar 25, 2022
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure 44409d314e27fd683c59b7a36a4540e357333e84
Log 3746

Reports 3746

server/src/main/java/org/opensearch/tasks/Task.java Outdated Show resolved Hide resolved
}
}

// TODO: Need performance testing results to understand if we can to use synchronized here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this TODO need to be answered before merging this code? It looks like you're merging this into a feature branch. What is the ultimate plan for the feature branch?

Comment on lines 48 to 49
// generating metadata in a lazy way since source can be quite big
private final Supplier<String> metadataSupplier;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know for sure that doing this lazily is the right approach? It adds complexity and actually results in more work if SearchShardTask#getTaskMetadata happens to be called more than once. The optimal approach would be to memoize the result of the supplier, but that adds more complexity, so I'd suggest doing this only if you know there is a non-trivial perf benefit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same/similar question from me - is there a reason why this can't be a simple String instead of a Supplier ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SearchShardTask we want to log the query string to be able to correlate the top resource consumers with the query. We have seen query strings can be really huge and with high search workload it can be very expensive to construct it for each shard task. Also depending on the consumers type it may not be needed to get the query string for each shard tasks. Hence, depending upon the consumers (for example in this case TopNConsumer) it will only call this method on the Tasks which are of interest and not on all the Task object.

I agree with the memoization approach but given at the moment there is only one caller expected for this method seems like we can do that optimization as a follow-up ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @sohami pointed out this method will only be added once (from the consumer and if query is top resource consumer), there are no other callers to this method. In future if this method is going to be called more than once, we can extend to memoize the result of this supplier. Adding a comment for future reference.

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
@sruti1312 sruti1312 force-pushed the feature/consumer-integration branch from 44409d3 to eeb754a Compare March 29, 2022 21:13
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure eeb754a
Log 3877

Reports 3877

@sruti1312 sruti1312 changed the base branch from feature/task-resource-tracking to main March 30, 2022 00:30
@opensearch-ci-bot
Copy link
Collaborator

❌   Gradle Check failure c16db7a6d01d62a885caf40646b9aa87b1c5299d
Log 3881

Reports 3881

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me overall, made some suggestions. Talk me out of the metadata part, looks like a hack.

Try to get the build to green, address @andrross's comments, etc.

@sruti1312 sruti1312 force-pushed the feature/consumer-integration branch from c16db7a to e9ed69f Compare March 31, 2022 22:55
@opensearch-ci-bot
Copy link
Collaborator

✅   Gradle Check success e9ed69f
Log 3994

Reports 3994

Comment on lines 48 to 49
// generating metadata in a lazy way since source can be quite big
private final Supplier<String> metadataSupplier;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same/similar question from me - is there a reason why this can't be a simple String instead of a Supplier ?

Comment on lines 134 to 145
this.taskResourceConsumer = new ArrayList<Consumer<Task>>() {
{
add(new TopNSearchTasksLogger(settings));
}
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick - gate this with a check on taskResourceConsumersEnabled ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding TASK_RESOURCE_CONSUMER_SETTING to clusterSetting to enable/disable as required. Also adding a check to see if enabled before publishing to consumers. Do you still think we need to include a gate check here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we continue to add more consumers what might help is a control knob per consumer instead of a blanket level knob

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bukhtawar When we have more consumers, it makes sense to have control knob per consumer! I think blanket level will also be helpful to control all consumers using one knob (like parent level). Going forward we can add the control knobs per consumer. Any thoughts?

Comment on lines 48 to 49
// generating metadata in a lazy way since source can be quite big
private final Supplier<String> metadataSupplier;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For SearchShardTask we want to log the query string to be able to correlate the top resource consumers with the query. We have seen query strings can be really huge and with high search workload it can be very expensive to construct it for each shard task. Also depending on the consumers type it may not be needed to get the query string for each shard tasks. Hence, depending upon the consumers (for example in this case TopNConsumer) it will only call this method on the Tasks which are of interest and not on all the Task object.

I agree with the memoization approach but given at the moment there is only one caller expected for this method seems like we can do that optimization as a follow-up ?

server/src/main/java/org/opensearch/tasks/TaskManager.java Outdated Show resolved Hide resolved
}
}

private synchronized void recordSearchTask(final Task searchTask) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of making this method synchronized we should have a queue to hold on to all the tasks which are passed to the consumer. Then a background thread should clear off that queue and manager the topN queue out of that. The reason being unregister can be called on network thread and if logging takes long time (because of configuration or logging module issue) then we will end up blocking the network thread which can result into other issues in the cluster.

@sruti1312
Copy link
Contributor Author

Performance test results

Workload configuration details

Key Value
dataset nyc_taxis
warmupIterations 2
testIterations 3
primaryShards 5
replicaShards 1

Cluster with consumer enabled

Latency (ms)

Operation Type P50 P90 P99 P100
index 4,761 13,287.1 24,975.9 60,989
query 120 139.1 194.7 233.8

Throughput (req/s)

Operation Type P0 P50 P100
index 12,153.6 13,335.1 22,659
query 1.543 1.693 1.723

Operation Counts

Operation Type Op Count Op Error Count Error Rate
index 48,272 18 0%
query 1,510 0 0

Usage (%)

Resource P50 P90 P99 P100
CPU 0 38 64 95
Memory 11.22 61 73.476 78

Cluster with consumer disabled

Latency (ms)

Operation Type P50 P90 P99 P100
index 4,515.6 11,294.7 24,493.5 61,004
query 112.9 130.6 204.2 261.8

Throughput (req/s)

Operation Type P0 P50 P100
index 11,058.9 14,221.6 24,538.2
query 1.574 1.695 1.726

Operation Counts

Operation Type Op Count Op Error Count Error Rate
index 47,970 8 0%
query 1,510 0 0

Usage (%)

Resource P50 P90 P99 P100
CPU 0 46 75 97
Memory 10.96 62 73.85 79

@kotwanikunal
Copy link
Member

@sruti1312 Can you please address the comments? It would be great to get this PR to a closure.

@JeffHuss
Copy link

Is this PR going to target the 2.2 release or a later version? I just want to make sure the associated doc ticket for this functionality is accurate and updated so we don't miss the deadline.

@sruti1312 sruti1312 requested a review from reta as a code owner August 3, 2022 19:39
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
@sruti1312 sruti1312 force-pushed the feature/consumer-integration branch from 52a6161 to 5f3c262 Compare August 5, 2022 01:10
@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 5, 2022

Gradle Check (Jenkins) Run Completed with:

@kartg kartg added backport 2.x Backport to 2.x branch v2.2.0 backport 2.2 Backport to 2.2 branch labels Aug 5, 2022
@kartg
Copy link
Member

kartg commented Aug 5, 2022

@Bukhtawar can you take a look and merge this if no blockers remain? I believe @sruti1312 would like to get this into 2.2

@Bukhtawar Bukhtawar merged commit fbe93d4 into opensearch-project:main Aug 5, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 5, 2022
* Integrate task consumers to capture task resource information during unregister.
  Add consumer that logs topN expensive search tasks

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
(cherry picked from commit fbe93d4)
opensearch-trigger-bot bot pushed a commit that referenced this pull request Aug 5, 2022
* Integrate task consumers to capture task resource information during unregister.
  Add consumer that logs topN expensive search tasks

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
(cherry picked from commit fbe93d4)
@sruti1312 sruti1312 deleted the feature/consumer-integration branch August 5, 2022 15:51
Bukhtawar pushed a commit that referenced this pull request Aug 5, 2022
* Integrate task consumers to capture task resource information during unregister.
  Add consumer that logs topN expensive search tasks

Co-authored-by: Sruti Parthiban <srutiparthiban@gmail.com>
Bukhtawar pushed a commit that referenced this pull request Aug 5, 2022
* Integrate task consumers to capture task resource information during unregister.
  Add consumer that logs topN expensive search tasks

Co-authored-by: Sruti Parthiban <srutiparthiban@gmail.com>
dreamer-89 pushed a commit to dreamer-89/OpenSearch that referenced this pull request Aug 12, 2022
* Integrate task consumers to capture task resource information during unregister.
  Add consumer that logs topN expensive search tasks

Signed-off-by: sruti1312 <srutiparthiban@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport 2.2 Backport to 2.2 branch v2.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.