Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds search backpressure documentation #1790

Merged
merged 10 commits into from
Nov 10, 2022
Merged

Conversation

kolchfa-aws
Copy link
Collaborator

Fixes #795

Checklist

  • [ x] By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and subject to the Developers Certificate of Origin.
    For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws self-assigned this Nov 2, 2022
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws added 3 - Tech review PR: Tech review in progress v2.4.0 'Issues and PRs related to version v2.4.0' labels Nov 5, 2022
Comment on lines 29 to 54
{
"error" : {
"root_cause" : [
{
"type" : "task_cancelled_exception",
"reason" : "Task is cancelled due to high resource consumption"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "nyc_taxis",
"node" : "MGkMkg9wREW3IVewZ7U_jw",
"reason" : {
"type" : "task_cancelled_exception",
"reason" : "Task is cancelled due to high resource consumption"
}
}
]
},
"status" : 500
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing an up-to-date sample cancellation response. Can you please update?

{
    "error": {
        "root_cause": [
            {
                "type": "task_cancelled_exception",
                "reason": "cancelled task with reason: cpu usage exceeded [17.9ms >= 15ms], elapsed time exceeded [1.1s >= 300ms]"
            },
            {
                "type": "task_cancelled_exception",
                "reason": "cancelled task with reason: elapsed time exceeded [1.1s >= 300ms]"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "foobar",
                "node": "7yIqOeMfRyWW1rHs2S4byw",
                "reason": {
                    "type": "task_cancelled_exception",
                    "reason": "cancelled task with reason: cpu usage exceeded [17.9ms >= 15ms], elapsed time exceeded [1.1s >= 300ms]"
                }
            },
            {
                "shard": 1,
                "index": "foobar",
                "node": "7yIqOeMfRyWW1rHs2S4byw",
                "reason": {
                    "type": "task_cancelled_exception",
                    "reason": "cancelled task with reason: elapsed time exceeded [1.1s >= 300ms]"
                }
            }
        ]
    },
    "status": 500
}


An observer thread tracks the resource consumption of each task thread. It measures the resource consumption at several checkpoints during the query phase of a shard search request. If the node is determined to be under duress based on the JVM memory pressure and CPU utilization, the server examines the resource consumption for each search task. It determines if the CPU usage and elapsed time are within their fixed thresholds, and it compares the heap usage against the rolling average of the heap usage of the 100 most recent tasks. If the task is among the most resource-intensive based on these criteria, the task in canceled.

Every minute OpenSearch can cancel at most 1% of the number of currently running search shard tasks. Once a task is canceled, OpenSearch monitors the node for the next two seconds to determine if it is still under duress.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest re-wording this slightly.

OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is out of duress.

- Heap usage
- Elapsed time

An observer thread tracks the resource consumption of each task thread. It measures the resource consumption at several checkpoints during the query phase of a shard search request. If the node is determined to be under duress based on the JVM memory pressure and CPU utilization, the server examines the resource consumption for each search task. It determines if the CPU usage and elapsed time are within their fixed thresholds, and it compares the heap usage against the rolling average of the heap usage of the 100 most recent tasks. If the task is among the most resource-intensive based on these criteria, the task in canceled.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest re-wording this slightly.

An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, then the resource usage of each search shard task is examined and compared against some tunable thresholds. CPU usage, heap usage and elapsed time are considered to give each task a cancellation score which is then used to cancel the most resource-intensive tasks.


## Canceled queries

If a query is canceled, instead of receiving search results you receive an error from the server similar to the error below:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what all shards failed, it may be possible that OpenSearch returns partial results. Can we re-word this slightly?

To retrieve the stats, use the following request:

```json
GET _nodes/stats/search_backpressure
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The response below is with human-readable fields enabled. Can you update this to:

GET _nodes/stats/search_backpressure?human=true

cancellation_count | Integer | The number of tasks canceled because of excessive heap usage since the node last restarted.
current_max_bytes | Integer | The maximum heap usage for all tasks currently running on the node, in bytes.
current_avg_bytes | Integer | The average heap usage for all tasks currently running on the node, in bytes.
rolling_avg_bytes | Integer | The rolling average heap usage for the 100 most recent tasks, in bytes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rolling average is not hard-coded to work with just 100 most recent tasks.

This is a configurable setting defined by search_backpressure.search_shard_task.heap_moving_average_window_size. 100 is just the default value for it.

Comment on lines 61 to 79
## Search backpressure settings

Search backpressure adds several settings to the standard OpenSearch cluster settings. These settings are dynamic, so you can change the default behavior of this feature without restarting your cluster.

Setting | Default | Description
:--- | :--- | :---
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;mode | `monitor_only` | The [mode](#search-backpressure-modes) for search backpressure. Valid values are `monitor_only`, `enforced`, or `disabled`.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;interval | 1 second | The interval at which the observer thread measures the resource consumption and cancels tasks.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;cancellation_ratio | 10% | The maximum percentage of tasks to cancel out of the number of successful task completions.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;cancellation_rate | 0.003 | The maximum number of tasks to cancel per millisecond.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;cancellation_burst | 10 | The maximum number of tasks that can be canceled before no further cancellations are made.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_consecutive_breaches | 3 | The number of consecutive limit breaches after which the node is marked in duress.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_threshold | 90% | The CPU usage threshold (in percentage) for a node to be considered in duress.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;heap_threshold | 70% | The heap usage threshold (in percentage) for a node to be considered in duress.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_heap_threshold | 5% | The heap usage threshold (in percentage) for the sum of heap usages across all search tasks before server-side cancellation is applied.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_heap_threshold | 0.5% | The heap usage threshold (in percentage) for one task before it is considered for cancellation.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_heap_variance | 2 | The heap usage variance for one task before it is considered for cancellation. A task is considered for cancellation when `taskHeapUsage` is greater than or equal to `heapUsageMovingAverage` &middot; `variance`.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_cpu_time_threshold | 15 seconds | The CPU usage threshold (in milliseconds) for one task before it is considered for cancellation.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_elapsed_time_threshold | 30 seconds | The elapsed time threshold (in milliseconds) for one task before it is considered for cancellation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these settings have been renamed. Sharing the latest ones along with simplified descriptions.

  • search_backpressure.mode (default monitor_only)
    The search backpressure mode. Valid values are monitor_only, enforced, or disabled.

  • search_backpressure.interval_millis (default 1 second)
    The interval at which the observer thread measures the resource usage and cancels tasks.

  • search_backpressure.cancellation_ratio (default 10%)
    The maximum number of tasks to cancel as a fraction of successful task completions.

  • search_backpressure.cancellation_rate (default 0.003)
    The maximum number of tasks to cancel per millisecond of elapsed time.

  • search_backpressure.cancellation_burst (default 10)
    The maximum number of tasks to cancel in a single iteration of the observer thread.

  • search_backpressure.node_duress.num_successive_breaches (default 3)
    The number of of successive limit breaches after which the node is considered under duress.

  • search_backpressure.node_duress.cpu_threshold (default 90%)
    The CPU usage threshold (in percentage) for a node to be considered in duress.

  • search_backpressure.node_duress.heap_threshold (default 70%)
    The heap usage threshold (in percentage) for a node to be considered in duress.

  • search_backpressure.search_shard_task.total_heap_percent_threshold (default 5%)
    The heap usage threshold (in percentage) for the sum of all search shard tasks before cancellation is applied.

  • search_backpressure.search_shard_task.heap_percent_threshold (default 0.5%)
    The heap usage threshold (in percentage) for a single search shard task before it is considered for cancellation.

  • search_backpressure.search_shard_task.heap_variance (default 2.0)
    The minimum variance of a single search shard task's heap usage usage compared to the rolling average of previously completed tasks before it is considered for cancellation.

  • search_backpressure.search_shard_task.heap_moving_average_window_size (default 100)
    The number of previously completed search shard tasks to consider when calculating the moving average of heap usage.

  • search_backpressure.search_shard_task.cpu_time_millis_threshold (default 15 seconds)
    The CPU usage threshold (in milliseconds) for a single search shard task before it is considered for cancellation.

  • search_backpressure.search_shard_task.elapsed_time_millis_threshold (default 30 seconds)
    The elapsed time threshold (in milliseconds) for a single search shard task before it is considered for cancellation.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@ketanv3
Copy link

ketanv3 commented Nov 7, 2022

Thanks for these changes. LGTM!

@kolchfa-aws kolchfa-aws added 4 - Doc review PR: Doc review in progress and removed 3 - Tech review PR: Tech review in progress labels Nov 7, 2022
@kolchfa-aws kolchfa-aws marked this pull request as ready for review November 7, 2022 14:31
@kolchfa-aws kolchfa-aws requested a review from a team as a code owner November 7, 2022 14:31
Copy link

@nssuresh2007 nssuresh2007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@Naarcha-AWS Naarcha-AWS self-requested a review November 7, 2022 20:33
Copy link
Collaborator

@vagimeli vagimeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimal changes. LGTM.

- Heap usage
- Elapsed time

An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Third sentence: insert comma after "heap usage."


## Canceled queries

If a query is canceled, OpenSearch may return partial results in case some shards failed. If all shards failed, OpenSearch returns an error from the server similar to the error below:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify the second part of this sentence, I bolded my suggestion: "If a query is canceled, OpenSearch may return partial results if some shards failed."

search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;heap_threshold | 70% | The heap usage threshold (in percentage) for a node to be considered in duress.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_shard_task.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;total_heap_percent_threshold | 5% | The heap usage threshold (in percentage) for the sum of heap usages of all search shard tasks before cancellation is applied.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_shard_task.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;heap_percent_threshold | 0.5% | The heap usage threshold (in percentage) for a single search shard task before it is considered for cancellation.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_shard_task.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;heap_variance | 2.0 | The minimum variance of a single search shard task's heap usage usage compared to the rolling average of previously completed tasks before it is considered for cancellation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete second "usage" after "heap usage."


Field Name | Data Type | Description
:--- | :--- | :---
search_backpressure | Object | Contains statistics about search backpressure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, either delete the verb "contains" in line 173 and 174 or add a verb to each description in lines 175-177.

Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Copy link
Contributor

@ariamarble ariamarble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Please see my comments and changes and let me know if you have any questions. Thanks!

_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
- Heap usage
- Elapsed time

An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage, and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the node being "determined to be under duress" standard language in this context? It reads a bit strangely to me, as though we're anthropomorphizing the node. Do we mean something like "If the node continues to be under duress"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not "continues", it is whether it becomes under duress. So, we're checking periodically for the node health. Normally, it's not under duress. If we determine that it is under duress, then we remediate.


An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, OpenSearch examines the resource usage of each search shard task and compares it against configurable thresholds. OpenSearch considers CPU usage, heap usage, and elapsed time and assigns each task a cancellation score that is then used to cancel the most resource-intensive tasks.

OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is no longer under duress.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is no longer under duress.
OpenSearch limits the number of cancellations to a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is no longer under duress.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded for clarity.

_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
_opensearch/search-backpressure.md Outdated Show resolved Hide resolved
kolchfa-aws and others added 4 commits November 10, 2022 09:23
Co-authored-by: Nate Bower <nbower@amazon.com>
Co-authored-by: Nate Bower <nbower@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
@kolchfa-aws kolchfa-aws merged commit 91caf0c into main Nov 10, 2022
@Naarcha-AWS Naarcha-AWS deleted the server-side-cancellation branch December 13, 2022 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Doc review PR: Doc review in progress v2.4.0 'Issues and PRs related to version v2.4.0'
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Doc] Server-side rejection of search requests based on resource
6 participants