Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds search backpressure documentation #1790

Merged
merged 10 commits into from
Nov 10, 2022
1 change: 1 addition & 0 deletions _api-reference/nodes-apis/nodes-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -615,6 +615,7 @@ http.total_opened | Integer | The total number of HTTP connections the node has
[adaptive_selection](#adaptive_selection) | Object | Statistics about adaptive selections for the node.
[indexing_pressure](#indexing_pressure) | Object | Statistics related to the node's indexing pressure.
[shard_indexing_pressure](#shard_indexing_pressure) | Object | Statistics related to indexing pressure at the shard level.
[search_backpressure]({{site.url}}{{site.baseurl}}/opensearch/search-backpressure#search-backpressure-stats-api) | Object | Statistics related to search backpressure.

### `indices`

Expand Down
217 changes: 217 additions & 0 deletions _opensearch/search-backpressure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
---
layout: default
title: Search backpressure
nav_order: 63
has_children: false
---

# Search backpressure

Search backpressure is a mechanism to identify resource-intensive search requests and cancel them when the node is under duress. If a search request on a node or shard has breached the resource limits and does not recover within a certain threshold, it is rejected. These thresholds are dynamic and configurable through [cluster settings](#search-backpressure-settings).
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Measuring resource consumption

To decide whether to apply search backpressure, OpenSearch periodically measures the following resource consumption statistics for each search request:

- CPU usage
- Heap usage
- Elapsed time

An observer thread tracks the resource consumption of each task thread. It measures the resource consumption at several checkpoints during the query phase of a shard search request. If the node is determined to be under duress based on the JVM memory pressure and CPU utilization, the server examines the resource consumption for each search task. It determines if the CPU usage and elapsed time are within their fixed thresholds, and it compares the heap usage against the rolling average of the heap usage of the 100 most recent tasks. If the task is among the most resource-intensive based on these criteria, the task in canceled.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest re-wording this slightly.

An observer thread periodically measures the resource usage of the node. If the node is determined to be under duress, then the resource usage of each search shard task is examined and compared against some tunable thresholds. CPU usage, heap usage and elapsed time are considered to give each task a cancellation score which is then used to cancel the most resource-intensive tasks.


Every minute OpenSearch can cancel at most 1% of the number of currently running search shard tasks. Once a task is canceled, OpenSearch monitors the node for the next two seconds to determine if it is still under duress.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest re-wording this slightly.

OpenSearch limits the number of cancellations as a fraction of successful task completions and cancellations per unit time. It continues to monitor and cancel tasks until the node is out of duress.


## Canceled queries

If a query is canceled, instead of receiving search results you receive an error from the server similar to the error below:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what all shards failed, it may be possible that OpenSearch returns partial results. Can we re-word this slightly?


```json
{
"error" : {
"root_cause" : [
{
"type" : "task_cancelled_exception",
"reason" : "Task is cancelled due to high resource consumption"
}
],
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : 0,
"index" : "nyc_taxis",
"node" : "MGkMkg9wREW3IVewZ7U_jw",
"reason" : {
"type" : "task_cancelled_exception",
"reason" : "Task is cancelled due to high resource consumption"
}
}
]
},
"status" : 500
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing an up-to-date sample cancellation response. Can you please update?

{
    "error": {
        "root_cause": [
            {
                "type": "task_cancelled_exception",
                "reason": "cancelled task with reason: cpu usage exceeded [17.9ms >= 15ms], elapsed time exceeded [1.1s >= 300ms]"
            },
            {
                "type": "task_cancelled_exception",
                "reason": "cancelled task with reason: elapsed time exceeded [1.1s >= 300ms]"
            }
        ],
        "type": "search_phase_execution_exception",
        "reason": "all shards failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [
            {
                "shard": 0,
                "index": "foobar",
                "node": "7yIqOeMfRyWW1rHs2S4byw",
                "reason": {
                    "type": "task_cancelled_exception",
                    "reason": "cancelled task with reason: cpu usage exceeded [17.9ms >= 15ms], elapsed time exceeded [1.1s >= 300ms]"
                }
            },
            {
                "shard": 1,
                "index": "foobar",
                "node": "7yIqOeMfRyWW1rHs2S4byw",
                "reason": {
                    "type": "task_cancelled_exception",
                    "reason": "cancelled task with reason: elapsed time exceeded [1.1s >= 300ms]"
                }
            }
        ]
    },
    "status": 500
}

```

## Search backpressure modes

Search backpressure runs in `monitor_only` (default), `enforced`, or `disabled` mode. In the `enforced` mode, the server rejects search requests. In the `monitor_only` mode, the server does not actually cancel search requests, but tracks statistics about them. You can specify the mode in the [`search_backpressure.mode`](#search-backpressure-settings) parameter.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Search backpressure settings

Search backpressure adds several settings to the standard OpenSearch cluster settings. These settings are dynamic, so you can change the default behavior of this feature without restarting your cluster.

Setting | Default | Description
:--- | :--- | :---
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;mode | `monitor_only` | The [mode](#search-backpressure-modes) for search backpressure. Valid values are `monitor_only`, `enforced`, or `disabled`.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;interval | 1 second | The interval at which the observer thread measures the resource consumption and cancels tasks.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;cancellation_ratio | 10% | The maximum percentage of tasks to cancel out of the number of successful task completions.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;cancellation_rate | 0.003 | The maximum number of tasks to cancel per millisecond.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;cancellation_burst | 10 | The maximum number of tasks that can be canceled before no further cancellations are made.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;num_consecutive_breaches | 3 | The number of consecutive limit breaches after which the node is marked in duress.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cpu_threshold | 90% | The CPU usage threshold (in percentage) for a node to be considered in duress.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;node_duress.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;heap_threshold | 70% | The heap usage threshold (in percentage) for a node to be considered in duress.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_heap_threshold | 5% | The heap usage threshold (in percentage) for the sum of heap usages across all search tasks before server-side cancellation is applied.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_heap_threshold | 0.5% | The heap usage threshold (in percentage) for one task before it is considered for cancellation.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_heap_variance | 2 | The heap usage variance for one task before it is considered for cancellation. A task is considered for cancellation when `taskHeapUsage` is greater than or equal to `heapUsageMovingAverage` &middot; `variance`.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_cpu_time_threshold | 15 seconds | The CPU usage threshold (in milliseconds) for one task before it is considered for cancellation.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_task_elapsed_time_threshold | 30 seconds | The elapsed time threshold (in milliseconds) for one task before it is considered for cancellation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these settings have been renamed. Sharing the latest ones along with simplified descriptions.

  • search_backpressure.mode (default monitor_only)
    The search backpressure mode. Valid values are monitor_only, enforced, or disabled.

  • search_backpressure.interval_millis (default 1 second)
    The interval at which the observer thread measures the resource usage and cancels tasks.

  • search_backpressure.cancellation_ratio (default 10%)
    The maximum number of tasks to cancel as a fraction of successful task completions.

  • search_backpressure.cancellation_rate (default 0.003)
    The maximum number of tasks to cancel per millisecond of elapsed time.

  • search_backpressure.cancellation_burst (default 10)
    The maximum number of tasks to cancel in a single iteration of the observer thread.

  • search_backpressure.node_duress.num_successive_breaches (default 3)
    The number of of successive limit breaches after which the node is considered under duress.

  • search_backpressure.node_duress.cpu_threshold (default 90%)
    The CPU usage threshold (in percentage) for a node to be considered in duress.

  • search_backpressure.node_duress.heap_threshold (default 70%)
    The heap usage threshold (in percentage) for a node to be considered in duress.

  • search_backpressure.search_shard_task.total_heap_percent_threshold (default 5%)
    The heap usage threshold (in percentage) for the sum of all search shard tasks before cancellation is applied.

  • search_backpressure.search_shard_task.heap_percent_threshold (default 0.5%)
    The heap usage threshold (in percentage) for a single search shard task before it is considered for cancellation.

  • search_backpressure.search_shard_task.heap_variance (default 2.0)
    The minimum variance of a single search shard task's heap usage usage compared to the rolling average of previously completed tasks before it is considered for cancellation.

  • search_backpressure.search_shard_task.heap_moving_average_window_size (default 100)
    The number of previously completed search shard tasks to consider when calculating the moving average of heap usage.

  • search_backpressure.search_shard_task.cpu_time_millis_threshold (default 15 seconds)
    The CPU usage threshold (in milliseconds) for a single search shard task before it is considered for cancellation.

  • search_backpressure.search_shard_task.elapsed_time_millis_threshold (default 30 seconds)
    The elapsed time threshold (in milliseconds) for a single search shard task before it is considered for cancellation.


## Search Backpressure Stats API
Introduced 2.4
{: .label .label-purple }

You can use the [nodes stats API operation]({{site.url}}{{site.baseurl}}/opensearch/query-dsl/text-analyzers/#how-to-use-text-analyzers) to monitor server-side request cancellations.

#### Sample request

To retrieve the stats, use the following request:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

```json
GET _nodes/stats/search_backpressure
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The response below is with human-readable fields enabled. Can you update this to:

GET _nodes/stats/search_backpressure?human=true

```

#### Sample response

The response contains server-side request cancellation statistics:

```json
{
"_nodes": {
"total": 1,
"successful": 1,
"failed": 0
},
"cluster_name": "runTask",
"nodes": {
"T7aqO6zaQX-lt8XBWBYLsA": {
"timestamp": 1667409521070,
"name": "runTask-0",
"transport_address": "127.0.0.1:9300",
"host": "127.0.0.1",
"ip": "127.0.0.1:9300",
"roles": [
"cluster_manager",
"data",
"ingest",
"remote_cluster_client"
],
"attributes": {
"testattr": "test",
"shard_indexing_pressure_enabled": "true"
},
"search_backpressure": {
"search_shard_task": {
"resource_tracker_stats": {
"heap_usage_tracker": {
"cancellation_count": 34,
"current_max": "1.1mb",
"current_max_bytes": 1203272,
"current_avg": "683.8kb",
"current_avg_bytes": 700267,
"rolling_avg": "1.1mb",
"rolling_avg_bytes": 1156270
},
"cpu_usage_tracker": {
"cancellation_count": 318,
"current_max": "731.3ms",
"current_max_millis": 731,
"current_avg": "303.6ms",
"current_avg_millis": 303
},
"elapsed_time_tracker": {
"cancellation_count": 310,
"current_max": "1.3s",
"current_max_millis": 1305,
"current_avg": "649.3ms",
"current_avg_millis": 649
}
},
"cancellation_stats": {
"cancellation_count": 318,
"cancellation_limit_reached_count": 97
}
},
"mode": "enforced"
}
}
}
}
```

### Response fields

The response contains the following fields.

Field Name | Data Type | Description
:--- | :--- | :---
search_backpressure | Object | Contains statistics about search backpressure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, either delete the verb "contains" in line 173 and 174 or add a verb to each description in lines 175-177.

search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_shard_task | Object | Contains statistics specific to the search shard task.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_shard_task.<br>&nbsp;&nbsp;&nbsp;&nbsp;[resource_tracker_stats](#resource_tracker_stats) | Object | Statistics about the current tasks.
search_backpressure.<br>&nbsp;&nbsp;&nbsp;&nbsp;search_shard_task.<br>&nbsp;&nbsp;&nbsp;&nbsp;[calcellation_stats](#cancellation_stats) | Object | Statistics about the canceled tasks since the node last restarted.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
search_backpressure.mode | String | The [mode](#search-backpressure-modes) for search backpressure.

### `resource_tracker_stats`

The `resource_tracker_stats` object contains the statistics for each resource tracker: [`elapsed_time_tracker`](#elapsed_time_tracker), [`heap_usage_tracker`](#heap_usage_tracker), and [`cpu_usage_tracker`](#cpu_usage_tracker).

#### `elapsed_time_tracker`

The `elapsed_time_tracker` object contains the following statistics related to the elapsed time.

Field Name | Data Type | Description
:--- | :--- | :---
cancellation_count | Integer | The number of tasks canceled because of excessive elapsed time since the node last restarted.
current_max_millis | Integer | The maximum elapsed time for all tasks currently running on the node, in milliseconds.
current_avg_millis | Integer | The average elapsed time for all tasks currently running on the node, in milliseconds.

#### `heap_usage_tracker`

The `heap_usage_tracker` object contains the following statistics related to the heap usage.

Field Name | Data Type | Description
:--- | :--- | :---
cancellation_count | Integer | The number of tasks canceled because of excessive heap usage since the node last restarted.
current_max_bytes | Integer | The maximum heap usage for all tasks currently running on the node, in bytes.
current_avg_bytes | Integer | The average heap usage for all tasks currently running on the node, in bytes.
rolling_avg_bytes | Integer | The rolling average heap usage for the 100 most recent tasks, in bytes.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Rolling average is not hard-coded to work with just 100 most recent tasks.

This is a configurable setting defined by search_backpressure.search_shard_task.heap_moving_average_window_size. 100 is just the default value for it.


#### `cpu_usage_tracker`

The `cpu_usage_tracker` object contains the following statistics related to the CPU usage.

Field Name | Data Type | Description
:--- | :--- | :---
cancellation_count | Integer | The number of tasks canceled because of excessive CPU usage since the node last restarted.
current_max_millis | Integer | The maximum CPU time for all tasks currently running on the node, in milliseconds.
current_avg_millis | Integer | The average CPU time for all tasks currently running on the node, in milliseconds.

### `cancellation_stats`

The `cancellation_stats` object contains the following statistics for canceled tasks.

Field Name | Data Type | Description
:--- | :--- | :---
cancellation_count | Integer | The total number of canceled tasks since the node last restarted.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
cancellation_limit_reached_count | Integer | The number of situations when there were more tasks eligible for cancellation than the set cancellation threshold.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion _opensearch/segment-replication/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: default
title: Segment replication
nav_order: 63
nav_order: 64
has_children: true
redirect_from:
- /opensearch/segment-replication/
Expand Down