Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove MimirProvisioningTooManyActiveSeries alert #5593

Merged
merged 1 commit into from
Jul 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
* [CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
* [CHANGE] Dashboards: default to shared crosshair on all dashboards. #5489
* [CHANGE] Dashboards: sort variable drop-down lists from A to Z, rather than Z to A. #5490
* [CHANGE] Alerts: removed `MimirProvisioningTooManyActiveSeries` alert. You should configure `-ingester.instance-limits.max-series` and rely on `MimirIngesterReachingSeriesLimit` alert instead. #5593
* [ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
* [ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
* [ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
Expand Down
16 changes: 1 addition & 15 deletions docs/sources/mimir/manage/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ How to **investigate**:
- **`ingester`**
- Typically, ingester p99 latency is in the range 5-50ms. If the ingester latency is higher than this, you should investigate the root cause before scaling up ingesters.
- Check out the following alerts and fix them if firing:
- `MimirProvisioningTooManyActiveSeries`
- `MimirIngesterReachingSeriesLimit`
- `MimirProvisioningTooManyWrites`

#### Read Latency
Expand Down Expand Up @@ -776,20 +776,6 @@ How to **investigate**:
- `other`
- Check both Mimir and cache logs to find more details

### MimirProvisioningTooManyActiveSeries

This alert fires if the average number of in-memory series per ingester is above our target (1.5M).

How to **fix** it:

- Scale up ingesters
- To find out the Mimir clusters where ingesters should be scaled up and how many minimum replicas are expected:
```
ceil(sum by(cluster, namespace) (cortex_ingester_memory_series) / 1.5e6) >
count by(cluster, namespace) (cortex_ingester_memory_series)
```
- After the scale up, the in-memory series are expected to be reduced at the next TSDB head compaction (occurring every 2h)

### MimirProvisioningTooManyWrites

This alert fires if the average number of samples ingested / sec in ingesters is above our target.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -364,16 +364,6 @@ spec:
severity: critical
- name: mimir-provisioning
rules:
- alert: MimirProvisioningTooManyActiveSeries
annotations:
message: |
The number of in-memory series per ingester in {{ $labels.cluster }}/{{ $labels.namespace }} is too high.
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimirprovisioningtoomanyactiveseries
expr: |
avg by (cluster, namespace) (cortex_ingester_memory_series) > 1.6e6
for: 2h
labels:
severity: warning
- alert: MimirProvisioningTooManyWrites
annotations:
message: |
Expand Down
10 changes: 0 additions & 10 deletions operations/mimir-mixin-compiled-baremetal/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -352,16 +352,6 @@ groups:
severity: critical
- name: mimir-provisioning
rules:
- alert: MimirProvisioningTooManyActiveSeries
annotations:
message: |
The number of in-memory series per ingester in {{ $labels.cluster }}/{{ $labels.namespace }} is too high.
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimirprovisioningtoomanyactiveseries
expr: |
avg by (cluster, namespace) (cortex_ingester_memory_series) > 1.6e6
for: 2h
labels:
severity: warning
- alert: MimirProvisioningTooManyWrites
annotations:
message: |
Expand Down
10 changes: 0 additions & 10 deletions operations/mimir-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -352,16 +352,6 @@ groups:
severity: critical
- name: mimir-provisioning
rules:
- alert: MimirProvisioningTooManyActiveSeries
annotations:
message: |
The number of in-memory series per ingester in {{ $labels.cluster }}/{{ $labels.namespace }} is too high.
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimirprovisioningtoomanyactiveseries
expr: |
avg by (cluster, namespace) (cortex_ingester_memory_series) > 1.6e6
for: 2h
labels:
severity: warning
- alert: MimirProvisioningTooManyWrites
annotations:
message: |
Expand Down
18 changes: 0 additions & 18 deletions operations/mimir-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -538,24 +538,6 @@ local utils = import 'mixin-utils/utils.libsonnet';
{
name: 'mimir-provisioning',
rules: [
{
alert: $.alertName('ProvisioningTooManyActiveSeries'),
// We target each ingester to 1.5M in-memory series. This alert fires if the average
// number of series / ingester in a Mimir cluster is > 1.6M for 2h (we compact
// the TSDB head every 2h).
expr: |||
avg by (%s) (cortex_ingester_memory_series) > 1.6e6
||| % [$._config.alert_aggregation_labels],
'for': '2h',
labels: {
severity: 'warning',
},
annotations: {
message: |||
The number of in-memory series per ingester in %(alert_aggregation_variables)s is too high.
||| % $._config,
},
},
{
alert: $.alertName('ProvisioningTooManyWrites'),
// 80k writes / s per ingester max.
Expand Down
Loading