Skip to content

Commit

Permalink
Remove MimirProvisioningTooManyActiveSeries alert (#5593)
Browse files Browse the repository at this point in the history
Signed-off-by: Marco Pracucci <marco@pracucci.com>
  • Loading branch information
pracucci committed Jul 28, 2023
1 parent e34c6cc commit 242f6ac
Show file tree
Hide file tree
Showing 6 changed files with 2 additions and 63 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
* [CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
* [CHANGE] Dashboards: default to shared crosshair on all dashboards. #5489
* [CHANGE] Dashboards: sort variable drop-down lists from A to Z, rather than Z to A. #5490
* [CHANGE] Alerts: removed `MimirProvisioningTooManyActiveSeries` alert. You should configure `-ingester.instance-limits.max-series` and rely on `MimirIngesterReachingSeriesLimit` alert instead. #5593
* [ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
* [ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
* [ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
Expand Down
16 changes: 1 addition & 15 deletions docs/sources/mimir/manage/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ How to **investigate**:
- **`ingester`**
- Typically, ingester p99 latency is in the range 5-50ms. If the ingester latency is higher than this, you should investigate the root cause before scaling up ingesters.
- Check out the following alerts and fix them if firing:
- `MimirProvisioningTooManyActiveSeries`
- `MimirIngesterReachingSeriesLimit`
- `MimirProvisioningTooManyWrites`

#### Read Latency
Expand Down Expand Up @@ -776,20 +776,6 @@ How to **investigate**:
- `other`
- Check both Mimir and cache logs to find more details

### MimirProvisioningTooManyActiveSeries

This alert fires if the average number of in-memory series per ingester is above our target (1.5M).

How to **fix** it:

- Scale up ingesters
- To find out the Mimir clusters where ingesters should be scaled up and how many minimum replicas are expected:
```
ceil(sum by(cluster, namespace) (cortex_ingester_memory_series) / 1.5e6) >
count by(cluster, namespace) (cortex_ingester_memory_series)
```
- After the scale up, the in-memory series are expected to be reduced at the next TSDB head compaction (occurring every 2h)

### MimirProvisioningTooManyWrites

This alert fires if the average number of samples ingested / sec in ingesters is above our target.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -364,16 +364,6 @@ spec:
severity: critical
- name: mimir-provisioning
rules:
- alert: MimirProvisioningTooManyActiveSeries
annotations:
message: |
The number of in-memory series per ingester in {{ $labels.cluster }}/{{ $labels.namespace }} is too high.
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimirprovisioningtoomanyactiveseries
expr: |
avg by (cluster, namespace) (cortex_ingester_memory_series) > 1.6e6
for: 2h
labels:
severity: warning
- alert: MimirProvisioningTooManyWrites
annotations:
message: |
Expand Down
10 changes: 0 additions & 10 deletions operations/mimir-mixin-compiled-baremetal/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -352,16 +352,6 @@ groups:
severity: critical
- name: mimir-provisioning
rules:
- alert: MimirProvisioningTooManyActiveSeries
annotations:
message: |
The number of in-memory series per ingester in {{ $labels.cluster }}/{{ $labels.namespace }} is too high.
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimirprovisioningtoomanyactiveseries
expr: |
avg by (cluster, namespace) (cortex_ingester_memory_series) > 1.6e6
for: 2h
labels:
severity: warning
- alert: MimirProvisioningTooManyWrites
annotations:
message: |
Expand Down
10 changes: 0 additions & 10 deletions operations/mimir-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -352,16 +352,6 @@ groups:
severity: critical
- name: mimir-provisioning
rules:
- alert: MimirProvisioningTooManyActiveSeries
annotations:
message: |
The number of in-memory series per ingester in {{ $labels.cluster }}/{{ $labels.namespace }} is too high.
runbook_url: https://grafana.com/docs/mimir/latest/operators-guide/mimir-runbooks/#mimirprovisioningtoomanyactiveseries
expr: |
avg by (cluster, namespace) (cortex_ingester_memory_series) > 1.6e6
for: 2h
labels:
severity: warning
- alert: MimirProvisioningTooManyWrites
annotations:
message: |
Expand Down
18 changes: 0 additions & 18 deletions operations/mimir-mixin/alerts/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -538,24 +538,6 @@ local utils = import 'mixin-utils/utils.libsonnet';
{
name: 'mimir-provisioning',
rules: [
{
alert: $.alertName('ProvisioningTooManyActiveSeries'),
// We target each ingester to 1.5M in-memory series. This alert fires if the average
// number of series / ingester in a Mimir cluster is > 1.6M for 2h (we compact
// the TSDB head every 2h).
expr: |||
avg by (%s) (cortex_ingester_memory_series) > 1.6e6
||| % [$._config.alert_aggregation_labels],
'for': '2h',
labels: {
severity: 'warning',
},
annotations: {
message: |||
The number of in-memory series per ingester in %(alert_aggregation_variables)s is too high.
||| % $._config,
},
},
{
alert: $.alertName('ProvisioningTooManyWrites'),
// 80k writes / s per ingester max.
Expand Down

0 comments on commit 242f6ac

Please sign in to comment.