Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scaling dashboard to work on multi-zone ingesters #365

Merged
merged 3 commits into from
Jul 28, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
* [BUGFIX] Fixed `CortexIngesterHasNotShippedBlocks` alert false positive in case an ingester instance had ingested samples in the past, then no traffic was received for a long period and then it started receiving samples again. #308
* [BUGFIX] Alertmanager: fixed `--alertmanager.cluster.peers` CLI flag passed to alertmanager when HA is enabled. #329
* [BUGFIX] Fixed `CortexInconsistentRuntimeConfig` metric. #335
* [BUGFIX] Fixed scaling dashboard to correctly work when a Cortex service deployment spans across multiple zones (a zone is expected to have the `zone-[a-z]` suffix). #365

## 1.9.0 / 2021-05-18

Expand Down
63 changes: 47 additions & 16 deletions cortex-mixin/recording_rules.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,21 @@ local utils = import 'mixin-utils/utils.libsonnet';
rules: [
{
// Convenience rule to get the number of replicas for both a deployment and a statefulset.
// Multi-zone deployments are grouped together removing the "zone-X" suffix.
record: 'cluster_namespace_deployment:actual_replicas:count',
expr: |||
sum by (cluster, namespace, deployment) (kube_deployment_spec_replicas)
or
sum by (cluster, namespace, deployment) (
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*)")
label_replace(
kube_deployment_spec_replicas,
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the first question mark necessary? Shouldn't it be:

Suggested change
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
"deployment", "$1", "deployment", "(.*)(?:-zone-[a-z])?"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first question mark is to make it non-greedy. Since the (?:-zone-[a-z])? is optional (ending ?), if the first (.*) is greedy then it always match everything and never removes the zone. Adding (.*?) we make the first .* non greedy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! Can you add a comment to this effect please?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

)
)
or
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*)"),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner label replace is just moving the statefuleset label to the deployment label, so could be done with this I believe:

Suggested change
label_replace(
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*)"),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*?)(?:-zone-[a-z])?"),

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I've applied the suggested change and manually tested it.

)
|||,
},
Expand Down Expand Up @@ -188,7 +197,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
expr: |||
ceil(
(sum by (cluster, namespace) (
cortex_ingester_tsdb_storage_blocks_bytes{job=~".+/ingester"}
cortex_ingester_tsdb_storage_blocks_bytes{job=~".+/ingester.*"}
) / 4)
/
avg by (cluster, namespace) (
Expand All @@ -199,18 +208,23 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
{
// Convenience rule to get the CPU utilization for both a deployment and a statefulset.
// Multi-zone deployments are grouped together removing the "zone-X" suffix.
record: 'cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate',
expr: |||
sum by (cluster, namespace, deployment) (
label_replace(
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
label_replace(
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
|||,
},
{
// Convenience rule to get the CPU request for both a deployment and a statefulset.
// Multi-zone deployments are grouped together removing the "zone-X" suffix.
record: 'cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum',
expr: |||
# This recording rule is made compatible with the breaking changes introduced in kube-state-metrics v2
Expand All @@ -223,8 +237,11 @@ local utils = import 'mixin-utils/utils.libsonnet';
(
sum by (cluster, namespace, deployment) (
label_replace(
kube_pod_container_resource_requests_cpu_cores,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
label_replace(
kube_pod_container_resource_requests_cpu_cores,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
Expand All @@ -234,8 +251,11 @@ local utils = import 'mixin-utils/utils.libsonnet';
(
sum by (cluster, namespace, deployment) (
label_replace(
kube_pod_container_resource_requests{resource="cpu"},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
label_replace(
kube_pod_container_resource_requests{resource="cpu"},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
Expand All @@ -261,18 +281,23 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
{
// Convenience rule to get the Memory utilization for both a deployment and a statefulset.
// Multi-zone deployments are grouped together removing the "zone-X" suffix.
record: 'cluster_namespace_deployment:container_memory_usage_bytes:sum',
expr: |||
sum by (cluster, namespace, deployment) (
label_replace(
container_memory_usage_bytes,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
label_replace(
container_memory_usage_bytes,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
|||,
},
{
// Convenience rule to get the Memory request for both a deployment and a statefulset.
// Multi-zone deployments are grouped together removing the "zone-X" suffix.
record: 'cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum',
expr: |||
# This recording rule is made compatible with the breaking changes introduced in kube-state-metrics v2
Expand All @@ -285,8 +310,11 @@ local utils = import 'mixin-utils/utils.libsonnet';
(
sum by (cluster, namespace, deployment) (
label_replace(
kube_pod_container_resource_requests_memory_bytes,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
label_replace(
kube_pod_container_resource_requests_memory_bytes,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
Expand All @@ -296,8 +324,11 @@ local utils = import 'mixin-utils/utils.libsonnet';
(
sum by (cluster, namespace, deployment) (
label_replace(
kube_pod_container_resource_requests{resource="memory"},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
label_replace(
kube_pod_container_resource_requests{resource="memory"},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
Expand Down