Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query results cache hit ratio in the 'Mimir / Queries' dashboard #5423

Merged
merged 2 commits into from
Jul 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,11 @@

* [CHANGE] Dashboards: show all workloads in selected namespace on "rollout progress" dashboard. #5113
* [CHANGE] Dashboards: show the number of updated and ready pods for each workload in the "rollout progress" panel on the "rollout progress" dashboard. #5113
* [CHANGE] Dashboards: removed "Query results cache misses" panel on the "Mimir / Queries" dashboard. #5423
* [ENHANCEMENT] Dashboards: adjust layout of "rollout progress" dashboard panels so that the "rollout progress" panel doesn't require scrolling. #5113
* [ENHANCEMENT] Dashboards: show container name first in "pods count per version" panel on "rollout progress" dashboard. #5113
* [ENHANCEMENT] Dashboards: show time spend waiting for turn when lazy loading index headers in the "index-header lazy load gate latency" panel on the "queries" dashboard. #5313
* [ENHANCEMENT] Dashboards: split query results cache hit ratio by request type in "Query results cache hit ratio" panel on the "Mimir / Queries" dashboard. #5423
* [BUGFIX] Alerts: fix `MimirIngesterRestarts` to fire only when the ingester container is restarted, excluding the cases the pod is rescheduled. #5397
* [BUGFIX] Dashboards: fix "unhealthy pods" panel on "rollout progress" dashboard showing only a number rather than the name of the workload and the number of unhealthy pods if only one workload has unhealthy pods. #5113 #5200
* [BUGFIX] Alerts: fixed `MimirIngesterHasNotShippedBlocks` and `MimirIngesterHasNotShippedBlocksSinceStart` alerts. #5396
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13156,7 +13156,7 @@ data:
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 3,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
Expand Down Expand Up @@ -13232,15 +13232,15 @@ data:
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 3,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "# Query metrics before and after migration to new memcached backend.\nsum (\n rate(cortex_cache_hits{name=~\"frontend.+\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n or\n rate(thanos_cache_memcached_hits_total{name=\"frontend-cache\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n)\n/\nsum (\n rate(cortex_cache_fetched_keys{name=~\"frontend.+\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n or\n rate(thanos_cache_memcached_requests_total{name=~\"frontend-cache\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n)\n",
"expr": "# Query the new metric introduced in Mimir 2.10.\n(\n sum by(request_type) (rate(cortex_frontend_query_result_cache_hits_total{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval]))\n /\n sum by(request_type) (rate(cortex_frontend_query_result_cache_requests_total{cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval]))\n)\n# Otherwise fallback to the previous general-purpose metrics.\nor\n(\n label_replace(\n # Query metrics before and after migration to new memcached backend.\n sum (\n rate(cortex_cache_hits{name=~\"frontend.+\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n or\n rate(thanos_cache_memcached_hits_total{name=\"frontend-cache\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n )\n /\n sum (\n rate(cortex_cache_fetched_keys{name=~\"frontend.+\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n or\n rate(thanos_cache_memcached_requests_total{name=~\"frontend-cache\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n ),\n \"request_type\", \"query_range\", \"\", \"\")\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Hit ratio",
"legendFormat": "{{request_type}}",
"legendLink": null,
"step": 10
}
Expand Down Expand Up @@ -13281,82 +13281,6 @@ data:
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 3,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "# Query metrics before and after migration to new memcached backend.\nsum (\n rate(cortex_cache_fetched_keys{name=~\"frontend.+\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n or\n rate(thanos_cache_memcached_requests_total{name=\"frontend-cache\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n)\n-\nsum (\n rate(cortex_cache_hits{name=~\"frontend.+\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n or\n rate(thanos_cache_memcached_hits_total{name=~\"frontend-cache\", cluster=~\"$cluster\", job=~\"($namespace)/((query-frontend.*|cortex|mimir|mimir-read.*))\"}[$__rate_interval])\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Missed query results per second",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Query results cache misses",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
Expand All @@ -13365,7 +13289,7 @@ data:
"datasource": "$datasource",
"description": "### Query results cache skipped\nThe % of queries whose results could not be cached.\nIt is tracked for each split query when the splitting by interval is enabled.\n\n",
"fill": 10,
"id": 11,
"id": 10,
"legend": {
"avg": false,
"current": false,
Expand All @@ -13385,7 +13309,7 @@ data:
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 3,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
Expand Down Expand Up @@ -13454,7 +13378,7 @@ data:
"datasource": "$datasource",
"description": "### Sharded queries ratio\nThe % of queries that have been successfully rewritten and executed in a shardable way.\nThis panel only takes into account the type of queries that are supported by query sharding (eg. range queries).\n\n",
"fill": 1,
"id": 12,
"id": 11,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -13531,7 +13455,7 @@ data:
"datasource": "$datasource",
"description": "### Number of sharded queries per query\nThe number of sharded queries that have been executed for a single input query. It only tracks queries that\nhave been successfully rewritten in a shardable way.\n\n",
"fill": 1,
"id": 13,
"id": 12,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -13635,7 +13559,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 14,
"id": 13,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -13727,7 +13651,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 15,
"id": 14,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -13819,7 +13743,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 16,
"id": 15,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -13923,7 +13847,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 17,
"id": 16,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14015,7 +13939,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 18,
"id": 17,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14109,7 +14033,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 19,
"id": 18,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14197,7 +14121,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 20,
"id": 19,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14292,7 +14216,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 21,
"id": 20,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14376,7 +14300,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 22,
"id": 21,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14480,7 +14404,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 23,
"id": 22,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14556,7 +14480,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 24,
"id": 23,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14632,7 +14556,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 25,
"id": 24,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14720,7 +14644,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 26,
"id": 25,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14796,7 +14720,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 27,
"id": 26,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14873,7 +14797,7 @@ data:
"datasource": "$datasource",
"description": "### Series batch preloading efficiency\nThis panel shows the % of time reduced by preloading, for Series() requests which have been\nsplit to 2+ batches. If a Series() request is served within a single batch, then preloading\nis not triggered, and thus not counted in this measurement.\n\n",
"fill": 1,
"id": 28,
"id": 27,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -14961,7 +14885,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 29,
"id": 28,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15040,7 +14964,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 30,
"id": 29,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15127,7 +15051,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 31,
"id": 30,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15223,7 +15147,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 32,
"id": 31,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15299,7 +15223,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 33,
"id": 32,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15392,7 +15316,7 @@ data:
"datasource": "$datasource",
"description": "### Index-header lazy load gate latency\nTime spent waiting for a turn to load an index header. This time is not included in \"Index-header lazy load duration.\"\n\n",
"fill": 1,
"id": 34,
"id": 33,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15496,7 +15420,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 35,
"id": 34,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15572,7 +15496,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 36,
"id": 35,
"legend": {
"avg": false,
"current": false,
Expand Down Expand Up @@ -15648,7 +15572,7 @@ data:
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 37,
"id": 36,
"legend": {
"avg": false,
"current": false,
Expand Down
Loading
Loading