You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Query frontend, querier and receive on Thanos version 0.32.2
Sidecar on Thanos version 0.30.2
What happened:
The same query returned different results when executed at different times when deduplication was enabled. This happens for queries on data in sidecars (v0.30.2) or receives (v0.32.2)
What you expected to happen:
The same query always returns the same results.
How to reproduce it (as minimally and precisely as possible):
I attach two videos illustrating the problem:
1 - dedup_bug_sidecar_0_30_2.mov
dedup_bug_sidecar_0_30_2.mov
Here the data is scraped by Prometheus HA (2 replicas) and queried on its Thanos sidecar.
The query `aggregator_unavailable_apiservice_total{cluster="osdp-prod-azu-switzerlandnorth-1",name="v1beta1.metrics.k8s.io"} returns two series when deduplication is disabled, one for each prometheus instance:
{prometheus_replica="prometheus-osdp-monitoring-prometheus-0"} 3
{prometheus_replica="prometheus-osdp-monitoring-prometheus-1"} 5
When querying the raw data for the past 30 minutes with deduplication is enabled the query returns 3 most of the time, however sometimes it returns 5 until time X and 3 afterwards. This X is always around the top of the hour.
2 - dedup_bug_receive_0_32_2.mov
dedup_bug_receive_0_32_2.mov
Here the data is scrapped by Prometheus HA (2 replicas), remote written to Thanos receives (factor 2 replication) and queried from there.
The query `aggregator_unavailable_apiservice_total{cluster="osse-prod-azu-eastus-1",name="v1beta1.custom.metrics.k8s.io"} returns four series when deduplication is disabled, one for each prometheus instance and receive replica combination:
As in the example above, different prometheus replicas have store different values.
When querying the raw data for the past 30 minutes with deduplication enabled the query returns 8 most of the time, however sometimes it returns 8 until time X and 7 afterwards. Again, this X is always around the top of the hour.
Anything else we need to know:
I can upgrade the sidecar to 0.32.2 if you would like. But I think showing the bug was there in 0.30.2 is still interesting since 0.31.0 was the source of some querying issues.
On the querier there is deduplication on the following labels
Thanos, Prometheus and Golang version used:
Query frontend, querier and receive on Thanos version 0.32.2
Sidecar on Thanos version 0.30.2
What happened:
The same query returned different results when executed at different times when deduplication was enabled. This happens for queries on data in sidecars (v0.30.2) or receives (v0.32.2)
What you expected to happen:
The same query always returns the same results.
How to reproduce it (as minimally and precisely as possible):
I attach two videos illustrating the problem:
1 - dedup_bug_sidecar_0_30_2.mov
dedup_bug_sidecar_0_30_2.mov
Here the data is scraped by Prometheus HA (2 replicas) and queried on its Thanos sidecar.
The query `aggregator_unavailable_apiservice_total{cluster="osdp-prod-azu-switzerlandnorth-1",name="v1beta1.metrics.k8s.io"} returns two series when deduplication is disabled, one for each prometheus instance:
{prometheus_replica="prometheus-osdp-monitoring-prometheus-0"} 3
{prometheus_replica="prometheus-osdp-monitoring-prometheus-1"} 5
When querying the raw data for the past 30 minutes with deduplication is enabled the query returns 3 most of the time, however sometimes it returns 5 until time X and 3 afterwards. This X is always around the top of the hour.
2 - dedup_bug_receive_0_32_2.mov
dedup_bug_receive_0_32_2.mov
Here the data is scrapped by Prometheus HA (2 replicas), remote written to Thanos receives (factor 2 replication) and queried from there.
The query `aggregator_unavailable_apiservice_total{cluster="osse-prod-azu-eastus-1",name="v1beta1.custom.metrics.k8s.io"} returns four series when deduplication is disabled, one for each prometheus instance and receive replica combination:
{prometheus_replica="prometheus-osdp-monitoring-prometheus-0",receive_replica="thanos-receive-cloudinfrastructure-1"} 8
{prometheus_replica="prometheus-osdp-monitoring-prometheus-0",receive_replica="thanos-receive-cloudinfrastructure-3"} 8
{prometheus_replica="prometheus-osdp-monitoring-prometheus-1",receive_replica="thanos-receive-cloudinfrastructure-1"} 7
{prometheus_replica="prometheus-osdp-monitoring-prometheus-1",receive_replica="thanos-receive-cloudinfrastructure-3"} 7
As in the example above, different prometheus replicas have store different values.
When querying the raw data for the past 30 minutes with deduplication enabled the query returns 8 most of the time, however sometimes it returns 8 until time X and 7 afterwards. Again, this X is always around the top of the hour.
Anything else we need to know:
I can upgrade the sidecar to 0.32.2 if you would like. But I think showing the bug was there in 0.30.2 is still interesting since 0.31.0 was the source of some querying issues.
On the querier there is deduplication on the following labels
The text was updated successfully, but these errors were encountered: