Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Descrepancy in labels using api/v1/series and api/v1/query when external label and internal label has the same key #6844

Closed
andrejshapal opened this issue Oct 25, 2023 · 26 comments · Fixed by #6874

Comments

@andrejshapal
Copy link

Hello,

I am using Thanos 0.32.5.

Issue:
It was noticed flaky issue, that Thanos always exposing external_label when executing queries. But it randomly giving external_label or internal_label value when labels key is the same when querieng api/v1/series.

Here is prometheus output:
image

And after application external_label, the new label cluster shows in thanos:
image

But when I am querieng api/v1/series endpoint, it randomly gives value of cluster:
{ "status": "success", "data": [ { "__name__": "collectd_collectd_queue_length", "cassandra_datastax_com_cluster": "cassandra", "cassandra_datastax_com_datacenter": "dc1", "cluster": "cassandra", "collectd": "write_queue", "container": "cassandra", "dc": "dc1", "endpoint": "prometheus", "exported_instance": "10.2.150.192", "instance": "10.2.150.192:9103", "job": "cassandra-dc1-all-pods-service", "namespace": "cit1-core", "pod": "cassandra-dc1-r2-sts-0", "prometheus": "monitoring/kube-prometheus-stack-prometheus", "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0", "rack": "r2", "service": "cassandra-dc1-all-pods-service" }, { "__name__": "collectd_collectd_queue_length", "cassandra_datastax_com_cluster": "cassandra", "cassandra_datastax_com_datacenter": "dc1", "cluster": "cassandra", "collectd": "write_queue", "container": "cassandra", "dc": "dc1", "endpoint": "prometheus", "exported_instance": "10.2.151.7", "instance": "10.2.151.7:9103", "job": "cassandra-dc1-all-pods-service", "namespace": "cit1-core", "pod": "cassandra-dc1-r3-sts-0", "prometheus": "monitoring/kube-prometheus-stack-prometheus", "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0", "rack": "r3", "service": "cassandra-dc1-all-pods-service" } ] }

Expected:
Any api call should prioritise external_label and return it as a result of request.

Possible solution:
The current solution is to rename internal label in scrape config. But mostly we are using configs via helm out of the box. Meaning, we do not set configs. Therefore, there is a chance external label match some random label from some random metric. Since that, this is worth to fix discrepancy.
Series api endpoint is used by Grafana.

@GiedriusS
Copy link
Member

Are all components on the same version?

@andrejshapal
Copy link
Author

@GiedriusS sidecars have 0.32.4, the rest 0.32.5.

@mhoffm-aiven
Copy link
Contributor

Hey, are you able to share some downstream blocks so we can try to reproduce locally? Also what downstream stores apis are you querying?

@andrejshapal
Copy link
Author

andrejshapal commented Oct 25, 2023

@mhoffm-aiven

what downstream stores apis are you querying

Not sure what are downstream stores apis. But we have global thanos, which is querieng querier on another cluster via grpc, which works with thanos sidecar. Data is stored in gcp bucket.

are you able to share some downstream blocks so we can try to reproduce locally

Can you help me with some guide? Should I just send you some chunks from bucket? If so, how can I find necessary chunks (we have too many of them and no "created at" in gcp bucket".

@MichaHoffmann
Copy link
Contributor

Mh, ok so sharing might not be practical. With "downsteam store api" i meant essentially "--endpoint"s!

@MichaHoffmann
Copy link
Contributor

Can you bump sidecars to 0.32.5? There was this

#6816 Store: fix prometheus store label values matches for external labels

Which feels somewhat related.

@andrejshapal
Copy link
Author

andrejshapal commented Oct 26, 2023

@MichaHoffmann
I suspect it is enough just to bump sidecar on the cluster where the issue is reproducible?
If so, I bumped to 0.32.5:

{
    "status": "success",
    "data": [
        {
            "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
            "cassandra_datastax_com_cluster": "-cassandra",
            "cassandra_datastax_com_datacenter": "dc1",
            "cluster": "cit1-k8s",
            "container": "cassandra",
            "datacenter": "dc1",
            "endpoint": "metrics",
            "exported_instance": "10.2.145.73",
            "host": "3a2b71b6-8026-4323-a5d9-6b9420258bc5",
            "instance": "10.2.145.73:9000",
            "job": "-cassandra-dc1-all-pods-service",
            "namespace": "cit1--core",
            "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-6w90",
            "pod": "-cassandra-dc1-r3-sts-0",
            "pod_name": "-cassandra-dc1-r3-sts-0",
            "pool_name": "PerDiskMemtableFlushWriter_0",
            "pool_type": "internal",
            "prometheus": "monitoring/kube-prometheus-stack-prometheus",
            "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
            "rack": "r3",
            "service": "-cassandra-dc1-all-pods-service"
        },
        {
            "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
            "cassandra_datastax_com_cluster": "-cassandra",
            "cassandra_datastax_com_datacenter": "dc1",
            "cluster": "cit1-k8s",
            "container": "cassandra",
            "datacenter": "dc1",
            "endpoint": "metrics",
            "exported_instance": "10.2.147.193",
            "host": "1252ec4c-66b7-47de-9745-42d368198c3e",
            "instance": "10.2.147.193:9000",
            "job": "-cassandra-dc1-all-pods-service",
            "namespace": "cit1--core",
            "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-xmr5",
            "pod": "-cassandra-dc1-r2-sts-0",
            "pod_name": "-cassandra-dc1-r2-sts-0",
            "pool_name": "PerDiskMemtableFlushWriter_0",
            "pool_type": "internal",
            "prometheus": "monitoring/kube-prometheus-stack-prometheus",
            "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
            "rack": "r2",
            "service": "-cassandra-dc1-all-pods-service"
        },
        {
            "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
            "cassandra_datastax_com_cluster": "-cassandra",
            "cassandra_datastax_com_datacenter": "dc1",
            "cluster": "-cassandra",
            "container": "cassandra",
            "datacenter": "dc1",
            "endpoint": "metrics",
            "exported_instance": "10.2.150.131",
            "host": "1cae4b22-a89b-451f-8f02-d276b86efb83",
            "instance": "10.2.150.131:9000",
            "job": "-cassandra-dc1-all-pods-service",
            "namespace": "cit1--core",
            "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-movi",
            "pod": "-cassandra-dc1-r1-sts-0",
            "pod_name": "-cassandra-dc1-r1-sts-0",
            "pool_name": "PerDiskMemtableFlushWriter_0",
            "pool_type": "internal",
            "prometheus": "monitoring/kube-prometheus-stack-prometheus",
            "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
            "rack": "r1",
            "service": "-cassandra-dc1-all-pods-service"
        }
    ]
}

Issue not gone.
image

what downstream stores apis are you querying
With "downsteam store api" i meant essentially "--endpoint"s!

Well, we have many endpoints.

        - query
        - '--log.level=info'
        - '--log.format=logfmt'
        - '--grpc-address=0.0.0.0:10901'
        - '--http-address=0.0.0.0:10902'
        - '--query.replica-label=replica'
        - '--endpoint=thanos-sidecar-querier-query-grpc.monitoring.svc:10901'
        - '--endpoint=thanos-storegateway.monitoring.svc:10901'
        - '--endpoint=lv01-prometheus01.int.company.live:10903'
        - '--endpoint=lv01-prometheus02.int.company.live:10903'
        - '--endpoint=ro01-prometheus01.int.company.live:10903'
        - '--endpoint=ro01-prometheus02.int.company.live:10903'
        - '--endpoint=ge01-prometheus01.int.company.live:10903'
        - '--endpoint=ge01-prometheus02.int.company.live:10903'
        - '--endpoint=thanos.ci.int.company.live:443'
        - '--endpoint=thanos.ci-en1.int.company.live:443'
        - '--endpoint=thanos.dev.int.company.live:443'
        - '--endpoint=thanos.live.int.company.live:443'
        - '--endpoint=thanos-1.global.int.company.live:443'
        - >-
          --endpoint=astradb-thanos-sidecar-querier-query-grpc.monitoring.svc:10901
        - '--grpc-client-tls-secure'
        - '--grpc-client-tls-cert=/certs/client/tls.crt'
        - '--grpc-client-tls-key=/certs/client/tls.key'
        - '--grpc-client-tls-ca=/certs/client/ca.crt'

The one, which have metrics in questions is thanos.dev.int.company.live:443

@yeya24
Copy link
Contributor

yeya24 commented Oct 27, 2023

@andrejshapal Can you try bumping up the version? Seems it is the same bug fixed in v0.32.5

@andrejshapal
Copy link
Author

@yeya24
Hello,
Bumped everything to 0.32.5 and still see the same issue.

@MichaHoffmann
Copy link
Contributor

Hey @andrejshapal can you share configuration of the offending thanos.dev.int.company.live please?

@andrejshapal
Copy link
Author

andrejshapal commented Oct 27, 2023

@MichaHoffmann Sure:

spec:
  project: application-support
  sources:
    - repoURL: https://helm.onairent.live
      chart: any-resource
      targetRevision: "0.1.0"
      helm:
        values: |
          anyResources:
    - repoURL: https://charts.bitnami.com/bitnami
      chart: thanos
      targetRevision: "12.13.12"
      helm:
        values: |
          fullnameOverride: thanos-sidecar-querier
          query:
            dnsDiscovery:
              enabled: true
              sidecarsService: kube-prometheus-stack-thanos-discovery
              sidecarsNamespace: monitoring

            service:
              annotations:
                traefik.ingress.kubernetes.io/service.serversscheme: h2c

            serviceGrpc:
              annotations:
                traefik.ingress.kubernetes.io/service.serversscheme: h2c

            ingress:
              grpc:
                enabled: true
                ingressClassName: traefik-internal
                annotations:
                  traefik.ingress.kubernetes.io/router.tls.options: monitoring-thanos@kubernetescrd
                hostname: thanos.dev.int.company.live
                extraTls:
                  - hosts:
                      - thanos.dev.int.company.live
                    secretName: thanos-client-server-cert-1

          bucketweb:
            enabled: false

          compactor:
            enabled: false

          storegateway:
            enabled: false

          receive:
            enabled: false

          metrics:
            enabled: true
            serviceMonitor:
              enabled: true
              labels:
                prometheus: main

I also noticed it returns one cluster untill 07:00 27/10/2023 (local time, now is 12:41) and at 07:05 already 2 "clusters".

@MichaHoffmann
Copy link
Contributor

can you share the prometheus configurations from the instances that monitor the offending cassandra cluster too please?

@andrejshapal
Copy link
Author

We use kube-prometheus-stack. Nothing really special:

    - repoURL: https://prometheus-community.github.io/helm-charts
      chart: kube-prometheus-stack
      targetRevision: "50.3.1"
      helm:
        values: |
          fullnameOverride: kube-prometheus-stack
          commonLabels:
            prometheus: main

          defaultRules:
            create: false

          kube-state-metrics:
            fullnameOverride: kube-state-metrics
            prometheus:
              monitor:
                enabled: true
                additionalLabels:
                  prometheus: main
                metricRelabelings:
                  - action: labeldrop
                    regex: container_id
                  - action: labeldrop
                    regex: uid
                  - sourceLabels: [__name__]
                    action: drop
                    regex: 'kube_configmap_(annotations|created|info|labels|metadata_resource_version)'
            collectors:
              - certificatesigningrequests
              - configmaps
              - cronjobs
              - daemonsets
              - deployments
              - endpoints
              - horizontalpodautoscalers
              - ingresses
              - jobs
              - limitranges
              - mutatingwebhookconfigurations
              - namespaces
              - networkpolicies
              - nodes
              - persistentvolumeclaims
              - persistentvolumes
              - poddisruptionbudgets
              - pods
              - replicasets
              - replicationcontrollers
              - resourcequotas
              - secrets
              - services
              - statefulsets
              - storageclasses
              - validatingwebhookconfigurations
              - volumeattachments
            metricLabelsAllowlist:
              - pods=[version]

          kubeScheduler:
            enabled: false

          kubeEtcd:
            enabled: false

          kubeProxy:
            enabled: false

          kubeControllerManager:
            enabled: false

          prometheus-node-exporter:
            fullnameOverride: node-exporter
            extraArgs:
              - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/)
              - --collector.filesystem.fs-types-exclude=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|tmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
            prometheus:
              monitor:
                enabled: true
                additionalLabels:
                  prometheus: main
                relabelings:
                  - action: replace
                    sourceLabels:
                    - __meta_kubernetes_pod_node_name
                    targetLabel: instance

          coreDns:
            enabled: false

          kubelet:
            enabled: true
            serviceMonitor:
              cAdvisorMetricRelabelings:
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_cpu_(cfs_throttled_seconds_total|load_average_10s|system_seconds_total|user_seconds_total)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_fs_(io_current|io_time_seconds_total|io_time_weighted_seconds_total|reads_merged_total|sector_reads_total|sector_writes_total|writes_merged_total)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_memory_(mapped_file|swap)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_(file_descriptors|tasks_state|threads_max)'
                - sourceLabels: [__name__]
                  action: drop
                  regex: 'container_spec.*'
                - sourceLabels: [id, pod]
                  action: drop
                  regex: '.+;'
                - action: labeldrop
                  regex: id
                - action: labeldrop
                  regex: name
                - action: labeldrop
                  regex: uid

              cAdvisorRelabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

              probesMetricRelabelings:
                - action: labeldrop
                  regex: pod_uid

              probesRelabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

              resourceRelabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

              relabelings:
                - action: replace
                  sourceLabels: [__metrics_path__]
                  targetLabel: metrics_path

          grafana:
            enabled: false

          alertmanager:
            enabled: false

          prometheus:
            enabled: true
            monitor:
              additionalLabels:
                prometheus: main

            serviceAccount:
              create: true
              name: "prometheus"

            thanosService:
              enabled: true

            thanosServiceMonitor:
              enabled: true

            ingress:
              enabled: true
              annotations:
                kubernetes.io/ingress.class: traefik-internal
              hosts:
                - prometheus.dev.int.company.live
              tls:
              - hosts:
                  - prometheus.dev.int.company.live
                secretName: wildcard-dev-int-company-live

            prometheusSpec:
              enableRemoteWriteReceiver: true
              serviceAccountName: prometheus
              enableAdminAPI: true
              disableCompaction: true
              scrapeInterval: 10s
              retention: 2h
              additionalScrapeConfigsSecret:
                enabled: false
              storageSpec:
                volumeClaimTemplate:
                  spec:
                    accessModes: ["ReadWriteOnce"]
                    resources:
                      requests:
                        storage: 20Gi

              externalLabels:
                cluster: cit1-k8s
                replica: prometheus-cit1-1

              additionalAlertManagerConfigs:
                - scheme: https
                  static_configs:
                    - targets:
                        - alertmanager.company.live

              thanos:
                image: quay.io/thanos/thanos:v0.32.5
                objectStorageConfig:
                  name: thanos-objstore
                  key: objstore.yml

              ruleSelector:
                matchLabels:
                  evaluation: prometheus
              serviceMonitorSelector:
                matchLabels:
                  prometheus: main
              podMonitorSelector:
                matchLabels:
                  prometheus: main
              probeSelector:
                matchLabels:
                  prometheus: main
              resources:
                requests:
                  cpu: "3.2"
                  memory: 14Gi
                limits:
                  cpu: 8
                  memory: 20Gi

@MichaHoffmann
Copy link
Contributor

Is there another replica somewhere maybe? Asking since it has the external "replica" label

@andrejshapal
Copy link
Author

andrejshapal commented Oct 27, 2023

@MichaHoffmann Nope. We have HA prometheuses on some clusters, but added replica label everywhere just for consistency.

@MichaHoffmann
Copy link
Contributor

having replica label on things that are not replicas of one another feels like ti could be an issue

@andrejshapal
Copy link
Author

@MichaHoffmann I can try to remove replica label. But this should not be an issue, sisnce it just used as a deduplication label?

@andrejshapal
Copy link
Author

@MichaHoffmann I have removed replica label, but no effect on issue in question.

@MichaHoffmann
Copy link
Contributor

@MichaHoffmann I have removed replica label, but no effect on issue in question.

Ah well, an attempt was made. Do you have the same issue if you uncheck "Use Deduplication" ?

@andrejshapal
Copy link
Author

@MichaHoffmann In thanos query with or without deduplication issue is not noticed. I don't think querieng works via api/v1/series.

@MichaHoffmann
Copy link
Contributor

You can specify ?dedup=false i think on the API request ( https://thanos.io/v0.33/components/query.md/#deduplication-enabled )

@andrejshapal
Copy link
Author

dedup false:

{
  "status": "success",
  "data": [
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "cit1-k8s",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.145.73",
      "host": "3a2b71b6-8026-4323-a5d9-6b9420258bc5",
      "instance": "10.2.145.73:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-6w90",
      "pod": "-cassandra-dc1-r3-sts-0",
      "pod_name": "-cassandra-dc1-r3-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r3",
      "service": "-cassandra-dc1-all-pods-service"
    },
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "-cassandra",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.147.193",
      "host": "1252ec4c-66b7-47de-9745-42d368198c3e",
      "instance": "10.2.147.193:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-xmr5",
      "pod": "-cassandra-dc1-r2-sts-0",
      "pod_name": "-cassandra-dc1-r2-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r2",
      "service": "-cassandra-dc1-all-pods-service"
    }
  ]
}

dedup true:

{
  "status": "success",
  "data": [
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "-cassandra",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.145.73",
      "host": "3a2b71b6-8026-4323-a5d9-6b9420258bc5",
      "instance": "10.2.145.73:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-6w90",
      "pod": "-cassandra-dc1-r3-sts-0",
      "pod_name": "-cassandra-dc1-r3-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r3",
      "service": "-cassandra-dc1-all-pods-service"
    },
    {
      "__name__": "org_apache_cassandra_metrics_thread_pools_completed_tasks",
      "cassandra_datastax_com_cluster": "-cassandra",
      "cassandra_datastax_com_datacenter": "dc1",
      "cluster": "-cassandra",
      "container": "cassandra",
      "datacenter": "dc1",
      "endpoint": "metrics",
      "exported_instance": "10.2.147.193",
      "host": "1252ec4c-66b7-47de-9745-42d368198c3e",
      "instance": "10.2.147.193:9000",
      "job": "-cassandra-dc1-all-pods-service",
      "namespace": "cit1--core",
      "node_name": "gke-cit1-k8s-cit1-nodepool-1-331970fb-xmr5",
      "pod": "-cassandra-dc1-r2-sts-0",
      "pod_name": "-cassandra-dc1-r2-sts-0",
      "pool_name": "InternalResponseStage",
      "pool_type": "internal",
      "prometheus": "monitoring/kube-prometheus-stack-prometheus",
      "prometheus_replica": "prometheus-kube-prometheus-stack-prometheus-0",
      "rack": "r2",
      "service": "-cassandra-dc1-all-pods-service"
    }
  ]
}

@MichaHoffmann
Copy link
Contributor

Would it be possible to send promtool tsdb dump output with appropriate matcher from the offending prometheus? ( With the labels censored like in this example ); I could build a block and try to debug locally from that!

@andrejshapal
Copy link
Author

@MichaHoffmann Sorry for long waiting, had busy week.
dump.zip

@MichaHoffmann
Copy link
Contributor

MichaHoffmann commented Nov 5, 2023

Hey,

I did small local setup of prometheus, sidecar ,querier (on latest main) and your data and can reproduce!

$ curl -sq -g '0.0.0.0:10904/api/v1/series?' --data-urlencode 'match[]=foo'  | jq '.data.[].cluster'
"xxx-cassandra"
 fedora  ~  git  thanos-repro  andrej   
$ curl -sq -g '0.0.0.0:10904/api/v1/series?' --data-urlencode 'match[]=foo'  | jq '.data.[].cluster'
"xxx-cassandra"
 fedora  ~  git  thanos-repro  andrej   
$ curl -sq -g '0.0.0.0:10904/api/v1/series?' --data-urlencode 'match[]=foo'  | jq '.data.[].cluster'
"cluster_1"
 fedora  ~  git  thanos-repro  andrej   

with prometheus configured like

global:
  external_labels:
    cluster: cluster_1

querier and sidecar are configured mostly as default.

Thanks, ill look into this in the debugger a bit later!

@MichaHoffmann
Copy link
Contributor

Ok i think i have found the issue and have a fix; was able to reproduce in a minimal acceptance test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants