Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: consider handling duplicate labels, or continuing on error #497

Closed
erilane opened this issue Aug 30, 2018 · 9 comments
Closed

compact: consider handling duplicate labels, or continuing on error #497

erilane opened this issue Aug 30, 2018 · 9 comments

Comments

@erilane
Copy link

erilane commented Aug 30, 2018

Thanos, Prometheus and Golang version used
thanos, version 0.1.0-rc.2 (branch: HEAD, revision: 53e4d69)
build user: root@c7199d758b5e
build date: 20180705-12:54:50
go version: go1.10.3

prometheus, version 2.3.2 (branch: HEAD, revision: 71af5e29e815795e9dd14742ee7725682fa14b7b)
build user: root@5258e0bd9cc1
build date: 20180712-14:02:52
go version: go1.10.3

What happened
Thanos compact logs the following error and then exits with status 1

level=error ts=2018-08-30T23:08:03.919366788Z caller=main.go:160 msg="running command failed" err="compaction: gather index issues for block /thanos-compact-scratch/compact/0@{dc=\"dc1\",env=\"stg\",product=\"mongodb\",prom_config=\"mongodb-stg-dc1\"}/01CNNJJVN77F43F7Y5QA4ESWRE: out-of-order label set {__name__=\"mongodb_collection_avg_objsize_bytes\",dc=\"dc1\",env=\"stg\",host=\"mongoc3.stg.dc1.thousandeyes.com\",instance=\"mongoc3.stg.dc1.thousandeyes.com:9126\",job=\"hosts\",ns=\"config.actionlog\",ns=\"config.actionlog\",product=\"mongodb\",project=\"msc-cluster\"} for series 20551"

note the duplicate ns="config.actionlog"

What you expected to happen
Compactor to de-dup the labels (?), or simply log the error and move on to other work

How to reproduce it (as minimally and precisely as possible):
I'm still trying to figure out how this happened. Maybe a bad scrape target. It seems to have stopped happening on my environment. But, if I intentionally create a bad scrape target with something like:

bad_metric{a="1"} 1
bad_metric{a="1",a="1"} 1

Prometheus will show two different timeseries with identical labels when I query for bad_metric

@Allex1
Copy link
Contributor

Allex1 commented Sep 14, 2018

+1

@bwplotka
Copy link
Member

bwplotka commented Oct 3, 2018

What you expected to happen
Compactor to de-dup the labels (?), or simply log the error and move on to other work

So.. would you rather don't care about all of those issues? Even if they will after 1000 compactions and downsampling operations grow to serious problem that is unfixable and you would need to delete couple month of data? That can happen if you are feeding some malformed block into compaction logic when compaction was never tested to compact block with such thing.

I doubt this issue will cause such a serious cause so what needs to be done is a short unit test if we can compact those blocks and what it looks like afterwards. And if all good then we can switch to soft notification -> a metric and log line and contining the compaction work (:

@bwplotka
Copy link
Member

bwplotka commented Oct 20, 2018

See https://improbable-eng.slack.com/archives/CA4UWKEEN/p1539959413000100

You need to apply relabelling to get rid of From (move to from). In mean time we need to find a way to:

  • alert on this as soon as possible
  • auto fix this? (lowercase)
  • allow this, but the consequences are roughly unknown (we know it is inefficient, but we don't know how much)

@sbueringer
Copy link
Contributor

sbueringer commented Mar 2, 2019

I have a similar issue with time series produced by kube-state-metrics. Example:
two annotations on a namespace:

	namespace-provisioner/secret: test1
	namespace.provisioner/secret: test2

This leads to a timeseries with the following labels:

{__name__=\"kube_namespace_annotations\",annotation_namespace_provisioner_secret=\"test1\",annotation_namespace_provisioner_secret=\"test2\",..}

which produces this error:

level=error ts=2019-03-02T15:46:12.99163908Z caller=main.go:181 msg="running command failed" err="error executing compaction: compaction failed: compaction: gather index issues for block /var/thanos/compact/data/compact/0@{monitor=\"prometheus\",replica=\"2\"}/01D4Y5G8ZQQAZ58T81QQPSJ586: out-of-order label set {__name__=\"kube_namespace_annotations\",...

Sorry I'm probably asking in the wrong place. Are duplicate labels allowed in the Prometheus format? (and is this therefore a bug in kube-state-metrics or thanos?)

EDIT: When I query Prometheus for this timeseries, Prometheus shows only one of the 2 duplicate labels.

@sbueringer
Copy link
Contributor

Would be fixed by #848 right?

@sbueringer
Copy link
Contributor

@bwplotka

@FUSAKLA
Copy link
Member

FUSAKLA commented Mar 23, 2019

Hi @sbueringer , I believe that yes. It should fix the issue with the same label names.

The only issue that persists is first letter uppercase in label names which should be possible to overcame when #953 is merged, but that's different case.

Could you please try it out with current master if the issue is resolved?

Thanks!

@sbueringer
Copy link
Contributor

Sorry for the late answer. I verified it with the current Thanos version and the issue is fixed. So in my opinion you can close the issue

@FUSAKLA
Copy link
Member

FUSAKLA commented Aug 7, 2019

Great to hear, thanks for verifying!

@FUSAKLA FUSAKLA closed this as completed Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants