Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: query_exemplars does not account for external labels #4116

Closed
tcolgate opened this issue Apr 28, 2021 · 3 comments · Fixed by #4127
Closed

query: query_exemplars does not account for external labels #4116

tcolgate opened this issue Apr 28, 2021 · 3 comments · Fixed by #4127

Comments

@tcolgate
Copy link
Contributor

@goku321

Thanos, Prometheus and Golang version used: v0.20.0 docker image

Object Storage Provider: gcs

What happened:

  • configured --exemplar=... on our local and global queriers
  • configure grafana for exemplars

What you expected to happen:

  • grafana "time series" panel should show exemplars.
    however...
    the time series panel includes the full query from the panel, including external labels used by the local prom
    instances.
    the query_range queries work as expected.
    query_exemplars did not.

If I send the query_exemplars a query including the external labels, I get an empty response.
If I remove the external labels and just use labels the local prom instance knows about, I get responses.

How to reproduce it (as minimally and precisely as possible):

  • configure a prometheus with external labels, with sidecar
  • configure thanos query to see the sidecar
  • query thanos-query with the external labels.
    (we have a second tier of queriers, but I do not think that is causing the issue.
@yeya24
Copy link
Contributor

yeya24 commented Apr 28, 2021

Thanks for reporting this issue.
Right now the Exemplar API implementation doesn't check external labels. We definitely need to support it for this use case.
Assign it to me.

@yeya24
Copy link
Contributor

yeya24 commented Apr 29, 2021

This problem is more complex than it looks like because the param query is a promql query instead of matches.

In order to process this exemplar query, the querier needs to:

  1. Parse the input query to promql expression and extract the metrics selectors.

  2. Handle complexity caused by promql itself. For example an example query http_request_duration_bucket{cluster="A"} + http_request_duration_bucket{cluster="B"}:

The left side matches cluster A and the right side matches cluster B. In this case, we have to query cluster A for the left side and query cluster B for the right side, instead of sending the original query to them.

  1. If the original input query contains function or operators, we cannot preserve them.
    For example, if the input is avg_over_time(http_request_duration_bucket{cluster="B"}[5m]), after parsing the whole query, extracting the labels and matching external labels, we can only get http_request_duration_bucket{cluster="B"}. The function information part is lost. AFAIK, there is no way to edit the labels of an expression directly.
    So my current solution is just to return the metrics selectors. If there are multiple selectors, we use + to concatenate them. This is not a good solution, but the results are correct.

Anyway, I have a working fix for this already. Will open a pr after test cases are added.

@kakkoyun
Copy link
Member

@yeya24 Please ping me on the PR for the fix and let's open it against the release branch and we can easily release a patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants