Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Query Frontend] support flag --web.external-url #6370

Closed
chrisduong opened this issue May 17, 2023 · 9 comments · May be fixed by #6524
Closed

[Query Frontend] support flag --web.external-url #6370

chrisduong opened this issue May 17, 2023 · 9 comments · May be fixed by #6524

Comments

@chrisduong
Copy link

chrisduong commented May 17, 2023

Is your proposal related to a problem?

I know there was already an request before #364, but I don't think it was resolved completely and it may worth another try (because this is quite a stable feature of Prometheus for a long time).

I deployed Thanos behind a reverse proxy with an external URL (for e.g. https://thanos.example.com)

The Thanos UI for Rules contain wrong the "URLs" for Rule Expressions, they always refer to "http://localhost:10902" as the web URL, not the external URL "https://thanos.example.com".

For e.g. the "AlertManagerClusterDown" Rule Expressions will have the URL like this:

image
http://localhost:10902/graph?g0.expr=(count%20by%20(namespace%2C%20service)%20(avg_over_time(up%7Bjob%3D%22kube-prometheus-stack-alertmanager%22%2Cnamespace%3D%22monitoring%22%7D%5B5m%5D)%20%3C%200.5)%20%2F%20count%20by%20(namespace%2C%20service)%20(up%7Bjob%3D%22kube-prometheus-stack-alertmanager%22%2Cnamespace%3D%22monitoring%22%7D))%20%3E%3D%200.5&g0.tab=1&g0.stacked=0&g0.range_input=1h

Without this feature, user has to copy the Rule Expressions URL, and replace the "http://localhost:10902" with "https://thanos.example.com" to be able to review the expression result in Thanos.

I hope we can make it work this time. Thank you 🙂

Describe the solution you'd like

If Query Frontend can understand that it is served behind a reverse proxy with the external URL, Rule Expressions should use the external URL instead.

For e.g:

https://thanos.example.com/graph?g0.expr=(count%20by%20(namespace%2C%20service)%20(avg_over_time(up%7Bjob%3D%22kube-prometheus-stack-alertmanager%22%2Cnamespace%3D%22monitoring%22%7D%5B5m%5D)%20%3C%200.5)%20%2F%20count%20by%20(namespace%2C%20service)%20(up%7Bjob%3D%22kube-prometheus-stack-alertmanager%22%2Cnamespace%3D%22monitoring%22%7D))%20%3E%3D%200.5&g0.tab=1&g0.stacked=0&g0.range_input=1h
@thomas-maurice
Copy link

@chrisduong did you manage to fix the issue ? I'm seeing the same thing in my deployment!

@chrisduong
Copy link
Author

No, this flag is not implemented yet.

@thomas-maurice
Copy link

I think I managed to troubleshoot a little bit but I hit a roadblock.

The issue shows up in the Rules and Alerts pages, when hovering on an expression instead of having the right URL I get http://localhost:10902. This is despite setting the flag --alert.query-url properly when starting the container. It happens on thanos v0.28.1 for me.

Something I noticed is that on the JS console when I'm trying to output console.log(THANOS_QUERY_URL) I get http://localhost:10902 which makes me think that some variable assignment does not happen right in there https://github.com/thanos-io/thanos/blob/main/pkg/ui/react-app/src/thanos/config.ts#L1

However, running both main and 0.28.1 locally from the releases does not allow me to reproduce the bug. Running something like

docker run -it -p 10902:10902 thanosio/thanos:v0.28.1 --alert.query-url=https://hello.com --query=https://localhost:9090

Would give me locally the right THANOS_QUERY_URL.

We are using the bitnami built image, but then again, after @vide ran it locally we were able to see that the said variable is showing correctly in the console, so something must be wrong with the way we deploy the binary, and this is where I hit a roadblock.

We are running on Kubernetes, and an excerpt of the kubectl describe looks like this

Containers:
  ruler:
    Container ID:  [container]
    Image:         [image]
    Image ID:      [image]
    Ports:         10902/TCP, 10901/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      rule
      --log.level=info
      --log.format=logfmt
      --grpc-address=0.0.0.0:10901
      --http-address=0.0.0.0:10902
      --data-dir=/data
      --eval-interval=1m
      --alertmanagers.url=dns+http://[REDACTED]:9093
      --alertmanagers.url=https://[REDACTED]
      --query=dnssrv+_http._tcp.[REDACTED].svc.cluster.local
      --label=replica="$(POD_NAME)"
      --label=ruler_cluster=""
      --alert.label-drop=replica
      --objstore.config-file=/conf/objstore/objstore.yml
      --rule-file=/conf/rules/*.yml
      --alert.query-url=https://[OUR DOMAIN NAME]
    State:          Running

Which according to the documentation should work. Anything like your deployment setup @chrisduong ? How are you deploying your Ruler ? Because the fact it works locally, both running a container and the bare binary leads me to think it has something to do with the deployment method.

@stefreak
Copy link

I have the same issue. In the thanos-10.5.5 helm chart I specified these values:

[..]
  ruler:
    enabled: true
    extraFlags: ["--alert.query-url=https://thanos.gardencloudmgmt.sys.garden/"]
[..]

And the ruler itself has the correct command line options:

Name:         thanos-ruler-0
Namespace:    monitoring
[..]
Status:       Running
IP:           10.0.1.11
IPs:
  IP:           10.0.1.11
Controlled By:  StatefulSet/thanos-ruler
Containers:
  ruler:
    Container ID:  docker://69506e9a45353c407e9dea877693fb5d1e23ae4ec37e781170e56a393d8f03b5
    Image:         docker.io/bitnami/thanos:0.27.0-scratch-r0
    Image ID:      docker-pullable://bitnami/thanos@sha256:78253039e561910f14cbddcd1b452a5b5b63f9154cfb71470715b04057d6d194
    Ports:         10902/TCP, 10901/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      rule
      --log.level=info
      --log.format=logfmt
      --grpc-address=0.0.0.0:10901
      --http-address=0.0.0.0:10902
      --data-dir=/data
      --eval-interval=1m
      --alertmanagers.url=http://alertmanager-kube-prometheus-stack-alertmanager-0.alertmanager-operated:9093
      --alertmanagers.url=http://alertmanager-kube-prometheus-stack-alertmanager-1.alertmanager-operated:9093
      --query=dnssrv+_http._tcp.thanos-query.monitoring.svc.cluster.local
      --label=replica="$(POD_NAME)"
      --label=ruler_cluster=""
      --alert.label-drop=replica
      --objstore.config-file=/conf/objstore/objstore.yml
      --rule-file=/conf/rules/*.yml
      --alert.query-url=https://(redacted)/

But the links in the thanos queryFrontend UI are still broken (pointing to localhost)

stefreak added a commit to stefreak/thanos that referenced this issue Jul 12, 2023
The `THANOS_QUERY_URL` constant is already assigned to `{{ .queryURL }}` in `public/index.html`.
Remove this comparison, as it causes the `--alert.query-url` command line option to have no effect whatsoever.

Fixes thanos-io#6370

Signed-off-by: Steffen Neubauer <stefreak@googlemail.com>
@GiedriusS
Copy link
Member

Something must be wrong with docker.io/bitnami/thanos:0.27.0-scratch-r0 or the other image. I am running on the latest main and I use the same exact parameter, and I can't reproduce your problem.

@GiedriusS
Copy link
Member

GiedriusS commented Jul 12, 2023

I believe this was fixed by #4847. I can't reproduce this neither in Query UI, neither in the Ruler UI. Do you also see this problem in the Query UI (not on query-frontend)?

@chrisduong
Copy link
Author

I believe this was fixed by #4847. I can't reproduce this neither in Query UI, neither in the Ruler UI. Do you also see this problem in the Query UI (not on query-frontend)?

I believed that PR #4847 is for --alert.query-url, not --web.external-url

@fabriciolos
Copy link

I decided to test everything throughly today, in order to find what was needed for the alert.query-url arg to work on kubernetes.

I managed to reproduce both the error and the solution and it resides on the fact that --alert.query-url must be set on the query pod, which is then forwarded to the query-frontend pod without the need of additional args.

TL;DR:
--alert.query-url on ruler image: used on the AlertManager UI for the clickable query/promQL expression.
--alert.query-url on query image: used on BOTH query and query-frontend UI for the clickable query/promQL expression.

@Aracki
Copy link

Aracki commented Nov 8, 2023

Exactly! I was missing --alert.query-url on query image. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants