Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query-frontend: Extract org id from from request headers #3245

Merged
merged 6 commits into from
Nov 11, 2020

Conversation

neilfordyce
Copy link
Contributor

@neilfordyce neilfordyce commented Sep 28, 2020

Signed-off-by: Neil Fordyce neilfordyce@skyscanner.net

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  1. Add query-frontend.org-id-header flag to specify HTTP header(s) to populate slow query log org_id field: Fixes query-frontend: populate org_id in slow query logs #3177

Verification

Executed query-frontend pointed at a local querier with this configuration
thanos query-frontend --http-address=127.0.0.1:9093 --query-frontend.downstream-url=http://localhost:9090 --query-frontend.log-queries-longer-than="-1s" --query-frontend.org-id-header="x-grafana-user" --query-frontend.org-id-header="x-scope-orgid"

Ran query with expected headers

curl --request GET \
  --url 'http://localhost:9093/api/v1/query_range?query=up&dedup=true&partial_response=true&start=1594422000&end=1594508400&step=600&max_source_resolution=0s' \
  --header 'x-grafana-user: grafana-username'

grafana-username shows in org_id field of query frontend logs

level=info ts=2020-09-28T13:11:22.645878Z caller=frontend.go:220 org_id=grafana-username msg="slow query detected" method=GET host=localhost:9093 path=/api/v1/query_range time_taken=811.734669ms param_dedup=true param_partial_response=true param_start=1594422000 param_end=1594508400 param_step=600 param_max_source_resolution=0s param_query=up

Further testing summary

Input headers org_id field contents comment
--header 'x-grafana-user: grafana-username' --header 'X-Scope-OrgId: another-org-id' grafana-username check first arg specified has precedence
--header 'X-Scope-OrgId: another-org-id' another-org-id check second arg works if first arg header not present
no headers anonymous check fallback if no headers match

@yeya24 yeya24 self-requested a review September 28, 2020 13:29
@neilfordyce neilfordyce marked this pull request as ready for review September 28, 2020 14:40
@yeya24
Copy link
Contributor

yeya24 commented Sep 28, 2020

Thanks for the work. The CI failure is not related.

@krasi-georgiev
Copy link
Contributor

@neilfordyce can you explain when is this needed/useful?

Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nits. Overall it looks very good! Would you mind also documenting this?

func extractOrgId(
conf *queryFrontendConfig,
r *http.Request,
) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit, we can inline this.

@@ -82,6 +83,10 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-frontend.log-queries-longer-than", "Log queries that are slower than the specified duration. "+
"Set to 0 to disable. Set to < 0 to enable on all queries.").Default("0").DurationVar(&cfg.CortexFrontendConfig.LogQueriesLongerThan)

cmd.Flag("query-frontend.org-id-header", "Request header names used to set the org id field in the slow query log (repeated flag). "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about having a default org-id-header X-Scope-OrgId and users can also specify other headers as well. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds like a sensible default 👍. Will that help with compatibility with Cortex things perhaps?

@neilfordyce
Copy link
Contributor Author

neilfordyce commented Sep 30, 2020

@krasi-georgiev - absolutely, apologies for the missing context. We find it helpful to identify users and services who are creating (slow) queries. In that way, we can find the source of slow queries and alter the queries, e.g. if they are continually causing memory issues for queriers. It's a bit of a stopgap to help reduce problems large queries cause with high memory use; but more generally, it's useful to be able to see who is running what queries.

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Sep 30, 2020

Why not use some of the standard HTTP headers to identify the source that creates these slow queries?
Host, Origin, Referer, Via, User-Agent?

@neilfordyce
Copy link
Contributor Author

We definitely could do that, but the best header to use would likely vary depending on the client. E.g. most people interacting with our Thanos setup are logged in Grafana users, so it's handy to identify them via X-Grafana-User. Other users of Thanos might not be so interested in the specific user (or they might be anonymous in Grafana) so they could fallback to another standard header.

@krasi-georgiev
Copy link
Contributor

I would let one of the maintainers(cc @bwplotka, @kakkoyun ) decide on that, but I would say let's add something basic for now that solves your use-case and if needed can revisit later to make it configurable.

Would the origin header solve your use case?

@neilfordyce
Copy link
Contributor Author

I thought origin might be enough, but it appears like Grafana doesn't forward the origin when it's configured to proxy requests through the data source proxy.

This is the complete set of headers I can see from a request coming from the Prometheus data source. x-forwarded-for might be helpful, but an IP address isn't as helpful as a user name. { host: 'echo-proxy:80', 'user-agent': 'Grafana/7.0.6', accept: 'application/json, text/plain, */*', 'accept-encoding': 'gzip, deflate', 'accept-language': 'en-US,en;q=0.5', 'x-forwarded-for': '192.168.48.1, 192.168.48.1', 'x-grafana-nocache': 'true', 'x-grafana-org-id': '1', 'x-grafana-user': 'neil' }

If I change the datasource to go direct from the Browser there is an origin header, so we'd be able to see slow requests were coming from Grafana. But again probably not as helpful as seeing the user name in the logs.
{ host: 'localhost:8090', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:80.0) Gecko/20100101 Firefox/80.0', accept: 'application/json, text/plain, */*', 'accept-language': 'en-US,en;q=0.5', 'accept-encoding': 'gzip, deflate', referer: 'http://localhost:3000/datasources/edit/6/', origin: 'http://localhost:3000', connection: 'keep-alive' }

@krasi-georgiev
Copy link
Contributor

hm, yeah I see what you mean.

But isn't an IP a more unique ID here? It is not unlikely that many grafana installations will use the admin username which will cause the same log label from different sources.

@krasi-georgiev
Copy link
Contributor

I see host is present in both modes here

@krasi-georgiev
Copy link
Contributor

btw could this be an issue with Grafana not adding the origin header?

grafana/grafana#22869

@neilfordyce
Copy link
Contributor Author

We only allow logged in users in our Grafana setup, so the x-grafana-user header is always set and gives us exactly what we want. IP address could be helpful in some setups, but for others it might only reveal the address of a NAT which would be the same for all users. My thought was, different headers will make sense in different contexts. This might be why Cortex have the user id header fixed to X-Scope-OrgId, then leaves it up to a reverse proxy to allow users to fill out that header whatever logic they want from the request.

@krasi-georgiev
Copy link
Contributor

krasi-georgiev commented Oct 1, 2020

Got it - using IP or Origin header will always be the same when the request comes from the same grafana instance. Using the grafana username will indeed give you a better understanding of which use is sending these heavy queries.

@@ -82,6 +83,10 @@ func registerQueryFrontend(app *extkingpin.App) {
cmd.Flag("query-frontend.log-queries-longer-than", "Log queries that are slower than the specified duration. "+
"Set to 0 to disable. Set to < 0 to enable on all queries.").Default("0").DurationVar(&cfg.CortexFrontendConfig.LogQueriesLongerThan)

cmd.Flag("query-frontend.org-id-header", "Request header names used to set the org id field in the slow query log (repeated flag). "+
"If multiple headers match the request, the first matching arg specified will take precedence. "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this bit? Can you add test for this case to make it a bit more clear?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yeah this sounded clearer in my head; I'll try to make the behaviour clearer in the help string. The idea is that if --query-frontend.org-id-header is used multiple times, we try to extract a header from the request in order the command line args were specified.

There are test cases to check the precedence behaviour in query_frontend_test.go, e.g. we check that when both --query-frontend.org-id-header=x-server-id and query-frontend.org-id-header=x-grafana-user are specified that org id is set to the value of the x-server-id header. However, happy to split that into a few test functions with different names if it would be clearer what's being tested?

Also, I noticed that this file has changed a bit since writing it and after rebasing it seems bit out of place. Considering moving the new logic into pkg/queryfrontend. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, now looking at the test it is clear.
I think the test location is fine, but if you prefer can move it.

@@ -161,6 +161,12 @@ Flags:
Log queries that are slower than the specified
duration. Set to 0 to disable. Set to < 0 to
enable on all queries.
--query-frontend.org-id-header=<http-header-name> ...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yeya24 - thanks for the feedback. Regarding docs, I'm considering adding something to explain the new flag in the slow-query-log section of query-frontend.md. And probably some godoc to explain the new function. Is that the kind of thing you had in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply add this description to the flag and I think it would be clear 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. I think it's already like that. This description is the one generated from the flag in cmd/thanos/query_frontend.go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean you can add something like This ord id will be added to the slow-query-log to the flag description directly and then regenerate the doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an update to the flag description. Hope it's clearer, but open to suggestions for a better way to describe it.

@krasi-georgiev
Copy link
Contributor

LGTM

@yeya24
Copy link
Contributor

yeya24 commented Nov 10, 2020

Sorry for the late response. I think this feature is nice to have. @neilfordyce Can you please fix the conflicts? Then we can merge 🥇

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
@neilfordyce
Copy link
Contributor Author

Thanks for the bump. This slipped my mind. Sure thing, I've rebased and fixed the conflicts. The test failure seems to be for unrelated reasons.

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think it is clear. @bwplotka @kakkoyun Can you please also take a look at this pr?

Copy link
Member

@kakkoyun kakkoyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇 LGTM. Thanks for your contribution.

@kakkoyun kakkoyun merged commit 713ff65 into thanos-io:master Nov 11, 2020
Oghenebrume50 pushed a commit to Oghenebrume50/thanos that referenced this pull request Dec 7, 2020
)

* query-frontend: Extract org id from from request headers

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>

* docs: regenerate docs

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>

* query-frontend: gofmt

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>

* query-frontend: inline function header

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>

* docs: rebuild query-frontend docs

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>

* docs: clarify usage of org-id-header flag

Signed-off-by: Neil Fordyce <neil.fordyce@skyscanner.net>
Signed-off-by: Oghenebrume50 <raphlbrume@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

query-frontend: populate org_id in slow query logs
4 participants