Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop Kibana from generating unique X-Opaque-ID values when not received from the client. #123737

Open
pgayvallet opened this issue Jan 25, 2022 · 10 comments
Assignees
Labels
Feature:elasticsearch impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:large Large Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

We've been historically generating unique x-opaque-id values for each Kibana request, and transmitting those values to elasticsearch every time a request is performed using a scoped client.

This was done with the assumption that this value had no semantics or no functional usage in elasticsearch. However, this assumption was wrong, as this opaqueId value is now used for some features such as deprecation logs deduplication.

We need to stop generating values for x-opaque-id / (also exposed as request.id) when the client initiating the request against Kibana does not provide one, and not propagate the value to elasticsearch.

Now that ES has support for the traceparent header (elastic/elasticsearch#74210), we can use the trace.id value as correlation id for Kibana features that require so.

Note that if this x-opaque-id/request.id value is supposed to be mostly internal to core, it's currently used by the audit logging as their correlation value between Kibana and Elasticsearch audit logs. We will need to adapt Kibana's audit logging to stop overriding trace.id with request.id and update their documentation to reflect the fact that the correlation value will now be trace.id

Also, for this traceparent header to be sent to elasticsearch, the APM agent needs to be enabled, as least in contextPropagationOnly mode (which is the default configuration). We need to figure out if this limitation is acceptable and if documenting it is good enough, or if we may want to consider this feature of the APM agent as an implementation detail and stop allowing customer to disable it.

More context in the original issue: #120124

cc @elastic/kibana-security

@pgayvallet pgayvallet added Feature:elasticsearch Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc labels Jan 25, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@exalate-issue-sync exalate-issue-sync bot added impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:small Small Level of Effort labels Feb 1, 2022
@exalate-issue-sync exalate-issue-sync bot added loe:large Large Level of Effort and removed loe:small Small Level of Effort labels Feb 8, 2022
@pgayvallet
Copy link
Contributor Author

A draft PR doing most of the technical changes is available here: #120124

We still need a decision regarding

Also, for this traceparent header to be sent to elasticsearch, the APM agent needs to be enabled, as least in contextPropagationOnly mode (which is the default configuration). We need to figure out if this limitation is acceptable and if documenting it is good enough, or if we may want to consider this feature of the APM agent as an implementation detail and stop allowing customer to disable it.

We may want to perform load testing to determine the impact of the contextPropagationOnly compared to a fully disabled APM agent to decide if forcing it to always be enabled is acceptable.

We can probably tweak the load tests from elastic/kibana-load-testing#221 to check the perf impact of contextPropagationOnly

@lukeelmers
Copy link
Member

We can probably tweak the load tests from elastic/kibana-load-testing#221 to check the perf impact of contextPropagationOnly

@dmlemeshko I know you've been looking into perf impacts of enabling APM, would you be able to help us assess the impact of this particular option?

@dmlemeshko
Copy link
Member

@lukeelmers
Sorry for late response, I will definitely help as much as I can. To check the impact I need either a branch I can use for testing (and compare with main) or required Kibana configuration changes I need to apply.

Please just open an issue with details and will try to do it asap.

@pgayvallet
Copy link
Contributor Author

@dmlemeshko I opened elastic/kibana-load-testing#245. Please tell me if you need more information

@pgayvallet
Copy link
Contributor Author

@dmlemeshko did you find an opportunity to take a look at elastic/kibana-load-testing#245?

@pgayvallet
Copy link
Contributor Author

I performed the benchmarking and posted the results here: #129585 (comment)

TLDR: there is a significant difference between APM being fully disabled and being enabled in contextPropagationOnly mode, which means forcing APM to be enabled in contextPropagationOnly doesn't seems like a viable option in the scope of this issue.

So I guess we're back to the initial plan to simply document that features relying on trace.id such as audit logging require APM to not be disabled to work.

@lukeelmers wdyt?

@jportner
Copy link
Contributor

jportner commented May 5, 2022

So I guess we're back to the initial plan to simply document that features relying on trace.id such as audit logging require APM to not be disabled to work.

@lukeelmers wdyt?

Since I was subscribed to this issue: I'll chime in on behalf of Platform Security and say I am fine with this 👍

@lukeelmers
Copy link
Member

So I guess we're back to the initial plan to simply document that features relying on trace.id such as audit logging require APM to not be disabled to work.

@lukeelmers wdyt?

I'll chime in on behalf of Platform Security and say I am fine with this

++ Agreed, I think this is acceptable given these findings, as long as Platform Security is comfortable with it.

@pgayvallet
Copy link
Contributor Author

So #123197 will need to update the audit logging documentation that was added in #123757 to inform that APM needs to be enabled at least for context propagation for the feature to function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:elasticsearch impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. loe:large Large Level of Effort Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants