-
-
Notifications
You must be signed in to change notification settings - Fork 225
ref(DSC): Only include user_id
in DSC if the send-default-pii
option is enabled
#625
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -118,7 +122,7 @@ The value of this envelope header is a JSON object with the following fields: | |||
- `sample_rate` (string) - The sample rate as defined by the user on the SDK. This string should always be a number between (and including) 0 and 1 in basic float notation (`0.04242`) - no funky business like exponents or anything similar. If a `tracesSampler` callback was used for the sampling decision, its result should be used for `sample_rate` instead of the `tracesSampleRate` from `SentryOptions`. In case `tracesSampler` returns `True` it should be sent as `1.0`, `False` should be sent as `0.0`. | |||
- `release` (string) - The release name as specified in client options`. | |||
- `environment` (string) - The environment name as specified in client options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So only sending this in the envelope header if tracingOrigins
allows would mean we have to check tracingOrigins
against the DSN or proxy to see if we're allowed to send it, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to user_id
? Could you elaborate what you mean here? Generally, in the context of this PR we're not including tracingOrigins
in any way or condition. Just wanted to mention in the description that discussions around it are in the works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this referred to user_id
. Once discussed we could add something to clarify whether we need to check tracingOrigins
against DSN / proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So currently, the JS SDK's BrowserTracing
integration does not check for the DSN (or a proxy) but it only determines if the URL of the outgoing request matches a string or regex specified in tracingOrigins
. So far, we haven't considered changing this but sincetracingOrigins
is gonna be standardized, we can certainly discuss this. I would suggest though, discussing this in a separate issue about the tracingOrigins
spec/changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Java we have our own transport(s) for communication with Sentry so we'd have to separately add the check there if we so choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tracingOrigins
doesn't really matter when determining what is sent to Sentry. tracingOrigins
should simply control which targets should receive a) a sentry-trace
header b) DSC in a baggage header.
This shouldn't concern transports but rather request instrumentation. I might be wrong though since I don't know specifics on the Java SDK.
@@ -106,6 +106,10 @@ After the DSC of a particular trace has been frozen, API calls like `set_user` o | |||
Dynamic Sampling Context is sent to Sentry via the `trace` envelope header and is propagated to downstream SDKs via a baggage header. | |||
|
|||
All of the values in the payloads below are required (non-optional) in a sense, that when they are known to an SDK at the time a transaction envelope is sent to Sentry, or at the time a baggage header is propagated, they must also be included in said envelope or baggage. | |||
|
|||
The **only exception** is the `user_id` field. | |||
To avoid sensitive data being leaked to third parties, `user_id` should only be included, if the <Link to="/sdk/data-handling/#sensitive-data">`send-default-pii`</Link> option was enabled in the init options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify what send-default-pii
controls exactly? Only new DSC or also DSC from incoming requests?
I'm really not sure about this one. If we're not propagating user_id when it's incoming, we're mutating the DSC which is bad. On the other hand, if we just propagate it, we give users a PII foot gun in the form of a tracing product - if not even a foot bazooka.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make it for new DSC - especially since the data here is based on the head SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also rename this to send_default_pii
to match snake case (then folks know to use camel case if needed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only new DSC or also DSC from incoming requests
Let's make it for new DSC
Yes, I strongly agree with only new DSC. We have to adapt this in all SDKs and hence, eventually, we should not get a user_id in incoming DSC. Also, I don't want to allow exceptions in the DSC immutability for now.
Can we also rename this to send_default_pii
Certainly, the only reason why I wrote it like this is because it was already written in that format in the dev docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're also good for the case that downstream SDKs set different sendDefaultPii
values than the head-of-trace SDK. Users have to set DS rules on the DSC values from the head SDK anyway, meaning this is the only SDK where the Pii decision should matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this only applies for the head-of-trace SDK that populates the DSC content. | ||
In case downstream SDKs receive a `user_id` in incoming DSC, they should continue to propagate it. The rationale behind this is | ||
that incoming DSC [must be frozen](#freezing-dynamic-sampling-context) and users must make DS rules based on the head SDK's DSC data anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying this!
This PR adds a condition to including the
user_id
field in the dynamic sampling context (DSC) that is sent via thetrace
envelope header to relay or propagated via thebaggage
Http header in outgoing requests.As discussed in TSC and in Slack, adding
user_id
to the DSC raises a PII concern because many SDKs currently propagate the DSC viabaggage
in all outgoing Http requests. This means, that PII could be sent to third parties. To avoid this, the decision was made to only include theuser_id
if thesend-default-pii
init option was set totrue
.It's important to note that another way of addressing issues like this one is in discussion, which is making the
tracingOrigins
option part of every SDK. With this option, users can clearly specify where tracing data should be sent to, and thereby also disable the propagation of DSC to third parties. Until this is defined more clearly and ready for development, we propose the solution added with this PR in the meantime.Resolves #626