Skip to content
This repository has been archived by the owner on Aug 14, 2024. It is now read-only.

ref(DSC): Only include user_id in DSC if the send-default-pii option is enabled #625

Merged
merged 4 commits into from
Jul 1, 2022

Conversation

Lms24
Copy link
Member

@Lms24 Lms24 commented Jun 30, 2022

This PR adds a condition to including the user_id field in the dynamic sampling context (DSC) that is sent via the trace envelope header to relay or propagated via the baggage Http header in outgoing requests.

As discussed in TSC and in Slack, adding user_id to the DSC raises a PII concern because many SDKs currently propagate the DSC via baggage in all outgoing Http requests. This means, that PII could be sent to third parties. To avoid this, the decision was made to only include the user_id if the send-default-pii init option was set to true.

It's important to note that another way of addressing issues like this one is in discussion, which is making the tracingOrigins option part of every SDK. With this option, users can clearly specify where tracing data should be sent to, and thereby also disable the propagation of DSC to third parties. Until this is defined more clearly and ready for development, we propose the solution added with this PR in the meantime.

Resolves #626

@vercel
Copy link

vercel bot commented Jun 30, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
develop ✅ Ready (Inspect) Visit Preview Jul 1, 2022 at 0:21AM (UTC)

@@ -118,7 +122,7 @@ The value of this envelope header is a JSON object with the following fields:
- `sample_rate` (string) - The sample rate as defined by the user on the SDK. This string should always be a number between (and including) 0 and 1 in basic float notation (`0.04242`) - no funky business like exponents or anything similar. If a `tracesSampler` callback was used for the sampling decision, its result should be used for `sample_rate` instead of the `tracesSampleRate` from `SentryOptions`. In case `tracesSampler` returns `True` it should be sent as `1.0`, `False` should be sent as `0.0`.
- `release` (string) - The release name as specified in client options`.
- `environment` (string) - The environment name as specified in client options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So only sending this in the envelope header if tracingOrigins allows would mean we have to check tracingOrigins against the DSN or proxy to see if we're allowed to send it, correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to user_id? Could you elaborate what you mean here? Generally, in the context of this PR we're not including tracingOrigins in any way or condition. Just wanted to mention in the description that discussions around it are in the works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this referred to user_id. Once discussed we could add something to clarify whether we need to check tracingOrigins against DSN / proxy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So currently, the JS SDK's BrowserTracing integration does not check for the DSN (or a proxy) but it only determines if the URL of the outgoing request matches a string or regex specified in tracingOrigins. So far, we haven't considered changing this but sincetracingOrigins is gonna be standardized, we can certainly discuss this. I would suggest though, discussing this in a separate issue about the tracingOrigins spec/changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Java we have our own transport(s) for communication with Sentry so we'd have to separately add the check there if we so choose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracingOrigins doesn't really matter when determining what is sent to Sentry. tracingOrigins should simply control which targets should receive a) a sentry-trace header b) DSC in a baggage header.

This shouldn't concern transports but rather request instrumentation. I might be wrong though since I don't know specifics on the Java SDK.

@@ -106,6 +106,10 @@ After the DSC of a particular trace has been frozen, API calls like `set_user` o
Dynamic Sampling Context is sent to Sentry via the `trace` envelope header and is propagated to downstream SDKs via a baggage header.

All of the values in the payloads below are required (non-optional) in a sense, that when they are known to an SDK at the time a transaction envelope is sent to Sentry, or at the time a baggage header is propagated, they must also be included in said envelope or baggage.

The **only exception** is the `user_id` field.
To avoid sensitive data being leaked to third parties, `user_id` should only be included, if the <Link to="/sdk/data-handling/#sensitive-data">`send-default-pii`</Link> option was enabled in the init options.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify what send-default-pii controls exactly? Only new DSC or also DSC from incoming requests?

I'm really not sure about this one. If we're not propagating user_id when it's incoming, we're mutating the DSC which is bad. On the other hand, if we just propagate it, we give users a PII foot gun in the form of a tracing product - if not even a foot bazooka.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make it for new DSC - especially since the data here is based on the head SDK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also rename this to send_default_pii to match snake case (then folks know to use camel case if needed)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only new DSC or also DSC from incoming requests
Let's make it for new DSC

Yes, I strongly agree with only new DSC. We have to adapt this in all SDKs and hence, eventually, we should not get a user_id in incoming DSC. Also, I don't want to allow exceptions in the DSC immutability for now.

Can we also rename this to send_default_pii

Certainly, the only reason why I wrote it like this is because it was already written in that format in the dev docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're also good for the case that downstream SDKs set different sendDefaultPii values than the head-of-trace SDK. Users have to set DS rules on the DSC values from the head SDK anyway, meaning this is the only SDK where the Pii decision should matter

Copy link
Member Author

@Lms24 Lms24 Jul 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in efbc5a1 and 8679a0d

Comment on lines 112 to 114
Note that this only applies for the head-of-trace SDK that populates the DSC content.
In case downstream SDKs receive a `user_id` in incoming DSC, they should continue to propagate it. The rationale behind this is
that incoming DSC [must be frozen](#freezing-dynamic-sampling-context) and users must make DS rules based on the head SDK's DSC data anyway.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying this!

@Lms24
Copy link
Member Author

Lms24 commented Jul 1, 2022

@lforst and I had a chat about the longevity of this solution. Since we all agreed (also at TSC) that the tracingOrigins solution is the more robust one (which also addresses other problems), I added a notice to this section that it is subject to change and might be revisited.

e818094

@Lms24 Lms24 merged commit 33b0b82 into master Jul 1, 2022
@Lms24 Lms24 deleted the lms-dsc-spec-userid-pii branch July 1, 2022 14:47
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DSC Spec] Only include user_id in DSC if the send-default-pii option is enabled
5 participants