From c4f333d1c1cc54a70bccb77b298671b074198a99 Mon Sep 17 00:00:00 2001 From: Luca Forstner Date: Tue, 5 Jul 2022 15:51:02 +0200 Subject: [PATCH 1/4] Remove data from DSC which is potentially PII --- .../performance/dynamic-sampling-context.mdx | 25 +++++++++++-------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/src/docs/sdk/performance/dynamic-sampling-context.mdx b/src/docs/sdk/performance/dynamic-sampling-context.mdx index 7f388a36ea..bebab1edb4 100644 --- a/src/docs/sdk/performance/dynamic-sampling-context.mdx +++ b/src/docs/sdk/performance/dynamic-sampling-context.mdx @@ -32,7 +32,7 @@ This means that we need the exact same decision basis for all transactions belon In other words, all transactions of a trace need to hold all of the information to make a sampling decision, and that **information needs to be the same across all transactions of the trace**. We call the information we base sampling decisions on **"Dynamic Sampling Context"** or **"DSC"**. -+Currently, we can dynamically sample in two ways. First, we can do dynamic sampling on single transactions. For this process, Relay looks at the incoming event payload to make decisions. Second, we can do dynamic sampling across an entire trace. For this process, Relay relies on a **Dynamic Sampling Context**. +Currently, we can dynamically sample in two ways. First, we can do dynamic sampling on single transactions. For this process, Relay looks at the incoming event payload to make decisions. Second, we can do dynamic sampling across an entire trace. For this process, Relay relies on a **Dynamic Sampling Context**. As a mental model: The head transaction in a trace determines the Dynamic Sampling Context for all following transactions in that trace. @@ -99,7 +99,7 @@ SDKs should recognize incoming requests as "instrumented" or "traced" when at le - The incoming request has a `sentry-trace` header - The incoming request has a `baggage` header containing one or more keys starting with "`sentry-`" -After the DSC of a particular trace has been frozen, API calls like `set_user` or `set_transaction` should have no effect on the DSC. +After the DSC of a particular trace has been frozen, API calls like `set_user` should have no effect on the DSC. ## Payloads @@ -107,7 +107,8 @@ Dynamic Sampling Context is sent to Sentry via the `trace` envelope header and i All of the values in the payloads below are required (non-optional) in a sense, that when they are known to an SDK at the time a transaction envelope is sent to Sentry, or at the time a baggage header is propagated, they must also be included in said envelope or baggage. -### Handling Sensitive Data and PII + + #### Strictly required values @@ -140,11 +141,10 @@ The value of this envelope header is a JSON object with the following fields: - `sample_rate` (string) - The sample rate as defined by the user on the SDK. This string should always be a number between (and including) 0 and 1 in basic float notation (`0.04242`) - no funky business like exponents or anything similar. If a `tracesSampler` callback was used for the sampling decision, its result should be used for `sample_rate` instead of the `tracesSampleRate` from `SentryOptions`. In case `tracesSampler` returns `True` it should be sent as `1.0`, `False` should be sent as `0.0`. - `release` (string) - The release name as specified in client options`. - `environment` (string) - The environment name as specified in client options. -- `user_id` (string) - User ID as set by the user with `scope.set_user`. **Only included, if [`send_default_pii`](#handling-sensitive-data-and-pii) is enabled.** - `user_segment` (string) - User segment as set by the user with `scope.set_user`. -- `transaction` (string) - The transaction name set on the scope. -It's important to note that at the current moment, only `release`, `environment`, `user_id`, `user_segment`, and `transaction` are used by the product for dynamic sampling functionality. The rest of the context attributes, `trace_id`, `public_key`, and `sample_rate`, are used by Relay for internal decisions (like transaction sample rate smoothing). +It's important to note that at the moment, only `release`, `environment`, and `user_segment` are used by the product for dynamic sampling functionality. +The rest of the context attributes, `trace_id`, `public_key`, and `sample_rate`, are used by Relay for internal decisions (like transaction sample rate smoothing). ### Baggage-Header @@ -155,9 +155,7 @@ SDKs may use the following keys to set entries on `baggage` HTTP headers: - `sentry-sample_rate` - `sentry-release` - `sentry-environment` -- `sentry-user_id` - **Only included, if [`send_default_pii`](#handling-sensitive-data-and-pii) is enabled.** - `sentry-user_segment` -- `sentry-transaction` SDKs must set all of the keys in the form of "`sentry-[name]`". The prefix "`sentry-`" acts to identify key-value pairs set by Sentry SDKs. @@ -220,6 +218,13 @@ These are not blockers to the adoption of the spec, but instead are here as cont ### The Temporal Problem + + +This section contains references to `transaction` and `user_id`, which are not part of the dynamic sampling context right now because of PII concerns. +They might however make a comeback in the future and the problem outlined here still applies to `user_segment`. + + + Unlike `environment` or `release`, which should always be known to an SDK at initialization time, `user_id`, `user_segment`, and `transaction` (name) are only known after SDK initialization time. This means that if a trace is propagated from a running transaction _BEFORE_ the user/transaction attributes are set, you'll get a portion of transactions in a trace that have different Dynamic Sampling Context than other portions, leading to _dynamic sampling across a trace_ not working as expected for users. From fed2f5ceba971756f7f0308eb7ea8b43a5aee6a9 Mon Sep 17 00:00:00 2001 From: Luca Forstner Date: Tue, 5 Jul 2022 16:07:36 +0200 Subject: [PATCH 2/4] Fix typo --- src/docs/sdk/performance/dynamic-sampling-context.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/docs/sdk/performance/dynamic-sampling-context.mdx b/src/docs/sdk/performance/dynamic-sampling-context.mdx index bebab1edb4..f5b0c23761 100644 --- a/src/docs/sdk/performance/dynamic-sampling-context.mdx +++ b/src/docs/sdk/performance/dynamic-sampling-context.mdx @@ -107,7 +107,7 @@ Dynamic Sampling Context is sent to Sentry via the `trace` envelope header and i All of the values in the payloads below are required (non-optional) in a sense, that when they are known to an SDK at the time a transaction envelope is sent to Sentry, or at the time a baggage header is propagated, they must also be included in said envelope or baggage. - + -#### Strictly required values +### Strictly required values In any case, `trace_id`, `public_key`, and `sample_rate` should always be known to an SDK, so these values are strictly required. From b714ed28b03d10011340defa665c9ab7526e129e Mon Sep 17 00:00:00 2001 From: Luca Forstner Date: Wed, 6 Jul 2022 13:28:42 +0200 Subject: [PATCH 4/4] Remove commented out code --- .../performance/dynamic-sampling-context.mdx | 20 ------------------- 1 file changed, 20 deletions(-) diff --git a/src/docs/sdk/performance/dynamic-sampling-context.mdx b/src/docs/sdk/performance/dynamic-sampling-context.mdx index 02d929e463..07cbcd9001 100644 --- a/src/docs/sdk/performance/dynamic-sampling-context.mdx +++ b/src/docs/sdk/performance/dynamic-sampling-context.mdx @@ -107,26 +107,6 @@ Dynamic Sampling Context is sent to Sentry via the `trace` envelope header and i All of the values in the payloads below are required (non-optional) in a sense, that when they are known to an SDK at the time a transaction envelope is sent to Sentry, or at the time a baggage header is propagated, they must also be included in said envelope or baggage. - - - ### Strictly required values In any case, `trace_id`, `public_key`, and `sample_rate` should always be known to an SDK, so these values are strictly required.