Clarify that changing the Span async from OnStart is undefined behavior #2751

bogdandrutu · 2022-08-25T14:24:02Z

The reason to make this undefined behavior is because it is not guarantee that the span is still active after the call to OnStart finishes,
so instead of doing a very complicated synchronization logic which none of the SDK does, mark this as undefined behavior.

Signed-off-by: Bogdan Drutu bogdandrutu@gmail.com

specification/trace/sdk.md

The reason to make this undefined behavior is because it is not guarantee that the span is still active after the call to OnStart finishes, so instead of doing a very complicated synchronization logic which none of the SDK does, mark this as undefined behavior. Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>

specification/trace/sdk.md

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

Oberon00

I think we should weaken that requirement.

so instead of doing a very complicated synchronization logic which none of the SDK does, mark this as undefined behavior.

What would that synchronization logic be? The SDK needs to do some synchronization logic for spans anyway because they are thread safe, and it is valid to use a span after it ended. We specified that the write operations will be no-op though.

I originally wanted to object that processors could use OnEnd to detect when the span ends, but once OnEnd is invoked the span is already ended, so that would be too late. Of course one can also check if the span has ended (there is an explicit requirement that this is queryable from a readable span), but that is transient information as well.

Still, for some use cases a "works most of the time" will be enough, e.g. to regularly add a stacktrace in an event on the span.

I think we don't need to add any further requirements to the spec, this is already expressed through the text on End:

Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended (there might be exceptions when Tracer is streaming events and has no mutable state associated with the Span).

https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#end

I think it does makes sense to add a note that after OnStart returns the span can be ended at any time, and this will happen before OnEnd is invoked.

There is one contradictory statement in the spec under https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span-operations

With the exception of the function to retrieve the Span's SpanContext and recording status, none of the below may be called after the Span is finished.

The "may" is lowercased, and it seems inconsistent with the spec part for Span.End quoted above. We should change one of these locations, probably based on what SDKs allow today. If we want to keep the ban on calling span methods "after the Span is finished" it would be nice to clarify what this means in combination with thread safety, i.e. I think this would make End thread un-safe.

bogdandrutu · 2022-08-26T15:20:30Z

You are making a not always true assumption that the same "instance" is returned in the OnStart as well as to the instrumentation user via StartSpan, in terms of synchronization etc. OnStart happens during StartSpan so the access is exclusive to the SDK, which can pass a non thread safe implementation to the OnStart.

The SDK needs to do some synchronization logic for spans anyway because they are thread safe, and it is valid to use a span after it ended. We specified that the write operations will be no-op though.

False; The operation happens during StartSpan and access is exclusive to the SDK, prior to the moment is return to the instrumentation call.

I originally wanted to object that processors could use OnEnd to detect when the span ends, but once OnEnd is invoked the span is already ended, so that would be too late. Of course one can also check if the span has ended (there is an explicit requirement that this is queryable from a readable span), but that is transient information as well.
Still, for some use cases a "works most of the time" will be enough, e.g. to regularly add a stacktrace in an event on the span.

I don't see these use-cases, and again you are requiring to return a thread safe implementation if you do this. Also any implementation that "works most of the time" is a bad implementation, so we should not encourage that.

The "may" is lowercased, and it seems inconsistent with the spec part for Span.End quoted above. We should change one of these locations, probably based on what SDKs allow today. If we want to keep the ban on calling span methods "after the Span is finished" it would be nice to clarify what this means in combination with thread safety, i.e. I think this would make End thread un-safe.

Sure, we should fix the inconsistency in the specs. Undefined Behavior is for the instance passed as argument toOnStart not for the instance returned by StartSpan, even though they are implementing the same interface does not mean they have same requirements.

Oberon00 · 2022-08-26T16:20:14Z

You are making a not always true assumption that the same "instance" is returned in the OnStart as well as to the instrumentation user via StartSpan, in terms of synchronization etc.

I am indeed making this assumption, and I think all SDKs will have to implement it that way or at least something very close to it, because we have the following in the spec (note especially the second paragraph) at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#additional-span-interfaces:

Read/write span: A function receiving this as argument must have access to both the full span API as defined in the API-level definition for span's interface and additionally must be able to retrieve all information that was added to the span (as with readable span).

It MUST be possible for functions being called with this to somehow obtain the same Span instance and type that the span creation API returned (or will return) to the user (for example, the Span could be one of the parameters passed to such a function, or a getter could be provided).

False; The operation happens during StartSpan and access is exclusive to the SDK, prior to the moment is return to the instrumentation call.

Only the read-access is specific to the SDK, and that we need to have thread safe together with concurrent write access anyways because of the valid read-only use cases that are allowed even after this PR.

I don't see these use-cases,

I think the example I mentioned, attaching an event with the stacktrace regularly to every span, is a valid one (e.g. every 2 minutes to detect long-running spans).

Also any implementation that "works most of the time" is a bad implementation, so we should not encourage that.

Sure, we can even explicitly discourage it, but IMHO we should not make currently allowed valid use cases impossible (technically this would even be a breaking change to a stable API)

bogdandrutu · 2022-08-26T21:09:41Z

Sure, we can even explicitly discourage it, but IMHO we should not make currently allowed valid use cases impossible (technically this would even be a breaking change to a stable API)

@Oberon00 how come is a breaking change? A UB does not mean code breaking, means undefined behavior, so the behavior if a user does that is not defined. Is it defined right now and we are changing that? I just proved that you defined it with an assumption in mind, which is not documented to be the case. So hence this behavior was not previously defined.

To understand where I am coming from, try to implement this using a non thread safe mechanism for write (not the way you have this implementation in mind), that also does not accept any more writes after the function ends? Is that a valid implementation that I have? It is, then my implementation will crash if users are doing what you want to do, so this is a definition of undefined behavior, if based on the current rules I can have 2 implementations one that crashes one that does not, users should be aware of that.

Oberon00 · 2022-08-29T07:09:32Z

I just proved that you defined it with an assumption in mind, which is not documented to be the case.

I just quoted the spec test that confirms my assumption:

It MUST be possible for functions being called with this [a read/write span] to somehow obtain the same Span instance and type that the span creation API returned (or will return) to the user (for example, the Span could be one of the parameters passed to such a function, or a getter could be provided).

-- https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#additional-span-interfaces

I think it's technically a breaking change, but I don't actually feel this issue is very important either way, so I'm dismissing my request for changes.

Oberon00 · 2022-08-31T10:33:43Z

One follow-up comment: We should clarify (generally, maybe in the glossary) what we mean by "undefined behavior". My default assumption, coming from C++, is basically: "it's absolutely forbidden to go into this, even if it works now, it might cause security breaches, exceptions, or any other bad thing in the future".

Oberon00 · 2022-08-31T10:36:14Z

specification/trace/sdk.md

@@ -491,6 +491,8 @@ exceptions.
  SHOULD be reflected in it.
  For example, this is useful for creating a SpanProcessor that periodically
  evaluates/prints information about all active span from a background thread.
+  When a reference is kept, the span MUST be treated as a readonly object,
+  trying to modify the Span after `OnStart` method returns is undefined behavior.


Please clarify if this also applies if one "somehow obtained the same Span instance and type that the span creation API returned", which is always possible according to the definition of read/write span.
I believe it cannot apply there (it could at most be unspecified if the span is already ended).

Ping @bogdandrutu

github-actions · 2022-09-08T03:57:38Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

jsuereth · 2022-09-08T15:15:08Z

@bogdandrutu Can you address @Oberon00's follow on comment?

github-actions · 2022-09-16T04:05:10Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2022-09-30T04:09:10Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

bogdandrutu requested review from a team August 25, 2022 14:24

yurishkuro reviewed Aug 25, 2022

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

bogdandrutu force-pushed the clarifyub branch from 5f31806 to 0f6d165 Compare August 25, 2022 14:32

tigrannajaryan reviewed Aug 25, 2022

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

yurishkuro reviewed Aug 25, 2022

View reviewed changes

specification/trace/sdk.md Outdated Show resolved Hide resolved

yurishkuro approved these changes Aug 25, 2022

View reviewed changes

Update specification/trace/sdk.md

8c14e4f

Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>

yurishkuro approved these changes Aug 25, 2022

View reviewed changes

github-actions bot assigned jsuereth Aug 25, 2022

arminru added area:sdk Related to the SDK spec:trace Related to the specification/trace directory labels Aug 25, 2022

arminru approved these changes Aug 25, 2022

View reviewed changes

Oberon00 previously requested changes Aug 26, 2022

View reviewed changes

jsuereth approved these changes Aug 30, 2022

View reviewed changes

Oberon00 reviewed Aug 31, 2022

View reviewed changes

carlosalberto approved these changes Aug 31, 2022

View reviewed changes

github-actions bot added the Stale label Sep 8, 2022

jsuereth removed the Stale label Sep 8, 2022

github-actions bot added the Stale label Sep 16, 2022

github-actions bot closed this Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify that changing the Span async from OnStart is undefined behavior #2751

Clarify that changing the Span async from OnStart is undefined behavior #2751

bogdandrutu commented Aug 25, 2022

Oberon00 left a comment •

edited

Loading

bogdandrutu commented Aug 26, 2022

Oberon00 commented Aug 26, 2022 •

edited

Loading

bogdandrutu commented Aug 26, 2022 •

edited

Loading

Oberon00 commented Aug 29, 2022 •

edited

Loading

Oberon00 commented Aug 31, 2022 •

edited

Loading

Oberon00 Aug 31, 2022

carlosalberto Sep 22, 2022

github-actions bot commented Sep 8, 2022

jsuereth commented Sep 8, 2022

github-actions bot commented Sep 16, 2022

github-actions bot commented Sep 30, 2022

Clarify that changing the Span async from OnStart is undefined behavior #2751

Clarify that changing the Span async from OnStart is undefined behavior #2751

Conversation

bogdandrutu commented Aug 25, 2022

Oberon00 left a comment • edited Loading

Choose a reason for hiding this comment

bogdandrutu commented Aug 26, 2022

Oberon00 commented Aug 26, 2022 • edited Loading

bogdandrutu commented Aug 26, 2022 • edited Loading

Oberon00 commented Aug 29, 2022 • edited Loading

Oberon00 commented Aug 31, 2022 • edited Loading

Oberon00 Aug 31, 2022

Choose a reason for hiding this comment

carlosalberto Sep 22, 2022

Choose a reason for hiding this comment

github-actions bot commented Sep 8, 2022

jsuereth commented Sep 8, 2022

github-actions bot commented Sep 16, 2022

github-actions bot commented Sep 30, 2022

Oberon00 left a comment •

edited

Loading

Oberon00 commented Aug 26, 2022 •

edited

Loading

bogdandrutu commented Aug 26, 2022 •

edited

Loading

Oberon00 commented Aug 29, 2022 •

edited

Loading

Oberon00 commented Aug 31, 2022 •

edited

Loading