-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify that changing the Span async from OnStart is undefined behavior #2751
Conversation
The reason to make this undefined behavior is because it is not guarantee that the span is still active after the call to OnStart finishes, so instead of doing a very complicated synchronization logic which none of the SDK does, mark this as undefined behavior. Signed-off-by: Bogdan Drutu <bogdandrutu@gmail.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should weaken that requirement.
so instead of doing a very complicated synchronization logic which none of the SDK does, mark this as undefined behavior.
What would that synchronization logic be? The SDK needs to do some synchronization logic for spans anyway because they are thread safe, and it is valid to use a span after it ended. We specified that the write operations will be no-op though.
I originally wanted to object that processors could use OnEnd to detect when the span ends, but once OnEnd is invoked the span is already ended, so that would be too late. Of course one can also check if the span has ended (there is an explicit requirement that this is queryable from a readable span), but that is transient information as well.
Still, for some use cases a "works most of the time" will be enough, e.g. to regularly add a stacktrace in an event on the span.
I think we don't need to add any further requirements to the spec, this is already expressed through the text on End:
Implementations SHOULD ignore all subsequent calls to End and any other Span methods, i.e. the Span becomes non-recording by being ended (there might be exceptions when Tracer is streaming events and has no mutable state associated with the Span).
I think it does makes sense to add a note that after OnStart returns the span can be ended at any time, and this will happen before OnEnd is invoked.
There is one contradictory statement in the spec under https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#span-operations
With the exception of the function to retrieve the Span's SpanContext and recording status, none of the below may be called after the Span is finished.
The "may" is lowercased, and it seems inconsistent with the spec part for Span.End quoted above. We should change one of these locations, probably based on what SDKs allow today. If we want to keep the ban on calling span methods "after the Span is finished" it would be nice to clarify what this means in combination with thread safety, i.e. I think this would make End thread un-safe.
You are making a not always true assumption that the same "instance" is returned in the OnStart as well as to the instrumentation user via StartSpan, in terms of synchronization etc.
False; The operation happens during
I don't see these use-cases, and again you are requiring to return a thread safe implementation if you do this. Also any implementation that "works most of the time" is a bad implementation, so we should not encourage that.
Sure, we should fix the inconsistency in the specs. Undefined Behavior is for the instance passed as argument to |
I am indeed making this assumption, and I think all SDKs will have to implement it that way or at least something very close to it, because we have the following in the spec (note especially the second paragraph) at https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#additional-span-interfaces:
Only the read-access is specific to the SDK, and that we need to have thread safe together with concurrent write access anyways because of the valid read-only use cases that are allowed even after this PR.
I think the example I mentioned, attaching an event with the stacktrace regularly to every span, is a valid one (e.g. every 2 minutes to detect long-running spans).
Sure, we can even explicitly discourage it, but IMHO we should not make currently allowed valid use cases impossible (technically this would even be a breaking change to a stable API) |
@Oberon00 how come is a breaking change? A UB does not mean code breaking, means undefined behavior, so the behavior if a user does that is not defined. Is it defined right now and we are changing that? I just proved that you defined it with an assumption in mind, which is not documented to be the case. So hence this behavior was not previously defined. To understand where I am coming from, try to implement this using a non thread safe mechanism for write (not the way you have this implementation in mind), that also does not accept any more writes after the function ends? Is that a valid implementation that I have? It is, then my implementation will crash if users are doing what you want to do, so this is a definition of undefined behavior, if based on the current rules I can have 2 implementations one that crashes one that does not, users should be aware of that. |
I just quoted the spec test that confirms my assumption:
|
I think it's technically a breaking change, but I don't actually feel this issue is very important either way, so I'm dismissing my request for changes.
One follow-up comment: We should clarify (generally, maybe in the glossary) what we mean by "undefined behavior". My default assumption, coming from C++, is basically: "it's absolutely forbidden to go into this, even if it works now, it might cause security breaches, exceptions, or any other bad thing in the future". |
@@ -491,6 +491,8 @@ exceptions. | |||
SHOULD be reflected in it. | |||
For example, this is useful for creating a SpanProcessor that periodically | |||
evaluates/prints information about all active span from a background thread. | |||
When a reference is kept, the span MUST be treated as a readonly object, | |||
trying to modify the Span after `OnStart` method returns is undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify if this also applies if one "somehow obtained the same Span instance and type that the span creation API returned", which is always possible according to the definition of read/write span.
I believe it cannot apply there (it could at most be unspecified if the span is already ended).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping @bogdandrutu
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
@bogdandrutu Can you address @Oberon00's follow on comment? |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
The reason to make this undefined behavior is because it is not guarantee that the span is still active after the call to OnStart finishes,
so instead of doing a very complicated synchronization logic which none of the SDK does, mark this as undefined behavior.
Signed-off-by: Bogdan Drutu bogdandrutu@gmail.com