Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define span attributes for messaging systems #418

Merged

Conversation

arminru
Copy link
Member

@arminru arminru commented Jan 23, 2020

We already prepared a proposal regarding semantic conventions for messaging systems but @kbrockhoff was faster with his PR #395. I sent him our proposal and he's fine with it, so I integrated his suggestions into ours and we're moving forward with this PR now.

This proposal includes the component field like the other semantic conventions currently do. There are discussions (#271 and #336 ) for removing that field but there hasn't been any agreement yet.

Regarding an attribute describing the service account for connecting to the messaging system (if desired): Should we add a "user" attribute per type (db.user, messaging.user, rpc.user and the like) or should we define a general attribute for that? (brought up by @bogdandrutu on #395 (comment))

@arminru
Copy link
Member Author

arminru commented Jan 30, 2020

@open-telemetry/specs-approvers PTAL 🙂

@arminru
Copy link
Member Author

arminru commented Feb 6, 2020

@open-telemetry/specs-approvers PTAL 🙂

@carlosalberto
Copy link
Contributor

@open-telemetry/specs-approvers Please review this PR. It's around semantic conventions, and it augments what we have (in this case, adding a section for messaging Spans specifically), so it's not breaking anything.

Would love to have this merged (or reviewed/tuned) soon ;)

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, It would be great to have some specific examples. Also requested some clarifications

specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
specification/data-messaging.md Outdated Show resolved Hide resolved
@SergeyKanzhelev
Copy link
Member

The last question I have left is the library.name vs. messaging.system

| `messaging.temp_destination` | A boolean that is `true` if the message destination is temporary. | If temporary (assumed to be `false` if missing). |
| `messaging.protocol` | The transport protocol such as `AMQP` or `MQTT`. | No |
| `messaging.url` | Connection substring such as `tibjmsnaming://localhost:7222` or `https://queue.amazonaws.com/80398EXAMPLE/MyQueue`. | No |
| `messaging.message_id` | A value used by the messaging system as an identifier for the message, represented as a string. | No |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if messages are batched? I assume message_id is not populated, is it correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about producing, receiving or processing of the messages?
If they are produced one at a time, received in a batch and then processed one at a time, each producing and each processing span should have the respective message ID. The receiver span won't have any. Please see the batch example at the end of the file.

| `messaging.operation` | A string identifying which part and kind of message consumption this span describes: either `receive` or `process`. (If the operation is `send`, this attribute must not be set: the operation can be inferred from the span kind in that case.) | No |

The _receive_ span is be used to track the time used for receiving the message(s), whereas the _process_ span(s) track the time for processing the message(s).
Note that one or multiple Spans with `messaging.operation` = `process` may often be the children of a Span with `messaging.operation` = `receive`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you provide reasons/preconditions behind making receive span a parent of process? What about receive duration then, it seems to end before children (which is likely ok)?

Also, in the context of sampling, how receive is sampled considering it knows linked context after call completes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to track the time spent receiving the message(s) separately from the time spent for processing them. The receiver span might or might not end before the processor spans are started - both would be correct.

I cannot quite follow the last question. If receiving is tracked separately, the context will usually not have been propagated at the time the receiver span is started. Also, in batch receive scenarios, there won't be one single context to act as a parent for the receive span. Therefore, the receive span will usually be a root span. The child spans tracking message processing will link to the message producer spans as shown in the batch example at the bottom of the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's likely that consumer writes code like

messages = queue.receive(5);

foreach (message in messages)
{
   processSpan = trace.StartActiveSpan("my-queue");
   // do stuff
   span.end()

}  

Logically by the time processing starts, receive is completed and it's span context is not current/may be lost. Do I have to pass it to process span?

I'm on the library instrumentation side (Azure EventHubs and other messaging offerings in Azure) i.e. if I instrument receive, I would have no reasonable means to pass it's context to users so they can not possibly start 'process' span as a child of 'receive'.

I want to understand reasons behind the choice of making process child of receive and perhaps removing mentions of it if this is optional, there is no strong need for it.

Regarding sampling:
OTel samplers check links and if there are links that are sampled in, sampler will also make 'sampled in' decision for span.
But links are not known, as you mentioned, before the 'receive' span starts. This basically means low chance for receive to be consistently sampled with the messages in the batch which makes it useless.
I decided against instrumenting receive at all for the near future, and this is one of the reasons for it.

I've heard some instrumentations solve it through delaying span creation until messages are received to make sampling consistent. Was it considered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically by the time processing starts, receive is completed and it's span context is not current/may be lost. Do I have to pass it to process span?

This depends on the OTel implementation. In Java, for example, one could create a scope around the foreach in which the receive span is active. Ending the receive span does not "deactivate" it, so it would still be picked up as parent by the process spans. Passing the receive span or its spancontext explicitly would also be an option, yes.

I'm on the library instrumentation side (Azure EventHubs and other messaging offerings in Azure) i.e. if I instrument receive, I would have no reasonable means to pass it's context to users so they can not possibly start 'process' span as a child of 'receive'.

If you're on the library instrumentation side and only instrument the receive call which returns a number of messages without having any further hooks to instrument, then you will not be able to produce the process spans, yes. This is a limitation by the instrumentation approach and it should be fine to just have receive spans without process spans in this case since this is all you can trace this way. How else would you trace the time spent processing messages given these limitations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding sampling
I'd expect the sampler used to sample the receive calls if the information is deemed interesting/valuable based on the span attributes being set. In this case the sampler can deduce from the required span attributes that this is a message-receive span and decide accordingly.

If the receive call is sampled, the process spans would be children of the receive span and bear links to the send/create spans from the spancontext being propagated in the message metadata, if available.
If the receive call is not sampled, the process spans will either be the child of some other span and add the aforementioned links or - if there is no implicit parent - use the spancontext extracted from the message metadata as a parent.

I've heard some instrumentations solve it through delaying span creation until messages are received to make sampling consistent. Was it considered?

The process spans are supposed to be created after the message was received, so this is already solved by having two different spans for receiving and processing.

@arminru
Copy link
Member Author

arminru commented Apr 7, 2020

@open-telemetry/specs-approvers who haven't had the chance to review this PR yet, please consider taking a look. Thank you!

@carlosalberto carlosalberto merged commit 6b29667 into open-telemetry:master Apr 7, 2020
@arminru arminru deleted the semconv-messaging branch April 7, 2020 15:09
@arminru arminru added this to the v0.4 milestone May 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants