Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/datadog] Update default logic for service entry spans identification #32005

Open
liustanley opened this issue Mar 27, 2024 · 8 comments
Labels
enhancement New feature or request exporter/datadog Datadog components never stale Issues marked with this label will be never staled and automatically removed

Comments

@liustanley
Copy link
Contributor

liustanley commented Mar 27, 2024

What will change?

Datadog has a notion of service entry spans, which allows users to filter on a subset of traces and indicates that trace metrics should be generated. We are introducing the following mapping in order to consistently identify service entry spans from OTLP spans from the Datadog Exporter.

OpenTelemetry Convention Datadog Convention
Root span Service entry span
Server span (span.kind: server) Service entry span
Consumer span (span.kind: consumer) Service entry span
Client span (span.kind: client) Generate trace metrics
Producer span (span.kind: producer) Generate trace metrics
Internal span (span.kind: internal) No trace metrics generated

Who is affected?

Most users will see an increase in trace metrics with this change, which may affect existing monitors that are based on trace metrics. Users who only have internal spans will see a decrease in trace metrics.

If you are using compute_stats_by_span_kind, you likely won't be affected because we are also computing stats by span kind in this change.

What should I do if I am affected?

If you have existing monitors based on trace metrics, you can update them after upgrading since this change will introduce more consistency in trace metrics. If you only have internal spans, please update your instrumentation according to the above table to receive trace metrics and service entry spans. The transform processor can also be used to update span kind if needed.

When will it change?

This change will first be released as an opt-in feature in order to avoid breaking changes for customers. This feature has already been merged in the datadog-agent repository and is pending release (PR). Once version 7.53.0 of the agent is released (tentatively mid April), this change will be implemented in collector-contrib. Once this feature has been deemed stable, we will enable this change by default.

@liustanley liustanley added the needs triage New item requiring triage label Mar 27, 2024
@dudo
Copy link

dudo commented Mar 28, 2024

Internal span (span.kind: internal) No trace metrics generated

Is there a mechanism in place to create trace metrics for these, ad hoc?

@liustanley
Copy link
Contributor Author

Internal span (span.kind: internal) No trace metrics generated

Is there a mechanism in place to create trace metrics for these, ad hoc?

Not at the moment, but this is something we can consider if there are valid use cases for this. We only intend on computing trace metrics for incoming and outgoing spans, which is why we recommend updating your instrumentation according to the table above to receive trace metrics and service entry spans.

@dudo
Copy link

dudo commented Mar 28, 2024

With the datadog agent (I think that's where it happens, anyway) you can pass _dd.measured as a tag on the span to trigger trace metrics. Could we do something similar with OTel, or do you think it'd be abused?

I guess we could re-type the spans, too. A concrete case for this is with something like graphql.internal. It's not a server span, per se, since the webserver has to receive the request first, but it's still the logical edge of the system. Is that reasonable?

edit: Another case that's more ubiquitous is something like cron.internal. This is the edge of the system, technically, and something that really should be the root span of a trace.

@liustanley
Copy link
Contributor Author

With the datadog agent (I think that's where it happens, anyway) you can pass _dd.measured as a tag on the span to trigger trace metrics. Could we do something similar with OTel, or do you think it'd be abused?

I guess we could re-type the spans, too. A concrete case for this is with something like graphql.internal. It's not a server span, per se, since the webserver has to receive the request first, but it's still the logical edge of the system. Is that reasonable?

edit: Another case that's more ubiquitous is something like cron.internal. This is the edge of the system, technically, and something that really should be the root span of a trace.

Are graphql.internal and cron.internal typically root spans of their traces? If so, they will still get identified as service entry spans. We would prefer re-typing spans over using _dd.measured as an override, but this is something we can consider as well.

Also as an FYI, we changed this feature to be opt-in to avoid breaking changes for customers, and we will enable it by default in the future.

@dudo
Copy link

dudo commented Apr 12, 2024

In those 2 examples, graphql would be behind a webserver, so not typically the root span. Cron would be, so good call on that.

I guess for spans we really want to keep, we just re-type. That's a straight forward enough solution. I know this wasn't directly related to your post, but thanks for answering my questions!

@crobert-1 crobert-1 added the exporter/datadog Datadog components label Apr 15, 2024
Copy link
Contributor

Pinging code owners for exporter/datadog: @mx-psi @dineshg13 @liustanley @songy23 @mackjmr. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

Removing needs triage as this was filed by a code owner.

@songy23 songy23 added enhancement New feature or request never stale Issues marked with this label will be never staled and automatically removed and removed needs triage New item requiring triage labels Apr 15, 2024
mx-psi pushed a commit that referenced this issue May 6, 2024
…ind (#32836)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Add config option for computing top level spans by span kind in the
Datadog connector and exporter.

**Link to tracking Issue:** <Issue number if applicable>

#32005

**Testing:** <Describe what testing was performed and which tests were
added.>

**Documentation:** <Describe the documentation added.>
@liustanley
Copy link
Contributor Author

This change has been released as an opt-in feature as a part of opentelemetry-collector-contrib v0.100.0 (PR). Please refer to the following documentation for usage instructions: https://docs.datadoghq.com/opentelemetry/schema_semantics/service_entry_spans/?tab=otelcollectoranddatadogexporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/datadog Datadog components never stale Issues marked with this label will be never staled and automatically removed
Projects
None yet
Development

No branches or pull requests

4 participants