Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka Client: traceparent not detected / not correctly propagated #4854

Closed
leewoobin789 opened this issue Dec 9, 2021 · 8 comments
Closed
Labels
bug Something isn't working

Comments

@leewoobin789
Copy link

Describe the bug
For the sake of demo, i have created 3 services that are communication to each other over kafka.
A NodeJs service ("orderService") sends a kafka message ("order"), which is consumed by both Vertx-Java-Application ("reportingService") and Sprint-Application ("paymentService")

What i found out in this case is that the instrumentation for kafka client with vertx was perfectly done with the same traceId but in the other consumer of spring application, i se basically another traceId beding generated. it indicates that the traceparent header of kafka message is not being detected in spring-application, where i used KafkaListener.

orderService =order topic=> reportingService(perfect) & reportingService(defect)

Steps to reproduce

- orderService [NodeJs]

InstrumentationLibrary opentelemetry-instrumentation-kafkajs 0.26.1
Span #0
Trace ID : 1ee56906e5b2bde4368e859b18050996
Parent ID : 94e75fb3091d0010
ID : 4ac7496ef8c3a2c7
Name : order
Kind : SPAN_KIND_PRODUCER
Start time : 2021-12-09 13:35:56.95997824 +0000 UTC
End time : 2021-12-09 13:35:56.963647232 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> messaging.system: STRING(kafka)
-> messaging.destination: STRING(order)
-> messaging.destination_kind: STRING(topic)

- reportingService [Vertx Java]

SpanLink #0
-> Trace ID: 1ee56906e5b2bde4368e859b18050996
-> ID: 4ac7496ef8c3a2c7
-> TraceState:
-> DroppedAttributesCount: 0
InstrumentationLibrarySpans #1
InstrumentationLibrary io.vertx
Span #0
Trace ID : 1ee56906e5b2bde4368e859b18050996
Parent ID : 7440fb0a7651441f
ID : 7e16eb5654698647
Name : send
Kind : SPAN_KIND_CLIENT
Start time : 2021-12-09 13:35:56.965394055 +0000 UTC
End time : 2021-12-09 13:35:56.965542188 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(vert.x-eventloop-thread-0)
-> thread.id: INT(22)
-> message_bus.destination: STRING(reporting.vertx.first.worloads.addr)
Span #1
Trace ID : 1ee56906e5b2bde4368e859b18050996
Parent ID : 7440fb0a7651441f
ID : 5445462bbc1f12e4
Name : send
Kind : SPAN_KIND_CLIENT
Start time : 2021-12-09 13:35:56.965582978 +0000 UTC
End time : 2021-12-09 13:35:56.965667008 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(vert.x-eventloop-thread-0)
-> thread.id: INT(22)
-> message_bus.destination: STRING(reporting.vertx.second.worloads.addr)
Span #2
Trace ID : 1ee56906e5b2bde4368e859b18050996
Parent ID : 4ac7496ef8c3a2c7
ID : 7440fb0a7651441f
Name : kafka_receive
Kind : SPAN_KIND_CONSUMER
Start time : 2021-12-09 13:35:56.964979 +0000 UTC
End time : 2021-12-09 13:35:56.965681383 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(vert.x-eventloop-thread-0)
-> peer.port: STRING(9092)
-> peer.address: STRING(demo-kafka-cp-kafka:9092)
-> peer.hostname: STRING(demo-kafka-cp-kafka)
-> peer.service: STRING(kafka)
-> thread.id: INT(22)
-> message_bus.destination: STRING(order)
Span #3
Trace ID : 1ee56906e5b2bde4368e859b18050996
Parent ID : 5445462bbc1f12e4
ID : 473de02c726afe63
Name : send
Kind : SPAN_KIND_SERVER
Start time : 2021-12-09 13:35:56.96578 +0000 UTC
End time : 2021-12-09 13:35:57.967911815 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(vert.x-eventloop-thread-1)
-> thread.id: INT(24)
-> message_bus.destination: STRING(reporting.vertx.second.worloads.addr)
Span #4
Trace ID : 1ee56906e5b2bde4368e859b18050996
Parent ID : 7e16eb5654698647
ID : 76a5d851cc9ded97
Name : send
Kind : SPAN_KIND_SERVER
Start time : 2021-12-09 13:35:56.965809 +0000 UTC
End time : 2021-12-09 13:35:57.968290501 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(vert.x-eventloop-thread-0)
-> thread.id: INT(22)
-> message_bus.destination: STRING(reporting.vertx.first.worloads.addr)

- paymentService[Spring Java]

InstrumentationLibrary io.opentelemetry.kafka-clients-0.11 1.7.2
Span #0
Trace ID : 352e1e2dbda022838fcd7df0e57366bd
Parent ID :
ID : 3dab64a6cc1fde2d
Name : order receive
Kind : SPAN_KIND_CONSUMER
Start time : 2021-12-09 13:35:54.319613 +0000 UTC
End time : 2021-12-09 13:35:56.964710077 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(payment-service-0-C-1)
-> messaging.operation: STRING(receive)
-> messaging.destination: STRING(order)
-> thread.id: INT(20)
-> messaging.system: STRING(kafka)
-> messaging.destination_kind: STRING(topic)
Span #1
Trace ID : 352e1e2dbda022838fcd7df0e57366bd
Parent ID : 3dab64a6cc1fde2d
ID : ceda32efd081b027
Name : order process
Kind : SPAN_KIND_CONSUMER
Start time : 2021-12-09 13:35:56.964873 +0000 UTC
End time : 2021-12-09 13:35:57.967409404 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> thread.name: STRING(payment-service-0-C-1)
-> messaging.operation: STRING(process)
-> messaging.destination: STRING(order)
-> messaging.message_payload_size_bytes: INT(58)
-> thread.id: INT(20)
-> messaging.system: STRING(kafka)
-> messaging.destination_kind: STRING(topic)
-> messaging.kafka.partition: INT(0)

What did you expect to see?
Same TraceId, sothat the end-to-end correlation is visible.

What did you see instead?
Different TraceId in Spring application. it was also same case in kafka connect.

What version are you using?
v1.7.2

Environment
Compiler: OpenJDK11
OS: Ubuntu 20.04
Runtime (if different from JDK above): openjdk:11-jdk
OS (if different from OS compiled on): Kubernetes (Kind)

Additional context
Nothing more

@leewoobin789 leewoobin789 added the bug Something isn't working label Dec 9, 2021
@leewoobin789 leewoobin789 changed the title Kafka Client: traceparent not detected / propagated Kafka Client: traceparent not detected / not correctly propagated Dec 9, 2021
@leewoobin789
Copy link
Author

leewoobin789 commented Dec 9, 2021

i guess this is a bug with io.opentelemetry.kafka-clients-0.11

@mateuszrzeszutek
Copy link
Member

Hey @leewoobin789 ,

  • Can you observe any span links being added on the consumer side? By default, the agent will follow the OTel messaging semantic conventions and add links between the PRODUCER and CONSUMER process spans. Yo turn off that behavior you can set the otel.instrumentation.common.experimental.suppress-messaging-receive-spans system property to true.
  • Are there any traceparent headers in the kafka message received by the paymentService?

@leewoobin789
Copy link
Author

Hey @mateuszrzeszutek

Suppressing a specific type of spans is not unfortunately desired at all in our case.
Here are the span flow of the kafka consumers.

Resource labels:
     -> container.id: STRING(b5dce0eb1a3db5003e38c82c324fb34f0fc230769fa0a18895689c4eba80cc6d)
     -> host.arch: STRING(amd64)
     -> host.name: STRING(payment-service-7db8c68887-n26nl)
     -> k8s.container.name: STRING(payment-api)
     -> k8s.deployment.name: STRING(payment-service)
     -> k8s.namespace.name: STRING(demo)
     -> k8s.node.name: STRING(otel-sandbox-worker)
     -> k8s.pod.name: STRING(payment-service-7db8c68887-n26nl)
     -> k8s.replicaset.name: STRING(payment-service-7db8c68887)
     -> os.description: STRING(Linux 5.11.0-41-generic)
     -> os.type: STRING(linux)
     -> process.command_line: STRING(/usr/local/openjdk-11:bin:java -javaagent:/otel-auto-instrumentation/javaagent.jar)
     -> process.executable.path: STRING(/usr/local/openjdk-11:bin:java)
     -> process.pid: INT(10)
     -> process.runtime.description: STRING(Oracle Corporation OpenJDK 64-Bit Server VM 11.0.12+7)
     -> process.runtime.name: STRING(OpenJDK Runtime Environment)
     -> process.runtime.version: STRING(11.0.12+7)
     -> service.name: STRING(payment-service)
     -> telemetry.auto.version: STRING(1.7.2)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.1)
InstrumentationLibrarySpans #0
InstrumentationLibrary io.opentelemetry.kafka-clients-0.11 1.7.2
Span #0
    Trace ID       : 67f29c39a3bad1b1ec04dcff2a60e286
    Parent ID      : 
    ID             : 79eec569dd2ee2b3
    Name           : order receive
    Kind           : SPAN_KIND_CONSUMER
    Start time     : 2021-12-09 15:14:05.926356 +0000 UTC
    End time       : 2021-12-09 15:14:08.506293665 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(payment-service-0-C-1)
     -> messaging.operation: STRING(receive)
     -> messaging.destination: STRING(order)
     -> messaging.system: STRING(kafka)
     -> thread.id: INT(20)
     -> messaging.destination_kind: STRING(topic)
Span #1
    Trace ID       : 67f29c39a3bad1b1ec04dcff2a60e286
    Parent ID      : 79eec569dd2ee2b3
    ID             : 11b480960379bcbb
    Name           : order process
    Kind           : SPAN_KIND_CONSUMER
    Start time     : 2021-12-09 15:14:08.506852 +0000 UTC
    End time       : 2021-12-09 15:14:09.515723945 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(payment-service-0-C-1)
     -> messaging.operation: STRING(process)
     -> messaging.destination: STRING(order)
     -> messaging.kafka.partition: INT(0)
     -> messaging.system: STRING(kafka)
     -> thread.id: INT(20)
     -> messaging.message_payload_size_bytes: INT(58)
     -> messaging.destination_kind: STRING(topic)
Links:
SpanLink #0
     -> Trace ID: bcf9938cf318c73089abc78c394a9a87
     -> ID: ba0535bd604710e0
     -> TraceState: 
     -> DroppedAttributesCount: 0
Span #2
    Trace ID       : 67f29c39a3bad1b1ec04dcff2a60e286
    Parent ID      : 11b480960379bcbb
    ID             : 4f1facdd2af38afa
    Name           : second-topic send
    Kind           : SPAN_KIND_PRODUCER
    Start time     : 2021-12-09 15:14:09.510821173 +0000 UTC
    End time       : 2021-12-09 15:14:09.529769894 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(payment-service-0-C-1)
     -> messaging.destination: STRING(second-topic)
     -> messaging.system: STRING(kafka)
     -> thread.id: INT(20)
     -> messaging.destination_kind: STRING(topic)

2021-12-09T15:14:12.525Z        INFO    loggingexporter/logging_exporter.go:40  TracesExporter  {"#spans": 8}
2021-12-09T15:14:12.526Z        DEBUG   loggingexporter/logging_exporter.go:49  ResourceSpans #0
Resource labels:
     -> container.id: STRING(cc7e66f9e6aea32bafe61106635fb2c6785c95bd6c1fd86a4170f8fdca03de37)
     -> host.arch: STRING(amd64)
     -> host.name: STRING(reporting-service-666c68c845-v6jsp)
     -> k8s.container.name: STRING(reporting-api)
     -> k8s.deployment.name: STRING(reporting-service)
     -> k8s.namespace.name: STRING(demo)
     -> k8s.node.name: STRING(otel-sandbox-worker)
     -> k8s.pod.name: STRING(reporting-service-666c68c845-v6jsp)
     -> k8s.replicaset.name: STRING(reporting-service-666c68c845)
     -> os.description: STRING(Linux 5.11.0-41-generic)
     -> os.type: STRING(linux)
     -> process.command_line: STRING(/usr/local/openjdk-11:bin:java -javaagent:/otel-auto-instrumentation/javaagent.jar)
     -> process.executable.path: STRING(/usr/local/openjdk-11:bin:java)
     -> process.pid: INT(7)
     -> process.runtime.description: STRING(Oracle Corporation OpenJDK 64-Bit Server VM 11.0.12+7)
     -> process.runtime.name: STRING(OpenJDK Runtime Environment)
     -> process.runtime.version: STRING(11.0.12+7)
     -> service.name: STRING(reportingService)
     -> telemetry.auto.version: STRING(1.7.2)
     -> telemetry.sdk.language: STRING(java)
     -> telemetry.sdk.name: STRING(opentelemetry)
     -> telemetry.sdk.version: STRING(1.7.1)
InstrumentationLibrarySpans #0
InstrumentationLibrary io.opentelemetry.kafka-clients-0.11 1.7.2
Span #0
    Trace ID       : d506fa3230c7df84c36c5effda29d91b
    Parent ID      : 
    ID             : 19562ddb55105812
    Name           : order receive
    Kind           : SPAN_KIND_CONSUMER
    Start time     : 2021-12-09 15:14:07.823709 +0000 UTC
    End time       : 2021-12-09 15:14:08.503896589 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-kafka-consumer-thread-0)
     -> messaging.operation: STRING(receive)
     -> messaging.destination: STRING(order)
     -> messaging.system: STRING(kafka)
     -> thread.id: INT(26)
     -> messaging.destination_kind: STRING(topic)
Span #1
    Trace ID       : d506fa3230c7df84c36c5effda29d91b
    Parent ID      : 19562ddb55105812
    ID             : b02ed37c49a777a4
    Name           : order process
    Kind           : SPAN_KIND_CONSUMER
    Start time     : 2021-12-09 15:14:08.504728 +0000 UTC
    End time       : 2021-12-09 15:14:08.507793606 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-0)
     -> messaging.operation: STRING(process)
     -> messaging.destination: STRING(order)
     -> messaging.kafka.partition: INT(0)
     -> messaging.system: STRING(kafka)
     -> thread.id: INT(22)
     -> messaging.message_payload_size_bytes: INT(58)
     -> messaging.destination_kind: STRING(topic)
Links:
SpanLink #0
     -> Trace ID: bcf9938cf318c73089abc78c394a9a87
     -> ID: ba0535bd604710e0
     -> TraceState: 
     -> DroppedAttributesCount: 0
InstrumentationLibrarySpans #1
InstrumentationLibrary io.vertx 
Span #0
    Trace ID       : bcf9938cf318c73089abc78c394a9a87
    Parent ID      : b6e14451de2c2759
    ID             : 51cb531de62cf3ef
    Name           : send
    Kind           : SPAN_KIND_CLIENT
    Start time     : 2021-12-09 15:14:08.506975852 +0000 UTC
    End time       : 2021-12-09 15:14:08.507527172 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-0)
     -> thread.id: INT(22)
     -> message_bus.destination: STRING(reporting.vertx.first.worloads.addr)
Span #1
    Trace ID       : bcf9938cf318c73089abc78c394a9a87
    Parent ID      : b6e14451de2c2759
    ID             : bb7141f07cff998b
    Name           : send
    Kind           : SPAN_KIND_CLIENT
    Start time     : 2021-12-09 15:14:08.507599281 +0000 UTC
    End time       : 2021-12-09 15:14:08.507738595 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-0)
     -> thread.id: INT(22)
     -> message_bus.destination: STRING(reporting.vertx.second.worloads.addr)
Span #2
    Trace ID       : bcf9938cf318c73089abc78c394a9a87
    Parent ID      : ba0535bd604710e0
    ID             : b6e14451de2c2759
    Name           : kafka_receive
    Kind           : SPAN_KIND_CONSUMER
    Start time     : 2021-12-09 15:14:08.505408 +0000 UTC
    End time       : 2021-12-09 15:14:08.507764238 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-0)
     -> peer.port: STRING(9092)
     -> peer.address: STRING(demo-kafka-cp-kafka:9092)
     -> peer.hostname: STRING(demo-kafka-cp-kafka)
     -> peer.service: STRING(kafka)
     -> thread.id: INT(22)
     -> message_bus.destination: STRING(order)
Span #3
    Trace ID       : bcf9938cf318c73089abc78c394a9a87
    Parent ID      : bb7141f07cff998b
    ID             : 4cedbd6fd73b7b15
    Name           : send
    Kind           : SPAN_KIND_SERVER
    Start time     : 2021-12-09 15:14:08.507901 +0000 UTC
    End time       : 2021-12-09 15:14:09.510519893 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-1)
     -> thread.id: INT(24)
     -> message_bus.destination: STRING(reporting.vertx.second.worloads.addr)
Span #4
    Trace ID       : bcf9938cf318c73089abc78c394a9a87
    Parent ID      : 51cb531de62cf3ef
    ID             : afc77183862e5bee
    Name           : send
    Kind           : SPAN_KIND_SERVER
    Start time     : 2021-12-09 15:14:08.507953 +0000 UTC
    End time       : 2021-12-09 15:14:09.512430054 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-0)
     -> thread.id: INT(22)
     -> message_bus.destination: STRING(reporting.vertx.first.worloads.addr)
InstrumentationLibrarySpans #2
InstrumentationLibrary io.opentelemetry.netty-4.1 1.7.2
Span #0
    Trace ID       : d506fa3230c7df84c36c5effda29d91b
    Parent ID      : b02ed37c49a777a4
    ID             : 9e5ec6cd54f7d10d
    Name           : CONNECT
    Kind           : SPAN_KIND_CLIENT
    Start time     : 2021-12-09 15:14:09.516981292 +0000 UTC
    End time       : 2021-12-09 15:14:09.518650012 +0000 UTC
    Status code    : STATUS_CODE_ERROR
    Status message : 
Attributes:
     -> thread.name: STRING(vert.x-eventloop-thread-0)
     -> net.transport: STRING(ip_tcp)
     -> thread.id: INT(22)

and also the traceparent header received by payment-service was correct as expected but not the traceId in the log output

{
  "@timestamp":"2021-12-09T15:14:08.509Z", 
  "log.level": "INFO", 
  "message":"header key -> traceparent, header value -> 00-bcf9938cf318c73089abc78c394a9a87-ba0535bd604710e0-01", 
  "ecs.version": "1.2.0",
  "service.name":"payment-service",
  "event.dataset":"payment-service.log",
  "process.thread.name":"payment-service-0-C-1",
 
 "log.logger":"com.otelsandbox.paymentservice.component.KafkaConsumer",
  "span_id":"11b480960379bcbb",
  "trace_flags":"01",
  "trace_id":"67f29c39a3bad1b1ec04dcff2a60e286"
}

@mateuszrzeszutek
Copy link
Member

Okay, there is a span link in the CUSTOMER process span emitted by the paymentService:

Links:
SpanLink #0
     -> Trace ID: bcf9938cf318c73089abc78c394a9a87
     -> ID: ba0535bd604710e0
     -> TraceState: 
     -> DroppedAttributesCount: 0

This means that the context was propagated correctly. The instrumentation purposefully did not use it as the parent of the CONSUMER span, but instead attached it as a link; that is because OTel semantic conventions recommend using span links when you have separate CONSUMER spans for receiving and processing the message.
Please try the otel.instrumentation.common.experimental.suppress-messaging-receive-spans flag -- yes, it will suppress the CONSUMER receive span, but you'll still have the CONSUMER process span in your service; and it will use the "simple" approach to propagating context and just use the passed PRODUCER span context as parent of the CONSUMER.

@leewoobin789
Copy link
Author

ok i will try that out! :)

@leewoobin789
Copy link
Author

@mateuszrzeszutek

{
"@timestamp":"2021-12-09T15:57:28.536Z", 
"log.level": "INFO", 
"message":"header key -> traceparent, header value -> 00-10be9311ede764887f76bd0478c831d4-fc2fa06f4d989c68-01", "ecs.version": "1.2.0",
"service.name":"payment-service",
"event.dataset":"payment-service.log",
"process.thread.name":"payment-service-0-C-1",
"log.logger":"com.otelsandbox.paymentservice.component.KafkaConsumer",
"span_id":"6757a666a858e25c",
"trace_flags":"01",
"trace_id":"10be9311ede764887f76bd0478c831d4"
}

yeah perfect! it works!!!
nevertheless 'experimental' scares me a lot :) is there any further plan on applying it by default?

@mateuszrzeszutek
Copy link
Member

nevertheless 'experimental' scares me a lot :) is there any further plan on applying it by default?

Good question 😄
The OTel messaging semantic spec is still not stable, I think we can probably expect some changes in the future - e.g. what kind of data is captured by spans, and how they're connected/how relationships between send/receive/process/settle spans look like. I think that regardless of that we probably will have a configuration flag for enabling the "simple" behavior, as it tends to be the most intuitive (even if it does not capture the whole picture).
Anyway, this flag will most likely stay in the agent -- and if it's ever renamed we'll announce it in release docs, with at least 1 minor release of transition where both names will be supported.

@leewoobin789
Copy link
Author

@mateuszrzeszutek
thank you very mush for that precise explanation!
Having an option to just enable the simplified form of message traces would make my life easier XD
ok i close this issue! thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants