Support other propagators in Python Layer #124

NathanielRN · 2021-07-29T18:53:33Z

We want to support other propagators besides the X-Ray propagator in the opentelemetry python lambda layer.

These changes allow that, by making the AwsLambdaInstrumentor class

Read the _X_AMZN_TRACE_ID environment variable FIRST and then, if that fails, from the Lambda event HTTP headers (either using the default event_context_extractor or the user provided one)
The default event_context_extractor uses the global propagator to extract and parent spans with the context in the HTTP headers

Fixes #123

NathanielRN · 2021-07-29T18:58:59Z

python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py

+        parent_context = get_global_textmap().extract(
+            lambda_event.get("headers", {}).update(
+                {"X-Amzn-Trace-Id": environ.get("_X_AMZN_TRACE_ID", "")}
+            )


How will this work if you have both TraceContextPropagator and AwsPropagator? Will the context be a fix of both? With 2 spans?

You should always use the xray propagator format when reading from the environment variable, and use the configured propagator if reading from HTTP headers when the environment variable wasn't sampled

Hm I think I've seen in other Python code that they're able to do just get_global_textmap().extract(headers). But we can't do that and have the behavior you mentioned if the user sets something like OTEL_PROPAGATORS=xray,tracecontext for their environment variables.

Maybe the answer is to read the OTEL_PROPAGATORS variable directly and instantiate the propagators? But that doesn't feel right... Would probably need to investigate the behavior of when we have multiple propagators.

The env variable only ever fills in with AWS format so we can't really rely on configured propagator to read from it. In other languages we directly reference the awsxraypropagator through code without using the global can't you do that here?

Yes that's what we intend to do here! We first try the XRay propagator, but if that fails, we try the global propagator. That's what I interpret the specs on how to parent a Lambda trace saying we should do?

After updating the PR it looks like this:

parent_context = None xray_env_var = os.environ.get("_X_AMZN_TRACE_ID") if xray_env_var: parent_context = AwsXRayFormat().extract( {"X-Amzn-Trace-Id": xray_env_var} ) event_span_attributes = {} if not parent_context or not parent_context.trace_flags.sampled: lambda_event = args[0] # ... Try the global propagator

python/src/otel/otel_sdk/otel-instrument

garrettwegan · 2021-07-29T19:23:12Z

python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py

+        lambda_event, lambda_context = args[0], args[1]
+
+        parent_context = get_global_textmap().extract(
+            lambda_event.get("headers", {}).update(


I don't think we know headers will always be located at lambda_event.headers. If a spec guaranteed all lambda entry points follow this convention we could do this, but I don't believe such a spec exists. See how eventContextExtractor in JS Instrumentation handles this by requiring user to provide header extraction function. Note JS requires user to fully extract into context which is unnecessary for here since you are using global propagator to extract. Here could require user to just return headers from event - similar to WIP Golang instrumentation.

Hm I see what you're saying. Yeah this guide I found online gives an example on how to populate "headers" in API Gateway, but it's not a general solution.

I think we could do something like JS for the manual instrumentation case, but not the auto instrumentation case. In auto instrumentation the AwsLambdaInstrumentor will run without hooks and later on it will be too late to add the hooks.

So either we force users to manual instrument AwsLambdaInstrumentor in their code before their lambda_handler, or we define a mapping "ourselves" here as events["headers"] and explain to users how they can add them...

I think we can add the hooks in a subsequent PR, the question here is just do we want to give users the expectation that we will looks for HTTP headers in the event["headers"]? I think it could for us even though there is no spec, as long as we document it somewhere here?

So event["headers"] may or may not be pointing into the actual customer event. If you invoke lambda directly (cli/sdk) event is your custom event type so yes we could point to one place the header must be, however, if lambda is invoked via some other AWS service (API Gateway, SQS, etc) event is some AWS specified type and your custom event is within that type somewhere.

First of all, many thanks for looking into this.

My 2c:

Lambda handlers always accept 2 arguments: event and context. Event is the message delivered by a Carrier (API GW, SQS, etc) and should have both, data and metadata about the message that can include the context. Propagators define the rules about how the context should be propagated with a particular Carrier.

So, it should be safe to assume that event should contain the required metadata with the context. As the default implementation, it should be safe to assume that event["headers"] will contain the context data and it may be 95% of all cases where people use API-GW and SQS triggered lambdas. In other cases, the users can provide their own extractor and injector implementations, as JS lambda implementation does.

In the case where there is no data in the message (i.e. event["header"]) that carries the context for the defined propagator, it should default to the span with no parent and start a new trace, as opposed to defaulting to AWS X-Ray propagator.

Also, specifying AWS X-Ray propagator should be a choice, i.e. the user should explicitly define it in OTEL_PROPAGATORS=aws_xray env variable.

Hope it makes sense.

@garrettwegan

I guess we need to find out the different branching paths for what an "event could be". For now, I followed the spec where it says how to determine the parent span in Lambda to guide a "guess" for what API Gateway would look like, but I haven't tested it so I wouldn't be comfortable merging this just yet. But if we test it and it works (In the X-Ray case, the not X-Ray case, and the not any propagator case), and people agree with it, we can make this incremental improvement.

@michaelhyatt in trying to follow this spec on how to determine the parent span in Lambda I made several updates to this PR. It still requires serious testing though:

test with API Gateway configured in its 2 possible forms

test when multiple combinations of propagators are set via OTEL_PROPAGATORS

I can't test it this week but if we get a chance to test it in the near future and people are okay with checking event["headers"] as a default we can probably make this incremental improvement to unblock you.

python/src/otel/otel_sdk/otel-instrument

michaelhyatt · 2021-08-03T23:52:26Z

It makes sense @NathanielRN

The only thing I wanted to add is highlight the behaviour when tracing=PassThrough, which should switch off X-Ray propagation, as per the document:

The environment variable will be set and the Context will be sampled only if AWS X-Ray has been enabled for the Lambda function. A user can disable AWS X-Ray for the function if this propagation is not desired.

NathanielRN · 2021-08-04T00:41:18Z

@michaelhyatt

tracing=PassThrough, which should switch off X-Ray propagation, as per the document:

I didn't see the document mention tracing=PassThrough, did you see it somewhere else? In regards to the part you quoted, I think the document is saying that "switching off X-Ray" will mean that the environment variable is simply not set. Without the environment variable, we won't be able to use the X-Ray propagator anyways, so we can fallback to the user global configured propagator as they would expect 🙂 (Therefore, we don't need any additional configuration flags like tracing=PassThrough because it will just try possible actions after the other until a context is extracted).

michaelhyatt · 2021-08-04T01:37:46Z

I was under the impression that _X_AMZN_TRACE_ID gets populated automatically from the HTTP header X-Amzn-Trace-ID rather than being exposed to the end user to control whether the tracing is switched on or off, but I may be wrong. I just want to make sure that I can use OTEL_PTOPAGATORS=tracecontext and not get any parent.id references when I am sending a request without the externally set traceparent header.

I interpreted the requirement for X-Ray tracing to be switched off as turning the Tracing control in the lambda UI off, i.e. setting tracing=PassThrough (as opposed to Active), but I may be wrong here too. BTW, I believe these two values are from CloudFormation.

NathanielRN · 2021-08-13T17:22:30Z

I was under the impression that _X_AMZN_TRACE_ID gets populated automatically from the HTTP header X-Amzn-Trace-ID rather than being exposed to the end user to control whether the tracing is switched on or off, but I may be wrong.

Thanks for that prompt! I did some investigation and found out that you're right about this, so I made a PR in the specifications to follow up on this: open-telemetry/opentelemetry-specification#1867

I interpreted the requirement for X-Ray tracing to be switched off as turning the Tracing control in the lambda UI off, i.e. setting tracing=PassThrough (as opposed to Active), but I may be wrong here too.

Yes I agree with you that sounds like the ideal customer experience!

michaelhyatt · 2021-09-21T10:14:51Z

@NathanielRN I am curious if the fix made it into v1.5.0? Maybe not, I can see this one still being a draft.

NathanielRN · 2021-09-23T17:06:07Z

@michaelhyatt Sorry about that, yes this change hasn't been merged in. Haven't had bandwidth to test this solution even though I think it should work, once it gets tested by publishing a built version of this layer we can merge this change.

NathanielRN · 2021-10-06T00:04:13Z

Hey all, I've updated this PR to have the minimal changes we would need to support this feature 🙂 With these changes, we can parent the context of subsequent spans using the HTTP headers in the lambda event. Further, using event_context_extractor you can define a custom way of extracting the context.

Tests have also been added to validate this.

Please let me know what you think!

anuraaga

Thanks, just some small points

anuraaga · 2021-10-06T00:09:48Z

python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py


        Args:
            **kwargs: Optional arguments
                ``tracer_provider``: a TracerProvider, defaults to global
+                ``event_context_extractor``: a method which takes the Lambda
+                    Event as input and provides the object which contains the
+                    context as output (usually HTTP headers)


IIUC, it returns the Context itself, not HTTP headers, right?

Thanks for that catch! This was outdated, I've updated it now 🙂

anuraaga · 2021-10-06T00:11:24Z

python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py

+        ):
+            return parent_context
+
+        logger.debug(


X-Ray being disabled isn't a failure so we shouldn't log it as such. I guess we don't need this log message anyways

I thought it could be useful as a "debug" log and so labeled it as such, but I agree it's not necessary so I'll remove it! 🙂

python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py

anuraaga · 2021-10-06T00:26:36Z

python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lambda/__init__.py

+    try:
+        headers = lambda_event["headers"]
+    except (TypeError, KeyError):
+        logger.warning("Failed to extract context from Lambda Event.")


We can add more on what a user should do if they see this error. Either X-Ray needs to be enabled or the function must be serving API Gateway as a HTTP proxy. Otherwise, this instrumentation should not be included. Something like that

That makes sense to me. What do you think about

"Extracting context from Lambda Event failed: either enable X-Ray active tracing or configure API Gateway to trigger this Lambda function as a pure proxy. Otherwise, generated spans will have an invalid (empty) parent context."

? I'm wondering if there's a use case where someone would want instrumentation and they're okay with the empty parent context. That's why I though "instrumentation should not be included" to not fit as well.

As does the warning level makes sense?

Hmm - I guess there is a use case where the function does not use X-Ray, and is not API Gateway but meant to always be a root - still it would be a bit weird to have no ability to parent at all but it's conceivable. Your message seems fine then, and yeah it should be debug level if there is a conceivable use case for it rather than always being in error.

Yeah like if they just trigger it from the console is the easiest case I can think of. Maybe that Lambda triggers another Lambda 😝

I've made the updates 👍

…bda/__init__.py Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>

NathanielRN commented Jul 29, 2021

View reviewed changes

python/src/otel/otel_sdk/otel-instrument Outdated Show resolved Hide resolved

garrettwegan reviewed Jul 29, 2021

View reviewed changes

NathanielRN mentioned this pull request Jul 29, 2021

[Python] Tracecontext OTel propagator config generates an additional X-Ray span aws-observability/aws-otel-lambda#123

Closed

wangzlei reviewed Jul 30, 2021

View reviewed changes

python/src/otel/otel_sdk/otel-instrument Outdated Show resolved Hide resolved

NathanielRN mentioned this pull request Jul 30, 2021

Move OTel wrapper py code into instrument script #125

Closed

NathanielRN force-pushed the support-other-propagators branch 5 times, most recently from 5544d57 to 38bf31c Compare October 5, 2021 05:26

NathanielRN mentioned this pull request Oct 5, 2021

Do not force xray propagator, default adds W3C+B3 aws-observability/aws-otel-lambda#155

Merged

NathanielRN force-pushed the support-other-propagators branch from 38bf31c to 1e56587 Compare October 5, 2021 22:45

NathanielRN mentioned this pull request Oct 5, 2021

Add NathanielRN to repo approvers #151

Closed

NathanielRN force-pushed the support-other-propagators branch from 1e56587 to 6d55f27 Compare October 5, 2021 23:47

NathanielRN added 2 commits October 5, 2021 16:50

Format tests to match upstream format

2bcabf5

Remove -s flag since it is only for logging

4965453

NathanielRN force-pushed the support-other-propagators branch 3 times, most recently from 9ee1abe to e625603 Compare October 5, 2021 23:58

Support other propagators in Python Layer

da03b2f

NathanielRN force-pushed the support-other-propagators branch from e625603 to da03b2f Compare October 6, 2021 00:00

NathanielRN marked this pull request as ready for review October 6, 2021 00:01

NathanielRN requested a review from codeboten as a code owner October 6, 2021 00:01

NathanielRN requested review from anuraaga and wangzlei October 6, 2021 00:01

anuraaga reviewed Oct 6, 2021

View reviewed changes

Update event context extractor description + clean up log

6461cf0

NathanielRN requested a review from anuraaga October 6, 2021 00:16

anuraaga reviewed Oct 6, 2021

View reviewed changes

Update python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lam…

930d98c

…bda/__init__.py Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>

NathanielRN force-pushed the support-other-propagators branch from efe66e7 to 930d98c Compare October 6, 2021 04:15

Update python/src/otel/otel_sdk/opentelemetry/instrumentation/aws_lam…

aaa09bd

…bda/__init__.py Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>

NathanielRN requested a review from anuraaga October 6, 2021 04:20

anuraaga approved these changes Oct 6, 2021

View reviewed changes

anuraaga merged commit 861305e into open-telemetry:main Oct 6, 2021

NathanielRN deleted the support-other-propagators branch October 8, 2021 16:53

NathanielRN mentioned this pull request Oct 26, 2021

Add instrumentation for AWS Lambda Service - pkg metadata files (Part 1/2) open-telemetry/opentelemetry-python-contrib#739

Merged

7 tasks

NathanielRN mentioned this pull request Feb 16, 2022

Maintainer Access Request - NathanielRN #219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support other propagators in Python Layer #124

Support other propagators in Python Layer #124

NathanielRN commented Jul 29, 2021 •

edited

Loading

NathanielRN Jul 29, 2021

anuraaga Jul 29, 2021

NathanielRN Jul 30, 2021

anuraaga Jul 31, 2021

NathanielRN Aug 3, 2021

garrettwegan Jul 29, 2021 •

edited

Loading

NathanielRN Jul 29, 2021

garrettwegan Jul 29, 2021

michaelhyatt Jul 30, 2021

NathanielRN Jul 30, 2021

michaelhyatt commented Aug 3, 2021

NathanielRN commented Aug 4, 2021

michaelhyatt commented Aug 4, 2021

NathanielRN commented Aug 13, 2021

michaelhyatt commented Sep 21, 2021

NathanielRN commented Sep 23, 2021

NathanielRN commented Oct 6, 2021

anuraaga left a comment

anuraaga Oct 6, 2021

NathanielRN Oct 6, 2021

anuraaga Oct 6, 2021

NathanielRN Oct 6, 2021

anuraaga Oct 6, 2021

NathanielRN Oct 6, 2021

anuraaga Oct 6, 2021

NathanielRN Oct 6, 2021

Support other propagators in Python Layer #124

Support other propagators in Python Layer #124

Conversation

NathanielRN commented Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garrettwegan Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelhyatt commented Aug 3, 2021

NathanielRN commented Aug 4, 2021

michaelhyatt commented Aug 4, 2021

NathanielRN commented Aug 13, 2021

michaelhyatt commented Sep 21, 2021

NathanielRN commented Sep 23, 2021

NathanielRN commented Oct 6, 2021

anuraaga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NathanielRN commented Jul 29, 2021 •

edited

Loading

garrettwegan Jul 29, 2021 •

edited

Loading