Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other propagators in Python Layer #124

Merged
merged 6 commits into from
Oct 6, 2021

Conversation

NathanielRN
Copy link
Contributor

@NathanielRN NathanielRN commented Jul 29, 2021

We want to support other propagators besides the X-Ray propagator in the opentelemetry python lambda layer.

These changes allow that, by making the AwsLambdaInstrumentor class

  • Read the _X_AMZN_TRACE_ID environment variable FIRST and then, if that fails, from the Lambda event HTTP headers (either using the default event_context_extractor or the user provided one)
  • The default event_context_extractor uses the global propagator to extract and parent spans with the context in the HTTP headers

Fixes #123

Comment on lines 101 to 167
parent_context = get_global_textmap().extract(
lambda_event.get("headers", {}).update(
{"X-Amzn-Trace-Id": environ.get("_X_AMZN_TRACE_ID", "")}
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work if you have both TraceContextPropagator and AwsPropagator? Will the context be a fix of both? With 2 spans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should always use the xray propagator format when reading from the environment variable, and use the configured propagator if reading from HTTP headers when the environment variable wasn't sampled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I think I've seen in other Python code that they're able to do just get_global_textmap().extract(headers). But we can't do that and have the behavior you mentioned if the user sets something like OTEL_PROPAGATORS=xray,tracecontext for their environment variables.

Maybe the answer is to read the OTEL_PROPAGATORS variable directly and instantiate the propagators? But that doesn't feel right... Would probably need to investigate the behavior of when we have multiple propagators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The env variable only ever fills in with AWS format so we can't really rely on configured propagator to read from it. In other languages we directly reference the awsxraypropagator through code without using the global can't you do that here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's what we intend to do here! We first try the XRay propagator, but if that fails, we try the global propagator. That's what I interpret the specs on how to parent a Lambda trace saying we should do?

After updating the PR it looks like this:

parent_context = None

xray_env_var = os.environ.get("_X_AMZN_TRACE_ID")

if xray_env_var:
    parent_context = AwsXRayFormat().extract(
        {"X-Amzn-Trace-Id": xray_env_var}
    )

event_span_attributes = {}

if not parent_context or not parent_context.trace_flags.sampled:
    lambda_event = args[0]
    # ... Try the global propagator

lambda_event, lambda_context = args[0], args[1]

parent_context = get_global_textmap().extract(
lambda_event.get("headers", {}).update(
Copy link
Contributor

@garrettwegan garrettwegan Jul 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we know headers will always be located at lambda_event.headers. If a spec guaranteed all lambda entry points follow this convention we could do this, but I don't believe such a spec exists. See how eventContextExtractor in JS Instrumentation handles this by requiring user to provide header extraction function. Note JS requires user to fully extract into context which is unnecessary for here since you are using global propagator to extract. Here could require user to just return headers from event - similar to WIP Golang instrumentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I see what you're saying. Yeah this guide I found online gives an example on how to populate "headers" in API Gateway, but it's not a general solution.

I think we could do something like JS for the manual instrumentation case, but not the auto instrumentation case. In auto instrumentation the AwsLambdaInstrumentor will run without hooks and later on it will be too late to add the hooks.

So either we force users to manual instrument AwsLambdaInstrumentor in their code before their lambda_handler, or we define a mapping "ourselves" here as events["headers"] and explain to users how they can add them...

I think we can add the hooks in a subsequent PR, the question here is just do we want to give users the expectation that we will looks for HTTP headers in the event["headers"]? I think it could for us even though there is no spec, as long as we document it somewhere here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So event["headers"] may or may not be pointing into the actual customer event. If you invoke lambda directly (cli/sdk) event is your custom event type so yes we could point to one place the header must be, however, if lambda is invoked via some other AWS service (API Gateway, SQS, etc) event is some AWS specified type and your custom event is within that type somewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, many thanks for looking into this.

My 2c:

Lambda handlers always accept 2 arguments: event and context. Event is the message delivered by a Carrier (API GW, SQS, etc) and should have both, data and metadata about the message that can include the context. Propagators define the rules about how the context should be propagated with a particular Carrier.

So, it should be safe to assume that event should contain the required metadata with the context. As the default implementation, it should be safe to assume that event["headers"] will contain the context data and it may be 95% of all cases where people use API-GW and SQS triggered lambdas. In other cases, the users can provide their own extractor and injector implementations, as JS lambda implementation does.

In the case where there is no data in the message (i.e. event["header"]) that carries the context for the defined propagator, it should default to the span with no parent and start a new trace, as opposed to defaulting to AWS X-Ray propagator.

Also, specifying AWS X-Ray propagator should be a choice, i.e. the user should explicitly define it in OTEL_PROPAGATORS=aws_xray env variable.

Hope it makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@garrettwegan

I guess we need to find out the different branching paths for what an "event could be". For now, I followed the spec where it says how to determine the parent span in Lambda to guide a "guess" for what API Gateway would look like, but I haven't tested it so I wouldn't be comfortable merging this just yet. But if we test it and it works (In the X-Ray case, the not X-Ray case, and the not any propagator case), and people agree with it, we can make this incremental improvement.

@michaelhyatt in trying to follow this spec on how to determine the parent span in Lambda I made several updates to this PR. It still requires serious testing though:

  • test with API Gateway configured in its 2 possible forms
  • test when multiple combinations of propagators are set via OTEL_PROPAGATORS

I can't test it this week but if we get a chance to test it in the near future and people are okay with checking event["headers"] as a default we can probably make this incremental improvement to unblock you.

@michaelhyatt
Copy link

It makes sense @NathanielRN

The only thing I wanted to add is highlight the behaviour when tracing=PassThrough, which should switch off X-Ray propagation, as per the document:

The environment variable will be set and the Context will be sampled only if AWS X-Ray has been enabled for the Lambda function. A user can disable AWS X-Ray for the function if this propagation is not desired.

@NathanielRN
Copy link
Contributor Author

@michaelhyatt

tracing=PassThrough, which should switch off X-Ray propagation, as per the document:

I didn't see the document mention tracing=PassThrough, did you see it somewhere else? In regards to the part you quoted, I think the document is saying that "switching off X-Ray" will mean that the environment variable is simply not set. Without the environment variable, we won't be able to use the X-Ray propagator anyways, so we can fallback to the user global configured propagator as they would expect 🙂 (Therefore, we don't need any additional configuration flags like tracing=PassThrough because it will just try possible actions after the other until a context is extracted).

@michaelhyatt
Copy link

I was under the impression that _X_AMZN_TRACE_ID gets populated automatically from the HTTP header X-Amzn-Trace-ID rather than being exposed to the end user to control whether the tracing is switched on or off, but I may be wrong. I just want to make sure that I can use OTEL_PTOPAGATORS=tracecontext and not get any parent.id references when I am sending a request without the externally set traceparent header.

I interpreted the requirement for X-Ray tracing to be switched off as turning the Tracing control in the lambda UI off, i.e. setting tracing=PassThrough (as opposed to Active), but I may be wrong here too. BTW, I believe these two values are from CloudFormation.

@NathanielRN
Copy link
Contributor Author

I was under the impression that _X_AMZN_TRACE_ID gets populated automatically from the HTTP header X-Amzn-Trace-ID rather than being exposed to the end user to control whether the tracing is switched on or off, but I may be wrong.

Thanks for that prompt! I did some investigation and found out that you're right about this, so I made a PR in the specifications to follow up on this: open-telemetry/opentelemetry-specification#1867

I interpreted the requirement for X-Ray tracing to be switched off as turning the Tracing control in the lambda UI off, i.e. setting tracing=PassThrough (as opposed to Active), but I may be wrong here too.

Yes I agree with you that sounds like the ideal customer experience!

@michaelhyatt
Copy link

@NathanielRN I am curious if the fix made it into v1.5.0? Maybe not, I can see this one still being a draft.

@NathanielRN
Copy link
Contributor Author

@michaelhyatt Sorry about that, yes this change hasn't been merged in. Haven't had bandwidth to test this solution even though I think it should work, once it gets tested by publishing a built version of this layer we can merge this change.

@NathanielRN NathanielRN force-pushed the support-other-propagators branch 3 times, most recently from 9ee1abe to e625603 Compare October 5, 2021 23:58
@NathanielRN NathanielRN marked this pull request as ready for review October 6, 2021 00:01
@NathanielRN
Copy link
Contributor Author

Hey all, I've updated this PR to have the minimal changes we would need to support this feature 🙂 With these changes, we can parent the context of subsequent spans using the HTTP headers in the lambda event. Further, using event_context_extractor you can define a custom way of extracting the context.

Tests have also been added to validate this.

Please let me know what you think!

Copy link
Contributor

@anuraaga anuraaga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, just some small points


Args:
**kwargs: Optional arguments
``tracer_provider``: a TracerProvider, defaults to global
``event_context_extractor``: a method which takes the Lambda
Event as input and provides the object which contains the
context as output (usually HTTP headers)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it returns the Context itself, not HTTP headers, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that catch! This was outdated, I've updated it now 🙂

):
return parent_context

logger.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X-Ray being disabled isn't a failure so we shouldn't log it as such. I guess we don't need this log message anyways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it could be useful as a "debug" log and so labeled it as such, but I agree it's not necessary so I'll remove it! 🙂

try:
headers = lambda_event["headers"]
except (TypeError, KeyError):
logger.warning("Failed to extract context from Lambda Event.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more on what a user should do if they see this error. Either X-Ray needs to be enabled or the function must be serving API Gateway as a HTTP proxy. Otherwise, this instrumentation should not be included. Something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. What do you think about

"Extracting context from Lambda Event failed: either enable X-Ray active tracing or configure API Gateway to trigger this Lambda function as a pure proxy. Otherwise, generated spans will have an invalid (empty) parent context."

? I'm wondering if there's a use case where someone would want instrumentation and they're okay with the empty parent context. That's why I though "instrumentation should not be included" to not fit as well.

As does the warning level makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - I guess there is a use case where the function does not use X-Ray, and is not API Gateway but meant to always be a root - still it would be a bit weird to have no ability to parent at all but it's conceivable. Your message seems fine then, and yeah it should be debug level if there is a conceivable use case for it rather than always being in error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah like if they just trigger it from the console is the easiest case I can think of. Maybe that Lambda triggers another Lambda 😝

I've made the updates 👍

…bda/__init__.py

Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>
…bda/__init__.py

Co-authored-by: Anuraag Agrawal <anuraaga@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Support for OTEL_PROPAGATORS setting beyond AWS X-Ray
5 participants