Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using upstream OTel auto instrumentation and minimize otel_wrapper #158

Conversation

NathanielRN
Copy link
Contributor

Description

@wangzlei and I worked together to figure out how to mostly auto-instrument Lambda functions using the upstream opentelemetry-instrument auto-instrumentation package.

We made two major updates

Update the PYTHONPATH so we can call opentelemtry-instrument

The auto-instrumentation package counts on 2 things:

  • the opentelemetry package
  • locating all the python packages the user wants to instrument

Both these 2 things require us modifying the environment variable PYTHONPATH right away in otel-instrument. AWS Lambda will add the correct paths but it does it too late. (It only does it once it calls it's originally intended entry point of python3 /var/runtime/bootstrap.py).

Update the otel_wrapper.py to only call the AwsLambdaInstrumentor

All of the instrumentation is done in sitecustomize.py by opentelemetry-instrument. However, the way the Lambda Handler is imported in the AWS Lambda bootstrap.py file which is run after sitecustomize.py CLEARS any instrumentation done on lambda_function.lambda_handler. That's why we keep otel_wrapper.py around. So that bootstrap.py can call it, do any destructive imports it needs to do, and then call AwsLambdaInstrumentor.instrument() explicitly. This way we know the Lambda function is instrumented, we can import, and give bootstrap.py the Lambda Handler that we know is instrumented.

Future Work

We would ideally like to modify boostrap.py or investigate how we can get rid of otel_wrapper.py and have all the instrumentation be finished in sitecustomize.py such that bootstrap.py can just import the normal lambda handler at the _HANDLER environment variable without destroying the import at all.

Fixes #152

@NathanielRN NathanielRN force-pushed the make-exec-script-call-auto-instrument-script branch from 4b147b1 to a270074 Compare October 21, 2021 05:35
@NathanielRN NathanielRN force-pushed the make-exec-script-call-auto-instrument-script branch from a270074 to 46110d2 Compare October 21, 2021 05:36
# know where the OpenTelemetry Python packages are and can add them to the
# PYTHONPATH.

LAMBDA_LAYER_PKGS_DIR = os.path.abspath(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file would be located at /var/task - shouldn't this explicitly reference /opt/python which while not an environment variable is a well defined path for lambda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file would be located at /var/task

Didn't you also say this in #157

Note, the file structure is important. Lambda does not seem to allow loading the handler if it is placed at a subdirectory such as opentelemetry.instrumentation.lambda.handler, when inside a layer it must be at the top level.

so this file has to be moved to the top level as we do in the build:

mv /build/python/otel-instrument /build/otel-instrument && \

shouldn't this explicitly reference /opt/python which while not an environment variable is a well defined path for lambda?

Yeah we can do that, it was just annoying to test locally because ../python always existed but /opt/python did not. I'll move it back though because that explains the intent well.


# the path to the interpreter and all of the originally intended arguments
args = sys.argv[1:]
AWS_LAMBDA_FUNCTION_NAME = "AWS_LAMBDA_FUNCTION_NAME"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd expect these to actually refer to a value, not just the name itself. I'd just inline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a different name for the constant if you think that's better? I think this pattern is a good idea though because we reference this variable in 3 different places and that's 3 different chances to have a typo (as has happened to me before).

It's also pretty normal in the upstream to do this? https://github.com/open-telemetry/opentelemetry-python/blob/main/opentelemetry-api/src/opentelemetry/environment_variables.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming to something else seems like it reduces readability of the code that uses the variable. Using variables that refer to names, instead of values, also reduces readability though. I disagree with the upstream pattern here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, readability always needs to trump writability - typos during writing is still sort of OK for readability sakes. In many cases it results in duplicate docs, etc that get out of sync but it is what it is. In this case, readability suffers from the constants since then you have to jump to the constant definition to know the value every time - using the string has only positive impacts on readibility though


# - Set the service name

service_name_resource_attribute = "service.name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to not respect the user's service.name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


environ.setdefault(OTEL_PROPAGATORS, "tracecontext,b3,xray")

# - Disable Lambda instrumentation because we will instrument when we know it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just remove the auto instrumentation entry point from setup.cfg instead as it can't really be used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right idea, will do that!


environ[OTEL_PYTHON_DISABLED_INSTRUMENTATIONS] = "aws_lambda"

# - Use a wrapper because instrumentations don't last across parent -> child
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this comment, we already found enough confusion on the topic of parent / child already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, yes it is confusing, removing it and explaining it better.

environ["_HANDLER"] = "otel_wrapper.lambda_handler"

# - Call the upstream auto instrumentation script

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly recommend rewriting this file into a shell script. It solidifies two facts

  1. It's supposed to only do very simple things like setting environment variables, no business logic can happen in here
  2. There is no parent python process in our lambda handler, the fact that this file is python I feel can confuse a reader into wondering why we don't load instrumentation here (I think you tried this in a PR once :P). A shell script makes obvious this isn't the app, it's just configuring the environment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's been a lot of talk at the python SIG about moving all bash scripts to python. Python does scripts really well!

  1. It's supposed to only do very simple things like setting environment variables, no business logic can happen in here

One huge benefit is being able to import other python packages to GET environment variable constants which like you said is the point of this file. Then we can be sure updates get picked up. (Although they are spec defined things in the instrumentation section are still changing).

All in all a python script should be easier to maintain by a python script.

  1. There is no parent python process in our lambda handler, the fact that this file is python I feel can confuse a reader into wondering why we don't load instrumentation here (I think you tried this in a PR once :P). A shell script makes obvious this isn't the app, it's just configuring the environment

Lol yes I did but I learned a lot about the Lambda instrumentation in #152 🥲 I added an explicit comment at the top description. I feel like that was what the file was missing and with this it shouldn't confuse future readers.

configured = entry_point.name
except Exception as exc: # pylint: disable=broad-except
logger.debug("Configuration of %s failed", entry_point.name)
AwsLambdaInstrumentor().instrument()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When upstreaming, though initially probably a PR for this repo since it's easier to test I guess, strongly recommend providing the original handler as a parameter to this function. That's all it should take to make the instrumentation work both manually, if a user wants to, or automatically from the lambda layer. We just want to decouple ORIG_HANDLER as it's a mysterious dependency between the two components.

I'd recommend sending a PR that only adds AwsLambdaInstrumenter file and has unit tests that exercise the instrumentation manually first to keep the scope small and easy to review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize currently the instrumentation seems to handle by falling back to _HANDLER when ORIG_HANDLER isn't set. It works ok but less coupling will make the parts easier to follow. Instrumentation is about instrumenting the handler, wrapper is about setting up the instrumentation automatically in the runtime. The separation will improve the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend sending a PR that only adds AwsLambdaInstrumenter file and has unit tests that exercise the instrumentation manually first to keep the scope small and easy to review.

Will definitely keep this in mind as I ask for feedback from the Python SIG. The actual code change in open-telemetry/opentelemetry-python-contrib#739 is small though and when I brought it up last week they said they would be able to help me review it! It's inflated by the LICENSE.

The tests already give an example of BOTH auto instrumentation AND manual instrumentation. Auto Instrumentation is a super important part to this product so I feel reluctant to commit it later when we already have tests that can give us confidence the scripts is working as intended.

(I put in a lot of effort into #150 to make sure the tests use the script because it helps give us confidence that instrumentation works as intended in Lambda).

@anuraaga anuraaga closed this Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use opentelemetry-instrument script to invoke python app
2 participants