Support tail-based sampling from OTEL Collector #5867

yurishkuro · 2024-08-20T00:34:18Z

Jaeger v2 can now support tail based sampling that exists in OTEL Collector as an extension.

include loadbalancing exporter and tail_sampling processor in components.go
create sample configuration and docker-compose file utilizing it
create e2e integration test (script and GH actions workflow)
create README documenting the setup

As a single task it's too large for good-first-issue, but can be done incrementally

The text was updated successfully, but these errors were encountered:

mahadzaryab1 · 2024-08-21T22:23:15Z

@yurishkuro I'm interested in working on this but have never contributed to this repository before. Would you be able to guide me on how to tackle this issue?

yurishkuro · 2024-08-22T02:32:59Z

@mahadzaryab1 I would start with

reading the blog post I linked
reproducing it in a local setup via docker compose
then swapping the final stage collectors with jaeger-v2 collector (it will require importing the tail sampling processor into components.go)

mahadzaryab1 · 2024-08-22T03:00:50Z

sounds good! i'll give this a shot

mahadzaryab1 · 2024-08-22T23:58:32Z

@yurishkuro I got the first item done in #5878 and read the blog post. Are there any instructions/examples on how I can can create a sample configuration and docker-compose file. Thank you so much for your time and help!

yurishkuro · 2024-08-23T00:35:27Z

Sample configurations are provided in the blog post. Example of docker compose using Jaeger is in docker-compose/monitor/docker-compose-v2.yml (but you will need to build the Docker image locally so that the code recognizes tail sampling processor, because officially published image will not)

mahadzaryab1 · 2024-08-23T03:28:45Z

@yurishkuro thank you! as a follow-up, i'm trying to build the image locally by running make build from jaeger/docker-compose/monitor but am running into the following error. am i missing a step in between?

+ cd packages/jaeger-ui
./scripts/rebuild-ui.sh: line 35: cd: packages/jaeger-ui: No such file or directory
make[2]: *** [rebuild-ui] Error 1
make[1]: *** [jaeger-ui/packages/jaeger-ui/build/index.html] Error 2
make: *** [build] Error 2

mahadzaryab1 · 2024-08-23T04:01:16Z

@yurishkuro this is how i've modified the docker setup to reproduce tail based sampling - let me know if you have any feedback

yurishkuro · 2024-08-23T04:19:39Z

you need to do initialize & update git submodules in order to access UI dir

mahadzaryab1 · 2024-08-23T05:18:43Z

@yurishkuro awesome! thank you so much. I was able to get the setup going based on the configuration I linked above. Here's some sample output I'm seeing from the different policies that are being evaluated:

jaeger-1          | 2024-08-23T05:14:01.932Z	debug	sampling/string_tag_filter.go:95	Evaluting spans in string-tag filter	{"kind": "processor", "name": "tail_sampling", "pipeline": "traces", "policy": "string_attribute"}
jaeger-1          | 2024-08-23T05:14:01.932Z	debug	sampling/latency.go:34	Evaluating spans in latency filter	{"kind": "processor", "name": "tail_sampling", "pipeline": "traces", "policy": "latency"}
jaeger-1          | 2024-08-23T05:14:01.932Z	debug	sampling/probabilistic.go:46	Evaluating spans in probabilistic filter	{"kind": "processor", "name": "tail_sampling", "pipeline": "traces", "policy": "probabilistic"}
jaeger-1          | 2024-08-23T05:14:01.932Z	debug	sampling/status_code.go:54	Evaluating spans in status code filter	{"kind": "processor", "name": "tail_sampling", "pipeline": "traces", "policy": "status_code"}

yurishkuro · 2024-08-23T05:34:04Z

I suggest now to think about what e2e integration test could look like for this

mahadzaryab1 · 2024-08-23T05:37:07Z

@yurishkuro will do - thanks for all your help and guidance so far! should the sample configuration/docker compose/readme go in https://github.com/jaegertracing/jaeger/tree/main/examples?

yurishkuro · 2024-08-23T05:38:54Z

docker-compose/tail-sampling

mahadzaryab1 · 2024-08-24T01:01:52Z

@yurishkuro I completed the second task of creating a sample configuration. I'm looking for a bit of guidance on the e2e integration tests. I see that we have some integration tests but those mostly seem to be for the storage collectors. In this case, do we want to maybe test that the traces are being sampled according to the policies in our config? For example, we could test the 'filter-by-attribute' or the 'all-errors' policies by ensuring that those traces are getting captured and the rest are getting filtered out.

yurishkuro · 2024-08-24T17:05:05Z

You can use the tests we have as inspiration, but don't try to fit your test to them. How would you e2e test tail sampling if starting from scratch? Try explaining the whole setup and test process.

mahadzaryab1 · 2024-08-24T17:31:48Z

@yurishkuro I am envisioning something like this:

Set up the jaeger-v2 collector with a tail-sampling processor with a string_attribute policy that matches on a particular tag
Start the load balancing collector and jaeger-v2 collector from the docker-compose setup
Send spans with various attributes to the otel-collector with load balancing
Test that only the tags that match the ones listed in the policy are sampled/stored by the jaeger-v2 collector

Let me know what you think and if you have any feedback!

yurishkuro · 2024-08-24T17:46:44Z

SGTM. A couple thoughts:

Such test would depend on your ability to generate very specific traces, how are you planning to achieve that? E.g. both microsim and tracegen utils generate nearly identical traces, although you have more control with tracegen where, for instance, you could control how many service names it generates
How will you verify that the right traces are sampled?
Most importantly, you need to make sure that the test is robust, e.g. that the reason some traces are not sampled is due to tail sampler, not due to some other condition. One way to ensure this is to run the A/B test where the only change between A and B is the configuration of the tail sampler. E.g. in A you only configure it to sample service-a, but not service-b, and you verify that you can observe that. Then in B you flip that condition and again verify that you get expected results.

BTW, in order to perform this test you do not need load generator running continuously, it's better if you just generate a fixed number of traces. You also need to make sure the storage is purged between A and B.

mahadzaryab1 · 2024-08-24T20:44:43Z

@yurishkuro Thanks for the follow-up. I'm going to try playing around with tracegen to begin with to see what kind of traces I can generate. I'm currently trying to replace mirosim with tracegen to the docker-compose.yml with the following configuration.

  tracegen:
    image: jaegertracing/jaeger-tracegen:latest
    environment:
      - OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://jaeger:4318/v1/traces
    command: ["-duration", "10s", "-workers", "3", "-pause", "250ms"]
    depends_on:
      - jaeger

This doesn't seem to work and gives me the following error:

tracegen-1        | 2024/08/24 20:42:32 traces export: Post "http://jaeger:4318/v1/traces": dial tcp: lookup jaeger on 127.0.0.11:53: no such host
tracegen-1 exited with code 0

Do you know what I'm missing? Here is the full docker-compose set up if that helps.

yurishkuro · 2024-08-24T20:54:17Z

You're missing network setting on tracegen compose config

mahadzaryab1 · 2024-08-24T21:50:35Z

@yurishkuro thank you so much! As a follow up, I was wondering if you had any guidance (or an example if we're already doing this somewhere) on how I can query the spans that are stored? If we set up a policy on the service name, then we can do an A/B test with and without sampling as you suggested by querying the spans from store.

mahadzaryab1 · 2024-08-24T23:51:10Z

@yurishkuro I've been reading through the storage integration tests. If we add a filter on the service name in our tail sampling processor config, then can we do something similar to https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/integration/integration.go#L136 to get all the services from the traces we have collected (in memory) and ensure that we only have the ones we listed in our policy?

mahadzaryab1 · 2024-08-24T23:57:47Z

As well, do you have any thoughts on where the integration tests should live? It seems like the storage ones are in cmd/jaeger/internal/integration.

yurishkuro · 2024-08-25T00:27:22Z

to get all the services from the traces we have collected (in memory) and ensure that we only have the ones we listed in our policy?

Conceptually yes, but sampling works at the trace level (all spans or nothing), so still need a bit more thought how you'd test them

yurishkuro · 2024-08-25T00:30:02Z

The test needs a new workflow yaml file and a new shell script to orchestrate. Then depending on how you want to do the analysis you might need more - at this point I don't know what you have in mind. If you need go code then cmd/jaeger/internal/integration would be a good place.

mahadzaryab1 · 2024-08-25T00:59:12Z

@yurishkuro Would it not be enough to have a policy in the tail sampling processor that filters out a single service using a string attribute policy, say tracegen-02. We can then use tracegen to generate traces for some number of services, say 5. Then, if our tail sampling processor is working as expected, we would only store tracegen-02 and discard the rest.

yurishkuro · 2024-08-25T01:34:33Z

That's fine as a first version, but it's not quite a robust test because it doesn't prove that service02 was actually generated in the first place. An A/B test wouldn't have that problem.

mahadzaryab1 · 2024-08-25T02:09:29Z

@yurishkuro I see. How about something like?

Start jaeger backend with batch processor
Use tracegen to generate traces for 5 services
Query the data store to see that traces are present for all 5 services
Flush the data store
Start jaeger backend with tail sampling processor that has a policy to only sample service.name=tracegen-02
Query the data store to see that traces are present only for tracegen-02

mahadzaryab1 · 2024-08-25T17:29:20Z

@yurishkuro would you be able to help me with the setup of the test? i'm stuck on the following:

how do I query the datastore?
can all of this be done through the shell script or should I be writing go code?
If go code needs to be writte, does it make sense for it to cmd/jaeger/internal/integration? it seems like this directory is being used to run all the storage related integration tests from the Makefile (https://github.com/jaegertracing/jaeger/blob/main/Makefile#L7)
Is the load balancer required for the integration tests? I'm not sure if there's something wrong with my setup but if I comment out the load balancing otel collector from the docker-compose file, the behaviour looks to be the same. Is this expected? (https://gist.github.com/mahadzaryab1/0b8ccc194421e00cd2e6dce6f450c424)

yurishkuro · 2024-08-25T22:43:10Z

It is easiest to query data store if you write Go code in cmd/jaeger/internal/integration, because that's exactly what the tests located there are doing - they are using an RPC implementation of SpanReader which proxies the requests to the query service running in the jaeger-v2 collector.

Load balancer is not necessary since you will only be running a single instance of the collector. The objective of the load balancer is to ensure that all spans for the same trace ID end up in the same instance of the collector that runs tail sampling logic.

mahadzaryab1 · 2024-08-26T03:07:11Z

@yurishkuro Got it! Thank you so much. A couple of follow-ups I had:

Can we simply use the SpanWriter to manually send spans? Is there a reason why we want to use tracegen?
If i write the tests in cmd/jaeger/internal/integration, will they be run as part of https://github.com/jaegertracing/jaeger/blob/main/Makefile#L140-L146? Is that fine? Do we need to set an env variable to run this test similar to STORAGE=?

yurishkuro · 2024-08-26T04:12:46Z

Yes if you have to write code to read you might as well save whichever traces you need via Span Writer.

The test won't automatically run because those integration tests all require an environment variable to activate, otherwise they will all run at once and most of them fail because storage backend won't be available.

mahadzaryab1 · 2024-08-30T23:42:47Z

@yurishkuro I've completed the integration test and pushed it to #5878. Working on the final README task - should this go in jaeger/docker-compose/tail-sampling?

yurishkuro · 2024-08-31T03:09:33Z

yes please

mahadzaryab1 · 2024-08-31T16:59:41Z

@yurishkuro Thanks for all your help and guidance in helping me complete this issue. I learnt a lot about OpenTelemtry, Jaeger, and Distributed Tracing. I'm very excited about this project and would like to keep contributing to it. Do you have any recommendations for what I can pick up next?

yurishkuro · 2024-08-31T17:19:53Z

@mahadzaryab1 appreciate the help. The top priority is completing the work on Jaeger-v2, which you can see in the project board https://github.com/orgs/jaegertracing/projects/3/views/2

dosubot bot added area/sampling changelog:new-feature Change that should be called out as new feature in CHANGELOG v2 labels Aug 20, 2024

yurishkuro added help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners labels Aug 20, 2024

yurishkuro mentioned this issue Aug 24, 2024

Support Tail Based Sampling Processor From OTEL Collector Extension #5878

Merged

4 tasks

yurishkuro closed this as completed in #5878 Aug 31, 2024

yurishkuro closed this as completed in 8ad6ed0 Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tail-based sampling from OTEL Collector #5867

Support tail-based sampling from OTEL Collector #5867

yurishkuro commented Aug 20, 2024 •

edited

Loading

mahadzaryab1 commented Aug 21, 2024

yurishkuro commented Aug 22, 2024

mahadzaryab1 commented Aug 22, 2024

mahadzaryab1 commented Aug 22, 2024 •

edited

Loading

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024 •

edited

Loading

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 24, 2024 •

edited

Loading

yurishkuro commented Aug 24, 2024

mahadzaryab1 commented Aug 24, 2024

yurishkuro commented Aug 24, 2024

mahadzaryab1 commented Aug 24, 2024

yurishkuro commented Aug 24, 2024

mahadzaryab1 commented Aug 24, 2024 •

edited

Loading

mahadzaryab1 commented Aug 24, 2024 •

edited

Loading

mahadzaryab1 commented Aug 24, 2024

yurishkuro commented Aug 25, 2024

yurishkuro commented Aug 25, 2024

mahadzaryab1 commented Aug 25, 2024

yurishkuro commented Aug 25, 2024

mahadzaryab1 commented Aug 25, 2024

mahadzaryab1 commented Aug 25, 2024 •

edited

Loading

yurishkuro commented Aug 25, 2024

mahadzaryab1 commented Aug 26, 2024 •

edited

Loading

yurishkuro commented Aug 26, 2024

mahadzaryab1 commented Aug 30, 2024

yurishkuro commented Aug 31, 2024

mahadzaryab1 commented Aug 31, 2024 •

edited

Loading

yurishkuro commented Aug 31, 2024

Support tail-based sampling from OTEL Collector #5867

Support tail-based sampling from OTEL Collector #5867

Comments

yurishkuro commented Aug 20, 2024 • edited Loading

mahadzaryab1 commented Aug 21, 2024

yurishkuro commented Aug 22, 2024

mahadzaryab1 commented Aug 22, 2024

mahadzaryab1 commented Aug 22, 2024 • edited Loading

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024 • edited Loading

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 23, 2024

yurishkuro commented Aug 23, 2024

mahadzaryab1 commented Aug 24, 2024 • edited Loading

yurishkuro commented Aug 24, 2024

mahadzaryab1 commented Aug 24, 2024

yurishkuro commented Aug 24, 2024

mahadzaryab1 commented Aug 24, 2024

yurishkuro commented Aug 24, 2024

mahadzaryab1 commented Aug 24, 2024 • edited Loading

mahadzaryab1 commented Aug 24, 2024 • edited Loading

mahadzaryab1 commented Aug 24, 2024

yurishkuro commented Aug 25, 2024

yurishkuro commented Aug 25, 2024

mahadzaryab1 commented Aug 25, 2024

yurishkuro commented Aug 25, 2024

mahadzaryab1 commented Aug 25, 2024

mahadzaryab1 commented Aug 25, 2024 • edited Loading

yurishkuro commented Aug 25, 2024

mahadzaryab1 commented Aug 26, 2024 • edited Loading

yurishkuro commented Aug 26, 2024

mahadzaryab1 commented Aug 30, 2024

yurishkuro commented Aug 31, 2024

mahadzaryab1 commented Aug 31, 2024 • edited Loading

yurishkuro commented Aug 31, 2024

yurishkuro commented Aug 20, 2024 •

edited

Loading

mahadzaryab1 commented Aug 22, 2024 •

edited

Loading

mahadzaryab1 commented Aug 23, 2024 •

edited

Loading

mahadzaryab1 commented Aug 24, 2024 •

edited

Loading

mahadzaryab1 commented Aug 24, 2024 •

edited

Loading

mahadzaryab1 commented Aug 24, 2024 •

edited

Loading

mahadzaryab1 commented Aug 25, 2024 •

edited

Loading

mahadzaryab1 commented Aug 26, 2024 •

edited

Loading

mahadzaryab1 commented Aug 31, 2024 •

edited

Loading