Skip to content

Commit

Permalink
Support Tail Based Sampling Processor From OTEL Collector Extension (#…
Browse files Browse the repository at this point in the history
…5878)

## Which problem is this PR solving?
- Closes #5867

## Description of the changes
- Added the [tail-based sampling processor
extension](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md)
from otel to jaeger
- Added a `docker compose` to demonstrate usage of the tail-based
sampling processor extension in jaeger.
- Added an end to end integration test to test that the new processor
works as expected
- Added a README to the `docker compose` setup describing the setup and
usage of the new processor

## How was this change tested?
- An end to end integration test was added and is run from the CI

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Mahad Zaryab <mahadzaryab1@gmail.com>
Signed-off-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <github@ysh.us>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
  • Loading branch information
3 people committed Aug 31, 2024
1 parent 9a30dfc commit 8ad6ed0
Show file tree
Hide file tree
Showing 13 changed files with 437 additions and 0 deletions.
41 changes: 41 additions & 0 deletions .github/workflows/ci-e2e-tailsampling-processor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Test Tail Sampling Processor

on:
push:
branches: [main]

pull_request:
branches: [main]

concurrency:
group: ${{ github.workflow }}-${{ (github.event.pull_request && github.event.pull_request.number) || github.ref || github.run_id }}
cancel-in-progress: true

# See https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions
permissions: # added using https://github.com/step-security/secure-workflows
contents: read

jobs:
tailsampling-processor:
runs-on: ubuntu-latest
steps:
- name: Harden Runner
uses: step-security/harden-runner@0d381219ddf674d61a7572ddd19d7941e271515c # v2.9.0
with:
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs

- uses: actions/checkout@692973e3d937129bcbf40652eb9f2f61becf3332 # v4.1.7

- uses: actions/setup-go@0a12ed9d6a96ab950c8f026ed9f722fe0da7ef32 # v5.0.2
with:
go-version: 1.23.x

- name: Run Tail Sampling Processor Integration Test
run: |
make tail-sampling-integration-test
- name: Upload coverage to codecov
uses: ./.github/actions/upload-codecov
with:
files: cover.out
flags: tailsampling-processor
4 changes: 4 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,10 @@ index-cleaner-integration-test: docker-images-elastic
index-rollover-integration-test: docker-images-elastic
$(MAKE) storage-integration-test COVEROUT=cover-index-rollover.out

.PHONY: tail-sampling-integration-test
tail-sampling-integration-test:
SAMPLING=tail $(MAKE) jaeger-v2-storage-integration-test

.PHONY: cover
cover: nocover
bash -c "set -e; set -o pipefail; STORAGE=memory $(GOTEST) -timeout 5m -coverprofile $(COVEROUT) ./... | tee test-results.json"
Expand Down
38 changes: 38 additions & 0 deletions cmd/jaeger/config-tail-sampling-always-sample.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
service:
extensions: [jaeger_storage, jaeger_query, healthcheckv2]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [jaeger_storage_exporter]
telemetry:
logs:
level: DEBUG

extensions:
healthcheckv2:
use_v2: true
http:
jaeger_query:
trace_storage: some_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000

receivers:
otlp:
protocols:
grpc:
http:
endpoint: "0.0.0.0:4318"

processors:
tail_sampling:
decision_wait: 5s
policies: [{ name: test-policy-1, type: always_sample }]

exporters:
jaeger_storage_exporter:
trace_storage: some_storage
46 changes: 46 additions & 0 deletions cmd/jaeger/config-tail-sampling-service-name-policy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
service:
extensions: [jaeger_storage, jaeger_query, healthcheckv2]
pipelines:
traces:
receivers: [otlp]
processors: [tail_sampling]
exporters: [jaeger_storage_exporter]
telemetry:
logs:
level: DEBUG

extensions:
healthcheckv2:
use_v2: true
http:
jaeger_query:
trace_storage: some_storage
jaeger_storage:
backends:
some_storage:
memory:
max_traces: 100000

receivers:
otlp:
protocols:
grpc:
http:
endpoint: "0.0.0.0:4318"

processors:
tail_sampling:
decision_wait: 5s
policies:
[
{
name: filter-by-attribute,
type: string_attribute,
string_attribute:
{ key: service.name, values: [tracegen-00, tracegen-03] },
},
]

exporters:
jaeger_storage_exporter:
trace_storage: some_storage
2 changes: 2 additions & 0 deletions cmd/jaeger/internal/components.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/kafkaexporter"
"github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusexporter"
"github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckv2extension"
"github.com/open-telemetry/opentelemetry-collector-contrib/processor/tailsamplingprocessor"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/jaegerreceiver"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/kafkareceiver"
"github.com/open-telemetry/opentelemetry-collector-contrib/receiver/zipkinreceiver"
Expand Down Expand Up @@ -104,6 +105,7 @@ func (b builders) build() (otelcol.Factories, error) {
// standard
batchprocessor.NewFactory(),
memorylimiterprocessor.NewFactory(),
tailsamplingprocessor.NewFactory(),
// add-ons
adaptivesampling.NewFactory(),
)
Expand Down
95 changes: 95 additions & 0 deletions cmd/jaeger/internal/integration/tailsampling_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package integration

import (
"context"
"os"
"os/exec"
"sort"
"testing"
"time"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"

"github.com/jaegertracing/jaeger/plugin/storage/integration"
)

// TailSamplingIntegration contains the test components to perform an integration test
// for the Tail Sampling Processor.
type TailSamplingIntegration struct {
E2EStorageIntegration

// expectedServices contains a list of services that should be sampled in the test case.
expectedServices []string
}

// TestTailSamplingProcessor_EnforcesPolicies runs an A/B test to perform an integration test
// for the Tail Sampling Processor.
// - Test A uses a Jaeger config file with a tail sampling processor that has a policy for sampling
// all traces. In this test, we check that all services that are samples are stored.
// - Test B uses a Jaeger config file with a tail sampling processor that has a policy to sample
// traces using on the `service.name` attribute. In this test, we check that only the services
// listed as part of the policy in the config file are stored.
func TestTailSamplingProcessor_EnforcesPolicies(t *testing.T) {
if env := os.Getenv("SAMPLING"); env != "tail" {
t.Skipf("This test requires environment variable SAMPLING=tail")
}

expectedServicesA := []string{"tracegen-00", "tracegen-01", "tracegen-02", "tracegen-03", "tracegen-04"}
tailSamplingA := &TailSamplingIntegration{
E2EStorageIntegration: E2EStorageIntegration{
ConfigFile: "../../config-tail-sampling-always-sample.yaml",
StorageIntegration: integration.StorageIntegration{
CleanUp: purge,
},
},
expectedServices: expectedServicesA,
}

expectedServicesB := []string{"tracegen-00", "tracegen-03"}
tailSamplingB := &TailSamplingIntegration{
E2EStorageIntegration: E2EStorageIntegration{
ConfigFile: "../../config-tail-sampling-service-name-policy.yaml",
StorageIntegration: integration.StorageIntegration{
CleanUp: purge,
},
},
expectedServices: expectedServicesB,
}

t.Run("sample_all", tailSamplingA.testTailSamplingProccessor)
t.Run("sample_some", tailSamplingB.testTailSamplingProccessor)
}

// testTailSamplingProccessor performs the following steps:
// 1. Initialize the test case by starting the Jaeger V2 collector
// 2. Generate 5 traces using `tracegen` with one service per trace
// 3. Read the stored services from the memory store
// 4. Check that the sampled services match what is expected
func (ts *TailSamplingIntegration) testTailSamplingProccessor(t *testing.T) {
ts.e2eInitialize(t, "memory")
ts.generateTraces(t)

var actual []string
assert.Eventually(t, func() bool {
var err error
actual, err = ts.SpanReader.GetServices(context.Background())
require.NoError(t, err)
sort.Strings(actual)
return assert.ObjectsAreEqualValues(ts.expectedServices, actual)
}, 100*time.Second, 15*time.Second)

t.Logf("Expected: %v", ts.expectedServices)
t.Logf("Actual : %v", actual)
}

// generateTraces generates 5 traces using `tracegen` with one service per trace
func (*TailSamplingIntegration) generateTraces(t *testing.T) {
tracegenCmd := exec.Command("go", "run", "../../../../cmd/tracegen", "-traces", "5", "-services", "5")
out, err := tracegenCmd.CombinedOutput()
require.NoError(t, err)
t.Logf("tracegen completed: %s", out)
}
30 changes: 30 additions & 0 deletions docker-compose/tail-sampling/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Copyright (c) 2024 The Jaeger Authors.
# SPDX-License-Identifier: Apache-2.0

BINARY ?= jaeger

.PHONY: build
build: clean-jaeger
cd ../../ && make build-$(BINARY) GOOS=linux
cd ../../ && make create-baseimg PLATFORMS=linux/$(shell go env GOARCH)
cd ../../ && docker buildx build --target release \
--tag jaegertracing/$(BINARY):dev \
--build-arg base_image=localhost:5000/baseimg_alpine:latest \
--build-arg debug_image=not-used \
--build-arg TARGETARCH=$(shell go env GOARCH) \
--load \
cmd/$(BINARY)

.PHONY: dev
dev: export JAEGER_IMAGE_TAG = dev
dev: build
docker compose -f docker-compose.yml up $(DOCKER_COMPOSE_ARGS)

.PHONY: clean-jaeger
clean-jaeger:
# Also cleans up intermediate cached containers.
docker system prune -f

.PHONY: clean-all
clean-all: clean-jaeger
docker rmi -f otel/opentelemetry-collector-contrib:latest
37 changes: 37 additions & 0 deletions docker-compose/tail-sampling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Tail-Based Sampling Processor

This `docker compose` environment provides a sample configuration of a Jaeger collector utilizing the
[Tail-Based Sampling Processor in OpenTelemtry](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md).

## Description of Setup

The `docker-compose.yml` contains three services and their functions are outlined as follows:

1. `jaeger` - This is the Jaeger V2 collector that samples traces using the `tail_sampling` processor.
The configuration for this service is in [jaeger-v2-config.yml](./jaeger-v2-config.yml).
The `tail_sampling` processor has one policy that only captures traces from the services `tracegen-02` and `tracegen-04`.
For a full list of policies that can be added to the `tail_sampling` processor, check out [this README](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md).
2. `otel_collector` - This is an OpenTelemtry collector with a `loadbalancing` exporter that routes requests to `jaeger`.
The configuration for this service is in [otel-collector-config-connector.yml](./otel-collector-config-connector.yml).
The purpose of this collector is to collect spans from different services and forward all spans with the same `traceID`
to the same downstream collector instance (`jaeger` in this case), so that sampling decisions for a given trace can be
made in the same collector instance.
3. `tracegen` - This is a service that generates traces for 5 different services and sends them to `otel_collector`
(which will in turn send them to `jaeger`).

Note that in this minimal setup, a `loadbalancer` collector is not necessary since we are only running a
single instance of the `jaeger` collector. In a real-world distributed system running multiple instances
of the `jaeger` collector, a load balancer is necessary to avoid spans from the same trace being routed
to different collector instances.

## Running the Example

The example can be run using the following command:

```bash
make dev
```

To see the tail-based sampling processor in action, go to the Jaeger UI at <http://localhost:16686/>.
You will see that only traces for the services outlined in the policy in [jaeger-v2-config.yml](./jaeger-v2-config.yml)
are sampled.
36 changes: 36 additions & 0 deletions docker-compose/tail-sampling/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
services:
jaeger:
networks:
backend:
image: jaegertracing/jaeger:${JAEGER_IMAGE_TAG:-latest}
volumes:
- "./jaeger-v2-config.yml:/etc/jaeger/config.yml"
command: ["--config", "/etc/jaeger/config.yml"]
ports:
- "16686:16686"
- "4317"

otel_collector:
networks:
backend:
image: otel/opentelemetry-collector-contrib:${OTEL_IMAGE_TAG:-0.108.0}
volumes:
- ${OTEL_CONFIG_SRC:-./otel-collector-config-connector.yml}:/etc/otelcol/otel-collector-config.yml
command: --config /etc/otelcol/otel-collector-config.yml
depends_on:
- jaeger
ports:
- "4318"

tracegen:
networks:
- backend
image: jaegertracing/jaeger-tracegen:latest
environment:
- OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://otel_collector:4318/v1/traces
command: ["-workers", "3", "-pause", "250ms", "-services", "5", "-duration", "10s"]
depends_on:
- jaeger

networks:
backend:
Loading

0 comments on commit 8ad6ed0

Please sign in to comment.