feat(llmobs): submit span events rather than llmobs records #8339

Yun-Kim · 2024-02-08T21:34:25Z

Summary

This PR does a few things:

Replaces LLMObs Records with LLMObs Span events (to conform to LLMObs intake payload structure changes)
LLMObsWriter class switches from submitting LLMObs records to submit Span events.
LLMObsWriter is now a single class owned by the LLMObs service instance rather than each LLMIntegration classes owning an instance of LLMObsWriter.
LLMIntegration classes now mark LLM (completion/chat) spans and set temporary ml_obs tags to be extracted by the LLMObs service LLMObsTraceProcessor to create span events and submit them to the LLMObsWriter to be written to LLMObs intake.
Removes the now unnecessary DD_APP_KEY as a config option.
Introduces a new span type llm which the LLMObsTraceProcessor uses to identify spans which should also be converted to LLMObs span events and submitted to LLMObs.

LLMObs Records --> LLMObs Span Event

The LLMIntegration classes currently generate LLMObs records and pass it to their LLMObsWriter instance to submit it to LLMObs' record intake. However, the LLMObs intake is being updated to accept Span events, which this PR creates new support for (and removes support for submitting LLMObs records). The new span event structure looks something like this:

class LLMObsEvent:
    span_id: str
    trace_id: str
    parent_id: str
    session_id: str
    tags: List[str]
    name: str
    start_ns: int
    duration: float
    error: int
    meta: Dict[str, Any]
    metrics: Dict[str, Any]

This also includes some changes to the LLMObs writer:

Submits payloads to a new intake url https://llmobs-intake.{DD_SITE}/api/v2/llmobs.
Removes all previous workarounds to submit only one record per interval, as the new span event structure does not have any of the previous limitations.
Removes DD_APP_KEY as a config option, as the new endpoint does not need it (only needs API key).

New Workflow (who owns the LLMObsWriter now?)

Currently, the workflow for LLMObs involves something like this:

1. OpenAI integration traces a LLM call
2. If `DD_LLMOBS_ENABLED=1`, then the OpenAIIntegration instance will generate a LLMObs record, then pass it to its `LLMObsWriter` thread to submit to LLMObs

The proposed workflow looks something like this:

1. If `DD_LLMOBS_ENABLED=1`, then the LLMObs service is enabled automatically on ddtrace-run/openai/bedrock integration load.
2. The LLMObs service starts its own LLMObsWriter thread, (instead of LLMIntegrations owning their own writer threads).
3. The LLMObs service also adds a LLMObsTraceProcessor to the active tracer.
4. The OpenAIIntegration traces (using same tracer) a LLM call, and marks that span as being an LLM-type. It also sets temporary `ml_obs.*` tags on the span.
5. Once the span is finished, it triggers the tracer's trace processors.
6. The LLMObsTraceProcessor takes all traces containing LLM-type spans, removes the `ml_obs.*` tags to create a LLMObs Span event, and submits to LLMObs via its LLMObsWriter thread.

Note: in this approach, we're creating temporary ml_obs.* tags on the APM spans before removing them to populate the LLMObs span event to submit to LLMObs. These temporary tags will only be created if LLMObs is enabled, and will never be submitted on the APM span object.

Checklist

Change(s) are motivated and described in the PR description
Testing strategy is described if automated tests are not included in the PR
Risks are described (performance impact, potential for breakage, maintainability)
Change is maintainable (easy to change, telemetry, documentation)
Library release note guidelines are followed or label changelog/no-changelog is set
Documentation is included (in-code, generated user docs, public corp docs)
Backport labels are set (if applicable)
If this PR changes the public interface, I've notified @DataDog/apm-tees.
If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

Reviewer Checklist

Title is accurate
All changes are related to the pull request's stated goal
Description motivates each change
Avoids breaking API changes
Testing strategy adequately addresses listed risks
Change is maintainable (easy to change, telemetry, documentation)
Release note makes sense to a user of the library
Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
Backport labels are set in a manner that is consistent with the release branch maintenance policy

pr-commenter · 2024-02-08T22:42:58Z

Benchmarks

Benchmark execution time: 2024-02-14 23:07:55

Comparing candidate commit c7d15a1 in PR branch yunkim/llmobs-span with baseline commit 56422f3 in branch main.

Found 20 performance improvements and 17 performance regressions! Performance is the same for 158 metrics, 9 unstable metrics.

scenario:coreapiscenario-context_with_data_no_listeners

🟩 max_rss_usage [-859.684KB; -686.965KB] or [-2.867%; -2.291%]

scenario:coreapiscenario-core_dispatch_no_listeners

🟥 max_rss_usage [+614.164KB; +791.583KB] or [+2.104%; +2.711%]

scenario:coreapiscenario-core_dispatch_only_all_listeners

🟩 max_rss_usage [-844.583KB; -688.960KB] or [-2.814%; -2.296%]

scenario:coreapiscenario-core_dispatch_with_results_no_listeners

🟩 max_rss_usage [-863.504KB; -704.035KB] or [-2.870%; -2.340%]

scenario:flasksimple-appsec-telemetry

🟥 execution_time [+215.377µs; +264.372µs] or [+3.408%; +4.183%]

scenario:httppropagationextract-datadog_tracecontext_tracestate_propagated_on_trace_id_match

🟩 max_rss_usage [-747.464KB; -630.840KB] or [-2.491%; -2.102%]

scenario:httppropagationextract-empty_headers

🟥 max_rss_usage [+697.845KB; +813.989KB] or [+2.384%; +2.780%]

scenario:httppropagationextract-invalid_span_id_header

🟩 max_rss_usage [-761.802KB; -645.583KB] or [-2.539%; -2.151%]

scenario:httppropagationextract-invalid_trace_id_header

🟥 max_rss_usage [+635.393KB; +747.416KB] or [+2.171%; +2.553%]

scenario:httppropagationextract-large_valid_headers_all

🟩 max_rss_usage [-831.718KB; -724.762KB] or [-2.766%; -2.410%]

scenario:httppropagationextract-medium_valid_headers_all

🟩 max_rss_usage [-758.594KB; -637.323KB] or [-2.526%; -2.122%]

scenario:httppropagationextract-none_propagation_style

🟩 max_rss_usage [-820.634KB; -704.307KB] or [-2.735%; -2.348%]

scenario:httppropagationextract-valid_headers_all

🟩 max_rss_usage [-768.650KB; -642.422KB] or [-2.565%; -2.144%]

scenario:httppropagationextract-valid_headers_basic

🟥 max_rss_usage [+686.972KB; +803.562KB] or [+2.348%; +2.746%]

scenario:httppropagationextract-wsgi_invalid_trace_id_header

🟥 max_rss_usage [+666.992KB; +770.704KB] or [+2.279%; +2.633%]

scenario:httppropagationextract-wsgi_large_header_no_matches

🟥 max_rss_usage [+669.199KB; +767.678KB] or [+2.282%; +2.618%]

scenario:httppropagationextract-wsgi_large_valid_headers_all

🟩 max_rss_usage [-732.367KB; -610.711KB] or [-2.443%; -2.038%]

scenario:httppropagationextract-wsgi_medium_header_no_matches

🟥 max_rss_usage [+678.936KB; +774.325KB] or [+2.318%; +2.644%]

scenario:httppropagationextract-wsgi_medium_valid_headers_all

🟩 max_rss_usage [-749.522KB; -642.708KB] or [-2.499%; -2.143%]

scenario:httppropagationextract-wsgi_valid_headers_all

🟩 max_rss_usage [-817.060KB; -703.785KB] or [-2.720%; -2.343%]

scenario:httppropagationinject-with_all

🟥 max_rss_usage [+713.624KB; +788.789KB] or [+2.437%; +2.694%]

scenario:httppropagationinject-with_dd_origin

🟥 max_rss_usage [+705.559KB; +790.710KB] or [+2.412%; +2.703%]

scenario:httppropagationinject-with_priority_and_origin

🟩 max_rss_usage [-1042.517KB; -956.331KB] or [-3.472%; -3.185%]

scenario:httppropagationinject-with_sampling_priority

🟥 max_rss_usage [+697.334KB; +783.779KB] or [+2.386%; +2.682%]

scenario:httppropagationinject-with_tags

🟩 max_rss_usage [-784.588KB; -692.020KB] or [-2.615%; -2.307%]

scenario:httppropagationinject-with_tags_max_size

🟩 max_rss_usage [-734.773KB; -668.926KB] or [-2.450%; -2.230%]

scenario:otelspan-start

🟥 max_rss_usage [+4.014MB; +4.126MB] or [+8.520%; +8.757%]

scenario:otelspan-start-finish-telemetry

🟩 max_rss_usage [-1014.751KB; -893.576KB] or [-3.238%; -2.852%]

scenario:sethttpmeta-all-disabled

🟥 max_rss_usage [+624.462KB; +727.628KB] or [+2.109%; +2.457%]

scenario:sethttpmeta-collectipvariant_exists

🟩 max_rss_usage [-745.564KB; -616.765KB] or [-2.457%; -2.032%]

scenario:sethttpmeta-no-collectipvariant

🟥 max_rss_usage [+604.403KB; +721.063KB] or [+2.040%; +2.434%]

scenario:sethttpmeta-obfuscation-no-query

🟥 max_rss_usage [+642.239KB; +749.581KB] or [+2.169%; +2.531%]

scenario:sethttpmeta-obfuscation-regular-case-implicit-query

🟩 max_rss_usage [-764.345KB; -654.100KB] or [-2.506%; -2.144%]

scenario:sethttpmeta-obfuscation-send-querystring-disabled

🟩 max_rss_usage [-787.907KB; -654.294KB] or [-2.580%; -2.143%]

scenario:sethttpmeta-obfuscation-worst-case-implicit-query

🟩 max_rss_usage [-802.142KB; -678.152KB] or [-2.626%; -2.220%]

scenario:sethttpmeta-useragentvariant_not_exists_1

🟥 max_rss_usage [+618.106KB; +753.644KB] or [+2.087%; +2.545%]

scenario:span-start-finish

🟥 max_rss_usage [+0.947MB; +1.068MB] or [+3.259%; +3.678%]

type spans and submitting to LLMObs.

ddtrace/settings/config.py

ddtrace/contrib/openai/_endpoint_hooks.py

sabrenner

lgtm!

- Remove apm_context - Move kind --> meta.span.kind - Replace status --> error (0=ok, 1=err) - Move status_message --> meta.error.message

…8339 to 2.8] (#8986) Backport #8339 to 2.8. This PR changes the langchain integration such that it will check for partner libraries before attempting to patch them. The langchain integration patches `langchain_openai.OpenAIEmbeddings.*` and `langchain_pinecone.PineconeVectorStore.*`, which are partner libraries that are not required to be installed. Currently if they are not available, we raise `ModuleNotFoundError`. This PR fixes this so that we'll skip patching those methods if the corresponding partner library is not available. Additionally, this PR also adds importing `langchain_community.llm/chat_models` as those are not automatically imported by importing `langchain_community`. This is important as we reference those submodules in our patch code later on. ## Checklist - [x] Change(s) are motivated and described in the PR description - [x] Testing strategy is described if automated tests are not included in the PR - [x] Risks are described (performance impact, potential for breakage, maintainability) - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed or label `changelog/no-changelog` is set - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)) - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) - [x] If this PR changes the public interface, I've notified `@DataDog/apm-tees`. ## Reviewer Checklist - [x] Title is accurate - [x] All changes are related to the pull request's stated goal - [x] Description motivates each change - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - [x] Testing strategy adequately addresses listed risks - [x] Change is maintainable (easy to change, telemetry, documentation) - [x] Release note makes sense to a user of the library - [x] Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)

Yun-Kim and others added 11 commits February 1, 2024 05:23

Make DD_API_KEY, DD_APP_KEY, DD_SITE global configs

3f5a59a

Fix openai/langchain/bedrock tests for api/app key config

cca1be4

Add DD_API_KEY, DD_APP_KEY, DD_SITE to test/utils override global config

3741ee0

Move llmobs_enabled, llmobs_sample_rate to global config

d8addd6

Add auto LLMObs enabling to ddtrace-run

2839a5b

Merge branch 'main' into yunkim/llmobs-enable-flag

c2095ba

revert enabling in ddtrace-run

0364144

fmt

7a9fed5

wip: llmobs service

6b0a08e

Merge branch 'main' into yunkim/llmobs-chain

01396e0

Enable on openai patch, remove span starting logic so far

0857615

Yun-Kim added changelog/no-changelog A changelog entry is not required for this PR. MLObs ML Observability (LLMObs) labels Feb 8, 2024

Yun-Kim changed the base branch from main to yunkim/llmobs-service February 8, 2024 21:34

Yun-Kim added 6 commits February 8, 2024 16:38

Enable bedrock llmobs service, service properly initializes

e7d10a2

fmt

5004dd9

fmt

40a445c

Move ddtrace/internal/llmobs to ddtrace/llmobs/, set up new test suite

b58ef37

Revert setting llmobs span type to future PR

70608c0

fmt

496e100

Yun-Kim added 5 commits February 8, 2024 17:54

Missed import changes

285865b

fmt

f085b03

missing import changes

24b8762

Rename test cassette files, riot lockfiles

59acf84

Revert to using pytest 7.4.4

3ba3f95

Yun-Kim mentioned this pull request Feb 9, 2024

feat(llmobs): create LLMObs service #8320

Merged

18 tasks

Yun-Kim added 3 commits February 8, 2024 22:55

Add tests for llmobs service enabling

f5e7aad

fmt

2117e30

Revert LLMObsTraceProcessor logic for now

b6b774e

Merge branch 'main' into yunkim/llmobs-span

bed9879

Yun-Kim marked this pull request as ready for review February 13, 2024 22:17

Yun-Kim requested review from a team as code owners February 13, 2024 22:17

Yun-Kim requested review from gnufede, romainkomorndatadog, emmettbutler, mabdinur and tabgok February 13, 2024 22:17

Yun-Kim and others added 3 commits February 13, 2024 17:23

fmt

b90c62f

Revert any complex trace processor behavior other than extracting LLM

6bee4e8

type spans and submitting to LLMObs.

Merge branch 'main' into yunkim/llmobs-span

985c4b7

emmettbutler requested changes Feb 14, 2024

View reviewed changes

ddtrace/settings/config.py Show resolved Hide resolved

Rename variable to match span event rather than removed llm record

32f3bff

Yun-Kim commented Feb 14, 2024

View reviewed changes

ddtrace/contrib/openai/_endpoint_hooks.py Show resolved Hide resolved

mabdinur approved these changes Feb 14, 2024

View reviewed changes

sabrenner approved these changes Feb 14, 2024

View reviewed changes

Span struct changes:

c7d15a1

- Remove apm_context - Move kind --> meta.span.kind - Replace status --> error (0=ok, 1=err) - Move status_message --> meta.error.message

Yun-Kim requested a review from emmettbutler February 14, 2024 22:08

emmettbutler approved these changes Feb 15, 2024

View reviewed changes

Merge branch 'main' into yunkim/llmobs-span

5010eb8

Yun-Kim enabled auto-merge (squash) February 15, 2024 19:20

Yun-Kim merged commit 45e677c into main Feb 15, 2024
149 checks passed

Yun-Kim deleted the yunkim/llmobs-span branch February 15, 2024 19:55

Yun-Kim mentioned this pull request Apr 15, 2024

fix(langchain): avoid patching libraries if not available [backport #8339 to 2.8] #8986

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llmobs): submit span events rather than llmobs records #8339

feat(llmobs): submit span events rather than llmobs records #8339

Yun-Kim commented Feb 8, 2024 •

edited

Loading

pr-commenter bot commented Feb 8, 2024 •

edited

Loading

sabrenner left a comment

feat(llmobs): submit span events rather than llmobs records #8339

feat(llmobs): submit span events rather than llmobs records #8339

Conversation

Yun-Kim commented Feb 8, 2024 • edited Loading

Summary

LLMObs Records --> LLMObs Span Event

New Workflow (who owns the LLMObsWriter now?)

Checklist

Reviewer Checklist

pr-commenter bot commented Feb 8, 2024 • edited Loading

Benchmarks

scenario:coreapiscenario-context_with_data_no_listeners

scenario:coreapiscenario-core_dispatch_no_listeners

scenario:coreapiscenario-core_dispatch_only_all_listeners

scenario:coreapiscenario-core_dispatch_with_results_no_listeners

scenario:flasksimple-appsec-telemetry

scenario:httppropagationextract-datadog_tracecontext_tracestate_propagated_on_trace_id_match

scenario:httppropagationextract-empty_headers

scenario:httppropagationextract-invalid_span_id_header

scenario:httppropagationextract-invalid_trace_id_header

scenario:httppropagationextract-large_valid_headers_all

scenario:httppropagationextract-medium_valid_headers_all

scenario:httppropagationextract-none_propagation_style

scenario:httppropagationextract-valid_headers_all

scenario:httppropagationextract-valid_headers_basic

scenario:httppropagationextract-wsgi_invalid_trace_id_header

scenario:httppropagationextract-wsgi_large_header_no_matches

scenario:httppropagationextract-wsgi_large_valid_headers_all

scenario:httppropagationextract-wsgi_medium_header_no_matches

scenario:httppropagationextract-wsgi_medium_valid_headers_all

scenario:httppropagationextract-wsgi_valid_headers_all

scenario:httppropagationinject-with_all

scenario:httppropagationinject-with_dd_origin

scenario:httppropagationinject-with_priority_and_origin

scenario:httppropagationinject-with_sampling_priority

scenario:httppropagationinject-with_tags

scenario:httppropagationinject-with_tags_max_size

scenario:otelspan-start

scenario:otelspan-start-finish-telemetry

scenario:sethttpmeta-all-disabled

scenario:sethttpmeta-collectipvariant_exists

scenario:sethttpmeta-no-collectipvariant

scenario:sethttpmeta-obfuscation-no-query

scenario:sethttpmeta-obfuscation-regular-case-implicit-query

scenario:sethttpmeta-obfuscation-send-querystring-disabled

scenario:sethttpmeta-obfuscation-worst-case-implicit-query

scenario:sethttpmeta-useragentvariant_not_exists_1

scenario:span-start-finish

sabrenner left a comment

Choose a reason for hiding this comment

Yun-Kim commented Feb 8, 2024 •

edited

Loading

pr-commenter bot commented Feb 8, 2024 •

edited

Loading