Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llmobs): submit span events rather than llmobs records #8339

Merged
merged 38 commits into from
Feb 15, 2024

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Feb 8, 2024

Summary

This PR does a few things:

  • Replaces LLMObs Records with LLMObs Span events (to conform to LLMObs intake payload structure changes)
  • LLMObsWriter class switches from submitting LLMObs records to submit Span events.
  • LLMObsWriter is now a single class owned by the LLMObs service instance rather than each LLMIntegration classes owning an instance of LLMObsWriter.
  • LLMIntegration classes now mark LLM (completion/chat) spans and set temporary ml_obs tags to be extracted by the LLMObs service LLMObsTraceProcessor to create span events and submit them to the LLMObsWriter to be written to LLMObs intake.
  • Removes the now unnecessary DD_APP_KEY as a config option.
  • Introduces a new span type llm which the LLMObsTraceProcessor uses to identify spans which should also be converted to LLMObs span events and submitted to LLMObs.

LLMObs Records --> LLMObs Span Event

The LLMIntegration classes currently generate LLMObs records and pass it to their LLMObsWriter instance to submit it to LLMObs' record intake. However, the LLMObs intake is being updated to accept Span events, which this PR creates new support for (and removes support for submitting LLMObs records). The new span event structure looks something like this:

class LLMObsEvent:
    span_id: str
    trace_id: str
    parent_id: str
    session_id: str
    tags: List[str]
    name: str
    start_ns: int
    duration: float
    error: int
    meta: Dict[str, Any]
    metrics: Dict[str, Any]

This also includes some changes to the LLMObs writer:

  • Submits payloads to a new intake url https://llmobs-intake.{DD_SITE}/api/v2/llmobs.
  • Removes all previous workarounds to submit only one record per interval, as the new span event structure does not have any of the previous limitations.
  • Removes DD_APP_KEY as a config option, as the new endpoint does not need it (only needs API key).

New Workflow (who owns the LLMObsWriter now?)

Currently, the workflow for LLMObs involves something like this:

1. OpenAI integration traces a LLM call
2. If `DD_LLMOBS_ENABLED=1`, then the OpenAIIntegration instance will generate a LLMObs record, then pass it to its `LLMObsWriter` thread to submit to LLMObs

The proposed workflow looks something like this:

1. If `DD_LLMOBS_ENABLED=1`, then the LLMObs service is enabled automatically on ddtrace-run/openai/bedrock integration load.
2. The LLMObs service starts its own LLMObsWriter thread, (instead of LLMIntegrations owning their own writer threads).
3. The LLMObs service also adds a LLMObsTraceProcessor to the active tracer.
4. The OpenAIIntegration traces (using same tracer) a LLM call, and marks that span as being an LLM-type. It also sets temporary `ml_obs.*` tags on the span.
5. Once the span is finished, it triggers the tracer's trace processors.
6. The LLMObsTraceProcessor takes all traces containing LLM-type spans, removes the `ml_obs.*` tags to create a LLMObs Span event, and submits to LLMObs via its LLMObsWriter thread.

Note: in this approach, we're creating temporary ml_obs.* tags on the APM spans before removing them to populate the LLMObs span event to submit to LLMObs. These temporary tags will only be created if LLMObs is enabled, and will never be submitted on the APM span object.

Checklist

  • Change(s) are motivated and described in the PR description
  • Testing strategy is described if automated tests are not included in the PR
  • Risks are described (performance impact, potential for breakage, maintainability)
  • Change is maintainable (easy to change, telemetry, documentation)
  • Library release note guidelines are followed or label changelog/no-changelog is set
  • Documentation is included (in-code, generated user docs, public corp docs)
  • Backport labels are set (if applicable)
  • If this PR changes the public interface, I've notified @DataDog/apm-tees.
  • If change touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.

Reviewer Checklist

  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Description motivates each change
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Change is maintainable (easy to change, telemetry, documentation)
  • Release note makes sense to a user of the library
  • Author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@Yun-Kim Yun-Kim added changelog/no-changelog A changelog entry is not required for this PR. MLObs ML Observability (LLMObs) labels Feb 8, 2024
@Yun-Kim Yun-Kim changed the base branch from main to yunkim/llmobs-service February 8, 2024 21:34
@pr-commenter
Copy link

pr-commenter bot commented Feb 8, 2024

Benchmarks

Benchmark execution time: 2024-02-14 23:07:55

Comparing candidate commit c7d15a1 in PR branch yunkim/llmobs-span with baseline commit 56422f3 in branch main.

Found 20 performance improvements and 17 performance regressions! Performance is the same for 158 metrics, 9 unstable metrics.

scenario:coreapiscenario-context_with_data_no_listeners

  • 🟩 max_rss_usage [-859.684KB; -686.965KB] or [-2.867%; -2.291%]

scenario:coreapiscenario-core_dispatch_no_listeners

  • 🟥 max_rss_usage [+614.164KB; +791.583KB] or [+2.104%; +2.711%]

scenario:coreapiscenario-core_dispatch_only_all_listeners

  • 🟩 max_rss_usage [-844.583KB; -688.960KB] or [-2.814%; -2.296%]

scenario:coreapiscenario-core_dispatch_with_results_no_listeners

  • 🟩 max_rss_usage [-863.504KB; -704.035KB] or [-2.870%; -2.340%]

scenario:flasksimple-appsec-telemetry

  • 🟥 execution_time [+215.377µs; +264.372µs] or [+3.408%; +4.183%]

scenario:httppropagationextract-datadog_tracecontext_tracestate_propagated_on_trace_id_match

  • 🟩 max_rss_usage [-747.464KB; -630.840KB] or [-2.491%; -2.102%]

scenario:httppropagationextract-empty_headers

  • 🟥 max_rss_usage [+697.845KB; +813.989KB] or [+2.384%; +2.780%]

scenario:httppropagationextract-invalid_span_id_header

  • 🟩 max_rss_usage [-761.802KB; -645.583KB] or [-2.539%; -2.151%]

scenario:httppropagationextract-invalid_trace_id_header

  • 🟥 max_rss_usage [+635.393KB; +747.416KB] or [+2.171%; +2.553%]

scenario:httppropagationextract-large_valid_headers_all

  • 🟩 max_rss_usage [-831.718KB; -724.762KB] or [-2.766%; -2.410%]

scenario:httppropagationextract-medium_valid_headers_all

  • 🟩 max_rss_usage [-758.594KB; -637.323KB] or [-2.526%; -2.122%]

scenario:httppropagationextract-none_propagation_style

  • 🟩 max_rss_usage [-820.634KB; -704.307KB] or [-2.735%; -2.348%]

scenario:httppropagationextract-valid_headers_all

  • 🟩 max_rss_usage [-768.650KB; -642.422KB] or [-2.565%; -2.144%]

scenario:httppropagationextract-valid_headers_basic

  • 🟥 max_rss_usage [+686.972KB; +803.562KB] or [+2.348%; +2.746%]

scenario:httppropagationextract-wsgi_invalid_trace_id_header

  • 🟥 max_rss_usage [+666.992KB; +770.704KB] or [+2.279%; +2.633%]

scenario:httppropagationextract-wsgi_large_header_no_matches

  • 🟥 max_rss_usage [+669.199KB; +767.678KB] or [+2.282%; +2.618%]

scenario:httppropagationextract-wsgi_large_valid_headers_all

  • 🟩 max_rss_usage [-732.367KB; -610.711KB] or [-2.443%; -2.038%]

scenario:httppropagationextract-wsgi_medium_header_no_matches

  • 🟥 max_rss_usage [+678.936KB; +774.325KB] or [+2.318%; +2.644%]

scenario:httppropagationextract-wsgi_medium_valid_headers_all

  • 🟩 max_rss_usage [-749.522KB; -642.708KB] or [-2.499%; -2.143%]

scenario:httppropagationextract-wsgi_valid_headers_all

  • 🟩 max_rss_usage [-817.060KB; -703.785KB] or [-2.720%; -2.343%]

scenario:httppropagationinject-with_all

  • 🟥 max_rss_usage [+713.624KB; +788.789KB] or [+2.437%; +2.694%]

scenario:httppropagationinject-with_dd_origin

  • 🟥 max_rss_usage [+705.559KB; +790.710KB] or [+2.412%; +2.703%]

scenario:httppropagationinject-with_priority_and_origin

  • 🟩 max_rss_usage [-1042.517KB; -956.331KB] or [-3.472%; -3.185%]

scenario:httppropagationinject-with_sampling_priority

  • 🟥 max_rss_usage [+697.334KB; +783.779KB] or [+2.386%; +2.682%]

scenario:httppropagationinject-with_tags

  • 🟩 max_rss_usage [-784.588KB; -692.020KB] or [-2.615%; -2.307%]

scenario:httppropagationinject-with_tags_max_size

  • 🟩 max_rss_usage [-734.773KB; -668.926KB] or [-2.450%; -2.230%]

scenario:otelspan-start

  • 🟥 max_rss_usage [+4.014MB; +4.126MB] or [+8.520%; +8.757%]

scenario:otelspan-start-finish-telemetry

  • 🟩 max_rss_usage [-1014.751KB; -893.576KB] or [-3.238%; -2.852%]

scenario:sethttpmeta-all-disabled

  • 🟥 max_rss_usage [+624.462KB; +727.628KB] or [+2.109%; +2.457%]

scenario:sethttpmeta-collectipvariant_exists

  • 🟩 max_rss_usage [-745.564KB; -616.765KB] or [-2.457%; -2.032%]

scenario:sethttpmeta-no-collectipvariant

  • 🟥 max_rss_usage [+604.403KB; +721.063KB] or [+2.040%; +2.434%]

scenario:sethttpmeta-obfuscation-no-query

  • 🟥 max_rss_usage [+642.239KB; +749.581KB] or [+2.169%; +2.531%]

scenario:sethttpmeta-obfuscation-regular-case-implicit-query

  • 🟩 max_rss_usage [-764.345KB; -654.100KB] or [-2.506%; -2.144%]

scenario:sethttpmeta-obfuscation-send-querystring-disabled

  • 🟩 max_rss_usage [-787.907KB; -654.294KB] or [-2.580%; -2.143%]

scenario:sethttpmeta-obfuscation-worst-case-implicit-query

  • 🟩 max_rss_usage [-802.142KB; -678.152KB] or [-2.626%; -2.220%]

scenario:sethttpmeta-useragentvariant_not_exists_1

  • 🟥 max_rss_usage [+618.106KB; +753.644KB] or [+2.087%; +2.545%]

scenario:span-start-finish

  • 🟥 max_rss_usage [+0.947MB; +1.068MB] or [+3.259%; +3.678%]

@Yun-Kim Yun-Kim mentioned this pull request Feb 9, 2024
18 tasks
@Yun-Kim Yun-Kim marked this pull request as ready for review February 13, 2024 22:17
@Yun-Kim Yun-Kim requested review from a team as code owners February 13, 2024 22:17
Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

- Remove apm_context
- Move kind --> meta.span.kind
- Replace status --> error (0=ok, 1=err)
- Move status_message --> meta.error.message
@Yun-Kim Yun-Kim enabled auto-merge (squash) February 15, 2024 19:20
@Yun-Kim Yun-Kim merged commit 45e677c into main Feb 15, 2024
149 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/llmobs-span branch February 15, 2024 19:55
Yun-Kim added a commit that referenced this pull request Apr 16, 2024
…8339 to 2.8] (#8986)

Backport #8339 to 2.8.

This PR changes the langchain integration such that it will check for
partner libraries before attempting to patch them.

The langchain integration patches `langchain_openai.OpenAIEmbeddings.*`
and `langchain_pinecone.PineconeVectorStore.*`, which are partner
libraries that are not required to be installed. Currently if they are
not available, we raise `ModuleNotFoundError`. This PR fixes this so
that we'll skip patching those methods if the corresponding partner
library is not available.

Additionally, this PR also adds importing
`langchain_community.llm/chat_models` as those are not automatically
imported by importing `langchain_community`. This is important as we
reference those submodules in our patch code later on.


## Checklist

- [x] Change(s) are motivated and described in the PR description
- [x] Testing strategy is described if automated tests are not included
in the PR
- [x] Risks are described (performance impact, potential for breakage,
maintainability)
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] [Library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
are followed or label `changelog/no-changelog` is set
- [x] Documentation is included (in-code, generated user docs, [public
corp docs](https://github.com/DataDog/documentation/))
- [x] Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))
- [x] If this PR changes the public interface, I've notified
`@DataDog/apm-tees`.

## Reviewer Checklist

- [x] Title is accurate
- [x] All changes are related to the pull request's stated goal
- [x] Description motivates each change
- [x] Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- [x] Testing strategy adequately addresses listed risks
- [x] Change is maintainable (easy to change, telemetry, documentation)
- [x] Release note makes sense to a user of the library
- [x] Author has acknowledged and discussed the performance implications
of this PR as reported in the benchmarks PR comment
- [x] Backport labels are set in a manner that is consistent with the
[release branch maintenance

policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog/no-changelog A changelog entry is not required for this PR. MLObs ML Observability (LLMObs)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants