Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM semantic conventions #639

Closed
wants to merge 12 commits into from
25 changes: 25 additions & 0 deletions docs/ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: AI
Copy link
Contributor

@lmolkova lmolkova Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since it's LLM semconv, I think it should be in the llm folder and should have LLM title

path_base_for_github_subdir:
from: content/en/docs/specs/semconv/ai/_index.md
to: database/README.md
--->

# Semantic Conventions for AI systems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Semantic Conventions for AI systems
# Semantic Conventions for LLM clients


**Status**: [Experimental][DocumentStatus]

This document defines semantic conventions for the following kind of AI systems:

* LLMs

Semantic conventions for LLM operations are defined for the following signals:

* [LLM Spans](llm-spans.md): Semantic Conventions for LLM requests - *spans*.
nirga marked this conversation as resolved.
Show resolved Hide resolved

Technology specific semantic conventions are defined for the following LLM providers:

* [OpenAI](openai.md): Semantic Conventions for *OpenAI* spans.
* [OpenAI Metrics](openai-metrics.md): Semantic Conventions for *OpenAI* metrics.

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md
86 changes: 86 additions & 0 deletions docs/ai/llm-spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: LLM Calls
--->

# Semantic Conventions for LLM requests

**Status**: [Experimental][DocumentStatus]

<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

<!-- toc -->

- [Configuration](#configuration)
- [LLM Request attributes](#llm-request-attributes)
- [Events](#events)

<!-- tocstop -->

A request to an LLM is modeled as a span in a trace.

The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM.
It MAY be a name of the API endpoint for the LLM being called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't usually put endpoints in the span names. Perhaps we can stay vague and say that it should contain specific operation name (e.g. create_chat_completions).

See also a comment on metric regarding introducing llm.operation attribute


## Configuration

Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons:
Copy link
Contributor

@lmolkova lmolkova Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please set requirement levels on corresponding attributes to opt-in - then there will be no need to specify this requirement - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attribute-requirement-level.md

We can just say that prompts and completions could be sensitive (and keep explanation below)


1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend.
2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.
3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application.

By default, these configurations SHOULD NOT capture prompts and completions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, these configurations SHOULD NOT capture prompts and completions.

we need to change requirement level to opt-in and then it's redundunt


## LLM Request attributes

These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs.

<!-- semconv llm.request -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
nirga marked this conversation as resolved.
Show resolved Hide resolved
| [`llm.request.is_stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is it important for observability? How would i use it?

| [`llm.request.max_tokens`](../attributes-registry/llm.md) | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
| [`llm.request.model`](../attributes-registry/llm.md) | string | The name of the LLM a request is being made to. [1] | `gpt-4` | Required |
| [`llm.request.stop_sequences`](../attributes-registry/llm.md) | string | Array of strings the LLM uses as a stop sequence. | `stop1` | Recommended |
| [`llm.request.temperature`](../attributes-registry/llm.md) | double | The temperature setting for the LLM request. | `0.0` | Recommended |
| [`llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended |
| [`llm.response.id`](../attributes-registry/llm.md) | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | Recommended |
| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [2] | `gpt-4-0613` | Required |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exclusive with llm.request.model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the requested model identifier is sometimes different than the response model identifier. For example, Azure OpenAI allows for a deployment name as the request model, but responds with the actual LLM model name. Other systems will add the current variant at the end of the model name in the response.

| [`llm.system`](../attributes-registry/llm.md) | string | The name of the LLM foundation model vendor, if applicable. [3] | `openai` | Recommended |
| [`llm.usage.completion_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM response (completion). | `180` | Recommended |
| [`llm.usage.prompt_tokens`](../attributes-registry/llm.md) | int | The number of tokens used in the LLM prompt. | `100` | Recommended |
| [`llm.usage.total_tokens`](../attributes-registry/llm.md) | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended |

**[1]:** The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.

**[2]:** The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value should have a more specific name than the base model that's been fine-tuned.

**[3]:** The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank.
<!-- endsemconv -->

## Events

In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit in reporting the event without prompt text? E.g. to add event for each message and have a place to describe individual message properties such as role, token counts and whatnot?


<!-- semconv llm.content.prompt -->
The event name MUST be `llm.content.prompt`.

| Attribute | Type | Description | Examples | Requirement Level |
nirga marked this conversation as resolved.
Show resolved Hide resolved
|---|---|---|---|---|
nirga marked this conversation as resolved.
Show resolved Hide resolved
| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |
Copy link

@zzn2 zzn2 Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the plain text prompts, OpenAI's chat completion API also supports complicated inputs/outputs like images, function calls, etc. How do we plan to record these kinds of payloads into trace?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the chat completion API, OpenAI also has a lot of other APIs like Embeddings, Images, Assistants, etc. How do we plan to support those scenarios?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are discussing in the working group, requirements for an initial minimum PR to get into the semantic-conventions. After this initial merge. we will be able to create additional proposals, issues, PRs. We will likely reduce the surface area of this initial PR. Then proposals can be submitted for embeddings, images, etc.


**[1]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to put json into this attribute. Once we have a way to specify what goes into event payload, we'll move it there and json, xml or plain text would be perfectly fine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt vs. completion seems to be the old Completion API that is deprecated by openai and many other model providers not even start with having Completion api (e.g. Mistral: https://docs.mistral.ai/api/)

The current open ai api is ChatCompletion which has a messages array as input and one message output.
https://platform.openai.com/docs/guides/text-generation/chat-completions-api

Also if we consider the llm-span is the "base" and to be extended for different model providers and APIs, i'm not sure if we should include things like prompt/completion (input/output) as part of it, since different model/api will have totally different input/output, trying to define a base input/output here doesn't help.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes more sense to define span type for each api not vendor?
Open AI has chat completion, embedding, image generation, gpt-4v ... it will be hard to capture that in a single span type 'openai'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes more sense to define span type for each api not vendor? Open AI has chat completion, embedding, image generation, gpt-4v ... it will be hard to capture that in a single span type 'openai'.

Yes, have the same concern. Do you think "inputs" and "outputs" would be the more generic representation across various apis? Then we can add api specific attributes for chatcompletions, image generation, etc. gen_ai.openai.chatcompletions.*, gen_ai.openai.images.* etc. We should discuss this in the working group.

<!-- endsemconv -->

<!-- semconv llm.content.completion -->
The event name MUST be `llm.content.completion`.

| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended |

**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence.
Why leave this attribute blank and not out json there?
also, I think we should create even per each message in the completion, at least when response is streamed.

<!-- endsemconv -->

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
Loading
Loading