Skip to content

Commit

Permalink
chore: @lmolkova reviews
Browse files Browse the repository at this point in the history
  • Loading branch information
nirga committed Jan 28, 2024
1 parent 0203aea commit 0891f91
Show file tree
Hide file tree
Showing 5 changed files with 225 additions and 12 deletions.
2 changes: 1 addition & 1 deletion docs/ai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,4 @@ Technology specific semantic conventions are defined for the following LLM provi

* [OpenAI](openai.md): Semantic Conventions for *OpenAI*.

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.26.0/specification/document-status.md
99 changes: 99 additions & 0 deletions docs/ai/llm-spans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: LLM Calls
--->

# Semantic Conventions for LLM requests

**Status**: [Experimental][DocumentStatus]

<!-- Re-generate TOC with `markdown-toc --no-first-h1 -i` -->

<!-- toc -->

- [LLM Request attributes](#llm-request-attributes)
- [Configuration](#configuration)
- [Semantic Conventions for specific LLM technologies](#semantic-conventions-for-specific-llm-technologies)

<!-- tocstop -->

A request to an LLM is modeled as a span in a trace.

The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM.
It MAY be a name of the API endpoint for the LLM being called.

## Configuration

Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons:

1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend.
2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.
3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application.

By default, these configurations SHOULD NOT capture prompts and completions.

## LLM Request attributes

These attributes track input data and metadata for a request to an LLM. Each attribute represents a concept that is common to most LLMs.

<!-- semconv ai(tag=llm-request) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended |
| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended |
| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended |
| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended |

`llm.model` has the following list of well-known values. If one of them applies, then the respective value MUST be used, otherwise a custom value MAY be used.

| Value | Description |
|---|---|
| `gpt-4` | GPT-4 |
| `gpt-4-32k` | GPT-4 with 32k context window |
| `gpt-3.5-turbo` | GPT-3.5-turbo |
| `gpt-3.5-turbo-16k` | GPT-3.5-turbo with 16k context window|
| `claude-instant-1` | Claude Instant (latest version) |
| `claude-2` | Claude 2 (latest version) |
| `other-llm` | Any LLM not listed in this table. Use for any fine-tuned version of a model. |
<!-- endsemconv -->

## LLM Response attributes

These attributes track output data and metadata for a response from an LLM. Each attribute represents a concept that is common to most LLMs.

<!-- semconv ai(tag=llm-response) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required |
| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended |
| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended |
| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended |
| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended |

`llm.response.finish_reason` MUST be one of the following:

| Value | Description |
|---|---|
| `stop` | If the model hit a natural stop point or a provided stop sequence. |
| `max_tokens` | If the maximum number of tokens specified in the request was reached. |
| `tool_call` | If a function / tool call was made by the model (for models that support such functionality). |
<!-- endsemconv -->

## Events

In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation.

<!-- semconv ai(tag=llm-prompt) -->
| Attribute | Type | Description | Examples | Requirement Level |
| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Recommended |
<!-- endsemconv -->

<!-- semconv ai(tag=llm-completion) -->
| Attribute | Type | Description | Examples | Requirement Level |
| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Recommended |
<!-- endsemconv -->

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
114 changes: 114 additions & 0 deletions docs/ai/openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: OpenAI
--->

# Semantic Conventions for OpenAI Spans

**Status**: [Experimental][DocumentStatus]

This document outlines the Semantic Conventions specific to
[OpenAI](https://platform.openai.com/) spans, extending the general semantics
found in the [LLM Semantic Conventions](llm-spans.md). These conventions are
designed to standardize telemetry data for OpenAI interactions, particularly
focusing on the `/chat/completions` endpoint. By following to these guidelines,
developers can ensure consistent, meaningful, and easily interpretable telemetry
data across different applications and platforms.

## Chat Completions

The span name for OpenAI chat completions SHOULD be `openai.chat`
to maintain consistency and clarity in telemetry data.

## Request Attributes

These are the attributes when instrumenting OpenAI LLM requests with the
`/chat/completions` endpoint.

<!-- semconv llm.openai(tag=llm-request-tech-specific) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended |
| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
| `llm.temperature` | float | The temperature setting for the LLM request. | `0.0` | Recommended |
| `llm.top_p` | float | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
| `llm.stream` | bool | Whether the LLM responds with a stream. | `false` | Recommended |
| `llm.stop_sequences` | array | Array of strings the LLM uses as a stop sequence. | `["stop1"]` | Recommended |
| `llm.openai.n` | integer | The number of completions to generate. | `1` | Recommended |
| `llm.openai.presence_penalty` | float | If present, the `presence_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended |
| `llm.openai.frequency_penalty` | float | If present, the `frequency_penalty` used in an OpenAI request. Value is between -2.0 and 2.0. | `-0.5` | Recommended |
| `llm.openai.logit_bias` | string | If present, the JSON-encoded string of a `logit_bias` used in an OpenAI request. | `{2435:-100, 640:-100}` | Recommended |
| `llm.openai.user` | string | If present, the `user` used in an OpenAI request. | `bob` | Opt-in |
| `llm.openai.response_format` | string | An object specifying the format that the model must output. Either `text` or `json_object` | `text` | Recommended |
| `llm.openai.seed` | integer | Seed used in request to improve determinism. | `1234` | Recommended |
<!-- endsemconv -->

## Response attributes

Attributes for chat completion responses SHOULD follow these conventions:

<!-- semconv llm.openai(tag=llm-response-tech-specific) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required |
| `llm.response.finish_reason` | string | The reason the model stopped generating tokens | `stop` | Recommended |
| `llm.usage.prompt_tokens` | int | The number of tokens used in the LLM prompt. | `100` | Recommended |
| `llm.usage.completion_tokens` | int | The number of tokens used in the LLM response (completion). | `180` | Recommended |
| `llm.usage.total_tokens` | int | The total number of tokens used in the LLM prompt and response. | `280` | Recommended |
| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended |
| `llm.openai.system_fingerprint` | string | This fingerprint represents the backend configuration that the model runs with. | asdf987123 | Recommended |
<!-- endsemconv -->

## Request Events

In the lifetime of an LLM span, an event for prompts sent and completions received MAY be created, depending on the configuration of the instrumentation.
Because OpenAI uses a more complex prompt structure, these events will be used instead of the generic ones detailed in the [LLM Semantic Conventions](llm-spans.md).

### Prompt Events

Prompt event name SHOULD be `llm.openai.prompt`.

<!-- semconv llm.openai(tag=llm-prompt-tech-specific) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `role` | string | The role of the prompt author, can be one of `system`, `user`, `assistant`, or `tool` | `system` | Required |
| `content` | string | The content for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required |
| `tool_call_id` | string | If role is `tool` or `function`, then this tool call that this message is responding to. | `get_current_weather` | Conditionally Required: If `role` is `tool`. |
<!-- endsemconv -->

### Tools Events

Tools event name SHOULD be `llm.openai.tool`, specifying potential tools or functions the LLM can use.

<!-- semconv llm.openai(tag=llm-tools-tech-specific) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `type` | string | They type of the tool. Currently, only `function` is supported. | `function` | Required |
| `function.name` | string | The name of the function to be called. | `get_weather` | Required !
| `function.description` | string | A description of what the function does, used by the model to choose when and how to call the function. | `` | Required |
| `function.parameters` | string | JSON-encoded string of the parameter object for the function. | `{"type": "object", "properties": {}}` | Required |
<!-- endsemconv -->

### Choice Events

Recording details about Choices in each response MAY be included as
Span Events.

Choice event name SHOULD be `llm.openai.choice`.

If there is more than one `tool_call`, separate events SHOULD be used.

<!-- semconv llm.openai(tag=llm-completion-tech-specific) -->
| `type` | string | Either `delta` or `message`. | `message` | Required |
|---|---|---|---|---|
| `finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended |
| `role` | string | The assigned role for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `system` | Required |
| `content` | string | The content for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required |
| `tool_call.id` | string | If exists, the ID of the tool call. | `call_BP08xxEhU60txNjnz3z9R4h9` | Required |
| `tool_call.type` | string | Currently only `function` is supported. | `function` | Required |
| `tool_call.function.name` | string | If exists, the name of a function call for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `get_weather_report` | Required |
| `tool_call.function.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required |
<!-- endsemconv -->

[DocumentStatus]: https://github.com/open-telemetry/opentelemetry-specification/tree/v1.22.0/specification/document-status.md
6 changes: 3 additions & 3 deletions model/registry/llm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ groups:
brief: >
This document defines the attributes used to describe telemetry in the context of LLM (Large Language Models) requests and responses.
attributes:
- id: request.vendor
- id: system
type: string
brief: The name of the LLM foundation model vendor, if applicable.
examples: 'openai'
Expand All @@ -30,7 +30,7 @@ groups:
brief: The top_p sampling setting for the LLM request.
examples: [1.0]
tag: llm-generic-request
- id: request.stream
- id: request.is_stream
type: boolean
brief: Whether the LLM responds with a stream.
examples: [false]
Expand All @@ -41,7 +41,7 @@ groups:
examples: ["stop1"]
tag: llm-generic-request
- id: response.id
type: string
type: string[]
brief: The unique identifier for the completion.
examples: ['chatcmpl-123']
tag: llm-generic-response
Expand Down
16 changes: 8 additions & 8 deletions model/trace/llm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ groups:
brief: >
A request to an LLM is modeled as a span in a trace. The span name should be a low cardinality value representing the request made to an LLM, like the name of the API endpoint being called.
attributes:
- ref: llm.request.vendor
- ref: llm.system
requirement_level: recommended
note: >
The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank.
Expand All @@ -18,7 +18,7 @@ groups:
requirement_level: recommended
- ref: llm.request.top_p
requirement_level: recommended
- ref: llm.request.stream
- ref: llm.request.is_stream
requirement_level: recommended
- ref: llm.request.stop_sequences
requirement_level: recommended
Expand Down Expand Up @@ -65,9 +65,9 @@ groups:
- id: llm.openai
type: span
brief: >
These are the attributes when instrumenting OpenAI LLM requests with the `/chat/completions` endpoint.
A span representing a request to OpenAI's API, providing additional information on top of the generic llm.request.
attributes:
- ref: llm.request.vendor
- ref: llm.system
requirement_level: recommended
examples: ['openai', 'microsoft']
tag: tech-specific-openai-request
Expand All @@ -82,7 +82,7 @@ groups:
tag: tech-specific-openai-request
- ref: llm.request.top_p
tag: tech-specific-openai-request
- ref: llm.request.stream
- ref: llm.request.is_stream
tag: tech-specific-openai-request
- ref: llm.request.stop_sequences
tag: tech-specific-openai-request
Expand Down Expand Up @@ -119,7 +119,7 @@ groups:
name: llm.content.openai.prompt
type: event
brief: >
These are the attributes when instrumenting OpenAI LLM requests and recording prompts in the request.
This event is fired when a completion request is sent to OpenAI, specifying the prompt that was sent.
attributes:
- ref: llm.openai.role
requirement_level: required
Expand All @@ -134,7 +134,7 @@ groups:
name: llm.content.openai.tool
type: event
brief: >
These are the attributes when instrumenting OpenAI LLM requests that specify tools (or functions) the LLM can use.
This event is fired when a completion request is sent to OpenAI, specifying tools that the LLM can use.
attributes:
- ref: llm.openai.tool.type
requirement_level: required
Expand All @@ -149,7 +149,7 @@ groups:
name: llm.content.openai.completion.choice
type: event
brief: >
These are the attributes when instrumenting OpenAI LLM requests and recording choices in the result.
This event is fired when a completion response is returned from OpenAI, specifying one possibile completion returned by the LLM.
attributes:
- ref: llm.openai.choice.type
requirement_level: required
Expand Down

0 comments on commit 0891f91

Please sign in to comment.