Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LLM semantic conventions #639

Closed
wants to merge 12 commits into from
Closed

Add LLM semantic conventions #639

wants to merge 12 commits into from

Conversation

nirga
Copy link
Contributor

@nirga nirga commented Jan 12, 2024

Advancement towards #327

Changes

Please provide a brief description of the changes here.

Continuing the work from #483. Introduces semantic conventions for modern AI systems.
I tried focusing on a minimal set, specifically supporting LLMs in general with some specific semantic conventions for OpenAI as its API is far more complex than others like Anthropic. Future PRs will address more foundation models as well as vector DBs and frameworks.

I'm trying to match this to what we've already started building with OpenLLMetry and will make the needed changes there once this is approved.

Merge requirement checklist

@nirga nirga requested review from a team January 12, 2024 10:56
@joaopgrassi
Copy link
Member

Hi @nirga, thanks for the contribution!

Please, refer to the CONTRIBUTION guide. The markdown attribute tables are generated automatically, and you need to define the attributes via YAML files.

Copy link
Member

@gyliu513 gyliu513 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, thanks @nirga !

docs/ai/llm-spans.md Show resolved Hide resolved
docs/ai/llm-spans.md Show resolved Hide resolved
docs/ai/llm-spans.md Show resolved Hide resolved
<!-- semconv ai(tag=llm-response) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding llm.response.duration? This is requested to check the latency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it covered already by the fact that an LLM request is a single span which as a duration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean get the info just from otel span, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this information would be available from the Span. In the case of a streaming response some people want to know "time to first token", "max time/pause between tokens".

docs/ai/openai.md Show resolved Hide resolved
@nirga
Copy link
Contributor Author

nirga commented Jan 12, 2024

Hi @nirga, thanks for the contribution!

Please, refer to the CONTRIBUTION guide. The markdown attribute tables are generated automatically, and you need to define the attributes via YAML files.

Thanks! I was merely copying and adapting #483. I'll work on converting this to a YAML as well.

Copy link
Member

@drewby drewby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback. Once we define the yaml model files, some other efficiencies (duplication and naming conventions) become evident.

Also, can we add the openai metrics back into this PR or do you want that to be in a separate PR?

docs/ai/llm-spans.md Outdated Show resolved Hide resolved
docs/ai/llm-spans.md Show resolved Hide resolved
<!-- semconv ai(tag=llm-response) -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this information would be available from the Span. In the case of a streaming response some people want to know "time to first token", "max time/pause between tokens".

| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `llm.response.id` | string | The unique identifier for the completion. | `chatcmpl-123` | Recommended |
| `llm.response.model` | string | The name of the LLM a response is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model actually used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4-0613` | Required |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect llm.request.model and llm.response.model to be different? The request and response are all recorded on one span, so these would be redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for example in OpenAI you ask for gpt-4 and then get a specific version like gpt-4-0613 (I've also seen this in Ahthropic, Replicate, and others)

|---|---|---|---|---|
| `llm.vendor` | string | The name of the LLM foundation model vendor, if applicable. If not using a vendor-supplied model, this field is left blank. | `openai` | Recommended |
| `llm.request.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model requested. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
| `llm.request.max_tokens` | int | The maximum number of tokens the LLM generates for a request. | `100` | Recommended |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_tokens is prefixed with request, whereas other parameters such as temperature are not prefixed. Perhaps we should remove the request prefix or add the prefix to the others. Instead of request, perhaps parameter is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Parameter would sound weird for model, no? llm.parameter.model. I've added request to all request parameters.

Copy link

linux-foundation-easycla bot commented Jan 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@nirga
Copy link
Contributor Author

nirga commented Jan 23, 2024

Hi @nirga, thanks for the contribution!

Please, refer to the CONTRIBUTION guide. The markdown attribute tables are generated automatically, and you need to define the attributes via YAML files.

@joaopgrassi YAML files were added.

Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start!
Left some suggestions and questions.

docs/ai/README.md Outdated Show resolved Hide resolved
model/registry/llm.yaml Outdated Show resolved Hide resolved
brief: The name of the LLM foundation model vendor, if applicable.
examples: 'openai'
tag: llm-generic-request
- id: request.model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just llm.model ?

Also, I assume there could be multiple model properties, perhaps llm.model.name would be more future-proof?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is there are many providers (like OpenAI, Anthropic, Render, etc.) where you ask for a general version (like gpt-4-turbo-preview) but then get a specific version (like gpt-4-0125-preview). So we need a separation between the "request" model and the "actual" model.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Is it a common case that request and response models are different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question still applies: llm.model.name, could the model contain more info other than the name? Like llm.request.model.name|version etc? Do we want to make "model" a top namespace and then request response can just re-use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question still applies: llm.model.name, could the model contain more info other than the name? Like llm.request.model.name|version etc? Do we want to make "model" a top namespace and then request response can just re-use?

This is a single input parameter in the services I've seen, not a separate name and version. The response model could be a different qualified identifier. For example, the request could be for 'gpt4' and the response could say 'gpt4-32k-turbo'.

model/registry/llm.yaml Outdated Show resolved Hide resolved
tag: llm-generic-request
- id: request.stop_sequences
type: string
brief: Array of strings the LLM uses as a stop sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's an array, should the type be ``string[]`? If there are good reasons (such as perf) to keep it as string, how values are separated? Could you also provide an example of array in examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the resolution here? should it be of type string[]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should likely be string[]

- llm.content.openai.tool
- llm.content.openai.completion.choice

- id: llm.content.openai.prompt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this event?
It can be llm.content.prompt with open-ai specific attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question. Given that OpenAI (specifically) has a really different way of modeling prompts and completions, I wonder if it won't be cumbersome to use the same event for both?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can start with one event and add more once it's proven to be too difficult. the spec would stay experimental for now anyway.
For the time being we can just list OpenAI-specific attributes and mention that they would appear on the events.

(Unless you already have good reasons to keep events separate)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some frameworks (e.g. vllm) have OpenAI-compatible serving APIs.

Using the same event name can benefit this use case.

model/trace/llm.yaml Show resolved Hide resolved
model/trace/llm.yaml Outdated Show resolved Hide resolved
model/trace/llm.yaml Outdated Show resolved Hide resolved
model/trace/llm.yaml Outdated Show resolved Hide resolved
docs/ai/README.md Outdated Show resolved Hide resolved
A request to an LLM is modeled as a span in a trace.

The **span name** SHOULD be set to a low cardinality value representing the request made to an LLM.
It MAY be a name of the API endpoint for the LLM being called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't usually put endpoints in the span names. Perhaps we can stay vague and say that it should contain specific operation name (e.g. create_chat_completions).

See also a comment on metric regarding introducing llm.operation attribute


## Configuration

Instrumentations for LLMs MUST offer the ability to turn off capture of prompts and completions. This is for three primary reasons:
Copy link
Contributor

@lmolkova lmolkova Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please set requirement levels on corresponding attributes to opt-in - then there will be no need to specify this requirement - https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attribute-requirement-level.md

We can just say that prompts and completions could be sensitive (and keep explanation below)

2. Data size concerns. Although there is no specified limit to sizes, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.
3. Performance concerns. Sending large amounts of data to a telemetry backend may cause performance issues for the application.

By default, these configurations SHOULD NOT capture prompts and completions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default, these configurations SHOULD NOT capture prompts and completions.

we need to change requirement level to opt-in and then it's redundunt

<!-- semconv llm.request -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| [`llm.request.is_stream`](../attributes-registry/llm.md) | boolean | Whether the LLM responds with a stream. | `False` | Recommended |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is it important for observability? How would i use it?

brief: The name of the LLM foundation model vendor, if applicable.
examples: 'openai'
tag: llm-generic-request
- id: request.model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Is it a common case that request and response models are different?

|---|---|---|---|---|
| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |

**[1]:** The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object, this field is blank, and the response is instead captured in an event determined by the specific LLM technology semantic convention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's ok to put json into this attribute. Once we have a way to specify what goes into event payload, we'll move it there and json, xml or plain text would be perfectly fine.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prompt vs. completion seems to be the old Completion API that is deprecated by openai and many other model providers not even start with having Completion api (e.g. Mistral: https://docs.mistral.ai/api/)

The current open ai api is ChatCompletion which has a messages array as input and one message output.
https://platform.openai.com/docs/guides/text-generation/chat-completions-api

Also if we consider the llm-span is the "base" and to be extended for different model providers and APIs, i'm not sure if we should include things like prompt/completion (input/output) as part of it, since different model/api will have totally different input/output, trying to define a base input/output here doesn't help.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes more sense to define span type for each api not vendor?
Open AI has chat completion, embedding, image generation, gpt-4v ... it will be hard to capture that in a single span type 'openai'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes more sense to define span type for each api not vendor? Open AI has chat completion, embedding, image generation, gpt-4v ... it will be hard to capture that in a single span type 'openai'.

Yes, have the same concern. Do you think "inputs" and "outputs" would be the more generic representation across various apis? Then we can add api specific attributes for chatcompletions, image generation, etc. gen_ai.openai.chatcompletions.*, gen_ai.openai.images.* etc. We should discuss this in the working group.

|---|---|---|---|---|
| [`llm.completion`](../attributes-registry/llm.md) | string | The full response string from an LLM in a response. [1] | `Why did the developer stop using OpenTelemetry? Because they couldnt trace their steps!` | Recommended |

**[1]:** The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an event determined by the specific LLM technology semantic convention.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this sentence.
Why leave this attribute blank and not out json there?
also, I think we should create even per each message in the completion, at least when response is streamed.

@@ -0,0 +1,372 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: MEtrics
Copy link
Contributor

@lmolkova lmolkova Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
linkTitle: MEtrics
linkTitle: LLM metrics

(or OpenAI metrics depending on the discussion below)

@@ -0,0 +1,25 @@
<!--- Hugo front matter used to generate the website version of this page:
linkTitle: AI
Copy link
Contributor

@lmolkova lmolkova Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since it's LLM semconv, I think it should be in the llm folder and should have LLM title

to: database/README.md
--->

# Semantic Conventions for AI systems
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Semantic Conventions for AI systems
# Semantic Conventions for LLM clients

brief: The name of the LLM foundation model vendor, if applicable.
examples: 'openai'
tag: llm-generic-request
- id: request.model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question still applies: llm.model.name, could the model contain more info other than the name? Like llm.request.model.name|version etc? Do we want to make "model" a top namespace and then request response can just re-use?

brief: The total number of tokens used in the LLM prompt and response.
examples: [280]
tag: llm-generic-response
- id: prompt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inside request.prompt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inside request.prompt?

These were intended to be attributes on span events, but we will be moving them to the Event body.

brief: The full prompt string sent to an LLM in a request.
examples: ['\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:']
tag: llm-generic-events
- id: completion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- id: completion
- id: response.completion

No?

Copy link

@zzn2 zzn2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nirga for the great work!
Added some comments.

| Value | Description |
|---|---|
| `prompt` | prompt |
| `completion` | completion |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does embedding have completion token type?

examples: ["stop1"]
tag: llm-generic-request
- id: response.id
type: string[]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should response.id be of string type (instead of string[])?

- ref: llm.request.max_tokens
tag: tech-specific-openai-request
- ref: llm.request.temperature
tag: tech-specific-openai-request
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the default requirement level if not specified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is Recommended.

- llm.content.openai.tool
- llm.content.openai.completion.choice

- id: llm.content.openai.prompt
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some frameworks (e.g. vllm) have OpenAI-compatible serving APIs.

Using the same event name can benefit this use case.

| [`llm.request.top_p`](../attributes-registry/llm.md) | double | The top_p sampling setting for the LLM request. | `1.0` | Recommended |
| [`llm.response.finish_reason`](../attributes-registry/llm.md) | string | The reason the model stopped generating tokens. | `stop` | Recommended |
| [`llm.response.id`](../attributes-registry/llm.md) | string[] | The unique identifier for the completion. | `[chatcmpl-123]` | Recommended |
| [`llm.response.model`](../attributes-registry/llm.md) | string | The name of the LLM a response is being made to. [2] | `gpt-4-0613` | Required |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exclusive with llm.request.model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the requested model identifier is sometimes different than the response model identifier. For example, Azure OpenAI allows for a deployment name as the request model, but responds with the actual LLM model name. Other systems will add the current variant at the end of the model name in the response.


| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| [`llm.prompt`](../attributes-registry/llm.md) | string | The full prompt string sent to an LLM in a request. [1] | `\\n\\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\\n\\nAssistant:` | Recommended |
Copy link

@zzn2 zzn2 Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the plain text prompts, OpenAI's chat completion API also supports complicated inputs/outputs like images, function calls, etc. How do we plan to record these kinds of payloads into trace?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the chat completion API, OpenAI also has a lot of other APIs like Embeddings, Images, Assistants, etc. How do we plan to support those scenarios?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are discussing in the working group, requirements for an initial minimum PR to get into the semantic-conventions. After this initial merge. we will be able to create additional proposals, issues, PRs. We will likely reduce the surface area of this initial PR. Then proposals can be submitted for embeddings, images, etc.

@nirga
Copy link
Contributor Author

nirga commented Mar 19, 2024

Work continued in #825

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.