Add the beginnings of AI Semantic conventions #483

cartermp · 2023-11-01T18:05:24Z

Fixes #327

Changes

As mentioned in #327, this introduces semantic conventions for modern AI systems. While there's a lot of machine learning stuff that doesn't involve LLMs and vector DBs, the sheer adoption of this tech is so high and growing that it's a good one to start with. Furthermore, with projects like OpenLLMetry likely moving into the CNCF space, there's no better time like the present to get started here.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
CHANGELOG.md updated for non-trivial changes.
schema-next.yaml updated with changes to existing conventions.

docs/ai/anthropic.md

Co-authored-by: Nathan Slaughter <28688390+nslaughter@users.noreply.github.com>

drewby

We have many customers that would benefit from these semantic conventions getting merged in the short term even if in Experimental status.

Its a great start. I think the placesholders (todo and empty files) would need to be removed and added at a later date. I'd also reduce the the list done to the essentials in order to get a PR approved and then we can add more in furture PRs. For example, the OpenAI list can be greatly reduced by eliminating the deprecated Chat api and combine ChatCompletions into one list for streaming and non-streaming.

Its also important to have metrics defined as well. We started a draft sometime ago for openai: https://github.com/lmolkova/semantic-conventions/tree/openai/docs/openai. Feel free to cherry pick.

I'm happy to help anyway I can, to get this main. I can do a PR to your branch with some updates if that helps.

drewby · 2023-11-09T05:56:09Z

docs/ai/llm-spans.md

+<!-- semconv ai(tag=llm-response) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an attribute determined by the specific LLM technology semantic convention for responses.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required |


In openai, you have completion_tokens, prompt_tokens, etc. Is that not generally applicable here?

On multiple responses from LLM, if these are captured as events (see my earlier suggestion) then this could be handled by adding multiple events to the Span.

Unfortunately not every LLM supports this in their response. For example, in anthropic's client SDK they have a separate count_tokens function that you use to pass your prompt and/or response to to get this information.

Perhaps this could be done as an optional attribute, since the reality is that most people are using OpenAI.

drewby · 2023-11-09T05:59:23Z

docs/ai/llm-spans.md

+1. Data privacy concerns. End users of LLM applications may input sensitive information or personally identifiable information (PII) that they do not wish to be sent to a telemetry backend.
+2. Data size concerns. Although there is no specified limit to the size of an attribute, there are practical limitations in programming languages and telemety systems. Some LLMs allow for extremely large context windows that end users may take full advantage of.
+
+By default, these configurations SHOULD capture inputs and outputs.


Should these inputs and outputs be added as Events instead of directly to the span? They aren't directly used for query and Events in some systems have higher limits on attribute size.

I would disagree with that. Inputs and outputs are definitely used for querying, such as:

"For a system doing text -> json, show me all groups of inputs and outputs where we failed to parse a json response"

Or:

"Group inputs by feedback responses"

Or:

"For input , show all grouped outputs"

While a backend could in theory assemble these from span events, I think it's far more likely that a tracing backend would just look for this data directly on the spans. I also don't think it fits the conceptual model for span events, as there's not really a meaningful timestamp to assign to this data - it'd have to be contrived or zereod out.

it's common for backends to have limitations of attribute length

E.g.

New Relic - 4k

Azure Monitor - 8k

AWS X-Ray - 250 chars

In addition to backend limitations, attribute values will stay in memory until spans are exported and may significantly increase otel memory consumption.
Events have the same limitations, so logs seem the only reasonable option given verbosity and the ability to export them right away.

It's still possible to query logs/events (as long as they are in the same backend).

drewby · 2023-11-09T06:07:55Z

docs/ai/openai.md

+|---|---|---|---|---|
+| `llm.openai.messages.<index>.role` | string | The assigned role for a given OpenAI request, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `system` | Required |
+| `llm.openai.messages.<index>.message` | string | The message for a given OpenAI request, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `You are an AI system that tells jokes about OpenTelemetry.` | Required |
+| `llm.openai.messages.<index>.name` | string | If present, the message for a given OpenAI request, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `You are an AI system that tells jokes about OpenTelemetry.` | Required |


This is redundant description and example with line above.

drewby · 2023-11-09T06:11:57Z

docs/ai/openai.md

+| `llm.openai.functions.<index>.name` | string | If present, name of an OpenAI function for a given OpenAI request, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `get_weather_forecast` | Required |
+| `llm.openai.functions.<index>.parameters` | string | If present, JSON-encoded string of the parameter object of an OpenAI function for a given OpenAI request, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {}}` | Required |
+| `llm.openai.functions.<index>.description` | string | If present, description of an OpenAI function for a given OpenAI request, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `Gets the weather forecast.` | Required |
+| `llm.openai.n` | int | If present, the number of messages an OpenAI request responds with. | `2` | Recommended |


If using Span Events, this won't be needed.

drewby · 2023-11-09T06:13:41Z

docs/ai/openai.md

+<!-- semconv llm.openai(tag=llm-response-tech-specific) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.openai.choices.<index>.role` | string | The assigned role for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `system` | Required |


We should consider using Span Events instead of "indexed" attributes here.

Why would span events make more sense here than attributes?

drewby · 2023-11-09T07:58:35Z

docs/ai/openai.md

+| `llm.openai.choices.<index>.role` | string | The assigned role for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `system` | Required |
+| `llm.openai.choices.<index>.content` | string | The content for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required |
+| `llm.openai.choices.<index>.function_call.name` | string | If exists, the name of a function call for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `get_weather_report` | Required |
+| `llm.openai.choices.<index>.function_call.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `{"type": "object", "properties": {"some":"data"}}` | Required |


Again, these could be Span Events with a type attribute of function.

drewby · 2023-11-09T08:00:26Z

docs/ai/openai.md

+<!-- semconv llm.openai(tag=llm-response-tech-specific-chunk) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.openai.choices.<index>.delta.role` | string | The assigned role for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `system` | Required |


I'm not sure I compeletely understand the use case, but this seems like it be an awful lot of attributes for each stream delta (really, one for every token?). Instead of having a seperate set of attributes for Streaming, why not just combine with ChatCompletions with an attribute that says it was a "Stream"?

drewby · 2023-11-09T08:08:24Z

docs/ai/openai.md

I think for the short-term to get a PR approved, I'd focus this list on just ChatCompletions. Chat is deperecated to older models. And it will be much simpler to start if its just one list for not streaming and streaming.

Do you mean the completions endpoint? I initially added a section there because (at the time) GPT-3.5-turbo-instruct was added. The docs are a little confusing, though, as the endpoint is considered legacy, but the model is quite new.

Happy to remove it for now, though.

morningspace · 2023-11-13T04:16:34Z

Hi @cartermp @drewby happen to come across this PR. I am seeing there are vendor specific convention for openai, anthropic, etc. Just curious, if it would also cover watsonx, in case there's anything specific?

drewby · 2023-11-13T04:23:05Z

Hi @cartermp @drewby happen to come across this PR. I am seeing there are vendor specific convention for openai, anthropic, etc. Just curious, if it would also cover watsonx, in case there's anything specific?

We should push as much as possible to find a common set of attributes. But if you look at other areas like Database semantic conventions, there is a pattern for including vendor specific additions that build on the core set. So yes, I'd expect some specific conventions for openai, watsonx, etc.

For this PR, I'd focus on a small set to start and we can add more via further PRs. It will be at "Experimental" level so changes will be expected.

cartermp · 2023-11-13T17:42:38Z

Yeah, I'd prefer to keep the scope smaller here. As far as I'm aware, once you're past OpenAI/Anthropic/Cohere there's very few end-users for other commercial options. Open Source is tricker since a fine-tuned model can emit just about anything in any format, so the generic attributes is about as good as we could get for now.

cartermp · 2023-11-13T17:44:12Z

@drewby Feel free to PR against my branch! I have time to address things and get this over the hump, but the more contributions, the better 🙂

lmolkova · 2023-11-14T02:10:22Z

docs/ai/llm-spans.md

+
+## Configuration
+
+Instrumentations for LLMs MUST offer the ability to turn off capture of raw inputs to LLM requests and the completion response text for LLM responses. This is for two primary reasons:


in other semconvs we control it with Opt-in requirement level.

Opt-in attributes are always off by default and instrumentations MAY provide configuration.
Given the privacy, verbosity and consistency reasons, I believe we should do the same here.

lmolkova

I have concerns around:

capturing extensive amounts of data by default
fitting it into potentially strictly limited attribute values
capturing sensitive data (by default)
capturing contents - we never capture contents of HTTP requests/responses, DB responses (even queries are controversial), messaging payloads, etc and we do not have a good approach for it in OTel.

I suggest starting with noncontroversial part that does not include prompt/completions and then evolving it to potentially include contents.

JFYI: we've been baking something around Azure OpenAI that's consistent with the current stuff in OTel semconv in case you want to take a look - https://github.com/open-telemetry/semantic-conventions/pull/513/files

lmolkova · 2023-11-14T02:34:24Z

docs/ai/llm-spans.md

+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.model` | string | The name of the LLM a request is being made to. If the LLM is supplied by a vendor, then the value must be the exact name of the model used. If the LLM is a fine-tuned custom model, the value SHOULD have a more specific name than the base model that's been fine-tuned. | `gpt-4` | Required |
+| `llm.prompt` | string | The full prompt string sent to an LLM in a request. If the LLM accepts a more complex input like a JSON object made up of several pieces (such as OpenAI's different message types), this field is that entire JSON object encoded as a string. | `\n\nHuman:You are an AI assistant that tells jokes. Can you tell me a joke about OpenTelemetry?\n\nAssistant:` | Required |


given the verbosity and that it contain sensitive and private data, this attribute should be opt-in

lmolkova · 2023-11-14T02:36:09Z

docs/ai/llm-spans.md

+<!-- semconv ai(tag=llm-response) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.completion` | string | The full response string from an LLM. If the LLM responds with a more complex output like a JSON object made up of several pieces (such as OpenAI's message choices), this field is the content of the response. If the LLM produces multiple responses, then this field is left blank, and each response is instead captured in an attribute determined by the specific LLM technology semantic convention for responses.| `Why did the developer stop using OpenTelemetry? Because they couldn't trace their steps!` | Required |


for the same reasons as propmt, this should be opt-in (and probably an event/log)

lmolkova · 2023-11-14T02:37:21Z

docs/ai/openai.md

+| `llm.openai.choices.<index>.finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended |
+| `llm.openai.id` | string | The unique identifier for the chat completion. | `chatcmpl-123` | Recommended |
+| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended |
+| `llm.openai.model` | string | The name of the model used for the completion. | `gpt-3.5-turbo` | Recommended |


should be covered with llm.model and not necessarry?

lmolkova · 2023-11-14T02:37:36Z

docs/ai/openai.md

+| `llm.openai.choices.<index>.delta.function_call.arguments` | string | If exists, the arguments to call a function call with for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `{"type": "object",` | Required |
+| `llm.openai.choices.<index>.finish_reason` | string | The reason the OpenAI model stopped generating tokens for this chunk. | `stop` | Recommended |
+| `llm.openai.id` | string | The unique identifier for the chat completion. | `chatcmpl-123` | Recommended |
+| `llm.openai.created` | int | The UNIX timestamp (in seconds) if when the completion was created. | `1677652288` | Recommended |


would be a good timestamp for the log/event

lmolkova · 2023-11-14T02:39:50Z

docs/ai/openai.md

+<!-- semconv llm.openai(tag=llm-response-tech-specific-chunk) -->
+| Attribute  | Type | Description  | Examples  | Requirement Level |
+|---|---|---|---|---|
+| `llm.openai.choices.<index>.delta.role` | string | The assigned role for a given OpenAI response, denoted by `<index>`. The value for `<index>` starts with 0, where 0 is the first message. | `system` | Required |


instead of representing the whole response as one span, would it perhaps be better to represent each completion as an individual span and avoid having indexed attributes?

linux-foundation-easycla · 2023-11-15T17:21:30Z

The committers listed above are authorized under a signed CLA.

✅ login: cartermp / name: Phillip Carter (32586a5, c83e491, 8b46140, 1c8d328, 4c922cd, 2d1bd47, ebcab1e, bdda45a, 1f47a6c, 0aac56f, bc2ee57, 315a76a, f0d12da, 6a89928, afc182e, e5173b2, 5fd523e, d26784c, 5427fe1, b9b4d99, 272f3f1)
✅ login: drewby / name: Drew Robbins (471faf4, 53c4fb5, 5a7ef2d, 27e37d6)

cartermp · 2023-11-15T17:22:14Z

@drewby I pulled in yours and @lmolkova's work. Looks like you need to sign the CLA though!

drewby · 2023-11-15T20:37:24Z

@drewby I pulled in yours and @lmolkova's work. Looks like you need to sign the CLA though!

Signed.

sudivate · 2023-11-15T22:05:55Z

I would also like to request a review from @mikeldking from Arize . Arize team started with the Open-Inference Spec initiative which includes Semantic Conventions for Traces

mikeldking · 2023-11-17T16:21:28Z

I would also like to request a review from @mikeldking from Arize . Arize team started with the Open-Inference Spec initiative which includes Semantic Conventions for Traces

Thanks for the nomination @sudivate - this initiative is very much something we've been looking for and have a fair amount of learnings from our implementation of the OpenInference semantic conventions. Will follow along and try to give informed feedback as I see it. Exciting progress!

morningspace · 2023-12-08T03:05:50Z

Hi @drewby, also others, I saw you mentioned adding metrics to this PR, but it's specifically to OpenAI, while generally, I thought the conventions for metrics, just as tracing does in this PR, needs to be categorized into common stuff, plus vendor specific stuff. Will this be updated later?

Besides that, I originally thought this PR is mainly for tracing, but now that I saw the metrics for OpenAI is also added, will this PR also cover metrics?

cartermp · 2023-12-09T15:11:11Z

Unfortunately (as evidenced by my activity here), I don't really have the time/space to make reasonable progress on this PR anymore. @drewby @lmolkova please feel free to take over or start anew. I'm more than happy to offer a drive-by review.

nirga · 2023-12-09T15:14:30Z

@cartermp would love to take this over

cartermp · 2023-12-09T16:08:48Z

@nirga go for it! I don't have any staged changes, so feel free to carry on from here. My main TODO was to redefine the request/response as logs.

drewby · 2023-12-11T07:57:01Z

@cartermp would love to take this over

@nirga, would a call make sense to sync up on scope for this? We may also want to have more discussion in a Slack thread in the SIG channel for semantic conventions.

I'm normally in Japan time, but will be in the US for two weeks starting 12/14 and will have more time through the end of the year.

drewby · 2023-12-11T07:57:54Z

Hi @drewby, also others, I saw you mentioned adding metrics to this PR, but it's specifically to OpenAI, while generally, I thought the conventions for metrics, just as tracing does in this PR, needs to be categorized into common stuff, plus vendor specific stuff. Will this be updated later?

Besides that, I originally thought this PR is mainly for tracing, but now that I saw the metrics for OpenAI is also added, will this PR also cover metrics?

We could focus a PR on tracing first, but metrics would also be useful to have some common data model / semantic conventions.

nirga · 2023-12-11T22:16:21Z

@drewby I’ll ping you on slack

gyliu513 · 2024-01-04T16:57:03Z

docs/ai/openai-metrics.md

+<!-- endsemconv -->
+
+
+### Metric: `llm.openai.chat_completions.duration`


I did not see chat_completions has the duration attribute at https://platform.openai.com/docs/api-reference/chat/object, am I missing anything?

nirga · 2024-01-12T10:58:23Z

@drewby @lmolkova @gyliu513 @mikeldking and others I might have missed -
I'm continuing @cartermp's great work in #639. Let's get this merged :)

github-actions · 2024-01-28T03:18:21Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2024-02-04T03:19:27Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

arminru · 2024-02-05T12:03:43Z

^ continued in #639

nirga · 2024-04-19T03:21:17Z

FYI, a first version of this is now merged with #825

cartermp added 17 commits September 13, 2023 16:09

First cut at AI semconv

32586a5

Add openai response attributes

c83e491

Add prose

8b46140

add streamed response table for openai

1c8d328

add todo files

4c922cd

nope

2d1bd47

update

ebcab1e

Adding anthropic

bdda45a

Add chains and agents semconv

1f47a6c

remove todos and amend tables

0aac56f

clarify chain/agent definitions

bc2ee57

Clarify intent behind traces

315a76a

add configuration

f0d12da

Update llm spans

6a89928

Clarify some language

afc182e

increase the scope, fuck it, why not

e5173b2

Update openai.md

5fd523e

cartermp requested review from a team November 1, 2023 18:05

github-actions bot assigned AlexanderWert Nov 1, 2023

Merge branch 'main' into cartermp/ai

d26784c

nslaughter reviewed Nov 3, 2023

View reviewed changes

docs/ai/anthropic.md Outdated Show resolved Hide resolved

Update docs/ai/anthropic.md

5427fe1

Co-authored-by: Nathan Slaughter <28688390+nslaughter@users.noreply.github.com>

drewby reviewed Nov 9, 2023

View reviewed changes

lmolkova reviewed Nov 14, 2023

View reviewed changes

lmolkova requested changes Nov 14, 2023

View reviewed changes

Merge pull request #1 from drewby/drewby/ai

b9b4d99

drewby and others added 2 commits November 16, 2023 07:49

Fix table formatting

27e37d6

Merge pull request #2 from drewby/drewby/ai

272f3f1

nirga mentioned this pull request Nov 27, 2023

🚀 Feature: contribute this to otel traceloop/openllmetry#213

Closed

1 task

morgante mentioned this pull request Dec 3, 2023

[Feature]: Add open telemetry support BerriAI/litellm#963

Closed

gyliu513 mentioned this pull request Dec 6, 2023

feat: Monitor Vector DB langfuse/langfuse#630

Open

1 task

gyliu513 reviewed Jan 4, 2024

View reviewed changes

nirga mentioned this pull request Jan 12, 2024

Add LLM semantic conventions #639

Closed

3 tasks

github-actions bot added the Stale label Jan 28, 2024

github-actions bot closed this Feb 4, 2024

nirga mentioned this pull request Mar 19, 2024

LLM Semantic Conventions: Initial PR #825

Merged

3 tasks

dashpole mentioned this pull request Jul 10, 2024

New component: Blob Attribute Uploader Connector open-telemetry/opentelemetry-collector-contrib#33737

Open

3 tasks

codefromthecrypt mentioned this pull request Aug 5, 2024

Allow INTERNAL GenAI/db spans instead of requiring the kind to be CLIENT #1315

Open


		## Configuration

		Instrumentations for LLMs MUST offer the ability to turn off capture of raw inputs to LLM requests and the completion response text for LLM responses. This is for two primary reasons:

		<!-- endsemconv -->


		### Metric: `llm.openai.chat_completions.duration`

Add the beginnings of AI Semantic conventions #483

Add the beginnings of AI Semantic conventions #483

Conversation

cartermp commented Nov 1, 2023

Changes

Merge requirement checklist

drewby left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nirga Nov 14, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

morningspace commented Nov 13, 2023 • edited Loading

drewby commented Nov 13, 2023

cartermp commented Nov 13, 2023

cartermp commented Nov 13, 2023

Choose a reason for hiding this comment

lmolkova left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

linux-foundation-easycla bot commented Nov 15, 2023 • edited Loading

cartermp commented Nov 15, 2023

drewby commented Nov 15, 2023

sudivate commented Nov 15, 2023

mikeldking commented Nov 17, 2023

morningspace commented Dec 8, 2023

cartermp commented Dec 9, 2023

nirga commented Dec 9, 2023

cartermp commented Dec 9, 2023

drewby commented Dec 11, 2023

drewby commented Dec 11, 2023

nirga commented Dec 11, 2023

Choose a reason for hiding this comment

nirga commented Jan 12, 2024

github-actions bot commented Jan 28, 2024

github-actions bot commented Feb 4, 2024

arminru commented Feb 5, 2024

nirga commented Apr 19, 2024

drewby left a comment •

edited

Loading

nirga Nov 14, 2023 •

edited

Loading

morningspace commented Nov 13, 2023 •

edited

Loading

lmolkova left a comment •

edited

Loading

linux-foundation-easycla bot commented Nov 15, 2023 •

edited

Loading