Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename/replace (client|server).socket.(address|port) attributes with network.(peer|local).(address|port). #342

Merged
merged 30 commits into from
Oct 10, 2023

Conversation

trask
Copy link
Member

@trask trask commented Sep 25, 2023

Renames/replaces (client|server).socket.(address|port) attributes with network.(peer|local).(address|port).

Motivation:

  • Moving these under network.* makes it clear that these describe the network connection.

  • Renaming these to peer and local makes it extra clear that these describe the direct network peer connection. I think enough so that the extra notes are no longer needed:

    When observed from the client side, this SHOULD represent the immediate server peer address.
    When observed from the server side, this SHOULD represent the physical server address.

    When observed from the client side, this SHOULD represent the immediate server peer port.
    When observed from the server side, this SHOULD represent the physical server port.

Removes server.socket.domain. This was for modeling proxies, but this use case could be addressed after stability (e.g. possibly with proxy.*.

Note: I don't think schema transformation is possible for this change.

cc @lmolkova @Oberon00 @AlexanderWert

@trask trask requested review from a team September 25, 2023 04:13
@AlexanderWert
Copy link
Member

... but it does not address the duplication issue ...

Is avoiding duplication a strict guideline that is written down somewhere? I'm asking because I have the feeling that this guideline contributes to some of the confusion in this context of semantic conventions.

I think it also helps by making a slightly clearer separation between the logical client./server. attributes and the physical network.client./network.server. attributes.

I really like that clarity about separation between the logical and physical connection. BUT it gets mixed up and unclear, once we apply the guideline to avoid duplication and have this: network.server.address is recommended but only if different than server.address (and same with client). So, for instrumentation logic and for end users one piece of information (in one attribute e.g. network.server.address) depends on the availability of another piece of information (in another attribute).

Concrete example:

A web-server instrumentation retrieves the IP from the incoming request. Now, where to put that IP, client.address or network.client.address ? Well, it depends on whether the request also contains a Forwarded-For header, right?

So, I'm just wondering how problematic it would actually be if we would just consistently set client/server.* attributes to the logical connection levels, and network.client/server.* attributes to the physical connection level, disregarding the duplication concern?
Does avoiding duplication needs to be a strict rule if there are legitimate semantical reasons for doing so?

Not blocking this PR, this was just on my mind when reflecting the discussions we had around this specific context.

@lmolkova
Copy link
Contributor

lmolkova commented Sep 25, 2023

So, I'm just wondering how problematic it would actually be if we would just consistently set client/server.* attributes to the logical connection levels, and network.client/server.* attributes to the physical connection level, disregarding the duplication concern?
Does avoiding duplication needs to be a strict rule if there are legitimate semantical reasons for doing so?

I believe the guidance for instrumentations today is:

  • server.address: if there is a DNS, use it, otherwise use IP
  • server.socket.address: if socket info is available, check if it's not the same as server.address and (if not) set the attribute value
  • client.address: if there is a forwarded header, use it, otherwise use direct IP
  • client.socket.address: if socket info is available, check if it's not the same as client.address and then (if not) set the attribute value

I.e. server \ client are not logical attributes, but best known - either logical or physical.
Consumers can always use server \ client as the primary source of information and consider *.socket.* attributes as an extra detail.

This way there is no duplication problem and it's still simple to use.

@trask
Copy link
Member Author

trask commented Sep 25, 2023

I.e. server \ client are not logical attributes, but best known - either logical or physical.

good point 👍

@lmolkova
Copy link
Contributor

lmolkova commented Sep 25, 2023

I'd like to suggest the following:

  • the only socket attribute we have to figure out prior to HTTP stability is server.socket.address since it's the only one used on metrics
  • we can still rename server.socket.address to server.ip attribute to describe the best known IP - either of the server or a proxy (it's rarely possible to know the actual IP of the server anyway)
  • we can remove all other *.socket.* attributes from HTTP semconv (or all semconv) and they can be added later after stability
    • I believe we need to model them better with symmetrical protocols and see how things will play out with proxies

[EDIT]

Another approach is that we keep physical attributes (with whatever names) and start rendering attribute stability explicitly identifying physical socket attributes as experimental (on spans).

@lmolkova
Copy link
Contributor

lmolkova commented Sep 25, 2023

I can also imagine a future when we won't want physical connection attributes on HTTP spans. E.g.:

  • we have a span per connection measuring connection duration and capturing it's information and how it ends
  • we link HTTP spans to connection spans

@trask
Copy link
Member Author

trask commented Sep 26, 2023

we can still rename server.socket.address to server.ip attribute to describe the best known IP - either of the server or a proxy (it's rarely possible to know the actual IP of the server anyway)

for some reason I'm thinking of server.ip as the IP address of the best known server. the distinction being whether server.ip should be an IP address for server.address, or if it's ok for server.ip to be captured from the socket connection (in which case it may be the IP address of a proxy server)

@trask
Copy link
Member Author

trask commented Sep 26, 2023

I liked @lmolkova's suggestion above (and I think @Oberon00 made similar previously) about network.local.* and network.peer.*.

I put together a new proposal in diagrams: https://gist.github.com/trask/711f91feda06115353e4e56cfebedf5d

a few notes

  • it includes network.local.* and network.peer.*
  • it includes proxy.* (but this would only be layered on by instrumentation which is proxy-aware)
  • it includes network.(local|peer).port even if it's a duplicate of (client|server).port because I thought it felt right to capture network layer address/port pairs together
  • it does not include network.(local|peer).* if there is no corresponding "best known" client/server and so "best known" client/server is captured at the network layer

I think duplication is an orthogonal concern, so would be good to get thoughts on both the modeling, and separately on the duplication.

@trask trask force-pushed the alt-network branch 2 times, most recently from 950c3f8 to a45beec Compare September 26, 2023 15:34
docs/http/http-metrics.md Outdated Show resolved Hide resolved
docs/rpc/rpc-metrics.md Outdated Show resolved Hide resolved
model/metrics/rpc-metrics.yaml Outdated Show resolved Hide resolved
@trask trask changed the title Rename/replace (client|server).socket.(address|port|domain) attributes with network.(client|server).(address|port|ip). Rename/replace (client|server).socket.(address|port) attributes with network.(peer|local).(address|port). Sep 26, 2023
@trask
Copy link
Member Author

trask commented Sep 26, 2023

@lmolkova @Oberon00 @AlexanderWert I updated the PR, and the title and description, ptal, thx!

model/proxy.yaml Outdated Show resolved Hide resolved
model/trace/http.yaml Outdated Show resolved Hide resolved
@trask
Copy link
Member Author

trask commented Sep 27, 2023

I'm thinking that maybe it's not worth trying to incorporate client.ip and server.ip into the HTTP semconv general modeling.

I think server.ip (when captured) should be a resolved IP address for server.address, which means we wouldn't want to use server.ip like in #321 (comment):

image

or on the client side to capture server.ip from the network connection, which could point to a forward proxy and not be a resolved IP address for server.address.

I think client.ip and server.ip can still be brought into OpenTelemetry, but maybe they will have less significance in OpenTelemetry if we go forward with network.peer.address and network.local.address as proposed in this PR.

Which brings us to the question of whether we should move forward with this PR at all, since one of the drivers for #321 was to try to incorporate client.ip and server.ip.

I still like a few things about this PR:

  • moving all the network-layer stuff under network.* making a clear separation. network.* is now the place to go when you need to dig into that layer (and is the stuff you can generally ignore otherwise 😅).
  • I didn't like the peer term when it was describing the stuff that you need to care about even before going down to the network layer. But I don't mind it when it's used only for these low-level details. In fact, I maybe even like it, since it lends extra precision to these low-level details (and again, I can generally ignore it 😅).

@AlexanderWert
Copy link
Member

I'm thinking that maybe it's not worth trying to incorporate client.ip and server.ip into the HTTP semconv general modeling.
...
I think client.ip and server.ip can still be brought into OpenTelemetry, but maybe they will have less significance in OpenTelemetry if we go forward with network.peer.address and network.local.address as proposed in this PR.

I think that's a valid point for HTTP semconv!
As long as we are not blocking that client/server.ip can be added to OTel attributes registry (and be used for other use cases in the future), a +1 from me on your proposal @trask !
I think, once we dive deeper into more specific logging and security scenarios with OTel, client/server.ip will be needed as attributes, but agree that with your proposal here they are not needed for HTTP semconv.

I still like a few things about this PR:

  • moving all the network-layer stuff under network.* making a clear separation. network.* is now the place to go when you need to dig into that layer (and is the stuff you can generally ignore otherwise 😅).
  • I didn't like the peer term when it was describing the stuff that you need to care about even before going down to the network layer. But I don't mind it when it's used only for these low-level details. In fact, I maybe even like it, since it lends extra precision to these low-level details (and again, I can generally ignore it 😅).

Also +1 on moving to network.* instead of client/server.socket.*. I agree with both of your points above!

@trask
Copy link
Member Author

trask commented Oct 3, 2023

any thoughts how this can/will play out on logs where there is no concept of which side we are observing from, e.g. https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model-appendix.md#apache-http-server-access-log?

For some logs, e.g. Apache Server access logs (e.g. open-telemetry/opentelemetry-specification#3712), network.(peer|local).* seems ok for network-layer attributes since the logs are "server" access logs, and clearly from the perspective of the server side.

  • %a - network.peer.address
  • %A - network.local.address
  • %h - client.address

But what if we want to log HTTP requests from a reverse proxy, e.g. in this picture:

reverse-proxy

The reverse proxy could report an HTTP log record with:

client.address:        101.102.103.104
client.port:           50101

server.address:        server.io
server.port:           9876

But the reverse proxy couldn't report its direct network peers (at least not both on that same HTTP log record):

network.peer.address:  101.102.103.104
network.peer.port:     50101

network.peer.address:  10.11.12.13
network.peer.port:     5678

Whereas if we went with network.(server|client).* instead of network.(peer|local).*, it could emit an HTTP log record with:

client.address:        101.102.103.104
client.port:           50101

server.address:        server.io
server.port:           9876

network.client.address:  101.102.103.104
network.client.port:     50101

network.server.address:  10.11.12.13
network.server.port:     5678

But then again, other network.* values could be different between the client-side connection and the server-side connection:

  • network.transport
  • network.type
  • network.protocol.name
  • network.protocol.version

so maybe it's good that network.(peer|local).* very specifically limits us to a single network connection, and we have to find other ways (and other attributes) to model more complex scenarios as needed.

@trask
Copy link
Member Author

trask commented Oct 3, 2023

Trying to summarize:

  • Moving these (to somewhere) under network.* makes it clear that these describe the network connection.
  • These are only populated when (additionally) instrumenting the network connection (as opposed to client.*, server.*, source.*, and domain.*, which will continue to be captured by higher-level instrumentation)

These are the two primary options for where to put them under network.*:

  • network.(peer|local).*
  • network.(client|server|source|destination).*

Soft preference for network.(peer|local).* because

  • It's less likely to be confused with the high-level (client|server|source|destination).* attributes
  • It's extra clear that it only models (only) a direct peer connection

Open questions:

  • Does option 1 cause a problem for (e.g. HTTP) logs? Logs don't (currently) have an attribute like SpanKind to know which side they are observed from. The high-level attributes are modeled as (client|server|source|destination).*, but it's not clear whether this is beneficial for the network layer attributes.
  • In option 2, would we want all four namespaces network.(client|server|source|destination).*? Can lower-level network instrumentation know the difference to choose between client/server or source/destination?

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of peer and local.

@reyang
Copy link
Member

reyang commented Oct 3, 2023

I like the use of peer and local.

+1

@AlexanderWert
Copy link
Member

Does option 1 cause a problem for (e.g. HTTP) logs? Logs don't (currently) have an attribute like SpanKind to know which side they are observed from. The high-level attributes are modeled as (client|server|source|destination).*, but it's not clear whether this is beneficial for the network layer attributes.

@trask I think that's fine! I think the logical addresses are more important for logs (e.g. access logs). And depending on certain concrete use cases in the future (e.g. we would explicitly define / document semantic conventions for NGINX access logs), we can still overwrite / precise the explicit meaning of network.peer in that context.

@Oberon00
Copy link
Member

Oberon00 commented Oct 4, 2023

we can still overwrite / precise the explicit meaning of network.peer in that context.

IMHO, the network.peer/local attributes should be so low-level and technical that we should never (need to) override them. Specific instructions for how to get these pieces of information (e.g. formatter code, getter to get to the socket, etc.) are fine but even slightly overwriting the meaning should be a no-go for these IMHO.

@joaopgrassi
Copy link
Member

joaopgrassi commented Oct 6, 2023

But the reverse proxy couldn't report its direct network peers (at least not both on that same HTTP log record):

network.peer.address:  101.102.103.104
network.peer.port:     50101

network.peer.address:  10.11.12.13
network.peer.port:     5678

@trask maybe it's just me not getting it, but in the diagrams you mentioned before https://gist.github.com/trask/711f91feda06115353e4e56cfebedf5d, isn't the last scenario exactly this one? In there you have network.local.address|port describing the server the reverse proxy is talking to and in network.peer.address|port you have the proxy itself. Isn't that modeling all scenarios for logs you mentioned?

Image for context:
image

@AlexanderWert
Copy link
Member

IMHO, the network.peer/local attributes should be so low-level and technical that we should never (need to) override them. Specific instructions for how to get these pieces of information (e.g. formatter code, getter to get to the socket, etc.) are fine but even slightly overwriting the meaning should be a no-go for these IMHO.

@Oberon00 The meaning of network.peer/local is inherently depending on the context perspective, right? In a tracing scenario it's clear because we have the SpanKind that tells us what local is and what peer is.
But, what about a scenario where we report two network connections in a single log entry (e.g. access logs on a reverse proxy (NGINX, etc.), or a load balancer, where we have the incoming request / connection and the downstream connection where the request is being forwarded to). It would be not clear what is local and what is peer. So in this scenario specifying the concrete meaning of network.peer/local is a MUST. So by overwriting I mean specifying which of the two network connections the network.peer/local attributes belong to.

@Oberon00
Copy link
Member

Oberon00 commented Oct 6, 2023

You are right, I did not think of a situation where multiple network connections are involved in a single telemetry item (could also happen with spans).

@trask
Copy link
Member Author

trask commented Oct 9, 2023

specifying which of the two network connections the network.peer/local attributes belong to.

just wanted to mention that this extends beyond network.peer/local.* and is an issue for all network.* attributes (e.g. one network connection could be over ipv4 and the other over ipv6)

@trask
Copy link
Member Author

trask commented Oct 9, 2023

@trask maybe it's just me not getting it, but in the diagrams you mentioned before https://gist.github.com/trask/711f91feda06115353e4e56cfebedf5d, isn't the last scenario exactly this one? In there you have network.local.address|port describing the server the reverse proxy is talking to and in network.peer.address|port you have the proxy itself. Isn't that modeling all scenarios for logs you mentioned?

this diagram shows what's reported from the server (as opposed to what's reported from reverse proxy). the server only has a single network connection (to the reverse proxy).

if you were emitting telemetry from the reverse proxy itself, you'd have two network connections to deal with (one to the client, and one to the server), and in that case it's not clear which one network.peer.address refers to

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.