Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow field-presence to be used in OTLP #336

Closed
wants to merge 2 commits into from

Conversation

jsuereth
Copy link
Contributor

@jsuereth jsuereth commented Oct 4, 2021

DO NOT MERGE UNTIL GOGOPROTO SOLUTION IS FOUND

Update the build to allow field presence to be used in OTLP.

NOTE: GoGoProto does not support optional field presence. Investigating how long adding that would take, but it would necessitate forking GoGo proto.

@jsuereth
Copy link
Contributor Author

jsuereth commented Oct 4, 2021

cc @jack-berg This is also blocking min/max on histogram using optional. Same fix for builds for non-go libraries

Copy link
Member

@tigrannajaryan tigrannajaryan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on an experimental capability of protoc. My understanding is that capability can be removed/changed anytime and thus does not belong to a stable protocol specification.

The official definition of the optional tag is the following:

When a message is parsed, if it does not contain an optional element, the corresponding field in the parsed object is set to the default value for that field.

My reading of this is that the missing optional field cannot be distinguished from a present field that has the default value. So, if I understand correctly this proposal cannot solve #336 regardless of what Gogo proto does.

@@ -442,7 +442,7 @@ message HistogramDataPoint {
// Negative events *can* be recorded, but sum should not be filled out when
// doing so. This is specifically to enforce compatibility w/ OpenMetrics,
// see: https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#histogram
double sum = 5;
optional double sum = 5;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity (and not having read any prior context), is there a semantic meaning to optional fields in OTLP? i.e., why would we allow SDKs to not populate this, given it can provide value? Or if it doesn't provide value why do we not just deprecate it for removal?

If it's about some fields being possibly useful if the backend understands it, then it's a concept I haven't seen before in OTel, which generally leans toward sending as much data as possible. If it is about that sort of efficiency though, I think we first need some sort of negotiation protocol defined, where the required fields (and semantic attributes!) are negotiated beforehand before sending - that would then make this sort of optionality make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this one is about Prometheus compatibility. Prometheus histogram sum only allows monotonic sums, and OpenTelemetry can compute histograms with non-monotonic sums. This bit of OTLP only exists for prometheus compatibility, specifically knowing that sum-in-histogram can be a sum-in-prometheus. There are alternative ways to solve this that were discarded. If you read the description of the proto, this change merely brings this in line with what was decided for the v0.9 version of OTLP.

Copy link

@anuraaga anuraaga Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I figured it's something like that - but since SDKs currently don't know if their data will end up in Prometheus, I think either 1) they always fill it or 2) the ability to negotiate what they are required to fill is needed, isn't it?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsuereth Would be good to get input on this point about OTLP negotiation. I am curious on whether optional fields really make sense for OTLP given there would probably need to be some out-of-band way of telling what needs to be sent, which may negate the need for optional fields in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We chatted a bit offline about this.

The Sum field was specified as optional, we just didn't encode that in the proto correctly. The current specification states that sum will only be filled out when it is monotonic (and positive) as required for prometheus compatibility. We may alllow non-monotonic sums, but marking it optional does NOT fix that issue, that's a separate concern.

The note above this line is basically describing how negative events can be recorded (i.e. if you wire a Gauge or UpDownCounter up to this form of histogram, you cannot report sum in OTLP right now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTLP negotiation

Just to add some colour to this.

By negotiation I assume you mean the receiver giving some instructions to the sender. This would be a significant shift in what OTLP today is and how it is used. It is essentially a uni-directional protocol from logical data flow perspective (plus acks and flow control in the opposite direction). We can add the opposite direction of data flow to OTLP, however the problem is that the entire Collector is built with a single direction of data flow in mind. It won't be an easy change to allow negotiations to happen between SDKs and backends if there is an intermediary Collector present.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had misinterpreted the field as "only fill when backend is Prometheus", not "only fill for monotonic metrics, because a backend may be Prometheus" which seems to be the case. I don't think there's any need for negotiation for the latter and it keeps things simpler.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Oct 4, 2021

A possible alternate solution is to add a bit to flags to indicate whether the sum is 0 or unassigned. (Does it need to be different from FLAG_NO_RECORDED_VALUE?). We will need to think through the backward compatibility aspect of this too since metrics proto is considered stable.

@jsuereth
Copy link
Contributor Author

jsuereth commented Oct 4, 2021

@tigrannajaryan As of this v3.15, field presence is part of protocol buffers (no longer experimental), see these notes.

A possible alternate solution is to add a bit to flags to indicate whether the sum is 0 or unassigned.

We'd actually like to do better, but that PR was rejected for various reasons, see: #320

As a quick note, two important points:

  1. The optional bit is not the important part to the fix. I just needed an optional to demonstrate this issue.
  2. The important aspect is whether or not we CAN rely on field-presence to evolve our protocol (see the Min/Max additions to histogram). This is the most important aspect of this PR and it's blocked on gogo-proto.

@tigrannajaryan
Copy link
Member

Does this need to be distinguishable in the Collector (e.g. converting from OTLP to Prometheus)? We won't be able to implement it since Gogoproto does not support it.

The important aspect is whether or not we CAN rely on field-presence to evolve our protocol (see the Min/Max additions to histogram). This is the most important aspect of this PR and it's blocked on gogo-proto.

Short-term I think we can't since we rely on Gogoproto in the Collector. Long term, probably we can, assuming we can move away from Gogoproto.

@jsuereth
Copy link
Contributor Author

jsuereth commented Oct 4, 2021

@tigrannajaryan You missed the collector SiG where @bogdandrutu and I discussed this. I'm not willing to make this change until we have a soution for gogoproto. I'm investigating updating goproto itself (this change actually isn't too hard to add, it's just impossible to submit + merge a PR to gogoproto, so we'd need to fork/vendor)

@tigrannajaryan tigrannajaryan self-requested a review October 4, 2021 14:25
@tigrannajaryan
Copy link
Member

@jsuereth SGTM. I removed my block. Let's see if there is a solution with gogoproto.

@jmacd
Copy link
Contributor

jmacd commented Oct 5, 2021

See #279.
Is it worth waiting so that 0.11 contains both the exponential histogram and min/max? Requires this change first.

@jsuereth
Copy link
Contributor Author

jsuereth commented Oct 5, 2021

@jmacd Might be worth discussing offline. Any solution for Go (likely) requires vendoring GoGo proto for a patch allowing optional or creating our own proto generation plugin with go-specific optimsiations from GoGo proto. I'm investigating how big that would be right now.

@anuraaga
Copy link

anuraaga commented Oct 13, 2021

Just curious, was there any consideration given to using a one-field oneof instead of optional? FWIU, they are equivalent (and optional is even implemented in proto reflection almost identical to a oneof to avoid breaking reflection), except the generated code and proto syntax for optional is a bit smaller. For example, the generated code for Metric.java already has methods like hasGauge(), which is why we'd want optional, but in addition has an enum DataCase to allow switch, which isn't needed for a single field.

It does have the advantage of working with any proto3 implementation out there, not just modern ones so I think it deserves some consideration if it hasn't been given yet. Taking advantage of modern features where helpful is of course a fine reason to use optional though if consideration has been given. FWIW, switching from a single-field oneof to optional later would be compatible except for if anyone used the *Case generated code for some reason (the oneof names could follow some internal-only naming scheme to discourage this). At least in Java, not sure about other languages but I'd expect them to act the same.

@tigrannajaryan
Copy link
Member

FWIU, they are equivalent (and optional is even implemented in proto reflection almost identical to a oneof to avoid breaking reflection), except the generated code and proto syntax for optional is a bit smaller.

If I remember correctly In Go oneof is slower than a regular optional field. It generates an additional indirection via an interface, which results in one extra allocation.

@jsuereth
Copy link
Contributor Author

Quick update on this:

  1. I have a version of GoGo proto that seemingly creates optional primitive types
  2. Based on @tigrannajaryan's comment, this one does NOT export optional as a oneof field. As @anuraaga points out, if all you do to gogoproto is flag that is supports proto3 optional, you get them encoded like oneof fields, and the behavior is correct, but wrapped in an additional struct. While I'm still working on tests (go is not one of my strengths), here's an example of the generated code . Is there anything obviously wrong here? The serialization library appears to handle the field-tag oneof appropriately for the raw primitive. Still working on more tests to make sure this covers corner cases well.

@codeboten
Copy link
Contributor

Update on this, I believe we have a working solution for adding optional fields support in the collector. Please take a look at open-telemetry/opentelemetry-collector#4750 and add comments there if there are any concerns.

Note: that PR was using the proto from #279, but the same work can be used to support this PR.

@bogdandrutu
Copy link
Member

@codeboten we really have a good solution now correct?

@codeboten
Copy link
Contributor

@bogdandrutu yes the current fix in open-telemetry/opentelemetry-collector#4929 allows the collector to continue using the proto generation moving forward. This is good to move forward 👍

@jmacd
Copy link
Contributor

jmacd commented Mar 4, 2022

Close in favor of #366.

@jmacd jmacd closed this Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot distinguish Histogram's Sum field between 0 vs not assigned.
6 participants