-
Notifications
You must be signed in to change notification settings - Fork 887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTLP/HTTP: Retry or no on status code 401, 403? #2915
Comments
I think this needs to be discussed first. An argument can be made that only responses that indicate that the request data is wrong should be permanent (since retrying it will never succeed). Everything else seems useful to retry using our usual backoff strategy. For example auth failures may be because of misconfiguration on the server side which can be fixed over time and a retry may succeed. It is not obvious to me that dropping the data is more useful than retrying it in this case. |
I think this discussion belongs to the spec, moving there. |
@open-telemetry/specs-approvers any thoughts on this? |
Related to #2217. Without clarification from the spec, java decided that 429, 502, 503, and 504 are retryable, as seen here. We've left the retry feature in java unstable due to this lack of specificity (and also #1742) which really is a shame because retry is super important for production use cases. |
|
gRPC retry is largely based on the grpc status code (not http status code) returned.
gRPC clients can dictate a retry policy as described here. |
Doesn't answer my question - the status codes are conceptually similar, and the approach should be the same across transports.
Also doesn't answer my question. Yes, the client can do it, but why make it a client (distributed) problem instead of a server (centralized) problem? Server should know better anyway if the request is retryable. |
For the context, the spec today says:
For OTLP/gRCP we have a more explicit table that lists every possible gRPC response code and tells whether it is retryable. We can introduce a similar table for OTLP/HTTP and borrow from OTLP/gRPC's equivalent lines (e.g. UNAUTHENTICATED in gRPC corresponds to HTTP 401. I still wanted to discuss this to make sure we agree this is the right approach. I am not certain that it is. |
Do you anticipate some sort of downside from being explicit about which are retryable? |
No. We should be explicit. I had some imaginary situations in my mind that after thinking through I no longer think should be considered. |
Also see #2993 |
Thanks for moving the issue to the right place. any idea how this can be moved forward?
IHMO this may or may not be the case, i.e. if the server failed to authenticate the request because of external dependencies, say the auth server for example, it should be a 5xx error i.e. 502? |
PRs to fix this are welcome. I think we can mirror what we do for OTLP/gRPC. |
Is your feature request related to a problem? Please describe.
401, 403 means auth failure, typically retry should not be attempted.
Describe the solution you'd like
add 401, 403 as
PermanentClientFailure
hereThe text was updated successfully, but these errors were encountered: