-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic 502 errors for HTTP/2 POST/PUT requests after upgrade to 2.8.x #571
Comments
So we went through three separate stages (
1. Known GoodWith 2.7 we didn't observe any particular issues. 2. Broken BehaviourWith 2.8 some of the scenarios broke. We suspect that this broken behaviour was introduced with haproxy/haproxy@f2b02cf but we don't know for sure. The observed behaviour was that for 0.1% to 1% of requests with a body the backend responded with a 200 but HAProxy detected some broken backend state and responded with a 502. My current understanding is that this was previously masked because of the way the states were checked. With 2.8 there was an over-haul of error checking. The specific case is if HAProxy still has data to send (the request body) but the server already responded with a header frame which had the 3. Fix AppliedWith haproxy/haproxy@d3e379b HAProxy still processes the response even if the server already closed the stream as the closing of the stream and the response can be a single frame. There's probably a lot more details to this but for me this explanation is sound enough to accept that it's fixed with 2.8.4. The error reporting is still broken but that seems to depend on a pending refactor. |
I'd say for the purpose of this issue it should be sufficient to bump to 2.8.4 and close it as the traffic is no longer hampered. We may or may not decide to roll out the change or wait for a "better" fix. However, upstream clients may still want to use 2.8 and should not be forced to wait because of a seemingly visual-only error. |
Fixed in https://github.com/cloudfoundry/haproxy-boshrelease/releases/tag/v13.3.0%2B2.8.4. As mentioned above, the new version might contain some misleading 200s with termination state Edit: Adding additional feedback from the HAProxy community
|
Just a little update about this issue. A recent fix was pushed to haproxy-3.0-DEV. It may help to limit false |
Description
This was observed in Cloud Foundry landscapes with the following ingress path: Client -> LB -> HAProxy -> gorouter -> app. On the frontend, HTTP/2 is enabled and apps are free to choose between HTTP/1.1 and HTTP/2. If the speaks H/2, H/2 is used towards gorouter who might to a downgrade and opens several HTTP/1.1 connection towards the respective app for a paralllel H/2 request at the frontend. With the HAProxy upgrade from 2.7.10 to 2.8.3, we started observing a failure rate of ~1% for POST and PUT requests.
Initial Analysis
In the platform logs, we observed differing status codes for HAProxy and Gorouter respectively. If gorouter logs 200 response, HAProxy might log a 502 for the same request. HAProxy also logs termination state
SH--
, from the docs:The issue could be reproduced with 100 req/s, e.g. via:
Explanation: execute HTTP/2 POST requests with the given string as body for 1 minute with 2 requests per worker per second using the default worker pool of 50. This results in 100 requests per second using 50 connections for 60s = 6000 requests (if everything goes well, might be a bit less as the program hard-stops after one minute).
Countermeasures
Do not move to 2.8.x if you run a similar setup. We keep maintaining the 2.7.y-line as well.
The text was updated successfully, but these errors were encountered: