-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: reset transport version via reconnect_transport #5520
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the quick turnaround here!
@@ -112,6 +112,8 @@ class transport final : public net::base_transport { | |||
* version level used when dispatching requests. this value may change | |||
* during the lifetime of the transport. for example the version may be | |||
* upgraded if it is discovered that a server supports a newer version. | |||
* | |||
* reset to v1 in reset_state() to support reconnect_transport. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
@andrwng looks like there are several new ducktape bugs that popped up after this change. i'll have to address those before merging. |
When an error occurs while processing a request (e.g. a handler throws or the server receives an unsupported api request) a default initialized netbuf is created to hold the error and is sent back to the client. Prior to this commit the header included v0 as the version for such error replies. Since the error may be returned to a client whose transport was upgraded to v2 a protocol violation error would be logged by the client. This commit matches the reply version to the request version for error replies under the reasonable assumption that (1) the rpc header is compatibile across transport versions and (2) the errors contain no decodable payloads. Signed-off-by: Noah Watkins <[email protected]>
Other exceptions log messages when returning timeout errors to the client. However, an actual timeout exception didn't log so an observer would have to deduce exactly what happened based on an omission in the logs. Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
This is needed for the case in which a peer on the other end of a reconnect_transport is downgraded. unless we reset the version on reconnect then we'll get a policy violation if the peer responds with a lower version than the transport had been upgraded to. Signed-off-by: Noah Watkins <[email protected]>
Signed-off-by: Noah Watkins <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall makes sense. I've checked out the new patch and am running node op fuzz test with mixed versions (previously also tripped on this), initial results seem positive
@@ -153,6 +153,7 @@ simple_protocol::dispatch_method_once(header h, net::server::resources rs) { | |||
ctx->res.conn->addr); | |||
rs.probe().method_not_found(); | |||
netbuf reply_buf; | |||
reply_buf.set_version(ctx->get_header().version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes me think it'd be nice to be able to add a vassert
that we never send a v0 netbuf
over the wire, e.g. in send_reply()
. I don't think we can though since we'll still send v0 over the wire if talking to an old server (though it gets ignored regardless)
Another thought is that we could use some simple_reply_buf
around the simple protocol instead of netbuf
that is constructed with an initial version (even if it's just _version
).
I don't feel strongly that either of these would be much better than what's there, but just brainstorming ways we can be sure our sent bytes are reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can though since we'll still send v0 over the wire if talking to an old server (though it gets ignored regardless)
correct
Another thought is that we could use some simple_reply_buf around the simple protocol instead of netbuf that is constructed with an initial version (even if it's just _version).
i think the right move would be to add a constructor to netbuf so it can't be constructed without providing a version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea that seems reasonable too. I was initially worried we'd end up with some lifecycle issues, e.g. not knowing what the right version is at time of construction, but I agree baking this into netbuf would feel better
Failure is #4848 |
Cover letter
reconnect_transport may be used on a peer that is downgraded, in which case, we need to also reset the transport version to restart version negotiation.
Problem found via test #5488
Fixes: #5506
Release notes