rpc: reset transport version via reconnect_transport #5520

dotnwat · 2022-07-19T21:24:44Z

Cover letter

reconnect_transport may be used on a peer that is downgraded, in which case, we need to also reset the transport version to restart version negotiation.

Problem found via test #5488

Fixes: #5506

Release notes

None

andrwng

LGTM! Thanks for the quick turnaround here!

andrwng · 2022-07-19T21:41:30Z

src/v/rpc/transport.h

@@ -112,6 +112,8 @@ class transport final : public net::base_transport {
 * version level used when dispatching requests. this value may change
 * during the lifetime of the transport. for example the version may be
 * upgraded if it is discovered that a server supports a newer version.
+ *
+ * reset to v1 in reset_state() to support reconnect_transport.


dotnwat · 2022-07-20T01:36:06Z

@andrwng looks like there are several new ducktape bugs that popped up after this change. i'll have to address those before merging.

When an error occurs while processing a request (e.g. a handler throws or the server receives an unsupported api request) a default initialized netbuf is created to hold the error and is sent back to the client. Prior to this commit the header included v0 as the version for such error replies. Since the error may be returned to a client whose transport was upgraded to v2 a protocol violation error would be logged by the client. This commit matches the reply version to the request version for error replies under the reasonable assumption that (1) the rpc header is compatibile across transport versions and (2) the errors contain no decodable payloads. Signed-off-by: Noah Watkins <[email protected]>

Other exceptions log messages when returning timeout errors to the client. However, an actual timeout exception didn't log so an observer would have to deduce exactly what happened based on an omission in the logs. Signed-off-by: Noah Watkins <[email protected]>

Signed-off-by: Noah Watkins <[email protected]>

This is needed for the case in which a peer on the other end of a reconnect_transport is downgraded. unless we reset the version on reconnect then we'll get a policy violation if the peer responds with a lower version than the transport had been upgraded to. Signed-off-by: Noah Watkins <[email protected]>

Signed-off-by: Noah Watkins <[email protected]>

andrwng

Overall makes sense. I've checked out the new patch and am running node op fuzz test with mixed versions (previously also tripped on this), initial results seem positive

andrwng · 2022-07-20T22:33:33Z

src/v/rpc/simple_protocol.cc

@@ -153,6 +153,7 @@ simple_protocol::dispatch_method_once(header h, net::server::resources rs) {
 ctx->res.conn->addr);
 rs.probe().method_not_found();
 netbuf reply_buf;
+ reply_buf.set_version(ctx->get_header().version);


This makes me think it'd be nice to be able to add a vassert that we never send a v0 netbuf over the wire, e.g. in send_reply(). I don't think we can though since we'll still send v0 over the wire if talking to an old server (though it gets ignored regardless)

Another thought is that we could use some simple_reply_buf around the simple protocol instead of netbuf that is constructed with an initial version (even if it's just _version).

I don't feel strongly that either of these would be much better than what's there, but just brainstorming ways we can be sure our sent bytes are reasonable

I don't think we can though since we'll still send v0 over the wire if talking to an old server (though it gets ignored regardless)

correct

Another thought is that we could use some simple_reply_buf around the simple protocol instead of netbuf that is constructed with an initial version (even if it's just _version).

i think the right move would be to add a constructor to netbuf so it can't be constructed without providing a version

Yea that seems reasonable too. I was initially worried we'd end up with some lifecycle issues, e.g. not knowing what the right version is at time of construction, but I agree baking this into netbuf would feel better

dotnwat · 2022-07-21T00:50:24Z

Failure is #4848

dotnwat requested review from mmaslankaprv and jcsp as code owners July 19, 2022 21:24

github-actions bot added the area/redpanda label Jul 19, 2022

dotnwat changed the title ~~Fix 5506~~ rpc: reset transport version via reconnect_transport Jul 19, 2022

dotnwat requested review from andrwng and graphcareful July 19, 2022 21:25

dotnwat force-pushed the fix-5506 branch from 3d35dad to 24e055d Compare July 19, 2022 21:27

andrwng previously approved these changes Jul 19, 2022

View reviewed changes

dotnwat dismissed andrwng’s stale review via 9382cc5 July 19, 2022 22:51

dotnwat force-pushed the fix-5506 branch from 24e055d to 9382cc5 Compare July 19, 2022 22:51

dotnwat requested a review from andrwng July 19, 2022 22:51

andrwng previously approved these changes Jul 19, 2022

View reviewed changes

mmedenjak added the kind/bug Something isn't working label Jul 20, 2022

dotnwat added 5 commits July 20, 2022 14:17

rpc: privatize internal apis

dfe2e13

Signed-off-by: Noah Watkins <[email protected]>

rpc: log policy violations at error level

f3c6b5f

Signed-off-by: Noah Watkins <[email protected]>

dotnwat dismissed andrwng’s stale review via f3c6b5f July 20, 2022 22:00

dotnwat force-pushed the fix-5506 branch from 9382cc5 to f3c6b5f Compare July 20, 2022 22:00

dotnwat requested a review from andrwng July 20, 2022 22:00

andrwng approved these changes Jul 20, 2022

View reviewed changes

dotnwat merged commit b2403ff into redpanda-data:dev Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc: reset transport version via reconnect_transport #5520

rpc: reset transport version via reconnect_transport #5520

dotnwat commented Jul 19, 2022

andrwng left a comment

andrwng Jul 19, 2022

dotnwat commented Jul 20, 2022

andrwng left a comment

andrwng Jul 20, 2022

dotnwat Jul 20, 2022

andrwng Jul 20, 2022

dotnwat commented Jul 21, 2022

rpc: reset transport version via reconnect_transport #5520

rpc: reset transport version via reconnect_transport #5520

Conversation

dotnwat commented Jul 19, 2022

Cover letter

Release notes

andrwng left a comment

Choose a reason for hiding this comment

andrwng Jul 19, 2022

Choose a reason for hiding this comment

dotnwat commented Jul 20, 2022

andrwng left a comment

Choose a reason for hiding this comment

andrwng Jul 20, 2022

Choose a reason for hiding this comment

dotnwat Jul 20, 2022

Choose a reason for hiding this comment

andrwng Jul 20, 2022

Choose a reason for hiding this comment

dotnwat commented Jul 21, 2022