kafka: fix metadata requests during advertised listener config changes #3589

jcsp · 2022-01-24T12:11:32Z

Cover letter

The existing code assumed that members_table would contain
listener info with matching name field for the name
of the listener on which we receive a kafka metadata request.

This may not be the case:

On a badly-written configuration file, which has suitable
listener addresses on all nodes but has failed to match
up the names
During a rolling restart for configuration change that
modifies listener address
Immediately after restart for a node changing its listener
names, where the node_config listener name has not yet
propagated to the members_table via raft0

Fixes: #3588

Release notes

Improvements

Improved handling of configurations where advertised_kafka_api or kafka_api property has different names between nodes, for example during a configuration change & rolling restart.

The existing code assumed that members_table would contain listener info with matching `name` field for the name of the listener on which we receive a kafka metadata request. This may not be the case: - On a badly-written configuration file, which has suitable listener addresses on all nodes but has failed to match up the names - During a rolling restart for configuration change that modifies listener address - Immediately after restart for a node changing its listener names, where the node_config listener name has not yet propagated to the members_table via raft0 Fixes: redpanda-data#3588

jcsp · 2022-01-24T14:35:11Z

CI failure was the RpkConfigTest one fixed in #3590, rerunning

src/v/kafka/server/handlers/metadata.cc

jcsp · 2022-01-25T11:36:43Z

second CI failure was #3595

dotnwat

code lgtm.

I'm wondering how useful guessing is in practice. Afaict listener names are usually used in contexts where there is no reasonable backup option, such as instructing clients to connect to a host that is routable for the client's network.

Does guessing have the potential to leak internal configuration details that we might want to keep private?

jcsp · 2022-01-25T22:45:57Z

I'm wondering how useful guessing is in practice. Afaict listener names are usually used in contexts where there is no reasonable backup option, such as instructing clients to connect to a host that is routable for the client's network.

I think the main scenario where this will come up is reconfiguration where the initial config is an anonymous listener, and the new config is two listeners with names -- imagine users starting out with a single non-tls listener, then later adding a tls listener and giving both nice names.

A really clever operator might monitor for kafka client connectivity during the rolling restart, and in this case might end up backing out the change because they saw kafka go goofy while the cluster was in a mixed-config state. In the absence of those smarts, the users would probably see client applications have issues during the restart and file it under general superstition about not trusting our rolling restarts to be safe.

Not frequent in walltime terms, but probably reasonably frequent in %ge of people trying out redpanda and evolving their config.

Does guessing have the potential to leak internal configuration details that we might want to keep private?

Interesting point. Yes, in quite limited circumstances. For example, if on our cloud instances we removed the external listener from one node and restarted it, then other nodes would end up including that node's internal pod IP in their metadata responses. I'd put that under the umbrella of "bad configuration" and say that while it leaks an internal IP, that isn't a secret per se, and bad configurations can always leak information (like if the configuration mixed up the listener names, they'd broadcast the internal name to external clients).

dotnwat · 2022-01-25T23:21:30Z

where the initial config is an anonymous listener, and the new config is two listeners with names

ahh of course that makes sense. and thanks for the rest of the discussion. it does seem like there could be operator smarts added in the future.

put that under the umbrella of "bad configuration" and say that while it leaks an internal IP

yeh the internal IP is what I had in mind, but this taxonomy of bad configuration seems reasonable.

let's merge it!

jcsp added kind/bug Something isn't working area/kafka labels Jan 24, 2022

jcsp requested review from dotnwat and NyaliaLui as code owners January 24, 2022 12:11

github-actions bot added the area/redpanda label Jan 24, 2022

jcsp mentioned this pull request Jan 24, 2022

admin: improve leader redirect heuristic #3571

Merged

dotnwat reviewed Jan 25, 2022

View reviewed changes

src/v/kafka/server/handlers/metadata.cc Show resolved Hide resolved

dotnwat approved these changes Jan 25, 2022

View reviewed changes

jcsp merged commit f6b6ffe into redpanda-data:dev Jan 26, 2022

jcsp deleted the issue-3588 branch January 26, 2022 15:33

mmaslankaprv mentioned this pull request Mar 14, 2022

Fix 3859 #4003

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka: fix metadata requests during advertised listener config changes #3589

kafka: fix metadata requests during advertised listener config changes #3589

jcsp commented Jan 24, 2022

jcsp commented Jan 24, 2022

jcsp commented Jan 25, 2022

dotnwat left a comment

jcsp commented Jan 25, 2022 •

edited

Loading

dotnwat commented Jan 25, 2022

kafka: fix metadata requests during advertised listener config changes #3589

kafka: fix metadata requests during advertised listener config changes #3589

Conversation

jcsp commented Jan 24, 2022

Cover letter

Release notes

Improvements

jcsp commented Jan 24, 2022

jcsp commented Jan 25, 2022

dotnwat left a comment

Choose a reason for hiding this comment

jcsp commented Jan 25, 2022 • edited Loading

dotnwat commented Jan 25, 2022

jcsp commented Jan 25, 2022 •

edited

Loading