-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka: fix metadata requests during advertised listener config changes #3589
Conversation
The existing code assumed that members_table would contain listener info with matching `name` field for the name of the listener on which we receive a kafka metadata request. This may not be the case: - On a badly-written configuration file, which has suitable listener addresses on all nodes but has failed to match up the names - During a rolling restart for configuration change that modifies listener address - Immediately after restart for a node changing its listener names, where the node_config listener name has not yet propagated to the members_table via raft0 Fixes: redpanda-data#3588
CI failure was the RpkConfigTest one fixed in #3590, rerunning |
second CI failure was #3595 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm.
I'm wondering how useful guessing is in practice. Afaict listener names are usually used in contexts where there is no reasonable backup option, such as instructing clients to connect to a host that is routable for the client's network.
Does guessing have the potential to leak internal configuration details that we might want to keep private?
I think the main scenario where this will come up is reconfiguration where the initial config is an anonymous listener, and the new config is two listeners with names -- imagine users starting out with a single non-tls listener, then later adding a tls listener and giving both nice names. A really clever operator might monitor for kafka client connectivity during the rolling restart, and in this case might end up backing out the change because they saw kafka go goofy while the cluster was in a mixed-config state. In the absence of those smarts, the users would probably see client applications have issues during the restart and file it under general superstition about not trusting our rolling restarts to be safe. Not frequent in walltime terms, but probably reasonably frequent in %ge of people trying out redpanda and evolving their config.
Interesting point. Yes, in quite limited circumstances. For example, if on our cloud instances we removed the external listener from one node and restarted it, then other nodes would end up including that node's internal pod IP in their metadata responses. I'd put that under the umbrella of "bad configuration" and say that while it leaks an internal IP, that isn't a secret per se, and bad configurations can always leak information (like if the configuration mixed up the listener names, they'd broadcast the internal name to external clients). |
ahh of course that makes sense. and thanks for the rest of the discussion. it does seem like there could be operator smarts added in the future.
yeh the internal IP is what I had in mind, but this taxonomy of bad configuration seems reasonable. let's merge it! |
Cover letter
The existing code assumed that members_table would contain
listener info with matching
name
field for the nameof the listener on which we receive a kafka metadata request.
This may not be the case:
listener addresses on all nodes but has failed to match
up the names
modifies listener address
names, where the node_config listener name has not yet
propagated to the members_table via raft0
Fixes: #3588
Release notes
Improvements
advertised_kafka_api
orkafka_api
property has different names between nodes, for example during a configuration change & rolling restart.