Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka: fix metadata requests during advertised listener config changes #3589

Merged
merged 1 commit into from
Jan 26, 2022

Conversation

jcsp
Copy link
Contributor

@jcsp jcsp commented Jan 24, 2022

Cover letter

The existing code assumed that members_table would contain
listener info with matching name field for the name
of the listener on which we receive a kafka metadata request.

This may not be the case:

  • On a badly-written configuration file, which has suitable
    listener addresses on all nodes but has failed to match
    up the names
  • During a rolling restart for configuration change that
    modifies listener address
  • Immediately after restart for a node changing its listener
    names, where the node_config listener name has not yet
    propagated to the members_table via raft0

Fixes: #3588

Release notes

Improvements

  • Improved handling of configurations where advertised_kafka_api or kafka_api property has different names between nodes, for example during a configuration change & rolling restart.

The existing code assumed that members_table would contain
listener info with matching `name` field for the name
of the listener on which we receive a kafka metadata request.

This may not be the case:
- On a badly-written configuration file, which has suitable
  listener addresses on all nodes but has failed to match
  up the names
- During a rolling restart for configuration change that
  modifies listener address
- Immediately after restart for a node changing its listener
  names, where the node_config listener name has not yet
  propagated to the members_table via raft0

Fixes: redpanda-data#3588
@jcsp
Copy link
Contributor Author

jcsp commented Jan 24, 2022

CI failure was the RpkConfigTest one fixed in #3590, rerunning

@jcsp
Copy link
Contributor Author

jcsp commented Jan 25, 2022

second CI failure was #3595

Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm.

I'm wondering how useful guessing is in practice. Afaict listener names are usually used in contexts where there is no reasonable backup option, such as instructing clients to connect to a host that is routable for the client's network.

Does guessing have the potential to leak internal configuration details that we might want to keep private?

@jcsp
Copy link
Contributor Author

jcsp commented Jan 25, 2022

I'm wondering how useful guessing is in practice. Afaict listener names are usually used in contexts where there is no reasonable backup option, such as instructing clients to connect to a host that is routable for the client's network.

I think the main scenario where this will come up is reconfiguration where the initial config is an anonymous listener, and the new config is two listeners with names -- imagine users starting out with a single non-tls listener, then later adding a tls listener and giving both nice names.

A really clever operator might monitor for kafka client connectivity during the rolling restart, and in this case might end up backing out the change because they saw kafka go goofy while the cluster was in a mixed-config state. In the absence of those smarts, the users would probably see client applications have issues during the restart and file it under general superstition about not trusting our rolling restarts to be safe.

Not frequent in walltime terms, but probably reasonably frequent in %ge of people trying out redpanda and evolving their config.

Does guessing have the potential to leak internal configuration details that we might want to keep private?

Interesting point. Yes, in quite limited circumstances. For example, if on our cloud instances we removed the external listener from one node and restarted it, then other nodes would end up including that node's internal pod IP in their metadata responses. I'd put that under the umbrella of "bad configuration" and say that while it leaks an internal IP, that isn't a secret per se, and bad configurations can always leak information (like if the configuration mixed up the listener names, they'd broadcast the internal name to external clients).

@dotnwat
Copy link
Member

dotnwat commented Jan 25, 2022

where the initial config is an anonymous listener, and the new config is two listeners with names

ahh of course that makes sense. and thanks for the rest of the discussion. it does seem like there could be operator smarts added in the future.

put that under the umbrella of "bad configuration" and say that while it leaks an internal IP

yeh the internal IP is what I had in mind, but this taxonomy of bad configuration seems reasonable.

let's merge it!

@jcsp jcsp merged commit f6b6ffe into redpanda-data:dev Jan 26, 2022
@jcsp jcsp deleted the issue-3588 branch January 26, 2022 15:33
@mmaslankaprv mmaslankaprv mentioned this pull request Mar 14, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kafka: when changing advertised address name, we may return empty metadata response
2 participants