-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator: move centralized-configuration Kuttl tests to main bucket. #5609
Conversation
What was the outcome of investigating the most recent failures of these? (https://redpandadata.slack.com/archives/C01H6JRQX1S/p1658740986316359?thread_ts=1658736130.531769&cid=C01H6JRQX1S) If something is still up with them, let's fix it before we reinstate them. |
k8s-unstable-tests had two failures in last 24h: @nicolaferraro it would be good to get a signal on whether we have a bug here before we release 22.2 |
@jcsp the logs highlight a strange behavior of the configuration system in redpanda:
It's a 2 replicas cluster and the flow can be read as:
So, there might be some error in the way the query is performed, or redpanda changed the way to handle these cases of configuration changes. Wdyt? |
The information about which config is every node using comes from |
Update of status is asynchronous: there is no guarantee that the version in the status will reflect the version in the response from the PUT. It's because status updates are themselves persistent writes to the controller log, separate to the write that updates the configuration. We could make the API a bit friendlier by waiting for status updates inside the PUT handler, but that would not be 100% reliable either because it's possible for controller to lose leadership between writing the config update and writing the status update. For testing on existing code, the solution is to have a retry-wait for the status, rather than expecting it to be updated synchronously. Making this a bit friendlier in the API is #5833 |
I remember we did some changes in the v22.1 branch to direct calls to the leader since it was supposed to apply the configuration before returning. The current operator code requires that there's some consistency between the two calls to save-config and get-config.. |
Getting the config on the leader after setting it on the leader is synchronous, it's just the status specifically that's asynchonous. |
Previously, after writing a config update, API clients could do a /status query to the same node and not see any nodes (including the leader that they just PUT to) reflect the new version. With this change, if the client is talking to the controller leader, it will reliably see the new config version reflected in the /status result when querying the same node again after a PUT. This is a little subtle and later we should make simpler rules for this via a higher level "wait for status updates" as part of the PUT call itself: redpanda-data#5833 Related: redpanda-data#5609
Previously, after writing a config update, API clients could do a /status query to the same node and not see any nodes (including the leader that they just PUT to) reflect the new version. With this change, if the client is talking to the controller leader, it will reliably see the new config version reflected in the /status result when querying the same node again after a PUT. This is a little subtle and later we should make simpler rules for this via a higher level "wait for status updates" as part of the PUT call itself: redpanda-data#5833 Related: redpanda-data#5609 (cherry picked from commit 6ba1128)
Previously, after writing a config update, API clients could do a /status query to the same node and not see any nodes (including the leader that they just PUT to) reflect the new version. With this change, if the client is talking to the controller leader, it will reliably see the new config version reflected in the /status result when querying the same node again after a PUT. This is a little subtle and later we should make simpler rules for this via a higher level "wait for status updates" as part of the PUT call itself: redpanda-data#5833 Related: redpanda-data#5609 (cherry picked from commit 6ba1128)
Previously, after writing a config update, API clients could do a /status query to the same node and not see any nodes (including the leader that they just PUT to) reflect the new version. With this change, if the client is talking to the controller leader, it will reliably see the new config version reflected in the /status result when querying the same node again after a PUT. This is a little subtle and later we should make simpler rules for this via a higher level "wait for status updates" as part of the PUT call itself: redpanda-data#5833 Related: redpanda-data#5609
Needs rebase |
Centralized configuration and related tests seem to be stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, tests is passing. I will merge as the upgrade tests are failing. The #9400 will address that.
Centralized configuration and related tests seem to be stable.
cc: @jcsp