-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admin: read-after-write consistency for config status on leader node #5835
admin: read-after-write consistency for config status on leader node #5835
Conversation
Previously, after writing a config update, API clients could do a /status query to the same node and not see any nodes (including the leader that they just PUT to) reflect the new version. With this change, if the client is talking to the controller leader, it will reliably see the new config version reflected in the /status result when querying the same node again after a PUT. This is a little subtle and later we should make simpler rules for this via a higher level "wait for status updates" as part of the PUT call itself: redpanda-data#5833 Related: redpanda-data#5609
This tests the new behaviour in the previous commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
This had a couple failures in ClusterConfigTest, will need to take a look at whether those tests have incorrect assumptions or if something else is up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Clearly doing fast reads isn't a guarantee of strict consistency | ||
rules, but it will detect violations on realistic timescales. This | ||
test did fail in practice before the change to have /status return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do i understand correctly that what you are saying here is that a read-your-own-write, provided by this patch, might not yet be replicated such that a write may appear to disappear under failure scenarios? if not, i guess i'm a bit confused about what is being said here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is really about the test more than the main code: pointing out that for tests, doing reads after writes does not in itself prove read-after-write consistency (we might just get lucky), but that in practice i have confidence in this test because it did indeed fail when run against a redpanda without the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might not yet be replicated such that a write may appear to disappear under failure scenarios?
No, the node where the config update has been applied will not rewind its view of its own status: the ack of PUT is a replicate_and_wait of the configuration delta. It isn't waiting for the configuration status to be persisted, but that's the thrust of the change in this PR: we will now have nodes report a non-persistent status for themselves if the persistent status hasn't advanced yet.
If we query another node, it is possible to see persistent status updates for the nodes _other_ than the one we are querying, and non-persistent update to the stauts of the node we are querying, that passes the version check. Then if we query status on a different node a moment later, we will see an older state for the node we first queried. This only matters for tests that are actively trying to read the status _again_ after wait_for_version_sync. wait_for_version_sync was already correct inasmuchas when it complete the config has been applied everywhere.
restarted ci since the pr is a bit old, but otherwise looks good |
CI failures are:
|
/backport v22.2.x |
/backport v22.1.x |
Cover letter
Related: #5609
Backport Required
UX changes
None
Release notes