admin: read-after-write consistency for config status on leader node #5835

jcsp · 2022-08-04T10:40:23Z

Cover letter

admin: read-after-write consistency for config status on leader node

Previously, after writing a config update, API clients could do
a /status query to the same node and not see any nodes (including
the leader that they just PUT to) reflect the new version.

With this change, if the client is talking to the controller leader,
it will reliably see the new config version reflected in the /status
result when querying the same node again after a PUT.

This is a little subtle and later we should make simpler rules
for this via a higher level "wait for status updates" as part
of the PUT call itself: https://github.com/redpanda-data/redpanda/issues/5833

Related: #5609

Backport Required

UX changes

None

Release notes

none

Previously, after writing a config update, API clients could do a /status query to the same node and not see any nodes (including the leader that they just PUT to) reflect the new version. With this change, if the client is talking to the controller leader, it will reliably see the new config version reflected in the /status result when querying the same node again after a PUT. This is a little subtle and later we should make simpler rules for this via a higher level "wait for status updates" as part of the PUT call itself: redpanda-data#5833 Related: redpanda-data#5609

This tests the new behaviour in the previous commit.

nicolaferraro

Sounds good

jcsp · 2022-08-05T17:40:47Z

This had a couple failures in ClusterConfigTest, will need to take a look at whether those tests have incorrect assumptions or if something else is up.

dotnwat

lgtm

dotnwat · 2022-08-06T01:19:33Z

tests/rptest/tests/cluster_config_test.py

+        Clearly doing fast reads isn't a guarantee of strict consistency
+        rules, but it will detect violations on realistic timescales.  This
+        test did fail in practice before the change to have /status return


do i understand correctly that what you are saying here is that a read-your-own-write, provided by this patch, might not yet be replicated such that a write may appear to disappear under failure scenarios? if not, i guess i'm a bit confused about what is being said here.

This comment is really about the test more than the main code: pointing out that for tests, doing reads after writes does not in itself prove read-after-write consistency (we might just get lucky), but that in practice i have confidence in this test because it did indeed fail when run against a redpanda without the change.

might not yet be replicated such that a write may appear to disappear under failure scenarios?

No, the node where the config update has been applied will not rewind its view of its own status: the ack of PUT is a replicate_and_wait of the configuration delta. It isn't waiting for the configuration status to be persisted, but that's the thrust of the change in this PR: we will now have nodes report a non-persistent status for themselves if the persistent status hasn't advanced yet.

If we query another node, it is possible to see persistent status updates for the nodes _other_ than the one we are querying, and non-persistent update to the stauts of the node we are querying, that passes the version check. Then if we query status on a different node a moment later, we will see an older state for the node we first queried. This only matters for tests that are actively trying to read the status _again_ after wait_for_version_sync. wait_for_version_sync was already correct inasmuchas when it complete the config has been applied everywhere.

jcsp · 2022-08-09T12:44:41Z

CI failures:

dotnwat · 2022-08-16T04:52:23Z

restarted ci since the pr is a bit old, but otherwise looks good

jcsp · 2022-08-16T08:47:08Z

CI failures are:

Failure of ConnectionRateLimitTest.connection_rate_test #5276
test_cancelling_partition_move_x_core (there are three issues, I didn't check which one it was!)

jcsp · 2022-08-16T08:47:21Z

/backport v22.2.x

jcsp · 2022-08-16T08:47:34Z

/backport v22.1.x

jcsp added 2 commits August 4, 2022 11:39

test: add config test_status_read_after_write_consistency

c35eb1d

This tests the new behaviour in the previous commit.

jcsp added kind/enhance New feature or request area/redpanda labels Aug 4, 2022

jcsp mentioned this pull request Aug 4, 2022

operator: move centralized-configuration Kuttl tests to main bucket. #5609

Merged

jcsp marked this pull request as ready for review August 4, 2022 11:02

jcsp requested review from dotnwat, NyaliaLui, mmaslankaprv, ztlpn and VadimPlh as code owners August 4, 2022 11:02

jcsp requested a review from nicolaferraro August 4, 2022 11:02

nicolaferraro previously approved these changes Aug 4, 2022

View reviewed changes

dotnwat reviewed Aug 6, 2022

View reviewed changes

nicolaferraro mentioned this pull request Aug 8, 2022

When specifying/updating cloudStorage in the CRD, the cluster config is not updated #5876

Closed

jcsp dismissed nicolaferraro’s stale review via 13d211a August 8, 2022 14:50

jcsp requested review from dotnwat and nicolaferraro August 9, 2022 12:44

dotnwat approved these changes Aug 16, 2022

View reviewed changes

jcsp merged commit b40ed60 into redpanda-data:dev Aug 16, 2022

jcsp deleted the issue-5609-config-status-consistency branch August 16, 2022 08:47

This was referenced Aug 16, 2022

[v22.2.x] admin: read-after-write consistency for config status on leader node #6049

Merged

[v22.1.x] admin: read-after-write consistency for config status on leader node #6050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

admin: read-after-write consistency for config status on leader node #5835

admin: read-after-write consistency for config status on leader node #5835

jcsp commented Aug 4, 2022 •

edited

Loading

nicolaferraro left a comment

jcsp commented Aug 5, 2022

dotnwat left a comment

dotnwat Aug 6, 2022

jcsp Aug 8, 2022

jcsp Aug 8, 2022

jcsp commented Aug 9, 2022

dotnwat commented Aug 16, 2022

jcsp commented Aug 16, 2022

jcsp commented Aug 16, 2022

jcsp commented Aug 16, 2022

admin: read-after-write consistency for config status on leader node #5835

admin: read-after-write consistency for config status on leader node #5835

Conversation

jcsp commented Aug 4, 2022 • edited Loading

Cover letter

Backport Required

UX changes

Release notes

nicolaferraro left a comment

Choose a reason for hiding this comment

jcsp commented Aug 5, 2022

dotnwat left a comment

Choose a reason for hiding this comment

dotnwat Aug 6, 2022

Choose a reason for hiding this comment

jcsp Aug 8, 2022

Choose a reason for hiding this comment

jcsp Aug 8, 2022

Choose a reason for hiding this comment

jcsp commented Aug 9, 2022

dotnwat commented Aug 16, 2022

jcsp commented Aug 16, 2022

jcsp commented Aug 16, 2022

jcsp commented Aug 16, 2022

jcsp commented Aug 4, 2022 •

edited

Loading