-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: adjust condition to remove archiver #5934
Conversation
14c8522
to
e524e69
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one small nit
006402d
to
f0a7e39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I was also looking at this the other day.
at least one test failure looks concerning:
will wait for debug build to finish and then trigger another run, to see if it may be related to this change. EDIT: did not fail in debug, doing a repeatx5 to check what fails. |
most test failures are variants of failures=False eg https://buildkite.com/redpanda/redpanda/builds/13929#0182885d-f1ef-4b0a-bb56-c7715aa0f404 couple of others which are mentioned on the PRs test_id: rptest.tests.consumer_offsets_migration_test.ConsumerOffsetsMigrationTest.test_migrating_consume_offsets.failures=False.cpus=1 |
an archiver is removed in reconcile loop if it is stopped. the stop condition checks if the upload loop is stopped. with read replica there are two status variables which denote a stopped loop. this change makes the condition such that any of these two variables being true marks the archiver as stopped, as the two conditions start out being false. they turn to true in mutually exclusive code paths. the archiver must be marked as removed in order for a stopped upload loop to restart.
f0a7e39
to
a7ff954
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the fix.
/backport 22.2.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
An archiver is removed in reconcile loop if it is considered to be stopped. The stop condition checks if the upload loop is stopped. with read replica there are two status variables which denote a stopped loop - one related to normal upload and other related to read replica.
This change adjusts the condition such that any of these two variables being true marks the archiver as stopped, as the two conditions start out being false and they turn to true in mutually exclusive code paths.
The archiver must be removed in order for a stopped upload loop to restart later. This can happen in a condition where a node lost leadership, archiver is stopped and it immediately regained leadership. It is then necessary for the archiver to have been removed correctly earlier for it to start again.
fixes #5928
Backport Required
UX changes
None
Release notes
None