-
Notifications
You must be signed in to change notification settings - Fork 579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extreme_recovery_test.py fails to upload one manifest #5928
Comments
Missing manifest here was
|
Searching for this particular partition in the logs, it looks like this may just be that the affected partition didn't have any non-data batches? edit: This is very unlikely, given kgo-verifier randomly chooses partitions on produce, and these tests are producing ~10000 messages per partition.
|
it looks like here
node stepped down and archiver loop stopped, then node became leader again a couple of seconds later but archiver was not started. Later on the segment did have some data, but it did not get uploaded. The segment names logged by storage later on indicate it had large offset to have some data:
so it should have been uploaded but wasn't. |
I've reproduced this a couple of times when working on #5818.
This "scale test" for recovery from cloud storage bascially does this:
redpanda.remote.recovery
to recover each topic.The test occasionally fails in step 3: The test waits a long time (say, approx. two hours for 12 GiB of data), but fails with one missing
segmentmanifest file like this:Both times I reproduced this, there was exactly one segment missing when the test timed out and failed:
The text was updated successfully, but these errors were encountered: