Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ducktape: Change condition for upload check to pass when all segments are uploaded #11433

Merged

Conversation

abhijat
Copy link
Contributor

@abhijat abhijat commented Jun 14, 2023

The condition for the test checks if the number of segments uploaded is exactly equal to one minus the local segment count.

Since the max upload interval is set we can end up uploading all segments to the cloud including the open segment(as seen in CI failures). This change adjusts the success condition to include the case where all segments are uploaded.

Fixes #9587

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

  • none

The condition for the test checks if the number of segments uploaded is
exactly equal to one minus the local segment count. Since the max
upload interval is set we can end up uploading all segments to the
cloud (as seen in CI failures). This change adjusts the success
condition to include the case where all segments are uploaded.
@abhijat abhijat marked this pull request as ready for review June 14, 2023 15:34
@abhijat
Copy link
Contributor Author

abhijat commented Jun 15, 2023

Hmm, the https://ci-artifacts.dev.vectorized.cloud/redpanda/31299/0188bad1-db3b-4ba6-a9fa-ef7802ee8563/vbuild/ducktape/results/2023-06-14--001/report.html crash seems new.

Looks like it is not a crash, the process was killed by a failure injector and never brought up again, so it makes sense that it was not running:

[INFO  - 2023-06-14 17:21:42,580 - failure_injector - inject_failure - lineno:79]: injecting failure: type: 0, length: 0 seconds, node: docker-rp-19
[INFO  - 2023-06-14 17:21:42,580 - failure_injector - _kill - lineno:143]: killing redpanda on docker-rp-19
[DEBUG - 2023-06-14 17:21:42,581 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: pgrep --exact redpanda
[DEBUG - 2023-06-14 17:21:42,586 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: kill -9 5386
[DEBUG - 2023-06-14 17:21:42,634 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: pgrep --exact redpanda
[DEBUG - 2023-06-14 17:21:42,785 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: pgrep --exact redpanda
[DEBUG - 2023-06-14 17:21:42,890 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: pgrep --exact redpanda
[DEBUG - 2023-06-14 17:21:42,896 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command 'pgrep --exact redpanda' exited with status 1 and message: b''
[DEBUG - 2023-06-14 17:21:42,896 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: pgrep --exact redpanda
[DEBUG - 2023-06-14 17:21:42,905 - remoteaccount - _log - lineno:166]: root@docker-rp-19: Running ssh command: /var/lib/buildkite-agent/builds/buildkite-bk-amd64-xfs-builders-i-0cc38ccba52eafc2c-1/redpanda/redpanda/vbuild/redpanda_installs/ci/bin/redpanda --version
[INFO  - 2023-06-14 17:21:42,956 - node_operations - _worker - lineno:500]: waiting 55 seconds before next failure

The test itself might need to be looked into. Note that the other two nodes were terminated (not killed) and they were started again but this process which was killed was never restarted.

@abhijat
Copy link
Contributor Author

abhijat commented Jun 15, 2023

two CI failures are #11044 and #9052

@piyushredpanda piyushredpanda merged commit 3f9698f into redpanda-data:dev Jun 15, 2023
18 checks passed
@vbotbuildovich
Copy link
Collaborator

/backport v23.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI Failure "Segments not uploaded" in ShadowIndexingCloudRetentionTest.test_cloud_time_based_retention
4 participants