Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semaphore timeout in FranzGoVerifiableWithSiTest.test_si_with_timeboxed #5895

Closed
VladLazar opened this issue Aug 8, 2022 · 1 comment · Fixed by #5921
Closed

Semaphore timeout in FranzGoVerifiableWithSiTest.test_si_with_timeboxed #5895

VladLazar opened this issue Aug 8, 2022 · 1 comment · Fixed by #5921
Assignees
Labels
area/cloud-storage Shadow indexing subsystem area/tests ci-failure kind/bug Something isn't working

Comments

@VladLazar
Copy link
Contributor

A semaphore timed out in the NTP archiver upload loop and the exception bubbled up stopping the loop:

upload loop error: seastar::semaphore_timed_out (Semaphore timedout)

The latest occurence is from Friday (5 Aug), but there's more in the past: https://buildkite.com/redpanda/vtools/builds/3123#01826bf6-06ed-43d7-b0cf-5124782ac879.

@VladLazar VladLazar added kind/bug Something isn't working ci-failure labels Aug 8, 2022
@VladLazar VladLazar self-assigned this Aug 8, 2022
@VladLazar
Copy link
Contributor Author

The important clue in the error message is that the semaphore is not named. All the semaphores explicitly created by redpanda are named. This means that the time-out comes from some other concurrency primitive that uses seastar::semaphore under the hood. Some digging later I found the the segment_read_lock in the archiver which matches the description.

The reason for the failure is that the test causes contention on the lock. In the current configuration every segment fetch from SI causes a cache eviction (see #5915). That fix might help here too.

The underlying problem of not handling semaphore time outs still remains though. Perhaps we should catch the exception here and silence it as we do for other exceptions in the upload loop.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem area/tests ci-failure kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants