Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix readers cache truncation deadlock #5681

Conversation

mmaslankaprv
Copy link
Member

@mmaslankaprv mmaslankaprv commented Jul 27, 2022

Cover letter

Fixed condition checking if range is locked. Incorrect check resulted in
situations in which log truncation was blocked by pending readers.

Log truncation is multi step process that ends with grabbing necessary
segments write logs and data deletion. In order for truncation to grab
the write lock all readers which own read lock units from range being
truncated must be evicted from readers_cache. Since log truncation
contains multiple scheduling points it may interleave with another fiber
creating a reader for log that is currently being truncated. This reader
MUST not be cached as truncation would need to wait for it to be
evicted. Additionally no new readers can be created as truncation
related write lock request is waiting in the read_write_lock
underlying semaphore waiters queue.

In order to prevent readers requesting truncated range from being cached
readers cache maintain list of locked ranges i.e. ranges for which
readers can not be cached.

Previously an incorrect condition checking if reader belongs to locked
range allow it to be cached preventing the truncate action to
continue. This stopped all other writes and truncation for 60 seconds,
after this duration the reader was evicted from the cache, its lease was
released and truncation was able to finish.

Fixed incorrect condition checking if reader is within the locked
range.

Fixes: #5510

Release notes

Fixed move constructor and operator of
`readers_cache::range_lock_holder`. Previously moving the
`range_lock_holder` would result in lock being released.

Signed-off-by: Michal Maslanka <michal@redpanda.com>
_locked_offset_ranges.end(),
[reader_base_offset, reader_end_offset](const offset_range& range) {
return reader_base_offset <= range.second
&& reader_end_offset >= range.first;
Copy link
Contributor

@ajfabbri ajfabbri Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Story time: When I finished grad school and interviewed at my first big job, this was the last coding problem in a long day of interviews. This part in particular is error prone. Also, the guy who gave me this problem is now making autonomous robots that kill things (weeds) with lasers.

_locked_offset_ranges.begin(),
_locked_offset_ranges.end(),
[reader_base_offset, reader_end_offset](const offset_range& range) {
return reader_base_offset <= range.second
Copy link
Contributor

@ajfabbri ajfabbri Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find. This condition looks correct to me. Previous one didn't make any sense to me. Good call putting it in a helper function.

} // namespace storage
using namespace storage;

FIXTURE_TEST(test_range_is_correctly_locked, readers_cache_test_fixture) {
Copy link
Contributor

@ajfabbri ajfabbri Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding a unit test.

});
});

produce.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions about this integration test.. nothing that should block this from being merged, IMO. We can discuss separately.

ajfabbri
ajfabbri previously approved these changes Jul 27, 2022
Copy link
Contributor

@ajfabbri ajfabbri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (pending CI): The new range overlap condition makes much more sense than before. Thanks for adding tests. For "continuous improvement", let's remind each other during review that all new APIs / modules should come with a unit test.

Comment on lines -57 to -63
auto lock_it = std::find_if(
_locked_offset_ranges.begin(),
_locked_offset_ranges.end(),
[&reader](const offset_range& range) {
return reader->lease_range_base_offset() > range.second
|| reader->lease_range_end_offset() < range.first;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! The fix seems reasonable, though I'm curious how else this bug may manifest itself. Is the worst that can happen a minute-long deadlock? Or is there any way this can result in a segfault or segments being deleted incorrectly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minute deadlock is how it manifests, it can not lead to any other more serious issue as read/write/truncate concurrency control is separate from readers cache.

dotnwat
dotnwat previously approved these changes Jul 28, 2022
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending the CI failure audit

nice catch. that's great find.

Comment on lines -52 to +64
std::erase(_cache->_locked_offset_ranges, _range);
if (_range) {
std::erase(_cache->_locked_offset_ranges, _range.value());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gosh. nice catch

src/v/storage/readers_cache.cc Outdated Show resolved Hide resolved
src/v/storage/tests/storage_e2e_test.cc Show resolved Hide resolved
auto holder = cache.evict_prefix_truncate(model::offset(10)).get();

// [0,1] is in [0,10]
test_within_range(0, 1, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Fixed condition checking if range is locked. Incorrect check resulted in
situations in which log truncation was blocked by pending readers.

Log truncation is multi step process that ends with grabbing necessary
segments write logs and data deletion. In order for truncation to grab
the write lock all readers which own read lock units from range being
truncated must be evicted from `readers_cache`. Since log truncation
contains multiple scheduling points it may interleave with another fiber
creating a reader for log that is currently being truncated. This reader
MUST not be cached as truncation would need to wait for it to be
evicted. Additionally no new readers can be created as truncation
related write lock request is waiting in the `read_write_lock`
underlying semaphore waiters queue.

In order to prevent readers requesting truncated range from being cached
readers cache maintain list of locked ranges i.e. ranges for which
readers can not be cached.

Previously an incorrect condition checking if reader belongs to locked
range allow it to be cached preventing the `truncate` action to
continue. This stopped all other writes and truncation for 60 seconds,
after this duration the reader was evicted from the cache, its lease was
released and trucation was able to finish.

Fixed incorrect condition checking if reader is within the locked
range.

Fixes: redpanda-data#5510

Signed-off-by: Michal Maslanka <michal@redpanda.com>
Signed-off-by: Michal Maslanka <michal@redpanda.com>
@mmaslankaprv mmaslankaprv merged commit 16d30b1 into redpanda-data:dev Jul 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants