Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race between flush error recovery and db destruction #12002

Closed
wants to merge 1 commit into from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Oct 24, 2023

Context:
DB destruction will wait for ongoing error recovery through EndAutoRecovery() and join the recovery thread:

error_handler_.CancelErrorRecovery();
->
EndAutoRecovery();
->

rocksdb/db/error_handler.cc

Lines 808 to 823 in 519f2a4

void ErrorHandler::EndAutoRecovery() {
db_mutex_->AssertHeld();
if (!end_recovery_) {
end_recovery_ = true;
}
if (recovery_thread_) {
// Ensure only one thread can execute the join().
std::unique_ptr<port::Thread> old_recovery_thread(
std::move(recovery_thread_));
db_mutex_->Unlock();
cv_.SignalAll();
old_recovery_thread->join();
db_mutex_->Lock();
}
return;
}

However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below.

std::terminate() 
std::default_delete<std::thread>::operator()(std::thread*) const 
std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr()
rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678)

Summary:
This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently handle such case inside the created recovery thread.

Test plan:
A new UT repro-ed the crash before this fix and and pass after.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We should see if there's some patterns in the recent ErrorHandler race conditions. #11955 #11950 #11939 #11937 #11890 #11880 #11991. Personally I think it has too much state and maybe could be implemented as functions on DBImpl

@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 merged this pull request in 0f14135.

rockeet pushed a commit to topling/toplingdb that referenced this pull request Dec 23, 2023
Summary:
**Context:**
DB destruction will wait for ongoing error recovery through `EndAutoRecovery()` and join the recovery thread: https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/db_impl/db_impl.cc#L525 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L250 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L808-L823

However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below.

```
std::terminate()
std::default_delete<std::thread>::operator()(std::thread*) const
std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr()
rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678)
```

**Summary:**
This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently [handle](https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L688-L694) such case inside the created recovery thread.

Pull Request resolved: facebook/rocksdb#12002

Test Plan: A new UT repro-ed the crash before this fix and and pass after.

Reviewed By: ajkr

Differential Revision: D50586191

Pulled By: hx235

fbshipit-source-id: b372f6d7a94eadee4b9283b826cc5fb81779a093
rockeet pushed a commit to topling/toplingdb that referenced this pull request Sep 1, 2024
Summary:
**Context:**
DB destruction will wait for ongoing error recovery through `EndAutoRecovery()` and join the recovery thread: https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/db_impl/db_impl.cc#L525 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L250 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L808-L823

However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below.

```
std::terminate()
std::default_delete<std::thread>::operator()(std::thread*) const
std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr()
rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678)
```

**Summary:**
This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently [handle](https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L688-L694) such case inside the created recovery thread.

Pull Request resolved: facebook/rocksdb#12002

Test Plan: A new UT repro-ed the crash before this fix and and pass after.

Reviewed By: ajkr

Differential Revision: D50586191

Pulled By: hx235

fbshipit-source-id: b372f6d7a94eadee4b9283b826cc5fb81779a093
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants