Fix race between flush error recovery and db destruction #12002

hx235 · 2023-10-24T03:00:44Z

Context:
DB destruction will wait for ongoing error recovery through EndAutoRecovery() and join the recovery thread:

rocksdb/db/db_impl/db_impl.cc

Line 525 in 519f2a4

error_handler_.CancelErrorRecovery();

->

rocksdb/db/error_handler.cc

Line 250 in 519f2a4

EndAutoRecovery();

->

rocksdb/db/error_handler.cc

Lines 808 to 823 in 519f2a4

    
           void ErrorHandler::EndAutoRecovery() { 
        
             db_mutex_->AssertHeld(); 
        
             if (!end_recovery_) { 
        
               end_recovery_ = true; 
        
             } 
        
             if (recovery_thread_) { 
        
               // Ensure only one thread can execute the join(). 
        
               std::unique_ptr<port::Thread> old_recovery_thread( 
        
                   std::move(recovery_thread_)); 
        
               db_mutex_->Unlock(); 
        
               cv_.SignalAll(); 
        
               old_recovery_thread->join(); 
        
               db_mutex_->Lock(); 
        
             } 
        
             return; 
        
           }

However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below.

std::terminate() 
std::default_delete<std::thread>::operator()(std::thread*) const 
std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr()
rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678)

Summary:
This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently handle such case inside the created recovery thread.

Test plan:
A new UT repro-ed the crash before this fix and and pass after.

facebook-github-bot · 2023-10-24T03:01:17Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-10-24T04:55:30Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-10-24T04:55:45Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr

LGTM. We should see if there's some patterns in the recent ErrorHandler race conditions. #11955 #11950 #11939 #11937 #11890 #11880 #11991. Personally I think it has too much state and maybe could be implemented as functions on DBImpl

facebook-github-bot · 2023-10-25T05:02:08Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-10-25T05:02:26Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-10-25T19:02:59Z

@hx235 merged this pull request in 0f14135.

Summary: **Context:** DB destruction will wait for ongoing error recovery through `EndAutoRecovery()` and join the recovery thread: https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/db_impl/db_impl.cc#L525 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L250 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L808-L823 However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below. ``` std::terminate() std::default_delete<std::thread>::operator()(std::thread*) const std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr() rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31) rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725) rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725) rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678) ``` **Summary:** This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently [handle](https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L688-L694) such case inside the created recovery thread. Pull Request resolved: facebook/rocksdb#12002 Test Plan: A new UT repro-ed the crash before this fix and and pass after. Reviewed By: ajkr Differential Revision: D50586191 Pulled By: hx235 fbshipit-source-id: b372f6d7a94eadee4b9283b826cc5fb81779a093

facebook-github-bot added the CLA Signed label Oct 24, 2023

hx235 force-pushed the error_handler_crash branch from 50fc35a to fff71a7 Compare October 24, 2023 04:55

ajkr approved these changes Oct 25, 2023

View reviewed changes

Fix race between flush error recovery VS db destruction

dd11428

hx235 force-pushed the error_handler_crash branch from fff71a7 to dd11428 Compare October 25, 2023 05:02

facebook-github-bot closed this in 0f14135 Oct 25, 2023

facebook-github-bot added the Merged label Oct 25, 2023

jaykorean mentioned this pull request Oct 31, 2023

Fix for RecoverFromRetryableBGIOError starting with recovery_in_prog_ false #11991

Closed

igorcanadi mentioned this pull request Jan 17, 2024

[SYS-6913] Upgrade RocksDB-Cloud to 8.9.1 rockset/rocksdb-cloud#315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race between flush error recovery and db destruction #12002

Fix race between flush error recovery and db destruction #12002

hx235 commented Oct 24, 2023

facebook-github-bot commented Oct 24, 2023

facebook-github-bot commented Oct 24, 2023

facebook-github-bot commented Oct 24, 2023

ajkr left a comment

facebook-github-bot commented Oct 25, 2023

facebook-github-bot commented Oct 25, 2023

facebook-github-bot commented Oct 25, 2023

	void ErrorHandler::EndAutoRecovery() {
	db_mutex_->AssertHeld();
	if (!end_recovery_) {
	end_recovery_ = true;
	}
	if (recovery_thread_) {
	// Ensure only one thread can execute the join().
	std::unique_ptr<port::Thread> old_recovery_thread(
	std::move(recovery_thread_));
	db_mutex_->Unlock();
	cv_.SignalAll();
	old_recovery_thread->join();
	db_mutex_->Lock();
	}
	return;
	}

Fix race between flush error recovery and db destruction #12002

Fix race between flush error recovery and db destruction #12002

Conversation

hx235 commented Oct 24, 2023

facebook-github-bot commented Oct 24, 2023

facebook-github-bot commented Oct 24, 2023

facebook-github-bot commented Oct 24, 2023

ajkr left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 25, 2023

facebook-github-bot commented Oct 25, 2023

facebook-github-bot commented Oct 25, 2023