Skip to content

Commit

Permalink
Force thread cleanup in threadpool destructor
Browse files Browse the repository at this point in the history
The current threadpool design expects that host/client
programs that use RocksDB will explicitly call
ThreadPoolImpl::Impl::JoinThreads before the threadpool
destructor is called. It has been observed that a case
may arise in which the join on the threads is not done
and the destructor is called by the exit handler of
the host program. When this happens, the threadpool
destructor will hang on the call to pthread_cond_destroy as
show in the truncated stack trace of a hung process below:

    $ pstack 3885
    #0  0x000014e2288ebe63 in pthread_cond_destroy@@GLIBC_2.3.2 () from target:/lib64/libpthread.so.0
    facebook#1  0x000014e225a969af in rocksdb::ThreadPoolImpl::Impl::~Impl (this=0x14e227c472a0, __in_chrg=<optimized out>) at MariaDB/storage/rocksdb/rocksdb/util/threadpool_imp.cc:147
    facebook#2  std::default_delete<rocksdb::ThreadPoolImpl::Impl>::operator() (this=<optimized out>, __ptr=0x14e227c472a0) at /gcc-x.5.0/include/c++/x.5.0/bits/unique_ptr.h:78
    facebook#3  std::unique_ptr<rocksdb::ThreadPoolImpl::Impl, std::default_delete<rocksdb::ThreadPoolImpl::Impl> >::~unique_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/unique_ptr.h:263
    facebook#4  rocksdb::ThreadPoolImpl::~ThreadPoolImpl (this=<optimized out>, __in_chrg=<optimized out>) at MariaDB/storage/rocksdb/rocksdb/util/threadpool_imp.cc:427
    facebook#5  0x000014e225af7186 in std::_Destroy<rocksdb::ThreadPoolImpl> (__pointer=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:98
    facebook#6  std::_Destroy_aux<false>::__destroy<rocksdb::ThreadPoolImpl*> (__last=<optimized out>, __first=0x14e227c0f190) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:108
    facebook#7  std::_Destroy<rocksdb::ThreadPoolImpl*> (__last=<optimized out>, __first=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:137
    facebook#8  std::_Destroy<rocksdb::ThreadPoolImpl*, rocksdb::ThreadPoolImpl> (__last=0x14e227c0f1c0, __first=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_construct.h:206
    facebook#9  std::vector<rocksdb::ThreadPoolImpl, std::allocator<rocksdb::ThreadPoolImpl> >::~vector (this=0x14e225f65a40 <rocksdb::Env::Default()::default_env+32>, __in_chrg=<optimized out>) at /gcc-x.5.0/include/c++/x.5.0/bits/stl_vector.h:434
    facebook#10 rocksdb::(anonymous namespace)::PosixEnv::~PosixEnv (this=0x14e225f65a20 <rocksdb::Env::Default()::default_env>, __in_chrg=<optimized out>) at MariaDB/storage/rocksdb/rocksdb/env/env_posix.cc:133
    facebook#11 0x000014e2285685ac in __run_exit_handlers () from target:/lib64/libc.so.6
    facebook#12 0x000014e2285685fa in exit () from target:/lib64/libc.so.6
    ...

The reason this hang occurs is because the threadpool
condition variable may still be locked by other threads.
To fix the issue, a cleanup of the threads
is forced by calling join before the threadpool proceeds
with the rest of it's object destruction process. In
this way, we handle any corner cases were the destructor is called,
but the threadpool's condition variable is still locked
by another thread.

Lastly in our call to ThreadPoolImpl::Impl::JoinThreads
we set the boolean wait_for_jobs_to_complete = false
so that the effect of destructing the threadpool is
immediate.

This change was tested by using MariaDB as a host program,
and making it follow an execution path that directly calls
the threadpool destructor as MariaDB is exiting:
1) Remove the addr2line binary on the host machine. This will make
the MariaDB cleanup handler to directly call the RocksDB
threadpool destructor when a MariaDB crash is instigated.
2) Instigate a crash of MariaDB by sending signal 11 to its running
process `pkill -11 mysqld`.
3) Check that MariaDB is no longer running:
`ps -ef | grep mysqld`. Before this change, the process would
still be in existence in a hang state due to the RocksDB
threadpool, but after the change the process properly exits.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
  • Loading branch information
lottaquestions committed Apr 4, 2023
1 parent e5a560e commit acd2124
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion util/threadpool_imp.cc
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,15 @@ inline ThreadPoolImpl::Impl::Impl()
bgsignal_(),
bgthreads_() {}

inline ThreadPoolImpl::Impl::~Impl() { assert(bgthreads_.size() == 0U); }
inline ThreadPoolImpl::Impl::~Impl() {
// In case destructor is called by host program before
// first joining the threads, force join the threads
// so that the program does not hang on shutdown due
// to the threadpool condition variable being locked
// by other RocksDB threads.
JoinThreads(false);
assert(bgthreads_.size() == 0U);
}

void ThreadPoolImpl::Impl::JoinThreads(bool wait_for_jobs_to_complete) {
std::unique_lock<std::mutex> lock(mu_);
Expand Down

0 comments on commit acd2124

Please sign in to comment.