Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ #10610

hx235 · 2022-08-30T18:41:51Z

Context/Summary:
According to https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_job.h#L328-L332, any reading in the form of *bg_compaction_scheduled_ , *bg_bottom_compaction_scheduled_ should be protected by mutex, which isn't the case for some assert statement. This leads to a data race that can be repro-ed by the following command (command coming soon)

db=/dev/shm/rocksdb_crashtest_blackbox
exp=/dev/shm/rocksdb_crashtest_expected
rm -rf $db $exp
mkdir -p $exp

./db_stress --clear_column_family_one_in=0 --column_families=1 --db=$db --delpercent=10 --delrangepercent=0 --destroy_db_initially=1 --expected_values_dir=$exp --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=1000000 --max_key_len=3 --prefixpercent=0 --readpercent=0 --reopen=0 --ops_per_thread=100000000 --value_size_mult=32 --writepercent=90  --compaction_pri=4 --use_txn=1 --level_compaction_dynamic_level_bytes=True  --compaction_ttl=0  --compact_files_one_in=1000000 --compact_range_one_in=1000000 --value_size_mult=32 --verify_db_one_in=1000  --write_buffer_size=65536 --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_key=25000000 --max_key_len=3 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --target_file_size_base=2097152 --target_file_size_multiplier=2

WARNING: ThreadSanitizer: data race (pid=73424)
  Read of size 4 at 0x7b8c0000151c by thread T13:
    #0 ReleaseSubcompactionResources internal_repo_rocksdb/repo/db/compaction/compaction_job.cc:390 (db_stress+0x630aa3)
    #1 rocksdb::CompactionJob::Run() internal_repo_rocksdb/repo/db/compaction/compaction_job.cc:741 (db_stress+0x630aa3)
    #2 rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:3436 (db_stress+0x60b2cc)
    #3 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2950 (db_stress+0x606d79)
    #4 rocksdb::DBImpl::BGWorkCompaction(void*) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2693 (db_stress+0x60356a)

  Previous write of size 4 at 0x7b8c0000151c by thread T12 (mutexes: write M438955329917552448):
    #0 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:3018 (db_stress+0x6072a1)
    #1 rocksdb::DBImpl::BGWorkCompaction(void*) internal_repo_rocksdb/repo/db/db_impl/db_impl_compaction_flush.cc:2693 (db_stress+0x60356a)

Location is heap block of size 6720 at 0x7b8c00000000 allocated by main thread:
    #0 operator new(unsigned long, std::align_val_t) <null> (db_stress+0xbab5bb)
    #1 rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool) internal_repo_rocksdb/repo/db/db_impl/db_impl_open.cc:1811 (db_stress+0x69769a)
    #2 rocksdb::TransactionDB::Open(rocksdb::DBOptions const&, rocksdb::TransactionDBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::TransactionDB**) internal_repo_rocksdb/repo/utilities/transactions/pessimistic_transaction_db.cc:258 (db_stress+0x8ae1f4)
    #3 rocksdb::StressTest::Open(rocksdb::SharedState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:2611 (db_stress+0x32b927)
    #4 rocksdb::StressTest::InitDb(rocksdb::SharedState*) internal_repo_rocksdb/repo/db_stress_tool/db_stress_test_base.cc:290 (db_stress+0x34712c)

This PR added all the missing mutex that should've been in place

Test:

Past repro command
Existing CI

facebook-github-bot · 2022-08-30T18:42:36Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963 · 2022-08-30T18:47:50Z

db/compaction/compaction_job.cc

@@ -321,6 +321,7 @@ void CompactionJob::AcquireSubcompactionResources(
              ->write_controller()
              ->NeedSpeedupCompaction())
          .max_compactions;
+  db_mutex_->Lock();


Maybe use InstrumentedMutexLock

riversand963 · 2022-08-30T18:49:02Z

db/compaction/compaction_job.cc

@@ -380,6 +380,7 @@ void CompactionJob::ReleaseSubcompactionResources() {
  if (extra_num_subcompaction_threads_reserved_ == 0) {
    return;
  }
+  db_mutex_->Lock();


Unfortunately I can't easily do InstrumentedMutexLock for this one cuz the function right after its unlock "ShrinkSubcompactionResources(extra_num_subcompaction_threads_reserved_);" but before the scope ends need to acquire lock again. So I won't be fixing this one.

How about

{ InstrumentedMutexLock (db_mutex_); assert(...) } ShrinkSubcompactionresrouce();

Furthermore, mutex lock-unlock is not needed if in opt mode, but if we differentiate between debug and opt mode, there may be a gap in test coverage, thus I am fine with current.

Got you - yeah I can change to your suggested version! This is the easy way that I didn't think of!

facebook-github-bot · 2022-08-30T19:15:45Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-30T19:16:05Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2022-08-30T19:42:20Z

@hx235 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-08-30T19:42:39Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

riversand963

LGTM as long as it fixes the data race.

Not related to this PR, but I find the design of having one class keeping a pointer to a non-public data member difficult to track when trying to reason about data access safety. We should think of a better way of doing it.

facebook-github-bot · 2022-08-30T20:11:26Z

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hx235 · 2022-08-30T23:07:52Z

LGTM as long as it fixes the data race.

Not related to this PR, but I find the design of having one class keeping a pointer to a non-public data member difficult to track when trying to reason about data access safety. We should think of a better way of doing it.

Thanks. Yeah this was one of those difficult design encountered during the internship that was decided to pursue the current way. Need to think harder about your suggestion ...

hx235 · 2022-08-30T23:14:05Z

LGTM as long as it fixes the data race.

Not related to this PR, but I find the design of having one class keeping a pointer to a non-public data member difficult to track when trying to reason about data access safety. We should think of a better way of doing it.

Tracked this internally as a tech debt.

Added missing mutex when reading shared variable

df954a2

facebook-github-bot added the CLA Signed label Aug 30, 2022

riversand963 reviewed Aug 30, 2022

View reviewed changes

hx235 changed the title ~~Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_~~ [draft]Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ Aug 30, 2022

hx235 added the WIP Work in progress label Aug 30, 2022

hx235 requested review from riversand963 and ajkr August 30, 2022 19:39

Use InstrumentedMutexLock as much as possible

3f5cc5d

hx235 force-pushed the missing_lock branch from 170eed7 to 3f5cc5d Compare August 30, 2022 19:42

hx235 changed the title ~~[draft]Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_~~ Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ Aug 30, 2022

hx235 removed the WIP Work in progress label Aug 30, 2022

riversand963 approved these changes Aug 30, 2022

View reviewed changes

facebook-github-bot closed this in 8a85946 Aug 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ #10610

Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ #10610

hx235 commented Aug 30, 2022 •

edited

Loading

facebook-github-bot commented Aug 30, 2022

riversand963 Aug 30, 2022

hx235 Aug 30, 2022

riversand963 Aug 30, 2022

hx235 Aug 30, 2022

riversand963 Aug 30, 2022 •

edited

Loading

hx235 Aug 30, 2022

hx235 Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

riversand963 left a comment

facebook-github-bot commented Aug 30, 2022

hx235 commented Aug 30, 2022 •

edited

Loading

hx235 commented Aug 30, 2022

Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ #10610

Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ #10610

Conversation

hx235 commented Aug 30, 2022 • edited Loading

facebook-github-bot commented Aug 30, 2022

riversand963 Aug 30, 2022

Choose a reason for hiding this comment

hx235 Aug 30, 2022

Choose a reason for hiding this comment

riversand963 Aug 30, 2022

Choose a reason for hiding this comment

hx235 Aug 30, 2022

Choose a reason for hiding this comment

riversand963 Aug 30, 2022 • edited Loading

Choose a reason for hiding this comment

hx235 Aug 30, 2022

Choose a reason for hiding this comment

hx235 Aug 30, 2022

Choose a reason for hiding this comment

facebook-github-bot commented Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

facebook-github-bot commented Aug 30, 2022

riversand963 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Aug 30, 2022

hx235 commented Aug 30, 2022 • edited Loading

hx235 commented Aug 30, 2022

hx235 commented Aug 30, 2022 •

edited

Loading

riversand963 Aug 30, 2022 •

edited

Loading

hx235 commented Aug 30, 2022 •

edited

Loading