Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel read and write/delete to same key in NonBatchedOpsStressTest #11058

Closed
wants to merge 5 commits into from

Conversation

hx235
Copy link
Contributor

@hx235 hx235 commented Dec 27, 2022

Context:
Current NonBatchedOpsStressTest does not allow multi-thread read (i.e, Get, Iterator) and write (i.e, Put, Merge) or delete to the same key. Every read or write/delete operation will acquire lock (GetLocksForKeyRange) on the target key to gain exclusive access to it. This does not align with RocksDB's nature of allowing multi-thread read and write/delete to the same key, that is concurrent threads can issue read/write/delete to RocksDB without external locking. Therefore this is a gap in our testing coverage.

To close the gap, biggest challenge remains in verifying db value against expected state in presence of parallel read and write/delete. The challenge is due to read/write/delete to the db and read/write to expected state is not within one atomic operation. Therefore we may not know the exact expected state of a certain db read, as by the time we read the expected state for that db read, another write to expected state for another db write to the same key might have changed the expected state.

Summary:
Credited to @ajkr's idea, we now solve this challenge by breaking the 32-bits expected value of a key into different parts that can be read and write to in parallel.

Basically we divide the 32-bits expected value into value_base (corresponding to the previous whole 32 bits but now with some shrinking in the value base range we allow), pending_write (i.e, whether there is an ongoing concurrent write), del_counter (i.e, number of times a value has been deleted, analogous to value_base for write), pending_delete (similar to pending_write) and deleted (i.e whether a key is deleted).

Also, we need to use incremental value_base instead of random value base as before because we want to control the range of value base a correct db read result can possibly be in presence of parallel read and write. In that way, we can verify the correctness of the read against expected state more easily. This is at the cost of reducing the randomness of the value generated in NonBatchedOpsStressTest we are willing to accept.

(For detailed algorithm of how to use these parts to infer expected state of a key, see the PR)

Misc: hide value_base detail from callers of ExpectedState by abstracting related logics into ExpectedValue class

Test:

  • Manual test of small number of keys (i.e, high chances of parallel read and write/delete to same key) with equally distributed read/write/deleted for 30 min
python3 tools/db_crashtest.py --simple {blackbox|whitebox} --sync_fault_injection=1 --skip_verifydb=0 --continuous_verification_interval=1000 --clear_column_family_one_in=0 --max_key=10 --column_families=1 --threads=32 --readpercent=25 --writepercent=25 --nooverwritepercent=0 --iterpercent=25 --verify_iterator_with_expected_state_one_in=1 --num_iterations=5 --delpercent=15 --delrangepercent=10 --range_deletion_width=5 --use_merge={0|1} --use_put_entity_one_in=0 --use_txn=0 --verify_before_write=0 --user_timestamp_size=0 --compact_files_one_in=1000 --compact_range_one_in=1000 --flush_one_in=1000 --get_property_one_in=1000 --ingest_external_file_one_in=100 --backup_one_in=100 --checkpoint_one_in=100 --approximate_size_one_in=0 --acquire_snapshot_one_in=100 --use_multiget=0 --prefixpercent=0 --get_live_files_one_in=1000 --manual_wal_flush_one_in=1000 --pause_background_one_in=1000 --target_file_size_base=524288 --write_buffer_size=524288 --verify_checksum_one_in=1000 --verify_db_one_in=1000
  • Rehearsal stress test for normal parameter and aggressive parameter to see if such change can find what existing stress test can find (i.e, no regression in testing capability)
  • [Ongoing]Try to find new bugs with this change that are not found by current NonBatchedOpsStressTest with no parallel read and write/delete to same key

@hx235 hx235 added the WIP Work in progress label Dec 27, 2022
@hx235 hx235 force-pushed the parallel_read_write_stress_test branch 2 times, most recently from f703d6a to ed8c324 Compare December 28, 2022 00:37
@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from ed8c324 to 21ab545 Compare December 28, 2022 19:42
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 21ab545 to 9a53dd5 Compare December 28, 2022 21:07
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 9a53dd5 to 06e5d7e Compare December 30, 2022 19:17
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@hx235 hx235 changed the title Allow parallel read and write to same key in NonBatchedOpsStressTest Allow parallel read and write/delete to same key in NonBatchedOpsStressTest Dec 30, 2022
@hx235 hx235 changed the title Allow parallel read and write/delete to same key in NonBatchedOpsStressTest Support parallel read and write/delete to same key in NonBatchedOpsStressTest Dec 30, 2022
@hx235 hx235 removed the WIP Work in progress label Dec 31, 2022
@hx235 hx235 requested a review from ajkr December 31, 2022 20:59
@ajkr ajkr removed their request for review January 26, 2023 18:39
Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are a few ways to improve readability but overall it's already pretty nice to read - thanks for the great PR!

db_stress_tool/expected_state.h Outdated Show resolved Hide resolved
db_stress_tool/expected_state.cc Outdated Show resolved Hide resolved
db_stress_tool/expected_state.cc Show resolved Hide resolved
db_stress_tool/expected_state.cc Outdated Show resolved Hide resolved
db_stress_tool/expected_state.cc Outdated Show resolved Hide resolved
@hx235
Copy link
Contributor Author

hx235 commented Mar 4, 2023

Thanks for the review - will get back to this next week Mon/Tue!

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 06e5d7e to 4a7ae09 Compare March 28, 2023 03:01
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235
Copy link
Contributor Author

hx235 commented Mar 28, 2023

Mainly rebase by working all my way through related merged PRs (#11133, #11147, #11144, #11228, #11249, #11303) as the rebase is non-trivial (i.e, can't fully trust git rebase)

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 4a7ae09 to 3d57beb Compare March 28, 2023 03:05
@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from d5ab031 to 7841783 Compare May 11, 2023 23:07
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 7841783 to 8ed8109 Compare May 11, 2023 23:11
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 8ed8109 to 756af42 Compare May 12, 2023 00:30
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@hx235 hx235 force-pushed the parallel_read_write_stress_test branch from 756af42 to 5440fca Compare May 12, 2023 00:33
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the updates

db_stress_tool/db_stress_shared_state.h Outdated Show resolved Hide resolved
db_stress_tool/expected_state.h Outdated Show resolved Hide resolved
@facebook-github-bot
Copy link
Contributor

@hx235 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@hx235 merged this pull request in 5fc57ee.

facebook-github-bot pushed a commit that referenced this pull request Aug 11, 2023
Summary:
**Context/Summary**
After #11058, we no longer lock the key range to iterate in TestIterateAgainstExpected, except for working with timestamp feature.

Pull Request resolved: #11695

Test Plan: no code change

Reviewed By: ajkr

Differential Revision: D48276668

Pulled By: hx235

fbshipit-source-id: dc92a3708b2281dc737c0877fb755548bf03a9fc
rockeet pushed a commit to topling/toplingdb that referenced this pull request Dec 18, 2023
Summary:
**Context/Summary**
After facebook/rocksdb#11058, we no longer lock the key range to iterate in TestIterateAgainstExpected, except for working with timestamp feature.

Pull Request resolved: facebook/rocksdb#11695

Test Plan: no code change

Reviewed By: ajkr

Differential Revision: D48276668

Pulled By: hx235

fbshipit-source-id: dc92a3708b2281dc737c0877fb755548bf03a9fc
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request May 29, 2024
Summary: This is most likely copypasta from `TestGet` from before facebook#11058 . There is no need to lock the mutex for the key for reads; in fact, doing so is detrimental to test coverage since it locks out concurrent writers.

Differential Revision: D57915207
ltamasi added a commit to ltamasi/rocksdb that referenced this pull request May 29, 2024
…ity (facebook#12709)

Summary:

This is most likely copypasta from `TestGet` from before facebook#11058 . There is no need to lock the mutex for the key for reads; in fact, doing so is detrimental to test coverage since it locks out concurrent writers.

Differential Revision: D57915207
facebook-github-bot pushed a commit that referenced this pull request May 29, 2024
…ity (#12709)

Summary:
Pull Request resolved: #12709

This is most likely copypasta from `TestGet` from before #11058 . There is no need to lock the mutex for the key for reads; in fact, doing so is detrimental to test coverage since it locks out concurrent writers.

Reviewed By: jowlyzhang

Differential Revision: D57915207

fbshipit-source-id: eb0dbf6b84e5408b87d96dd47597511996e206a7
rockeet pushed a commit to topling/toplingdb that referenced this pull request Sep 1, 2024
Summary:
**Context/Summary**
After facebook/rocksdb#11058, we no longer lock the key range to iterate in TestIterateAgainstExpected, except for working with timestamp feature.

Pull Request resolved: facebook/rocksdb#11695

Test Plan: no code change

Reviewed By: ajkr

Differential Revision: D48276668

Pulled By: hx235

fbshipit-source-id: dc92a3708b2281dc737c0877fb755548bf03a9fc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants