Improve memory efficiency of many OptimisticTransactionDBs #11439

pdillinger · 2023-05-11T20:57:51Z

Summary: Currently it's easy to use a ton of memory with many small OptimisticTransactionDB instances, because each one by default allocates a million mutexes (40 bytes each on my compiler) for validating transactions. It even puts a lot of pressure on the allocator by allocating each one individually!

In this change:

Create a new object and option that enables sharing these buckets of mutexes between instances. This is generally good for load balancing potential contention as various DBs become hotter or colder with txn writes. About the only cases where this sharing wouldn't make sense (e.g. each DB usually written by one thread) are cases that would be better off with OccValidationPolicy::kValidateSerial which doesn't use the buckets anyway.
Allocate the mutexes in a contiguous array, for efficiency
Add an option to ensure the mutexes are cache-aligned. In several other places we use cache-aligned mutexes but OptimisticTransactionDB historically does not. It should be a space-time trade-off the user can choose.
Provide some visibility into the memory used by the mutex buckets with an ApproximateMemoryUsage() function (also used in unit testing)
Share code with other users of "striped" mutexes, appropriate refactoring for customization & efficiency (e.g. using FastRange instead of modulus)

Test Plan: unit tests added. Ran sized-up versions of stress test in unit test, including a before-and-after performance test showing no consistent difference. (NOTE: OptimisticTransactionDB not currently covered by db_stress!)

Summary: Currently it's easy to use a ton of memory with many small OptimisticTransactionDB instances, because each one by default allocates (by default) a million mutexes (40 bytes each on my compiler) for validating transactions. It even puts a lot of pressure on the allocator by allocating each one individually! In this change: * Create a new object and option that enables sharing these buckets of mutexes between instances. This is generally good for load balancing potential contention as various DBs become hotter or colder with txn writes. About the only cases where this sharing wouldn't make sense (e.g. each DB usually written by one thread) are cases that would be better off with OccValidationPolicy::kValidateSerial which doesn't use the buckets anyway. * Allocate the mutexes in a contiguous array, for efficiency * Add an option to ensure the mutexes are cache-aligned. In several other places we use cache-aligned mutexes but OptimisticTransactionDB historically does not. It should be a space-time trade-off the user can choose. * Provide some visibility into the memory used by the mutex buckets with an ApproximateMemoryUsage() function (also used in unit testing) * Share code with other users of "striped" mutexes, appropriate refactoring for customization & efficiency (e.g. using FastRange instead of modulus) Test Plan: unit tests added. Ran sized-up versions of stress test in unit test, including a before-and-after performance test showing no consistent difference. (NOTE: OptimisticTransactionDB not currently covered by db_stress!)

facebook-github-bot · 2023-05-11T22:34:22Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ltamasi · 2023-05-22T19:57:59Z

include/rocksdb/utilities/optimistic_transaction_db.h

+  // Most details in internal derived class
+  // Users should not derive from this class


We could consider enforcing this (instead of relying on users) using the pimpl idiom:

class OccLockBuckets { public: ~OccLockBuckets(); // non-virtual, implemented in .cc size_t ApproximateMemoryUsage() const; // non-virtual, gets forwarded to impl_->ApproximateMemoryUsage(), implemented in .cc private: // private ctor, implemented in .cc // friends as needed (e.g. factory method) class ImplBase; std::unique_ptr<ImplBase> impl_; };

I don't like the additional level of indirection, and wanted to avoid some boilerplate, but I'll include the version without indirection.

ltamasi · 2023-05-22T20:41:46Z

utilities/transactions/optimistic_transaction.cc

+  for (auto v : lk_ptrs) {
+    v->Lock();
  }
+  Defer unlocks([&]() {
+    for (auto v : lk_ptrs) {
+      v->Unlock();
+    }
+  });


I realize that (at least per the style guide) exceptions don't exist ;) but this is slightly less exception safe than the original. Building a vector of unique_locks as we iterate means that if an exception is thrown during a call to Lock, keys locked earlier get automatically unlocked; here, we only ensure that if every mutex was successfully locked, each will also get unlocked.

I'll note this in a comment.

ltamasi · 2023-05-22T20:44:46Z

utilities/transactions/optimistic_transaction_db_impl.h

+template <bool cache_aligned>
+class OccLockBucketsImpl : public OccLockBucketsImplBase {
+ public:
+  OccLockBucketsImpl(size_t bucket_count) : locks_(bucket_count) {}


Might want to mark this ctor explicit or the linter might complain

…n_memory

facebook-github-bot · 2023-05-23T21:25:38Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-05-23T21:26:30Z

@pdillinger has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2023-05-23T21:26:57Z

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-05-24T19:06:11Z

@pdillinger merged this pull request in 17bc277.

facebook-github-bot added the CLA Signed label May 11, 2023

Fixes

d2dc631

pdillinger force-pushed the optimistic_txn_memory branch from 2317ef8 to d2dc631 Compare May 11, 2023 22:19

pdillinger requested a review from ltamasi May 11, 2023 22:34

ltamasi approved these changes May 22, 2023

View reviewed changes

Merge branch 'main' of github.com:facebook/rocksdb into optimistic_tx…

8e0022e

…n_memory

Some polish from feedback

db40f19

pdillinger force-pushed the optimistic_txn_memory branch from 60eed0d to db40f19 Compare May 23, 2023 21:26

facebook-github-bot closed this in 17bc277 May 24, 2023

facebook-github-bot added the Merged label May 24, 2023

Chaoses-Ib mentioned this pull request Aug 14, 2023

Expose OptimisticTransactionDBOptions in C API #11703

Open

anand1976 mentioned this pull request Nov 13, 2023

v8.4.4 Segmentation fault #12054

Closed

igorcanadi mentioned this pull request Jan 17, 2024

[SYS-6913] Upgrade RocksDB-Cloud to 8.9.1 rockset/rocksdb-cloud#315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory efficiency of many OptimisticTransactionDBs #11439

Improve memory efficiency of many OptimisticTransactionDBs #11439

pdillinger commented May 11, 2023

facebook-github-bot commented May 11, 2023

ltamasi May 22, 2023

pdillinger May 23, 2023

ltamasi May 22, 2023

pdillinger May 23, 2023

ltamasi May 22, 2023

facebook-github-bot commented May 23, 2023

facebook-github-bot commented May 23, 2023

facebook-github-bot commented May 23, 2023

facebook-github-bot commented May 24, 2023

		// Most details in internal derived class
		// Users should not derive from this class

Improve memory efficiency of many OptimisticTransactionDBs #11439

Improve memory efficiency of many OptimisticTransactionDBs #11439

Conversation

pdillinger commented May 11, 2023

facebook-github-bot commented May 11, 2023

ltamasi May 22, 2023

Choose a reason for hiding this comment

pdillinger May 23, 2023

Choose a reason for hiding this comment

ltamasi May 22, 2023

Choose a reason for hiding this comment

pdillinger May 23, 2023

Choose a reason for hiding this comment

ltamasi May 22, 2023

Choose a reason for hiding this comment

facebook-github-bot commented May 23, 2023

facebook-github-bot commented May 23, 2023

facebook-github-bot commented May 23, 2023

facebook-github-bot commented May 24, 2023