memdb: Make iterators not channel based #188

ValarDragon · 2021-09-13T17:11:53Z

Currently the memdb iterator is channel based: https://github.com/tendermint/tm-db/blob/master/memdb/iterator.go#L41-L86

This causes an extra thread lying around, and lots of lost dead time in dealing with channel wait times, in what should be a really fast hot loop. In the cacheKVStore usecase, the single core time for a Next() operation is 2-3x worse than before, and its actually using two threads so potentially 4-6x (perhaps double-counting, its a bit hard haha). The actual execution time looking at pprof has ~no time spent in the actual btree methods here.

This extra thread also makes debugging performance issues harder, and saturates other cores, which is pretty sub-ideal as we try to push more code to be multi-threaded!

The only thing that the channel structure buys, is letting two different processes read from the same iterator, with their reads being interleaved. This does not make sense as a usecase to me. You really shouldn't have multiple parallel processes reading from the same iterator in a blocking fashion as a general API, if this is needed it should be pushed to the iterator consumer. In general, parallelism should be pushing for each parallel process 'owning' the data its processing, and care being placed when it doesn't get to own the data for the duration of its execution. (E.g. rust's par_iter_mut has each process get one 'chunk' of the iterable space, not interleaved. This is beneficial for the data ownership / simplicity, reduced dead time, and cache efficiency)

ValarDragon · 2021-09-14T04:10:57Z

Its fairly hard to pin with certainty, but in some of our benchmarks that aren't doing much beyond creating many iterators, theres a ton of time going into mstart(), and mcall routines (both commonly used in golang parallel processing). It appears to be a 50x overhead over our entire exact computation.

Can't say with certainty its coming from these processes, but its the only component of the code I'm aware that uses goroutines. (And these are the only applicable sources of mcall and mstart I'm aware of)

ValarDragon · 2021-09-14T05:58:30Z

Looked into this some more, it definitely appears like our benchmark delay (50x) is coming from here, and this probably isnt from unclosed iterators. In the benchmark, there are 859926 created iterators, and the internal function registered closing at least 846118 of them, I highly doubt that the last 13k unclosed benchmarks is the delay, seeing as pprof shows the delay is in mstart.

Unfortunately the way google/btree's iterator is written essentially forces the iterator to be written the way it is. However, I think the performance demand here merits switching the library.

I recommend we switch to tidwall's btree library (which is imo a significant improvement over google's btree), and as a first pass use the path hint logic to make this iterator, fully single threaded, just via Get commands.

Then if/when we want more performance, its a very easy lift to just make the ascend and descend functions public https://github.com/tidwall/btree/blob/master/btree.go#L528, unlike in the google btree.

tac0turtle · 2021-09-14T06:40:27Z

If the change can be introduced in a non breaking way we could make this happen, otherwise a similar issue should be opened in the sdk as the Memdb was added there as well.

Love to see a db meant for testing purposes used in prod lol

ValarDragon · 2021-09-14T16:23:56Z

Oh yikes lol. I thought it was already used throughout the codebase, otherwise would've just re-imported a BTree :(

This can be added in a non-breaking way

tac0turtle · 2021-09-14T16:27:00Z

the memdb in the sdk is slated for another release. IF you want this change now, it would have to happen here

ValarDragon · 2021-09-14T16:38:51Z

agreed/understood, working on it here!

ValarDragon · 2021-09-29T03:47:08Z

No perms to re-open this. Can this remain open though, I am very much hoping to find someone with availability to work on this, its very much needed. This problem causes golang's race detector to catch lots of false positives =/

ValarDragon · 2021-10-10T22:30:18Z

I was investigating with traces, and basically ~all of our time is spent in deadtime with these chan.recv' and mutex unlocks aligning, and processes starting and stopping, between here and the IAVL iterator.
Every get operation is requiring a sync between channels, which in these benchmarks is taking 70,000ns each when in a tight get loop.

This would also explain all the crazy delays in queries that involve iteration. I'm going to prioritize patches for these, maybe our nodes will finally be saved 😅

faddat · 2022-05-07T14:46:43Z

@ValarDragon I missed this somehow.

We'll begin work on it, but we expect this to be state breaking, right?

ValarDragon · 2022-05-07T15:43:20Z

Its not state breaking. This purely an in memory representation. I don't think this is where the big wins are though, as it only impacts cacheKV store.

faddat · 2022-05-07T16:54:08Z

The thinking is to start at the bottom, and work our way up-- especially since I'm learning as I go.

I figure that even if this dependency (tm-db) is slated to be deprecated, it has a year left in its lifespan

(time wasted on perf) x 100+ engineers = worth it I guess?

faddat · 2022-05-07T16:57:08Z

Ascend and descend are now public functions in tidwall/btree

ValarDragon · 2022-05-07T17:23:49Z

Yes, but the CacheKV store is not where any bottleneck is coming from right now.

catShaark · 2022-05-10T00:33:17Z

Ascend and descend are now public functions in tidwall/btree

but there's no AscendRange or similar function

ValarDragon mentioned this issue Sep 19, 2021

A notable variety of CacheKVStore improvements osmosis-labs/cosmos-sdk#34

Merged

9 tasks

github-actions bot added the Stale label Sep 25, 2021

github-actions bot closed this as completed Sep 29, 2021

tac0turtle reopened this Sep 29, 2021

github-actions bot removed the Stale label Sep 30, 2021

ValarDragon mentioned this issue Oct 10, 2021

Change the IAVL iterators to not be channel based cosmos/cosmos-sdk#10335

Closed

4 tasks

daniel-farina mentioned this issue Nov 19, 2021

Make memDB iterators not use channels osmosis-labs/feature-requests#19

Closed

faddat mentioned this issue May 7, 2022

Notional tracking issue #248

Open

faddat mentioned this issue May 7, 2022

Begin tidwall/btree #253

Closed

faddat mentioned this issue May 7, 2022

memdb: Implement btree.Item on item directly, in addition to *item #186

Closed

tac0turtle mentioned this issue Sep 1, 2022

memdb: Make iterators not channel based cosmos/cosmos-db#47

Open

ValarDragon mentioned this issue Nov 22, 2022

perf: optimize iteration on nested cache context cosmos/cosmos-sdk#13881

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memdb: Make iterators not channel based #188

memdb: Make iterators not channel based #188

ValarDragon commented Sep 13, 2021

ValarDragon commented Sep 14, 2021 •

edited

Loading

ValarDragon commented Sep 14, 2021 •

edited

Loading

tac0turtle commented Sep 14, 2021

ValarDragon commented Sep 14, 2021 •

edited

Loading

tac0turtle commented Sep 14, 2021

ValarDragon commented Sep 14, 2021

ValarDragon commented Sep 29, 2021 •

edited

Loading

ValarDragon commented Oct 10, 2021

faddat commented May 7, 2022

ValarDragon commented May 7, 2022

faddat commented May 7, 2022

faddat commented May 7, 2022

ValarDragon commented May 7, 2022

catShaark commented May 10, 2022

memdb: Make iterators not channel based #188

memdb: Make iterators not channel based #188

Comments

ValarDragon commented Sep 13, 2021

ValarDragon commented Sep 14, 2021 • edited Loading

ValarDragon commented Sep 14, 2021 • edited Loading

tac0turtle commented Sep 14, 2021

ValarDragon commented Sep 14, 2021 • edited Loading

tac0turtle commented Sep 14, 2021

ValarDragon commented Sep 14, 2021

ValarDragon commented Sep 29, 2021 • edited Loading

ValarDragon commented Oct 10, 2021

faddat commented May 7, 2022

ValarDragon commented May 7, 2022

faddat commented May 7, 2022

faddat commented May 7, 2022

ValarDragon commented May 7, 2022

catShaark commented May 10, 2022

ValarDragon commented Sep 14, 2021 •

edited

Loading

ValarDragon commented Sep 14, 2021 •

edited

Loading

ValarDragon commented Sep 14, 2021 •

edited

Loading

ValarDragon commented Sep 29, 2021 •

edited

Loading