Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Storage #12986

Closed
9 of 22 tasks
tac0turtle opened this issue Aug 22, 2022 · 17 comments
Closed
9 of 22 tasks

EPIC: Storage #12986

tac0turtle opened this issue Aug 22, 2022 · 17 comments
Assignees
Labels

Comments

@tac0turtle
Copy link
Member

tac0turtle commented Aug 22, 2022

Summary

Storage in the cosmos-sdk is where we spend the most amount of time in execution. The original store package was written many years ago and left to ferment. As the needs of blockchains expanded and we gained better insight into how applications using the cosmos-sdk reads and writes we have learned how to better design the storage layer.

Problem Definition

Cosmos-sdk spends immense amounts of time at the storage layer and this is causing many bottlenecks.

Proposal

We need to refactor many parts of the storage layer in order to get somewhere where the users of the cosmos-sdk can get the highest performance for their applications.

Things we can do today:

  • Refactor Cache kv store
  • Refactor multistore
  • Write benchmarks for many things in the store package
  • Profile current store and bring performance improvements to users.
  • Refactor key structure in IAVL to take advantage of data locality.

Things to do in the future:

  • Refactor the design of storage to do something along the lines of ADR-40
  • Use a different tree structure
  • Use a different database

Previously there was work around a new storage layer, store/v2 and adr-40, this work and design is currently under review from the core sdk team.

Work Breakdown

Phase 1

  • Write a specification of the store package and how it works with the current tree (iavl)
    • cachekv
    • multistore
  • Spec: Produce a spec for store v2 #14667
  • Refactor store package (API cleaning and improvement)
    • cachekv store docs and refactor to provide strong guarantees and more performance
      • It mixes so many jobs, and does a poor job at it, especially around iterator
      • Someone ground up rewriting it, and writing out what the guarantees we should want, will net save so much headache
      • (Make parallel iterators explicitly clear in how expectations are made of it)
    • Store: Write regression tests & more benchmarks/tests #13224

ref: #7101, #6687

Phase 2

  • POC to prove iavl key format lends to higher performance
  • Refactor IAVL key layout to operate within data locality
  • Tests and benchmarks
  • Migration for current state

Phase 3

  • evaluate different data structures
  • evaluate adr040 design and implementation on how to progress to separating SS and SC
  • .... TBD
@alexanderbez alexanderbez self-assigned this Aug 22, 2022
@alexanderbez
Copy link
Contributor

Thanks for writing up this up @marbar3778! Seems like a pretty good summary to me. Just to highlight and reiterate, things that I think are paramount:

  • Single logical DB utilization for atomic commits (giving optionality for separate logical DBs when desired by a module).
  • Refactor store APIs and how stores are registered

One thing I'm still hazy on is IAVL vs some future better accumulator tree structure (e.g. JMT). How much time do we want to spend on IAVL and say refactoring the key layout vs just switching to something better like the JMT?

@tac0turtle
Copy link
Member Author

One thing I'm still hazy on is IAVL vs some future better accumulator tree structure (e.g. JMT). How much time do we want to spend on IAVL and say refactoring the key layout vs just switching to something better like the JMT?

Talking with @ValarDragon he believes the change would take 1 week or less of work after a repo cleanup is upstreamed to our version. The JMT work would take much longer. This could be an easy win until we land JMT or something similar.

Would you want to dive into the key format change?

@alexanderbez
Copy link
Contributor

alexanderbez commented Aug 25, 2022

Awesome!

Would you want to dive into the key format change?

Sure! From what I've read in the JMT paper, we can adopt a similar tactic, i.e. keys take the form of version || nibble path (e.g. 45 || 0010101) where || denotes concatenation , I'm not sure IAVL uses bit nibble paths for keys, but it is binary and thus we can a similar/identical format. Correct me if I'm wrong @ValarDragon.

@adu-crypto
Copy link
Contributor

are we planning on switching to the JMT and refactoring the whole storage?

@adu-crypto
Copy link
Contributor

For current store v2 work, besides the SMT itself compared with IAVL tree, I personally think the new store v2 comes with two major differences from store v1:

  • store v2 uses backing db transaction to commit and control version, which means every time store is committed, db tx is discarded and root store needs to be reloaded next time for read/write. While store v1 manages the version by the tree itself and don’t need to discard tx and reload store after commitment.
  • store v1 iavlStore caches store nodes in memory, I don’t think store v2 caches smt/databucket/indexbucket in memory

@alexanderbez
Copy link
Contributor

are we planning on switching to the JMT and refactoring the whole storage?

The strategy will be as follows (in order):

  • Refactor store package (API cleaning and improvement -- I'd like to use signal logical DB)
  • Relatively low-lift/small refactor & improvement of IAVL
  • Eventual implementation of JMT (~q1 2023)

@alexanderbez
Copy link
Contributor

As for the current v2, I think there might be a chance that is actually discarded -- I'll let @marbar3778 chime in on that

@tac0turtle
Copy link
Member Author

V2 is still being evaluated. Much of the team has been in and out the past couple weeks so we have been unable to review in a timely manner. There are some concerns taken in the current design and the performance of v2 as it currently stands is not drastically more than v1 with iavl.

With a week or two of work on iavl and another week or two in the store package there is a high probability that v1+iavl will become more performant than v2+smt. For now we are opting to move forward with v1+iavl until we fully evaluate the design decisions of v2+smt.

@tac0turtle
Copy link
Member Author

tac0turtle commented Aug 29, 2022

The strategy will be as follows (in order):

  • Refactor store package (API cleaning and improvement -- I'd like to use signal logical DB)
  • Relatively low-lift/small refactor & improvement of IAVL
  • Eventual implementation of JMT (~q1 2023)

lets start with the first two and add in specs/docs for the store package. Do you want to lead it @alexanderbez?

@alexanderbez
Copy link
Contributor

The strategy will be as follows (in order):

  • Refactor store package (API cleaning and improvement -- I'd like to use signal logical DB)
  • Relatively low-lift/small refactor & improvement of IAVL
  • Eventual implementation of JMT (~q1 2023)

lets start with the first two and add in specs/docs for the store package. Do you want to lead it @alexanderbez?

yes ser

@yihuang
Copy link
Collaborator

yihuang commented Sep 14, 2022

FYI, v2 store looks not bad in terms of db size reduction: evmos/ethermint#1304 (comment)
But this benchmark didn't count in the cost of snapshot versioning.

@yihuang
Copy link
Collaborator

yihuang commented Sep 14, 2022

The low level db snapshot/checkpoints hard-links the db files, it seems pretty costly in terms of db size intuitively (need some tests to confirm).

https://hackmd.io/@2O2cXDfdQpijemGc_vlHCA/SkXxuTAli
I wrote a proposal to store historical versions explicitly in a similar way to erigon, to not rely on low level db checkpoints.

@adu-crypto
Copy link
Contributor

FYI, v2 store looks not bad in terms of db size reduction: evmos/ethermint#1304 (comment) But this benchmark didn't count in the cost of snapshot versioning.

I agree that if we really want to make the blockchain network more usable, we have to take the db size of archive/full node into consideration.

@tac0turtle
Copy link
Member Author

Seems like the size comparison is of smt vs iavl. Smt having a smaller footprint is known, this is not part of the worry, the worry is that the semantics of iavl and store v1 are not known and a new implementation was done without this understanding.

Did the benchmark account for the historical versions on disk or only what is in the tree? The data in smt is not historical but only current so it makes sense that its smaller

@JayT106
Copy link
Contributor

JayT106 commented Sep 15, 2022

The low level db snapshot/checkpoints hard-links the db files, it seems pretty costly in terms of db size intuitively (need some tests to confirm).

not sure why the checkpoints consumes a lot of disk space but definitely need to be checked out
#12251 (comment)

@yihuang
Copy link
Collaborator

yihuang commented Oct 6, 2022

An alternative db design for historical states, moving the discussion here.

  • reduce archive node db size dramatically (10x)
  • not consensus breaking

@tac0turtle
Copy link
Member Author

closing this epic for now, @alexanderbez when you have a chance could you create a new one for the upcoming storage work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants