Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ADR): ADR 062: Collections -- a new storage layer for cosmos-sdk modules. #14107

Merged
merged 11 commits into from
Dec 6, 2022
1 change: 1 addition & 0 deletions docs/architecture/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ When writing ADRs, follow the same best practices for writing RFCs. When writing
* [ADR 046: Module Params](./adr-046-module-params.md)
* [ADR 057: App Wiring](./adr-057-app-wiring.md)
* [ADR 059: Test Scopes](./adr-059-test-scopes.md)
* [ADR 062: Collections State Layer](./adr-062-collections-state-layer.md)

### Draft

Expand Down
117 changes: 117 additions & 0 deletions docs/architecture/adr-062-collections-state-layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# ADR 062: Collections, a simplified storage layer for cosmos-sdk modules.

## Changelog

* 30/11/2022: PROPOSED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* 30/11/2022: PROPOSED
* 30/11/2022: Initial draft (@testinginprod)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops... missed this before hitting merge 🤦‍♂️


## Status

PROPOSED - Implemented

## Abstract

We propose a simplified module storage layer which leverages golang generics to allow module developers to handle module
storage in a simple and straightforward manner, whilst offering safety, extensibility and standardisation.

## Context

Module developers are forced into manually implementing storage functionalities in their modules, those functionalities include
but are not limited to:

- Defining key to bytes formats.
- Defining value to bytes formats.
- Defining secondary indexes.
- Defining query methods to expose outside to deal with storage.
- Defining local methods to deal with storage writing.
- Dealing with genesis imports and exports.
- Writing tests for all the above.


This brings in a lot of problems:
- It blocks developers from focusing on the most important part: writing business logic.
- Key to bytes formats are complex and their definition is error-prone, for example:
- how do I format time to bytes in such a way that bytes are sorted?
- how do I ensure when I don't have namespace collisions when dealing with secondary indexes?
- The lack of standardisation makes life hard for clients, and the problem is exacerbated when it comes to providing proofs for objects present in state. Clients are forced to maintain a list of object paths to gather proofs.

### Current Solution: ORM

The current SDK proposed solution to this problem is [ORM](https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-055-orm.md).
Whilst ORM offers a lot of good functionality aimed at solving these specific problems, it has some downsides:
- It requires migrations.
- It uses the newest protobuf golang API, whilst the SDK still mainly uses gogoproto.
- Integrating ORM into a module would require the developer to deal with two different golang frameworks (golang protobuf + gogoproto) representing the same API objects.
- It has a high learning curve, even for simple storage layers as it requires developers to have knowledge around protobuf options, custom cosmos-sdk storage extensions, and tooling download. Then after this they still need to learn the code-generated API.
Comment on lines +39 to +44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree there are some downsides at the moment but the downsides related to gogoproto seems more along the lines of an issue with the sdk needing to be updated rather than a downside of the orm module itself.

Also, developers should already have working knowledge of protobuf through module development and learning the code-generated API was not difficult for me so I don't really see the orm as a steep learning curve (although maybe a bit steeper when compared to the current proposal for collections layer).

I think the advantages of the orm are missing from this section and listing them might help a user understand when they would want to use the orm versus when they would want to use the collections layer. The advantages that have already been discussed in this pull request might be a good starting point.

We are using the orm in regen ledger and I don't see us switching as we have had no issues with the orm and we are leveraging the more complex functionality it provides to meet our needs, which includes multiple indexes, unique constraints on multiple indexes, and multi-column indexes.

Maybe some of this is planned for the collections layer but I would still prefer to define the state api using protobuf and to see protobuf more widely adopted for module development, "write once use forever".


### CosmWasm Solution: cw-storage-plus

The collections API takes inspiration from [cw-storage-plus](https://docs.cosmwasm.com/docs/1.0/smart-contracts/state/cw-plus/),
which has demonstrated to be a powerful tool for dealing with storage in CosmWasm contracts.
It's simple, does not require extra tooling, it makes it easy to deal with complex storage structures (indexes, snapshot, etc).
The API is straightforward and explicit.

## Decision

We propose to port the `collections` API, whose implementation lives in [NibiruChain/collections](https://github.com/NibiruChain/collections) to cosmos-sdk.

Collections implements four different storage handlers types:

- `Map`: which deals with simple `key=>object` mappings.
- `KeySet`: which acts as a `Set` and only retains keys and no object (usecase: allow-lists).
- `Item`: which always contains only one object (usecase: Params)
- `Sequence`: which implements a simple always increasing number (usecase: Nonces)
- `IndexedMap`: builds on top of `Map` and `KeySet` and allows to create relationships with `Objects` and `Objects` secondary keys.

All the collection APIs build on top of the simple `Map` type.

Collections is fully generic, meaning that anything can be used as `Key` and `Value`. It can be a protobuf object or not.

Collections types, in fact, delegate the duty of serialisation of keys and values to a secondary collections API component called `ValueEncoders` and `KeyEncoders`.

`ValueEncoders` take care of converting a value to bytes (relevant only for `Map`). And offers a plug and play layer which allows us to change how we encode objects,
which is relevant for swapping serialisation frameworks and enhancing performance.
`Collections` already comes in with default `ValueEncoders`, specifically for: protobuf objects, special SDK types (sdk.Int, sdk.Dec).

`KeyEncoders` take care of converting keys to bytes, `collections` already comes in with some default `KeyEncoders` for some privimite golang types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One downside in my opinion of KeyEncoders being an open interface is that the number of encoding formats is unbounded.

For each new custom KeyEncoder that a chain decides to create, the same code would need to be rewritten in JS, to be used for light client proofs.

In the proto ORM, the number of encoders was standardized, and all info to understand key format was in proto files (queryable by clients), so it was "write once use forever" for clients. In this ADR, afaiu, reflection only returns a string of the key encoder type, so the client would need a custom registry of key encoder type to implementation, and possibly write new implementations for custom chains.

Hopefully in practice, the community would converge to a set of 10-12 key encoders. (same applies for value encoders)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a downside. Hopefully we can still have a standardized set of key + value encoders to simplify client integration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is an unfortunate downside, but a desired property in order to incentivise developers to migrate in a backwards compatible way with respect to their state.

In the future nothing forbids us from adding a doNotImplement method on KeyCodec, I even wonder if this could be done gradually, type by type; example: starting from uint64, string...

(uint64, string, time.Time, ...) and some widely used sdk types (sdk.Acc/Val/ConsAddress, sdk.Int/Dec, ...).
These default implementations also offer safety around proper lexicographic ordering and namespace-collision.

Examples of the collections API can be found here:
- introduction: https://github.com/NibiruChain/collections/tree/main/examples
- usage in nibiru: [x/oracle](https://github.com/NibiruChain/nibiru/blob/master/x/oracle/keeper/keeper.go#L32), [x/perp](https://github.com/NibiruChain/nibiru/blob/master/x/perp/keeper/keeper.go#L31)
- cosmos-sdk's x/staking migrated: https://github.com/testinginprod/cosmos-sdk/pull/22


## Consequences

### Backwards Compatibility

The design of `ValueEncoders` and `KeyEncoders` allows modules to retain the same `byte(key)=>byte(value)` mappings, making
the upgrade to the new storage layer non-state breaking.


### Positive

- ADR aimed at removing code from the SDK rather than adding it. Migrating just `x/staking` to collections would yield to a net decrease in LOC (even considering the addition of collections itself).
- Simplifies and standardises storage layers across modules in the SDK.
- Does not require to have to deal with protobuf.
- It's pure golang code.
- Generalisation over `KeyEncoders` and `ValueEncoders` allows us to not tie ourself to the data serialisation framework.
- `KeyEncoders` and `ValueEncoders` can be extended to provide schema reflection.

### Negative

- Golang generics are not as battle-tested as other Golang features, despite being used in production right now.
- Collection types instantiation needs to be improved.

### Neutral

{neutral consequences}

## Further Discussions

- Automatic genesis import/export (not implemented because of API breakage)
- Schema reflection


## References