feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134

testinginprod · 2022-12-02T11:41:55Z

Description

Optimistically porting the core API of collections, pending ADR approval, with some slight modifications to address initial reviews.

Closes: #XXXX

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

confirmed the correct type prefix in the PR title
confirmed ! in the type prefix if API or client breaking change
confirmed all author checklist items have been addressed
reviewed state machine logic
reviewed API design and naming
reviewed documentation is accurate
reviewed tests and test coverage
manually tested (if applicable)

collections/collections.go

testinginprod · 2022-12-02T11:53:33Z

Iteration is not ported into this PR in order to have a conversation around API friendliness.

collections/collections.go

collections/item.go

collections/collections.go

peterbourgon · 2022-12-03T12:34:20Z

collections/collections.go

+
+// ValueEncoder defines a generic interface which is implemented
+// by types that are capable of encoding and decoding collection values.
+type ValueEncoder[T any] interface {


Encoding should in general take an io.Writer which is written-to. This provides maximum flexibility and performance. Callers can pass an e.g. bytes.Buffer if they want to encode to bytes in memory. Similarly decoding should in general take an io.Reader which is read-from. Same rationale. Callers can buffer some bytes into an e.g. bytes.Buffer or pass a bytes.NewReader(b) or etc. if they want to operate on bytes in memory.

I think Reader/Writer is a reasonable choice for ValueEncoder but it is likely not a big deal either way as most marshaling methods only use []byte

collections/keys.go

collections/map.go

collections/item.go

collections/collections.go

testinginprod · 2022-12-05T10:38:05Z

Moved the API to error in case of serialisation or deserialisation errors. Feel free to comment on the latest API changes.

peterbourgon

This PR defines encoding and decoding methods on individual types, e.g. Uint64Key. The idiomatic solution to this problem is to define a (non-generic) interface which those types may implement.

type KeyEncoder interface { EncodeKey(io.Writer) (int, error) }
type KeyDecoder interface { DecodeKey(io.Reader) (int, error) }

type Uint64Key uint64

func (k *Uint64Key) EncodeKey(w io.Writer) (int, error) { /* encode k to bytes, write bytes to w     */ }
func (k *Uint64Key) DecodeKey(r io.Writer) (int, error) { /* read bytes from r, decode and save to k */ }

The {Key,Value}{Encoder,Decoder} interfaces as defined in this PR are generic over the any type, which would only make sense if there were Encoder and Decoder types which were also generic over a wide set of types. For example, if there were a JSONKeyEncoder type, with an EncodeKey method that took a key T and could deal with a bunch of possible T values.

But that doesn't seem to be the case here. Can you document the rationale for this approach?

peterbourgon · 2022-12-05T23:19:08Z

collections/collections.go

+	// PutKey writes the key bytes into the buffer. Returns the number of
+	// bytes written.
+	// The bytes written must match the value returned by the Size method.
+	PutKey(buffer []byte, key T) (int, error)


Callers are free to pass buffers of any size. If that size is less than the assumed size of the key type, implementations of this method will panic. That's not really an acceptable API, right?

What motivates the method signature here? Why do Put and Read take byte slices directly instead of the more common io.Reader/Writer idiom?

io.Reader/Writer makes sense too, but I think this API can work too. The documentation just needs to be a bit clearer and state that the buffer must be at least as large as the value returned by Size.

An interface with methods that accept a byte slice which is expected to be of a certain minimum size defined by the return value of a different method of that interface is, at best, highly nonidiomatic. For those methods which accept byte slices to panic when that expectation is not met is basically non-viable. Many issues here.

First of all, because this isn't a reasonable invariant in the first place; there is no idiomatic Go API which takes a byte slice that is assumed to be of a dynamic size

Second, because this invariant is dynamic based on implementation, which means failures are runtime errors rather than programmer errors, and (basically) not the kind of thing that should induce panics in the first place

Third, because these methods return errors, and so any violation of this expectation is far more effectively communicated to callers that way

Fourth, because the interface documents nothing re: concurrency, meaning there is no guarantee that the Size returned by an implementation of this interface at time t1 will be the same value that's expected by the e.g. PutKey method called at some future time t2, which makes it unsound at the level of the language memory model

(several more things I elide for brevity)

Happy to explain any of these points in more detail, and/or to help improve the PR in any of these dimensions.

@peterbourgon Go's binary encoding API follows the same semantics. Only difference being:

For certain types, you have constant size (int64, uint64, ...) [perfectly inline with go's binary encoding API]

For others types, you have variable size ([]byte, string).

which means failures are runtime errors rather than programmer errors

A failure of slice out of bounds kind indicates a bad interface implementation, should I panic or not in this case? We can also error for instance by checking if the length matches the expected size.

There's also a second variant to this API which works with append.

PutKey would become: AppendKey(buffer []byte, key K) (newBuffer []byte, err error), would this satisfy your expectations?

I want to also mention that the consumer of the KeyEncoder API is collections itself, it's not external. The KeyEncoder implementation is consumed here: https://github.com/cosmos/cosmos-sdk/pull/14134/files#diff-7ac7fb13c90a4739690a55f679a39d02a650034f043b44875b4e205ea20807d0R100.

@testinginprod

Go's binary encoding API follows the same semantics

binary/encoding offers two "levels" of API. The lower level API you're pointing out here operates on byte slices directly and panics when invariants aren't met like e.g. the slices aren't big enough. The higher level API operates on io.Readers and io.Writers and doesn't panic. The lower level API is provided for niche use cases and isn't meant to be general-purpose. The higher level API is the one meant for general use.

A failure of slice out of bounds kind indicates a bad interface implementation,

I'm assuming that "a failure of slice out of bounds" means "providing a slice to PutKey which is smaller than the value returned by Size", is that right? Assuming so, this has nothing to do with the type that implements the interface, right? The implementation defines the PutKey and Size methods, but it can't make any guarantees about anything between them. If an implementation defines Size to return 8, and PutKey is called with a buffer of size 4, this isn't anything to do with the interface implementation, it's a function of the caller, and definitely isn't something that can be asserted as a programmer error.

should I panic or not in this case? We can also error for instance by checking if the length matches the expected size.

Calling PutKey with a buffer slice that's too-small is a bug, but it's not a programmer error, and should definitely not panic. It should return an error. Better yet, the API shouldn't use byte slices at all, it should operate on io.Readers and io.Writers. They are in fact faster than operating on byte slices in memory, in general; anyone claiming otherwise without concrete performance data is, bluntly, lying.

I want to also mention that the consumer of the KeyEncoder API is collections itself, it's not external.

Oh! Great! In that case, un-export the type and its methods, and all of my concerns go away 👍

The implementation defines the PutKey and Size methods, but it can't make any guarantees about anything between them.

This is true but the consumer of the collectionsKeyEncoder API is the collections.Map itself, in its encodeKey function: https://github.com/cosmos/cosmos-sdk/pull/14134/files#diff-7ac7fb13c90a4739690a55f679a39d02a650034f043b44875b4e205ea20807d0R125-R129

If calling StringKeyEncoder.Size("hello") returns 5 and I initialise a 5-length bytes slice and pass it to StringKeyEncoder.Encode(bufferOfLength5, "hello") and it panics because of slice out of bounds then it is by fact a bad interface implementation, also KeyEncoder is still free to return an error in its Encode function, which can contain the buf length check, if desired.

They are in fact faster than operating on byte slices in memory

TBH, I'd like the this to be proven, because I am not seeing how an io.Writer/Reader can give me any performance benefits, in KeyEncoders specifically, when I do have knowledge that:

Key byte size is ephemeral in 90% of cases 8-20-40-100 bytes, and I am exagerating.

This key ends up in KVStore which does not accept io.Reader/Writer but works on []byte.

For the collections.Map I'm not seeing how an abstraction over []byte can be more performant than the raw []byte itself, especially considering we don't have memory constraints in this context.

Oh! Great! In that case, un-export the type and its methods, and all of my concerns go away 👍

This is unfortunately impossible because, for backwards compatibility reasons, I need to allow developers to implement their own KeyEncoders in order to retain their serialisation semantics around, let's say, string rather than using what we provide as a StringKeyEncoder.

If calling StringKeyEncoder.Size("hello") returns 5 and I initialise a 5-length bytes slice and pass it to StringKeyEncoder.Encode(bufferOfLength5, "hello") and it panics because of slice out of bounds then it is by fact a bad interface implementation, also KeyEncoder is still free to return an error in its Encode function, which can contain the buf length check, if desired.

KeyEncoder is an interface which types can implement. And StringKeyEncoder is a type that would implement KeyEncoder, it isn't itself a value that callers would call methods on. So you wouldn't ever call StringKeyEncoder.Size, you'd always construct an enc = NewStringKeyEncoder, and then call enc.Size.

But wait. You define Size(value T) int, and then you say that Size("hello") would return 5, implying e.g. Size("xxx") would return 3. That means Size isn't a function of the type, it's a function of the value. That would mean Encode(buf []byte, val T) would demand buf sizes which were not established by the type T itself, but rather by the specific val of type T. How can this possibly work? What do you expect your callers to do?

enc := NewKeyEncoder[keytype, valtype](...) key := keytype(...) keysz := enc.Size(key) buf := make([]byte, keysz) err := enc.Encode(buf, key) // use buf

This is clearly non-viable, right?

aaronc · 2022-12-06T02:28:55Z

@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal. The generics propagate the type of the collection (Map, etc.) to the encoder. We do not want to require some set of wrapper types for each encoding as this requires using those wrapper types in the public API of collections. That pushes unnecessary complexity onto the user. I encourage you to spend more time studying the API of collections as is. I think it does make sense and that the Encoder/Decoder use of generics is correct. I do agree that more documentation on rationale would be good, but also keep in mind that this is actually still a draft PR where API details are being worked out.

peterbourgon · 2022-12-06T03:16:13Z

@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal.

I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?

The generics propagate the type of the collection (Map, etc.) to the encoder. We do not want to require some set of wrapper types for each encoding as this requires using those wrapper types in the public API of collections. That pushes unnecessary complexity onto the user. I encourage you to spend more time studying the API of collections as is. I think it does make sense and that the Encoder/Decoder use of generics is correct. I do agree that more documentation on rationale would be good, but also keep in mind that this is actually still a draft PR where API details are being worked out.

How do you envision the "end state" of encoding and decoding of keys and values? What do you expect would be the concrete implementations that satisfy the e.g. KeyEncoder interface? Would there be one per concrete key type (big-endian-uint64, length-prefixed-string, etc.)? Or one per encoding form (big-endian-encoder, JSON-encoder, etc.)? Something else? This PR suggests the first option, which is fine, but not a good match for the implementation.

testinginprod · 2022-12-06T08:31:27Z

@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal.

I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?

#14134 (comment)

Would there be one per concrete key type

Yes, I assume initially (not at comsos-sdk level, but ecosystem level) there will be a lot of different KeyEncoders to match whatever people were implementing manually as keys, then my assumption will be that there will be a convergence towards sdk's provided encoders for new modules and optimistically for old modules too as state migrations are written.

peterbourgon · 2022-12-06T10:01:47Z

Yes, I assume initially (not at comsos-sdk level, but ecosystem level) there will be a lot of different KeyEncoders to match whatever people were implementing manually as keys

KeyEncoder[T any] expresses something which models encoding formats or schemes such as JSON or XML, and not encodable key types like uint64 or string. Is this understood? If not, what can I help to clarify? This is an important point which seems to be fundamental to the PR and it's important that it's clear.

testinginprod · 2022-12-06T10:07:43Z

KeyEncoder[T any] expresses something which models encoding formats or schemes such as JSON or XML, and not encodable key types like uint64 or string. Is this understood?

This is not the point of KeyEncoder. It's not going to encode XML, JSON. It's rather going to be string=> []byte(string), uint64 => bigendian... etc.

If you feel how the API is expressed leads to this then we can iterate on how to better express its meaning.

ValueEncoder expresses the relationship you mentioned.

peterbourgon · 2022-12-06T10:08:33Z

I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?

#14134 (comment)

Defining an e.g. KeyEncoder type as a generic over any doesn't mean that any type is a valid key, and it doesn't allow downstream code to assert that constraint. Is that a goal or assumption of this code? (It can't be!)

collections/collections.go

collections/item.go

collections/map.go

tac0turtle

utACK. love the simplicity, left one question

collections/map.go

collections/collections_test.go

collections/collections.go

This reverts commit 82bc1a9.

julienrbrt · 2022-12-08T11:02:58Z

collections/go.mod

@@ -0,0 +1,53 @@
+module cosmossdk.io/collections


A few more things need to be done when adding a new module:

Can you add it to dependabot

Can you add it in the github workflow test.yml (under submodules)

Can you create a sonar-project.properties file (like in math) - with project key cosmos-sdk-collections

Can you add a changelog under collections/CHANGELOG.md

Added a vanity url here: feat: add collections vanity url vanity#29

lets do this in a follow up pr. Ideally we can scope PRs better to tooling vs logic

Make sense, except for the addition in CI. This should be done in the same PR as now our CI does not test it.

feat: collections init

1d3e878

testinginprod commented Dec 2, 2022

View reviewed changes

collections/collections.go Show resolved Hide resolved

testinginprod commented Dec 2, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

testinginprod commented Dec 2, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

aaronc reviewed Dec 2, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

testinginprod mentioned this pull request Dec 2, 2022

feat(ADR): ADR 062: Collections -- a new storage layer for cosmos-sdk modules. #14107

Merged

change: address PR comments + add item

2f7e962

testinginprod commented Dec 2, 2022

View reviewed changes

collections/item.go Show resolved Hide resolved

aaronc reviewed Dec 2, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 3, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 3, 2022

View reviewed changes

collections/keys.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 3, 2022

View reviewed changes

collections/map.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 3, 2022

View reviewed changes

collections/map.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 3, 2022

View reviewed changes

collections/item.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 3, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

testinginprod added 2 commits December 3, 2022 16:53

change: remove ErrNotFound type, make it value.

6b0cf0f

change: error, do not panic

e80a3a2

change: KeyEncoder API

b2b14b8

peterbourgon suggested changes Dec 6, 2022

View reviewed changes

testinginprod and others added 2 commits December 6, 2022 10:34

change: adjust encode key implementation

a04a1ea

Merge branch 'main' into tip/collections-1

b6185c4

aaronc reviewed Dec 6, 2022

View reviewed changes

collections/collections.go Show resolved Hide resolved

collections/item.go Outdated Show resolved Hide resolved

change: accept context.Context and cast to StorageProvider

b0ee283

tac0turtle reviewed Dec 7, 2022

View reviewed changes

collections/map.go Outdated Show resolved Hide resolved

tac0turtle reviewed Dec 7, 2022

View reviewed changes

testinginprod added 3 commits December 7, 2022 12:14

perf: copy prefix bytes, change: getStore can error

ac83e88

chore: docs on NewPrefix

ea4ed6c

change: remove Key suffix from KeyEncoder

25ea4cf

peterbourgon reviewed Dec 8, 2022

View reviewed changes

collections/map.go Show resolved Hide resolved

peterbourgon reviewed Dec 8, 2022

View reviewed changes

collections/collections_test.go Show resolved Hide resolved

peterbourgon reviewed Dec 8, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 8, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

peterbourgon reviewed Dec 8, 2022

View reviewed changes

collections/collections.go Outdated Show resolved Hide resolved

testinginprod and others added 2 commits December 8, 2022 05:46

chore: consistent naming

d741719

Merge branch 'main' into tip/collections-1

ddf86c4

tac0turtle approved these changes Dec 8, 2022

View reviewed changes

testinginprod enabled auto-merge (squash) December 8, 2022 09:46

testinginprod added 4 commits December 8, 2022 11:51

remove: btcec dependency

82bc1a9

Revert "remove: btcec dependency"

043d5da

This reverts commit 82bc1a9.

remove: btcec dependency

151275e

update gomod

01aa58e

julienrbrt mentioned this pull request Dec 8, 2022

feat: add collections vanity url cosmos/vanity#29

Merged

julienrbrt reviewed Dec 8, 2022

View reviewed changes

julienrbrt disabled auto-merge December 8, 2022 11:10

chore: add collections to CI

f30a049

github-actions bot added the Type: CI label Dec 8, 2022

testinginprod and others added 2 commits December 8, 2022 16:07

chore: go mod tidy

c2a90fd

Merge branch 'main' into tip/collections-1

60f8461

testinginprod enabled auto-merge (squash) December 8, 2022 15:09

testinginprod merged commit fca24b6 into cosmos:main Dec 8, 2022

testinginprod mentioned this pull request Dec 8, 2022

CI(collections): properly setup workflow for collections #14221

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134

feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134

testinginprod commented Dec 2, 2022 •

edited

Loading

testinginprod commented Dec 2, 2022

peterbourgon Dec 3, 2022

aaronc Dec 6, 2022

testinginprod commented Dec 5, 2022

peterbourgon left a comment •

edited

Loading

peterbourgon Dec 5, 2022 •

edited

Loading

aaronc Dec 6, 2022 •

edited

Loading

peterbourgon Dec 6, 2022 •

edited

Loading

testinginprod Dec 6, 2022 •

edited

Loading

peterbourgon Dec 8, 2022 •

edited

Loading

testinginprod Dec 8, 2022 •

edited

Loading

peterbourgon Dec 8, 2022

aaronc commented Dec 6, 2022 •

edited

Loading

peterbourgon commented Dec 6, 2022

testinginprod commented Dec 6, 2022

peterbourgon commented Dec 6, 2022

testinginprod commented Dec 6, 2022

peterbourgon commented Dec 6, 2022

tac0turtle left a comment

julienrbrt Dec 8, 2022 •

edited

Loading

tac0turtle Dec 8, 2022

julienrbrt Dec 8, 2022 •

edited

Loading

feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134

feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134

Conversation

testinginprod commented Dec 2, 2022 • edited Loading

Description

Author Checklist

Reviewers Checklist

testinginprod commented Dec 2, 2022

peterbourgon Dec 3, 2022

Choose a reason for hiding this comment

aaronc Dec 6, 2022

Choose a reason for hiding this comment

testinginprod commented Dec 5, 2022

peterbourgon left a comment • edited Loading

Choose a reason for hiding this comment

peterbourgon Dec 5, 2022 • edited Loading

Choose a reason for hiding this comment

aaronc Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

peterbourgon Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

testinginprod Dec 6, 2022 • edited Loading

Choose a reason for hiding this comment

peterbourgon Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

testinginprod Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

peterbourgon Dec 8, 2022

Choose a reason for hiding this comment

aaronc commented Dec 6, 2022 • edited Loading

peterbourgon commented Dec 6, 2022

testinginprod commented Dec 6, 2022

peterbourgon commented Dec 6, 2022

testinginprod commented Dec 6, 2022

peterbourgon commented Dec 6, 2022

tac0turtle left a comment

Choose a reason for hiding this comment

julienrbrt Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

tac0turtle Dec 8, 2022

Choose a reason for hiding this comment

julienrbrt Dec 8, 2022 • edited Loading

Choose a reason for hiding this comment

testinginprod commented Dec 2, 2022 •

edited

Loading

peterbourgon left a comment •

edited

Loading

peterbourgon Dec 5, 2022 •

edited

Loading

aaronc Dec 6, 2022 •

edited

Loading

peterbourgon Dec 6, 2022 •

edited

Loading

testinginprod Dec 6, 2022 •

edited

Loading

peterbourgon Dec 8, 2022 •

edited

Loading

testinginprod Dec 8, 2022 •

edited

Loading

aaronc commented Dec 6, 2022 •

edited

Loading

julienrbrt Dec 8, 2022 •

edited

Loading

julienrbrt Dec 8, 2022 •

edited

Loading