-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134
Conversation
Iteration is not ported into this PR in order to have a conversation around API friendliness. |
collections/collections.go
Outdated
|
||
// ValueEncoder defines a generic interface which is implemented | ||
// by types that are capable of encoding and decoding collection values. | ||
type ValueEncoder[T any] interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encoding should in general take an io.Writer which is written-to. This provides maximum flexibility and performance. Callers can pass an e.g. bytes.Buffer if they want to encode to bytes in memory. Similarly decoding should in general take an io.Reader which is read-from. Same rationale. Callers can buffer some bytes into an e.g. bytes.Buffer or pass a bytes.NewReader(b) or etc. if they want to operate on bytes in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Reader/Writer is a reasonable choice for ValueEncoder
but it is likely not a big deal either way as most marshaling methods only use []byte
Moved the API to error in case of serialisation or deserialisation errors. Feel free to comment on the latest API changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR defines encoding and decoding methods on individual types, e.g. Uint64Key. The idiomatic solution to this problem is to define a (non-generic) interface which those types may implement.
type KeyEncoder interface { EncodeKey(io.Writer) (int, error) }
type KeyDecoder interface { DecodeKey(io.Reader) (int, error) }
type Uint64Key uint64
func (k *Uint64Key) EncodeKey(w io.Writer) (int, error) { /* encode k to bytes, write bytes to w */ }
func (k *Uint64Key) DecodeKey(r io.Writer) (int, error) { /* read bytes from r, decode and save to k */ }
The {Key,Value}{Encoder,Decoder} interfaces as defined in this PR are generic over the any type, which would only make sense if there were Encoder and Decoder types which were also generic over a wide set of types. For example, if there were a JSONKeyEncoder type, with an EncodeKey method that took a key T and could deal with a bunch of possible T values.
But that doesn't seem to be the case here. Can you document the rationale for this approach?
collections/collections.go
Outdated
// PutKey writes the key bytes into the buffer. Returns the number of | ||
// bytes written. | ||
// The bytes written must match the value returned by the Size method. | ||
PutKey(buffer []byte, key T) (int, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callers are free to pass buffers of any size. If that size is less than the assumed size of the key type, implementations of this method will panic. That's not really an acceptable API, right?
What motivates the method signature here? Why do Put and Read take byte slices directly instead of the more common io.Reader/Writer idiom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.Reader/Writer makes sense too, but I think this API can work too. The documentation just needs to be a bit clearer and state that the buffer must be at least as large as the value returned by Size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interface with methods that accept a byte slice which is expected to be of a certain minimum size defined by the return value of a different method of that interface is, at best, highly nonidiomatic. For those methods which accept byte slices to panic when that expectation is not met is basically non-viable. Many issues here.
- First of all, because this isn't a reasonable invariant in the first place; there is no idiomatic Go API which takes a byte slice that is assumed to be of a dynamic size
- Second, because this invariant is dynamic based on implementation, which means failures are runtime errors rather than programmer errors, and (basically) not the kind of thing that should induce panics in the first place
- Third, because these methods return errors, and so any violation of this expectation is far more effectively communicated to callers that way
- Fourth, because the interface documents nothing re: concurrency, meaning there is no guarantee that the Size returned by an implementation of this interface at time t1 will be the same value that's expected by the e.g. PutKey method called at some future time t2, which makes it unsound at the level of the language memory model
- (several more things I elide for brevity)
Happy to explain any of these points in more detail, and/or to help improve the PR in any of these dimensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterbourgon Go's binary encoding API follows the same semantics. Only difference being:
- For certain types, you have constant size (int64, uint64, ...) [perfectly inline with go's binary encoding API]
- For others types, you have variable size ([]byte, string).
which means failures are runtime errors rather than programmer errors
A failure of slice out of bounds
kind indicates a bad interface implementation, should I panic or not in this case? We can also error for instance by checking if the length matches the expected size.
There's also a second variant to this API which works with append
.
PutKey
would become: AppendKey(buffer []byte, key K) (newBuffer []byte, err error)
, would this satisfy your expectations?
I want to also mention that the consumer of the KeyEncoder API is collections itself, it's not external. The KeyEncoder
implementation is consumed here: https://github.com/cosmos/cosmos-sdk/pull/14134/files#diff-7ac7fb13c90a4739690a55f679a39d02a650034f043b44875b4e205ea20807d0R100.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go's binary encoding API follows the same semantics
binary/encoding offers two "levels" of API. The lower level API you're pointing out here operates on byte slices directly and panics when invariants aren't met like e.g. the slices aren't big enough. The higher level API operates on io.Readers and io.Writers and doesn't panic. The lower level API is provided for niche use cases and isn't meant to be general-purpose. The higher level API is the one meant for general use.
A failure of slice out of bounds kind indicates a bad interface implementation,
I'm assuming that "a failure of slice out of bounds" means "providing a slice to PutKey which is smaller than the value returned by Size", is that right? Assuming so, this has nothing to do with the type that implements the interface, right? The implementation defines the PutKey and Size methods, but it can't make any guarantees about anything between them. If an implementation defines Size to return 8, and PutKey is called with a buffer of size 4, this isn't anything to do with the interface implementation, it's a function of the caller, and definitely isn't something that can be asserted as a programmer error.
should I panic or not in this case? We can also error for instance by checking if the length matches the expected size.
Calling PutKey with a buffer slice that's too-small is a bug, but it's not a programmer error, and should definitely not panic. It should return an error. Better yet, the API shouldn't use byte slices at all, it should operate on io.Readers and io.Writers. They are in fact faster than operating on byte slices in memory, in general; anyone claiming otherwise without concrete performance data is, bluntly, lying.
I want to also mention that the consumer of the KeyEncoder API is collections itself, it's not external.
Oh! Great! In that case, un-export the type and its methods, and all of my concerns go away 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation defines the PutKey and Size methods, but it can't make any guarantees about anything between them.
This is true but the consumer of the collectionsKeyEncoder API is the collections.Map itself, in its encodeKey function: https://github.com/cosmos/cosmos-sdk/pull/14134/files#diff-7ac7fb13c90a4739690a55f679a39d02a650034f043b44875b4e205ea20807d0R125-R129
If calling StringKeyEncoder.Size("hello")
returns 5
and I initialise a 5-length bytes slice and pass it to StringKeyEncoder.Encode(bufferOfLength5, "hello")
and it panics because of slice out of bounds then it is by fact a bad interface implementation, also KeyEncoder
is still free to return an error in its Encode
function, which can contain the buf length check, if desired.
They are in fact faster than operating on byte slices in memory
TBH, I'd like the this to be proven, because I am not seeing how an io.Writer/Reader
can give me any performance benefits, in KeyEncoders
specifically, when I do have knowledge that:
- Key byte size is ephemeral in 90% of cases 8-20-40-100 bytes, and I am exagerating.
- This key ends up in
KVStore
which does not acceptio.Reader/Writer
but works on[]byte
.
For the collections.Map
I'm not seeing how an abstraction over []byte
can be more performant than the raw []byte
itself, especially considering we don't have memory constraints in this context.
Oh! Great! In that case, un-export the type and its methods, and all of my concerns go away 👍
This is unfortunately impossible because, for backwards compatibility reasons, I need to allow developers to implement their own KeyEncoders
in order to retain their serialisation semantics around, let's say, string
rather than using what we provide as a StringKeyEncoder
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If calling StringKeyEncoder.Size("hello") returns 5 and I initialise a 5-length bytes slice and pass it to StringKeyEncoder.Encode(bufferOfLength5, "hello") and it panics because of slice out of bounds then it is by fact a bad interface implementation, also KeyEncoder is still free to return an error in its Encode function, which can contain the buf length check, if desired.
KeyEncoder is an interface which types can implement. And StringKeyEncoder is a type that would implement KeyEncoder, it isn't itself a value that callers would call methods on. So you wouldn't ever call StringKeyEncoder.Size, you'd always construct an enc = NewStringKeyEncoder, and then call enc.Size.
But wait. You define Size(value T) int, and then you say that Size("hello") would return 5, implying e.g. Size("xxx") would return 3. That means Size isn't a function of the type, it's a function of the value. That would mean Encode(buf []byte, val T) would demand buf sizes which were not established by the type T itself, but rather by the specific val of type T. How can this possibly work? What do you expect your callers to do?
enc := NewKeyEncoder[keytype, valtype](...)
key := keytype(...)
keysz := enc.Size(key)
buf := make([]byte, keysz)
err := enc.Encode(buf, key)
// use buf
This is clearly non-viable, right?
@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal. The generics propagate the type of the collection ( |
I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?
How do you envision the "end state" of encoding and decoding of keys and values? What do you expect would be the concrete implementations that satisfy the e.g. KeyEncoder interface? Would there be one per concrete key type (big-endian-uint64, length-prefixed-string, etc.)? Or one per encoding form (big-endian-encoder, JSON-encoder, etc.)? Something else? This PR suggests the first option, which is fine, but not a good match for the implementation. |
Yes, I assume initially (not at comsos-sdk level, but ecosystem level) there will be a lot of different KeyEncoders to match whatever people were implementing manually as keys, then my assumption will be that there will be a convergence towards sdk's provided encoders for new modules and optimistically for old modules too as state migrations are written. |
KeyEncoder[T any] expresses something which models encoding formats or schemes such as JSON or XML, and not encodable key types like uint64 or string. Is this understood? If not, what can I help to clarify? This is an important point which seems to be fundamental to the PR and it's important that it's clear. |
This is not the point of If you feel how the API is expressed leads to this then we can iterate on how to better express its meaning.
|
Defining an e.g. KeyEncoder type as a generic over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK. love the simplicity, left one question
@@ -0,0 +1,53 @@ | |||
module cosmossdk.io/collections |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more things need to be done when adding a new module:
- Can you add it to dependabot
- Can you add it in the github workflow test.yml (under submodules)
- Can you create a
sonar-project.properties
file (like in math) - with project keycosmos-sdk-collections
- Can you add a changelog under collections/CHANGELOG.md
- Added a vanity url here: feat: add collections vanity url vanity#29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets do this in a follow up pr. Ideally we can scope PRs better to tooling vs logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, except for the addition in CI. This should be done in the same PR as now our CI does not test it.
Description
Optimistically porting the core API of collections, pending ADR approval, with some slight modifications to address initial reviews.
Closes: #XXXX
Author Checklist
All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.
I have...
!
to the type prefix if API or client breaking changeCHANGELOG.md
Reviewers Checklist
All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.
I have...
!
in the type prefix if API or client breaking change