Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(collections): Initialise core (Prefix, KeyEncoder, ValueEncoder, Map) #14134

Merged
merged 42 commits into from
Dec 8, 2022

Conversation

testinginprod
Copy link
Contributor

@testinginprod testinginprod commented Dec 2, 2022

Description

Optimistically porting the core API of collections, pending ADR approval, with some slight modifications to address initial reviews.

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title
  • added ! to the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • followed the guidelines for building modules
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • included comments for documenting Go code
  • updated the relevant documentation or specification
  • reviewed "Files changed" and left comments if necessary
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed ! in the type prefix if API or client breaking change
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic
  • reviewed API design and naming
  • reviewed documentation is accurate
  • reviewed tests and test coverage
  • manually tested (if applicable)

@testinginprod
Copy link
Contributor Author

Iteration is not ported into this PR in order to have a conversation around API friendliness.

collections/collections.go Outdated Show resolved Hide resolved
collections/collections.go Outdated Show resolved Hide resolved

// ValueEncoder defines a generic interface which is implemented
// by types that are capable of encoding and decoding collection values.
type ValueEncoder[T any] interface {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encoding should in general take an io.Writer which is written-to. This provides maximum flexibility and performance. Callers can pass an e.g. bytes.Buffer if they want to encode to bytes in memory. Similarly decoding should in general take an io.Reader which is read-from. Same rationale. Callers can buffer some bytes into an e.g. bytes.Buffer or pass a bytes.NewReader(b) or etc. if they want to operate on bytes in memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Reader/Writer is a reasonable choice for ValueEncoder but it is likely not a big deal either way as most marshaling methods only use []byte

collections/keys.go Outdated Show resolved Hide resolved
collections/map.go Outdated Show resolved Hide resolved
collections/map.go Outdated Show resolved Hide resolved
collections/item.go Outdated Show resolved Hide resolved
@testinginprod
Copy link
Contributor Author

Moved the API to error in case of serialisation or deserialisation errors. Feel free to comment on the latest API changes.

Copy link

@peterbourgon peterbourgon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR defines encoding and decoding methods on individual types, e.g. Uint64Key. The idiomatic solution to this problem is to define a (non-generic) interface which those types may implement.

type KeyEncoder interface { EncodeKey(io.Writer) (int, error) }
type KeyDecoder interface { DecodeKey(io.Reader) (int, error) }

type Uint64Key uint64

func (k *Uint64Key) EncodeKey(w io.Writer) (int, error) { /* encode k to bytes, write bytes to w     */ }
func (k *Uint64Key) DecodeKey(r io.Writer) (int, error) { /* read bytes from r, decode and save to k */ }

The {Key,Value}{Encoder,Decoder} interfaces as defined in this PR are generic over the any type, which would only make sense if there were Encoder and Decoder types which were also generic over a wide set of types. For example, if there were a JSONKeyEncoder type, with an EncodeKey method that took a key T and could deal with a bunch of possible T values.

But that doesn't seem to be the case here. Can you document the rationale for this approach?

// PutKey writes the key bytes into the buffer. Returns the number of
// bytes written.
// The bytes written must match the value returned by the Size method.
PutKey(buffer []byte, key T) (int, error)
Copy link

@peterbourgon peterbourgon Dec 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callers are free to pass buffers of any size. If that size is less than the assumed size of the key type, implementations of this method will panic. That's not really an acceptable API, right?

What motivates the method signature here? Why do Put and Read take byte slices directly instead of the more common io.Reader/Writer idiom?

Copy link
Member

@aaronc aaronc Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.Reader/Writer makes sense too, but I think this API can work too. The documentation just needs to be a bit clearer and state that the buffer must be at least as large as the value returned by Size.

Copy link

@peterbourgon peterbourgon Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An interface with methods that accept a byte slice which is expected to be of a certain minimum size defined by the return value of a different method of that interface is, at best, highly nonidiomatic. For those methods which accept byte slices to panic when that expectation is not met is basically non-viable. Many issues here.

  • First of all, because this isn't a reasonable invariant in the first place; there is no idiomatic Go API which takes a byte slice that is assumed to be of a dynamic size
  • Second, because this invariant is dynamic based on implementation, which means failures are runtime errors rather than programmer errors, and (basically) not the kind of thing that should induce panics in the first place
  • Third, because these methods return errors, and so any violation of this expectation is far more effectively communicated to callers that way
  • Fourth, because the interface documents nothing re: concurrency, meaning there is no guarantee that the Size returned by an implementation of this interface at time t1 will be the same value that's expected by the e.g. PutKey method called at some future time t2, which makes it unsound at the level of the language memory model
  • (several more things I elide for brevity)

Happy to explain any of these points in more detail, and/or to help improve the PR in any of these dimensions.

Copy link
Contributor Author

@testinginprod testinginprod Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterbourgon Go's binary encoding API follows the same semantics. Only difference being:

  • For certain types, you have constant size (int64, uint64, ...) [perfectly inline with go's binary encoding API]
  • For others types, you have variable size ([]byte, string).

which means failures are runtime errors rather than programmer errors

A failure of slice out of bounds kind indicates a bad interface implementation, should I panic or not in this case? We can also error for instance by checking if the length matches the expected size.

There's also a second variant to this API which works with append.

PutKey would become: AppendKey(buffer []byte, key K) (newBuffer []byte, err error), would this satisfy your expectations?

I want to also mention that the consumer of the KeyEncoder API is collections itself, it's not external. The KeyEncoder implementation is consumed here: https://github.com/cosmos/cosmos-sdk/pull/14134/files#diff-7ac7fb13c90a4739690a55f679a39d02a650034f043b44875b4e205ea20807d0R100.

Copy link

@peterbourgon peterbourgon Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@testinginprod

Go's binary encoding API follows the same semantics

binary/encoding offers two "levels" of API. The lower level API you're pointing out here operates on byte slices directly and panics when invariants aren't met like e.g. the slices aren't big enough. The higher level API operates on io.Readers and io.Writers and doesn't panic. The lower level API is provided for niche use cases and isn't meant to be general-purpose. The higher level API is the one meant for general use.

A failure of slice out of bounds kind indicates a bad interface implementation,

I'm assuming that "a failure of slice out of bounds" means "providing a slice to PutKey which is smaller than the value returned by Size", is that right? Assuming so, this has nothing to do with the type that implements the interface, right? The implementation defines the PutKey and Size methods, but it can't make any guarantees about anything between them. If an implementation defines Size to return 8, and PutKey is called with a buffer of size 4, this isn't anything to do with the interface implementation, it's a function of the caller, and definitely isn't something that can be asserted as a programmer error.

should I panic or not in this case? We can also error for instance by checking if the length matches the expected size.

Calling PutKey with a buffer slice that's too-small is a bug, but it's not a programmer error, and should definitely not panic. It should return an error. Better yet, the API shouldn't use byte slices at all, it should operate on io.Readers and io.Writers. They are in fact faster than operating on byte slices in memory, in general; anyone claiming otherwise without concrete performance data is, bluntly, lying.

I want to also mention that the consumer of the KeyEncoder API is collections itself, it's not external.

Oh! Great! In that case, un-export the type and its methods, and all of my concerns go away 👍

Copy link
Contributor Author

@testinginprod testinginprod Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation defines the PutKey and Size methods, but it can't make any guarantees about anything between them.

This is true but the consumer of the collectionsKeyEncoder API is the collections.Map itself, in its encodeKey function: https://github.com/cosmos/cosmos-sdk/pull/14134/files#diff-7ac7fb13c90a4739690a55f679a39d02a650034f043b44875b4e205ea20807d0R125-R129

If calling StringKeyEncoder.Size("hello") returns 5 and I initialise a 5-length bytes slice and pass it to StringKeyEncoder.Encode(bufferOfLength5, "hello") and it panics because of slice out of bounds then it is by fact a bad interface implementation, also KeyEncoder is still free to return an error in its Encode function, which can contain the buf length check, if desired.

They are in fact faster than operating on byte slices in memory

TBH, I'd like the this to be proven, because I am not seeing how an io.Writer/Reader can give me any performance benefits, in KeyEncoders specifically, when I do have knowledge that:

  • Key byte size is ephemeral in 90% of cases 8-20-40-100 bytes, and I am exagerating.
  • This key ends up in KVStore which does not accept io.Reader/Writer but works on []byte.

For the collections.Map I'm not seeing how an abstraction over []byte can be more performant than the raw []byte itself, especially considering we don't have memory constraints in this context.

Oh! Great! In that case, un-export the type and its methods, and all of my concerns go away 👍

This is unfortunately impossible because, for backwards compatibility reasons, I need to allow developers to implement their own KeyEncoders in order to retain their serialisation semantics around, let's say, string rather than using what we provide as a StringKeyEncoder.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If calling StringKeyEncoder.Size("hello") returns 5 and I initialise a 5-length bytes slice and pass it to StringKeyEncoder.Encode(bufferOfLength5, "hello") and it panics because of slice out of bounds then it is by fact a bad interface implementation, also KeyEncoder is still free to return an error in its Encode function, which can contain the buf length check, if desired.

KeyEncoder is an interface which types can implement. And StringKeyEncoder is a type that would implement KeyEncoder, it isn't itself a value that callers would call methods on. So you wouldn't ever call StringKeyEncoder.Size, you'd always construct an enc = NewStringKeyEncoder, and then call enc.Size.

But wait. You define Size(value T) int, and then you say that Size("hello") would return 5, implying e.g. Size("xxx") would return 3. That means Size isn't a function of the type, it's a function of the value. That would mean Encode(buf []byte, val T) would demand buf sizes which were not established by the type T itself, but rather by the specific val of type T. How can this possibly work? What do you expect your callers to do?

enc := NewKeyEncoder[keytype, valtype](...)
key := keytype(...)
keysz := enc.Size(key)
buf := make([]byte, keysz)
err := enc.Encode(buf, key)
// use buf

This is clearly non-viable, right?

@aaronc
Copy link
Member

aaronc commented Dec 6, 2022

@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal. The generics propagate the type of the collection (Map, etc.) to the encoder. We do not want to require some set of wrapper types for each encoding as this requires using those wrapper types in the public API of collections. That pushes unnecessary complexity onto the user. I encourage you to spend more time studying the API of collections as is. I think it does make sense and that the Encoder/Decoder use of generics is correct. I do agree that more documentation on rationale would be good, but also keep in mind that this is actually still a draft PR where API details are being worked out.

@peterbourgon
Copy link

@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal.

I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?

The generics propagate the type of the collection (Map, etc.) to the encoder. We do not want to require some set of wrapper types for each encoding as this requires using those wrapper types in the public API of collections. That pushes unnecessary complexity onto the user. I encourage you to spend more time studying the API of collections as is. I think it does make sense and that the Encoder/Decoder use of generics is correct. I do agree that more documentation on rationale would be good, but also keep in mind that this is actually still a draft PR where API details are being worked out.

How do you envision the "end state" of encoding and decoding of keys and values? What do you expect would be the concrete implementations that satisfy the e.g. KeyEncoder interface? Would there be one per concrete key type (big-endian-uint64, length-prefixed-string, etc.)? Or one per encoding form (big-endian-encoder, JSON-encoder, etc.)? Something else? This PR suggests the first option, which is fine, but not a good match for the implementation.

@testinginprod
Copy link
Contributor Author

@peterbourgon, @testinginprod already addressed why the non-generic API you are proposing isn't ideal.

I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?

#14134 (comment)

Would there be one per concrete key type

Yes, I assume initially (not at comsos-sdk level, but ecosystem level) there will be a lot of different KeyEncoders to match whatever people were implementing manually as keys, then my assumption will be that there will be a convergence towards sdk's provided encoders for new modules and optimistically for old modules too as state migrations are written.

@peterbourgon
Copy link

Yes, I assume initially (not at comsos-sdk level, but ecosystem level) there will be a lot of different KeyEncoders to match whatever people were implementing manually as keys

KeyEncoder[T any] expresses something which models encoding formats or schemes such as JSON or XML, and not encodable key types like uint64 or string. Is this understood? If not, what can I help to clarify? This is an important point which seems to be fundamental to the PR and it's important that it's clear.

@testinginprod
Copy link
Contributor Author

KeyEncoder[T any] expresses something which models encoding formats or schemes such as JSON or XML, and not encodable key types like uint64 or string. Is this understood?

This is not the point of KeyEncoder. It's not going to encode XML, JSON. It's rather going to be string=> []byte(string), uint64 => bigendian... etc.

If you feel how the API is expressed leads to this then we can iterate on how to better express its meaning.

ValueEncoder expresses the relationship you mentioned.

@peterbourgon
Copy link

I've read every comment in the PR and haven't found this rationale, can you point me to what you're referring to?

#14134 (comment)

Defining an e.g. KeyEncoder type as a generic over any doesn't mean that any type is a valid key, and it doesn't allow downstream code to assert that constraint. Is that a goal or assumption of this code? (It can't be!)

collections/collections.go Show resolved Hide resolved
collections/item.go Outdated Show resolved Hide resolved
collections/map.go Outdated Show resolved Hide resolved
Copy link
Member

@tac0turtle tac0turtle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK. love the simplicity, left one question

@testinginprod testinginprod enabled auto-merge (squash) December 8, 2022 09:46
@@ -0,0 +1,53 @@
module cosmossdk.io/collections
Copy link
Member

@julienrbrt julienrbrt Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things need to be done when adding a new module:

  • Can you add it to dependabot
  • Can you add it in the github workflow test.yml (under submodules)
  • Can you create a sonar-project.properties file (like in math) - with project key cosmos-sdk-collections
  • Can you add a changelog under collections/CHANGELOG.md
  • Added a vanity url here: feat: add collections vanity url vanity#29

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets do this in a follow up pr. Ideally we can scope PRs better to tooling vs logic

Copy link
Member

@julienrbrt julienrbrt Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, except for the addition in CI. This should be done in the same PR as now our CI does not test it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants