Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: require varints to be minimally encoded #19

Merged
merged 2 commits into from
Sep 18, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 10 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# unsigned-varint

[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io)
[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](https://protocol.ai)
[![](https://img.shields.io/badge/project-multiformats-blue.svg?style=flat-square)](https://github.com/multiformats/multiformats)
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](https://webchat.freenode.net/?channels=%23ipfs)
[![](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme)
Expand All @@ -21,6 +21,7 @@ The encoding is:
- unsigned integers are serialized 7 bits at a time, starting with the least significant bits
- the most significant bit (msb) in each output byte indicates if there is a continuation byte (msb = 1)
- there are no signed integers
- integers are minimally encoded

Examples:

Expand Down Expand Up @@ -77,28 +78,20 @@ For the forseeable future:
- A multiformat spec MAY explicitly declare a smaller maximum when using varints.
- A multiformat spec MAY NOT explicitly declare a larger maximum when using varints without first changing this spec.

### Max differences from Go Varint
### Main differences from Go Varint

This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one, and that one in turn was based on ...
This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one.

However, we have two modifications:

- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding)
- Multiformats varint does not use the 9th byte's msb as part of the number. It never interprets 64-bit numbers from 9 bytes. The Go varint does do that.
- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding).
- Multiformats varints must be minimally encoded. That is, numbers must be encoded in the least number of bytes possible.

> What is this about 9th byte msb in Go varint ...
> What do we mean by minimally encoded?
In the Go implementation, the target is a "64-bit integer". Since the 64th bit bumps to 10 varint bytes, the authors chose to restrict the maximum size to 9-bytes and made the last byte's msb be part of the number. This means the Go implementation is incompatible with 128bit varints (protobuf), see the design note in [varint.go](https://golang.org/src/encoding/binary/varint.go). This also means growing the varint may be difficult or break things as numbers might then mean two different things.

Instead, in the multiformats unsigned-varint, we explicitly declare that our unsigned varints are _theoretically_ infinite, but _in practice_ limited to 9 bytes for security. This means:

- There may always be a continuation bit.
- A continuation bit in the 9th byte MUST be ignored, and an error returned.
- Our unsigned ints are compatible with much larger integers (like 128-bit unsigned protobuf varints)
- Leaves door open for growing in the future if it is absolutely needed.
- This gives us a large window of numbers (2^63 is a huge number), plenty big for these use cases.

This format is simpler, and our varints are not expected to ever get beyond 63bits, as opposed to what you might find with group varints.
Multiformat varints must be encoded in as few bytes as possible. To illustrate
the issue, take `{0x81 0x00}`. This is a valid golang varint encoding of 0x1.
However, the _minimal_ encoding of 0x1 is `{0x1}`.

## Maintainers

Expand Down