Skip to content

Commit

Permalink
Merge pull request #19 from multiformats/feat/minimal-encoding
Browse files Browse the repository at this point in the history
feat: require varints to be minimally encoded
  • Loading branch information
Stebalien committed Sep 18, 2019
2 parents 8097523 + fb4968a commit b30e694
Showing 1 changed file with 10 additions and 17 deletions.
27 changes: 10 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# unsigned-varint

[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io)
[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](https://protocol.ai)
[![](https://img.shields.io/badge/project-multiformats-blue.svg?style=flat-square)](https://github.com/multiformats/multiformats)
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](https://webchat.freenode.net/?channels=%23ipfs)
[![](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme)
Expand All @@ -21,6 +21,7 @@ The encoding is:
- unsigned integers are serialized 7 bits at a time, starting with the least significant bits
- the most significant bit (msb) in each output byte indicates if there is a continuation byte (msb = 1)
- there are no signed integers
- integers are minimally encoded

Examples:

Expand Down Expand Up @@ -77,28 +78,20 @@ For the forseeable future:
- A multiformat spec MAY explicitly declare a smaller maximum when using varints.
- A multiformat spec MAY NOT explicitly declare a larger maximum when using varints without first changing this spec.

### Max differences from Go Varint
### Main differences from Go Varint

This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one, and that one in turn was based on ...
This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one.

However, we have two modifications:

- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding)
- Multiformats varint does not use the 9th byte's msb as part of the number. It never interprets 64-bit numbers from 9 bytes. The Go varint does do that.
- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding).
- Multiformats varints must be minimally encoded. That is, numbers must be encoded in the least number of bytes possible.

> What is this about 9th byte msb in Go varint ...
> What do we mean by minimally encoded?
In the Go implementation, the target is a "64-bit integer". Since the 64th bit bumps to 10 varint bytes, the authors chose to restrict the maximum size to 9-bytes and made the last byte's msb be part of the number. This means the Go implementation is incompatible with 128bit varints (protobuf), see the design note in [varint.go](https://golang.org/src/encoding/binary/varint.go). This also means growing the varint may be difficult or break things as numbers might then mean two different things.

Instead, in the multiformats unsigned-varint, we explicitly declare that our unsigned varints are _theoretically_ infinite, but _in practice_ limited to 9 bytes for security. This means:

- There may always be a continuation bit.
- A continuation bit in the 9th byte MUST be ignored, and an error returned.
- Our unsigned ints are compatible with much larger integers (like 128-bit unsigned protobuf varints)
- Leaves door open for growing in the future if it is absolutely needed.
- This gives us a large window of numbers (2^63 is a huge number), plenty big for these use cases.

This format is simpler, and our varints are not expected to ever get beyond 63bits, as opposed to what you might find with group varints.
Multiformat varints must be encoded in as few bytes as possible. To illustrate
the issue, take `{0x81 0x00}`. This is a valid golang varint encoding of 0x1.
However, the _minimal_ encoding of 0x1 is `{0x1}`.

## Maintainers

Expand Down

0 comments on commit b30e694

Please sign in to comment.