diff --git a/README.md b/README.md index fb1f62c..19e637c 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # unsigned-varint -[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io) +[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](https://protocol.ai) [![](https://img.shields.io/badge/project-multiformats-blue.svg?style=flat-square)](https://github.com/multiformats/multiformats) [![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](https://webchat.freenode.net/?channels=%23ipfs) [![](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme) @@ -21,6 +21,7 @@ The encoding is: - unsigned integers are serialized 7 bits at a time, starting with the least significant bits - the most significant bit (msb) in each output byte indicates if there is a continuation byte (msb = 1) - there are no signed integers +- integers are minimally encoded Examples: @@ -77,28 +78,20 @@ For the forseeable future: - A multiformat spec MAY explicitly declare a smaller maximum when using varints. - A multiformat spec MAY NOT explicitly declare a larger maximum when using varints without first changing this spec. -### Max differences from Go Varint +### Main differences from Go Varint -This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one, and that one in turn was based on ... +This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one. However, we have two modifications: -- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding) -- Multiformats varint does not use the 9th byte's msb as part of the number. It never interprets 64-bit numbers from 9 bytes. The Go varint does do that. +- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding). +- Multiformats varints must be minimally encoded. That is, numbers must be encoded in the least number of bytes possible. -> What is this about 9th byte msb in Go varint ... +> What do we mean by minimally encoded? -In the Go implementation, the target is a "64-bit integer". Since the 64th bit bumps to 10 varint bytes, the authors chose to restrict the maximum size to 9-bytes and made the last byte's msb be part of the number. This means the Go implementation is incompatible with 128bit varints (protobuf), see the design note in [varint.go](https://golang.org/src/encoding/binary/varint.go). This also means growing the varint may be difficult or break things as numbers might then mean two different things. - -Instead, in the multiformats unsigned-varint, we explicitly declare that our unsigned varints are _theoretically_ infinite, but _in practice_ limited to 9 bytes for security. This means: - -- There may always be a continuation bit. -- A continuation bit in the 9th byte MUST be ignored, and an error returned. -- Our unsigned ints are compatible with much larger integers (like 128-bit unsigned protobuf varints) -- Leaves door open for growing in the future if it is absolutely needed. - - This gives us a large window of numbers (2^63 is a huge number), plenty big for these use cases. - -This format is simpler, and our varints are not expected to ever get beyond 63bits, as opposed to what you might find with group varints. +Multiformat varints must be encoded in as few bytes as possible. To illustrate +the issue, take `{0x81 0x00}`. This is a valid golang varint encoding of 0x1. +However, the _minimal_ encoding of 0x1 is `{0x1}`. ## Maintainers