From 1907e6a078cca5572a2140d1e9f713188f93ab06 Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Wed, 18 Sep 2019 00:44:06 -0700 Subject: [PATCH 1/2] feat: require varints to be minimally encoded --- README.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index fb1f62c..6173763 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # unsigned-varint -[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io) +[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](https://protocol.ai) [![](https://img.shields.io/badge/project-multiformats-blue.svg?style=flat-square)](https://github.com/multiformats/multiformats) [![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](https://webchat.freenode.net/?channels=%23ipfs) [![](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme) @@ -21,6 +21,7 @@ The encoding is: - unsigned integers are serialized 7 bits at a time, starting with the least significant bits - the most significant bit (msb) in each output byte indicates if there is a continuation byte (msb = 1) - there are no signed integers +- integers are minimally encoded Examples: @@ -77,13 +78,14 @@ For the forseeable future: - A multiformat spec MAY explicitly declare a smaller maximum when using varints. - A multiformat spec MAY NOT explicitly declare a larger maximum when using varints without first changing this spec. -### Max differences from Go Varint +### Main differences from Go Varint -This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one, and that one in turn was based on ... +This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one. -However, we have two modifications: +However, we have three modifications: -- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding) +- Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding). +- Multiformats varints must be minimally encoded. That is, numbers must be encoded in the least number of bytes possible. - Multiformats varint does not use the 9th byte's msb as part of the number. It never interprets 64-bit numbers from 9 bytes. The Go varint does do that. > What is this about 9th byte msb in Go varint ... @@ -100,6 +102,12 @@ Instead, in the multiformats unsigned-varint, we explicitly declare that our uns This format is simpler, and our varints are not expected to ever get beyond 63bits, as opposed to what you might find with group varints. +> What do we mean by minimally encoded? + +Multiformat varints must be encoded in as few bytes as possible. To illustrate +the issue, take `{0x81 0x00}`. This is a valid golang varint encoding of 0x1. +However, the _minimal_ encoding of 0x1 is `{0x1}`. + ## Maintainers Captain: [@jbenet](https://github.com/jbenet). From fb4968ae29c050ca2a0f39667c0290a041a83715 Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Wed, 18 Sep 2019 00:56:05 -0700 Subject: [PATCH 2/2] fix: remove outdated information about go --- README.md | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) diff --git a/README.md b/README.md index 6173763..19e637c 100644 --- a/README.md +++ b/README.md @@ -82,25 +82,10 @@ For the forseeable future: This MSB-based unsigned varint is based on the [varint of the Go standard library](https://golang.org/src/encoding/binary/varint.go), which itself was based on the protocol buffers one. -However, we have three modifications: +However, we have two modifications: - Multiformats varint only supports unsigned integers, the Go varint supports signed (using zig-zag encoding). - Multiformats varints must be minimally encoded. That is, numbers must be encoded in the least number of bytes possible. -- Multiformats varint does not use the 9th byte's msb as part of the number. It never interprets 64-bit numbers from 9 bytes. The Go varint does do that. - -> What is this about 9th byte msb in Go varint ... - -In the Go implementation, the target is a "64-bit integer". Since the 64th bit bumps to 10 varint bytes, the authors chose to restrict the maximum size to 9-bytes and made the last byte's msb be part of the number. This means the Go implementation is incompatible with 128bit varints (protobuf), see the design note in [varint.go](https://golang.org/src/encoding/binary/varint.go). This also means growing the varint may be difficult or break things as numbers might then mean two different things. - -Instead, in the multiformats unsigned-varint, we explicitly declare that our unsigned varints are _theoretically_ infinite, but _in practice_ limited to 9 bytes for security. This means: - -- There may always be a continuation bit. -- A continuation bit in the 9th byte MUST be ignored, and an error returned. -- Our unsigned ints are compatible with much larger integers (like 128-bit unsigned protobuf varints) -- Leaves door open for growing in the future if it is absolutely needed. - - This gives us a large window of numbers (2^63 is a huge number), plenty big for these use cases. - -This format is simpler, and our varints are not expected to ever get beyond 63bits, as opposed to what you might find with group varints. > What do we mean by minimally encoded?