Add support for unknown fields #2

danburkert · 2017-06-26T01:28:06Z

See https://docs.google.com/document/d/1KMRX-G91Aa-Y2FkEaHeeviLRRNblgIahbsk4wA14gRk/view and protocolbuffers/protobuf#272.

per-gron · 2018-07-05T16:04:39Z

I'm interested in this.

In order to make this work, unknown fields have to be stored in the structs somehow, something along the lines of adding an UnknownFieldSet to every generated message struct. It could be made to be just a null pointer and nothing more unless there are unknown fields.

This would be an API break, since existing struct instantiations would break because of not setting this field. A seemingly sensible way to make this sane is to recommend everyone to instantiate structs like this:

syntax = "proto3";

message Person {
  string name = 1;
}

Person { name: "Per".to_string(), ..Default::default() };

This would work if all generated structs would #[derive(Default)] (or equivalent), which seems like a sensible thing to do in general because Protobufs in general assume that no code breaks from just adding a field. (Not sure how this would interact with proto2 default values?) This also seems compatible with https://github.com/danburkert/prost/issues/43 (#[non_exhaustive]).

For completeness, the parsing code would probably have to be updated to at least not discard groups, even if it's not made capable of actually parsing them.

I don't know if known fields with unexpected type should go into the unknown fields set or not.

Does this make sense?

per-gron · 2018-07-11T16:17:09Z

I have started hacking on this.

There are some ugly corner cases that I'm not entirely sure how to deal with. For example the well-known types such as BoolValue when represented as their Rust native types obviously won't be able to keep unknown values. The same applies to maps.

danburkert · 2018-07-29T19:36:06Z

@per-gron thanks for the PR. Could you explain the motivation behind the unknown fields feature? I've read through the upstream docs linked in the first comment, but I don't really understand why it's become such a priority for Google to support this feature. My take is that unknown fields are full of subtle footguns1, and the described usecases are better either not a good fit for protobuf (RMW) or better done through preserving the original encoded message (intermediary servers).

per-gron · 2018-07-29T20:16:00Z

@danburkert I honestly don't know exactly why either, but I can speculate a bit: It seems like the fact that proto3 is different from proto2 in this regard has made it difficult for a lot of internal projects to adopt proto3. It's an API break that simply makes it scary to upgrade when you have a huge code base, and the difference is not important enough to justify a change of behavior.

I agree that this feature is footgun-prone; yet I think it is better to behave as similarly as possible to the other protobuf libraries than to not have it at all. (It is of course possible to provide some kind of Prost setting to disable the feature for projects that don't care about having the same behavior across languages.)

For me personally, I don't really care much about this particular feature or lack of it, the reason I wanted to work on this is that it seems like Prost is a very nice Rust Protobuf library, and I think it would be good for Rust to have a more "officially endorsed" version that people can trust will work as expected, including across languages. When working with large projects and across different languages, uniformity really helps a lot, even with really small details.

(Note: Even though I work at Google I don't have any special power to "officialize" any particular Rust Protobuf library, but I hope that fixing details like this one so that its behavior is very close to the main Protobuf libraries along with having people from the Rust community agree that Prost's struct-based API is nice will go a long way.)

danburkert · 2018-08-02T06:07:36Z

Just to set expectations, it's never been a goal of mine to have prost officially endorsed by Google or the upstream Protobuf authors or community. Nor has it been a goal to have feature compatibility with other Protobuf libraries, since there are so many Protobuf features (at least in the upstream C++ implementation). My goals with prost are roughly (and probably omitting some important ones):

Idiomatic generated code
performance
straightforward build system integration
Compatibility with other Protobuf libraries (wouldn't really be protobuf otherwise)
Support the core proto2/proto3 features

The upstream project has done a great job publishing conformance tests which prost uses to ensure compatibility.

As far as adding features for the sake of parity, there's no way I can feasibly do that as a single maintainer, nor do I think I could even maintain such a library if the features were contributed by others. prost is currently 'feature-complete' for the purposes of the application for which I wrote it, so isn't a burning need for me to keep adding features. I'm especially disinclined to add features which I think are harmful to creating flexible applications, and while I haven't completely made up my mind, I tend to think unknown fields fall in that category. I've yet to come across a solid usecase for them that isn't error prone or better solved in a different way.

per-gron · 2018-08-06T11:10:40Z

Thanks for the information about your priorities with Prost. (I've been on vacation hence my slow reply.)

The reasons that I wrote this PR are 1) there was an open issue about it written by you, 2) unlike JSON/groups/reflection this is something that is or will soon be required by the spec so this is not quite the same as those other features.

Given help with maintenance, would you be interested in expanding the scope of this library to include features that makes it possible to get broader adoption? If not, would you support a fork with that goal?

scottlamb · 2018-08-06T20:06:27Z

I don't really understand why it's become such a priority for Google to support this feature. My take is that unknown fields are full of subtle footguns1, and the described usecases are better either not a good fit for protobuf (RMW) or better done through preserving the original encoded message (intermediary servers).

I happen to work at Google (not on protobufs!) and know folks who really pushed for this (reversing the proto 2->3 decision to drop unknown field handling). Those two use cases are the reason why.

I'm not sure why you say protobuf is not a good fit for RMW; it's overwhelmingly common for databases to be full of protobuf values which servers do RMW on. It generally works well...but most teams are still using proto 2 (with no plans to adopt proto 3). Some that started using proto 3 got nasty surprises; thus the push.

I see your point intermediary servers could preserve the original encoded message, but there are also reasons it's easier to work with a message field embedded in a message field rather than a bytes field embedded in a message field. Think ease of writing an ASCII message (to check in as test data or to send an RPC by hand on the commandline while debugging), ease of understanding debug representations (inverse of the above), and not having to explicitly have code do additional serialization/deserialization steps / possibly mess up the type of the contained proto.

The oneof footgun doesn't concern me too much, fwiw. It's wrong to have a message that has more than one of the oneof present; inconsistent behavior in which is used should probably be expected. There are bigger footguns involved in proto compatibility (reusing field numbers / changing types of fields is a huge common way to screw up; there are a lot of variations of how to make this mistake and how it can bite), but dealing with backward and forward compatibility can't be avoided in many applications.

Given help with maintenance, would you be interested in expanding the scope of this library to include features that makes it possible to get broader adoption? If not, would you support a fork with that goal?

I'm also interested in the answer here (though I can't commit to significant help myself). I appreciate the hard work you've done on this project. Nonetheless, it's a bit frustrating that there are at least three apparently-commonly-used Rust proto implementations, with none being a superset of the others (much less matching Google's official C++ protobuf implementation). I'd love a path to resolve that, and that probably starts with knowing if the base should be prost or one of the others.

dg-builder · 2023-09-28T01:39:58Z

Hi, what's the current status for supporting unknown fields? I see multiple attempts over the past couple of years but no progress.

It would be great to rm this file :) https://github.com/tokio-rs/prost/blob/master/conformance/failing_tests.txt

danburkert added enhancement help wanted labels Jul 23, 2017

kestred added a commit to wyyerd-contrib/prost that referenced this issue Feb 3, 2018

readme: remove FAQ tokio-rs#2 and document type/tag inference

08c5f6a

per-gron mentioned this issue Jul 13, 2018

Keep unknown fields in decode/encode roundtrips #117

Closed

8 tasks

timthelion mentioned this issue Nov 19, 2018

It would be nice to derive Default for message structs #136

Closed

illicitonion mentioned this issue Dec 15, 2020

refactor all gRPC usages to use Tonic instead of grpcio pantsbuild/pants#11307

Merged

akhilles mentioned this issue Mar 30, 2021

Add ability to detect unknown fields #451

Closed

ancazamfir mentioned this issue Nov 3, 2021

Upgrade proto to ibc-go v1.2 and sdk v0.44 informalsystems/hermes#1438

Merged

6 tasks

andrewhickman linked a pull request Dec 20, 2021 that will close this issue

Add opt-in support for unknown fields #574

Open

bergundy mentioned this issue Jul 11, 2022

Add property/mutable state updating events temporalio/api#196

Merged

jasonahills mentioned this issue Nov 30, 2022

Draft: Unknown fields #769

Open

andrewhickman mentioned this issue Aug 28, 2023

Stabilise useful functions from encoding module #903

Open

0xcaff mentioned this issue Jul 13, 2024

add example for rust-protobuf codec hyperium/tonic#1789

Closed

caspermeijn mentioned this issue Jul 19, 2024

support message option? #425

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for unknown fields #2

Add support for unknown fields #2

danburkert commented Jun 26, 2017

per-gron commented Jul 5, 2018

per-gron commented Jul 11, 2018

danburkert commented Jul 29, 2018 •

edited

Loading

per-gron commented Jul 29, 2018

danburkert commented Aug 2, 2018

per-gron commented Aug 6, 2018

scottlamb commented Aug 6, 2018

dg-builder commented Sep 28, 2023

Add support for unknown fields #2

Add support for unknown fields #2

Comments

danburkert commented Jun 26, 2017

per-gron commented Jul 5, 2018

per-gron commented Jul 11, 2018

danburkert commented Jul 29, 2018 • edited Loading

per-gron commented Jul 29, 2018

danburkert commented Aug 2, 2018

per-gron commented Aug 6, 2018

scottlamb commented Aug 6, 2018

dg-builder commented Sep 28, 2023

danburkert commented Jul 29, 2018 •

edited

Loading