Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for unknown fields #2

Open
danburkert opened this issue Jun 26, 2017 · 8 comments · May be fixed by #574
Open

Add support for unknown fields #2

danburkert opened this issue Jun 26, 2017 · 8 comments · May be fixed by #574

Comments

@danburkert
Copy link
Collaborator

See https://docs.google.com/document/d/1KMRX-G91Aa-Y2FkEaHeeviLRRNblgIahbsk4wA14gRk/view and protocolbuffers/protobuf#272.

@per-gron
Copy link

per-gron commented Jul 5, 2018

I'm interested in this.

In order to make this work, unknown fields have to be stored in the structs somehow, something along the lines of adding an UnknownFieldSet to every generated message struct. It could be made to be just a null pointer and nothing more unless there are unknown fields.

This would be an API break, since existing struct instantiations would break because of not setting this field. A seemingly sensible way to make this sane is to recommend everyone to instantiate structs like this:

syntax = "proto3";

message Person {
  string name = 1;
}
Person { name: "Per".to_string(), ..Default::default() };

This would work if all generated structs would #[derive(Default)] (or equivalent), which seems like a sensible thing to do in general because Protobufs in general assume that no code breaks from just adding a field. (Not sure how this would interact with proto2 default values?) This also seems compatible with https://github.com/danburkert/prost/issues/43 (#[non_exhaustive]).

For completeness, the parsing code would probably have to be updated to at least not discard groups, even if it's not made capable of actually parsing them.

I don't know if known fields with unexpected type should go into the unknown fields set or not.

Does this make sense?

@per-gron
Copy link

I have started hacking on this.

There are some ugly corner cases that I'm not entirely sure how to deal with. For example the well-known types such as BoolValue when represented as their Rust native types obviously won't be able to keep unknown values. The same applies to maps.

@danburkert
Copy link
Collaborator Author

danburkert commented Jul 29, 2018

@per-gron thanks for the PR. Could you explain the motivation behind the unknown fields feature? I've read through the upstream docs linked in the first comment, but I don't really understand why it's become such a priority for Google to support this feature. My take is that unknown fields are full of subtle footguns1, and the described usecases are better either not a good fit for protobuf (RMW) or better done through preserving the original encoded message (intermediary servers).

@per-gron
Copy link

@danburkert I honestly don't know exactly why either, but I can speculate a bit: It seems like the fact that proto3 is different from proto2 in this regard has made it difficult for a lot of internal projects to adopt proto3. It's an API break that simply makes it scary to upgrade when you have a huge code base, and the difference is not important enough to justify a change of behavior.

I agree that this feature is footgun-prone; yet I think it is better to behave as similarly as possible to the other protobuf libraries than to not have it at all. (It is of course possible to provide some kind of Prost setting to disable the feature for projects that don't care about having the same behavior across languages.)

For me personally, I don't really care much about this particular feature or lack of it, the reason I wanted to work on this is that it seems like Prost is a very nice Rust Protobuf library, and I think it would be good for Rust to have a more "officially endorsed" version that people can trust will work as expected, including across languages. When working with large projects and across different languages, uniformity really helps a lot, even with really small details.

(Note: Even though I work at Google I don't have any special power to "officialize" any particular Rust Protobuf library, but I hope that fixing details like this one so that its behavior is very close to the main Protobuf libraries along with having people from the Rust community agree that Prost's struct-based API is nice will go a long way.)

@danburkert
Copy link
Collaborator Author

Just to set expectations, it's never been a goal of mine to have prost officially endorsed by Google or the upstream Protobuf authors or community. Nor has it been a goal to have feature compatibility with other Protobuf libraries, since there are so many Protobuf features (at least in the upstream C++ implementation). My goals with prost are roughly (and probably omitting some important ones):

  • Idiomatic generated code
  • performance
  • straightforward build system integration
  • Compatibility with other Protobuf libraries (wouldn't really be protobuf otherwise)
  • Support the core proto2/proto3 features

The upstream project has done a great job publishing conformance tests which prost uses to ensure compatibility.

As far as adding features for the sake of parity, there's no way I can feasibly do that as a single maintainer, nor do I think I could even maintain such a library if the features were contributed by others. prost is currently 'feature-complete' for the purposes of the application for which I wrote it, so isn't a burning need for me to keep adding features. I'm especially disinclined to add features which I think are harmful to creating flexible applications, and while I haven't completely made up my mind, I tend to think unknown fields fall in that category. I've yet to come across a solid usecase for them that isn't error prone or better solved in a different way.

@per-gron
Copy link

per-gron commented Aug 6, 2018

Thanks for the information about your priorities with Prost. (I've been on vacation hence my slow reply.)

The reasons that I wrote this PR are 1) there was an open issue about it written by you, 2) unlike JSON/groups/reflection this is something that is or will soon be required by the spec so this is not quite the same as those other features.

Given help with maintenance, would you be interested in expanding the scope of this library to include features that makes it possible to get broader adoption? If not, would you support a fork with that goal?

@scottlamb
Copy link

I don't really understand why it's become such a priority for Google to support this feature. My take is that unknown fields are full of subtle footguns1, and the described usecases are better either not a good fit for protobuf (RMW) or better done through preserving the original encoded message (intermediary servers).

I happen to work at Google (not on protobufs!) and know folks who really pushed for this (reversing the proto 2->3 decision to drop unknown field handling). Those two use cases are the reason why.

I'm not sure why you say protobuf is not a good fit for RMW; it's overwhelmingly common for databases to be full of protobuf values which servers do RMW on. It generally works well...but most teams are still using proto 2 (with no plans to adopt proto 3). Some that started using proto 3 got nasty surprises; thus the push.

I see your point intermediary servers could preserve the original encoded message, but there are also reasons it's easier to work with a message field embedded in a message field rather than a bytes field embedded in a message field. Think ease of writing an ASCII message (to check in as test data or to send an RPC by hand on the commandline while debugging), ease of understanding debug representations (inverse of the above), and not having to explicitly have code do additional serialization/deserialization steps / possibly mess up the type of the contained proto.

The oneof footgun doesn't concern me too much, fwiw. It's wrong to have a message that has more than one of the oneof present; inconsistent behavior in which is used should probably be expected. There are bigger footguns involved in proto compatibility (reusing field numbers / changing types of fields is a huge common way to screw up; there are a lot of variations of how to make this mistake and how it can bite), but dealing with backward and forward compatibility can't be avoided in many applications.

Given help with maintenance, would you be interested in expanding the scope of this library to include features that makes it possible to get broader adoption? If not, would you support a fork with that goal?

I'm also interested in the answer here (though I can't commit to significant help myself). I appreciate the hard work you've done on this project. Nonetheless, it's a bit frustrating that there are at least three apparently-commonly-used Rust proto implementations, with none being a superset of the others (much less matching Google's official C++ protobuf implementation). I'd love a path to resolve that, and that probably starts with knowing if the base should be prost or one of the others.

@dg-builder
Copy link

Hi, what's the current status for supporting unknown fields? I see multiple attempts over the past couple of years but no progress.

It would be great to rm this file :) https://github.com/tokio-rs/prost/blob/master/conformance/failing_tests.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants