Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed messaging and validation #196

Open
3 tasks
goodboy opened this issue Feb 24, 2021 · 14 comments
Open
3 tasks

Typed messaging and validation #196

goodboy opened this issue Feb 24, 2021 · 14 comments
Labels
api discussion enhancement New feature or request help wanted Extra attention is needed IPC and transport

Comments

@goodboy
Copy link
Owner

goodboy commented Feb 24, 2021

I was originally going to make a big post on pydantic and how we could offer typed messages using that very very nice project despite there being a couple holdups for integration with msgpack.

However, it turns out just today an even faster and msgpack specific project was released: msgspec 🏄🏼

It claims to not only be faster then msgpack-python but also supports schema evolution and other niceties
It also has perf bumps when making multiple repeated encode/decode calls which is exactly how we're currently using msgpack inside our Channel.

Overall there looks to be no downside and we'll get typed message semantics fast and free 👍🏼

For reference, I'll leave a bunch of links I'd previously gathered regarding making pydantic work with msgpack:

TODO
  • support for a msgpack-python custom type serializer for pydantic.BaseModel such that we just implicitly render with .dict() as pack time and load via `Model(**message)`` at decode time?
  • write ourselves a small bytes-length prefixed framing protocol for msgspec as per the comments in Try msgspec #212
      while header := await stream.receive_all_or_none(4):
          len, = struct.unpack("<I", header)
          # probably want to sanity-check len for not being unreasonably huge
          chunk = await stream.receive_exactly(len)
          # do something with chunk
  • consider offering msgspec as an optional dependency if we end up liking it?
@ryanhiebert
Copy link
Collaborator

That's really neat! I was looking at implementing Pydantic in a project a little while ago, and chose not to. It seemed like the API wasn't quite what I was looking for. I was wanting data classes, and confidence that serialization and deserialization were both strict. I'm not quite sure why I concluded that, unfortunately. I knew about the data classes integration with Pydantic, but there was something missing with it that I felt I needed.

msgspec looks pretty cool for when you control the data format, but that definitely wasn't part of what I was doing. (I was writing an API wrapper over a JSON API).

I know many people have gotten a lot of mileage out of Pydantic. It's a great project.

@goodboy
Copy link
Owner Author

goodboy commented Feb 24, 2021

Yeah alternatively we've been thinking about using capnproto and in particular seeing if we can auto-gen schema from type annotated Python functions.

I think this would be a huge boon since we'd get CBS (capability based sec) for free 🏄🏼.

The only holdup will be figuring out how pycapnp can work with async stuff and if it can help us with the schema gen/loading.
There appears to now be asyncio support but not sure how/if that will get in our way or if we can work off that impl to support trio.

@goodboy
Copy link
Owner Author

goodboy commented Feb 24, 2021

Oh also another notable project (for a tractor dependent that will likely soon be broken out on it's own repo) there is
nptyping which may prove useful in automatic serialization of arrays.

@goodboy
Copy link
Owner Author

goodboy commented Mar 7, 2021

Linking to jcrist/msgspec#25 since we'll likely need nested Structs to make this the most easy to implement (messages containing strictly typed payloads also defined as structs) otherwise there may need to be some finagling to either hack a standard message schema where payload's are decoded specifically as structs or we'll need to just always decode to a dict. It would be better to have the former considering the supposed speed improvement:

Depending on the schema, deserializing a message into a Struct can be roughly twice as fast as deserializing it into a dict.

@gc-ss
Copy link

gc-ss commented May 12, 2021

in particular seeing if we can auto-gen schema from type annotated Python functions.

Is there an issue for this.

Essentially to do this, we need to:

  1. Parse dataclasses and save Field attributes
  2. Feed this into networkx to build graph with child, isa and 'hasa` relationships
  3. Use the builder pattern over the networkx graph with a dialect (capnproto or probuf etc)

@goodboy
Copy link
Owner Author

goodboy commented May 12, 2021

@gc-ss not yet specifically; feel free to make one of course if you have some ideas and/or want to try it out.

Also, i think this could be easily wrapped in an external repo for use as well; it doesn't have to be tractor specific.

@goodboy
Copy link
Owner Author

goodboy commented May 12, 2021

Feed this into networkx to build graph with child, isa and 'hasa` relationships

@gc-ss wait why would you need this?
Afaiu graph relations aren't relevant here; are you talking about building nested structs as trees or?

@gc-ss
Copy link

gc-ss commented May 12, 2021

Afaiu graph relations aren't relevant here; are you talking about building nested structs as trees or?

Consider this:

class A:
    a: int


class B(A):
    b: int

class C(A):
    c: int

class D:
    composes_c: C

Now if we wanted auto-gen schema for type D, we don't want to spit out B. Also, it's possible some schema libraries might want schemas to be ordered in a certain way depending on the dependency tree.

So you need graphs

What do you think?

If this makes sense, I can move these into a different repo and send you a link.

@goodboy
Copy link
Owner Author

goodboy commented May 12, 2021

@gc-ss yah, as I was thinking you mean for composed structs/types.

If this makes sense, I can move these into a different repo and send you a link.

Cool, yeah if you're interested in working on this then for sure.
We can also experiment here around the tractor IPC apis and see how it forms out with tinkering, then move it to a new project.

Up to you, I don't have immediate bandwidth for this.

@goodboy
Copy link
Owner Author

goodboy commented May 31, 2021

First hold up with msgspec is mentioned in jcrist/msgspec#27, they have no streaming decoder api.

No longer a problem, we just have to write a prefix framing stream packer; see above.

goodboy added a commit that referenced this issue Jun 2, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
@goodboy
Copy link
Owner Author

goodboy commented Jun 6, 2021

Hmm alternatively to get typing going sooner then later we could just make some pydantic message type handlers. Pretty sure all we'd need it detection of a BaseModel and then serialization with .dict() on encode and decode into a BaseModel(**dict).

Pretty sure we could offer this as an extras dependency as well?

goodboy added a commit that referenced this issue Jun 11, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Jun 14, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Jul 1, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Jul 1, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Sep 5, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Sep 8, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Sep 18, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Sep 18, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Oct 4, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Oct 4, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit that referenced this issue Oct 5, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
@goodboy
Copy link
Owner Author

goodboy commented Feb 9, 2022

Linking explanation from jcrist/msgspec#25

goodboy added a commit that referenced this issue Jul 7, 2022
The greasy details are strewn throughout a `msgspec` issue:
jcrist/msgspec#140

and specifically this code was mostly written as part of POC example in
this comment:
jcrist/msgspec#140 (comment)

This work obviously pertains to our desire and prep for typed messaging
and capabilities aware msg-oriented-protocols in #196, caps sec nods in

I added a "wants to have" method to `Context` showing how I think we
could offer a pretty neat msg-type-set-as-capability-for-protocol
system.
goodboy added a commit that referenced this issue May 15, 2023
The greasy details are strewn throughout a `msgspec` issue:
jcrist/msgspec#140

and specifically this code was mostly written as part of POC example in
this comment:
jcrist/msgspec#140 (comment)

This work obviously pertains to our desire and prep for typed messaging
and capabilities aware msg-oriented-protocols in #196, caps sec nods in

I added a "wants to have" method to `Context` showing how I think we
could offer a pretty neat msg-type-set-as-capability-for-protocol
system.
@goodboy
Copy link
Owner Author

goodboy commented May 26, 2023

Probably worth noting is dataclass union libs like https://github.com/yukinarit/pyserde

@goodboy
Copy link
Owner Author

goodboy commented May 26, 2023

Hilarious to see a writeup of what we've been doing in this repo for years 😂
https://kobzol.github.io/rust/python/2023/05/20/writing-python-like-its-rust.html#fnref:2

the part on ADTs is particularly notable as part of this feature work 🏄🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api discussion enhancement New feature or request help wanted Extra attention is needed IPC and transport
Projects
None yet
Development

No branches or pull requests

3 participants