Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consideration for a streaming Decoder #27

Closed
goodboy opened this issue May 30, 2021 · 4 comments
Closed

Consideration for a streaming Decoder #27

goodboy opened this issue May 30, 2021 · 4 comments

Comments

@goodboy
Copy link

goodboy commented May 30, 2021

I was able to get basic funtionality integrated into a project but got hung up on not having an easy way to decode streamed msgpack data into objects incrementally.

msgpack-python offers the streaming Unpacker api which is implemented in cython.

I would be great to get something similar in this project for convenient stream processing without having to do manual parsing for object message delimiters.

I would be interested to provide a patch for this support.
Would you accept one as such in cython or would you insist on C?

goodboy added a commit to goodboy/tractor that referenced this issue Jun 2, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
@goodboy goodboy mentioned this issue Jun 2, 2021
7 tasks
@jcrist
Copy link
Owner

jcrist commented Jun 7, 2021

A streaming API for a deserializer is hard to support, since the internal state of the decoder needs to be kept throughout all operations so that it can be resumed later on. This comes with a performance hit for non-streaming decoding, and would complicate the codebase. Implementing it would require changes to the parser statemachine, which is all implemented at the C level (so no, Cython wouldn't work here).

Rather, I recommend implementing message framing at a higher level. A simple protocol would be length prefix framing (see e.g. this blog post for reference). This has a few benefits besides simplifying our codebase:

  • It lets you easily swap out the serialization mechanism without changing the framing (so you could try out e.g. json, pickle, quickle, etc...) instead.
  • They let you set limits on the message size received, so a client can't send a giant message that would crash the server. This is harder to handle at the msgspec level, but easier at the framing level (where you can set a max size on the total message).

@goodboy
Copy link
Author

goodboy commented Jun 7, 2021

Rather, I recommend implementing message framing at a higher level.

@jcrist cool, this def makes sense to me.

Are you suggesting that a framing protocol could be added to this project or you're suggesting client code should implement it in its own code base?

Thanks again for the in depth answers btw.

@jcrist
Copy link
Owner

jcrist commented Jun 7, 2021

This would be something you'd handle in your own codebase. A naive asyncio implementation might be (untested, please note I don't have these apis memorized):

async def write(stream, msg: bytes) -> None:
    n = len(msg)
    stream.write(n.to_bytes(4, "big"))
    socket.write(msg)
    await stream.drain()

async def read(stream) -> bytes:
    prefix = await stream.readexactly(4)
    n = int.from_bytes(prefix, "big")
    return await stream.readexactly(n)

@goodboy
Copy link
Author

goodboy commented Jun 7, 2021

@jcrist if you're ok with it I might also put an example of this in the docs PR for #25 just so any newcomers have an example to work off. I'll probably link to the protobuf post you sent as well. Imo it'd be pretty handy to have some examples for multiple async frameworks as well.

goodboy added a commit to goodboy/tractor that referenced this issue Jun 11, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Jun 14, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Jul 1, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Jul 1, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Sep 5, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Sep 8, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Sep 18, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Sep 18, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Oct 4, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Oct 4, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
goodboy added a commit to goodboy/tractor that referenced this issue Oct 5, 2021
Can only really use an encoder currently since there is no streaming api
in `msgspec` as of currently. See jcrist/msgspec#27.

Not sure if any encoding speedups are currently noticeable especially
without any validation going on yet XD.

First experiments toward #196
@jcrist jcrist closed this as completed Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants