asyncio tcp transport on Windows reads bytearray instead of bytes #99941

DarioDaF · 2022-12-02T02:58:13Z

Bug report

asyncio.Protocol.data_received prototype not respected: when using create_connection to create a tcp transport data_received is being called with a bytearray object instead of bytes.
If this is expected behaviour libraries like httpx and such should be warned or the prototype modified, although I doubt it's intended because it would suppose the generator doesn't hold reference to it otherwise data could change while stored in buffered stuff.

import sys
print(f'Python: {sys.version}\n')

import asyncio

class MyProto(asyncio.Protocol):
    def data_received(self, data: bytes) -> None:
        print('@@@@@@@@@@ ', data)
    def eof_received(self):
        print('##########')

async def main():
    t, proto = await asyncio.get_running_loop().create_connection(MyProto, 'example.com', 80)
    t.write(b'SITE BE MAD\n\n')
    await asyncio.sleep(1)
    t.close()

asyncio.run(main())

Correct output: @@@@@@@@@@ b'HTTP/1.0 ...WHATEVER THE SERVER ANSWERS...'
Faulty output: @@@@@@@@@@ bytearray(b'HTTP/1.0 ...WHATEVER THE SERVER ANSWERS...')

Your environment

On Windows 11 x64

Tested on:

3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]
3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)]

Working as expected in:

3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)]

Related problems

Link to the thread where I found a conflicting thing (it's being solved there as a workaround)
encode/httpx#2305 (comment)

EDIT: simplified the minimal example

Linked PRs

gh-99941: asyncio.Protocol.data_received now received immutable bytes #100053

The text was updated successfully, but these errors were encountered:

gvanrossum · 2022-12-03T05:48:11Z

Is it really only happening on Windows?

I'm not sure what you mean by asyncio.Protocol.data_received prototype -- are you referring to the type stubs used by static type checkers in typeshed perhaps? Those are not considered normative (they are often created using reverse engineering and bugs may be fixed when users complain). What does the doc say? That would be considered normative (in most cases).

Fixing this might require copying the data which would adversely affect performance, effectively copying all data received an extra time.

DarioDaF · 2022-12-03T18:44:23Z

I've not tested it on linux, the docs clearly says https://docs.python.org/3/library/asyncio-protocol.html#asyncio.Protocol.data_received non empty bytes object although could make sense for perfomrance to return a bytes-like object (alhough non hashable that is the main problem, and slices of bytearrays are still byte arrays even tho they don't share memory and are a copy) https://github1s.com/python/cpython/blob/HEAD/Lib/_pyio.py#L621 as you can see searching for read functions it's always either casted to bytes or calling other reads that finally lead to os.read which correctly returns bytes, will test on linux probably on monday and see if I can setup a test env to delve deeper in where the bytearray comes from

pochmann · 2022-12-03T20:29:51Z

the docs clearly says non empty bytes object

Hmm, actually it says "bytes object", not "bytes object".

That said, the doc for bytes and bytearray often says "bytes object" and "bytearray object", referring to bytes and bytearray, respectively.

gvanrossum · 2022-12-04T01:12:38Z

I agree that the phrase "bytes object" in the docs refers to bytes, not bytearray.

The cause of the regression (if it is that) is presumably GH-21442, which switches to bytearray for performance reasons. It does this only for the proactor event loop, so Windows only.

Now, I am reluctant to switch back to bytes, because that would lose some of the performance gain. I'd rather update the docs. If a particular protocol class really want bytes, it's easy to put data = bytes(data) at the top of the data_received method.

DarioDaF · 2022-12-04T04:26:15Z

Fair as long as the bytearray is not reused and is released by the source, otherwise if any protocol buffers stuff storing the object it will get corrupted, although I see self._data[:length] in the deferred call, so it's already copying, making it a bytes shouldn't have speed downsides cause slices for bytearrays are copies

kumaraditya303 · 2022-12-04T04:46:21Z

+1 for updating the docs to something like data is a bytes-like object. An optimized protocol may choose to return memoryview of the data to avoid excessive copying so I would like to keep the docs a bit generic for future improvements.

gvanrossum · 2022-12-04T05:12:02Z

Hm, memoryview is probably too different from bytes and bytearray to be acceptable. (In the typeshed world there's been discussion of dropping its inclusion in the bytes-like concept there, though this is not so simple either.)

It is indeed the case that the bytesarray passed to data_received() is a fresh slice. Unfortunately I think there's no API that I know of that creates a slice of a bytesarray and turns it into a bytes without an extra copy. (Unless that's possible using memoryview as an intermediate?)

I think it's fine if the docs state bytes-like.

DarioDaF · 2022-12-04T20:36:16Z

Totally agree, should be possible (ofc requiring testing) to wrap a memoryview, slice that which is a noop and bytes it instead with almost no performance drop

gvanrossum · 2022-12-04T20:50:07Z

So instead of self._data[:length] we could write bytes(memoryview(self._data)[:length]) and that doesn't copy bytes redundantly. It does create two memoryview objects (one for self._data and one for the slice) and each memoryview object costs 184 bytes on my machine, so maybe we could do something like bytes(memoryview(self._data)[:length]) if length >= 500 else bytes(self._data[:length]) (the 500 is a hunch, we could tune it using a micro-benchmark if you feel like it).

But since the current behavior is already out in 3.10 and 3.11 maybe it's not worth fixing and we should just document that it's "bytes-like". Thoughts?

DarioDaF · 2022-12-04T21:25:40Z

In my opinion I'd keep bytes, cause passing mutable objects in the stream stack means that any protocol in between could mess up and break reading stuff (same with write in fact there was a discussion to disallow passing bytearray in writes, I think they settled in copying it with bytes on function call). Also all the reads (file like objects and everything) documentations must be changed or al least checked if we change this protocol. I'm not in the dev team for python tho, so I might not be the best person to ask this

gvanrossum · 2022-12-06T00:03:54Z

Do you feel comfortable submitting a PR? I think there are only two or three places that need to be changed, plus a unittest for each.

DarioDaF · 2022-12-06T08:26:07Z

Sure no problem, I'd see how to test performance with the example, I've also found that a similar thing is done in streams.py to uniform that we should decide what to do, imo unless the interpreter can figure out bytes(my_bytearray[:n]) as a simplified slice (since the slice reference is never stored) it's always best to go with the memoryview intermediary (if someone is reading with a very small buffer they are clearly not going for performance and adding a conditional will slow stuff down) so I'd change to that also in https://github1s.com/python/cpython/blob/HEAD/Lib/asyncio/streams.py#L691 and https://github1s.com/python/cpython/blob/HEAD/Lib/asyncio/streams.py#L733

…le bytes (#100053)

msoxzw · 2023-01-28T23:26:26Z

Another PR, #21446, efficiently implementing BufferedProtocol for proactor event loop, addresses this issue as well.

DarioDaF · 2023-01-29T01:01:04Z

I'll give a look at the PR, it seems more broad then the actual problem. Now there are test cases for this so if it passes should be fine

DarioDaF added the type-bug An unexpected behavior, bug, or error label Dec 2, 2022

kumaraditya303 added the topic-asyncio label Dec 2, 2022

DarioDaF added a commit to DarioDaF/cpython that referenced this issue Dec 6, 2022

Added test case for python#99941

1a5af35

bedevere-bot mentioned this issue Dec 6, 2022

gh-99941: asyncio.Protocol.data_received now received immutable bytes #100053

Merged

gvanrossum pushed a commit that referenced this issue Dec 10, 2022

gh-99941: Ensure that asyncio.Protocol.data_received receives immutab…

1bb68ba

…le bytes (#100053)

kumaraditya303 closed this as completed Dec 15, 2022

farmio mentioned this issue Feb 11, 2023

Fix Windows TCP transport XKNX/xknx#1191

Merged

7 tasks

jonathanslenders mentioned this issue Aug 30, 2024

asyncio.BaseProactorEventLoop feeds bytearray objects to the Protocol.data_received callback, contradicting docs #123496

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

asyncio tcp transport on Windows reads bytearray instead of bytes #99941

asyncio tcp transport on Windows reads bytearray instead of bytes #99941

DarioDaF commented Dec 2, 2022 •

edited by bedevere-bot

Loading

gvanrossum commented Dec 3, 2022

DarioDaF commented Dec 3, 2022

pochmann commented Dec 3, 2022 •

edited

Loading

gvanrossum commented Dec 4, 2022

DarioDaF commented Dec 4, 2022

kumaraditya303 commented Dec 4, 2022 •

edited

Loading

gvanrossum commented Dec 4, 2022

DarioDaF commented Dec 4, 2022

gvanrossum commented Dec 4, 2022

DarioDaF commented Dec 4, 2022

gvanrossum commented Dec 6, 2022

DarioDaF commented Dec 6, 2022

msoxzw commented Jan 28, 2023

DarioDaF commented Jan 29, 2023

asyncio tcp transport on Windows reads bytearray instead of bytes #99941

asyncio tcp transport on Windows reads bytearray instead of bytes #99941

Comments

DarioDaF commented Dec 2, 2022 • edited by bedevere-bot Loading

Bug report

Your environment

Related problems

Linked PRs

gvanrossum commented Dec 3, 2022

DarioDaF commented Dec 3, 2022

pochmann commented Dec 3, 2022 • edited Loading

gvanrossum commented Dec 4, 2022

DarioDaF commented Dec 4, 2022

kumaraditya303 commented Dec 4, 2022 • edited Loading

gvanrossum commented Dec 4, 2022

DarioDaF commented Dec 4, 2022

gvanrossum commented Dec 4, 2022

DarioDaF commented Dec 4, 2022

gvanrossum commented Dec 6, 2022

DarioDaF commented Dec 6, 2022

msoxzw commented Jan 28, 2023

DarioDaF commented Jan 29, 2023

DarioDaF commented Dec 2, 2022 •

edited by bedevere-bot

Loading

pochmann commented Dec 3, 2022 •

edited

Loading

kumaraditya303 commented Dec 4, 2022 •

edited

Loading