Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(client) connections cannot be really closed before EOF with uvloop enabled #6762

Closed
1 task done
Rongronggg9 opened this issue May 21, 2022 · 20 comments
Closed
1 task done
Labels

Comments

@Rongronggg9
Copy link
Contributor

Rongronggg9 commented May 21, 2022

Describe the bug

If uvloop is enabled, as long as EOF is not reached, even if the response, connection, and session are all closed, the server from which aiohttp requested resources, will still keep sending packets to the aiohttp-client for some time. However, the bug is only reproducible with uvloop enabled.

To Reproduce

aiohttp[speedups]==3.8.1
uvloop==0.16.0
import asyncio
import aiohttp
import uvloop
import logging

logging.basicConfig(level=logging.DEBUG)


async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            logging.debug(response)
            logging.debug(response.content)


async def reproduce(wait):
    await fetch('https://sgp-ping.vultr.com/vultr.com.1000MB.bin')
    await asyncio.sleep(wait)  # now you will see that the file is still being downloaded


def main(enable_uvloop):
    if enable_uvloop:
        logging.info('********** Using uvloop **********')
        uvloop.install()  # the bug is only reproducible with uvloop
    else:
        logging.info('********** Using asyncio **********')
    loop = asyncio.new_event_loop()
    loop.set_debug(True)  # or set env PYTHONASYNCIODEBUG=1, not necessary to reproduce the bug
    asyncio.set_event_loop(loop)
    loop.run_until_complete(reproduce(15 if enable_uvloop else 5))
    loop.close()


if __name__ == '__main__':
    main(enable_uvloop=False)
    main(enable_uvloop=True)
    input('Press Enter to exit')

Expected behavior

The connection should be closed and no packet from the server could be sent to the client anymore, no matter with or without uvloop enabled, no matter reaching or not reaching EOF.

Logs/tracebacks

INFO:root:********** Using asyncio **********
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:asyncio:Get address info sgp-ping.vultr.com:443, type=<SocketKind.SOCK_STREAM: 1>, flags=<AddressInfo.AI_ADDRCONFIG: 32>
INFO:asyncio:Getting address info sgp-ping.vultr.com:443, type=<SocketKind.SOCK_STREAM: 1>, flags=<AddressInfo.AI_ADDRCONFIG: 32> took 209.423ms: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('45.32.100.168', 443))]
DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fc80064dfa0> starts SSL handshake
DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fc80064dfa0>: SSL handshake took 225.2 ms
DEBUG:asyncio:<asyncio.TransportSocket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.1.101', 51708), raddr=('45.32.100.168', 443)> connected to 45.32.100.168:443: (<asyncio.sslproto._SSLProtocolTransport object at 0x7fc7ff9b5a00>, <aiohttp.client_proto.ResponseHandler object at 0x7fc800205b20>)
DEBUG:root:<ClientResponse(https://sgp-ping.vultr.com/vultr.com.1000MB.bin) [200 OK]>
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Sat, 21 May 2022 00:29:55 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '1048576000', 'Last-Modified': 'Mon, 20 Sep 2021 19:54:01 GMT', 'Connection': 'keep-alive', 'Etag': '"6148e6d9-3e800000"', 'Expires': 'Sun, 22 May 2022 00:29:55 GMT', 'Cache-Control': 'max-age=86400', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'Access-Control-Allow-Origin': '*', 'Accept-Ranges': 'bytes')>

DEBUG:root:<StreamReader 15968 bytes>
DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fc80064dfa0>: SSL error in data received
Traceback (most recent call last):
  File "/usr/lib/python3.9/asyncio/sslproto.py", line 534, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/usr/lib/python3.9/asyncio/sslproto.py", line 206, in feed_ssldata
    self._sslobj.unwrap()
  File "/usr/lib/python3.9/ssl.py", line 948, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)
DEBUG:asyncio:Close <_UnixSelectorEventLoop running=False closed=False debug=True>
INFO:root:********** Using uvloop **********
DEBUG:asyncio:<uvloop.loop.SSLProtocol object at 0x7fc8040f8b40> starts SSL handshake
DEBUG:asyncio:<uvloop.loop.SSLProtocol object at 0x7fc8040f8b40>: SSL handshake took 227.0 ms
DEBUG:root:<ClientResponse(https://sgp-ping.vultr.com/vultr.com.1000MB.bin) [200 OK]>
<CIMultiDictProxy('Server': 'nginx', 'Date': 'Sat, 21 May 2022 00:30:01 GMT', 'Content-Type': 'application/octet-stream', 'Content-Length': '1048576000', 'Last-Modified': 'Mon, 20 Sep 2021 19:54:01 GMT', 'Connection': 'keep-alive', 'Etag': '"6148e6d9-3e800000"', 'Expires': 'Sun, 22 May 2022 00:30:01 GMT', 'Cache-Control': 'max-age=86400', 'X-Frame-Options': 'DENY', 'X-Content-Type-Options': 'nosniff', 'Access-Control-Allow-Origin': '*', 'Accept-Ranges': 'bytes')>

DEBUG:root:<StreamReader 15968 bytes>
Press Enter to exit

Python Version

Python 3.9.12 (main, Mar 24 2022, 13:02:21) [GCC 11.2.0] on linux

aiohttp Version

Name: aiohttp
Version: 3.8.1
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: 
Author-email: 
License: Apache 2
Location: /home/rongrong/venv/RSStT/lib/python3.9/site-packages
Requires: aiosignal, async-timeout, attrs, charset-normalizer, frozenlist, multidict, yarl
Required-by: aiohttp-retry, aiohttp-socks

multidict Version

Name: multidict
Version: 6.0.2
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/rongrong/venv/RSStT/lib/python3.9/site-packages
Requires: 
Required-by: aiohttp, yarl

yarl Version

Name: yarl
Version: 1.7.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /home/rongrong/venv/RSStT/lib/python3.9/site-packages
Requires: idna, multidict
Required-by: aiohttp

OS

       _,met$$$$$gg.          rongrong@envy
    ,g$$$$$$$$$$$$$$$P.       -------------
  ,g$$P"     """Y$$.".        OS: Debian GNU/Linux bookworm/sid x86_64
 ,$$P'              `$$$.     Host: HP ENVY x360 Convertible 15-dr1xxx
',$$P       ,ggs.     `$$b:   Kernel: 5.17.0-1-amd64
`d$$'     ,$P"'   .    $$$    Uptime: 7 hours, 43 mins
 $$P      d$'     ,    $$P    Packages: 3453 (dpkg)
 $$:      $$.   -    ,d$$'    Shell: bash 5.1.16
 $$;      Y$b._   _,d$P'      Resolution: 1080x1920, 1920x1080
 Y$$.    `.`"Y$$$$P"'         DE: Plasma 5.24.5
 `$$b      "-.__              WM: KWin
  `Y$$                        WM Theme: Breeze 微风
   `Y$$.                      Theme: Breeze Light [Plasma], Breeze [GTK2/3]
     `$$b.                    Icons: breeze-dark [Plasma], breeze-dark [GTK2/3]
       `Y$$b.                 Terminal: konsole
          `"Y$b._             CPU: Intel i7-10510U (8) @ 4.900GHz
              `"""            GPU: NVIDIA GeForce MX250
                              GPU: Intel CometLake-U GT2 [UHD Graphics]
                              Memory: 16156MiB / 31774MiB

Related component

Client

Additional context

The bug has also been reported to uvloop: MagicStack/uvloop#471

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
Rongronggg9 added a commit to Rongronggg9/RSS-to-Telegram-Bot that referenced this issue May 21, 2022
@Rongronggg9 Rongronggg9 changed the title (aiohttp-client) Connections never really closed with uvloop enabled (client) connections cannot be really closed before EOF with uvloop enabled May 21, 2022
@Rongronggg9
Copy link
Contributor Author

Some statistics in my production environment. (In my use case only the response headers and/or the beginning 4KB of the response body is needed.)

@Dreamsorcerer
Copy link
Member

That probably explains some of the frequent test failures with uvloop. Would be great if someone figures out the fix.

@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented May 22, 2022

I'd done some deep-dive and found that the bug is a TLS/SSL-relevant bug. Non-TLS connections can be closed even if uvloop is enabled.

-    await fetch('https://sgp-ping.vultr.com/vultr.com.1000MB.bin')
+    await fetch('http://sgp-ping.vultr.com/vultr.com.1000MB.bin')

@Rongronggg9
Copy link
Contributor Author

I also analyzed the web traffic with Wireshark. With asyncio, RST is immediately sent once the connection was closed; but with uvloop, closing the connection won't change anything as if is it not closed, RST is only sent after the event loop is closed.

@Rongronggg9
Copy link
Contributor Author

Another deep-dive:

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            logging.debug(response)
            logging.debug(response.content)
+            await asyncio.sleep(5)  # at this point, only a little part of the response is read, then pends
+            return  # after the connection is "closed", uvloop will resume reading the response

@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented May 24, 2022

A deeper dive shows the direct cause of the bug.

async def reproduce(wait):
    await fetch('https://sgp-ping.vultr.com/vultr.com.1000MB.bin')
-    await asyncio.sleep(wait)  # now you will see that the file is still being downloaded
+    await asyncio.sleep(35)  # the download will last for exactly 30s, then abort

aiohttp closing its high-level connection causes uvloop to call _SSLProtocolTransport.close().

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/sslproto.pyx#L42-L51

     def close(self): 
         """Close the transport. 
  
         Buffered data will be flushed asynchronously.  No more data 
         will be received.  After all buffered data is flushed, the 
         protocol's connection_lost() method will (eventually) called 
         with None as its argument. 
         """ 
         self._closed = True 
         self._ssl_protocol._start_shutdown(self.context.copy()) 

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/sslproto.pyx#L559-L574

 cdef _start_shutdown(self, object context=None): 
     if self._state in (FLUSHING, SHUTDOWN, UNWRAPPED): 
         return 
     # we don't need the context for _abort or the timeout, because 
     # TCP transport._force_close() should be able to call 
     # connection_lost() in the right context 
     if self._app_transport is not None: 
         self._app_transport._closed = True 
     if self._state == DO_HANDSHAKE: 
         self._abort(None) 
     else: 
         self._set_state(FLUSHING) 
         self._shutdown_timeout_handle = \
             self._loop.call_later(self._ssl_shutdown_timeout, 
                                   lambda: self._check_shutdown_timeout()) 
         self._do_flush(context) 

We can easily verify that the low-level connection is closed 30 seconds after aiohttp have "closed" its high-level connection (as long as the response body is big enough). This complies with SSL_SHUTDOWN_TIMEOUT. That is to say, it is SSLProtocol._ssl_shutdown_timeout() that eventually closes the connection.

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/includes/consts.pxi#L24

 DEF SSL_SHUTDOWN_TIMEOUT = 30.0 

Now that SSLProtocol._ssl_shutdown_timeout() is called, the conclusion is evident that SSLProtocol._do_flush() causes the reading to be resumed but meanwhile is stuck in something until EOF or timeout is reached, causing the reading sustained.

https://github.com/MagicStack/uvloop/blob/3e71ddc3383a8e0546519117c8775323c19eb6d7/uvloop/sslproto.pyx#L600-L618

     cdef _do_flush(self, object context=None): 
         """Flush the write backlog, discarding new data received. 
  
         We don't send close_notify in FLUSHING because we still want to send 
         the remaining data over SSL, even if we received a close_notify. Also, 
         no application-level resume_writing() or pause_writing() will be called 
         in FLUSHING, as we could fully manage the flow control internally. 
         """ 
         try: 
             self._do_read_into_void(context) 
             self._do_write() 
             self._process_outgoing() 
             self._control_ssl_reading() 
         except Exception as ex: 
             self._on_shutdown_complete(ex) 
         else: 
             if not self._get_write_buffer_size(): 
                 self._set_state(SHUTDOWN) 
                 self._do_shutdown(context) 

Before uvloop fixes the bug, here's a simple hotfix to stop uvloop from reading the response happily and not willing to close the connection.

class AiohttpUvloopTransportHotfix(contextlib.AbstractAsyncContextManager):
    def __init__(self, response: aiohttp.ClientResponse):
        self.response = response

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        if UVLOOP_ENABLED:
            self.response.connection.transport.abort()

async def fetch(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            async with AiohttpUvloopTransportHotfix(response):
                ...

CC @Dreamsorcerer

Rongronggg9 added a commit to Rongronggg9/RSS-to-Telegram-Bot that referenced this issue May 25, 2022
@Dreamsorcerer
Copy link
Member

Impressive deep dive. If this is a uvloop bug, have you reported it there? A link to it here would be useful for tracking.

@Rongronggg9
Copy link
Contributor Author

I'd already opened a mirror issue before.

Additional context

The bug has also been reported to uvloop: MagicStack/uvloop#471

@Dreamsorcerer
Copy link
Member

Ah, great. I wonder if there's an easy way to put a temp fix in aiohttp, which would hopefully get our CI tests stable.

@Dreamsorcerer
Copy link
Member

Maybe we can just add those lines into ClientSession, possibly even just monkey patching it in the test code.

@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented May 25, 2022

Ah, great. I wonder if there's an easy way to put a temp fix in aiohttp, which would hopefully get our CI tests stable.

Maybe we can just add those lines into ClientSession, possibly even just monkey patching it in the test code.

Temporarily fixing it is not difficult, just call transport.abort() instead of transport.close() if the transport is a uvloop SSL protocol transport. But there is indeed a lot to do since there are a lot of transport.close() calls scattered in aiohttp. I am not really sure which ones are not affected by the uvloop bug. The problem also exists in the monkey-patching-in-test-code solution - when to monkey-patch and when not?

asyncio: graceful close (general behavior)

uvloop: hard abort (with my hotfix)

@beesaferoot
Copy link

beesaferoot commented Sep 7, 2022

Hi @Rongronggg9, could this somehow lead to a memory leak? I'm facing an issue where after switching to aiohttp (3.8.1) we began getting high RAM consumption after spikes in requests but which doesn't drop afterwards.

We also use uvloop and make lots of TLS/SSL connections. I will try to test without uvloop to confirm.

@Rongronggg9
Copy link
Contributor Author

@beesaferoot Hmmm... I don't think so, because uvloop will eventually force abort the connection after 30s as I've described before. Also, in my usage, either with/without uvloop or with/without my uvloop hotfix, aiohttp never shows any potential to leak memory.

My advice is to check if there are unclosed responses, sessions, or so on. Also, if you are using CPython built with Glibc (in most cases it is exactly what you are using on Linux), you should consider the possibility of memory fragmentation (for more details: kurtmckee/feedparser#287).

@beesaferoot
Copy link

I see, I will investigate inline with your suggestion. Thanks for the insight

@beesaferoot
Copy link

It appears to be a fragmentation problem, as setting the env variable MALLOC_MMAP_THRESHOLD_ stops the ever increasing memory!

@Rongronggg9
Copy link
Contributor Author

uvloop v0.17.0 was just released, but still buggy

@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented Sep 15, 2022

I've also tested the vanilla asyncio in CPython 3.11.0rc1/rc2 since it has "borrowed"1 the SSL implementation from uvloop. Still, the connection is closed as expected.

But a fatal error is logged:

DEBUG:asyncio:<asyncio.sslproto.SSLProtocol object at 0x7fdacf7879d0>: Fatal error on transport
Traceback (most recent call last):
  File "/usr/lib/python3.11/asyncio/sslproto.py", line 646, in _do_shutdown
    self._sslobj.unwrap()
  File "/usr/lib/python3.11/ssl.py", line 983, in unwrap
    return self._sslobj.shutdown()
           ^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2672)

I consider the fatal error is pretty OK:

Footnotes

  1. https://github.com/python/cpython/issues/88177, https://github.com/python/cpython/pull/31275

@Rongronggg9
Copy link
Contributor Author

Rongronggg9 commented Sep 15, 2022

So here locates the buggy code:

cdef _do_flush(self, object context=None):
    try:
        self._do_read_into_void(context)
        self._do_write()
        self._process_outgoing()
        self._control_ssl_reading()
    except Exception as ex:
        self._on_shutdown_complete(ex)
    else:
        if not self._get_write_buffer_size():
            self._set_state(SHUTDOWN)
            self._do_shutdown(context)

cdef _do_shutdown(self, object context=None):
    try:
        # we must skip all application data (if any) before unwrap
        self._do_read_into_void(context)  # <--- I am so sad because I am buggy
        try:
            self._sslobj.unwrap()
        except ssl_SSLAgainErrors as exc:
            self._process_outgoing()
        else:
            self._process_outgoing()
            if not self._get_write_buffer_size():
                self._on_shutdown_complete(None)
    except Exception as ex:
        self._on_shutdown_complete(ex)

Being commented out, uvloop shows 100% the same behavior as vanilla asyncio in CPython 3.11 (closing the connection as expected and logging the fatal error):

DEBUG:asyncio:<uvloop.loop.SSLProtocol object at 0x7f9cbfbcf5e0>: Error occurred during shutdown
Traceback (most recent call last):
  File "uvloop/sslproto.pyx", line 624, in uvloop.loop.SSLProtocol._do_shutdown
  File "/usr/lib/python3.11/ssl.py", line 983, in unwrap
    return self._sslobj.shutdown()
           ^^^^^^^^^^^^^^^^^^^^^^^
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2672)

You may also inspect the source code of both uvloop and asyncio yourself. Vanilla asyncio in CPython 3.11 indeed does NOT read before unwrap in the shutdown stage.

def _do_flush(self):
    self._do_read()
    self._set_state(SSLProtocolState.SHUTDOWN)
    self._do_shutdown()

def _do_shutdown(self):
    try:
        if not self._eof_received:
            self._sslobj.unwrap()
    except SSLAgainErrors:
        self._process_outgoing()
    except ssl.SSLError as exc:
        self._on_shutdown_complete(exc)
    else:
        self._process_outgoing()
        self._call_eof_received()
        self._on_shutdown_complete(None)

https://github.com/python/cpython/blob/3.11/Lib/asyncio/sslproto.py#L643

@Rongronggg9
Copy link
Contributor Author

CC

The issue has been located:
MagicStack/uvloop#471 (comment)
MagicStack/uvloop#471 (comment)
MagicStack/uvloop#471 (comment)

However, more discussion is needed since the issue is actually caused by a widespread implementation practice violating the specification:
MagicStack/uvloop#471 (comment)

silvered-shark added a commit to silvered-shark/RSS-to-Telegram-Bot that referenced this issue Jun 17, 2024
silvered-shark added a commit to silvered-shark/RSS-to-Telegram-Bot that referenced this issue Jun 17, 2024
@Dreamsorcerer
Copy link
Member

The last comments on uvloop suggest it will get fixed there (eventually), so I don't think there's anything for us to do here.

@Dreamsorcerer Dreamsorcerer closed this as not planned Won't fix, can't repro, duplicate, stale Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants