Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite loop releasing the connection when the writer is not finished #7798

Closed
wants to merge 9 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGES/7798.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix infinite loop releasing the connection when the writer is not finished
15 changes: 14 additions & 1 deletion aiohttp/client_reqrep.py
Original file line number Diff line number Diff line change
Expand Up @@ -968,13 +968,26 @@ def raise_for_status(self) -> None:
headers=self.headers,
)

def _cleanup_writer_and_release_connection(self, _: asyncio.Future) -> None:
"""Cleanup writer and release connection.

This is called when the writer is not finished
and _release_connection is called.
"""
if self._connection is not None:
self._cleanup_writer()
self._connection.release()
self._connection = None

def _release_connection(self) -> None:
if self._connection is not None:
if self._writer is None:
self._connection.release()
self._connection = None
else:
self._writer.add_done_callback(lambda f: self._release_connection())
self._writer.add_done_callback(
self._cleanup_writer_and_release_connection
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a separate function to ensure any future refactoring never generates a loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there definitely an issue here? The writer task itself does self._writer = None, which is why I originally left this code like this. It might be worth making a change to make it clearer that the behaviour is correct, I'm just aware that we already have multiple blocks of code in this class doing almost identical things, so I was trying to avoid adding another one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop is

_release_connection is called and self._writer is not None so it calls the self._writer.add_done_callback, the writer finishes and the callback fires, so _release_connection is called again, and since self._writer was never unset on ClientResponse it does the self._writer.add_done_callback again and since its already done, it fires via call_soon, and the loop repeats.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it happened without intervention and I only found the issue because the container was using 100% cpu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but why does the callback happen when the writer has not been reset? The task itself resets the attribute before completing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, just realised, that assignment in the init needs to trigger the callback logic. Just pushed a commit to ensure that happens. Can you try again with that one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a shot as soon as I get back home < 1h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cherry-picked f45d9e4 cleanly, results shortly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came up cleanly, loop did not happen right away as before.

profile is clean, py-spy is clean

Will report back tomorrow as the original symptoms only happened after about 12 hours (not sure one which request)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything ran fine overnight.
I rebooted a few switches and routers to generate some network chaos and everything recovered just fine

Closing this PR in favor of #7815


async def _wait_released(self) -> None:
if self._writer is not None:
Expand Down
Loading