Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix infinite loop releasing the connection when the writer is not finished #7798

Closed
wants to merge 9 commits into from

Conversation

bdraco
Copy link
Member

@bdraco bdraco commented Nov 7, 2023

What do these changes do?

#7764 ensured the connection was released if the writer was not done yet, but it never cleared the writer so it would schedule callbacks in a loop

Screenshot 2023-11-07 at 7 53 35 AM

Are there changes in behavior for the user?

Related issue number

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes
  • If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • Add a new news fragment into the CHANGES folder
    • name it <issue_id>.<type> for example (588.bugfix)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

…ished

aio-libs#7764 ensured the connection was released if the writer was
not done yet, but it never cleared the writer so it would
schedule callbacks in a loop
@bdraco bdraco mentioned this pull request Nov 7, 2023
8 tasks
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Nov 7, 2023
Comment on lines 977 to 990
self._writer.add_done_callback(lambda f: self._release_connection())
self._writer.add_done_callback(
self._cleanup_writer_and_release_connection
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it a separate function to ensure any future refactoring never generates a loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there definitely an issue here? The writer task itself does self._writer = None, which is why I originally left this code like this. It might be worth making a change to make it clearer that the behaviour is correct, I'm just aware that we already have multiple blocks of code in this class doing almost identical things, so I was trying to avoid adding another one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop is

_release_connection is called and self._writer is not None so it calls the self._writer.add_done_callback, the writer finishes and the callback fires, so _release_connection is called again, and since self._writer was never unset on ClientResponse it does the self._writer.add_done_callback again and since its already done, it fires via call_soon, and the loop repeats.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it happened without intervention and I only found the issue because the container was using 100% cpu.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but why does the callback happen when the writer has not been reset? The task itself resets the attribute before completing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, just realised, that assignment in the init needs to trigger the callback logic. Just pushed a commit to ensure that happens. Can you try again with that one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a shot as soon as I get back home < 1h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cherry-picked f45d9e4 cleanly, results shortly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came up cleanly, loop did not happen right away as before.

profile is clean, py-spy is clean

Will report back tomorrow as the original symptoms only happened after about 12 hours (not sure one which request)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything ran fine overnight.
I rebooted a few switches and routers to generate some network chaos and everything recovered just fine

Closing this PR in favor of #7815

@bdraco bdraco marked this pull request as ready for review November 7, 2023 14:28
@bdraco bdraco requested a review from asvetlov as a code owner November 7, 2023 14:28
Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Merging #7798 (885afdb) into master (27c308b) will decrease coverage by 0.01%.
Report is 1 commits behind head on master.
The diff coverage is 83.33%.

@@            Coverage Diff             @@
##           master    #7798      +/-   ##
==========================================
- Coverage   97.42%   97.41%   -0.01%     
==========================================
  Files         106      106              
  Lines       32111    32115       +4     
  Branches     3726     3725       -1     
==========================================
+ Hits        31283    31286       +3     
- Misses        626      627       +1     
  Partials      202      202              
Flag Coverage Δ
CI-GHA 97.33% <83.33%> (-0.01%) ⬇️
OS-Linux 97.01% <83.33%> (-0.01%) ⬇️
OS-Windows 95.50% <83.33%> (-0.01%) ⬇️
OS-macOS 96.68% <83.33%> (-0.01%) ⬇️
Py-3.10.11 95.42% <83.33%> (-0.01%) ⬇️
Py-3.10.13 96.87% <83.33%> (-0.01%) ⬇️
Py-3.11.6 96.53% <83.33%> (-0.01%) ⬇️
Py-3.8.10 95.39% <83.33%> (-0.01%) ⬇️
Py-3.8.18 96.79% <83.33%> (-0.01%) ⬇️
Py-3.9.13 95.39% <83.33%> (-0.01%) ⬇️
Py-3.9.18 96.83% <83.33%> (-0.01%) ⬇️
Py-pypy7.3.11 96.23% <83.33%> (-0.02%) ⬇️
VM-macos 96.68% <83.33%> (-0.01%) ⬇️
VM-ubuntu 97.01% <83.33%> (-0.01%) ⬇️
VM-windows 95.50% <83.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
aiohttp/client_reqrep.py 98.41% <83.33%> (-0.15%) ⬇️

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@bdraco
Copy link
Member Author

bdraco commented Nov 11, 2023

closing in favor of #7815

@bdraco bdraco closed this Nov 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided There is a change note present in this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants