-
-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak when generating reports #1791
Comments
As I asked on that issue, I need clear complete instructions for how to reproduce the issue. No one has mentioned what version of coverage.py they are using. I don't know how to make the problem appear so that I can debug it. If you know of a way for me to do that, please provide the instructions. |
I'm sorry, that still doesn't give me a way to reproduce the issue. I'm available under contract if you need me to sign an NDA to see your private repo. |
Sounds like #1785 to me: affects Python 3.12, eats up a lot of memory. @0x78f1935, does your code happen to have very long lines in it? |
I'm sorry, I cannot do that. Altho @sebastian-muthwill, mentioned he was able to reproduce the issue I had encountered.
No, my unit-tests do not utilize a large mock data set. That being said, I also utilize Flake8 across my codebase. |
@nedbat I take the discussion up here. My project is much bigger and is failing. I try to reproduce the issue and come back. |
I can confirm that this happens to me with big project(s) too. First, I thought maybe my project was the issue, but now that @sebastian-muthwill was able to reproduce the issue, I'm more confident we are onto something. I would love to be of help; unfortunately, I'm unable to share my codebase. Just like Sebastian's codebase, mine is big. I took some time to take my codebase apart and basically started removing all components I didn't want to share so I could make a repository that reproduces the issue. After removing the frontend and Docker environments, I was left with my tests and my backend. I trimmed my tests to a total of one test and was still able to reproduce the memory leak. The moment I started touching endpoints in my backend (which I don't want to share), things started to work, which is why I'm a bit uncertain. The strange part is that it doesn't really matter which component I turn on or off in any order. When they are turned on, there is a memory leak; when they are turned off, everything is okay (though there is no data to report), which might explain why that approach appears successful. This makes it difficult to pinpoint a specific component. I thought to myself, that isn't very helpful. Surely, I can find an existing project of someone and transform that into a reproducible environment, but so far no luck with reproduction. I tried various things; I even tried my own project.toml file. No luck. I hate to say it, but the best workaround I have right now, if you really need your coverage, downgrade to Python 3.11 or skip generating reports all together. |
There is a memory issue that only affects Python 3.12, due to a bug in that Python version: python/cpython#119118. I believe this is the issue affecting you. There's a test you can run to check whether this is the issue we think it is. In coveragepy/coverage/phystokens.py, change the line:
To:
It should lessen the memory impact, if it's the same issue. |
@devdanzin Thanks for pointing that out. @nedbat I just tested it and can confirm that the issue is gone when settings the return as @devdanzin described.
Since it is not reproduceable with a fresh installation, I could imagine it has maybe something to do with the amount of tests run? |
So far, we had only seen this issue manifest on files with very long (many thousand characters) lines, where memory ballooned very fast and resulted in a Since it comes from a Python 3.12 bug, it seemed OK to keep the However, given that your case indicates the |
Without having it measured, I would say it was not longer than in 3.11. However I have only 53 tests so the impact is maybe not that big. |
Thinking about it, the use of Which brings the question: is the removal of the list() call correct given presence of the cache? Wouldn't we be caching an exhausted iterator in that case? |
Confirmed that just removing the Given that tokenizing seems much faster on 3.12 compared to 3.9 (and uses a lot more memory), maybe we could define |
I've tested out your theory. @functools.lru_cache(maxsize=100)
def generate_tokens(text: str) -> TokenInfos:
"""A cached version of `tokenize.generate_tokens`.
When reporting, coverage.py tokenizes files twice, once to find the
structure of the file, and once to syntax-color it. Tokenizing is
expensive, and easily cached.
Unfortunately, the HTML report code tokenizes all the files the first time
before then tokenizing them a second time, so we cache many. Ideally we'd
rearrange the code to tokenize each file twice before moving onto the next.
"""
readline = io.StringIO(text).readline
# return list(tokenize.generate_tokens(readline))
return tokenize.generate_tokens(readline) The behaviour of the end changed drastically.
I think your issue might be related, yes! My current run command looks like
I don't think that is the case, while I was stripping down to a reproducable version I was able to walk against this issue with just one test.
I would say 3.11 is significant faster in comparison. Which is weird, cause wasn't 3.12 suppose to be 40% faster?
Lets test it: # @functools.lru_cache(maxsize=100)
def generate_tokens(text: str) -> TokenInfos:
"""A cached version of `tokenize.generate_tokens`.
When reporting, coverage.py tokenizes files twice, once to find the
structure of the file, and once to syntax-color it. Tokenizing is
expensive, and easily cached.
Unfortunately, the HTML report code tokenizes all the files the first time
before then tokenizing them a second time, so we cache many. Ideally we'd
rearrange the code to tokenize each file twice before moving onto the next.
"""
readline = io.StringIO(text).readline
return list(tokenize.generate_tokens(readline))
# return tokenize.generate_tokens(readline) This causes the same issue to arrise. Memoryleak. I'll try without list foor good meassure. # @functools.lru_cache(maxsize=100)
def generate_tokens(text: str) -> TokenInfos:
"""A cached version of `tokenize.generate_tokens`.
When reporting, coverage.py tokenizes files twice, once to find the
structure of the file, and once to syntax-color it. Tokenizing is
expensive, and easily cached.
Unfortunately, the HTML report code tokenizes all the files the first time
before then tokenizing them a second time, so we cache many. Ideally we'd
rearrange the code to tokenize each file twice before moving onto the next.
"""
readline = io.StringIO(text).readline
# return list(tokenize.generate_tokens(readline))
return tokenize.generate_tokens(readline) This seems to work, but it doesn't increase the speed. Just like the first attempt, it's stuck on 100%, memory is idle, takes a couple of minutes before html files start appearing. But it takes ages for it to complete. @devdanzin You beat me too it :P , Thank you for your research! |
Very odd that tokenize makes new copies of the line for every token. That seems unnecessary. I wrote a CPython issue about it: python/cpython#119654 |
I did some quick experiments with both many-line files and very-long-line files, and found that on both 3.11 and 3.12 it was faster to not cache and to not use |
This is now released as part of coverage 7.5.3. |
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [coverage](https://github.com/nedbat/coveragepy) | `==7.5.2` -> `==7.5.3` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/coverage/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/coverage/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/coverage/7.5.2/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/coverage/7.5.2/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>nedbat/coveragepy (coverage)</summary> ### [`v7.5.3`](https://github.com/nedbat/coveragepy/blob/HEAD/CHANGES.rst#Version-753--2024-05-28) [Compare Source](https://github.com/nedbat/coveragepy/compare/7.5.2...7.5.3) - Performance improvements for combining data files, especially when measuring line coverage. A few different quadratic behaviors were eliminated. In one extreme case of combining 700+ data files, the time dropped from more than three hours to seven minutes. Thanks for Kraken Tech for funding the fix. - Performance improvements for generating HTML reports, with a side benefit of reducing memory use, closing `issue 1791`\_. Thanks to Daniel Diniz for helping to diagnose the problem. .. \_issue 1791:[nedbat/coveragepy#1791 .. \_changes\_7-5-2: </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/allenporter/flux-local). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzcuOCIsInVwZGF0ZWRJblZlciI6IjM3LjM3Ny44IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [coverage](https://github.com/nedbat/coveragepy) | `==7.5.2` -> `==7.5.3` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/coverage/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/coverage/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/coverage/7.5.2/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/coverage/7.5.2/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>nedbat/coveragepy (coverage)</summary> ### [`v7.5.3`](https://github.com/nedbat/coveragepy/blob/HEAD/CHANGES.rst#Version-753--2024-05-28) [Compare Source](https://github.com/nedbat/coveragepy/compare/7.5.2...7.5.3) - Performance improvements for combining data files, especially when measuring line coverage. A few different quadratic behaviors were eliminated. In one extreme case of combining 700+ data files, the time dropped from more than three hours to seven minutes. Thanks for Kraken Tech for funding the fix. - Performance improvements for generating HTML reports, with a side benefit of reducing memory use, closing `issue 1791`\_. Thanks to Daniel Diniz for helping to diagnose the problem. .. \_issue 1791:[nedbat/coveragepy#1791 .. \_changes\_7-5-2: </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/allenporter/pyrainbird). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzcuOCIsInVwZGF0ZWRJblZlciI6IjM3LjM3Ny44IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [coverage](https://github.com/nedbat/coveragepy) | `==7.5.1` -> `==7.5.3` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/coverage/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/coverage/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/coverage/7.5.1/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/coverage/7.5.1/7.5.3?slim=true)](https://docs.renovatebot.com/merge-confidence/) | | [cryptography](https://github.com/pyca/cryptography) ([changelog](https://cryptography.io/en/latest/changelog/)) | `==42.0.7` -> `==42.0.8` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/cryptography/42.0.8?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/cryptography/42.0.8?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/cryptography/42.0.7/42.0.8?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/cryptography/42.0.7/42.0.8?slim=true)](https://docs.renovatebot.com/merge-confidence/) | | [freezegun](https://github.com/spulec/freezegun) ([changelog](https://github.com/spulec/freezegun/blob/master/CHANGELOG)) | `==1.5.0` -> `==1.5.1` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/freezegun/1.5.1?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/freezegun/1.5.1?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/freezegun/1.5.0/1.5.1?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/freezegun/1.5.0/1.5.1?slim=true)](https://docs.renovatebot.com/merge-confidence/) | | [platformdirs](https://github.com/platformdirs/platformdirs) | `==4.2.1` -> `==4.2.2` | [![age](https://developer.mend.io/api/mc/badges/age/pypi/platformdirs/4.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/pypi/platformdirs/4.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/pypi/platformdirs/4.2.1/4.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/pypi/platformdirs/4.2.1/4.2.2?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>nedbat/coveragepy (coverage)</summary> ### [`v7.5.3`](https://github.com/nedbat/coveragepy/blob/HEAD/CHANGES.rst#Version-753--2024-05-28) [Compare Source](https://github.com/nedbat/coveragepy/compare/7.5.2...7.5.3) - Performance improvements for combining data files, especially when measuring line coverage. A few different quadratic behaviors were eliminated. In one extreme case of combining 700+ data files, the time dropped from more than three hours to seven minutes. Thanks for Kraken Tech for funding the fix. - Performance improvements for generating HTML reports, with a side benefit of reducing memory use, closing `issue 1791`\_. Thanks to Daniel Diniz for helping to diagnose the problem. .. \_issue 1791:[nedbat/coveragepy#1791 .. \_changes\_7-5-2: ### [`v7.5.2`](https://github.com/nedbat/coveragepy/blob/HEAD/CHANGES.rst#Version-752--2024-05-24) [Compare Source](https://github.com/nedbat/coveragepy/compare/7.5.1...7.5.2) - Fix: nested matches of exclude patterns could exclude too much code, as reported in `issue 1779`\_. This is now fixed. - Changed: previously, coverage.py would consider a module docstring to be an executable statement if it appeared after line 1 in the file, but not executable if it was the first line. Now module docstrings are never counted as executable statements. This can change coverage.py's count of the number of statements in a file, which can slightly change the coverage percentage reported. - In the HTML report, the filter term and "hide covered" checkbox settings are remembered between viewings, thanks to `Daniel Diniz <pull 1776_>`\_. - Python 3.13.0b1 is supported. - Fix: parsing error handling is improved to ensure bizarre source files are handled gracefully, and to unblock oss-fuzz fuzzing, thanks to `Liam DeVoe <pull 1788_>`*. Closes `issue 1787`*. .. \_pull 1776:[nedbat/coveragepy#1776 .. \_issue 1779[nedbat/coveragepy#1779 .. \_issue 178[nedbat/coveragepy#1787 .. \_pull 17[nedbat/coveragepy#1788 .. \_changes\_7-5-1: </details> <details> <summary>pyca/cryptography (cryptography)</summary> ### [`v42.0.8`](https://github.com/pyca/cryptography/compare/42.0.7...42.0.8) [Compare Source](https://github.com/pyca/cryptography/compare/42.0.7...42.0.8) </details> <details> <summary>spulec/freezegun (freezegun)</summary> ### [`v1.5.1`](https://github.com/spulec/freezegun/blob/HEAD/CHANGELOG#151) [Compare Source](https://github.com/spulec/freezegun/compare/1.5.0...1.5.1) - Fix the typing of the `tick()` method, and improve it's behaviour. </details> <details> <summary>platformdirs/platformdirs (platformdirs)</summary> ### [`v4.2.2`](https://github.com/platformdirs/platformdirs/releases/tag/4.2.2) [Compare Source](https://github.com/platformdirs/platformdirs/compare/4.2.1...4.2.2) <!-- Release notes generated using configuration in .github/release.yml at main --> #### What's Changed - Fix android detection when python4android is present by [@​tmolitor-stud-tu](https://github.com/tmolitor-stud-tu) in [tox-dev/platformdirs#277 #### New Contributors - [@​tmolitor-stud-tu](https://github.com/tmolitor-stud-tu) made their first contribution in [tox-dev/platformdirs#277 **Full Changelog**: tox-dev/platformdirs@4.2.1...4.2.2 </details> --- ### Configuration 📅 **Schedule**: Branch creation - "every weekend" in timezone Etc/UTC, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 👻 **Immortal**: This PR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/canonical/charmcraft). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTMuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5My4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJkZXBlbmRlbmNpZXMiXX0=--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Describe the bug
Please see this issue
To Reproduce
Generate any reports, json / html, doesn't matter.
Expected behavior
Reports will be generated, without eating away all my memory.
Additional context
This issue has been going on for a couple of months now and contains way more information and things that have been tried to solve this issue.
So far we have worked out that the issue occures spefically with
coverage report
.Any further tips?
The text was updated successfully, but these errors were encountered: