Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CodeParser not opening source files with proper decoder #107

Closed
nedbat opened this issue Jan 24, 2011 · 3 comments
Closed

CodeParser not opening source files with proper decoder #107

nedbat opened this issue Jan 24, 2011 · 3 comments
Labels
bug Something isn't working html

Comments

@nedbat
Copy link
Owner

nedbat commented Jan 24, 2011

Originally reported by Brett Cannon (Bitbucket: brettcannon, GitHub: brettcannon)


In CodeParser.init() you will notice that it is opening a source file and then reading it, relying on the default encoding for open(). This can trigger a UnicodeDecodeError if the source file specifies an explicit encoding other than Unicode (on Python 3).

For example, in Python's stdlib, Lib/sqlite3/test/dbapi.py has a specified encoding of ISO-8859-1. But because the CodeParser doesn't use something like tokenize.detect_encoding() (http://docs.python.org/py3k/library/tokenize.html#tokenize.detect_encoding) the read fails as there is some bytes in there not allowed under UTF-8 but are valid under ISO-8859-1.


@nedbat
Copy link
Owner Author

nedbat commented Jan 24, 2011

Original comment by Brett Cannon (Bitbucket: brettcannon, GitHub: brettcannon)


This also rears its head in CodeUnit.source_file().

@nedbat
Copy link
Owner Author

nedbat commented Jan 24, 2011

Original comment by Brett Cannon (Bitbucket: brettcannon, GitHub: brettcannon)


Attached is a patch that uses Python 3.2's tokenize.open() when available. A solution that works for Python 3.0 and 3.1 could be created by copying the implementation of tokenize.open(), but I went the easier route. =)

BTW, Ned, do you prefer patches or pull requests?

@nedbat
Copy link
Owner Author

nedbat commented Jan 30, 2011

Fixed in <<changeset bfb4640496bf (bb)>>. I made similar changes in a few more places that seemed like they would also need them.

@nedbat nedbat closed this as completed Jan 30, 2011
@nedbat nedbat added major bug Something isn't working html labels Jun 23, 2018
agronholm added a commit to agronholm/coveragepy that referenced this issue Aug 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working html
Projects
None yet
Development

No branches or pull requests

1 participant