Missing spaces in extract_text() method #1328
Labels
is-bug
From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
whitespace
While doing extract_text, getting the right number of whitespaces (spaces and newlines) is hard.
workflow-text-extraction
From a users perspective, text extraction is the affected feature/workflow
Missing spaces in extract_text() method.
See attached PDFs.
Text is being extracted nice, but it comes with no spaces from almost all fields.
Environment
$ python -c "import pypdf;print(pypdf.__version__)" pypdf==3.14.0
Code + PDF
PDF: 0004.pdf
gives:
expected (copy-pasted with Google chrome):
0000.pdf
Yes, you may add to the tests. It is public
data
from here: https://northdakota.hazconnect.com/ListIncidentPublic.aspxp,s, Thank you for the great package!
The text was updated successfully, but these errors were encountered: