-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep in-document hyperlinks after merged #22
Comments
Hi @no1xsyzy! Thanks for using pdfmerge. I ran a few tests to try and figure out what's going on. Inputs
Test 1:
Test 2:
Test 3:
Test 4:
I'm not exactly sure what is happening, but it seems that if the page with a link and it's target aren't written to the output stream at the same time, the link gets broken. pdfmerge is built on pyPDF2, so I'm going to see if there's any information about how this works and if there's anything I can do to prevent that from happening. Is there any other information about what you were trying to do that I should know in diagnosing this error? |
This is very interesting because it disproved my hypothesis. I need to learn more about how links get put into the output stream, but for the record, this is where I'm adding pages to the output stream which just calls addPage using pyPDF2. Not sure at which point the links are getting dropped. |
I would also be very interested in a fix for this. My use case is merging multiple complete pdf documents (not picking specific pages from any). I get this warning when I do the merge (I'm not sure if its relevant or not?):
|
Hi @exptom! Thanks for reaching out about your situation and the warning you're getting. This might all be related, so we'll keep it in this issue for now. I just did a test with adding a pdf to the end of a pdf that starts with a TOC and the links still work. What are you using to generate the separate pdfs? |
@metaist thanks for getting back to me. The initial pdf that includes the TOC is created using wkhtmltopdf (https://github.com/wkhtmltopdf/wkhtmltopdf) and the additional pdfs that are merged as appendicies can come from anywhere. (Users upload them) |
Oh, so they literally start out as HTML links, are converted to PDF links. Interesting. Will begin my deep dive into how PDF links actually work and are encoded. This may require an upstream patch to PyPDF2 once I figure out how their stuff works. I'm also looking at other places where people have issues with PDF links (e.g., combine_pdf) to see if I can learn anything from their general experience. Unfortunately, I do not have an easy short-term fix, but will keep this issue open and post here as I learn new things. |
They aren't actually HTML links. What happens is that wkhtmltopdf converts the HTML page to a PDF document and scans the HTML pulling out all the heading tags ( |
I just released |
It seems like |
Background: I used pandoc+texlive for my thesis and pdfmerge for another submission (I study in cooperated college).
After merges, TOC links and reference links don't work anymore. They are still links but clicking it will not navigate to the link target.
I think it a bug because they should've been there. They are hyperlinks, and their direction is definite.
The text was updated successfully, but these errors were encountered: