Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing XMP metadata to PDFs skips my linked pdf file. Improve error messages #8278

Closed
2 tasks done
ThiloteE opened this issue Nov 26, 2021 · 25 comments · Fixed by #8307 or #8332
Closed
2 tasks done

Writing XMP metadata to PDFs skips my linked pdf file. Improve error messages #8278

ThiloteE opened this issue Nov 26, 2021 · 25 comments · Fixed by #8307 or #8332
Assignees
Labels
bug Confirmed bugs or reports that are very likely to be bugs external files

Comments

@ThiloteE
Copy link
Member

JabRef version

Latest development branch build (please note build date below)

Operating system

Windows

Details on version and operating system

Windows 10 21H1

Checked with the latest development build

  • I made a backup of my libraries before testing the latest development version.
  • I have tested the latest development version and the problem persists

Steps to reproduce the behaviour

JabRef 5.4--2021-11-15--9955a33
Windows 10 10.0 amd64
Java 16.0.2
JavaFX 17.0.1+1

Writing XMP metadata to pdf(s) skips my linked pdf file

How to reproduce:

  1. Have a pdf file with linked XMP metadata

  2. drag this file into Jabref (a entry emerges; Let's call it X)

  3. Have another entry (let's call it Y) in another library that also links to this same file

  4. Write metadata from Y to pdf via the following feature:
    image

  5. Try to write metadata from X to pdf via the F6 feature.

As soon as you use Writing XMP metadata to PDFs (F6) feature the following message pops up, even though my file is linked:

image

Appendix

...

Log File
Paste an excerpt of your log file here
@ThiloteE
Copy link
Member Author

ThiloteE commented Nov 26, 2021

Minimal example with steps to reproduce:

  1. Have a pdf.
  2. Have an entry in Jabref
  3. Attach the pdf file to the entry.
  4. Write XMP metadata from entry to pdf file (F6).

File gets skipped, there is the pop-up Edit: and XMP metadata will definitely NOT be written. I verified via Foxitreader.

Edit: If one uses the following feature, XMP metadata will be written!, but there is no pop-up:

image

@ThiloteE
Copy link
Member Author

ThiloteE commented Nov 26, 2021

Additional info:

  • I verified the written metadata by dragging the manipulated pdf file into Jabref. The new entry that was thereby generated still contained old library data and the data that was supposed to be written was missing.

  • I can open my pdfs from within Jabref, and linked the files via the search expression feature of Jabref. I use this search expression: **/.*[auth.etal:regex("\\."," & "):regex("\\& etal","et al.")].*[Organization].*[YEAR].*\\.[extension].

  • the issue also happens, when i attach a file to an entry via right click on entry and attach file.

  • the issue also happens, when i attach a file by editing the file path in the general tab of the entry editor by right clicking the file and using the edit function.

  • Maybe relevant: https://discourse.jabref.org/t/write-to-xmp-feature-seems-to-work-only-with-relative-path/1714

  • Jabref Documentation about XMP Metadata: https://docs.jabref.org/advanced/xmp

@ThiloteE
Copy link
Member Author

Uuuuh XMP metadata actually WAS WRITTEN, but does not get imported into Jabref

  1. This is what i pushed to the file:
@Article{Thompson20200330MCR,
  author       = {Thompson, {Mark R.}},
  date         = {2020-03-30},
  journaltitle = {Global Asia},
  title        = {Middle-Class Remorse: Re-embracing Liberal Democracy in the Philippines and Thailand},
  language     = {English},
  number       = {1},
  pages        = {60--64},
  url          = {https://scholars.cityu.edu.hk/en/publications/publication(484a64e7-5da0-4ea6-affc-0d3e9ffdc7d7).html},
  urldate      = {2021-11-13},
  volume       = {15},
  abstract     = {Democracy in both Thailand and the Philippines has been crippled by support given to anti-democratic forces by intellectuals who foolishly endorsed leaders who have done serious damage to the prospects for democratic governance in both countries. The middle classes that supported those views are now seized by remorse, and it remains to be seen whether a course correction in both countries is on the horizon, writes Mark R. Thompson.},
  file         = {:Thompson (2020-03-30) middle-class-remorse-re-embracing-liberal-democracy-in-the-philippines-and-thailand.pdf:PDF},
  publisher    = {East Asia Foundation},
}
  1. With Foxitreader i can see that this is the metadata attached to the pdf-file:
    image

  2. But when i drag the pdf into Jabref THIS is the entry that will be generated:

@InProceedings{RemorseEtAl2021Fhc,
  author = {Middle-Class Remorse and Re-embracing Liberal and Democracy in the Philippines and Thailand and By Mark and R. Thompson},
  date   = {2021},
  title  = {Firefox https://www.globalasia.org/v15no1/cover/middle-class-remorse-re-emb},
  file   = {:E\:/server-t-150/FAU/3. Semester/AER Demokratische Transformation in Asien/00 Hausarbeit Buente 3/Thompson (2020-03-30) middle-class-remorse-re-embracing-liberal-democracy-in-the-philippines-and-thailand.pdf:PDF},
}

@ThiloteE
Copy link
Member Author

ThiloteE commented Nov 29, 2021

@btut explained why the import leads to different metadata displayed than actually is attached via XMP (https://discourse.jabref.org/t/extract-information-from-pdf-import/2899):

Indeed, if Grobid is enabled the importer uses the following order to obtain metadata:

    Look for bibtex entry on first page of pdf
    Look for embedded bib file
    Grobid
    XMP metadata
    Attempt to find metadata on first page (not in bibtex format).

If you want to force an XMP import, you can go to file → import → either to current or new library and select XMP-annotated PDF (last in the drop-down list) in the bottom right corner.

So out of the three issues i identified in this thread, only one is still open and the other two are solved:

  • Different XMP metadata is shown upon import of pdf than was written to pdf. Solution see above.
  • Metadata is not written to pdf. Solved --> it can be written by using
    image
  • Writing XMP metadata to PDFs skips my linked pdf file when using the Writing XMP metadata to PDFs (F6) feature. Not solved.

@btut
Copy link
Contributor

btut commented Nov 29, 2021

Hi, sorry I only saw this issue now after you tagged me in the forum. I'll look into that last point. Thanks for the detailed report!

@btut btut self-assigned this Nov 29, 2021
@Siedlerchr Siedlerchr added bug Confirmed bugs or reports that are very likely to be bugs external files search and removed search labels Dec 3, 2021
@btut btut mentioned this issue Dec 6, 2021
5 tasks
@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 7, 2021

Hey i just tried the new development version with the fix.

It works! Thank you! 👍

Unfortunately it does not work for all linked files. Out of 55 entries it returned 11 with error.

Here one example:

@Online{FreedomHouse2019Fit,
  date         = {2019},
  title        = {Freedom in the World 2019},
  url          = {https://freedomhouse.org/report/freedom-world/2019/democracy-retreat},
  organization = {Freedom House},
  subtitle     = {Democracy in Retreat},
  urldate      = {2021-09-30},
  comment      = {This is the overview on the website about their booklet and research "Freedom in the world 2019"},
  file         = {:Freedomhouse/Brandt et al. (2019) Freedomhouse. Freedom in the world 2019.pdf:PDF},
}

The corresponding error message was:

FreedomHouse2019Fit
  Error while writing 'E:\server-t-150\FAU\3. Semester\AER Demokratische Transformation in Asien\00 Hausarbeit Buente 3\Freedomhouse\Brandt et al. (2019) Freedomhouse. Freedom in the world 2019.pdf':
    null

I then checked the metadata that was attached to the file via foxitreader:
image

Then, i used the other method to write metadata (the one you can find in the general tab). This time it wrote correctly and this emerged:

image

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 7, 2021

I checked for another entry and that one yielded 0 results with both methods. None of the methods were able to successfully write metadata. I suspect Jabref is not able to write to all pdf versions (e.g. 1.7 or pdf/A), but i would need to do some more tests to know for sure.

@InBook{Adetula2011Mda,
  author    = {Adetula, Victor},
  booktitle = {Governance in the 21st Century},
  date      = {2011},
  title     = {Measuring democracy and ‘good governance’in Africa: A critique of assumptions and methods},
  editor    = {Kondlo, Kwandiwe and Ejiogu, Chinenyengozi},
  location  = {Cape Town},
  pages     = {10--25},
  publisher = {HSRC Press},
  series    = {Africa in Focus},
  url       = {https://www.diva-portal.org/smash/record.jsf?pid=diva2:946552},
  urldate   = {2021-09-25},
  file      = {:Adetula (2011) Measuring democracy and good governance in africa kommentiert.pdf:PDF;:Adetula (2011) Measuring democracy and good governance in africa.pdf:PDF},
  keywords  = {Democracy, good governance, political reforms, neo-liberal, civil society, elections, aids},
  urn       = {urn:nbn:se:nai:diva-2051},
}

Here the info from foxitreader:

image

And this is the error message when i tried to import the pdf file with XMP annotated PDF:

org.jabref.logic.JabRefException: No entries found. Please make sure you are using the correct import filter.
	at org.jabref@5.4.540/org.jabref.gui.importer.ImportAction.lambda$automatedImport$1(Unknown Source)
	at org.jabref@5.4.540/org.jabref.gui.util.BackgroundTask$1.call(Unknown Source)
	at org.jabref@5.4.540/org.jabref.gui.util.DefaultTaskExecutor$1.call(Unknown Source)
	at org.jabref.merged.module@5.4.540/javafx.concurrent.Task$TaskCallable.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 7, 2021

Still, i am happy that at least SOMETHING is written and i don't have to manually do the click for every single entry. xD A little step. And then another step and another. :) Suddenly we find ourselves on the shoulders on giants.

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 7, 2021

@btut
Copy link
Contributor

btut commented Dec 7, 2021

Wow - thank you so much for providing all those details! Will look into it.
The F6 action in tools now writes XMP and embedds the bibfile in the pdf. Can you check if both of them failed for the failing cases?

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 7, 2021

Pressing F6 also fails 😶 (For both cases)

@btut
Copy link
Contributor

btut commented Dec 8, 2021

I am very sorry, thats not what I meant to ask. Let me rephrase. When you press F6 OR tools -> export metadata, you get an error for some files. Can you confirm that XMP metadata is missing from those files? I have the feeling that it is the embedded bibtex export that fails, in that case XMP should be fine. We would still have to investigate why that is, but it would ease our investigation if you could confirm XMP is fine.
Or is that what you meant for both cases? xMP and embedded bibtex?

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 8, 2021

I am not sure how i can find out if it was embedded Bibtex that was written to the file or if it was XMP.

  1. More bad news: Now i get an error for every single entry in my library whenever i try to use the tools>write metadata to pdf files or F6 functionality, except the ones that were skipped (because no file was linked). This is very surprising to me as i might have changed one or two entries but not every single entry in my library. The last command on my end that touched all of them was pressing tools>write metadata to pdf files. I tried the same with F6 and there is no change. So to make this more clear: The first export seemed to have changed something so that every subsequent export leads to error messages.

    Here a typical error message for pressing F6 or manually pressing tools>write metadata to pdf files:

    Abdullah202005NNN
    Error while writing 'E:\server-t-150\FAU\3. Semester\AER Demokratische Transformation in Asien\00 Hausarbeit Buente 3\Abdullah (2020) New normal no more. Democratic backsliding in Singapore after 2015.pdf':
     null
    
     Finished writing metadata for 0 file (0 skipped, 1 errors).
    
  2. Good news: It still seems to write something to the ones that were not failing before everything was giving errormessages!

    I tried something new: I installed exiftool (https://exiftool.org/) and checked for metadata by dragging PDFs unto the executable.

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 8, 2021

I found a better command that extracts even more metadata.

"How do I extract absolutely all metadata from a file?"

    By default, duplicate tags, unknown tags, embedded tags, and System tags that require external utilities are not extracted. The main reason for this is performance; extracting these tags will significantly increase processing time for some files. The following command extracts everything possible with ExifTool:

    exiftool -ee3 -U -G3:1 -api requestall=3 -api largefilesupport FILE

    (The -G3:1 option is included in the above command only to give an indication of where the metadata was stored.)

exiftool data for Adetula ALL DATA EXTRACTED.txt
exiftool data for Abdullah ALL DATA EXTRACTED.txt

I think both xmp and bibtex are present in the abdullah file, but NONE in the Adetula file.

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 8, 2021

Heading off to sleep xD Good night.

@Siedlerchr Siedlerchr reopened this Dec 8, 2021
@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 8, 2021

I did not want to make a new thread for this post as you already have all the data about the pdf file i will talk about here in this thread. This post is about import, rather than exporting metadata. In reference to #8311

I tried the new development version.

JabRef 5.4--2021-12-08--74a4edb
Windows 10 10.0 amd64
Java 16.0.2
JavaFX 17.0.1+1

  1. When using import>XMP-annotated-pdf for the Adetula file, it led to a error message that is not understandable for the average user:

    image

Log File
org.jabref.logic.JabRefException: No entries found. Please make sure you are using the correct import filter.
	at org.jabref@5.4.542/org.jabref.gui.importer.ImportAction.lambda$automatedImport$1(Unknown Source)
	at org.jabref@5.4.542/org.jabref.gui.util.BackgroundTask$1.call(Unknown Source)
	at org.jabref@5.4.542/org.jabref.gui.util.DefaultTaskExecutor$1.call(Unknown Source)
	at org.jabref.merged.module@5.4.542/javafx.concurrent.Task$TaskCallable.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Since there is no XMP-metadata attached (at least i think so, according to exiftool) and if in any case that would be cause for the error and also the cause why the entry does not show up in the import dialogue, showing a message that the entry could not be imported, because no metadata was found might make more sense.

  1. When dragging the Adetula file into Jabref by mouse, there was no error message and the following bibliographic data emerged (which i think is the frontpage of the pdf):
@InProceedings{EtAl1997c,
  author = {Measuring democracy and and ‘good governance in Africa and A critique of assumptions and methods },
  date   = {1997},
  title  = {chAPter},
  file   = {:E\:/server-t-150/FAU/3. Semester/AER Demokratische Transformation in Asien/00 Hausarbeit Buente 3/Adetula (2011) Measuring democracy and good governance in africa kommentiert.pdf:PDF},
}

@Siedlerchr
Copy link
Member

So importing should work fine again (except maybe for a better error message).
Exporting still fails?

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 10, 2021

Summary of key points for

JabRef 5.4--2021-12-08--74a4edb
Windows 10 10.0 amd64
Java 16.0.2
JavaFX 17.0.1+1

  1. Yes, file>import>import into current library works fine for the entries i tested. The error message could be phrased better. See 8278#issuecomment-989323085
  2. Export (XMP and bibtex) works for most pdf-files, but not for all. Cause unclear.
  3. Export (tools>write metadata to pdf files or F6) works fine on first run, but leads to error message at and after the second run, even when metadata was written successfully. This error message bug should definitely be fixed. It gives wrong feedback to user. See 8278#issuecomment-988441874

@ThiloteE
Copy link
Member Author

One of the files that can't be exportet to, Adetula (2011) Measuring democracy and good governance in africa.pdf, shows special write protection. Maybe that is it?

image

The only other thing i can think of are the various pdf formats that could mess up things ... (https://www.pdfa.org/resource/pdf-specification-index/)

@ThiloteE
Copy link
Member Author

  1. Having two different entries that have the same file linked leads to an error message for both entries. Data is written, but it mixes data from both entries. I suspect the entry that writes second overwrites part of the data from the entry that was written first. The Error message could be better. There is nothing in the error message that would such a case.

  2. I checked all my failing entries manually. Because of 1. there are a lot less pdfs that fail than first estimated. At the end of the day i only could isolate 3 of them. They all are write protected in a special way.

    Améry (1973) Wider den Strukturalismus. Das Beispiel des Michel Foucault.pdf
    Fukuyama (1995) Reflections on the end of history. Five years later.pdf
    Adetula (2011) Measuring democracy and good governance in africa.pdf

@ThiloteE ThiloteE changed the title Writing XMP metadata to PDFs skips my linked pdf file Writing XMP metadata to PDFs skips my linked pdf file. Improve error messages Dec 10, 2021
@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 11, 2021

To do:

  • Find out why Jabref fails to export metadata to the 3 pdf's i uploaded in the last comment.

  • Change the error message. The following error message emerges for 3 different phenomena.

    Error while writing 'filepath\FILE.pdf':
     null
    
    Finished writing metadata for 0 file (0 skipped, 1 errors).
    
    1. When two entries are linked to the same file and both are selected for tools>write metadata to pdf files or F6, it pops for BOTH entries. Metadata is written though. The entry that was written second overwrites the data from the first entry.

      • Error message should remain, but rephrased to inform user that "During the export process, metadata from two (or more) separate entries were written to the same file".
      • Optional: somehow show which of the entries are linked to the same file.
      • Optional: Create a dialogue/preference/option that allows user to continue, abort or skip the export of metadata for separate entries that write to the same file.
    2. When metadata was written successfully to a file, on every subsequent usage of tools>write metadata to pdf files or F6, the error message pops when used on that specific file again. It is irrelevant, if metadata is written or not written, in this case. It pops regardless. When a "clean new" pdf file is used for the first time, the error message does not emerge.

      • Remove this error message. There should be the normal "OK, metadata was written successfully" instead.
    3. It pops for all definite metadata export failures. Those are the cases where i cannot observe any change in metadata that is attached to the pdf files (the 3 pdf's i uploaded in my last comment).

      • Error message should remain, but rephrased to something like: "Metadata could not be exported/written". Ideally the cause for why metadata could not be written should be included, but if the cause is unknown, the former sentence is fine i think.

@ThiloteE
Copy link
Member Author

  • Rephrase error message for XMP import, when the pdf-file does not have XMP-metadata attached. See comment-989323085. Instead of "No entries found" it should be something like "No XMP-Metadata was found to be attached to the file".

I am done! I think i have tried and written everything i can and know about this issue. 😅 Sorry to have flooded you with posts!

@btut
Copy link
Contributor

btut commented Dec 13, 2021

Hi! Sorry for the late reply - I did not have anything to add/ask and no time to actually work on the issue.
I do have some time now, so I will start looking into why some files still fail. I'll look into your suspicion that the failures may be caused by write protection or by the inability to overwrite existing metadata.
The error message can also be improved, I agree.

When metadata was written successfully to a file, on every subsequent usage of tools>write metadata to pdf files or F6, the error message pops when used on that specific file again. It is irrelevant, if metadata is written or not written, in this case. It pops regardless. When a "clean new" pdf file is used for the first time, the error message does not emerge.

Here I am unsure what to do. Is overwriting always the best solution? What if the 'old' metadata was not written by JabRef, do we still overwrite it? How would we know?

@ThiloteE
Copy link
Member Author

ThiloteE commented Dec 14, 2021

Is overwriting always the best solution? What if the 'old' metadata was not written by JabRef, do we still overwrite it? How would we know?

Prior to your pull-request, Jabref would overwrite metadata regardless, the user just was not aware that the export was successful. I think this could be something for a new issue if you deem it important enough. So far, my workflow is not negatively affected by being able to overwrite metadata (except for i. in #8278 (comment), but that is a very specific issue as well.)

@btut
Copy link
Contributor

btut commented Dec 15, 2021

Prior to your pull-request, Jabref would overwrite metadata regardless

The old way is not always the right way :)

I think this could be something for a new issue if you deem it important enough.

I just wanted to put this out there for discussion. In my opinion, the old way is indeed the right way. The user wants to write metadata -> then write metadata.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs external files
Projects
Archived in project
3 participants