Add logic for parsing references from last page of PDF #11156

koppor · 2024-04-07T08:56:52Z

A scientific paper has a "References" section. Especially when reviewing papers, it would be nice if all references from there would appear parsed within JabRef. This PR implements that. Thus, this PR implements #10200 via offline parsing (no online services used!), follow-up to #10437.

How to use:

Pre Condition

Steps

Create an entry in JabRef
Attach the PDF to JabRef
Open the context menu
Select "Extract references"
A dialog for importing is shown.
Select "Select all entries" and then "Import entries"

Status

Functionality implemented. The UI should show "online" and "offline" more transparently. This is the current work I am implementing.
Works for IEEE papers. This functionality will be used for 1.000+ papers in this field, thus, it is "OK for now". If other reviewers (e.g., for Springer papers) will raise their voice, we can refine the parser.

Mandatory checks

Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
Tests created for changes (if applicable)
Manually tested changed features in running JabRef (always required)
Screenshots added in PR description (for UI changes)
Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

…FromPdfImporter) - Support more date formats - Increase log level for issues for date parsing

src/main/java/org/jabref/logic/importer/fileformat/BibliopgraphyFromPdfImporter.java

src/main/java/org/jabref/gui/maintable/ExtractReferencesAction.java

src/main/java/org/jabref/logic/importer/fileformat/BibliopgraphyFromPdfImporter.java

Siedlerchr

see comments

Siedlerchr · 2024-04-07T09:35:43Z

src/main/java/org/jabref/gui/maintable/ExtractReferencesAction.java

+        for (BibEntry importedEntry : result.getDatabase().getEntries()) {
+            count++;
+            Optional<String> citationKey = importedEntry.getCitationKey();
+            if (citationKey.isPresent()) {


citationKey.map(cites:add).orElseGet( () ->

Not sure if new code is more readable --> "orElseGet" result needs to be added to the list, too. Uses outer variable "count", which is non final. I needed to wrap in anonymous object.

then better use the original code

Siedlerchr · 2024-04-07T09:37:39Z

src/main/java/org/jabref/logic/importer/fileformat/BibliographyFromPdfImporter.java

+        // Y. Shimosaki et al., “Lattice design for 5 MeV – 125 mA CW RFQ operation in LIPAc”, in Proc. IPAC’19, Mel- bourne, Australia, May 2019, pp. 977-979. doi:10.18429/ JACoW-IPAC2019-MOPTS051
+        int pos = reference.indexOf("doi:");
+        if (pos >= 0) {
+            String doi = reference.substring(pos + 4).trim();


Sure that this are always 4 characters?

I am pretty sure that the constant string "doi:" alwas has 4 characters. But in a parallel universe this might change. Thus, I will change to "doi:'.length() later

Siedlerchr · 2024-04-07T09:38:37Z

You should resolve the conflicts in changelog so that the tests are running

Example: "I. Podadera, J. M. Carmona, A. Ibarra, and J. Molla"

src/main/java/org/jabref/gui/maintable/ExtractReferencesAction.java

This reverts commit 7adb334.

github-actions · 2024-04-08T07:15:21Z

The build for this PR is no longer available. Please visit https://builds.jabref.org/main/ for the latest build.

koppor added 3 commits April 7, 2024 09:40

Add logic for parsing references from last page of PDF (Bibliopgraphy…

ffa85c8

…FromPdfImporter) - Support more date formats - Increase log level for issues for date parsing

Remove unused method

3af4e24

Wire in ExtractReferencesAction

cb360ef

github-actions bot reviewed Apr 7, 2024

View reviewed changes

src/main/java/org/jabref/logic/importer/fileformat/BibliopgraphyFromPdfImporter.java Outdated Show resolved Hide resolved

koppor added 2 commits April 7, 2024 10:58

Add CHANGELOG.md entry

3a401be

Fix reviewdog

a944a65

Siedlerchr reviewed Apr 7, 2024

View reviewed changes

src/main/java/org/jabref/gui/maintable/ExtractReferencesAction.java Outdated Show resolved Hide resolved

Siedlerchr reviewed Apr 7, 2024

View reviewed changes

src/main/java/org/jabref/logic/importer/fileformat/BibliopgraphyFromPdfImporter.java Outdated Show resolved Hide resolved

Siedlerchr requested changes Apr 7, 2024

View reviewed changes

koppor added 2 commits April 7, 2024 11:30

Switch "... (online)" and "... (offline)" in context meno

c01dc0b

Fix filename

0705471

Siedlerchr reviewed Apr 7, 2024

View reviewed changes

Make Patterns static

37b7959

koppor added 8 commits April 7, 2024 12:24

Refine AuthorListParser for IEEE formatting

2b77e60

Example: "I. Podadera, J. M. Carmona, A. Ibarra, and J. Molla"

Fix authorlist

4912361

Complete test for tua3i2refpage()

ca33931

Use "Optional" instead of null

ce430fa

Rewrite

7adb334

Merge branch 'main' into parse-from-pdf

4169668

Remove empty lines

021e84d

Replace magic number by "constant"

d6ebcba

koppor changed the title ~~[WIP] Add logic for parsing references from last page of PDF~~ Add logic for parsing references from last page of PDF Apr 7, 2024

koppor mentioned this pull request Apr 7, 2024

Create documentation for "Extract references from PDF" JabRef/user-documentation#484

Open

koppor marked this pull request as ready for review April 7, 2024 11:31

koppor enabled auto-merge April 7, 2024 11:31

koppor mentioned this pull request Apr 7, 2024

[WIP] Extract PDF References #10437

Merged

6 tasks

Siedlerchr reviewed Apr 7, 2024

View reviewed changes

src/main/java/org/jabref/gui/maintable/ExtractReferencesAction.java Outdated Show resolved Hide resolved

Revert "Rewrite"

b33346c

This reverts commit 7adb334.

Add more comments

2875851

koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Apr 7, 2024

Update CHANGELOG.md

e6e8512

Siedlerchr approved these changes Apr 8, 2024

View reviewed changes

koppor added this pull request to the merge queue Apr 8, 2024

Merged via the queue into main with commit a0080ba Apr 8, 2024
21 checks passed

koppor deleted the parse-from-pdf branch April 8, 2024 09:05

koppor mentioned this pull request Jul 21, 2024

Improve .bib from .pdf #11522

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add logic for parsing references from last page of PDF #11156

Add logic for parsing references from last page of PDF #11156

koppor commented Apr 7, 2024 •

edited

Loading

Siedlerchr left a comment

Siedlerchr Apr 7, 2024

koppor Apr 7, 2024

Siedlerchr Apr 7, 2024

Siedlerchr Apr 7, 2024

koppor Apr 7, 2024

Siedlerchr commented Apr 7, 2024

github-actions bot commented Apr 8, 2024 •

edited

Loading

Add logic for parsing references from last page of PDF #11156

Add logic for parsing references from last page of PDF #11156

Conversation

koppor commented Apr 7, 2024 • edited Loading

Pre Condition

Steps

Status

Mandatory checks

Siedlerchr left a comment

Choose a reason for hiding this comment

Siedlerchr Apr 7, 2024

Choose a reason for hiding this comment

koppor Apr 7, 2024

Choose a reason for hiding this comment

Siedlerchr Apr 7, 2024

Choose a reason for hiding this comment

Siedlerchr Apr 7, 2024

Choose a reason for hiding this comment

koppor Apr 7, 2024

Choose a reason for hiding this comment

Siedlerchr commented Apr 7, 2024

github-actions bot commented Apr 8, 2024 • edited Loading

koppor commented Apr 7, 2024 •

edited

Loading

github-actions bot commented Apr 8, 2024 •

edited

Loading