Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download some arXiv links if the "eprint" field is missing #7660

Closed
vlawhern opened this issue Apr 22, 2021 · 3 comments · Fixed by #7663
Closed

Unable to download some arXiv links if the "eprint" field is missing #7660

vlawhern opened this issue Apr 22, 2021 · 3 comments · Fixed by #7663
Labels
fetcher good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement

Comments

@vlawhern
Copy link

vlawhern commented Apr 22, 2021

JabRef version 5.3 developmental portable JAR

I am unable to automatically download some arXiv links if the "eprint" field is missing. For example, this BibTeX reference works:

@Article{booth_bayes-trex_2020,
  author        = {Serena Booth and Yilun Zhou and Ankit Shah and Julie Shah},
  journal       = {arXiv:2002.10248v4 [cs]},
  title         = {Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example},
  year          = {2020},
  month         = dec,
  archiveprefix = {arXiv},
  eprint        = {2002.10248},
  url           = {http://arxiv.org/abs/2002.10248v4},
}

but this one doesn't:

@Article{booth_bayes-trex_2020,
  author        = {Serena Booth and Yilun Zhou and Ankit Shah and Julie Shah},
  journal       = {arXiv:2002.10248v4 [cs]},
  title         = {Bayes-TrEx: a Bayesian Sampling Approach to Model Transparency by Example},
  year          = {2020},
  month         = dec,
  archiveprefix = {arXiv},
  url           = {http://arxiv.org/abs/2002.10248v4},
}

with the only thing different is the eprint field.

Interestingly this doesn't always happen; most of the time JabRef can fetch the arXiv PDF correctly without the eprint field, but for some reason JabRef fails to get other arXiv PDFs like the one above. I suspect it's a formatting issue similar to Issue #7633. It's not always consistent, but it feels like paper titles with colons appear to trigger this behavior more than others.

Perhaps it can be set when the url field is provided that JabRef could use that as a fallback to download the PDF..

@tobiasdiez tobiasdiez added fetcher good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement labels Apr 23, 2021
@tobiasdiez
Copy link
Member

This can be fixed by runing the EprintCleanup on a copy of the entry at

private List<ArXivEntry> searchForEntries(BibEntry entry) throws FetcherException {
before proceeding to get the arXiv id from the eprint field.

@JavuesZhang
Copy link
Contributor

Hi, I think I have solved this problem with the help of @tobiasdiez . I will open a PR later.

JavuesZhang added a commit to JavuesZhang/jabref that referenced this issue Apr 23, 2021
1. Run EprintCleanup on a copy of the entry the ArXiv fetcher is fetching before getting arXiv id from the eprint field;
2. Add two test method. One finds full text with title containing colon and journal, while another finds full text with title containing colon and url.
Siedlerchr pushed a commit that referenced this issue Apr 23, 2021
1. Run EprintCleanup on a copy of the entry the ArXiv fetcher is fetching before getting arXiv id from the eprint field;
2. Add two test method. One finds full text with title containing colon and journal, while another finds full text with title containing colon and url.
@tobiasdiez
Copy link
Member

tobiasdiez commented Apr 24, 2021

Thanks to @JavuesZhang this should be fixed in the latest development version. Could you please check the build from http://builds.jabref.org/main/. Thanks! Please remember to make a backup of your library before trying-out this version.

Siedlerchr added a commit that referenced this issue Apr 24, 2021
…om.tngtech.archunit-archunit-junit5-api-0.18.0

* upstream/main:
  Fix exception when searching (#7659)
  Fixes #7660 (#7663)
  Fix for issue 5850: Journal abbreviations in UTF-8 not recognized (#7639)
  Fix SSLHandshake Exception by using bypass (#7657)
  Fix for issue 7633: Unable to download arXiv pdfs if Title contains curly brackets (#7652)
  Fix#7195 partly Opacity of disabled icon-buttons
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fetcher good first issue An issue intended for project-newcomers. Varies in difficulty. type: enhancement
Projects
Archived in project
3 participants