Google search issue in version 3.6 [Fixed in DevBuilds] #1886

swamptromp · 2016-08-29T19:14:26Z

Hello,

Although JabRef 3.6 resolved issues with Google Search, I am still having problems. Google search worked for me for a few hours (I am trying to set up my entire library, so I have been using the search function a lot), but suddenly stopped fetching any search results. In the meantime, Springer and other searches continue to work - it sounds just like the 3.5 issues described on the forum. The first time this happened, I reinstalled JabRef 3.6, and Google search started working again. The problem came up again, though, and this time reinstalling isn't solving it. Is anyone else still having Google search issues? Any suggestions?

Thanks for your help!

// Edit by @matthiasgeiger: The problem with version 3.6 is fixed in the current development builds which are available at https://builds.jabref.org/master (for details see discussion below)

Siedlerchr · 2016-08-29T19:53:47Z

Hello @swamptromp, thanks for your report, I guess you ran into the Google limit. Google blocks your IP for a while, if you do too many automated requests (spam/bot protection) (as noted here)
#1694 (comment)

A solution would be to use the Browser Addon JabFox to import the entries

There is currently nothing we can do and I am not sure if we could display a more specific dialog/error. However, I will create a new issue for that.

swamptromp · 2016-08-29T20:14:04Z

Ah, thanks so much for this @Siedlerchr! Great to know, glad it isn't a longer-term issue.

swamptromp · 2016-08-30T13:27:50Z

Hi, I have one more question. I verified my identity via the Google Scholar website, and although I can now get results through Google Scholar, JabRef still won't return google search requests, even after reinstalling it. Does this problem just go away after a set amount of time?

Siedlerchr · 2016-08-30T15:36:33Z

@swamptromp As it works for me, I can only try to give an explanation. I still think that your JabRef is blocked (uses a specific user agent)
Your Google Scholar settings in the browser (e.g. the fact that you are successfully authorized) are stored in a cookie. However, JabRef does currently not support any form of authentication and therefore is not able to store your account info.

Maybe we can add this for the future which would resolve the problems a bit. In the meantime I would suggest using other fetchers or to manually import it. If you have a DOI or ISBN for a paper/book/..., try to use the DOI to Bibtex/ISBN fetchers, as they directly resolve the number to a bibtex entry.

tobiasdiez · 2016-08-31T03:13:42Z

@JabRef/developers did we tried to get clearance/allowance from google for the userargent = JabRef? This might be worth a try.

mlep · 2016-09-01T09:48:17Z

@JabRef/developers : The help about Google Scholar states that

To unblock your IP, do a Google scholar search in your browser.
You will be asked to show that you are not a robot (a CAPTCHA challenge).

Is this trick currently valid?

Siedlerchr · 2016-09-01T10:05:24Z

@tobiasdiez I remember that the User Agent previously was set to JabRef, but that led to the problem with then non utf-8 response. Maybe we should contact Google?

oscargus · 2016-09-01T10:35:38Z

@mlep Yes, I think so. It was added quite recently.

oscargus · 2016-09-01T10:37:59Z

@Siedlerchr That was solved by explicitly asking for UTF-8, see #1785

Tercus · 2016-09-21T23:34:41Z

I am running into the same problem. When I visit the google scholar page in my browser I have no problems (I get redirected to scholar.google.de though). When I copy the URL from the error log I DO get the message to prove that I am human.
Even weirder is that my search term does not show up anywhere in the URL.

Error log (newly opened, searched for "test"):

java.io.IOException: Server returned HTTP response code: 503 for URL: https://ipv4.google.com/sorry/IndexRedirect?continue=https://scholar.google.com/scholar%3Fhl%3Den%26oe%3DASCII%26num%3D20%26as_sdt%3D2006&hl=en&q=CGMSBFhDZhcYrbCMvwUiGQDxp4NLfnqIFz9y9s9bYeUiRVwogv02XcI

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839) ~[?:1.8.0_60]

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) ~[?:1.8.0_60]

    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) ~[?:1.8.0_60]

    at net.sf.jabref.logic.net.URLDownload.downloadToString(URLDownload.java:123) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.runConfig(GoogleScholarFetcher.java:166) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.processQueryGetPreview(GoogleScholarFetcher.java:82) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GeneralFetcher.lambda$actionPerformed$4(GeneralFetcher.java:191) ~[JabRef-3.6.jar:?]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

  01:30:53.124 [JabRef CachedThreadPool] WARN  net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher - Error fetching from Google Scholar

java.io.IOException: Server returned HTTP response code: 503 for URL: https://ipv4.google.com/sorry/IndexRedirect?continue=https://scholar.google.com/scholar%3Fhl%3Den%26oe%3DASCII%26num%3D20%26as_sdt%3D2006&hl=en&q=CGMSBFhDZhcYrbCMvwUiGQDxp4NLfnqIFz9y9s9bYeUiRVwogv02XcI

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839) ~[?:1.8.0_60]

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) ~[?:1.8.0_60]

    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) ~[?:1.8.0_60]

    at net.sf.jabref.logic.net.URLDownload.downloadToString(URLDownload.java:123) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.runConfig(GoogleScholarFetcher.java:166) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.processQueryGetPreview(GoogleScholarFetcher.java:82) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GeneralFetcher.lambda$actionPerformed$4(GeneralFetcher.java:191) ~[JabRef-3.6.jar:?]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

stefan-kolb · 2016-09-22T09:02:43Z

That part should be your search term &q=CGMSBFhDZhcYrbCMvwUiGQDxp4NLfnqIFz9y9s9bYeUiRVwogv02XcI. Dunno why it is encoded in some way.

matthiasgeiger · 2016-09-28T07:48:05Z

At the moment I don't have that much time, but started to have a look at it. The GoogleScholarFetcher for fetching entries seems to be broken at the moment.

The runConfig() method produces the error - and without the configuration the results won't have the expected format (bibtex is not the default citation format but via JS some stuff is loaded).

Fixing the configuration thing looks complicated to me - but I'm not a JS expert and had not the time to investigate deeply what happens at submitting https://scholar.google.com/scholar_settings and how to emulate this using JabRef.

Another approach would be to dermine the "ID" of each shown article and than to call https://scholar.google.de/scholar?q=info:**ID_HERE**:scholar.google.com/&output=cite&scirp=0&hl=de to get the links for the bibtex format; E.g.: https://scholar.google.de/scholar?q=info:RExzBa3OlkQJ:scholar.google.com/&output=cite&scirp=0&hl=de

~~Someone wants to investigate this? Perhaps @JabRef/stupro? 😜 ~~

I'm working on it.

lenhard · 2016-09-28T08:17:47Z

A very controversial suggestion: The google scholar fetcher is a pain in the ass. If they change their format / interface every two weeks, it is not something you can reasonably program against. Maybe we should drop support for it.

matthiasgeiger · 2016-09-28T10:07:58Z

Okay... I got it back working. However, there is rather strict limit now implemented on the Google side so that is only possible to show and import the first 10 search results... (See #2082)

Fixed version is available at http://builds.jabref.org/fix-googlescholar/

koppor · 2016-09-28T10:37:48Z

Screen scraping is still the most popular method of doing EAI. Most popular reason is the lack of an API, as it is in our case. As long as we find someone updating the code, we should keep it. (Also refs #1833)

Siedlerchr · 2016-09-28T11:19:23Z

I would also say that Google Scholar is an important fetcher, which is used by many users.

mlep · 2016-09-28T11:38:45Z

I concur to stress that Google Scholar is (unfortunately for JabRef developers) a primary source of data: accessible for free and covering all scientific fields.

lenhard · 2016-09-28T14:55:32Z

I had expected no other reply :-)

We will continue to have to do weird hacks in the fetcher and manage constant questions from people when it breaks, but what choice do we have...

tobiasdiez · 2016-09-28T14:56:33Z

support Microsoft's Bing academics 😸 (they have at least an API)

Siedlerchr · 2016-09-28T15:12:00Z

@tobiasdiez it's literally death: http://blogs.nature.com/news/2014/05/the-decline-and-fall-of-microsoft-academic-search.html

tobiasdiez · 2016-09-28T15:36:32Z

Well, "what is dead may never die, but rises again, harder and stronger."

Conclusion
In comparison to the Web of Science and Scopus, Microsoft Academic covers a far larger number of publications that are listed in Google Scholar and – importantly – covers all journal publications and books that are also covered in Google Scholar. This suggests that Microsoft Academic has excellent coverage of what are usually considered to be the most important academic outputs: journal articles and books.

http://www.harzing.com/download/mas.pdf

I think Microsoft changed the way how Bing Academics is feed with data. Previously, it was via an own crawler which was then suspended. But now it is connected to Bings main crawler and thus gets good and up-to-date data. Otherwise it wouldn't be listed as part of the congnitive services which until recently was a kind of test field or place for beta stage services and now moves on into "production".

Tercus · 2016-09-30T00:54:54Z

It would be nice to have a plugin-system for the whole search stuff. While it is nice to have a large choice of different search-APIs, most are unknown to people not working in that field and every change in the API requires a new version of Jabref. If the search-part would be using a plugin, then you could alter it by yourself and write your own. Also, you could split the maintenance of the search-APIs from the main project and just make it download them on demand...or something like that.

I also don't understand why some of the big players aren't added, such as WorldCat and many others

koppor · 2016-09-30T05:55:57Z

@Tercus I think, you often heard it in the context of open source projects, but I'll try to rephrase it for the context of JabRef. The JabRef team just consists of volunteers spending their free time for JabRef. They could finish their PhD or PostDoc phase, but they invest time in JabRef, because they just like to. There is no funding agency and the donations are not used for covering our living costs. They are also far from being enough to do so. They are also not enough to pay someone to do work we don't like to.

Regarding the plugins, we decided to drop support for it in the version 3.0. It was not, because plugins are bad per se, but increase our maintenance effort tremendously. We decided that reducing the amount of issues and having more (other) features in JabRef is more important. Moreover, having no plugin support assures that all functions in JabRef remains up to date with other JabRef code. Thus, changing internal data structures does not break any plugin, because we ensure that everything works during in internal change.

Having the code integrated in JabRef ensures that we do not rely on maintenance of third parties. The experience we have in JabRef is that people are working for JabRef and its plugins during their PhD and then move on to new things. Thus, it is not ensured that a plugin is maintained for a long time. Including it in JabRef really increases the probability that it is maintained.

Using TravisCI and offering all builds at https://builds.jabref.org/ ensures that fixes in the fetchers are available to the public as fast as possible.

We are also working on integrating all plugins into JabRef (see #152). And we did that for the GVK Fetcher (#378).

Regarding WorldCat - the JabRef user @ChristopherHackett volunteered to work on that: #1065.

Regarding other fetchers, I think, the answer is partially given in the first paragraph. If some maintenance work was put away from us (some hard tasks are listed in #111) would leave some time for us to do these things in JabRef. We also are aware of more than 500 feature requests from the old sourceforge tracker (see https://github.com/JabRef/jabref/wiki/FeatureRequests-Sorted).

What would help us, if someone would help maintaining our help pages (see https://github.com/JabRef/help.jabref.org/blob/gh-pages/CONTRIBUTING.md for a guide and https://github.com/JabRef/help.jabref.org/issues for a list of issues to start with), provide answers in the discourse forum, and transfer answers from discourse to our help page. Maybe this could be something for you to support JabRef?

Tercus · 2016-10-04T00:31:29Z

@koppor Thank you for your explanation. I'll try to give back to the project the best I can. I understand that a plugin system would be more work, but at the same time I think that it would easier for people like me to contribute to a plugin that is written in a script-language instead of having to figure out how JabRef works. It is quite daunting, to be honest. But I'll still try.

I work in social sciences and so far, none of the scrapper have been useful. Google-scholar worked until I got spam-banned because I was searching too often (apparently?). So right now I am stuck with having to use google scholar from my browser, search there and then import the bibtex code to jabref.

I'll try to get WorldCat running, maybe I'm lucky....

mlep · 2016-10-04T06:53:49Z

@Tercus: Are you aware of jabFox? It is quite handy. See https://addons.mozilla.org/en-us/firefox/addon/jabfox/

matthiasgeiger · 2016-10-04T08:47:38Z

And the current development builds are also capable of searching google Scholar again (with the limitation of only showing the first 10 results of a query).

DesBw · 2016-10-04T14:40:58Z

Thank you...the new version has fixed it

stefan-kolb · 2016-10-19T14:31:33Z

@koppor wants to keep #2173 open for now.

swamptromp closed this as completed Aug 29, 2016

swamptromp reopened this Aug 30, 2016

lenhard added the search label Sep 2, 2016

lenhard mentioned this issue Sep 23, 2016

"Error while fetching from Google Scholar" and potential solution to the problem #2046

Closed

matthiasgeiger mentioned this issue Sep 28, 2016

Quick fix google scholar entry fetching #2082

Merged

6 tasks

AEgit mentioned this issue Sep 29, 2016

Performance issue with new search #1993

Closed

matthiasgeiger mentioned this issue Oct 12, 2016

Google Scholar search randomly stopped working #2156

Closed

matthiasgeiger added the fixed-in-devBuilds label Oct 12, 2016

matthiasgeiger added this to the v3.7 milestone Oct 12, 2016

matthiasgeiger changed the title ~~Google search issue in version 3.6~~ Google search issue in version 3.6 [Fixed in DevBuilds] Oct 12, 2016

koppor mentioned this issue Oct 18, 2016

Google Scholar fetching not working #2173

Closed

stefan-kolb added status: waiting-for-feedback The submitter or other users need to provide more information about the issue and removed status: waiting-for-feedback The submitter or other users need to provide more information about the issue labels Oct 19, 2016

stefan-kolb closed this as completed Oct 19, 2016

stefan-kolb added the status: duplicate label Oct 19, 2016

matthiasgeiger removed the fixed-in-devBuilds label Nov 14, 2016

systemoperator mentioned this issue Feb 13, 2020

Integrates Google Scholar's citation count functionality, a websocket client for JabRef and other extensions/fixes JabRef/JabRef-Browser-Extension#131

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google search issue in version 3.6 [Fixed in DevBuilds] #1886

Google search issue in version 3.6 [Fixed in DevBuilds] #1886

swamptromp commented Aug 29, 2016 •

edited by matthiasgeiger

Loading

Siedlerchr commented Aug 29, 2016

swamptromp commented Aug 29, 2016

swamptromp commented Aug 30, 2016

Siedlerchr commented Aug 30, 2016

tobiasdiez commented Aug 31, 2016

mlep commented Sep 1, 2016

Siedlerchr commented Sep 1, 2016

oscargus commented Sep 1, 2016

oscargus commented Sep 1, 2016

Tercus commented Sep 21, 2016 •

edited

Loading

stefan-kolb commented Sep 22, 2016

matthiasgeiger commented Sep 28, 2016 •

edited

Loading

lenhard commented Sep 28, 2016

matthiasgeiger commented Sep 28, 2016 •

edited

Loading

koppor commented Sep 28, 2016

Siedlerchr commented Sep 28, 2016

mlep commented Sep 28, 2016

lenhard commented Sep 28, 2016 •

edited

Loading

tobiasdiez commented Sep 28, 2016

Siedlerchr commented Sep 28, 2016

tobiasdiez commented Sep 28, 2016

Tercus commented Sep 30, 2016

koppor commented Sep 30, 2016

Tercus commented Oct 4, 2016

mlep commented Oct 4, 2016

matthiasgeiger commented Oct 4, 2016

DesBw commented Oct 4, 2016

stefan-kolb commented Oct 19, 2016

Google search issue in version 3.6 [Fixed in DevBuilds] #1886

Google search issue in version 3.6 [Fixed in DevBuilds] #1886

Comments

swamptromp commented Aug 29, 2016 • edited by matthiasgeiger Loading

Siedlerchr commented Aug 29, 2016

swamptromp commented Aug 29, 2016

swamptromp commented Aug 30, 2016

Siedlerchr commented Aug 30, 2016

tobiasdiez commented Aug 31, 2016

mlep commented Sep 1, 2016

Siedlerchr commented Sep 1, 2016

oscargus commented Sep 1, 2016

oscargus commented Sep 1, 2016

Tercus commented Sep 21, 2016 • edited Loading

stefan-kolb commented Sep 22, 2016

matthiasgeiger commented Sep 28, 2016 • edited Loading

lenhard commented Sep 28, 2016

matthiasgeiger commented Sep 28, 2016 • edited Loading

koppor commented Sep 28, 2016

Siedlerchr commented Sep 28, 2016

mlep commented Sep 28, 2016

lenhard commented Sep 28, 2016 • edited Loading

tobiasdiez commented Sep 28, 2016

Siedlerchr commented Sep 28, 2016

tobiasdiez commented Sep 28, 2016

Tercus commented Sep 30, 2016

koppor commented Sep 30, 2016

Tercus commented Oct 4, 2016

mlep commented Oct 4, 2016

matthiasgeiger commented Oct 4, 2016

DesBw commented Oct 4, 2016

stefan-kolb commented Oct 19, 2016

swamptromp commented Aug 29, 2016 •

edited by matthiasgeiger

Loading

Tercus commented Sep 21, 2016 •

edited

Loading

matthiasgeiger commented Sep 28, 2016 •

edited

Loading

matthiasgeiger commented Sep 28, 2016 •

edited

Loading

lenhard commented Sep 28, 2016 •

edited

Loading