Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google search issue in version 3.6 [Fixed in DevBuilds] #1886

Closed
swamptromp opened this issue Aug 29, 2016 · 28 comments
Closed

Google search issue in version 3.6 [Fixed in DevBuilds] #1886

swamptromp opened this issue Aug 29, 2016 · 28 comments

Comments

@swamptromp
Copy link

swamptromp commented Aug 29, 2016

Hello,

Although JabRef 3.6 resolved issues with Google Search, I am still having problems. Google search worked for me for a few hours (I am trying to set up my entire library, so I have been using the search function a lot), but suddenly stopped fetching any search results. In the meantime, Springer and other searches continue to work - it sounds just like the 3.5 issues described on the forum. The first time this happened, I reinstalled JabRef 3.6, and Google search started working again. The problem came up again, though, and this time reinstalling isn't solving it. Is anyone else still having Google search issues? Any suggestions?

Thanks for your help!

// Edit by @matthiasgeiger: The problem with version 3.6 is fixed in the current development builds which are available at https://builds.jabref.org/master (for details see discussion below)

@Siedlerchr
Copy link
Member

Hello @swamptromp, thanks for your report, I guess you ran into the Google limit. Google blocks your IP for a while, if you do too many automated requests (spam/bot protection) (as noted here)
#1694 (comment)

A solution would be to use the Browser Addon JabFox to import the entries

There is currently nothing we can do and I am not sure if we could display a more specific dialog/error. However, I will create a new issue for that.

@swamptromp
Copy link
Author

Ah, thanks so much for this @Siedlerchr! Great to know, glad it isn't a longer-term issue.

@swamptromp
Copy link
Author

Hi, I have one more question. I verified my identity via the Google Scholar website, and although I can now get results through Google Scholar, JabRef still won't return google search requests, even after reinstalling it. Does this problem just go away after a set amount of time?

@swamptromp swamptromp reopened this Aug 30, 2016
@Siedlerchr
Copy link
Member

@swamptromp As it works for me, I can only try to give an explanation. I still think that your JabRef is blocked (uses a specific user agent)
Your Google Scholar settings in the browser (e.g. the fact that you are successfully authorized) are stored in a cookie. However, JabRef does currently not support any form of authentication and therefore is not able to store your account info.

Maybe we can add this for the future which would resolve the problems a bit. In the meantime I would suggest using other fetchers or to manually import it. If you have a DOI or ISBN for a paper/book/..., try to use the DOI to Bibtex/ISBN fetchers, as they directly resolve the number to a bibtex entry.

@tobiasdiez
Copy link
Member

@JabRef/developers did we tried to get clearance/allowance from google for the userargent = JabRef? This might be worth a try.

@mlep
Copy link
Contributor

mlep commented Sep 1, 2016

@JabRef/developers : The help about Google Scholar states that

To unblock your IP, do a Google scholar search in your browser.
You will be asked to show that you are not a robot (a CAPTCHA challenge).

Is this trick currently valid?

@Siedlerchr
Copy link
Member

@tobiasdiez I remember that the User Agent previously was set to JabRef, but that led to the problem with then non utf-8 response. Maybe we should contact Google?

@oscargus
Copy link
Contributor

oscargus commented Sep 1, 2016

@mlep Yes, I think so. It was added quite recently.

@oscargus
Copy link
Contributor

oscargus commented Sep 1, 2016

@Siedlerchr That was solved by explicitly asking for UTF-8, see #1785

@lenhard lenhard added the search label Sep 2, 2016
@Tercus
Copy link

Tercus commented Sep 21, 2016

I am running into the same problem. When I visit the google scholar page in my browser I have no problems (I get redirected to scholar.google.de though). When I copy the URL from the error log I DO get the message to prove that I am human.
Even weirder is that my search term does not show up anywhere in the URL.

Error log (newly opened, searched for "test"):

java.io.IOException: Server returned HTTP response code: 503 for URL: https://ipv4.google.com/sorry/IndexRedirect?continue=https://scholar.google.com/scholar%3Fhl%3Den%26oe%3DASCII%26num%3D20%26as_sdt%3D2006&hl=en&q=CGMSBFhDZhcYrbCMvwUiGQDxp4NLfnqIFz9y9s9bYeUiRVwogv02XcI

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839) ~[?:1.8.0_60]

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) ~[?:1.8.0_60]

    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) ~[?:1.8.0_60]

    at net.sf.jabref.logic.net.URLDownload.downloadToString(URLDownload.java:123) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.runConfig(GoogleScholarFetcher.java:166) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.processQueryGetPreview(GoogleScholarFetcher.java:82) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GeneralFetcher.lambda$actionPerformed$4(GeneralFetcher.java:191) ~[JabRef-3.6.jar:?]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

  01:30:53.124 [JabRef CachedThreadPool] WARN  net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher - Error fetching from Google Scholar

java.io.IOException: Server returned HTTP response code: 503 for URL: https://ipv4.google.com/sorry/IndexRedirect?continue=https://scholar.google.com/scholar%3Fhl%3Den%26oe%3DASCII%26num%3D20%26as_sdt%3D2006&hl=en&q=CGMSBFhDZhcYrbCMvwUiGQDxp4NLfnqIFz9y9s9bYeUiRVwogv02XcI

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1839) ~[?:1.8.0_60]

    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440) ~[?:1.8.0_60]

    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254) ~[?:1.8.0_60]

    at net.sf.jabref.logic.net.URLDownload.downloadToString(URLDownload.java:123) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.runConfig(GoogleScholarFetcher.java:166) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GoogleScholarFetcher.processQueryGetPreview(GoogleScholarFetcher.java:82) ~[JabRef-3.6.jar:?]

    at net.sf.jabref.gui.importer.fetcher.GeneralFetcher.lambda$actionPerformed$4(GeneralFetcher.java:191) ~[JabRef-3.6.jar:?]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

@stefan-kolb
Copy link
Member

That part should be your search term &q=CGMSBFhDZhcYrbCMvwUiGQDxp4NLfnqIFz9y9s9bYeUiRVwogv02XcI. Dunno why it is encoded in some way.

@matthiasgeiger
Copy link
Member

matthiasgeiger commented Sep 28, 2016

At the moment I don't have that much time, but started to have a look at it. The GoogleScholarFetcher for fetching entries seems to be broken at the moment.

The runConfig() method produces the error - and without the configuration the results won't have the expected format (bibtex is not the default citation format but via JS some stuff is loaded).

Fixing the configuration thing looks complicated to me - but I'm not a JS expert and had not the time to investigate deeply what happens at submitting https://scholar.google.com/scholar_settings and how to emulate this using JabRef.

Another approach would be to dermine the "ID" of each shown article and than to call https://scholar.google.de/scholar?q=info:**ID_HERE**:scholar.google.com/&output=cite&scirp=0&hl=de to get the links for the bibtex format; E.g.: https://scholar.google.de/scholar?q=info:RExzBa3OlkQJ:scholar.google.com/&output=cite&scirp=0&hl=de

~~Someone wants to investigate this? Perhaps @JabRef/stupro? 😜 ~~

I'm working on it.

@lenhard
Copy link
Member

lenhard commented Sep 28, 2016

A very controversial suggestion: The google scholar fetcher is a pain in the ass. If they change their format / interface every two weeks, it is not something you can reasonably program against. Maybe we should drop support for it.

@matthiasgeiger
Copy link
Member

matthiasgeiger commented Sep 28, 2016

Okay... I got it back working. However, there is rather strict limit now implemented on the Google side so that is only possible to show and import the first 10 search results... (See #2082)

Fixed version is available at http://builds.jabref.org/fix-googlescholar/

@koppor
Copy link
Member

koppor commented Sep 28, 2016

Screen scraping is still the most popular method of doing EAI. Most popular reason is the lack of an API, as it is in our case. As long as we find someone updating the code, we should keep it. (Also refs #1833)

@Siedlerchr
Copy link
Member

I would also say that Google Scholar is an important fetcher, which is used by many users.

@mlep
Copy link
Contributor

mlep commented Sep 28, 2016

I concur to stress that Google Scholar is (unfortunately for JabRef developers) a primary source of data: accessible for free and covering all scientific fields.

@lenhard
Copy link
Member

lenhard commented Sep 28, 2016

I had expected no other reply :-)

We will continue to have to do weird hacks in the fetcher and manage constant questions from people when it breaks, but what choice do we have...

@tobiasdiez
Copy link
Member

support Microsoft's Bing academics 😸 (they have at least an API)

@Siedlerchr
Copy link
Member

@tobiasdiez
Copy link
Member

Well, "what is dead may never die, but rises again, harder and stronger."

Conclusion
In comparison to the Web of Science and Scopus, Microsoft Academic covers a far larger number of publications that are listed in Google Scholar and – importantly – covers all journal publications and books that are also covered in Google Scholar. This suggests that Microsoft Academic has excellent coverage of what are usually considered to be the most important academic outputs: journal articles and books.

http://www.harzing.com/download/mas.pdf

I think Microsoft changed the way how Bing Academics is feed with data. Previously, it was via an own crawler which was then suspended. But now it is connected to Bings main crawler and thus gets good and up-to-date data. Otherwise it wouldn't be listed as part of the congnitive services which until recently was a kind of test field or place for beta stage services and now moves on into "production".

@Tercus
Copy link

Tercus commented Sep 30, 2016

It would be nice to have a plugin-system for the whole search stuff. While it is nice to have a large choice of different search-APIs, most are unknown to people not working in that field and every change in the API requires a new version of Jabref. If the search-part would be using a plugin, then you could alter it by yourself and write your own. Also, you could split the maintenance of the search-APIs from the main project and just make it download them on demand...or something like that.

I also don't understand why some of the big players aren't added, such as WorldCat and many others

@koppor
Copy link
Member

koppor commented Sep 30, 2016

@Tercus I think, you often heard it in the context of open source projects, but I'll try to rephrase it for the context of JabRef. The JabRef team just consists of volunteers spending their free time for JabRef. They could finish their PhD or PostDoc phase, but they invest time in JabRef, because they just like to. There is no funding agency and the donations are not used for covering our living costs. They are also far from being enough to do so. They are also not enough to pay someone to do work we don't like to.

Regarding the plugins, we decided to drop support for it in the version 3.0. It was not, because plugins are bad per se, but increase our maintenance effort tremendously. We decided that reducing the amount of issues and having more (other) features in JabRef is more important. Moreover, having no plugin support assures that all functions in JabRef remains up to date with other JabRef code. Thus, changing internal data structures does not break any plugin, because we ensure that everything works during in internal change.

Having the code integrated in JabRef ensures that we do not rely on maintenance of third parties. The experience we have in JabRef is that people are working for JabRef and its plugins during their PhD and then move on to new things. Thus, it is not ensured that a plugin is maintained for a long time. Including it in JabRef really increases the probability that it is maintained.

Using TravisCI and offering all builds at https://builds.jabref.org/ ensures that fixes in the fetchers are available to the public as fast as possible.

We are also working on integrating all plugins into JabRef (see #152). And we did that for the GVK Fetcher (#378).

Regarding WorldCat - the JabRef user @ChristopherHackett volunteered to work on that: #1065.

Regarding other fetchers, I think, the answer is partially given in the first paragraph. If some maintenance work was put away from us (some hard tasks are listed in #111) would leave some time for us to do these things in JabRef. We also are aware of more than 500 feature requests from the old sourceforge tracker (see https://github.com/JabRef/jabref/wiki/FeatureRequests-Sorted).

What would help us, if someone would help maintaining our help pages (see https://github.com/JabRef/help.jabref.org/blob/gh-pages/CONTRIBUTING.md for a guide and https://github.com/JabRef/help.jabref.org/issues for a list of issues to start with), provide answers in the discourse forum, and transfer answers from discourse to our help page. Maybe this could be something for you to support JabRef?

@Tercus
Copy link

Tercus commented Oct 4, 2016

@koppor Thank you for your explanation. I'll try to give back to the project the best I can. I understand that a plugin system would be more work, but at the same time I think that it would easier for people like me to contribute to a plugin that is written in a script-language instead of having to figure out how JabRef works. It is quite daunting, to be honest. But I'll still try.

I work in social sciences and so far, none of the scrapper have been useful. Google-scholar worked until I got spam-banned because I was searching too often (apparently?). So right now I am stuck with having to use google scholar from my browser, search there and then import the bibtex code to jabref.

I'll try to get WorldCat running, maybe I'm lucky....

@mlep
Copy link
Contributor

mlep commented Oct 4, 2016

@Tercus: Are you aware of jabFox? It is quite handy. See https://addons.mozilla.org/en-us/firefox/addon/jabfox/

@matthiasgeiger
Copy link
Member

And the current development builds are also capable of searching google Scholar again (with the limitation of only showing the first 10 results of a query).

@DesBw
Copy link

DesBw commented Oct 4, 2016

Thank you...the new version has fixed it

@matthiasgeiger matthiasgeiger added this to the v3.7 milestone Oct 12, 2016
@matthiasgeiger matthiasgeiger changed the title Google search issue in version 3.6 Google search issue in version 3.6 [Fixed in DevBuilds] Oct 12, 2016
@stefan-kolb stefan-kolb added status: waiting-for-feedback The submitter or other users need to provide more information about the issue and removed status: waiting-for-feedback The submitter or other users need to provide more information about the issue labels Oct 19, 2016
@stefan-kolb
Copy link
Member

@koppor wants to keep #2173 open for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests