Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Google Scholar fetcher for downloading a single entry #7075

Closed
wants to merge 16 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/advanced-reading/fetchers.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,17 @@ Fetchers are the implementation of the [search using online services](https://do

On Windows, you have to log-off and log-on to let IntelliJ know about the environment variable change. Execute the gradle task "processResources" in the group "others" within IntelliJ to ensure the values have been correctly written. Now, the fetcher tests should run without issues.

## Change the log levels to enable debugging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also just start JabRef with -debug as program argument.


1. Open `src/test/resources/log4j2-test.xml`
2. Add following XML snippet

```xml
<Logger name="org.jabref.logic.importer.fetcher" level="DEBUG">
<AppenderRef ref="CONSOLE"/>
</Logger>
```

## Background on embedding the keys in JabRef

The keys are placed into the `build.properties` file.
Expand Down
56 changes: 56 additions & 0 deletions src/main/java/org/jabref/gui/dialogs/CaptchaSolverDialog.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
package org.jabref.gui.dialogs;

import java.util.concurrent.CountDownLatch;

import javafx.application.Platform;
import javafx.scene.control.ButtonType;
import javafx.scene.web.WebView;

import org.jabref.gui.util.BaseDialog;
import org.jabref.logic.l10n.Localization;
import org.jabref.logic.net.URLDownload;

import org.jsoup.helper.W3CDom;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.w3c.dom.Document;

public class CaptchaSolverDialog extends BaseDialog<String> implements org.jabref.logic.importer.fetcher.CaptchaSolver {

public static final Logger LOGGER = LoggerFactory.getLogger(CaptchaSolverDialog.class);

private WebView webView;

public CaptchaSolverDialog() {
super();
this.setTitle(Localization.lang("Captcha Solver"));
getDialogPane().getButtonTypes().add(ButtonType.CLOSE);
getDialogPane().lookupButton(ButtonType.CLOSE).setVisible(true);

webView = new WebView();
webView.getEngine().setJavaScriptEnabled(true);
webView.getEngine().setUserAgent(URLDownload.USER_AGENT);
getDialogPane().setContent(webView);
}

@Override
public String solve(String queryURL) {
// slim implementation of https://news.kynosarges.org/2014/05/01/simulating-platform-runandwait/
final CountDownLatch doneLatch = new CountDownLatch(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to listen for the web engine ready event, see the preview Tab viewer where we add this highlight ja stuff

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

   previewView.getEngine().getLoadWorker().stateProperty().addListener((observable, oldValue, newValue) -> {

            if (newValue != Worker.State.SUCCEEDED) {
                return;
            }

See https://openjfx.io/javadoc/11/javafx.web/javafx/scene/web/WebEngine.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to listen for the web engine ready event, see the preview Tab viewer where we add this highlight ja stuff

Is this happen synchronously? The interface for the Captcha solver is designed in a synchronous way. Otherwise all fetchers need to be changed.

I'll be away anyway for the next days. Thus, you are free to experiment 😅

Platform.runLater(() -> {
webView.getEngine().load(queryURL);
// For the quick implementation, we ignore the result
// Later, at "webView", we directly extract it from the web view
this.showAndWait();
doneLatch.countDown();
});
try {
doneLatch.await();
Document document = webView.getEngine().getDocument();
return W3CDom.asString(document, null);
} catch (InterruptedException e) {
LOGGER.error("Issues with the UI", e);
}
return "";
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@

import org.jabref.gui.DialogService;
import org.jabref.gui.StateManager;
import org.jabref.gui.dialogs.CaptchaSolverDialog;
import org.jabref.gui.importer.ImportEntriesDialog;
import org.jabref.gui.util.BackgroundTask;
import org.jabref.logic.importer.ParserResult;
import org.jabref.logic.importer.SearchBasedFetcher;
import org.jabref.logic.importer.WebFetchers;
import org.jabref.logic.importer.fetcher.GoogleScholar;
import org.jabref.logic.l10n.Localization;
import org.jabref.model.strings.StringUtil;
import org.jabref.preferences.PreferencesService;
Expand All @@ -43,6 +45,7 @@ public WebSearchPaneViewModel(PreferencesService preferencesService, DialogServi
this.dialogService = dialogService;
this.stateManager = stateManager;

WebFetchers.setCaptchaSolver(new CaptchaSolverDialog());
SortedSet<SearchBasedFetcher> allFetchers = WebFetchers.getSearchBasedFetchers(preferencesService.getImportFormatPreferences());
fetchers.setAll(allFetchers);

Expand Down Expand Up @@ -107,6 +110,9 @@ public void search() {
task = BackgroundTask.wrap(() -> new ParserResult(activeFetcher.performSearch(getQuery().trim())))
.withInitialMessage(Localization.lang("Processing %0", getQuery().trim()));
task.onFailure(dialogService::showErrorDialogAndWait);
if (activeFetcher instanceof GoogleScholar) {
task.showToUser(true);
}

ImportEntriesDialog dialog = new ImportEntriesDialog(stateManager.getActiveDatabase().get(), task);
dialog.setTitle(activeFetcher.getName());
Expand Down
69 changes: 57 additions & 12 deletions src/main/java/org/jabref/logic/importer/QueryParser.java
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package org.jabref.logic.importer;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
Expand All @@ -11,7 +12,10 @@

import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.flexible.core.QueryNodeException;
import org.apache.lucene.queryparser.flexible.core.nodes.FieldQueryNode;
import org.apache.lucene.queryparser.flexible.core.nodes.QueryNode;
import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
import org.apache.lucene.queryparser.flexible.standard.parser.StandardSyntaxParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryVisitor;

Expand All @@ -26,24 +30,65 @@ public class QueryParser {
* Parses the given query string into a complex query using lucene.
* Note: For unique fields, the alphabetically and numerically first instance in the query string is used in the complex query.
*
* @param query The given query string
* @param query The given query string. E.g. <code>BPMN 2.0</code> or <code>author:"Kopp" AND title:"BPEL4Chor"</code>
* @return A complex query containing all fields of the query string
*/
public Optional<ComplexSearchQuery> parseQueryStringIntoComplexQuery(String query) {
try {
StandardQueryParser parser = new StandardQueryParser();
Query luceneQuery = parser.parse(query, "default");
Set<Term> terms = new HashSet<>();
// This implementation collects all terms from the leaves of the query tree independent of the internal boolean structure
// If further capabilities are required in the future the visitor and ComplexSearchQuery has to be adapted accordingly.
QueryVisitor visitor = QueryVisitor.termCollector(terms);
luceneQuery.visit(visitor);

List<Term> sortedTerms = new ArrayList<>(terms);
sortedTerms.sort(Comparator.comparing(Term::text).reversed());
return Optional.of(ComplexSearchQuery.fromTerms(sortedTerms));
StandardSyntaxParser parser = new StandardSyntaxParser();
QueryNode luceneQuery = parser.parse(query, "default");
QueryToComplexSearchQueryTransformator transformator = new QueryToComplexSearchQueryTransformator();
return Optional.of(transformator.handle(luceneQuery));
} catch (QueryNodeException | IllegalStateException | IllegalArgumentException ex) {
return Optional.empty();
}
}

private static class QueryToComplexSearchQueryTransformator {

ComplexSearchQuery.ComplexSearchQueryBuilder builder;

public ComplexSearchQuery handle(QueryNode query) {
builder = ComplexSearchQuery.builder();
transform(query);
return builder.build();
}

public void transform(QueryNode query) {
if (query instanceof FieldQueryNode) {
transform(((FieldQueryNode) query));
return;
}
query.getChildren().forEach(this::transform);
}

private void transform(FieldQueryNode query) {
final String fieldValue = query.getTextAsString();
switch (query.getFieldAsString()) {
case "author" -> {
builder.author(fieldValue);
}
case "journal" -> {
builder.journal(fieldValue);
}
case "title" -> {
builder.titlePhrase(fieldValue);
}
case "year" -> {
builder.singleYear(Integer.valueOf(fieldValue));
}
case "year-range" -> {
String[] years = fieldValue.split("-");
if (years.length != 2) {
return;
}
builder.fromYearAndToYear(Integer.valueOf(years[0]), Integer.valueOf(years[1]));
}
default -> {
builder.defaultFieldPhrase(fieldValue);
}
}
}

}
}
13 changes: 11 additions & 2 deletions src/main/java/org/jabref/logic/importer/WebFetchers.java
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import org.jabref.logic.importer.fetcher.ApsFetcher;
import org.jabref.logic.importer.fetcher.ArXiv;
import org.jabref.logic.importer.fetcher.AstrophysicsDataSystem;
import org.jabref.logic.importer.fetcher.CaptchaSolver;
import org.jabref.logic.importer.fetcher.CiteSeer;
import org.jabref.logic.importer.fetcher.CollectionOfComputerScienceBibliographiesFetcher;
import org.jabref.logic.importer.fetcher.CompositeSearchBasedFetcher;
Expand All @@ -31,6 +32,7 @@
import org.jabref.logic.importer.fetcher.MathSciNet;
import org.jabref.logic.importer.fetcher.MedlineFetcher;
import org.jabref.logic.importer.fetcher.Medra;
import org.jabref.logic.importer.fetcher.NoneCaptchaSolver;
import org.jabref.logic.importer.fetcher.OpenAccessDoi;
import org.jabref.logic.importer.fetcher.RfcFetcher;
import org.jabref.logic.importer.fetcher.ScienceDirect;
Expand All @@ -51,6 +53,13 @@ public class WebFetchers {
private WebFetchers() {
}

// Default CaptchaSolver is the useless one (which just does not through an exception)
private static CaptchaSolver captchaSolver = new NoneCaptchaSolver();

public static void setCaptchaSolver(CaptchaSolver captchaSolver) {
WebFetchers.captchaSolver = captchaSolver;
}

public static Optional<IdBasedFetcher> getIdBasedFetcherForField(Field field, ImportFormatPreferences preferences) {
IdBasedFetcher fetcher;

Expand Down Expand Up @@ -96,7 +105,7 @@ public static SortedSet<SearchBasedFetcher> getSearchBasedFetchers(ImportFormatP
set.add(new ZbMATH(importFormatPreferences));
// see https://github.com/JabRef/jabref/issues/5804
// set.add(new ACMPortalFetcher(importFormatPreferences));
set.add(new GoogleScholar(importFormatPreferences));
set.add(new GoogleScholar(importFormatPreferences, captchaSolver));
set.add(new DBLPFetcher(importFormatPreferences));
set.add(new SpringerFetcher());
set.add(new CrossRef());
Expand Down Expand Up @@ -170,7 +179,7 @@ public static Set<FulltextFetcher> getFullTextFetchers(ImportFormatPreferences i
fetchers.add(new ApsFetcher());
// Meta search
fetchers.add(new JstorFetcher(importFormatPreferences));
fetchers.add(new GoogleScholar(importFormatPreferences));
fetchers.add(new GoogleScholar(importFormatPreferences, captchaSolver));
fetchers.add(new OpenAccessDoi());

return fetchers;
Expand Down
12 changes: 12 additions & 0 deletions src/main/java/org/jabref/logic/importer/fetcher/CaptchaSolver.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package org.jabref.logic.importer.fetcher;

public interface CaptchaSolver {

/**
* Instructs the user to solve the captcha given at
*
* @param queryURL the URL to query
* @return html content after solving the captcha
*/
String solve(String queryURL);
}
Loading