Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/enable lucene query parsing #6799

Merged
merged 25 commits into from
Aug 31, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
c8ebc91
Squashed 'src/main/resources/csl-styles/' changes from bf698acec7..82…
Aug 1, 2020
f5f02cf
Merge commit 'c8ebc91d4799e59518ac887f69e584040cf29994'
Aug 1, 2020
37c2d64
Squashed 'src/main/resources/csl-styles/' changes from 827b986621..eb…
Aug 15, 2020
b85ebe3
Merge commit '37c2d64e449ef617bb98c0ad3b39a903d8f3d886' into master
Aug 15, 2020
f10ea5f
Merge remote-tracking branch 'upstream/master'
DominikVoigt Aug 25, 2020
172b113
Enable Query String into ComplexQuery parsing using lucene.
DominikVoigt Aug 26, 2020
90a629e
Add ADR
DominikVoigt Aug 27, 2020
3af3781
Add Changelog entry
DominikVoigt Aug 27, 2020
26851a7
Merge remote-tracking branch 'upstream/master' into feature/enable-lu…
DominikVoigt Aug 27, 2020
8468ad4
Move Changelog entry
DominikVoigt Aug 27, 2020
d592a47
Increment ADR number
DominikVoigt Aug 27, 2020
4610451
Remove Optional from Lists
DominikVoigt Aug 27, 2020
86c6576
Rename slipped occurences of converter
DominikVoigt Aug 27, 2020
e2447f6
Merge remote-tracking branch 'upstream/master' into feature/enable-lu…
DominikVoigt Aug 28, 2020
0642d49
Mapping of unknown fields onto default field
DominikVoigt Aug 30, 2020
11afbe9
Ignores unknown fields
DominikVoigt Aug 30, 2020
6fde881
Merge branch 'master' into feature/enable-lucene-query-parsing
koppor Aug 30, 2020
e27ebcc
Merge remote-tracking branch 'upstream/master' into feature/enable-lu…
DominikVoigt Aug 31, 2020
239095e
Fix checkstyle issues
DominikVoigt Aug 31, 2020
467cbfc
Merge branch 'master' into feature/enable-lucene-query-parsing
koppor Aug 31, 2020
ff6a7d3
Merge branch 'feature/enable-lucene-query-parsing' of https://github.…
DominikVoigt Aug 31, 2020
542d3a7
Apply suggestions from code review
DominikVoigt Aug 31, 2020
4b2ab98
Handle year-range parsing in a more robust way
DominikVoigt Aug 31, 2020
8b3b7cd
Move extracted method into builder
DominikVoigt Aug 31, 2020
ec60734
Merge branch 'feature/enable-lucene-query-parsing' of https://github.…
DominikVoigt Aug 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 0 additions & 48 deletions src/main/java/org/jabref/logic/importer/QueryConverter.java

This file was deleted.

53 changes: 53 additions & 0 deletions src/main/java/org/jabref/logic/importer/QueryParser.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
package org.jabref.logic.importer;

import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashSet;
import java.util.List;
import java.util.Optional;
import java.util.Set;

import org.jabref.logic.importer.fetcher.ComplexSearchQuery;

import org.apache.lucene.index.Term;
import org.apache.lucene.queryparser.flexible.core.QueryNodeException;
import org.apache.lucene.queryparser.flexible.standard.StandardQueryParser;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.QueryVisitor;

/**
* This class converts a query string written in lucene syntax into a complex search query.
*
* For simplicity this is limited to fielded data and the boolean AND operator.
*/
public class QueryParser {

/**
* Converts the given query string into a complex query using lucene.
* Note: For unique fields, the alphabetically first instance in the query string is used in the complex query.
*
* @param queryString The given query string
* @return A complex query containing all fields of the query string
* @throws QueryNodeException Error during parsing
*/
public Optional<ComplexSearchQuery> convertQueryStringIntoComplexQuery(String queryString) {
DominikVoigt marked this conversation as resolved.
Show resolved Hide resolved
try {
ComplexSearchQuery.ComplexSearchQueryBuilder builder = ComplexSearchQuery.builder();

StandardQueryParser parser = new StandardQueryParser();
Query luceneQuery = parser.parse(queryString, "default");
Set<Term> terms = new HashSet<>();
// This implementation collects all terms from the leaves of the query tree independent of the internal boolean structure
// If further capabilities are required in the future the visitor and ComplexSearchQuery has to be adapted accordingly.
QueryVisitor visitor = QueryVisitor.termCollector(terms);
luceneQuery.visit(visitor);

List<Term> sortedTerms = new ArrayList<>(terms);
sortedTerms.sort(Comparator.comparing(Term::text));
builder.terms(sortedTerms);
return Optional.of(builder.build());
} catch (QueryNodeException ex) {
return Optional.empty();
}
}
}
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package org.jabref.logic.importer;

import java.util.List;
import java.util.Optional;

import org.jabref.logic.importer.fetcher.ComplexSearchQuery;
import org.jabref.model.entry.BibEntry;
Expand All @@ -28,7 +27,7 @@ public interface SearchBasedFetcher extends WebFetcher {
*/
default List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
// Default Implementation behaves like perform search using the default field phrases as query
DominikVoigt marked this conversation as resolved.
Show resolved Hide resolved
Optional<List<String>> defaultPhrases = complexSearchQuery.getDefaultFieldPhrases();
return performSearch(defaultPhrases.map(strings -> String.join(" ", strings)).orElse(""));
List<String> defaultPhrases = complexSearchQuery.getDefaultFieldPhrases();
return performSearch(String.join(" ", defaultPhrases));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
import java.net.URL;
import java.util.Collections;
import java.util.List;
import java.util.Optional;

import org.jabref.logic.importer.fetcher.ComplexSearchQuery;
import org.jabref.model.cleanup.Formatter;
Expand Down Expand Up @@ -85,8 +84,8 @@ private List<BibEntry> getBibEntries(URL urlForQuery) throws FetcherException {

default URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException, FetcherException {
// Default Implementation behaves like getURLForQuery using the default field phrases as query
DominikVoigt marked this conversation as resolved.
Show resolved Hide resolved
Optional<List<String>> defaultPhrases = complexSearchQuery.getDefaultFieldPhrases();
return this.getURLForQuery(defaultPhrases.map(strings -> String.join(" ", strings)).orElse(""));
List<String> defaultPhrases = complexSearchQuery.getDefaultFieldPhrases();
return this.getURLForQuery(String.join(" ", defaultPhrases));
}

/**
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/org/jabref/logic/importer/fetcher/ArXiv.java
Original file line number Diff line number Diff line change
Expand Up @@ -264,12 +264,12 @@ public List<BibEntry> performSearch(String query) throws FetcherException {
@Override
public List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery) throws FetcherException {
List<String> searchTerms = new ArrayList<>();
complexSearchQuery.getAuthors().ifPresent(authors -> authors.forEach(author -> searchTerms.add("au:" + author)));
complexSearchQuery.getTitlePhrases().ifPresent(title -> searchTerms.add("ti:" + title));
complexSearchQuery.getAuthors().forEach(author -> searchTerms.add("au:" + author));
complexSearchQuery.getTitlePhrases().forEach(title -> searchTerms.add("ti:" + title));
complexSearchQuery.getJournal().ifPresent(journal -> searchTerms.add("jr:" + journal));
// Since ArXiv API does not support year search, we ignore the year related terms
complexSearchQuery.getToYear().ifPresent(year -> searchTerms.add(year.toString()));
complexSearchQuery.getDefaultFieldPhrases().ifPresent(searchTerms::addAll);
searchTerms.addAll(complexSearchQuery.getDefaultFieldPhrases());
String complexQueryString = String.join(" AND ", searchTerms);
return performSearch(complexQueryString);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,32 @@ private ComplexSearchQuery(List<String> defaultField, List<String> authors, List
this.singleYear = singleYear;
}

public Optional<List<String>> getDefaultFieldPhrases() {
return Optional.ofNullable(defaultField);
public static ComplexSearchQuery fromTerms(Collection<Term> terms) {
ComplexSearchQueryBuilder builder = ComplexSearchQuery.builder();
terms.forEach(term -> {
String termText = term.text();
switch (term.field().toLowerCase()) {
case "author" -> builder.author(termText);
case "title" -> builder.titlePhrase(termText);
case "journal" -> builder.journal(termText);
case "year" -> builder.singleYear(Integer.valueOf(termText));
case "year-range" -> builder.fromYearAndToYear(Integer.valueOf(termText.split("-")[0]), Integer.valueOf(termText.split("-")[1]));
koppor marked this conversation as resolved.
Show resolved Hide resolved
case "default" -> builder.defaultFieldPhrase(termText);
}
});
return builder.build();
}

public List<String> getDefaultFieldPhrases() {
return defaultField;
}

public Optional<List<String>> getAuthors() {
return Optional.ofNullable(authors);
public List<String> getAuthors() {
return authors;
}

public Optional<List<String>> getTitlePhrases() {
return Optional.ofNullable(titlePhrases);
public List<String> getTitlePhrases() {
return titlePhrases;
}

public Optional<Integer> getFromYear() {
Expand All @@ -63,10 +79,35 @@ public static ComplexSearchQueryBuilder builder() {
return new ComplexSearchQueryBuilder();
}

@Override
public boolean equals(Object o) {
if (this == o)
return true;
if (o == null || getClass() != o.getClass())
return false;

ComplexSearchQuery that = (ComplexSearchQuery) o;

// Just check for set equality, order does not matter
if (!(getDefaultFieldPhrases().containsAll(that.getDefaultFieldPhrases()) && that.getDefaultFieldPhrases().containsAll(getDefaultFieldPhrases())))
return false;
if (!(getAuthors().containsAll(that.getAuthors()) && that.getAuthors().containsAll(getAuthors())))
return false;
if (!(getTitlePhrases().containsAll(that.getTitlePhrases()) && that.getTitlePhrases().containsAll(getTitlePhrases())))
return false;
if (getFromYear().isPresent() ? !getFromYear().equals(that.getFromYear()) : that.getFromYear().isPresent())
return false;
if (getToYear().isPresent() ? !getToYear().equals(that.getToYear()) : that.getToYear().isPresent())
return false;
if (getSingleYear().isPresent() ? !getSingleYear().equals(that.getSingleYear()) : that.getSingleYear().isPresent())
return false;
return getJournal().isPresent() ? getJournal().equals(that.getJournal()) : !that.getJournal().isPresent();
}

public static class ComplexSearchQueryBuilder {
private List<String> defaultFieldPhrases;
private List<String> authors;
private List<String> titlePhrases;
private List<String> defaultFieldPhrases = new ArrayList<>();
private List<String> authors = new ArrayList<>();
private List<String> titlePhrases = new ArrayList<>();
private String journal;
private Integer fromYear;
private Integer toYear;
Expand All @@ -79,9 +120,6 @@ public ComplexSearchQueryBuilder defaultFieldPhrase(String defaultFieldPhrase) {
if (Objects.requireNonNull(defaultFieldPhrase).isBlank()) {
throw new IllegalArgumentException("Parameter must not be blank");
}
if (Objects.isNull(defaultFieldPhrases)) {
this.defaultFieldPhrases = new ArrayList<>();
}
// Strip all quotes before wrapping
this.defaultFieldPhrases.add(String.format("\"%s\"", defaultFieldPhrase.replace("\"", "")));
return this;
Expand All @@ -94,9 +132,6 @@ public ComplexSearchQueryBuilder author(String author) {
if (Objects.requireNonNull(author).isBlank()) {
throw new IllegalArgumentException("Parameter must not be blank");
}
if (Objects.isNull(authors)) {
this.authors = new ArrayList<>();
}
// Strip all quotes before wrapping
this.authors.add(String.format("\"%s\"", author.replace("\"", "")));
return this;
Expand All @@ -109,9 +144,6 @@ public ComplexSearchQueryBuilder titlePhrase(String titlePhrase) {
if (Objects.requireNonNull(titlePhrase).isBlank()) {
throw new IllegalArgumentException("Parameter must not be blank");
}
if (Objects.isNull(titlePhrases)) {
this.titlePhrases = new ArrayList<>();
}
// Strip all quotes before wrapping
this.titlePhrases.add(String.format("\"%s\"", titlePhrase.replace("\"", "")));
return this;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,9 +205,9 @@ public List<BibEntry> performComplexSearch(ComplexSearchQuery complexSearchQuery

private String constructComplexQueryString(ComplexSearchQuery complexSearchQuery) {
List<String> searchTerms = new ArrayList<>();
complexSearchQuery.getDefaultFieldPhrases().ifPresent(searchTerms::addAll);
complexSearchQuery.getAuthors().ifPresent(authors -> authors.forEach(author -> searchTerms.add("author:" + author)));
complexSearchQuery.getTitlePhrases().ifPresent(phrases -> searchTerms.add("allintitle:" + String.join(" ", phrases)));
searchTerms.addAll(complexSearchQuery.getDefaultFieldPhrases());
complexSearchQuery.getAuthors().forEach(author -> searchTerms.add("author:" + author));
searchTerms.add("allintitle:" + String.join(" ", complexSearchQuery.getTitlePhrases()));
complexSearchQuery.getJournal().ifPresent(journal -> searchTerms.add("source:" + journal));
// API automatically ANDs the terms
return String.join(" ", searchTerms);
Expand Down
14 changes: 9 additions & 5 deletions src/main/java/org/jabref/logic/importer/fetcher/IEEE.java
Original file line number Diff line number Diff line change
Expand Up @@ -236,11 +236,15 @@ public Optional<HelpFile> getHelpPage() {
public URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URISyntaxException, MalformedURLException {
URIBuilder uriBuilder = new URIBuilder("https://ieeexploreapi.ieee.org/api/v1/search/articles");
uriBuilder.addParameter("apikey", API_KEY);
complexSearchQuery.getDefaultFieldPhrases().ifPresent(defaultFieldPhrases -> uriBuilder.addParameter("querytext", String.join(" AND ", defaultFieldPhrases)));
complexSearchQuery.getAuthors().ifPresent(authors ->
uriBuilder.addParameter("author", String.join(" AND ", authors)));
complexSearchQuery.getTitlePhrases().ifPresent(articleTitlePhrases ->
uriBuilder.addParameter("article_title", String.join(" AND ", articleTitlePhrases)));
if (!complexSearchQuery.getDefaultFieldPhrases().isEmpty()) {
uriBuilder.addParameter("querytext", String.join(" AND ", complexSearchQuery.getDefaultFieldPhrases()));
}
if (!complexSearchQuery.getAuthors().isEmpty()) {
uriBuilder.addParameter("author", String.join(" AND ", complexSearchQuery.getAuthors()));
}
if (!complexSearchQuery.getAuthors().isEmpty()) {
uriBuilder.addParameter("article_title", String.join(" AND ", complexSearchQuery.getTitlePhrases()));
}
complexSearchQuery.getJournal().ifPresent(journalTitle -> uriBuilder.addParameter("publication_title", journalTitle));
complexSearchQuery.getFromYear().map(String::valueOf).ifPresent(year -> uriBuilder.addParameter("start_year", year));
complexSearchQuery.getToYear().map(String::valueOf).ifPresent(year -> uriBuilder.addParameter("end_year", year));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,12 +171,12 @@ public URL getComplexQueryURL(ComplexSearchQuery complexSearchQuery) throws URIS

private String constructComplexQueryString(ComplexSearchQuery complexSearchQuery) {
List<String> searchTerms = new ArrayList<>();
complexSearchQuery.getAuthors().ifPresent(authors -> authors.forEach(author -> searchTerms.add("name:" + author)));
complexSearchQuery.getTitlePhrases().ifPresent(titlePhrases -> titlePhrases.forEach(title -> searchTerms.add("title:" + title)));
complexSearchQuery.getAuthors().forEach(author -> searchTerms.add("name:" + author));
complexSearchQuery.getTitlePhrases().forEach(title -> searchTerms.add("title:" + title));
complexSearchQuery.getJournal().ifPresent(journal -> searchTerms.add("journal:" + journal));
// Since Springer API does not support year range search, we ignore formYear and toYear and use "singleYear" only
complexSearchQuery.getSingleYear().ifPresent(year -> searchTerms.add("year:" + year.toString()));
complexSearchQuery.getDefaultFieldPhrases().ifPresent(searchTerms::addAll);
searchTerms.addAll(complexSearchQuery.getDefaultFieldPhrases());
return String.join(" AND ", searchTerms);
}

Expand Down
59 changes: 0 additions & 59 deletions src/test/java/org/jabref/logic/importer/QueryConverterTest.java

This file was deleted.

Loading