Skip to content

Commit

Permalink
FASTMAIL: QueryParser: do not try to parse unbroken words as group terms
Browse files Browse the repository at this point in the history
This is submitted upstream at xapian#331

--- Original commit message:
Fixes a bug where QueryParser fails with an error when parsing
the sequence of: term, whitespace, unbroken words, term. The
underlying issue is that unbroken words can not be part of a
group according to the lemon grammar.

As the combination of unbroken words and terms is highly unlikely
to ever form a multi-term synonym, this patch changes the query
parser to leave group term mode after having parsed unbroken words.
  • Loading branch information
rsto committed Jan 25, 2024
1 parent 218e5c4 commit 52974ad
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 0 deletions.
3 changes: 3 additions & 0 deletions xapian-core/queryparser/queryparser.lemony
Original file line number Diff line number Diff line change
Expand Up @@ -1506,6 +1506,9 @@ phrased_term:

if (needs_word_break) {
Parse(&parser, UNBROKEN_WORDS, term_obj, &state);
// Drop out of IN_GROUP mode.
if (mode == IN_GROUP || mode == IN_GROUP2)
mode = DEFAULT;
if (it == end) break;
continue;
}
Expand Down
2 changes: 2 additions & 0 deletions xapian-core/tests/api_queryparser.cc
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,8 @@ static const test test_or_queries[] = {
{ "title:久有 归 天愿", "((XT久@1 AND XT有@1) OR 归@2 OR (天@3 AND 愿@3))" },

{ "h众ello万众", "(Zh@1 OR 众@2 OR Zello@3 OR (万@4 AND 众@4))" },
{ "x 我y", "(Zx@1 OR 我@2 OR Zy@3)" }, // WORD_BREAK ends group term
{ "x 我 y", "(Zx@1 OR 我@2 OR Zy@3)" }, // WORD_BREAK ends group term

// Korean splits some words by whitespace, and there is no available tool
// to crosscheck Korean word splits for these tests. So the expected values
Expand Down

0 comments on commit 52974ad

Please sign in to comment.