Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QueryParser: do not try to parse unbroken words as group terms #331

Closed

Conversation

rsto
Copy link
Contributor

@rsto rsto commented Jan 25, 2024

Fixes a bug where QueryParser fails with an error when parsing the sequence of: term, whitespace, unbroken words, term. The underlying issue is that unbroken words can not be part of a group according to the lemon grammar.

As the combination of unbroken words and terms is highly unlikely to ever form a multi-term synonym, this patch changes the query parser to leave group term mode after having parsed unbroken words.

Fixes a bug where QueryParser fails with an error when parsing
the sequence of: term, whitespace, unbroken words, term. The
underlying issue is that unbroken words can not be part of a
group according to the lemon grammar.

As the combination of unbroken words and terms is highly unlikely
to ever form a multi-term synonym, this patch changes the query
parser to leave group term mode after having parsed unbroken words.
rsto added a commit to cyrusimap/xapian that referenced this pull request Jan 25, 2024
This is submitted upstream at xapian#331

--- Original commit message:
Fixes a bug where QueryParser fails with an error when parsing
the sequence of: term, whitespace, unbroken words, term. The
underlying issue is that unbroken words can not be part of a
group according to the lemon grammar.

As the combination of unbroken words and terms is highly unlikely
to ever form a multi-term synonym, this patch changes the query
parser to leave group term mode after having parsed unbroken words.
rsto added a commit to cyrusimap/cyruslibs that referenced this pull request Jan 25, 2024
@ojwb
Copy link
Contributor

ojwb commented Jan 29, 2024

Thanks. Seems reasonable, but I'd like to have a look at the wider context before merging.

@ojwb
Copy link
Contributor

ojwb commented Mar 7, 2024

Sorry about taking so long to getting to looking into this.

I wondered if we should also be doing the EMPTY_GROUP_OK thing, but adding that in I can get parse errors for cases which should parse so it seems not.

I think you're right that we don't need to worry about trying multi-word synonyms in this situation - they'd need to include words from different scripts. If someone really wants that to work we can revisit.

I think it'd be good to exercise the IN_GROUP2 case as well as the IN_GROUP one so I'll add variants of your testcases which do that and merge via git.xapian.org in a few minutes.

@ojwb ojwb closed this in 624207c Mar 7, 2024
ojwb pushed a commit that referenced this pull request Mar 7, 2024
Fixes a bug where QueryParser fails with an error when parsing
the sequence of: term, whitespace, unbroken words, term. The
underlying issue is that unbroken words can not be part of a
group according to the lemon grammar.

As the combination of unbroken words and terms is highly unlikely
to ever form a multi-term synonym, this patch changes the query
parser to leave group term mode after having parsed unbroken words.

Fixes #331

(cherry picked from commit 624207c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants