Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search option for "contains any text separated by ORs" #224

Closed
5 tasks done
fedarko opened this issue Sep 26, 2019 · 0 comments
Closed
5 tasks done

Add search option for "contains any text separated by ORs" #224

fedarko opened this issue Sep 26, 2019 · 0 comments
Assignees
Labels
enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state
Milestone

Comments

@fedarko
Copy link
Collaborator

fedarko commented Sep 26, 2019

progress

  • implement initial version of code
  • add basic tests
  • add corner-case tests (e.g. no |s)
  • either fix or document trimming discrepancy with normal text searching in no |s case
    • For reference: in normal text searching, whitespace (even surrounding whitespace) isn't removed. However, in this searching mode, we trim whitespace around the search options: so using e.g. abc | def | ghi|jk as your input text searches for matches against abc, def, ghi, jk.
  • document -- this would be good to mention in moving pictures tutorial, can select multiple arrows at once by separating their IDs with |s)

description

Searching name is subject to change.

Essentially, this would let you search by multiple possible arbitrary substrings, without the limitations inherent to the current separated text fragment searching. The inputs would be separated by ORs, similar to SQL queries?

An example input string would be

g__thing; s__speciesname OR s__coli OR p__Proteobacteria

And any features with taxa containing g__thing; s__speciesname, or s__coli, or p__Proteobacteria would match. (We could just split up the input query by ORs, then trim off any surrounding whitespace from each substring.)

This would need to be properly documented (#123), but it's a much more natural way to search through multiple features at once than the current "contains the separated text fragment(s)" option. This is essentially another way of coming at #140, but I think this is much more doable in the short-term (and honestly it might be a bit more intuitive).

This would solve the case where you have polyphyletic taxa, e.g., the same species name in different genera—the current searching methods in Qurro are not fully equipped to handle that situation (you can search for a particular species using the "contains the text" searching option, but if you want to search for multiple features at once this isn't possible). An example of the "same species name in different genera" problem is e.g. P. gingivalis and H. gingivalis.

Thanks @lisa55asil for bringing this issue to my attention!

@fedarko fedarko added the enhancement New feature or request label Sep 26, 2019
@fedarko fedarko self-assigned this Sep 26, 2019
@fedarko fedarko added the important Things that are critical for getting Qurro in a working/useful state label Feb 27, 2020
fedarko added a commit to fedarko/qurro that referenced this issue Mar 8, 2020
Relevant to biocore#225.

At least there's finally an official explanation for what "separated
text fragments" does, but sheesh, this highlights the need for better
searching methods (biocore#224).
@fedarko fedarko changed the title Add search option for "contains multiple possible text fragments" Add search option for "contains any text separated by ORs" Mar 18, 2020
fedarko added a commit to fedarko/qurro that referenced this issue Mar 18, 2020
Still need to add tests for this code, but this seems to work fine.

Uses | characters as the dividers.

If the input text has no |s, then using this is identical to normal
searching (*with the exception* that this will trim leading/trailing
whitespace from the input text).
@fedarko fedarko added this to the By next lab meeting talk milestone Mar 18, 2020
fedarko added a commit to fedarko/qurro that referenced this issue Apr 26, 2020
The "global leak" problem (with currVal not being declared before its
use inside orFilterFeatures()) was actually the first time I've seen
Mocha find this sorta problem! (That I remember, at least.) Really
cool that this infrastructure is paying off.
fedarko added a commit to fedarko/qurro that referenced this issue Apr 26, 2020
This is close to being done. Want to test this works with taxonomy
strings (would be good to add an example similar to Lisa's), then
need to document this in the MP tutorial and address the whitespace
junk as discussed in biocore#224.
fedarko added a commit to fedarko/qurro that referenced this issue Apr 26, 2020
The error message improved has to do with what's thrown when
a nonexistent field name is passed to filterFeatures(). Should never
be seen by the user, but having a more detailed error message (that
includes the name of said nonexistent field) will help with debugging.
(I also modified the test for that error message to check that
the error message is actually thrown.)
fedarko added a commit to fedarko/qurro that referenced this issue Apr 27, 2020
fedarko added a commit to fedarko/qurro that referenced this issue Apr 27, 2020
fedarko added a commit to fedarko/qurro that referenced this issue Apr 27, 2020
you know, the last minute junk you think of while you're washing your
hair in the shower. standard stuff.

Think all that's left for biocore#224 now is documentation :)
fedarko added a commit to fedarko/qurro that referenced this issue May 1, 2020
Remembered that I wanted to bake a guarantee for this in
@fedarko fedarko closed this as completed in 19446aa May 4, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request important Things that are critical for getting Qurro in a working/useful state
Projects
None yet
Development

No branches or pull requests

1 participant