Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment/regex search #1588

Draft
wants to merge 7 commits into
base: develop
Choose a base branch
from
Draft

Experiment/regex search #1588

wants to merge 7 commits into from

Conversation

blms
Copy link
Contributor

@blms blms commented May 16, 2024

notes

  • regex_search is where the lucene query is built
  • get_regex_highlight is where the results are manually highlighted
    • used a bit of regex to get ~150 characters of context before and after the highlight, terminating at line breaks to prevent malformed html
    • also some complicated regex here to allow highlighting matches that span multiple lines, or appear on incomplete lines (i.e. snippet includes an opening or closing li tag but not both)
  • right now it's accessible by a separate URL, protected by change_document permissions
    • when available to the public it will likely be a switch on the normal search page, with some help text provided after enabling it
  • updated the clean_html method to prevent extra whitespace getting added inside <em> and <li> tags, as it otherwise breaks formatting for highlights
  • we're now getting matches across multiple transcriptions on the same document sometimes, so I added a little ellipsis to the template in case that happens
  • also added a feature flag and template logic/css for displaying relevance score
  • no unit tests yet!

@blms blms requested a review from rlskoeser May 16, 2024 19:22
Copy link

codecov bot commented May 16, 2024

Codecov Report

Attention: Patch coverage is 31.50685% with 50 lines in your changes are missing coverage. Please review.

Project coverage is 98.45%. Comparing base (9a5452a) to head (43750b2).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1588      +/-   ##
===========================================
- Coverage    98.81%   98.45%   -0.36%     
===========================================
  Files          238      238              
  Lines        13809    13878      +69     
===========================================
+ Hits         13645    13664      +19     
- Misses         164      214      +50     

@blms blms temporarily deployed to staging May 20, 2024 15:19 Inactive
@blms blms temporarily deployed to staging May 23, 2024 16:41 Inactive
@rlskoeser rlskoeser requested a deployment to staging June 11, 2024 17:45 Pending
@rlskoeser rlskoeser temporarily deployed to staging June 11, 2024 17:52 Inactive
@princetoncdh princetoncdh deployed to staging June 11, 2024 18:46 Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants