-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an [exec_partial_detailed] function to re. #219
Conversation
This allows users to see the groups that matched if a match exists for [exec_partial], and if not, then it tells you an index before which a match cannot exist.
lib/core.ml
Outdated
matches for regex that have boundary conditions with respect | ||
to the input string. | ||
*) | ||
let finalize () = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't call this function in every branch, but we allocate the closure eagerly. Can we hoist it up outside of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, somehow I missed the notification for this, and I went back to check on it, and found that there was a response. I pushed a new version that hoists this function.
CHANGES: * Add `Re.group_count` to get the number of groups in a compiled regex (ocaml/ocaml-re#218) * Add `Re.exec_partial_detailed` to allow resuming searches from partial inputs (ocaml/ocaml-re#219) * Re-export `Re.Perl`'s `Parse_error` and `Not_supported` exceptions in Pcre (ocaml/ocaml-re#222) * Add support for `DOTALL` flag in `Re.Pcre.regexp` (ocaml/ocaml-re#225) * Add support for named groups (ocaml/ocaml-re#223) * Add support for some control characters in `Re.Perl` (ocaml/ocaml-re#227)
CHANGES: * Add `Re.group_count` to get the number of groups in a compiled regex (ocaml/ocaml-re#218) * Add `Re.exec_partial_detailed` to allow resuming searches from partial inputs (ocaml/ocaml-re#219) * Re-export `Re.Perl`'s `Parse_error` and `Not_supported` exceptions in Pcre (ocaml/ocaml-re#222) * Add support for `DOTALL` flag in `Re.Pcre.regexp` (ocaml/ocaml-re#225) * Add support for named groups (ocaml/ocaml-re#223) * Add support for some control characters in `Re.Perl` (ocaml/ocaml-re#227)
This is similar to
exec_partial
, but it allows users to see the groups that matched if a match exists, and if not, then it tells you an index before which a match cannot start.Motivation
We had some code that was searching for a relatively small snippet within a large collection of text using regexes. This collection of text arrived in chunks, and the snippet could lie within the boundary of the chunks. We would only stop consuming chunks if we found the snippet. The most generic solution was to keep appending the chunks to a buffer, and redo the regex search from the beginning. However, that lead to$O(n^2)$ performance around larger inputs, and was resulting in significant slowdowns. Using this method, we could start the search at a later index, resulting in effectively $O(n)$ improvement.
Alternative solutions
We could export the current state of the match, but that seemed very complicated to fit into the existing code. The solution proposed here seemed to work well in practice.
Testing
Existing tests pass. Added additional tests for the new function.