Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an [exec_partial_detailed] function to re. #219

Merged
merged 3 commits into from
Aug 13, 2023

Conversation

mben-romdhane
Copy link
Contributor

This is similar to exec_partial, but it allows users to see the groups that matched if a match exists, and if not, then it tells you an index before which a match cannot start.

Motivation

We had some code that was searching for a relatively small snippet within a large collection of text using regexes. This collection of text arrived in chunks, and the snippet could lie within the boundary of the chunks. We would only stop consuming chunks if we found the snippet. The most generic solution was to keep appending the chunks to a buffer, and redo the regex search from the beginning. However, that lead to $O(n^2)$ performance around larger inputs, and was resulting in significant slowdowns. Using this method, we could start the search at a later index, resulting in effectively $O(n)$ improvement.

Alternative solutions

We could export the current state of the match, but that seemed very complicated to fit into the existing code. The solution proposed here seemed to work well in practice.

Testing

Existing tests pass. Added additional tests for the new function.

This allows users to see the groups that matched if a match
exists for [exec_partial], and if not, then it tells you
an index before which a match cannot exist.
lib/core.ml Outdated
matches for regex that have boundary conditions with respect
to the input string.
*)
let finalize () =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't call this function in every branch, but we allocate the closure eagerly. Can we hoist it up outside of this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, somehow I missed the notification for this, and I went back to check on it, and found that there was a response. I pushed a new version that hoists this function.

mben-romdhane and others added 2 commits August 3, 2023 16:52
Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
@rgrinberg rgrinberg added this to the 1.11.0 milestone Aug 13, 2023
@rgrinberg rgrinberg merged commit 2b529a8 into ocaml:master Aug 13, 2023
2 checks passed
rgrinberg added a commit to rgrinberg/opam-repository that referenced this pull request Aug 19, 2023
CHANGES:

* Add `Re.group_count` to get the number of groups in a compiled regex (ocaml/ocaml-re#218)
* Add `Re.exec_partial_detailed` to allow resuming searches from partial inputs
  (ocaml/ocaml-re#219)
* Re-export `Re.Perl`'s `Parse_error` and `Not_supported` exceptions
  in Pcre (ocaml/ocaml-re#222)
* Add support for `DOTALL` flag in `Re.Pcre.regexp` (ocaml/ocaml-re#225)
* Add support for named groups (ocaml/ocaml-re#223)
* Add support for some control characters in `Re.Perl` (ocaml/ocaml-re#227)
nberth pushed a commit to nberth/opam-repository that referenced this pull request Jun 18, 2024
CHANGES:

* Add `Re.group_count` to get the number of groups in a compiled regex (ocaml/ocaml-re#218)
* Add `Re.exec_partial_detailed` to allow resuming searches from partial inputs
  (ocaml/ocaml-re#219)
* Re-export `Re.Perl`'s `Parse_error` and `Not_supported` exceptions
  in Pcre (ocaml/ocaml-re#222)
* Add support for `DOTALL` flag in `Re.Pcre.regexp` (ocaml/ocaml-re#225)
* Add support for named groups (ocaml/ocaml-re#223)
* Add support for some control characters in `Re.Perl` (ocaml/ocaml-re#227)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants