-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement lint for regex::Regex compilation inside a loop #13412
base: master
Are you sure you want to change the base?
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @y21 (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
e44de97
to
63c6dac
Compare
63c6dac
to
711fd7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reviewing this 2 days ago but then forgot to continue 😅 Overall looks great aside from a few things
if let Some((_, fun, arg)) = extract_regex_call(self.definitions, self.cx, expr) | ||
&& (matches!(arg.kind, ExprKind::Lit(_)) || const_str(self.cx, arg).is_some()) | ||
{ | ||
span_lint_and_help( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is this needs to use span_lint_hir_and_then
to make #[allow]
/#[expect]
attributes on the Regex::new
call work correctly since this is emitting a warning on a different node (would be good to have a test case that allowing the lint works)
definitions: &self.definitions, | ||
}; | ||
|
||
visitor.visit_block(block); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now this is visiting all expressions in loops twice (once in the visitor and another time in the lint pass). We could just have a loop_stack: Vec<Span>
in the lint pass that a span gets pushed into in check_expr
for loops and popped in check_expr_post
.
Would get rid of the visitor and the need to use span_lint_hir_and_then
, though if this ends up being much more complicated than before then it's probably not worth it and we could just leave it. But it also seems like it could end up simpler - all we would need is a loop_stack.last()
call in check_expr to get the enclosing loop span if there is one and the rest of the already existing Regex::new()
matching could be reused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this way as having state in the lint itself made me worry that Something could go wrong, such as expressions not being visited in the expected order... but that idea of a loop stack makes me want to give it a go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of things in clippy rely on the fact that the iteration order is at the very least check_node → <everything contained in node> → check_node_post
. Grepping for check_.+_post
shows a bunch of cases that do something similar with state like here
/// ``` | ||
/// | ||
#[clippy::version = "1.83.0"] | ||
pub REGEX_COMPILE_IN_LOOP, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe regex_creation_in_loops
? Ideally lint names should be pluralized and make sense when read as a sentence together with #[allow]
(https://rust-lang.github.io/rfcs/0344-conventions-galore.html#lints)
/// ```no_run | ||
/// # let haystacks = [""]; | ||
/// # const MY_REGEX: &str = "a.b"; | ||
/// let regex = regex::Regex::new(MY_REGEX).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine a situation where the regex creation is in an unlikely (error) path in a loop and the suggested change of simply moving the Regex::new()
call outside the loop would go from compiling it almost never to always compiling it once
Though in those cases one can still move it outside the loop (or even into a static
) wrapped in a Lazy{Cell,Lock} so it's only compiled when accessed. The lint messages/description doesn't contradict this or specify/require how it should be moved out of the loop, but I think it'd still be useful to mention that somewhere because it might be non-obvious that that's an option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would also be good to link to regex's own description of the antipattern: https://docs.rs/regex/latest/regex/#avoid-re-compiling-regexes-especially-in-a-loop
Given regex's own advice it makes me wonder if we could just have a lint for regex creations with literals outside of a static anywhere as having it in a static with LazyLock would avoid recompiling it ever again even across function calls, but I can see how that's more... controversial than just this specific pattern |
Yeah, that would be a separate (pedantic) lint should probably have an issue opened for it, but definitely not this one. |
Closes #598.
Seems like a pretty simple one, I'm not sure if I sorted out all the lint plumbing correctly because I was adding it to the existing regex pass, but seems to work. The name is a bit jank and I'm super open to suggestions for changing it.
changelog: [
regex_compile_in_loop
]: Added lint for Regex compilation inside loops.