Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_js_analyze): noControlCharactersInRegex #4656

Merged
merged 3 commits into from
Jul 15, 2023
Merged

feat(rome_js_analyze): noControlCharactersInRegex #4656

merged 3 commits into from
Jul 15, 2023

Conversation

unvalley
Copy link
Contributor

@unvalley unvalley commented Jul 4, 2023

Summary

Closes #3962

noControlCharactersInRegex

description

Prevents from having control characters and some escape sequences that match control characters in regular expressions.

Control characters are hidden special characters that are numbered from 0 to 31 in the ASCII system. They're not commonly used in JavaScript text. So, if you see them in a pattern (called a regular expression), it's probably a mistake.

The following elements of regular expression patterns are considered possible errors in typing and are therefore disallowed by this rule:

- Hexadecimal character escapes from \x00 to \x1F
- Unicode character escapes from \u0000 to \u001F
- Unicode code point escapes from \u{0} to \u{1F}
- Unescaped raw characters from U+0000 to U+001F

Control escapes such as \t and \n are allowed by this rule.

ESLint: https://eslint.org/docs/latest/rules/no-control-regex

Test Plan

cargo test -p rome_js_analyze -- no_control_characters_in_regex

@netlify
Copy link

netlify bot commented Jul 4, 2023

Deploy Preview for docs-rometools ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 047e901
🔍 Latest deploy log https://app.netlify.com/sites/docs-rometools/deploys/64b251ebd2d5780008f8ddda
😎 Deploy Preview https://deploy-preview-4656--docs-rometools.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@github-actions github-actions bot added A-Linter Area: linter A-Parser Area: parser A-Project Area: project configuration and loading A-Diagnostic Area: errors and diagnostics labels Jul 4, 2023
@unvalley unvalley marked this pull request as ready for review July 11, 2023 17:20
Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a suggestion. Feel free to take or reject my suggestion. And if you take it, feel free to apply the suggestion in the next PR. Doing it in the next PR would allow us to catch possible regressions :)

At the moment logic of the rule creates a string every time we see a control character. I believe this is not efficient because technically we already have that string inside the node.

My suggestion would be to store the range of where the control character is. With range I mean the index where the control character starts and the index where the control character ends. You can save the index as (usize, usize) or TextRange.

Doing so would allow us to do better things:

  • we don't allocate a new string for every control character
  • to retrieve the string, we just need do
     let control_character = original_string[start_index..end_index]; 
  • we can create a TextRange from those indexes and show the control character in the diagnostic


fn decode_unicode_escape_to_code_point(iter: &mut Peekable<Chars>) -> Option<(String, i64)> {
let mut digits = String::new();
for _ in 0..4 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a comment to explain why 0..4? To me, as a first reader, don't make sense

decode: fn(&mut Peekable<Chars>) -> Option<(String, i64)>,
) {
if let Some((s, cp)) = decode(iter) {
if (0..32).contains(&cp) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to explain 0..32

Comment on lines +239 to +241
markup! {
"Unexpected control character(s) in regular expression: "<Emphasis>{state.join(", ")}</Emphasis>""
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should let the user know what they should do here. For example, I see the diagnostic for the first time, and I don't know how to solve the issue 😄

@unvalley
Copy link
Contributor Author

@ematipico Thanks for the review! I addressed reviewed points.
After merging this PR, I'll create an issue for updating to use TextRange instead of creating new string.

test: add test cases

feat: restore previous implementation

docs: lint rule details

chore: use js instead of cjs

chore: codegen

chore: fix lint rule category order

refactor: logics in rule

chore: codegen

Update crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>

Update crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>

Update crates/rome_js_analyze/src/analyzers/nursery/no_control_characters_in_regex.rs

Co-authored-by: Emanuele Stoppa <my.burning@gmail.com>

refactor: update comments and naming

feat: add note to the rule and update docs

refactor: fix remaining
@unvalley unvalley merged commit 5510f5f into rome:main Jul 15, 2023
18 checks passed
@unvalley unvalley deleted the no-control-characters-in-regex branch July 15, 2023 09:18
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Diagnostic Area: errors and diagnostics A-Linter Area: linter A-Parser Area: parser A-Project Area: project configuration and loading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

noControlCharactersInRegex, no-control-regex
2 participants