Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pydocstyle] Escaped docstring in docstring (D301 ) #12192

Merged
merged 3 commits into from
Jul 18, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions crates/ruff_linter/resources/test/fixtures/pydocstyle/D301.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,61 @@ def make_unique_pod_id(pod_id: str) -> str | None:

def shouldnt_add_raw_here2():
u"Sum\\mary."


def shouldnt_add_raw_for_double_quote_docstring_contains_docstring():
"""
This docstring contains another double-quote docstring.

def foo():
\"\"\"Foo.\"\"\"
"""


def shouldnt_add_raw_for_double_quote_docstring_contains_docstring2():
"""
This docstring contains another double-quote docstring.

def bar():
\"""Bar.\"""
"""


def shouldnt_add_raw_for_single_quote_docstring_contains_docstring():
'''
This docstring contains another single-quote docstring.

def foo():
\'\'\'Foo.\'\'\'
'''


def shouldnt_add_raw_for_single_quote_docstring_contains_docstring2():
'''
This docstring contains another single-quote docstring.

def bar():
\'''Bar.\'''
'''

def shouldnt_add_raw_for_docstring_contains_escaped_double_triple_quotes():
"""
Escaped triple quote \""" or \"\"\".
"""

def shouldnt_add_raw_for_docstring_contains_escaped_single_triple_quotes():
'''
Escaped triple quote \''' or \'\'\'.
'''


def should_add_raw_for_single_double_quote_escape():
"""
This is single quote escape \".
"""


def should_add_raw_for_single_single_quote_escape():
'''
This is single quote escape \'.
'''
31 changes: 31 additions & 0 deletions crates/ruff_linter/src/rules/pydocstyle/rules/backslashes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,39 @@ pub(crate) fn backslashes(checker: &mut Checker, docstring: &Docstring) {
// Docstring contains at least one backslash.
let body = docstring.body();
let bytes = body.as_bytes();
let mut backslash_index = 0;
let double_quote_docstring_backslashes_pattern = b"\"\\\"\\\"";
let single_quote_docstring_backslashes_pattern = b"\'\\\'\\\'";
if memchr_iter(b'\\', bytes).any(|position| {
let escaped_char = bytes.get(position.saturating_add(1));
// Allow escaped docstring.
if matches!(escaped_char, Some(b'"' | b'\'')) {
// If the next three characters are equal to """, it indicates an escaped docstring pattern.
let escaped_triple_quotes =
&bytes[position.saturating_add(1)..position.saturating_add(4)];
if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" {
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if what comes after the \ is shorter than 3 characters. I would rewrite this to something like

Suggested change
let escaped_triple_quotes =
&bytes[position.saturating_add(1)..position.saturating_add(4)];
if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" {
return false;
}
let after_first_backslash = &bytes[position + 1..];
let is_escaped_triple = after_first_backslash.starts_with(b"\"\"\"")
|| after_first_backslash.starts_with(b"\'\'\'");
if is_escaped_triple {
return false;
}

// For the `"\"\"` pattern, each iteration advances by 2 characters.
// For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to any will be called twice for \\, once for each backslash position but the offsets aren't to indices apart.

What I understand is that you want to track if you're at the beginning of an escape sequence.

This is not fully fledged out, but I think we may have to rewrite the entire loop

    while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) {
        let after_escape = &body[position + 1..];
        let next_char_len = after_escape.chars().next().unwrap_or_default();

        let Some(escaped_char) = &after_escape.chars().next() else {
            break;
        };

        if matches!(escaped_char, '"' | '\'') {
            let is_escaped_triple =
                after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'");

            if is_escaped_triple {
                // don't add a diagnostic
            }
            
            if position != 0 && offset == position {
                // An escape sequence, e.g. `\a\b`
            }
        }
        
        

        offset = position + escaped_char.len_utf8();
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This helps a lot!

// Therefore, we utilize an index to keep track of the remaining characters.
let escaped_quotes_backslashes = &bytes
[position.saturating_add(1)..position.saturating_add(6 - backslash_index * 2)];
if escaped_quotes_backslashes
== &double_quote_docstring_backslashes_pattern[backslash_index * 2..]
|| escaped_quotes_backslashes
== &single_quote_docstring_backslashes_pattern[backslash_index * 2..]
{
backslash_index += 1;
// Reset to avoid overflow.
if backslash_index > 2 {
backslash_index = 0;
}
return false;
}
return true;
}
// Allow continuations (backslashes followed by newlines) and Unicode escapes.
!matches!(escaped_char, Some(b'\r' | b'\n' | b'u' | b'U' | b'N'))
}) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,43 @@ D301.py:37:5: D301 Use `r"""` if any backslashes in a docstring
|
= help: Add `r` prefix

D301.py:87:5: D301 [*] Use `r"""` if any backslashes in a docstring
|
86 | def should_add_raw_for_single_double_quote_escape():
87 | """
| _____^
88 | | This is single quote escape \".
89 | | """
| |_______^ D301
|
= help: Add `r` prefix

ℹ Unsafe fix
84 84 |
85 85 |
86 86 | def should_add_raw_for_single_double_quote_escape():
87 |- """
87 |+ r"""
88 88 | This is single quote escape \".
89 89 | """
90 90 |

D301.py:93:5: D301 [*] Use `r"""` if any backslashes in a docstring
|
92 | def should_add_raw_for_single_single_quote_escape():
93 | '''
| _____^
94 | | This is single quote escape \'.
95 | | '''
| |_______^ D301
|
= help: Add `r` prefix

ℹ Unsafe fix
90 90 |
91 91 |
92 92 | def should_add_raw_for_single_single_quote_escape():
93 |- '''
93 |+ r'''
94 94 | This is single quote escape \'.
95 95 | '''
Loading