Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pydocstyle] Escaped docstring in docstring (D301 ) #12192

Merged
merged 3 commits into from
Jul 18, 2024

Conversation

ukyen8
Copy link
Contributor

@ukyen8 ukyen8 commented Jul 4, 2024

Summary

This PR updates D301 rule to allow inclduing escaped docstring, e.g. \"""Foo.\""" or \"\"\"Bar.\"\"\", within a docstring.

Related issue: #12152

Test Plan

Add more test cases to D301.py and update the snapshot file.

@ukyen8 ukyen8 marked this pull request as ready for review July 4, 2024 21:44
@charliermarsh charliermarsh added bug Something isn't working docstring Related to docstring linting or formatting labels Jul 4, 2024
@@ -69,8 +69,35 @@ pub(crate) fn backslashes(checker: &mut Checker, docstring: &Docstring) {
// Docstring contains at least one backslash.
let body = docstring.body();
let bytes = body.as_bytes();
let mut backslash_index = 0;
let escaped_docstring_backslashes_pattern = b"\"\\\"\\\"";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely also need to handle single quotes here (i.e., escaped single quotes within single-quote docstrings).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But D301 is based on double quotes, do we need to cover single-quote docstring here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't verified this myself but the way I read the code is that docstrings are extracted from any string literal

/// Extract a docstring from a function or class body.
pub(crate) fn docstring_from(suite: &[Stmt]) -> Option<&ast::ExprStringLiteral> {
let stmt = suite.first()?;
// Require the docstring to be a standalone expression.
let Stmt::Expr(ast::StmtExpr { value, range: _ }) = stmt else {
return None;
};
// Only match strings.
value.as_string_literal_expr()
}

@ukyen8 ukyen8 force-pushed the d301-docstring-in-docstring branch from 84ba7f5 to 0d94f8e Compare July 12, 2024 23:00
Comment on lines 80 to 85
let escaped_triple_quotes =
&bytes[position.saturating_add(1)..position.saturating_add(4)];
if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" {
return false;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will panic if what comes after the \ is shorter than 3 characters. I would rewrite this to something like

Suggested change
let escaped_triple_quotes =
&bytes[position.saturating_add(1)..position.saturating_add(4)];
if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" {
return false;
}
let after_first_backslash = &bytes[position + 1..];
let is_escaped_triple = after_first_backslash.starts_with(b"\"\"\"")
|| after_first_backslash.starts_with(b"\'\'\'");
if is_escaped_triple {
return false;
}

Comment on lines 86 to 87
// For the `"\"\"` pattern, each iteration advances by 2 characters.
// For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to any will be called twice for \\, once for each backslash position but the offsets aren't to indices apart.

What I understand is that you want to track if you're at the beginning of an escape sequence.

This is not fully fledged out, but I think we may have to rewrite the entire loop

    while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) {
        let after_escape = &body[position + 1..];
        let next_char_len = after_escape.chars().next().unwrap_or_default();

        let Some(escaped_char) = &after_escape.chars().next() else {
            break;
        };

        if matches!(escaped_char, '"' | '\'') {
            let is_escaped_triple =
                after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'");

            if is_escaped_triple {
                // don't add a diagnostic
            }
            
            if position != 0 && offset == position {
                // An escape sequence, e.g. `\a\b`
            }
        }
        
        

        offset = position + escaped_char.len_utf8();
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This helps a lot!

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@charliermarsh charliermarsh merged commit 0ba7fc6 into astral-sh:main Jul 18, 2024
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docstring Related to docstring linting or formatting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants