Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescaping cleanups #103919

Merged
merged 10 commits into from
Nov 9, 2022
Merged

Unescaping cleanups #103919

merged 10 commits into from
Nov 9, 2022

Conversation

nnethercote
Copy link
Contributor

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? @matklad

These have been bugging me for a while.

- `literal_text`: `src` is also used and is shorter and better.
- `first_char`: used even when "first" doesn't make sense; `c` is
  shorter and better.
- `curr`: `c` is shorter and better.
- `unescaped_char`: `result` is also used and is shorter and better.
- `second_char`: these have a single use and can be elided.
There is some subtlety here.
It's passed to numerous places where we just need an `is_byte` bool.
Passing the bool avoids the need for some assertions.

Also rename `is_bytes()` as `is_byte()`, to better match `Mode::Byte`,
`Mode::ByteStr`, and `Mode::RawByteStr`.
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Nov 3, 2022
compiler/rustc_lexer/src/unescape.rs Outdated Show resolved Hide resolved
compiler/rustc_lexer/src/unescape.rs Show resolved Hide resolved
src/test/ui/attributes/key-value-non-ascii.stderr Outdated Show resolved Hide resolved
src/test/ui/parser/unicode-control-codepoints.stderr Outdated Show resolved Hide resolved
@@ -351,7 +338,7 @@ where
}
}

fn byte_from_char(c: char) -> u8 {
pub fn byte_from_char(c: char) -> u8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit dicey. I agree that in the context of compiler we don't really care about nice abstrction boundaries, and this is a simplification.

But we want rustc_lexer to also be a nice crates.io crates, and from that point of view this feels a bit too type-unsafe.

No strong opinion overall.

Though, if we do this, this one surely wants to be tagged as #[inline]? tiny cross-crate fn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a fair point. A couple of questions:

  • Is rustc_lexer actually usable outside of rustc/rust-analyzer? the rustc-lexer crate is v0.1.0 and hasn't been updated in three years. I see some auto-updated variants but none of them have been updated in the past year.
  • I'm not sure how this is type-unsafe. You'll get u8/char type mismatches if you do something wrong?
  • Inlining is a good idea.

There are three kinds of "byte" literals: byte literals, byte string
literals, and raw byte string literals. None are allowed to have
non-ASCII chars in them.

Two `EscapeError` variants exist for when that constraint is violated.
- `NonAsciiCharInByte`: used for byte literals and byte string literals.
- `NonAsciiCharInByteString`: used for raw byte string literals.

As a result, the messages for raw byte string literals use different
wording, without good reason. Also, byte string literals are incorrectly
described as "byte constants" in some error messages.

This commit eliminates `NonAsciiCharInByteString` so the three cases are
handled similarly, and described correctly. The `mode` is enough to
distinguish them.

Note: Some existing error messages mention "byte constants" and some
mention "byte literals". I went with the latter here, because it's a
more correct name, as used by the Reference.
Remove a low-value comment, remove a duplicate comment, and correct a
third comment.
It deals with eight cases: ints, floats, and the six quoted types
(char/byte/strings). For ints and floats we have an early return, and
the other six types fall through to the code at the end, which makes the
function hard to read.

This commit rearranges things to avoid the early returns.
It has a single callsite, and is fairly small. The `Float` match arm
already has base-specific checking inline, so this makes things more
consistent.
@nnethercote
Copy link
Contributor Author

Thanks for the comments. I have updated.

  • result/res renaming is in a new commit.
  • Error message wording changes and incorrect suggestion changes are in the original commit.
  • Inlining of byte_from_char is in the original commit.

@rust-log-analyzer

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

The job mingw-check failed!

This was a network error, when downloading some crates.

@matklad
Copy link
Member

matklad commented Nov 6, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Nov 6, 2022

📌 Commit 43d21b5 has been approved by matklad

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 6, 2022
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 7, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? `@matklad`
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 7, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? ``@matklad``
The `usize` isn't needed in the error case.
@matklad
Copy link
Member

matklad commented Nov 8, 2022

Makes sense. We do use it here

https://github.com/rust-lang/rust-analyzer/blob/df4ed94f2fb7c16daefa1722657216609f0e8d31/crates/syntax/src/validation.rs#L160

but I think that’s not essential and using zero offfset would work.

btw, consider sending a PR to upgrade the crate to rust-analyzer, once we get a new ap version, it might be helpful to learn how we use it from that side.

@bors r+

@bors
Copy link
Contributor

bors commented Nov 8, 2022

📌 Commit d6c97a3 has been approved by matklad

It is now in the queue for this repository.

@nnethercote
Copy link
Contributor Author

We do use it here

Oh, I overlooked that. I thought rust-analyzer was getting built by the rustc build system. Is that not the case?

btw, consider sending a PR to upgrade the crate to rust-analyzer, once we get a new ap version, it might be helpful to learn how we use it from that side.

Can you expand on this? I don't really understand the rustc/rust-analyzer integration, so I don't understand everything in that sentence.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 8, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? `@matklad`
@matklad
Copy link
Member

matklad commented Nov 8, 2022

The main insight is that rust-analyzer is a bog-standard rust project. It’s build using cargo build with a stable compiler, and all of its deps are on crates.io.

In particular, this is how it gets rustc_lexer:

https://github.com/rust-lang/rust-analyzer/blob/977a029c1ec97a62810943bcdc8c6e540ad16baa/crates/syntax/Cargo.toml#L17

so what you’d need to do is to wait for the next auto-published version (rustc crates are published every night I think), and than change the dep in Cargo toml.

We could somehow entangle rust-analyzer and rustc repos, to make sure they use the same source, but that would break “ra is a normal cargo project”, and that I consider to be super-important property.

As an alternative, we could make both rustc and rust-analyzer depend on crates.io version of the crate with a nice semver-stable API. In general, this is a bad idea for compiler crates (semver-stability is very costly), but for lexer specifically it might make sense: pulling the “official” Rust lexer from crates.io sounds nifty.

Finally, a third alternative would be to change rustc itself to be a normal cargo project (only std needs to be special, see https://matklad.github.io/2020/09/12/rust-in-2021.html#compiling-the-compiler). If that happens (which I am not holding my breath for), merging rustc and rust-analyzer’s repos would be a no-brainer.

@nnethercote
Copy link
Contributor Author

Thanks for the info. My rust repo has src/tools/rust-analyzer... is that a git submodule, or something?

@matklad
Copy link
Member

matklad commented Nov 8, 2022

I think we are using some fancy fit feature here, but I don’t know, cc @Veykril

the source of truth for rust-analyzer is https://github.com/rust-lang/rust-analyzer

(And I most definitely not even thinking about reading “optimizing rust-analyzer in 2022” blog post :) )

@Veykril
Copy link
Member

Veykril commented Nov 8, 2022

Thanks for the info. My rust repo has src/tools/rust-analyzer... is that a git submodule, or something?

rust-analyzer is pulled in as a git subtree nowadays (and we are currently having trouble with syncing changes from rust-lang/rust ->rust-lang/rust-analyzer rust-lang/rust-analyzer#13459)

Note that we are currently 2 versions behind on the latest autopublished rustc lexer crate, updating it requires some slight changes due to the token prefix stuff that was added a while ago if I recall correctly

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 8, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? `@matklad`
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 8, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? ``@matklad``
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this pull request Nov 9, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? ```@matklad```
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 9, 2022
Rollup of 7 pull requests

Successful merges:

 - rust-lang#103570 (Stabilize integer logarithms)
 - rust-lang#103694 (Add documentation examples for `pointer::mask`)
 - rust-lang#103919 (Unescaping cleanups)
 - rust-lang#103933 (Promote {aarch64,i686,x86_64}-unknown-uefi to Tier 2)
 - rust-lang#103952 (Don't intra linkcheck reference)
 - rust-lang#104111 (rustdoc: Add mutable to the description)
 - rust-lang#104125 (Const Compare for Tuples)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
Manishearth added a commit to Manishearth/rust that referenced this pull request Nov 9, 2022
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? `````@matklad`````
@bors bors merged commit 4b50fb3 into rust-lang:master Nov 9, 2022
@rustbot rustbot added this to the 1.67.0 milestone Nov 9, 2022
@nnethercote nnethercote deleted the unescaping-cleanups branch November 10, 2022 06:39
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? ````@matklad````
Aaron1011 pushed a commit to Aaron1011/rust that referenced this pull request Jan 6, 2023
Rollup of 7 pull requests

Successful merges:

 - rust-lang#103570 (Stabilize integer logarithms)
 - rust-lang#103694 (Add documentation examples for `pointer::mask`)
 - rust-lang#103919 (Unescaping cleanups)
 - rust-lang#103933 (Promote {aarch64,i686,x86_64}-unknown-uefi to Tier 2)
 - rust-lang#103952 (Don't intra linkcheck reference)
 - rust-lang#104111 (rustdoc: Add mutable to the description)
 - rust-lang#104125 (Const Compare for Tuples)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
bors pushed a commit to rust-lang-ci/rust that referenced this pull request Jan 9, 2023
…=matklad

Unescaping cleanups

Some code improvements, and some error message improvements.

Best reviewed one commit at a time.

r? ````@matklad````
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants