Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display spans correctly when there are zero-width or wide characters #45711

Merged
merged 1 commit into from
Nov 5, 2017

Conversation

tirr-c
Copy link
Contributor

@tirr-c tirr-c commented Nov 2, 2017

Hopefully...


Before:

error: invalid width `7` for integer literal
  --> unicode_2.rs:12:25
   |
12 |     let _ = ("a̐éö̲", 0u7);
   |                         ^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: invalid width `42` for integer literal
  --> unicode_2.rs:13:20
   |
13 |     let _ = ("아あ", 1i42);
   |                    ^^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: aborting due to 2 previous errors

After:

error: invalid width `7` for integer literal
  --> unicode_2.rs:12:25
   |
12 |     let _ = ("a̐éö̲", 0u7);
   |                     ^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: invalid width `42` for integer literal
  --> unicode_2.rs:13:20
   |
13 |     let _ = ("아あ", 1i42);
   |                      ^^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: aborting due to 2 previous errors

Spans might display incorrectly on the browser.

r? @estebank

@tirr-c
Copy link
Contributor Author

tirr-c commented Nov 2, 2017

Is this failure a travis issue?

@est31
Copy link
Member

est31 commented Nov 2, 2017

@tirr-c its no travis issue:

[00:03:44] tidy error: /checkout/src/libsyntax_pos/lib.rs:829: TODO is deprecated; use FIXME

[00:03:46] some tidy checks failed

You should also read contributing.md.

As one thing to check, does your code compute the correct number of ^ as well? E.g. in something like let _ = (42, a̐é); where should underline the a̐é with two ^ markers.

@tirr-c
Copy link
Contributor Author

tirr-c commented Nov 2, 2017

Oh, I've run tidy test before adding some comments. I thought the failure is because of travis since I didn't see any build logs after failure. Fixed.

And yes, it displays two ^'s:

error: non-ascii idents are not fully supported. (see issue #28979)
 --> test.rs:2:18
  |
2 |     let _ = (42, a̐é);
  |                  ^^
  |
  = help: add #![feature(non_ascii_idents)] to the crate attributes to enable

@kennytm kennytm added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 2, 2017
@tirr-c
Copy link
Contributor Author

tirr-c commented Nov 2, 2017

Maybe I need to rename fields and variables. The term "narrow" seems more appropriate than "half-width" here.

@est31
Copy link
Member

est31 commented Nov 2, 2017

"half-width" is an actual term so IMO it is okay: https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms

@tirr-c
Copy link
Contributor Author

tirr-c commented Nov 2, 2017

My understanding is that the term "narrow" is broader than "halfwidth." Also, most ASCII characters are defined as narrow, not halfwidth, so narrow would be better than halfwidth.

http://www.unicode.org/reports/tr11/

@tirr-c tirr-c changed the title Display spans correctly when there are non-half-width characters Display spans correctly when there are zero-width or wide characters Nov 2, 2017
Copy link
Contributor

@estebank estebank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with the approach, but I can't shake the feeling that performing the translation from bytepos to visual column in src/librustc_errors/emitter.rs's render_source_line by looking at the line itself and fixing the annotation positions there might be slightly less/more straightforward code.

That being said, the current approach is sound. Could you add the requested test? Once that's done, r=me.

@@ -618,7 +634,7 @@ impl Encodable for FileMap {
impl Decodable for FileMap {
fn decode<D: Decoder>(d: &mut D) -> Result<FileMap, D::Error> {

d.read_struct("FileMap", 6, |d| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the 6 a bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea, but for me 6 there didn't make sense. There were total of 7 (now 8) fields.

pub pos: BytePos,
/// The width of the character, 0 (zero-width) or 2 (full-width)
pub width: usize,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it'd make sense to make NonNarrowChar an enum with ZeroWidth(BytePos) and FullWidth(BytePos). The way it is being used below this is not a big issue, but it is good style to encode restrictions like this in the type system and provide either a Deref for usize or a method width(&self) that will give you the usize representation. This way you're assured no-one will come along in the future and try to use NonNarrowChar with a width of 12 😬

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree!

'\t' | '\n' =>
// Tabs will consume one column.
// Make newlines take one column so that displayed spans can point them.
1,
Copy link
Contributor

@estebank estebank Nov 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you're adding support for characters with varying widths, adding support for either 4 or 8 width tabs should be easy to add. Let's do so in a subsequent PR.

Done in #45953.

--> $DIR/unicode_2.rs:13:20
|
13 | let _ = ("아あ", 1i42);
| ^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per @est31's comment, could you add a case that points at the string itself to make sure regressions would get caught?

@tirr-c
Copy link
Contributor Author

tirr-c commented Nov 2, 2017

Made NonNarrowChar into a enum, and added a new test.

pub fn pos(&self) -> BytePos {
match *self {
NonNarrowChar::ZeroWidth(p) => p,
NonNarrowChar::Wide(p) => p,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to

match *self {
    NonNarrowChar::ZeroWidth(p) |
    NonNarrowChar::Wide(p) => p,
}

@estebank
Copy link
Contributor

estebank commented Nov 2, 2017

@bors r+

@bors
Copy link
Contributor

bors commented Nov 2, 2017

📌 Commit 272c2fa has been approved by estebank

@kennytm kennytm added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 2, 2017
@bors
Copy link
Contributor

bors commented Nov 3, 2017

⌛ Testing commit 272c2fa with merge 3e20b2c69630dc4e6752d30f9b3788c4456272a6...

@bors
Copy link
Contributor

bors commented Nov 3, 2017

💔 Test failed - status-travis

@kennytm
Copy link
Member

kennytm commented Nov 3, 2017

@bors retry

@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 3, 2017
@bors
Copy link
Contributor

bors commented Nov 4, 2017

⌛ Testing commit 272c2fa with merge 12e6b53...

bors added a commit that referenced this pull request Nov 4, 2017
Display spans correctly when there are zero-width or wide characters

Hopefully...
* fixes #45211
* fixes #8706

---

Before:
```
error: invalid width `7` for integer literal
  --> unicode_2.rs:12:25
   |
12 |     let _ = ("a̐éö̲", 0u7);
   |                         ^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: invalid width `42` for integer literal
  --> unicode_2.rs:13:20
   |
13 |     let _ = ("아あ", 1i42);
   |                    ^^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: aborting due to 2 previous errors
```

After:
```
error: invalid width `7` for integer literal
  --> unicode_2.rs:12:25
   |
12 |     let _ = ("a̐éö̲", 0u7);
   |                     ^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: invalid width `42` for integer literal
  --> unicode_2.rs:13:20
   |
13 |     let _ = ("아あ", 1i42);
   |                      ^^^^
   |
   = help: valid widths are 8, 16, 32, 64 and 128

error: aborting due to 2 previous errors
```

Spans might display incorrectly on the browser.

r? @estebank
@bors
Copy link
Contributor

bors commented Nov 5, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: estebank
Pushing 12e6b53 to master...

@bors bors merged commit 272c2fa into rust-lang:master Nov 5, 2017
@tirr-c tirr-c deleted the unicode-span branch November 5, 2017 07:02
bors added a commit that referenced this pull request Dec 6, 2017
Display `\t` in diagnostics code as four spaces

Follow up to #44386 using the unicode variable width machinery from #45711 to replace tabs in the source code when displaying a diagnostic error with four spaces (instead of only one), while properly accounting for this when calculating underlines.

Partly addresses #44618.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
6 participants