Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 2151, Raw Identifiers #48589

Closed
3 of 7 tasks
Centril opened this issue Feb 27, 2018 · 65 comments
Closed
3 of 7 tasks

Tracking issue for RFC 2151, Raw Identifiers #48589

Centril opened this issue Feb 27, 2018 · 65 comments
Assignees
Labels
B-RFC-implemented Blocker: Approved by a merged RFC and implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. finished-final-comment-period The final comment period is finished for this PR / Issue. P-high High priority T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@Centril
Copy link
Contributor

Centril commented Feb 27, 2018

This is a tracking issue for RFC 2151 (rust-lang/rfcs#2151).

Steps:

Unresolved questions:

  • Do macros need any special care with such identifier tokens?
    Probably not.
  • Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
    Depends on the edition?
  • Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
@Centril Centril added B-RFC-approved Blocker: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Feb 27, 2018
@nikomatsakis
Copy link
Contributor

@Centril you rock! @petrochenkov, think you could supply some mentoring instructions here?

@Lymia
Copy link
Contributor

Lymia commented Mar 6, 2018

I'd like to take a shot at this. It seems like it'd be a decent way to learn how rustc works.

@Manishearth
Copy link
Member

The relevant code is probably this, you'll want to make the ('r', Some('#'), _) case allow for the third character to be alphabetic or an underscore, and in that case skip the r and the # before running ident_continue.

We could add an is_raw boolean to token::Ident as well.

You'll also need to feature gate this but we can do that later.

lmk if you have questions

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 7, 2018

We could add an is_raw boolean to token::Ident as well.

This is possible, but would be unfortunate. Idents are used everywhere and supposed to be small.

Ideally we should limit the effect of r# to lexer, for example by interning r#keyword identifiers into separate slots (like gensyms) so r#keyword and keyword have different NameSymbols.

EDIT: The first paragraph is about ast::Ident, token::Ident may actually be the appropriate place.

@petrochenkov
Copy link
Contributor

Some clarifications are needed:

  • How r# affects context-dependent identifiers (aka weak keywords) like default.
    Do they lose their special context-dependent properties and turn into "normal identifiers"?
// `union` is a normal ident, this is not an error
union U {
    ....
}

// `union` is a raw ident, is this an error?
r#union U {
    ...
}
  • How does r# affect keywords that are "semantically special" and not "syntactically special"?
    I'm talking about path segment keywords specifically.
    For example, Self in Self::A::B is already treated as normal identifier during parsing, it only gains special abilities during name resolution when we resolve identifiers named Self (or self/super/etc) in a special way.
#[derive(Default)]
struct S;

impl S {
    fn f() -> S {
        r#Self::default() // Is this an error?
    }  
}

@Manishearth
Copy link
Member

oh, I didn't realize we reuse Ident from the lexer.

I think r#union is an error (when used to create a union). We'll need the ident lexing step to return a bool on the lexed ident's raw-ness.

I think it's ok for r#Self to work; but don't mind either way

@petrochenkov
Copy link
Contributor

Also, lifetime identifiers weren't covered by the RFC - r#'ident or 'r#ident.
(One more case of ident vs lifetime mismatch caused by lifetime token being a separate entity rather than a combination of ' and identifier, cc https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716).

@Manishearth
Copy link
Member

I think it's fine if we don't have raw lifetime identifiers. Lifetimes are crate-local, their identifiers never need to be used by consumers of your crate, so lifetimes clashing with keywords can simply be fixed on epochs. Admittedly, writing a lint that makes that automatic may be tricky.

Raw identifiers are primarily necessary because people may need to call e.g. functions named catch() in crates on an older epoch. This problem doesn't occur for lifetimes.

@petrochenkov
Copy link
Contributor

Yeah, it's mostly a consistency question rather than a practical issue.

@Lymia
Copy link
Contributor

Lymia commented Mar 7, 2018

From what I've been seeing while looking around the codebase, I think the best way to implement this is to add a new parameter to token::Ident, rather than messing with the Symbol itself?

I think this would make implementing epoch-specific keywords easier, since there's no question of what Symbol should be used when, and in what epoch. (For example, you'd have to make sure the Symbol for catch being used as an identifier in 2015 epoch code is the same as the Symbol for r#catch being used in a epoch where it's a full keyword.) This was already something I wasn't sure how to handle with contextual keywords.

My main questions, right now, would be:

  • Does this actually sound like the best approach?
  • Reading the code, it looks like most feature gating is done on the AST after parsing, and not during parsing. Since nothing in the AST would reflect the raw identifiers being there at all with this approach, the feature check would have to be in parser.rs. Would adding one there be an issue? How would I go about doing that, considering that module doesn't have other feature checks that I can use as a template?
  • A minor code style point: right now, token::Ident is declared as Ident(ast::Ident). To add an is_raw field would mean having a mystery unnamed bool field in a tuple struct, or making it use named fields, in which case, matching on token::Idents becomes nastier. One idea that did come to mind is adding an RawIdent(ast::Ident) variant, but then the compiler can't help me find places I might need to worry about raw identifiers. Any advice on this?

I'll implement lifetime parameters if it turns out to be easy to, I guess. As Manishearth said, it's not something you really need to escape ever.

@Manishearth
Copy link
Member

Yes, we should not be affecting Symbol.

Regarding the feature gate, we can solve the problem later, but I was thinking of doing a delayed error or something since we don't know what feature gates are available whilst lexing

a mystery unnamed bool field in a tuple struct

I think that's fine. Folks usually do this as Ident(ast::Ident, /* is_raw */ bool)

@Manishearth
Copy link
Member

I think it's best if we don't allow this to work for lifetime parameters, actually. We restrict the number of places where raw identifiers are allowed at a first pass, and if we need this for a later epoch, we add it then

@Manishearth
Copy link
Member

But yeah, your plan sounds good otherwise

@Lymia
Copy link
Contributor

Lymia commented Mar 7, 2018

.... right, that makes sense. The lexer obviously doesn't know what feature flags there are, because it's busy lexing them. :D

@nikomatsakis
Copy link
Contributor

@petrochenkov

In my opinion, r#foo should always be a "generic identifier" and hence not eligible as a contextual keyword. So:

How r# affects context-dependent identifiers (aka weak keywords) like default.

They are not special anymore. r#union Foo would be an error.

How does r# affect keywords that are "semantically special" and not "syntactically special"?

Is this just about self and Self? Does this apply to super too? We currently talk about self and Self as if they were keywords. I am therefore inclined to think that r#self would be "just another name" and not have the special properties that self ordinarily has.

So e.g. use r#self::foo would be an absolute path.

cc @rust-lang/lang -- do others agree?

@scottmcm
Copy link
Member

scottmcm commented Mar 8, 2018

@nikomatsakis That's exactly what I'd have expected.

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

(I wish it didn't, but it might be too late?)

@cuviper
Copy link
Member

cuviper commented Mar 8, 2018

I hesitate about treating self and r#self as distinct things, because it seems like this would allow overlapping names in the same scope, like:

impl Foo {
    fn foo(self, r#self: Bar) {
        println!("I have distinct {} and {} at the same time?", self, r#self);
    }
}

Maybe that's in fact OK, but if so it's at least a corner case to test...

(In general, foo and r#foo are supposed to refer to the same thing.)

@cuviper
Copy link
Member

cuviper commented Mar 8, 2018

@scottmcm I'd expect your foo!(union) to work, but not foo!(r#union). That is, macros will have to preserve the metadata whether a particular :ident is raw or not.

@Lymia
Copy link
Contributor

Lymia commented Mar 9, 2018

I've found some other unexpected places where raw identifiers might show up while implementing this. Should these be allowed?

  • #[r#struct]
  • macro_rules! foo { ($r#struct:expr) => { r#expr } }

There's some weirdness with the built-in procedural macros taking raw identifiers too:

  • concat_args!(r#abc, r#def)
  • format!("{struct}", r#struct = 0);

Also, libproc_macro seems to need to be able to deal with raw identifiers somehow too. I've been creating artificial Symbols with content like r#struct, but this doesn't seem like a very good solution since these Symbols wouldn't be used anywhere else. Any advice here?

@Manishearth
Copy link
Member

Procedural macros see them after lexing, so that will Just Work.

I don't think libproc_macro needs to know anything? Again, this is all after lexing.

I personally think it's fine to allow all those. Simplifies things.

@Lymia
Copy link
Contributor

Lymia commented Mar 9, 2018

libproc_macro has an unstable enum that represent a token: https://doc.rust-lang.org/nightly/proc_macro/enum.TokenNode.html

This needs to know about the new field in token::Ident to properly serialize/deserialize it. I don't know how much unstable stuff depends on it.

@Manishearth
Copy link
Member

Manishearth commented Mar 9, 2018 via email

@Lymia
Copy link
Contributor

Lymia commented Mar 9, 2018

That said, I don't know what that means with macros, since apparently this works:

macro_rules! foo {
    ($i:ident) => {
        $i Foo {
            x: u32,
            y: i32,
        }
    }
}

foo!(union);

Actually, checking on the playground, it looks like this isn't even unique to contextual keywords: https://play.rust-lang.org/?gist=98bba154f78cd9aba5838bf82ac2fbb4&version=stable

@Lymia
Copy link
Contributor

Lymia commented Mar 10, 2018

Hrm, another strange case that came up while writing tests:

Given this macro definition, which branch should test_macro!(r#a) match:

macro_rules! test_macro {
    (a) => { ... };
    (r#a) => { ... };
}

@Manishearth
Copy link
Member

@nikomatsakis ^ ?

Could even make this change based on the epoch. idk.

@Manishearth
Copy link
Member

macro matching is kinda-sorta-breakable already

@rfcbot
Copy link

rfcbot commented Jul 24, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Jul 24, 2018
@eddyb
Copy link
Member

eddyb commented Jul 31, 2018

You can't detect $foo properly in a macro_rules macro, but if you have a proc macro, it should work (at least now, maybe it has had issues in the past).

@rfcbot
Copy link

rfcbot commented Aug 3, 2018

The final comment period, with a disposition to merge, as per the review above, is now complete.

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Aug 3, 2018
@Mark-Simulacrum
Copy link
Member

I've nominated this mostly so that someone (lang team, maybe) can find or select someone to write up docs and stabilize this feature.

@cramertj cramertj self-assigned this Aug 9, 2018
@cramertj cramertj added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Aug 9, 2018
@cramertj
Copy link
Member

cramertj commented Aug 9, 2018

This is in need of a stabilization PR! There's a stabilization guide on the forge. This feature is already documented in the edition guide, so much of that documentation can probably be reused in the stabilization PRs. Please post here if you plan to take this issue! (I'll circulate it around and see if anyone wants to take it as a good first or third PR)

@alexreg
Copy link
Contributor

alexreg commented Aug 9, 2018

I'll have a go. This looks pretty straightforward.

@alexreg
Copy link
Contributor

alexreg commented Aug 9, 2018

When it comes to new documentation, is there anything that should be updated besides the Reference? I'm not sure it belongs in the Book.

kennytm added a commit to kennytm/rust that referenced this issue Aug 11, 2018
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril
kennytm added a commit to kennytm/rust that referenced this issue Aug 13, 2018
…mertj

Stabilise raw_identifiers feature

* [Reference PR](rust-lang/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes rust-lang#48589.

r? @cramertj
CC @cuviper @Centril
bors added a commit that referenced this issue Aug 21, 2018
Stabilise raw_identifiers feature

* [Reference PR](rust-lang/reference#395)
* [Book PR](rust-lang/book#1480)
* [Rust by Example PR](rust-lang/rust-by-example#1095)

Closes #48589.

r? @cramertj
CC @cuviper @Centril
@alexreg
Copy link
Contributor

alexreg commented Aug 21, 2018

Merged! I think some boxes can be ticked off now, @Centril. :-)

@alexreg
Copy link
Contributor

alexreg commented Aug 21, 2018

As for the unresolved questions:

  • Do macros need any special care with such identifier tokens?
    No, I can't see how this would affect macros. Perhaps @petrochenkov could second this, however.
  • Should diagnostics use the r# syntax when printing identifiers that overlap keywords?
    Yes, I think so, although possibly we should not do this if we're using an old edition and the keyword was only introduced in a later one.
  • Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
    I'm not sure of this. What do you mean by pub use old_epoch::*, @Centril?

@cuviper
Copy link
Member

cuviper commented Aug 21, 2018

Does rustdoc need to use the r# syntax? e.g. to document pub use old_epoch::*
I'm not sure of this. What do you mean by pub use old_epoch::*

For instance, if the old_epoch crate had a fn catch() which it could declare without bothering with raw identifiers, and a new_epoch crate used and re-exported it. The new crate would have had to use raw identifiers if it had declared that function itself. Should this affect the way rustdoc presents it?

@alexreg
Copy link
Contributor

alexreg commented Aug 21, 2018

@cuviper Right, makes sense. I think the rules should be the same as what I proposed for diagnostics, in that case.

@thatname
Copy link

thatname commented Apr 22, 2021

@Centril
I think it's still impossible for rust to reference external C++ symbols even with raw identifiers.
Because identifier names mangled by MS C++ ABI has characters such as '?', '@', which are not valid in rust identifiers.
For example, this is valid C++ code:

#ifdef _MSC_VER 
	// MS C++ ABI mangled identifier
	void* vtbl = __identifier("??_7BaseShader@pe3@@6B@");
#else
	// Itanium C++ ABI mangled name, all *nix toolchain use this.
	extern "C" void * _ZTVN3pe310BaseShaderE;
	void* vtbl = _ZTVN3pe310BaseShaderE ;
#endif

The __identifer("??_7BaseShader@pe3@@6B@") is a valid raw identifier in MS C++,
but r#??_7BaseShader@pe3@@6B@ is not valid in rust.
Will raw identifier be relaxed to accept these characters?

@Nemo157
Copy link
Member

Nemo157 commented Apr 22, 2021

Inter-language-operability was not a goal of this RFC, it was purely limited to inter-edition-operability within Rust. AFAIK #[link_name] should be enough for FFI purposes, there's no need to be able to have the identifiers in the Rust code identical to what is used to link to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-implemented Blocker: Approved by a merged RFC and implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. finished-final-comment-period The final comment period is finished for this PR / Issue. P-high High priority T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests