RFC: Raw Identifiers #2151

cuviper · 2017-09-14T17:04:13Z

Add a raw identifier format r#ident, so crates written in future
language epochs/versions can still use an older API that overlaps with
new keywords.

(rendered)

Add a raw identifier format `r#ident`, so crates written in future language epochs/versions can still use an older API that overlaps with new keywords.

est31 · 2017-09-14T17:24:19Z

Generally I'm in support of the RFC. However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

There might also be a need for raw keywords in the other direction, e.g. so the
older epoch can still use the new catch functionality somehow. I think this
particular case is already served well enough by do catch { ... }, if we
choose to stabilize it that way.

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

SimonSapin · 2017-09-14T17:25:23Z

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

cuviper · 2017-09-14T17:35:07Z

@est31

However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

I prefer generality myself. I could see having a lint for "unnecessarily raw identifier", but I see no reason to forbid this.

@SimonSapin

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

Correct. Some of the discussed alternatives could allow extended characters, but that's not what I'm proposing. If some people do want extended characters, then we might want to choose a syntax that would allow that, even if we don't extend it initially.

cuviper · 2017-09-14T17:38:56Z

@est31

There might also be a need for raw keywords in the other direction, [...]

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

I dismissed the br# alternative as being unnecessary, but maybe it would work for this?
i.e. r#ident and br#keyword

scottmcm · 2017-09-14T17:46:08Z

I like not extending the identifier alphabet here.

the feature should only be available through a whitelist, where its actually useful

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions. I want to be able to update my code to avoid a new-epoch keyword while still being able to compile it with the current stable that doesn't know about that keyword yet.

est31 · 2017-09-14T18:03:17Z

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions.

Epochs work differently. Any future compiler version will support the epoch of your code, that's what the epochs RFC guarantees. So if you say that your codebase uses the old epoch, you can freely use the identifier, and you are compatible with all future compilers. This will be even enforced in macros (macros will get epoch hygiene)! If you say that your codebase uses the new epoch, your crate can obviously only be compiled by compiler versions that support that epoch, this has nothing to do with the whitelist. But if you opt in to the new epoch, the whitelisted keywords will be available to you.

The only thing that a whitelist will make harder is wanting to be able to "support" multiple epochs, but this isn't really a legitimate real-world case IMO because your code will always be in exactly one epoch as you must explictly specify it (except for the 2015 epoch which is the default).

There is one use case where badly deployed whitelists would be an issue: when you are migrating code from one epoch to another, and you are not doing it by invoking rustfix (despite rustfix being required to work with almost all code), it would show up as error. This use case can very easily be fixed though, simply by extending the whitelist in the old epoch as well.

scottmcm · 2017-09-14T19:18:18Z

I agree it's rare, but I don't think it deserves to be blocking. I'd be tempted to use r#catch in a Stack Overflow answer even in the 2015 epoch, for example. And targeting the preview epoch on nightly would want to be able to use r#throw before the keyword was added to the whitelist, if an RFC is accepted.

I do agree that a "unnecessary raw identifier" warning or clippy lint makes sense.

egilburg · 2017-09-14T20:50:32Z

Backslashes could connote escaping identifiers, like \ident, perhaps surrounded like \ident, {ident}, etc. However, the infix RFC #1579 currently seems to be leaning towards \op syntax already.

It doesn't seem that like RFC has a lot of traction. Backslashes are intuitive as "escape" characters. I feel just \ident is also more ergonomic than \ident\.

Seeing a letter prefix like r# seems to imply more like literal casting. E.g. s"foo" as hypothetical shorthand of "foo".to_string()

cuviper · 2017-09-14T21:03:12Z

@egilburg

Seeing a letter prefix like r# seems to imply more like literal casting.

It's meant to seem more like raw strings, e.g. r#foo is equivalent to foo, just like r"foo" and r#"foo"# are equivalent to "foo". And such raw strings already exist, unlike your hypothetical, but I do take the point that this wasn't intuitive to you.

petrochenkov · 2017-09-14T21:21:13Z

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

petrochenkov · 2017-09-14T21:26:03Z

There is also a minor technical issue with raw identifiers - some logic in the compiler relies on keywords being unusable as item names.
For example, it would be pretty unfortunate if you could create a type named Self, self or super. Maybe there are other cases, but I can't recall them right away.

est31 · 2017-09-14T21:28:24Z

@petrochenkov 's argument that standard library additions mean a similar amount of breakage has convinced me that this feature is not required. I think its better off to just simply change the identifiers to not use keywords again, maybe forcing an API bump.

burdges · 2017-09-14T21:39:40Z

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;, so long as the new keywords does not appear in use statements.

cuviper · 2017-09-14T22:05:57Z

@petrochenkov

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

AFAICS, catch is still explicitly mentioned as a motivator in the epochs RFC, along with the general desire for new keywords. If you think that there are reasonable rules for adding keywords without breaking epoch interoperability, then shouldn't that be spelled out in that RFC? (I confess I stopped reading that discussion a while ago though.)

@burdges

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;

That's ok for free items, but you can't import associated items like methods this way. Maybe that can still use a UFCS form -- in the baseball example, you'd write Player::catch(&mut player, ball). I don't think there's any such workaround for struct fields though.

If new keywords are always considered identifiers in the context of paths (foo::catch) or fields/methods (foo.catch), then perhaps use-renaming can take care of the rest. I'm not sure.

burdges · 2017-09-14T22:40:28Z

We're only worried about catch, dyn, and default right now, yes? And default must stay contextual anyways. We cannot add keywords forever regardless, not without driving away users.

I think perhaps the best solution might be prefixing each usage by an attribute #[epoch(...)], so typically #[epoch(...)] use old_crate::dyn as old_crate_dyn;

I doubt struct fields would be too problematic in practice, but methods could maybe be renamed with local inherent impls for traits:

impl<T: Player> T {
    fn old_catch(...) {  #[epoch(...)] <T as Player>::catch(...)  }
}

I suppose use syntax could maybe rename struct fields and methods if push really came to shove, but the attribute can handle them directly if that ever happens.

withoutboats · 2017-09-14T22:52:20Z

I could see limiting this to only reserved words, but limiting to only those reserved words which were introduced in an epoch seems unnecessary & potentially confusing for users who encounter this feature and don't know when each keyword was introduced. In general, we have taken a very free hand with the syntax and use lints, social conventions and rustfmt to keep everyone on the same page, and I don't see a reason to do things differently here.

This seems like a straightforward solution to a basic problem to me.

kennytm · 2017-09-15T03:41:58Z

One more alternative: C# allows bare Unicode escapes as part of identifier. (Very ugly, not recommending it, but still an alternative.)

class Class1
{
    static void M() {
        cl\u0061ss.st\u0061tic(true);
    }
}

(This "feature" is probably inspired by Java, but you can't define a keyword-identifier like this in Java.)

eddyb · 2017-09-16T06:02:34Z

Not necessarily an alternative, but Dart uses #ident (but also e.g. #+, to refer to operator+).

cuviper · 2017-09-17T22:32:29Z

OK, I noted Dart, but it looks like #ident would break macros-1.0 too.

est31 · 2017-09-17T22:49:57Z

@cuviper not just that I think people also wonder whether to use them in macros 2.0 for escaping hygiene.

eddyb · 2017-09-18T03:58:29Z

@cuviper Hmm, so these are the official docs - but they don't mention # used with operators.
Anyway, I know # wouldn't work for rust, but as I mentioned on the forums, r#+::r#+ could be a strange and interesting replacement for Add::add (not entirely serious suggestion).

est31 · 2017-09-19T19:04:30Z

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17. I'm sure like trigraphs this feature will be used more by people who want to write confusing code than for its actually intended purpose... Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

cuviper · 2017-09-19T19:32:24Z

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17.

Come on, r#ident is not anywhere near as obfuscating as trigraphs!

Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

The RFC explicitly recommends using alternatives like krate when possible.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

I don't see how rustfix is relevant. The point is to have compatibility using older APIs that may not get updated, for whatever reason. Maybe said crate just doesn't want to make a breaking change to avoid the new keyword, maybe the maintainer is on holiday, etc.

This is just a means towards keeping Rust's overall compatibility goals.

scottmcm · 2017-09-19T20:15:58Z

This proposal reminds me of C/C++ trigraphs

This doesn't remind me of trigraphs in the slightest. Those are there for character sets without symbols used by the language, or for people who cannot type them. I agree we don't have that need.

Instead it reminds me of @class in C#, since different .Net languages can have different sets of keywords. Sure, people are discouraged from using certain things, but if you need to use them, you need to use them. And sometimes it leads to nice libraries, like how in Razor one can set HTML attributes with syntax like

new { style="max-width: 66ex", @class = "textcontent" }

It could tell them to use klass, but it's just as easy to tell them to use @class, and using klass for class there would prevent people from being able to set a klass attribute if they so wanted.

SimonSapin · 2018-02-10T22:30:12Z

The RFC as proposed does not change which characters can be used in an identifier. It only allows having identifiers that would otherwise be keywords. I’m not for or against this proposal.

I would be opposed to raw identifier allowing arbitrary characters. CSS does this, and it’s just nonsense. For example you can have a CSS custom property whose name is literally the ASCII space.

Serde already has #[serde(rename = "foo")] for name that are not valid Rust identifiers. For FFI we have #[link_name = "foo"].

rfcbot · 2018-02-14T22:15:54Z

🔔 This is now entering its final comment period, as per the review above. 🔔

cuviper · 2018-02-15T00:03:41Z

I didn't get around to adding an alternative about just renaming/aliasing. Is that still wanted?
(Personally, I feel that approach is too limited to really address the issue.)

burdges · 2018-02-15T01:07:35Z

Is the macro like syntax ident!("name") unworkable for parser or hygiene reasons?

cuviper · 2018-02-15T01:14:24Z

@burdges AFAIK macros don't work in ident positions, which is why concat_idents! can't actually do much.

bstrie · 2018-02-21T10:07:43Z

Re: syntax, just go with \foo. The symmetry with escaping is obvious and avoids the unjustified construction of r#foo (which, additionally, nobody here seems to be fond of even the slightest bit). The only objection given in the text is that a different RFC--one that will almost certainly never be accepted--had considered possibly using that syntax. This RFC is profoundly more important than that one (though FFI ought to be the primary motivation, rather than epoch breakage).

Re: extending this feature to putting arbitrary Unicode in identifiers: don't. That's a subject for its own RFC and its own bikeshed. Be maximally conservative here.

est31 · 2018-02-21T13:48:45Z

\foo is not ugly enough. This needs to be maximally ugly and minimally useful in order to be a strong enough deterrent. Just go with r#foo as well as a lint that checks for usage of the feature outside of a whitelist of recently introduced or to be introduced keywords.

bstrie · 2018-02-21T19:28:12Z

@est31 Though there do exist features that ought to be made syntactically ugly in order to discourage their use, this isn't one of them. There is nothing dangerous whatsoever about this feature, and nothing useful about it that risks overuse (or any use at all) except in unfortunate circumstances that will require users to use it. Any argument that people will deliberately use this to obfuscate their code is even more damning of r#foo, because that is more obfuscatory than \foo. Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility. And let's not give people any more reason than necessary to look at a random unfamiliar piece of Rust code and wonder, "what the hell could this possibly mean?".

Ixrec · 2018-02-21T19:40:37Z

Maybe this is just me, but if I had no idea Rust had a raw identifiers feature, and I saw r#foo or \foo in the wild, I'd probably be able to guess what r# was doing while \ I wouldn't even dare to guess. The "symmetry with escaping" is not obvious for me, because in every other language I know the very notion of "escaping" is unique to string/regex literals. When I see \foo my first thought is of Haskell's lambda syntax (and I've hardly ever used Haskell). On the other hand, any "r and a sigil" syntax immediately reminds me of things like C++'s raw string literals, which is "raw" in the same sense that raw identifiers are raw.

I don't object to the notion that r# is "ugly", but I do object to the notion that it's (in any non-purely-subjective sense) more obfuscating or more of a penalty than \ is. For me it's quite the opposite.

petrochenkov · 2018-02-21T19:44:39Z

If this feature has to happen, I'd rather use r#ident# that is fully symmetrical with raw strings and potentially extensible (to $ in identifiers, for example, or something else).

est31 · 2018-02-21T22:15:16Z

Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility.

Have you even read the epochs RFC? Code under the old epoch will always compile. If you switch epochs, this can already be seen by some as a semver-breaking change as most likely you are switching the minimum supported rustc version (it disrupts anyone stuck on an old compiler, so it is a breaking change!). So do it properly and just replace all the idents you have with proper new names for them.

aturon · 2018-02-21T22:19:14Z

@est31

Have you even read the epochs RFC?

Please tone it down.

While @bstrie is incorrect about the detail you mention, the rest of his post I think presents a fine argument for the proposed syntax (or something close to it).

burdges · 2018-02-21T23:02:58Z

I think \foo is problematic because \ has no similar meaning in other languages. Instead \ gets used for lambda expressions, set difference, left actions, shorter escapes, etc. Rust might want it for infix symbols or whatever. There is a vastly weaker but similar argument against using #, but afaik no such argument against using $ since Rust is not an ML style language.

I still personally like both use whatever::"ident" as my_ident; and ident!("name"), but those only work for importing strange symbols, not exporting them, which maybe people wish to do. We could have both ident!("ident") for rarely used imports and def_ident!(my_ident,"ident"); for both frequently used imports and exports though.

Aside from r$ident$ or r#ident#, one could even imagine $my_ident where previously const my_ident : &'static str = "ident";

Also, we'll need to use these identifiers anyways, right? Assuming so, a use wherever::"ident" as my_ident; makes good sense. Is it too strange to use self::"ident" as my_ident; for exporting?

eddyb · 2018-02-23T23:10:59Z

IMO we should be using \ for escaping tokens in macros, at the very least.
E.g. \$$name:ident matching $foo and $($x:ident)\++ using + as a separator.

Centril · 2018-02-24T03:35:04Z

Linking #1579 with respect to using a \dot b for infix method syntax vs. using it for raw identifiers.

rfcbot · 2018-02-24T22:18:07Z

The final comment period is now complete.

twmb · 2018-02-25T03:11:15Z

If there isn't, can there be a page for why words are reserved, and why these reserved words cannot be contextual?

Centril · 2018-02-27T18:54:43Z

Huzzah! The RFC is merged!

Tracking issue: rust-lang/rust#48589

ssokolow · 2018-02-28T05:19:27Z

I don't know how I missed both the initial announcement of this and the FCP call, but I just have to share a perspective on the intuitiveness of \ident vs. r#ident that I didn't see from anyone else.

As someone whose experience is more or less exclusively in imperative languages, whenever I see \ being used outside a literal DOS/Windows path, I have a strong expectation that it is a fixed-width token, consisting of the slash and the character which follows... so when I see \ident, I can't help but expect it to produce error: unknown character escape: i.

By contrast, r#ident gives me the impression that # is being used as some kind of namespacing operator similar to ::... which is essentially correct. You're conceptually unifying reserved words and identifiers and then restricting to the identifier side to override the default precedence... the only difference is that, instead of precedence being within the resoution of "an identifier node in the AST" level, you're operating more at the level of translating tokens into AST nodes.

RFC: Raw Identifiers

3f9a0f5

Add a raw identifier format `r#ident`, so crates written in future language epochs/versions can still use an older API that overlaps with new keywords.

scottmcm added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 14, 2017

cuviper added 2 commits September 15, 2017 12:45

mention br#keyword possibility

afcb41e

add a couple more references to other languages

b602bf0

note Dart's #ident

289c6f8

Use a better Dart link

935feba

aturon removed the I-nominated label Feb 8, 2018

rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Feb 14, 2018

Centril mentioned this pull request Feb 27, 2018

Tracking issue for RFC 2151, Raw Identifiers rust-lang/rust#48589

Closed

7 tasks

RFC 2151

d049d6c

Centril merged commit 0574612 into rust-lang:master Feb 27, 2018

Centril mentioned this pull request Jul 13, 2018

Hygiene opt-out (escaping) for declarative macros 2.0 #2498

Closed

Centril added A-syntax Syntax related proposals & ideas A-resolve Proposals relating to name resolution. labels Nov 23, 2018

cuviper mentioned this pull request May 2, 2020

Add the experimental_keywords ability #2919

Closed

ehuss mentioned this pull request Jun 4, 2024

keyword-idents-2024 unable to migrate loop labels or lifetimes rust-lang/rust#125986

Closed

RFC: Raw Identifiers #2151

RFC: Raw Identifiers #2151

Conversation

cuviper commented Sep 14, 2017 • edited by Centril Loading

est31 commented Sep 14, 2017

SimonSapin commented Sep 14, 2017

cuviper commented Sep 14, 2017

cuviper commented Sep 14, 2017

scottmcm commented Sep 14, 2017

est31 commented Sep 14, 2017

scottmcm commented Sep 14, 2017

egilburg commented Sep 14, 2017 • edited Loading

cuviper commented Sep 14, 2017

petrochenkov commented Sep 14, 2017

petrochenkov commented Sep 14, 2017 • edited Loading

est31 commented Sep 14, 2017

burdges commented Sep 14, 2017 • edited Loading

cuviper commented Sep 14, 2017

burdges commented Sep 14, 2017

withoutboats commented Sep 14, 2017

kennytm commented Sep 15, 2017 • edited Loading

eddyb commented Sep 16, 2017

cuviper commented Sep 17, 2017

est31 commented Sep 17, 2017

eddyb commented Sep 18, 2017

est31 commented Sep 19, 2017

cuviper commented Sep 19, 2017

scottmcm commented Sep 19, 2017

SimonSapin commented Feb 10, 2018

rfcbot commented Feb 14, 2018

cuviper commented Feb 15, 2018

burdges commented Feb 15, 2018

cuviper commented Feb 15, 2018

bstrie commented Feb 21, 2018

est31 commented Feb 21, 2018

bstrie commented Feb 21, 2018

Ixrec commented Feb 21, 2018

petrochenkov commented Feb 21, 2018 • edited Loading

est31 commented Feb 21, 2018

aturon commented Feb 21, 2018

burdges commented Feb 21, 2018

eddyb commented Feb 23, 2018

Centril commented Feb 24, 2018

rfcbot commented Feb 24, 2018

twmb commented Feb 25, 2018

Centril commented Feb 27, 2018

ssokolow commented Feb 28, 2018 • edited Loading

cuviper commented Sep 14, 2017 •

edited by Centril

Loading

egilburg commented Sep 14, 2017 •

edited

Loading

petrochenkov commented Sep 14, 2017 •

edited

Loading

burdges commented Sep 14, 2017 •

edited

Loading

kennytm commented Sep 15, 2017 •

edited

Loading

petrochenkov commented Feb 21, 2018 •

edited

Loading

ssokolow commented Feb 28, 2018 •

edited

Loading