Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Raw Identifiers #2151

Merged
merged 6 commits into from
Feb 27, 2018
Merged

RFC: Raw Identifiers #2151

merged 6 commits into from
Feb 27, 2018

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Sep 14, 2017

Add a raw identifier format r#ident, so crates written in future
language epochs/versions can still use an older API that overlaps with
new keywords.

(rendered)

Add a raw identifier format `r#ident`, so crates written in future
language epochs/versions can still use an older API that overlaps with
new keywords.
@est31
Copy link
Member

est31 commented Sep 14, 2017

Generally I'm in support of the RFC. However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

There might also be a need for raw keywords in the other direction, e.g. so the
older epoch can still use the new catch functionality somehow. I think this
particular case is already served well enough by do catch { ... }, if we
choose to stabilize it that way.

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

@SimonSapin
Copy link
Contributor

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

@cuviper
Copy link
Member Author

cuviper commented Sep 14, 2017

@est31

However I think that the feature should only be available through a whitelist, where its actually useful. So only enable it for the newly introduced keywords like catch. This means it can't be used in general.

I prefer generality myself. I could see having a lint for "unnecessarily raw identifier", but I see no reason to forbid this.

@SimonSapin

To clarify: this allows using as an identifier what would otherwise be an identifier, but does not change the set of characters allows in identifiers, right? If so, that sounds fine.

Correct. Some of the discussed alternatives could allow extended characters, but that's not what I'm proposing. If some people do want extended characters, then we might want to choose a syntax that would allow that, even if we don't extend it initially.

@cuviper
Copy link
Member Author

cuviper commented Sep 14, 2017

@est31

There might also be a need for raw keywords in the other direction, [...]

In fact in the VLA RFC we were wondering how to get [V; dyn N] syntax working in the current epoch. So this is relevant beyond just catch.

I dismissed the br# alternative as being unnecessary, but maybe it would work for this?
i.e. r#ident and br#keyword

@scottmcm scottmcm added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 14, 2017
@scottmcm
Copy link
Member

I like not extending the identifier alphabet here.

the feature should only be available through a whitelist, where its actually useful

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions. I want to be able to update my code to avoid a new-epoch keyword while still being able to compile it with the current stable that doesn't know about that keyword yet.

@est31
Copy link
Member

est31 commented Sep 14, 2017

I worry that such a restriction would make it harder to write code that compiles on multiple compiler versions.

Epochs work differently. Any future compiler version will support the epoch of your code, that's what the epochs RFC guarantees. So if you say that your codebase uses the old epoch, you can freely use the identifier, and you are compatible with all future compilers. This will be even enforced in macros (macros will get epoch hygiene)! If you say that your codebase uses the new epoch, your crate can obviously only be compiled by compiler versions that support that epoch, this has nothing to do with the whitelist. But if you opt in to the new epoch, the whitelisted keywords will be available to you.

The only thing that a whitelist will make harder is wanting to be able to "support" multiple epochs, but this isn't really a legitimate real-world case IMO because your code will always be in exactly one epoch as you must explictly specify it (except for the 2015 epoch which is the default).

There is one use case where badly deployed whitelists would be an issue: when you are migrating code from one epoch to another, and you are not doing it by invoking rustfix (despite rustfix being required to work with almost all code), it would show up as error. This use case can very easily be fixed though, simply by extending the whitelist in the old epoch as well.

@scottmcm
Copy link
Member

I agree it's rare, but I don't think it deserves to be blocking. I'd be tempted to use r#catch in a Stack Overflow answer even in the 2015 epoch, for example. And targeting the preview epoch on nightly would want to be able to use r#throw before the keyword was added to the whitelist, if an RFC is accepted.

I do agree that a "unnecessary raw identifier" warning or clippy lint makes sense.

@egilburg
Copy link

egilburg commented Sep 14, 2017

Backslashes could connote escaping identifiers, like \ident, perhaps surrounded like \ident, {ident}, etc. However, the infix RFC #1579 currently seems to be leaning towards \op syntax already.

It doesn't seem that like RFC has a lot of traction. Backslashes are intuitive as "escape" characters. I feel just \ident is also more ergonomic than \ident\.

Seeing a letter prefix like r# seems to imply more like literal casting. E.g. s"foo" as hypothetical shorthand of "foo".to_string()

@cuviper
Copy link
Member Author

cuviper commented Sep 14, 2017

@egilburg

Seeing a letter prefix like r# seems to imply more like literal casting.

It's meant to seem more like raw strings, e.g. r#foo is equivalent to foo, just like r"foo" and r#"foo"# are equivalent to "foo". And such raw strings already exist, unlike your hypothetical, but I do take the point that this wasn't intuitive to you.

@petrochenkov
Copy link
Contributor

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

@petrochenkov
Copy link
Contributor

petrochenkov commented Sep 14, 2017

There is also a minor technical issue with raw identifiers - some logic in the compiler relies on keywords being unusable as item names.
For example, it would be pretty unfortunate if you could create a type named Self, self or super. Maybe there are other cases, but I can't recall them right away.

@est31
Copy link
Member

est31 commented Sep 14, 2017

@petrochenkov 's argument that standard library additions mean a similar amount of breakage has convinced me that this feature is not required. I think its better off to just simply change the identifiers to not use keywords again, maybe forcing an API bump.

@burdges
Copy link

burdges commented Sep 14, 2017

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;, so long as the new keywords does not appear in use statements.

@cuviper
Copy link
Member Author

cuviper commented Sep 14, 2017

@petrochenkov

This RFC tries to solve a problem that doesn't exist and won't exist is epochs are done in responsible way.

catch specifically is a bad motivating example because catch as a context-dependent identifier has exactly zero breakage in practice, i.e. infinitely less breakage than routinely done by standard library additions.

AFAICS, catch is still explicitly mentioned as a motivator in the epochs RFC, along with the general desire for new keywords. If you think that there are reasonable rules for adding keywords without breaking epoch interoperability, then shouldn't that be spelled out in that RFC? (I confess I stopped reading that discussion a while ago though.)

@burdges

You'd import this old API via use statements, right? I'd think use statements could address this, like use old_crate::dyn as old_crate_dyn;

That's ok for free items, but you can't import associated items like methods this way. Maybe that can still use a UFCS form -- in the baseball example, you'd write Player::catch(&mut player, ball). I don't think there's any such workaround for struct fields though.

If new keywords are always considered identifiers in the context of paths (foo::catch) or fields/methods (foo.catch), then perhaps use-renaming can take care of the rest. I'm not sure.

@burdges
Copy link

burdges commented Sep 14, 2017

We're only worried about catch, dyn, and default right now, yes? And default must stay contextual anyways. We cannot add keywords forever regardless, not without driving away users.

I think perhaps the best solution might be prefixing each usage by an attribute #[epoch(...)], so typically #[epoch(...)] use old_crate::dyn as old_crate_dyn;

I doubt struct fields would be too problematic in practice, but methods could maybe be renamed with local inherent impls for traits:

impl<T: Player> T {
    fn old_catch(...) {  #[epoch(...)] <T as Player>::catch(...)  }
}

I suppose use syntax could maybe rename struct fields and methods if push really came to shove, but the attribute can handle them directly if that ever happens.

@withoutboats
Copy link
Contributor

I could see limiting this to only reserved words, but limiting to only those reserved words which were introduced in an epoch seems unnecessary & potentially confusing for users who encounter this feature and don't know when each keyword was introduced. In general, we have taken a very free hand with the syntax and use lints, social conventions and rustfmt to keep everyone on the same page, and I don't see a reason to do things differently here.

This seems like a straightforward solution to a basic problem to me.

@kennytm
Copy link
Member

kennytm commented Sep 15, 2017

One more alternative: C# allows bare Unicode escapes as part of identifier. (Very ugly, not recommending it, but still an alternative.)

class Class1
{
    static void M() {
        cl\u0061ss.st\u0061tic(true);
    }
}

(This "feature" is probably inspired by Java, but you can't define a keyword-identifier like this in Java.)

@eddyb
Copy link
Member

eddyb commented Sep 16, 2017

Not necessarily an alternative, but Dart uses #ident (but also e.g. #+, to refer to operator+).

@cuviper
Copy link
Member Author

cuviper commented Sep 17, 2017

OK, I noted Dart, but it looks like #ident would break macros-1.0 too.

@est31
Copy link
Member

est31 commented Sep 17, 2017

@cuviper not just that I think people also wonder whether to use them in macros 2.0 for escaping hygiene.

@eddyb
Copy link
Member

eddyb commented Sep 18, 2017

@cuviper Hmm, so these are the official docs - but they don't mention # used with operators.
Anyway, I know # wouldn't work for rust, but as I mentioned on the forums, r#+::r#+ could be a strange and interesting replacement for Add::add (not entirely serious suggestion).

@est31
Copy link
Member

est31 commented Sep 19, 2017

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17. I'm sure like trigraphs this feature will be used more by people who want to write confusing code than for its actually intended purpose... Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

@cuviper
Copy link
Member Author

cuviper commented Sep 19, 2017

This proposal reminds me of C/C++ trigraphs which are on their way out with C++17.

Come on, r#ident is not anywhere near as obfuscating as trigraphs!

Also, I don't think that it will be of any good if cargo and the rust compiler now switch to using r#crate everywhere instead of krate.

The RFC explicitly recommends using alternatives like krate when possible.

Do you really have to modify the language and add a whole new way of referring to identifiers just because you are scared of implementing analysis of which identifiers are still free in rustfix?

I don't see how rustfix is relevant. The point is to have compatibility using older APIs that may not get updated, for whatever reason. Maybe said crate just doesn't want to make a breaking change to avoid the new keyword, maybe the maintainer is on holiday, etc.

This is just a means towards keeping Rust's overall compatibility goals.

@scottmcm
Copy link
Member

This proposal reminds me of C/C++ trigraphs

This doesn't remind me of trigraphs in the slightest. Those are there for character sets without symbols used by the language, or for people who cannot type them. I agree we don't have that need.

Instead it reminds me of @class in C#, since different .Net languages can have different sets of keywords. Sure, people are discouraged from using certain things, but if you need to use them, you need to use them. And sometimes it leads to nice libraries, like how in Razor one can set HTML attributes with syntax like

new { style="max-width: 66ex", @class = "textcontent" }

It could tell them to use klass, but it's just as easy to tell them to use @class, and using klass for class there would prevent people from being able to set a klass attribute if they so wanted.

@aturon aturon removed the I-nominated label Feb 8, 2018
@SimonSapin
Copy link
Contributor

The RFC as proposed does not change which characters can be used in an identifier. It only allows having identifiers that would otherwise be keywords. I’m not for or against this proposal.

I would be opposed to raw identifier allowing arbitrary characters. CSS does this, and it’s just nonsense. For example you can have a CSS custom property whose name is literally the ASCII space.

Serde already has #[serde(rename = "foo")] for name that are not valid Rust identifiers. For FFI we have #[link_name = "foo"].

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Feb 14, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 14, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@cuviper
Copy link
Member Author

cuviper commented Feb 15, 2018

I didn't get around to adding an alternative about just renaming/aliasing. Is that still wanted?
(Personally, I feel that approach is too limited to really address the issue.)

@burdges
Copy link

burdges commented Feb 15, 2018

Is the macro like syntax ident!("name") unworkable for parser or hygiene reasons?

@cuviper
Copy link
Member Author

cuviper commented Feb 15, 2018

@burdges AFAIK macros don't work in ident positions, which is why concat_idents! can't actually do much.

@bstrie
Copy link
Contributor

bstrie commented Feb 21, 2018

Re: syntax, just go with \foo. The symmetry with escaping is obvious and avoids the unjustified construction of r#foo (which, additionally, nobody here seems to be fond of even the slightest bit). The only objection given in the text is that a different RFC--one that will almost certainly never be accepted--had considered possibly using that syntax. This RFC is profoundly more important than that one (though FFI ought to be the primary motivation, rather than epoch breakage).

Re: extending this feature to putting arbitrary Unicode in identifiers: don't. That's a subject for its own RFC and its own bikeshed. Be maximally conservative here.

@est31
Copy link
Member

est31 commented Feb 21, 2018

\foo is not ugly enough. This needs to be maximally ugly and minimally useful in order to be a strong enough deterrent. Just go with r#foo as well as a lint that checks for usage of the feature outside of a whitelist of recently introduced or to be introduced keywords.

@bstrie
Copy link
Contributor

bstrie commented Feb 21, 2018

@est31 Though there do exist features that ought to be made syntactically ugly in order to discourage their use, this isn't one of them. There is nothing dangerous whatsoever about this feature, and nothing useful about it that risks overuse (or any use at all) except in unfortunate circumstances that will require users to use it. Any argument that people will deliberately use this to obfuscate their code is even more damning of r#foo, because that is more obfuscatory than \foo. Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility. And let's not give people any more reason than necessary to look at a random unfamiliar piece of Rust code and wonder, "what the hell could this possibly mean?".

@Ixrec
Copy link
Contributor

Ixrec commented Feb 21, 2018

Maybe this is just me, but if I had no idea Rust had a raw identifiers feature, and I saw r#foo or \foo in the wild, I'd probably be able to guess what r# was doing while \ I wouldn't even dare to guess. The "symmetry with escaping" is not obvious for me, because in every other language I know the very notion of "escaping" is unique to string/regex literals. When I see \foo my first thought is of Haskell's lambda syntax (and I've hardly ever used Haskell). On the other hand, any "r and a sigil" syntax immediately reminds me of things like C++'s raw string literals, which is "raw" in the same sense that raw identifiers are raw.

I don't object to the notion that r# is "ugly", but I do object to the notion that it's (in any non-purely-subjective sense) more obfuscating or more of a penalty than \ is. For me it's quite the opposite.

@petrochenkov
Copy link
Contributor

petrochenkov commented Feb 21, 2018

If this feature has to happen, I'd rather use r#ident# that is fully symmetrical with raw strings and potentially extensible (to $ in identifiers, for example, or something else).

@est31
Copy link
Member

est31 commented Feb 21, 2018

Let's not penalize people who are already being penalized by being forced to annotate their code for the sake of forwards compatibility.

Have you even read the epochs RFC? Code under the old epoch will always compile. If you switch epochs, this can already be seen by some as a semver-breaking change as most likely you are switching the minimum supported rustc version (it disrupts anyone stuck on an old compiler, so it is a breaking change!). So do it properly and just replace all the idents you have with proper new names for them.

@aturon
Copy link
Member

aturon commented Feb 21, 2018

@est31

Have you even read the epochs RFC?

Please tone it down.

While @bstrie is incorrect about the detail you mention, the rest of his post I think presents a fine argument for the proposed syntax (or something close to it).

@burdges
Copy link

burdges commented Feb 21, 2018

I think \foo is problematic because \ has no similar meaning in other languages. Instead \ gets used for lambda expressions, set difference, left actions, shorter escapes, etc. Rust might want it for infix symbols or whatever. There is a vastly weaker but similar argument against using #, but afaik no such argument against using $ since Rust is not an ML style language.

I still personally like both use whatever::"ident" as my_ident; and ident!("name"), but those only work for importing strange symbols, not exporting them, which maybe people wish to do. We could have both ident!("ident") for rarely used imports and def_ident!(my_ident,"ident"); for both frequently used imports and exports though.

Aside from r$ident$ or r#ident#, one could even imagine $my_ident where previously const my_ident : &'static str = "ident";

Also, we'll need to use these identifiers anyways, right? Assuming so, a use wherever::"ident" as my_ident; makes good sense. Is it too strange to use self::"ident" as my_ident; for exporting?

@eddyb
Copy link
Member

eddyb commented Feb 23, 2018

IMO we should be using \ for escaping tokens in macros, at the very least.
E.g. \$$name:ident matching $foo and $($x:ident)\++ using + as a separator.

@Centril
Copy link
Contributor

Centril commented Feb 24, 2018

Linking #1579 with respect to using a \dot b for infix method syntax vs. using it for raw identifiers.

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 24, 2018

The final comment period is now complete.

@twmb
Copy link

twmb commented Feb 25, 2018

If there isn't, can there be a page for why words are reserved, and why these reserved words cannot be contextual?

@Centril Centril merged commit 0574612 into rust-lang:master Feb 27, 2018
@Centril
Copy link
Contributor

Centril commented Feb 27, 2018

Huzzah! The RFC is merged!

Tracking issue: rust-lang/rust#48589

@ssokolow
Copy link

ssokolow commented Feb 28, 2018

I don't know how I missed both the initial announcement of this and the FCP call, but I just have to share a perspective on the intuitiveness of \ident vs. r#ident that I didn't see from anyone else.

As someone whose experience is more or less exclusively in imperative languages, whenever I see \ being used outside a literal DOS/Windows path, I have a strong expectation that it is a fixed-width token, consisting of the slash and the character which follows... so when I see \ident, I can't help but expect it to produce error: unknown character escape: i.

By contrast, r#ident gives me the impression that # is being used as some kind of namespacing operator similar to ::... which is essentially correct. You're conceptually unifying reserved words and identifiers and then restricting to the identifier side to override the default precedence... the only difference is that, instead of precedence being within the resoution of "an identifier node in the AST" level, you're operating more at the level of translating tokens into AST nodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-resolve Proposals relating to name resolution. A-syntax Syntax related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.