Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review proc macro API 1.2 #50473

Merged
merged 7 commits into from
May 16, 2018
Merged

Review proc macro API 1.2 #50473

merged 7 commits into from
May 16, 2018

Conversation

petrochenkov
Copy link
Contributor

@petrochenkov petrochenkov commented May 6, 2018

cc #38356

Summary of applied changes:

  • Documentation for proc macro API 1.2 is expanded.
  • Renamed APIs: Term -> Ident, TokenTree::Term -> TokenTree::Ident, Op -> Punct, TokenTree::Op -> TokenTree::Punct, Op::op -> Punct::as_char.
  • Removed APIs: Ident::as_str, use Display impl for Ident instead.
  • New APIs (not stabilized in 1.2): Ident::new_raw for creating a raw identifier (I'm not sure new_x it's a very idiomatic name though).
  • Runtime changes:
    • Punct::new now ensures that the input char is a valid punctuation character in Rust.
    • Ident::new ensures that the input str is a valid identifier in Rust.
    • Lifetimes in proc macros are now represented as two joint tokens - Punct('\'', Spacing::Joint) and Ident("lifetime_name_without_quote") similarly to multi-character operators.
  • Stabilized APIs: None yet.

A bit of motivation for renaming (although it was already stated in the review comments):

  • With my compiler frontend glasses on Ident is the single most appropriate name for this thing, especially if we are doing input validation on construction. TokenTree::Ident effectively wraps token::Ident or ast::Ident + is_raw, its meaning is "identifier" and it's already named ident in declarative macros.
  • Regarding Punct, the motivation is that Op is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named it Punct (first proposed in the original RFC, then by @SimonSapin) , together with input validation it's now a subset of ASCII punctuation character category (u8::is_ascii_punctuation).

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 6, 2018
@petrochenkov
Copy link
Contributor Author

@dtolnay @alexcrichton @eddyb @nrc
Could you go through the comments marked with "REVIEW" and tell what you think?

@bors
Copy link
Contributor

bors commented May 6, 2018

☔ The latest upstream changes (presumably #50453) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -207,7 +236,7 @@ pub mod token_stream {

/// `quote!(..)` accepts arbitrary tokens and expands into a `TokenStream` describing the input.
/// For example, `quote!(a + b)` will produce a expression, that, when evaluated, constructs
/// the `TokenStream` `[Word("a"), Op('+', Alone), Word("b")]`.
/// the `TokenStream` `[Word("a"), Punct('+', Alone), Word("b")]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Word is now Term, right?

/// An identifier or lifetime identifier.
Ident(Ident),
/// A single punctuation character (`+`, `,`, `$`, etc.).
Punct(Punct),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I didn't expect these changes. I should leave most of this up to @alexcrichton (I do not have a strong preference either way).

'^', '&', '|', '@', '.', ',', ';', ':', '#', '$', '?'];
if !LEGAL_CHARS.contains(&ch) {
panic!("unsupported character `{:?}`", ch)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation seems really nice, I think previously it's been deferred until conversion to the compiler-internal forms is needed.

#[derive(Copy, Clone, Debug)]
#[unstable(feature = "proc_macro", issue = "38356")]
pub struct Term {
pub struct Ident {
// REVIEW(INTERNAL) Symbol + Span is actually `ast::Ident`! We can use it here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that this would not cause issues with #49219.

@@ -1199,7 +1300,7 @@ impl TokenTree {
'#' => Pound,
'$' => Dollar,
'?' => Question,
_ => panic!("unsupported character {}", op),
_ => panic!("unsupported character {}", ch),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this become unreachable!() now? As only a finite set of characters can be used.

/// An identifier (`ident`) or lifetime identifier (`'ident`).
///
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields.
/// REVIEW Do we want to guarantee `Ident` to be `Copy`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do keep it Copy, that means we can never soundly offer an &self -> &str API. See #38356 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't mind if we remove the Copy here.

@shepmaster shepmaster added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 6, 2018
@petrochenkov petrochenkov changed the title [WIP] Review proc macro API 1.2 Review proc macro API 1.2 May 7, 2018
@petrochenkov petrochenkov added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 7, 2018
@alexcrichton
Copy link
Member

All the changes here look great to me, thanks @petrochenkov! While we've already FCP'd for stability in #38356 I think the Ident name is probably better than Term and I'm personally ambivalent on Op or Punct, but am happy to go with Punct.

I'd prefer to land this all at once if possible to reduce churn in the ecosystem, so I'd be mostly in favor of going ahead and stabilizing everything in this PR (if you're ok with that). I don't mind doing the work myself for the APIs here and pushing those up (annotation-wise). Along those lines, want to remove as_str as well from this?

/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters
/// or characters not existing in the language.
///
/// REVIEW The function actually panics on any error and never returns `LexError`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this doesn't match the original implementation, when we first started we were sure to return an error I thought...

In any case I'd think it's a bug that this panics, it should definitely be returning an error (but we can fix that later)

/// Creates a token stream containing a single token tree.
///
/// REVIEW We don't generally have impls `From<T> for Collection<T>`, but I see why this exists
/// REVIEW from practical point of view.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is currently the main way to go from a TokenTree to a TokenStream, and I think we also accidentally stabilized it already so we're required to keep it.

@@ -227,6 +256,9 @@ pub fn quote_span(span: Span) -> TokenStream {
}

/// A region of source code, along with macro expansion information.
///
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields.
/// REVIEW Do we want to guarantee `Span` to be `Copy`? Yes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we're likely to always want Span to be Copy

TokenTree::Literal(ref tt) => tt.fmt(f),
}
}
}

/// REVIEW the impls below are kind of `From<T> for Option<T>`, not strictly necessary,
/// REVIEW but convenient. No harm, I guess. I'd actually like to see impls
/// REVIEW `From<Group/Ident/Punct/Literal> for TokenStream` to avoid stuttering like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect impls like these will grow and come to exist over time yeah, that's definitely the direction that libstd has taken.

/// forms of `Spacing` returned.
///
/// REVIEW ATTENTION: `Copy` impl on a struct with private fields.
/// REVIEW Do we want to guarantee `Punct` to be `Copy`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is less critical than Span to be copy, and wouldn't mind going either way here. Making this not-Copy is probably the most conservative thing to do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's go with no Copy.

/// which can be further configured with the `set_span` method below.
///
/// REVIEW Why we even use `char` here? There's no reason to use unicode here.
/// REVIEW I guess because it's more convenient to write `new('+')` than `new(b'+')`, that's ok.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's stick to char for maximal future flexibility.

///
/// REVIEW Again, there's no need for unicode here,
/// REVIEW except for maybe future compatibility in case Rust turns into APL,
/// REVIEW but if it's more convenient to use `char` then that's okay.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for future flexibility I think char's the way to go here

dtolnay
dtolnay previously requested changes May 7, 2018
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we decided against TokenTree::Ident was because the matcher $i:ident does not accept lifetimes. I would be opposed to calling it Ident here if it can hold lifetimes and $i:ident cannot.

@petrochenkov
Copy link
Contributor Author

petrochenkov commented May 7, 2018

@dtolnay
I'd be totally okay with accepting lifetimes in $i: ident (with or without dropping $l: lifetime) if we are ready to cement lifetimes as a "lexical class inside identifiers" as opposed to "' plus an identifier" (which proc macro API already does).
There's internals thread about this https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716/10. I don't think the original suggestion is a good idea anymore, but see https://internals.rust-lang.org/t/pre-rfc-splitting-lifetime-into-two-tokens/6716/11 in particular about the "lexical class inside identifiers" alternative.

The key questions here are probably:

  • Is it important to discern between identifiers and lifetimes in procedural macros? Why?
    • We want to minimize API surface. It's always possible to detect lifetimes using the first character.
  • Is it important to discern between identifiers and lifetimes in declarative macros? Why?
    • I'd personally want to minimize the language surface (and simplify compiler as well if we drop $l: lifetime). It's possible to detect lifetimes in macros left sides if we keep lifetime as a subset of ident (kinda like path is a subset of ty/expr/pat) or allow lifetime idents in macro "variables" ($'var).

@petrochenkov
Copy link
Contributor Author

@alexcrichton

I'd prefer to land this all at once if possible to reduce churn in the ecosystem, so I'd be mostly in favor of going ahead and stabilizing everything in this PR (if you're ok with that). I don't mind doing the work myself for the APIs here and pushing those up (annotation-wise). Along those lines, want to remove as_str as well from this?

Sure, I'll add stable annotations and remove as_str.

@eddyb
Copy link
Member

eddyb commented May 7, 2018

Is it important to discern between identifiers and lifetimes in declarative macros? Why?

Probably to parse anything looking like a list of generics, if you want to tell apart 'a and T.

@dtolnay
Copy link
Member

dtolnay commented May 7, 2018

It's possible to detect lifetimes in macros left sides if we keep lifetime as a subset of ident

I would need to play with this in nightly before I would feel comfortable committing to it. What I wouldn't want is:

  • We stabilize TokenTree::Ident under the expectation that we will make $:ident accept lifetimes.
  • We change $:ident in decl macros to accept lifetimes, while keeping $:lifetime.
  • After some use we discover this is worse this way than having $:ident and $:lifetime disjoint.
  • We revert $:ident to reject lifetimes.
  • Now proc macro and decl macro are inconsistent in an obnoxious way.

@dtolnay
Copy link
Member

dtolnay commented May 7, 2018

Probably to parse anything looking like a list of generics, if you want to tell apart 'a and T.

Hmm, can we imagine any macros that would need to handle lifetime parameters and type parameters differently? There is some appeal in being able to say the distinction never matters until much later in the compile process.

@dtolnay
Copy link
Member

dtolnay commented May 7, 2018

One place I can imagine running into trouble is if a macro somehow needs to handle optional lifetimes on a reference.

macro_rules! m {
    (& $($p:lifetime)? $i:ident) => {}
}

m!(&foo);
m!(&'a foo);
error: local ambiguity: multiple parsing options: built-in NTs lifetime ('p') or ident ('i').
 --> src/main.rs:4:5
  |
4 | m!(&foo);
  |     ^^^

@petrochenkov
Copy link
Contributor Author

@dtolnay

local ambiguity: multiple parsing options: built-in NTs lifetime ('p') or ident ('i').

This is a weird error. I never investigated how our macro matching works in detail, but I expected it to be modeled after regular expressions (greedy by default) and work in both cases, whether lifetime is a subset of ident or not.

extern crate regex;
use regex::Regex;

fn main() {
    let not_subset = Regex::new(r"a?b").unwrap();
    assert!(not_subset.is_match("ab"));
    assert!(not_subset.is_match("b"));
    let subset = Regex::new(r"a?\w").unwrap();
    assert!(subset.is_match("ab"));
    assert!(subset.is_match("aa"));
    assert!(subset.is_match("b"));
    assert!(subset.is_match("a"));
}

http://play.rust-lang.org/?gist=372d5a5f25219670dabc89a1dfb536ba&version=nightly&mode=debug

@petrochenkov
Copy link
Contributor Author

petrochenkov commented May 8, 2018

We have 6 weeks if we want to stabilize the API in 1.28.

@eddyb
Can the lifetime question get a slot in the next lang team meeting?

@petrochenkov petrochenkov changed the title Review and stabilize proc macro API 1.2 Review proc macro API 1.2 May 15, 2018
@petrochenkov
Copy link
Contributor Author

@bors r=alexcrichton

Stabilization commit for later cherry-picking: petrochenkov@3e71e49

@bors
Copy link
Contributor

bors commented May 15, 2018

📌 Commit dab8c0a has been approved by alexcrichton

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 15, 2018
@kennytm
Copy link
Member

kennytm commented May 16, 2018

@bors p=1

Unblocks #49219

@bors
Copy link
Contributor

bors commented May 16, 2018

⌛ Testing commit dab8c0a with merge 2a3f536...

bors added a commit that referenced this pull request May 16, 2018
Review proc macro API 1.2

cc #38356

Summary of applied changes:
- Documentation for proc macro API 1.2 is expanded.
- Renamed APIs: `Term` -> `Ident`, `TokenTree::Term` -> `TokenTree::Ident`, `Op` -> `Punct`, `TokenTree::Op` -> `TokenTree::Punct`, `Op::op` -> `Punct::as_char`.
- Removed APIs: `Ident::as_str`, use `Display` impl for `Ident` instead.
- New APIs (not stabilized in 1.2): `Ident::new_raw` for creating a raw identifier (I'm not sure `new_x` it's a very idiomatic name though).
- Runtime changes:
    - `Punct::new` now ensures that the input `char` is a valid punctuation character in Rust.
    - `Ident::new` ensures that the input `str` is a valid identifier in Rust.
    - Lifetimes in proc macros are now represented as two joint tokens - `Punct('\'', Spacing::Joint)` and `Ident("lifetime_name_without_quote")` similarly to multi-character operators.
- Stabilized APIs: None yet.

A bit of motivation for renaming (although it was already stated in the review comments):
- With my compiler frontend glasses on `Ident` is the single most appropriate name for this thing, *especially* if we are doing input validation on construction. `TokenTree::Ident` effectively wraps `token::Ident` or `ast::Ident + is_raw`, its meaning is "identifier" and it's already named `ident` in declarative macros.
- Regarding `Punct`, the motivation is that `Op` is actively misleading. The thing doesn't mean an operator, it's neither a subset of operators (there is non-operator punctuation in the language), nor superset (operators can be multicharacter while this thing is always a single character). So I named it `Punct` (first proposed in [the original RFC](rust-lang/rfcs#1566), then [by @SimonSapin](#38356 (comment))) , together with input validation it's now a subset of ASCII punctuation character category (`u8::is_ascii_punctuation`).
@bors
Copy link
Contributor

bors commented May 16, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 2a3f536 to master...

@bors bors merged commit dab8c0a into rust-lang:master May 16, 2018
alexcrichton added a commit to alexcrichton/rust that referenced this pull request May 21, 2018
This commit fixes an accidental regression from rust-lang#50473 where lifetime tokens
produced by procedural macros ended up getting lost in translation in the
compiler and not actually producing parseable code. The issue lies in the fact
that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for
gluing joint tokens together forgot to take this into account so the lifetime
inside of `Ident` was missing the leading tick!

The `glue` implementation here is updated to create a new `Symbol` in these
situations to manufacture a new `Ident` with a leading tick to ensure it parses
correctly.

Closes rust-lang#50942
kennytm added a commit to kennytm/rust that referenced this pull request May 22, 2018
…petrochenkov

rustc: Fix procedural macros generating lifetime tokens

This commit fixes an accidental regression from rust-lang#50473 where lifetime tokens
produced by procedural macros ended up getting lost in translation in the
compiler and not actually producing parseable code. The issue lies in the fact
that a lifetime's `Ident` is prefixed with `'`. The `glue` implementation for
gluing joint tokens together forgot to take this into account so the lifetime
inside of `Ident` was missing the leading tick!

The `glue` implementation here is updated to create a new `Symbol` in these
situations to manufacture a new `Ident` with a leading tick to ensure it parses
correctly.

Closes rust-lang#50942
@petrochenkov petrochenkov deleted the pmapi branch June 5, 2019 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.