Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: new lifetime elision rules #141

Merged
merged 4 commits into from
Jul 9, 2014

Conversation

aturon
Copy link
Member

@aturon aturon commented Jun 26, 2014

Rendered (draft)

text/

tracking issue: rust-lang/rust#15552

Note: the core idea for this RFC and the initial survey both came from @wycats.

* If there is exactly one input lifetime position (elided or not), that lifetime
is assigned to _all_ elided output lifetimes.

* If there are multiple input lifetime positions, but one of them is `&self` or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this rule a bit surprising (the others make perfect sense). I can intuitively see the motivation that self ought to be privileged but, I can't really justify why that is so. Looking at the examples below, the ones using this rule took me a lot longer to grok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale is that in several usage surveys, this was essentially the only pattern we saw when &self was involved.

I believe that the reason for this is that when you're borrowing something out of self, it makes sense to involve another ref for computation. In contrast, it's a very unusual pattern to borrow something out of a value as a method of some other object. It's just not really how people think about using methods and objects in general, so it doesn't happen (almost at all).

I suspect that in cases where this pattern could occur, people use standalone functions instead of methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wycats, what proportion of the cited 87% would be lost if this rule were not accepted? I don't personally object to it, but I can see how it's a bit more flimsy than the others, and I would be willing to live without it if the statistics bore it out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it should use the lifetime of the first input parameter, regardless of whether it is self or not, and only if it is an elided lifetime.

This avoids issues with UFC and makes method and non-method functions work the same.

Supporting elision of lifetimes only in the return value when they are explicit on self seems a bad idea, since it is counterintuitive. Also, it doesn't work for multiple explicit lifetimes (e.g. &'a Block<'b>).

@lilyball
Copy link
Contributor

👍

* For `impl` headers, input refers to the lifetimes appears in the type
receiving the `impl`, while output refers to the trait, if any. So `impl<'a>
Foo<'a>` has `'a` in input position, while `impl<'a> SomeTrait<'a> Foo<'a>`
has `'a` in both input and output positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the word for is lacking from the second example. It’s not an obvious example of where the lifetimes are, either—it could be rewritten as the probably-fairly-nonsensical “impl<'a, 'b> SomeTrait<'a> for Foo<'b> has 'a in [the] output position and 'b in [the] input position”.

(As for the “the”, I think that should be there in all these cases, or “an” as the case may be in some places. This affects much of the document.)

@huonw
Copy link
Member

huonw commented Jun 26, 2014

I'm nervous about adding elision for output parameters since I'm slightly concerned that may make things less clear (a minor adjustment to a signature that otherwise compiles would make the compiler spew weird errors), but I am in favour of elision in input position in impl, that is:

impl BufReader { ... }
impl Reader for BufReader { ... }
impl Reader for (&str, &str) { ... }

@chris-morgan
Copy link
Member

@huonw Do you mean output parameters in general, or just in impls?

@glaebhoerl
Copy link
Contributor

  • If there are multiple input lifetime positions, but one of them is &self or &mut sef, the lifetime of self is assigned to all elided output lifetimes.

I don't like this rule. The other rules have the property that there's no other way the signature could possibly make sense: i.e., the desugaring is unambiguous. Here we're making an arbitrary choice. I don't think we should do that.

Subtlety for non-& types

There's an additional subtlety: lifetime parameters of & types are covariant. For other types, they may not be. For instance:

struct Callback<'s> {
    callback: fn(&'s str) -> int;
}

fn some_fn(cb: Callback) -> &str;

// Under proposed rules desugars to:
fn some_fn<'s>(cb: Callback<'s>) -> &'s str;

Here Callback has a contravariant lifetime parameter. And the desugaring doesn't make sense, because there's no way you can get something with a lifetime of 's out of a Callback<'s>; you can only "put one in". In other words, Callback's lifetime parameter is in an output position.

If you just take that into account when applying the rules, then I think they would keep working. But I'm not sure what the situation is with invariant or bivariant lifetime parameters, because I haven't thought about it yet.

@glaebhoerl
Copy link
Contributor

OK, so in plain English, I think the rule should be: If there's exactly one readable lifetime and N writable ones, all the writable lifetimes are assumed to be the same as the readable one. Lifetime parameters in covariant position are readable, in contravariant writable, invariant both, bivariant neither.

@wycats
Copy link
Contributor

wycats commented Jun 26, 2014

@huonw I think the proposed error messages will go a long way to avoid "compiling spewing weird error messages", no?

@pcwalton
Copy link
Contributor

I was originally a bit nervous about this sort of thing, but now I have no objections.

I'm slightly more nervous about the self thing, but I'm fine with trying it and seeing how it goes. I think that the "suggest-a-lifetime" error messages that we now have make this sort of thing easier to deal with.

can avoid writing any lifetimes in ~87% of the cases where they are currently
required.

Doing so is a clear ergonomic win.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the biggest part of this proposal for me. (well, combined with the data that shows that it is)

@steveklabnik
Copy link
Member

A big 👍 from me. If the vast majority of code is doing something a certain way, then it's a good basis for making a rule. This should eliminate a lot of what is effectively boilerplate, and a good lifetimes tutorial / better errors will assist in the pedagogy sense.

Also, if you like the lifetimes, you can keep writing them.

@aturon
Copy link
Member Author

aturon commented Jun 26, 2014

@glaebhoerl Great point about contravariance, which I hadn't thought about. I agree that a contravariant argument should not be considered as an input position.

Just to be clear, is the suggestion that contravariant positions swap the input/output distinction? (Which would be the typical type-theoretical thing to do.) Concretely, are you proposing that

fn some_fn(&self, cb: Callback) -> int;
fn other_fn(n: int) -> (&T, cb: Callback);

expands to

fn some_fn<'a>(&'a self, cb: Callback<'a>) -> int;
fn other_fn<'a>(n: int) -> (&'a T, cb: Callback<'a>)

The first case makes some sense, but the latter case is pretty surprising -- it would happen because the Callback's lifetime is considered an input position, and thus can establish the (sole) output position for &T.

We could also simply disallow eliding contravariant lifetimes, since it may be preferable to be explicit in those (rare) cases.

Finally, see @wycats's comment above re: the &self rule. It's not arbitrary: the &self parameter definitely plays a special role for methods, and the proposed rules are based on the most common patterns in the libstd corpus.

@bachm
Copy link

bachm commented Jun 26, 2014

Just posting to express my support for this well written RFC. With the proposed error messages there should be little confusion when an user first encounters unelidable lifetimes.

@glaebhoerl
Copy link
Contributor

the latter case is pretty surprising -- it would happen because the Callback's lifetime is considered an input position, and thus can establish the (sole) output position for &T.

Even thinking about this example makes my head hurt... I think the "logic" of it, as it were, is that when the caller of other_fn invokes the second component of the returned tuple, which is the Callback, with something of lifetime 'a, other_fn can then use that to "produce" the first component of the tuple, also of lifetime 'a? Obviously that couldn't physically work without a time machine.

One distinction that I noticed, and I'm not sure if it has significance, is that while the return type of a function f, and an argument of a function g which is f's parameter, are both output positions, f is required to return a value, but it's not required to call the callback g. Again, I'm not sure whether this has implications for how inference should work.

I basically agree with you that it seems reasonable-but-not-imperative to desugar your first example, but not so much the second one. I don't have any concrete rules in mind which might accomplish this.

Finally, see @wycats's comment above re: the &self rule. It's not arbitrary

To avoid getting caught up in debating the meaning of the word "arbitrary" (I wasn't assuming that you flipped a coin): For the first and second rules, there's only one way it can make sense. If the user were to explicitly annotate lifetimes, they would annotate the same ones we infer 100% of the time. For the third rule, there's more than one way it can make sense, and we'd be choosing to favor one of them. Even if our favoring rests on a stronger basis than a coin flip, I don't think this kind of "probably what you meant" inference is something we should be doing.

@aturon
Copy link
Member Author

aturon commented Jun 26, 2014

@glaebhoerl Thanks for the thoughtful comments.

My feeling about &self is that these rules are not inference, but rather shorthand: they are a systematic way of filling in what's been left off of a signature without looking at the body.

The rules are simple enough that it's easy to know, given the signature in your head, whether you can elide or not.

Put another way, the debate is whether

fn foo(&self, t: &T) -> &U;

is simply not allowed/usable as a signature, or whether it has a useful meaning based on the most common lifetime patterns. Once you know the rules, you know immediately that the above would expand into

fn foo<'a,'b>(&'a self, t: &'b T) -> &'a U;

and would only write the elided signature if that's what you wanted.

FWIW, I disagree that the other rules give the only sensible expansion. Not even today's rules do. If you write

fn bar(t: &T, u: &U);

you get distinct lifetimes for the two parameters. But it can also make sense for them to share the same lifetime, and some uses would require it. In that situation, you know you can't leave off the lifetimes, and you write an explicit signature. I think the same would be true with the &self rule.


The error case on `impl` is exceedingly rare: it requires (1) that the `impl` is
for a trait with a lifetime argument, which is uncommon, and (2) that the `Self`
type has multiple lifetime arguments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example arise today in any known Rust codebase?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bstrie I don't know of any cases offhand, which is why the error message here is probably not so important.

@bstrie
Copy link
Contributor

bstrie commented Jun 26, 2014

Above I draw a comparison between lifetime elision and type inference, and how the great thing is that people who choose to be explicit are still welcome to manually annotate lifetimes. However, there is one thing that would support the people who make such a decision and improve teachability for newcomers: make the --pretty typed compiler flag annotate the elided lifetimes just as it annotates the inferred types (or you could make it an entirely separate flag, I suppose).

@rkjnsn
Copy link
Contributor

rkjnsn commented Jun 26, 2014

Quick question:

Would it be feasible to handle the multiple-input case by having something like

fn frob(s: &str, t: &str) -> &'t str;

expand to

fn frob<'a, 'b>(s: &'a str, t: &'b str) -> &'b str;

While such a shorthand would be mostly orthogonal to the elision rules of this RFC, I bring it up because it seems like it could impact whether we want to treat self specially (the third rule of the RFC), since one would be able to write

fn args<T:ToCStr>(&mut self, args: &[T]) -> &'self mut Command

Also, I realize the lookup rules would take some consideration if this were to be implemented, since lifetimes and parameter names are currently in different namespaces.

@glaebhoerl
Copy link
Contributor

@aturon You're right.

Now we have the interesting situation that you've shown that my stated arguments against "the self rule" are invalid, yet, for some reason, this hasn't convinced me to like it. Apparently, my stated arguments were not the real reason why it bothers me. When your subconscious is telling you something is wrong, it doesn't necessarily go into great detail about why, or which part...

I think a large part of it is because of the fact that I don't think we should semantically/syntactically distinguish the self argument in the first place, or even necessarily have a self keyword at all. (I have a proposal to this effect which I might hopefully have time to write down at some point in the next 5,000 years.) And here we're proposing to distinguish it in an additional way. When you write, "If there are multiple input lifetime positions, but one of them is &self", I read, "If there are multiple input lifetime positions, but one of them is the first argument, or is called "self""... I mean, maybe it holds up statistically, but statistically speaking, there are two popes per square kilometer in Vatican City. (Or currently, I suppose, four.)

@krdln
Copy link
Contributor

krdln commented Jun 27, 2014

@bstrie
I think that not only --pretty typed should reveal all lifetimes, but also each compiler error involving lifetimes should print fully annotated function signature. This is a problem even now, when compiler tells about errors involving unnamed lifetimes.

@aturon
Copy link
Member Author

aturon commented Jul 14, 2014

@glaebhoerl I didn't mean to turn this discussion into a defense of rule 3; I'm also trying to understand better the relationships between the rules. I'm sorry I didn't make that more clear. (Text is hard.)

Let me try again. The initial question was whether I see a qualitative difference between rule 3 and the others. I do not, myself. But I can see how someone with a different perspective on methods (which I think you have?) would feel differently.

My general perspective on the rules is that they are simply shorthand, providing carefully-chosen defaults. Defaults are always heuristic and connected to common patterns of thought and code.

As with any defaults, in a purely semantic sense the rules are arbitrary, because there are other valid (and sometimes useful) lifetime assignments that the language allows.

As heuristics, the rules have a clear quantitative basis.

I think what's up for grabs is the qualitative basis -- how do they "feel", how well do they match our intuitions?

The intuitions that @wycats and I were most interested in come from borrowing/ownership, as opposed to lifetimes. If you write

fn foo(x: &Foo) -> &Bar

you know the function takes in borrowed data and produces borrowed data. The simplest intuition is that the output borrow takes its ownership from the input borrow. It's then not a hard conceptual leap to say that the borrowed ownership of the output is only good for as long as the input's was -- we hope that the elided form can build intuitions about borrowing that lead naturally into the mechanics of lifetimes.

I feel similarly about methods. I'm using a method to access or otherwise manipulate the receiver, so all things being equal I expect any output borrows to flow from my borrow of the receiver.

Does that help?

@jfager
Copy link

jfager commented Jul 14, 2014

What was the argument against @bill-myers suggestion of using the first input lifetime? That covers more cases for regular functions and rule 3 falls out for free. It's not a particular deep or profound unifying principle, but it's simple and seems less ad-hoc.

@lilyball
Copy link
Contributor

First input lifetime seems a bit more ad-hoc, as strange as it sounds.

Methods are special, and self is special in these methods. Rule 3 seems perfectly natural to me given that perspective. But the first input lifetime is not special. It's actually rather arbitrary. There's no reason to believe that in fn foo(a: &str, b: &str) -> &str the output is necessarily more likely to be derived from a than from b.

"First input lifetime" will also cause some possibly surprising behavior in fn foo(self, x: &str) -> &str, where the output is derived from the second parameter x instead of from self. Of course, it usually can't be derived from self (the only way that makes sense is if the type of self contains a lifetime parameter), but that's not a good reason to arbitrarily select the second parameter as the inferred lifetime source.

Overall, "lifetime of self" is a more constrained rule than "first input lifetime", based as it is on the special nature of methods and self, and I believe is much more likely to be a correct heuristic than "first input lifetime".

@jfager
Copy link

jfager commented Jul 14, 2014

Under the currently proposed rules, fn foo(self, x: &str) -> &str's output lifetime would also be derived from x via rule 2, wouldn't it? Rule 3 only states it kicks in for &self or &mut self.

@lilyball
Copy link
Contributor

@jfager Hrm, you're right. I hadn't considered the fn foo(self, x: &str) -> &str case until my previous comment, and there I only considered it in light of rule 3.

I think that, due to rule 3, it may be reasonable to adjust the rules such that fn foo(self, x: &str) -> &str cannot elide the lifetime. This would be a consequence of the fact that self is special, and therefore any method on self reasonably assumes an elided output lifetime is derived from self. My belief is this should be true even for by-value self methods.

That said, this particular case is I think something of an edge case, and I would not consider it a serious problem if the rules are left unchanged.

@jfager
Copy link

jfager commented Jul 15, 2014

It's an edge case but now that it's come up I think it gets right at the discomfort of the current set of rules. The justification for rule 3 is 'methods are special', but this interaction with rule 2 says 'but maybe not that special'. They should either be uniform, or they should be different; it's straddling the fence that feels odd.

"You may elide lifetimes; output lifetimes are assigned the first input lifetime" is arbitrary and there's not a great intuitive reason it should be true, but it's uniform between fns and methods, and despite its arbitrariness it's simple and easy to understand, and it ends up giving you the same code and behavior in all but one of the examples given in this RFC, frob being the exception.

"Elided output lifetimes take the lifetime of self for methods, or the lifetime of a sole input lifetime for functions" is similarly straightforward and simple, but treats methods and fns clearly differently.

I could get behind either.

*Edit: sorry, posted early.

@glaebhoerl
Copy link
Contributor

@aturon Yes, that's closer what I was trying to get at. (Though I was also wondering if there might be some drier, more formal formulation of our intuitions.) How does rule 1 fit into these intuitions about borrowing, i.e. why is it more intuitive for each input lifetime to be different rather than tied together?

result types. So `fn foo(s: &str) -> (&str, &str)` has elided one lifetime in
input position and two lifetimes in output position.

* For `impl` headers, input refers to the lifetimes appears in the type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trait definitions themselves are also a form that offers lifetime positions. That may or may not be relevant (I'll be posting a question about that soon -- see a few lines up), but should probably be addressed explicitly.

pnkfelix added a commit to pnkfelix/rfcs that referenced this pull request Jul 15, 2014
Explicitly note that lifetimes from the `impl` (and `trait`/`struct`)
are not considered "input positions" for the purposes of expanded `fn`
definitions.

Added a collection of examples illustrating this.

Drive-by: Addressed a review comment from @chris-morgan
[here](rust-lang#141 (comment)).
@zwarich
Copy link

zwarich commented Jul 17, 2014

@glaebhoerl In the absence of any lifetime variables in return types, the assignment of distinct lifetime parameters is the most general type that can be given. That is probably the intuition that is at play here. Of course, once return types with lifetime variables get involved, then this no longer applies, but everyone agreed that this case was broken today anyways.

@glaebhoerl
Copy link
Contributor

I was thinking that maybe elided lifetimes in arguments of higher-order function parameters should be desugared to higher-rank lifetimes, because that's usually what you want:

// not legal, I believe?
fn print_with(text: &str, printer: |&str|) { ... }

=>

// you pretty much always want this, I think?
fn print_with<'a>(text: &'a str, printer: <'b> |&'b str|) { ... }

The question is, given that closures are going to be merely trait objects, how could we properly generalize this? (There may or may not be an easy answer; I've spent approximately two minutes thinking about it.)

@zwarich
Copy link

zwarich commented Jul 26, 2014

@glaebhoerl Why does it matter in that particular case? The only lifetimes that can ever be passed to printer are 'a and 'static, so you can always choose 'a.

I assume you were thinking of another case where it does matter?

@glaebhoerl
Copy link
Contributor

Maybe the example was bad. But:

The only lifetimes that can ever be passed to printer are 'a and 'static, so you can always choose 'a.

This is not true, because print_with could easily have local variables of type &'x str. (Which, in this case, might be weird, which in turn is why this might've been a bad example; then again, maybe print_with might want to prefix text with a timestamp or something.)

But to amend, imagine this:

fn print_two_with(text1: &str, text2: &str, printer: |&str|) { ... }

Now there are two &str arguments with different lifetimes and we want printer to work for both.

But the point is really that in general, do you ever want the lifetimes of the arguments of an argument function to be pre-determined by lifetime parameters on the outer HOF, instead of the (strictly-)more-general formulation where the argument function itself is parameterized over them?

@zwarich
Copy link

zwarich commented Jul 26, 2014

@glaebhoerl My (potentially mistaken) assumption is that the legacy closures actually have higher-rank lifetimes, even though it isn't a feature exposed independently in the type system, and rust-lang/rust#15067 is tracking exposing that to the new unboxed closures. This code type-checks:

fn print_two_with<'a, 'b>(text1: &'a str, text2: &'b str, printer: |&str|) {
    if true {
        printer(text1);
    } else {
        printer(text2);
    }
}

whereas this code does not:

fn print_two_with<'a, 'b>(text1: &'a str, text2: &'b str, printer: |&'a str|) {
    if true {
        printer(text1);
    } else {
        printer(text2);
    }
}

@aturon
Copy link
Member Author

aturon commented Jul 27, 2014

@glaebhoerl The current plan is that the elision rules apply recursively for the sugared form of unboxed closure types (i.e., the |x: T| -> U notation), as they have in the past. There's not currently a plan to generalize this to uses of traits directly, although that might actually be the right answer to covariant lifetime positions (see rust-lang/rust#15699 about _co_variance being the odd case, and rust-lang/rust#15907 about its relation to lifetime elision).

glaebhoerl pushed a commit to glaebhoerl/rfcs that referenced this pull request Aug 8, 2014
Explicitly note that lifetimes from the `impl` (and `trait`/`struct`)
are not considered "input positions" for the purposes of expanded `fn`
definitions.

Added a collection of examples illustrating this.

Drive-by: Addressed a review comment from @chris-morgan
[here](rust-lang#141 (comment)).
@ticki
Copy link
Contributor

ticki commented Aug 7, 2015

👍

@b-jonas0
Copy link

After this change, would there be any lifetime elision rules for lifetimes that appear in the head of a lambda expression (anonymous function)?

@Centril Centril added A-inference Type inference related proposals & ideas A-typesystem Type system related proposals & ideas A-lifetimes Lifetime related proposals. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-inference Type inference related proposals & ideas A-lifetimes Lifetime related proposals. A-typesystem Type system related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.