Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: deref coercions #241

Merged
merged 5 commits into from
Jan 20, 2015
Merged

RFC: deref coercions #241

merged 5 commits into from
Jan 20, 2015

Conversation

aturon
Copy link
Member

@aturon aturon commented Sep 16, 2014

Add the following coercions:

  • From &T to &U when T: Deref<U>.
  • From &mut T to &U when T: Deref<U>.
  • From &mut T to &mut U when T: DerefMut<U>

These coercions eliminate the need for "cross-borrowing" (things like &**v) and calls to as_slice.

Rendered

@aturon
Copy link
Member Author

aturon commented Sep 16, 2014

cc @nick29581 @nikomatsakis
cc #226

Of course, method dispatch can implicitly execute code via `Deref`. But `Deref`
is a pretty specialized tool:

* Each type `T` can only deref to *one* other type.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a formal restriction or an informal one? AFAIK it's currently possible to do something like

trait Foo { ... }
impl Foo for int { ... }
impl Foo for uint { ... }

struct Bar;

impl<T> Deref<T> for Bar where T: Foo { ... }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once associated types land, we'd make T an associated type, which would force it to be uniquely determined by the Self type.

That restriction is necessary for the coercion algorithm being proposed to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfackler Updated the RFC to clarify.

This is a key difference from the
[cross-borrowing RFC](https://github.com/rust-lang/rfcs/pull/226).

### Limit implicitly execution of arbitrary code
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: should be "implicit execution of" or "implicitly executed"

@pnkfelix
Copy link
Member

Did any of the examples show an instance of the second bullet's coercion, namely: "From &mut T to &U when T: Deref" ?

I ask because that is a case where something that looks like an ownership transfer (of the &mut itself) will actually not be so, unlike (I believe) the first and third bullets.

It could be that the trade off here is in the RFC's favor. I just want to make sure we show an example justifying the second bullet in particular.

```rust
let v = vec![0u8, 1, 2];
foo(v); // is v moved here?
bar(v); // is v still available?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My basic counter-argument to this is that the compiler tells you exactly this - if this is working code, then you know for sure that v is available in the call to bar so, you don't benefit from having to write &.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a fair assumption for experienced Rust developers, but this is a large disadvantage for people new to Rust, who will just be extremely confused when two function calls that look like they use v in exactly the same way actually use it completely differently.

@pnkfelix
Copy link
Member

I ask because that is a case where something that looks like an ownership transfer (of the &mut itself) will actually not be so, unlike (I believe) the first and third bullets.

Actually I just remembered: of course there are other cases where passing a &mut is reborrowed rather than transfered. So I overstated the situation above.

Still, adding the requested example would benefit the RFC presentation.

The design satisfies both of the principles laid out in the Motivation:

* It does not introduce implicit borrows of owned data, since it only applies to
already-borrowed data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice!

@nikomatsakis
Copy link
Contributor

@pnkfelix actually, coercing &mut T to &T is a kind of ownership transfer -- in particular, the &mut T is unusable during the time it is re-borrowed as an &T. So it's the same sort of transfer in particular as reborrowing an &mut T to another &mut T.

@nikomatsakis
Copy link
Contributor

@nick29581 it seems like the key question is whether, indeed, programmers need to be very aware of the levels of indirection. I contend that they do not.

In terms of capabilities, there is virtually no difference between a T and a Box<T>. In fact, the only difference between a T and a Rc<T> is that a T can be mutated ("unique ownership" vs "shared ownership") -- basically let x: Rc<T> and let y" T are equivalent to one another, except that the T in y can be moved.

Speaking personally, I find accounting for the precise amount of indirection to be tedious and I don't think it adds much information. It's not a big deal, but converting from &x to &*x to &**x is just a kind of regular irritant. When reading code, I basically ignore the number of * that I see. That is, if I see code like this (from rustc): check_expr_coercable_to_type(fcx, &**arg, formal_ty);, do you really care whether it says &*arg or &arg? The gist is the same. Certainly my eyes glaze over the distinction, and I tend to always just write & or &* and then let the compiler correct me. As further evidence, in other parts of the language, we often obscure the precise amount of indirection. Autoderef and closures come to mind, as well as autoref for operator overloading (though perhaps that should change, at least in some cases, but for independent reasons).

However, when we used to have cross-borrowing, I did find that confusing. I remember distinctly being surprised to see vectors that looked like they were being moved, but were in fact being (implicitly) borrowed, and also being surprised that a Box<T> behaved differently from a plain T. Of course this is anecdotal, but I think it's indicative of the kind of thinking we want to encourage: one that focuses on ownership and borrowing. Put another way, pointers are the means, not the end.

@nrc
Copy link
Member

nrc commented Sep 16, 2014

@nikomatsakis yes, totally agree it boils down to whether programmers need to be concerned about the level of indirection. I could be persuaded. My experience is mostly from C++ where it is deadly important to know.

I agree that at first blush I ignore the &** stuff on actual parameters. When I do care, is when I have a bug and I'm trying to work out what is going wrong with some code which compiles and gives the wrong result. In that case I really want to know exactly where something can be mutated or referenced. It seems to me that being imprecise about indirection will make that harder (perhaps there is a case that we will never obscure the different between a pointer and a value with this system, only between pointers with different levels of indirection, so it is all OK).

I want to believe that we can not worry about pointers and only about ownership, it seems a much more attractive model. But my experience in the past has always been that if I don't know precisely what is going on, there is confusion, and that makes debugging harder. I guess I think of borrowing, deep down, as just a pointer to something, so trying to not think of pointers is difficult for me.

I'll try and think only of ownership and borrowing for a while and see if I can warm to the idea of reasoning without pointers :-)

@nikomatsakis
Copy link
Contributor

On Tue, Sep 16, 2014 at 04:31:35AM -0700, Nick Cameron wrote:

I agree that at first blush I ignore the &** stuff on actual parameters. When I do care, is when I have a bug and I'm trying to work out what is going wrong with some code which compiles and gives the wrong result. In that case I really want to know exactly where something can be mutated or referenced.

That's interesting. I don't think I've ever had a bug in Rust code
that boiled down to a mistake in layers of indirections. I could be
just failing to remember. It feels like these mismatches always
manifest at compilation time. Can you give me an example of the sort
of bug you are thinking of?

The closest example I can come up with is that I have seen errors
where something contained a Cell was implicitly being copied, and
thus mutation was occuring on a copy of the cell and not the original.
This was basically fixed by making Cell linear (in general I contend
that anything with interior mutability has "identity" and hence should
be linear). However, it seems to argue in favor of this
"ownership-focused" proposal, since the important thing is whether the
underlying value is being moved or borrowed.

@nrc
Copy link
Member

nrc commented Sep 16, 2014

I am still not decided on whether this is the best approach, but a couple of thoughts if it is:

  • if we need a & to convert from String to &strand vecs (as in this proposal) I prefer to either not do it (i.e., don't implement Deref for String and Vec) or to remove the empty slicing notation. Otherwise you always have this choice of writing &s or s[] and neither is an obvious choice (I think converting &String to &str etc., would be rare, but perhaps I'm wrong.
  • Which leads to... do you have an idea how common it is to convert Smaht<T> to &T vs &Smaht<T> to &T? My intuition is the former is far more common, but the latter happens some times (i.e., I see a lot more &* than &**, but a non-negligible number of &**)?
  • I ask the above because, I am warming to the general idea here, but I think I would prefer if this were implemented as a change to & semantics rather than a coercion. The brief idea is that & would be the borrow operator, rather than the address of operator. It would do as many derefs as possible (i.e., using a built-in deref or with the trait), which might be zero and one address-of operation. So assuming T does not implement Deref, here is a table of examples of the type of e and &e (where e is any expression):
e &e
T &T
&T &T
Rc &T
&Rc &T

and so forth. I think this gives similar results to this RFC, but also works for &T types. This does mean that if you need a &T you can always write &e no matter the type of e and it should work. This might lead people to be sloppy, I'm not sure how bad that is. I think this is as close as we can get to auto-borrowing in Rust (i.e., more so than this RFC) without losing some vital distinctions between values and references. Note that * would still do a single deref. To do an address-of rather than a borrow, I propose a &(e). Although that is a little ambiguous (although I think we can parse it sensibly, it might not be very ergonomic).

I believe the borrow operator has the following advantages over a cross-borrow-deref coercion:

  • clearer behaviour since it is not type driven,
  • better interaction with coercions (because you would still get the ones we want (e.g., unsizing) after applying the & operator, but we don't make coercions more complicated, nor check after every iteration),
  • makes the principle of borrowing even stronger because we have an operator for it,
  • better interaction with type inference (you can write let x = &y;, a coercion requires an explicit type here),
  • easier to explain.

Does this sound like a sensible alternative or is it silly? If it is the former I can write up an RFC so we have a place to discuss, rather than doing so here.

@aturon
Copy link
Member Author

aturon commented Sep 16, 2014

@nick29581

if we need a & to convert from String to &strand vecs (as in this proposal) I prefer to either not do it (i.e., don't implement Deref for String and Vec) or to remove the empty slicing notation. Otherwise you always have this choice of writing &s or s[] and neither is an obvious choice (I think converting &String to &str etc., would be rare, but perhaps I'm wrong.

I tend to agree; we could drop []. Converting from &String to &str is rare, but &mut Vec<T> to &[T] is I suspect somewhat more common.

Which leads to... do you have an idea how common it is to convert Smaht<T> to &T vs &Smaht<T> to &T? My intuition is the former is far more common, but the latter happens some times (i.e., I see a lot more &* than &**, but a non-negligible number of &**)?

Yes, though it depends on what implements Deref. For example, working with &Box<T> and &Vec<T> (or mutable versions) is not so uncommon.

I ask the above because, I am warming to the general idea here, but I think I would prefer if this were implemented as a change to & semantics rather than a coercion.

(details elided)

Does this sound like a sensible alternative or is it silly? If it is the former I can write up an RFC so we have a place to discuss, rather than doing so here.

It's an intriguing idea, but seems problematic. Consider cases like the following:

fn wants_vec_ref(v: &mut Vec<u8>) { ... }
fn has_vec(v: Vec<u8>) {
    wants_vec_ref(&mut v)
}

If Vec implements Deref, we'd have to use the special "address of" notation even though we really do want a borrow -- a borrow of the "smart pointer" itself rather than the slice it points to.

(Just wanted to jot that down for now -- more later.)

@CloudiDust
Copy link
Contributor

@aturon @nick29581, another alternative would be making coercions explicit, but not too explicit.

Some ideas here: Semi-explicit coercion control with ~.

@nikomatsakis
Copy link
Contributor

On Tue, Sep 16, 2014 at 01:43:23PM -0700, Nick Cameron wrote:

I think this gives similar results to this RFC, but also works for &T types.

Why does the current RFC not work for &T types? I mean, presumably
&T implements Deref<T> (it may not today, but obviously it
should), and hence &&T can be coerced to &T just as &Rc<T> can
be coerced to &T.

Regarding the slice notation, I agree that [] is not so
important. In fact, I'm wondering if maybe we actually do want
foo[a..b] to act "like an lvalue", so that one writes &foo[3..] or
&mut foo[3..]. The latter would replace foo[mut 3..]. Before I
thought this had bad ergonomics, but now it seems consistent with this
general trend towards making borrows apparent with the & operator,
as well as with &vec being (basically) the way to do what was
vec[].

@aturon aturon mentioned this pull request Sep 17, 2014
@Valloric
Copy link

@Thiez Syntax like &*foo would rightfully produce a WTF-stream from any Rust newcomer. It's not the API we should be proposing for a common operation like taking a slice out of a vector.

@Thiez
Copy link

Thiez commented Jan 14, 2015

@Valloric And syntax like &'a foo would also produce a WTF-stream. And syntax like |&: n| { ... }. I think once a person understands the basics of pointers in Rust, (which are quite simple: & creates a reference, and * dereferences) and knows about Deref they should be able to understand &*foo, because it is very simple and works exactly how one would expect.

If you're going to learn a new language, you have to learn the syntax. That is not a bad thing. And while the syntax certainly shouldn't be made more complex that it needs to be, I don't think making the semantics of & more complex is going to make the language easier to learn.

@liigo
Copy link
Contributor

liigo commented Jan 14, 2015

+1
2015年1月14日 上午7:19于 "Val Markovic" notifications@github.com写道:

@Thiez https://github.com/Thiez Syntax like &*foo would rightfully
produce a WTF-stream from any Rust newcomer. It's not the API we should be
proposing for a common operation like taking a slice out of a vector.


Reply to this email directly or view it on GitHub
#241 (comment).

@retep998
Copy link
Member

+1
This RFC seems like it manages to retain enough explicitness while making a relatively common task simpler to write.
I have one question though, given T: Deref<U> if a function takes a generic which is impl'd for both T and U, and I pass &T, does it result in ambiguity or simply take the option with the least derefs?

@nrc
Copy link
Member

nrc commented Jan 19, 2015

There is an implementation now, and there seems to be a lot of positive sentiment for this change, should we push forward with it?

@aturon are there any open questions you're aware of? (I haven't re-read the whole conversation here, but I will if we are going to move on to this. But I'm mostly checking you don't have any new concerns which aren't already here).

I'm still a little uneasy about the 'searching' aspect of the coercion, but I think it is worth it for the ergonomic benefit.

@aturon
Copy link
Member Author

aturon commented Jan 20, 2015

@retep998

I have one question though, given T: Deref<U> if a function takes a generic which is impl'd for both T and U, and I pass &T, does it result in ambiguity or simply take the option with the least derefs?

The & operator by itself keeps the same meaning it has always had: it gives you a &T. The only time the deref coercion kicks in is if you then use this &T value in a place where some &U is expected an T != U.

@aturon
Copy link
Member Author

aturon commented Jan 20, 2015

@retep998

Note that this is one reason why the slice syntax &v[] is still desirable: when you're invoking a generic function and want to pass a slice, not a reference to the vector.

@aturon
Copy link
Member Author

aturon commented Jan 20, 2015

@nick29581

Yes, we're ready to move forward with this; this way we can gain some experience before the beta.

@aturon
Copy link
Member Author

aturon commented Jan 20, 2015

Note: this RFC has now been merged, after some significant digestion on all our parts :-)

I won't recap the motivation/arguments here, which are well-represented in the RFC and thread. However, I will note that an implementation is nearly ready to land, and so we should have time to gain some experience here before shipping beta.

eddyb added a commit to eddyb/rust that referenced this pull request Jan 29, 2015
daramos added a commit to daramos/rust-memcmp that referenced this pull request Feb 18, 2015
withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
@Centril Centril added A-typesystem Type system related proposals & ideas A-coercions Proposals relating to coercions. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-coercions Proposals relating to coercions. A-typesystem Type system related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.