Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite nomicon references section #27911

Closed
Gankra opened this issue Aug 20, 2015 · 14 comments
Closed

Rewrite nomicon references section #27911

Gankra opened this issue Aug 20, 2015 · 14 comments

Comments

@Gankra
Copy link
Contributor

Gankra commented Aug 20, 2015

https://doc.rust-lang.org/nightly/nomicon/references.html

This involves solving the incredibly difficult question of "what on earth are Rust's True Pointer Aliasing Rules".

CC @aturon @arielb1 @nikomatsakis @pnkfelix @sunfishcode

@Gankra
Copy link
Contributor Author

Gankra commented Aug 20, 2015

It has been argued that the references section should be written in terms of lvalue paths. I believe this is what the borrow checker reasons in terms of, and is at very least a concrete concept. However this section does not want to simply model how the borrow checker thinks -- the entire point is that there needs to be a more fundamental model that the borrow checker models a subset of, but is fundamentally unable to model all of. This is the model unsafe code should be written against, and that the borrow checker can grow into if improved (e.g. nonlexical borrows).

@Gankra
Copy link
Contributor Author

Gankra commented Aug 20, 2015

CC @RalfJung and co who are working on formally modeling Rust's semantics.

@RalfJung
Copy link
Member

Indeed, we'll have some fun figuring this out ;-)

Speaking of which, "A reference cannot outlive its referent" is already something that's not actually enforced in my model. It's only when you use a reference that you have to prove that the referent is still alive, by showing that the lifetime of the reference is still active. As long as you don't use the reference, the model doesn't care whether it is valid.

I should also mention that "path" is not a thing in my formal model. I don't even have a stack. It's all about owning locations, or knowing the protocols that some locations are currently subject to. (Like, a shared borrow to a basic datatype follows the protocol that everybody can read it, and that multiple reads are guaranteed to deliver the same result. A mutable borrow to a basic datatype has the protocol that you can temporarily exchange your borrow for actual ownership of the referent, but until you change this back, it is impossible for the lifetime of the borrow to end.) The challenge will be to translate these protocols, and the even more implicit notions of separation/disjointness, back to something that makes sense when looking at surface Rust code...

@aturon
Copy link
Member

aturon commented Aug 21, 2015

@RalfJung

Speaking of which, "A reference cannot outlive its referent" is already something that's not actually enforced in my model. It's only when you use a reference that you have to prove that the referent is still alive, by showing that the lifetime of the reference is still active. As long as you don't use the reference, the model doesn't care whether it is valid.

This was essentially what the whole mem::forget/thread::scoped drama was about, and it's equally true of Rust the language: the type system ensures that lifetimes are not usefully reachable outside the scope they describe, but you can e.g. stash them in a leaked Rc cycle.

I think the "path" description in the current document is a bit of a dead end for what the book is ultimately trying to do -- describe the constraints on unsafe code. I do think that a more precise version of the path explanation would be a good way to explain borrow checking, though.

@Parakleta
Copy link

I assume this is the correct issue to add this question to. I've discovered through some experimentation and by reading #10488 that

let _ = Iron::new(hello_world).http("localhost:3000").unwrap();

for example causes the destructor to be run immediately (i.e. the end of the statement) and so joins the thread and blocks further execution, but

let _ = &Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block.

I can understand that

let _listen = Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block because that is the scope of the variable _listen, even though it is unused. What I don't understand is how the lifetime of let _ = <rvalue> differs from let _ = &<rvalue>. Is this a difference I should be relying on? What is the correct method to control the lifetime of unused/anonymous objects?

@Gankra
Copy link
Contributor Author

Gankra commented Nov 10, 2015

Interesting! @eddyb any thoughts on this?

@nikomatsakis
Copy link
Contributor

On Mon, Nov 09, 2015 at 06:05:20PM -0800, Parakleta wrote:

I assume this is the correct issue to add this question to. I've discovered through some experimentation and by reading #10488 that

let _ = Iron::new(hello_world).http("localhost:3000").unwrap();

for example causes the destructor to be run immediately (i.e. the end of the statement) and so joins the thread and blocks further execution, but

let _ = &Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block.

I can understand that

let _listen = Iron::new(hello_world).http("localhost:3000").unwrap();

extends the lifetime to the enclosing block because that is the scope of the variable _listen, even though it is unused. What I don't understand is how the lifetime of _ = <rvalue> differs from _ = &<rvalue>. Is this a difference I should be relying on? What is the correct method to control the lifetime of unused/anonymous objects?

Yes, these are all different. It's kind of the intersection of two
distinct rules. The mental model is roughly that the initializer is
stored into a temporary which has the lifetime of the statement. When
you do let <pat> = <initializer>, then, the pattern is matched
against this temporary. It may move things out of the temporary and
place them into fresh bindings, which then live as long as the block,
but things it does not move get dropped along with the temporary.

So something like let (foo, _) = <expr> is roughly as if you did:

let foo;
{
    let temp = <expr>;
    foo = temp.0;
}

Note that _ is not an identifier, it is a pattern which means "ignore this value".

So in terms of your examples:

let _ = foo.unwrap() means: call unwrap and discard result (drops
immediately).

let _x = foo.unwrap() means: call unwrap and store result into a
variable called _x (drops when _x is dropped)

Meanwhile, orthogonally: &foo.unwrap() means "create a temporary
stack slot" and store foo.unwrap() into it. Because it is being
stored into a let binding, the lifetime of this temporary is
extended to the enclosing block.

It's possible that the lifetime of the temporary we create when doing
pattern matching in a let should be the enclosing block, rather than
the let statement. This would be perhaps more analogous with the &
rules. But I wonder if this would break existing code; it's hard to
know. I'm not 100% sure why I didn't do it this way at the time,
because I remember being annoyed that let _ = foo() and let _x = foo() were not equivalent. That said, there are many who believe
they should not be; I can't find the issue now, but there was at one
point specific code in trans to ensure that let _ = foo() would drop
the result of foo() immediately.

@Parakleta
Copy link

The discussion in #10488 for the distinction between _ and _x makes sense to me, and I'm happy with the rationale that let _ = <rvalue> is essentially a no-op. I'm just confused by the & case. Does it mean that an & anywhere in an expression always creates a temporary that has the lifetime at least as long as the enclosing block? The statement &Iron::new()::http()::unwrap() doesn't but is assume that's because it's a statement and not an expression.

@nikomatsakis
Copy link
Contributor

On Tue, Nov 10, 2015 at 12:39:02PM -0800, Parakleta wrote:

Does it mean that an & anywhere in an expression always creates a
temporary that has the lifetime at least as long as the enclosing
block?

No. The rules are more subtle than that. Temporaries usually live
until the end of the current statement (the let, in this case) but
if they appear in specific places, they are extended until the end of
the block. Basically, if they appear in a place where it is unambiguous
that they would be stored into the result of a let.

Hence, the following temporaries will last until end of block:

let <pat> = &<expr>
let <pat> = StructName { field: &<expr>, ... }

But (under current rules) this would not:

let <pat> = method(&<expr>);

See http://doc.rust-lang.org/reference.html#temporary-lifetimes for
more details and more examples.

@Parakleta
Copy link

So the let _ = <rvalue>; statement lifetime and the let <pat> = &<rvalue>; statement lifetimes are different but my confusion comes from the idea (maybe I'm missing something though) that _ is a valid <pat> and &<rvalue> is also a valid <rvalue> so the statement let _ = &<rvalue> would match both rules?

Does this mean that let <pat> = &<rvalue> has higher priority than let _ = ... and should we assume this will always be true?

@eddyb
Copy link
Member

eddyb commented Nov 12, 2015

@Parakleta let _ = <rvalue>; drops the RHS immediatelly but always evaluates it, so the rules for &<rvalue> still apply, even if the reference is dropped (which is a no-op because references are Copy).

@Parakleta
Copy link

Thanks, I just noticed the text "The compiler uses simple syntactic rules to decide" in the reference manual. This all seems a bit fragile, considering that the decision is made on Syntax rather than Semantics. For example, if I have let _ = nop!(&<rvalue>); the lifetime is unknown without knowing exactly the contents of the macro. It seems RFC66 (Issue #15023) has the potential to end up changing this anyway. I'll steer clear of let _ = &<rvalue> for now and stick with let _tmp = <rvalue> instead.

@nikomatsakis
Copy link
Contributor

On Thu, Nov 12, 2015 at 04:06:03PM -0800, Parakleta wrote:

Thanks, I just noticed the text "The compiler uses simple syntactic rules to decide" in the reference manual. This all seems a bit fragile, considering that the decision is made on Syntax rather than Semantics. For example, if I have let _ = nop!(&<rvalue>); the lifetime is unknown without knowing exactly the contents of the macro. It seems RFC66 (Issue #15023) has the potential to end up changing this anyway. I'll steer clear of let _ = &<rvalue> for now and stick with let _tmp = <rvalue> instead.

It is true that you have to know the contents of the macro. However,
the motivation for using syntax was precisely to make it easier to
follow -- people were nervous about relying on inference to decide
when a destructor runs, since inference algorithms might be overly
conservative, or change over time.

@steveklabnik
Copy link
Member

Moving this to rust-lang/nomicon#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants