Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make MIR drop terminators borrow the dropped location #61069

Merged
merged 1 commit into from
Jun 5, 2019

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented May 23, 2019

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 23, 2019
@Centril
Copy link
Contributor

Centril commented May 23, 2019

@bors rollup

// Drop terminators borrows the location
TerminatorKind::Drop { location, .. } |
TerminatorKind::DropAndReplace { location, .. } => {
if let Some(local) = find_local(location) {
Copy link
Contributor

@Centril Centril May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BorrowedLocalsVisitor {
sets,
}.visit_terminator(self.mir[loc.block].terminator(), loc);
}.visit_terminator(terminator, loc);
match &terminator.kind {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use if let?

@eddyb
Copy link
Member

eddyb commented May 23, 2019

r? @pnkfelix cc @nikomatsakis

@rust-highfive rust-highfive assigned pnkfelix and unassigned eddyb May 23, 2019
@Zoxc
Copy link
Contributor Author

Zoxc commented May 24, 2019

HaveBeenBorrowedLocals is used to find when references and raw pointers to a local can exist and is only used for immovable generators when computing liveness of locals. User code may capture pointers to locals with Drop impls and access them using unsafe code.

@eddyb
Copy link
Member

eddyb commented May 24, 2019

@Zoxc Isn't it UB to capture self in a Drop impl? cc @rust-lang/wg-unsafe-code-guidelines

@RalfJung
Copy link
Member

@eddyb I don't understand the question. What is "capturing self"? Do you have some example code?

@eddyb
Copy link
Member

eddyb commented May 24, 2019

@RalfJung I mean, I'm assuming @Zoxc meant something like this:

struct Foo<T>(T, &mut *const T);
impl<T> Drop for Foo<T> {
    fn drop(&mut self) {
        *self.1 = &self.0 as *const T;
    }
}

@RalfJung
Copy link
Member

I don't see any grounds for making this UB. However, if the caller does a StorageDead or otherwise deallocates the backing storage of the drop, then ever using that pointer to access memory is UB (as part of the normal rule that accessing a dangling pointer is UB).

@RalfJung
Copy link
Member

Also, why the question? What is the analysis/optimization that would benefit from that code being UB? Honestly I have no idea what this PR even does, drop already borrows the dropped location because it receives an &mut, does it not?

@eddyb
Copy link
Member

eddyb commented May 24, 2019

@RalfJung Well, the issue is that in MIR it's Drop(Place), i.e. drop(x) not drop(&mut x).
So there's no explicit borrow involved, and this PR takes into account the implicit borrow.
(A borrow would technically be wrong anyway, without it being &move / &own)

I guess we shouldn't just assume it's UB, just like keeping a raw pointer to a moved-from location?

It does throw a wrench into my plan of "assume borrows are cancelled by moves/drops" to limit the effects of borrows, but at least for some optimizations it all works out if there are no derefs/calls between Move(x)/Drop(x) and StorageDead(x) (since then any raw pointers couldn't have been used anyway).

@RalfJung
Copy link
Member

this PR takes into account the implicit borrow

Takes into account for what? The borrow checker?

"assume borrows are cancelled by moves/drops"

Yeah, that is hard to realize. Even if we make moves write Undef in the storage they come from (which would be annoying to do in Miri because then reads need mutable access to the abstract machine memory), pointers to that storage would still be live and could be written to.

@eddyb
Copy link
Member

eddyb commented May 24, 2019

Just because I got a bit confused and thought this PR might not have any effect, here's an example:

let mut x = foo();
x = bar(); // DropAndReplace
drop(x);
yield;
baz(); // may access `x` via raw pointers, `x` must be in generator state
// no Drop at the end of scope, only StorageDead

Note that without the drop(x) you'd have an end-of-scope Drop which I would expect to keep x in the generator state across the yield point, even without this PR.

And I think that for regular end-of-scope Drops, that are followed by StorageDead without any intervening yields, this PR has no effect, so it's only Drops from DropAndReplace (assuming drop elaboration ran already) that will cause more locals to be kept in the generator state.

cc @cramertj @tmandry How can we measure the impact this has on async fn generators?
It may result in bloat, in certain specific conditions (which could be rare but I'm not sure).

@tmandry
Copy link
Member

tmandry commented May 24, 2019

Maybe I'm similarly confused, but we primarily use liveness_of_locals to determine which locals to save, and that already says it takes drops into account. Assuming that's true I actually don't think this would have an impact today.

That being said, it makes me nervous that we allow such things like capturing the pointer to a local inside a drop impl and later using it. Like if I were writing unsafe code, I wouldn't expect this to be allowed, at least without consulting a reference of some sort. It sounds like I've missed some previous discussions on the topic, but keeping our options open for future optimizations is important if we want to deliver on Rust's promises about performance.

@RalfJung
Copy link
Member

keeping our options open for future optimizations is important if we want to deliver on Rust's promises about performance.

At the same time, keeping our semantics concrete and operational is important if we want it to be possible for mere mortals to write correct unsafe code.

Right now, I consider drop to be just another function call. There's nothing special about it. drop(place) (in the MIR) as a terminator is equivalent to _tmp = &mut place; Drop::drop(_tmp). That's an easy to implement, easy to remember, easy to reason about rule.

"pointers may not escape from drop" is an axiom, a wish that you say you want to be true. To make things worse, "escape" is a rather fuzzy notion. To make that wish come true we have to figure out what exactly "escape" means, and we have to add clauses to our semantics that say, in a concrete operational way, where the UB here comes from. Think of implementing this as a check in Miri.

To give one example of what "escape" can not mean: it cannot mean "the address is not written to any memory outside the drop stack frame". It is legal today to write safe code in drop that calls rayon::join to send the &mut self of drop temporarily to another thread.

I also don't see the benefit in terms of optimization: if the compiler is supposed to know that some memory is not reachable any more through pointers "escaped" into somewhere, it should deallocate that memory (e.g. using StorageDead). Alternatively we might want to argue based on aliasing and provenance using Stacked Borrows. Which extra power is this clause you are suggesting giving?

@pnkfelix
Copy link
Member

Okay, after some puzzling and re-reading the dialogue here, I think I understand what this PR is doing.

The important bit is what eddyb said up above; in terms of runtime semantics, the MIR operator drop(local) really behaves as _tmp = &mut place; Drop::drop(_tmp); (or if you prefer, _tmp = &move place; Drop::drop(_tmp)). And that's important, because that means that drop(local) has an implicit borrow of local that is not reflected in a MIR (and thus is not exposed by the visit code's traversal over the MIR that the borrowed-locals code is otherwise using to identify the borrows).

  • Maybe that is a footgun we should think about trying to address, in terms of the design of MIR itself or in the design of the mir::visit code.

@pnkfelix
Copy link
Member

pnkfelix commented May 29, 2019

@tmandry wrote:

That being said, it makes me nervous that we allow such things like capturing the pointer to a local inside a drop impl and later using it

Well, at this point we have set down our drop order for e.g. struct fields.

So one can imagine examples like this, which I think should be valid (play):

use std::fmt::Debug;

struct D<A: Debug>(*const A);
struct Two<A: Debug> { d: D<A>, a: A }

impl<A: Debug> Drop for Two<A> {
    fn drop(&mut self) {
        println!("Dropping Two");
        self.d.0 = &mut self.a as *const A;
    }
}
    
impl<A: Debug> Drop for D<A> {
    fn drop(&mut self) {
        let a: A = unsafe { std::ptr::read(self.0) };
        println!("Dropping D({:?})", a);
        std::mem::forget(a);
    }
}

fn main() {
    let t = Two { d: D(std::ptr::null()), a: format!("Hello") };
}

and examples like this, which I think are UB (play):

use std::fmt::Debug;

struct D<A: Debug>(*const A);
struct Two<A: Debug> { a: A, d: D<A> }

impl<A: Debug> Drop for Two<A> {
    fn drop(&mut self) {
        println!("Dropping Two");
        self.d.0 = &mut self.a as *const A;
    }
}
    
impl<A: Debug> Drop for D<A> {
    fn drop(&mut self) {
        let a: A = unsafe { std::ptr::read(self.0) };
        println!("Dropping D({:?})", a);
        std::mem::forget(a);
    }
}

fn main() {
    let t = Two { d: D(std::ptr::null()), a: format!("Hello") };
}

(Note that the only difference between those two examples is the order of the struct field declarations in struct Two. When d comes before a lexically, as in the first case, that means d will be dropped before a, and so the unsafe pointer it carries will remain valid long enough for the destructor for d to run successfully. When d comes after a, then a is dropped before d's destructor runs, and we get haywire.)

@tmandry
Copy link
Member

tmandry commented May 29, 2019

Okay, what I'm gathering from this discussion is that we can use StorageDead to get the semantics that I really want. I'm working on a PR that emits StorageDead after every drop. Unfortunately, it can't be used in functions that aren't generators due to the compile time performance hit of doing this, but I guess that's another matter.

The important bit is what eddyb said up above; in terms of runtime semantics, the MIR operator drop(local) really behaves as _tmp = &mut place; Drop::drop(_tmp); (or if you prefer, _tmp = &move place; Drop::drop(_tmp)). And that's important, because that means that drop(local) has an implicit borrow of local that is not reflected in a MIR (and thus is not exposed by the visit code's traversal over the MIR that the borrowed-locals code is otherwise using to identify the borrows).

Semantically I agree with this change then. I don't think it changes anything we do today, but it is more correct. I think this reasoning should be laid out a bit more explicitly in comments, since a few of us were confused by what the change was doing.

@eddyb
Copy link
Member

eddyb commented May 29, 2019

Unfortunately, it can't be used in functions that aren't generators due to the compile time performance hit of doing this, but I guess that's another matter.

Just to be clear, there's a significant hit from StorageDeads added only to the unwind path?
Are you suppressing the codegen of llvm.lifetime.end calls in the unwind path? If not, I'd try it.

@tmandry
Copy link
Member

tmandry commented May 29, 2019

Just to be clear, there's a significant hit from StorageDeads added only to the unwind path?
Are you suppressing the codegen of llvm.lifetime.end calls in the unwind path? If not, I'd try it.

Yes; some benchmark results are here. Keep in mind that this does cause new basic blocks to be created. I have not tried suppressing those in the codegen yet.

@RalfJung
Copy link
Member

I'm working on a PR that emits StorageDead after every drop.

Just to be clear, this would happen during MIR construction when we know that this is really a "final" drop because something went out of scope, and not a "drop-and-replace", right?

@tmandry
Copy link
Member

tmandry commented May 29, 2019

Just to be clear, this would happen during MIR construction when we know that this is really a "final" drop because something went out of scope, and not a "drop-and-replace", right?

Yep.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 3, 2019

@tmandry are you saying I should close this PR and wait for you to implement the semantics you actually desire?

As far as I can tell, this PR is putting in a fix, though its not 100% clear to me whether the fix is actually observable. (The lack of tests in the PR may be a reason to r- for the short term...)

@tmandry
Copy link
Member

tmandry commented Jun 3, 2019

@pnkfelix No sorry, I agree with your conclusion that this PR does implement a fix, which probably isn't observable today (but should still be merged IMO). My work doesn't change that.

Sorry for throwing the discussion off track a bit, I was just trying to clarify my own understanding.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 4, 2019

@bors r+

@bors
Copy link
Contributor

bors commented Jun 4, 2019

📌 Commit cb6beef has been approved by pnkfelix

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 4, 2019
Centril added a commit to Centril/rust that referenced this pull request Jun 4, 2019
Make MIR drop terminators borrow the dropped location

r? @eddyb
cc @tmandry
pietroalbini added a commit to pietroalbini/rust that referenced this pull request Jun 4, 2019
Make MIR drop terminators borrow the dropped location

r? @eddyb
cc @tmandry
bors added a commit that referenced this pull request Jun 4, 2019
Rollup of 5 pull requests

Successful merges:

 - #61069 (Make MIR drop terminators borrow the dropped location)
 - #61453 (Remove unneeded feature attr from atomic integers doctests)
 - #61488 (Fix NLL typeck ICEs)
 - #61500 (Fix regression 61475)
 - #61523 (Hide gen_future API from documentation)

Failed merges:

r? @ghost
@bors bors merged commit cb6beef into rust-lang:master Jun 5, 2019
@Zoxc Zoxc deleted the drop-borrow-fix branch June 5, 2019 03:39
bors added a commit that referenced this pull request Feb 19, 2020
…sleywiser

Combine `HaveBeenBorrowedLocals` and `IndirectlyMutableLocals` into one dataflow analysis

This PR began as an attempt to port `HaveBeenBorrowedLocals` to the new dataflow framework (see #68241 for prior art). Along the way, I noticed that it could share most of its code with `IndirectlyMutableLocals` and then found a few bugs in the two analyses:
- Neither one marked locals as borrowed after an `Rvalue::AddressOf`.
- `IndirectlyMutableLocals` was missing a minor fix that `HaveBeenBorrowedLocals` got in #61069. This is not a problem today since it is only used during const-checking, where custom drop glue is forbidden. However, this may change some day.

I decided to combine the two analyses so that they wouldn't diverge in the future while ensuring that they remain distinct types (called `MaybeBorrowedLocals` and `MaybeMutBorrowedLocals` to be consistent with the `Maybe{Un,}InitializedPlaces` naming scheme). I fixed the bugs and switched to exhaustive matching where possible to make them less likely in the future. Finally, I added comments explaining some of the finer points of the transfer function for these analyses (see #61069 and #65006).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants