Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change the strategy for diverging types #40224

Merged
merged 36 commits into from
Mar 30, 2017
Merged

Conversation

nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Mar 3, 2017

The new strategy is as follows. First, the ! type is assigned
in two cases:

  • a block with a diverging statement and no tail expression (e.g.,
    {return;});
  • any expression with the type ! is considered diverging.

Second, we track when we are in a diverging state, and we permit a value
of any type to be coerced into ! if the expression that produced
it is diverging. This means that fn foo() -> ! { panic!(); 22 }
type-checks, even though the block has a type of usize.

Finally, coercions from the ! type to any other are always
permitted.

Fixes #39808.
Fixes #39984.

@rust-highfive
Copy link
Collaborator

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

@nikomatsakis
Copy link
Contributor Author

cc @eddyb @canndrew

@nikomatsakis
Copy link
Contributor Author

Still experimenting here but this shows promise. run-pass and compile-fail both basically pass, though there is one stubborn compile-fail test giving me a double warning for some reason I've not yet figured out. =)

@brson
Copy link
Contributor

brson commented Mar 3, 2017

Thanks for fixing stable regressions @nikomatsakis!

@canndrew
Copy link
Contributor

canndrew commented Mar 3, 2017

Thanks for this! I just encountered yet another package that was broken by the last ! changes, so something like this is sorely needed.

a block with a diverging statement and no tail expression (e.g.,
{return;});

Do we need to assign ! there or could we treat it the same as {return; ()}?

/// side produced `a` and one side produced `b`. If either of
/// these types is `!`, then the result is the other. Otherwise,
/// returns the LUB of the two types.
pub fn join_tys(&self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😻

@nikomatsakis
Copy link
Contributor Author

@canndrew

Do we need to assign ! there or could we treat it the same as {return; ()}?

I believe we have to treat it as ! because otherwise stuff like this will not type check:

let x = if foo {
    22
} else {
    return bar;
};

That is, if you wrote the () explicitly, the RHS has type (), and we error out because we can't unify () and usize. (If you left off the semicolon, everything would work.) So propagating the ! in this case is basically the hack that lets this code work with or without a semicolon (which I believe it should).

I agree it's a bit undisciplined, but only a bit. There seems to be no perfect answer in terms of "how much" to check in cases like these.

@nikomatsakis nikomatsakis force-pushed the issue-39808 branch 2 times, most recently from 0535e3e to f2c980c Compare March 4, 2017 02:16
@nikomatsakis
Copy link
Contributor Author

I did a crater run, though from an earlier revision (f63d9cc81c9ee3b9637b0d0343916371235967ea). There were 2 regressions and 1 timeout:

I haven't had time to investigate yet. I'll start a run in any case with the latest revision.

@nikomatsakis
Copy link
Contributor Author

New crater run gives the same two regressions. I think these are also the same problem causing the build to die. In my newer code, I am basically never creating diverging type variables; this is intentional, I wanted to see how far we could get without any fallback. It works well but we are failing with an unresolved type variable in this doctest:

```rust
use std::thread;

let handle = thread::spawn(move || {
    panic!("oops!");
});

let result = handle.join();

assert!(result.is_err());

At the moment, I'm not interesting in spending my weekend time on this bug any further, so I'll investigate later...

// None |
// Some(&Adjustment { kind: Adjust::NeverToAny, .. }) => (),
// _ => bug!("expr already has an adjustment on it!"),
// };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this used to be an assert ensuring coercions are not overwritten.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

er dang it, I meant to keep the bug!, but not the match

@nikomatsakis nikomatsakis force-pushed the issue-39808 branch 2 times, most recently from 6b4eeb5 to 5984dd7 Compare March 7, 2017 20:50
@nikomatsakis nikomatsakis changed the title WIP: change the strategy for diverging types change the strategy for diverging types Mar 7, 2017
@nikomatsakis
Copy link
Contributor Author

Crater report shows zero regressions:

  • 8350 crates tested: 6260 working / 2071 broken / 0 regressed / 0 fixed / 19 unknown.

I'm a little surprised not to see any "fixed" entries, but I think that's because many of the regressions were either "fixed" by adding explicit type annotations or occurred in tests etc.

@@ -3311,6 +3347,20 @@ impl<'a, 'gcx, 'tcx> FnCtxt<'a, 'gcx, 'tcx> {
_ => self.warn_if_unreachable(expr.id, expr.span, "expression")
}

// Non-lvalues that produce values of type `!` indicate a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lvalue thing looks like a hack. For example, isn't a dereference of an *const ! an lvalue, as in let x = *(ptr: *const !);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's interesting. I agree the lvalue rule may be a bit overly broad, although it's not obvious to me that this is wrong. For example, consider x: &! -- referencing *x does not cause divergence, rather it's supposed to just never happen. That is, the fact that x has type &! and you are able to access it reflects the fact that this code is supposed to be dead. The same is true of dereferencing a pointer of type *const !, I would argue. It does seem like it would make sense to issue some sort of warning here, though!

In any case, failing to set the "diverges!" flag then only affects unreachable code warnings and whether something like let x: ! = Some(*(ptr: *const !)); is legal (it would not be). Put another way, the diverges flag is a conservative approximation, and it should always be safe to set it less (right?).

In any case, I have not yet understood your alternative proposal, so let me try to read and process that.

@arielb1
Copy link
Contributor

arielb1 commented Mar 8, 2017

I should have looked at the code earlier, but, here am I.

The lvalue thing looks like a hack. For example, isn't a dereference of an *const ! an lvalue, as in let x = *(ptr: *const !);?

I think the cleanest way to solve this is to have both "bottom up" divergence (if node is entered, will node exit?) and "top down" (can node be reached from the start) reachability, where reachability is only used for "unreachable code" warnings.

Because the main thing divergence affects is whether blocks that return into ! can return any type (correct me if I'm wrong, but I think everybody else sets divergence to Maybe before going into a coercion), it should probably be passed "downwards" in the block code and returned everywhere else.

Reachability should probably be passed "by &mut" and cloned when you need a lattice join. I'm not sure about whether the reachability lattice should be Reachable < UnreachableWarned < Unreachable or Reachable < Unreachable < UnreachableWarned. This affects how many warnings we get in a case such as

let x = if cond {
    panic!()
} else {
    panic!();
    42 //~ WARNING unreachable code
};
42 //~ WARNING? unreachable code

Top down unreachability means we don't emit unreachable code warnings if the diverging statement was already unreachable (e.g. panic!(); { panic!(); 42 }), but I am not sure that's bad. It sounds less hacky than that lvalue thing.

// this match to get the same treatment:
//
// let _: Option<?T> = match x {
// Ok(_) => Default::default(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _match code uses try_coerce for the first arm, so you still have this problem if the arms are reversed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch! I am not entirely happy with that bit of code, but it seemed like the most expedient way to get the correct behavior. I had hoped to revisit it after introducing subtyping obligations (?T <: ?U). In any case, I'll add some examples with the match arms reversed and see what happens...

@nikomatsakis
Copy link
Contributor Author

The lvalue thing looks like a hack. For example, isn't a dereference of an *const ! an lvalue, as in let x = *(ptr: *const !);?

So, using is_lvalue was really supposed to be a convenient short-hand (aka a hack 😄) of sorts for narrowing down the kinds of expressions I care about, but I think the more interesting question is what I mean when I say causes divergence (and I'm not sure I am entirely consistent; in particular, I think I got match x { } wrong, and it does not "cause divergence").

In my view, accessing an lvalue of type ! (including *ptr where ptr: &const !) is undefined behavior. This may indeed diverge, but it may also do many other things. Luckily, it's supposed to be impossible.

In contrast, return, break, panic!() and foo() (where fn foo() -> !) cause divergence in the sense that is perfectly well-defined for control-flow to enter (and then it will, reliably, never exit normally).

In this light, match x { } does not cause divergence. A match with no arms is supposed to indicate a case where control-flow can't reach, but I am (incorrectly) setting the diverges flag here. I think that the right way to set the diverges flag is if there are >0 arms and all of them themselves diverge (this reflects the fact that match with no arms is a sort of special case, as we've seen elsewhere).

Anyway, I've got to run, so I'll try to do a better comparison to your proposal after I get my daughter out to school..

@nikomatsakis
Copy link
Contributor Author

OK, I've thought more about this but not yet reached a firm conclusion. I started preparing a super-long gist with some thoughts, but I'm still churning on those, so I just wanted to post a more limited comment amending my previous one. =) In particular, I think everything I wrote about match x { } is true -- it does not "cause divergence" in the sense that I meant -- but I think I was correct to consider it as a "diverging expression" none-the-less.

This is because it essentially plays the role of a kind of cast -- a way for users to signal that they know the code is unreachable. Kind of a safer, statically proven form of unreachable!(). An example of where this might arise is something like this code from the Void crate:

pub fn unreachable(x: Void) -> ! {
    match x {}
}

But this still builds on this notion I was going for that the set of things we consider to be "diverging expressions" ought to be a largely static set, relatively isolated from type inference (it cannot be fully isolated, because of method dispatch). I think the set of diverging expressions would be as follows:

  • return / break / continue
  • loop (without a corresponding break, which we can statically see)
  • invoking a fn that returns ! (the tricky one, see below)
  • a match where all the arms in ... diverge (this also applies if there are zero arms)
  • an if where both arms diverge (an if without an else cannot meet this definition, of course)

This property propagates upwards through enclosing expressions by default unless they join multiple control-flow paths (in which case it the upward propagation is included in the rules above).

Making the set of expressions that the compiler considers to signal "unreachability" be relatively clear and syntactic seems to fit the Rust tradition to me (e.g., it is the reason we added loop { } rather than building on while true { }). But my other reason was that I want to avoid, to the extent we can, dependence on the vagaries of type inference.

However, I am beginning to question some of the constraints I was starting from. For example, I assumed that we wanted this to type-check (because a test exists, iirc):

fn foo() -> ! {
    panic!();
    22
}

But in retrospect I'm not sure we do. After all, I broke the unit-fallback test because it is wrong-headed, and I think this may well be wrong-headed too. It's kind of hard to make sense of type-checking dead-code, but I think that one reasonable interpretation is that "one could replace the diverging expression with something that would lead to a correct type-check". That seems to not be true in this case -- so why do we apply this dead tail expression? But there may well be a reasonable reason.

One thing that I definitely think we do want to work is guaranteeing that the ; after a return is not significant. In other words, if { return } type-checks in some context, then { return; } ought to as well. This is a touch unfortunate since it means that { return; () } is not necessarily equivalent. I can live with that. But I generalized this to any statement in the block diverging, which now seems stronger than necessary -- in other words, should { return; 22; } also work? I don't think that's so important.

Anyway, this "short" comment grew to be nearly as long and disjoint as the other, but I'm going to post it anyhow. I'll probably iterate a bit more on this branch -- but I want to try and land something soon, even if we wind up tweaking it some more, since the current behavior really ought not to make it into a beta.

@nikomatsakis
Copy link
Contributor Author

@arielb1

I think the cleanest way to solve this is to have both "bottom up" divergence (if node is entered, will node exit?) and "top down" (can node be reached from the start) reachability, where reachability is only used for "unreachable code" warnings.

I agree with this. The current diverges flag is sort of playing both of these rules, and it seems a bit confusing to me. I'd probably prefer to have two distinct flags, as you say, and also to thread using parameters and return values (but I've not tried to do this yet; it may be quite painful). The current system just seems a touch error-prone to me. I'm not sure I want to do this in this branch though. Maybe.

@nikomatsakis
Copy link
Contributor Author

OK, I'm still going back and forth about how to treat match x { }. It is clearly a kind of coercion -- but I think I was perhaps right originally to say that it does not "cause divergence", rather it allows you to convert (e.g.) an empty enum -- or other uninhabited type -- into the ! type, which in turn gives access to the various special coercions that values of type ! permit. This does not necessarily imply you want an unreachable code warning though (and in fact the new warnings we were getting in the source seemed to mostly be a net negative -- the empty match was there to express that the code was ready for all cases).

@nikomatsakis
Copy link
Contributor Author

Fixing the match bug that @arielb1 identified led to some more old code not compiling. One case seemed correct to me: in the parse, there was a call like self.unreachable()?, where unreachable returns PResult<'a, T> for any T. This T is, indeed, unconstrained.

One thing that annoys me though is this case:

    tls::with_opt(move |tcx| {
        let msg = format!("{}:{}: {}", file, line, args);
        match (tcx, span) {
            (Some(tcx), Some(span)) => tcx.sess.diagnostic().span_bug(span, &msg),
            (Some(tcx), None) => tcx.sess.diagnostic().bug(&msg),
            (None, _) => panic!(msg)
        }
    });

this is judged to have an unconstrained return type in my code, and it is because the way I am typing match is leaving its result as an unconstrained type variable. This is correct in a certain sense, but not really the result I want. I want the type to be !.

The easiest way to fix this would be to have the match code check if all the arms are known to have type ! (as in this case) and then assign the match the type !. I am not thrilled about this but I guess it's not all that different from other such coercions that we do. This is the certainly the sort of "inference entanglements" I had hoped to avoid, though.

@nikomatsakis nikomatsakis force-pushed the issue-39808 branch 2 times, most recently from e3db7a1 to 03c2389 Compare March 10, 2017 14:31
For the most part, the current code performs similarly, although it
differs in some particulars. I'll be nice to have these tests for
judging future changes, as well.
Before I was checking this in `demand_coerce` but that's not really the
right place. The right place is to move that into the coercion routine
itself.
The `try_coerce` method coerces from a source to a target
type, possibly inserting adjustments. It should guarantee
that the post-adjustment type is a subtype of the target type
(or else that some side-constraint has been registered which will lead
to an error). However, it used to return the (possibly adjusted) source
as the type of the expression rather than the target. This led to
less good downstream errors.

To work around this, the code around blocks -- and particular tail
expressions in blocks -- had some special case manipulation. However,
since that code is now using the more general `CoerceMany` construct (to
account for breaks), it can no longer take advantage of that. This lead
to some regressions in compile-fail tests were errors were reported at
"less good" locations than before.

This change modifies coercions to return the target type when successful
rather the source type. This extends the behavior from blocks to all
coercions. Typically this has limited effect but on a few tests yielded
better errors results (and avoided regressions, of course).

This change also restores the hint about removing semicolons which went
missing (by giving 'force-unit' coercions a chance to add notes etc).
They are so annoying to update, and haven't caught any bugs afaik.
We no longer give suggestions; this is presumably related to the changes
I made in coercion. However, those suggestions appear to be wrong
anyhow!
@nikomatsakis
Copy link
Contributor Author

@bors r=eddyb

@bors
Copy link
Contributor

bors commented Mar 30, 2017

📌 Commit 2414222 has been approved by eddyb

@bors
Copy link
Contributor

bors commented Mar 30, 2017

⌛ Testing commit 2414222 with merge 7ae95e5...

bors added a commit that referenced this pull request Mar 30, 2017
change the strategy for diverging types

The new strategy is as follows. First, the `!` type is assigned
in two cases:

- a block with a diverging statement and no tail expression (e.g.,
  `{return;}`);
- any expression with the type `!` is considered diverging.

Second, we track when we are in a diverging state, and we permit a value
of any type to be coerced **into** `!` if the expression that produced
it is diverging. This means that `fn foo() -> ! { panic!(); 22 }`
type-checks, even though the block has a type of `usize`.

Finally, coercions **from** the `!` type to any other are always
permitted.

Fixes #39808.
Fixes #39984.
@bors
Copy link
Contributor

bors commented Mar 30, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: eddyb
Pushing 7ae95e5 to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.