Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abort by default v2 #1765

Closed
wants to merge 5 commits into from
Closed

Abort by default v2 #1765

wants to merge 5 commits into from

Conversation

ticki
Copy link
Contributor

@ticki ticki commented Oct 5, 2016

This version is updated to reflect all the concerns mentioned in the previous proposal. Especially the concerns for stability are now fixed. Instead of switching the default, we switch what cargo new generates as well as making cargo build add panic=unwind to smoothen the transition to mandatory choice of panicking strategy.

Rendered

cc. @Ericson2314 @eddyb @rkruppe @petrochenkov @apasel422 @steveklabnik @sfackler @Amanieu @frewsxcv @aturon @brson @Valloric @gnzlbg @nikomatsakis (participants in the previous thread).

@ticki ticki mentioned this pull request Oct 5, 2016
@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 5, 2016

If I specify panic=abort for a binary but try to use a library that explicitly specifies panic=unwind (or nothing) does cargo currently produce a build time error?

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

@gnzlbg Yes.

@hanna-kruppe
Copy link

@gbzlbg It does not.

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

@rkruppe Are you sure about that? If so, I'll add it to the RFC.

@hanna-kruppe
Copy link

hanna-kruppe commented Oct 5, 2016

I tested it and reportedmy findings in the old thread. It's easy to try for yourself, too.

# Detailed design
[design]: #detailed-design

Have `cargo` generate `panic=abort` to the `Cargo.toml`. Whenever the user inteacts with `cargo` (by using a command) and the `Cargo.toml` has no specified panicking strategy, it will add `panic=unwind` and leave a note to the user:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inteacts -> interacts


Warning: No panic strategy was specified, added unwinding to the `Cargo.toml` file, please consider change it to your needs.

For libraries, nothing is modified, as they inherit the strategy of the binary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for crates with both? I believe this is a fairly common pattern; I assume they would set it to unwind since they have the binary component?

Use unwind in `debug` and abort in `release`.

Make use of more exotic instructions like `bound`, allowing e.g. overflow or bound checking without branches. This comes at the expense of error output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember seeing stack maps discussed as an alternative, can someone speak to that? I don't know much about whether those are beneficial here/related to this....

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 5, 2016

For libraries, nothing is modified, as they inherit the strategy of the binary.

It seems to me that we should make the panic strategy always explicit, and if a library can work with both abort and unwind we should explicitly say so with panic=all (or a bikeshed thereof), which should probably be the default for libraries.

@Idyllei
Copy link

Idyllei commented Oct 5, 2016

I don't quite understand why we should force a program into aborting as a binary.

Is it possible that the program was specifically designed to take advantage of unwinding and that removing that possibility could hurt the performance/reliability in some way?

Maybe the performance hit from unwinding is negligible for a certain application, or the program is (hopefully not) developed in a way that unwinding is required for it to function correctly.

I do not think a programmer should be forced into using one method of recovering from errors just because it might save space or decrease compile time.

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

@Idyllei

Is it possible that the program was specifically designed to take advantage of unwinding and that removing that possibility could hurt the performance/reliability in some way?

Chances are that it's not, and thus aborting is IMO a saner default. You don't "force" the program into anything, you change the default, and if you believe your program is better off w/ unwinding, then you can just change the default.

@Mark-Simulacrum
Copy link
Member

I might be wrong, but won't abort by default make hyper and it's stack unable to survive panics in the request handler (without changing the behavior to unwinding)?

I believe the RFC should discuss the impact on libtest (which depends on non-abort), hyper/iron, and other programs which likely need to survive a panic. While changing the panic behavior is an option, I don't believe it should be required to alter defaults when adding tests.

I believe @ticki is arguing that changing the default is easy when required, but I think it still is an overhead, especially for newcomers to the language, that we should try to avoid.

@gnzlbg
Copy link
Contributor

gnzlbg commented Oct 5, 2016

@Idyllei

I don't quite understand why we should force a program into aborting as a binary.

One pillar of Rust is zero-cost abstractions. This goes hand-in-hand with "don't pay for what you don't use".

Currently, the default is that all binaries pay for unwinding in terms of performance, binary size, and compilation speed, with respect to aborting, even if they do not use unwinding or anything that does.

This RFC just argues that it would be a better default for new projects to abort by default, and if they want to use a library that requires unwinding (or use unwinding themselves), then opt into it. Instead of the other way around.

As a side-effect, this discourages library writers even more from writing libraries that implicitly assume that most binaries will use unwinding as a panicking strategy. This might encourage library writers to assume that most binaries use abort as a panicking strategy, but this might be a good thing. Since while panics should only be used for fatal unrecoverable errors, it is tempting for some authors to use them for error handling just because they think they might get away with it. Switching to abort as a default is a clear statement that this isn't the intended usage of panic, which I think is a nice side-effect of this RFC, but not the main reason while it is being proposed.

The main reason is "pay for what you use, don't pay for what you don't use, by default".

@Mark-Simulacrum
Copy link
Member

Another argument that I see is, as far as I know, there has been discussion treating OOM and other similar conditions as "panic" which could then be caught when necessary.

However, despite my earlier comments, I think this RFC in general is a good idea especially considering the "don't pay for what you don't use" argument. However, I believe that one thing to consider is that rustc itself has to use unwinding (as far as I know), and I suspect this will be common. It would be interesting and helpful to have some numbers as to the percent of crates that depend at some level on unwinding instead of aborting.

@ticki
Copy link
Contributor Author

ticki commented Oct 5, 2016

@Mark-Simulacrum Your 2nd last comment is a good point. I've tried to address it in the latest commit.

@Ericson2314
Copy link
Contributor

@ticki I'd also add a benefit on simpler programming model---use what you need in terms of cognitive burden too.

One hitch is that as per rust-lang/rust#32837 (comment) I don't think the current Cargo option is very well designed. Relatedly, post-needs-provides I'd like rustc to only think of the ABI concerns of various panic strategies, not the strategies themselves. Together this means while I am 100% on board the goals of this RFC, I also don't want to stabilize Cargo's current interface any time soon :/.


Keep unwinding as default.

Make Cargo set `panic=abort` in all new binaries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer an alternative but the main proposal.

@Mark-Simulacrum
Copy link
Member

Regarding the unresolved question, searching for catch_unwind is a start, but I believe at most (probably a subset of) all crates which spawn threads depend on unwinding. For example, Iron and Hyper both do not have catch_unwind in their codebase, but do catch unwinds at the thread boundary.


Will the proposed panic=any be passed to rustc with -Cpanic=any? Or will it be Cargo which checks crate compatibility?

@Idyllei
Copy link

Idyllei commented Oct 5, 2016

As I understand it, if one uses panic=abort in their code, they are not able to use (binary?) libraries that have panic=unwind without compilation errors.

Would it be feasibly possible to allow interconnected libraries/crates to implement differing panic=_ settings?

For example: library A uses panic=unwind, library B uses panic=abort, and B imports/uses library A. Should a panic cause the section of the stack for A to be unwound? If it does, then would the unwinding cause the program to abort upon reaching a section of the stack that uses panic=abort?

The process might be like this:

  1. panic!() is called from within A.
  2. The stack is unwound until either:
    1. a solution is found, or
    2. it reaches a panic boundary (a stack frame which belongs to a library that uses panic=abort).
  3. If it reaches a Panic Boundary, it aborts execution.

@codyps
Copy link

codyps commented Oct 5, 2016

While I'd like to get abort by default, I'm not sure this is really the best way to do it:

Crates requiring panic=unwind is (in some sense) an unstated dependency (of a sort: a crate "depending" on panic=unwind would, in a conservative setup, require that all other crates it links to are built with panic=unwind), and I'd much rather we have a way for that dependency to be stated rather than requiring the end consumer crate to know. This RFC still seems to try to work within the existing cargo configuration, which I don't think can represent that dependency.

Anyhow, a more specific RFC comment:

  • I'd recommend against having cargo modify an already created Cargo.toml. Instead, just keep warning about the lack of a specified panic strategy, defaulting to the target's default (targets recently grew support to specify the panic strategy).

@hanna-kruppe
Copy link

@Idyllei

As I understand it, if one uses panic=abort in their code, they are not able to use (binary?) libraries that have panic=unwind without compilation errors.

No, this scenario works. std is distributed as binary with unwinding support, and almost all panic=abort applications happily link against it. What doesn't work today is linking a panic=abort library into a panic=unwind application. When @ticki brought this up in the previous RFC PR, we talked about it on IRC about why it's difficult (in short: you can't unwind through code compiled without unwinding support, and because of function pointers and trait objects and other things you may need to do that). Your proposal works around that problem by aborting when unwinding up to a frame without unwinding support, but I'm unsure whether that behavior is desirable, and what's more, I'm unsure whether it's possible: unwinding across an FFI boundary is explicitly undefined behavior (rather than, say, aborting). Some of the difficulties with handling foreign stack frames may carry over to handling Rust-with-panic=abort stack frames.

I'm also fuzzy on what the benefit of such a change would be.

@seanmonstar
Copy link
Contributor

Iron and Hyper both do not have catch_unwind in their codebase, but do catch unwinds at the thread boundary.

Regarding hyper: in version 0.9.x, it does spawn threads (since thats required to handle more than 1 connection thanks to synchronous IO), it doesn't depend on the the threads catching panics. If the process aborted instead, that'd be fine. That may not be something a user wants of their server, and instead wants the other connections to be able to safely shutdown or whatever, but that's not something hyper cares about.

In version 0.10, once using async IO, hyper won't even be spawning threads, so it's even less of an issue. Instead, any panic during the event loop means it could all die in a fire.


## Libraries

For libraries, the `Cargo.toml` is not modified, as they inherit the strategy of the binary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflicts with the above, which says a default of any will be added.

@hanna-kruppe
Copy link

One aspect that the RFC, and most of the discussion so far, has ignored is that the panic strategy is per profile. There is not one panic strategy but potentially many, and in fact I will later argue that they should not all be the same.

I build some crates with panic=abort, but I only do this for the release profile (I'd also add it to the bench profile if I had any benchmarks that go through Cargo). The dev and test profiles in my projects default to unwinding and I don't really see a problem with that:

  • The performance gains will evaporate without optimizations
  • The size gains will be greatly reduced since the binary is much larger anyway
  • The compilation speed gain still applies, but probably slightly less so (smaller code is generated, but there also aren't as many optimization passes combing through all that code)

Furthermore, unwinding is virtually unavoidable for testing, since it's needed for #[should_panic] tests! By the way, this is not a weakness of the current testing harness, anything which tests for panics will need to detect panics and carry on, unless it launches a whole new process for every such test.

Additionally, it has been pointed that there are some downsides to 99% of projects being built with the same panic strategy, regardless of which one it is:

  • predominance of unwinding means people could more easily use catch_unwind for error handling
  • predominance of aborting means unsafe code may not be written and tested for exception safety

Therefore, it seems good to "mix things up". That is, if I could choose what panic strategy 99% of crates should have, I would opt for:

  • abort in release, bench — default to maximum performance
  • unwind in dev, test, doc — necessary for a significant chunk of testing, and avoids a panic=abort monoculture

@internetionals
Copy link

The only way I could see this working, would be to require a crate to explicitly state panic=unwind or panic=any in order to make 'catch_unwind' available. This way any crate that relies on unwind will automatically need annotations.

All other crates should not have to specify anything and thus should follow the dependee.

Crates can always explicitly set panic={unwind,abort,any} and all when compiling a program all crates should have compatible settings. But the idea is that almost no library crate should need such an explicit setting and mismatches should thus very rarely occur.

The behaviour of thread exit is not tied to thus setting, since it represents a totally different isolation boundary. The most common form if panic recovery in this case is around worker pools and letting the panic policy be controlled by the dependee is idiomatic in this case. In the very rare case that a crate is absolutely relying on error recover through threads panicing it can still explictly state panic=unwind.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 5, 2016
@sfackler
Copy link
Member

With that argument we should not have made mem::forget safe in the first place. And I believe several leakages can happen with panic=unwind too, Vec::drain comes to mind e g.

unsafe does not mean "a bad idea". It has a very specific meaning.

@diwic
Copy link

diwic commented Oct 13, 2016

@sfackler Fair enough; forget about unsafe - I meant to say that if you want to avoid leaking memory, you likely have more important things to worry about (mem::forget, Rc cycles, etc). And if panicking is something that's going to happen regularly during the lifetime of your program, surely the middle ground panic mode is not for you.

And all I'm asking for is that you give it a thought and evaluate the upsides of my proposal, before just dismissing it because of what you see as one specific downside?

@sfackler
Copy link
Member

In what way are mem::forget, Rc cycles, etc "more important things" than all of the memory associated with a a thread leaking all at once? Is this some kind of absolutist mindset of "unless we can formally verify that literally no memory will leak forever, we may as well leak all of it"?

Adding a new unwinding strategy has costs. Library authors like @nikomatsakis will need to go back and review their code to decide if it is or is not sound in the presence of this new strategy. It is yet another subtle variant of unwinding strategies a user considering their pick will need to think about.

How common is this middle ground use case of not wanting to pay whatever cost landing pads have, but still want to do some amount of cleanup after a panic (that I guess is recognized at the thread join?) before exiting?

@arielb1
Copy link
Contributor

arielb1 commented Oct 13, 2016

@sfackler

Well, you can implement begin_unwind as an infinite loop (either a spinloop or with sleep). No soundness problems then - this can be done in entirely safe code.

@diwic
Copy link

diwic commented Oct 13, 2016

In what way are mem::forget, Rc cycles, etc "more important things" than all of the memory associated with a a thread leaking all at once?

They are more important because they can happen in non-panicking code.

Is this some kind of absolutist mindset of "unless we can formally verify that literally no memory will leak forever, we may as well leak all of it"?

No; it's because panicking happens so rarely, I'm not willing to pay any performance cost in my ordinary non-panicking code.

Adding a new unwinding strategy has costs. Library authors like @nikomatsakis will need to go back and review their code to decide if it is or is not sound in the presence of this new strategy. It is yet another subtle variant of unwinding strategies a user considering their pick will need to think about.

Only if you rely on a specific destructor behaviour to be safe, which the nomicon explicitly says that you cannot do.

(Side note: Thinking more about it, perhaps rayon can replace its current strategy with a catch_unwind instead?)

How common is this middle ground use case of not wanting to pay whatever cost landing pads have, but still want to do some amount of cleanup after a panic (that I guess is recognized at the thread join?) before exiting?

If you're writing a library, to be called by a non-Rust host process, and also want maximum performance, this middle-ground mode is your best option. Because killing a host process is not nice behaviour; you don't know if the host process wants to clean something up.

For my own part, I'd just use it by default everywhere.

@sfackler
Copy link
Member

Leaking memory of your host process until it gets OOM killed is also not nice behavior, and is in some ways worse since it can be far more difficult to track down the cause.

@diwic
Copy link

diwic commented Oct 13, 2016

@sfackler

Leaking memory of your host process until it gets OOM killed is also not nice behavior, and is in some ways worse since it can be far more difficult to track down the cause.

That is unlikely to happen for two reasons:

  1. Because panicking is rare, like it is for most applications - if it is not, then (as stated earlier) the middle-ground option is not for you.

  2. Because you will not silently drop the error at the catch_panic boundary but instead do something wiser with it, that enables someone to track down the issue.

@ticki
Copy link
Contributor Author

ticki commented Oct 13, 2016

Memory leaking is a no-no, and honestly I think that this discussion has gone way out of the actual scope

@diwic
Copy link

diwic commented Oct 13, 2016

@ticki

Memory leaking is a no-no

But panic=unwind can leak memory too (e g in Vec::drain) and panic=abort leaks everything that the OS does not clean up for you. And there are many other ways of memory leaking, even in non-panicking code.

So I disagree with that statement.

@ticki
Copy link
Contributor Author

ticki commented Oct 13, 2016

But panic=unwind can leak memory too (e g in Vec::drain)

First of all, it's pretty rare that'll happen. Second of all, if it does it is a ridiculously small amount. By leaking everything owned in the call stack, you leak potentially enormous amounts of memory.

and panic=abort leaks everything that the OS does not clean up for you.

Like what? Files, resources, virtual memory, poll queues, etc. etc. are all cleared for you.

@diwic
Copy link

diwic commented Oct 14, 2016

@ticki a quick googling came up with this stackoverflow link for a list of things not cleaned up by Windows. Other operating systems may have other lists of things they do not clean up.

@rustonaut
Copy link

I think the RFC should mention that certain parts of std and possible other libraries should not be usable => cause an error on compilation

currently this is mainly catch_unwind but I don't think the compiler should know about a specific function to do this, so why not add an annotation e.g.: #[needs_unwinding].

With this if a library uses a part marked with it and panic=abort or panic=any then the compiler should emit an error. If the library does not specify panic=XX the compiler could emit a warning, and can treat it as if panic=unwind was specified. Optionally cargo could additionally automatically add panic=unwind to the libraries Cargo.toml file

@ticki
Copy link
Contributor Author

ticki commented Oct 14, 2016

@dathinab Using catch_unwind is not necessarily indicative of not being compatible with abort.

@rustonaut
Copy link

@ticki your right using catch_unwind in your library does not mean you can't support panic=aboard.

But, if you specify panic=abort in your library then using catch_unwind is definitely a bug.

Also I would guess that many crates where catch_unwind is used would behave unexpected in some situations if used with panic=abort, mainly because the usage catch_unwind kind of implies that you expect to be able to handle panics gracefully.

The only situation I can think of where catch_unwind (in --lib) + panic=abort (in --bin) makes sense are:

  1. catch_unwind is used by a feature/library part unused by the end-program (--bin)
  2. it is used to get some stability through error kernels or similar but the end-program (--bin) does not rely on this

So yes, erroring on panic=any(in --lib) + catch_unwind (in --lib) and the (optinal) adding of panic=unwind I mentioned are not working out. But it might still be good to:

  • error if a library does specifies panic=abort and uses catch_unwind itself
  • warn if a library does not specifies panic=XX and uses catch_unwind (as well as some attribute to mark catch_unwind and other possible functionality "for unwinding")
  • don't limit this to the std's catch_unwind function but use some annotation to mark catch_unwind and possible other functions (through this could be a implementation detail which could be limited to nightly forever)
    • but don't name the annotation #[needs_unwinding], this was just an example name anyway and would not really capture it's intended meaning, more like: #[some_functionallity_which_only_makes_sense_with_panic_eq_unwinding_without_preventing_the_usage_of_panic_eq_abort] but much shorter ¯_(ツ)_/¯

@ticki
Copy link
Contributor Author

ticki commented Nov 3, 2016

rust-lang/rust#37551 (comment)

https://github.com/redox-os/ralloc/blob/skiplist/src/bk/node.rs#L52

https://github.com/aji/ircd-oxide/blob/master/src/irc/client.rs#L100

I see this pattern all the time. Search for unreachable! on github. You'll find that a significant amount of these could be removed using replace_with.

Wrong thread.

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 3, 2016

Cross-linking to another discussion of this RFC and related themes.
https://groups.google.com/a/isocpp.org/forum/?fromgroups#!topic/sg14/k0VWxA52c5M

TLDR:
Side 1: I'd be nice to standardize something similar to panic abort for C++.
Side 2: Exceptions are misunderstood, Rust error handling doesn't work well in practice.

@DemiMarie
Copy link

Debug builds should definitely unwind. Performance is irrelevant here; being able to log a detailed error is.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp close

There hasn't been a lot of activity on this thread lately. I think the basic tradeoffs are well understood, so let me summarize:

  • The RFC proposes to make abort-by-default the default strategy for projects created by cargo new, as opposed to unwinding.
  • This does not affect the semantics of any existing projects.
  • The motivation is:
    • potential space/performance/compile-time benefits
    • helping to ensure people don't abuse catch-unwind
  • However:
    • it's unclear how representative the performance etc benefits really are
    • both controlled unwind and abort are good choices for many applications
      • no obvious best choice, might as well stay with the existing one
    • there are also risks to abort by default
      • e.g. that people will overlook the need for exn safety if they don't realize unwinding is possible

I think ultimately there isn't enough reason to make a change here. Hence I nominate that we close this RFC. Thoughts?

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 5, 2017

Team member @nikomatsakis has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@michaelwoerister
Copy link
Member

@rfcbot reviewed

3 similar comments
@vadimcn
Copy link
Contributor

vadimcn commented Jan 5, 2017

@rfcbot reviewed

@japaric
Copy link
Member

japaric commented Jan 5, 2017

@rfcbot reviewed

@pnkfelix
Copy link
Member

pnkfelix commented Jan 6, 2017

@rfcbot reviewed

@brson
Copy link
Contributor

brson commented Jan 6, 2017

I agree that switching to abort by default isn't something to go for right now.

@aturon
Copy link
Member

aturon commented Jan 6, 2017

The bot appears to be stuck, but everyone has signed off, so:

🔔 This RFC is entering its final comment period with disposition to close. 🔔

See the summary for more details.

@aturon aturon added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jan 6, 2017
@ticki
Copy link
Contributor Author

ticki commented Jan 13, 2017

😢

@aturon
Copy link
Member

aturon commented Jan 13, 2017

The final comment period for this RFC has elapsed. There hasn't been any further discussion since the lang team summary, so I'm going to go ahead and close.

Thanks, @ticki for the RFC!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.