SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

geofft · 2015-06-20T18:10:38Z

fn main() {
    unsafe {*(0 as *mut u32) = 0};
}

playpen: application terminated abnormally with signal 4 (Illegal instruction)

The reason for this is that, when a UNIX signal is being handled, UNIX blocks that signal while the handler is running, by default. However, this particular signal handler tries to re-raise the signal when it's done with processing. This doesn't do anything since the handler is still running, so execution continues with the next line, intrinsics::abort(), which induces SIGILL. (The code tries to reset the signal disposition to avoid re-entering the same handler, but that's not enough; the mask of blocked signals also needs to be cleared.)

There are a couple of possible solutions to this. The straightforward one is to cause the signal not to be masked. I have a patch for this that depends on my signal-FFI-bindings refactoring in #25784; I can submit it as a PR once that lands.

Another one, which I prefer, is to just let the handler terminate instead of re-raising. There was discussion about this previously, where it was noted that glibc's manual says this is undefined for "program error signals" like SIGSEGV and SIGBUS. POSIX agrees with that. However, most platforms in practice define this behavior, and allow you to return from these handlers, so that you can do things like userspace page-fault handling.

For example,

Google's Breakpad crash-handling library returns from SIGSEGV on Linux and on Solaris. (On Darwin they use Mach exceptions.)
The Oracle JVM relies on being able to return from a SIGSEGV handler: they use it as a trick for stop-the-world pauses with low overhead. (Periodically each thread will do a single read from a special page, which is ~one instruction. If the GC wants to stop the world, it'll unmap the page to cause each thread to fault, and once it's ready to continue the world, it'll remap the page and make all threads return from their signal handlers.) So any UNIX platform where the JVM works should support this.
GNU libsigsegv is a library for doing things like userspace paging, and it's predicated on the assumption that returning from SIGSEGV is possible. The PORTING file has a wide list of supported platforms, including Linux, Darwin, FreeBSD, OpenBSD, NetBSD, Solaris, and MinGW.

So I think that it's merely the case that returning from a program error handler is unspecified in POSIX, but just about all actual OSes we care about support it.

This has the advantage that the exact signal is re-delivered to kill the program, with the right siginfo (indicating it died because of a memory error, not because someone manually sent SIGSEGV), so the last frame in a coredump is right, dmesg prints a line, etc. We can keep the re-raise for unknown platforms, but for platforms where we know that returning from the handler works, that seems both simpler and better. (Breakpad takes this approach for the same reasons.)

The final option is just to remove this code. What it does, as far as I see, is to print an error message and die if the segfault was on the guard page, and just die otherwise. I don't think there's a compelling reason to print a message for stack overflow, especially as caught by SIGSEGV (it makes more sense if it's caught by stack probes or morestack). In any other systems language, overflowing the stack just gets you killed with SIGSEGV, and installing a special handler doesn't seem to match the runtime-removal philosophy. It made sense in the librustrt world, but I'd argue it's not useful now. But if other people are finding the handler / the error message useful, that's fine.

Cc @Zoxc for advice, as the original author of the segfault handling code.

The text was updated successfully, but these errors were encountered:

Zoxc · 2015-06-20T18:54:18Z

If we can return from SIGSEGV and SIGBUS on some platforms, I support that option. For other platforms we should reset the mask as you suggest.

I find it very important to be able to quickly identify a stack overflow, as to not raise confusion with way more scary cases of signals. As long as you stick with safe Rust, you should get friendly error messages.

alexcrichton · 2015-06-21T18:13:56Z

I agree with @Zoxc that having the error message is quite nice and I'd prefer to not lose it, but it sounds like the option of just returning from the signal handler on known platforms is a good idea! I like the idea of re-raising the exact same signal.

Thanks for the investigation @geofft, and feel free to submit a PR!

geofft · 2015-07-29T23:23:04Z

All the non-Windows ports currently in mk/cfg support returning from a SIGSEGV handler:

Per Breakpad, Linux, at least i386, x86-64, arm, aarch64, and mips.
Per the JVM, OS X, FreeBSD, OpenBSD, and DragonFlyBSD on x86 and x86-64.
Per Mono, which installs a signal handler to convert NULL pointer dereferences into C# exceptions, iOS. (I think they even stuff a new stack frame on the main stack, for extra awesome.)
Per libsigsegv, Linux on PPC.

(There are some overIaps and some of these sources attest to way more, I just wanted to confirm that at least one source attests to each current Rust target. I don't actually think this is arch-specific on any OS.) So this is safe to do unconditionally, as @alexcrichton has done in PR #27338, and I'm happy to let that PR close this.

Porters to new UNIXes may need to watch for this for correctness. This is also safe on Solaris (per Breakpad and JVM); I think the only non-supporting UNIX that Rust might realistically get ported to is MINIX 3.

alexcrichton · 2015-07-30T00:17:25Z

Awesome, thanks for the investigation @geofft!

This commit removes all morestack support from the compiler which entails: * Segmented stacks are no longer emitted in codegen. * We no longer build or distribute libmorestack.a * The `stack_exhausted` lang item is no longer required The only current use of the segmented stack support in LLVM is to detect stack overflow. This is no longer really required, however, because we already have guard pages for all threads and registered signal handlers watching for a segfault on those pages (to print out a stack overflow message). Additionally, major platforms (aka Windows) already don't use morestack. This means that Rust is by default less likely to catch stack overflows because if a function takes up more than one page of stack space it won't hit the guard page. This is what the purpose of morestack was (to catch this case), but it's better served with stack probes which have more cross platform support and no runtime support necessary. Until LLVM supports this for all platform it looks like morestack isn't really buying us much. cc rust-lang#16012 (still need stack probes) Closes rust-lang#26458 (a drive-by fix to help diagnostics on stack overflow)

@brson

This commit removes all morestack support from the compiler which entails: * Segmented stacks are no longer emitted in codegen. * We no longer build or distribute libmorestack.a * The `stack_exhausted` lang item is no longer required The only current use of the segmented stack support in LLVM is to detect stack overflow. This is no longer really required, however, because we already have guard pages for all threads and registered signal handlers watching for a segfault on those pages (to print out a stack overflow message). Additionally, major platforms (aka Windows) already don't use morestack. This means that Rust is by default less likely to catch stack overflows because if a function takes up more than one page of stack space it won't hit the guard page. This is what the purpose of morestack was (to catch this case), but it's better served with stack probes which have more cross platform support and no runtime support necessary. Until LLVM supports this for all platform it looks like morestack isn't really buying us much. cc #16012 (still need stack probes) Closes #26458 (a drive-by fix to help diagnostics on stack overflow) r? @brson

steveklabnik added the A-libs label Jun 23, 2015

alexcrichton mentioned this issue Jul 27, 2015

Remove morestack support #27338

Merged

bors closed this as completed in #27338 Aug 11, 2015

lambda mentioned this issue Feb 1, 2016

Stack overflow should abort the process normally, not segfault and dump core #31273

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

geofft commented Jun 20, 2015

Zoxc commented Jun 20, 2015

alexcrichton commented Jun 21, 2015

geofft commented Jul 29, 2015

alexcrichton commented Jul 30, 2015

SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

Comments

geofft commented Jun 20, 2015

Zoxc commented Jun 20, 2015

alexcrichton commented Jun 21, 2015

geofft commented Jul 29, 2015

alexcrichton commented Jul 30, 2015