Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault due to stack size limit when recursing? #30

Closed
critiqjo opened this issue Jan 30, 2016 · 22 comments
Closed

Segfault due to stack size limit when recursing? #30

critiqjo opened this issue Jan 30, 2016 · 22 comments

Comments

@critiqjo
Copy link

fn crazy_recurse(d: usize) -> usize {
    if d == 0 { d }
    else { 1 + crazy_recurse(d - 1) }
}

fn main() {
    println!("Heil Hydra {}", crazy_recurse(16000));
    Scheduler::new().with_workers(1)
        .run(|| {
            for i in 0..4 {
                Scheduler::spawn(move || {
                    println!("Heil Hydra {}", crazy_recurse(1964)); //1963 works fine!
                });
            }
        }).unwrap();
}

Issue only in debug build, probably because the recursion gets optimized out in release build. If this can't be resolved quickly, I think it is worth a mention in the readme.

@lhecker
Copy link
Collaborator

lhecker commented Jan 30, 2016

The default stack size of new coroutines is 128KB. This means that you're using about 64 bytes per recursion.

BUT: You can optionally specify the stack size of newly spawned coroutines. Just replace your call to spawn(func) with spawn_opts(func, opts), where opts is a coio::Options instance.

If you have any idea how to improve this API you can mention it here. 😃
It might take some time until we implement it though, because I'd consider this as low-priority (PRs are always welcome though).

@critiqjo
Copy link
Author

A quick resolution could simply be emitting a panic instead of segfault. Stack resizing at runtime seems hard.

@critiqjo
Copy link
Author

I don't know if it can be "quick", but it should be "easier", right? Segfault is really bad!

@lhecker
Copy link
Collaborator

lhecker commented Jan 30, 2016

You do realize that not even plain Rust itself panics on a stack overflow, right? In practice it simply does not happen because your stack is usually so large that it will never overflow. So... I think it's unlikely that we will ever implement it.

It might be possible though by using some kind of guard page that triggers a interrupt when it's accessed or something like that. But I've never done something like that before and wouldn't really know where to start.

What is possible though is to have growing stacks on x64 by using lazily allocated pages. But that is also something I've never done before and not really some "easy" thing to achieve and thus would be low-priority too.

@critiqjo
Copy link
Author

I see... By lazily allocating, do you mean: issuing a very large malloc assuming that the physical pages will be mapped lazily? Or is it something else?

@lhecker
Copy link
Collaborator

lhecker commented Jan 30, 2016

I've only remotely heard about this before, but AFAIR: Yes by using a large allocation and letting the MMU allocate physical pages lazily.

@zonyitoo
Copy link
Owner

It is impossible to panic when you triggered stack overflow. Currently in Coio, we uses a protected pages at the end of the allocated stack space. When you access to that page, OS will kill the program for invalid access, which is why you get the "unknown error occur" message in your program.

I personally don't want to support segmented stack or reallocate stack like the latest Go does. Because it just add complexity in the runtime. If you want to run your crazy recursion function, you can increase the stack size by:

use coio::Builder;

Builder::new().stack_size(whatever_stack_size_you_want).spawn(|| { ... });

And yes, the memory allocating is lazy. OS wil allocate the page when you actually need it.

@critiqjo
Copy link
Author

I personally don't want to support segmented stack or reallocate stack like the latest Go does. Because it just add complexity in the runtime.

Fair enough... Just a suggestion that it might be good to add a "Limitations" section in the Readme.

Closing.

@critiqjo
Copy link
Author

A draft:

Limitations

  • Stack size per coroutine is limited by 128KiB by default, you can change this by using spawn_opts or custom Builder.
  • The last page of the stack is "protected", which will trigger an invalid access and terminate the process when hit. (This is not guaranteed when using FFI ... [reasons]??)

@zonyitoo
Copy link
Owner

I could explain why I don't want to add segmented stack and continuous stack:

  1. Segmented stack clearly has huge performance issue. Imagine you have a loop right at the boundary of the current stack, then a new segment of stack will be created/destroyed many times.
  2. Continuous stack is bad for FFI call. It is very hard to do it right without the help of compiler. Many of the Rust's libraries uses FFI to call C libraries, so it is not an option.

You could open another issue for adding limitation section in README. Or you could just open a PR, I am happy to merge it.

@zonyitoo
Copy link
Owner

This is not guaranteed when using FFI

No, it is guaranteed. The process will be killed by the OS when you access to the last page (4KiB in most of the OSes).

@critiqjo
Copy link
Author

No, it is guaranteed.

These comments are outdated?

@zonyitoo
Copy link
Owner

Ah, I noticed a possible situation. The last page is 4KiB. What if a function call eventually skip that page and access to the pages behind it? For example, you allocate a huge array on the stack but you don't use it, which happens to cover the whole last page. So the other local variables may be allocated after the last page.

@zonyitoo
Copy link
Owner

This could be added into that comment. I nearly forget about it.

@critiqjo
Copy link
Author

For example, you allocate a huge array on the stack but you don't use it, which happens to cover the whole last page.

Is it limited to FFI, or can it happen in safe Rust too?

@zonyitoo
Copy link
Owner

Nope. You can never use uninitialized memory in safe Rust. So it only affects unsafe Rust (including FFI).

@critiqjo
Copy link
Author

You can never use uninitialized memory in safe Rust.

Did you mean: you can never have allocated but uninitialized stack regions in safe Rust?

Yay, got "an unknown error" here ✌️ :

fn print_test() -> usize {
    println!("Hello!");
    return 128;
}

fn main() {
    Scheduler::new()
        .with_workers(1)
        .run(|| {
            let x = print_test();
            if x != 128 {
                let a = [0usize; 16000]; // thought this will be uninitialized
                                         // until program counter reached here
                println!("{}", a[8000]); // guess not!
            }
        }).unwrap();
}

@zonyitoo
Copy link
Owner

The expression [0usize; 16000] is equivalent to memset the allocated memory to 0 in C. So it is initialized and the program will hit the protect page.

This is the unsafe Rust version of your program:

use std::mem;

fn print_test() -> usize {
    println!("Hello!");
    return 128;
}

fn main() {
    Scheduler::new()
        .with_workers(1)
        .run(|| {
            let x = print_test();
            if x != 128 {
                let a: [usize; 16000] = unsafe { mem::uninitialized() }; // Allocate without initialization
                println!("{}", a[8000]);
            }
        }).unwrap();
}

This should work, but it may be killed by the OS somehow.

In my laptop, it reports SIGBUS.

@zonyitoo
Copy link
Owner

I got the emitted LLVM IR:

; Function Attrs: inlinehint uwtable
define internal void @"_ZN4main16_$LT$closure$GT$12closure.7156E"(%closure) unnamed_addr #9 {
entry-block:
  %x = alloca i64
  %_a = alloca [16000 x i64]
  %1 = call i64 @_ZN10print_test20hddcb26cfc9bc004ahaaE(), !dbg !12035
  store i64 %1, i64* %x, align 8, !dbg !12035
  call void @llvm.dbg.declare(metadata i64* %x, metadata !12038, metadata !6391), !dbg !12035
  %2 = load i64, i64* %x, align 8, !dbg !12039
  %3 = icmp ne i64 %2, 128, !dbg !12039
  br i1 %3, label %then-block-61-, label %next-block, !dbg !12039

then-block-61-:                                   ; preds = %entry-block
  call void @_ZN3mem13uninitialized21h12808030131495532484E([16000 x i64]* noalias nocapture sret dereferenceable(128000) %_a), !dbg !12040
  call void @llvm.dbg.declare(metadata [16000 x i64]* %_a, metadata !12043, metadata !6391), !dbg !12044
  br label %next-block

next-block:                                       ; preds = %entry-block, %then-block-61-
  ret void, !dbg !12045
}

It seems that it still going to do something with the allocated %_a. I don't know what it have done, but it actually triggered SIGBUS.

@critiqjo
Copy link
Author

I got the emitted LLVM IR:
... too many symbols here ...
... actually triggered SIGBUS.

I see... (well, not really 😄)

I will issue a PR with "Limitations" section within 3 days (hopefully)...

@critiqjo
Copy link
Author

critiqjo commented Feb 4, 2016

rust-lang/rust#31273 might be of interest.

@zonyitoo
Copy link
Owner

zonyitoo commented Feb 4, 2016

Good idea. But they are going to implement it in the signal handler, right? Then we cannot define it for ourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants