Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bevy, on Web and WebGL, doesn't exit winit's Event Loop cleanly after a GPU crash. It panics twice with already borrowed: BorrowMutError #12979

Open
Maximetinu opened this issue Apr 15, 2024 · 9 comments
Labels
A-Windowing Platform-agnostic interface layer to run your app in C-Bug An unexpected or incorrect behavior O-Web Specific to web (WASM) builds O-WebGL2 Specific to the WebGL2 render API P-Crash A sudden unexpected crash

Comments

@Maximetinu
Copy link
Contributor

Maximetinu commented Apr 15, 2024

Bevy version

Latest released Bevy atow, which is 0.13.2

This can be reproduced at the Bevy examples posted on the web

Relevant system information

I am able to reproduce this both with Google Chrome and Firefox, both in a M1 Macbook and a Windows 11 laptop

What you did (repro steps)

image
  • Click on the example canvas to gain focus
  • Press P to panic, as usual
  • You will see an error like this. This is normal and expected, it's a deliberated panic, part of the example:
image

Now, let's reproduce the actual issue

image

Unexpected behaviour comes here.

  • Instead of showing only 1 panic: the panic that occurred
  • FAIL: It shows an additional BorrowMutError panic. See:
image

wasm_example.js:502 panicked at [...]/winit-0.29.15/src/platform_impl/web/event_loop/runner.rs:625:30: already borrowed: BorrowMutError

What went wrong

- what were you expecting?
I was expecting to see a single crash. This is needed in production environments to uniquely report crashes for analytics and error tracking, and avoiding redundancies.

- what actually happened?
Winit seems to be exiting the event loop in a wrong way, causing yet another panic, which is then handled by the console_error_panic_hook and logged as well

Additional information

  • As an alternative to simulating a GPU crash by calling the WebGL2 extension, in Google Chrome, we can also write chrome://gpucrash/ in the address bar and press enter to simulate it at a browser level. Same happens
  • This can't be reproduced on the WebGPU example because WebGPU doesn't crash upon a GPU context loss
  • I'm opening the issue here instead of on Winit's repo because, since Winit web examples don't use WebGL nor graphics, I couldn't reproduce this on them.
  • Update: I've tried setting up a minimal Winit loop example that crases when losing the WebGL context, and I haven't been able to reproduce this, which suggests it may be caused by Bevy, or by how Bevy sets up the Winit's loop?
@Maximetinu Maximetinu added C-Bug An unexpected or incorrect behavior S-Needs-Triage This issue needs to be labelled labels Apr 15, 2024
@daxpedda
Copy link
Contributor

Without diving to deep into this, my assumption is that this is a re-entrancy issue, ergo after a panic you should not re-enter the Wasm module. See rust-lang/rust#117344.

Please let me know if I got this wrong!

@Maximetinu
Copy link
Contributor Author

Yes, that's exactly what this is. In native it's impossible that a program panics twice. However, in this case something (the event loop? Some browser's animation frame call back?) is calling the WASM module after it has originally crashed.

It is a bit random, and can be influenced by the browser tab being on focus or not. In fact, I have seen the 2nd BorrowMutError panic happen a considerable amount of time after the original panic occurred.

Thankfully, despite of its randomness, I was able to find reliable repro steps to reproduce it with this Bevy example, hence I decided to open the issue to share it.

@daxpedda
Copy link
Contributor

daxpedda commented Apr 16, 2024

Just to clarify: this is not something that can be fixed by Bevy or Winit, at least not properly.

This is a Rust issue (rust-lang/rust#117344) and should be fixed in Rustc itself. It could also be fixed in wasm-bindgen (rustwasm/wasm-bindgen#3687), but there are no current plans to do so because it has to be solved in Rustc (because not all wasm32-unknown-unknown users use wasm-bindgen) either way.

@Maximetinu
Copy link
Contributor Author

Just to clarify: this is not something that can be fixed by Bevy or Winit, at least not properly.

Yep indeed, thx!

And, as an update for Bevy side, I was able to reproduce this problem on Winit's minimal web example with this code, which at least discards Bevy as being the culprit.

Because of that, I guess this issue can be closed as not planned to wait for Rustc to fix this in their side. I'll leave that to Bevy maintainers in any case.

@daxpedda, for the same reason, I guess it is not necessary to open the same issue to Winit with these steps to reproduce (the repro steps in the code linked above), right? Let me know if you think otherwise 🙂

@Maximetinu
Copy link
Contributor Author

Maximetinu commented Apr 17, 2024

Hey @daxpedda, can I make you a question about this?

If I understand it correctly, what is happening here is that the Rust's WASM module panics, but later on "something" calls into it again.

This "something" is, in this case, the browser. Winit's evt loop subscribes to a callback from web_sys, and the problem is that, when the app panics, it never has the chance to unsubscribe from it, right? So, when that callback is called next, after the panic, it panics again because of this BorrowMutError (or, in other words, it throws an RuntimeError: unreachable exception).

So, my question is, if I'm correct about this, shouldn't this be handled by Winit, by setting up a panic hook that unsubscribes from this callback?

@daxpedda
Copy link
Contributor

daxpedda commented Apr 18, 2024

Your description of the problem is correct.

So, my question is, if I'm correct about this, shouldn't this be handled by Winit, by setting up a panic hook that unsubscribes from this callback?

Panic hooks can only be set by the application, not a library.

For arguments sake, Winit could find a way to unhook all events when at least panicking inside of Winit itself, though this would be difficult and introduce significant code complexity. It doesn't really solve the problem, if the user panics or any other library, it would hit the same issue again. Bevy itself uses other libraries that hook up into events, e.g. Wgpu. If you are using Egui you would also hit similar problems.

It is safe to say that any reasonable solution to this problem has to live in either Rustc or wasm-bindgen.
In the meantime an effective solution has to live in JS and therefor deployed by the user and can't be handled by a Rust library. Alternatively users can set up some sort of application shutdown inside their panic handler when running in the browser, this is something Bevy might want to look into (because I think it provides ways to overwrite the panic handler already?).

@Maximetinu
Copy link
Contributor Author

Panic hooks can only be set by the application, not a library.

You sure? Bevy set ups its own one here:

#[cfg(target_arch = "wasm32")]
{
console_error_panic_hook::set_once();
finished_subscriber = subscriber.with(tracing_wasm::WASMLayer::new(
tracing_wasm::WASMLayerConfig::default(),
));
}

I don't know what happens when user code set ups its own. Is it overriden? They can be chained with update_hook or take_hook

though this would be difficult and introduce significant code complexity

totally true, I gave a quick try to this myself and I wasn't able

if the user panics or any other library, it would hit the same issue again

Oh, really? I didn't know that, I thought a installed panic hook was called for any code panicking in that thread, like being totally global, not per library.

It is safe to say that any reasonable solution to this problem has to live in either Rustc or wasm-bindgen

Yep. There is something I don't still get about that core solution: if the WASM module get poisoned after a panic so that it cannot be accessed further in any way, what will happen when a browser callback calls into it like it's currently happening? Will it throw an exception? If that's the case, from the JS point of view, won't it be the same, just with a different type of exception?

I found this stack overflow answer to what I think is the same issue, but it sounds too easy to be true, just using the parking_lot crate, that Winit already stopped using, I guess for a good reason:

@daxpedda
Copy link
Contributor

daxpedda commented Apr 18, 2024

Panic hooks can only be set by the application, not a library.

You sure? Bevy set ups its own one here:

Apologies, I meant "should".

While it is true that Winit could "update" the panic hook, instead of overwriting it, it would be difficult to remove it again after Winit is finished. If Winit does this, other libraries might want to do the same, registering multiple competing panic hooks which would be difficult to remove/untangle again.

Among other reasons, e.g. users should ultimately be in control of what happens in case of a panic, it is not recommended to set the panic hook in libraries (though I couldn't find that in the Rust API guidelines).

if the user panics or any other library, it would hit the same issue again

Oh, really? I didn't know that, I thought a installed panic hook was called for any code panicking in that thread, like being totally global, not per library.

I meant versus trying to unhook all event listeners when panicking inside of Winit.

It is safe to say that any reasonable solution to this problem has to live in either Rustc or wasm-bindgen

Yep. There is something I don't still get about that core solution: if the WASM module get poisoned after a panic so that it cannot be accessed further in any way, what will happen when a browser callback calls into it like it's currently happening? Will it throw an exception? If that's the case, from the JS point of view, won't it be the same, just with a different type of exception?

It would trap, which in JS land is exposed as an exception. I think you mean to say that this isn't ideal, which I agree to. This will probably only be totally solved when Rust Wasm moves to the component model 🤞, which supports module poisoning.

I found this stack overflow answer to what I think is the same issue, but it sounds too easy to be true, just using the parking_lot crate, that Winit already stopped using, I guess for a good reason:

I didn't look too closely at this, but I think this was specifically talking about parking_lot types not having a poison state, therefore they remain usable after panicking. This is not the issue we are facing here, because we lack unwinding entirely, the Mutex (or in this case the RefCell) isn't being unlocked in the first place.

@Maximetinu
Copy link
Contributor Author

ook gotcha, all clear now, thx! 🙂🙏

@NthTensor NthTensor added A-Windowing Platform-agnostic interface layer to run your app in P-Crash A sudden unexpected crash O-Web Specific to web (WASM) builds O-WebGL2 Specific to the WebGL2 render API and removed S-Needs-Triage This issue needs to be labelled labels Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Windowing Platform-agnostic interface layer to run your app in C-Bug An unexpected or incorrect behavior O-Web Specific to web (WASM) builds O-WebGL2 Specific to the WebGL2 render API P-Crash A sudden unexpected crash
Projects
None yet
Development

No branches or pull requests

3 participants