Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

longjmp in Lua producing unexpected behavior in Emscripten 2.0.11 #13166

Open
MCJack123 opened this issue Jan 2, 2021 · 6 comments
Open

longjmp in Lua producing unexpected behavior in Emscripten 2.0.11 #13166

MCJack123 opened this issue Jan 2, 2021 · 6 comments

Comments

@MCJack123
Copy link

I've been working on updating the WebAssembly version of an app I develop to the latest release, and I've run into an odd bug after updating to Emscripten 2.0.11. When calling longjmp to yield a coroutine in Lua, the execution appears to resume not at the setjmp as expected, but instead after the function (chain) the longjmp was triggered in. Here's the code that's getting run (with a bunch of print statements for debugging):

void luaD_throw (lua_State *L, int errcode) {
  struct lua_longjmp *lj = L->errorJmp;
  if ((errcode >= LUA_ERRRUN && errcode <= LUA_ERREXC) && L->hookmask & LUA_MASKERROR)
    luaD_callhook(L, LUA_HOOKERROR, -1);
  if (lj) {
    if (errcode == LUA_ERRRUN)
      errcode = call_errfunc(L);
    printf("throw %d %p\n", errcode, lj);
    showStackTrace();
    lj->status = errcode;
    LUAI_THROW(L, lj); // macro for longjmp(lj->b, 1)
  }
  else {
    L->status = cast_byte(errcode);
    if (G(L)->panic) {
      resetstack(L, errcode);
      lua_unlock(L);
      G(L)->panic(L);
    }
    exit(EXIT_FAILURE);
  }
}


int luaD_rawrunprotected (lua_State *L, Pfunc f, void *ud) {
  struct lua_longjmp lj;
  lj.status = 0;
  lj.previous = L->errorJmp;  /* chain new error handler */
  L->errorJmp = &lj;
  int status = setjmp((&lj)->b);
  printf("rawrunprotected_setjmp %d\n", status);
  if (status == 0) {
    printf("rawrunprotected starting\n");
    lj.status = (*f)(L, ud);
    // after longjmp, execution resumes here
    printf("rawrunprotected ok\n");
  } else printf("longjmp triggered\n");
  showStackTrace();
  printf("rawrunprotected %d %p\n", lj.status, &lj);
  L->errorJmp = lj.previous;  /* restore old error handler */
  return lj.status;
}

When this runs, I get output in the console similar to this:

throw 1 0xab9e60
rawrunprotected ok
rawrunprotected 0 0xab9e60

I'd expect the execution to restart at the setjmp call (double-returning, with a value of 1 in this case), but instead it continues from after the function's called.

I'm not sure why this is happening at all. The last version I tested this with was 1.40.1, which worked just fine (it's currently in production as well). I don't really have much time to try to bisect this issue right now, but I might be able to check later.

The code I'm using (minus the print statements) is available at MCJack123/craftos2-lua. It's based on Lua 5.1.5, with a number of patches including a fix for yielding across C-calls. I've also attached the built Lua library (with prints) if desired: liblua.a.gz

I'm running Emscripten 2.0.11 on macOS 11.0 x64, using emsdk to download the tools.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 2, 2021

When you have this working with 1.40.1 do you know if you were using fastcomp or the llvm backend. Can you confirm by trying with 1.40.1-fastcomp and then with 1.40.1-upstream.. it could be that this code just never worked with the llvm upstream backend for some reason.

@MCJack123
Copy link
Author

MCJack123 commented Jan 2, 2021

It works just fine on both 1.40.1-upstream and 1.40.1-fastcomp:

throw 1 0xab9d40
rawrunprotected_setjmp 1
longjmp triggered
rawrunprotected 1 0xab9d40

I believe I was using LLVM before as well. I remember getting notices about fastcomp and I think I switched to LLVM then. Shouldn't really affect the outcome of this issue, as both fastcomp & LLVM work correctly on 1.40.1.

I'm going to continue testing on each version until the bug appears again.

@MCJack123
Copy link
Author

I've tracked down the issue's introduction to 2.0.1. I don't see any commits related to setjmp/longjmp between 2.0.0 and 2.0.1, but I'll keep looking.

@MCJack123
Copy link
Author

After a lot more testing, I've come to these conclusions:

  1. 2.0.1's failure appears to have been a red herring. I can't reproduce the issue on 2.0.1 anymore. Not sure why it appeared before, but it isn't happening now. Maybe I mistyped 2.0.11 instead of 2.0.1?
  2. Versions 2.0.4 and below work as intended.
  3. Versions 2.0.5 - 2.0.8 are causing my code to segfault a lot in various places, crashing the application and not letting me test the issue.
  4. Versions 2.0.9+ no longer segfault, but the longjmp issue now appears.

I've also looked through the changelog, and I wonder if #12056 in 2.0.5 may have caused or otherwise influenced this issue? I currently can't test between 2.0.4 and 2.0.5 due to the aforementioned segfaults, but I might be able to put together a proper minimal reproduction example to fully test this later.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 3, 2021

Thanks for investigating. If you could come up with a minimal test case that would great.

@aheejin
Copy link
Member

aheejin commented Oct 26, 2021

Does this still appear with the ToT? Have you checked with #12056 reverted?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants