Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async terminateVat has surprising synchronous expectations from callers #10073

Open
mhofman opened this issue Sep 11, 2024 · 0 comments
Open
Labels
enhancement New feature or request

Comments

@mhofman
Copy link
Member

mhofman commented Sep 11, 2024

What is the Problem Being Solved?

Figuring it out was an interesting dive... vat-vat-admin.js terminateWithFailure ends with a synchronous call to D(vatAdminDev).terminateWithFailure, which through further synchronous calls in device-vat-admin.jskernel/deviceManager.jskernel.jskernelSyscall.jsvat-admin-hooks.js
invokes void terminateVat(vatID, true, marshalledReason), which means that only the synchronous prelude of terminateVat has an effect before the result promise of E(vatAdminNode).terminateWithFailure(reason) settles and enqueues notifies.

Before this PR, kernelKeeper.removeVatFromSwingStoreExports was synchronous and therefore could not impede the following step's rejection of promises yet to be decided by the terminating vat (which therefore preceded fulfillment of the terminateWithFailure result promise).

But now that this PR is making removeVatFromSwingStoreExports async, a naïve await thereof would end the synchronous prelude before dead promise rejection, which is what caused the assertion in terminate.test.js to fail. Deferring the await restores the expected ordering, although we may want to move it even further down to avoid bumping following logic out of the synchronous prelude, e.g.:

    if (critical) {
      // The following error construction is a bit awkward, but (1) it saves us
      // from doing unmarshaling while in the kernel, while (2) it protects
      // against the info not actually encoding an error, but (3) it still
      // provides some diagnostic information back to the host, and (4) this
      // should happen rarely enough that if you have to do a little bit of
      // extra interpretive work on the receiving end to diagnose the problem,
      // it's going to be a small cost compared to the trouble you're probably
      // already in anyway if this happens.
      panic(`critical vat ${vatID} failed`, Error(info.body));
    } else if (vatAdminRootKref) {
      // static vat termination can happen before vat admin vat exists
      notifyTermination(
        vatID,
        vatAdminRootKref,
        shouldReject,
        info,
        queueToKref,
      );
    } else {
      console.log(
        `warning: vat ${vatID} terminated without a vatAdmin to report to`,
      );
    }

    // worker, if present, needs to be stopped (note that this only applies to ephemeral vats)
    deferred.push(() => vatWarehouse.stopWorker(vatID));

    await Promise.all(deferred);

Originally posted by @gibson042 in #10055 (comment)

The problem really is that we're voiding the result of terminateVat, meaning that we also ignore any errors that would happen during that call (unhandled rejection). This was introduced in bd93950 along with the following note:

      // TODO: terminateVat is async, result doesn't fire until worker
      // is dead. To fix this we'll probably need to move termination
      // to a run-queue ['terminate-vat', vatID] event, like createVat

Description of the Design

Do what is described in the TODO, or at least change terminateVat to no longer be an async function, but instead return a promise for the completion of the async parts. That latter part wasn't done in #10055 as it would have cause errors triggered in terminateVat to throw instead of rejecting the result promise, and the consequences are somewhat unclear (this is also a form of releasing zalgo)

Security Considerations

When changing the termination behavior, we run the risk of making some errors synchronous, preventing some forward progress. It might however be for the best as these errors might have previously been ignored.

Scaling Considerations

None

Test Plan

Both termination paths must be tested: vatAdmin triggered, and controller triggered, as well as attempting to trigger some of the error cases of vat termination to verify the handling behavior.

Upgrade Considerations

This behavior is part of consensus, and as such requires a chain software upgrade. It potentially might require a vat-admin upgrade as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant