Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ergonomic way to move data between workers #10078

Open
jakearchibald opened this issue Jan 19, 2024 · 9 comments
Open

Ergonomic way to move data between workers #10078

jakearchibald opened this issue Jan 19, 2024 · 9 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: workers

Comments

@jakearchibald
Copy link
Contributor

What problem are you trying to solve?

Right now, call-and-response communications with a worker are pretty cumbersome. Complex libraries have been created to try and make this easier.

What solutions exist today?

Message ports, and utilities like comlink.

How would you solve it?

This builds on the blank worker proposal.

ShadowRealm is now stage 3, and I think it has ideas we could borrow.

const worker = new Worker('about:blankjs', { type: 'module' });

// Import into worker:
await worker.importValue(specifier);

// Import and get export:
const value = await worker.importValue(specifier, exportName);

importValue would throw if the worker is not type: 'module'.

The export can be anything structured cloneable, but can be or can include functions.

When functions are called, the args are cloned and the function in the worker is called with those args. The return value is cloned, and used to resolve the function on the caller's side.

For example:

worker-utils.js

export createNumbersArray(length) {
  return Array.from({ length }, (_, i) => i);
}

index.js

const worker = new Worker('about:blankjs', { type: 'module' });
const createNumbersArray = await worker.importValue('./worker-utils.js', 'createNumbersArray');

const numbersArray = await createNumbersArray(3);
// [0, 1, 2]

Anything else?

It'd be nice if certain values could be marked as "transferrable" rather than cloneable. Tranferrable streams would benefit from this too.

@jakearchibald jakearchibald added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Jan 19, 2024
@guybedford
Copy link
Contributor

guybedford commented Jan 19, 2024

Very interesting, would be great to see something like this. While having this shape of API for the blank worker use case makes a lot of sense, for the use case where the worker itself has a "top-level" module definition, it might also make sense to simplify the interface further to some kind of top-level exports use:

const worker = new Worker('./mod.mjs', { type: 'module' });
const { createNumbersArray, anotherExport } = worker.getExports(['createNumbersArray', 'anotherExport']);

@asutherland
Copy link

The dynamic behavior allowed by importValue()'s exportName argument makes sense in the context of shadow realms which are heavy on eval-semantics already, but I am concerned it would lend itself to a style that makes it harder for static analysis for security purposes, like for auditing WebExtensions where there's definitely problems with bad actors trying to do tricky things via obfuscated dynamism. Like, it's fine if there's always a string literal there, but the API shape feels like it would be just as idiomatic to use a variable which opens up all kinds of avenues of dynamism, and the need for code auditors to potentially re-litigate the exact same debate over about how submitted code should use the API. In particular, I can imagine developers wanting to use a for loop there.

I feel like there had previously been discussion about enabling something like await worker.remoteImport('the-script.mjs') that could have similar semantics to import(), in particular returning a module namespace object? Are there major spec complexities preventing a solution like that? While this of course still allows nefarious dynamic behavior, it's already the identical problem to import() and allows consistent policy enforcement, like accessing the module via remotedModule.createNumbersArray().

The downside with this alternative is of course that it would potentially do significantly more work than required if the imported script has more exported functions than the caller wants to call. Although obviously it's possible with modules to not export more than is actually desired to proxy, and this approach could arguably be beneficial to static-analysis-based auditors since it would encourage the code authors to limit their number of exports because their code would perform worse because of the wasted exports.

@Jamesernator
Copy link

The dynamic behavior allowed by importValue()'s exportName argument makes sense in the context of shadow realms which are heavy on eval-semantics already, but I am concerned it would lend itself to a style that makes it harder for static analysis for security purposes, like for auditing WebExtensions where there's definitely problems with bad actors trying to do tricky things via obfuscated dynamism.

This is a wider concern with string specifiers as a dynamic import mechanism. The problem would be generally solved for all forms of dynamic import (including this one) with the module expression blocks proposal.

@jakearchibald
Copy link
Contributor Author

There are cases where blocks are handy, but I don't think they should be required to improve worker communication. Having worker code in another file is usually a benefit.

@jakearchibald
Copy link
Contributor Author

@guybedford yeah, I agree that would be handy (although getExports would need to return a promise). I was trying to avoid creating an API shape that was so different to shadow realms, but maybe that doesn't matter.

@josephrocca

This comment was marked as resolved.

@jakearchibald
Copy link
Contributor Author

jakearchibald commented Jan 23, 2024

I don't think we should get too bogged down in the transferable issue, but I think the solution here should be the same as it would be for transferred streams.

Something like:

const blob = new Blob();
markForTransfer(blob);

At this point blob is now detached, but the realm has a "marked for transfer" set containing the blob.

Then, in StructuredSerializeWithTransfer, if an object is in the "marked for transfer" set, it's transferred, and removed from the set.

Having one API that's transfer by default seems weird.

@asutherland
Copy link

I like the markForTransfer idea, it definitely feels ergonomic in a way that transfer lists do not and where transfer lists would be a real problem here. In general I think lessons from CORBA and other APIs are that it's hard/inadvisable to hide RPC boundaries and so it would be quite reasonable for a proposal like this to make it more ergonomic to perform what amounts to RPC, but that code still would need to account for that, including participating in semantics-impacting decisions like marking things for transfer.

Have there been similar discussions of this proposal elsewhere, and in particular that have been TAG reviewed? I'm having trouble finding other examples of markForTransfer specifically.

I should also note:

  • Per the File API WebIDL, Blobs and Files are not currrently marked Transferable.
  • Blob.close() was explicitly removed from the spec and markForTransfer presumably amounts to the same thing, as it seems like one would specify the transfer set to be cleared at the conclusion of the task, and if it wasn't transferred, it's now effectively closed. Although this potentially creates foot-guns if the method ever goes async after calling markForTransfer. One would really want an explicit and clear lifetime if allowing the pending transfer to outlive a request. Like let t = new TransferDecorator(); t.markForTransfer(transferrableThing); await somePromise; return t. But that still sounds like it's similar to the comlink approach?

@ggoodman
Copy link

In chatting with @guybedford, a challenge across JavaScript runtimes right now is how to run an untrusted guest via WebAssembly. The current design of that interface seems to be more aligned with a WASM host and guest being within more-or-less the same trust boundary. A WASM guest can deny service to the event-loop of the host and the host has nothing that it can do to prevent that.

For a host to prevent that, a parent controlling event loop is required. We can accomplish that by having a Worker be the host of the WASM instance. The event loop that spawns the worker can enforce deadlines and cancellations by managing the lifetime of the whole Worker.

However, the ergonomics of this leave a lot to be desired; packaging re-usable code across runtimes so that some is run in a worker and some in the main thread is challenging. Setting up the communication channels in a cross-platform way is also challenging for the same reasons motivating this discussion.

It seems like a version of this proposal might offer a more ergonomic way of handling this intermediary Worker, which is why I bring up the use-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: workers
Development

No branches or pull requests

7 participants