Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASM application security considerations #304

Closed
kg opened this issue Aug 18, 2015 · 17 comments
Closed

WASM application security considerations #304

kg opened this issue Aug 18, 2015 · 17 comments

Comments

@kg
Copy link
Contributor

kg commented Aug 18, 2015

#302 and a few other discussions have raised an issue we need to at least consider for the MVP: Application security.

The wasm sandbox provides security for the OS/browser hosting the wasm application, so we're all good there. Security for the application itself - the safety of its data, for example - is another concern.

For JavaScript/DOM applications this is currently addressed by the same-origin policy, data hiding (inside closures), etc. The approach used in JS+DOM is - to put it mildly, rather complicated.

When thinking about JS->wasm and wasm->JS interop, along with scenarios where multiple wasm applications are loaded in a page (from different origins), we need to make sure nothing we spec allows for the integrity of a wasm module's heap to be significantly compromised. Otherwise, a couple years down the road we have some 'worst case' scenarios equivalent to someone being able to pull your whole email history out of the heap of your wasm email client.

This mostly comes up when addressing design considerations like how the wasm heap is exposed to JS (if at all), how data crosses the JS<->wasm boundary, how function pointers work, etc. The 'obvious' solution to some of these problems will have some significant security consequences.

@jfbastien
Copy link
Member

See #205 for my TODO on this :-)
I think this dupes it?

@kg
Copy link
Contributor Author

kg commented Aug 18, 2015

This isn't about bugs, it's about us specifically security-hardening interfaces like the externally visible heap. It does overlap with #205 somewhat, though! In this case none of the wasm modules would be buggy or malicious. Think of this as an equivalent to CSRF, not an equivalent to a buffer overflow.

@lukewagner
Copy link
Member

As currently described, wasm modules would have an associated origin, like every JS script/function and would be subject to the same same-origin security policy which basically limits all cross-origin (access except for a tiny set of white-listed DOM objects+properties (Location, Window)). There is also the question of whether to allow cross-origin script loading (as JS does) without requiring CORS (personally, I'd like to avoid that if we can). Are there other specific concerns?

@kg
Copy link
Contributor Author

kg commented Aug 19, 2015

It might be worthwhile to consider whether we want to allow JS (even from the same origin) full access to the heap or only to specific regions of the heap. Essentially, does the wasm module decide what external JS has access to, or is it all-or-nothing? I can see arguments for both.

It's not clear to me how a loaded wasm module is exposed in the dom yet so I'm not sure exactly how the same-origin policy governs it. In current emscripten an asm.js module is just crammed into the global scope where third-party JS could definitely access the heap. I assume we don't want this, but what specific steps are we taking to prevent it? Presumably wasm modules are loaded via module loaders (imports), but in that case do we block cross-origin imports? If I import a module a 2nd time, it has to provide the same module so that each ES6 module's imports are the same. How do we secure that across origins? Do we want to take steps to prevent a module's heap from 'leaking out' of the origin and being accessed by malicious actors?

For the record, I don't think we need to necessarily protect against most attacks, but we should think through our security model and lay out a case for why we chose our final set of primitives.

This interacts when we are figuring out exactly how to expose the heap to JS for interop (a current point of discussion, I believe), and it also potentially interacts with address space management (if we introduce PROT_READ pages, do we also enforce that read-only status for external JS? Do we introduce PROT_JS_READ and PROT_JS_WRITE?)

@lukewagner
Copy link
Member

@kg I agree that we should make sharing-linear-memory-with-JS opt-in, and that is, subtly, already the wording in Web.md: "If allowed by the module, JavaScript can alias a loaded module's linear memory via Typed Arrays". I had been thinking an all-or-nothing mode, but it is interesting to consider something finer-grained. One idea is that we could start going in the direction described by the WebIDL integration and define a "memory region" opaque reference type. When a "memory region" was passed to JS, a typed array view would pop out the other side.

It's not clear to me how a loaded wasm module is exposed in the dom yet

The current proposal is to load wasm modules just like ES6 modules. So if you load wasm via <script src=... type='module'> there would be a script element in the DOM (just like today), but it doesn't really give you any access to the module state. In general, the stated high-level goal is that all the security would be the same as if the wasm module was es6 code.

In current emscripten an asm.js module is just crammed into the global scope where third-party
JS could definitely access the heap.

Only if it's third-party JS you loaded into your origin and the heap was exposed to JS as discussed above. If you load malicious third-party code into your origin, it can already do all manner of other bad things like take over the DOM, call arbitrary JS and wasm exports, etc. If you want sandboxing for untrusted code, you want iframes (although some o-cap advocates might argue it's technically achievable within a single origin by limiting what the untrusted code has access to...).

Presumably wasm modules are loaded via module loaders (imports), but in that case do we block > cross-origin imports?

That would be the default, but as I said above, I'd rather require CORS so we don't have to worry about things like error sanitization that we have to worry about now in the JS engine.

If I import a module a 2nd time, it has to provide the same module so that each ES6 module's
imports are the same. How do we secure that across origins?

This will be defined by the loader spec in stage 0, under "memoization". The short answer is that the "registry" used to memoize is (as you might expect) per-realm.

Do we want to take steps to prevent a module's heap from 'leaking out' of the origin and being
accessed by malicious actors?

The same-origin policy already does this; a module (es6 or wasm) in one origin is just not reachable/visible to any other origin.

if we introduce PROT_READ pages, do we also enforce that read-only status for external JS?

We can specify that any typed array views of memory that are made partially or fully inaccessible would be detached; otherwise we'd need to change the semantics of typed array access in JS. This is already the proposal for what to do when memory is resized (to avoid opening a can of engine worms).

@AndrewScheidecker
Copy link

One way to deal with this would be to expose a JS implementation of map_shmem and/or shmem_create that returns a TypedArray. Then you can make JavaScript go through the proposed shared memory mapping mechanism (shmem_create&map_shmem). That provides opt-in sharing with pretty minimal constraints on the WebAssembly VM.

@jfbastien
Copy link
Member

@AndrewScheidecker are you proposing this mechanism to share memory between JS and wasm?

@AndrewScheidecker
Copy link

@jfbastien Yes, and also that it should be the only way that JS is allowed to access WASM memory.

@jfbastien
Copy link
Member

@AndrewScheidecker I suggest moving this to a separate issue. It has deeply constraining implications on what wasm is allowed to do going forward, including heap resizing, page table management, GC, debugging, ...

@lukewagner
Copy link
Member

shmem is an interesting idea. However, it seems like we don't actually need the full power that shmem provides (the ability for one page to be simultaneously mapped into several different virtual address ranges); all JS needs is a pointer. An unpleasant consequence would be that JS wouldn't be able to view any regions of linear memory that were map_filed and vice versa (there can be only one file mapping for a given address range at the OS level). Also, it would mean putting shmem in the MVP when it's not an easy feature to implement (it requires OS support and it's hard on Windows because of MapViewOfFile limitations). Lastly, I think my proposal above (wasm can pass out "memory regions" that turn into typed array views in JS) would offer the same fine-grained access control.

A general point that I should have pointed out earlier, though: in the MVP, with all Web API access going through JS, the most natural strategy that Emscripten/llvm-wasm will use is to alias the entire linear memory (so that pointers work on both sides). So I'm not even really sure we'd benefit from this fine-grained control in the MVP.

@AndrewScheidecker
Copy link

An unpleasant consequence would be that JS wouldn't be able to view any regions of linear memory that were map_filed and vice versa

I hadn't thought about the interaction with map_file, but I think it's also an argument for applying those restrictions to whatever mechanism JavaScript can use to see wasm memory. If you provide more power, it forces the wasm VM to run in the same OS process as the JS. Executing a wasm module in separate OS processes seems necessary to support a truly 64-bit address space, or to even support allocating most of a 32-bit address space when invoked by a 32-bit browser, so I think we should make sure the spec doesn't make that impractical.

It's pointless if the polyfill needs OS support to implement this functionality, but if we can accept the polyfill only sharing memory within browser process, that's not necessary.

@lukewagner
Copy link
Member

Synchronous calls to/from JS and Web APIs (something that was questioned and which multiple browsers were strongly in favor of to support the high-level goal of tight integration with the existing web platform) already practically (though not theoretically) force wasm into the same process (and callstack) as JS. In the future, once wasm can access all APIs without going through JS (see GC.md), it will be possible to create a Web Worker containing only wasm that could easily be launched in a separate OS process. In fact, Web Workers started out in separate processes in some browsers but were brought back in for various reasons. There is also a (rather vague) multiprocess support future feature that could allow wasm to more explicitly request a separate process.

I don't quite follow what you mean by "truly 64-bit address space". Even in native code, OSes put limitations on apps from mapping more than a few TB (in some cases, more than a few GB). Post-MVP, wasm will allow int64 pointers and thus >4GiB heaps. I'm not sure what other qualities of 64-bit address space are missing.

@AndrewScheidecker
Copy link

Synchronous calls to/from JS and Web APIs (something that was questioned and which multiple browsers were strongly in favor of to support the high-level goal of tight integration with the existing web platform) already practically (though not theoretically) force wasm into the same process (and callstack) as JS.

I think it's important that it's possible to run a WebAssembly process in a separate OS process, so you're not competing for address space with whatever else is going on in the browser process. Requiring you to explicitly choose to do that is fine. That would also imply that you have to go through shmem_create/map_shmem to share memory with that process.

I don't quite follow what you mean by "truly 64-bit address space".

Sorry, that wasn't clear. I mean that if a WebAssembly process is to use anywhere near the virtual address space allowed by the OS, then it must do so in a separate OS process from the browser and other WebAssembly processes.

@lukewagner
Copy link
Member

I think it's important that it's possible to run a WebAssembly process in a separate OS process,...

It would be possible to run a wasm process in a separate OS process, even with JS typed array views on linear memory because at the impl level both map_file and sharing memory with JS would involve mapping a file so the engine would just need to keep track of mappings and map the right file for the right ranges. It's only at the semantic level that you'd have a conflict with two conflicting shmems.

I mean that if a WebAssembly process is to use anywhere near the virtual address space allowed
by the OS, then it must do so in a separate OS process from the browser and other WebAssembly
processes.

Agreed on wanting to unlock this capability (I think it's way more valuable for 32-bit). Given that browsers will start with multiple wasm modules (and the JS engine) sharing the same process (many performance reasons), I think having an explicit way to request a separate process (and avoid the sync RPC issues by design is the best path forward.

@jfbastien
Copy link
Member

I mean that if a WebAssembly process is to use anywhere near the virtual address space allowed
by the OS, then it must do so in a separate OS process from the browser and other WebAssembly
processes.

Agreed on wanting to unlock this capability (I think it's way more valuable for 32-bit). Given that browsers will start with multiple wasm modules (and the JS engine) sharing the same process (many performance reasons), I think having an explicit way to request a separate process (and avoid the sync RPC issues by design is the best path forward.

I tentatively agree with @lukewagner because it seems difficult to make guarantees about APIs when developers expects them to be synchronous. I'd like this to be disproved though: it would be wonderful if it were possible to just put wasm into its own process when convenient!

@lukewagner
Copy link
Member

@jfbastien FWIW, even before specific multi-process features, Web Workers containing only wasm code would be an easier first target to move out of process.

@lukewagner
Copy link
Member

Tentatively closing this now; new questions/ideas welcome in new issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants