From bf2555a63e3e59dd5c021a78d9d4e4d1874e2a74 Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 13:12:27 +0200 Subject: [PATCH 1/7] Refactor future features This refactoring clarifies points made in #53, #99, #81, and overall tries to make the text more self-coherent, less bullet-pointy. --- FutureFeatures.md | 116 ++++++++++++++++++++++++++-------------------- 1 file changed, 66 insertions(+), 50 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index f33bc58a..cc6cc39b 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -8,69 +8,85 @@ to be standardized immediately after the MVP. These will be prioritized based on developer feedback, and will be available under [feature tests](FeatureTest.md). ## Great tooling support + This is covered in the [tooling](Tooling.md) section. ## Dynamic linking - * [Dynamic loading](MVP.md#code-loading-and-imports) is in [the MVP](MVP.md), but all loaded modules have - their own [separate heaps](MVP.md#heap) and cannot share [function pointers](MVP.md#function-pointers). - * Support both load-time and run-time (`dlopen`) dynamic linking of both - WebAssembly modules and non-WebAssembly modules (e.g., on the web, ES6 - ones containing JS), sharing the heap as well as function pointers. - * TODO + +[Dynamic loading](MVP.md#code-loading-and-imports) is in [the MVP](MVP.md), but +all loaded modules have their own [separate heaps](MVP.md#heap) and cannot share +[function pointers](MVP.md#function-pointers). Dynamic linking will allow +developers to share heaps and function pointers between WebAssembly modules, but +requires an implementation which properly handle ABI compatibility. + +WebAssembly will support both load-time and run-time (`dlopen`) dynamic linking +of both WebAssembly modules and non-WebAssembly modules (e.g., on the web, ES6 +ones containing JavaScript). + +Dynamic linking is especially useful when combined with a Content Distribution +Network (CDN) such as [hosted libraries][] because the library is only ever +downloaded and compiled once per user device. It can also allow for smaller +differential updates, which could be implemented in collaboration with +[service workers][]. + +Security-wise, dynamic linking and CDNs should be combine with [CORS][] and +[subresource integrity][]. + + [hosted libraries]: https://developers.google.com/speed/libraries/ + [service workers]: http://www.w3.org/TR/service-workers/ + [CORS]: http://www.w3.org/TR/cors/ + [subresource integrity]: http://www.w3.org/TR/SRI/ ## Finer-grained control over memory - * `mmap` of File, `madvise(MADV_DONTNEED)`, ... - * TODO + +* `mmap` of files. +* `mmap` with `MAP_FIXED`, which is often used as a performance optimization for + tools such as address sanitizer for its shadow memory. +* `madvise(MADV_DONTNEED)`. +* Shared memory, in the same WebAssembly module as well as across modules. ## More expressive control flow - * Some types of control flow (esp. irreducible and indirect) cannot be - expressed with maximum efficiency in WebAssembly without patterned output by - the relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading) - optimizations in the engine. - * Options under consideration: - * No action, while+switch and jump-threading are enough. - * Just add goto (direct and indirect). - * Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls). - * Add new control-flow primitives that address common patterns. + +* Some types of control flow (esp. irreducible and indirect) cannot be expressed + with maximum efficiency in WebAssembly without patterned output by the + relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading) + optimizations in the engine. +* Options under consideration: + * No action, while+switch and jump-threading are enough. + * Just add goto (direct and indirect). + * Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls). + * Add new control-flow primitives that address common patterns. ## GC/DOM Integration - * Access to certain kinds of GC things from variables/arguments/expressions - * Ability to GC-allocate certain kinds of GC things - * Initially, things with fixed structure: - * JS strings - * JS functions (as callable closures) - * Typed Arrays - * [Typed objects](https://github.com/nikomatsakis/typed-objects-explainer/) - * DOM objects via WebIDL - * Perhaps a rooting API for safe reference from the linear address space - * TODO + +* Access to certain kinds of Garbage-Collected (GC) objects from variables, + arguments, expressions. +* Ability to GC-allocate certain kinds of GC objects. +* Initially, things with fixed structure: + * JavaScript strings; + * JavaScript functions (as callable closures); + * Typed Arrays; + * [Typed objects](https://github.com/nikomatsakis/typed-objects-explainer/); + * DOM objects via WebIDL. +* Perhaps a rooting API for safe reference from the linear address space. ## Heaps bigger than 4GiB -* Allow heaps greater than 4GiB. -* Provide load/store operations that take 64-bit address operands; `int64` becomes the - canonical pointer type. -* On a 32-bit system, heaps must still be <4GiB so all the int64 arithmetic (which will be much - slower than 32-bit arithmetic) will be unnecessary. - * Should we provide a uintptr_t (only 64-bit on 64-bit systems)? - * This feature alone would not allow a C++ compiler to write size-polymorphic code since the word - size is also baked into the code in a hundred other ways (consider `offsetof`). - * The compiler *could* inflate all pointer types that are used in heap storage to 64-bit (so the - uintptr_t type was only used for local variable/expression types). - * This would imply an implicit truncation of any load of a pointer from the heap which could cause - subtle bugs if the pointer was storing a real int64-width value. - * This would still unnecessarily increase heap size on 32-bit; applications sensitive to OOM would - still want a separate 32-bit build. - * Now there are three compile targets: all-32, all-64, and this uintptr_t hybrid. - * More discussion and experimentation needed. - * Would the hybrid mostly Just Work? - * Are there users who would want to ship a hybrid build instead of two 32- and 64-bit builds - (conditionally loaded after a feature test)? + +WebAssembly will eventually allow heaps greater than 4GiB by providing +load/store operations that take 64-bit address operands. Modules which opt-in to +this feature have `int64` as the canonical pointer type. + +On a 32-bit system, heaps must still be smaller than 4GiB. All 64-bit pointer +arithmetic arithmetic (which will be much slower than 32-bit arithmetic) will be +therefore unnecessary. ## Source maps integration - * Add a new source maps [module section type](MVP.md#module-structure). - * Either embed the source maps directly or just a URL from which source maps can be downloaded. - * Text source maps become intractably large for even moderate-sized compiled codes, so probably - need to define new binary format for source maps. + +* Add a new source maps [module section type](MVP.md#module-structure). +* Either embed the source maps directly or just a URL from which source maps can + be downloaded. +* Text source maps become intractably large for even moderate-sized compiled + codes, so probably need to define new binary format for source maps. ## Signature-restricted Proper Tail Calls * See the [asm.js RFC](http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js). From ad83b196041dcbffda35ab9ec9afc1504b9ae48c Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 13:33:49 +0200 Subject: [PATCH 2/7] Finish refactoring. Address #49. --- FutureFeatures.md | 197 ++++++++++++++++++++++++++-------------------- 1 file changed, 110 insertions(+), 87 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index cc6cc39b..a7843cb5 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -47,15 +47,16 @@ Security-wise, dynamic linking and CDNs should be combine with [CORS][] and ## More expressive control flow -* Some types of control flow (esp. irreducible and indirect) cannot be expressed - with maximum efficiency in WebAssembly without patterned output by the - relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading) - optimizations in the engine. -* Options under consideration: - * No action, while+switch and jump-threading are enough. - * Just add goto (direct and indirect). - * Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls). - * Add new control-flow primitives that address common patterns. +Some types of control flow (especially irreducible and indirect) cannot be +expressed with maximum efficiency in WebAssembly without patterned output by the +relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading) +optimizations in the engine. + +Options under consideration: +* No action, `while` and `switch` combined with jump-threading are enough. +* Just add `goto` (direct and indirect). +* Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls). +* Add new control-flow primitives that address common patterns. ## GC/DOM Integration @@ -89,97 +90,119 @@ therefore unnecessary. codes, so probably need to define new binary format for source maps. ## Signature-restricted Proper Tail Calls -* See the [asm.js RFC](http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js). -* Useful properties of signature-restricted PTCs: - * In most cases, can be compiled to a single jump. - * Can express indirect `goto` via function-pointer calls. - * Can be used as a compile target for languages with unrestricted PTCs; - the code generator can use a stack in the heap to effectively implement a - custom call ABI on top of signature-restricted PTCs. - * An engine that wishes to perform aggressive optimization can fuse a graph of PTCs into a - single function. - * To reduce compile time, a code generator can use PTCs to break up - ultra-large functions into smaller functions at low overhead using PTCs. - * A compiler can exert some amount of control over register allocation via the ordering of - arguments in the PTC signature. - + +See the [asm.js RFC][] for a full description of signature-restricted Proper +Tail Calls (PTC). + +Useful properties of signature-restricted PTCs: + +* In most cases, can be compiled to a single jump. +* Can express indirect `goto` via function-pointer calls. +* Can be used as a compile target for languages with unrestricted PTCs; the code + generator can use a stack in the heap to effectively implement a custom call + ABI on top of signature-restricted PTCs. +* An engine that wishes to perform aggressive optimization can fuse a graph of + PTCs into a single function. +* To reduce compile time, a code generator can use PTCs to break up ultra-large + functions into smaller functions at low overhead using PTCs. +* A compiler can exert some amount of control over register allocation via the + ordering of arguments in the PTC signature. + + [asm.js RFC]: http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js + ## Proper Tail Calls - * Expands upon Signature-restricted Proper Tail Calls. - * TODO + +Expands upon signature-restricted Proper Tail Calls, and makes it easier to +support other languages, especially functional programming languages. ## Asynchronous Signals - * TODO + +TODO ## "Long SIMD" -* The initial SIMD API will be a "short SIMD" API, centered around fixed-width - 128-bit types and explicit SIMD operations. This is quite portable and useful, - but it won't be able to deliver the full performance capabilities of some of - today's popular hardware. There is [a proposal in the SIMD.js repository][] - for a "long SIMD" model which generalizes to wider hardware vector lengths, - making more natural use of advanced features like vector lane predication, - gather/scatter, and so on. Interesting questions to ask of such an model will - include: - * How will this model map onto popular modern SIMD hardware architectures? - * What is this model's relationship to other hardware parallelism features, - such as GPUs and threads with shared memory? - * How will this model be used from higher-level programming languages? - For example, the C++ committee is considering a wide variety of possible - approaches; which of them might be supported by the model? - * What is the relationship to the "short SIMD" API? "None" may be an - acceptable answer, but it's something to think about. - * What non-determinism does this model introduce into the overall platform? - * What happens when code uses long SIMD on a hardware platform which doesn't - support it? Reasonable options may include emulating it without the - benefit of hardware acceleration, or indicating a lack of support through - feature tests. + +The initial SIMD API will be a "short SIMD" API, centered around fixed-width +128-bit types and explicit SIMD operations. This is quite portable and useful, +but it won't be able to deliver the full performance capabilities of some of +today's popular hardware. There is [a proposal in the SIMD.js repository][] for +a "long SIMD" model which generalizes to wider hardware vector lengths, making +more natural use of advanced features like vector lane predication, +gather/scatter, and so on. Interesting questions to ask of such an model will +include: + +* How will this model map onto popular modern SIMD hardware architectures? +* What is this model's relationship to other hardware parallelism features, such + as GPUs and threads with shared memory? +* How will this model be used from higher-level programming languages? For + example, the C++ committee is considering a wide variety of possible + approaches; which of them might be supported by the model? +* What is the relationship to the "short SIMD" API? "None" may be an acceptable + answer, but it's something to think about. +* What non-determinism does this model introduce into the overall platform? +* What happens when code uses long SIMD on a hardware platform which doesn't + support it? Reasonable options may include emulating it without the benefit of + hardware acceleration, or indicating a lack of support through feature tests. [a proposal in the SIMD.js repository]: https://github.com/johnmccutchan/ecmascript_simd/issues/180 ## Operations which may not be available or may not perform well on all platforms - * Fused multiply-add. - * Reciprocal square root approximate. - * 16-bit floating point. - * and more! -## Platform-independent Just-in-Time compilation -* Minimally, we need mechanisms to make this possible. - * Producing a dynamic library and loading it is very likely the first step, as - it will be easy to get working. +* Fused multiply-add. +* Reciprocal square root approximate. +* 16-bit floating point. +* and more! - * After that, it may become desirable to define lighter-weight mechanisms, such - as the ability to add a function to an existing module, or even the ability to - define explicitly patchable constructs within functions to allow for very - fine-grained JITing. +## Platform-independent Just-in-Time compilation -* Potential enhancements include: - * Provide JITs access to profile feedback for their JITed code. +WebAssembly is a new virtual ISA, and as such applications won't be able to +simply reuse their existing JIT-compiler backends. Applications will instead +have to interface with WebAssembly's instructions as if they were a new ISA. + +Applications expect a wide variety of JIT-compilation capabilities. WebAssembly +should support: + +* Producing a dynamic library and loading it into the current WebAssembly + module. +* Define lighter-weight mechanisms, such as the ability to add a function to an + existing module. +* Support explicitly patchable constructs within functions to allow for very + fine-grained JIT-compilation. This includes: + * Code patching for polymorphic inline caching; + * Call patching to chain JIT-compiled functions together; + * Temporary halt-insertion within functions, to trap if a function start + executing while a JIT-compiler's runtime is performing operations + dangerous to that function. +* Provide JITs access to profile feedback for their JIT-compiled code. +* Code unloading capabilities, especially in the context of code garbage + collection and defragmentation. ## Multiprocess support - * `vfork`. - * Inter-process communication. - * Inter-process `mmap`. + +* `vfork`. +* Inter-process communication. +* Inter-process `mmap`. ## Trapping or non-trapping strategies. -* Presently, when an instruction traps, the program is immediately terminated. - This suits C/C++ code, where trapping conditions indicate Undefined Behavior - at the source level, and it's also nice for handwritten code, where trapping - conditions typically indicate an instruction being asked to perform outside - its supported range. However, the current facilities do not cover some - interesting use cases: - - * Not all likely-bug conditions are covered. For example, it would be very - nice to have a signed-integer add which traps on overflow. Such a construct - would add too much overhead on today's popular hardware architectures to be - used in general, however it may still be useful in some contexts. - - * Some higher-level languages define their own semantics for conditions like - division by zero and so on. It's possible for compilers to add explicit - checks and handle such cases manually, though more direct support from the - platform could have advantages: - * Non-trapping versions of some opcodes, such as an integer division - instruction that returns zero instead of trapping on division by zero, - could potentially run faster on some platforms. - * The ability to recover gracefully from traps in some way could make many - things possible. Possibly this could involve throwing or possibly by - resuming execution at the trapping instruction with the execution state - altered, if there can be a reasonable way to specify how that should work. + +Presently, when an instruction traps, the program is immediately terminated. +This suits C/C++ code, where trapping conditions indicate Undefined Behavior at +the source level, and it's also nice for handwritten code, where trapping +conditions typically indicate an instruction being asked to perform outside its +supported range. However, the current facilities do not cover some interesting +use cases: + +* Not all likely-bug conditions are covered. For example, it would be very nice + to have a signed-integer add which traps on overflow. Such a construct would + add too much overhead on today's popular hardware architectures to be used in + general, however it may still be useful in some contexts. +* Some higher-level languages define their own semantics for conditions like + division by zero and so on. It's possible for compilers to add explicit checks + and handle such cases manually, though more direct support from the platform + could have advantages: + * Non-trapping versions of some opcodes, such as an integer division + instruction that returns zero instead of trapping on division by zero, could + potentially run faster on some platforms. + * The ability to recover gracefully from traps in some way could make many + things possible. Possibly this could involve throwing or possibly by + resuming execution at the trapping instruction with the execution state + altered, if there can be a reasonable way to specify how that should work. From d69b6dba446cbfe19438ab16b6813d525088945a Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 17:15:27 +0200 Subject: [PATCH 3/7] Drop ABI mention, it's hard to explain concisely. --- FutureFeatures.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index a7843cb5..e3c69210 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -16,8 +16,7 @@ This is covered in the [tooling](Tooling.md) section. [Dynamic loading](MVP.md#code-loading-and-imports) is in [the MVP](MVP.md), but all loaded modules have their own [separate heaps](MVP.md#heap) and cannot share [function pointers](MVP.md#function-pointers). Dynamic linking will allow -developers to share heaps and function pointers between WebAssembly modules, but -requires an implementation which properly handle ABI compatibility. +developers to share heaps and function pointers between WebAssembly modules. WebAssembly will support both load-time and run-time (`dlopen`) dynamic linking of both WebAssembly modules and non-WebAssembly modules (e.g., on the web, ES6 From 7f7713be0e489fdd8f829b8fe841ef93f9c3beba Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 17:17:59 +0200 Subject: [PATCH 4/7] Drop CORS and SRI. Move to MVP in another PR. --- FutureFeatures.md | 5 ----- 1 file changed, 5 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index e3c69210..3c994569 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -28,13 +28,8 @@ downloaded and compiled once per user device. It can also allow for smaller differential updates, which could be implemented in collaboration with [service workers][]. -Security-wise, dynamic linking and CDNs should be combine with [CORS][] and -[subresource integrity][]. - [hosted libraries]: https://developers.google.com/speed/libraries/ [service workers]: http://www.w3.org/TR/service-workers/ - [CORS]: http://www.w3.org/TR/cors/ - [subresource integrity]: http://www.w3.org/TR/SRI/ ## Finer-grained control over memory From 82bd80a0613f269bf3293013c0b953eb8f69140d Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 18:12:40 +0200 Subject: [PATCH 5/7] Clarify shared memory. --- FutureFeatures.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index 3c994569..0653d1d4 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -37,7 +37,8 @@ differential updates, which could be implemented in collaboration with * `mmap` with `MAP_FIXED`, which is often used as a performance optimization for tools such as address sanitizer for its shadow memory. * `madvise(MADV_DONTNEED)`. -* Shared memory, in the same WebAssembly module as well as across modules. +* Shared memory, where a physical address range is mapped to multiple physical + pages in a single WebAssembly module as well as across modules. ## More expressive control flow From fc88967d7bb150a3a82101d631025fa6da7d24ee Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 18:16:04 +0200 Subject: [PATCH 6/7] Clarify 64-bit heap. --- FutureFeatures.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index 0653d1d4..0e23e390 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -72,9 +72,9 @@ WebAssembly will eventually allow heaps greater than 4GiB by providing load/store operations that take 64-bit address operands. Modules which opt-in to this feature have `int64` as the canonical pointer type. -On a 32-bit system, heaps must still be smaller than 4GiB. All 64-bit pointer -arithmetic arithmetic (which will be much slower than 32-bit arithmetic) will be -therefore unnecessary. +On a 32-bit system, heaps must still be smaller than 4GiB. A WebAssembly +implementation running on such a platform may restrict allocations to the lower +4GiB, and leave the two 32-bits untouched. ## Source maps integration From 3c6c2f67f3438d0c324b357711db4f4804c79566 Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 23:17:22 +0200 Subject: [PATCH 7/7] Drop MAP_FIXED, add to Tooling.md in a separate PR. --- FutureFeatures.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/FutureFeatures.md b/FutureFeatures.md index 0e23e390..92415414 100644 --- a/FutureFeatures.md +++ b/FutureFeatures.md @@ -34,8 +34,6 @@ differential updates, which could be implemented in collaboration with ## Finer-grained control over memory * `mmap` of files. -* `mmap` with `MAP_FIXED`, which is often used as a performance optimization for - tools such as address sanitizer for its shadow memory. * `madvise(MADV_DONTNEED)`. * Shared memory, where a physical address range is mapped to multiple physical pages in a single WebAssembly module as well as across modules.