From bf2555a63e3e59dd5c021a78d9d4e4d1874e2a74 Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 13:12:27 +0200
Subject: [PATCH 1/7] Refactor future features

This refactoring clarifies points made in #53, #99, #81, and overall tries to make the text more self-coherent, less bullet-pointy.
---
 FutureFeatures.md | 116 ++++++++++++++++++++++++++--------------------
 1 file changed, 66 insertions(+), 50 deletions(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index f33bc58a..cc6cc39b 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -8,69 +8,85 @@ to be standardized immediately after the MVP. These will be prioritized based on
 developer feedback, and will be available under [feature tests](FeatureTest.md).
 
 ## Great tooling support
+
 This is covered in the [tooling](Tooling.md) section.
 
 ## Dynamic linking
- * [Dynamic loading](MVP.md#code-loading-and-imports) is in [the MVP](MVP.md), but all loaded modules have
-   their own [separate heaps](MVP.md#heap) and cannot share [function pointers](MVP.md#function-pointers).
- * Support both load-time and run-time (`dlopen`) dynamic linking of both
-   WebAssembly modules and non-WebAssembly modules (e.g., on the web, ES6
-   ones containing JS), sharing the heap as well as function pointers.
- * TODO
+
+[Dynamic loading](MVP.md#code-loading-and-imports) is in [the MVP](MVP.md), but
+all loaded modules have their own [separate heaps](MVP.md#heap) and cannot share
+[function pointers](MVP.md#function-pointers). Dynamic linking will allow
+developers to share heaps and function pointers between WebAssembly modules, but
+requires an implementation which properly handle ABI compatibility.
+
+WebAssembly will support both load-time and run-time (`dlopen`) dynamic linking
+of both WebAssembly modules and non-WebAssembly modules (e.g., on the web, ES6
+ones containing JavaScript).
+
+Dynamic linking is especially useful when combined with a Content Distribution
+Network (CDN) such as [hosted libraries][] because the library is only ever
+downloaded and compiled once per user device. It can also allow for smaller
+differential updates, which could be implemented in collaboration with
+[service workers][].
+
+Security-wise, dynamic linking and CDNs should be combine with [CORS][] and
+[subresource integrity][].
+
+  [hosted libraries]: https://developers.google.com/speed/libraries/
+  [service workers]: http://www.w3.org/TR/service-workers/
+  [CORS]: http://www.w3.org/TR/cors/
+  [subresource integrity]: http://www.w3.org/TR/SRI/
 
 ## Finer-grained control over memory
- * `mmap` of File, `madvise(MADV_DONTNEED)`, ...
- * TODO
+
+* `mmap` of files.
+* `mmap` with `MAP_FIXED`, which is often used as a performance optimization for
+  tools such as address sanitizer for its shadow memory.
+* `madvise(MADV_DONTNEED)`.
+* Shared memory, in the same WebAssembly module as well as across modules.
 
 ## More expressive control flow
- * Some types of control flow (esp. irreducible and indirect) cannot be
-   expressed with maximum efficiency in WebAssembly without patterned output by
-   the relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading)
-   optimizations in the engine.
- * Options under consideration:
-   * No action, while+switch and jump-threading are enough.
-   * Just add goto (direct and indirect).
-   * Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls).
-   * Add new control-flow primitives that address common patterns.
+
+* Some types of control flow (esp. irreducible and indirect) cannot be expressed
+  with maximum efficiency in WebAssembly without patterned output by the
+  relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading)
+  optimizations in the engine.
+* Options under consideration:
+  * No action, while+switch and jump-threading are enough.
+  * Just add goto (direct and indirect).
+  * Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls).
+  * Add new control-flow primitives that address common patterns.
 
 ## GC/DOM Integration
- * Access to certain kinds of GC things from variables/arguments/expressions
- * Ability to GC-allocate certain kinds of GC things
- * Initially, things with fixed structure:
-   * JS strings
-   * JS functions (as callable closures)
-   * Typed Arrays
-   * [Typed objects](https://github.com/nikomatsakis/typed-objects-explainer/)
-   * DOM objects via WebIDL
- * Perhaps a rooting API for safe reference from the linear address space
- * TODO
+
+* Access to certain kinds of Garbage-Collected (GC) objects from variables,
+  arguments, expressions.
+* Ability to GC-allocate certain kinds of GC objects.
+* Initially, things with fixed structure:
+  * JavaScript strings;
+  * JavaScript functions (as callable closures);
+  * Typed Arrays;
+  * [Typed objects](https://github.com/nikomatsakis/typed-objects-explainer/);
+  * DOM objects via WebIDL.
+* Perhaps a rooting API for safe reference from the linear address space.
 
 ## Heaps bigger than 4GiB
-* Allow heaps greater than 4GiB.
-* Provide load/store operations that take 64-bit address operands; `int64` becomes the
-  canonical pointer type.
-* On a 32-bit system, heaps must still be <4GiB so all the int64 arithmetic (which will be much
-  slower than 32-bit arithmetic) will be unnecessary.
-  * Should we provide a uintptr_t (only 64-bit on 64-bit systems)?
-    * This feature alone would not allow a C++ compiler to write size-polymorphic code since the word
-      size is also baked into the code in a hundred other ways (consider `offsetof`).
-    * The compiler *could* inflate all pointer types that are used in heap storage to 64-bit (so the
-      uintptr_t type was only used for local variable/expression types).
-      * This would imply an implicit truncation of any load of a pointer from the heap which could cause
-        subtle bugs if the pointer was storing a real int64-width value.
-      * This would still unnecessarily increase heap size on 32-bit; applications sensitive to OOM would
-        still want a separate 32-bit build.
-      * Now there are three compile targets: all-32, all-64, and this uintptr_t hybrid.
-    * More discussion and experimentation needed.
-      * Would the hybrid mostly Just Work?
-      * Are there users who would want to ship a hybrid build instead of two 32- and 64-bit builds
-        (conditionally loaded after a feature test)?
+
+WebAssembly will eventually allow heaps greater than 4GiB by providing
+load/store operations that take 64-bit address operands. Modules which opt-in to
+this feature have `int64` as the canonical pointer type.
+
+On a 32-bit system, heaps must still be smaller than 4GiB. All 64-bit pointer
+arithmetic arithmetic (which will be much slower than 32-bit arithmetic) will be
+therefore unnecessary.
 
 ## Source maps integration
- * Add a new source maps [module section type](MVP.md#module-structure).
- * Either embed the source maps directly or just a URL from which source maps can be downloaded.
- * Text source maps become intractably large for even moderate-sized compiled codes, so probably
-   need to define new binary format for source maps.
+
+* Add a new source maps [module section type](MVP.md#module-structure).
+* Either embed the source maps directly or just a URL from which source maps can
+  be downloaded.
+* Text source maps become intractably large for even moderate-sized compiled
+  codes, so probably need to define new binary format for source maps.
 
 ## Signature-restricted Proper Tail Calls
 * See the [asm.js RFC](http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js).

From ad83b196041dcbffda35ab9ec9afc1504b9ae48c Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 13:33:49 +0200
Subject: [PATCH 2/7] Finish refactoring. Address #49.

---
 FutureFeatures.md | 197 ++++++++++++++++++++++++++--------------------
 1 file changed, 110 insertions(+), 87 deletions(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index cc6cc39b..a7843cb5 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -47,15 +47,16 @@ Security-wise, dynamic linking and CDNs should be combine with [CORS][] and
 
 ## More expressive control flow
 
-* Some types of control flow (esp. irreducible and indirect) cannot be expressed
-  with maximum efficiency in WebAssembly without patterned output by the
-  relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading)
-  optimizations in the engine.
-* Options under consideration:
-  * No action, while+switch and jump-threading are enough.
-  * Just add goto (direct and indirect).
-  * Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls).
-  * Add new control-flow primitives that address common patterns.
+Some types of control flow (especially irreducible and indirect) cannot be
+expressed with maximum efficiency in WebAssembly without patterned output by the
+relooper and [jump-threading](http://en.wikipedia.org/wiki/Jump_threading)
+optimizations in the engine.
+
+Options under consideration:
+* No action, `while` and `switch` combined with jump-threading are enough.
+* Just add `goto` (direct and indirect).
+* Add [signature-restricted Proper Tail Calls](FutureFeatures.md#signature-restricted-proper-tail-calls).
+* Add new control-flow primitives that address common patterns.
 
 ## GC/DOM Integration
 
@@ -89,97 +90,119 @@ therefore unnecessary.
   codes, so probably need to define new binary format for source maps.
 
 ## Signature-restricted Proper Tail Calls
-* See the [asm.js RFC](http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js).
-* Useful properties of signature-restricted PTCs:
-  * In most cases, can be compiled to a single jump.
-  * Can express indirect `goto` via function-pointer calls.
-  * Can be used as a compile target for languages with unrestricted PTCs;
-    the code generator can use a stack in the heap to effectively implement a
-    custom call ABI on top of signature-restricted PTCs.
-  * An engine that wishes to perform aggressive optimization can fuse a graph of PTCs into a
-    single function.
-  * To reduce compile time, a code generator can use PTCs to break up
-    ultra-large functions into smaller functions at low overhead using PTCs.
-  * A compiler can exert some amount of control over register allocation via the ordering of
-    arguments in the PTC signature.
- 
+
+See the [asm.js RFC][] for a full description of signature-restricted Proper
+Tail Calls (PTC).
+
+Useful properties of signature-restricted PTCs:
+
+* In most cases, can be compiled to a single jump.
+* Can express indirect `goto` via function-pointer calls.
+* Can be used as a compile target for languages with unrestricted PTCs; the code
+  generator can use a stack in the heap to effectively implement a custom call
+  ABI on top of signature-restricted PTCs.
+* An engine that wishes to perform aggressive optimization can fuse a graph of
+  PTCs into a single function.
+* To reduce compile time, a code generator can use PTCs to break up ultra-large
+  functions into smaller functions at low overhead using PTCs.
+* A compiler can exert some amount of control over register allocation via the
+  ordering of arguments in the PTC signature.
+
+  [asm.js RFC]: http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js
+
 ## Proper Tail Calls
- * Expands upon Signature-restricted Proper Tail Calls.
- * TODO
+
+Expands upon signature-restricted Proper Tail Calls, and makes it easier to
+support other languages, especially functional programming languages.
  
 ## Asynchronous Signals
- * TODO
+
+TODO
 
 ## "Long SIMD"
-* The initial SIMD API will be a "short SIMD" API, centered around fixed-width
-  128-bit types and explicit SIMD operations. This is quite portable and useful,
-  but it won't be able to deliver the full performance capabilities of some of
-  today's popular hardware. There is [a proposal in the SIMD.js repository][]
-  for a "long SIMD" model which generalizes to wider hardware vector lengths,
-  making more natural use of advanced features like vector lane predication,
-  gather/scatter, and so on. Interesting questions to ask of such an model will
-  include:
-    * How will this model map onto popular modern SIMD hardware architectures?
-    * What is this model's relationship to other hardware parallelism features,
-      such as GPUs and threads with shared memory?
-    * How will this model be used from higher-level programming languages?
-      For example, the C++ committee is considering a wide variety of possible
-      approaches; which of them might be supported by the model?
-    * What is the relationship to the "short SIMD" API? "None" may be an
-      acceptable answer, but it's something to think about.
-    * What non-determinism does this model introduce into the overall platform?
-    * What happens when code uses long SIMD on a hardware platform which doesn't
-      support it? Reasonable options may include emulating it without the
-      benefit of hardware acceleration, or indicating a lack of support through
-      feature tests.
+
+The initial SIMD API will be a "short SIMD" API, centered around fixed-width
+128-bit types and explicit SIMD operations. This is quite portable and useful,
+but it won't be able to deliver the full performance capabilities of some of
+today's popular hardware. There is [a proposal in the SIMD.js repository][] for
+a "long SIMD" model which generalizes to wider hardware vector lengths, making
+more natural use of advanced features like vector lane predication,
+gather/scatter, and so on. Interesting questions to ask of such an model will
+include:
+
+* How will this model map onto popular modern SIMD hardware architectures?
+* What is this model's relationship to other hardware parallelism features, such
+  as GPUs and threads with shared memory?
+* How will this model be used from higher-level programming languages? For
+  example, the C++ committee is considering a wide variety of possible
+  approaches; which of them might be supported by the model?
+* What is the relationship to the "short SIMD" API? "None" may be an acceptable
+  answer, but it's something to think about.
+* What non-determinism does this model introduce into the overall platform?
+* What happens when code uses long SIMD on a hardware platform which doesn't
+  support it? Reasonable options may include emulating it without the benefit of
+  hardware acceleration, or indicating a lack of support through feature tests.
 
   [a proposal in the SIMD.js repository]: https://github.com/johnmccutchan/ecmascript_simd/issues/180
 
 ## Operations which may not be available or may not perform well on all platforms
- * Fused multiply-add.
- * Reciprocal square root approximate.
- * 16-bit floating point.
- * and more!
 
-## Platform-independent Just-in-Time compilation
-* Minimally, we need mechanisms to make this possible.
-  * Producing a dynamic library and loading it is very likely the first step, as
-    it will be easy to get working.
+* Fused multiply-add.
+* Reciprocal square root approximate.
+* 16-bit floating point.
+* and more!
 
-  * After that, it may become desirable to define lighter-weight mechanisms, such
-    as the ability to add a function to an existing module, or even the ability to
-    define explicitly patchable constructs within functions to allow for very
-    fine-grained JITing.
+## Platform-independent Just-in-Time compilation
 
-* Potential enhancements include:
-  * Provide JITs access to profile feedback for their JITed code.
+WebAssembly is a new virtual ISA, and as such applications won't be able to
+simply reuse their existing JIT-compiler backends. Applications will instead
+have to interface with WebAssembly's instructions as if they were a new ISA.
+
+Applications expect a wide variety of JIT-compilation capabilities. WebAssembly
+should support:
+
+* Producing a dynamic library and loading it into the current WebAssembly
+  module.
+* Define lighter-weight mechanisms, such as the ability to add a function to an
+  existing module.
+* Support explicitly patchable constructs within functions to allow for very
+  fine-grained JIT-compilation. This includes:
+    * Code patching for polymorphic inline caching;
+	* Call patching to chain JIT-compiled functions together;
+	* Temporary halt-insertion within functions, to trap if a function start
+      executing while a JIT-compiler's runtime is performing operations
+      dangerous to that function.
+* Provide JITs access to profile feedback for their JIT-compiled code.
+* Code unloading capabilities, especially in the context of code garbage
+  collection and defragmentation.
 
 ## Multiprocess support
- * `vfork`.
- * Inter-process communication.
- * Inter-process `mmap`.
+
+* `vfork`.
+* Inter-process communication.
+* Inter-process `mmap`.
 
 ## Trapping or non-trapping strategies.
-* Presently, when an instruction traps, the program is immediately terminated.
-  This suits C/C++ code, where trapping conditions indicate Undefined Behavior
-  at the source level, and it's also nice for handwritten code, where trapping
-  conditions typically indicate an instruction being asked to perform outside
-  its supported range. However, the current facilities do not cover some
-  interesting use cases:
-
-  * Not all likely-bug conditions are covered. For example, it would be very
-    nice to have a signed-integer add which traps on overflow. Such a construct
-    would add too much overhead on today's popular hardware architectures to be
-    used in general, however it may still be useful in some contexts.
-
-  * Some higher-level languages define their own semantics for conditions like
-    division by zero and so on. It's possible for compilers to add explicit
-    checks and handle such cases manually, though more direct support from the
-    platform could have advantages:
-    * Non-trapping versions of some opcodes, such as an integer division
-      instruction that returns zero instead of trapping on division by zero,
-      could potentially run faster on some platforms.
-    * The ability to recover gracefully from traps in some way could make many
-      things possible. Possibly this could involve throwing or possibly by
-      resuming execution at the trapping instruction with the execution state
-      altered, if there can be a reasonable way to specify how that should work.
+
+Presently, when an instruction traps, the program is immediately terminated.
+This suits C/C++ code, where trapping conditions indicate Undefined Behavior at
+the source level, and it's also nice for handwritten code, where trapping
+conditions typically indicate an instruction being asked to perform outside its
+supported range. However, the current facilities do not cover some interesting
+use cases:
+
+* Not all likely-bug conditions are covered. For example, it would be very nice
+  to have a signed-integer add which traps on overflow. Such a construct would
+  add too much overhead on today's popular hardware architectures to be used in
+  general, however it may still be useful in some contexts.
+* Some higher-level languages define their own semantics for conditions like
+  division by zero and so on. It's possible for compilers to add explicit checks
+  and handle such cases manually, though more direct support from the platform
+  could have advantages:
+  * Non-trapping versions of some opcodes, such as an integer division
+    instruction that returns zero instead of trapping on division by zero, could
+    potentially run faster on some platforms.
+  * The ability to recover gracefully from traps in some way could make many
+    things possible. Possibly this could involve throwing or possibly by
+    resuming execution at the trapping instruction with the execution state
+    altered, if there can be a reasonable way to specify how that should work.

From d69b6dba446cbfe19438ab16b6813d525088945a Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 17:15:27 +0200
Subject: [PATCH 3/7] Drop ABI mention, it's hard to explain concisely.

---
 FutureFeatures.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index a7843cb5..e3c69210 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -16,8 +16,7 @@ This is covered in the [tooling](Tooling.md) section.
 [Dynamic loading](MVP.md#code-loading-and-imports) is in [the MVP](MVP.md), but
 all loaded modules have their own [separate heaps](MVP.md#heap) and cannot share
 [function pointers](MVP.md#function-pointers). Dynamic linking will allow
-developers to share heaps and function pointers between WebAssembly modules, but
-requires an implementation which properly handle ABI compatibility.
+developers to share heaps and function pointers between WebAssembly modules.
 
 WebAssembly will support both load-time and run-time (`dlopen`) dynamic linking
 of both WebAssembly modules and non-WebAssembly modules (e.g., on the web, ES6

From 7f7713be0e489fdd8f829b8fe841ef93f9c3beba Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 17:17:59 +0200
Subject: [PATCH 4/7] Drop CORS and SRI. Move to MVP in another PR.

---
 FutureFeatures.md | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index e3c69210..3c994569 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -28,13 +28,8 @@ downloaded and compiled once per user device. It can also allow for smaller
 differential updates, which could be implemented in collaboration with
 [service workers][].
 
-Security-wise, dynamic linking and CDNs should be combine with [CORS][] and
-[subresource integrity][].
-
   [hosted libraries]: https://developers.google.com/speed/libraries/
   [service workers]: http://www.w3.org/TR/service-workers/
-  [CORS]: http://www.w3.org/TR/cors/
-  [subresource integrity]: http://www.w3.org/TR/SRI/
 
 ## Finer-grained control over memory
 

From 82bd80a0613f269bf3293013c0b953eb8f69140d Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 18:12:40 +0200
Subject: [PATCH 5/7] Clarify shared memory.

---
 FutureFeatures.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index 3c994569..0653d1d4 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -37,7 +37,8 @@ differential updates, which could be implemented in collaboration with
 * `mmap` with `MAP_FIXED`, which is often used as a performance optimization for
   tools such as address sanitizer for its shadow memory.
 * `madvise(MADV_DONTNEED)`.
-* Shared memory, in the same WebAssembly module as well as across modules.
+* Shared memory, where a physical address range is mapped to multiple physical
+  pages in a single WebAssembly module as well as across modules.
 
 ## More expressive control flow
 

From fc88967d7bb150a3a82101d631025fa6da7d24ee Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 18:16:04 +0200
Subject: [PATCH 6/7] Clarify 64-bit heap.

---
 FutureFeatures.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index 0653d1d4..0e23e390 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -72,9 +72,9 @@ WebAssembly will eventually allow heaps greater than 4GiB by providing
 load/store operations that take 64-bit address operands. Modules which opt-in to
 this feature have `int64` as the canonical pointer type.
 
-On a 32-bit system, heaps must still be smaller than 4GiB. All 64-bit pointer
-arithmetic arithmetic (which will be much slower than 32-bit arithmetic) will be
-therefore unnecessary.
+On a 32-bit system, heaps must still be smaller than 4GiB. A WebAssembly
+implementation running on such a platform may restrict allocations to the lower
+4GiB, and leave the two 32-bits untouched.
 
 ## Source maps integration
 

From 3c6c2f67f3438d0c324b357711db4f4804c79566 Mon Sep 17 00:00:00 2001
From: JF Bastien <jfb@chromium.org>
Date: Tue, 9 Jun 2015 23:17:22 +0200
Subject: [PATCH 7/7] Drop MAP_FIXED, add to Tooling.md in a separate PR.

---
 FutureFeatures.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/FutureFeatures.md b/FutureFeatures.md
index 0e23e390..92415414 100644
--- a/FutureFeatures.md
+++ b/FutureFeatures.md
@@ -34,8 +34,6 @@ differential updates, which could be implemented in collaboration with
 ## Finer-grained control over memory
 
 * `mmap` of files.
-* `mmap` with `MAP_FIXED`, which is often used as a performance optimization for
-  tools such as address sanitizer for its shadow memory.
 * `madvise(MADV_DONTNEED)`.
 * Shared memory, where a physical address range is mapped to multiple physical
   pages in a single WebAssembly module as well as across modules.