Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Appropriateness for wasm #59

Closed
lars-t-hansen opened this issue Feb 8, 2016 · 18 comments
Closed

Appropriateness for wasm #59

lars-t-hansen opened this issue Feb 8, 2016 · 18 comments
Milestone

Comments

@lars-t-hansen
Copy link
Collaborator

(Visibility: @jfbastien, @lukewagner, @sunfishcode)

Wasm does not yet have a model for shared memory and atomics but I know it has been discussed. And I've already received some feedback from the wasm stakeholders (some of which are cc'd). What I'd like to have at this point is written feedback, specifically in this ticket, that the design outlined for shared memory in JS is not seen as conflicting in significant ways with the design one has envisioned for wasm. (Or, or course, a discussion of any conflicts :) This is due-diligence work for getting the shared memory spec to Stage 3 in TC39.

I realize that what we have for JS is a subset of what we want for wasm, eg, in the JS spec there are only seq_cst atomics, and the definition of isLockFree is fairly weak. What I want to be sure is that any limitations in the JS spec do not get in the way of providing wasm with something more; that, while the limitations will impact wasm code that interacts with JS, they will be compatible with whatever wasm will eventually design in.

@lars-t-hansen lars-t-hansen added this to the Stage 3 milestone Feb 8, 2016
@jfbastien
Copy link
Contributor

At a high-level, yes SAB's design is in line with wasm. We'll want both to be able to communicate through atomics and futex / synchronic so having a compatible model is important.

Specifically:

  • non-lock-free atomics should use the same synchronization mechanism between the two (e.g. lock shard).
  • Futex / synchronic must be able to wait on the same ID and wakeup each other.
  • Atomics must remain address-free in both cases.
  • If JS ends up only having seq_cst and wasm has more then we have to ensure the memory models work together (the memory model must ensure accesses on both sides synchronize with each other as if they were from the same memory model).
  • JS can't start supporting other memory model primitives which wasm doesn't know about without making sure both memory models are compatible.

In C++ this would look like shared memory (from C++ wasm's perspective JS is doing external modifications to the shared heap) so C++ wasm code should use volatile atomic. This implies that post-translation from C++ atomics have volatile atomic semantics, or we add a volatile qualifier to identify memory locations which are potentially externally modified (or we add a "not-volatile" qualifier in a future revision, loosening the memory model without breaking existing code).

@lars-t-hansen
Copy link
Collaborator Author

cc @littledan

I was chatting to @lukewagner and he suggested to me that one possible model (not memory model, really) for wasm is to go with a SAB-like system with an additional reserve/commit mechanism, to allow heap growth.

One big question to me is whether wasm will try to re-introduce the atomic objects of C++ and C, or whether it will go with the low-level model that is proposed for JS and asm.js, where the translator implements those objects on top of a flat memory. I assume it almost has to go the latter direction, in which case aligning the memory models would be simpler.

Another question is about the formalism used for defining the memory models. Currently the JS spec uses an axiomatic formalism based on happens-before rules and a somewhat loose idea of what might happen during races (since UB is not an option). This formalism has some known problems, notably with memory that can be accessed both as atomic and non-atomic, and there are also concerns about whether it properly can limit out-of-thin-air results without ad-hoc rules to prohibit that. That formalism is probably adequate though for the simple system in the JS spec, where everything is SC. But it is probably not adequate for wasm, and wasm needs something hairier anyway to deal with release/acquire and relaxed accesses. There's also the question (Issue #71) whether the memory model does not need to address more directly the outcome of races.

It would be good to know, as soon as we can, how wasm is planning to formalize its memory model, in case there's a chance that we can choose known-compatible formalizations.

@jfbastien
Copy link
Contributor

I agree that wasm should go the SAB route, which is why I want to make sure we get SAB right.

@lukewagner
Copy link

One big question to me is whether wasm will try to re-introduce the atomic objects of C++ and C, or
whether it will go with the low-level model that is proposed for JS and asm.js, where the translator
implements those objects on top of a flat memory. I assume it almost has to go the latter direction, in
which case aligning the memory models would be simpler.

Agreed. I think the litmus test is: can we reuse all the non-JS-specific bits of SAB/Atomics in the impl.

To your broader question, another litmus test is I think: when it comes to the memory model and ordering constraints: we shouldn't need a "wasm" mode in the compiler that affects the codegen or optimizations on Atomic ops. I'm not aware of any such cases atm, but I'll admit I don't have an accurate picture in my head right now of what precisely is in the spec.

It would be good to know, as soon as we can, how wasm is planning to formalize its memory model, in
case there's a chance that we can choose known-compatible formalizations.

I'm not an expert (or even practitioner) on this topic, but I spent a bit of time reading into it last summer and became tentatively convinced that, for wasm, especially since we're starting with an operational semantics anyway, we should specify the memory model operationally. I won't attempt to summarize them here, but I found the arguments given at the beginning of Gustavo Petri's thesis really intuitive and compelling.

One idea I've had in the back of my mind is that we could factor out the memory model spec so that it can be literally shared by JS/wasm. Specifically, I was thinking the shared shmem spec would define a representative mini-language (perhaps just taking wasm and unifying nodes that are no different wrt the memory model) and then JS/wasm specs would define mapping into the mini-language (the wasm one perhaps being trivial). For JS, having to explicitly enumerate all features to define this mapping may end up sussing out interesting corner cases. Given the size/history of this field, I'm sure this approach has been considered before, so I'd be interested to hear whether it's worked in the past. cc @rossberg-chromium.

@lars-t-hansen
Copy link
Collaborator Author

Agreed we should not have a "wasm" mode.

Re operational models, there was a nice paper in POPL this year that argues that it's possible to get around the thin-air values problem using one. (Addressing C++, they have to contend with lock/unlock but are allowed to assume that atomic objects don't overlap non-atomic ones.)

I'm not opposed to try to share the JS and wasm models, but we have deadlines for the JS work that I think are still important (stage 3 in September is probably the latest we can go, and even that makes me very, very nervous, see also #91). Wasm has not yet started its shared memory work (for obvious, good reasons) and blocking on wasm is not at all appealing. The JS spec can define a mini-language as you suggest but of what concrete value will it be if we can't hold it up against the wasm model? For all we know it will be unsuitable for what wasm ends up with. JS will not have any need to deal with release/acquire or relaxed atomics, for example, modulo our discussion of the outcomes of data races.

@littledan
Copy link
Member

I strongly support sharing the memory model between JS and wasm. Ultimately, it seems very possible that the wasm heap will be accessible to JS as an SAB or something like it, or implictly through the FFI; having two different memory models accessible to JS sounds like a very suboptimal outcome for both implementors and users.

Seems like there are two separate, orthogonal questions--how to integrate weaker atomics/other operations, and what subset of that to expose to "memory model embedders" like JS and wasm. Ideally, there would be one shared, factored-out piece of text that both would reference.

If there's not enough time for wasm contributions at the moment, but there are later discoveries of a bug or missing feature in the shared memory model, then of course we can take pull requests. At any stage in the process, and also in the main spec, pull requests can be made and accepted. This includes for integrating things with other operations, even if those operations aren't exposed to JS.

I like @lukewagner 's idea of a mini-language to define the memory model in--in ECMASpeak terms, this could be a list of records which are operations; some may object to mapping down to wasm as it could increase the effective size of a self-contained ECMAScript spec significantly (probably accidentally making references to lots of other parts of wasm), but personally I wouldn't mind.

@lukewagner
Copy link

@lars-t-hansen Sorry, I failed to add that I think it's reasonable to start off with a natural-language-prose, axiomatic definition in JS (exactly like you have now) and "factor out" the memory model into a JS-wasm shared spec when we get serious about shmem in wasm (switching to operational at that point). There's some risk we'd put ourselves in a corner with the first version, but I expect we have a bit of leeway to tweak as long as we move on to the operational model quickly.

@littledan

Ultimately, it seems very possible that the wasm heap will be accessible to JS as an SAB or something
like it

Near-certainty, I'd say :)

some may object to mapping down to wasm as it could increase the effective size of a self-contained
ECMAScript spec significantly

I'd been hoping that there was a handwavy scheme that allowed us to sidestep or reuse most of the JS spec; if we had to compile many various complex JS operations in wasm, that'd be a nonstarter. If there isn't a good handwavy way to avoid reimplementation, maybe an alternative to a mini-language is to have the shared shmem spec define an abstract shared-memory object and a set of abstract operations on it that would get called from both the JS and wasm specs (just like they both call abstract IEEE754 operations today). This might be overly simplistic, though, given that memory models involve an interplay between control flow and the actual memory access; I should probably stop guessing and someone more experienced in memory models should step in :)

@lars-t-hansen
Copy link
Collaborator Author

@lukewagner, I could live with that, we'll see what others think. I'm actually a little concerned about whether the axiomatic model is strong enough to express desirable properties of what might happen for races (ref #71), and of course there's the thin-air issue (ref #82), there might be a number of prose side-conditions, not a great situation. An operational model might be clearer on those points (and more in the spirit of the ES spec).

@jfbastien
Copy link
Contributor

One different we'll observe between SAB and wasm is w.r.t. alignment: wasm mandates that unaligned accesses function properly.

The memory model currently assumes that SAB accesses are naturally aligned because there's no way to create unaligned accesses. wasm's current model will be different when we add atomics.

Should we:

  1. Design SAB's memory model with misaligned access support (even though SAB can't use it).
  2. Add misaligned access support to wasm only, and figure out how that affects SAB↔wasm interactions.
  3. Revisit wasm's misaligned access support.

@lars-t-hansen
Copy link
Collaborator Author

I'm surprised that misaligned accesses are supported in wasm, but my surprise could stem from my belief that misaligned access support is disabled in most systems that use ARM and that the cost of emulating that in software could be prohibitive. I could be wrong about that. (Data welcome.)

Support for unaligned accesses is not necessarily a huge problem for a memory model since we could always say that non-atomic accesses are sets of byte accesses, but makes it awkward to incorporate single-copy atomicity, perhaps, if we want to talk about that (see #71). And unaligned atomics would be a no-no but I assume that is already assumed by the wasm group.

I must confess I don't much feel like taking a bullet for wasm in this case, so options 2 and 3 have my vote, but option 1 does not. (Also we already know it's going to be challenging to formalize what we have to have for SAB.)

@jfbastien
Copy link
Contributor

I'm surprised that misaligned accesses are supported in wasm, but my surprise could stem from my belief that misaligned access support is disabled in most systems that use ARM and that the cost of emulating that in software could be prohibitive. I could be wrong about that. (Data welcome.)

Context from wasm is:

And unaligned atomics would be a no-no but I assume that is already assumed by the wasm group.

How would you specify it, though?

I must confess I don't much feel like taking a bullet for wasm in this case, so options 2 and 3 have my vote, but option 1 does not. (Also we already know it's going to be challenging to formalize what we have to have for SAB.)

As you'll note I'm not a fan of specifying that misaligned accesses "just work". If you want to push for option 3 then I'm happy to help. I think we can spec very sensible things, not all-out-undefined, while making the memory model sane.

@sunfishcode
Copy link
Member

What if we just say that all unaligned accesses, plain or atomic, properly hinted or not, are sets of byte accesses?

@sunfishcode
Copy link
Member

... plus permission to access additional bytes above and below, out to the nearest reasonable alignment boundary.

Big picture: Assuming this basic approach is reasonable, the SAB spec itself doesn't need to address it specifically; we can work out the details as part of the wasm work.

@lukewagner
Copy link

Regarding "no unaligned accesses in JS": I thought that, if it's not in the proposal already, we would eventually allow creating a DataView on a SAB and this would give JS unaligned accesses.

Treating unaligned accesses semantically a sequence of byte accesses has another advantage: if we disallow thin-air values for aligned accesses as is proposed in the other issue (defining it to be one of the racing values), then we'd transitively be disallowing thin-air values for unaligned accesses too (effectively constraining the result to a byte-level scrambling of all the racing stores).

@lukewagner
Copy link

@lars-t-hansen The key hard constraints that motivated the current wasm design are:

  • avoid adding overhead to x86 and new ARM where unaligned just work
  • avoid making unaligned accesses nondeterministic

Thus, the obvious clean design of having unaligned accesses throw would break constraint 1 by requiring extra branches (I checked a long time ago and no we can't use the x86 alignment flag for this purpose; it requires kernel permissions which often aren't present). The second obvious design of allowing unaligned access to fault only on archs where the hardware faults breaks 2 and would be a super-common portability hazard (given that unaligned access is a relatively common bug in big codebases).

@lars-t-hansen
Copy link
Collaborator Author

@lukewagner, the spec allows DataView on SAB but all accesses are racy. The ES spec is probably not specific on what the access granularity is, the implication is probably byte-by-byte access, but the implication has no teeth in unshared memory apart from avoiding alignment faults. In shared memory it may actually be observable how the accesses are implemented (byte-by-byte or wider), and if we start talking about single-copy atomicity in the spec (#71) that may start to matter. Not saying we can't do it, just that there are some interesting problems here.

@lars-t-hansen
Copy link
Collaborator Author

Just to wrap this up:

We'll stick with the current axiomatic formalism for now. It'll be a goal to extract a joint memory model for JS and wasm later, probably using some operational formalism. The current JS work is so constrained that it does not seem to endanger whatever wasm wants to do in the future, and likely the work on wasm will be underway before the JS model needs to expand further.

If/when the axiomatic formalism collapses we'll revisit, but then with wasm in mind.

A stray observation re @jfbastien's "how to spec the prohibition on unaligned accesses": wasm should specify its atomic operations as operating on naturally aligned addresses exclusively (most hardware requires this) with (probably) a built-in cross-platform masking of the low bits of the effective address before the access or (less probably) a trap for non-aligned effective addresses. For atomics, unlike for normal heap accesses, the cost of the mask would be negligible.

@lars-t-hansen
Copy link
Collaborator Author

Summary of this discussion added to DISCUSSION.md.

This was referenced Jul 20, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants