-
Notifications
You must be signed in to change notification settings - Fork 32
Non-lock-free atomics + non-atomic exposes tearing #96
Comments
Any issue against standardizing the already expected results that are classified as undefined in C++? |
What I mean for that is the platform specific behavior that seems to persist across multiple hw, even though it's not supposed to be "predictable". I'm feeling deja vu with signed shift right (of negative numbers) being "undefined" in a bunch of ye olde languages even though most programmers were never aware it was or that it once had a good reason. |
That's what I'm advocating for, yes. |
What hardware would be a valid opposition to this then? Reminds me of ones' complement machines in the 70s being a driving force behind the undefined ASR special case. |
I don't understand what you're asking. What's "this"? What I'm proposing is to clarify the memory model. |
JF: At TC39 you made the claim that the observation of tearing by using
non-atomic operations to look at atomic values depends on whether the
atomics are lock-free or not. That's what I was repeatedly saying was
incorrect.
As you mentioned in the message above, when isLockFree(8) == 0, tearing can
be observed. However, when isLockFree(8) == 1, then tearing can still be
observed. Just because atomics can use an expensive store that stores
without tearing does not imply that all non-atomic loads are also done as
indivisible units. I've worked with architectures that don't do this, and
it's also perfectly reasonable for a compiler to split the 8-byte
non-atomic load into two 4-byte non-atomic loads.
|
I'm asking what hardware would give a result not outlined above. |
You indeed were loudly repeating that something was incorrect, but that wasn't what I was talking about. It's now clear that what I said isn't what you understood, and I suggest that giving me the chance to finish what I'm saying would reduce miscommunication.
Are these architectures relevant to SAB / wasm? That would be an interesting datapoint which I believe we've ruled out so far.
Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size Remember that alignment isn't an issue for SAB: we know that accesses are naturally aligned. |
On Wed, Mar 30, 2016 at 4:52 PM, JF Bastien notifications@github.com
|
Yes.
Agreed. I can't find the reference but one of the discussion was around
What I meant by "basically": "not spelled out". Another suggestion in #59 was to spec all non-atomic accesses as "byte accesses", so the example I gave above could have much more tearing and reordering but still seems sane. |
FWIW, non-tearing for non-atomic (racy) loads of size n even when isLockFree(n)==true was not intended by me when I wrote the spec, so if it is implied by the prose then it is IMO overspecified. I may well have misunderstood the guarantees of is_lock_free() in C++ when I wrote the prose for SAB, and the prose for SAB is pretty dodgy and leaves too much to the imagination - issue #94 alludes to that - but the only thing I intended to guarantee was that atomic, non-racy accesses to the location would not acquire a lock. The early attempts we've made to specify the outcome of races should not affect that, though addressing #71 might. Since we only have atomics up to 4 bytes at the moment it's actually true on all hardware we currently care about that isLockFree(n) implies that a racy load of size n will not tear when the obvious code is generated, but (a) the intent is to allow non-obvious code and (b) if we had int64 types then they could still be lock-free on ARMv7 with native atomic instructions (use LDREXD to load and then cancel the reservation) but tear when loaded with obvious native non-atomic instructions (LDRD is single-copy atomic only with the large address extension), and the intent, again, is to allow that mapping. |
In C++ is_lock_free applies only to atomic accesses. It doesn't change their meaning other than imposing performance guarantees of not being blocked by suspended threads. Data races are simply undefined behavior. |
My mistake then, I'd read it as implied without being specified.
Yes, this is critical to maintain: wasm will likely add support for signals and non-lock-free atomics aren't signal-safe (note: that part of the standard will be overhauled for C++17, the current broken concept of "Plain Old Function" should be gone). |
Would there be a benefit to splitting isLockFree into two versions to report the minimum and maximum atomicities instead? That is, atomic even when not specified as atomic versus atomic only when accessed as atomic. |
That might be a more interesting question in the context of #71, and/or in the discussion around single-copy atomicity guarantees, though I confess my initial reaction is that it won't be very useful. The existing isLockFree is only marginally useful as it is, now that we've decreed that int32 and uint32 are always lock-free; almost all code will likely use four-byte ints for guaranteed-fastest atomic operations except when extremely memory-constrained. |
I will add a short comment to the section on lock-freedom to clarify the matter raised here, so that there's less scope for confusion in the future. |
From a discussion at TC39, @waldemarhorwat disagreed when I said that non-lock-free atomics allowed the user to observe tearing when using non-atomics.
This falls out of the spec but isn't explicit, so I'd like to get @waldemarhorwat's agreement, and I think we'll want to state it explicitly in the spec.
Consider this pseudo-code:
What are the allowed outcomes?
When
isLockFree(8) === 1
then the allowed outcomes are that thread 2 prints either0
or0xDEADBEEFCAFEC0DE
.When
isLockFree(8) === 0
then tearing can be observed:Atomics.store
of a 64-bit value operates by:__atomic_*
operations are in a shared library and usually acquire a lock from a lock-shard (hashed based on the address of the atomic, and potentially the size) to reduce contention. See compiler-rt and libatomic implementations.The non-atomic load is racy. In C++ that's UB, but for SAB we'd like to define what the race can produce. We already guarantee that 32-bit operations are lock-free, the only sane outcome here is that you can print one of
0
,0xDEADBEEFCAFEC0DE
,0xDEADBEEF00000000
, or0x00000000CAFEC0DE
.There are some other interesting things to spec out when non-lock-free accesses are used: what are valid lock implementations? That'll affect what type of tearing can be observed when mixing accesses of different size or when mixing atomic and non-atomic accesses. Again, I think specifying that some interleaving of old+new sub-values can be observed makes sense, without specifying their order.
The text was updated successfully, but these errors were encountered: