Non-lock-free atomics + non-atomic exposes tearing #96

jfbastien · 2016-03-30T20:46:36Z

From a discussion at TC39, @waldemarhorwat disagreed when I said that non-lock-free atomics allowed the user to observe tearing when using non-atomics.

This falls out of the spec but isn't explicit, so I'd like to get @waldemarhorwat's agreement, and I think we'll want to state it explicitly in the spec.

Consider this pseudo-code:

i64 global = 0;

// Thread 1:                                   | // Thread 2:
Atomics.store(global, 0xDEADBEEFCAFEC0DE);     | i64 local = global; // Non-atomic load!
                                               | print(global);

What are the allowed outcomes?

When isLockFree(8) === 1 then the allowed outcomes are that thread 2 prints either 0 or 0xDEADBEEFCAFEC0DE.

When isLockFree(8) === 0 then tearing can be observed: Atomics.store of a 64-bit value operates by:

Acquiring a lock in the implementation.
- In C++ this is provided by gccmm, the non-lock-free __atomic_* operations are in a shared library and usually acquire a lock from a lock-shard (hashed based on the address of the atomic, and potentially the size) to reduce contention. See compiler-rt and libatomic implementations.
- In SAB / wasm this would be provided by the VM.
Doing the store non-atomically.
Unlocking.

The non-atomic load is racy. In C++ that's UB, but for SAB we'd like to define what the race can produce. We already guarantee that 32-bit operations are lock-free, the only sane outcome here is that you can print one of 0, 0xDEADBEEFCAFEC0DE, 0xDEADBEEF00000000, or 0x00000000CAFEC0DE.

There are some other interesting things to spec out when non-lock-free accesses are used: what are valid lock implementations? That'll affect what type of tearing can be observed when mixing accesses of different size or when mixing atomic and non-atomic accesses. Again, I think specifying that some interleaving of old+new sub-values can be observed makes sense, without specifying their order.

The text was updated successfully, but these errors were encountered:

taisel · 2016-03-30T22:10:18Z

Any issue against standardizing the already expected results that are classified as undefined in C++?

taisel · 2016-03-30T22:10:56Z

What I mean for that is the platform specific behavior that seems to persist across multiple hw, even though it's not supposed to be "predictable".

I'm feeling deja vu with signed shift right (of negative numbers) being "undefined" in a bunch of ye olde languages even though most programmers were never aware it was or that it once had a good reason.

jfbastien · 2016-03-30T22:26:46Z

Any issue against standardizing the already expected results that are classified as undefined in C++?

That's what I'm advocating for, yes.

taisel · 2016-03-30T23:11:32Z

What hardware would be a valid opposition to this then? Reminds me of ones' complement machines in the 70s being a driving force behind the undefined ASR special case.

jfbastien · 2016-03-30T23:35:44Z

What hardware would be a valid opposition to this then?

I don't understand what you're asking. What's "this"? What I'm proposing is to clarify the memory model.

waldemarhorwat · 2016-03-30T23:36:59Z

JF: At TC39 you made the claim that the observation of tearing by using non-atomic operations to look at atomic values depends on whether the atomics are lock-free or not. That's what I was repeatedly saying was incorrect. As you mentioned in the message above, when isLockFree(8) == 0, tearing can be observed. However, when isLockFree(8) == 1, then tearing can still be observed. Just because atomics can use an expensive store that stores without tearing does not imply that all non-atomic loads are also done as indivisible units. I've worked with architectures that don't do this, and it's also perfectly reasonable for a compiler to split the 8-byte non-atomic load into two 4-byte non-atomic loads.

taisel · 2016-03-30T23:41:26Z

I'm asking what hardware would give a result not outlined above.

jfbastien · 2016-03-30T23:52:02Z

JF: At TC39 you made the claim that the observation of tearing by using
non-atomic operations to look at atomic values depends on whether the
atomics are lock-free or not. That's what I was repeatedly saying was
incorrect.

You indeed were loudly repeating that something was incorrect, but that wasn't what I was talking about. It's now clear that what I said isn't what you understood, and I suggest that giving me the chance to finish what I'm saying would reduce miscommunication.

As you mentioned in the message above, when isLockFree(8) == 0, tearing can
be observed. However, when isLockFree(8) == 1, then tearing can still be
observed. Just because atomics can use an expensive store that stores
without tearing does not imply that all non-atomic loads are also done as
indivisible units. I've worked with architectures that don't do this, and

Are these architectures relevant to SAB / wasm? That would be an interesting datapoint which I believe we've ruled out so far.

it's also perfectly reasonable for a compiler to split the 8-byte
non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size isLockFree. It's reasonable but forbidden.

Remember that alignment isn't an issue for SAB: we know that accesses are naturally aligned.

waldemarhorwat · 2016-03-31T07:37:09Z

On Wed, Mar 30, 2016 at 4:52 PM, JF Bastien notifications@github.com
wrote:

JF: At TC39 you made the claim that the observation of tearing by using
non-atomic operations to look at atomic values depends on whether the
atomics are lock-free or not. That's what I was repeatedly saying was
incorrect.

You indeed were loudly repeating that something was incorrect, but that
wasn't what I was talking about. It's now clear that what I said isn't what
you understood, and I suggest that giving me the chance to finish what I'm
saying would reduce miscommunication.

To set the record straight, it is you who started loudly repeating that I
was wrong while providing no other information. You kept on repeating I was
wrong when I said that making the atomic operations lock-free is not
sufficient to prevent tearing without giving me a chance to finish or
offering any explanation. Please reconsider your behavior; it wasn't
appropriate.

As you mentioned in the message above, when isLockFree(8) == 0, tearing can
be observed. However, when isLockFree(8) == 1, then tearing can still be
observed. Just because atomics can use an expensive store that stores
without tearing does not imply that all non-atomic loads are also done as
indivisible units. I've worked with architectures that don't do this, and

Are these architectures relevant to SAB / wasm? That would be an
interesting datapoint which I believe we've ruled out so far.

Is x86 relevant to SAB/wasm? 32-bit x86 has that characteristic — it
readily supports 64-bit lock-free atomics via CMPXCHG8B but you definitely
wouldn't want to use that for all plain loads and stores. 64-bit x86 has
the same property, just with double the bits. Other architectures I've
worked with also have this property.

it's also perfectly reasonable for a compiler to split the 8-byte
non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting
for accesses, atomic and non-atomic, is that size isLockFree. It's
reasonable but forbidden.

It is? Where is it mandating that?

jfbastien · 2016-03-31T16:02:59Z

As you mentioned in the message above, when isLockFree(8) == 0, tearing can be observed. However, when isLockFree(8) == 1, then tearing can still be observed. Just because atomics can use an expensive store that stores without tearing does not imply that all non-atomic loads are also done as indivisible units. I've worked with architectures that don't do this, and

Are these architectures relevant to SAB / wasm? That would be an interesting datapoint which I believe we've ruled out so far.

Is x86 relevant to SAB/wasm?

Yes.

32-bit x86 has that characteristic — it readily supports 64-bit lock-free atomics via CMPXCHG8B but you definitely wouldn't want to use that for all plain loads and stores. 64-bit x86 has the same property, just with double the bits. Other architectures I've worked with also have this property.

Agreed. I can't find the reference but one of the discussion was around isLockFree being guaranteed for pointer size but no more. As you point out this means that the example I provided at TC39 (linked-list with two pointers) won't be very useful.

it's also perfectly reasonable for a compiler to split the 8-byte non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size isLockFree. It's reasonable but forbidden.

It is? Where is it mandating that?

What I meant by "basically": "not spelled out".

Another suggestion in #59 was to spec all non-atomic accesses as "byte accesses", so the example I gave above could have much more tearing and reordering but still seems sane.

lars-t-hansen · 2016-03-31T19:50:25Z

it's also perfectly reasonable for a compiler to split the 8-byte non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size isLockFree. It's reasonable but forbidden.
It is? Where is it mandating that?

What I meant by "basically": "not spelled out".

FWIW, non-tearing for non-atomic (racy) loads of size n even when isLockFree(n)==true was not intended by me when I wrote the spec, so if it is implied by the prose then it is IMO overspecified.

I may well have misunderstood the guarantees of is_lock_free() in C++ when I wrote the prose for SAB, and the prose for SAB is pretty dodgy and leaves too much to the imagination - issue #94 alludes to that - but the only thing I intended to guarantee was that atomic, non-racy accesses to the location would not acquire a lock. The early attempts we've made to specify the outcome of races should not affect that, though addressing #71 might.

Since we only have atomics up to 4 bytes at the moment it's actually true on all hardware we currently care about that isLockFree(n) implies that a racy load of size n will not tear when the obvious code is generated, but (a) the intent is to allow non-obvious code and (b) if we had int64 types then they could still be lock-free on ARMv7 with native atomic instructions (use LDREXD to load and then cancel the reservation) but tear when loaded with obvious native non-atomic instructions (LDRD is single-copy atomic only with the large address extension), and the intent, again, is to allow that mapping.

waldemarhorwat · 2016-03-31T20:09:24Z

In C++ is_lock_free applies only to atomic accesses. It doesn't change their meaning other than imposing performance guarantees of not being blocked by suspended threads. Data races are simply undefined behavior.

jfbastien · 2016-03-31T20:17:42Z

FWIW, non-tearing for non-atomic (racy) loads of size n even when isLockFree(n)==true was not intended by me when I wrote the spec, so if it is implied by the prose then it is IMO overspecified.

My mistake then, I'd read it as implied without being specified.

atomic, non-racy accesses to the location would not acquire a lock

Yes, this is critical to maintain: wasm will likely add support for signals and non-lock-free atomics aren't signal-safe (note: that part of the standard will be overhauled for C++17, the current broken concept of "Plain Old Function" should be gone).

taisel · 2016-04-01T01:44:32Z

Would there be a benefit to splitting isLockFree into two versions to report the minimum and maximum atomicities instead? That is, atomic even when not specified as atomic versus atomic only when accessed as atomic.

lars-t-hansen · 2016-04-01T02:07:47Z

Would there be a benefit to splitting isLockFree into two versions to report the minimum and maximum atomicities instead? That is, atomic even when not specified as atomic versus atomic only when accessed as atomic.

That might be a more interesting question in the context of #71, and/or in the discussion around single-copy atomicity guarantees, though I confess my initial reaction is that it won't be very useful. The existing isLockFree is only marginally useful as it is, now that we've decreed that int32 and uint32 are always lock-free; almost all code will likely use four-byte ints for guaranteed-fastest atomic operations except when extremely memory-constrained.

lars-t-hansen · 2016-05-04T13:40:15Z

I will add a short comment to the section on lock-freedom to clarify the matter raised here, so that there's less scope for confusion in the future.

lars-t-hansen added the Memory model label Mar 31, 2016

lars-t-hansen closed this as completed in c0800ce May 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-lock-free atomics + non-atomic exposes tearing #96

Non-lock-free atomics + non-atomic exposes tearing #96

jfbastien commented Mar 30, 2016

taisel commented Mar 30, 2016

taisel commented Mar 30, 2016

jfbastien commented Mar 30, 2016

taisel commented Mar 30, 2016

jfbastien commented Mar 30, 2016

waldemarhorwat commented Mar 30, 2016 via email

taisel commented Mar 30, 2016

jfbastien commented Mar 30, 2016

waldemarhorwat commented Mar 31, 2016

jfbastien commented Mar 31, 2016

lars-t-hansen commented Mar 31, 2016

waldemarhorwat commented Mar 31, 2016

jfbastien commented Mar 31, 2016

taisel commented Apr 1, 2016

lars-t-hansen commented Apr 1, 2016

lars-t-hansen commented May 4, 2016

Non-lock-free atomics + non-atomic exposes tearing #96

Non-lock-free atomics + non-atomic exposes tearing #96

Comments

jfbastien commented Mar 30, 2016

taisel commented Mar 30, 2016

taisel commented Mar 30, 2016

jfbastien commented Mar 30, 2016

taisel commented Mar 30, 2016

jfbastien commented Mar 30, 2016

waldemarhorwat commented Mar 30, 2016 via email

taisel commented Mar 30, 2016

jfbastien commented Mar 30, 2016

waldemarhorwat commented Mar 31, 2016

jfbastien commented Mar 31, 2016

lars-t-hansen commented Mar 31, 2016

waldemarhorwat commented Mar 31, 2016

jfbastien commented Mar 31, 2016

taisel commented Apr 1, 2016

lars-t-hansen commented Apr 1, 2016

lars-t-hansen commented May 4, 2016