Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Non-lock-free atomics + non-atomic exposes tearing #96

Closed
jfbastien opened this issue Mar 30, 2016 · 16 comments
Closed

Non-lock-free atomics + non-atomic exposes tearing #96

jfbastien opened this issue Mar 30, 2016 · 16 comments

Comments

@jfbastien
Copy link
Contributor

From a discussion at TC39, @waldemarhorwat disagreed when I said that non-lock-free atomics allowed the user to observe tearing when using non-atomics.

This falls out of the spec but isn't explicit, so I'd like to get @waldemarhorwat's agreement, and I think we'll want to state it explicitly in the spec.

Consider this pseudo-code:

i64 global = 0;

// Thread 1:                                   | // Thread 2:
Atomics.store(global, 0xDEADBEEFCAFEC0DE);     | i64 local = global; // Non-atomic load!
                                               | print(global);

What are the allowed outcomes?

When isLockFree(8) === 1 then the allowed outcomes are that thread 2 prints either 0 or 0xDEADBEEFCAFEC0DE.

When isLockFree(8) === 0 then tearing can be observed: Atomics.store of a 64-bit value operates by:

  1. Acquiring a lock in the implementation.
    • In C++ this is provided by gccmm, the non-lock-free __atomic_* operations are in a shared library and usually acquire a lock from a lock-shard (hashed based on the address of the atomic, and potentially the size) to reduce contention. See compiler-rt and libatomic implementations.
    • In SAB / wasm this would be provided by the VM.
  2. Doing the store non-atomically.
  3. Unlocking.

The non-atomic load is racy. In C++ that's UB, but for SAB we'd like to define what the race can produce. We already guarantee that 32-bit operations are lock-free, the only sane outcome here is that you can print one of 0, 0xDEADBEEFCAFEC0DE, 0xDEADBEEF00000000, or 0x00000000CAFEC0DE.

There are some other interesting things to spec out when non-lock-free accesses are used: what are valid lock implementations? That'll affect what type of tearing can be observed when mixing accesses of different size or when mixing atomic and non-atomic accesses. Again, I think specifying that some interleaving of old+new sub-values can be observed makes sense, without specifying their order.

@taisel
Copy link

taisel commented Mar 30, 2016

Any issue against standardizing the already expected results that are classified as undefined in C++?

@taisel
Copy link

taisel commented Mar 30, 2016

What I mean for that is the platform specific behavior that seems to persist across multiple hw, even though it's not supposed to be "predictable".

I'm feeling deja vu with signed shift right (of negative numbers) being "undefined" in a bunch of ye olde languages even though most programmers were never aware it was or that it once had a good reason.

@jfbastien
Copy link
Contributor Author

Any issue against standardizing the already expected results that are classified as undefined in C++?

That's what I'm advocating for, yes.

@taisel
Copy link

taisel commented Mar 30, 2016

What hardware would be a valid opposition to this then? Reminds me of ones' complement machines in the 70s being a driving force behind the undefined ASR special case.

@jfbastien
Copy link
Contributor Author

What hardware would be a valid opposition to this then?

I don't understand what you're asking. What's "this"? What I'm proposing is to clarify the memory model.

@waldemarhorwat
Copy link

waldemarhorwat commented Mar 30, 2016 via email

@taisel
Copy link

taisel commented Mar 30, 2016

I'm asking what hardware would give a result not outlined above.

@jfbastien
Copy link
Contributor Author

JF: At TC39 you made the claim that the observation of tearing by using
non-atomic operations to look at atomic values depends on whether the
atomics are lock-free or not. That's what I was repeatedly saying was
incorrect.

You indeed were loudly repeating that something was incorrect, but that wasn't what I was talking about. It's now clear that what I said isn't what you understood, and I suggest that giving me the chance to finish what I'm saying would reduce miscommunication.

As you mentioned in the message above, when isLockFree(8) == 0, tearing can
be observed. However, when isLockFree(8) == 1, then tearing can still be
observed. Just because atomics can use an expensive store that stores
without tearing does not imply that all non-atomic loads are also done as
indivisible units. I've worked with architectures that don't do this, and

Are these architectures relevant to SAB / wasm? That would be an interesting datapoint which I believe we've ruled out so far.

it's also perfectly reasonable for a compiler to split the 8-byte
non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size isLockFree. It's reasonable but forbidden.

Remember that alignment isn't an issue for SAB: we know that accesses are naturally aligned.

@waldemarhorwat
Copy link

On Wed, Mar 30, 2016 at 4:52 PM, JF Bastien notifications@github.com
wrote:

JF: At TC39 you made the claim that the observation of tearing by using
non-atomic operations to look at atomic values depends on whether the
atomics are lock-free or not. That's what I was repeatedly saying was
incorrect.

You indeed were loudly repeating that something was incorrect, but that
wasn't what I was talking about. It's now clear that what I said isn't what
you understood, and I suggest that giving me the chance to finish what I'm
saying would reduce miscommunication.

To set the record straight, it is you who started loudly repeating that I
was wrong while providing no other information. You kept on repeating I was
wrong when I said that making the atomic operations lock-free is not
sufficient to prevent tearing without giving me a chance to finish or
offering any explanation. Please reconsider your behavior; it wasn't
appropriate.

As you mentioned in the message above, when isLockFree(8) == 0, tearing can
be observed. However, when isLockFree(8) == 1, then tearing can still be
observed. Just because atomics can use an expensive store that stores
without tearing does not imply that all non-atomic loads are also done as
indivisible units. I've worked with architectures that don't do this, and

Are these architectures relevant to SAB / wasm? That would be an
interesting datapoint which I believe we've ruled out so far.

Is x86 relevant to SAB/wasm? 32-bit x86 has that characteristic — it
readily supports 64-bit lock-free atomics via CMPXCHG8B but you definitely
wouldn't want to use that for all plain loads and stores. 64-bit x86 has
the same property, just with double the bits. Other architectures I've
worked with also have this property.

it's also perfectly reasonable for a compiler to split the 8-byte
non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting
for accesses, atomic and non-atomic, is that size isLockFree. It's
reasonable but forbidden.

It is? Where is it mandating that?

@jfbastien
Copy link
Contributor Author

As you mentioned in the message above, when isLockFree(8) == 0, tearing can be observed. However, when isLockFree(8) == 1, then tearing can still be observed. Just because atomics can use an expensive store that stores without tearing does not imply that all non-atomic loads are also done as indivisible units. I've worked with architectures that don't do this, and

Are these architectures relevant to SAB / wasm? That would be an interesting datapoint which I believe we've ruled out so far.

Is x86 relevant to SAB/wasm?

Yes.

32-bit x86 has that characteristic — it readily supports 64-bit lock-free atomics via CMPXCHG8B but you definitely wouldn't want to use that for all plain loads and stores. 64-bit x86 has the same property, just with double the bits. Other architectures I've worked with also have this property.

Agreed. I can't find the reference but one of the discussion was around isLockFree being guaranteed for pointer size but no more. As you point out this means that the example I provided at TC39 (linked-list with two pointers) won't be very useful.

it's also perfectly reasonable for a compiler to split the 8-byte non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size isLockFree. It's reasonable but forbidden.

It is? Where is it mandating that?

What I meant by "basically": "not spelled out".

Another suggestion in #59 was to spec all non-atomic accesses as "byte accesses", so the example I gave above could have much more tearing and reordering but still seems sane.

@lars-t-hansen
Copy link
Collaborator

it's also perfectly reasonable for a compiler to split the 8-byte non-atomic load into two 4-byte non-atomic loads.

Right: the current spec is basically mandating that there be no splitting for accesses, atomic and non-atomic, is that size isLockFree. It's reasonable but forbidden.
It is? Where is it mandating that?

What I meant by "basically": "not spelled out".

FWIW, non-tearing for non-atomic (racy) loads of size n even when isLockFree(n)==true was not intended by me when I wrote the spec, so if it is implied by the prose then it is IMO overspecified.

I may well have misunderstood the guarantees of is_lock_free() in C++ when I wrote the prose for SAB, and the prose for SAB is pretty dodgy and leaves too much to the imagination - issue #94 alludes to that - but the only thing I intended to guarantee was that atomic, non-racy accesses to the location would not acquire a lock. The early attempts we've made to specify the outcome of races should not affect that, though addressing #71 might.

Since we only have atomics up to 4 bytes at the moment it's actually true on all hardware we currently care about that isLockFree(n) implies that a racy load of size n will not tear when the obvious code is generated, but (a) the intent is to allow non-obvious code and (b) if we had int64 types then they could still be lock-free on ARMv7 with native atomic instructions (use LDREXD to load and then cancel the reservation) but tear when loaded with obvious native non-atomic instructions (LDRD is single-copy atomic only with the large address extension), and the intent, again, is to allow that mapping.

@waldemarhorwat
Copy link

In C++ is_lock_free applies only to atomic accesses. It doesn't change their meaning other than imposing performance guarantees of not being blocked by suspended threads. Data races are simply undefined behavior.

@jfbastien
Copy link
Contributor Author

FWIW, non-tearing for non-atomic (racy) loads of size n even when isLockFree(n)==true was not intended by me when I wrote the spec, so if it is implied by the prose then it is IMO overspecified.

My mistake then, I'd read it as implied without being specified.

atomic, non-racy accesses to the location would not acquire a lock

Yes, this is critical to maintain: wasm will likely add support for signals and non-lock-free atomics aren't signal-safe (note: that part of the standard will be overhauled for C++17, the current broken concept of "Plain Old Function" should be gone).

@taisel
Copy link

taisel commented Apr 1, 2016

Would there be a benefit to splitting isLockFree into two versions to report the minimum and maximum atomicities instead? That is, atomic even when not specified as atomic versus atomic only when accessed as atomic.

@lars-t-hansen
Copy link
Collaborator

Would there be a benefit to splitting isLockFree into two versions to report the minimum and maximum atomicities instead? That is, atomic even when not specified as atomic versus atomic only when accessed as atomic.

That might be a more interesting question in the context of #71, and/or in the discussion around single-copy atomicity guarantees, though I confess my initial reaction is that it won't be very useful. The existing isLockFree is only marginally useful as it is, now that we've decreed that int32 and uint32 are always lock-free; almost all code will likely use four-byte ints for guaranteed-fastest atomic operations except when extremely memory-constrained.

@lars-t-hansen
Copy link
Collaborator

I will add a short comment to the section on lock-freedom to clarify the matter raised here, so that there's less scope for confusion in the future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants