Skip to content
This repository has been archived by the owner on Jan 25, 2022. It is now read-only.

Data race spec allows an implementation to reveal passwords #71

Closed
erights opened this issue Feb 28, 2016 · 22 comments
Closed

Data race spec allows an implementation to reveal passwords #71

erights opened this issue Feb 28, 2016 · 22 comments
Milestone

Comments

@erights
Copy link

erights commented Feb 28, 2016

At http://tc39.github.io/ecmascript_sharedmem/shmem.html#Atomics.WithAtomicAccessTo

A write involved in a data race stores a poison value that can read back as anything whatsoever.
[emphasis added]

I appreciate that you tried to spec something tighter, such as that the value had to be one of those written sometime. I would like to understand why you had to backoff. There are statements that it made the spec too complicated or something. Was there a thread on this?

In any case, anything whatsoever means that an implementation with a backdoor which selectively reveals the passwords of other users through this channel, even though none of that information went into the data race, would still be ruled a conforming implementation. Certainly we cannot prevent such backdoors by writing spec language, but we don't need to bless them either.

I acknowledge there may be no simple way to rule this out, in which case it may not be worth addressing. But we should give it some thought.

@lars-t-hansen
Copy link
Collaborator

There were mail threads discussing a tighter spec based on the wording in the original proposal, as well as comment threads on that proposal, but there's nothing on the bug tracker about this.

The Java memory model tries to fix the problem, because it needs to ensure that pointer values that are written racily cannot create bogus pointers. The Java MM is widely considered to be very complicated, probably a little buggy still, and (if I understand correctly) to be somewhat out of touch with what Java compilers want to do and in fact do. I doubt a model as hairy as that would pass muster with TC39. (I won't claim to understand it, either.) JS might need a more complicated model still, since our memory arrays can be aliased.

What you're getting at is similar in spirit to a phrasing that is in the spec already, which is that in a data race the affected locations are limited to being the union of the locations touched by the racing accesses. That, too, has been a little controversial, because the garbage in a race tends to bleed into the rest of memory once it is used in the subsequent computation. JS is safe, so the program can't do just anything, but the garbage can spread from shared memory into non-shared memory in the agents, for example, by being used in control flow, computation, and copy operations. Optimizers likely make the situation worse too. Saying that only the locations touched are poisoned is perhaps technically correct at a low enough level but does not reveal the full impact of the situation.

So when you suggest that we could tighten the spec so that "anything whatsoever" is reduced down to "information that goes into the race" I am sympathetic, but it will be technically correct only in the above sense. If garbage is picked up from a race and used in a computation there is no guarantee that that computation won't accidentally reveal information lifted from elsewhere.

@lars-t-hansen
Copy link
Collaborator

Similar concerns as Mark's voiced by @pizlonator in Issue #77.

@lars-t-hansen
Copy link
Collaborator

If we can agree about the following two rules for how shared memory behaves then I think we can make some progress:

  • A read from shared memory does not fail to produce a value.
  • The shared memory does not fail to honor a write, but if there is a race the memory can arbitrarily select bits from the operands of the racing writes and the previous memory contents at the racy written-to locations.

Mainstream memories behave that way already, so these rules are really restrictions on the compiler: if there is a race, and the compiler can prove that there is a race, it cannot produce a completely arbitrary value (thus risking revealing a secret by eg reusing register contents), but must produce a value that could be a combination of some values written to memory. (It could produce the last value written by the thread executing the read.)

The rules would come in addition to rules we already have (or should have, or should follow from other rules):

  • The racing locations are the union of the locations touched by all accesses involved in the race (Issue 22 has more)
  • The values at racy locations are not necessarily stable (due to propagation delay and differences in observed modification order in weak memories)

@pizlonator
Copy link

On Mar 17, 2016, at 4:11 AM, Lars T Hansen notifications@github.com wrote:

If we can agree about the following two rules for how shared memory behaves then I think we can make some progress:

A read from shared memory does not fail to produce a value.
The shared memory does not fail to honor a write, but if there is a race the memory can arbitrarily select bits from the operands of the racing writes and the previous memory contents at the racy written-to locations.
This is still too weak.

I think that we should promise that writes to an aligned atomic word either happens or doesn't happen and there is no mixing of those bits with the bits that were already there.

-Filip

Mainstream memories behave that way already, so these rules are really restrictions on the compiler: if there is a race, and the compiler can prove that there is a race, it cannot produce a completely arbitrary value (thus risking revealing a secret by eg reusing register contents), but must produce a value that could be a combination of some values written to memory. (It could produce the last value written by the thread executing the read.)

The rules would come in addition to rules we already have (or should have, or should follow from other rules):

The racing locations are the union of the locations touched by all accesses involved in the race (Issue 22 has more)
The values at racy locations are not necessarily stable (due to propagation delay and differences in observed modification order in weak memories)

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@lars-t-hansen
Copy link
Collaborator

This is still too weak.

Why?

I think that we should promise that writes to an aligned atomic word either happens or doesn't happen and there is no mixing of those bits with the bits that were already there.

What do you mean, "atomic word"? This bug is specifically about what happens during races. (Also we don't have "atomic" vs "non-atomic" memory so presumably you mean something else.)

@pizlonator
Copy link

On Mar 17, 2016, at 9:59 AM, Lars T Hansen notifications@github.com wrote:

This is still too weak.

Why?

I think that we should promise that writes to an aligned atomic word either happens or doesn't happen and there is no mixing of those bits with the bits that were already there.

What do you mean, "atomic word"? This bug is specifically about what happens during races. (Also we don't have "atomic" vs "non-atomic" memory so presumably you mean something else.)

When I say "atomic word”, I mean a word whose size is small enough that the CPU can operate over it atomically. I’m going to use this term in the rest of this e-mail. Feel free to suggest a different term for this concept.

I’m not referring to atomic memory. I don’t know what that is.

On every CPU that we will have to worry about, if the atomic word size is int32, then any aligned int32 store is guaranteed to either happen or not happen. It is an atomic operation. Same thing for loads. If the atomic word size is int32, and you do an aligned int32 load, then you will load what the bits were at some moment in time.

Note that whether it’s an atomic operation (it happens all at once) is separate from how its ordered with respect to other operations. I’m not suggesting sequential consistency or anything like that.

This is an important feature of any memory model. Currently we are specifying behavior of shared memory that does not contain pointers from ES’s standpoint. But whatever model we define here may be used in a world that has pointers - for example in asm.js, an int32 is used as a pointer. Also, if shared memory semantics spread to other parts of the ES spec, then this memory model will be a starting point.

Something that makes shared memory behave somewhat sensibly in the presence of pointers is that if you have a race on a pointer, like this:

void* ptr; // global variable, starts out null

// called from thread A without any locking
void foo(v)
{
ptr = v;
}

// called from thread B without any locking
void* bar()
{
return ptr;
}

(The equivalent of this in ES with shared array buffer would be that “ptr” is an int32 at some aligned 4-byte aligned offset in a shared array buffer.)

Then bar() is guaranteed to return either null or some value ‘v’ stored by thread A. This is guaranteed by all compilers on all hardware platforms that I’m aware of. If this was not guaranteed, then you would not be able to write the above program. It turns out that people write such programs. WebKit is an example of a program that does this. I suspect that using data races over pointers comes up frequently in large parallel programs. I’ve used this trick in the past in parallel numerical programs. We want shared array buffers to be used by large parallel programs. So, I think that we want to guarantee that even in the presence of races, an aligned load of an atomic word will not load any value other than what was stored into that word in the past, so long as the past stores were to that entire word and they used an atomic word size.

We can still have word tearing if you load or store something larger than the atomic word size. For example, storing and loading doubles may cause word tearing.

I suspect that the “aligned” requirement is tautological in our context since the rules for mapping a typed array onto an array buffer require that the elements of the typed array are native-aligned.

This means that the only case in which performing an atomic load would give anything other than someone atomic stored is if there were stores of different sizes, for example byte stores used to populate an int32. I suspect that we could spec quite precisely what the possible outcomes would be. It’s true that the programmer can only rely on some random interleaving of the various stored bits - at a minimum, stores to bytes are always atomic, even in case of misaligned access.

-Filip

@lars-t-hansen
Copy link
Collaborator

@pizlonator,

I see "access-atomic" or "single-copy atomic" used in the literature to describe your "atomic word", but in any case we agree on what we're talking about.

The early drafts of this spec actually baked that feature in, notably for int32, for the use cases you quote, and precisely because it seemed that we could rely on all (modern) hardware doing the right thing with regular load and store instructions.

I got a fair amount of pushback on that, some of it along the lines that if I want relaxed atomics I should just add them, not try to smuggle them in. In any event, since my then-lofty goal was to be able to support translation from JVM/CIL and I would need many more things for that than just this feature, it was not a hardship to take it out.

I'm not sure this would be so easy to specify as you think it is: I really want to stay away from relaxed atomics for version 1 if I can but this feature is not far from that. On the other hand, the spec is currently ignoring a similar issue, namely that it's hard to prove soundness of the memory model if memory can be accessed both by atomic ops (ie those on the Atomics object) and regular data accesses (single-copy atomic or not), so perhaps the two issues can be tackled together as they are probably related.

Waving my hands some, it seems that this could be somewhat well-defined: if all racy accesses to a location from some set of threads are all of the same size and happened-after some kind of synchronizing action that the threads all participated in then we have a guarantee of single-copy atomicity on that location. (Restrict to int32 at will.) Whether that fits into the memory model I'm not sure. Whether it's useful I'm also not sure, threads like to allocate memory and pass it on to other threads without that type of global synchronization.

@pizlonator
Copy link

@lars-t-hansen Can you tell me more about the kind of pushback you got? I don't see this as smuggling in relaxed atomics. Unless the spec mandates SC (which isn't what anyone advocates AFAICT), basic memory access operations will exhibit relaxed behavior. So, saying nothing about this means that we have already "smuggled" them in. This proposal could specify a memory model that provides a more useful constraint on what you might load from memory without also including a concept of relaxed atomics in the API. This is what I'm advocating for.

Also, I'm not implying anything about the relative difficulty of specifying this. I'm aware of the complexity of memory models and the effort required in creating them. I'm just not sure it's a good idea to proceed with a memory model that is so loose. We want to be able to deflect criticisms about this introducing unnecessary nondeterminism. We also want to preclude implementations from overoptimizing atomic operations on the grounds that the spec permits poisoning on any race.

@binji
Copy link

binji commented Mar 23, 2016

Unless the spec mandates SC (which isn't what anyone advocates AFAICT), basic memory access operations will exhibit relaxed behavior.

Relaxed with regard to memory ordering, yes. But are you suggesting that they should not be allowed to be elided or duplicated by the compiler either?

@jfbastien
Copy link
Contributor

Agreed with @binji: I'm not sure I understand what you're advocating w.r.t. non-atomic operations.

@pizlonator
Copy link

@binji I'm not suggesting that.

The current memory model appears to follow the C++ model with respect to races. I think that's too weak for ES. It's strange that the spec creates a situation where a racing program could see values that were never written to the memory location from which it is loading.

Basically, I'm curious why the C++ model was chosen when stronger alternatives are available. For example, I believe that the ARM memory model would guarantee that a racing program would still only see those values that were ever written rather than crazy town.

I appreciate why C++ went for a very loose model - C++ is all about not compromising on performance even if it means having a spec that leaves a lot of undefined behavior. But that's not how the web usually works. It would be weird to allow nasal demons to sneak in through shared array buffers when we are going to such great lengths to avoid them elsewhere.

@binji
Copy link

binji commented Mar 25, 2016

@pizlonator I seem to remember there being an example of a compiler generating speculative writes to memory, which are valid for single-threaded code, but would break this model. Perhaps there are other scenarios like this?

@jfbastien
Copy link
Contributor

@pizlonator
Copy link

@binji @jfbastien I think that what you guys are talking about is orthogonal to the question I'm posing.

@lars-t-hansen appears to advocate limiting the memory model to return something other than pure garbage in #71 (comment). I'm advocating that this constraint is strengthened to respect the atomicity of writes. @lars-t-hansen's wording in that comment implies that word-tearing can happen at the bit level. That's not how memories work. I've asked for clarification on why we want bit-level tearing instead of word-level tearing. I'd still like to hear more clarification on that. @lars-t-hansen provides some insight in #71 (comment), but this only refers to some pushback without going into details. I want the details please.

Note that this discussion has an interesting overlap with the Zappa paper. That paper claims that preventing out-of-thin-air results is super hard, and I would tend to agree. But I think that the constraint that @lars-t-hansen suggests, and that I'm asking to strengthen, doesn't stumble upon the badness that the Zappa paper describes. If it did, then I believe that even constraining a loaded value to the "bit mixing" of stores as @lars-t-hansen suggests would not be any more sound than the word mixing that I want.

This has nothing to do with the compiler eliminating loads as @binji asked. Whether a compiler can eliminate a load doesn't impact the atomicity of the load. @lars-t-hansen already advocates that the spec should honor stores, and so speculative writes to memory would become illegal. @binji, when you bring up this case, are you claiming that the spec should allow it, or that it shouldn't allow it? It feels like this observation is actually unrelated to the discussion about bit-level tearing versus word-level tearing.

Finally, so far I've seen arguments defending the current very weak model on the grounds that the JMM is too complex and that the C++ MM is broken. Those are both true statements. But both of those memory models were created in an environment where no performance compromises were permitted. I'm not sure we are in the same world. An asm.js or WebAssembly memory access is already more expensive than a native one, from what I understand that overhead is about 20%-40% depending on whether you do virtual memory tricks. This means that any additional overhead from the memory model will have a smaller impact than it would have had in those other languages. Also, I think that compromising on semantics to get performance would go against the spirit of the web.

Therefore, an argument in favor of this very weak memory model cannot just rely on the fact that the alternatives might be too slow. I think that at a minimum you'd have to prove decisively that the alternatives really are way too slow. I don't think anyone has done that.

For example, in Sec 8 of the Zappa paper, they cite four articles about memory models that tighten things up while introducing some non-negligible cost. I'd like to understand why shared array buffers don't consider such memory models. If the issue is pragmatism, then this cuts both ways: the pragmatic path historically taken in web specs is to avoid nasal demons, since nasal demons tend to get a lot of push-back. So, choosing a memory model that relies on poisoning may not actually be all that pragmatic.

@binji
Copy link

binji commented Mar 25, 2016

Sorry, I see what you're arguing for now -- just bit tearing vs. word tearing, as Lars mentioned in the comment above. Yes, I agree bit tearing doesn't make sense; my understanding of the pushback Lars got about the original proposal was not about arbitrary tearing but about reading completely arbitrary values as produced by compiler optimizations to single-threaded code. I think we're OK taking some amount of performance hit for memory safety, but if we constrain the values read by a racy normal read, doesn't this require negatively affecting the optimization possibilities of single-threaded code that never uses shared memory? Perhaps we're saying that is only the case when using a SAB or whenever the compiler can't determine whether the typed array is backed by a SAB, in which case this may be acceptable.

@pizlonator
Copy link

@binji,

It's true that JIT compilers will be able to speculate that an array buffer is not shared and then they will be able to assume that the only way that someone else could perform an aliased memory access is by doing it on the current thread. This unlocks optimization opportunities that the memory model would otherwise prohibit. I'm glad you brought it up, but that's not the argument that I'm making.

I think that the discussion about optimizations that would be inhibited by limitations on what could be read should be explicit about which optimizations this actually prohibits. For example, in WebKit we've toyed with speculative stores and we have generally not done them because it's so unlikely to be a profitable optimization. So, we've opted not to do it because it's more trouble than it's worth even in the single-thread case. Because of this, I think that when someone objects to a tightening of the memory model on the grounds that it might inhibit optimizations, then I think that the burden is on that person to say specifically what optimization they are referring to, why they think it's a good optimization in the single-thread case, and why that optimization would be rendered invalid.

So, do you know of a specific optimization that would be inhibited by instituting the constraint that I'm advocating?

As for taking performance hits to tighten the memory model: I brought that up because this is my broader vision for how a memory model for the web should work. I did not mean to imply that constraining out-of-thin-air values would actually hurt performance. I don't have any data that it would, and I don't know if anyone else has any such data.

@lars-t-hansen
Copy link
Collaborator

(Not ignoring this, just no cycles for it yet.)

@lukewagner
Copy link

One additional argument in favor of a stronger memory model is that Java/C++ compilers have a bigger need for optimization potential than wasm since Java/C++ compiler input programs are source and are often written in an "obviously" suboptimal form in the name of readability/maintainability. As a virtual ISA, wasm will have often been optimized already and this further reduces the overall potential for speedup from the optimizations that a stronger memory model would inhibit.

@jfbastien
Copy link
Contributor

To supplement what @lukewagner says: IMO the "right" place to draw the line on optimization for SAB is "compilers can do things which the hardware can also do". The difficulty is in quantifying what the hardware can do! Write buffer, speculative reads, and reorder buffers are somewhat easy to wrap our minds around, but we can't dictate where hardware goes in the future with, say, transactional memory, non-temporals, cache writeback, etc.

C++ is wrestling with this issue as well: http://wg21.link/p0062r0

@pizlonator
Copy link

Also, if both llvm and the wasm backend go to town with concurrently-shady optimizations, then I fear a kind of double jeopardy for code that is racy.

In my experience the worst codegen bugs happen when two seemingly unrelated phases conspire together: the first transforms the code into a form the confuses the latter and then the latter breaks the code. The way we prevent this when designing a compiler is to describe the semantics of the IR so that we can reason about whether a transformation is valid. Then we don't have to think about this in terms of inter-phase contracts but rather a single contract that the whole compiler obeys.

Wasm will be an IR in the sense that it will enable one compiler to pass information to another. If concurrency is permitted then I think we need to have some semantics for it that are stronger than "all bets are off", since otherwise we will increase the likelihood of compiler bugs caused by semantic ambiguity. This is particularly risky in wasm since we will have N producers and M consumers.

-Filip

On Mar 28, 2016, at 9:28 AM, Luke Wagner notifications@github.com wrote:

One additional argument in favor of a stronger memory model is that Java/C++ compilers have a bigger need for optimization potential than wasm since Java/C++ compiler input programs are source and are often written in an "obviously" suboptimal form in the name of readability/maintainability. As a virtual ISA, wasm will have often been optimized already and this further reduces the overall potential for speedup from the optimizations that a stronger memory model would inhibit.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@sunfishcode
Copy link
Member

Also, the shared_ptr optimization example in the C++ standards discussions illustrates the difference between the needs of a source language and the needs of a platform: If C++ defines special rules that enable the atomic operations in shared_ptr to be optimized, they will be optimized in the C++ compiler optimizer before it generates its output. This means the underlying platform doesn't need the special rules, and we can keep it simpler and stronger.

@lars-t-hansen
Copy link
Collaborator

OK, lots of topics in this bug:

Originally we were talking about the possibility of returning garbage. That was then closed off by at least requiring a bitwise interleaving, which, while unrealistic on one level (hardware doesn't work at the bit level and is actually stronger) is realistic on another (hardware doesn't manufacture bits, you get out some combination of what you put in). So no more revealing secrets, hopefully. The language I used was possibly not good enough though, see below.

Then we got to talking about realistic interleavings and how strong the memory model should be. As I said, in the early days of this proposal I had some provision for single-copy atomicity / access-atomicity when conditions were favorable, that is, when a race only involved accesses to the same cells in memory using single operations. I no longer have those comments, they were lost. But the gist was that if that's the behavior I want I should just add relaxed atomics instead. I can't speak to what precisely motivated those comments, maybe it's a C++-centric point of view.

Anyway, I've rewritten the memory model again (new draft coming later today) and I have tried to provide rules that apply to races that are in general much clearer, they attempt to draw the line between legal and illegal program transformations in the presence of races and to describe the types of interleavings that can happen. If the race is favorable, as just described, then we get access-atomicity. This is decoupled from relaxed atomics.

I'm going to close this bug because it is meandering, but (a) do reopen if that seems most reasonable or, better, (b) open new bugs for specific issues with the new draft please.

This was referenced Jul 13, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants