Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spinloop relaxation instruction #15

Open
lars-t-hansen opened this issue May 24, 2017 · 12 comments
Open

Spinloop relaxation instruction #15

lars-t-hansen opened this issue May 24, 2017 · 12 comments

Comments

@lars-t-hansen
Copy link

(Forked from issue #11)

One operation that never made it into ES was an instruction to relax in a spinloop (ie, the PAUSE instruction in x86, IIRC). It was in the very earliest drafts of the spec but I think that, more than anything, it was considered very low-level for ES and something that would create controversy. (I experimented with a related idea, a micro-wait primitive with a back-off scheme, and I called that "pause" too, but it's not what I'm talking about here.)

I know that PAUSE is important for performance on x86. I don't know if ARM has anything similar in eg its event instructions (WFE). I don't know if it's a good idea to push this through for the MVP since we won't be able to remove it again if it turns out to be the wrong thing.

@jfbastien
Copy link
Member

If we can get a rough sketch and some performance information then this is worth polling at the meeting.

@lars-t-hansen
Copy link
Author

lars-t-hansen commented May 29, 2017

There's a writeup from Intel here: https://software.intel.com/en-us/articles/benefitting-power-and-performance-sleep-loops. Numbers at the end, though exactly how much they mean is a little vague because the paper mixes the discussion of PAUSE with a discussion of backoff. Generally it appears to be seen as a substantial benefit for hot spinloops because it lowers power consumption and relinquishes processor resources such as holds on the bus.

On ARM this is the YIELD instruction (since ARMv6K).

On POWER this is provided by different instructions in different subarchitectures, see eg https://stackoverflow.com/questions/5425506/equivalent-of-x86-pause-instruction-for-ppc/7588941#7588941.

Apparently some C/C++ compilers provide _mm_pause() as an intrinsic to allow the compiler to emit this instruction.

@lars-t-hansen
Copy link
Author

Oh, and as for a sketch, this would be an operand-free instruction, call it PAUSE or YIELD, that acts as a nop and has at most a performance impact. It would always be correct for a compiler to generate no code for the instruction.

@binji
Copy link
Member

binji commented Jun 6, 2017

Oops, maybe better to put my comment from #11 here:

There wasn't a poll about this in the CG meeting, but there seemed to be interest in exploring adding a yield operation (as opposed to pause). This would be defined in the spec as being a no-op, but would be a hint to implementations to yield execution to another thread.

@bitter
Copy link

bitter commented Jan 11, 2023

Hi! I'm looking at spin loop relaxation in Unity and without explicit instruction support in wasm it's tricky to get right. We either go with something like nop which does no relaxation or we do a scheduler yield, something like atomic_wait with a timeout of zero. None of these are very good options. What we really want is to make sure we don't consume massive amounts of power when hitting spin loop constructs.

What are the chances of getting something like yield into wasm?

Cheers,
Martin

ps. Let me know if you want me to open a WebAssembly/design issue on this instead.

@bitter
Copy link

bitter commented Jan 11, 2023

fwiw I like the old rust naming of this as it's very explicit - spin_loop_hint. And it doesn't necessarily map directly to one of the instructions mentioned in this thread. On aarch64 for instance they map it to isb sy.

rust-lang/rust@c064b65

@tlively
Copy link
Member

tlively commented Jan 11, 2023

I think adding a yield instruction would be a good idea. Assuming folks would want to see real-world performance data, I would be happy to help prototype it in LLVM, Binaryen, and Emscripten if any engine implementers would be interested in prototyping it on the engine side.

@conrad-watt
Copy link
Collaborator

conrad-watt commented Jan 11, 2023

No objections here - we previously had a short discussion in the in-person meeting where I advanced the opinion that this should be a hint on a loop or branch opcode, but given that it seems to be exposed as a separate instruction both above and below us in the compilation chain, that's probably not a good idea.

One drop of bikeshedding - yield is quite a generic term and we already have the potential for multiple brands of concurrency in the spec competing for names (threads, continuations etc). How would we feel if the instruction was called something like skip_busy?

@tlively
Copy link
Member

tlively commented Jan 11, 2023

Maybe nop_yield or nop_pause to make the semantics very clear and still associate the instruction with its purpose using a well-known, unsurprising name?

@conrad-watt
Copy link
Collaborator

Oops, I forgot that our "nop" instruction is called nop instead of skip. Then I'd suggest nop_busy, but I'd also be fine with nop_pause.

@lars-t-hansen
Copy link
Author

In the old discussion the best suggestion was probably pause. FWIW, I think spin_loop_hint as suggested above is less mysterious than any of the other suggestions here. The only function of this instruction is to relax the spin loop; a name that does not mention that fact seems amiss.

@titzer
Copy link

titzer commented Jan 13, 2023

(Just to weigh in on the bikeshedding of the name). I think the name should connote its rough hardware effect rather than its use, because people end up finding new uses for things in the future. Yet, the Intel instruction to which this will map is called "pause" and its documentation reads:

Improves the performance of spin-wait loops. When executing a “spin-wait loop,” processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.

An additional function of the PAUSE instruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption.

This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation).

tlively added a commit to WebAssembly/shared-everything-threads that referenced this issue Apr 23, 2024
This instruction was originally discussed in
WebAssembly/threads#15, but did not make it into the
original threads spec.
tlively added a commit to WebAssembly/shared-everything-threads that referenced this issue Apr 26, 2024
This instruction was originally discussed in
WebAssembly/threads#15, but did not make it
into the original threads spec.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants