Scatter and gather #12

penzn · 2020-06-13T01:41:05Z

As @Maratyszcza and @lemaitre point out in #7, we should consider scatter and gather operations. This is an issue to track that.

Potential topics to discuss:

Emulation (it is only supported by AVX512 and SVE)
Compiler support - is there any ways to enable it aside from intrinsics?

jan-wassenberg · 2020-06-15T07:04:01Z

Developers made do quite successfully without it on native hardware, why is it a must for Wasm?

IMHO, I think it is a must for flexible vectors (much less for WASM SIMD in general). If you know the size of your SIMD register

I am surprised to hear it described as a must, haven't yet seen an application that really required it.
We do know the size of the register, right? We have to have some function that tells us the loop increment, which is (by definition) the register size.

lemaitre · 2020-06-15T08:14:56Z

We do know the size of the register, right? We have to have some function that tells us the loop increment, which is (by definition) the register size.

It is seems to be a misconception here: We do not know, as developers, the SIMD width.
We only know how to get it at runtime via a specific instruction (or global).

For instance, one way to emulate scatter (or gather for that matter) is to implement a full in-register transposition.
This means that, at some point, you need as many registers as their width to store the data.
If you transpose floats in SSE, you need 4 registers. With AVX2, 8 registers, and so on.
So you cannot do this trick if you don't know at code time (or compile time) the size of the registers.

Even the extract pattern for scatter would be problematic as the extract index would most likely be an immediate (and certainly is on most architectures). So here, either you unroll completely the loop at compile time and check for each index that it is less than actual width, or we make the extract with runtime indices and hope the generation will see that the index is actually compile-time...

Neither solution sounds appealing.

jan-wassenberg · 2020-06-15T10:09:36Z

Yes, to be clear: having the function could allow us to compare the runtime value against a small set of candidates, and use the corresponding code pregenerated for each. Which raises an interesting question: is there some abstraction we can provide that allows developers to know that SVE will always have n*128 bit, x86 will have {1,2,4}x128?
RiscV V has no such limitation, but if the function returns something the app doesn't expect (e.g. 16K bits) then the app can fall back to some codepath that doesn't do in-register transposition.

penzn · 2020-06-15T17:37:26Z

For Arm and x86 ISAs it would be perfectly legal to say that maximum width is always a multiple of 128 bits, though I am not sure how that would map to RiscV.

lemaitre · 2020-06-15T17:52:49Z

According to Risc-V V spec (https://riscv.github.io/documents/riscv-v-spec/riscv-v-spec.pdf#_implementation_defined_constant_parameters), maximum width should be a power of larger than or equal to 32 bits. (EDIT: I got confused in a previous version of this message)

Only SVE does not require that maximum width should be a power of 2.

programmerjake · 2021-04-09T05:51:20Z

There's also SimpleV, a WIP extension on OpenPower that guarantees availability of any vector length from 1 to 64 (not limited to powers of 2, so e.g. 35 is a valid vector length), and allows (like RISC-V V) the length to be set dynamically. It supports gather-load, scatter-store, and gather register-to-register moves.

lemaitre · 2021-04-09T08:10:03Z

Thanks @programmerjake, I was not aware of SimpleV. To me, the interesting point is the guarantee that vector length of 64 is available. So on SimpleV, we can force the vl to a power of 2 if required.

Also, you mention "gather register-to-register moves". While in hardware, it makes sense to group it with gather loads, for the software point of view, such connection is not required and the terminology used is more shuffle/swizzle.

programmerjake · 2021-04-09T09:34:05Z

Some additional comments on SimpleV on Libre-SOC's mailing list:
http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-April/002318.html

programmerjake · 2021-04-09T09:42:32Z

Thanks @programmerjake, I was not aware of SimpleV. To me, the interesting point is the guarantee that vector length of 64 is available.

Yup! We basically picked 64 as the max since the general purpose integer registers are 64-bits wide allowing 1 predicate bit per vector element.

So on SimpleV, we can force the vl to a power of 2 if required.

Yup, though if your forcing VL to be bigger than necessary just so it's a power of 2, it will probably run slower, since it's implemented using a hardware-level loop over vector elements.

Also, you mention "gather register-to-register moves". While in hardware, it makes sense to group it with gather loads, for the software point of view, such connection is not required and the terminology used is more shuffle/swizzle.

Yup!

penzn mentioned this issue Jun 13, 2020

Hardware support and priorities #7

Open

padenot mentioned this issue Nov 22, 2021

Planar-only considered harmful WebAudio/web-audio-api#2458

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scatter and gather #12

Scatter and gather #12

penzn commented Jun 13, 2020

jan-wassenberg commented Jun 15, 2020

lemaitre commented Jun 15, 2020

jan-wassenberg commented Jun 15, 2020

penzn commented Jun 15, 2020

lemaitre commented Jun 15, 2020 •

edited

Loading

programmerjake commented Apr 9, 2021

lemaitre commented Apr 9, 2021

programmerjake commented Apr 9, 2021

programmerjake commented Apr 9, 2021

Scatter and gather #12

Scatter and gather #12

Comments

penzn commented Jun 13, 2020

jan-wassenberg commented Jun 15, 2020

lemaitre commented Jun 15, 2020

jan-wassenberg commented Jun 15, 2020

penzn commented Jun 15, 2020

lemaitre commented Jun 15, 2020 • edited Loading

programmerjake commented Apr 9, 2021

lemaitre commented Apr 9, 2021

programmerjake commented Apr 9, 2021

programmerjake commented Apr 9, 2021

lemaitre commented Jun 15, 2020 •

edited

Loading