-
Notifications
You must be signed in to change notification settings - Fork 6
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scatter and gather #12
Comments
I am surprised to hear it described as a must, haven't yet seen an application that really required it. |
It is seems to be a misconception here: We do not know, as developers, the SIMD width. For instance, one way to emulate scatter (or gather for that matter) is to implement a full in-register transposition. Even the extract pattern for scatter would be problematic as the extract index would most likely be an immediate (and certainly is on most architectures). So here, either you unroll completely the loop at compile time and check for each index that it is less than actual width, or we make the extract with runtime indices and hope the generation will see that the index is actually compile-time... Neither solution sounds appealing. |
Yes, to be clear: having the function could allow us to compare the runtime value against a small set of candidates, and use the corresponding code pregenerated for each. Which raises an interesting question: is there some abstraction we can provide that allows developers to know that SVE will always have n*128 bit, x86 will have {1,2,4}x128? |
For Arm and x86 ISAs it would be perfectly legal to say that maximum width is always a multiple of 128 bits, though I am not sure how that would map to RiscV. |
According to Risc-V V spec (https://riscv.github.io/documents/riscv-v-spec/riscv-v-spec.pdf#_implementation_defined_constant_parameters), maximum width should be a power of larger than or equal to 32 bits. (EDIT: I got confused in a previous version of this message) Only SVE does not require that maximum width should be a power of 2. |
There's also SimpleV, a WIP extension on OpenPower that guarantees availability of any vector length from 1 to 64 (not limited to powers of 2, so e.g. 35 is a valid vector length), and allows (like RISC-V V) the length to be set dynamically. It supports gather-load, scatter-store, and gather register-to-register moves. |
Thanks @programmerjake, I was not aware of SimpleV. To me, the interesting point is the guarantee that vector length of 64 is available. So on SimpleV, we can force the vl to a power of 2 if required. Also, you mention "gather register-to-register moves". While in hardware, it makes sense to group it with gather loads, for the software point of view, such connection is not required and the terminology used is more shuffle/swizzle. |
Some additional comments on SimpleV on Libre-SOC's mailing list: |
Yup! We basically picked 64 as the max since the general purpose integer registers are 64-bits wide allowing 1 predicate bit per vector element.
Yup, though if your forcing VL to be bigger than necessary just so it's a power of 2, it will probably run slower, since it's implemented using a hardware-level loop over vector elements.
Yup! |
As @Maratyszcza and @lemaitre point out in #7, we should consider scatter and gather operations. This is an issue to track that.
Potential topics to discuss:
The text was updated successfully, but these errors were encountered: