-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concerns about amount of allocated opcode space #47
Comments
Let me start by stating two obsevations:
Here are some results on code size and "performance" using the embench benchmark and a prototypical compiler: Results will improve as the compiler matures (you can see some clearly bad usages of Zilsd and missed opportunities). |
I agree this is consistent with the existing instructions, but this new instruction doesn't necessary need to follow the same inefficient encoding. I'd imagine a 5-bit scaled immediate would be sufficient for the majority of cases? Can you run objdump on the code generated by your prototype compiler to create a histogram of the used immediates?
These numbers look good for some of those benchmarks, but I'd like to know what ISA string this was built with. Since you use the entire Zcf opcode space this extension is incompatible with Zcmp and I would imagine push/pop has a larger impact on this benchmark overall? I am also very surprised by the cubic numbers - looking at the code this only performs floating point operations - I assume you were building for soft-float? |
I should have shared the isa string. Baseline is The benchmarks with very high code size reduction are those that have a high exposure to double. I do not have statistics for the immediate distribution, but embench would clearly not be representative here (In fact, most benchmarks are simply to small to have realistic immediate distribution). Please also understand that the specification is currently in the final phases of architecture review. I am happy to get questions and input in all phases, but it is best to provide during the internal review period, which is long past. |
In general I think this extension makes a lot of sense, but I am slightly concerned about how much opcode space is being used here.
While I see that just using the "double-word" encoding makes a lot of sense from a simplicity point, it burns a lot of opcode space: do we really need a 12-bit immediate for the offset?
Additionally, that immediate is unscaled even though it only really makes sense to use it for multiples of 8, wasting 3 bits of the encoding.
Do you have any data showing which immediate values are being used when building some larger projects? Inside loops I'd imagine this to be a very small offset since the base register would be modified and for stack loads/stores the most common offsets would also be quite small (and there is push/pop which replaces lots of the ldp/stp you see in AArch64 function prologs/epilogs).
I am also not sure this extension needs compressed opcodes - is it really that common? I imagine you have a compiler prototype that can show how often it is being used?
For the compressed instructions we would end up using essentially all the remaining encodings freed up by disabling Zcf which seems quite a large impact for what I would expect to be a rather small code size improvement.
The text was updated successfully, but these errors were encountered: