Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate codegen for Vector128.GetUpper #59838

Closed
tannergooding opened this issue Sep 30, 2021 · 8 comments
Closed

Investigate codegen for Vector128.GetUpper #59838

tannergooding opened this issue Sep 30, 2021 · 8 comments
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration tenet-performance Performance related issue
Milestone

Comments

@tannergooding
Copy link
Member

tannergooding commented Sep 30, 2021

As per #53450 (comment), it would like be beneficial to investigate changing the codegen of Vector128.GetUpper to use DUP rather than EXT on ARM64.

category:cq
theme:codegen

@tannergooding tannergooding added arch-arm64 tenet-performance Performance related issue labels Sep 30, 2021
@tannergooding tannergooding self-assigned this Sep 30, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Sep 30, 2021
@ghost
Copy link

ghost commented Sep 30, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

As per #53450 (comment), it would like be beneficial to investigate changing the codegen of Vector128.GetUpper to use DUP rather than EXT on ARM64.

Author: tannergooding
Assignees: tannergooding
Labels:

arch-arm64, tenet-performance, area-CodeGen-coreclr, untriaged

Milestone: -

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Sep 30, 2021
@JulieLeeMSFT
Copy link
Member

Cc @dotnet/jit-contrib

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Sep 30, 2021
@tannergooding tannergooding modified the milestones: 7.0.0, 8.0.0 Aug 1, 2022
@TIHan TIHan assigned TIHan and unassigned tannergooding Feb 2, 2023
@TIHan
Copy link
Member

TIHan commented Mar 8, 2023

@TamarChristinaArm , I have a question, looking at the A78 Optimization Guide, EXT D-form and DUP D-form both have the same latency and throughput, is this indeed true? If so, then there wouldn't be a benefit to emitting DUP instead of EXT here.

@TIHan TIHan added the needs-author-action An issue or pull request that requires more info or actions from the author. label Mar 8, 2023
@ghost
Copy link

ghost commented Mar 8, 2023

This issue has been marked needs-author-action and may be missing some important information.

@TamarChristinaArm
Copy link
Contributor

TamarChristinaArm commented Mar 8, 2023

@TIHan yeah they have the same latency and throughput across all Arm Cortex and Neoverse cores.
The only improvement here is if one does Vector128.GetUpper().AsScalar() in which case the UMOV is faster than EXT + UMOV.

@tannergooding
Copy link
Member Author

GetUpper/Lower() followed be a GetElement (including ToScalar()) seems like a general morph optimization.

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 8, 2023
@TamarChristinaArm
Copy link
Contributor

GetUpper/Lower() followed be a GetElement (including ToScalar()) seems like a general morph optimization.

Indeed, we also do some of the following, like Vector128<uint>.GetUpper().ToScalar() >> 16 would instead generate an UMOV on shorts instead by remapping the index. I don't know how common that is in .NET code, we typically see it during some image/video processing workloads. Just mentioning in case useful.

@TIHan
Copy link
Member

TIHan commented Mar 9, 2023

Thank you for the information @TamarChristinaArm .

Closing this issue as we confirmed the optimization won't benefit. I'll leave it up to @tannergooding to create another issue regarding Vector128.GetUpper().AsScalar() if we plan to do that.

@TIHan TIHan closed this as completed Mar 9, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants