Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: gathervector extends #103370

Merged
merged 7 commits into from
Jun 14, 2024
Merged

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Jun 12, 2024

Does not include GatherVectorWithByteOffsets.

There were handful of reviewed APIs that did not make sense, as they were all zero/sign extending from 32bits to 32bits.
eg:
Vector<int> GatherVectorUInt32ZeroExtend(Vector<int> mask, uint* address, Vector<int> indices);
For these, there was no matching C++ API and the SVE instruction it pointed to was invalid (as you can't extend 32bits to 32bits). Not quite sure how they got into the API. I did not include any of these in this PR.

As per #103159, all 32bit address APIs have been commented out.

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jun 12, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 12, 2024

@dotnet/arm64-contrib @kunalspathak : This one is ready.

@a74nh
Copy link
Contributor Author

a74nh commented Jun 13, 2024

stress test results. Failures look like predicate issues and the same as those for gathervector()

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for writing reusable test templates :) One comment about the memory load flag.

@@ -76,6 +76,16 @@ HARDWARE_INTRINSIC(Sve, FusedMultiplySubtract,
HARDWARE_INTRINSIC(Sve, FusedMultiplySubtractBySelectedScalar, -1, 4, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_fmls, INS_sve_fmls}, HW_Category_SIMDByIndexedElement, HW_Flag_Scalable|HW_Flag_HasImmediateOperand|HW_Flag_HasRMWSemantics|HW_Flag_FmaIntrinsic|HW_Flag_LowVectorOperation)
HARDWARE_INTRINSIC(Sve, FusedMultiplySubtractNegated, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_fnmls, INS_sve_fnmls}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_EmbeddedMaskedOperation|HW_Flag_HasRMWSemantics|HW_Flag_LowMaskedOperation|HW_Flag_FmaIntrinsic|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(Sve, GatherVector, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1w, INS_sve_ld1w, INS_sve_ld1d, INS_sve_ld1d, INS_sve_ld1w, INS_sve_ld1d}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, GatherVectorByteZeroExtend, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize earlier but all the load/gatherload need to be marked as HW_Category_MemoryLoad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize earlier but all the load/gatherload need to be marked as HW_Category_MemoryLoad

That's going to require some debugging, as they don't fit a normal style of a load API (I get an assert failure).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hwintrinsic.cpp has a if ((category == HW_Category_MemoryLoad) && op1->OperIs(GT_CAST)) check. This is only checked for memory loads with a single argument. All other types of memory loads are unchecked.

I moved this code so that it is checked for all HW_Category_MemoryLoad nodes. I reused OperIsMemoryLoad() but that means it needs doing after the ret node is created.

Inside OperIsMemoryLoad() I'm not sure what to do for the cases where the load is a vector of addresses. My thought was that we should not set paddr in these cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hwintrinsic.cpp has a if ((category == HW_Category_MemoryLoad) && op1->OperIs(GT_CAST)) check. This is only checked for memory loads with a single argument. All other types of memory loads are unchecked.

can we leave it as is then and not touch it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we leave it as is then and not touch it?

My assumption was this was a bug. The check needs to be added for all loads where the address is not in arg1.

What happens if the address is a GT_CAST and it is not removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the GT_CAST changes for now. I'll raise a bug so we can double check it later. PR is all still working.

@kunalspathak
Copy link
Member

/ba-g the failures are known and test time out.

@kunalspathak kunalspathak merged commit 0d60428 into dotnet:main Jun 14, 2024
160 of 167 checks passed
@a74nh a74nh deleted the gatherload_extend_github branch June 17, 2024 09:57
if (varTypeIsSIMD(addr->gtType))
{
// The address is a vector of addresses.
// Return true, but do not set pAddr.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a74nh - do you remember why we do not set pAddr here? We get AV in value numbering because of that.

Sve_GatherVector_Bases_double_ulong was failing for me with below call stack:

>	clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberAddExceptionSetForIndirection(GenTree * tree, GenTree * baseAddr) Line 13618	C++
 	clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberHWIntrinsic(GenTreeHWIntrinsic * tree) Line 12461	C++
 	clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberTree(GenTree * tree) Line 12056	C++
 	clrjit_universal_arm64_x64.dll!Compiler::fgValueNumberBlock(BasicBlock * blk) Line 10554	C++
 	clrjit_universal_arm64_x64.dll!Compiler::fgValueNumber() Line 10340	C++
 	clrjit_universal_arm64_x64.dll!CompilerPhaseWithStatus::DoPhase() Line 132	C++

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #103606

@github-actions github-actions bot locked and limited conversation to collaborators Jul 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime.Intrinsics arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants