Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Added SVE GetFfr, SetFfr, LoadVectorFirstFaulting, GatherVectorFirstFaulting #104502

Merged
merged 47 commits into from
Jul 27, 2024

Conversation

TIHan
Copy link
Member

@TIHan TIHan commented Jul 6, 2024

Contributes to #99957

Adds

  • GetFfr
  • SetFfr
  • LoadVectorFirstFaulting
  • GatherVectorFirstFaulting

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

@TIHan TIHan mentioned this pull request Jul 6, 2024
/// svbool_t svrdffr()
/// RDFFR Presult.B
/// </summary>
public static unsafe Vector<byte> GetFfr() => GetFfr();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannergooding @terrajobst I think the APIs for GetFfr need another look. We can't provide overloads by return type, so if we were to expose a way to do that, we need to use suffix versions, i.e. GetFfrByte , GetFfrUInt16, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you've done is actually how API review discussed and approved it, it just didn't get captured in the approval notes.

We have a standard convention of suffixing hwintrinsic APIs with the .NET type name where we need to disambiguate, such as ConvertToSingle or LoadVector128. This will always be a Vector<T> return and so we only needed to encode the T, such as Byte, Int16, Int32, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TIHan - can you please update or add a comment to the original issue #94004 the new names that we decided here? It will help us to know the final API name, rather than searching for this comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about it, we do not have individual type variants in C++ APIs. wondering why we have them at all? https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=svrdffr. There should be just 1 API for this IMO - Vector<byte> GetFfr()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why we have them at all?

C/C++ doesn't use templates and has an explicit mask register type (svbool_t). By default you'd presume that svbool_t is byte, but that isn't always the case and how that svbool_t is interpreted depends on the consuming operation. For example CNTP which finds the number of active elements set reads every nth bit:

if !HaveSVE() && !HaveSME() then UNDEFINED;
constant integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Rd);

CheckSVEEnabled();
constant integer VL = CurrentVL;
constant integer PL = VL DIV 8;
constant integer elements = VL DIV esize;
bits(PL) mask = P[g, PL];
bits(PL) operand = P[n, PL];
bits(64) sum = Zeros(64);

for e = 0 to elements-1
    if ActivePredicateElement(mask, e, esize) && ActivePredicateElement(operand, e, esize) then
        sum = sum + 1;
X[d, 64] = sum;

boolean ActivePredicateElement(bits(N) pred, integer e, integer esize)
    assert esize IN {8, 16, 32, 64, 128};
    integer n = e * (esize DIV 8);
    assert n >= 0 && n < N;
    return pred<n> == '1';

And so, a svbool_t for int and for byte are both the same size, which is 1/8th the size of Vector<T> (thus for a 2048-bit Vector, we have 256-bit VectorMask<T>). Where VectorMask<byte> means check every bit and VectorMask<int> means check every 4th bit.

Since .NET uses generics and doesn't have an explicit mask type, we want to explicitly expose the underlying T so that devs can take the mask produced by GetFfr and pass it down to other APIs without casting, which is the same thing they could do in C/C++.

src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
@TIHan TIHan marked this pull request as ready for review July 8, 2024 21:14
@TIHan
Copy link
Member Author

TIHan commented Jul 8, 2024

@dotnet/arm64-contrib @kunalspathak @tannergooding this is ready.

stress test results:

C:\Users\dotnetperfuser\Desktop\work>py stress_tester.py windows.arm64.Checked/Tests/Core_Root/corerun.exe windows.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll FirstFaulting
Starting test: windows.arm64.Checked/Tests/Core_Root/corerun.exe windows.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll FirstFaulting
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_float_int() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_int_int() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_uint_int() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_float_uint() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_int_uint() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_uint_uint() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_double_long() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_long_long() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_ulong_long() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_double_ulong() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_long_ulong() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_ulong_ulong() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_float() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_double() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_sbyte() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_short() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_int() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_long() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_byte() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_ushort() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_uint() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_ulong() : 9
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@TIHan
Copy link
Member Author

TIHan commented Jul 8, 2024

Ok, this isn't quite as ready as I would like. I had my test wrong for GatherVector.

@TIHan
Copy link
Member Author

TIHan commented Jul 8, 2024

@dotnet/arm64-contrib @kunalspathak @tannergooding Now this is ready. :)

stress results:

Starting test: windows.arm64.Checked/Tests/Core_Root/corerun.exe windows.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll FirstFaulting
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_float_int() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_int_int() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_uint_int() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_float_uint() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_int_uint() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_uint_uint() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_double_long() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_long_long() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_ulong_long() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_double_ulong() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_long_ulong() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorFirstFaulting_Indices_ulong_ulong() : 13
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_float() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_double() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_sbyte() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_short() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_int() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_long() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_byte() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_ushort() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_uint() : 9
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorFirstFaulting_ulong() : 9
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Jul 9, 2024
@@ -105,6 +105,7 @@ HARDWARE_INTRINSIC(Sve, GatherPrefetch64Bit,
HARDWARE_INTRINSIC(Sve, GatherPrefetch8Bit, -1, -1, false, {INS_sve_prfb, INS_sve_prfb, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Special, HW_Flag_Scalable|HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_HasImmediateOperand|HW_Flag_HasEnumOperand)
HARDWARE_INTRINSIC(Sve, GatherVector, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1w, INS_sve_ld1w, INS_sve_ld1d, INS_sve_ld1d, INS_sve_ld1w, INS_sve_ld1d}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, GatherVectorByteZeroExtend, -1, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_sve_ld1b, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, GatherVectorFirstFaulting, -1, 3, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ldff1w, INS_sve_ldff1w, INS_sve_ldff1d, INS_sve_ldff1d, INS_sve_ldff1w, INS_sve_ldff1d}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_SpecialCodeGen|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_SpecialSideEffectMask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this have SpecialEffects flag? Having memoryload should be good enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It causes a side-effect. If I do not have that flag on there, this API can get dead-code eliminated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is different for this API from other API? Do you have an example where this gets dead-code eliminated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, it's simple to write something that will be dead-code eliminated:

void SomeMethod()
{
    var notUsingResult = GatherVectorFirstFaulting*; // or LoadVectorFirstFaulting
}

@@ -136,6 +145,7 @@ HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToInt64,
HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToUInt16, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToUInt32, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, LoadVectorByteZeroExtendToUInt64, -1, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sve_ld1b, INS_invalid, INS_invalid}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation)
HARDWARE_INTRINSIC(Sve, LoadVectorFirstFaulting, -1, 2, true, {INS_sve_ldff1b, INS_sve_ldff1b, INS_sve_ldff1h, INS_sve_ldff1h, INS_sve_ldff1w, INS_sve_ldff1w, INS_sve_ldff1d, INS_sve_ldff1d, INS_sve_ldff1w, INS_sve_ldff1d}, HW_Category_MemoryLoad, HW_Flag_Scalable|HW_Flag_ExplicitMaskedOperation|HW_Flag_LowMaskedOperation|HW_Flag_SpecialCodeGen|HW_Flag_SpecialSideEffectMask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. why do we need SpecialEffect flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this API causes a side-effect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the MemoryLoad is going to force it to be a global ref, so it should take care of it?

Copy link
Member Author

@TIHan TIHan Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In liveness, there is this logic:

#ifdef FEATURE_HW_INTRINSICS
            case GT_HWINTRINSIC:
            {
                GenTreeHWIntrinsic* hwintrinsic = node->AsHWIntrinsic();
                NamedIntrinsic      intrinsicId = hwintrinsic->GetHWIntrinsicId();

                if (hwintrinsic->OperIsMemoryStore())
                {
                    // Never remove these nodes, as they are always side-effecting.
                    break;
                }
                else if (HWIntrinsicInfo::HasSpecialSideEffect(intrinsicId))
                {
                    // Never remove these nodes, as they are always side-effecting
                    // or have a behavioral semantic that is undesirable to remove
                    break;
                }

                fgTryRemoveNonLocal(node, &blockRange);
                break;
            }
#endif // FEATURE_HW_INTRINSICS

So, this is what I believe I encountered without the flag. The nodes would get removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of GatherVectorFirstFaulting should be exactly similar to GatherVectorByteZeroExtend or any other GatherVector* APIs in terms of liveness, etc. Can you double check why other APIs are not removed while this one is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FirstFaulting is side-effectful because it sets the FFR register. The non-FirstFaulting APIs do not do this, therefore, we don't care if they get dead-code eliminated.

src/coreclr/jit/hwintrinsiclistarm64sve.h Outdated Show resolved Hide resolved
src/coreclr/jit/fgdiagnostic.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsiccodegenarm64.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/hwintrinsiccodegenarm64.cpp Outdated Show resolved Hide resolved

public void RunStructLclFldScenario()
{
TestLibrary.TestFramework.BeginScenario(nameof(RunStructLclFldScenario));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is different in this template that other templates do not have?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to have a new validation method for first-faulting and I also need to initialize the inBounded.

If we really want to start re-using stuff from other templates without all the duplication, we shouldn't continue to write tests this way. We should try to extract more and more stuff out into helpers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing the original template that was referenced for creating this one helps me in diffing to spot what was the new addition that was added in the new template.

break;
}
}
var succeeded = Helpers.CheckGatherVectorBehavior<{RetBaseType}, {ExtendedElementType}, {Op3BaseType}>(firstOp, secondOp, thirdOp, result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was there anything we were missing for this test, or is the change just trying to reuse the new logic that is written in Helpers.cs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-using the logic. If we can extract out how we validate/check the APIs in the helpers, we should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Please run all the tests (with stress) that uses this template and confirm that they all pass.

@TIHan
Copy link
Member Author

TIHan commented Jul 9, 2024

There should be just 1 API for this IMO - Vector GetFfr()

I don't disagree, the user can always convert it to the Vector<T> of their choosing like I did before in the tests. But it's up to API reviewers to determine this.

@kunalspathak
Copy link
Member

kunalspathak commented Jul 26, 2024

@TIHan - I have pushed a change to handle the intrinsics correctly with FFR register handling. Please validate the tests with these changes, specifically around ffr getting restored after function call.

return TFault.One;
}

var index = int.CreateChecked(indices[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this validation work? I was hoping that we check if the address that we want to read for ith lane is invalid and then we would say hasFaulted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not checking the bases which contain the addresses, the CheckGatherVectorBasesFirstFaultingBehavior is the one that looks at the address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point me to the line of code that does it?

return (maskResult != T.Zero) ? result : falseResult;
}

private static T ConditionalSelectTrueResult<T>(T maskResult, T result, T trueResult) where T : INumberBase<T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConditionalSelectTrueResult

I don't see place where we are validating this requirement:

Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.

"and are set to zero in the destination vector.".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its ok to add this in a follow-up PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a follow-up PR would be fine. We should do this for all test cases that use ConditionalSelectTrueResult.


Vector<{VectorBaseType}> loadMask = Sve.CreateTrueMask{VectorBaseType}(SveMaskPattern.All);

Sve.SetFfr(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return TFault.One;
}

var index = int.CreateChecked(indices[i]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not checking the bases which contain the addresses, the CheckGatherVectorBasesFirstFaultingBehavior is the one that looks at the address.

return (maskResult != T.Zero) ? result : falseResult;
}

private static T ConditionalSelectTrueResult<T>(T maskResult, T result, T trueResult) where T : INumberBase<T>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a follow-up PR would be fine. We should do this for all test cases that use ConditionalSelectTrueResult.

@kunalspathak
Copy link
Member

We should do this for all test cases that use ConditionalSelectTrueResult.

Yes, for the FirstFaulting and probably NonFaulting too.

@TIHan - I have pushed a change to handle the intrinsics correctly with FFR register handling. Please validate the tests with these changes, specifically around ffr getting restored after function call.

I actually went ahead and did the validation. Spotted few issues that I pushed on this PR. All tests are passing - https://gist.github.com/kunalspathak/ad873d6b11f8e9959c3ecdf1df992e85

@kunalspathak
Copy link
Member

I think a follow-up PR would be fine. We should do this for all test cases that use ConditionalSelectTrueResult.

#105580

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@kunalspathak
Copy link
Member

/ba-g Build Analysis seems to be stuck

@kunalspathak kunalspathak merged commit 991ae97 into dotnet:main Jul 27, 2024
150 of 171 checks passed
@jkotas
Copy link
Member

jkotas commented Jul 27, 2024

This was merged with number of failures that are clearly caused by the PR. For example:

   System.InvalidOperationException : Failed to mark page as a poison page using mprotect with error :22.
      Stack Trace:
        /_/src/libraries/Common/tests/TestUtilities/System/Buffers/BoundedMemory.Unix.cs(159,0): at System.Buffers.BoundedMemory.AllocHGlobalHandle.Allocate(IntPtr byteLength, PoisonPagePlacement placement)

in number of libraries legs.

I am reverting the change.

@TIHan
Copy link
Member Author

TIHan commented Jul 27, 2024

I see this is related to the BoundedMemory Unix changes. I think we should just have a separate impl for the FirstFaulting tests to reduce risk of other tests failing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants