Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Fix conditional select for Zeroing predicates #102904 #105737

Merged
merged 9 commits into from
Aug 13, 2024

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Jul 31, 2024

Fixes #102904

Some SVE instructions use Pg/Z instead of Pg/M.
This means they cannot be wrapped in conditional select in the standard way - if they are then any inactive lanes will end up with zeros instead of the false value.

For example:
Sve.ConditionalSelect(mask, Sve.LoadVector(address), falseOp);

Cannot be converted into a single load:
LD1W zFalseVal, pMask/Z, xAddress

Instead two instructions are required:

LD1W zDest, pTrue/Z, xAddress
SEL zDest, pMask/M, zDest, zFalseVal

Fix by adding marking all relevant instructions and disabling the condition select optimisation for these instructions.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 31, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 31, 2024
@a74nh
Copy link
Contributor Author

a74nh commented Jul 31, 2024

Stress test results:

❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_GatherVector
Starting test: /home/alahay01/dotnet/runtime_sve/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_GatherVector
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Bases_double_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_float_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_float_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_double_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_double_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVector_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorByteZeroExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16SignExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt16WithByteOffsetsSignExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32SignExtend_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32SignExtend_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32SignExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32SignExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32SignExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32SignExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32WithByteOffsetsSignExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32WithByteOffsetsSignExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32WithByteOffsetsSignExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorInt32WithByteOffsetsSignExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorSByteSignExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16WithByteOffsetsZeroExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt16ZeroExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32WithByteOffsetsZeroExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Bases_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Bases_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_ulong_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorUInt32ZeroExtend_Indices_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_float_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_int_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_uint_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_float_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_int_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_uint_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_double_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_long_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_ulong_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_double_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_long_ulong() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_GatherVectorWithByteOffsets_ulong_ulong() : 2
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------
❯ ~/stress_tester.py $CORE_ROOT/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_LoadVectorNonFaulting
Starting test: /home/alahay01/dotnet/runtime_sve/artifacts/tests/coreclr/linux.arm64.Checked/Tests/Core_Root/corerun ./artifacts/tests/coreclr/linux.arm64.Checked/JIT/HardwareIntrinsics/HardwareIntrinsics_Arm_ro/HardwareIntrinsics_Arm_ro.dll Sve_LoadVectorNonFaulting
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_float() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_double() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_sbyte() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_short() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_int() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_long() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_byte() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_ushort() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_uint() : 2
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_LoadVectorNonFaulting_ulong() : 2
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

@a74nh a74nh marked this pull request as ready for review July 31, 2024 07:50
@a74nh
Copy link
Contributor Author

a74nh commented Jul 31, 2024

@dotnet/arm64-contrib @kunalspathak

@a74nh a74nh requested a review from tannergooding July 31, 2024 07:51
@a74nh a74nh changed the title ARM64-SVE: Fix conditional select for Zeroing predicates ARM64-SVE: Fix conditional select for Zeroing predicates (#102904) Jul 31, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 9.0.0 milestone Aug 1, 2024
@a74nh a74nh changed the title ARM64-SVE: Fix conditional select for Zeroing predicates (#102904) ARM64-SVE: Fix conditional select for Zeroing predicates #102904 Aug 2, 2024
@amanasifkhalid
Copy link
Member

ping @tannergooding
@a74nh looks like the recent refactor of hwintrinsiclistarm64sve.h created merge conflicts.

@a74nh
Copy link
Contributor Author

a74nh commented Aug 5, 2024

@a74nh looks like the recent refactor of hwintrinsiclistarm64sve.h created merge conflicts.

Done! Thanks.

@a74nh
Copy link
Contributor Author

a74nh commented Aug 12, 2024

ping @tannergooding

@a74nh a74nh requested a review from TIHan August 12, 2024 14:44
Comment on lines +4083 to +4087
// If the nested op uses Pg/Z, then inactive lanes will result in zeros, so can only transform if
// op3 is all zeros.

if (nestedOp1->IsMaskAllBitsSet() &&
(!HWIntrinsicInfo::IsZeroingMaskedOperation(nestedOp2Id) || op3->IsVectorZero()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any instructions that only allow Pg/M?

Basically we have:

  • Pg/Z only
  • Pg/M -or- Pg/Z

So I'm wanting to discern if we also have:

  • Pg/M only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, quite a few.
eg:
https://docsmirror.github.io/A64/2023-09/add_z_p_zz.html
ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely need something that indicates that and avoids containment of zero for that case then, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's <Pg>/M only then the optimisation will continue to work the same as it does in HEAD (because IsZeroingMaskedOperation() will be false)

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming we don't have any Pg/M only instructions

@amanasifkhalid amanasifkhalid merged commit 329bff4 into dotnet:main Aug 13, 2024
115 of 117 checks passed
@a74nh a74nh deleted the selectz_github branch August 13, 2024 15:08
@github-actions github-actions bot locked and limited conversation to collaborators Sep 14, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Arm64/Sve: ConditionalSelect(LoadVector*NonFaultingSignExtend*) codegen
5 participants