Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Add Not, InsertIntoShiftedVector #103725

Merged
merged 4 commits into from
Jun 21, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #99957. InsertIntoShiftedVector requires a new test template, _SveVecAndScalarOpTest.template. This template is a clone of _SveMasklessUnaryOpTestTemplate.template, except that a second argument of type T is passed to Sve.InsertIntoShiftedVector<T> and handled elsewhere accordingly. I could've implemented InsertIntoShiftedVector without any special codegen, but the path for handling unpredicated instructions with two operands in CodeGen::genHWIntrinsic checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method. Existing instructions seem to be dependent on this order of checking, so I cannot flip the order of the if-elseif-else statement without breaking existing tests, and I wasn't sure if duplicating the RMW logic on the scalable path would be ideal if it's only for one instruction -- if we prefer to do that now, I can get rid of the special path for this intrinsic, and handle the scalable RMW case on the general path.

Not tests:

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_sbyte() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_short() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_int() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_long() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_byte() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ushort() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_uint() : 16
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ulong() : 16
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_sbyte() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_short() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_int() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_long() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_byte() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ushort() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_uint() : 16
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_Not_ulong() : 16
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

InsertIntoShiftedVector tests:

===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_float() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_double() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_sbyte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_short() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_int() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_long() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_byte() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ushort() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_uint() : 7
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_float() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_double() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_sbyte() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_short() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_int() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_long() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_byte() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ushort() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_uint() : 7
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_InsertIntoShiftedVector_ulong() : 7
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

cc @dotnet/arm64-contrib

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Jun 19, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checks if the instruction is scalable before checking if it has RMW semantics (which this one does), and thus ends up calling the wrong emitIns* method.

Can you confirm the exact place where this is happening?

/// INSR Ztied1.D, Xop2
/// INSR Ztied1.D, Dop2
/// </summary>
public static unsafe Vector<double> InsertIntoShiftedVector(Vector<double> left, double right) { throw new PlatformNotSupportedException(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API, as per the docs is implementing both INSR - SIMD&FP and INSR - scalar, but it is not clear to me, how we decide which one to pick. @a74nh - any idea?

if (targetReg != op1Reg)
{
assert(targetReg != op2Reg);
GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work with both op1Reg being gpr or SIMD/FP?

Copy link
Member Author

@amanasifkhalid amanasifkhalid Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, though I'm struggling to get the JIT to use a gpr for op1Reg under the stress modes -- I'm only ever getting the "easy" case, where when op1Reg and targetReg differ, op1Reg is already a vector register, so we're moving from a vector reg to a vector reg. Since the first argument in Sve.InsertIntoShiftedVector<T> is of type Vector<T>, we'd expect op1Reg to always be a vector register, right? I could add an assert here clarify this. (Though looking at emitIns_Mov, it does have a path for emitting mov instructions from a gpr to a vector register.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I meant that for op2Reg. Sorry. So ideally, where you have double or float as 2nd argument, we should have op2Reg as SIMD/floating point and otherwise should be gpr. Can you verify that please? For op1Reg it should always be scalable register and we already assert that in emitter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I've added an assert to check this. Stress tests for unoptimized and optimized tests are still passing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind sharing small section of disassembly for both the categories?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all. Here's a snippet from the double tests:

            ldr     q16, [x0]
            ldr     d17, [fp, #0x30]    // [V01 loc0]
            insr    z16.d, d17

And from the uint tests:

            ldr     q16, [x0]
            ldr     w0, [fp, #0x34]     // [V01 loc0]
            insr    z16.s, w0

@amanasifkhalid
Copy link
Member Author

Can you confirm the exact place where this is happening?

Sure; right here. Notice how in the if-else statement, we check if the instruction is scalable before we check if it has RMW semantics. insr has both flags, so we end up calling emitIns_R_R_R for it, when we should be calling emitIns_R_R (and doing any necessary moves beforehand).

@amanasifkhalid
Copy link
Member Author

I made a small change to InsertIntoShiftedVector's test helper to avoid an extra array allocation.

@amanasifkhalid
Copy link
Member Author

@kunalspathak @a74nh does this PR need anything else?

@kunalspathak
Copy link
Member

Sure; right here.

Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing.

if (HWIntrinsicInfo::IsExplicitMaskedOperation(intrin.id))
{
    GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}
else
{
    // This generates an unpredicated version
    // Implicitly predicated should be taken care above `intrin.op2->IsEmbMaskOp()`
    GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
}

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amanasifkhalid
Copy link
Member Author

Your change looks good except that I realized that after #103620, we do not need condition anymore for below code because both branches are doing same thing.

Got it, I'll fix that in my next PR.

@amanasifkhalid
Copy link
Member Author

/ba-g Build Analysis blocked by timeouts

@amanasifkhalid amanasifkhalid merged commit 18bc115 into dotnet:main Jun 21, 2024
150 of 167 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-bitwise-not branch June 21, 2024 15:11
rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants