Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Add TrigonometricMultiplyAddCoefficient #104697

Merged
merged 6 commits into from
Jul 15, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #99957. @dotnet/arm64-contrib PTAL, thanks!

Test output:

Starting test: .\Core_Root\corerun.exe .\HardwareIntrinsics_Arm_r\HardwareIntrinsics_Arm_r.dll Sve_TrigonometricMultiplyAddCoefficient
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_TrigonometricMultiplyAddCoefficient_float() : 31
Passed test: _Sve_r::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_TrigonometricMultiplyAddCoefficient_double() : 31
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

Starting test: .\Core_Root\corerun.exe .\HardwareIntrinsics_Arm_ro\HardwareIntrinsics_Arm_ro.dll Sve_TrigonometricMultiplyAddCoefficient
===================Running default===================
------------------- {} -------------------
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_TrigonometricMultiplyAddCoefficient_float() : 31
Passed test: _Sve_ro::JIT.HardwareIntrinsics.Arm._Sve.Program.Sve_TrigonometricMultiplyAddCoefficient_double() : 31
===================Running jitstress===================
------------------- {'JitMinOpts': '1'} -------------------
------------------- {'JitStress': '1'} -------------------
------------------- {'JitStress': '2'} -------------------
------------------- {'JitStress': '1', 'TieredCompilation': '1'} -------------------
------------------- {'JitStress': '2', 'TieredCompilation': '1'} -------------------
------------------- {'TailcallStress': '1'} -------------------
------------------- {'ReadyToRun': '0'} -------------------
===================Running jitstressregs===================
------------------- {'JitStressRegs': '1'} -------------------
------------------- {'JitStressRegs': '2'} -------------------
------------------- {'JitStressRegs': '3'} -------------------
------------------- {'JitStressRegs': '4'} -------------------
------------------- {'JitStressRegs': '8'} -------------------
------------------- {'JitStressRegs': '0x10'} -------------------
------------------- {'JitStressRegs': '0x80'} -------------------
------------------- {'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStressRegs': '0x2000'} -------------------
===================Running jitstress2-jitstressregs===================
------------------- {'JitStress': '2', 'JitStressRegs': '1'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '2'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '3'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '4'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '8'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x10'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x80'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x1000'} -------------------
------------------- {'JitStress': '2', 'JitStressRegs': '0x2000'} -------------------

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

1 similar comment
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Jul 10, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions around test.

src/coreclr/jit/hwintrinsiccodegenarm64.cpp Show resolved Hide resolved
@@ -150,7 +150,7 @@ namespace JIT.HardwareIntrinsics.Arm

public void RunStructFldScenario({TemplateName}BinaryOpTest__{TestName} testClass)
{
var result = {Isa}.{Method}(_fld1, _fld2, {Imm});
var result = {Isa}.{Method}(_fld1, _fld2, Imm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing a test where we pass invalidImm as input. See test.RunBasicScenario_UnsafeRead_InvalidImm in _SveImmUnaryOpTestTemplate.template for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I noticed the InvalidImm tests in other templates look like this:

bool succeeded = false;
try
{
    var result = {Isa}.{Method}(
        Unsafe.Read<{Op1VectorType}<{Op1BaseType}>>(_dataTable.inArrayPtr),
        {InvalidImm}
    );
}
catch (ArgumentOutOfRangeException)
{
    succeeded = true;
}

We aren't doing anything with succeeded. Is this intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't doing anything with succeeded. Is this intentional?

Unfortunately, it is not. It is missing this code:

            if (!succeeded)
            {
                Succeeded = false;
            }

Opened #104809 to track it. Feel free to just update this one and for others we will take care of as part of fixing the overall issue.

public static float TrigonometricMultiplyAddCoefficient(float op1, float op2, byte imm)
{
int index = (op2 < 0) ? (imm + 8) : imm;
uint coeff = index switch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SwapnilGaikwad - can you please confirm this logic verifies the operation of the instruction?

@@ -233,7 +255,7 @@ namespace JIT.HardwareIntrinsics.Arm
.Invoke(null, new object[] {
Unsafe.Read<{Op1VectorType}<{Op1BaseType}>>(_dataTable.inArray1Ptr),
Unsafe.Read<{Op1VectorType}<{Op1BaseType}>>(_dataTable.inArray2Ptr),
(byte){Imm}
(byte)Imm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By having Imm, we will always generate table driven logic, while with {Imm}, we will always generate the single instruction, with constant embedded. I think we need to have a test for both and I have captured it in #104809. Please revert these changes and we will address them together as part of that issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I did this out of convenience for writing the test definitions: If we do {Imm}, then we cannot dynamically generate the immediate value using something like TestLibrary.Generator.GetByte() because the value will change each time we use {Imm}. Instead, we'll have to hard-code the immediate into the test with something like ["Imm"] = "0". I don't mind doing this; it will just result in more tests for this API.

@@ -5218,6 +5218,33 @@ public static float MultiplyExtended(float op1, float op2)
}
}

public static float TrigonometricMultiplyAddCoefficient(float op1, float op2, byte imm)
{
int index = (op2 < 0) ? (imm + 8) : imm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have we verified the behavior of this API with invalid values? If it is undefined, are we making the test robust to handle it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in #104681, the tests seem to work fine with out-of-bound values, though I can easily modify the tests to skip validation for such values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the way you have combined the table of sin and cos in a single lookup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in #104681, the tests seem to work fine with out-of-bound values, though I can easily modify the tests to skip validation for such values.

That will be great.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the way you have combined the table of sin and cos in a single lookup.

Thanks, I wish I could say that was my idea, but the ARM docs suggested it

@amanasifkhalid
Copy link
Member Author

@kunalspathak I addressed your feedback. I had to add a usage of the intrinsic's result in the InvalidImm scenario so that the intrinsic itself doesn't get optimized away and fail the test. Stress tests are passing.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. please make sure we skip the validation for invalid input as well as make sure the other tests using the modified template passes.

@@ -62,6 +62,9 @@ namespace JIT.HardwareIntrinsics.Arm

// Validates executing the test inside conditional, with op3 as zero
test.ConditionalSelect_ZeroOp();

// Validates basic functionality fails with an invalid imm, using Unsafe.ReadUnaligned
test.RunBasicScenario_UnsafeRead_InvalidImm();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This template is used by *MultiplyBySelectedScalar* too. Have you verified if those tests pass, including the stress?

@@ -5218,6 +5218,33 @@ public static float MultiplyExtended(float op1, float op2)
}
}

public static float TrigonometricMultiplyAddCoefficient(float op1, float op2, byte imm)
{
int index = (op2 < 0) ? (imm + 8) : imm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the way you have combined the table of sin and cos in a single lookup.

@@ -5218,6 +5218,33 @@ public static float MultiplyExtended(float op1, float op2)
}
}

public static float TrigonometricMultiplyAddCoefficient(float op1, float op2, byte imm)
{
int index = (op2 < 0) ? (imm + 8) : imm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like in #104681, the tests seem to work fine with out-of-bound values, though I can easily modify the tests to skip validation for such values.

That will be great.

@amanasifkhalid
Copy link
Member Author

@kunalspathak I merged from main and reran the stress tests for MultiplyBySelectedScalar and TrigonometricMultiplyAddCoefficient, and they're passing. Is this ready to be merged?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@amanasifkhalid amanasifkhalid merged commit 5354e0f into dotnet:main Jul 15, 2024
163 of 167 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-ftmad branch July 15, 2024 16:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants