Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoongArch64] add Intrinsics' API for LoongArch64. #94400

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

shushanhf
Copy link
Contributor

@shushanhf shushanhf commented Nov 6, 2023

We have finished the SIMD on the runtime6.0 and the tests passed.

I will push the SIMD for LoongArch64.

This is the first PR about the API's name.

The [API Proposal]: LoongArch64: add Intrinsics' API for LoongArch64
#94445

@tannergooding
Can you give me some advices ?
Thanks

Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Nov 6, 2023
@ghost
Copy link

ghost commented Nov 6, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

We have finished the SIMD on the runtime6.0 and the tests passed.

I will push the SIMD for LoongArch64.

This is the first PR about the API's name.

@tannergooding
Can you give me some advices ?
Thanks

Author: shushanhf
Assignees: -
Labels:

area-System.Runtime.Intrinsics, new-api-needs-documentation

Milestone: -

@shushanhf
Copy link
Contributor Author

@tannergooding
Can you give me some advices ?

This is just the API's name, and first focus on the class name and the API name.

Later I will update this PR to amend some details.

Thanks

@huoyaoyuan
Copy link
Member

As a new architecture, it's more risky to expose public APIs comparing to mature architectures. I'd suggest keeping them internal, and focusing on cross-platform Vector128/256 intrinsics now.

@huoyaoyuan
Copy link
Member

For API names, you can open API proposal like #94011. API definition without JIT implementation should be unwanted.

@shushanhf
Copy link
Contributor Author

For API names, you can open API proposal like #94011. API definition without JIT implementation should be unwanted.

If the API is OK for LoongArch64, I will push the JIT implementation.

@shushanhf
Copy link
Contributor Author

As a new architecture, it's more risky to expose public APIs comparing to mature architectures. I'd suggest keeping them internal, and focusing on cross-platform Vector128/256 intrinsics now.

yes, the Vector128/256 is independent of the CPU.

Now the API for architecture is the most important for LoongArch64, I want to confirm them for LoongArch64.
And then I will push all the Hardware-Intrinsics and SIMD.

@tannergooding tannergooding added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 6, 2023
#pragma warning disable IDE0060 // unused parameters
using System.Runtime.CompilerServices;

namespace System.Runtime.Intrinsics.LoongArch64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked this as NO-MERGE since we cannot take it until after an API review has occurred. See https://github.com/dotnet/runtime/blob/main/docs/project/api-review-process.md

We need an API proposal, following the standard template, created first. We'll have the discussion on relevant name changes and other bits there, then I can then champion that and take it to API review. Once approved, we can then implement the API surface.

Until then, LoongArch would be relegated to only supporting the existing cross platform API surface. For example, Leading/TrailingZeroCount can be supported by accelerating int.Leading/TrailingZeroCount and the same methods on the other primitive types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !
Reviewing the API for LoongArch64 based on a PR maybe more clear. So I pushed this PR.
I will create an API proposal for LoongArch64's API.

/// float64x4_t xvfmin_d_f64 (float64x4_t a, float64x4_t b)
/// LASX: XVFMIN.D Xd.4D, Xj.4D, Xk.4D
/// </summary>
public static Vector256<double> Min(Vector256<double> left, Vector256<double> right) => Min(left, right);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the semantics around NaN and -0 handling on LoongArch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The float operation is implemented within the IEEE-754-2008, here is MinNum(x,y).

/// float32x8_t xvfrecip_s_f32 (float32x8_t a)
/// LASX: XVFRECIP.S Xd.8S Xj.8S
/// </summary>
public static Vector256<float> Reciprocal(Vector256<float> value) => Reciprocal(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this exact, or is it an estimate with more than 0.5 ULP error allowed, like on several other platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Reciprocal is implemented with the IEEE754-2008 division(1.0,x).

Copy link
Contributor Author

@shushanhf shushanhf Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the FRECIPE and FRSQRTE within the LoongArchBase class are estimate.
But the FRECIP and FRSQRT are exact.

/// bool xvsetnez_v_u8 (uint8x32_t value)
/// LASX: XVSETNEZ.V cd, Xj.32B
/// </summary>
public static bool HasElementsNotZero(Vector256<byte> value) => HasElementsNotZero(value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this instruction work at the hardware level?

Xj.32B is clearly the input register, but I'm not familiar with cd here. Is it a general purpose register, a flag register, something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will answer these together later.

Copy link
Contributor Author

@shushanhf shushanhf Nov 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this instruction work at the hardware level?

Xj.32B is clearly the input register, but I'm not familiar with cd here. Is it a general purpose register, a flag register, something else?

The cd is a float flag register which indicating the floats comparing results.
There are 8 cd float flag registers.

Of course here I didn't expose the cd within the API just for simple usage.

Update the API within the LoongArchBase class.
/// </summary>
[Intrinsic]
[CLSCompliant(false)]
public abstract class LoongArchBase
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Rename this file as LoongArchBase.cs, is it OK?
    Or Just name this file as LABase.cs?
  • Naming this class as LoongArchBase, is it OK ?

Comment on lines +29 to +34
public static int LeadingSignCount(int value) => LeadingSignCount(value);

/// <summary>
/// LA64: CLO.W rd, rj
/// </summary>
public static int LeadingSignCount(uint value) => LeadingSignCount(value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to add two types API with int value and uint value ?

Comment on lines +129 to +134
public static long ReverseElementBits(int value) => ReverseElementBits(value);

/// <summary>
/// LA64: BITREV.W rd, rj
/// </summary>
public static ulong ReverseElementBits(uint value) => ReverseElementBits(value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed to add the int value and uint value for the API ReverseElementBits() ?

Copy link
Contributor Author

@shushanhf shushanhf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +139 to +154
public static int ReverseElementBits(int value) => ReverseElementBits(value);

/// <summary>
/// LA64: REVB.2W rd, rj
/// </summary>
public static uint ReverseElementBits(uint value) => ReverseElementBits(value);

/// <summary>
/// LA64: REVB.D rd, rj
/// </summary>
public static long ReverseElementBits(long value) => ReverseElementBits(value);

/// <summary>
/// LA64: REVB.D rd, rj
/// </summary>
public static ulong ReverseElementBits(ulong value) => ReverseElementBits(value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are part of instructions liking the Arm64's REV, REV16, REV32, REV64, but the ArmBase class doesn't support these, Why?

Is it needed to add these for LoongArch64.

…unding.

Count leading ones/zeros and elements' bit clear.
bitwise shift, shuffle, compare and float operations.
    add LoadElementReplicateVector, Vector elements' operations and AverageRounded.
@shushanhf
Copy link
Contributor Author

Hi, @tannergooding
I finished the first version of this PR, could you please review this PR?
Thanks

@shushanhf shushanhf force-pushed the LA_SIMD_API branch 6 times, most recently from 424a8a1 to 411b9f5 Compare November 30, 2023 03:20
@tannergooding
Copy link
Member

could you please review this PR?

I can potentially give it a pass today or tomorrow, but its still blocked until API review can happen. That probably won't happen until the new year as API review typically doesn't happen in December when most people are on holiday/vacation.

@shushanhf
Copy link
Contributor Author

could you please review this PR?

I can potentially give it a pass today or tomorrow, but its still blocked until API review can happen. That probably won't happen until the new year as API review typically doesn't happen in December when most people are on holiday/vacation.

OK, Thanks
I can wait more reviewers.

I will push other PRs that are independent of these APIs liking the SIMD's instructions within the emitter #95456

Also amend some code-formate.
@tannergooding
Copy link
Member

I'm still waiting for response to the question asked on the API proposal:

Where is the spec for the LA64 SIMD ISA?

Based on https://loongson.github.io/LoongArch-Documentation/README-EN.html, it looks like it should be Volume 2, but there doesn't appear to be any version published for it yet and the backing GitHub repo looks to be archived now and I do not see a replacement.

@shushanhf
Copy link
Contributor Author

I'm still waiting for response to the question asked on the API proposal:

Where is the spec for the LA64 SIMD ISA?
Based on https://loongson.github.io/LoongArch-Documentation/README-EN.html, it looks like it should be Volume 2, but there doesn't appear to be any version published for it yet and the backing GitHub repo looks to be archived now and I do not see a replacement.

I'm very sorry for late response.
In fact the Loongson does't publish the offical Intrinsic manual of English Version yet.

Although the GCC had merged the LoongArch's SIMD.
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/loongarch/lsxintrin.h
https://github.com/gcc-mirror/gcc/blob/master/gcc/config/loongarch/lasxintrin.h

And the LLVM is same.

There is an unofficial intrinsics manual:
https://github.com/jiegec/unofficial-loongarch-intrinsics-guide?tab=readme-ov-file
https://jia.je/unofficial-loongarch-intrinsics-guide/

@tannergooding
Copy link
Member

Thanks! This is still on my backlog but is lower priority than some other work due to the API review not having happened yet (and this PR being blocked until that can happen).

I'll try to set some time aside in the next week or two to go through the SIMD ISA guide and compare it to the proposed API surface so that it can get marked ready-for-review

@tannergooding tannergooding self-assigned this Jan 30, 2024
@shushanhf
Copy link
Contributor Author

Thanks! This is still on my backlog but is lower priority than some other work due to the API review not having happened yet (and this PR being blocked until that can happen).

I'll try to set some time aside in the next week or two to go through the SIMD ISA guide and compare it to the proposed API surface so that it can get marked ready-for-review

OK, Thanks very much.
I will do it after the Feb. 17 for the Chinese Spring Festival.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-loongarch64 area-System.Runtime.Intrinsics community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants