Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Arm64 vector ABI #23675

Merged
merged 7 commits into from
Apr 16, 2019
Merged

Arm64 vector ABI #23675

merged 7 commits into from
Apr 16, 2019

Conversation

CarolEidt
Copy link

@CarolEidt CarolEidt commented Apr 2, 2019

Extend HFA support to support vectors as well as floating point types.
This requires that the JIT recognize vector types even during crossgen,
so that the ABI is supported consistently.

Fix #16022

@CarolEidt
Copy link
Author

@dotnet/jit-contrib @tannergooding @sdmaclea @TamarChristinaArm @jkotas @jkoritzinsky
This is not yet ready to merge (in fact, there's a conflict with @jkoritzinsky 's recent commit), but I would like to get feedback at this point.

src/jit/compiler.h Outdated Show resolved Hide resolved
src/jit/vartype.h Outdated Show resolved Hide resolved
bool _isHfaArg : 1; // True when the argument is an HFA type.
bool _isDoubleHfa : 1; // True when the argument is an HFA, with an element type of DOUBLE.
bool _isHfaArg : 1; // True when the argument is an HFA type.
unsigned char _hfaLogSize : 2; // The base 2 log of the HFA element size in 4-byte units (e.g. 0 for TYP_FLOAT)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there other terminology that could be used here? "size" most of the type refers to byte size. In call morphing code "size" also refers to the number of slots. And now here it refers to the number of 4-byte units.

Copy link
Author

@CarolEidt CarolEidt Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about _hfaLogWidth? Or _hfaLogRelSize for relative size?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, width is also used in some cases to denote byte size (some block ops code does that). But at least it's not used within LclVarDsc (which has size and exactSize) so it's a better alternative.

Though I'm not sure why this is needed to begin with. Is it just to "compress" the information down to 2 bits in the bitfield?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's just to keep the information small as opposed to making it a var_types field, or creating a new enum.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then leave it as it is. I have a feeling that this HFA info will end up in the ClassLayout thing I have in #21705 sooner or later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In looking at my response, I'm thinking a new enum would be better - it could be called something like HfaElemKind and have values HFA_ELEM_FLOAT, HFA_ELEM_DOUBLE and HFA_ELEM_VECTOR128. And then we could rely on the C++ compiler to generate good code rather than trying to pre-optimize by using shifts. (I try to make it a principle to rely on the code generator and not pre-optimize, but I don't always manage to do so!)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, an enum should be fine as long as it has underlying type unsigned char to match the other bitfield members. Otherwise AFAIR VC++ does not pack data member of different types into the same byte of the bitfield.

@TamarChristinaArm
Copy link

Ah great, I'm out of office until Monday but will take a look first thing Monday morning!

src/jit/codegencommon.cpp Outdated Show resolved Hide resolved
@CarolEidt
Copy link
Author

@dotnet-bot test Windows_NT arm Cross Checked Innerloop Build and Test

@CarolEidt CarolEidt force-pushed the Arm64VectorABI branch 2 times, most recently from 6aaedf9 to e31c470 Compare April 10, 2019 14:54
@CarolEidt
Copy link
Author

I believe I've incorporated all the feedback, and this is now green. I'm going to do some outerloop testing, but would appreciate any additional feedback on the changes.

@CarolEidt
Copy link
Author

/azp run coreclr-outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sdmaclea
Copy link

Are there any tests which are testing this is working correctly? Last I knew the tests were missing.
Are we supporting interop with external libraries using HVA? Especially this class of tests.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM and looks to match the spec.

I think it is a bit odd that we aren't fully classifying everything independently as HVAs vs HFAs (it looked like we were sharing the logic a lot, where possible, since they are very similar), but I don't think it is something we need to block this PR on. Ultimately, I think it would be good to have all the logic more clearly defined/classified as it might make the code easier to follow and make it easier to bring new platforms online...

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments/questions

@@ -50,8 +50,7 @@ struct ArgLocDesc
#endif // UNIX_AMD64_ABI

#if defined(_TARGET_ARM64_)
bool m_isSinglePrecision; // For determining if HFA is single or double
// precision
int m_hfaFieldSize; // Size of HFA field
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful if the comment stated "... in bytes" (assuming that's what it is)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this is never negative, right, so should this be declared unsigned?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that makes sense.

case ELEMENT_TYPE_R4: hfaFieldSize = 4; break;
case ELEMENT_TYPE_R8: hfaFieldSize = 8; break;
#ifdef _TARGET_ARM64_
case ELEMENT_TYPE_VALUETYPE: hfaFieldSize = 16; break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems odd that ELEMENT_TYPE_VALUETYPE implies hfaFieldSize = 16. Don't you need to somehow look up the value type size? I guess you're just overloading this existing type for this particular purpose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code (figuring out hfaFieldSize) is also duplicated 3 times; can it be extracted to a func?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you're just overloading this existing type for this particular purpose?

Yes, it's a bit of a hack, but avoids having to create a new VM type.

@@ -580,7 +587,13 @@ class EEClassLayoutInfo
void SetNativeHFAType(CorElementType hfaType)
{
LIMITED_METHOD_CONTRACT;
m_bFlags |= (hfaType == ELEMENT_TYPE_R4) ? e_R4_HFA : e_R8_HFA;
switch (hfaType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function ever called more than once?

Seems you could add assert(m_bFlags & e_HFATypeFlags == 0) if not.

or, change the assigns to something like m_bFlags = (m_bFlags & ~e_HFATypeFlags) | e_..._HFA

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should never be called more than once. I'll add the assert.

HFA_ELEM_NONE,
HFA_ELEM_FLOAT,
HFA_ELEM_DOUBLE,
HFA_ELEM_SIMD16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is HFA_ELEM_SIMD16 only arm64? Should it be #ifdef'ed, or is that too much? Maybe asserts someplace that other platforms don't see it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It didn't seem worth ifdef'ing this out

//
// Return Value:
// On Arm64 - Returns 1-4 indicating the number of register slots used by the HFA
// On Arm32 - Returns the total number of single FP register slots used by the HFA, max is 8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This asymmetry is kind of annoying...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, but it's pre-existing and I'm pretty sure that changing it would be non-trivial.

slots = lvExactSize >> 3;
break;
case HFA_ELEM_SIMD16:
slots = lvExactSize >> 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you add asserts that lvExactSize is a multiple of the appropriate size, e.g., assert(lvExactSize % 16 == 0)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense.

Extend HFA support to support vectors as well as floating point types.
This requires that the JIT recognize vector types even during crossgen,
so that the ABI is supported consistently.

Fix #16022
@CarolEidt
Copy link
Author

The "Test pri0 Linux arm64 checked" leg failed with a timeout in /JIT/Methodical/tailcall/Desktop/_il_relthread-race/_il_relthread-race.sh
and "There was an I/O error during the build." on the Windows arm64 leg.

@CarolEidt
Copy link
Author

test Windows_NT arm64 Cross Checked Innerloop Build and Test

@CarolEidt
Copy link
Author

/azp run coreclr-outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@CarolEidt
Copy link
Author

test Windows_NT arm64 Cross Checked Innerloop Build and Test

@CarolEidt
Copy link
Author

@BruceForstall @dotnet/jit-contrib - I've addressed all of the PR feedback, and the only remaining outerloop failures are the recently added test:
JIT/Methodical/largeframes/skip4/skippage4/skippage4.sh

I have the changes from #23881, but this test is a pri1 test and doesn't seem to have been run on that PR (or since). The failure is:

Unhandled Exception: System.TypeLoadException: Could not load type 'BigFrames.LargeStructWithRef' from assembly 'skippage4, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null' because it contains an object field at offset 65500 that is incorrectly aligned or overlapped by a non-object field.

Can I get another review?

@CarolEidt
Copy link
Author

@dotnet/jit-contrib ping

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@CarolEidt CarolEidt merged commit 5be6b66 into dotnet:master Apr 16, 2019
@CarolEidt
Copy link
Author

Thanks @briansull !

@CarolEidt CarolEidt deleted the Arm64VectorABI branch April 16, 2019 17:20
@echesakov
Copy link

@CarolEidt I am seeing 8 R2R Linux arm64 and Linux musl_arm64 tests failing during crossgen run on this commit validation https://dev.azure.com/dnceng/public/_build/results?buildId=157586&view=ms.vss-test-web.build-test-results-tab

@CarolEidt
Copy link
Author

@echesakovMSFT - I'll take a look; are these not run in outerloop testing?

@echesakov
Copy link

@CarolEidt We don't run R2R tests in coreclr-outerloop, only during commit validations and official builds.

@RussKeldorph We probably should add coreclr-outerloop-r2r and coreclr-outerloop-r2r-jistress/gcstress build definitions for folks who want to trigger those and also have them running on schedule.

@CarolEidt
Copy link
Author

@echesakovMSFT - can you remind me how to invoke a single test with R2R? Is it just the crossgen command with the appropriate "/Platform_Assemblies_Paths" option? Or is there a better way?

@RussKeldorph
Copy link

@CarolEidt To crossgen the test, set RunCrossGen=1 in the environment before running the test script. If you need the framework compiled like the R2R jobs, pass the crossgen option to build-test's command line.

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Support for Arm64 Vector ABI

Extend HFA support to support vectors as well as floating point types.
This requires that the JIT recognize vector types even during crossgen,
so that the ABI is supported consistently.

Also, fix and re-enable the disabled Arm64 Simd tests.

Fix dotnet/coreclr#16022

Commit migrated from dotnet/coreclr@5be6b66
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Arm64] Add HVA ABI support for HW Intrinsics
10 participants