Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752

Merged
merged 16 commits into from
Jun 13, 2021

Conversation

imhameed
Copy link
Contributor

@imhameed imhameed commented Jun 5, 2021

Changes:

  • Consolidate SSE shuffle constant unrolling

    Remove OP_SSE2_SHUFFLE, which is unused.

    Rename OP_SSE_SHUFFLE to OP_SSE_SHUFPS, to make this more consistent with
    the naming convention used for other SSE shuffles.

    Use immediate_unroll_* instead of hand-writing branch emission. These
    branch tables are huge (in the simplest case, with 256 different constant
    values, we can spend over 1KB of code on nothing but shufps and jmps, and
    the cost gets worse if any tail duplication happens), and are currently
    emitted inline. Future work ought to:

    1. use a sequence of extractelement/insertelement instructions, which can be
      optimized into a constant shuffle when the shuffle control parameter is
      constant, and otherwise generates a high-latency but low-code-size fallback
      (note that this only works for shuffles); or

    2. emit the fallback branch tables out of line and use llvm.is.constant to
      generate either a constant shuffle or a call to a fallback shuffle branch
      table function (the cost isn't too bad: a direct-call/ret pair would add ~4-5
      cycles and eat an RSB slot on top of the cost of the branch table).

    Fixes JIT/HardwareIntrinsics/X86/Regression/GitHub_21855/GitHub_21855_r.

  • Fix intrinsification for MathF.Round

    OP_SSE41_ROUNDS takes two source registers, not one.

    TODO: Investigate what happens with llvm.round and
    llvm.experimental.constrained.round.

    Fixes JIT/Intrinsics/MathRoundSingle_r,
    JIT/Math/Functions/Functions_r, and
    JIT/Performance/CodeQuality/Math/Functions/Functions.

  • Clean up intrinsic group lookup

    Use a dummy never-supported intrinsic group as a default fallback, instead of
    adding a special-case "intrinsic group not present" branch

    Correctly intrinsify get_IsSupported even when not using LLVM

    Fixes spurious System.PlatformNotSupportedExceptions when calling
    get_IsSupported when the LLVM backend isn't being used.

  • The "not" SSE comparions are unordered, so use the appropriate unordered LLVM
    IR comparisons

    Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD.

    Fixes Regressions.coreclr/GitHub_34094/Test34094.

  • Fix LoadAndDuplicateToVector128

    LoadAndDuplicateToVector128 should load exactly one 8-byte value from memory
    before broadcasting it into both lanes in a 128-bit result vector.

    Fixes JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r.

  • Implement constant unrolling for Sse41.DotProduct

    As with shuffles, the fallback jump table should probably be kept out of line
    someday; vdpps uses 6 bytes of space, so any fallback jump table for the
    selection control mask will be at least 1.5kb large.

    Fixes JIT/HardwareIntrinsics/X86/Sse41/DotProduct_r.

  • Implement constant unrolling for Sse41.Blend

    The usual: big jump blobs should be out of line, possible to use
    extract/insertelement.

  • Zero is part of the domain of lzcnt and shouldn't yield an undef.

    Use fully-defined llvm.ctlz when implementing OP_LZCNT32/64.

    Fixes JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r

  • Unify amd64/arm64 vector extraction handling

    Removes OP_EXTRACT_U1 and OP_EXTRACT_U2. Instead, sign/zero extension is
    determined via inst_c1 for OP_EXTRACT_* and OP_XEXTRACT_* (and
    OP_EXTRACTX_U2, which doesn't seem to be generated as part of intrinsic
    translation), which must be set to a MonoTypeEnum.

    Replaces OP_EXTRACT_VAR_* with OP_XEXTRACT_*.

    Fixes JIT/Regression/JitBlue/GitHub_23159/GitHub_23159 and
    JIT/Regression/JitBlue/GitHub_13568/GitHub_13568.

  • Remove OP_DPPS; it is unused

  • Disable JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840 when running with mono LLVM AOT

  • Disable finalizearray when running with mono LLVM AOT

  • Disable Vector256_1/Vector128_1 tests on wasm

  • Enable sse4.2, popcnt, lzcnt, bmi, and bmi2 when AOT compiling the runtime
    tests.

  • Pass the runtime variant to helixpublishwitharcade.proj, and forward this
    runtime variant to testenvironment.proj.

    This is used to selectively enable LLVM JIT on the LLVM AOT lanes. Removes
    the hack added to CLRTest.Execute.Bash.targets that did this for arm64 (which
    happens to only have an LLVM AOT lane for runtime tests right now).

  • Enable JIT/HardwareIntrinsics/General/Vector128_1/**,
    JIT/HardwareIntrinsics/General/Vector256/**,
    JIT/HardwareIntrinsics/General/Vector256_1/**, and
    JIT/HardwareIntrinsics/X86/General/IsSupported*/** for LLVM AOT on amd64.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@imhameed imhameed added area-Infrastructure-mono runtime-mono specific to the Mono runtime labels Jun 5, 2021
@ghost
Copy link

ghost commented Jun 5, 2021

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details
Author: imhameed
Assignees: -
Labels:

area-Infrastructure-mono, runtime-mono

Milestone: -

@fanyang-mono
Copy link
Member

@imhameed If you rebase this PR to get b13715b, you should be able to get all the logs generated by runtime tests, which could help you investigate these test failures.

@imhameed imhameed force-pushed the monoamd64intrintests branch 23 times, most recently from 580a326 to a90ccef Compare June 12, 2021 07:52
Use a dummy never-supported intrinsic group as a default fallback,
instead of adding a special-case "intrinsic group not present" branch

Correctly intrinsify get_IsSupported even when not using LLVM

Fixes spurious `System.PlatformNotSupportedException`s when calling
`get_IsSupported` when the LLVM backend isn't being used.
Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD

Fixes `Regressions.coreclr/GitHub_34094/Test34094`
LoadAndDuplicateToVector128 should load exactly one 8-byte value from
memory before broadcasting it into both lanes in a 128-bit result
vector.

Fixes `JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r`
As with shuffles, the fallback jump table should probably be kept out of
line someday; `vdpps` uses 6 bytes of space, so any fallback jump table
for the selection control mask will be at least 1.5kb large.

Fixes `JIT/HardwareIntrinsics/X86/Sse41/DotProduct_r`
Fixes `JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r`
The usual: big jump blobs should be out of line, possible to use
extract/insertelement
@imhameed imhameed force-pushed the monoamd64intrintests branch 2 times, most recently from 7dd61e8 to 7fcab9c Compare June 13, 2021 00:01
Removes `OP_EXTRACT_U1` and `OP_EXTRACT_U2`. Instead, sign/zero
extension is determined via `inst_c1` for `OP_EXTRACT_*` and
`OP_XEXTRACT_*` (and `OP_EXTRACTX_U2`, which doesn't seem to be
generated as part of intrinsic translation), which must be set to a
MonoTypeEnum.

Replaces `OP_EXTRACT_VAR_*` with `OP_XEXTRACT_*`.

Fixes `JIT/Regression/JitBlue/GitHub_23159/GitHub_23159` and
`JIT/Regression/JitBlue/GitHub_13568/GitHub_13568`.
@imhameed imhameed force-pushed the monoamd64intrintests branch 2 times, most recently from b70d3bc to 48dd2d8 Compare June 13, 2021 15:04
@imhameed imhameed changed the title [mono] Reenable some amd64 intrinsic tests and enable a bunch of ISA extensions when AOTing [mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes Jun 13, 2021
@imhameed imhameed marked this pull request as ready for review June 13, 2021 22:49
@imhameed imhameed merged commit b8b3ef1 into dotnet:main Jun 13, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-mono runtime-mono specific to the Mono runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants