Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] ARM64 - Temporary fix for ldp/stp optimizations #90534

Merged
merged 5 commits into from
Aug 16, 2023

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Aug 14, 2023

Resolves #85765

With the latest, the code-gen is quite different from what was reported in the issue, and therefore doesn't reproduce. But the issue still exists and is able to be reproduced by a different sample:

using System;
using System.Runtime.CompilerServices;

// Expected: 515
// Actual: 0
public unsafe class Program
{
    public static void Main()
    {
        byte* bytes = stackalloc byte[1024];
        bytes[0x1A] = 1;
        bytes[0x1B] = 2;
        int sum = Foo(bytes);
        Console.WriteLine(sum);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int Foo(byte* b)
    {
        return Unsafe.ReadUnaligned<int>(ref b[0x1A]) + Unsafe.ReadUnaligned<int>(ref b[0x1B]);
    }
}

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 14, 2023
@ghost ghost assigned TIHan Aug 14, 2023
@ghost
Copy link

ghost commented Aug 14, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Resolves #85765

With the latest, the code-gen is quite different from what was reported in the issue.

Current code-gen:

; Assembly listing for method Program:Main() (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; invoked as altjit
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op <S1>
;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "Inline ldloca(s) first use temp" <S1>
;* V03 tmp2         [V03    ] (  0,  0   )  struct ( 8) zero-ref    ld-addr-op "Inline ldloca(s) first use temp" <S0>
;* V04 tmp3         [V04    ] (  0,  0   )   ubyte  ->  zero-ref    "field V00.F0 (fldOffset=0x0)" P-INDEP
;* V05 tmp4         [V05    ] (  0,  0   )    bool  ->  zero-ref    single-def "field V00.F1 (fldOffset=0x1)" P-INDEP
;* V06 tmp5         [V06    ] (  0,  0   )    bool  ->  zero-ref    "field V00.F2 (fldOffset=0x2)" P-INDEP
;* V07 tmp6         [V07    ] (  0,  0   )   ubyte  ->  zero-ref    single-def "field V02.F0 (fldOffset=0x0)" P-INDEP
;* V08 tmp7         [V08    ] (  0,  0   )    bool  ->  zero-ref    single-def "field V02.F1 (fldOffset=0x1)" P-INDEP
;* V09 tmp8         [V09    ] (  0,  0   )    bool  ->  zero-ref    single-def "field V02.F2 (fldOffset=0x2)" P-INDEP
;
; Lcl frame size = 0

G_M27646_IG01:
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
						;; size=8 bbWeight=1 PerfScore 1.50
G_M27646_IG02:
            mov     w0, wzr
            movz    x1, #0xD1FFAB1E      // code for System.Console:WriteLine(bool)
            movk    x1, #0xD1FFAB1E LSL #16
            movk    x1, #0xD1FFAB1E LSL #32
            ldr     x1, [x1]
            blr     x1
						;; size=24 bbWeight=1 PerfScore 6.00
G_M27646_IG03:
            ldp     fp, lr, [sp], #0x10
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00

; Total bytes of code 40, prolog size 8, PerfScore 13.50, instruction count 10, allocated bytes for code 40 (MethodHash=cb019401) for method Program:Main() (FullOpts)
Author: TIHan
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member

jakobbotsch commented Aug 14, 2023

This repros up until #86491 was merged. Can you check out e62cb64 (the parent of #86491) and repro the problem there? With altjit and TC=0 I get the same codegen as in the issue on that commit.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

@jakobbotsch What did you do to determine which PR could repo this? Did you just do a bisect?

@jakobbotsch
Copy link
Member

@jakobbotsch What did you do to determine which PR could repo this? Did you just do a bisect?

Yes. I keep Core_Roots compiled for all JIT commits so that I can quickly do that.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

Yes. I keep Core_Roots compiled for all JIT commits so that I can quickly do that.

That's a lot of Core_Roots :)

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

I checked out e62cb64 and did a fresh/clean checked build and the codegen is the same. Maybe I should try to go back further.

@jakobbotsch
Copy link
Member

What if you mark M4 as NoInlining? It is not being inlined for me, but it seems it is inlined in your codegen.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

Marking it as NoInline I was able to reproduce it, but only on that commit.

Latest code-gen is:

; Assembly listing for method Program:Main() (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; invoked as altjit
; Final local variable assignments
;
;  V00 loc0         [V00    ] (  5,  5   )  struct ( 8) [fp+0x18]  do-not-enreg[SB] ld-addr-op <S1>
;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [sp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V02 tmp1         [V02,T03] (  1,  1   )   ubyte  ->  [fp+0x18]  do-not-enreg[] "field V00.F0 (fldOffset=0x0)" P-DEP
;  V03 tmp2         [V03,T02] (  2,  2   )    bool  ->  [fp+0x19]  do-not-enreg[] "field V00.F1 (fldOffset=0x1)" P-DEP
;  V04 tmp3         [V04,T00] (  4,  4   )    bool  ->  [fp+0x1A]  do-not-enreg[] single-def "field V00.F2 (fldOffset=0x2)" P-DEP
;  V05 rat0         [V05,T01] (  2,  4   )  struct ( 8) [fp+0x10]  do-not-enreg[SF] "Return value temp for an odd struct return size" <S1>
;
; Lcl frame size = 16

G_M27646_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x20]!
            mov     fp, sp
						;; size=8 bbWeight=1 PerfScore 1.50
G_M27646_IG02:  ;; offset=0x0008
            movz    x0, #0xC408      // code for Program:M4():S1
            movk    x0, #0x5CDC LSL #16
            movk    x0, #0x7FFD LSL #32
            ldr     x0, [x0]
            blr     x0
            str     w0, [fp, #0x10]	// [V05 rat0]
            ldrh    w0, [fp, #0x10]
            strh    w0, [fp, #0x18]
            ldrb    w0, [fp, #0x12]
            strb    w0, [fp, #0x1A]
            ldrb    w0, [fp, #0x1A]	// [V04 tmp3]
            ldrb    w1, [fp, #0x19]	// [V03 tmp2]
            orr     w0, w0, w1
            strb    w0, [fp, #0x1A]	// [V04 tmp3]
            ldrb    w0, [fp, #0x1A]	// [V04 tmp3]
            movz    x1, #0x4CD8      // code for System.Console:WriteLine(bool)
            movk    x1, #0x5CFF LSL #16
            movk    x1, #0x7FFD LSL #32
            ldr     x1, [x1]
            blr     x1
						;; size=80 bbWeight=1 PerfScore 25.50
G_M27646_IG03:  ;; offset=0x0058
            ldp     fp, lr, [sp], #0x20
            ret     lr
						;; size=8 bbWeight=1 PerfScore 2.00

; Total bytes of code 96, prolog size 8, PerfScore 38.60, instruction count 24, allocated bytes for code 96 (MethodHash=cb019401) for method Program:Main() (FullOpts)
; ============================================================

@jakobbotsch
Copy link
Member

Marking it as NoInline I was able to reproduce it, but only on that commit.

This is in the backend, so we should fix it even if it no longer repros with this specific example. #86491 is a change in morph so it did not fix the backend bug.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

This is in the backend, so we should fix it even if it no longer repros with this specific example. #86491 is a change in morph so it did not fix the backend bug.

Looks like it is, but all I can do is try to fix it based on this commit and hope that it is correct considering there are no other examples that reproduce it in the latest.

@jakobbotsch
Copy link
Member

Looks like it is, but all I can do is try to fix it based on this commit and hope that it is correct considering there are no other examples that reproduce it in the latest.

Fixing the backend bug on a commit that is a few months older is just fine. You should be able to test your fix on that commit. I did the same in #90246.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

It doesn't leave me feeling confident knowing that this cannot be reproduced in latest.

@jakobbotsch
Copy link
Member

How would you know that the problem cannot be reproduced in main? Once you understand the problem you might even be able to construct an example yourself.

@jakobbotsch
Copy link
Member

Here is an example that reproduces the problem on main:

using System;
using System.Runtime.CompilerServices;

public unsafe class Program
{
    public static void Main()
    {
        byte* bytes = stackalloc byte[1024];
        bytes[0x1A] = 1;
        bytes[0x1B] = 2;
        int sum = Foo(bytes);
        Console.WriteLine(sum);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static int Foo(byte* b)
    {
        return Unsafe.ReadUnaligned<int>(ref b[0x1A]) + Unsafe.ReadUnaligned<int>(ref b[0x1B]);
    }
}

Expected: 515
Actual: 0

@TIHan
Copy link
Contributor Author

TIHan commented Aug 14, 2023

Once you understand the problem you might even be able to construct an example yourself.

That's the tricky part for this problem. I have no idea what I'm looking at as it's new to me.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 15, 2023

That new example, interestingly, it outputs 515 in e62cb64 .

@jakobbotsch
Copy link
Member

That new example, interestingly, it outputs 515 in e62cb64 .

Odd, I get the same codegen on main and e62cb64: a ldp w1, w0, [x0, #0x68] that incorrectly loads at 0x1A * 4.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 15, 2023

@jakobbotsch I made a quick fix, but I put in a "TODO".

The problem is a little complicated, but the issue is that 'imm' and/or 'prevImm' are assumed to be "scaled" when attempting to do a ldr/str pair optimization to ldp/stp respectively. The bug is that 'imm' and/or 'prevImm' are not "scaled" so it's using the wrong values for the comparisons. We should fix this, but we can fix it later.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 15, 2023

@dotnet/jit-contrib @BruceForstall @jakobbotsch this is ready, pending CI.

@TIHan TIHan changed the title Added regression test 85765 [JIT] ARM64 - Temporary fix for ldp/stp optimizations Aug 15, 2023
@BruceForstall
Copy link
Member

No diffs

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

We should back-port this to .NET 8.

In the future, I'd like to see the "peephole optimization" code deal in "actual" values, not "encoded" values.

@TIHan TIHan merged commit 99a60c6 into dotnet:main Aug 16, 2023
120 of 129 checks passed
@TIHan
Copy link
Contributor Author

TIHan commented Aug 16, 2023

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5881464622

@github-actions
Copy link
Contributor

@TIHan an error occurred while backporting to release/8.0, please check the run log for details!

Error: @TIHan is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your dotnet team membership visibility is set to Public on https://github.com/orgs/dotnet/people?query=TIHan

@TIHan
Copy link
Contributor Author

TIHan commented Aug 16, 2023

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5882707392

@jakobbotsch
Copy link
Member

@TIHan The test added here doesn't build -- seems like CI was red when this PR was merged.

@elinor-fung
Copy link
Member

@TIHan the test added doesn't build: src/tests/JIT/Regression/JitBlue/Runtime_85765/Runtime_85765.cs(59,27): error CS0214: Pointers and fixed size buffers may only be used in an unsafe context [/__w/1/s/src/tests/JIT/Regression/JitBlue/Runtime_85765/Runtime_85765.csproj]

@TIHan
Copy link
Contributor Author

TIHan commented Aug 16, 2023

Ok, will make a quick PR to fix this.

@TIHan
Copy link
Contributor Author

TIHan commented Aug 16, 2023

#90698

@jkotas
Copy link
Member

jkotas commented Aug 16, 2023

I am going to revert this. This can be introducing number of other problems since the CI was all read when this was merged.

jkotas added a commit that referenced this pull request Aug 16, 2023
@TIHan
Copy link
Contributor Author

TIHan commented Aug 16, 2023

This is the same PR as this but with the test fix.

My fault for not checking.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Invalid ldp optimization with locals
6 participants