Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Improve call args interference checking when stores are involved #97409

Merged
merged 7 commits into from
Jan 25, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jan 23, 2024

  • Fix the propagation of GTF_ASG during call args morphing
  • Introduce Compiler::gtMayHaveStoreInterference that can check whether two trees interfere with each other due to a store in one tree that stores to a local read by the other tree
  • Use the new helper when checking for whether we should reverse GT_STOREIND nodes
  • Use the new helper when deciding whether previous args need to be evaluated to temps because we see an argument with an embedded store (typically a call now that we propagate flags correctly).
  • Use the new helpers when checking interference in optRedundantRelop

Fix #13758

- Fix the propagation of `GTF_ASG` during call args morphing
- Introduce `Compiler::gtHaveStoreInterference` that can check whether
  two trees interfere with each other due to a store in one tree that
  stores to a local read by the other tree
- Use the new helper when checking for whether we should reverse
  `GTF_STOREIND` nodes
- Use the new helper when deciding whether previous args need to be
  evaluated to temps because we see an argument with an embedded store
  (typically a call).

Fix dotnet#13758
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 23, 2024
@ghost ghost assigned jakobbotsch Jan 23, 2024
@ghost
Copy link

ghost commented Jan 23, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Fix the propagation of GTF_ASG during call args morphing
  • Introduce Compiler::gtHaveStoreInterference that can check whether two trees interfere with each other due to a store in one tree that stores to a local read by the other tree
  • Use the new helper when checking for whether we should reverse GTF_STOREIND nodes
  • Use the new helper when deciding whether previous args need to be evaluated to temps because we see an argument with an embedded store (typically a call).

Fix #13758

Author: jakobbotsch
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch

This comment was marked as outdated.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, Fuzzlyn

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@jakobbotsch
Copy link
Member Author

jitstress failures are #97437

@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. Wins across the board from enabling this optimization and also doing it in MinOpts.

Initially I just wanted to fix #13758. However, fixing the propagation causes several regressions in places that are using GTF_ASG for interference checks. I identified these three places

  • Call args morphing
  • optRedundantRelop
  • gtSetEvalOrder in the GT_STOREIND case

and switched them to more precise checking. The first seems to be by far the most impactful.

I've looked at most of the benchmarks.run_pgo FullOpts regressions. Almost all of them are because the removal of these temps means less live range splitting for LSRA. This doesn't usually result in new spills, but it does mean we sometimes pick registers that take more space to encode, or where we end up needing another reg-reg mov. For example, most regressions are similar in spirit to the following:

 ; Assembly listing for method Microsoft.Cci.MetadataWriter:GetTypeAttributes(Microsoft.Cci.ITypeDefinition):int:this (Tier1)
 ; Emitting BLENDED_CODE for X64 with AVX512 - Windows
 ; Tier1 code
 ; optimized code
 ; rsp based frame
 ; partially interruptible
 ; No matching PGO data
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  3,  3   )     ref  ->  rcx         this class-hnd single-def <Microsoft.Cci.MetadataWriter>
-;  V01 arg1         [V01,T01] (  3,  3   )     ref  ->  rdx         class-hnd single-def <Microsoft.Cci.ITypeDefinition>
+;  V01 arg1         [V01,T01] (  3,  3   )     ref  ->  rax         class-hnd single-def <Microsoft.Cci.ITypeDefinition>
 ;  V02 OutArgs      [V02    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
 ;  V03 tmp1         [V03    ] (  2,  4   )  struct (48) [rsp+0x28]  do-not-enreg[XS] addr-exposed "by-value struct argument" <Microsoft.CodeAnalysis.Emit.EmitContext>
-;  V04 tmp2         [V04,T02] (  2,  4   )     ref  ->  rdx         single-def "argument with side effect"
 ;
 ; Lcl frame size = 88
 
 G_M23091_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
        sub      rsp, 88
        vzeroupper 
-						;; size=7 bbWeight=1 PerfScore 1.25
-G_M23091_IG02:        ; bbWeight=1, gcrefRegs=0006 {rcx rdx}, byrefRegs=0000 {}, byref, nogc
-       ; gcrRegs +[rcx rdx]
+       mov      rax, rdx
+       ; gcrRegs +[rax]
+						;; size=10 bbWeight=1 PerfScore 1.50
+G_M23091_IG02:        ; bbWeight=1, gcrefRegs=0003 {rax rcx}, byrefRegs=0000 {}, byref, nogc
+       ; gcrRegs +[rcx]
        vmovdqu  ymm0, ymmword ptr [rcx+0xD0]
        vmovdqu  ymmword ptr [rsp+0x28], ymm0
        vmovdqu  xmm0, xmmword ptr [rcx+0xF0]
        vmovdqu  xmmword ptr [rsp+0x48], xmm0
 						;; size=28 bbWeight=1 PerfScore 11.00
 G_M23091_IG03:        ; bbWeight=1, extend
-       mov      rcx, rdx
        lea      rdx, [rsp+0x28]
-       ; gcrRegs -[rdx]
+       mov      rcx, rax
        call     [<unknown method>]
-       ; gcrRegs -[rcx]
+       ; gcrRegs -[rax rcx]
        ; gcr arg pop 0
        nop      
 						;; size=15 bbWeight=1 PerfScore 4.00
 G_M23091_IG04:        ; bbWeight=1, epilog, nogc, extend
        add      rsp, 88
        ret      
 						;; size=5 bbWeight=1 PerfScore 1.25
 
-; Total bytes of code 55, prolog size 7, PerfScore 17.50, instruction count 12, allocated bytes for code 55 (MethodHash=af30a5cc) for method Microsoft.Cci.MetadataWriter:GetTypeAttributes(Microsoft.Cci.ITypeDefinition):int:this (Tier1)
+; Total bytes of code 58, prolog size 7, PerfScore 17.75, instruction count 13, allocated bytes for code 58 (MethodHash=af30a5cc) for method Microsoft.Cci.MetadataWriter:GetTypeAttributes(Microsoft.Cci.ITypeDefinition):int:this (Tier1)
 ; ============================================================

which results in the one additional reg-reg mov.

@jakobbotsch jakobbotsch marked this pull request as ready for review January 24, 2024 14:46
@@ -2148,7 +2139,7 @@ bool Compiler::optRedundantRelop(BasicBlock* const block)

for (unsigned int i = 0; i < definedLocalsCount; i++)
{
if (gtHasRef(prevTreeData, definedLocals[i]))
if (gtTreeHasLocalRead(prevTreeData, definedLocals[i]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a small bug fix here -- gtHasRef does not take promotion into account.

@ryujit-bot
Copy link

Diff results for #97409

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,501,156 contexts (1,003,806 MinOpts, 1,497,350 FullOpts).

MISSED contexts: base: 4,060 (0.16%), diff: 4,061 (0.16%)

Overall (-1,073,216 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,577,120 -2,576
benchmarks.run_pgo.linux.arm64.checked.mch 81,135,316 -67,672
benchmarks.run_tiered.linux.arm64.checked.mch 24,708,980 -59,456
coreclr_tests.run.linux.arm64.checked.mch 509,824,836 -372,928
libraries.crossgen2.linux.arm64.checked.mch 55,738,036 -12,004
libraries.pmi.linux.arm64.checked.mch 76,022,852 -11,096
libraries_tests.run.linux.arm64.Release.mch 381,446,624 -507,536
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 162,656,488 -31,200
realworld.run.linux.arm64.checked.mch 15,906,904 -8,516
smoke_tests.nativeaot.linux.arm64.checked.mch 2,949,332 -232
MinOpts (-911,184 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 24,936,460 -65,264
benchmarks.run_tiered.linux.arm64.checked.mch 19,784,856 -58,804
coreclr_tests.run.linux.arm64.checked.mch 349,225,056 -316,420
libraries_tests.run.linux.arm64.Release.mch 215,297,140 -465,236
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 13,481,212 -1,016
realworld.run.linux.arm64.checked.mch 585,368 -4,444
FullOpts (-162,032 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm64.checked.mch 15,274,112 -2,576
benchmarks.run_pgo.linux.arm64.checked.mch 56,198,856 -2,408
benchmarks.run_tiered.linux.arm64.checked.mch 4,924,124 -652
coreclr_tests.run.linux.arm64.checked.mch 160,599,780 -56,508
libraries.crossgen2.linux.arm64.checked.mch 55,736,400 -12,004
libraries.pmi.linux.arm64.checked.mch 75,902,868 -11,096
libraries_tests.run.linux.arm64.Release.mch 166,149,484 -42,300
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 149,175,276 -30,184
realworld.run.linux.arm64.checked.mch 15,321,536 -4,072
smoke_tests.nativeaot.linux.arm64.checked.mch 2,948,384 -232

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,595,003 contexts (1,052,329 MinOpts, 1,542,674 FullOpts).

MISSED contexts: base: 3,628 (0.14%), diff: 3,632 (0.14%)

Overall (-1,002,204 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 13,735,124 -3,912
benchmarks.run_pgo.linux.x64.checked.mch 68,635,056 -34,985
benchmarks.run_tiered.linux.x64.checked.mch 17,373,152 -2,107
coreclr_tests.run.linux.x64.checked.mch 459,551,078 -378,596
libraries.crossgen2.linux.x64.checked.mch 38,670,232 -20,898
libraries.pmi.linux.x64.checked.mch 60,144,132 -25,391
libraries_tests.run.linux.x64.Release.mch 333,551,492 -488,707
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 130,468,363 -41,266
realworld.run.linux.x64.checked.mch 13,194,158 -6,001
smoke_tests.nativeaot.linux.x64.checked.mch 4,198,166 -341
MinOpts (-727,753 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 19,829,757 -36,358
benchmarks.run_tiered.linux.x64.checked.mch 13,677,760 -889
coreclr_tests.run.linux.x64.checked.mch 326,558,135 -279,637
libraries_tests.run.linux.x64.Release.mch 184,389,521 -410,869
FullOpts (-274,451 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.x64.checked.mch 13,468,331 -3,912
benchmarks.run_pgo.linux.x64.checked.mch 48,805,299 +1,373
benchmarks.run_tiered.linux.x64.checked.mch 3,695,392 -1,218
coreclr_tests.run.linux.x64.checked.mch 132,992,943 -98,959
libraries.crossgen2.linux.x64.checked.mch 38,669,030 -20,898
libraries.pmi.linux.x64.checked.mch 60,031,262 -25,391
libraries_tests.run.linux.x64.Release.mch 149,161,971 -77,838
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 119,809,892 -41,266
realworld.run.linux.x64.checked.mch 12,805,052 -6,001
smoke_tests.nativeaot.linux.x64.checked.mch 4,197,255 -341

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,707 contexts (930,876 MinOpts, 1,331,831 FullOpts).

MISSED contexts: base: 3,256 (0.14%), diff: 3,258 (0.14%)

Overall (-803,772 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,178,360 -2,736
benchmarks.run_pgo.osx.arm64.checked.mch 34,671,272 -41,660
benchmarks.run_tiered.osx.arm64.checked.mch 15,557,888 -13,832
coreclr_tests.run.osx.arm64.checked.mch 485,381,240 -229,068
libraries.crossgen2.osx.arm64.checked.mch 55,622,104 -12,124
libraries.pmi.osx.arm64.checked.mch 79,954,540 -11,800
libraries_tests.run.osx.arm64.Release.mch 312,899,776 -453,244
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 160,911,076 -30,976
realworld.run.osx.arm64.checked.mch 15,072,368 -8,332
MinOpts (-652,688 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 16,473,280 -37,300
benchmarks.run_tiered.osx.arm64.checked.mch 11,515,076 -13,296
coreclr_tests.run.osx.arm64.checked.mch 332,306,456 -173,632
libraries_tests.run.osx.arm64.Release.mch 203,940,504 -423,000
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 13,137,528 -1,008
realworld.run.osx.arm64.checked.mch 568,404 -4,452
FullOpts (-151,084 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.osx.arm64.checked.mch 11,177,824 -2,736
benchmarks.run_pgo.osx.arm64.checked.mch 18,197,992 -4,360
benchmarks.run_tiered.osx.arm64.checked.mch 4,042,812 -536
coreclr_tests.run.osx.arm64.checked.mch 153,074,784 -55,436
libraries.crossgen2.osx.arm64.checked.mch 55,620,476 -12,124
libraries.pmi.osx.arm64.checked.mch 79,833,412 -11,800
libraries_tests.run.osx.arm64.Release.mch 108,959,272 -30,244
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 147,773,548 -29,968
realworld.run.osx.arm64.checked.mch 14,503,964 -3,880

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,318,205 contexts (931,543 MinOpts, 1,386,662 FullOpts).

MISSED contexts: base: 2,687 (0.12%), diff: 2,689 (0.12%)

Overall (-780,956 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,961,236 -2,460
benchmarks.run_pgo.windows.arm64.checked.mch 47,394,624 -42,728
benchmarks.run_tiered.windows.arm64.checked.mch 15,343,840 -13,924
coreclr_tests.run.windows.arm64.checked.mch 495,372,096 -228,492
libraries.crossgen2.windows.arm64.checked.mch 58,964,780 -11,900
libraries.pmi.windows.arm64.checked.mch 79,594,724 -11,288
libraries_tests.run.windows.arm64.Release.mch 310,503,812 -429,744
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 169,134,348 -31,556
realworld.run.windows.arm64.checked.mch 15,891,116 -8,524
smoke_tests.nativeaot.windows.arm64.checked.mch 3,973,164 -340
MinOpts (-643,252 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 16,250,384 -39,960
benchmarks.run_tiered.windows.arm64.checked.mch 11,189,376 -13,432
coreclr_tests.run.windows.arm64.checked.mch 339,091,528 -180,424
libraries_tests.run.windows.arm64.Release.mch 201,581,504 -403,976
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 13,137,464 -1,016
realworld.run.windows.arm64.checked.mch 568,424 -4,444
FullOpts (-137,704 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.arm64.checked.mch 10,960,700 -2,460
benchmarks.run_pgo.windows.arm64.checked.mch 31,144,240 -2,768
benchmarks.run_tiered.windows.arm64.checked.mch 4,154,464 -492
coreclr_tests.run.windows.arm64.checked.mch 156,280,568 -48,068
libraries.crossgen2.windows.arm64.checked.mch 58,963,144 -11,900
libraries.pmi.windows.arm64.checked.mch 79,474,740 -11,288
libraries_tests.run.windows.arm64.Release.mch 108,922,308 -25,768
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 155,996,884 -30,540
realworld.run.windows.arm64.checked.mch 15,322,692 -4,080
smoke_tests.nativeaot.windows.arm64.checked.mch 3,972,192 -340

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,492,895 contexts (983,689 MinOpts, 1,509,206 FullOpts).

MISSED contexts: base: 3,899 (0.16%), diff: 3,916 (0.16%)

Overall (-2,840,990 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 42,178,999 -126,131
benchmarks.run.windows.x64.checked.mch 8,747,477 -2,284
benchmarks.run_pgo.windows.x64.checked.mch 35,391,293 -134,490
benchmarks.run_tiered.windows.x64.checked.mch 12,661,498 -76,974
coreclr_tests.run.windows.x64.checked.mch 393,404,923 -994,701
libraries.crossgen2.windows.x64.checked.mch 39,443,922 -32,948
libraries.pmi.windows.x64.checked.mch 61,386,973 -16,895
libraries_tests.run.windows.x64.Release.mch 281,632,278 -1,346,220
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 133,912,278 -96,807
realworld.run.windows.x64.checked.mch 14,170,687 -13,266
smoke_tests.nativeaot.windows.x64.checked.mch 5,092,364 -274
MinOpts (-2,692,403 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 14,658,725 -118,356
benchmarks.run_pgo.windows.x64.checked.mch 14,234,977 -135,881
benchmarks.run_tiered.windows.x64.checked.mch 9,185,266 -76,584
coreclr_tests.run.windows.x64.checked.mch 273,542,992 -972,566
libraries_tests.run.windows.x64.Release.mch 178,368,316 -1,315,308
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 10,423,370 -70,606
realworld.run.windows.x64.checked.mch 389,705 -3,102
FullOpts (-148,587 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 27,520,274 -7,775
benchmarks.run.windows.x64.checked.mch 8,747,116 -2,284
benchmarks.run_pgo.windows.x64.checked.mch 21,156,316 +1,391
benchmarks.run_tiered.windows.x64.checked.mch 3,476,232 -390
coreclr_tests.run.windows.x64.checked.mch 119,861,931 -22,135
libraries.crossgen2.windows.x64.checked.mch 39,442,733 -32,948
libraries.pmi.windows.x64.checked.mch 61,273,454 -16,895
libraries_tests.run.windows.x64.Release.mch 103,263,962 -30,912
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 123,488,908 -26,201
realworld.run.windows.x64.checked.mch 13,780,982 -10,164
smoke_tests.nativeaot.windows.x64.checked.mch 5,091,455 -274

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,237,690 contexts (827,812 MinOpts, 1,409,878 FullOpts).

MISSED contexts: 74,588 (3.23%)

Overall (-277,852 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 15,303,134 -1,294
benchmarks.run_pgo.linux.arm.checked.mch 61,262,128 +39,336
benchmarks.run_tiered.linux.arm.checked.mch 22,643,448 -1,326
coreclr_tests.run.linux.arm.checked.mch 321,791,116 -209,470
libraries.crossgen2.linux.arm.checked.mch 35,175,262 -5,168
libraries.pmi.linux.arm.checked.mch 49,615,946 -6,010
libraries_tests.run.linux.arm.Release.mch 242,762,120 -83,274
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,201,612 -8,510
realworld.run.linux.arm.checked.mch 13,613,446 -2,136
MinOpts (-137,640 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 10,797,602 -6,080
benchmarks.run_tiered.linux.arm.checked.mch 9,107,156 -74
coreclr_tests.run.linux.arm.checked.mch 212,730,134 -77,912
libraries_tests.run.linux.arm.Release.mch 122,002,944 -53,574
FullOpts (-140,212 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.linux.arm.checked.mch 14,913,878 -1,294
benchmarks.run_pgo.linux.arm.checked.mch 50,464,526 +45,416
benchmarks.run_tiered.linux.arm.checked.mch 13,536,292 -1,252
coreclr_tests.run.linux.arm.checked.mch 109,060,982 -131,558
libraries.crossgen2.linux.arm.checked.mch 35,174,032 -5,168
libraries.pmi.linux.arm.checked.mch 49,509,442 -6,010
libraries_tests.run.linux.arm.Release.mch 120,759,176 -29,700
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,117,792 -8,510
realworld.run.linux.arm.checked.mch 13,163,500 -2,136

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,296,119 contexts (841,817 MinOpts, 1,454,302 FullOpts).

MISSED contexts: base: 5,093 (0.22%), diff: 5,251 (0.23%)

Overall (-182,725 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,104,846 -1,078
benchmarks.run_pgo.windows.x86.checked.mch 45,219,159 -7,368
benchmarks.run_tiered.windows.x86.checked.mch 9,509,296 -932
coreclr_tests.run.windows.x86.checked.mch 309,170,651 -39,164
libraries.crossgen2.windows.x86.checked.mch 31,620,161 -3,178
libraries.pmi.windows.x86.checked.mch 48,795,839 -4,664
libraries_tests.run.windows.x86.Release.mch 185,793,630 -119,035
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 102,157,669 -6,712
realworld.run.windows.x86.checked.mch 11,359,539 -594
MinOpts (-105,402 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 6,629,490 -8,597
benchmarks.run_tiered.windows.x86.checked.mch 4,269,809 -101
coreclr_tests.run.windows.x86.checked.mch 201,671,769 -20,973
libraries_tests.run.windows.x86.Release.mch 98,331,507 -75,731
FullOpts (-77,323 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run.windows.x86.checked.mch 7,104,567 -1,078
benchmarks.run_pgo.windows.x86.checked.mch 38,589,669 +1,229
benchmarks.run_tiered.windows.x86.checked.mch 5,239,487 -831
coreclr_tests.run.windows.x86.checked.mch 107,498,882 -18,191
libraries.crossgen2.windows.x86.checked.mch 31,619,104 -3,178
libraries.pmi.windows.x86.checked.mch 48,700,525 -4,664
libraries_tests.run.windows.x86.Release.mch 87,462,123 -43,304
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 93,487,877 -6,712
realworld.run.windows.x86.checked.mch 11,063,839 -594

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (-0.17% to -0.02%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.07%
benchmarks.run_pgo.linux.arm64.checked.mch -0.06%
benchmarks.run_tiered.linux.arm64.checked.mch -0.17%
coreclr_tests.run.linux.arm64.checked.mch -0.05%
libraries.crossgen2.linux.arm64.checked.mch -0.10%
libraries.pmi.linux.arm64.checked.mch -0.08%
libraries_tests.run.linux.arm64.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.07%
realworld.run.linux.arm64.checked.mch -0.13%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%
MinOpts (-0.36% to +0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.01%
benchmarks.run_pgo.linux.arm64.checked.mch -0.20%
benchmarks.run_tiered.linux.arm64.checked.mch -0.28%
coreclr_tests.run.linux.arm64.checked.mch -0.05%
libraries.pmi.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch -0.15%
realworld.run.linux.arm64.checked.mch -0.36%
FullOpts (-0.13% to -0.02%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch -0.07%
benchmarks.run_pgo.linux.arm64.checked.mch -0.04%
benchmarks.run_tiered.linux.arm64.checked.mch -0.04%
coreclr_tests.run.linux.arm64.checked.mch -0.05%
libraries.crossgen2.linux.arm64.checked.mch -0.10%
libraries.pmi.linux.arm64.checked.mch -0.08%
libraries_tests.run.linux.arm64.Release.mch -0.07%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.07%
realworld.run.linux.arm64.checked.mch -0.13%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%

Throughput diffs for linux/x64 ran on windows/x64

Overall (-0.06% to -0.01%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.02%
benchmarks.run_pgo.linux.x64.checked.mch -0.03%
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
coreclr_tests.run.linux.x64.checked.mch -0.03%
libraries.crossgen2.linux.x64.checked.mch -0.04%
libraries.pmi.linux.x64.checked.mch -0.05%
libraries_tests.run.linux.x64.Release.mch -0.06%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.06%
realworld.run.linux.x64.checked.mch -0.05%
smoke_tests.nativeaot.linux.x64.checked.mch -0.03%
MinOpts (-0.09% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.linux.x64.checked.mch -0.08%
benchmarks.run_tiered.linux.x64.checked.mch +0.01%
coreclr_tests.run.linux.x64.checked.mch -0.03%
libraries.crossgen2.linux.x64.checked.mch -0.01%
libraries_tests.run.linux.x64.Release.mch -0.09%
realworld.run.linux.x64.checked.mch -0.01%
smoke_tests.nativeaot.linux.x64.checked.mch -0.01%
FullOpts (-0.06% to -0.02%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch -0.02%
benchmarks.run_pgo.linux.x64.checked.mch -0.03%
benchmarks.run_tiered.linux.x64.checked.mch -0.02%
coreclr_tests.run.linux.x64.checked.mch -0.04%
libraries.crossgen2.linux.x64.checked.mch -0.04%
libraries.pmi.linux.x64.checked.mch -0.05%
libraries_tests.run.linux.x64.Release.mch -0.05%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.06%
realworld.run.linux.x64.checked.mch -0.05%
smoke_tests.nativeaot.linux.x64.checked.mch -0.03%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (-0.13% to -0.04%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.06%
benchmarks.run_pgo.osx.arm64.checked.mch -0.08%
benchmarks.run_tiered.osx.arm64.checked.mch -0.04%
coreclr_tests.run.osx.arm64.checked.mch -0.04%
libraries.crossgen2.osx.arm64.checked.mch -0.10%
libraries.pmi.osx.arm64.checked.mch -0.08%
libraries_tests.run.osx.arm64.Release.mch -0.10%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.07%
realworld.run.osx.arm64.checked.mch -0.13%
MinOpts (-0.38% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.osx.arm64.checked.mch -0.16%
benchmarks.run_tiered.osx.arm64.checked.mch -0.07%
coreclr_tests.run.osx.arm64.checked.mch -0.03%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch -0.14%
realworld.run.osx.arm64.checked.mch -0.38%
FullOpts (-0.13% to -0.02%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch -0.06%
benchmarks.run_pgo.osx.arm64.checked.mch -0.06%
benchmarks.run_tiered.osx.arm64.checked.mch -0.02%
coreclr_tests.run.osx.arm64.checked.mch -0.05%
libraries.crossgen2.osx.arm64.checked.mch -0.10%
libraries.pmi.osx.arm64.checked.mch -0.08%
libraries_tests.run.osx.arm64.Release.mch -0.07%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch -0.07%
realworld.run.osx.arm64.checked.mch -0.13%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (-0.13% to -0.02%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.06%
benchmarks.run_pgo.windows.arm64.checked.mch -0.08%
benchmarks.run_tiered.windows.arm64.checked.mch -0.05%
coreclr_tests.run.windows.arm64.checked.mch -0.04%
libraries.crossgen2.windows.arm64.checked.mch -0.10%
libraries.pmi.windows.arm64.checked.mch -0.08%
libraries_tests.run.windows.arm64.Release.mch -0.09%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.07%
realworld.run.windows.arm64.checked.mch -0.13%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.02%
MinOpts (-0.38% to +0.01%)
Collection PDIFF
benchmarks.run_pgo.windows.arm64.checked.mch -0.17%
benchmarks.run_tiered.windows.arm64.checked.mch -0.07%
coreclr_tests.run.windows.arm64.checked.mch -0.03%
libraries.pmi.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch -0.14%
realworld.run.windows.arm64.checked.mch -0.38%
FullOpts (-0.13% to -0.02%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch -0.06%
benchmarks.run_pgo.windows.arm64.checked.mch -0.06%
benchmarks.run_tiered.windows.arm64.checked.mch -0.04%
coreclr_tests.run.windows.arm64.checked.mch -0.04%
libraries.crossgen2.windows.arm64.checked.mch -0.10%
libraries.pmi.windows.arm64.checked.mch -0.08%
libraries_tests.run.windows.arm64.Release.mch -0.07%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch -0.07%
realworld.run.windows.arm64.checked.mch -0.13%
smoke_tests.nativeaot.windows.arm64.checked.mch -0.02%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.31% to -0.08%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.16%
benchmarks.run.windows.x64.checked.mch -0.13%
benchmarks.run_pgo.windows.x64.checked.mch -0.13%
benchmarks.run_tiered.windows.x64.checked.mch -0.18%
coreclr_tests.run.windows.x64.checked.mch -0.08%
libraries.crossgen2.windows.x64.checked.mch -0.21%
libraries.pmi.windows.x64.checked.mch -0.14%
libraries_tests.run.windows.x64.Release.mch -0.19%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.14%
realworld.run.windows.x64.checked.mch -0.31%
smoke_tests.nativeaot.windows.x64.checked.mch -0.12%
MinOpts (-0.45% to +0.01%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.36%
benchmarks.run_pgo.windows.x64.checked.mch -0.43%
benchmarks.run_tiered.windows.x64.checked.mch -0.36%
coreclr_tests.run.windows.x64.checked.mch -0.10%
libraries.pmi.windows.x64.checked.mch +0.01%
libraries_tests.run.windows.x64.Release.mch -0.35%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.45%
realworld.run.windows.x64.checked.mch -0.42%
FullOpts (-0.31% to -0.06%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.13%
benchmarks.run.windows.x64.checked.mch -0.13%
benchmarks.run_pgo.windows.x64.checked.mch -0.08%
benchmarks.run_tiered.windows.x64.checked.mch -0.06%
coreclr_tests.run.windows.x64.checked.mch -0.06%
libraries.crossgen2.windows.x64.checked.mch -0.21%
libraries.pmi.windows.x64.checked.mch -0.14%
libraries_tests.run.windows.x64.Release.mch -0.13%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch -0.13%
realworld.run.windows.x64.checked.mch -0.31%
smoke_tests.nativeaot.windows.x64.checked.mch -0.12%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (-0.03% to +0.02%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.01%
benchmarks.run_pgo.linux.arm.checked.mch +0.01%
benchmarks.run_tiered.linux.arm.checked.mch +0.02%
coreclr_tests.run.linux.arm.checked.mch -0.03%
libraries_tests.run.linux.arm.Release.mch -0.02%
realworld.run.linux.arm.checked.mch -0.01%
MinOpts (-0.03% to +0.02%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.02%
benchmarks.run_pgo.linux.arm.checked.mch -0.03%
benchmarks.run_tiered.linux.arm.checked.mch +0.02%
coreclr_tests.run.linux.arm.checked.mch -0.03%
libraries.pmi.linux.arm.checked.mch +0.02%
libraries_tests.run.linux.arm.Release.mch -0.02%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.01%
realworld.run.linux.arm.checked.mch +0.01%
FullOpts (-0.03% to +0.01%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.01%
benchmarks.run_pgo.linux.arm.checked.mch +0.01%
benchmarks.run_tiered.linux.arm.checked.mch +0.01%
coreclr_tests.run.linux.arm.checked.mch -0.03%
libraries_tests.run.linux.arm.Release.mch -0.02%
realworld.run.linux.arm.checked.mch -0.01%

Throughput diffs for windows/x86 ran on windows/x86

Overall (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_pgo.windows.x86.checked.mch -0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
libraries.crossgen2.windows.x86.checked.mch +0.02%
libraries.pmi.windows.x86.checked.mch +0.02%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.03%
realworld.run.windows.x86.checked.mch +0.02%
MinOpts (-0.03% to +0.03%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_pgo.windows.x86.checked.mch -0.03%
benchmarks.run_tiered.windows.x86.checked.mch +0.03%
coreclr_tests.run.windows.x86.checked.mch +0.01%
libraries.crossgen2.windows.x86.checked.mch +0.01%
libraries.pmi.windows.x86.checked.mch +0.03%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.02%
realworld.run.windows.x86.checked.mch +0.02%
FullOpts (-0.01% to +0.03%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_pgo.windows.x86.checked.mch -0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
libraries.crossgen2.windows.x86.checked.mch +0.02%
libraries.pmi.windows.x86.checked.mch +0.02%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.03%
realworld.run.windows.x86.checked.mch +0.02%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (-0.17% to -0.02%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.06%
benchmarks.run_pgo.linux.arm64.checked.mch -0.07%
libraries.crossgen2.linux.arm64.checked.mch -0.11%
realworld.run.linux.arm64.checked.mch -0.14%
libraries_tests.run.linux.arm64.Release.mch -0.10%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%
benchmarks.run.linux.arm64.checked.mch -0.08%
libraries.pmi.linux.arm64.checked.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.08%
benchmarks.run_tiered.linux.arm64.checked.mch -0.17%
MinOpts (-0.36% to +0.00%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.05%
benchmarks.run_pgo.linux.arm64.checked.mch -0.20%
libraries.crossgen2.linux.arm64.checked.mch -0.01%
realworld.run.linux.arm64.checked.mch -0.36%
libraries_tests.run.linux.arm64.Release.mch -0.16%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.01%
benchmarks.run_tiered.linux.arm64.checked.mch -0.28%
FullOpts (-0.14% to -0.02%)
Collection PDIFF
coreclr_tests.run.linux.arm64.checked.mch -0.06%
benchmarks.run_pgo.linux.arm64.checked.mch -0.05%
libraries.crossgen2.linux.arm64.checked.mch -0.11%
realworld.run.linux.arm64.checked.mch -0.14%
libraries_tests.run.linux.arm64.Release.mch -0.08%
smoke_tests.nativeaot.linux.arm64.checked.mch -0.02%
benchmarks.run.linux.arm64.checked.mch -0.08%
libraries.pmi.linux.arm64.checked.mch -0.09%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch -0.08%
benchmarks.run_tiered.linux.arm64.checked.mch -0.05%

Throughput diffs for linux/x64 ran on linux/x64

Overall (-0.07% to -0.01%)
Collection PDIFF
realworld.run.linux.x64.checked.mch -0.05%
benchmarks.run_pgo.linux.x64.checked.mch -0.04%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.07%
smoke_tests.nativeaot.linux.x64.checked.mch -0.03%
libraries.pmi.linux.x64.checked.mch -0.05%
benchmarks.run_tiered.linux.x64.checked.mch -0.01%
libraries_tests.run.linux.x64.Release.mch -0.06%
coreclr_tests.run.linux.x64.checked.mch -0.04%
benchmarks.run.linux.x64.checked.mch -0.02%
libraries.crossgen2.linux.x64.checked.mch -0.04%
MinOpts (-0.10% to +0.00%)
Collection PDIFF
realworld.run.linux.x64.checked.mch -0.03%
benchmarks.run_pgo.linux.x64.checked.mch -0.08%
smoke_tests.nativeaot.linux.x64.checked.mch -0.02%
libraries_tests.run.linux.x64.Release.mch -0.10%
coreclr_tests.run.linux.x64.checked.mch -0.03%
libraries.crossgen2.linux.x64.checked.mch -0.01%
FullOpts (-0.07% to -0.02%)
Collection PDIFF
realworld.run.linux.x64.checked.mch -0.05%
benchmarks.run_pgo.linux.x64.checked.mch -0.03%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch -0.07%
smoke_tests.nativeaot.linux.x64.checked.mch -0.03%
libraries.pmi.linux.x64.checked.mch -0.05%
benchmarks.run_tiered.linux.x64.checked.mch -0.02%
libraries_tests.run.linux.x64.Release.mch -0.05%
coreclr_tests.run.linux.x64.checked.mch -0.04%
benchmarks.run.linux.x64.checked.mch -0.02%
libraries.crossgen2.linux.x64.checked.mch -0.04%

Details here


Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

I wonder if it would be more efficient to generalize this to sets of locals? You could have the same kind of cap where if the sets grow too large you just bail out.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to approve, comments above are for possible follow up.

@jakobbotsch
Copy link
Member Author

Changes look good.

I wonder if it would be more efficient to generalize this to sets of locals? You could have the same kind of cap where if the sets grow too large you just bail out.

Seems likely it would be, e.g. I could factor SmallValueNumSet to a general set type and try to reuse it here. I'll consider that for a follow up.
We also have ALLVARSET, but I dislike it given how it pushes the handling of cases where you have more than lclMAX_ALLSET_TRACKED locals onto the user.

@jakobbotsch jakobbotsch merged commit 2558ff9 into dotnet:main Jan 25, 2024
126 of 129 checks passed
@jakobbotsch jakobbotsch deleted the call-args-morphing-flags branch January 25, 2024 08:44
@cincuranet
Copy link
Contributor

Improvements:

@github-actions github-actions bot locked and limited conversation to collaborators Mar 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing GTF_ASG and GTF_ALL_EFFECT on GenTree nodes.
4 participants