Add SPGO support and MIBC comparison in dotnet-pgo #52765

jakobbotsch · 2021-05-14T15:36:47Z

This allows dotnet-pgo to generate .mibc files using the sample data
stored in the trace that it is processing. It implements support for
both last branch record (LBR) data and normal IP samples. The latter can
be produced using PerfView as normal while the former currently requires
using xperf with LBR mode enabled. For posterity, to enable both logging
required .NET events and LBR, the following commands can be used (on
Windows):

xperf.exe -start "NT Kernel Logger" -on LOADER+PROC_THREAD+PMC_PROFILE -MinBuffers 4096 -MaxBuffers 4096 -BufferSize 4096 -pmcprofile BranchInstructionRetired -LastBranch PmcInterrupt -setProfInt BranchInstructionRetired 65537 -start clr -on e13c0d23-ccbc-4e12-931b-d9cc2eee27e4:0x40000A0018:0x5 -MinBuffers 4096 -MaxBuffers 4096 -BufferSize 4096
scenario.exe
xperf.exe -stop "NT Kernel Logger" -stop clr -d xperftrace.etl

SPGO does not currently do well with optimized code as the mapping
IP<->IL mappings the JIT produces there are not sufficiently accurate.
To collect data in tier-0 one can enable two environment variables
before running the scenario:

$env:COMPlus_TC_QuickJitForLoops=1
$env:COMPlus_TC_CallCounting=0

When samples are used the associated counts will not typically look
valid, i.e. they won't satisfy flow conservation. To remedy this,
dotnet-pgo performs a smoothing step after assigning samples to the
flow-graph of each method. The smoothing is based on [1] and the code
comes from Midori.

The commit adds some new commands to dotnet-pgo. The --spgo flag can be
specified to create-mibc to use samples to create the .mibc file. Also,
even if --spgo is specified, instrumented data will still be preferred
if available in the trace. If spgo is not specified, the behavior should
be the same as before.

--spgo-with-block-counts and --spgo-with-edge-counts control whether
dotnet-pgo outputs the smoothed block or edge counts (or both). By
default block counts are output. The JIT can use both forms of counts
but will be most happy if only one kind is present for each method.

--spgo-min-samples controls how many samples must be in each method
before smoothing is applied and the result included in the .mibc. SPGO
is quite sensitive to low sample counts and the produced results are not
good when the number of samples is low. By default, this value is 50.

The commit also adds a new compare-mibc command that allows to compare
two .mibc files. Usage is dotnet-pgo compare-mibc --input file1.mibc
--input file2.mibc. For example, comparing a .mibc produced via
instrumentation and one produced via SPGO (in tier-0) for some JIT
benchmarks produces the following:

Comparing instrumented.mibc to spgo.mibc
Statistics for instrumented.mibc
# Methods: 3490
# Methods with any profile data: 865
# Methods with 32-bit block counts: 0
# Methods with 64-bit block counts: 865
# Methods with 32-bit edge counts: 0
# Methods with 64-bit edge counts: 0
# Methods with type handle histograms: 184
# Methods with GetLikelyClass data: 0
# Profiled methods in instrumented.mibc not in spgo.mibc: 652

Statistics for spgo.mibc
# Methods: 1107
# Methods with any profile data: 286
# Methods with 32-bit block counts: 286
# Methods with 64-bit block counts: 0
# Methods with 32-bit edge counts: 0
# Methods with 64-bit edge counts: 0
# Methods with type handle histograms: 0
# Methods with GetLikelyClass data: 0
# Profiled methods in spgo.mibc not in instrumented.mibc: 73

Comparison
# Methods with profile data in both .mibc files: 213
  Of these, 213 have matching flow-graphs and the remaining 0 do not

When comparing the flow-graphs of the matching methods, their overlaps break down as follows:
100% █ (1.9%)
>95% █████████████████████████████████▌ (61.0%)
>90% ████████ (14.6%)
>85% ████▏ (7.5%)
>80% ████▋ (8.5%)
>75% █▊ (3.3%)
>70% █ (1.9%)
>65% ▎ (0.5%)
>60% ▎ (0.5%)
>55% ▏ (0.0%)
>50% ▏ (0.0%)
>45% ▏ (0.0%)
>40% ▎ (0.5%)
>35% ▏ (0.0%)
>30% ▏ (0.0%)
>25% ▏ (0.0%)
>20% ▏ (0.0%)
>15% ▏ (0.0%)
>10% ▏ (0.0%)
> 5% ▏ (0.0%)
> 0% ▏ (0.0%)
(using block counts)

I also made the dump command print some statistics about the .mibc that
was dumped. Hopefully some of this tooling can help track down #51908.

[1] Levin R., Newman I., Haber G. (2008) Complementing Missing and
Inaccurate Profiling Using a Minimum Cost Circulation Algorithm. In:
Stenström P., Dubois M., Katevenis M., Gupta R., Ungerer T. (eds) High
Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture
Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-540-77560-7_20

cc @davidwrighton @AndyAyersMS

This allows dotnet-pgo to generate .mibc files using the sample data stored in the trace that it is processing. It implements support for both last branch record (LBR) data and normal IP samples. The latter can be produced using PerfView as normal while the former currently requires using xperf with LBR mode enabled. For posterity, to enable both logging required .NET events and LBR, the following commands can be used (on Windows): ``` xperf.exe -start "NT Kernel Logger" -on LOADER+PROC_THREAD+PMC_PROFILE -MinBuffers 4096 -MaxBuffers 4096 -BufferSize 4096 -pmcprofile BranchInstructionRetired -LastBranch PmcInterrupt -setProfInt BranchInstructionRetired 65537 -start clr -on e13c0d23-ccbc-4e12-931b-d9cc2eee27e4:0x40000A0018:0x5 -MinBuffers 4096 -MaxBuffers 4096 -BufferSize 4096 scenario.exe xperf.exe -stop "NT Kernel Logger" -stop clr -d xperftrace.etl ``` SPGO does not currently do well with optimized code as the mapping IP<->IL mappings the JIT produces there are not sufficiently accurate. To collect data in tier-0 one can enable two environment variables before running the scenario: ``` $env:COMPlus_TC_QuickJitForLoops=1 $env:COMPlus_TC_CallCounting=0 ``` When samples are used the associated counts will not typically look valid, i.e. they won't satisfy flow conservation. To remedy this, dotnet-pgo performs a smoothing step after assigning samples to the flow-graph of each method. The smoothing is based on [1] and the code comes from Midori. The commit adds some new commands to dotnet-pgo. The --spgo flag can be specified to create-mibc to use samples to create the .mibc file. Also, even if --spgo is specified, instrumented data will still be preferred if available in the trace. If spgo is not specified, the behavior should be the same as before. --spgo-with-block-counts and --spgo-with-edge-counts control whether dotnet-pgo outputs the smoothed block or edge counts (or both). By default block counts are output. The JIT can use both forms of counts but will be most happy if only one kind is present for each method. --spgo-min-samples controls how many samples must be in each method before smoothing is applied and the result included in the .mibc. SPGO is quite sensitive to low sample counts and the produced results are not good when the number of samples is low. By default, this value is 50. The commit also adds a new compare-mibc command that allows to compare two .mibc files. Usage is dotnet-pgo compare-mibc --input file1.mibc --input file2.mibc. For example, comparing a .mibc produced via instrumentation and one produced via SPGO (in tier-0) for some JIT benchmarks produces the following: ``` Comparing instrumented.mibc to spgo.mibc Statistics for instrumented.mibc # Methods: 3490 # Methods with any profile data: 865 # Methods with 32-bit block counts: 0 # Methods with 64-bit block counts: 865 # Methods with 32-bit edge counts: 0 # Methods with 64-bit edge counts: 0 # Methods with type handle histograms: 184 # Methods with GetLikelyClass data: 0 # Profiled methods in instrumented.mibc not in spgo.mibc: 652 Statistics for spgo.mibc # Methods: 1107 # Methods with any profile data: 286 # Methods with 32-bit block counts: 286 # Methods with 64-bit block counts: 0 # Methods with 32-bit edge counts: 0 # Methods with 64-bit edge counts: 0 # Methods with type handle histograms: 0 # Methods with GetLikelyClass data: 0 # Profiled methods in spgo.mibc not in instrumented.mibc: 73 Comparison # Methods with profile data in both .mibc files: 213 Of these, 213 have matching flow-graphs and the remaining 0 do not When comparing the flow-graphs of the matching methods, their overlaps break down as follows: 100% █ (1.9%) >95% █████████████████████████████████▌ (61.0%) >90% ████████ (14.6%) >85% ████▏ (7.5%) >80% ████▋ (8.5%) >75% █▊ (3.3%) >70% █ (1.9%) >65% ▎ (0.5%) >60% ▎ (0.5%) >55% ▏ (0.0%) >50% ▏ (0.0%) >45% ▏ (0.0%) >40% ▎ (0.5%) >35% ▏ (0.0%) >30% ▏ (0.0%) >25% ▏ (0.0%) >20% ▏ (0.0%) >15% ▏ (0.0%) >10% ▏ (0.0%) > 5% ▏ (0.0%) > 0% ▏ (0.0%) (using block counts) ``` I also made the dump command print some statistics about the .mibc that was dumped. Hopefully some of this tooling can help track down dotnet#51908. [1] Levin R., Newman I., Haber G. (2008) Complementing Missing and Inaccurate Profiling Using a Minimum Cost Circulation Algorithm. In: Stenström P., Dubois M., Katevenis M., Gupta R., Ungerer T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2008. Lecture Notes in Computer Science, vol 4917. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77560-7_20

jakobbotsch · 2021-05-14T15:38:02Z

src/coreclr/tools/Common/Pgo/PgoFormat.cs

                        case PgoInstrumentationKind.EdgeIntCount:
+                        case PgoInstrumentationKind.EdgeLongCount:


I missed this in #51625 so merging files with 64-bit counts currently does not work.

Is there anything else you know of that would block us from using 64 bit counts across dynamic/static PGO?

I think we might want to update the class probe counter, but I don't think we use the value for anything right now, so perhaps it's not urgent. Changing this is a bit tricky as the runtime helper also knows the layout of this data.

I'm not familiar with anything, no. At least I believe I have modified all places using EdgeIntCount and BlockIntCount in the code base now, but maybe something else could be hiding somewhere.

I do now see that I missed the class probe counter. I'll submit a PR for that.

There is actually still this left:

runtime/src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs

Lines 2456 to 2457 in 3b8adb7

if (pSchema[iSchema].InstrumentationKind != PgoInstrumentationKind.BasicBlockIntCount)

return HRESULT.E_NOTIMPL;

I don't know if this is currently being used, however. Do we ever crossgen with instrumentation turned on?

davidwrighton

@jakobbotsch, this looks fantastic. I'm not going to review the spgo algorithm for correctness, as I'm no expert in the space, but the integration of it into the dotnet-pgo tool is absolutely first class. I'd like to see @AndyAyersMS do the review of the actual SPGO algorithm.

The other detail I'd like to see is moving the FlowGraph class from being part of your SPGO directory, to instead place it in src/coreclr/tools/Common/TypeSystem/IL next to ILOpcodeHelper.cs Looking at FlowGraph it looks general purpose, and may actually be useful for several purposes in crossgen2. For instance, we could record the flowgraph in some way into the mibc file and avoid causing edge/block count mismatches on merge. In another possibility we use it as some sort of complexity detector, and tweak how we choose which methods to precompile.

AndyAyersMS · 2021-05-14T18:49:56Z

Profiled methods in spgo.mibc not in instrumented.mibc: 73

I wonder if this happens because "minimal profiling" is enabled by default when jitting. That is, if a method has no interesting control flow, the jit won't add any count probes. I added this to reduce overhead for DynamicPGO.

You might play around with setting COMPlus_JitMinimalJitProfiling=0 to force count instrumentation in all methods. Or else mimic this behavior for SPGO reconstruction and not bother with trivial methods (presumably there's not much interesting to learn here).

We also might want to generally set COMPlus_JitMinimalJitProfiling=0when gathering static PGO data (especially if we aslo force methods to remain in Tier0) so that absolute profile counts are meaningful across methods - eg for computing which methods are globally important or for informing method layout algorithms.

AndyAyersMS

I want to echo what David said, this looks great.

I will be quite interested to try out the mibc comparison on some problems I'm puzzling through.

AndyAyersMS · 2021-05-14T20:41:05Z

src/coreclr/tools/Common/Pgo/PgoFormat.cs

                        case PgoInstrumentationKind.EdgeIntCount:
+                        case PgoInstrumentationKind.EdgeLongCount:


Is there anything else you know of that would block us from using 64 bit counts across dynamic/static PGO?

I think we might want to update the class probe counter, but I don't think we use the value for anything right now, so perhaps it's not urgent. Changing this is a bit tricky as the runtime helper also knows the layout of this data.

davidwrighton · 2021-05-14T21:46:01Z

@AndyAyersMS We should decide if we want 64bit counts always or 32bit counts in some cases. Right now the merger isn't able to merge a 64bit and 32bit count, which doesn't matter too much, but if we are going to have combinations of the two, we should be able to merge them together into 64bit counts for the edge/block data.

Also to get rid of 32bit counts will also involve removing support for the old IBC pipeline. We may wish to keep the IBC parser for csc scenarios as they still feed that data to us. Or perhaps we should just stop that too.

jakobbotsch · 2021-05-17T11:20:36Z

@jakobbotsch, this looks fantastic. I'm not going to review the spgo algorithm for correctness, as I'm no expert in the space, but the integration of it into the dotnet-pgo tool is absolutely first class. I'd like to see @AndyAyersMS do the review of the actual SPGO algorithm.

Thank you!

The other detail I'd like to see is moving the FlowGraph class from being part of your SPGO directory, to instead place it in src/coreclr/tools/Common/TypeSystem/IL next to ILOpcodeHelper.cs Looking at FlowGraph it looks general purpose, and may actually be useful for several purposes in crossgen2. For instance, we could record the flowgraph in some way into the mibc file and avoid causing edge/block count mismatches on merge. In another possibility we use it as some sort of complexity detector, and tweak how we choose which methods to precompile.

@davidwrighton Should I move the file there and include it in the ILCompiler.TypeSystem.ReadyToRun project, or should I just move it there and link it into dotnet-pgo?

I wonder if this happens because "minimal profiling" is enabled by default when jitting. That is, if a method has no interesting control flow, the jit won't add any count probes. I added this to reduce overhead for DynamicPGO.

You might play around with setting COMPlus_JitMinimalJitProfiling=0 to force count instrumentation in all methods. Or else mimic this behavior for SPGO reconstruction and not bother with trivial methods (presumably there's not much interesting to learn here).

We also might want to generally set COMPlus_JitMinimalJitProfiling=0when gathering static PGO data (especially if we aslo force methods to remain in Tier0) so that absolute profile counts are meaningful across methods - eg for computing which methods are globally important or for informing method layout algorithms.

@AndyAyersMS I looked at a few of them and indeed, looks like they are very simple methods without interesting control flow. I think it makes sense to do the same in SPGO and ignore them, depending on whether we need a globally consistent view or not.

davidwrighton · 2021-05-17T17:59:10Z

@jakobbotsch For t Flowgraph, either build approach is good for me. If we need it, the code will be there to reference without trouble.

dotnet-issue-labeler bot added the area-crossgen2-coreclr label May 14, 2021

jakobbotsch commented May 14, 2021

View reviewed changes

davidwrighton approved these changes May 14, 2021

View reviewed changes

AndyAyersMS approved these changes May 14, 2021

View reviewed changes

Small expression and help string fix

04368f5

Move FlowGraph.cs to be more generally available

0b706b1

jakobbotsch merged commit 62df492 into dotnet:main May 17, 2021

jakobbotsch deleted the spgo branch May 17, 2021 21:52

jakobbotsch mentioned this pull request May 18, 2021

Dynamic PGO #43618

Closed

54 tasks

karelz added this to the 6.0.0 milestone May 20, 2021

ghost locked as resolved and limited conversation to collaborators Jun 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SPGO support and MIBC comparison in dotnet-pgo #52765

Add SPGO support and MIBC comparison in dotnet-pgo #52765

jakobbotsch commented May 14, 2021

jakobbotsch May 14, 2021

AndyAyersMS May 14, 2021

jakobbotsch May 17, 2021

jakobbotsch May 17, 2021

davidwrighton left a comment

AndyAyersMS commented May 14, 2021

AndyAyersMS left a comment

AndyAyersMS May 14, 2021

davidwrighton commented May 14, 2021

jakobbotsch commented May 17, 2021

davidwrighton commented May 17, 2021

		case PgoInstrumentationKind.EdgeIntCount:
		case PgoInstrumentationKind.EdgeLongCount:

	if (pSchema[iSchema].InstrumentationKind != PgoInstrumentationKind.BasicBlockIntCount)
	return HRESULT.E_NOTIMPL;

Add SPGO support and MIBC comparison in dotnet-pgo #52765

Add SPGO support and MIBC comparison in dotnet-pgo #52765

Conversation

jakobbotsch commented May 14, 2021

jakobbotsch May 14, 2021

Choose a reason for hiding this comment

AndyAyersMS May 14, 2021

Choose a reason for hiding this comment

jakobbotsch May 17, 2021

Choose a reason for hiding this comment

jakobbotsch May 17, 2021

Choose a reason for hiding this comment

davidwrighton left a comment

Choose a reason for hiding this comment

AndyAyersMS commented May 14, 2021

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS May 14, 2021

Choose a reason for hiding this comment

davidwrighton commented May 14, 2021

jakobbotsch commented May 17, 2021

davidwrighton commented May 17, 2021