JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303

AndyAyersMS · 2018-05-09T22:06:21Z

When a caller argument is an exact type and feeds a virtual or interface call in the callee, we might want to inline more aggressively.

A toy example of this can be found in this BenchmarkDotNet sample. Here Run, if inlined, would allow the interface calls to devirtualize.

Run is currently pretty far from being a viable inline candidate:

Inline candidate callsite is boring.  Multiplier increased to 1.3.
calleeNativeSizeEstimate=545
callsiteNativeSizeEstimate=115
benefit multiplier=1.3
threshold=149
Native estimate for function size exceeds threshold for inlining 54.5 > 14.9 (multiplier = 1.3)

Inline expansion aborted, inline not profitable

INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for
'Jit_InterfaceMethod:Run1():double:this' calling 'Jit_InterfaceMethod:Run(ref):double:this'

Caller knows that the argument to Run is exact:

lvaUpdateClass: Updating class for V01 (00007FF966344CD0) Foo1 to be exact

               [000029] ------------              *  STMT      void  (IL 0x00C...  ???)
               [000027] I-C-G-------              \--*  CALL      double Jit_InterfaceMethod.Run
               [000025] ------------ this in rcx     +--*  LCL_VAR   ref    V00 this         
               [000026] ------------ arg1            \--*  LCL_VAR   ref    V01 loc0

Likely we would not give a ~3.65x boost to inlining benefit based on one argument reaching one call site. But if we also realized the call site was in a loop perhaps the net effect would be enough to justify an inline.

Currently we don't know when observing arg uses whether that use is in a loop or not. But if we were to associate uses with callee IL offsets we could circle back after finding all the branch targets and develop a crude estimator for loop depth, then sum up the weighted observations.

It would also be nice to tabulate a few more opportunities of this kind. The basic observational part change is simple enough to prototype that perhaps just building it is one way to make forward progress.

category:cq
theme:inlining
skill-level:intermediate
cost:medium

The text was updated successfully, but these errors were encountered:

AndyAyersMS · 2018-05-10T01:27:41Z

Took a look at this and was reminded that the jit's early modelling of prospective inlinees is quite crude.

In a case like this we have A -> B ->C, and we're trying to decide if inlining B into A will allow us to devirtualize the call from B to C.

So we need to know:

Is the call from B to C a virtual call? We can easily see if it has a CALLVIRT opcode, but that's not sufficient
- So we'll need to resolve the token at the call site, and query some properties: is the method virtual, and what is it arity? This is potentially costly.
If it is virtual and has N args, we then need to know what expression in B is the Nth one on the eval stack. Currently the stack model for B is crude and only tracks 2 levels and ignores pops. So we'd likely want it to run a bit deeper; likely 4 or so would be sufficient as anything that takes more args is likely too big to inline by default anyways. We can ignore pops for now since this is a heuristic and it's ok to be wrong sometimes (as long as it's not too often)
Map the expression back into A and see if we know the type is exact. We know the expression in A and we can just track the simple case where that expression reaches the call site (eg is an argument reference).

To try and reduce the token resolution cost we could instead first see if any of the tracked stack locations is an exactly known ref type; if not we could skip the lookup. If so we could do the lookup and verify we're making a virtual/interface call AND that the arity is within our tracked range AND that the object pointer in the call is one of the exactly known ref types (AND perhaps that the type has interesting devirtualizations, eg we may want to punt on string).

redknightlois · 2018-05-24T12:32:29Z

@AndyAyersMS I have been thinking about that problem too (not at the JIT level). There are specific cases where this can be handled is a different way (probably deserves its own issue), but let me elaborate in case there is some fundamental issue involved that I am not aware.

In cases of very hot paths, a call may be done already knowing the type, especially when they are candidates for inlining but fail because Inlining is actually not profitable. That doesn't prevent us to be able to devirtualize it. We could explicitly mark the method to be opt-in "specializable" somehow (attribute, Jit.Specialize() or something). In that way, JIT will emit as many versions of the method type parameters as she sees being called upon. If the type of the variable is known to be of a specific type even if parameter is an Interface type, it will generate a specialization and call it. It could probably create code bloating if not used properly but that is not a size fits all case.

The interesting outcome is that feature would bring some of the 'struct' specialization optimizations into reference land which today is a no-no. I have the feeling that there is an 'underlying' limitation at JIT time, but probably it can be done.

AndyAyersMS · 2018-05-25T03:43:58Z

Pointers to specific examples might be helpful, as I can imagine a number of things that might map onto your suggestions.

There is an issue #9682 open to consider generating specialized versions of generics for particular reference types. I haven't looked at it in depth to determine what all would be involved. Most of the work for it would be outside the jit. And as you note any solution must somehow come to grips with the potential for code bloat.

From past experience I never found compiler-driven code specialization by cloning methods to be particularly useful -- the costs were high, the benefits modest, and inlining seemed like the superior way to benefit from specialization.

But that was before the days of widespread metaprogramming. So perhaps I ought to reconsider.

redknightlois · 2018-05-26T02:16:49Z

@AndyAyersMS That is a case, but I have plenty of those things but at a higher level of abstraction where evicting the interface at the class level would have a deep architectural impact. My latest Generalized Allocators work has to do pretty nasty stuff to be able to do everything struct based, because of the usage pattern a class level #9682 would simplify it a lot. I dont care about code bloating in those cases because I am making an absurd effort to ensure that codepaths are created for each type of allocator in a custom way.

msftgits transferred this issue from dotnet/coreclr Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

AndyAyersMS mentioned this issue Oct 19, 2020

Dynamic PGO #43618

Closed

54 tasks

BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020

EgorBo mentioned this issue Jun 21, 2021

[JIT] Improve inliner: new heuristics, rely on PGO data #52708

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 21, 2021

EgorBo closed this as completed in #52708 Jul 1, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 1, 2021

ghost locked as resolved and limited conversation to collaborators Jul 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303

JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303

AndyAyersMS commented May 9, 2018

AndyAyersMS commented May 10, 2018

redknightlois commented May 24, 2018

AndyAyersMS commented May 25, 2018

redknightlois commented May 26, 2018 •

edited

Loading

JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303

JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303

Comments

AndyAyersMS commented May 9, 2018

AndyAyersMS commented May 10, 2018

redknightlois commented May 24, 2018

AndyAyersMS commented May 25, 2018

redknightlois commented May 26, 2018 • edited Loading

redknightlois commented May 26, 2018 •

edited

Loading