Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: have inlining heuristics look for cases where inlining might enable devirtualization #10303

Closed
AndyAyersMS opened this issue May 9, 2018 · 4 comments · Fixed by #52708
Closed
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitUntriaged CLR JIT issues needing additional triage optimization
Milestone

Comments

@AndyAyersMS
Copy link
Member

When a caller argument is an exact type and feeds a virtual or interface call in the callee, we might want to inline more aggressively.

A toy example of this can be found in this BenchmarkDotNet sample. Here Run, if inlined, would allow the interface calls to devirtualize.

Run is currently pretty far from being a viable inline candidate:

Inline candidate callsite is boring.  Multiplier increased to 1.3.
calleeNativeSizeEstimate=545
callsiteNativeSizeEstimate=115
benefit multiplier=1.3
threshold=149
Native estimate for function size exceeds threshold for inlining 54.5 > 14.9 (multiplier = 1.3)

Inline expansion aborted, inline not profitable

INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for
'Jit_InterfaceMethod:Run1():double:this' calling 'Jit_InterfaceMethod:Run(ref):double:this'

Caller knows that the argument to Run is exact:

lvaUpdateClass: Updating class for V01 (00007FF966344CD0) Foo1 to be exact

               [000029] ------------              *  STMT      void  (IL 0x00C...  ???)
               [000027] I-C-G-------              \--*  CALL      double Jit_InterfaceMethod.Run
               [000025] ------------ this in rcx     +--*  LCL_VAR   ref    V00 this         
               [000026] ------------ arg1            \--*  LCL_VAR   ref    V01 loc0         

Likely we would not give a ~3.65x boost to inlining benefit based on one argument reaching one call site. But if we also realized the call site was in a loop perhaps the net effect would be enough to justify an inline.

Currently we don't know when observing arg uses whether that use is in a loop or not. But if we were to associate uses with callee IL offsets we could circle back after finding all the branch targets and develop a crude estimator for loop depth, then sum up the weighted observations.

It would also be nice to tabulate a few more opportunities of this kind. The basic observational part change is simple enough to prototype that perhaps just building it is one way to make forward progress.

category:cq
theme:inlining
skill-level:intermediate
cost:medium

@AndyAyersMS
Copy link
Member Author

Took a look at this and was reminded that the jit's early modelling of prospective inlinees is quite crude.

In a case like this we have A -> B ->C, and we're trying to decide if inlining B into A will allow us to devirtualize the call from B to C.

So we need to know:

  • Is the call from B to C a virtual call? We can easily see if it has a CALLVIRT opcode, but that's not sufficient
    • So we'll need to resolve the token at the call site, and query some properties: is the method virtual, and what is it arity? This is potentially costly.
  • If it is virtual and has N args, we then need to know what expression in B is the Nth one on the eval stack. Currently the stack model for B is crude and only tracks 2 levels and ignores pops. So we'd likely want it to run a bit deeper; likely 4 or so would be sufficient as anything that takes more args is likely too big to inline by default anyways. We can ignore pops for now since this is a heuristic and it's ok to be wrong sometimes (as long as it's not too often)
  • Map the expression back into A and see if we know the type is exact. We know the expression in A and we can just track the simple case where that expression reaches the call site (eg is an argument reference).

To try and reduce the token resolution cost we could instead first see if any of the tracked stack locations is an exactly known ref type; if not we could skip the lookup. If so we could do the lookup and verify we're making a virtual/interface call AND that the arity is within our tracked range AND that the object pointer in the call is one of the exactly known ref types (AND perhaps that the type has interesting devirtualizations, eg we may want to punt on string).

@redknightlois
Copy link

@AndyAyersMS I have been thinking about that problem too (not at the JIT level). There are specific cases where this can be handled is a different way (probably deserves its own issue), but let me elaborate in case there is some fundamental issue involved that I am not aware.

In cases of very hot paths, a call may be done already knowing the type, especially when they are candidates for inlining but fail because Inlining is actually not profitable. That doesn't prevent us to be able to devirtualize it. We could explicitly mark the method to be opt-in "specializable" somehow (attribute, Jit.Specialize() or something). In that way, JIT will emit as many versions of the method type parameters as she sees being called upon. If the type of the variable is known to be of a specific type even if parameter is an Interface type, it will generate a specialization and call it. It could probably create code bloating if not used properly but that is not a size fits all case.

The interesting outcome is that feature would bring some of the 'struct' specialization optimizations into reference land which today is a no-no. I have the feeling that there is an 'underlying' limitation at JIT time, but probably it can be done.

@AndyAyersMS
Copy link
Member Author

Pointers to specific examples might be helpful, as I can imagine a number of things that might map onto your suggestions.

There is an issue #9682 open to consider generating specialized versions of generics for particular reference types. I haven't looked at it in depth to determine what all would be involved. Most of the work for it would be outside the jit. And as you note any solution must somehow come to grips with the potential for code bloat.

From past experience I never found compiler-driven code specialization by cloning methods to be particularly useful -- the costs were high, the benefits modest, and inlining seemed like the superior way to benefit from specialization.

But that was before the days of widespread metaprogramming. So perhaps I ought to reconsider.

@redknightlois
Copy link

redknightlois commented May 26, 2018

@AndyAyersMS That is a case, but I have plenty of those things but at a higher level of abstraction where evicting the interface at the class level would have a deep architectural impact. My latest Generalized Allocators work has to do pretty nasty stuff to be able to do everything struct based, because of the usage pattern a class level #9682 would simplify it a lot. I dont care about code bloating in those cases because I am making an absurd effort to ensure that codepaths are created for each type of allocator in a custom way.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@AndyAyersMS AndyAyersMS mentioned this issue Oct 19, 2020
54 tasks
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 21, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 1, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions JitUntriaged CLR JIT issues needing additional triage optimization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants