Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: optimize redundant type tests created by PGO #46887

Open
Tracked by #74873
AndyAyersMS opened this issue Jan 12, 2021 · 5 comments
Open
Tracked by #74873

JIT: optimize redundant type tests created by PGO #46887

AndyAyersMS opened this issue Jan 12, 2021 · 5 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

It seems likely that many of the type tests introduced by the jit will be redundant, as multiple virtual or interface calls can be made on the same object.

A first step towards optimizing these is the jump threading introduced in #46257, but this should be generalized to handle the case where there is computation in between the calls. In general this computation must be duplicated so in addition to the logistics of properly duplicating the code, some heuristics will be needed to decide when the duplication is worthwhile.

category:cq
theme:profile-feedback
skill-level:expert
cost:medium

@AndyAyersMS AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 12, 2021
@AndyAyersMS AndyAyersMS added this to the 6.0.0 milestone Jan 12, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Jan 12, 2021
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Jan 12, 2021
@AndyAyersMS AndyAyersMS mentioned this issue Jan 12, 2021
54 tasks
@AndyAyersMS
Copy link
Member Author

It might also make sense to try and catch this earlier during fgOptimizeUncondBranchToSimpleCond so that in cases with back to back type tests on the same object the paths are disambiguated before we get to the optimizer.

@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@AndyAyersMS AndyAyersMS removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 30, 2021
@AndyAyersMS AndyAyersMS modified the milestones: 6.0.0, Future Mar 30, 2021
@AndyAyersMS
Copy link
Member Author

#51890 gets many cases of this even earlier (during guarded devirt transform). But likely we'll want enhancements elsewhere as opportunities will arise downstream.

@AndyAyersMS
Copy link
Member Author

#69022 also helps improve jump threading, but we still will bail out when the block has multiple statements.

@AndyAyersMS
Copy link
Member Author

Worked on this a bit.

Mechanically this isn't too bad. We are still trying to avoid creating new blocks. We potentially need to duplicate statements for some subset of preds but can hold costs down by reusing the current block. Looking at potential cost of duplication the bulk of the opportunities happen with duplication costs (as measured by GetCostSz, windows x64 SPMI) of 20 or less:

image

However, we need to figure out how to handle updating SSA. Since we lack a coherent SSA update story, it may be challenging.

If we duplicate uses, we may need to map them to the corresponding phi input ssa nums. That might be tractable, though IIRC we currently do not strongly associate PHI input operands with control flow. If we duplicate defs we may need new SSA numbers and the ability to update the appropriate subset of uses downstream. That seems difficult as we do not track SSA uses for each def.

Here's now things currently work. We insist the jump threaded block have just one side-effect free tree, and we eliminate that tree. So we do not introduce new defs. We do not try and fix the downstream uses. For instance int the AFTER picture below, X's IR should refer to t_0, J should be empty, and Y should refer to t_1.

BEFOREAFTER (current JT)

Furthermore, any downstream uses of t_2 that are reachable from X only should also be updated to t_0; any reachable from Y only should also be updated to t_1, and any reachable from both should inspire an appropriate phi (with subsequent uses likely updated, and possibly more phis, etc).

With this odd state of affairs, if a downstream phase walks the SSA graph it will see extra definitions in some places but will never miss seeing an actual definition. So we have argued (perhaps incorrectly) that this state of affairs is conservatively correct and the only downside of not updating SSA (and related things like VNs) is that downstream phases will miss out on follow-on opportunities.

But if jump threading starts duplicating SSA defs it gets harder to envision what a conservatively correct result looks like. Blind duplication doesn't work as range prop notices the multiple defs.

And if jump threading starts to duplicate SSA uses into X that are defined by a PHI in J, then the uses in X are nonsensical; these should be remapped to whatever use is available at the end of X (which as noted above we may not be able to figure out easily right now).

It may be possible to get some subset of cases by not allowing SSA defs or phis in block J, so all we end up doing is duplicating uses in X that are already available there, but this is likely way too restrictive. I'll take a look, but don't expect to find many cases like this.

To sum up: I'm not sure how to address these problems; seems like we need to tackle some kind of SSA update first.

@AndyAyersMS
Copy link
Member Author

Given the above, I'm going to move this to future.

@AndyAyersMS AndyAyersMS modified the milestones: 7.0.0, Future Jun 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Development

No branches or pull requests

2 participants