Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: split up some parts of flowgraph.cpp #47072

Merged
merged 8 commits into from
Jan 26, 2021

Conversation

AndyAyersMS
Copy link
Member

Move code into files organized by purpose:

  • ehopts - eh optimizations
  • fgcheck - flow graph and IR checks
  • fgdump - flow graph dumping
  • profile - profile instrumentation and reading

This reduces the size of flowgraph.cpp by about 25%, but it is still over 20K lines.

More splitting is potentially warranted, eg splitting out analyses, optimizations,
inlining support, etc.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 16, 2021
@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib ... I could take this further.

@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Jan 16, 2021
@JulieLeeMSFT
Copy link
Member

Great to get 25% code size reduction!

@sandreenko
Copy link
Contributor

@AndyAyersMS do you have any insights if any of these files can be transformed into separate classes easily?

I could take this further.

I would appreciate this.

contributes to #13359

@AndyAyersMS
Copy link
Member Author

you have any insights if any of these files can be transformed into separate classes easily

I suspect the ones that are phase-like (EH opts) should not be too hard to transform, but if they still densely interact with the compiler instance then I'm not sure much is gained.

We should think more about what we'd hope to achieve by creating these kinds of classes, beyond having them in a separate namespace of sorts.

@AndyAyersMS
Copy link
Member Author

Here's one attempt at logically grouping the remainder of the methods. I suspect we may want to combine some of these.

I don't know how many LOC are in each group yet; will try to add that subsequently.

analysis

fgDominate
fgReachable
fgComputeReachabilitySets
fgUpdateChangedFlowGraph
fgComputeEnterBlocksSet
fgComputeReachability
fgDfsInvPostOrder
fgDomFindStartNodes
fgDfsInvPostOrderHelper
fgComputeDoms
fgBuildDomTree
fgNumberDomTree
fgIntersectDom
fgGetDominatorSet
fgInitBlockVarSets
fgGetCodeEstimate
fgMeasureIR
fgCompDominatedByExceptionalEntryBlocks

optimization

fgRemoveUnreachableBlocks
fgRemoveEmptyBlocks
fgCanCompactBlocks
fgCompactBlocks
fgUpdateLoopsAfterCompacting
fgUnreachableBlock
fgRemoveConditionalJump

fgOptimizeBranchToEmptyUnconditional
fgOptimizeEmptyBlock
fgOptimizeSwitchBranches
fgBlockEndFavorsTailDuplication
fgBlockIsGoodTailDuplicationCandidate
fgOptimizeUncondBranchToSimpleCond
fgOptimizeBranchToNext
fgOptimizeBranch
fgOptimizeSwitchJumps
fgReorderBlocks

fgUpdateFlowGraph

statements

fgInsertStmtAtBeg
fgNewStmtAtBeg
fgInsertStmtAtEnd
fgNewStmtAtEnd
fgInsertStmtNearEnd
fgNewStmtNearEnd
fgInsertStmtAfter
fgInsertStmtBefore
fgInsertStmtListAfter

fgNewStmtFromTree

fgRemoveStmt
fgCheckRemoveStmt

maintenance

fgEnsureFirstBBisScratch
fgFirstBBisScratch
fgBBisScratch

fgChangeSwitchBlock
fgReplaceSwitchJumpTarget
fgReplaceJumpTarget
fgReplacePred

fgSplitBlockAtEnd
fgSplitBlockAfterStatement
fgSplitBlockAfterNode
fgSplitBlockAtBeginning
fgSplitEdge

fgUnlinkBlock
fgUnlinkRange
fgRemoveBlock

fgConnectFallThrough

fgRenumberBlocks

fgEhAllowsMoveBlock
fgMoveBlocksAfter

fgRelocateEHRange

fgNewBBbefore
fgNewBBafter
fgInsertBBbefore
fgInsertBBafter

fgFindInsertPoint

fgNewBBinRegion
fgNewBBinRegionWorker

pred lists

fgGetPredForBlock
fgSpliceOutPred
fgAddRefPred
fgRemoveRefPred
fgRemoveAllRefPreds
fgRemoveAllRefPreds
fgRemoveBlockAsPred

fgComputeCheapPreds
fgAddCheapPred
fgRemoveCheapPred
fgRemovePreds
fgComputePreds

gc polls

blockNeedsGCPoll
fgInsertGCPolls
fgCreateGCPoll

construction

fgInit
fgNewBasicBlock
fgInitBBLookup
fgLookupBB
fgFindJumpTargets (also inlining)
fgAdjustForAddressExposedOrWrittenThis
fgMarkBackwardJump
fgLinkBasicBlocks
fgMakeBasicBlocks
fgFindBasicBlocks
fgCheckBasicBlockControlFlow
fgControlFlowPermitted
fgFlowToFirstBlockOfInnerTry

importing

fgImport

inlining

class FgStack
fgObserveInlineConstants
fgCheckInlineDepthAndRecursion
fgInline
fgFindNonInlineCandidate
fgNoteNonInlineCandidate
fgAssignStructInlineeToVar
fgAttachStructInlineeToAsg
fgUpdateInlineReturnExpressionPlaceHolder
fgLateDevirtualization
fgDebugCheckInlineCandidates
fgInvokeInlineeCompiler
fgInsertInlineeBlocks
fgInlinePrependStatements
fgInlineAppendStatements
fgNeedReturnSpillTemp

hot/cold, funclets

fgInDifferentRegions
fgIsBlockCold
fgLastBBInMainFunction
fgEndBBAfterMainFunction

fgInsertFuncletPrologBlock
fgCreateFuncletPrologBlocks
fgCreateFunclets

fgDetermineFirstColdBlock

late exception flow

acdHelper
fgAddCodeRef
fgFindExcptnTarget
fgRngChkTarget

misc

fgCanSwitchToOptimized
fgSwitchToOptimized
fgMayExplicitTailCall

fgBlockContainsStatementBounded

fgNSuccsOfFinallyRet
fgSuccOfFinallyRet
fgSuccOfFinallyRetWork
GetDescriptorForSwitch
SwitchUniqueSuccSet::UpdateTarget
fgInvalidateSwitchDescMapEntry
UpdateSwitchTableTarget
fgFirstBlockOfHandler
fgGetNestingLevel
fgFindBlockILOffset

fgIsForwardBranch

fgMightHaveLoop

fgIsBetterFallThrough

fgUseThrowHelperBlocks

eh

fgClearFinallyTargetBit
fgIsIntraHandlerPred
fgAnyIntraHandlerPreds
fgRelocateEHRegions

fgExtendEHRegionBefore
fgExtendEHRegionAfter

fgCheckEHCanInsertAfterBlock

edge weights

setEdgeWeightMinChecked
setEdgeWeightMaxChecked
setEdgeWeights

fgPrintEdgeWeights (should go in fgdump)

sync methods, pinvoke

fgGetCritSectOfStaticMethod
fgAddSyncMethodEnterExit
fgCreateMonitorTree
fgConvertSyncReturnToLeave
fgAddInternal
fgAddReversePInvokeEnterExit

phases

fgExpandRarelyRunBlocks
fgFindOperOrder
fgSimpleLowering
fgSetBlockOrder
fgSetBlockOrder

return merging

fgMoreThanOneReturnBlock
class MergedReturns

loops

fgLoopCallTest
fgLoopCallMark
fgMarkLoopHead

things that should be gentree

fgIsThrow
fgIsCommaThrow
fgIsIndirOfAddrOfLocal
fgAddrCouldBeNull
fgCastNeeded
fgDoNormalizeOnStore
OperIsControlFlow
fgSetTreeSeq
fgSetTreeSeqHelper
fgSetTreeSeqFinish
fgSetStmtSeq
fgGetFirstNode
fgGetStructAsStructPtr

fgChkLocAllocCB
fgChkQmarkCB

fgCheckCallArgUpdate

things that should be in lclvars

fgLclFldAssign

things that should be importer

fgGetStaticsCCtorHelper
fgGetSharedCCtor

fgOptimizeDelegateConstructor (?)

@kunalspathak
Copy link
Member

For easier reviewing, could you confirm if you have just moved the code into new files or were there any edits in them?

@BruceForstall
Copy link
Member

I like the idea of breaking up flowgraph.cpp further if it can be logically done (the initial effort years ago to remove fgMorph* from flowgraph was a simple start).

However, I think we need to do it "all at once" (or nearly so), and soon (very early in the product cycle). It breaks function history, which is pretty annoying, but probably worth it if the result it easier to work with.

It should be very simple to find something you're looking for, so breaking things up "too much" can also be counter-productive.

@sandreenko
Copy link
Contributor

I suspect the ones that are phase-like (EH opts) should not be too hard to transform, but if they still densely interact with the compiler instance then I'm not sure much is gained.

We should think more about what we'd hope to achieve by creating these kinds of classes, beyond having them in a separate namespace of sorts

With these classes, when they are not added as compiler friends, it is easier to control their individual responsibilities and avoid situations that happened with morph that is called from all different places without any control.

@BruceForstall
Copy link
Member

I like a lot of the classification you list above.

It makes sense to me to break into separate files:

  • analysis
  • optimization, gc polls, hot/cold, late exception flow, sync methods, pinvoke, phases, return merging, loops
  • "general manipulation": statements, maintenance, pred lists, construction, misc, edge weights
  • importing (not already part of importer.cpp?)
  • inlining
  • funclets, eh (in pre-existing jiteh.cpp) (although some functions listed, like fgExtendEHRegionBefore, fgExtendEHRegionAfter are general "manipulation" functions that should live in a general "manipulation" file)

@BruceForstall
Copy link
Member

btw, I like the work done so far in this PR. The only question is whether to do more function extraction from flowgraph.cpp, or leave it as currently proposed.

@AndyAyersMS
Copy link
Member Author

The notes over in #13359 suggest we should try to keep source files under 400K of text, so perhaps further splitting makes sense (flowgraph.cpp went from ~980K -> 750K with the above).

Maybe 2-4 more spinoff files would be needed to accomplish this? So some sort of grouping of the above.

I'll create a tool to estimate and split up the code and come back with an updated proposal.

@AndyAyersMS
Copy link
Member Author

Ok, tool is more or less working. Detailed analysis here: https://gist.github.com/AndyAyersMS/5aee6165808e123f1c97f19f30858837

Summary

[Construction]: 15 items,  2952 lines
[Profiling]: 13 items,  1622 lines
[Maintenance]: 27 items,  2460 lines
[Statements]: 16 items,  508 lines
[Predecessors]: 13 items,  936 lines
[Analysis]: 19 items,  1112 lines
[Optimization]: 18 items,  4263 lines
[Successors]: 7 items,  288 lines
[Queries]: 8 items,  169 lines
[GCPolls]: 3 items,  482 lines
[Inlining]: 16 items,  2161 lines
[Phases]: 6 items,  745 lines
[HotColdFunclet]: 8 items,  534 lines
[Loops]: 3 items,  178 lines
[SyncPinvoke]: 6 items,  860 lines
[Returns]: 2 items,  534 lines
[EH]: 7 items,  560 lines
[Dump]: 18 items,  1494 lines
[Check]: 18 items,  1508 lines
[LateEHFlow]: 4 items,  280 lines
[Lclvar]: 1 items,  9 lines
[EHOpt]: 13 items,  2121 lines

------ things that belong elsewhere  ---?

[Gentree]: 17 items,  1007 lines
[Importer]: 3 items,  340 lines
[Compiler]: 3 items,  99 lines

@AndyAyersMS
Copy link
Member Author

Here's one take on how to split the above. It would create 7 new files, none bigger than 5K lines or so. I would not move the gentree-related stuff since gentree is itself already 19K lines and likely should also be split up.

We can iterate on this; I'll use the tool to do the actual splitting. Will post an updated PR with tool generated splits shortly.

Feel free to suggest alternate groupings, better names, etc.

----------- fgBasic.cpp (new)

[Construction]: 15 items,  2952 lines
[Maintenance]: 27 items,  2460 lines

------------ fgOpt.cpp (new)

[Analysis]: 19 items,  1112 lines
[Optimization]: 18 items,  4263 lines

----------- fgInfo.cpp (new)

[Statements]: 16 items,  508 lines
[Predecessors]: 13 items,  936 lines
[Successors]: 7 items,  239 lines
[Queries]: 8 items,  228 lines
[Gentree]: 17 items,  1007 lines

----------- fgInline.cpp (new)

[Inlining]: 16 items,  2161 lines

----------- fgProfile.cpp (new)

[Profiling]: 13 items,  1622 lines

----------- flowgraph.cpp

[Compiler]: 3 items,  99 lines
[GCPolls]: 3 items,  482 lines
[Phases]: 6 items,  745 lines
[HotColdFunclet]: 8 items,  534 lines
[Importer]: 3 items,  340 lines
[Loops]: 3 items,  178 lines
[SyncPinvoke]: 6 items,  860 lines
[Returns]: 2 items,  534 lines
[LateEHFlow]: 4 items,  280 lines
[Lclvar]: 1 items,  9 lines

------------ jiteh.cpp (already 4K lines)

[EH]: 7 items,  560 lines

------------ jitehOpt.cpp (new)

[EHOpt]: 13 items,  2121 lines

------------- fgCheckDump.cpp (new)

[Dump]: 18 items,  1494 lines
[Check]: 18 items,  1508 lines

@AndyAyersMS
Copy link
Member Author

Hmm, looks like there is a bit more coupling around the pushedStack class than I'd like, and so parts of the inlining support have to be bundled with fgBasic for now, around 200 lines or so....

@AndyAyersMS
Copy link
Member Author

File sizes with second iteration:

01/22/2021  08:54 AM           204,870 fgBasic.cpp
01/22/2021  08:50 AM           100,421 fgCheckDump.cpp
01/22/2021  08:49 AM            78,894 fgEhOpt.cpp
01/22/2021  08:49 AM            94,226 fgInfo.cpp
01/22/2021  08:49 AM            77,816 fgInline.cpp
01/22/2021  08:49 AM           193,702 fgOpt.cpp
01/22/2021  08:49 AM            60,726 fgProfile.cpp
01/22/2021  08:49 AM           148,600 flowgraph.cpp
01/21/2021  06:51 PM           183,668 jiteh.cpp

@AndyAyersMS
Copy link
Member Author

Also note all the methods are in the same order in their respective files as they were before; I could (with some extra work) group methods by category, but would have to update how my rewriting tool tracks ifdef state.

Gist updated with new classification, and a copy of the splitting program.

@sandreenko
Copy link
Contributor

Also note all the methods are in the same order in their respective files as they were before; I could (with some extra work) group methods by category, but would have to update how my rewriting tool tracks ifdef state.
Gist updated with new classification, and a copy of the splitting program.

Should it be "if (text[j].StartsWith("#if"))" to track both "if defined" and "ifdef"?

In general, I like the current splitting.

@AndyAyersMS
Copy link
Member Author

Should it be "if (text[j].StartsWith("#if"))" to track both "if defined" and "ifdef"?

Could be, yes ... at this point the automated split is close enough that I'm doing manual repairs.

@AndyAyersMS
Copy link
Member Author

Linker test failures are quite likely unrelated.

Think this is more or less ready for review. cc @dotnet/jit-contrib

@BruceForstall
Copy link
Member

Misc. comments:

  • filenames need to be all lower case. Mixing case is just asking for trouble/confusion/pain
  • Now "flowgraph" is kind of a grab-bag. I guess that's ok.
  • Do fginfo and fgbasic need to be separate? And there are some function in "info" that modify, not just query, as the name might imply
  • "fgcheckdump" seems like an awkward name.. maybe? Maybe fgdebug? doesn't really have the right implication perhaps.

It's a little hard to review something like this. I really like the split. We'll probably realize more improvements that could be made after living with it.

@AndyAyersMS
Copy link
Member Author

For file names, how about:

-  fgBasic.cpp
-  fgCheckDump.cpp
-  fgEhOpt.cpp
-  fgInfo.cpp
-  fgInline.cpp
-  fgOpt.cpp
-  fgProfile.cpp
+  fgbasic.cpp
+  fgdiagnostic.cpp
+  fgehopt.cpp
+  fginfo.cpp
+  fginline.cpp
+  fgopt.cpp
+  fgprofile.cpp

Do fginfo and fgbasic need to be separate?

I think some sort of split here is a good idea; otherwise the merged file is ~300K bytes. If there are specific "info" methods you don't think belong, feel free to point them out.

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib ping

@BruceForstall
Copy link
Member

If there are specific "info" methods you don't think belong, feel free to point them out.

What is your "mental model" for what goes in basic vs. info?

By the name, I would assume basic includes functions that construct and modify the flow graph, including preds. But "info" contains a bunch of fgInsert* and fgNew* functions, creation of cheap preds and normal preds. Some things that seem like "info" would be fgGetNestingLevel, fgIsThrow, fgIsIndirOfAddrOfLocal, fgAddrCouldBeNull (seems like this should be on GenTree), fgIsBetterFallThrough.

There aren't many "pure" functions, which is what I was thinking "info" would contain.

@AndyAyersMS
Copy link
Member Author

Currently info contains these categories:

            case Category.Statements:
            case Category.Predecessors:
            case Category.Successors:
            case Category.Queries:
            case Category.Gentree:
                return Files.Info;

I see your point; arguably only Queries qualifies as info, but I also don't think all this should be lumped into basic.

Perhaps there's some other more fitting name for this group?

@BruceForstall
Copy link
Member

I would put Predecessors/Successors/Queries in basic, and leave Statements/Gentree in flowgraph.cpp.

Or, put Predecessors/Successors in a new "fgflow.cpp", Queries in basic.cpp, and Statements/Gentree in flowgraph.cpp. (Maybe that's just punting the flowgraph.cpp because it's hard to think of a sufficiently large subset with tight cohesion?). I suppose there could be fgstmt.cpp -- if ~500 lines is "big enough" to warrant its own file.

I think there's a risk of "too much" splitting unless there is strong cohesion to the split pieces.

Comments?

@AndyAyersMS
Copy link
Member Author

I don't mind splitting more, which is why I orginally had the finer-grained categories. So how about:

Pred/Succ   -> fgflow
Queries     -> fgbasic
Statements  -> fgstmt
Gentree     -> flowgraph

@BruceForstall
Copy link
Member

That looks good to me.

@AndyAyersMS
Copy link
Member Author

That would give us

01/25/2021  12:48 PM           213,136 fgbasic.cpp
01/25/2021  12:48 PM           100,463 fgdiagnostic.cpp
01/25/2021  12:48 PM            78,894 fgehopt.cpp
01/25/2021  12:48 PM            40,692 fgflow.cpp
01/25/2021  12:48 PM            77,816 fginline.cpp
01/25/2021  12:48 PM           193,702 fgopt.cpp
01/25/2021  12:48 PM            60,726 fgprofile.cpp
01/25/2021  12:48 PM            16,880 fgstmt.cpp
01/25/2021  12:48 PM           177,938 flowgraph.cpp
01/25/2021  12:48 PM            23,015 jiteh.cpp (bytes to be added to current jiteh.cpp, not full size)

@AndyAyersMS
Copy link
Member Author

Had to move helper method inline bool OperIsControlFlow(genTreeOps oper) into fgstmt, but otherwise as above.

@AndyAyersMS
Copy link
Member Author

Need to fix an ifdef...

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib here's one last chance to weigh in on this ...

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@AndyAyersMS AndyAyersMS merged commit a1f137e into dotnet:master Jan 26, 2021
@AndyAyersMS AndyAyersMS mentioned this pull request Jan 28, 2021
54 tasks
@ghost ghost locked as resolved and limited conversation to collaborators Feb 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants