Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] JIT: Testing field-wise analysis #105162

Closed
wants to merge 72 commits into from

Conversation

hez2010
Copy link
Contributor

@hez2010 hez2010 commented Jul 19, 2024

Built on-top-of the array analysis. I have been trying to make the analysis conservative, still open a PR to check if anything is broken. I expect many AVs can happen in the current stage.

/cc: @AndyAyersMS

hez2010 and others added 30 commits July 15, 2024 20:23
These flags are strictly optimizations. Having them required to be set
for certain indirs based on context of the operand introduces IR
invariants that are annoying to work with since it suddenly becomes
necessary for all transformations to consider "did we just introduce
this IR shape?". Instead, handle these patterns during morph as the
optimization it is and remove the strict flags checking from
`fgDebugCheckFlags`.
@hez2010
Copy link
Contributor Author

hez2010 commented Jul 20, 2024

The current field-wise analysis is too conservative and is making a lot of false negatives where it stack-allocates less than before the field-wise analysis was added. The analysis needs more work to unblock more cases.

@hez2010
Copy link
Contributor Author

hez2010 commented Jul 21, 2024

Added back the array for testing as it needs INDEX_ADDR to work correctly.
Also altered the delegate support a bit.

@hez2010
Copy link
Contributor Author

hez2010 commented Jul 21, 2024

Closing as the analysis still need more work.

@AndyAyersMS
Copy link
Member

Can you summarize what you tried and what roadblocks you ran into?

@hez2010
Copy link
Contributor Author

hez2010 commented Jul 22, 2024

The analysis in my prototype tried to recognize the following cases:

Case 1:

* STOREIND/STORE_BLK
+--* GT_FIELD_ADDR
|  \--* AnyLocal VTarget
\--* AnyLocal VSource

Then it extracts the VTarget and see if it is the this pointer.

  • If yes, it marks the VSource as escaping.
  • If not, it adds an edge to the connection graph g[target] = source, and continue to check the parent tree.

Case 2:

* STOREIND/STORE_BLK
+--* AnyLocal VTarget
\--* AnyLocal VSource

In this case, it extracts the VTarget and see if it is the this pointer or a byref parameter.

  • If yes, it marks the VSource as escaping.
  • If not, it adds an edge to the connection graph g[target] = source, and continue to check the parent tree.

What's more, now a local may escape through BLK and IND as we start to analyze address, so make it keep check the parent if we see BLK or IND.

However, actually this cannot handle the address correctly, as we may dup a local to another local directly lie:

* LCL_STORE
+--* LCL_VAR V02
\--* LCL_VAR V01

Consider this code:

var foo = new Foo(new Bar());
return foo.B;

class Foo
{
    public Bar B { get; set; }
    public Foo(Bar? b)
    {
        B = b ?? new Bar();
    }
}
class Bar { }

which produces the following GenTree:

***** BB01 [0000]
STMT00000 ( 0x000[E-] ... 0x00A )
               [000002] DA---------                         *  STORE_LCL_VAR ref    V02 tmp1
               [000001] -----------                         \--*  ALLOCOBJ  ref
               [000000] H----------                            \--*  CNS_INT(h) long   0x7ffea4f2dc20 class Program+Bar

***** BB01 [0000]
STMT00002 ( 0x005[--] ... ??? )
               [000008] DA---------                         *  STORE_LCL_VAR ref    V03 tmp2
               [000007] -----------                         \--*  ALLOCOBJ  ref
               [000006] H----------                            \--*  CNS_INT(h) long   0x7ffea4f610f8 class Program+Foo

------------ BB02 [0003] [000..000) -> BB04(0.5),BB03(0.5) (cond), preds={BB01} succs={BB03,BB04}

***** BB02 [0003]
STMT00010 ( INL03 @ ??? ... ??? ) <- INLRT @ ???
               [000026] DA---------                         *  STORE_LCL_VAR ref    V04 tmp3
               [000021] -----------                         \--*  LCL_VAR   ref    V03 tmp2

***** BB02 [0003]
STMT00011 ( INL03 @ ??? ... ??? ) <- INLRT @ ???
               [000028] DA---------                         *  STORE_LCL_VAR ref    V05 tmp4
               [000005] -----------                         \--*  LCL_VAR   ref    V02 tmp1

***** BB02 [0003]
STMT00009 ( INL03 @ 0x006[E-] ... ??? ) <- INLRT @ ???
               [000025] -----------                         *  JTRUE     void
               [000024] -----------                         \--*  NE        int
               [000022] -----------                            +--*  LCL_VAR   ref    V02 tmp1
               [000023] -----------                            \--*  CNS_INT   ref    null

------------ BB03 [0004] [000..000) -> BB04(1) (always), preds={BB02} succs={BB04}

***** BB03 [0004]
STMT00013 ( INL03 @ ??? ... ??? ) <- INLRT @ ???
               [000037] DA---------                         *  STORE_LCL_VAR ref    V06 tmp5
               [000036] -----------                         \--*  ALLOCOBJ  ref
               [000035] H----------                            \--*  CNS_INT(h) long   0x7ffea4f2dc20 class Program+Bar

***** BB03 [0004]
STMT00015 ( INL03 @ ??? ... ??? ) <- INLRT @ ???
               [000043] DA---------                         *  STORE_LCL_VAR ref    V05 tmp4
               [000040] -----------                         \--*  LCL_VAR   ref    V06 tmp5

------------ BB04 [0005] [000..000) -> BB05(1) (always), preds={BB02,BB03} succs={BB05}

***** BB04 [0005]
STMT00017 ( INL07 @ 0x000[E-] ... ??? ) <- INL03 @ ??? <- INLRT @ ???
               [000051] hA-XG------                         *  STOREIND  ref
               [000050] ---X-------                         +--*  FIELD_ADDR byref  Program+Foo:<B>k__BackingField
               [000032] -----------                         |  \--*  LCL_VAR   ref    V04 tmp3
               [000033] -----------                         \--*  LCL_VAR   ref    V05 tmp4

------------ BB05 [0006] [00A..012) (return), preds={BB04} succs={}

***** BB05 [0006]
STMT00004 ( 0x00A[--] ... ??? )
               [000012] DA---------                         *  STORE_LCL_VAR ref    V00 loc0
               [000011] -----------                         \--*  LCL_VAR   ref    V03 tmp2

***** BB05 [0006]
STMT00006 ( 0x00B[E-] ... ??? )
               [000016] --C--------                         *  RETURN    ref
               [000054] n--XG------                         \--*  IND       ref
               [000053] ---X-------                            \--*  FIELD_ADDR byref  Program+Foo:<B>k__BackingField
               [000013] -----------                               \--*  LCL_VAR   ref    V00 loc0

We added an edge from V02 to V05, from V06 to V05 and V03 to V04, and due to the fact the V05 is assigned to a field of V04, we added an edge from V05 to V04.

In the end the V03 escapes through V00 (this), but we failed to track it (because in this case we didn't consider V04 to escape), which results in incorrect analysis and stack allocate all those locals.

To address this, I conservatively add a bidirectional edge from both source to target and target to source when we see LCL_STORE. This covers most cases we were ignored, but it's too conservative and results in less objects to be able to stack allocate than before. To make it less conservative, I run another pass after we build the whole graph, that is, to make the edge bidirectional only for locals which have been stored into via indirect access before (I used m_IndirectRefStoredPointers to track this). This gives us the additional information like this:

V04 may point to objects that V03 points to
V04 may point to objects that V05 points to

While the current analysis still has correctness issue. Actually, the analysis needs to follow the Anderson's algorithm, and we need to track it following the below 4 rules:

Here introducing a PointsTo(x) function which indicates all nodes pointing to x.

  • x = &y: $y \in \text{PointsTo}(x)$
  • x = y: $\text{PointsTo}(y) \subseteq \text{PointsTo}(x)$
  • x = *y: $\forall a \in \text{PointsTo}(y). \text{PointsTo}(a) \subseteq \text{PointsTo}(x)$
  • *x = y: $\forall a \in \text{PointsTo}(x). \text{PointsTo}(y) \subseteq \text{PointsTo}(a)$

So apparently the prototype I built is both too conservative and incomplete (we need to treat LCL_VAR and LCL_ADDR differently). While doing an operation like $\forall a \in \text{PointsTo}(y). \text{PointsTo}(a) \subseteq \text{PointsTo}(x)$ in the analysis is a bit complicated, as we need the whole graph but we are yet reaching that point. And it seems the complexity of this algorithm will not be in linear time, either. I start to think about maybe we can use a union-find set to merge nodes for the analysis and seems that that's also a technique used by Steensgaard's algorithm.

Finally, I decide to stop the current prototype and move to Steensgaard's algorithm, which is fast as it's an algorithm in linear time. I'm running out of bandwidth so I'm closing this PR. I may revisit it to try the new algorithm if I have more time in the future.

Also, spilling all fields to their own temps may simplify the analysis in my opinion, though I'm not confident about this.

@AndyAyersMS
Copy link
Member

Thanks for the detailed notes. Not too surprising that this sort of analysis requires careful thinking.

If we end up needing to implement union-find as part of a revised approach, it would be good to think about how to make it into something generally useful.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants