Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Fix profiler tail call insertion logic for FIELD_LIST #76883

Merged
merged 6 commits into from
Oct 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions src/coreclr/jit/codegenarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3069,9 +3069,6 @@ void CodeGen::genCall(GenTreeCall* call)
CallArgABIInformation& abiInfo = arg.AbiInfo;
GenTree* argNode = arg.GetLateNode();

// GT_RELOAD/GT_COPY use the child node
argNode = argNode->gtSkipReloadOrCopy();
Comment on lines -3072 to -3073
Copy link
Member Author

@jakobbotsch jakobbotsch Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing/skipping these here would be illegal IIUC. SPMI replay does not find any occurrences of this, and codegenxarch.cpp also does not have the equivalent here.


if (abiInfo.GetRegNum() == REG_STK)
continue;

Expand Down
112 changes: 86 additions & 26 deletions src/coreclr/jit/lower.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1711,45 +1711,105 @@ void Lowering::InsertProfTailCallHook(GenTreeCall* call, GenTree* insertionPoint

if (insertionPoint == nullptr)
{
for (CallArg& arg : call->gtArgs.EarlyArgs())
{
assert(!arg.GetEarlyNode()->OperIs(GT_PUTARG_REG)); // We don't expect to see these in early args

if (arg.GetEarlyNode()->OperIs(GT_PUTARG_STK))
{
// found it
insertionPoint = arg.GetEarlyNode();
break;
}
}
insertionPoint = FindEarliestPutArg(call);

if (insertionPoint == nullptr)
{
for (CallArg& arg : call->gtArgs.LateArgs())
{
if (arg.GetLateNode()->OperIs(GT_PUTARG_REG, GT_PUTARG_STK))
{
// found it
insertionPoint = arg.GetLateNode();
break;
}
}

// If there are no args, insert before the call node
if (insertionPoint == nullptr)
{
insertionPoint = call;
}
insertionPoint = call;
}
}

#endif // !defined(TARGET_X86)

assert(insertionPoint != nullptr);
JITDUMP("Inserting profiler tail call before [%06u]\n", comp->dspTreeID(insertionPoint));

GenTree* profHookNode = new (comp, GT_PROF_HOOK) GenTree(GT_PROF_HOOK, TYP_VOID);
BlockRange().InsertBefore(insertionPoint, profHookNode);
}

//------------------------------------------------------------------------
// FindEarliestPutArg: Find the earliest direct PUTARG operand of a call node in
// linear order.
//
// Arguments:
// call - the call
//
// Returns:
// A PUTARG_* node that is the earliest of the call, or nullptr if the call
// has no arguments.
//
GenTree* Lowering::FindEarliestPutArg(GenTreeCall* call)
{
size_t numMarkedNodes = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a 2-pass system? Is it the case that we have a required ordering where the putarg under the field_list are in order, such that we can (1) look for the first field_list and then (2) look for the first putarg under that, in list order?

I guess your solution seems more robust and doesn't require strict list ordering, but does require more traversals.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd prefer that we don't make any assumptions on linear order vs operand order in LIR after rationalization. For example, it might be reasonable in the future to allow some trivial scheduling/reordering of the PUTARG_REG order in lowering. One place today we are moving these nodes already is when control-flow guard is enabled (MoveCFGCallArg -- but I think this ends up keeping the nodes in order).

Copy link
Contributor

@SingleAccretion SingleAccretion Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One place where the operand order does not match the execution order is on x86, where the use list is reversed.

Copy link
Member Author

@jakobbotsch jakobbotsch Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really true? I think the order matches execution order, but the arguments are pushed in that order so they end up on stack in opposite order of the other platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure?

// The code generator will push these fields in reverse order by offset. Reorder the list here s.t. the order
// of uses is visible to LSRA.
assert(fieldList->Uses().IsSorted());
fieldList->Uses().Reverse();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, for the FIELD_LIST operands. Ok, interesting, was not aware of this. This logic skips x86 anyway.

for (CallArg& arg : call->gtArgs.Args())
{
if (arg.GetEarlyNode() != nullptr)
{
numMarkedNodes += MarkPutArgNodes(arg.GetEarlyNode());
}

if (arg.GetLateNode() != nullptr)
{
numMarkedNodes += MarkPutArgNodes(arg.GetLateNode());
}
}

if (numMarkedNodes <= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: size_t is unsigned, so < is unnecessary (maybe confusing? Or just "defense in depth"?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a habit of mine. I always write this check like this.

{
return nullptr;
}

GenTree* node = call;
do
{
node = node->gtPrev;

assert((node != nullptr) && "Reached beginning of basic block while looking for marked nodes");

if ((node->gtLIRFlags & LIR::Flags::Mark) != 0)
{
node->gtLIRFlags &= ~LIR::Flags::Mark;
numMarkedNodes--;
}
} while (numMarkedNodes > 0);

assert(node->OperIsPutArg());
return node;
}

//------------------------------------------------------------------------
// MarkPutArgNodes: Mark all direct operand PUTARG nodes with a LIR mark.
//
// Arguments:
// node - the node (either a field list or PUTARG node)
//
// Returns:
// The number of marks added.
//
size_t Lowering::MarkPutArgNodes(GenTree* node)
{
assert(node->OperIsPutArg() || node->OperIsFieldList());

size_t result = 0;
if (node->OperIsFieldList())
{
for (GenTreeFieldList::Use& operand : node->AsFieldList()->Uses())
{
assert(operand.GetNode()->OperIsPutArg());
result += MarkPutArgNodes(operand.GetNode());
}
}
else
{
assert((node->gtLIRFlags & LIR::Flags::Mark) == 0);
node->gtLIRFlags |= LIR::Flags::Mark;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be appropriate to add:

assert((node->gtLIRFlags & LIR::Flags::Mark) == 0);

I.e., how do we know the LIR doesn't have any Mark set at this point of compilation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be appropriate, I can add the assert.

I.e., how do we know the LIR doesn't have any Mark set at this point of compilation?

Same assumption is made by LIR::Range::GetTreeRange that we are using from lowering already. This code is somewhat like LIR::Range::GetMarkedRange.

result++;
}

return result;
}

//------------------------------------------------------------------------
// LowerFastTailCall: Lower a call node dispatched as a fast tailcall (epilog +
// jmp).
Expand Down
2 changes: 2 additions & 0 deletions src/coreclr/jit/lower.h
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,8 @@ class Lowering final : public Phase
GenTree* lookForUsesStart,
GenTreeCall* callNode);
void InsertProfTailCallHook(GenTreeCall* callNode, GenTree* insertionPoint);
GenTree* FindEarliestPutArg(GenTreeCall* call);
size_t MarkPutArgNodes(GenTree* node);
GenTree* LowerVirtualVtableCall(GenTreeCall* call);
GenTree* LowerVirtualStubCall(GenTreeCall* call);
void LowerArgsForCall(GenTreeCall* call);
Expand Down