Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for stress failure when adjusting effective IP while stackwalking may put it on a wrong instruction. #100376

Merged
merged 3 commits into from
Mar 30, 2024

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Mar 27, 2024

Fixes: #86273

The change makes sure that an instruction after a throwing call cannot be branched to (and have different GC liveness.
In many cases that was already guaranteed, just picking up remaining cases.

This allows not to do -1 adjustment of the effective IP when stackwalking through exceptional/aborting frames. The root cause of #86273 was that this adjustment is not safe under some conditions.

One solution would be to:

  1. detect those conditions (which appear to be nontrivial, and we may not know all the cases), or
  2. make the adjustment unnecessary regardless of the context.

This change does the #2

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 27, 2024
@VSadov
Copy link
Member Author

VSadov commented Mar 27, 2024

/azp run runtime-coreclr gcstress-extra, runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@VSadov
Copy link
Member Author

VSadov commented Mar 28, 2024

The few failures that were observed in the stress runs are not new, so I consider this a pass.

@VSadov VSadov marked this pull request as ready for review March 28, 2024 20:37
@VSadov VSadov requested a review from AndyAyersMS March 28, 2024 20:37
@VSadov
Copy link
Member Author

VSadov commented Mar 28, 2024

CC. @jkotas @jakobbotsch

@VSadov
Copy link
Member Author

VSadov commented Mar 28, 2024

I think this is ready for a review.

@VSadov
Copy link
Member Author

VSadov commented Mar 28, 2024

I've just realized that we can now stop doing this adjustment on x86 and on NativeAOT as well.
And then can get rid of ICodeManagerFlags::AbortingCall entirely.

So, there is another commit. It is not strictly necessary for the fix, just some cleanup that we can do.

((call->AsCall()->gtCallType == CT_HELPER) &&
Compiler::s_helperCallProperties.AlwaysThrow(call->AsCall()->GetHelperNum())))
{
// NOTE: We should probably never see a BBJ_ALWAYS block ending with a throw in a first place.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you did see such cases? Can you open an issues for us to follow-up and switch them over to BBJ_THROWs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this might largely clean itself up by adding the logic to IsNoReturn, though there is one place in morph where we still check the flag rather than the property.

Copy link
Member Author

@VSadov VSadov Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it you did see such cases?

Yes, I wanted this to be an assert, so it would not happen by accident in the future. To my surprise, the assert was triggered by existing code.

@@ -712,7 +712,9 @@ void CodeGen::genCodeForBBlist()

if ((call != nullptr) && (call->gtOper == GT_CALL))
{
if ((call->AsCall()->gtCallMoreFlags & GTF_CALL_M_DOES_NOT_RETURN) != 0)
if ((call->AsCall()->gtCallMoreFlags & GTF_CALL_M_DOES_NOT_RETURN) != 0 ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend GenTreeCall::IsNoReturn() to cover the helper case too? Either by setting the no return flag when the helper is set, or by putting this check there?

Copy link
Member Author

@VSadov VSadov Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried, naively, to put

    if (s_helperCallProperties.AlwaysThrow((CorInfoHelpFunc)helper))
    {
        result->gtCallMoreFlags |= GTF_CALL_M_DOES_NOT_RETURN;
    }

in gtNewHelperCallNode

, but that resulted in

  Assertion failed 'call->gtCallType == CT_USER_FUNC' in 'System.ModuleHandle:ResolveFieldHandle(int,System.RuntimeTypeHandle[],System.Run
  timeTypeHandle[]):System.RuntimeFieldHandle:this' during 'Merge throw blocks' (IL size 283; hash 0x040825d3; FullOpts)
        if (!call->IsNoReturn())
        {
            continue;
        }

        // Sanity check -- only user funcs should be marked do not return
        assert(call->gtCallType == CT_USER_FUNC);

I could not easily tell if the assert is just an observation, that could be relaxed or indeed guarding some assumptions made in the code, so I did not continue on that path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting out that assert leads to others:

  Assert failure(PID 35472 [0x00008a90], Thread: 41468 [0xa1fc]): Assertion failed 'updateCount < optNoReturnCallCount' in 'System.Tests.S
  tringComparerTests:Create_CreatesValidComparer():this' during 'Merge throw blocks' (IL size 543; hash 0xc45bc068; FullOpts)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer entering an issue on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, looks like there are some assumptions in the throw helper merging to sort out.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JIT changes LGTM.

src/coreclr/inc/eetwain.h Outdated Show resolved Hide resolved
@jkotas
Copy link
Member

jkotas commented Mar 29, 2024

Do you plan to get rid of the adjustment in a follow up PR?

@VSadov
Copy link
Member Author

VSadov commented Mar 29, 2024

Do you plan to get rid of the adjustment in a follow up PR?

In this PR we can only avoid adjustment in a throwing case.

That is possible because on this path we will care only about fully-interruptible methods that faulted or called methods that throw (most commonly it is IL_Throw). We do not need to adjust in these cases because:

  • we only care about fully interruptible methods as only those may have EH, and otherwise all locals are dead anyways.
  • in a leaf that faulted we never wanted to adjust during stackwalk.
  • and in nonleaf, that called a throwing helper, the <brk> after the call makes it irrelevant if the IP is on the instruction following the throwing call or at -1 relative to that.

We stopped adjusting in this case in this PR - including x86 and AOT cases.

To get rid of the adjustment in all cases is a bigger change because:

  • we'd need to solve the issues uncovered in Allow async interruptions on safepoints #95565
    I am getting warmer towards "just add a NOP" solution, just want to see if I can scope it a bit.
    Adding <brk> after throwing calls is one thing, since those are rare and generally a slow path, adding NOP in a more general case where call is at the end of a BB still kind of bothers me.
    This will need to have a solution first. There are many benefits, not just that it leads to complete removal of -1 (although not sure if that can be done for x86 as well, maybe...)
  • another issue is that partially interruptible case includes -1 skew in the GCInfo for safepoints, so we'd need to change the format, which we can do in this release.

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@VSadov
Copy link
Member Author

VSadov commented Mar 30, 2024

Thanks!!!

@VSadov VSadov merged commit b7d91f2 into dotnet:main Mar 30, 2024
110 checks passed
@VSadov VSadov deleted the stress1fix branch March 30, 2024 00:52
@github-actions github-actions bot locked and limited conversation to collaborators Apr 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failed: (GetComponentSize() <= 2) || IsArray()
3 participants