Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: HandleRefTest.Validate_NoGC() #97334

Closed
BruceForstall opened this issue Jan 22, 2024 · 7 comments · Fixed by #98242
Closed

Test failure: HandleRefTest.Validate_NoGC() #97334

BruceForstall opened this issue Jan 22, 2024 · 7 comments · Fixed by #98242

Comments

@BruceForstall
Copy link
Member

This test is failing on osx-arm64 with GCStress=3 with a timeout/hang.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=535333&view=ms.vss-test-web.build-test-results-tab&runId=12652286&resultId=116250&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

05:52:21.755 Running test: global::HandleRefTest.Validate_NoGC()
GC Callback Begin
GC Callback before WaitForPendingFinalizers()
['Interop.0.1' END OF WORK ITEM LOG: Command timed out, and was killed]

@AaronRobinsonMSFT @cshung

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 22, 2024
@ghost
Copy link

ghost commented Jan 23, 2024

Tagging subscribers to this area: @dotnet/gc
See info in area-owners.md if you want to be subscribed.

Issue Details

This test is failing on osx-arm64 with GCStress=3 with a timeout/hang.

https://dev.azure.com/dnceng-public/public/_build/results?buildId=535333&view=ms.vss-test-web.build-test-results-tab&runId=12652286&resultId=116250&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab

05:52:21.755 Running test: global::HandleRefTest.Validate_NoGC()
GC Callback Begin
GC Callback before WaitForPendingFinalizers()
['Interop.0.1' END OF WORK ITEM LOG: Command timed out, and was killed]

@AaronRobinsonMSFT @cshung

Author: BruceForstall
Assignees: -
Labels:

GCStress, area-GC-coreclr, untriaged, blocking-clean-ci-optional

Milestone: -

@AaronRobinsonMSFT
Copy link
Member

I'm moving this to GC since the original issue that was mitigated involved frozen objects - PR #97109

@cshung
Copy link
Member

cshung commented Feb 9, 2024

@markples told me perhaps we shouldn't be running this test under GCStress since WaitForPendingFinalizers is not supported there

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 9, 2024
@ghost ghost removed in-pr There is an active PR which will close this issue when it is merged untriaged New issue has not been triaged by the area owner labels Feb 9, 2024
@BruceForstall
Copy link
Member Author

@markples told me perhaps we shouldn't be running this test under GCStress since WaitForPendingFinalizers is not supported there

Is that really true? Why?

There are hundreds of tests that call WaitForPendingFinalizers; I don't think they are all marked as incompatible with GCStress.

@markples
Copy link
Member

There was some discussion earlier:

#68529 (comment)
#68060 (comment)

It could be that WaitForPendingFinalizers is fine but not if the test (where test can mean an entire process - or merged group) has a ReRegisterForFinalize, but I don't think anyone knows. gcstress does seem to create a rather artificial environment - an infinite loop could very well be the correct behavior for the finalization mechanisms given the code+gcstress.

@BruceForstall
Copy link
Member Author

My mental model is that if a test can fail under GCStress then it can fail without GCStress if the set of GCs that happens is "unlucky". So if WaitForPendingFinalizers + (possibly) ReRegisterForFinalize can cause failures under GCStress but not without it, then we're assuming/hoping that GC doesn't happen in some subset of that scenario, which seems risky.

@markples
Copy link
Member

WaitForPendingFinalizers + allocation is enough to be in this situation, and we can force the (otherwise unlikely/unlucky) failure with an explicit Collect:

using System;
using System.Runtime.CompilerServices;

FFF.Alloc();
GC.Collect();
GC.WaitForPendingFinalizers();

class FFF
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void Alloc()
    {
        FFF f = new FFF();
    }
    ~FFF()
    {
        Console.Write('~');
        Alloc();
        GC.Collect();
    }
}

It doesn't quite work just calling ReRegisterForFinalize instead of Alloc because the finalizer holds onto this, so the Collect doesn't put it back on the queue. However, all we need is a second object so that whichever runs second is able to put the first one back on the queue (and then they can go back and forth after that).

using System;
using System.Runtime.CompilerServices;

FFF.Alloc();
FFF.Alloc();
GC.Collect();
GC.WaitForPendingFinalizers();

class FFF
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void Alloc()
    {
        FFF f = new FFF();
    }
    ~FFF()
    {
        Console.Write('~');
        GC.ReRegisterForFinalize(this);
        GC.Collect();
    }
}

So yes, the test pattern is risky - waiting for a queue to empty where the objects put themselves back into it. A "delayed reregister" would fix it, assuming that it didn't ruin the intent of the test(s). The finalization feature isn't used that way in practice, so it seems like this would be artificial.

using System;
using System.Runtime.CompilerServices;

FFF.Alloc();
FFF.Alloc();
GC.Collect();
GC.WaitForPendingFinalizers();
while (FFF.WaitToRegister.TryPop(out var f)) GC.ReRegisterForFinalize(f);
GC.Collect();
GC.WaitForPendingFinalizers();

class FFF
{
    public static Stack<FFF> WaitToRegister = new Stack<FFF>();

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static void Alloc()
    {
        FFF f = new FFF();
    }
    ~FFF()
    {
        Console.Write('~');
        WaitToRegister.Push(this);
        GC.Collect();
    }
}

@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants