Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix debugger funceval deadlock #72179

Merged
merged 5 commits into from
Jul 22, 2022
Merged

Conversation

noahfalk
Copy link
Member

Fixes #60565

Visual Studio devs reported that debugger funcevals were deadlocking because the
PinnedHeapHandleTableCrst was held while threads were suspended. This change
refactors that code path so that the AllocateHandles() operation where the lock is held
gets split into potentially multiple iterations and the GC allocation where the debugger
could suspend is extracted outside the region where the critical section is held. There is
another path into AllocateHandles that takes a different lock and theoretically it could
generate a similar deadlock, but I didn't do all the refactoring necessary to prevent
that path.

I'm still testing to confirm this works as intended, but please take a look
@davidwr @hoyosjs @mikem8361 @cshung

Fixes dotnet#60565

Visual Studio devs reported that debugger funcevals were deadlocking because the
PinnedHeapHandleTableCrst was held while threads were suspended. This change
refactors that code path so that the AllocateHandles() operation where the lock is held
gets split into potentially multiple iterations and the GC allocation where the debugger
could suspend is extracted outside the region where the critical section is held. There is
another path into AllocateHandles that takes a different lock and theoretically it could
generate a similar deadlock, but I didn't do all the refactoring necessary to prevent
that path.
@ghost
Copy link

ghost commented Jul 14, 2022

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #60565

Visual Studio devs reported that debugger funcevals were deadlocking because the
PinnedHeapHandleTableCrst was held while threads were suspended. This change
refactors that code path so that the AllocateHandles() operation where the lock is held
gets split into potentially multiple iterations and the GC allocation where the debugger
could suspend is extracted outside the region where the critical section is held. There is
another path into AllocateHandles that takes a different lock and theoretically it could
generate a similar deadlock, but I didn't do all the refactoring necessary to prevent
that path.

I'm still testing to confirm this works as intended, but please take a look
@davidwr @hoyosjs @mikem8361 @cshung

Author: noahfalk
Assignees: -
Labels:

area-Diagnostics-coreclr

Milestone: -

@davidwr
Copy link

davidwr commented Jul 14, 2022

Sorry, I think you miss typed my user XD

Maybe @davidwrighton

@jkotas
Copy link
Member

jkotas commented Jul 14, 2022

The tests are hitting intermittent crashes and asserts, e.g.:

Assert failure(PID 32290 [0x00007e22], Thread: 32296 [0x7e28]): result == NULL
    File: /__w/1/s/src/coreclr/vm/appdomain.cpp Line: 972

I think that the logic in this PR is close to impossible to follow. Can this be refactored so that it is easier to follow? For example, can we have a method that looks like this?

{
    take the lock
    use preallocated handles if we have enough
    release the lock
}

allocate enough handles

{
     take the lock
     add the new handles for bookkeeping
     release the lock
}

and still resolve the handle from the cache on the 2nd attempt.
Refactored using a NewHolder and initializing the outsideLockWork
at the top of the function reduce the logic complexity and make
it more robust to overlooked return points.
@noahfalk
Copy link
Member Author

@jkotas - yeah I saw the test failure log earlier and recognized the case I missed after seeing my assert failure. I also refactored the logic to hopefully make it simpler to the reader. Let me know if you if you think I was successful or if you have other suggestions.

For example, can we have a method that looks like this? ...

Unfortunately two passes isn't guaranteed to be enough, we need a retry loop. It is possible that thread A does the first step, then thread B does both steps, then when thread A returns attempting to complete step 2 it discovers the internal state has changed and the preallocated bucket is no longer the correct size.

Another alternative if folks prefer is that I can probably write a lock-free version. The downside is that it adds more complexity in terms of potential thread interleavings and memory ordering issues.

@jkotas
Copy link
Member

jkotas commented Jul 14, 2022

the preallocated bucket is no longer the correct size.

The array that was allocated outside the lock should be still sufficient to full-fill the request on the current thread. I do not think we need any retry loops.

@noahfalk
Copy link
Member Author

The array that was allocated outside the lock should be still sufficient to full-fill the request on the current thread. I do not think we need any retry loops.

I can certainly thread them on to the linked list, but what are you thinking should happen afterwards? As a concrete example two threads could concurrently each ask for 1 handle at a point where the NextBucketSize is 4000 and we'd thread both of those large arrays into the list. Currently there is a special case for allocations of size 1 where it searches beyond the head node, but for any other request size 3999 of the newly added array slots would be unusable. Were you thinking that we'd couple this with changes to searching algorithm that try harder to reuse previously allocated nodes or we'd just count on re-entrant operations to be rare and accept the loss of efficiency if it happened?

@noahfalk
Copy link
Member Author

Is the core of your concern the performance/correctness/efficiency of the algorithm with the loop, or just that it is hard as a reader to understand the logic?

@jkotas
Copy link
Member

jkotas commented Jul 15, 2022

Yes, this change has a risk of pathological performance issues. This risk is both designs (abandon the redundant allocation, add the redundant allocation to the list). The problem is that these allocations go to pinned heap. The pinned heap is used rarely, and it cannot be compacted. It means that once you create a fragment on this heap, there is a good chance that it won't be re-used for anything. We run into similar problems when we explored use of the pinned heap for string literals.

The core of my concern is that this is turning something simple (allocate a piece of managed memory under a lock) into something pretty complex, hard to reason about and potential source of pathological performance issues. Allocating managed memory under a lock is done all the time in C# and I am sure there are number of places where it is done in VM. The solution implemented here won't scale as a fix.

@noahfalk noahfalk marked this pull request as draft July 19, 2022 08:40
@noahfalk
Copy link
Member Author

@jkotas - I haven't tested this updated version beyond that it builds so far, but feel free to take a look to see if it aligns with what we discussed.

I'll ping again and convert back to a real PR once I've had a chance to better validate that it works as intended.

@jkotas
Copy link
Member

jkotas commented Jul 19, 2022

Yes, this looks good. Thanks!

@noahfalk
Copy link
Member Author

Failure is known issue: #70969

@noahfalk noahfalk marked this pull request as ready for review July 21, 2022 23:21
@noahfalk
Copy link
Member Author

OK this is looking good again. In the old code the PinnedHeapHandleTable was synchronized by one of two different locks, either the AppDomainHandleTable lock or the GlobalStringLiteralMap. In the new code AppDomainHandleTable lock is renamed to PinnedHeapHandleTable lock and this lock is always what synchronizes the PinnedHeapHandleTable code. In the string literal code path the GlobalStringLiteralMap is taken first and the PinnedHeapHandleTable lock is taken second, but the PinnedHeapHandleTable is no longer reliant on that outer GlobalStringLiteralMap lock for correctness.

In terms of testing I can verify under a debugger that I can suspend in the GC allocation point and the PinnedHeapHandleTable lock isn't held. This doesn't of course prevent other locks from being held so at best it is a partial fix for the issue. Nobody had a known repro so I wasn't able to verify anything more specifically. I also confirmed the race cases on the InterlockedExchange paths worked how they were intended by forcing them with a native debugger.

@jkotas
Copy link
Member

jkotas commented Jul 21, 2022

In the string literal code path the GlobalStringLiteralMap is taken first

I suspect that this locks is also exposed to the debugger deadlock.

@noahfalk noahfalk requested a review from janvorli July 22, 2022 02:32
@noahfalk
Copy link
Member Author

I suspect that this locks is also exposed to the debugger deadlock.

Yeah I expect so as well.

Copy link
Member

@janvorli janvorli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@noahfalk
Copy link
Member Author

Merging, failure is known issue: #70969

@noahfalk noahfalk merged commit 463efbb into dotnet:main Jul 22, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

m_UnresolvedClassLock is held in an unsafe way for Func Evals
4 participants