Add ee_alloc_context #104849

noahfalk · 2024-07-13T05:35:38Z

This change is some preparatory refactoring for the randomized allocation sampling feature. We need to add more state onto allocation context but we don't want to do a breaking change of the GC interface. The new state only needs to be visible to the EE but we want it physically near the existing alloc context state for good cache locality. To accomplish this we created a new ee_alloc_context struct which contains an instance of gc_alloc_context within it.

A new field called ee_alloc_context::combined_limit should be used by fast allocation helpers to determine when to go down the slow path. Most of the time combined_limit has the same value as alloc_limit, but periodically we need to emit an allocation sampling event on an object that is somewhere in the middle of an AC. Using combined_limit rather than alloc_limit as the slow path trigger allows us to keep all the sampling event logic in the slow path.

There is another PR for NativeAOT making the same change: #104851

dotnet-policy-service · 2024-07-13T05:36:22Z

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

src/coreclr/vm/gcheaputilities.h

noahfalk · 2024-07-14T22:16:36Z

This is ready for review now.
All failures are known pre-existing failures according to build analysis. I believe I've applied your feedback as well @jkotas

noahfalk · 2024-07-15T05:53:01Z

I converted back to draft because testing on NativeAOT revealed a race condition that I believe effects the CoreCLR work as well. I am investigating.

src/coreclr/vm/gcenv.ee.cpp

noahfalk · 2024-07-15T10:21:26Z

All issues I know of are resolved and build analysis is green, so this is ready for review once more. Thanks!

src/coreclr/debug/daccess/dacdbiimpl.cpp

src/coreclr/debug/daccess/request.cpp

src/native/managed/cdacreader/src/DataType.cs

src/coreclr/vm/gcenv.ee.cpp

src/coreclr/vm/gcheaputilities.h

src/coreclr/vm/gcenv.ee.cpp

jkotas · 2024-07-16T06:24:56Z

Is it that hard to implement it in a way that it does not introduce cache contention from thread enumerations running in parallel? I think it is preferable to go with designs that do not have performance issues by construction.

noahfalk · 2024-07-16T06:57:43Z

Is it that hard to implement it in a way that it does not introduce cache contention from thread enumerations running in parallel? I think it is preferable to go with designs that do not have performance issues by construction.

I can implement it using the RestartEE approach I had earlier if you prefer that? @Maoni0 did not believe it was worth prioritizing time making a change how the GC currently does that enumeration.

src/coreclr/vm/gcenv.ee.cpp

noahfalk · 2024-07-18T07:58:17Z

@jkotas Following up on perf, I attempted to stress the threading by running gcperfsim with 400 threads and gc server config, allocating 500GB of objects. I used two different configs, one with a 50MB heap size and small objects and the other with a 2gb heap size allowing for a 20% LOH allocation mix to introduce more BGCs. My machine has 20 hardware threads. Median execution time in scenario 1 was 34.5 sec for PR and baseline. Scenario 2 was 38.4 sec for baseline and 38.5 sec for the PR. Noise on machine causes runs to vary by up to 1 second so the 0.1 diff isn't statistically significant. It's possible with a large sample size and a low noise environment we could discern some cost from the noise, but it certainly doesnt jump out. I also took CPU sampling traces which showed GCEnumAllocContexts consistently consumed 0.3% inclusive time before and after the change.

Also fyi I lost internet service at my house. I'm trying to get it restored but for now I only have GH access from my cell phone.

noahfalk · 2024-07-18T22:05:03Z

@jkotas - anything remaining before you can sign off? I believe everything has been resolved aside from still waiting to hear from @elinor-fung or @lambdageek.

noahfalk · 2024-07-19T00:45:44Z

I did some manual testing of the debugging scenario. In the normal DAC scenario I found and fixed one bug in the new commit. CDAC appeared to be working correctly. My test was running windbg + SOS with !threads and !DumpHeap to exercise reading gc_alloc_context. I confirmed using a debugger on the debugger that the CDAC case was executing down the CDAC branch of the DAC code as intended.

elinor-fung · 2024-07-19T22:04:47Z

cDAC changes look good to me.

jkotas · 2024-07-29T00:37:03Z

perf

To assess perf impact for changes like this, we typically try to craft a microbenchmark that tries to show worst regression or best improvement (one example: #96150 (comment)). Unless you introduce very bad bug, the impact will be lost in the noise from all other things going on in a general-purpose allocation throughput microbenchmark.

I am not worried about the perf impact too much with the latest version of the change, so I do not think it is necessary to do that.

jkotas · 2024-07-29T00:50:01Z

/azp run runtime-coreclr gcstress-extra

azure-pipelines · 2024-07-29T00:50:10Z

Azure Pipelines successfully started running 1 pipeline(s).

jkotas · 2024-07-29T04:45:33Z

GC stress crash at:

      Assert failure(PID 2208 [0x000008a0], Thread: 9372 [0x249c]): Consistency check failed: AV in clr at this callstack:
      ------
      CORECLR! Thread::CooperativeCleanup + 0xFF (0x6fff547e)
      CORECLR! Thread::DetachThread + 0x124 (0x6fff69ec)
      CORECLR! TlsDestructionMonitor::~TlsDestructionMonitor + 0x8C (0x7043d0d8)
      CORECLR! _dyn_tls_dtor + 0x8A (0x7045116a)

It looks like a regression introduced by this change (Thread::CooperativeCleanup is modified by this change).

This change is some preparatory refactoring for the randomized allocation sampling feature. We need to add more state onto allocation context but we don't want to do a breaking change of the GC interface. The new state only needs to be visible to the EE but we want it physically near the existing alloc context state for good cache locality. To accomplish this we created a new ee_alloc_context struct which contains an instance of gc_alloc_context within it. The new ee_alloc_context.combined_limit field should be used by fast allocation helpers to determine when to go down the slow path. Most of the time combined_limit has the same value as alloc_limit, but periodically we need to emit an allocation sampling event on an object that is somewhere in the middle of an AC. Using combined_limit rather than alloc_limit as the slow path trigger allows us to keep all the sampling event logic in the slow path.

combined_limit is now synchronized in GcEnumAllocContexts instead of RestartEE. This requires the GC being constrained in how it updates the alloc_ptr and alloc_limit. No GC behavior changed, in practice, but the constraints are now part of the EE<->GC contract so that we can rely on them in the EE code.

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

The code of GetAllocContext() was constructing a PTR_gc_alloc_context which does a host->target pointer conversion. Those conversions work by doing a lookup in a dictionary of blocks of memory that we have previously marshalled and the pointer being converted is expected to be the start of the memory block. In this case we had never previously marshalled the gc_allocation_context on its own. We had only marshalled the m_pRuntimeThreadLocals block which includes the gc_allocation_context inside of it at a non-zero offset. This caused the host->target pointer conversion to fail which in turn meant commands like !threads in SOS would fail. The fix is pretty trivial. We don't need to do a host->target conversion here at all because the calling code in the DAC is going to immediately convert right back to a host pointer. We can avoid the conversion in both directions by eliminating the cast and returning the host pointer directly.

Somehow a test reached Thread::CooperativeCleanup() with m_pRuntimeThreadLocals==NULL. Looking at the code I expected that would mean GetThread()==NULL or ThreadState contains TS_Dead, but neither of those conditions were true so it is unclear how it executed to that state. The callstack was: > coreclr.dll!Thread::CooperativeCleanup() Line 2762 C++ coreclr.dll!Thread::DetachThread(int fDLLThreadDetach) Line 936 C++ coreclr.dll!TlsDestructionMonitor::~TlsDestructionMonitor() Line 1745 C++ coreclr.dll!__dyn_tls_dtor(void * __formal, const unsigned long dwReason, void * __formal) Line 122 C++ ntdll.dll!_LdrxCallInitRoutine@16() Unknown ntdll.dll!LdrpCallInitRoutine() Unknown ntdll.dll!LdrpCallTlsInitializers() Unknown ntdll.dll!LdrShutdownThread() Unknown ntdll.dll!RtlExitUserThread() Unknown Regardless, the previous code wouldn't have hit that issue because it obtained the pointer through t_runtime_thread_locals rather than m_pRuntimeThreadLocals. I restored using t_runtime_thread_locals in the Cleanup routine. Out of caution I also searched for any other places that were previously accessing the alloc_context through the thread local address and ensured they don't switch to use m_pRuntimeThreadLocals either.

noahfalk · 2024-07-30T10:26:53Z

I've updated for the feedback so far + the test failure that showed up in the GC stress run. Looking at the time I think I may need to call it quits on this for .NET 9 and instead treat it as .NET 10 on a slower cadence. At this point I don't think I have more time available right now to do a staged checkin, revalidate testing after the ongoing changes, and continue to have time in reserve to respond to potential post-checkin issues.

dotnet-issue-labeler bot added the area-VM-coreclr label Jul 13, 2024

dotnet-policy-service bot assigned noahfalk Jul 13, 2024

noahfalk force-pushed the combined_limit branch from ca15926 to fd89bb6 Compare July 13, 2024 08:02

This was referenced Jul 13, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

TimeProviderTests.TestProviderTimer failed in CI #103459

Closed

The job running on agent NetCore-Public ran longer than the maximum time #104044

Closed

jkotas reviewed Jul 13, 2024

View reviewed changes

src/coreclr/vm/gcheaputilities.h Outdated Show resolved Hide resolved

noahfalk force-pushed the combined_limit branch from fd89bb6 to d8e7fce Compare July 14, 2024 00:31

build-analysis bot mentioned this pull request Jul 14, 2024

Roslyn analyzer throws error AD0001 NullReferenceException dotnet/dnceng#3305

Open

3 tasks

noahfalk force-pushed the combined_limit branch from d8e7fce to da44466 Compare July 14, 2024 07:50

noahfalk marked this pull request as ready for review July 14, 2024 22:14

noahfalk marked this pull request as draft July 15, 2024 05:51

noahfalk force-pushed the combined_limit branch from da44466 to 14fc696 Compare July 15, 2024 07:21

noahfalk commented Jul 15, 2024

View reviewed changes

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

noahfalk mentioned this pull request Jul 15, 2024

Add ee_alloc_context (NativeAOT) #104851

Open

noahfalk marked this pull request as ready for review July 15, 2024 10:04

jkotas reviewed Jul 15, 2024

View reviewed changes

src/coreclr/debug/daccess/dacdbiimpl.cpp Outdated Show resolved Hide resolved

src/coreclr/debug/daccess/request.cpp Outdated Show resolved Hide resolved

src/native/managed/cdacreader/src/DataType.cs Show resolved Hide resolved

jkotas reviewed Jul 15, 2024

View reviewed changes

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

jkotas reviewed Jul 16, 2024

View reviewed changes

src/coreclr/vm/gcheaputilities.h Outdated Show resolved Hide resolved

jkotas reviewed Jul 16, 2024

View reviewed changes

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

noahfalk mentioned this pull request Jul 16, 2024

Randomized allocation sampling #104955

Open

jkotas reviewed Jul 17, 2024

View reviewed changes

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

jkotas reviewed Jul 17, 2024

View reviewed changes

src/coreclr/vm/gcenv.ee.cpp Outdated Show resolved Hide resolved

This was referenced Jul 29, 2024

[browser] RuntimeError: memory access out of bounds #96546

Closed

ExplicitConversion_FromSingle failing due to NaN != NaN #103347

Open

System.Numerics.Tensors.Tests.TensorSpanTests test failure #103525

Closed

WasmTestOnChrome timeouts in CI #105363

Open

noahfalk and others added 7 commits July 30, 2024 01:08

Apply suggestions from code review

8779eb2

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

Apply suggestions from code review

d89b555

Co-authored-by: Jan Kotas <jkotas@microsoft.com>

Code review feedback - Renaming fields using m_ style

cd70d74

noahfalk force-pushed the combined_limit branch from 4a22d53 to cd70d74 Compare July 30, 2024 10:16

This was referenced Jul 30, 2024

MSBuild crashing in the build #92290

Open

msbuild crashes with "MSB0001: Internal MSBuild Error: must be valid" dotnet/dnceng#3304

Open

tommcdon mentioned this pull request Aug 26, 2024

Add randomized allocation sampling #100356

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ee_alloc_context #104849

Add ee_alloc_context #104849

noahfalk commented Jul 13, 2024 •

edited

Loading

dotnet-policy-service bot commented Jul 13, 2024

noahfalk commented Jul 14, 2024

noahfalk commented Jul 15, 2024

noahfalk commented Jul 15, 2024

jkotas commented Jul 16, 2024 •

edited

Loading

noahfalk commented Jul 16, 2024

noahfalk commented Jul 18, 2024

noahfalk commented Jul 18, 2024

noahfalk commented Jul 19, 2024

elinor-fung commented Jul 19, 2024

jkotas commented Jul 29, 2024

jkotas commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

jkotas commented Jul 29, 2024

noahfalk commented Jul 30, 2024

Add ee_alloc_context #104849

Are you sure you want to change the base?

Add ee_alloc_context #104849

Conversation

noahfalk commented Jul 13, 2024 • edited Loading

dotnet-policy-service bot commented Jul 13, 2024

noahfalk commented Jul 14, 2024

noahfalk commented Jul 15, 2024

noahfalk commented Jul 15, 2024

jkotas commented Jul 16, 2024 • edited Loading

noahfalk commented Jul 16, 2024

noahfalk commented Jul 18, 2024

noahfalk commented Jul 18, 2024

noahfalk commented Jul 19, 2024

elinor-fung commented Jul 19, 2024

jkotas commented Jul 29, 2024

jkotas commented Jul 29, 2024

azure-pipelines bot commented Jul 29, 2024

jkotas commented Jul 29, 2024

noahfalk commented Jul 30, 2024

noahfalk commented Jul 13, 2024 •

edited

Loading

jkotas commented Jul 16, 2024 •

edited

Loading