Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mono][wasm] Assertion at mono/utils/lock-free-alloc.c:210, condition `!desc->in_use' not met #106007

Closed
caaavik-msft opened this issue Aug 6, 2024 · 5 comments · Fixed by #106080
Assignees
Labels
arch-wasm WebAssembly architecture area-GC-mono in-pr There is an active PR which will close this issue when it is merged os-browser Browser variant of arch-wasm
Milestone

Comments

@caaavik-msft
Copy link
Contributor

Description

In the dotnet-runtime-perf pipeline, the wasm BenchmarkDotNet tests are failing when running the Perf_Timer.ShortScheduleAndDisposeWithFiringTimers benchmark with the following stacktrace:

[2024/07/30 18:43:28][INFO] [MONO] * Assertion at /__w/1/s/src/mono/mono/utils/lock-free-alloc.c:210, condition `!desc->in_use' not met
[2024/07/30 18:43:28][INFO] 
[2024/07/30 18:43:28][INFO] Error
[2024/07/30 18:43:28][INFO]     at Cc (/home/helixbot/work/A386091A/w/AEC80959/e/performance/artifacts/bin/for-running/MicroBenchmarks/Job-BYBKHP/bin/Release/net9.0/browser-wasm/AppBundle/_framework/dotnet.runtime.js:3:167892)
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.wasm_trace_logger (wasm_trace_logger (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[13362]:0x22cc00))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.eglib_log_adapter (eglib_log_adapter (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[935]:0x3f1e9))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.monoeg_g_logv_nofree (monoeg_g_logv_nofree (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[835]:0x3cd7b))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.monoeg_assertion_message (monoeg_assertion_message (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[839]:0x3ce9f))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.mono_assertion_message (mono_assertion_message (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[841]:0x3cee2))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.mono_lock_free_alloc (mono_lock_free_alloc (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1389]:0x4eb10))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_alloc_internal (sgen_alloc_internal (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1402]:0x4f405))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_gray_object_alloc_queue_section (sgen_gray_object_alloc_queue_section (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1369]:0x4de54))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_gray_object_enqueue (sgen_gray_object_enqueue (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1370]:0x4dee7))
[2024/07/30 18:43:28][INFO] [MONO] /__w/1/s/src/mono/mono/sgen/sgen-gc.c:3984 <disabled>
[2024/07/30 18:43:28][INFO] Error
[2024/07/30 18:43:28][INFO]     at Cc (/home/helixbot/work/A386091A/w/AEC80959/e/performance/artifacts/bin/for-running/MicroBenchmarks/Job-BYBKHP/bin/Release/net9.0/browser-wasm/AppBundle/_framework/dotnet.runtime.js:3:167892)
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.wasm_trace_logger (wasm_trace_logger (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[13362]:0x22cc00))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.eglib_log_adapter (eglib_log_adapter (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[935]:0x3f1e9))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.monoeg_g_logv_nofree (monoeg_g_logv_nofree (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[835]:0x3cd7b))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.monoeg_g_log (monoeg_g_log (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[837]:0x3ce41))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.monoeg_g_log_disabled (monoeg_g_log_disabled (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[838]:0x3ce74))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_stop_world (sgen_stop_world (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1302]:0x49d18))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_perform_collection_inner (sgen_perform_collection_inner (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1301]:0x49b3f))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_perform_collection (sgen_perform_collection (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1299]:0x49aa5))
[2024/07/30 18:43:28][INFO]     at dotnet.native.wasm.sgen_ensure_free_space (sgen_ensure_free_space (wasm://wasm/dotnet.native.wasm-03460462:wasm-function[1298]:0x49a38))

Example CI showing failure: https://dev.azure.com/dnceng/internal/_build/results?buildId=2509023&view=logs&j=0f08c62b-ed4c-50b2-1260-59a23cc961c9&t=db1d6df3-8551-5183-6637-b677dadc9bee

Reproduction Steps

I was able to reproduce running the following in Ubuntu 20.04 WSL:

  1. Build the runtime: ./build.sh mono+libs -os browser -c Release
  2. Build the wasm-tools workload: ./dotnet.sh build -p:TargetOS=browser -p:TargetArchitecture=wasm -c Release src/mono/wasm/Wasm.Build.Tests /t:InstallWorkloadUsingArtifacts
  3. Clone the performance repo: https://github.com/dotnet/performance
  4. Run the following in the root of the performance repo: python3 scripts/benchmarks_ci.py -f net9.0 --dotnet-path /path/to/runtime/artifacts/bin/dotnet-latest --wasm --run-isolated --bdn-arguments="--anyCategories Libraries Runtime --category-exclusion-filter NoInterpreter NoWASM NoMono --logBuildOutput --wasmDataDir /path/to/runtime/src/mono/browser --filter *Perf_Timer.ShortScheduleAndDisposeWithFiringTimers* --wasmArgs \" --expose_wasm --module\"".
    Please note that the dotnet-path and wasmDataDir arguments need to be rooted as it doesn't support ~ expansion.

Expected behavior

The benchmark runs and collects performance results

Actual behavior

The benchmark fails to run and throws the assertion error

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 6, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Aug 6, 2024
@caaavik-msft caaavik-msft added the arch-wasm WebAssembly architecture label Aug 6, 2024
@lambdageek lambdageek added area-GC-mono os-browser Browser variant of arch-wasm and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 6, 2024
@caaavik-msft
Copy link
Contributor Author

I had forgotten to mention, it seems this issue has been happening for at least two months and was not caught (longer than our current pipeline retention so I can't pinpoint the exact commit that introduced this issue). I can see the issue occurred on a build on June 5th (db0eb5d), and there is a run that was manually retained on May 19th (5474ab5) that doesn't seem to have caught this issue, so I suspect it occurred between those two commits. Looking at the commit range I think it might be caused by this commit: 4219e45

@mkhamoyan mkhamoyan added this to the 9.0.0 milestone Aug 7, 2024
@mkhamoyan
Copy link
Member

@kg could you please check if issue is related to 4219e45 ?

@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Aug 7, 2024
@lambdageek
Copy link
Member

lambdageek commented Aug 7, 2024

Hmm... I think part of the work in 4219e45 was to sometimes return non-zeroed pages from the low-level allocator. So perhaps all that is missing is a memset(desc, 0, sizeof(*desc)) around here:

desc = (Descriptor *) mono_valloc (NULL, desc_size * NUM_DESC_BATCH, prot_flags_for_activate (TRUE), type);
g_assertf (desc, "Failed to allocate memory for the lock free allocator");

@kg we should perhaps audit the other uses of mono_valloc and be a bit more conservative about when we hand out non-zeroed pages

@kg
Copy link
Contributor

kg commented Aug 7, 2024

The default is zeroed pages, but maybe I turned on nonzeroed in a place where I shouldn't, I don't remember. I'll take a look.

@kg
Copy link
Contributor

kg commented Aug 7, 2024

Hmm... I think part of the work in 4219e45 was to sometimes return non-zeroed pages from the low-level allocator. So perhaps all that is missing is a memset(desc, 0, sizeof(*desc)) around here:

desc = (Descriptor *) mono_valloc (NULL, desc_size * NUM_DESC_BATCH, prot_flags_for_activate (TRUE), type);
g_assertf (desc, "Failed to allocate memory for the lock free allocator");

@kg we should perhaps audit the other uses of mono_valloc and be a bit more conservative about when we hand out non-zeroed pages

It looks like this specific call site is actually allocating a big block of descriptors, so it should all be zeroed. Unlike alloc_sb which appears to be allocating space for objects with a header

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 7, 2024
@kg kg closed this as completed in #106080 Aug 8, 2024
@kg kg closed this as completed in 68511fd Aug 8, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Sep 8, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-wasm WebAssembly architecture area-GC-mono in-pr There is an active PR which will close this issue when it is merged os-browser Browser variant of arch-wasm
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants