-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Marshal.AllocHGlobal/FreeHGlobal is ~150x slower in .NET than legacy mono on device (tvOS) #58939
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
Unrelated, just a note: there is also |
On Unix systems |
|
I agree it would be a correctness issue to do so. I am still thinking about benchmarking the impact though. Depending on particular use case there may be other ways to allocate the memory which are less costly (eg. stackalloc for small allocations). |
I think That is, these calls that make it "incompatible" with It would perhaps be interesting to see if there was something we could do that could help support this kind of scenario. |
like emitting it as
? where someLimit is some small value that won't cause any mmap under the hood |
You cannot tell. Any |
Right, just discovered that in glibc sources 👍 |
I think the root cause of this problem is the high overhead implementation strategy used for PInvoke transitions on tvOS. This high overhead is a problem for every other PInvoke. For example, globalization PInvokes will hit it too. |
Also add a 'SupportedOSPlatformVersion' value to the .NET perftest project file, to cope with recent changes in our .NET support. Ref: dotnet/runtime#58939.
I'll have to revisit mono/mono#17110 to see if the optimizations can be done correctly. On my MacBook Air M1 I get these numbers with the provided test case:
|
There's something weird with my local runtime builds because they behave quite differently to the official ones. I'll have to figure that out first. Local dotnet/runtime, MacCatalyst ARM64 Debug, Release builds are way more comparable to the numbers above. My local changes produce Another experiment is measuring the overhead by adding
So, yeah, the transitions are crazy expensive. |
I've done some experiment locally that ensures registers are saved to stack and then saves only part of the context on the thread state transition. Saves some memory copying. It would need a lot of polishing and validation to ensure it does not break anything. Mono ARM64 gets Not sure if I can get it ready anytime soon but here's a gist of what I was testing:
|
…12696) Also add a 'SupportedOSPlatformVersion' value to the .NET perftest project file, to cope with recent changes in our .NET support. Ref: dotnet/runtime#58939.
/cc @vargaz |
Fixing/improving this would require risky changes so this will unlikely to be fixed for 6.0. |
…elease builds (#59269) Backport of #59029 Profiling shows that large part of the GC transition overhead (~30%) in #58939 is caused by assert-style checks. Disabling them seems to be the best bang-for-the-buck option for reducing the overhead without fundamental changes to the code. Co-authored-by: Filip Navara <navara@emclient.com>
What about CoreCLR ARM64? |
Moving to 8.0 |
Description
Calling Marshal.AllocHGlobal / Marshal.FreeHGlobal is ~150x slower in .NET compared to legacy Mono when running on a tvOS device.
Sample test code: https://gist.github.com/rolfbjarne/b22b844e6f351ad40c4f30e20a2a36d8
Outputs something like this with legacy Mono (Xamarin.iOS from d16-10):
which is roughly 15.5M calls to Marshal.AllocHGlobal+FreeHGlobal per second.
Now in .NET I get this:
that's roughly 103k calls to Marshal.AllocHGlobal+FreeHGlobal per second; ~150x slower.
This is on an Apple TV 4K from 2017.
There's a difference in the simulator too, just not so stark (on an iMac Pro)
Legacy Mono:
and with .NET:
so ~4x slower.
I profiled the .NET version on device using instruments:
Marshal.trace.zip
Here's a preview:
It seems most of the time is spent inside
mono_threads_enter_gc_safe_region_unbalanced
.This function isn't even called in legacy Mono.
Here's an Instruments trace:
MarshalMono.trace.zip
and a preview:
I don't know if this applies to other platforms as well, I only tested tvOS.
The text was updated successfully, but these errors were encountered: