NativeAOT size conscious authoring observations #83902

NinoFloris · 2023-03-24T21:09:57Z

NativeAOT authoring observations

As I'm continuing to iterate to shave off overhead from the new serializer for Npgsql, and slowly getting more familiar with the code patterns that work well with NativeAOT I wanted to report on my findings so far. I will be using this issue to centralize future observations of a similar nature as well.

Async

One class of issues in particular I want to highlight is the cost of everything relating to async code. Having any of these methods in generic types is one major source of bloat due to their inherent IL type + codegen explosion` multiplied by the number of canonical instantiations.

It might be worthwhile for the C# and runtime team to take a look at what can be done to reduce its footprint, as this interaction is supremely problematic for more casual authors. If there is no significant improvement to be made here it could help to expose relevant tools for authors to reduce it on a case by case basis.

Some of those tools could be for instance having public methods on ValueTask and ValueTask to move from its generic form to non generic and back, allowing us external authors to support pooled IValueTaskSource(T) while doing so.
Tasks have inherent support of up/downcasting to do this. For ValueTask these apis could be used in the same way as up/downcasting Tasks, for the purpose of pushing this bulky async codegen to non generic types. There already is the internal method ValueTask.DangerousCreateFromTypedValueTask as a validation of this being useful internally.

Async Types

Next up I started looking at the general cost of even having apis returning ValueTask types, it's not cheap... Not only do you pay for the Task types but also for ValueTask<T>, IValueTaskSource<T>, ValueTask<T>.ValueTaskSourceAsTask, ValueTaskAwaiter<T> and others.

I have two links here, one before dropping about 20(?) mostly reference type instantiations of ValueTask<T> and one report after doing so. The difference is a hefty 80kb, with methods taking up just 24kb of that difference. The remainder is mostly types and type dictionary metadata.

Before: https://github.com/NinoFloris/Slon/actions/runs/4507705639
After: https://github.com/NinoFloris/Slon/actions/runs/4507808228

The mstats are attached if you want to explore the System.Private.Corelib types in more detail.

Reference Types

Onto the next papercut, reference type instantiations in particular. Now their methods are shared via __Canon, however all their concrete types are still required to exist in full, take a look:

These all add to the binary size, but I'm not sure what they're exactly uniquely adding. Could these EETypes/method tables (what is the correct name of this?) be shared in any way via __Canon? Only keeping a concrete generic context around per reference type instantiation? I understand the latter would always be needed to have correct type testing etc. inside methods.

Improving this situation seems like it may also reduce the previously discussed ValueTask<T> type bloat?

Value Type Code Sharing

Finally I would urge runtime devs to consider what it would take to share code across same size/layout value types, a la gc shapes in golang. For a type like int that could allow eligible code for say List<T> to be reused across int, uint, enums,ValueTuple<int>, DateOnly and other types wrapping a single int. The same would go for other primitives like long that have a lot of representationally isomorphic types to share code with.

I understand this cuts into the ability to do runtime intrinsic optimizations (like uints never being negative values etc). I also see how this may complicate codegen as this sharing is only possible when the different instantiations don't actually produce different bodies (i.e. int.Equals will produce different code than DateOnly.Equals), however there will be many methods that don't depend on the generic type's methods at all, just their data representation. For an initial good enough experiment it might just be sufficient to add a stage that aliases same-type method instantiations by their body being byte for byte identical?

Such a stage may also help with sharing code across all instantiations when it doesn't make use of the generic context at all.
IIRC there is already some global deduplication mode (which impacts stack traces) but only sharing per generic type seems to be more suitable to be enabled by default?

I can see the theoretical version of these things working but I'm obviously not sure what this would mean more concretely. (and the practical problems flowing from this, which I'm surely glossing over here)

Conclusion

If we're really serious about NativeAOT being 'effortlessly' competitive (so no crazy authoring) these issues must be explored. If only just to understand the problematic elements better.

All in all it's been challenging to keep size down to acceptable levels in this particular area of generics, async and serializer-like code.

@DamianEdwards Is there a world in which we drive dotnet/aspnetcore#45910 stage 2 efforts across internal and external collaborators more effectively than just github issues? Is that something you're open to?

The text was updated successfully, but these errors were encountered:

NinoFloris · 2023-03-24T21:43:17Z

The details below dive a bit deeper into some of the async overhead workarounds, this may be useful for other NativeAOT code authors. I've folded it to keep the issue focussed.

Workarounds

One of the hacks I've used to reduce size of async methods in generic classes (when they have a single await and no generic return type) can be seen being implemented in this commit NinoFloris/Slon@bcd8572

The basic steps come down to:

Create a non generic base class for the generic class
Define in it the async method which would otherwise live in the generic class, it should be taking the Task<T> value to be awaited, but upcasted to Task, the return type of this method should match the generic class method return type.
Define an abstract method to post the awaited task back to + any arguments that may be additionally required to do so.
Then, in the derived generic class:
Implement the abstract and downcast the task back to its generic form, pull out the result and work with it.
Where once was the async method you now call into the base class async method.

if you already inherit from a class you can't modify or when you are implementing a generic struct, an alternative is to move the async method to a static class which then takes a callback delegate as an alternative to the virtual method. This is worse in certain ways, better in others, see https://github.com/NinoFloris/Slon/blob/c7f2b856e35e91d1f025640785fae4172d960f00/Slon/Pg/Converters/CollectionConverter.cs#L242-L255 for an example.

When working with ValueTasks the generic class method should first check whether the ValueTask is completed and skip the base method call if so, saving the alloc and virtual call.

That's basically it, when dealing with ValueTask's that wrap an IValueTaskSource this technique won't work without allocating, AsTask() will have to allocate a Task to proxy the IValueTaskSource. This is where a public version of ValueTask.DangerousCreateFromTypedValueTask could be relevant.

That being said, if that method would exist it wouldn't be an easy pattern to get right. As you can only use an IValueTaskSource once you can't use normal async await code, it would call GetResult on the non generic interface. And by doing so it would free the source without you getting the result. It would instead require you to write your own state machine to call into the virtual/callback after awaiting (some help from the language or some api to simplify doing this could be useful again).

Once there you can proceed as usual, by converting the ValueTask back to its generic form ValueTask<T>.GetAwaiter.GetResult (or ValueTask.Result) now correctly goes through IValueTaskSource<T>.GetResult giving the desired free + result.

ghost · 2023-03-24T22:03:35Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

NativeAOT authoring observations

As I'm continuing to iterate to shave off overhead from the new serializer for Npgsql, and slowly getting more familiar with the code patterns that work well with NativeAOT I wanted to report on my findings so far. I will be using this issue to centralize future observations of a similar nature as well.

Async

One class of issues in particular I want to highlight is the cost of everything relating to async code. Having any of these methods in generic types is one major source of bloat due to their inherent IL type + codegen explosion` multiplied by the number of canonical instantiations.

It might be worthwhile for the C# and runtime team to take a look at what can be done to reduce its footprint, as this interaction is supremely problematic for more casual authors. If there is no significant improvement to be made here it could help to expose relevant tools for authors to reduce it on a case by case basis.

Some of those tools could be for instance having public methods on ValueTask and ValueTask to move from its generic form to non generic and back, allowing us external authors to support pooled IValueTaskSource(T) while doing so.
Tasks have inherent support of up/downcasting to do this. For ValueTask these apis could be used in the same way as up/downcasting Tasks, for the purpose of pushing this bulky async codegen to non generic types. There already is the internal method ValueTask.DangerousCreateFromTypedValueTask as a validation of this being useful internally.

Async Types

Next up I started looking at the general cost of even having apis returning ValueTask types, it's not cheap... Not only do you pay for the Task types but also for ValueTask<T>, IValueTaskSource<T>, ValueTask<T>.ValueTaskSourceAsTask, ValueTaskAwaiter<T> and others.

I have two links here, one before dropping about 20(?) mostly reference type instantiations of ValueTask<T> and one report after doing so. The difference is a hefty 80kb, with methods taking up just 24kb of that difference. The remainder is mostly types and type dictionary metadata.

Before: https://github.com/NinoFloris/Slon/actions/runs/4507705639
After: https://github.com/NinoFloris/Slon/actions/runs/4507808228

The mstats are attached if you want to explore the System.Private.Corelib types in more detail.

Reference Types

Onto the next papercut, reference type instantiations in particular. Now their methods are shared via __Canon, however all their concrete types are still required to exist in full, take a look:

These all add to the binary size, but I'm not sure what they're exactly uniquely adding. Could these EETypes/method tables (what is the correct name of this?) be shared in any way via __Canon? Only keeping a concrete generic context around per reference type instantiation? I understand the latter would always be needed to have correct type testing etc. inside methods.

Improving this situation seems like it may also reduce the previously discussed ValueTask<T> type bloat?

Value Type Code Sharing

Finally I would urge runtime devs to consider what it would take to share code across same size/layout value types, a la gc shapes in golang. For a type like int that could allow eligible code for say List<T> to be reused across int, uint, enums,ValueTuple<int>, DateOnly and other types wrapping a single int. The same would go for other primitives like long that have a lot of representationally isomorphic types to share code with.

I understand this cuts into the ability to do runtime intrinsic optimizations (like uints never being negative values etc). I also see how this may complicate codegen as this sharing is only possible when the different instantiations don't actually produce different bodies (i.e. int.Equals will produce different code than DateOnly.Equals), however there will be many methods that don't depend on the generic type's methods at all, just their data representation. For an initial good enough experiment it might just be sufficient to add a stage that aliases same-type method instantiations by their body being byte for byte identical?

Such a stage may also help with sharing code across all instantiations when it doesn't make use of the generic context at all.
IIRC there is already some global deduplication mode (which impacts stack traces) but only sharing per generic type seems to be more suitable to be enabled by default?

I can see the theoretical version of these things working but I'm obviously not sure what this would mean more concretely. (and the practical problems flowing from this, which I'm surely glossing over here)

Conclusion

If we're really serious about NativeAOT being 'effortlessly' competitive (so no crazy authoring) these issues must be explored. If only just to understand the problematic elements better.

All in all it's been challenging to keep size down to acceptable levels in this particular area of generics, async and serializer-like code.

@DamianEdwards Is there a world in which we drive dotnet/aspnetcore#45910 stage 2 efforts across internal and external collaborators more effectively than just github issues? Is that something you're open to?

Author:	NinoFloris
Assignees:	-
Labels:	`untriaged`, `area-NativeAOT-coreclr`, `needs-area-label`
Milestone:	-

jkotas · 2023-03-25T00:56:29Z

Could these EETypes/method tables (what is the correct name of this?) be shared in any way via __Canon?

260 bytes is the size after sharing via __Canon. In general, smaller size implies worse speed. Would you be willing to pay for smaller size by worse startup time or worse throughput?

there is already some global deduplication mode (which impacts stack traces) but only sharing per generic type seems to be more suitable to be enabled by default?

This undocumented mode can be enable by <IlcFoldIdenticalMethodBodies>true</IlcFoldIdenticalMethodBodies>. Enabling it will allow you to measure upper bound of what can be potentially achieved by this optimization. This mode breaks stacktraces and reflection in subtle ways, for both generic and non-generic code.

NinoFloris · 2023-03-25T04:03:05Z

260 bytes is the size after sharing via __Canon. In general, smaller size implies worse time. Would you be willing to pay for smaller size by worse startup time or worse throughput?

There's no sign of a __Canon entry, just a type definition, (non virtual method tables can get trimmed afaik?). (you can view the report here, its the last folded panel https://github.com/NinoFloris/Slon/actions/runs/4507808228)

If the BOTR broadly applies to NativeAOT I'm aware the entire virtual method table is not shared, but I don't quite understand why. Is it really just to have concrete method table pointers on object instance headers? Would the throughput loss mean adding an indirection here? While the startup time increase would be for synthesizing these copies? Could that be ad-hoc on first allocation?

The problem I'm dealing with is keeping code size small while having a reasonable supported set of types baked in, the chances that they'll all be used by the app is small, and the chances they're needed immediately even smaller. User directed rooting - which would certainly be preferable - is not a great story in ADO.NET, but that's a different matter.

To answer your question directly, it might be worth some form of perf loss yes, but it'd have to be quantified first. The bad alternative is working purely with System.Object results everywhere, downcasting to its known type at the edges instead. It gets rid of all the ValueTask types, the CollectionConverter types and all the other type bloat. I have a commit that went down this road but I backed out of it as it just adds more implementation complexity, I hope to avoid it.

Is this issue in any way related? #83438

This undocumented mode can be enable by true.

Yeah that's the one! Thanks, saves a me search :) I'll see what it does.

jkotas · 2023-03-25T04:14:08Z

Would the throughput loss mean adding an indirection here?

Yes, it would mean adding indirection in the method table and an extra instruction in all virtual callsites. It is not clear whether it would be a size win at the end.

While the startup time increase would be for synthesizing these copies?

Correct.

ghost · 2023-03-28T23:44:21Z

Tagging subscribers to 'linkable-framework': @eerhardt, @vitek-karas, @LakshanF, @sbomer, @joperezr, @marek-safar
See info in area-owners.md if you want to be subscribed.

Issue Details

NativeAOT authoring observations

As I'm continuing to iterate to shave off overhead from the new serializer for Npgsql, and slowly getting more familiar with the code patterns that work well with NativeAOT I wanted to report on my findings so far. I will be using this issue to centralize future observations of a similar nature as well.

Async

One class of issues in particular I want to highlight is the cost of everything relating to async code. Having any of these methods in generic types is one major source of bloat due to their inherent IL type + codegen explosion` multiplied by the number of canonical instantiations.

It might be worthwhile for the C# and runtime team to take a look at what can be done to reduce its footprint, as this interaction is supremely problematic for more casual authors. If there is no significant improvement to be made here it could help to expose relevant tools for authors to reduce it on a case by case basis.

Some of those tools could be for instance having public methods on ValueTask and ValueTask to move from its generic form to non generic and back, allowing us external authors to support pooled IValueTaskSource(T) while doing so.
Tasks have inherent support of up/downcasting to do this. For ValueTask these apis could be used in the same way as up/downcasting Tasks, for the purpose of pushing this bulky async codegen to non generic types. There already is the internal method ValueTask.DangerousCreateFromTypedValueTask as a validation of this being useful internally.

Async Types

Next up I started looking at the general cost of even having apis returning ValueTask types, it's not cheap... Not only do you pay for the Task types but also for ValueTask<T>, IValueTaskSource<T>, ValueTask<T>.ValueTaskSourceAsTask, ValueTaskAwaiter<T> and others.

I have two links here, one before dropping about 20(?) mostly reference type instantiations of ValueTask<T> and one report after doing so. The difference is a hefty 80kb, with methods taking up just 24kb of that difference. The remainder is mostly types and type dictionary metadata.

Before: https://github.com/NinoFloris/Slon/actions/runs/4507705639
After: https://github.com/NinoFloris/Slon/actions/runs/4507808228

The mstats are attached if you want to explore the System.Private.Corelib types in more detail.

Reference Types

Onto the next papercut, reference type instantiations in particular. Now their methods are shared via __Canon, however all their concrete types are still required to exist in full, take a look:

These all add to the binary size, but I'm not sure what they're exactly uniquely adding. Could these EETypes/method tables (what is the correct name of this?) be shared in any way via __Canon? Only keeping a concrete generic context around per reference type instantiation? I understand the latter would always be needed to have correct type testing etc. inside methods.

Improving this situation seems like it may also reduce the previously discussed ValueTask<T> type bloat?

Value Type Code Sharing

Finally I would urge runtime devs to consider what it would take to share code across same size/layout value types, a la gc shapes in golang. For a type like int that could allow eligible code for say List<T> to be reused across int, uint, enums,ValueTuple<int>, DateOnly and other types wrapping a single int. The same would go for other primitives like long that have a lot of representationally isomorphic types to share code with.

I understand this cuts into the ability to do runtime intrinsic optimizations (like uints never being negative values etc). I also see how this may complicate codegen as this sharing is only possible when the different instantiations don't actually produce different bodies (i.e. int.Equals will produce different code than DateOnly.Equals), however there will be many methods that don't depend on the generic type's methods at all, just their data representation. For an initial good enough experiment it might just be sufficient to add a stage that aliases same-type method instantiations by their body being byte for byte identical?

Such a stage may also help with sharing code across all instantiations when it doesn't make use of the generic context at all.
IIRC there is already some global deduplication mode (which impacts stack traces) but only sharing per generic type seems to be more suitable to be enabled by default?

I can see the theoretical version of these things working but I'm obviously not sure what this would mean more concretely. (and the practical problems flowing from this, which I'm surely glossing over here)

Conclusion

If we're really serious about NativeAOT being 'effortlessly' competitive (so no crazy authoring) these issues must be explored. If only just to understand the problematic elements better.

All in all it's been challenging to keep size down to acceptable levels in this particular area of generics, async and serializer-like code.

@DamianEdwards Is there a world in which we drive dotnet/aspnetcore#45910 stage 2 efforts across internal and external collaborators more effectively than just github issues? Is that something you're open to?

Author:	NinoFloris
Assignees:	-
Labels:	`linkable-framework`, `area-NativeAOT-coreclr`
Milestone:	Future

danmoseley · 2023-03-28T23:46:07Z

@DamianEdwards Is there a world in which we drive dotnet/aspnetcore#45910 stage 2 efforts across internal and external collaborators more effectively than just github issues? Is that something you're open to?

@eerhardt are there tasks where the warnings for a library (stage 2 or not) seem relatively well understood and community folks could work through fixing them? or perhaps that would too much parallelism right now.

eerhardt · 2023-03-29T01:14:15Z

are there tasks where the warnings for a library (stage 2 or not) seem relatively well understood and community folks could work through fixing them?

I'm not sure that is what @NinoFloris is asking for. But I will tag the issues for each area in dotnet/aspnetcore#45910 stage 2.a.

Is there a world in which we drive dotnet/aspnetcore#45910 stage 2 efforts across internal and external collaborators more effectively than just github issues? Is that something you're open to?

@NinoFloris - do you have a specific proposal? What other medium besides github issues would you use?

MichalPetryka · 2023-03-31T02:33:16Z

260 bytes is the size after sharing via __Canon. In general, smaller size implies worse speed. Would you be willing to pay for smaller size by worse startup time or worse throughput?

Personally I feel like games are an important usecase for AOTs and there the code size mostly doesn't matter since even if you have 100MB of binaries, you'll likely have multiple gigabytes of assets with it. As such, I believe that all size optimizations that negatively impact performance should be optional (or possible to disable via <IlcOptimizationPreference>Speed</IlcOptimizationPreference>).

MichalStrehovsky · 2023-04-03T02:14:43Z

The size of async is tracked in #79204 and similar.

The valuetype sharing would come at a throughput cost - consider code like this:

runtime/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/ArraySortHelper.cs

Lines 565 to 582 in 8ffad52

    
           [MethodImpl(MethodImplOptions.AggressiveInlining)] // compiles to a single comparison or method call 
        
           private static bool LessThan(ref T left, ref T right) 
        
           { 
        
               if (typeof(T) == typeof(byte)) return (byte)(object)left < (byte)(object)right; 
        
               if (typeof(T) == typeof(sbyte)) return (sbyte)(object)left < (sbyte)(object)right; 
        
               if (typeof(T) == typeof(ushort)) return (ushort)(object)left < (ushort)(object)right; 
        
               if (typeof(T) == typeof(short)) return (short)(object)left < (short)(object)right; 
        
               if (typeof(T) == typeof(uint)) return (uint)(object)left < (uint)(object)right; 
        
               if (typeof(T) == typeof(int)) return (int)(object)left < (int)(object)right; 
        
               if (typeof(T) == typeof(ulong)) return (ulong)(object)left < (ulong)(object)right; 
        
               if (typeof(T) == typeof(long)) return (long)(object)left < (long)(object)right; 
        
               if (typeof(T) == typeof(nuint)) return (nuint)(object)left < (nuint)(object)right; 
        
               if (typeof(T) == typeof(nint)) return (nint)(object)left < (nint)(object)right; 
        
               if (typeof(T) == typeof(float)) return (float)(object)left < (float)(object)right; 
        
               if (typeof(T) == typeof(double)) return (double)(object)left < (double)(object)right; 
        
               if (typeof(T) == typeof(Half)) return (Half)(object)left < (Half)(object)right; 
        
               return left.CompareTo(right) < 0 ? true : false; 
        
           }

If we do valuetype sharing here, what was originally a single instruction for long, now potentially becomes 4 comparisons (long/ulong/nint/nuint all have the same size). There's tons of high-performance code within the framework that assumes this code gets specialized at compile time. We do this especially when perf matters. We wouldn't be able to enable this by default, only if the user says "I care about size more than anything else". It would be a niche feature.

jkotas · 2023-08-07T16:01:05Z

I do not see anything actionable in this issue. The actionable suggestions are tracked in linked issues.

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 24, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Mar 24, 2023

SingleAccretion added the area-NativeAOT-coreclr label Mar 24, 2023

SingleAccretion removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Mar 24, 2023

agocke added this to the Future milestone Mar 24, 2023

agocke removed the untriaged New issue has not been triaged by the area owner label Mar 24, 2023

danmoseley added the linkable-framework Issues associated with delivering a linker friendly framework label Mar 28, 2023

NinoFloris mentioned this issue Jun 21, 2023

Handler rework npgsql/npgsql#5123

Merged

8 tasks

jkotas closed this as completed Aug 7, 2023

ghost locked as resolved and limited conversation to collaborators Sep 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NativeAOT size conscious authoring observations #83902

NativeAOT size conscious authoring observations #83902

NinoFloris commented Mar 24, 2023

NinoFloris commented Mar 24, 2023

ghost commented Mar 24, 2023

NativeAOT authoring observations

Async

Async Types

Reference Types

Value Type Code Sharing

Conclusion

jkotas commented Mar 25, 2023 •

edited

Loading

NinoFloris commented Mar 25, 2023

jkotas commented Mar 25, 2023

ghost commented Mar 28, 2023

NativeAOT authoring observations

Async

Async Types

Reference Types

Value Type Code Sharing

Conclusion

danmoseley commented Mar 28, 2023

eerhardt commented Mar 29, 2023

MichalPetryka commented Mar 31, 2023 •

edited

Loading

MichalStrehovsky commented Apr 3, 2023

jkotas commented Aug 7, 2023

NativeAOT size conscious authoring observations #83902

NativeAOT size conscious authoring observations #83902

Comments

NinoFloris commented Mar 24, 2023

NativeAOT authoring observations

Async

Async Types

Reference Types

Value Type Code Sharing

Conclusion

NinoFloris commented Mar 24, 2023

ghost commented Mar 24, 2023

NativeAOT authoring observations

Async

Async Types

Reference Types

Value Type Code Sharing

Conclusion

jkotas commented Mar 25, 2023 • edited Loading

NinoFloris commented Mar 25, 2023

jkotas commented Mar 25, 2023

ghost commented Mar 28, 2023

NativeAOT authoring observations

Async

Async Types

Reference Types

Value Type Code Sharing

Conclusion

danmoseley commented Mar 28, 2023

eerhardt commented Mar 29, 2023

MichalPetryka commented Mar 31, 2023 • edited Loading

MichalStrehovsky commented Apr 3, 2023

jkotas commented Aug 7, 2023

jkotas commented Mar 25, 2023 •

edited

Loading

MichalPetryka commented Mar 31, 2023 •

edited

Loading