Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: RuntimeHelpers.GetAssemblyFromRvaSpan API #71268

Closed
Sergio0694 opened this issue Jun 24, 2022 · 7 comments
Closed

[API Proposal]: RuntimeHelpers.GetAssemblyFromRvaSpan API #71268

Sergio0694 opened this issue Jun 24, 2022 · 7 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Jun 24, 2022

Background and motivation

This is a follow up from a conversation with @GrabYourPitchforks on Discord around UTF8 strings, and aims to mitigate the pack of that type as well as extending support for passing readonly PE data around on the heap as a more general feature.

C# has support for easily declaring embedded PE data, via ReadOnlySpan<byte> values. For instance:

ReadOnlySpan<byte> span1 = new byte[] { 1, 2, 3, 4 };
ReadOnlySpan<byte> span2 = "Hello world"u8;

Both result in two spans wrapping embedded PE data. There is a big limitation with this though, in that there's no safe way to then pass this data around over the heap without incurring additional allocations. You can certainly create a custom MemoryManager<byte> and then use Unsafe.AsPointer(ref MemoryMarshal.GetReference(span)) to wrap the pointer to the data, which is fixed, but this doesn't account for assembly unloading, as the fact that underlying RVA field is being referenced will not be seen by the GC. This is especially problematic due to not having an Utf8String type (which will likely not come at all at this point), but it applies to all embedded binary blobs in general. The second you want to pass them around on the heap, you need to just call ToArray() on them and allocate, so you can then pass them as a ReadOnlyMemory<byte>, or you need to give up on safety and just YOLO it with the workaround mentioned above. Neither of these is ideal at all.

We should have a fast, safe, trimming-friendly way to pass embedded PE data around on the heap.

cc. @AaronRobinsonMSFT @tannergooding

API Proposal

The idea is to solve this by adding the following intrinsic API:

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
    public static Assembly GetAssemblyForRvaSpan(ReadOnlySpan<byte> span);
}

This API would resolve the assembly from a span generated from an RVA field, and return it. If the span does not wrap an RVA field, it can just throw. This way, developers would be able to just pass around arbitrary types wrapping both the assembly and the raw pointer to the span data, allowing them to not allocate anything while still making the code safe for assembly unloading.

This could be useful in all sorts of cases where an Utf8String could've been used, and where arrays/memorys are instead used today after calling ToArray() on UTF8 literals, which is not ideal. Other cases where people might want to pass around embedded binary blobs could leverage this too. For instance, I'm currently using the hack mentioned above in ComputeSharp.D2D1, but I'd really like to just switch to a safe approach here instead without having to give up on keeping that code path allocation free.

API Usage

static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
    ReadOnlySpan<byte> data = "Hello world"u8;
    Assembly assembly = RuntimeHelpers.GetAssemblyForRvaSpan(data);
    void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));

    return new RvaSpanMemoryManager(ptr, data.Length, assembly).Memory;
}

Same thing for any other arbitrary embedded binary blob.

Alternative Designs

Not sure if a RuntimeFieldHandle also keeps the source assembly alive. If that's the case we might decide to have an intrinsic to get that from an RVA span, though it might be less useful than having an Assembly object. Not sure which would be easier to implement. I should note, this proposal is not necessarily with the best possible API shape, it's meant to also start a discussion on getting some way to do this though, because this being not doable at all is just really really unfortunate.

Open questions

C# 11 and .NET 7 introduced the ability to also have constant data for primitive types that can account for endianness of a system. This means that in some cases, the returned span might not directly wrap an RVA field. Do we want to support those too somehow, or is the current proposal to only support byte spans enough? We might also just decide to extend support to additional types in the future, so the two don't necessarily have to be done at the same time.

Risks

None that I can see.
Currently developers are forced to either use unsafe code or to waste allocations, so can't be worse than that.
The API would be hidden away in compiler services, so it's not meant for general use anyway.

@Sergio0694 Sergio0694 added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jun 24, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 24, 2022
@ghost
Copy link

ghost commented Jun 24, 2022

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

This is a follow up from a conversation with @GrabYourPitchforks on Discord around UTF8 strings, and aims to mitigate the pack of that type as well as extending support for passing readonly PE data around on the heap as a more general feature.

C# has support for easily declaring embedded PE data, via ReadOnlySpan<byte> values. For instance:

ReadOnlySpan<byte> span1 = new byte[] { 1, 2, 3, 4 };
ReadOnlySpan<byte> span2 = "Hello world"u8;

Both result in two spans wrapping embedded PE data. There is a big limitation with this though, in that there's no safe way to then pass this data around over the heap without incurring additional allocations. You can certainly create a custom MemoryManager<byte> and then use Unsafe.AsPointer(ref MemoryMarshal.GetReference(span)) to wrap the pointer to the data, which is fixed, but this doesn't account for assembly unloading, as the fact that underlying RVA field is being referenced will not be seen by the GC. This is especially problematic due to not having an Utf8String type (which will likely not come at all at this point), but it applies to all embedded binary blobs in general. The second you want to pass them around on the heap, you need to just call ToArray() on them and allocate, so you can then pass them as a ReadOnlyMemory<byte>, or you need to give up on safety and just YOLO it with the workaround mentioned above. Neither of these is ideal at all.

We should have a fast, safe, trimming-friendly way to pass embedded PE data around on the heap.

cc. @AaronRobinsonMSFT @tannergooding

API Proposal

The idea is to solve this by adding the following intrinsic API:

namespace System.Runtime.CompilerServices;

public static class RuntimeHelpers
{
    public static Assembly GetAssemblyForRvaSpan(ReadOnlySpan<byte> span);
}

This API would resolve the assembly from a span generated from an RVA field, and return it. If the span does not wrap an RVA field, it can just throw. This way, developers would be able to just pass around arbitrary types wrapping both the assembly and the raw pointer to the span data, allowing them to not allocate anything while still making the code safe for assembly unloading.

This could be useful in all sorts of cases where an Utf8String could've been used, and where arrays/memorys are instead used today after calling ToArray() on UTF8 literals, which is not ideal. Other cases where people might want to pass around embedded binary blobs could leverage this too. For instance, I'm currently using the hack mentioned above in ComputeSharp.D2D1, but I'd really like to just switch to a safe approach here instead without having to give up on keeping that code path allocation free.

API Usage

static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
    ReadOnlySpan<byte> data = "Hello world"u8;
    Assembly assembly = RuntimeHelpers.GetAssemblyForRvaSpan(data);
    void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));

    return new RvaSpanMemoryManager(ptr, data.Length, assembly).Memory;
}

Same thing for any other arbitrary embedded binary blob.

Alternative Designs

Not sure if a RuntimeFieldHandle also keeps the source assembly alive. If that's the case we might decide to have an intrinsic to get that from an RVA span, though it might be less useful than having an Assembly object. Not sure which would be easier to implement. I should note, this proposal is not necessarily with the best possible API shape, it's meant to also start a discussion on getting some way to do this though, because this being not doable at all is just really really unfortunate.

Risks

None that I can see.
Currently developers are forced to either use unsafe code or to waste allocations, so can't be worse than that.
The API would be hidden away in compiler services, so it's not meant for general use anyway.

Author: Sergio0694
Assignees: -
Labels:

api-suggestion, area-System.Memory

Milestone: -

@AaronRobinsonMSFT
Copy link
Member

I'm not really seeing the benefit here. Is the constant being reference from another assembly?

I think the above could just as easily be written as follows:

static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
    ReadOnlySpan<byte> data = "Hello world"u8;
    Assembly assembly = Assembly.GetExecutingAssembly();
    void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));

    return new RvaSpanMemoryManager(ptr, data.Length, assembly).Memory;
}

Even if that is the case sharing these constants seems odd as it exposes fields, which I thought was not desirable for public API surface.

@jkoritzinsky
Copy link
Member

Another option would be to only expose RvaSpanMemoryManager (likely with a different name like EmbeddedAssemblyDataMemoryManager) and have the "span -> Assembly" step be internal to its implementation. Then we'd avoid exposing this concept publicly for arbitrary spans.

@Sergio0694
Copy link
Contributor Author

"sharing these constants seems odd as it exposes fields, which I thought was not desirable for public API surface."

Could you share some context on this? Assuming the access is safe with the source assembly being properly tracked, why would that not be desireable? As in, the data would be abstracted away with a memory/span anyway, so why would the recommendation be to instead allocate an array and return that instead of the raw embedded data? 🤔

That said, I completely missed the fact in most cases Assembly.GetExecutingAssembly(); would do the trick 😄
I also wouldn't mind Jeremy's proposal of offering some nicer to use wrapper for this to avoid dealing with pointers directly.

@teo-tsirpanis
Copy link
Contributor

teo-tsirpanis commented Jun 24, 2022

What I would prefer is to make Roslyn optimize (ReadOnlyMemory<byte>)new byte[] {2, 2, 6} into emitting an internal memory manager type in the compiled assembly. It is safe, more general (would take advantage of primitives bigger than byte once #60948 is implemented), and more importantly, it needs no runtime support.

@Sergio0694
Copy link
Contributor Author

Given some more thoughts on this following @AaronRobinsonMSFT's comment, and realized we don't even actually need an assembly in the first place (which he also mentioned might be slower to retrieve than copying an array, if small enough), but a Type instance should be enough to root the containing assembly, and that can easily be retrieved from wherever the PE data is being fetched. Which means creating a safe memory manager should be doable just like so:

static ReadOnlyMemory<byte> GetSomeTextAsUtf8Memory()
{
    ReadOnlySpan<byte> data = "Hello world"u8;
    void* ptr = Unsafe.AsPointer(ref MemoryMarshal.GetReference(data));

    return new RvaSpanMemoryManager(ptr, data.Length, typeof(ThisContainingType)).Memory;
}

This is much faster than using Assembly.GetExecutingAssembly() and should be just as safe.

I still like @jkoritzinsky's suggestions of maybe having some type to support this so that developers don't have to worry about this logic and do everything manually (and I also personally would be nice to just have built-in support for passing safe constant data around with no allocation), but that can also be done in a separate, dedicated proposal if we wanted.

Feel free to either close this or repurpose this just for this new memory manager proposal 🙂

@AaronRobinsonMSFT
Copy link
Member

Feel free to either close this or repurpose this just for this new memory manager proposal 🙂

I don't think there is any need for a new API in this case. Either this can be done in a manner that avoids ALC unloading concerns using an allocation and/or this requires an update to ECMA to note the fixed nature of constants.

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jun 28, 2022
@teo-tsirpanis teo-tsirpanis closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Memory
Projects
None yet
Development

No branches or pull requests

4 participants