Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add PipeReader and PipeWriter overloads to JsonSerializer.SerilizeAsync/DeserializeAsync #68586

Open
davidfowl opened this issue Apr 27, 2022 · 33 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@davidfowl
Copy link
Member

davidfowl commented Apr 27, 2022

Background and motivation

When we originally did System.Text.Json there was some discussion about adding overloads to support PipeReader and PipeWriter. These discussion died because we didn't want to decide on forcing every API that had a Stream overload to have one for PipeReader/PipeWriter (some of that is here #28325). ASP.NET Core can save on allocations and excess buffer copies if we had support for these primitives in the JSON serializer. This likely isn't the only change we need to make to get great performance (there are some tweaks to pipelines that can be made to improve buffer sizes), but the idea is to add the APIs so that we can improve this on top.

Today in ASP.NET Core, we use the Stream overloads and:

  • On the Deserialize side, there's no need to further copy that into a buffer before parsing JSON.
  • On the Serialize side, we can avoid, the PooledBufferWriter allocation, and the extra buffer copies into the underlying Stream (which wrap the PipeWriter).

Of course we should measure and see how this materializes but we need to expose APIs to do so (at least privately so that ASP.NET Core can properly plumb them through the chain).

API Proposal

namespace System.Collections.Generic;

public static class JsonSerializer
{
    public static Task SerializeAsync<TValue>(
           PipeWriter utf8Json, 
           TValue value,  
           JsonTypeInfo<TValue> jsonTypeInfo, 
           CancellationToken cancellationToken = default);

    public static Task SerializeAsync<TValue>(
           PipeWriter utf8Json, 
           TValue value, 
           JsonSerializerOptions? options = null, 
           CancellationToken cancellationToken = default);
   
    public static Task SerializeAsync(
            PipeWriter utf8Json,
            object? value,
            JsonTypeInfo jsonTypeInfo,
            CancellationToken cancellationToken = default);
            
    public static Task SerializeAsync(
            PipeWriter utf8Json,
            object? value,
            Type inputType,
            JsonSerializerContext context,
            CancellationToken cancellationToken = default)
    
    public static ValueTask<TValue?> DeserializeAsync<TValue>(
            PipeReader utf8Json, 
            JsonSerializerOptions? options = null, 
            CancellationToken cancellationToken = default);
    
    public static ValueTask<TValue?> DeserializeAsync<TValue>(
            PipeReader utf8Json,
            JsonTypeInfo<TValue> jsonTypeInfo,
            CancellationToken cancellationToken = default);

    public static ValueTask<object?> DeserializeAsync(
            PipeReader utf8Json,
            JsonTypeInfo jsonTypeInfo,
            CancellationToken cancellationToken = default);
            
    public static ValueTask<object?> DeserializeAsync(
            PipeReader utf8Json,
            Type returnType,
            JsonSerializerContext context,
            CancellationToken cancellationToken = default);
}

API Usage

var pipe = new Pipe();
await JsonSerialize.SerilizeAsync(pipe.Writer, new Person { Name = "David" });
await pipe.Writer.CompleteAsync();
var result = await  JsonSerialize.DeserilizeAsync<Person>(pipe.Reader);
await pipe.Reader.CompleteAsync();

Alternative Designs

No response

Risks

No response

@davidfowl davidfowl added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 27, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Apr 27, 2022
@ghost
Copy link

ghost commented Apr 27, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

When we originally did System.Text.Json there was some discussion about adding overloads to support PipeReader and PipeWriter. These discussion died because we didn't want to decide on forcing every API that had a Stream overload to have one for PipeReader/PipeWriter (some of that is here #28325). ASP.NET Core can save on allocations and excess buffer copies if we had support for these primitives in the JSON serializer. This likely isn't the only change we need to make to get great performance (there are some tweaks to pipelines that can be made to improve buffer sizes), but the idea is to add the APIs so that we can improve this on top.

Today in ASP.NET Core, we use the Stream overloads and:

  • On the Deserialize side, there's no need to further copy that into a buffer before parsing JSON.
  • On the Serialize side, we can avoid, the PooledBufferWriter allocation, and the extra buffer copies into the underlying Stream (which wrap the PipeWriter).

Of course we should measure and see how this materializes but we need to expose APIs to do so (at least privately so that ASP.NET Core can properly plumb them through the chain).

API Proposal

namespace System.Collections.Generic;

public static class JsonSerializer
{
    public static Task SerializeAsync<TValue>(PipeWriter utf8Json, TValue value, JsonSerializerOptions? options = null, CancellationToken cancellationToken = default);
    public static ValueTask<TValue?> DeserializeAsync<TValue>(PipeReader utf8Json, JsonSerializerOptions? options = null, CancellationToken cancellationToken = default)
}

API Usage

var pipe = new Pipe();
await JsonSerialize.SerilizeAsync(pipe.Writer, new Person { Name = "David"  });
await pipe.Writer.CompleteAsync();
var result = await  JsonSerialize.DeserilizeAsync<Person>(pipe.Reader);
await pipe.Reader.CompleteAsync();

Alternative Designs

No response

Risks

No response

Author: davidfowl
Assignees: -
Labels:

api-suggestion, area-System.Text.Json, untriaged

Milestone: -

@davidfowl davidfowl added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Apr 27, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Apr 27, 2022
@davidfowl
Copy link
Member Author

cc @stephentoub

@eiriktsarpalis
Copy link
Member

If we are reviewing this, it should probably be looked at in conjunction with #63795. I'm not sure if STJ should be taking a dependency on pipes, but it should certainly be possible to write extensions targeting pipes or any other streaming abstraction.

@eiriktsarpalis eiriktsarpalis added this to the Future milestone Apr 27, 2022
@stephentoub
Copy link
Member

Of course we should measure and see how this materializes but we need to expose APIs to do so (at least privately so that ASP.NET Core can properly plumb them through the chain).

Nothing need be merged for such experimentation. A motivated developer can do it all in the privacy of their own machine.

Until there's meaningful data demonstrating this is a significant improvement, I don't want to pursue this, for the same reasons we've not done so in the past.

@stephentoub stephentoub added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 27, 2022
@davidfowl
Copy link
Member Author

If we are reviewing this, it should probably be looked at in conjunction with #63795. I'm not sure if STJ should be taking a dependency on pipes, but it should certainly be possible to write extensions targeting pipes or any other streaming abstraction.

Async converters aren't needed.

Nothing need be merged for such experimentation. A motivated developer can do it all in the privacy of their own machine.
Until there's meaningful data demonstrating this is a significant improvement, I don't want to pursue this, for the same reasons we've not done so in the past.

That's fair. I'll bring this back for API review when I have those numbers.

@eiriktsarpalis
Copy link
Member

If we are reviewing this, it should probably be looked at in conjunction with #63795. I'm not sure if STJ should be taking a dependency on pipes, but it should certainly be possible to write extensions targeting pipes or any other streaming abstraction.

Async converters aren't needed.

The linked issue title might be slightly misleading, but the proposal concerns exposing resumable serialization in the STJ public APIs. This starts with extending converters of course, but it would include exposing resumable TrySerialize and TryDeserialize methods in the JsonSerializer class. I expect it should be relatively straightforward to write a Pipes extension on top of such primitives.

@davidfowl
Copy link
Member Author

So the idea would be to implement these APIs in another package outside of the JsonSerializer class on another static class? While it would be nice to be able to do that, it would be nicer to have these on the main type. I think its fine to discuss dependency issues but that's not really a big problem.

@davidfowl
Copy link
Member Author

Maybe we can do the QUIC model?:

  • Add these APIs to the public surface area but not have them be in the ref assembly. This means we need some way of having a pipelines dependency so we can do this testing.
  • Flow these changes to ASP.NET Core, so they can be used in the public API surface.
  • Measure all the things, make tweaks etc.

Alternatively, we can build a new experimental package that references System.Text.Json and has IVT so that it can use the internal APIs to build new static serializer methods that add the pipelines API.

@stephentoub I am motivated to add these APIs but I want to reduce the friction and want to make it so that we can experiment with these changes and get benchmarks in a reasonable way. I'd love to help figuring out the most effective way to do that when changes cut across repos like this.

@stephentoub
Copy link
Member

We have dotnet/runtimelab. If we want to do pure experiments and merge and iterate on them, we can add them there. If we want to add a dotnet/aspnetcorelab, or have branches in dotnet/aspnetcore that consume dependencies from dotnet/runtimelab, we can do that. But I want to avoid churning the production code in dotnet/runtime for things that are pure experimentation. QUIC is a different beast: it's a major piece of work, that's fully intended to ship, that's used by production code in .NET 6 even if itself didn't expose public APIs, etc.

@davidfowl
Copy link
Member Author

So your suggestion for this change is to use dotnet/runtimelab to add 2 APIs to the JsonSerializer, and then make an aspcorelab (we have a labs repo already but its not a fork today) that consumes this change so we can make more public API changes. Feels like using a bazooka to kill an ant 😄. I'll spend some time investigating this. But you're right this experiment isn't the same level as QUIC, UTF8String, or native AOT. It's 2 APIs that we need to add.

@stephentoub
Copy link
Member

So your suggestion for this change is to use dotnet/runtimelab to add 2 APIs to the JsonSerializer, and then make an aspcorelab (we have a labs repo already but its not a fork today) that consumes this change so we can make more public API changes.

I'm suggesting if the goal is to make end-to-end experimentation easier that we make end-to-end experimentation easier, not limited to this specific case.

@davidfowl
Copy link
Member Author

That's fair, then I'd suggest runtimelab should contain ASP.NET Core. The 2 repos should be merged into one for experimentation purposes. IMO that's a more sane approach and would allow quicker experimentation.

@davidfowl
Copy link
Member Author

@eiriktsarpalis I'll make a branch with these API that I'd like your eyes on to make sure I didn't do anything dumb. I'll see if I can get some basic dependency flow going for this change so I can test it out in a meantime while we figure out the bigger labs changes.

@gragra33
Copy link

Is there any update on this API Proposal?

@davidfowl
Copy link
Member Author

Moving this out of future to 9.0.0. I'll update the API design with more overloads for JsonSerializationContext

@davidfowl
Copy link
Member Author

davidfowl commented Nov 22, 2023

The recommendation for this to go into the System.Text.Json assembly and for System.IO.Pipelines to go into the shared framework as a result.

Related to #28760

@davidfowl davidfowl added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Nov 22, 2023
@davidfowl davidfowl added api-suggestion Early API idea and discussion, it is NOT ready for implementation api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Nov 22, 2023
@davidfowl
Copy link
Member Author

davidfowl commented Nov 22, 2023

For the writing side, there's also a discussion around adding FlushAsync to IBufferWriter to make it usable for async IO.

public interface IBufferWriter<T>
{
+    ValueTask FlushAsync(CancellationToken cancellationToken = default) => default;
}

This would be a DIM and would only be implemented on .NET 9+. That would make the SerializeAsync overloads more general purpose:

public class JsonSerializer
{
public static class JsonSerializer
{
+     public static Task SerializeAsync<TValue>(
+      IBufferWriter<byte> utf8Json, 
+       TValue value, JsonSerializerOptions? options = null, 
+       CancellationToken cancellationToken = default);
   
+    public static Task SerializeAsync(
+        IBufferWriter<byte> utf8Json,
+        object? value,
+       JsonTypeInfo jsonTypeInfo,
+       CancellationToken cancellationToken = default);
            
+     public static Task SerializeAsync(
+        IBufferWriter<byte> utf8Json,
+         object? value,
+         Type inputType,
+         JsonSerializerContext context,
+        CancellationToken cancellationToken = default)
}

We could also consider:

interface IAsyncBufferWriter<T> : IBufferWriter<T>
{
    ValueTask FlushAsync(CancellationToken cancellationToken = default);
}

Since It's likely that most code paths with IBufferWriter<T> are synchronous and this makes it a bit more obvious that the code path might flush.

@eiriktsarpalis
Copy link
Member

Sharing from offline conversation: adding PipeReader based overloads would require using the ReadOnlySequence<byte> reading capability of Utf8JsonReader. It should be noted that this implementation isn't being used anywhere else within STJ and is known to have reliability issues:

  1. Resetting a Utf8JsonReader at a certain point causes discrepancy in BytePositionInLine #30751
  2. Discrepancies in Utf8JsonReader between single- and multi-segment modes for invalid JSON #30706
  3. Improve multi-segment Utf8JsonReader test coverage for ReadOnlySequence #27949

There are likely more latent bugs to be discovered here. Non-allocating converters need to use special logic for handling ReadOnlySequence so there is a chance there could be latent bugs, both in our built-in converters and user-defined custom converters.

If and when aspnetcore switches to PipeReader-based methods, we should make sure all these issues have been ironed out to avoid regressing users.

@gragra33
Copy link

I have a PipeReader implementation and found it very reliable on very large (GB) streams. I would love to retire my implementation for an official one. Here: [Utf8JsonAsyncStreamReader

@benaadams
Copy link
Member

We have a similar issue here with a chain of custom converters and large amounts of data (0xHEX rather than base64); one of the additional issues is even doing an await JsonSerializer.SerializeAsync while in the converters it builds up the entire tree of Json in that property to memory rather than flushing to the underlying Stream

The only way around this currently seems to be to base the serialization on an PipeWriter and use the IBufferWriter.Advance method to flush the data to the output; though this has potential dangerous sync/async interactions.

For a Kestrel output its more straightforward as it has handling for this sort of behaviour to Http pipeline, but we also need to support it via WebSocket and IpcSocket which don't so will have to construct a PipeWriter over their streams

@gragra33
Copy link

gragra33 commented Dec 2, 2023

@benaadams Have a look at the Utf8JsonAsyncStreamReader. This was designed to work with any stream. You could write the WebSocket/IpcSocket data to a MemoryStream and then use our custom reader to deserialize.

@benaadams
Copy link
Member

@gragra33 am going the other way and writing out; also blow the limits of MemoryStream if buffering to it

@gragra33
Copy link

gragra33 commented Dec 3, 2023

@benaadams You won't blow out memory if done correctly. There is a link to an article with sample code and data so you can try it out. The samples provided use eBay data as it was the largest public stream available for the purposes of demonstration. The samples provided also demonstrate both file and web streams, both raw and zipped.

FYI, we're processing JSON streams over 10GB in size.

@gragra33
Copy link

gragra33 commented Dec 4, 2023

It should be noted that this implementation isn't being used anywhere else within STJ and is known to have reliability issues

@eiriktsarpalis The issue I found was with the PipeReader and variable bytes returned. When using memory pooling, all blocks must be the same size. If they're not, null values are backfilled, so when you merge all chunks, they're not correctly aligned and get filled with invalid data. Instead, I use a MemoryStream to buffer the raw bytes before deserializing. I have made it as efficient as possible, and fully benchmarked. It works reliably on data streams that are over 10GB in size however I believe that it could be improved. I am using ReadOnlySequence, ReadOnlySpan, etc fully asynchronously.

@terrajobst
Copy link
Member

terrajobst commented Dec 12, 2023

We had a quick conversation among @davidfowl @halter73 @eiriktsarpalis and myself. Points discussed:

  1. Fuzzing. If we have System.Text.Json depend on pipelines because of our use in Kestrel, we likely want to plan around additional fuzzing as well.

  2. Layering. In order to depend on pipelines we'll have to move System.IO.Pipelines into the shared framework which we can't easily do with [Experimental] attribute. We could do unnatural gymnastics with packages on top and IVT (or invent an abstraction) but we'd rather not and instead have a conversation around layering. Specifically, is it sufficient to move System.IO.Pipelines into the shared framework or do we end up with having to move it to corlib anyway? It's around 200kb so it's worth thinking about. It seems we believe that moving it into the shared framework is sufficient.

  3. Versatility Are there any other components that likely need to depend on pipelines? What about networking, such as QUIC?

@bartonjs
Copy link
Member

bartonjs commented Feb 6, 2024

Video

  • We kept all existing overloads of SerializeAsync/DeserializeAsync (and added the missing one)
  • We also feel that DeserializeAsyncEnumerable is in scope, so it was added as well.
  • Double check the nullable annotations, and linker annotations.
  • Congratulations, System.IO.Pipelines.dll for moving into the shared runtime.
namespace System.Collections.Generic;

public static class JsonSerializer
{
    public static Task SerializeAsync<TValue>(
           PipeWriter utf8Json, 
           TValue value,  
           JsonTypeInfo<TValue> jsonTypeInfo, 
           CancellationToken cancellationToken = default);

    public static Task SerializeAsync<TValue>(
           PipeWriter utf8Json, 
           TValue value, 
           JsonSerializerOptions? options = null, 
           CancellationToken cancellationToken = default);
   
    public static Task SerializeAsync(
            PipeWriter utf8Json,
            object? value,
            JsonTypeInfo jsonTypeInfo,
            CancellationToken cancellationToken = default);
            
    public static Task SerializeAsync(
            PipeWriter utf8Json,
            object? value,
            Type inputType,
            JsonSerializerContext context,
            CancellationToken cancellationToken = default);

    public static Task SerializeAsync(
            Stream utf8Json,
            object? value,
            Type inputType,
            JsonSerializerOptions? options = null,
            CancellationToken cancellationToken = default);
    
    public static ValueTask<TValue?> DeserializeAsync<TValue>(
            PipeReader utf8Json, 
            JsonSerializerOptions? options = null, 
            CancellationToken cancellationToken = default);
    
    public static ValueTask<TValue?> DeserializeAsync<TValue>(
            PipeReader utf8Json,
            JsonTypeInfo<TValue> jsonTypeInfo,
            CancellationToken cancellationToken = default);

    public static ValueTask<object?> DeserializeAsync(
            PipeReader utf8Json,
            JsonTypeInfo jsonTypeInfo,
            CancellationToken cancellationToken = default);
            
    public static ValueTask<object?> DeserializeAsync(
            PipeReader utf8Json,
            Type returnType,
            JsonSerializerContext context,
            CancellationToken cancellationToken = default);

    public static ValueTask<object?> DeserializeAsync(
           Stream utf8Json,
           Type returnType,
           JsonSerializerOptions? options = null,
           CancellationToken cancellationToken = default);

    public static IAsyncEnumerable<TValue> DeserializeAsyncEnumerable<TValue>(
            Stream utf8Json,
            JsonSerializerOptions? options = null,
            CancellationToken cancellationToken = default);

    public static IAsyncEnumerable<TValue> DeserializeAsyncEnumerable<TValue>(
            Stream utf8Json,
            JsonTypeInfo<TValue> jsonTypeInfo,
            CancellationToken cancellationToken = default);

}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 6, 2024
@steveharter
Copy link
Member

FWIW this was prototyped when STJ was just being created, before being removed due to the assembly dependency on System.IO.Pipelines:
JsonSerializer.Read.Pipe.cs
JsonSerializer.Write.Pipe.cs

@BrennanConroy
Copy link
Member

SerializeAsync overloads have been added. Deserialize will need to add more validation due to heavier usage of ReadOnlySequence which hasn't had as much testing in Json.

@eiriktsarpalis
Copy link
Member

Moving to 10.0.0 since it's unlikely we'll get to PipeReader support for .NET 9.

@eiriktsarpalis eiriktsarpalis modified the milestones: 9.0.0, 10.0.0 Jun 24, 2024
@julealgon
Copy link

This whole Stream vs PipeReader/Writer overloads thing... has the team considered at one point that .NET might be lacking some sort of underlying abstraction, or that perhaps the Stream abstraction should be reconsidered altogether?

Whenever I see stuff that always takes the same pairs of overloads like this it looks like a design red flag to me.

Asking out of curiosity (I don't know if there are any proposals out there discussing this already, if so please share them).

@FLAMESpl
Copy link

FLAMESpl commented Sep 16, 2024

Hi, would new serialization API work in pair with IAsyncEnumerables (perhaps as TValue value or object? value parameters)?

I am interesed in solution of streaming data directly from sql database to one microservice and then to another.

@eiriktsarpalis
Copy link
Member

@FLAMESpl All SerializeAsync methods should support IAsyncEnumerable values, yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

No branches or pull requests