Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve blob reader performance #1179

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

richarddd
Copy link
Contributor

Description of changes:
Improve performance of collecting blob body by usingarrayBuffer()method.

If one or more of the packages in the /packages directory has been modified, be sure yarn changeset add has been run and its output has
been committed and included in this pull request. See CONTRIBUTING.md.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@richarddd richarddd requested review from a team as code owners March 4, 2024 15:19
@richarddd richarddd requested a review from syall March 4, 2024 15:19
@richarddd richarddd changed the title Improve blob reader Improve blob reader performance Mar 4, 2024
if (typeof Blob === "function" && stream instanceof Blob) {
return collectBlob(stream);
return new Uint8Array(await stream.arrayBuffer());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you update the unit tests for this?

what kind of performance diff does this bring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you update the unit tests for this?

Sure

what kind of performance diff does this bring?

It will depend on the payload size. But here we're using native (c++) apis to asynchronously populate an array buffer from the read stream in one go.

In the current implementation we do all these steps:

  1. we first have to base 64 encode the stream to a string (which requires a lot of memory).
  2. Then run a regex on the string to check its a valid base64 string
  3. Then do all this on each 4 bytes
    export const fromBase64 = (input: string): Uint8Array => {

So at least 2-5x is very pessimistic (for browsers).
For node is not as dramatic, but we do run a regex on the entire string to validate its base64.

Then we also have this code that does 95% the same thing:

Its out of scope for this PR but in fact, all base64 decoding that is being used to ultimately return a typed array/byte array can be removed.

@kuhe
Copy link
Contributor

kuhe commented Jun 17, 2024

https://caniuse.com/mdn-api_blob_arraybuffer checks out, and for Node.js if we have Blob (v18) global then it has .arrayBuffer() on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants