Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The http parser is not async, why does it hook into the async_hook machinery? #108

Open
mcollina opened this issue Sep 6, 2023 · 7 comments

Comments

@mcollina
Copy link
Member

mcollina commented Sep 6, 2023

The http parser does quite a bit of additional JS/C++ transitions by being observable via async_hooks. In fact, a wasm build is faster than C++ by avoiding that.

(this is just a hunch, but the benefits should be there).

@H4ad
Copy link
Member

H4ad commented Sep 11, 2023

Can you reference some files you saw?

I did a little exploration on http.Agent and it looks like it exists to fix issues like this nodejs/node#34401

Maybe after nodejs/node#48528 lands we can explore whether it's possible to remove it.

@ronag
Copy link
Member

ronag commented Sep 11, 2023

This is regarding the http server implementation. http.Agent is for the client, is already pretty much inofficially deprecated in favor of undici (which uses WASM).

@Qard
Copy link
Member

Qard commented Sep 11, 2023

Someone at some point decided they wanted to track http parsing as a resource as an http request would otherwise just look like a plain TCP socket. There's no new "resource" object created for an http request, it just upgrades a TCP socket to an HTTP connection. Because of how async_hooks is tied to the concept of a single id mapping to a single resource object the resource object of the TCP socket can't be reused for the HTTP socket events without breaking things. The choice was either don't differentiate HTTP connections in async_hooks or find some other object to be the "resource" of the http request. For whatever reason it was decided that making the http parser the resource was an appropriate solution. That decision then also led to discovering the http.Agent pooling problem on the client request side, but the use of the http parser as the resource object for an HTTP requests happens for both incoming and outgoing HTTP.

This is one of the countless reasons I want us making more purpose-focused APIs so we can mark the entire async_hooks API as deprecated or internal. Unfortunately removing this behaviour will probably break users, which is why I'm pushing so hard on getting AsyncLocalStorage off async_hooks so we can get the biggest use case off of it and not have to care so much anymore about breaking things.

@Flarna
Copy link
Member

Flarna commented Sep 12, 2023

It's all about tracking individual async "operations". For a long time async hooks was the thing to track this. By introducing AsyncLocalStore this was moved a bit towards using the async resource object instead keeping a separate map.
Still init hook is needed for propagation.

The current PR from @Qard moves parts of this propagation work into v8 as there was helpful functionality added.

There is work ongoing to move even more into the JS language (see https://github.com/tc39/proposal-async-context).

But still it's usercode which has to somehow "mark" individual operations (HTTP Requests) on shared resources (Socket). This is currently done via AsyncResource/AsyncLocalStore.run(),AsyncLocalStore.snapshot(),...
Besides HTTP there are quite a lot other locations inside node core which require special handling because v8 is not aware of it (fs, timers,...see async_wrap for details).

There is for example EventEmitterAsyncResource which enhances the sync event emitter api in a way to make it easy usable to signal async continuations from specific other (not current) operations.

@mcollina
Copy link
Member Author

Unfortunately removing this behaviour will probably break users

@Qard Why? All code in http parser is synchronous, so there is literally no reason for it. The only fix that might be required is adding an AsyncResource somewhere.


It's all about tracking individual async "operations".

@Flarna the http parser is not async at all. It's a purely synchronous library.


The fact that the http parser has this behavior implemented in C++ (vs JS) requires more back-and-forth across the C++/JS boundary, slowing everything down. Undici got a performance speed-up by dropping the native http parser and moving to the same code running inside WASM (wasm is slower than native code).

@Flarna
Copy link
Member

Flarna commented Sep 12, 2023

Yes, the http parser is sync but HTTP operations itself are not.

For whatever reason the parser instance was used as the resource object to identify/track an individual HTTP request at some point in time. As the HTTP Parser instance is reused between client and server and across requests it requires extra care unfortunately. One could say it was a bad choice but well that's a bit late now.

I think a much better object to track an HTTP request would be the IncomingMessage instance created for each request. I tried a bit a while ago to refactor HTTP implementation accordingly but failed. Doesn't mean it's impossible, just I found no non breaking solution in a reasonable amount of time (well, I'm not an http lib expert).

Another variant to reduce this binding would be to avoid reusing the HTTP parser instances. This was tried in nodejs/node#24330 which was replaced by nodejs/node#25094. Even nodejs/node#25094 says "do not reuse HTTPParser" in the headline it's misleading because the HTTPParser itself is still reused but on every reuse a new object is actually used as async resource.

I think a comparison to unidic is not that fair here. HTTP in node has a terrible long history and any refactoring holds quite some risk to break a lot. I'm quite sure a rewrite with knowledge of now and without thinking about compatibility issues would result in a significant performance improvement in HTTP. But I guess this issue is the wrong place to discuss a HTTP rewrite :o).

@Qard
Copy link
Member

Qard commented Sep 12, 2023

@mcollina It's not needed for async tracking. It is needed for differentiating raw TCP connections from HTTP connections.

It's probably possible to replace it with an equivalent AsyncResource somewhere else. All I meant to convey was that just plain removing it will likely break users because some might be using the different resource type to identify sockets as HTTP requests so without that we might break them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants