Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate AsyncLocalStorage with EventEmitter #33723

Closed
puzpuzpuz opened this issue Jun 4, 2020 · 30 comments
Closed

Integrate AsyncLocalStorage with EventEmitter #33723

puzpuzpuz opened this issue Jun 4, 2020 · 30 comments
Labels
async_hooks Issues and PRs related to the async hooks subsystem.

Comments

@puzpuzpuz
Copy link
Member

puzpuzpuz commented Jun 4, 2020

Is your feature request related to a problem? Please describe.

I'm going to port this library to AsyncLocalStorage. As a part of this migration I need to make ALS play nicely with events emitted by request and response objects. Current version of the library uses ns.bindEmitter() method available in cls-hooked (it's also present in continuation-local-storage).

Describe the solution you'd like

It's enough to add .bindEmitter(emitter, store) method into ALS:

const store = { foo: 'bar' };
als.bindEmitter(req, store);

req.on('close', () => {
  console.log(als.getStore()); // prints "{ foo: 'bar' }"
});

Describe alternatives you've considered

Of course, this integration can be implemented as a user-land module. But I believe that this will be a common case among ALS users, so it's probably worth to consider integrating ALS with EventEmitter in the core.

As more of such changes can bloat ALS API, I'd like to hear opinions from @nodejs/async_hooks before starting to work on the implementation.

Update. The consensus is to add something like AsyncResource#bind() method that would allow to do the following:

als.run({ foo: 'bar' }, () => {
  AsyncResource res = new AsyncResource('foobar');
  req.on('close', res.bind(() => {
    console.log(als.getStore()); // prints "{ foo: 'bar' }"
  }));
});
@puzpuzpuz puzpuzpuz added the async_hooks Issues and PRs related to the async hooks subsystem. label Jun 4, 2020
@benjamingr
Copy link
Member

benjamingr commented Jun 4, 2020

The goal of the method is to set the context on an existing emitter?

I'm a bit confused about why this is needed for the server case, wouldn't the request handler itself already run in the context?

Edit: closed (and reopened) due to butterfingers.

@benjamingr benjamingr reopened this Jun 4, 2020
@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jun 4, 2020

@benjamingr

The goal of the method is to set the context on an existing emitter?

Yes, that's right.

I'm a bit confused about why this is needed for the server case, wouldn't the request handler itself already run in the context?

The handler itself is run in the context, but some of events emitted for req/res may be run in a different context. Here is a simple test that illustrates the problem:

'use strict';
require('../common');
const assert = require('assert');
const { AsyncLocalStorage } = require('async_hooks');
const http = require('http');

const asyncLocalStorage = new AsyncLocalStorage();
const server = http.createServer((req, res) => {
  asyncLocalStorage.run(42, () => {
    req.on('close', () => {
      assert.strictEqual(asyncLocalStorage.getStore(), 42); // fails
    });
    assert.strictEqual(asyncLocalStorage.getStore(), 42);
    res.end('ok');
  });
});

server.listen(0, () => {
  http.get({ host: 'localhost', port: server.address().port }, () => {
    server.close();
  });
});

Edit: closed (and reopened) due to butterfingers.

No problems at all. Happens with everyone.

@Flarna
Copy link
Member

Flarna commented Jun 4, 2020

@puzpuzpuz Have you investigated where the context gets lost here?
It could be a bug somewhere in http implementation that wrong asyncIds are used somewhere resulting in new roots.

Besides the HTTP example I agree that this seems helpful - in special as there are a lot user space modules out there using emitters but don't care that much about async context used.

I'm not sure if binding a complete emitter is the right solution. Maybe this should at least allow to limit it to some events.

@targos
Copy link
Member

targos commented Jun 4, 2020

Wouldn't it make more sense to expose something that allows the user to bind any function / object method instead of specifically EventEmitter#emit ?

@puzpuzpuz
Copy link
Member Author

@puzpuzpuz Have you investigated where the context gets lost here?
It could be a bug somewhere in http implementation that wrong asyncIds are used somewhere resulting in new roots.

I don't think the same context for req/res's event listeners as for the request listener was even guaranteed, was it?

Some of them are emitted as a part of socket.on() listeners. See this one function as an example:

function abortIncoming(incoming) {

I'm not sure if binding a complete emitter is the right solution. Maybe this should at least allow to limit it to some events.

An optional filter argument for events could probably help.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jun 4, 2020

Wouldn't it make more sense to expose something that allows the user to bind any function / object method instead of specifically EventEmitter#emit ?

Probably, yes. Something similar to ns.bind() from cls-hooked might be even better than the .bindEmitter() method. With this approach user code would look like this one:

const store = { foo: 'bar' };
als.bind(req.emit, store);

req.on('close', () => {
  console.log(als.getStore()); // prints "{ foo: 'bar' }"
});

@Flarna
Copy link
Member

Flarna commented Jun 4, 2020

I don't think the same context for req/res's event listeners as for the request listener was even guaranteed, was it?

Not the same context but I would assume that context passing done via async hooks is able to link them similar as link via e.g. setTimeout works.
But sure, if events emitted from the net socket are just forwarded it's mostly clear that context is lost/wrong as socket doesn't know the current http request and may be used for several requests. It would be needed to handle the event in http first and reemit with correct context. More or less a userspace queuing problem.

@addaleax
Copy link
Member

addaleax commented Jun 4, 2020

So … trying to sort my thoughts here, and I’ll apologize for the long comment in advance.

Firstly, the idea to bind an existing EventEmitter to a specific ALS store seems odd, especially considering that an EventEmitter can have multiple consumers and those other consumers might not want that binding to a specific ALS store.

Secondly, unless I’m misunderstanding something, it feels like ALS is not the right level to approach this, because the problem is that the async tracking does not work as expected. In particular, in the example in #33723 (comment), the problem is that .on() is basically creating a new async operation, but it is not registered as such.

Thirdly, we bascially kind of ran into this kind of problem with domain as well. The solution for domain was (is) to basically monkey-patch EventEmitters so that they behave like AsyncResources where all events were emitted in the EE’s async scope.

That makes it seem to me like we want to do any of:

  1. Do the domain-like thing, and make EEs behave like AsyncResources. That would not solve the issue in the example above, though.
  2. Do a variant of that, which would be equivalent to making EventEmitter.on() create a new AsyncResource and have the corresponding .emit() call run the callback in that AsyncResource’s scope.

As an alternative, or additionally, we could also:

  1. Provide a way to bind the callback to the async scope in which it was created, i.e. req.on('close', async_hooks.bindToCurrentScope(() => { ... })). This could implicitly create an AsyncResource and run the callback in its scope whenever it is called. (Option 2 would basically mean doing this implicitly for ee.on(...).)

If I were to put myself in a position where I didn’t know a lot about Node.js internals, I think Option 2 is something that would make sense to be, because I would expect

process.nextTick(() => doSomething());

and

ee.on('event', () => doSomething());

to behave similarly – i.e. to run the callback in an async context that corresponds to where it was first registered.

I think all of these options would come with some perf impact, but also mean that we need to do less async tracking in the Node.js C++ internals overall, because we can let the wrapper objects (which often are EventEmitters) take care of that in a lot of cases. That might even lead to a net positive impact as far as Node.js-provided resources are concerned. I also think @Qard would like that move away from C++-based tracking, based on what I’ve heard him say 😉

Then, finally, I just had a short conversation with @ronag about how this would work in streams, and basically, the conclusion that I came to was that it would make sense for streams to also use custom AsyncResources to make sure that events coming from the stream are emitted in the correct async context – for example, .destroy() can be called from anywhere and subsequently lead to a 'close' event, which would mean that a .on('close') listener can basically run in any kind of async context that has no relation to when that listener was added. That seams like a special case of the situation for EventEmitters, though.

@Qard
Copy link
Member

Qard commented Jun 4, 2020

@addaleax Yep, idea 3 is basically what I’ve had in mind for awhile. I’m currently working on making a callback trampoline for InternalMakeCallback which I plan on being able to reuse for this sort of scenario to wrap JavaScript-side callbacks and emit the related events as much as possible purely on the JS-side.

By avoiding the barrier cross performance hit we should be able to make the cost of tracking additional relevant barriers like event emitters and streams less significant by relying on V8 to inline what it needs.

@devsnek
Copy link
Member

devsnek commented Jun 4, 2020

I'm also a fan of option 3.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jun 4, 2020

@addaleax

So … trying to sort my thoughts here, and I’ll apologize for the long comment in advance.

No need to apologize. Your comment is really valuable. 👍

Firstly, the idea to bind an existing EventEmitter to a specific ALS store seems odd, especially considering that an EventEmitter can have multiple consumers and those other consumers might not want that binding to a specific ALS store.

With the default behavior of EE consumers (listeners) of the same event will be executed within the same context, as the execution happen synchronously when .emit() is called. So, multiple consumers (of the same event, to be precise) will get the same store of a certain ALS instance. The proposed methods change nothing here.

BTW if one of listeners needs a separate store, it can be wrapped with als.run() (or proposed als.bind()).

Secondly, unless I’m misunderstanding something, it feels like ALS is not the right level to approach this, because the problem is that the async tracking does not work as expected. In particular, in the example in #33723 (comment), the problem is that .on() is basically creating a new async operation, but it is not registered as such.

EE itself has nothing to do with async resources, so I'm not sure where it's correct to say ".on() is basically creating a new async operation". The exact async context (resource) depends on each .emit() invocation and can even change between them.

That's why I'm not sure what should be the right level to approach this. But certainly EE API itself is not a problem and shouldn't be changed. But we may consider changing behavior of certain EEs returned by core APIs, like req/res, as users may have false expectations of their integration with async_hooks.

That makes it seem to me like we want to do any of:

  1. Do the domain-like thing, and make EEs behave like AsyncResources. That would not solve the issue in the example above, though.

Yes, that won't solve the discussed issue.

  1. Do a variant of that, which would be equivalent to making EventEmitter.on() create a new AsyncResource and have the corresponding .emit() call run the callback in that AsyncResource’s scope.

That could work if we properly integrate (say, by monkey-patching them) certain EEs used in the core with async_hooks. However, it's probably easier to change behavior of all EEs. In any case that has to be properly documented, so that users know what to expect.

As an alternative, or additionally, we could also:

  1. Provide a way to bind the callback to the async scope in which it was created, i.e. req.on('close', async_hooks.bindToCurrentScope(() => { ... })). This could implicitly create an AsyncResource and run the callback in its scope whenever it is called. (Option 2 would basically mean doing this implicitly for ee.on(...).)

That won't work in my case (and probably in other users's cases). I'm not in control of user code and other middlewares, so I can't force them to wrap EE listeners with something like async_hooks.bindToCurrentScope() function.

If I were to put myself in a position where I didn’t know a lot about Node.js internals, I think Option 2 is something that would make sense to be, because I would expect

process.nextTick(() => doSomething());

and

ee.on('event', () => doSomething());

to behave similarly – i.e. to run the callback in an async context that corresponds to where it was first registered.

I would have such expectations as EEs themselves do not imply async operations (resources), only CPS. But, once again, some of EEs (or all of them?) used in core APIs may be treated in a special way.

That seams like a special case of the situation for EventEmitters, though.

Yes, streams (certain streams used in core APIs, of course) are close to EEs.

@puzpuzpuz
Copy link
Member Author

@addaleax what would you say of .bind() method discussed in #33723 (comment) and #33723 (comment)? It's not coupled with EEs and provides some flexibility for users. Certainly, that's not a replacement of changes in EEs/streams behavior discussed above, but they're rather orthogonal.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented Jun 4, 2020

BTW .bind() could be added to AsyncResource instead of ALS. That would make it even more flexible.

Such API is close to async_hooks.bindToCurrentScope() function mentioned above, but it will give the user more control over the context. Thus, it'll be able to solve the original problem.

@Qard
Copy link
Member

Qard commented Jun 4, 2020

I think the correct place for this is in AsyncResource. The functionality is needed at every level of async correlation.

@puzpuzpuz
Copy link
Member Author

Many thanks for your inputs! I'm going to submit a PR for AsyncResource#bind(). Let's continue the conversation over there.

@puzpuzpuz
Copy link
Member Author

Here it goes: #33736

@puzpuzpuz
Copy link
Member Author

Closing this one as it can be implemented via language features and the consensus is to avoid bloating the API.

@addaleax
Copy link
Member

addaleax commented Jun 5, 2020

@puzpuzpuz I’ll reopen this since I’d still like to discuss whether we want to implement some form of built-in async tracking for EventEmitters in general, i.e. option 1 or 2 from above.

@addaleax addaleax reopened this Jun 5, 2020
@ronag
Copy link
Member

ronag commented Jun 5, 2020

Just copying in @jasnell's comment from #33749. I feel it might be relevant here:

EventEmitter is going to be a tricky one if only because of how performance sensitive it is. It absolutely makes sense for an EventEmitter to be tracked as an async resource but when I went through the exercise of making it one it ended up 2x-3x slower in regular use scenarios (streams, http servers, etc). I'm 100% in favor of doing something here but the performance loss problem absolutely needs to be addressed.

While simultaneously highlighting @addaleax's comment from earlier in this thread:

I think all of these options would come with some perf impact, but also mean that we need to do less async tracking in the Node.js C++ internals overall, because we can let the wrapper objects (which often are EventEmitters) take care of that in a lot of cases. That might even lead to a net positive impact as far as Node.js-provided resources are concerned. I also think @Qard would like that move away from C++-based tracking, based on what I’ve heard him say 😉

@puzpuzpuz
Copy link
Member Author

Just copying in @jasnell's comment from #33749. I feel it might be relevant here:

EventEmitter is going to be a tricky one if only because of how performance sensitive it is. It absolutely makes sense for an EventEmitter to be tracked as an async resource but when I went through the exercise of making it one it ended up 2x-3x slower in regular use scenarios (streams, http servers, etc).

@jasnell I wonder what was the approach? Did you try to implicitly create an AsyncResource on eventEmitter.on() call and then wrap the listener with runInAsyncScope() or it was something more complicated? After #32429 AsyncResources don't involve GC-tracking anymore when there are no destroy listeners and that could decrease the overhead, but I'm not sure about the exact improvement.

@jasnell
Copy link
Member

jasnell commented Jul 31, 2020

My apologies for missing the earlier notification from a month ago! Sigh...

Yes, the approach was to create an AsyncResource on eventEmitter.on(), in a generalized way that would enable it for all event emitter instances.

At this time, I'm heavily leaning towards @addaleax's option 3 here.

@eyalroth
Copy link

That won't work in my case (and probably in other users's cases). I'm not in control of user code and other middlewares, so I can't force them to wrap EE listeners with something like async_hooks.bindToCurrentScope() function.

I just want to add how important that this.

Consider this common use case of writing a simple express application and using body-parser to parse the JSON payload:

const express = require('express');
const bodyParser = require('body-parser');

const als = new AsyncLocalStorage();
const app = express();

app.use((req, res, next) => als.run({ requestId: generateId() }, next) );
app.use(bodyParser.json());
app.use((req, res, next) => {
  als.getStore(); // fails
  next();
});

Since body-parser uses event emitters, the requestId from the first middleware is not available in the third.

This extends to many other libraries that use events and streams -- remote calls, db, etc.

@mcollina
Copy link
Member

@eyalroth this is known and it's a problem inside express. If you would like some more details on this:

nestjs/nest#8837
fastify/fastify#3570
fastify/fastify#3571

That's the exact same issue you are facing. The problem is in body-parser and not Node.js. The fix is also linked in the issues and ideally should be added to express too.

@eyalroth
Copy link

@eyalroth this is known and it's a problem inside express. If you would like some more details on this:

@mcollina Interesting.

I tried to replace app.use(bodyParser.json()) with this:

const ee = new EventEmitter();
app.use((req, res, next) => {
  ee.prependOnceListener('foo', next);
  setTimeout(() => ee.emit('foo'), 100);
});

and indeed it worked.

It looks like under the hood body-parser is calling raw-body which only invokes next when an end event is emitted on req. Something like this:

app.use((req, res, next) => {
  req.on('data', () => console.log('data'));
  req.prependOnceListener('end', next);
});

which indeed doesn't work.

Perhaps this breaks something with express which expects all middlewares to complete before the end event on the request?

@mcollina
Copy link
Member

Perhaps this breaks something with express which expects all middlewares to complete before the end event on the request?

No. It's because the 'end' emitted by IncomingMessage is attached to the server context and not the one you are starting now. The only solution is to use AsyncResource.

@eyalroth
Copy link

eyalroth commented Jan 26, 2022

Perhaps this breaks something with express which expects all middlewares to complete before the end event on the request?

No. It's because the 'end' emitted by IncomingMessage is attached to the server context and not the one you are starting now. The only solution is to use AsyncResource.

@mcollina I see. I'm not familiar with AsyncResource and honestly the documentation isn't very clear on its usage.

Is that the intended usage?

const resources = {};

app.use((req, res, next) => {
  const requestId = generateId();
  res.locals.requestId = requestId;
  als.run({ requestId }, () => {
    resources[requestId] = new AsyncResource('foo');
    next();
  });
});

app.use(bodyParser.json());

app.use((req, res, next) => {
  const requestId = res.locals.requestId;
  resources[requestId].runInAsyncScope(() => {
    als.getStore(); // works
    next();
  });
});

If so, I'm assuming that there's no way to guarantee that my code -- that may invoke third-party async/callback functions -- will continue to run in the same execution context? After all, any third party async function might be resolved under the hood by a different execution context, and I have to manually handle each of these cases myself?

@mcollina
Copy link
Member

If so, I'm assuming that there's no way to guarantee that my code -- that may invoke third-party async/callback functions -- will continue to run in the same execution context? After all, any third party async function might be resolved under the hood by a different execution context, and I have to manually handle each of these cases myself?

Either it works or it doesn't, it won't be "random". Most libraries handle this internally, so you don't need to worry about it. Express is a fairly old codebase that was written before none of the mechanism existed and it has not been significantly updated.

@eyalroth
Copy link

eyalroth commented Jan 26, 2022

Either it works or it doesn't, it won't be "random". Most libraries handle this internally, so you don't need to worry about it. Express is a fairly old codebase that was written before none of the mechanism existed and it has not been significantly updated.

@mcollina Yep I didn't mean randomly but more like "it could potentially happen with any library".

Thank you for taking the time to explain all of this!

@mcollina
Copy link
Member

@Qard are you still interested in pursuing this issue or it could be closed?

@Qard
Copy link
Member

Qard commented Jan 26, 2022

Maybe at some point, but I think the intent of this issue is dealt with so can probably close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
async_hooks Issues and PRs related to the async hooks subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.