Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process: introduce codeGenerationFromString event #35157

Closed
wants to merge 2 commits into from
Closed

process: introduce codeGenerationFromString event #35157

wants to merge 2 commits into from

Conversation

vdeturckheim
Copy link
Member

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

This is an alternative to #34863 following a suggestion from @addaleax.

As for #34863, the goal is to provide a way to listen to unsafe code generation from strings as it is not possible to monkeypatch eval.

The event will only be emitted if there is at least one listener on it and removing all listeners on this event will result in the handler in V8 to never be called. In other words, if there is no listener on this event, there should be no performance impact on calling eval or the Function constructor.

@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Sep 11, 2020
doc/api/process.md Outdated Show resolved Hide resolved
doc/api/process.md Outdated Show resolved Hide resolved
lib/internal/bootstrap/node.js Outdated Show resolved Hide resolved
lib/internal/bootstrap/node.js Outdated Show resolved Hide resolved
@devsnek
Copy link
Member

devsnek commented Sep 11, 2020

This still feels like the wrong abstraction level. What if we did --code-generation-handler ./file.js instead?

@mmarchini
Copy link
Contributor

A flag won't be usable on many serverless environments, being able to set it at runtime (like this PR proposes) seems more appropriate.

@devsnek
Copy link
Member

devsnek commented Sep 11, 2020

My main concern here is that some random library has no business knowing what i'm passing to eval and new Function. My secondary concern is that the event model is wonkey.

How about this:

const release = v8.setCodeGenerationFromStringHandler((code) => {});
// ...
release();

Once called, it can't be called again until release is called.

@devsnek devsnek closed this Sep 11, 2020
@devsnek devsnek reopened this Sep 11, 2020
@mmarchini
Copy link
Contributor

This is an overall problem with the events we have on process, even uncaughtException and unhandledRejections have use cases where the application doesn't want modules to attach handlers to it. Kinda outside the scope of this PR, but could we somehow use policies to limit which modules can attach process handlers?

@guybedford
Copy link
Contributor

Agreed it would be great to see this kind of stuff moved out into loaders or a similar API as a higher level execution context. We previously discussed disabling high-level permissions on process, I would still like to see us deprecate more and more features from the process global over time. Even a new builtin module import for these features could allow policy scoping for who has access to it.

But however it is cut, likely does seem to require a new API away from process and deprecating the old on process.

@devsnek
Copy link
Member

devsnek commented Sep 12, 2020

@mmarchini do you have a constraint that this needs to be a process event? If not, I think we should go with the single-callback api.

@mmarchini
Copy link
Contributor

No constraints, as long as the handler can be defined during runtime I'm fine with it.

@vdeturckheim
Copy link
Member Author

@devsnek

My main concern here is that some random library has no business knowing what i'm passing to eval and new Function.

That's a very fair point. As of today, almost everything in Node.js can be monkeypatched and leak implementation details. For instance. This is true for core modules, but also for built-in such as RegExp.prototype.exec. eval is the only one I know that is un-mokeypatchable. Most of the time (I'd say 90%), using eval is a bad idea. This helps actually monitor how/why it is used and by who.

I am not sure I understand your secondary concern. This current PR does not change the v8 module.

@devsnek
Copy link
Member

devsnek commented Sep 14, 2020

@vdeturckheim are you saying you want random libraries to be able to tune into all evals and whatnot in the main context?

@vdeturckheim
Copy link
Member Author

@devsnek well right now, any library can spy on most of the things other pieces of code do. That's pretty much how APMs work. eval being the main exception to that rule and being at the same time, often something that should be avoided. Providing runtime visibility on calls made to eval to multiple moniroting tool is pretty useful.

@devsnek
Copy link
Member

devsnek commented Sep 14, 2020

@vdeturckheim I can't tell if you're saying you need the event model or not. can you be a bit more direct? afaict the api I proposed above should work.

@vdeturckheim
Copy link
Member Author

@devsnek gotcha, sorry I missunderstood the question ^^. I only need to place one callback for my use case. But I am afraid there might be multiple libraries needing it (say, one RASP for security, one enterprise compliance tool and an APM), and a single callback would make the use of them exclusive. That's why I used an event emitter here.

@devsnek
Copy link
Member

devsnek commented Sep 14, 2020

I guess in my ideal world I would like to see this:

setCallback((code) => {
  security(code);
  compliance(code);
  apm(code);
})

do people think this is unworkable?

@vdeturckheim
Copy link
Member Author

It would require the end user to set this up and assume no library does it without the end user knowing.
Another solution would be to expose the callback so libraries could composes multiple ones but this would end up witht the same result as the event emitter.

Overall, I found out that in the scope of instrumentation, end users don't really want to know about implementation and vendors often do their own things without considering the other ones really.

@devsnek
Copy link
Member

devsnek commented Sep 15, 2020

Well I won't block on this, but it is certainly not ideal.

@vdeturckheim
Copy link
Member Author

@cjihrig thanks for the review, it should be good :)

@Trott
Copy link
Member

Trott commented Sep 16, 2020

@nodejs/process

@benjamingr
Copy link
Member

I don't understand the use case for this - why would I use this?

@vdeturckheim
Copy link
Member Author

@benjamingr there are multiple reasons why one could want to listen to eval. The one I see and would use is to monitor if some pieces of code use eval and dynamically compare it with inputs (say http request) to ensure there is no exploitable/exploited code injection though this method.

@vdeturckheim vdeturckheim added the wip Issues and PRs that are still a work in progress. label Sep 21, 2020
@vdeturckheim
Copy link
Member Author

Adding WIP flag as I need to fix a build issue (warning emitted when compiling the callback)

@vdeturckheim vdeturckheim removed the wip Issues and PRs that are still a work in progress. label Sep 22, 2020
@vdeturckheim vdeturckheim marked this pull request as draft October 2, 2020 08:27
Comment on lines 10 to 13
process.on('codeGenerationFromString', common.mustCall((code) => {
assert.strictEqual(code, 'item.foo++');
}));
Copy link
Member

@watson watson Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @vdeturckheim pointed out in a comment on this PR, there's something not working as intend in this callback:

If an error is thrown inside of this callback, it doesn't always get picked up. In this example code, the callback will get called, but the error will be silenced:

process.on('codeGenerationFromString', (code) => {
  console.log('Hello from callback!')
  throw new Error('boom!!!') // error is silenced
})

eval('1+1')

However, this code will not silence the error:

process.on('codeGenerationFromString', (code) => {
  console.log('Hello from callback!')
  throw new Error('boom!!!') // error is not silenced
})

console.log('result of eval:', eval('1+1'))

I found that queing a microtask after the call to eval is also sufficient:

process.on('codeGenerationFromString', (code) => {
  console.log('Hello from callback!')
  throw new Error('boom!!!') // error is not silenced
})

eval('1+1')
queueMicrotask(() => {})

However, process.nextTick(() => {}) is not.

On top of this, if the listener is removed before the microtask is queued, the error is once again swallowed:

process.on('codeGenerationFromString', (code) => {
  console.log('Hello from callback!')
  throw new Error('boom!!!') // error is silenced
})

eval('1+1')

process.removeAllListeners('codeGenerationFromString')
queueMicrotask(() => {})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial guess was that the error was handled asynchronously and hence if the program ended or the listener was removed before Node.js had a chance to handle the error then it would be lost. However, if I swap the removeAllListeners and the queueMicrotask calls in the last code-example above, the error is actually retained, which I don't think it should according to that theory. So the very act of queueing the microtask - not actually waiting for it to be executed - seems to be enough for it to retain the error before the listener is removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New discovery: If I add a line of invalid code after removing the listeners, I don't get any error complaining about that line either. So it looks like it doesn't get to that line at all:

process.on('codeGenerationFromString', (code) => {
  console.log('Hello from callback!')
  throw new Error('boom!!!') // error is silenced
})

eval('1+1')

process.removeAllListeners('codeGenerationFromString')
bad_code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to compile Node.js with debug mode enabled (./configure --debug && make -j8) and ran it via lldb with breakpoints set to _exit and exit. And it looked like the program just ended normally like a regular program (which also explains the zero exit code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, with a lot of help from @addaleax I finally understand why the error was "silenced":

If V8 doesn't expect a callback given to it to call into JS-land, any JS exception thrown will not be properly handled. So an explicit try-catch has to be added to the C++ code that's calling into JS to ensure that the exception is handled. We have the same problem here:

node/src/node_task_queue.cc

Lines 153 to 159 in fa7cdd6

// V8 does not expect this callback to have a scheduled exceptions once it
// returns, so we print them out in a best effort to do something about it
// without failing silently and without crashing the process.
if (try_catch.HasCaught() && !try_catch.HasTerminated()) {
fprintf(stderr, "Exception in PromiseRejectCallback:\n");
PrintCaughtException(isolate, env->context(), try_catch);
}

I've updated this PR accordingly.

However, this unfortunately means that Node.js will not exit with an uncaught exception and so this isn't easily tested in our tests (except if we check the output to stderr I guess). And it also can be quite confusing for users if they except Node.js to crash.

Copy link
Member Author

@vdeturckheim vdeturckheim Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would process.on('uncaughtexception') still fire?
Thanks a lot for digging on this bug. this is amazing finding!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdeturckheim No it will not. The exception is now technically handled

@guybedford
Copy link
Contributor

Should dynamic import of data: URLs fall under this capacity?

@nodejs-github-bot
Copy link
Collaborator

@vdeturckheim
Copy link
Member Author

vdeturckheim commented Nov 24, 2021

@guybedford maybe 🤔 tbh, this mostly mimics the hook in V8 right now. In theory, but we could try to align on trusted types sinks. But it depends what V8 exposes. It is already a good first step in it's current form imho.

@watson watson marked this pull request as ready for review November 24, 2021 17:54
@devsnek
Copy link
Member

devsnek commented Nov 24, 2021

I think v8 should be updated to handle these exceptions rather than the current hack. Looking at the code I don't think it would be very complex and we should be able to backport it.

@watson
Copy link
Member

watson commented Nov 24, 2021

@devsnek that would be preferable sure (though we already use this hack in other parts of core - so I'm not sure how important it is??).

Do you know what the turnaround time for this sort of thing to be implemented, released and backported in V8 would normally be?

@devsnek
Copy link
Member

devsnek commented Nov 24, 2021

not sure how the thanksgiving holiday changes things but for a small change usually a day or two to get it into v8 and then you just have to backport it into node.

@watson
Copy link
Member

watson commented Nov 25, 2021

@devsnek haha that's great. I was just hoping we were not talking several months ☺️

The 'codeGenerationFromString' event is emitted
when a call is made to `eval` or `new Function`.

Co-authored-by: Thomas Watson <w@tson.dk>
@watson
Copy link
Member

watson commented Nov 25, 2021

@devsnek what's the process for getting that work started in V8 and do you know if there's anything I can do to help?

@watson
Copy link
Member

watson commented Nov 25, 2021

Small discovery which might just be related to crossing the JS/C++ boundary:

If I want to get the filename, line and column number from where eval was called in the JS source, the line/col number is not always 100% correct. Example:

process.on('codeGenerationFromString', (code) => {
  // print the frame in the stack trace where `eval` was called
  console.log('Detected code generation from string', new Error().stack.split('\n')[3].trim())
})

eval('1 + 1') // should be line 6, col 1 - this is also what is printed
someFn(eval('1 + 1')) // should be line 7, col 8 - is actually line 7, col 1
someFn(
   eval('1 + 1') // should be line 9, col 3 - is actually line 8, col 1
)
someFn(42, eval('1 + 1')) // should be line 11, col 12 - is actually line 11, col 1
someFn(
  42,
  eval('1 + 1') // should be line 14, col 3 - is actually line 12, col 1
)

Am I correct in my assumption that this is just an unfortunate side effect of crossing the JS/C++ boundary and that nothing can be done about it?

For reference, this is what the complete stack trace looks like for the first eval call in the code above:

Error
    at process.<anonymous> (my-app.js:3:55)
    at process.emit (node:events:390:28)
    at Object.<anonymous> (my-app.js:6:1)
    at Module._compile (node:internal/modules/cjs/loader:1097:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1149:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:17:47

Update: I found that this is only an issue for eval. If using new Function, Function, Function.apply, Function.call or even indirect eval (where I assign eval to another variable before calling the function), the line/column numbers are correct.

@devsnek
Copy link
Member

devsnek commented Nov 25, 2021

if you want to wait for someone from v8 to do this it probably will be several months. it should be a pretty simple change if you want to do it though, you just basically need to go around adding ASSIGN_RETURN_FAILURE_ON_EXCEPTION to Compiler::ValidateDynamicCompilationSource calls and then maybe a few RETURN_EXCEPTION_IF_SCHEDULED_EXCEPTION elsewhere. i'm available on irc/discord/etc if you want to walk through it.

@guybedford
Copy link
Contributor

@vdeturckheim the issue is just in terms of it being a security property ensuring the property is exhaustive is important, and it could be possible to write a custom interception that shares the same hook as they are the same thing from a user perspective.

@jasnell
Copy link
Member

jasnell commented Nov 25, 2021

the goal is to provide a way to listen to unsafe code generation from strings as it is not possible to monkeypatch

Can I step this back just a bit further and ask why we'd want to do this? What are the use cases here outside of logging or preventing eval by throwing in the handler?

doc/api/process.md Outdated Show resolved Hide resolved
@watson
Copy link
Member

watson commented Nov 25, 2021

@jasnell in our case we want to audit code generation from strings as simply using --disallow-code-generation-from-strings will block all usage while in reality you might want to allow some occurrences based on calling frame

@jasnell
Copy link
Member

jasnell commented Nov 25, 2021

Ok. Do you actually need access to code string being evaluated for that? Or just a stack to get the calling frame?

@vdeturckheim
Copy link
Member Author

@guybedford That what I meant by mentioning trusted types. There are multiple sinks for code generation from strings and the API should/could cover them all, so +1 :D

@jasnell in my use case, I need to know the code to identify its origin, for instance, this is used to detect RCEs by comparing the argument in the evil eval call with an incoming HTTP request to identify wether we are facing injected payloads.

@watson
Copy link
Member

watson commented Nov 30, 2021

@jasnell In my case all I need is the calling stack frame

Co-authored-by: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants