Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert: improve deepEqual perf for large input #12849

Closed
wants to merge 1 commit into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented May 5, 2017

Use a Map instead of an array for checking previously found
cyclic references.

This reduces complexity for an array-of-objects case from
O(n²) to O(n·log n).

Case from the issue, new output:

2, 3
3, 0
5, 0
8, 4
12, 2
18, 1
27, 4
41, 3
62, 1
93, 3
140, 4
210, 10
315, 5
473, 11
710, 16
1065, 25
1598, 40
2397, 75
3596, 119
5394, 150
8091, 216
12137, 296
18206, 419
27309, 681
40964, 944
61446, 1439
92169, 2424

Fixes: #12842

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

@nodejs-github-bot nodejs-github-bot added the assert Issues and PRs related to the assert subsystem. label May 5, 2017
@addaleax
Copy link
Member Author

addaleax commented May 5, 2017

Also, I found some weird behaviour while doing this:

a = {};
assert.deepStrictEqual([a,a], [a,{}]);

This does not throw, but I think it should; do you agree? Would you consider changing it semver-major? If not, I could do it in this PR, otherwise I’ll just wait until after this to avoid conflicts.

@mscdex mscdex added the performance Issues and PRs related to the performance of Node.js. label May 5, 2017
@addaleax addaleax requested review from joyeecheung and Trott May 5, 2017 09:18
Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if the CI passes.

lib/assert.js Outdated
const actualIndex = memos.actual.indexOf(actual);
if (actualIndex !== -1) {
if (actualIndex === memos.expected.indexOf(expected)) {
memos = memos || {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead doing if (!memos) { memos = ... }?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why not. Updated

lib/assert.js Outdated
if (actualIndex !== -1) {
if (actualIndex === memos.expected.indexOf(expected)) {
if (!memos) {
memors = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memos

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, sorry. 😄

lib/assert.js Outdated
} else {
memos.actual.map.set(actual, memos.actual.position++);
}
if (memos.expected.map.get(expected) === undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!memos.expected.map.has(expected)?

@addaleax
Copy link
Member Author

addaleax commented May 5, 2017

if (actualIndex === memos.expected.indexOf(expected)) {
if (!memos) {
memos = {
actual: { map: new Map(), position: 0 },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a WeakMap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyeecheung That thought occurred to me and I couldn’t really find a difference … would there be one? Is that going to be more performant?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@addaleax I think the GC pressure would be smaller after we leave the function because the references don't have be traversed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyeecheung ok, updated :)

@refack
Copy link
Contributor

refack commented May 5, 2017

@addaleax I keep being surprised by your energy and dedication 🥇
You could also add an eager chack at https://github.com/nodejs/node/blob/master/lib/assert.js#L408 and only sort if first compare fails. Or if (!(a instanceof Array) || !(b instanceof Array))

@addaleax
Copy link
Member Author

addaleax commented May 5, 2017

@refack That seems like a completely different issue, though? :D

@refack
Copy link
Contributor

refack commented May 5, 2017

It fit under the title "assert: improve deepEqual perf for large input". No need to sort keys of arrays. It is an O(n*log(n)) operation.

@addaleax
Copy link
Member Author

addaleax commented May 5, 2017

@refack Arrays can have extra non-indexed properties, too ;) This might not be as easy as you think. If you think you can improve the situation, feel free to experiment and see if you can get a PR together, and I’ll be happy to review.

@refack
Copy link
Contributor

refack commented May 5, 2017

Challenge accepted 🖖

@Fishrock123
Copy link
Contributor

a = {};
assert.deepStrictEqual([a,a], [a,{}]);

Imo, no. deepStrictEqual is supposed to make sure the props are strictly equal, not if the object is the same memory address.

Copy link
Contributor

@Fishrock123 Fishrock123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI again, since the last one only ran linux(?): https://ci.nodejs.org/job/node-test-pull-request/7899/

@addaleax
Copy link
Member Author

addaleax commented May 5, 2017

@joyeecheung Looks like doing this with WeakMaps is a bit more complicated than just switching the Map type, because at that part of the code it’s possible that one of the values is not a primitive (and it’s tricky to fix that because in for non-strict deepEqual a non-primitive and a primitive can compare equal).

New CI: https://ci.nodejs.org/job/node-test-commit/9684/

@addaleax
Copy link
Member Author

addaleax commented May 5, 2017

 a = {};
assert.deepStrictEqual([a,a], [a,{}]);

Imo, no. deepStrictEqual is supposed to make sure the props are strictly equal, not if the object is the same memory address.

Sorry, huh – what do memory addresses have to do with this? It’s not about the left and the right side of assert.deepStrictEqual having different object identities, it’s about different kinds of cyclic structures comparing equal.

To have a more explicit example:

a = {}; b = {};
assert.deepStrictEqual({key1:a,key2:a},{key1:a,key2:b});

should throw imo, because obj1.key1 === obj1.key2 but obj2.key1 !== obj2.key2.


const actualPosition = memos.actual.map.get(actual);
if (actualPosition !== undefined) {
if (actualPosition === memos.expected.map.get(expected)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing from Array.prototype.indexOf (which uses Strict Equality Comparison) to Map.prototype.get (which uses SameValueZero) can have subtle nuances that might result in incompatibilities, especially around NaN. Not sure if we need to care about it though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think that actually affects the logic here – if it matters at all, it should only affect cases where non-primitives compare deep-equal to primitives, but that doesn’t seem to happen anyway (e.g. assert.deepEqual([0], 0) throws even though [0] == 0).

@Trott
Copy link
Member

Trott commented May 5, 2017

a = {}; b = {};
assert.deepStrictEqual({key1:a,key2:a},{key1:a,key2:b});

One more reason to consider the implementation of deep*() in core a mistake and something that should have been left to userland.

IMO, that should not throw (so the current behavior is correct). The deep part of deepStrictEqual() is sorta kinda "hey, I'm recursive!" So it sees an object (or array, and ideally Map or Set too, basically any collection) as the value and is all like, "OK, let's go through this thing's values and see if they're all strict equal! No keys in both objects, they are deep strict equal!"

So, confusingly, the "strict" part doesn't apply to objects (or arrays or--ideally at least--Maps or Sets). It does mean "don't cast a string to a number" though. Whee!

But I do agree it's confusing and unfortunate.

Even if you disagree, I definitely wouldn't fix it in this PR because I predict a CITGM run will not be kind to the additional change.

Copy link
Member

@Trott Trott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me if CI is green and if benchmark/assert doesn't show any perf regressions. (Maybe add a benchmark to show the benefit of this change if existing benchmarks don't already show it?)

@refack
Copy link
Contributor

refack commented May 6, 2017

@joyeecheung
Copy link
Member

joyeecheung commented May 7, 2017

@addaleax hmm wait, do we even need to put primitives into the memos? The memos are for cycles and is it even possible to make a cycle with primitives? (Or do we even care if they do since they are not compared recursively?)

Also FWIW I think the example above should not throw because objects are not primitives and as the docs and the comments stare, for non-primitives we don't compare them by using operators. Objects with the same properties(or no properties), prototypes and tags should be considered equal.

@addaleax
Copy link
Member Author

addaleax commented May 7, 2017

hmm wait, do we even need to put primitives into the memos? The memos are for cycles and is it even possible to make a cycle with primitives?

I don’t think so, for both questions – but I’m not sure it’s trivial to undo that without changing the logic here for really weird edge cases.

Use a Map instead of an array for checking previously found
cyclic references.

This reduces complexity for an array-of-objects case from
O(n²) to O(n·log n).

Fixes: nodejs#12842
PR-URL: nodejs#12849
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
@addaleax
Copy link
Member Author

addaleax commented May 7, 2017

Landed in 7e5f500

@addaleax addaleax closed this May 7, 2017
addaleax added a commit that referenced this pull request May 7, 2017
Use a Map instead of an array for checking previously found
cyclic references.

This reduces complexity for an array-of-objects case from
O(n²) to O(n·log n).

Fixes: #12842
PR-URL: #12849
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
@addaleax addaleax deleted the assert-deepequal-map branch May 7, 2017 19:43
anchnk pushed a commit to anchnk/node that referenced this pull request May 19, 2017
Use a Map instead of an array for checking previously found
cyclic references.

This reduces complexity for an array-of-objects case from
O(n²) to O(n·log n).

Fixes: nodejs#12842
PR-URL: nodejs#12849
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Jeremiah Senkpiel <fishrock123@rocketmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
@jasnell jasnell mentioned this pull request May 11, 2017
@gibfahn gibfahn mentioned this pull request Jun 15, 2017
3 tasks
@gibfahn
Copy link
Member

gibfahn commented Jun 20, 2017

@addaleax if this is to be backported to v6.x it would need to be benchmarked on that branch. Feel free to backport or just change the label.

Should this be backported to v6.x-staging? If yes please follow the guide and raise a backport PR, if no let me know or add the dont-land-on label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assert Issues and PRs related to the assert subsystem. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparing arrays with assert seems to take polynomial time.