Status of running node core benchmarks #127

gareth-ellis · 2017-08-04T09:02:57Z

Since the PR was closed once I'd merged in my changes, I figured I'd open this to track current progress on getting this working.

This is a continuation of #58 .

Current status is that I have the job setup, however we currently fail on the fact Rscript is missing. I have opened nodejs/build#821 to get this installed.

In the mean time I am going to create a branch in benchmarking with a change to not use Rscript, which we can then use for continued testing.

gareth-ellis · 2017-08-10T07:42:11Z

This now seems to be working, and is ready for testing. @mcollina you mentioned you had some PRs to try it on.

At the moment it is set to test a PR with its impact on master.

There's four parameters available:
BRANCH - defaults to master
PULL_ID - the PR number
CATEGORY - the folder of benchmarks you want to run from node/benchmark
FILTER - to limit the selection of benchmarks in that folder.

Please take a look and let me know how it goes!

Example of output :
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/8/

URL to launch from:
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/build?delay=0sec

NOTE: Be careful to understand how long the run you are submitting will take:
Here is a guide that @AndreasMadsen wrote a year or so ago, each number (time (sec) ) that you intend to run needs multiplying by 60 (as the script will run 30 of the "old" and 30 of the "new" build.

mcollina · 2017-08-10T09:37:28Z

I'm currently on vacation, I'll get this going asap when I come back (1 week).

AndreasMadsen · 2017-08-10T21:23:32Z

ProTip: To multiple by 60 just read the seconds as minutes.

gareth-ellis · 2017-08-18T20:21:31Z

Thoughts: currently its taking AGES to clone node core, What are thoughts on cloning once on the machine and then copying over to the workspace and doing a pull origin master?

Perhaps we could update the central clone once a week?

Investigating why the run @mcollina failed - will update with findings....

gareth-ellis · 2017-08-28T19:30:25Z

How to: Running core benchmarks on Node.js CI

Current function

The current jenkins job aims at allowing the impact of a PR on a particular branch to be compared. For example how does PR XXXX impact performance on master?

How to configure the job

To start with visit https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/build?delay=0sec
Here you have four values to enter:

BRANCH -

this is which branch or tag to compare to, defaults to master

PULL_ID -

pull request to test
These two variables provide the two copies of Node.js to compare - we build BRANCH, then take a copy and apply the diff of PULL_ID, and build again. We now have two copies of Node to compare

CATEGORY -

This is a category of benchmarks in the node/benchmark folder from core

FILTER -

this reduces further the benchmarks in scope and is includes benchmark where FILTER exists in the filename.

On running, a minimum of 30 iterations of each benchmark/variation per build, once they have all finished Rscript will summarise the results, suggesting improvement / regression as well as confidence intervals.

mhdawson · 2017-09-08T18:03:08Z

@gareth-ellis It would probably make sense to PR this as a doc somewhere in the repo.

gareth-ellis · 2017-09-11T19:29:00Z

@AndreasMadsen @mcollina @mscdex , if you have some cycles do you want to have a go with the job, using instructions above and see how you get on?

Also got a PR for the above doc with a view to getting something together we can send out to a wider audience and get people using this job and core benchmarks. (PR in #144 )

AndreasMadsen · 2017-09-11T19:53:35Z

I would love to see it in action but right now I don't have anything I would like to benchmark :(

mcollina · 2017-09-13T10:02:02Z

I have started https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/14/

mcollina · 2017-09-14T10:06:28Z

I think the script should automatically rebase the branch, because a difference in the V8 engine could skew the results and provide wrong feedback. Maybe this should be a checkbox, so we can verify across different release lines.

I run the tests on my laptop in June and I got no regressions (nodejs/node#12857 (comment)) - this was at the time of V8 5.9.

However the branch is still using an extremely old V8 engine, and we got some very weird results running it now:

                                                   improvement confidence     p.value
 net/net-c2s-cork.js dur=5 type="buf" len=1024          2.56 %            0.330368195
 net/net-c2s-cork.js dur=5 type="buf" len=128          -5.73 %          * 0.010873902
 net/net-c2s-cork.js dur=5 type="buf" len=16           -0.34 %            0.869100857
 net/net-c2s-cork.js dur=5 type="buf" len=32            3.41 %            0.218088510
 net/net-c2s-cork.js dur=5 type="buf" len=4            -1.04 %            0.491132059
 net/net-c2s-cork.js dur=5 type="buf" len=512          -5.45 %          * 0.032530436
 net/net-c2s-cork.js dur=5 type="buf" len=64           -1.61 %            0.579751020
 net/net-c2s-cork.js dur=5 type="buf" len=8            -2.84 %            0.242407708
 net/net-c2s.js dur=5 type="asc" len=102400            -0.86 %            0.500537810
 net/net-c2s.js dur=5 type="asc" len=16777216          -0.73 %            0.762753668
 net/net-c2s.js dur=5 type="buf" len=102400             0.55 %            0.664537769
 net/net-c2s.js dur=5 type="buf" len=16777216           1.11 %            0.557571579
 net/net-c2s.js dur=5 type="utf" len=102400            -0.39 %            0.822950308
 net/net-c2s.js dur=5 type="utf" len=16777216           1.38 %            0.453838521
 net/net-pipe.js dur=5 type="asc" len=102400            0.41 %            0.712092837
 net/net-pipe.js dur=5 type="asc" len=16777216          0.81 %            0.775137756
 net/net-pipe.js dur=5 type="buf" len=102400            0.76 %            0.466008198
 net/net-pipe.js dur=5 type="buf" len=16777216          0.90 %            0.558028886
 net/net-pipe.js dur=5 type="utf" len=102400            0.54 %            0.671323747
 net/net-pipe.js dur=5 type="utf" len=16777216          3.82 %          * 0.032465465
 net/net-s2c.js dur=5 type="asc" len=102400             0.84 %            0.509765096
 net/net-s2c.js dur=5 type="asc" len=16777216          -2.78 %            0.217136214
 net/net-s2c.js dur=5 type="buf" len=102400             0.11 %            0.925209854
 net/net-s2c.js dur=5 type="buf" len=16777216           0.03 %            0.986008241
 net/net-s2c.js dur=5 type="utf" len=102400            -2.64 %            0.129138848
 net/net-s2c.js dur=5 type="utf" len=16777216           2.26 %            0.088823248
 net/tcp-raw-c2s.js dur=5 type="asc" len=102400         2.51 %            0.292369145
 net/tcp-raw-c2s.js dur=5 type="asc" len=16777216      -0.08 %            0.972487579
 net/tcp-raw-c2s.js dur=5 type="buf" len=102400         6.07 %            0.052679575
 net/tcp-raw-c2s.js dur=5 type="buf" len=16777216       1.04 %            0.562458368
 net/tcp-raw-c2s.js dur=5 type="utf" len=102400         5.49 %            0.217685094
 net/tcp-raw-c2s.js dur=5 type="utf" len=16777216      -0.06 %            0.884432672
 net/tcp-raw-pipe.js dur=5 type="asc" len=102400        0.93 %            0.731389654
 net/tcp-raw-pipe.js dur=5 type="asc" len=16777216    -18.48 %            0.304436728
 net/tcp-raw-pipe.js dur=5 type="buf" len=102400       -0.78 %            0.781388921
 net/tcp-raw-pipe.js dur=5 type="buf" len=16777216     75.96 %            0.165276028
 net/tcp-raw-pipe.js dur=5 type="utf" len=102400       -5.66 %         ** 0.006928205
 net/tcp-raw-pipe.js dur=5 type="utf" len=16777216     -1.05 %            0.669799761
 net/tcp-raw-s2c.js dur=5 type="asc" len=102400         0.48 %            0.848273086
 net/tcp-raw-s2c.js dur=5 type="asc" len=16777216      -1.09 %            0.664140660
 net/tcp-raw-s2c.js dur=5 type="buf" len=102400         5.85 %          * 0.046938660
 net/tcp-raw-s2c.js dur=5 type="buf" len=16777216      -0.38 %            0.884173926
 net/tcp-raw-s2c.js dur=5 type="utf" len=102400        -4.80 %            0.174611084
 net/tcp-raw-s2c.js dur=5 type="utf" len=16777216      -0.71 %            0.364776237
Notifying upstream projects of job completion
Finished: SUCCESS

I kicked off the job again: https://ci.nodejs.org/job/benchmark-node-micro-benchmarks/15/.

It would be good to access the raw data, so I can use R to plot graphs.. is that pushed somewhere?

AndreasMadsen · 2017-09-14T11:09:47Z

I think the script should automatically rebase the branch, because a difference in the V8 engine could skew the results and provide wrong feedback. Maybe this should be a checkbox, so we can verify across different release lines.

The script checkouts the $BRANCH (master) build node, then cherry-pick $PULL_ID on top and build again. So it should be rebased.

However the branch is still using an extremely old V8 engine, and we got some very weird results running it now:

What are the wired results? You have to remember that there is a small risk for a false positive and with many independent test that risk becomes large. At only 5% confidence (one star), the risk of getting 5 false positive in 44 tests is 4.590962%.

plot is only valid for 44 independent test.

plot code

library(ggplot2);
c1 = data.frame(
    false.positive=0:44,
    properbility=dbinom(0:44, 44, 0.05),
    confidence='*'
)
c2 = data.frame(
    false.positive=0:44,
    properbility=dbinom(0:44, 44, 0.01),
    confidence='**'
)
c3 = data.frame(
    false.positive=0:44,
    properbility=dbinom(0:44, 44, 0.001),
    confidence='***'
)
c = rbind(rbind(c1, c2), c3)

p = ggplot(c, aes(x=false.positive))
  + geom_bar(aes(y=properbility, fill=confidence), stat="identity")
  + facet_grid(confidence ~ .)
  + scale_x_continuous(breaks=0:44, limit=c(-0.5,8)))

print(p)

mcollina · 2017-09-14T11:37:52Z

My bad. So, this is fine.

The script works beautifully then. I think we should document it in the main repo, it would be of great help to everyone.

gareth-ellis · 2017-09-14T11:40:04Z

Excellent - I will see what I can do about providing the raw data.

Yes, we plan to get the info out to collaborators to start using, but thought it would be good to have some people test first :)

I'll add a bit of a description similar to what @AndreasMadsen wrote to explain how it works, and include in the doc in #144 .

Any other issues or quirks you noticed should be documented?

mcollina · 2017-09-14T11:45:49Z

@gareth-ellis +1 to all that you just said.

Running all the http benchmarks without a filter takes forever (44 hours on my laptop and I got to 60%). We should not allow that, or strongly suggest that they need to add a filter. Running a job for 3-4 days does not sound right. IMHO The filter might well be mandatory for this script.

Posting the result to the PR would be a huge plus, but I do not know if that is easy to do.

AndreasMadsen · 2017-09-14T13:53:24Z

Running all the http benchmarks without a filter takes forever (44 hours on my laptop and I got to 60%). We should not allow that, or strongly suggest that they need to add a filter. Running a job for 3-4 days does not sound right. IMHO The filter might well be mandatory for this script.

I agree. Statistically, due to the false positive risk, it is also not very useful to run that many tests. I think we should just check the expected benchmark time using a precompiled list as shown earlier #58 (comment).

gareth-ellis · 2017-09-18T10:10:15Z

@mcollina , for getting to the raw data, if you click on your job, you should be able to open the workspace, and then browse to
e.g. https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/ws/benchmarking/experimental/benchmarks/community-benchmark/node/output.csv.

I'll modify the script to move it to the top level, so there is less moving around. I think the gotcha here could be the workspace may be cleared before each run. I will rename the file then I can be sure. Otherwise another option may be copying it to a folder on the benchmarking machine and have it synced over to the website for download. We do something similar for the charts (at https://benchmarking.nodejs.org) so it should in theory be possible. Then we could post a link in the job output to where the file will be.

For posting to a PR, I know there's some bots - in the org - are there any i can pass data to post to a PR? In the job I already have the PR number, so I have the information to send and where it needs to go.

mcollina · 2017-09-18T10:24:59Z

both approach make sense. Even uploading it to a gist on github (together with a chart, maybe), would be fantastic. Something that last for a bit of time with some meta-info (shas being tested).

gareth-ellis · 2017-09-18T10:52:13Z

OK, will look into that.

In the meantime, csvs now get saved in the root of the ws with date/timestamp:

https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/ws/

mhdawson · 2017-10-23T19:14:15Z

Are we ready to announce to the rest of the team they can start using this or is there any other testing feedback we should get first.

AndreasMadsen · 2017-10-23T19:55:02Z

@mhdawson I don't think there is anything to lose. It is not like making this public can cause anything else to fail or break.

refack · 2017-11-09T00:33:43Z

@gareth-ellis is there a reason that RUNS wasn't exposed as a parameter to the job?

gareth-ellis · 2017-11-09T06:23:04Z

the main reason was that we were using the default number set of iterations set per benchmark, I can add the RUNS parameter if you like, just need to appreciate that if you set the number too low, the results may not be particularly useful, and if set too high and with enough benchmarks in scope it could take a long time to run

refack · 2017-11-09T14:14:26Z

I looked at the code and saw that the script will use it, so I added it, because we have a PR (nodejs/node#16888 (comment)) that is very noisy, and AFAICT it would give better results with more runs but shorter iterations.

AndreasMadsen · 2017-11-09T14:21:18Z

AFAICT it would give better results with more runs but shorter iterations

In theory, it will only make a tiny difference. After 30-100 runs, the difference should be less than the numerical precision. In practice, decreasing the number of iteration can change the optimization behavior.

refack · 2017-11-09T15:16:20Z

Look at this distribution:

I think all the outliers are cases where the GC kicks in, and it's not what we're trying to look at.

AndreasMadsen · 2017-11-10T09:08:38Z

@refack what are the outliers?

AndreasMadsen · 2017-11-10T10:22:51Z

@refack I have made a long comment in the other issue (nodejs/node#16888 (comment)), hopefully that will clarify things.

mhdawson · 2017-12-04T20:13:12Z

Gareth opened nodejs/node#17359 to let collaborators know about it.

mhdawson · 2018-01-15T20:53:35Z

Anything more to be discussed in this issue ? The benchmark job is now available, publicized and initial issues resolved.

gareth-ellis mentioned this issue Aug 4, 2017

Prototype for running a subset of benchmarks on a pull request #58

Merged

mhdawson added wg-agenda benchmarking-agenda and removed wg-agenda labels Aug 14, 2017

mhdawson mentioned this issue Aug 14, 2017

Benchmarking WG Meeting Aug 14 3 EST. #128

Closed

mcollina mentioned this issue Aug 17, 2017

streams: remove all references to readableState/writableState from docs nodejs/node#12855

Closed

4 tasks

mhdawson mentioned this issue Aug 28, 2017

Bencharking WG Meeting - Aug 28 - 3 EST #134

Closed

mhdawson mentioned this issue Sep 8, 2017

Node.js Foundation Benchmarking WorkGroup Meeting 2017-09-11 #141

Closed

gareth-ellis mentioned this issue Sep 11, 2017

doc: Adding help for core benchmark in jenkins #144

Closed

mhdawson mentioned this issue Sep 25, 2017

Node.js Foundation Benchmarking WorkGroup Meeting 2017-09-25 #154

Closed

mhdawson mentioned this issue Oct 19, 2017

Node.js Foundation Benchmarking WorkGroup Meeting 2017-10-23 #164

Closed

mhdawson mentioned this issue Nov 30, 2017

Node.js Foundation Benchmarking WorkGroup Meeting 2017-12-04 #176

Closed

mhdawson mentioned this issue Jan 1, 2018

Node.js Foundation Benchmarking WorkGroup Meeting 2018-01-02 #183

Closed

mhdawson mentioned this issue Jan 13, 2018

Node.js Foundation Benchmarking WorkGroup Meeting 2018-01-15 #186

Closed

mhdawson removed the benchmarking-agenda label Jan 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of running node core benchmarks #127

Status of running node core benchmarks #127

gareth-ellis commented Aug 4, 2017

gareth-ellis commented Aug 10, 2017 •

edited

Loading

mcollina commented Aug 10, 2017

AndreasMadsen commented Aug 10, 2017

gareth-ellis commented Aug 18, 2017

gareth-ellis commented Aug 28, 2017 •

edited

Loading

mhdawson commented Sep 8, 2017

gareth-ellis commented Sep 11, 2017

AndreasMadsen commented Sep 11, 2017

mcollina commented Sep 13, 2017

mcollina commented Sep 14, 2017

AndreasMadsen commented Sep 14, 2017 •

edited

Loading

mcollina commented Sep 14, 2017

gareth-ellis commented Sep 14, 2017

mcollina commented Sep 14, 2017

AndreasMadsen commented Sep 14, 2017

gareth-ellis commented Sep 18, 2017

mcollina commented Sep 18, 2017

gareth-ellis commented Sep 18, 2017

mhdawson commented Oct 23, 2017

AndreasMadsen commented Oct 23, 2017

refack commented Nov 9, 2017

gareth-ellis commented Nov 9, 2017

refack commented Nov 9, 2017

AndreasMadsen commented Nov 9, 2017

refack commented Nov 9, 2017

AndreasMadsen commented Nov 10, 2017

AndreasMadsen commented Nov 10, 2017

mhdawson commented Dec 4, 2017

mhdawson commented Jan 15, 2018

Status of running node core benchmarks #127

Status of running node core benchmarks #127

Comments

gareth-ellis commented Aug 4, 2017

gareth-ellis commented Aug 10, 2017 • edited Loading

mcollina commented Aug 10, 2017

AndreasMadsen commented Aug 10, 2017

gareth-ellis commented Aug 18, 2017

gareth-ellis commented Aug 28, 2017 • edited Loading

How to: Running core benchmarks on Node.js CI

Current function

How to configure the job

BRANCH -

PULL_ID -

CATEGORY -

FILTER -

mhdawson commented Sep 8, 2017

gareth-ellis commented Sep 11, 2017

AndreasMadsen commented Sep 11, 2017

mcollina commented Sep 13, 2017

mcollina commented Sep 14, 2017

AndreasMadsen commented Sep 14, 2017 • edited Loading

mcollina commented Sep 14, 2017

gareth-ellis commented Sep 14, 2017

mcollina commented Sep 14, 2017

AndreasMadsen commented Sep 14, 2017

gareth-ellis commented Sep 18, 2017

mcollina commented Sep 18, 2017

gareth-ellis commented Sep 18, 2017

mhdawson commented Oct 23, 2017

AndreasMadsen commented Oct 23, 2017

refack commented Nov 9, 2017

gareth-ellis commented Nov 9, 2017

refack commented Nov 9, 2017

AndreasMadsen commented Nov 9, 2017

refack commented Nov 9, 2017

AndreasMadsen commented Nov 10, 2017

AndreasMadsen commented Nov 10, 2017

mhdawson commented Dec 4, 2017

mhdawson commented Jan 15, 2018

gareth-ellis commented Aug 10, 2017 •

edited

Loading

gareth-ellis commented Aug 28, 2017 •

edited

Loading

AndreasMadsen commented Sep 14, 2017 •

edited

Loading