Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

refactor(rome_js_parser): Streamline parser events #2327

Merged
merged 1 commit into from
Mar 31, 2022

Conversation

MichaReiser
Copy link
Contributor

Reduce the size of a single parser event from 16 bytes to 8 bytes each by:

  • Using a NonZeroU32 for the forward parent. The forward parent can never be 0 because it stores the offset from the current event to the start of the "forwarded" parent.
  • Store the start of a node in the CompletedMarker (can't be computed because of forward parents)
  • Remove end from the Finish event and instead retrieve the last token of the node when queried (mainly to produce diagnostics).
  • Only store the end offset for each Token instead of the full range. The end offset is sufficient to reconstruct the length in the tree sink.

This reduces the memory consumption during the parse phase significantly:

  • jquery:
    • Current Bytes: 4.12 MB -> 2.12 MB
    • Max Bytes: 5.82 MB -> 3.82 MB
    • Total Bytes: 8.45 MB -> 4.37 MB
  • tex-chtml-full
    • Current bytes: 33.11 MB -> 17.11 MB
    • Max bytes: 46 MB -> 30 MB
    • Total bytes: 67.78 -> 34.92 MB

It also reduces the max bytes required during the tree sink phase.

The changes do improve parse times but not as much as I did hope for:

group                                    event                                  main
-----                                    -----                                  ----
parser/checker.ts                        1.00     63.6±1.84ms    40.9 MB/sec    1.00     63.8±0.45ms    40.8 MB/sec
parser/compiler.js                       1.00     36.3±0.77ms    28.9 MB/sec    1.03     37.5±0.38ms    27.9 MB/sec
parser/d3.min.js                         1.00     24.3±0.25ms    10.8 MB/sec    1.03     25.1±2.39ms    10.4 MB/sec
parser/dojo.js                           1.00      2.2±0.00ms    30.9 MB/sec    1.03      2.3±0.02ms    30.0 MB/sec
parser/ios.d.ts                          1.00     52.7±0.55ms    35.4 MB/sec    1.19     62.6±0.58ms    29.8 MB/sec
parser/jquery.min.js                     1.00      6.6±0.13ms    12.6 MB/sec    1.05      6.9±0.26ms    12.0 MB/sec
parser/math.js                           1.00     45.4±0.90ms    14.3 MB/sec    1.02     46.3±0.59ms    14.0 MB/sec
parser/parser.ts                         1.00  1525.9±16.73µs    31.7 MB/sec    1.02  1556.6±21.54µs    31.0 MB/sec
parser/pixi.min.js                       1.00     28.9±0.67ms    15.2 MB/sec    1.01     29.3±0.14ms    15.0 MB/sec
parser/react-dom.production.min.js       1.00      9.0±0.01ms    12.7 MB/sec    1.02      9.2±0.05ms    12.5 MB/sec
parser/react.production.min.js           1.00    466.9±1.03µs    13.2 MB/sec    1.03    481.5±3.49µs    12.8 MB/sec
parser/router.ts                         1.00   1186.9±8.65µs    50.4 MB/sec    1.03  1222.2±10.20µs    48.9 MB/sec
parser/tex-chtml-full.js                 1.00     60.5±0.68ms    15.1 MB/sec    1.10     66.4±1.53ms    13.7 MB/sec
parser/three.min.js                      1.00     32.1±0.24ms    18.3 MB/sec    1.03     33.0±0.43ms    17.8 MB/sec
parser/typescript.js                     1.00    279.9±4.87ms    33.9 MB/sec    1.04    292.2±2.93ms    32.5 MB/sec
parser/vue.global.prod.js                1.00     11.4±0.34ms    10.6 MB/sec    1.01     11.5±0.03ms    10.5 MB/sec

Tests

cargo test

@MichaReiser
Copy link
Contributor Author

!bench_parser

@@ -57,7 +52,7 @@ pub fn process(sink: &mut impl TreeSink, mut events: Vec<Event>, errors: Vec<Par
let mut forward_parents = Vec::new();

for i in 0..events.len() {
match mem::replace(&mut events[i], Event::tombstone(TextSize::default())) {
match &mut events[i] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to replace the events with tombstone because the iterator is only moving forward from here

// append `A`'s forward_parent `B`
fp = match mem::replace(&mut events[idx], Event::tombstone(TextSize::default()))
{
fp = match mem::replace(&mut events[idx], Event::tombstone()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing is necessary here because we don't want to visit the start node of a forwarded parent again.

@github-actions
Copy link

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 45250 45250 0
Passed 44310 44310 0
Failed 940 940 0
Panics 0 0 0
Coverage 97.92% 97.92% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 39 39 0
Passed 36 36 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.31% 92.31% 0.00%

ts/babel

Test result main count This PR count Difference
Total 584 584 0
Passed 506 506 0
Failed 78 78 0
Panics 0 0 0
Coverage 86.64% 86.64% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 15983 15983 0
Passed 12168 12168 0
Failed 3815 3815 0
Panics 0 0 0
Coverage 76.13% 76.13% 0.00%

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Mar 30, 2022

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 350dfe5
Status: ✅  Deploy successful!
Preview URL: https://f8ef6484.tools-8rn.pages.dev

View logs

@github-actions
Copy link

Parser Benchmark Results

group                                 main                                   pr
-----                                 ----                                   --
parser/checker.ts                     1.06    126.4±1.56ms    20.6 MB/sec    1.00    119.0±2.08ms    21.8 MB/sec
parser/compiler.js                    1.00     73.2±1.10ms    14.3 MB/sec    1.01     74.2±1.98ms    14.1 MB/sec
parser/d3.min.js                      1.04     48.4±0.58ms     5.4 MB/sec    1.00     46.3±0.29ms     5.7 MB/sec
parser/dojo.js                        1.02      4.5±0.00ms    15.4 MB/sec    1.00      4.4±0.00ms    15.7 MB/sec
parser/ios.d.ts                       1.11    114.1±1.01ms    16.4 MB/sec    1.00    103.1±2.72ms    18.1 MB/sec
parser/jquery.min.js                  1.02     13.0±0.03ms     6.3 MB/sec    1.00     12.8±0.06ms     6.5 MB/sec
parser/math.js                        1.03     90.3±1.01ms     7.2 MB/sec    1.00     87.9±0.84ms     7.4 MB/sec
parser/parser.ts                      1.03      3.1±0.00ms    15.6 MB/sec    1.00      3.0±0.00ms    16.0 MB/sec
parser/pixi.min.js                    1.04     56.6±0.76ms     7.8 MB/sec    1.00     54.5±0.44ms     8.0 MB/sec
parser/react-dom.production.min.js    1.02     17.7±0.03ms     6.5 MB/sec    1.00     17.5±0.04ms     6.6 MB/sec
parser/react.production.min.js        1.04    957.1±1.41µs     6.4 MB/sec    1.00   924.6±27.35µs     6.7 MB/sec
parser/router.ts                      1.02      2.5±0.01ms    24.3 MB/sec    1.00      2.4±0.02ms    24.8 MB/sec
parser/tex-chtml-full.js              1.08    124.5±1.93ms     7.3 MB/sec    1.00    114.8±1.22ms     7.9 MB/sec
parser/three.min.js                   1.04     63.5±0.79ms     9.2 MB/sec    1.00     61.3±0.44ms     9.6 MB/sec
parser/typescript.js                  1.03    510.4±6.54ms    18.6 MB/sec    1.00    495.2±7.10ms    19.2 MB/sec
parser/vue.global.prod.js             1.02     21.7±0.08ms     5.5 MB/sec    1.00     21.4±0.05ms     5.6 MB/sec

@MichaReiser MichaReiser changed the title refactor(rome_js_parser): Refactor Parser Events refactor(rome_js_parser): Streamline parser events Mar 30, 2022
Reduce the size of a single parser event from 16 bytes to 8 bytes each by:

* Using a `NonZeroU32` for the forward parent. The forward parent can never be 0 because it stores the offset from the current event to the start of the "forwarded" parent.
* Store the `start` of a node in the `CompletedMarker` (can't be computed because of forward parents)
* Remove `end` from the `Finish` event and instead retrieve the last token of the node when queried (mainly to produce diagnostics).
* Only store the end offset for each Token instead of the full range. The end offset is sufficient to reconstruct the length in the tree sink.

This reduces the memory consumption during the parse phase significantly:

* `jquery`:
  * Current Bytes: 4.12 MB -> 2.12 MB
  * Max Bytes: 5.82 MB -> 3.82 MB
  * Total Bytes: 8.45 MB -> 4.37 MB
* `tex-chtml-full`
  * Current bytes: 33.11 MB -> 17.11 MB
  * Max bytes: 46 MB -> 30 MB
  * Total bytes: 67.78 -> 34.92 MB

It also reduces the max bytes required during the tree sink phase.

The changes do improve parse times but not as much as I did hope for:

```
group                                    event                                  main
-----                                    -----                                  ----
parser/checker.ts                        1.00     63.6±1.84ms    40.9 MB/sec    1.00     63.8±0.45ms    40.8 MB/sec
parser/compiler.js                       1.00     36.3±0.77ms    28.9 MB/sec    1.03     37.5±0.38ms    27.9 MB/sec
parser/d3.min.js                         1.00     24.3±0.25ms    10.8 MB/sec    1.03     25.1±2.39ms    10.4 MB/sec
parser/dojo.js                           1.00      2.2±0.00ms    30.9 MB/sec    1.03      2.3±0.02ms    30.0 MB/sec
parser/ios.d.ts                          1.00     52.7±0.55ms    35.4 MB/sec    1.19     62.6±0.58ms    29.8 MB/sec
parser/jquery.min.js                     1.00      6.6±0.13ms    12.6 MB/sec    1.05      6.9±0.26ms    12.0 MB/sec
parser/math.js                           1.00     45.4±0.90ms    14.3 MB/sec    1.02     46.3±0.59ms    14.0 MB/sec
parser/parser.ts                         1.00  1525.9±16.73µs    31.7 MB/sec    1.02  1556.6±21.54µs    31.0 MB/sec
parser/pixi.min.js                       1.00     28.9±0.67ms    15.2 MB/sec    1.01     29.3±0.14ms    15.0 MB/sec
parser/react-dom.production.min.js       1.00      9.0±0.01ms    12.7 MB/sec    1.02      9.2±0.05ms    12.5 MB/sec
parser/react.production.min.js           1.00    466.9±1.03µs    13.2 MB/sec    1.03    481.5±3.49µs    12.8 MB/sec
parser/router.ts                         1.00   1186.9±8.65µs    50.4 MB/sec    1.03  1222.2±10.20µs    48.9 MB/sec
parser/tex-chtml-full.js                 1.00     60.5±0.68ms    15.1 MB/sec    1.10     66.4±1.53ms    13.7 MB/sec
parser/three.min.js                      1.00     32.1±0.24ms    18.3 MB/sec    1.03     33.0±0.43ms    17.8 MB/sec
parser/typescript.js                     1.00    279.9±4.87ms    33.9 MB/sec    1.04    292.2±2.93ms    32.5 MB/sec
parser/vue.global.prod.js                1.00     11.4±0.34ms    10.6 MB/sec    1.01     11.5±0.03ms    10.5 MB/sec
```

## Tests

`cargo test`
@MichaReiser MichaReiser merged commit f5eecdf into main Mar 31, 2022
@MichaReiser MichaReiser deleted the refactor/parser-events branch March 31, 2022 14:57
MichaReiser added a commit that referenced this pull request Apr 6, 2022
#2327 changed the parser to not store the last parsed token but instead
retrieve it by getting the kind from the last `Token` event in the parser events.

This didn't play well with skipping tokens because the parser didn't add any tokens to the `events` collection while in `skipping` mode.

This PR changes the `skipping` mode so that it only changes whatever the parser calls `bump` or `skip_as_token_trivia` but the parser will always add it to the `events` collection.

This adds a few unnecessary writes but this should be neglectable because

* Skipping token is rare and even then, often limited to very few tokens
* Moving the `bump` out of the condition may even help with branch prediction.

Fixes #2358
MichaReiser added a commit that referenced this pull request Apr 6, 2022
#2327 changed the parser to not store the last parsed token but instead
retrieve it by getting the kind from the last `Token` event in the parser events.

This didn't play well with skipping tokens because the parser didn't add any tokens to the `events` collection while in `skipping` mode.

This PR changes the `skipping` mode so that it only changes whatever the parser calls `bump` or `skip_as_token_trivia` but the parser will always add it to the `events` collection.

This adds a few unnecessary writes but this should be neglectable because

* Skipping token is rare and even then, often limited to very few tokens
* Moving the `bump` out of the condition may even help with branch prediction.

Fixes #2358
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants