Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use prism parser via translator #388

Merged
merged 4 commits into from
Feb 16, 2024
Merged

Conversation

exterm
Copy link
Contributor

@exterm exterm commented Feb 1, 2024

What are you trying to accomplish?

Proof of concept for discussion #387. Packwerk spends about 2/3 of the time it runs on a large code base in parsing Ruby. Prism is ~5x faster than the current parser (whitequark/parser). So this should be a significant speed boost for packwerk.

It looks like this could be a simple, almost trivial change.

What approach did you choose and why?

We could use Prism directly, but that would require significant changes to Packwerk. Luckily @kddnewton has included a translator with prism that provides compatibility with whitequark/parser, so that it should be a drop-in replacement. We won't get the same performance improvements, but we get some, and almost for free.

What should reviewers focus on?

Type of Change

  • Bugfix
  • New feature
  • Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Additional Release Notes

  • Breaking change (fix or feature that would cause existing functionality to change)

Include any notes here to include in the release description. For example, if you selected "breaking change" above, leave notes on how users can transition to this version.

If no additional notes are necessary, delete this section or leave it unchanged.

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • It is safe to rollback this change.

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

The problem seems to be that Parser::Prism raises a StandardError here, whereas Parser::CurrentRuby raises a Parser::SyntaxError in that case.

Parser::Prism raises a more specific error on parser-prism's main branch. We could cath that one explicitly. However, it seems for compatibility parser-prism may want to raise the same exceptions as Parser::CurrentRuby?

@kddnewton
Copy link

@exterm Can you use prism v0.20.0 itself? It has Prism::Translation::Parser which is effectively this. I'm going to deprecate parser-prism

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

It has Prism::Translation::Parser which is effectively this. I'm going to deprecate parser-prism

Oooh!

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

I'm running into lots of errors that look like I'm not requiring the right thing.

require "prism/translation/parser" =>

NameError: uninitialized constant Prism::Compiler

      class Compiler < ::Prism::Compiler
                              ^^^^^^^^^^
Did you mean?  Complex
    /home/[...]/ruby/3.1.2/lib/ruby/gems/3.1.0/gems/prism-0.20.0/lib/prism/translation/parser/compiler.rb:8:in `<class:Parser>'

When I add a require "prism/compiler" I get

NoMethodError: undefined method `parse' for Prism:Module

        result = unwrap(Prism.parse(source, filepath: source_buffer.name))
                             ^^^^^^
    /home/[...]/ruby/3.1.2/lib/ruby/gems/3.1.0/gems/prism-0.20.0/lib/prism/translation/parser.rb:45:in `parse'

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

Ah! It needs a require "prism" too. Might want to add that to the instructions in https://github.com/ruby/prism/blob/main/docs/parser_translation.md

@kddnewton
Copy link

Ahh sorry it's actually the opposite, you should only require prism and not require the nested one. Everything is autoloaded so it should work just fine with just the top-level require.

@exterm exterm force-pushed the use-prism-parser branch 2 times, most recently from 1f8b4a4 to a690e33 Compare February 1, 2024 22:09
@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

Done. I had to change a test because the error message was different. Ideally it would be the same as from Parser::CurrentRuby, no?

@kddnewton
Copy link

Error messages are going to be different because they're coming straight from prism. They're meant to be more informative than CRuby, and as such they're likely to be different from the parser gem. The message themselves we're not going to make any guarantees about, as they are likely to change as we continue to improve error recovery.

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

OK! I can live with that.

@exterm exterm changed the title Proof of concept: Use prism parser Use prism parser Feb 1, 2024
@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

Green tests! Thank you a lot for your help (and all the work on Prism of course), Kevin. Will try this out tomorrow on a substantial Rails app.

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

@gmcgibbon FYI it seems to "just work" according to the test suite. Real world testing before merge seems wise.

@exterm
Copy link
Contributor Author

exterm commented Feb 1, 2024

Seems to break on numbered parameters in blocks:

undefined method `parameters' for @ NumberedParametersNode (location: (22,27)-(22,65))
└── maximum: 1
:Prism::NumberedParametersNode /home/vscode/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/prism-0.20.0/lib/prism/translation/parser/compiler.rb:1752:in `visit_block'
/home/vscode/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/prism-0.20.0/lib/prism/translation/parser/compiler.rb:244:in `visit_call_node'

The code in question is something like self.something.find { _1.graphql_name == "somethingElse" }

I currently believe that this is a bug in prism, respectively the translator

@exterm
Copy link
Contributor Author

exterm commented Feb 2, 2024

I'll wait for the next prism release with the bugfix before I make this "ready for review".

I tested with the fix on the largest Rails app I currently have access to (~1m lines of Ruby code) and it works just fine. Speedup seems to be about 30% for a full packwerk check run according to ad-hoc benchmarks in a highly virtualized environment. About 50% on a more or less recent MacBook Pro.

@kddnewton
Copy link

@exterm I'll release a patch release on Monday first thing. It'll be v0.20.1.

Thank you very much for your work on this! I'm still very interested in replacing the parser with prism as a whole (that should be much better than using the translation layer) but I recognize that's quite a bit more involved. Is that something you're still interested in working on? I'd be happy to help support in any way I can.

@exterm
Copy link
Contributor Author

exterm commented Feb 2, 2024

@kddnewton What's the advantage to packwerk from skipping the translation layer?

@kddnewton
Copy link

It should be quite a bit faster (i.e., an order of magnitude), because it won't have to go through the whole translation.

The other very tangential, loosely-defined benefit is that it uses the same AST as Ruby, so anyone contributing to packwerk will be gaining knowledge of the AST that all of the Ruby implementations use. This means new contributors won't have to learn a new AST, and in a world where prism is the parser everywhere (the world I'm trying to create) this means a lower barrier to entry for new contributors.

@kddnewton
Copy link

The last thing is that we've worked hard to provide a really good Ruby API. Instead of dealing with a single class with a type field, all of our nodes have their own classes. We have utility functions on all of the classes that should make it quite a bit easier to use. Also, if you need anything, you have a line directly to the main contributor and I'll add whatever functions you may need.

@exterm
Copy link
Contributor Author

exterm commented Feb 3, 2024

I whipped up a quick benchmark and it looks like omitting the translation could indeed provide a huge boost.

@kddnewton
Copy link

@exterm v0.21.0 is out now with these bug fixes

@exterm
Copy link
Contributor Author

exterm commented Feb 5, 2024

@kddnewton thank you, will plug it in when I find some time. Probably later today.

Is that something you're still interested in working on? I'd be happy to help support in any way I can.

I was hoping to make a full parser replacement paid work, but I'm having trouble making a business case for it. Gusto now has a Rust reimplementation of packwerk that is a lot faster, and probably faster than we can make packwerk.

I'm still interested in this for fun, but I'll have to find time for it.

@shageman
Copy link
Contributor

shageman commented Feb 6, 2024

Testing this. I am atm stuck on prism 0.17 due to our version of rbi. Had to read through the discussion to see the version dependency.

@exterm
Copy link
Contributor Author

exterm commented Feb 6, 2024

mmmh, interesting. That might make a release incorporating this a little harder.

@kddnewton
Copy link

kddnewton commented Feb 6, 2024 via email

@exterm
Copy link
Contributor Author

exterm commented Feb 6, 2024

I was going to update prism and mark this PR ready for review, but in that case I'll wait for Ruby 2.7 support in prism.

@kddnewton
Copy link

@exterm version 0.22.0 supports Ruby 2.7, so you should be able to use that.

@exterm exterm changed the title Use prism parser Use prism parser via translator Feb 8, 2024
@exterm
Copy link
Contributor Author

exterm commented Feb 8, 2024

This is ready to go

@exterm exterm marked this pull request as ready for review February 8, 2024 01:48
@exterm exterm requested a review from a team as a code owner February 8, 2024 01:48
packwerk.gemspec Outdated
@@ -50,6 +50,7 @@ Gem::Specification.new do |spec|
# For Ruby parsing
spec.add_dependency("ast")
spec.add_dependency("parser")
spec.add_dependency("prism", ">= 0.22.0") # 0.21.0 fixes numbered block params, 0.22.0 adds Ruby 2.7 compatibility
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to use parser still or can we drop it as a dependency now that we're on prism?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bundler doesn't have peer dependencies, but if it did parser would be one. When the translation layer gets loaded it tries to require parser. We don't have it as a direct dependency because we don't want to impose that on all of prism's consumers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah besides, in this PR packwerk still uses Parser::AST and a few other things.

A "clean" migration to Prism without using the translation layer, and using prism's AST format instead, would let us get rid of parser AFAICT. But it's significantly more work.

I've looked into it a bit, and among other things NodeHelpers has to be completely rewritten. I'm thinking we may want to completely remove it though since prism's AST format is a lot more user friendly and may not need this kind of leaky wrapper around it.

(@kddnewton the only thing that's been a little annoying is that the types don't seem to represent shared interfaces - e.g. between symbol and string nodes, or between class and module nodes. What's a good place to discuss these things?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can open a discussion on github.com/ruby/prism.

Copy link
Member

@gmcgibbon gmcgibbon Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bundler doesn't have peer dependencies

It does, but not directly. You can use gem entries like requires to detect if a gem exists in an application's bundle. At least, this is what I was inferring to do.

in this PR packwerk still uses Parser::AST and a few other things.

Yes, I see now we're using a translation layer, and we're halfway between using prism and parser. Peer dependencies aren't appropriate here. I don't see any actual benchmarks, but I too believe this should be faster. I'll put together a quick benchmark.

Copy link
Member

@gmcgibbon gmcgibbon Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before:

📦 Finished in 16.88 seconds

bin/packwerk check  0.30s user 0.04s system 1% cpu 33.053 total

After:

📦 Finished in 17.98 seconds

bin/packwerk check  0.29s user 0.04s system 0% cpu 34.413 total

Looks like we're at the same-ish difference to me now over multiple runs. How many times faster do we expect Prism to be using the translation layer? 5x seems like a stretch to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It heavily depends on the file unfortunately. The biggest issue is that the parser gem uses character offsets, whereas the prism gem uses byte offsets. If the file has any multibyte characters, we have to run everything through an offset filter that takes time to build and run. It's much faster if the file has no multibyte characters.

That being said, this is only going to impact the parsing speed. So if parsing isn't a huge bottleneck in packwerk itself, it's not going to be a massive win. Of course the bigger win would be translating the entire thing to using prism directly, but that's far more work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exterm do you have an application that benefits from this change? If you do, please provide benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kddnewton parsing is about 2/3 of packwerk runtime on large apps. So I'd expect a big speedup and I did get a big speedup on the app I was working on (~1 million Ruby LOC). I'll do another benchmark after updating to prism 0.23.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @gmcgibbon and @exterm can you both try v0.24.0 (sorry). Aaron and I worked on a performance improvement for creating the AST which should yield some significant speed improvements.

@exterm
Copy link
Contributor Author

exterm commented Feb 13, 2024

@gmcgibbon can we get this merged? Shopify will save money in CI 😉

@exterm
Copy link
Contributor Author

exterm commented Feb 15, 2024

Up to date measurements from a reasonably sized Rails app, using spring. Sadly I can only currently access it through a Github Codespace, so it's a highly virtualized environment. I did not get a lot of variation on repeated runs though.

Packwerk 3.1.0

cached: ~2.6s
uncached: ~32s

Packwerk 3.1.0 with Prism::Translator::Parser from Prism 0.24.0

cached: ~2.6s
uncached: ~20s

Cached run times are the same, as expected - no parsing happens on repeated cached runs.

@exterm
Copy link
Contributor Author

exterm commented Feb 15, 2024

For a single 3k line file, the difference is about 0.8s vs 1.1s - it's not dramatic, but it's faster. Moving off the translator layer would be better but is more work. It could be done as a follow-up to this PR.

@gmcgibbon
Copy link
Member

Cached run times are the same, as expected - no parsing happens on repeated cached runs.

Right, I think I'm fixating on the wrong type of run. I believe there's an improvement on uncached runs on our app too.

Uncached before:

📦 Finished in 283.79 seconds

Uncached after:

📦 Finished in 187.03 seconds

That's considerably faster, and CI would use uncached, so that's good enough for me.

packwerk.gemspec Outdated
@@ -50,6 +50,7 @@ Gem::Specification.new do |spec|
# For Ruby parsing
spec.add_dependency("ast")
spec.add_dependency("parser")
spec.add_dependency("prism", ">= 0.23.0") # 0.24.0 fixes a performance issue with the parser translator
Copy link
Member

@gmcgibbon gmcgibbon Feb 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to 0.24.0, or we can be more conservative and use 0.22.0 as the minimum with 2.7 support.

Copy link
Contributor Author

@exterm exterm Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to but somehow I only changed the comment 🤦🏼

@gmcgibbon gmcgibbon merged commit 433c4ed into Shopify:main Feb 16, 2024
20 checks passed
@gmcgibbon
Copy link
Member

Thank you @exterm, I'll run a release for this now.

@exterm
Copy link
Contributor Author

exterm commented Feb 16, 2024

@gmcgibbon do you want to add a note about the expected speedup (~30% for uncached runs on large apps) to the release notes? Or maybe just say "this makes it faster" or something... Right now it's not really clear from the release notes why the change was made.

I don't have a strong opinion, just making a suggestion.

@tylerwillingham
Copy link

tylerwillingham commented Feb 17, 2024

25% faster on the main Rails app I work in 👍
Thanks folks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants