Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark parser plus source code info generation, possibly optimize #153

Open
jhump opened this issue Jun 14, 2023 · 0 comments
Open

Benchmark parser plus source code info generation, possibly optimize #153

jhump opened this issue Jun 14, 2023 · 0 comments

Comments

@jhump
Copy link
Member

jhump commented Jun 14, 2023

The AST representation is very useful for some kinds of tools, and also provides a nice API for extracting source position information that would be otherwise unavailable in a file descriptor. That's the reason the parser first creates the AST before then generating a file descriptor.

However, the AST is definitely a source of memory consumption that could be nice to omit. So it would be nice if the parser had an alternate mode of execution where it directly generated a file descriptor. One difficulty with this is that we use a bottom-up generated parser, whereas protoc uses a top-down parser. It is much easier to compute source code info as you parse in a top-down approach, since the parser can accumulate source info paths as it descends. So the trick with skipping the AST step would be how to compute data structures needed for creating the source code info of a file descriptor.

It is possible that it's simply not worth have two separate parser paths. We first need to benchmark the parser, to see the time to descriptor proto, plus source code info generation, so we even know what portion of the compiler is spent just in those phases. (The rest being spent in linking [almost certainly the great majority] and interpreting options.) If these parse phases only make up a small amount of the total time to compile, then it is probably not worth maintaining a separate parse path. (Though there could be sufficient memory/GC savings to make it still worthwhile.)

So the first step is to create a benchmark to measure the time of just parsing and source code info generation. That way we can assess whether there is sufficient room to improve performance by omitting the AST generation phase to justify the effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant