Perfbot commands for evaluating only size #1888

workingjubilee · 2024-04-05T17:59:26Z

Hi, I don't know if this is the appropriate spot to mention this request, please feel free to redirect me if not! For WG-binary-size purposes, it might be nice to have a perfbot command to evaluate a PR strictly for its binary size impact. For a lot of such PRs, it's inappropriate to run the entire perf infrastructure because we don't expect any appreciable cycle or wall time impact, and if we do we'd probably be fine following it up reactively, and we'd rather not take up extra queue time if we don't have to.

This is especially the case with "library-only" PRs which affect fairly fiddly bits of standard library code and touch their code size.

Kobzol · 2024-04-05T18:03:19Z

Heh, I was just thinking about this yesterday. Well, the thing is, to evaluate the binary size, we need to compile the benchmarks. We don't need to do that repeatedly, so I was thinking of introducing a parameter to only run each benchmark once - and then realized that we already have it! You can do @rust-timer queue runs=1 to have a faster perf.run.

However, we only compile most benchmarks 1-3x times, so the boost won't be as big. I would estimate that only running everything once will perhaps halve the total duration (also because we now actually run everything twice even if you ask only for a single execution, because one execution is special, since it runs with the self-profiler - we'd probably need to change that).

Kobzol · 2024-04-05T18:04:28Z

Wait, unless you meant the size impact on the compiler artifacts (like the size of the standard library .so), and not benchmark binaries.

workingjubilee · 2024-04-05T18:10:47Z

I think when we evaluate small size changes we're interested in the benchmark binaries mostly?

Kobzol · 2024-04-05T18:14:29Z

Usually yeah, but some PRs also care about reducing the size of the compiler itself.

In any case, to get the binary size of benchmarks, we need to compile them, and that's the ultimate step of the benchmarking process, so it's not like we can stop sooner to make the result collection faster. The only thing that we can do is to limit the amount of compilations that we perform, which can be done using the runs=1 parameter (and also e.g. include=... to only compile a subset of benchmarks!). I don't think that there's any other obvious way to make the feedback loop shorter (of course, having more benchmark machines would help in general, hopefully there might be some progress on that front this year).

Kobzol · 2024-04-05T18:16:48Z

We actually only have about 3-4 binaries in the benchmark list, so if we want to focus e.g. only on binaries, they can be selected using the include= parameter, to make the collection much faster. But the try build itself takes about 1h 20 minutes anyway, so it's not like it would be instant.

...unless! We could do a much faster try build without LTO/PGO/BOLT, just to evaluate the binary size. That could be possible after rust-lang/rust#123451, once we'll have the option run an arbitrary CI job with @bors try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfbot commands for evaluating only size #1888

Perfbot commands for evaluating only size #1888

workingjubilee commented Apr 5, 2024

Kobzol commented Apr 5, 2024

Kobzol commented Apr 5, 2024

workingjubilee commented Apr 5, 2024

Kobzol commented Apr 5, 2024

Kobzol commented Apr 5, 2024

Perfbot commands for evaluating only size #1888

Perfbot commands for evaluating only size #1888

Comments

workingjubilee commented Apr 5, 2024

Kobzol commented Apr 5, 2024

Kobzol commented Apr 5, 2024

workingjubilee commented Apr 5, 2024

Kobzol commented Apr 5, 2024

Kobzol commented Apr 5, 2024