Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazify more work in builtins targets #122703

Closed
wants to merge 12 commits into from
Closed

Conversation

Urgau
Copy link
Member

@Urgau Urgau commented Mar 18, 2024

This PR make some fields in Target and TargetOptions lazy.

This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields.

In particular this PR introduce a MaybeLazy<T> struct to support a borrowed, owned and lazy state.

The fields that are changed are:

  • LinkArgs fields (access to env + allocate): 54 023 236 ins1 -> 53 478 072 ins
  • CrtObjects fields (allocate): 53 478 072 ins -> 53 127 832 ins
  • llvm_name field (access to env + allocate): 53 127 832 ins -> 53 057 110 ins

This brings around -1 000 000 instructions and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.

See the latest perf run for the actual improvements, #122703 (comment).

This PR is also a step further to completely static-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.

r? @petrochenkov

Footnotes

  1. This is the total number of instructions as measured with cachegrind for the entire rustc invocation on a cargo --lib hello world, variance was really low (<15 000 ins).
    In the scale of the check-cfg feature, last time I measured CheckCfg::fill_well_known was around ~2.8 millions ins.

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 18, 2024
@rustbot
Copy link
Collaborator

rustbot commented Mar 18, 2024

These commits modify compiler targets.
(See the Target Tier Policy.)

@rust-log-analyzer

This comment has been minimized.

@Noratrieb
Copy link
Member

0.2ms is not a lot, even if you amortize it across hundreds of rustc invocations, and this does make the code more complicated. Is this really worth it?

@Urgau
Copy link
Member Author

Urgau commented Mar 19, 2024

Actually, now that I remeasured it, the difference is big than I previously though. With stage1 the minimum wall-time measured is now the same, while before with nightly (so way more aggressive optimization) the minimum was never below 0.4ms; therefore this PR has the potential to nearly eliminate the cost of loading all the well known cfgs.

So yes, I definitively think it's worth it.

Note that the measurement were done on a laptop where the noise is non-negligible, that's why I have 50 rounds of warmup, so the wall-time has to be taken with some precaution; but what's clear even with the noise is the clear decrease in wall-time.

$ hyperfine --warmup 50 "rustc +stage1 --crate-type=lib src/lib.rs" "rustc +stage1 --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs"
Benchmark 1: rustc +stage1 --crate-type=lib src/lib.rs
  Time (mean ± σ):      20.3 ms ±   1.5 ms    [User: 11.3 ms, System: 8.0 ms]
  Range (min … max):    17.9 ms …  24.9 ms    132 runs

Benchmark 2: rustc +stage1 --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs
  Time (mean ± σ):      20.7 ms ±   1.7 ms    [User: 12.7 ms, System: 6.9 ms]
  Range (min … max):    17.9 ms …  25.5 ms    133 runs

Summary
  rustc +stage1 --crate-type=lib src/lib.rs ran
    1.02 ± 0.11 times faster than rustc +stage1 --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs
$ hyperfine --warmup 50 "rustc +nightly --crate-type=lib src/lib.rs" "rustc +nightly --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs"
Benchmark 1: rustc +nightly --crate-type=lib src/lib.rs
  Time (mean ± σ):      17.8 ms ±   1.3 ms    [User: 8.2 ms, System: 8.6 ms]
  Range (min … max):    15.8 ms …  26.5 ms    159 runs

Benchmark 2: rustc +nightly --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs
  Time (mean ± σ):      18.3 ms ±   1.3 ms    [User: 9.1 ms, System: 8.0 ms]
  Range (min … max):    16.3 ms …  25.5 ms    148 runs

Summary
  rustc +nightly --crate-type=lib src/lib.rs ran
    1.03 ± 0.10 times faster than rustc +nightly --crate-type=lib --check-cfg='cfg()' -Zunstable-options src/lib.rs

Regarding the code complexity, the vast majority of it comes from adapting (ie adding MaybeLazy::lazy calls) to the fields in each and every targets, while there are no code changes outside rustc_target. The code complexity increase is IMO limited and contained.

Adding new targets shouldn't be more difficult than before.

@petrochenkov
Copy link
Contributor

I dislike this for the same reason as #122207, but at much larger scale.
I'd also prefer to not do this, unless cfg checking is enabled by default, blocking on rust-lang/cargo#13571.
Maybe be we'll have to do this, but it won't be a good day for target specs.
@rustbot blocked

@rustbot rustbot added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 19, 2024
@Urgau
Copy link
Member Author

Urgau commented Mar 19, 2024

I dislike this for the same reason as #122207, but at much larger scale.

I understand why you don't like #122207, I also don't like it much, I mainly proposed that one for discussion and perf test.

However I think this PR is quite different, it goes into the direction of static target definitions, which IMO is sufficient in it self, even if it doesn't bring any perf improvements. @petrochenkov Could you elaborate on why you don't like this PR?

EDIT: To elaborate a bit more, I always found it weird that the target specifications were not immutable, for the most part they are, except for some Apple shenanigans; and if they were and const-eval was a bit more permissive we could have them in a simple static which IMO would be cleaner and as a bonus would be better for check-cfg.

EDIT2: Yes, I realized I'm going into the static discussions, even through I said I wouldn't; but I really think this goes into a good direction by permitting more data to be static.

@bors

This comment was marked as resolved.

@rustbot rustbot added the O-macos Operating system: macOS label May 6, 2024
@lqd
Copy link
Member

lqd commented May 6, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 6, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request May 6, 2024
Lazify more work in builtins targets

This PR make some fields in `Target` and `TargetOptions` lazifyable.

This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields.

In paticular this PR introduce a `MaybeLazy<T>` struct to support a borrowed, owned and lazy state.

The fields that are changed are:
 - `LinkArgs` fields (access to env + allocate): ~54 023 236 ins[^1] -> ~53 478 072 ins
 - `CrtObjects` fields (allocate): ~53 478 072 ins -> ~53 127 832 ins
 - `llvm_name` field (access to env + allocate): ~53 127 832 ins -> ~53 057 110 ins

This brings around -1 000 000 ins*tructions* and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.

*This PR is also a step further to completely `static`-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.*

[^1]: This is the total number of instructions as measured with `cachegrind` for the entire `rustc` invocation on a cargo --lib hello world, variance was really low (<15 000 ins).
In the scale of the check-cfg feature, last time I measured `CheckCfg::fill_well_known` was around ~2.8 millions ins.

r? `@petrochenkov`
@bors
Copy link
Contributor

bors commented May 6, 2024

⌛ Trying commit 2a57448 with merge c3871b5...

@bors
Copy link
Contributor

bors commented May 6, 2024

☀️ Try build successful - checks-actions
Build commit: c3871b5 (c3871b59880d404e451097b32bf6f68bcbd565f1)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c3871b5): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.7% [0.2%, 1.4%] 4
Improvements ✅
(primary)
-1.3% [-2.2%, -0.3%] 4
Improvements ✅
(secondary)
-0.8% [-1.9%, -0.2%] 18
All ❌✅ (primary) -1.3% [-2.2%, -0.3%] 4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.1% [2.1%, 2.1%] 1
Regressions ❌
(secondary)
6.8% [5.5%, 8.0%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-5.1% [-5.1%, -5.1%] 1
All ❌✅ (primary) 2.1% [2.1%, 2.1%] 1

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.8% [2.4%, 3.4%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.2% [-2.2%, -2.2%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 676.654s -> 674.986s (-0.25%)
Artifact size: 315.91 MiB -> 315.95 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 6, 2024
@Urgau Urgau mentioned this pull request May 7, 2024
@Urgau
Copy link
Member Author

Urgau commented May 12, 2024

rust-lang/cargo#13571 has been merged, the PR has been rebased and the latest perf run shows some clear improvements across multiple benchmarks 🎉.

@rustbot labels -S-blocked +S-waiting-on-review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels May 12, 2024
@petrochenkov
Copy link
Contributor

petrochenkov commented May 28, 2024

@Urgau
To clarify the status, the default cfg checking is now in both stable and bootstrap rustc and cargo, and is not going to be reverted due to issues like #124800 and similar, right?

Could you give a link to the PR with perf regressions that first enables cfg checking in a way observable to perf checking on merge?

@bors
Copy link
Contributor

bors commented Jun 18, 2024

⌛ Trying commit d7ef5e6 with merge d013d1a...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 18, 2024
Lazify more work in builtins targets

This PR make some fields in `Target` and `TargetOptions` lazy.

This is done in order to reduce the impact of check-cfg which needs to load all the built-ins targets but doesn't need all the fields.

In particular this PR introduce a `MaybeLazy<T>` struct to support a borrowed, owned and lazy state.

~~The fields that are changed are:~~
 - ~~`LinkArgs` fields (access to env + allocate): 54 023 236 ins[^1] -> 53 478 072 ins~~
 - ~~`CrtObjects` fields (allocate): 53 478 072 ins -> 53 127 832 ins~~
 - ~~`llvm_name` field (access to env + allocate): 53 127 832 ins -> 53 057 110 ins~~

~~This brings around -1 000 000 ins*tructions* and -0.2/0.3ms of less wall-time (with some measurement even having the same minimum wall-time with or without check-cfg) 🎉.~~

See the latest perf run for the actual improvements, rust-lang#122703 (comment).

*This PR is also a step further to completely `static`-fying all the build-in targets, if we one day we decide to do it, but that's a separate conversion.*

[^1]: This is the total number of instructions as measured with `cachegrind` for the entire `rustc` invocation on a cargo --lib hello world, variance was really low (<15 000 ins).
In the scale of the check-cfg feature, last time I measured `CheckCfg::fill_well_known` was around ~2.8 millions ins.

r? `@petrochenkov`
@bors
Copy link
Contributor

bors commented Jun 18, 2024

☀️ Try build successful - checks-actions
Build commit: d013d1a (d013d1af6909ef48ddac409375e4f0746655c5ad)

@rust-timer

This comment has been minimized.

@rust-timer

This comment was marked as outdated.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 18, 2024
@Urgau
Copy link
Member Author

Urgau commented Jun 18, 2024

Standard LazyLock already supports stateful fn objects. I was able to avoid all MaybeLazy::lazy calls for CrtObjects this way - 2eefb88.

Indeed. This has now been done for CRT objects and link args. (pushed a separate commits, for easier review and maintenance for me)


The latest (and most accurate) perf run doesn't show as much perf improvement as the first one, I tried to improve more but it didn't lead anywhere.

(it's worth noting that they are not directly comparable given that they are more than a month apart - maybe worth redoing a perf run without the latest changes?)

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jun 18, 2024
compiler/rustc_target/src/spec/maybe_lazy.rs Show resolved Hide resolved
compiler/rustc_target/src/spec/maybe_lazy.rs Outdated Show resolved Hide resolved
compiler/rustc_target/src/spec/maybe_lazy.rs Show resolved Hide resolved
compiler/rustc_target/src/spec/maybe_lazy.rs Outdated Show resolved Hide resolved
compiler/rustc_target/src/spec/mod.rs Show resolved Hide resolved
compiler/rustc_target/src/spec/base/aix.rs Outdated Show resolved Hide resolved
@petrochenkov
Copy link
Contributor

Let's maybe split these changes and try to benchmark and land them separately.
MaybeLazy<str> first, then CRT objects, then link args.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 19, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 19, 2024
Lazify `Target::llvm_target` field

This PR lazify the `Target::llvm_target` field by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state).

Split from rust-lang#122703
r? `@petrochenkov`
bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 23, 2024
Lazify CRT objects fields initilization

This PR lazify the CRT objects by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state).

Split from rust-lang#122703
r? `@petrochenkov`
@bors
Copy link
Contributor

bors commented Jun 27, 2024

☔ The latest upstream changes (presumably #126907) made this pull request unmergeable. Please resolve the merge conflicts.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 19, 2024
Lazify `TargetOptions::*link_args` fields

This PR lazify the link args by introducing `MaybeLazy`, a 3-way lazy container (borrowed, owned and lazied state).

Split from rust-lang#122703
r? `@petrochenkov`
@petrochenkov
Copy link
Contributor

#127992 is closed so closing this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-macos Operating system: macOS perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants