Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always format to internal String in FmtPrinter #94131

Merged
merged 2 commits into from
Feb 24, 2022

Conversation

Mark-Simulacrum
Copy link
Member

This avoids monomorphizing for different parameters, decreasing generic code
instantiated downstream from rustc_middle -- locally seeing 7% unoptimized LLVM IR
line wins on rustc_borrowck, for example.

We likely can't/shouldn't get rid of the Result-ness on most functions, though some
further cleanup avoiding fmt::Error where we now know it won't occur may be possible,
though somewhat painful -- fmt::Write is a pretty annoying API to work with in practice
when you're trying to use it infallibly.

@rust-highfive
Copy link
Collaborator

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 18, 2022
@rust-highfive
Copy link
Collaborator

r? @oli-obk

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 18, 2022
@Mark-Simulacrum

This comment was marked as resolved.

@rust-timer

This comment was marked as resolved.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 18, 2022
@bors

This comment was marked as resolved.

@bors
Copy link
Contributor

bors commented Feb 18, 2022

☀️ Try build successful - checks-actions
Build commit: e19edf03ef30826c4928f4a631b37e4871e43ab4 (e19edf03ef30826c4928f4a631b37e4871e43ab4)

@rust-timer
Copy link
Collaborator

Queued e19edf03ef30826c4928f4a631b37e4871e43ab4 with parent b8c56fa, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e19edf03ef30826c4928f4a631b37e4871e43ab4): comparison url.

Summary: This benchmark run shows 32 relevant regressions 😿 to instruction counts.

  • Average relevant regression: 1.2%
  • Largest regression in instruction counts: 3.0% on full builds of deeply-nested-async check

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Feb 19, 2022
@oli-obk
Copy link
Contributor

oli-obk commented Feb 19, 2022

While we reduce bootstrap time by 9s, these regressions seem fairly bad. I see three other solutions that could be tried:

  • use dynamic dispatch
  • explicitly monomorphize the two cases and expose only those, preventing duplication in every crate
  • replace the generic param with fmt::Formatter and use some formatting wrapper for the string cases

@Mark-Simulacrum
Copy link
Member Author

I plan to dig into the regressions - my guess is that they're fixable along one of those lines. Maybe also preallocating some space in the buffer since we expect to print something.

@Mark-Simulacrum Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2022
@Mark-Simulacrum
Copy link
Member Author

I am pretty confident that the regressions here seem to be inlining decisions slightly changing -- looking at cachegrind diffs, it doesn't seem like the diffs belong to formatting code (mostly in ty_relate and similar generic code).

I'm going to try and see if I can confirm that more closely with some disassembly, and maybe see if there's a few optimizations that can be applied regardless (e.g., by making use of the new information that our destination buffer is infallible), but my initial sense is that these regressions are likely somewhat outside the direct control here. We're just moving a bunch of code to being codegen'd just the one time, which naturally means there's a little less information for LLVM to make use of in some places -- things like which flags were set on FmtPrinter, for example -- which leads to slightly worse performance.

impl<'a, 'tcx, F> Deref for FmtPrinter<'a, 'tcx, F> {
type Target = FmtPrinterData<'a, 'tcx, F>;
impl<'a, 'tcx> Deref for FmtPrinter<'a, 'tcx> {
type Target = FmtPrinterData<'a, 'tcx>;
fn deref(&self) -> &Self::Target {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be enough to slap #[inline] on all the functions in this file that appear in cachegrind for the first time after this PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these functions appear in cachegrind -- they're not the direct problem. For example on stm32f4-check-full, we see a diff like this:

245,045,449  PROGRAM TOTALS

--------------------------------------------------------------------------------
Ir             file:function
--------------------------------------------------------------------------------
  372,484,116  ???:<core::iter::adapters::map::Map<core::iter::adapters::enumerate::Enumerate<core::iter::adapters::zip::Zip<core::iter::adapters::copied::Copied<core::slice::iter::Iter<rustc_middle::ty::subst::>
 -326,700,323  ???:rustc_middle::ty::relate::relate_substs::<rustc_infer::infer::equate::Equate>
  123,186,012  ???:<rustc_middle::ty::subst::SubstFolder as rustc_middle::ty::fold::TypeFolder>::fold_ty
   86,186,243  ???:rustc_middle::ty::relate::super_relate_tys::<rustc_infer::infer::equate::Equate>
  -55,808,143  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::super_fold_with::<rustc_middle::ty::subst::SubstFolder>
   30,561,468  ???:<alloc::collections::btree::map::entry::OccupiedEntry<rustc_infer::infer::region_constraints::Constraint, rustc_infer::infer::SubregionOrigin>>::remove_entry
  -27,764,057  ???:<alloc::collections::btree::node::Handle<alloc::collections::btree::node::NodeRef<alloc::collections::btree::node::marker::Mut, rustc_infer::infer::region_constraints::Constraint, rustc_infer:>
   20,749,456  ???:<&mut alloc::vec::Vec<ena::unify::VarValue<rustc_type_ir::TyVid>> as core::convert::AsRef<[ena::unify::VarValue<rustc_type_ir::TyVid>]>>::as_ref
   18,135,210  ???:<rustc_middle::mir::Rvalue>::ty::<rustc_middle::mir::Body>
   15,490,475  ???:<rustc_middle::ty::fold::RegionFolder as rustc_middle::ty::fold::FallibleTypeFolder>::try_fold_ty
  -12,892,781  ???:<rustc_middle::ty::Ty as rustc_middle::ty::fold::TypeFoldable>::super_fold_with::<rustc_middle::ty::fold::RegionFolder>
   12,520,606  ???:<rustc_infer::infer::combine::CombineFields>::instantiate
   12,024,623  ???:<rustc_data_structures::stable_hasher::StableHasher>::finish::<u128>
  -11,945,001  ???:<core::iter::adapters::map::Map<std::collections::hash::map::Iter<rustc_span::def_id::LocalDefId, rustc_hir::hir_id::ItemLocalId>, rustc_data_structures::stable_hasher::stable_hash_reduce<rust>
  -10,922,924  ???:<rustc_middle::ty::subst::GenericArg as rustc_middle::ty::relate::Relate>::relate::<rustc_infer::infer::nll_relate::TypeRelating<rustc_borrowck::type_check::relate_tys::NllTypeRelatingDelegate>
    9,837,670  ???:<alloc::vec::Vec<rustc_type_ir::TyVid> as alloc::vec::spec_from_iter::SpecFromIter<rustc_type_ir::TyVid, core::iter::adapters::filter_map::FilterMap<core::ops::range::Range<usize>, <rustc_infe>
   -8,795,699  ???:<rustc_infer::infer::sub::Sub as rustc_middle::ty::relate::TypeRelation>::relate_with_variance::<rustc_middle::ty::Ty>
    8,712,904  ???:<&mut rustc_middle::ty::relate::relate_substs<rustc_infer::infer::nll_relate::TypeRelating<rustc_borrowck::type_check::relate_tys::NllTypeRelatingDelegate>>::{closure#0} as core::ops::function>
    8,332,971  ???:<rustc_infer::infer::equate::Equate as rustc_middle::ty::relate::TypeRelation>::relate_with_variance::<rustc_middle::ty::Ty>
    8,262,191  ???:<rustc_infer::infer::sub::Sub as rustc_middle::ty::relate::TypeRelation>::tys
   -8,099,008  ???:<rustc_infer::infer::InferCtxt>::unsolved_variables

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah lol, yea, inlining cost boundaries are fun. Yea, just slap the attribute on all previously generic methods, that should make callers big enough again to stop them from being inlined. Alternatively add inline(never) with a comment to functions that disappeared from cachegrind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not practical -- almost every single method became non-generic after removing the parameter; that's the whole reason for the big win on compile times. IMO trying to second-guess LLVM here on inlining (in either direction) is not the right approach, we're not likely to find a magic bullet.

The diff above is illustrative of typical inlining decisions slightly changing, but trying to recover from that by adding attributes isn't likely to really help that much -- the changes are too scattered and ultimately I suspect arise from things that we can't really influence easily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the gains are from not instantiating the generics in 5 different crates and generating LLVM IR for them and optimizing them?

Having them only get compiled once is a different aspect than removing the implicit #[inline] that generic function have

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#[inline] on all of these methods will -- AFAIK -- force generation in every crate that uses them; it's equivalent to generics in that respect. It also adds an inline hint, but that's a separate effect and unlikely to matter as much given the large size of many of the functions here. We save a little of course on generating just one copy per crate instead of 2-3, but that's still a higher cost.

@Mark-Simulacrum
Copy link
Member Author

@bors try @rust-timer queue

FWIW, I tried adding some eprintln! to the FmtPrinter::new and FmtPrinter::into_buffer functions, and neither are used at all for deeply-nested-async check builds. I think that further confirms that this is not in and of itself likely causing the regressions, rather that other functions are being shuffled between CGUs or similar due to these functions being omitted.

I've added a commit to preallocate a 64-byte buffer which might help with some benchmarks and rebased, let's see if we still see the same impacts.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Feb 19, 2022
@bors
Copy link
Contributor

bors commented Feb 19, 2022

⌛ Trying commit 87bbb05815e1b44c940809f9ab261411bf0068fc with merge c65d716943343dccd7e2f9a1ad2455e7ebf9ae49...

@bors
Copy link
Contributor

bors commented Feb 19, 2022

☀️ Try build successful - checks-actions
Build commit: c65d716943343dccd7e2f9a1ad2455e7ebf9ae49 (c65d716943343dccd7e2f9a1ad2455e7ebf9ae49)

@rust-timer
Copy link
Collaborator

Queued c65d716943343dccd7e2f9a1ad2455e7ebf9ae49 with parent e08d569, future comparison URL.

@Mark-Simulacrum
Copy link
Member Author

Preliminary local results suggest the 64-byte preallocation is not likely to have a significant impact here, so probably doesn't matter that much.

My suggested approach I think is still to call these relatively minor regressions justified -- they appear to be not directly related to the changes in this PR based on investigatory work I've done locally (cachegrind diffs, the fact that the code modified doesn't execute at all in some of the regressed benchmarks); I am inclined to take the win on bootstrap time and leave the runtime performance aside for this case. The folding code may have a few orthogonal wins that will buy us back some of this regression, but those wins are never going to really relate to this PR in a direct fashion.

@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 19, 2022
This avoids monomorphizing for different parameters, decreasing generic code
instantiated downstream from rustc_middle.
@oli-obk
Copy link
Contributor

oli-obk commented Feb 23, 2022

@bors r+

@bors
Copy link
Contributor

bors commented Feb 23, 2022

📌 Commit 2ee6d55 has been approved by oli-obk

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 23, 2022
@bors
Copy link
Contributor

bors commented Feb 24, 2022

⌛ Testing commit 2ee6d55 with merge 8b9e41cd50112f711aef7f5d28d8458430bee486...

@bors
Copy link
Contributor

bors commented Feb 24, 2022

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 24, 2022
@Mark-Simulacrum
Copy link
Member Author

@bors treeclosed=100 retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 24, 2022
@Mark-Simulacrum
Copy link
Member Author

@bors treeclosed-

@bors
Copy link
Contributor

bors commented Feb 24, 2022

⌛ Testing commit 2ee6d55 with merge a26f75a3c88c9ff5b5bcc90d944e60a37ef8a0c3...

@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@Mark-Simulacrum
Copy link
Member Author

@bors retry

@bors
Copy link
Contributor

bors commented Feb 24, 2022

⌛ Testing commit 2ee6d55 with merge 4b043fa...

@bors
Copy link
Contributor

bors commented Feb 24, 2022

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing 4b043fa to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 24, 2022
@bors bors merged commit 4b043fa into rust-lang:master Feb 24, 2022
@rustbot rustbot added this to the 1.61.0 milestone Feb 24, 2022
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4b043fa): comparison url.

Summary: This benchmark run did not return any relevant results. 30 results were found to be statistically significant but too small to be relevant.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

@rustbot label: -perf-regression

@rustbot rustbot removed the perf-regression Performance regression. label Feb 24, 2022
@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants