Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonlinear compile time blow-up with deeply nested types #38528

Closed
sfackler opened this issue Dec 22, 2016 · 40 comments
Closed

Nonlinear compile time blow-up with deeply nested types #38528

sfackler opened this issue Dec 22, 2016 · 40 comments
Assignees
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@sfackler
Copy link
Member

sfackler commented Dec 22, 2016

Compiling the postgres-tokio crate at sfackler/rust-postgres@d27518b goes from 5 seconds to 45 seconds on nightly if the two .boxed() calls in the middle of this call chain are removed: https://github.com/sfackler/rust-postgres/blob/d27518ba76d76ccaa59b3ccd63e981bd8bd0ef33/postgres-tokio/src/lib.rs#L342-L408.

Looks like 15 seconds is spent in translation item collection, and 39 seconds is spent in translation:

time: 0.004	parsing
time: 0.005	recursion limit
time: 0.000	crate injection
time: 0.001	plugin loading
time: 0.000	plugin registration
time: 0.238	expansion
time: 0.000	maybe building test harness
time: 0.000	maybe creating a macro crate
time: 0.000	checking for inline asm in case the target doesn't support it
time: 0.000	complete gated feature checking
time: 0.008	early lint checks
time: 0.000	AST validation
time: 0.005	name resolution
time: 0.004	lowering ast -> hir
time: 0.002	indexing hir
time: 0.000	attribute checking
time: 0.001	language item collection
time: 0.001	lifetime resolution
time: 0.000	looking for entry point
time: 0.000	looking for plugin registrar
time: 0.001	region resolution
time: 0.000	loop checking
time: 0.000	static item recursion checking
time: 0.005	compute_incremental_hashes_map
time: 0.000	load_dep_graph
time: 0.000	stability index
time: 0.001	stability checking
time: 0.011	type collecting
time: 0.000	variance inference
time: 0.000	impl wf inference
time: 0.021	coherence checking
time: 0.008	wf checking
time: 0.003	item-types checking
time: 0.500	item-bodies checking
time: 0.000	drop-impl checking
time: 0.003	const checking
time: 0.001	privacy checking
time: 0.000	intrinsic checking
time: 0.000	effect checking
time: 0.001	match checking
time: 0.000	liveness checking
time: 0.002	rvalue checking
time: 0.014	MIR dump
  time: 0.001	SimplifyCfg
  time: 0.002	QualifyAndPromoteConstants
  time: 0.002	TypeckMir
  time: 0.000	SimplifyBranches
  time: 0.001	SimplifyCfg
time: 0.006	MIR cleanup and validation
time: 0.006	borrow checking
time: 0.000	reachability checking
time: 0.000	death checking
time: 0.000	unused lib feature checking
time: 1.366	lint checking
time: 0.000	resolving dependency formats
  time: 0.000	NoLandingPads
  time: 0.001	SimplifyCfg
  time: 0.001	EraseRegions
  time: 0.000	AddCallGuards
  time: 0.008	ElaborateDrops
  time: 0.000	NoLandingPads
  time: 0.001	SimplifyCfg
  time: 0.001	InstCombine
  time: 0.000	Deaggregator
  time: 0.000	CopyPropagation
  time: 0.001	SimplifyLocals
  time: 0.000	AddCallGuards
  time: 0.000	PreTrans
time: 0.015	MIR optimisations
  time: 0.001	write metadata
  time: 15.086	translation item collection
  time: 0.020	codegen unit partitioning
  time: 0.022	internalize symbols
time: 39.376	translation
time: 0.000	assert dep graph
time: 0.000	serialize dep graph
  time: 0.089	llvm function passes [0]
  time: 0.037	llvm module passes [0]
  time: 2.148	codegen passes [0]
  time: 0.001	codegen passes [0]
time: 2.432	LLVM passes
time: 0.000	serialize work products
time: 0.084	linking

Things are significantly worse on 1.13 - 2 minutes in translation!

Some discussion in IRC: https://botbot.me/mozilla/rust-internals/2016-12-22/?msg=78294648&page=1

cc @aturon


UPDATE: #40280 was closed as a duplicate of this. It had the following sample code:

let &(first, B, C, D, E, F, G, H) = self;
let iter = first.buffers_list();
let iter = iter.chain(B.buffers_list());
let iter = iter.chain(C.buffers_list());
let iter = iter.chain(D.buffers_list());
let iter = iter.chain(E.buffers_list());
let iter = iter.chain(F.buffers_list());
let iter = iter.chain(G.buffers_list());
let iter = iter.chain(H.buffers_list());
Box::new(iter)

--nmatsakis

@sfackler sfackler added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 22, 2016
@nikomatsakis
Copy link
Contributor

nikomatsakis commented Dec 22, 2016

Things are significantly worse on 1.13 - 2 minutes in translation!

This may be the effect of the projection cache.

@aturon
Copy link
Member

aturon commented Dec 22, 2016

cc @alexcrichton This is obviously a show-stopper for futures.

@nikomatsakis
Copy link
Contributor

I can do some profiling. I've had some plans for improving collection/trans that I think may be related. One question to try and answer is what %age of this is just "we are making more code" vs "we are wasting time doing things in trait selection that could be cached". I have observed the latter from time to time and had some thoughts on how to fix it.

@alexcrichton
Copy link
Member

I've also seen this before, with tokio-socks5 as well. Removing just a handful of the trait objects in that file makes the compile time of the crate shoot from 2.34s to 89.52s (!!)

@dwrensha
Copy link
Contributor

Here's a simple example of something that takes a very long time to compile:

future::ok::<(),()>(()).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(())).and_then(|()| Ok(()));

@nikomatsakis
Copy link
Contributor

Closed #40280 as a duplicate, moved some example code into the issue header.

@vorner
Copy link
Contributor

vorner commented Mar 18, 2017

Just a stupid question ‒ the „translation item collection“, which is one of the two culprits here, is listed in the „MIR optimisations“. Does it really have to happen at all on non-optimised debug build?

@bluss
Copy link
Member

bluss commented Mar 18, 2017

It's natural to read it that way, but the headers are actual after their group, so it's part of translation

@vorner
Copy link
Contributor

vorner commented Mar 27, 2017

This seems to be hard enough to take some time fixing. So I thought of a workaround, if someone is also interested. It uses the trick with placing trait objects to split the chains of modifiers, but only on testing/debug builds where the compilation speed matters, while it keeps the complex but hopefully faster concrete types in release build:

This one is for streams (that's what I needed), but can obviously work for futures or other things as well:

#[cfg(debug_assertions)]
fn test_boxed<T, E, S>(s: S) -> Box<Stream<Item = T, Error = E>>
    where S: Stream<Item = T, Error = E> + 'static
{
    Box::new(s)
}
#[cfg(not(debug_assertions))]
fn test_boxed<S>(s: S) -> S {
    s
}

(I didn't find any better config option than the debug_assertions, but I hope that one is good enough)

@Mark-Simulacrum
Copy link
Member

Nominating in place of #42941. cc @eddyb (nominator)

@jonhoo
Copy link
Contributor

jonhoo commented Jul 2, 2017

Repeating another bad case from #42941:

extern crate futures;
use futures::{future, IntoFuture, Future};
fn main() {
    let t: std::result::Result<(), ()> = Ok(());
    let f = t
            .into_future()
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()))
            .and_then(|_| future::ok(()));
    f.wait();
}

The code above takes ~750s to compile on my laptop (you can make it shorter/longer by removing/adding .and_then calls):

...
  time: 0.000; rss: 108MB       write metadata
  time: 472.840; rss: 110MB     translation item collection
  time: 0.001; rss: 112MB       codegen unit partitioning
  time: 0.001; rss: 133MB       internalize symbols
time: 732.741; rss: 133MB       translation
...

The extra ~250s is spent between codegen unit partitioning and internalize symbols.

@jonhoo
Copy link
Contributor

jonhoo commented Jul 6, 2017

Another observation from #42941 about why this is becoming even more important is that the newly-released hyper 0.11 uses futures everywhere, which will likely cause many more programs to use chains of futures like this going forward.

@nikomatsakis
Copy link
Contributor

triage: P-high

We are raising to "high priority" to at least do some investigation and try to determine whether revised trait solving strategies will be of use here.

@rust-highfive rust-highfive added P-high High priority and removed I-nominated labels Jul 6, 2017
@nikomatsakis nikomatsakis self-assigned this Jul 6, 2017
jonhoo added a commit to jonhoo/fantoccini that referenced this issue Jul 8, 2017
This commit should be reverted once that issue has been resolved.
@Mark-Simulacrum Mark-Simulacrum added the C-bug Category: This is a bug. label Jul 26, 2017
@arielb1
Copy link
Contributor

arielb1 commented Aug 10, 2017

status: waiting for niko

@ishitatsuyuki
Copy link
Contributor

Bump. This really hurts impl Trait (boxing hides this issue).

@aturon aturon added this to the impl period milestone Sep 4, 2017
@ishitatsuyuki
Copy link
Contributor

@arielb1 You seem to have created this FIXME. Can you give me some hints on implementing cross-infcx cache?

@ishitatsuyuki
Copy link
Contributor

I have launched a PoC in #48296 and I confirm that the exponential part should be resolved. I still observe some degree of polynomial time algorithms, which makes rustc chokes at about 50 chain().

@ishitatsuyuki
Copy link
Contributor

@hcpl I suspect you copied the wrong code for the specialized version above, as I observe the same time-passes behavior and the code lacks main() despite you specify --crate-type bin for the specialized version.

Can you check again, and if possible, provide the correct specialized version code?

@ishitatsuyuki
Copy link
Contributor

Update: although I have fixed the normalization part, the typeck still seems to be exponential on time, and very memory consuming. I'm investigating the cause of this second issue.

@ishitatsuyuki
Copy link
Contributor

I have implemented a fix for the typeck part too, and I believe there's no more exponential algorithms in rustc anymore.

@hcpl
Copy link

hcpl commented Feb 18, 2018

@ishitatsuyuki the code is correct, but both rustc invocations should have been with --crate-type=lib src/lib.rs (which were assumed to be in different crates). Sorry about that.

I've changed filenames to foo_generic.rs and foo_specialized.rs respectively to better reflect their purpose.

bors added a commit that referenced this issue Feb 18, 2018
Fix exponential projection complexity on nested types

This implements solution 1 from #38528 (comment).

The code quality is currently extremely poor, but we can improve them during review.

Blocking issues:

- we probably don't want a quadratic deduplication for obligations.
- is there an alternative to deduplication?

Based on #48315.

Needs changelog. Noticable improvement on compile time is expected.

Fix #38528
Close #39684
Close #43757
Manishearth added a commit to Manishearth/rust that referenced this issue Feb 24, 2018
…sakis

Fix exponential projection complexity on nested types

This implements solution 1 from rust-lang#38528 (comment).

The code quality is currently extremely poor, but we can improve them during review.

Blocking issues:

- we probably don't want a quadratic deduplication for obligations.
- is there an alternative to deduplication?

Based on rust-lang#48315.

Needs changelog. Noticable improvement on compile time is expected.

Fix rust-lang#38528
Close rust-lang#39684
Close rust-lang#43757
jonhoo added a commit to jonhoo/fantoccini that referenced this issue Feb 27, 2018
phrohdoh pushed a commit to phrohdoh/fantoccini that referenced this issue May 11, 2018
Compilation of tests is *really* slow, most likely due to
rust-lang/rust#41696
or
rust-lang/rust#38528
phrohdoh pushed a commit to phrohdoh/fantoccini that referenced this issue May 11, 2018
This commit should be reverted once that issue has been resolved.
phrohdoh pushed a commit to phrohdoh/fantoccini that referenced this issue May 11, 2018
kennytm added a commit to kennytm/rust that referenced this issue Aug 10, 2018
Consider changing assert! to debug_assert! when it calls visit_with

The perf run from rust-lang#52956 revealed that there were 3 benchmarks that benefited most from changing `assert!`s to `debug_assert!`s:

- issue rust-lang#46449: avg -4.7% for -check
- deeply-nested (AKA rust-lang#38528): avg -3.4% for -check
- regression rust-lang#31157: avg -3.2% for -check

I analyzed their fixing PRs and decided to look for potentially heavy assertions in the files they modified. I noticed that all of the non-trivial ones contained indirect calls to `visit_with()`.

It might be a good idea to consider changing `assert!` to `debug_assert!` in those places in order to get the performance wins shown by the benchmarks.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Aug 12, 2018
Consider changing assert! to debug_assert! when it calls visit_with

The perf run from rust-lang#52956 revealed that there were 3 benchmarks that benefited most from changing `assert!`s to `debug_assert!`s:

- issue rust-lang#46449: avg -4.7% for -check
- deeply-nested (AKA rust-lang#38528): avg -3.4% for -check
- regression rust-lang#31157: avg -3.2% for -check

I analyzed their fixing PRs and decided to look for potentially heavy assertions in the files they modified. I noticed that all of the non-trivial ones contained indirect calls to `visit_with()`.

It might be a good idea to consider changing `assert!` to `debug_assert!` in those places in order to get the performance wins shown by the benchmarks.
bors added a commit to rust-lang-ci/rust that referenced this issue Mar 3, 2022
remove obligation dedup from `impl_or_trait_obligations`

Looking at the examples from rust-lang#38528 they all seem to compile fine even without this and it seems like this might be unnecessary effort
phdavis1027 added a commit to phdavis1027/irods_rust that referenced this issue Apr 9, 2024
- we need buffers in the deserialization methods, so i want ahead
and added them to the serialization methods too
- `boxed` in the `and_then` chain apparently makes both runtime
  performance and compile time usable, see: [here]()rust-lang/rust#38528
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests