Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemingly pathological behavior in typechecking, item collecting, and trans #33594

Closed
alexcrichton opened this issue May 12, 2016 · 12 comments
Closed

Comments

@alexcrichton
Copy link
Member

In working on a prototype futures library I've played around a bit with a small example program. Unfortunately this 20-line function has a 12 second compile time in debug mode. Looking at -Z time-passes, the worst offenders are:

$ rustup run nightly cargo rustc --example retry -- -Z time-passes
...
time: 2.266; rss: 87MB  item-bodies checking
...
time: 0.000; rss: 95MB  Prepare MIR codegen passes
  time: 3.745; rss: 106MB       translation item collection                                             
  time: 0.000; rss: 124MB       write metadata                                                          
time: 9.259; rss: 124MB translation                                                                     
time: 0.000; rss: 124MB assert dep graph
time: 0.000; rss: 124MB serialize dep graph
  time: 0.021; rss: 73MB        llvm function passes [0]
  time: 0.005; rss: 74MB        llvm module passes [0]  
  time: 0.365; rss: 79MB        codegen passes [0]      
  time: 0.000; rss: 71MB        codegen passes [0]      
time: 0.402; rss: 71MB  LLVM passes                     
  time: 0.042; rss: 71MB        running linker
time: 0.043; rss: 71MB  linking               

Some interesting data points

  • Item-bodies checking is taking two seconds? (cc @nikomatsakis)
  • Translation item collection it taking four seconds. (cc @michaelwoerister)
  • Trans itself is taking around six seconds (cc @arielb1, @dotdash). I didn't think this is monomorphizing massive amounts of code, at least not 6 seconds worth of trans of code.

FWIW, the perf record of this compile looks like the hottest functions are (those above 1%):

 23.53%     rustc  librustc-031d5509.so                  [.] _$LT$ty..subst..Substs$LT$$u27$tcx$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h67025cda0fa7ab1b                                                
 10.49%     rustc  librustc-031d5509.so                  [.] _$LT$collections..vec..Vec$LT$T$GT$$u20$as$u20$core..hash..Hash$GT$::hash::ha06e4f3f77f47962                                                   
  4.92%     rustc  librustc-031d5509.so                  [.] _$LT$std..collections..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::h97a8507e2c811a37                                                          
  4.88%     rustc  librustc-031d5509.so                  [.] _$LT$std..collections..HashMap$LT$K$C$$u20$V$C$$u20$S$GT$$GT$::get::hfbcbea9f0e6ca77f                                                          
  2.79%     rustc  libstd-031d5509.so                    [.] mallocx                                                                                                                                        
  2.73%     rustc  libstd-031d5509.so                    [.] sdallocx                                                                                                                                       
  2.43%     rustc  librustc-031d5509.so                  [.] _$LT$ty..sty..TypeVariants$LT$$u27$tcx$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h13defa770fe420c4                                            
  2.16%     rustc  librustc-031d5509.so                  [.] rustc::ty::fold::TypeFolder::fold_ty::hc4acd00a49a1567e                                                                                        
  1.71%     rustc  librustc-031d5509.so                  [.] _$LT$ty..sty..TypeVariants$LT$$u27$tcx$GT$$u20$as$u20$core..cmp..PartialEq$GT$::eq::he128d17c0cf9651e                                          
  1.57%     rustc  librustc-031d5509.so                  [.] rustc::ty::context::TyCtxt::intern_ty::h30fe821a54c8bb56                                                                                       
  1.53%     rustc  libc-2.19.so                          [.] __memmove_ssse3_back                                                                                                                           
  1.24%     rustc  librustc-031d5509.so                  [.] rustc::ty::context::TyCtxt::mk_substs::hf9ef6f493c09a1fb                                                                                       
  1.19%     rustc  librustc-031d5509.so                  [.] _$LT$ty..sty..Binder$LT$T$GT$$u20$as$u20$core..hash..Hash$GT$::hash::h1d6271393e0c767b                                                         
  1.05%     rustc  librustc-031d5509.so                  [.] traits..ObligationCause$LT$$u27$static$GT$::drop.80363::hae63a8e62161292a                                                                      
  1.04%     rustc  librustc-031d5509.so                  [.] _$LT$infer..resolve..OpportunisticTypeResolver$LT$$u27$a$C$$u20$$u27$tcx$GT$$u20$as$u20$ty..fold..TypeFolder$LT$$u27$tcx$GT$$GT$::fold_ty::hffa

We seem to be hashing... a lot!

@michaelwoerister
Copy link
Member

Yes, that definitely looks excessive. Thanks for the report!

@dotdash
Copy link
Contributor

dotdash commented May 12, 2016

Reduced test case:

extern crate futures;

use futures::*;

fn main() {
    let a: Box<Future<Item=i32, Error=()>> = loop {};
    a
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t)
        .map(|t| t);
}

@Marwes
Copy link
Contributor

Marwes commented May 12, 2016

I believe this is the same issue as in #22204 and #31849. @nikomatsakis did an attempt at adding a cache for projections in #32287 but there were some issues with higher ranked lifetimes.

@alexcrichton
Copy link
Member Author

@dotdash that test case seems to stress the pathological performance in "item-bodies checking", but it didn't seem to reproduce the massive amount of time in trans (particularly the item collector). Perhaps that's same as #22204 which @Marwes points out, but there may be something else lurking?

@Marwes yeah I'd believe that the typechecking performance cliff is related to that, but that's only about 15% of the original test's compile time, the remaining parts were all in trans.

@dotdash
Copy link
Contributor

dotdash commented May 12, 2016

@alexcrichton I think that the difference is just that trans hits the bad code path less often with this variant, not that there's actually a problem in trans. The profile you provided kind of supports that idea as well.

@Marwes
Copy link
Contributor

Marwes commented May 12, 2016

@alexcrichton Hmm, I think I got the times of the chained iterator example mixed up with the times I get for compiling the parser for embed_lang which definitely is slow due to the lack of cached projection types.

...
time: 9.079; rss: 101MB item-bodies checking
...
time: 0.009; rss: 108MB Prepare MIR codegen passes
  time: 198.739; rss: 157MB     translation item collection
  time: 0.129; rss: 324MB       write metadata
time: 636.004; rss: 315MB       translation
time: 0.000; rss: 315MB assert dep graph
time: 0.000; rss: 315MB serialize dep graph
  time: 0.601; rss: 198MB       llvm function passes [0]
  time: 0.197; rss: 200MB       llvm module passes [0]
  time: 13.754; rss: 199MB      codegen passes [0]
  time: 0.008; rss: 76MB        codegen passes [0]
time: 15.669; rss: 73MB LLVM passes
time: 0.633; rss: 75MB  linking

@michaelwoerister
Copy link
Member

time: 198.739; rss: 157MB translation item collection

wat!
Is this for @dotdash's test case?

@Marwes
Copy link
Contributor

Marwes commented May 12, 2016

@michaelwoerister Sorry I wasn't clear enough! That was for compiling embed_lang's parser (cargo rustc --features test -- -Z time-passes to be precise). To be fair, all times are actually doubled as passing --features test creates two concrete parsers (I acidentally passed that parameter out of habit).

@nikomatsakis
Copy link
Contributor

So for reference I tried Alex's future stuff with my projection cache and it made type-checking basically free, as well as lowering the cost of everything else substantially -- but trans and friends remained fairly expensive. I don't have the measurements anymore unfortunately. I hope to be landing the projection cache soon-ish, have to close up the unsoundness that the cache revealed first.

@sophiajt
Copy link
Contributor

@nikomatsakis - if you want I can also try it in the crates timing thing I was working on with Alex.

@nikomatsakis
Copy link
Contributor

Note that the projection cache landed. So type-checking should be faster, if not the backend. More work is needed there -- this may be because the projection cache is tied to an inference context, and we make a fresh one each time in trans, but there may be other factors.

@Mark-Simulacrum
Copy link
Member

I think this is the same issue as #38528, so closing in favor of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants