Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up static linking by combining ar invocations #15670

Merged
merged 1 commit into from
Jul 30, 2014
Merged

speed up static linking by combining ar invocations #15670

merged 1 commit into from
Jul 30, 2014

Conversation

spernsteiner
Copy link
Contributor

When rustc produces an rlib, it includes the contents of each static library required by the crate. Currently each static library is added individually, by extracting the library with ar x and adding the objects to the rlib using ar r. Each ar r has significant overhead - it appears to scan through the full contents of the rlib before adding the new files. This patch avoids most of the overhead by adding all library objects (and other rlib components) at once using a single ar r.

When building librustc (on Linux, using GNU ar), this patch gives a 60-80% reduction in linking time, from 90s to 10s one machine I tried and 25s to 8s on another. (Though librustc is a bit of a special case - it's a very large crate, so the rlib is large to begin with, and it also relies on a total of 45 static libraries due to the way LLVM is organized.) More reasonable crates such as libstd and libcore also get a small reduction in linking time (just from adding metadata, bitcode, and object code in one ar invocation instead of three), but this is not very noticeable since the time there is small to begin with (around 1s).

@@ -1064,11 +1067,12 @@ fn link_rlib<'a>(sess: &'a Session,
abi::OsMacos | abi::OsiOS => {}
_ => { a.update_symbols(); }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be necessary now that the entire archive is built in one step, could you try omitting the call to update_symbols entirely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To eliminate the extra ar invocation here, I've moved update_symbols into ArchiveBuilder, where it simply sets a flag to indicate that the s (update symbols) option should be used when build runs ar. (Originally I had build pass s always, but after reading the comment above this line and #11162, it sounds like that will crash ar on OSX.)

@alexcrichton
Copy link
Member

Those are some nice speedups, nice catch!

@sfackler
Copy link
Member

needs a rebase

@spernsteiner
Copy link
Contributor Author

I've rebased this, which required a bit of refactoring - there was a conflict with #15511, which refactored rustc::back::archive and moved it into a different crate.

@alexcrichton
Copy link
Member

Well that's a surprising error! I suppose we could batch the commands on windows instead of doing it all in one go.

If you feel like going whole-hog on this issue, I know that LLVM has an llvm-ar tool which is in theory a replacement for ar. This also means that they should have all the library support in LLVM proper to write all of this without invoking an external ar tool, but that's also dreaming a bit!

@spernsteiner
Copy link
Contributor Author

I've modified ArchiveBuilder::build to use multiple ar invocations when the total length of the argument list would exceed 32k (the limit on Windows). I also made it run ar out of the temp directory, using just the filename instead of the absolute path for each archive member, which on its own is sufficient to get rustc_llvm's ar command under 32k.

I looked at llvm-ar, but the archive-writing code is in llvm-ar.cpp, not a library. The format is pretty simple, so it wouldn't be that hard to write a pure-Rust archive writer, but there are also some annoying details regarding GNU format vs. BSD format archives, and I figured it was best to leave that for the system binutils to sort out.

@alexcrichton
Copy link
Member

Sounds like a good plan.

@spernsteiner
Copy link
Contributor Author

I've fixed the error on OSX by changing how the rlib symbol index is updated on that platform. Previously, link_rlib produced rlibs that were unusable because they did not contain a symbol index at all (update_symbols was never run). The simple fix (always run update_symbols) doesn't work because ar apparently still has trouble dealing with non-object files when producing a symbol table. So link_rlib now makes two ar invocations: one to add all object files and update symbols, and a second to add non-object files (metadata and bitcode) without updating symbols. (link_staticlib may make a third invocation if there are more objects to add.)

On non-OSX platforms, link_rlib/link_staticlib still uses a single ar invocation to add all object and non-object members and generate symbols.

bors added a commit that referenced this pull request Jul 30, 2014
When rustc produces an rlib, it includes the contents of each static library required by the crate.  Currently each static library is added individually, by extracting the library with `ar x` and adding the objects to the rlib using `ar r`.  Each `ar r` has significant overhead - it appears to scan through the full contents of the rlib before adding the new files.  This patch avoids most of the overhead by adding all library objects (and other rlib components) at once using a single `ar r`.

When building `librustc` (on Linux, using GNU ar), this patch gives a 60-80% reduction in linking time, from 90s to 10s one machine I tried and 25s to 8s on another.  (Though `librustc` is a bit of a special case - it's a very large crate, so the rlib is large to begin with, and it also relies on a total of 45 static libraries due to the way LLVM is organized.)  More reasonable crates such as `libstd` and `libcore` also get a small reduction in linking time (just from adding metadata, bitcode, and object code in one `ar` invocation instead of three), but this is not very noticeable since the time there is small to begin with (around 1s).
@bors bors closed this Jul 30, 2014
@bors bors merged commit 4d8de63 into rust-lang:master Jul 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants