Skip to content

Commit

Permalink
Rewrite after talking to @acrichto on IRC.
Browse files Browse the repository at this point in the history
tl;dr:
 - Motivation is significantly expanded
 - Phase 1 is cut for being too half-assed
  • Loading branch information
Ericson2314 committed May 29, 2015
1 parent 630ac97 commit 0d76898
Showing 1 changed file with 118 additions and 60 deletions.
178 changes: 118 additions & 60 deletions text/0000-cargo-libstd-awareness.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,92 +5,150 @@

# Summary

Currently, all packages implicitly depend on libstd. This makes Cargo unsuitable for packages that
need a custom-built libstd, or otherwise depend on crates with the same names as libstd and the
crates behind the facade. The proposed fixes also open the door to a future where libstd can be
Cargoized.
Currently, Cargo doesn't know whether packages depend on libstd. This makes Cargo unsuitable for
packages that need a cross-compiled or custom libstd, or otherwise depend on crates with the same
names as libstd and the crates behind the facade. The proposed fixes also open the door to a future
where libstd can be Cargoized.


# Motivation

Bare-metal work cannot use a standard build of libstd. But since any crate built with Cargo can link
with a system-installed libstd if the target matches, using Cargo for such projects can be irksome
or impossible.
First some background. The current situation seems to be more of an accident of `rustc`'s pre-Cargo
history than an explicit design decision. Cargo passes the location and name of all depended-on
crates to `rustc`. This method is good for a number of reasons stemming from its fine granularity,
such as:

Cargoizing libstd also generally simplifies the infrastructure, and makes cross compiling much
slicker, but that is a separate discussion.
- No undeclared dependencies can be used

Finally, I first raised this issue here: https://github.com/rust-lang/Cargo/issues/1096 Also, there
are some (heavily bit-rotted) projects at https://github.com/RustOS-Fork-Holding-Ground that depend
on each other in the way this RFC would make much more feasible.
- Conversely, `rustc` can warn against *unused* declared dependencies

# Detailed design
- Crate/symbol names are frobbed so that packages with the overlapping names don't conflict


However rather than passing in libstd and its deps, Cargo lets the compiler look for them as need in
the compiler's sysroot [specifically `<sysroot>/lib/<target>`]. This is quite coarse in comparison,
and we loose all the advantages of the previous method:

- Packages may link or not link against libs in that directory as they please, with Cargo being
none the wiser.

- Cargo-built crates with the same name as those in there will collide, as the sysroot libs don't
have their names frobbed.

The current situation seems to be more of an accident of `rustc`'s pre-Cargo history than an
explicit design decision. Cargo passes the location and name of all depended on crates to `rustc`.
This is good because it means that that no undeclared dependencies on other Cargo packages can leak
through. However, it also passes in `--sysroot /path/to/some/libdir`, the directory being were
libstd is. This means packages are free to use libstd, the crates behind the facade, or none of the
above, with Cargo being none the wiser.
- Cross compiling may fail at build-time (as opposed to the much shorter
"gather-dependencies-time") because of missing packages

The only new interface proposed is a boolean field to the package meta telling Cargo that the
package does not depend on libstd by default. This need not imply Rust's `no_std`, as one might want
to `use` their own build of libstd by default. To disambiguate, this field is called
`implicit-deps`; please, go ahead and bikeshead the name. `implicit-deps` is true by default to
maintain compatibility with existing packages.

The meaning of this flag is defined in 3 phases, where each phase extends the last. The idea being
is that while earlier phases are easier to implement, later phases yield a more elegant system.
Cargo doesn't look inside the sysroot to see what is or isn't there, but it would hardly help if it
did, because it doesn't know what any package needs. Assuming all packages need libstd, for example,
means Cargo just flat-out won't build freestanding packages that just use libcore on a platform that
doesn't support libstd.

## Phase 1
For an anecdote: in https://github.com/RustOS-Fork-Holding-Ground I tried to rig up Cargo to cross
compile libstd for me. Since I needed to use an unstable compiler anyways, it was possible in
principle to build absolutely everything I needed with the same `rustc` version. Because of some
trouble with Cargo and target JSONs, I didn't use a custom target specification, and just used
`x86_64-gnu-linux`, meaning that depending on platform I was compiling on, I may or may have been
cross-compiling. In the case where I wasn't, I couldn't complete the build because `rustc`
complained about the libstd I was building overlapping with the libstd in the sysroot.

Add a `--use-sysroot=<true|false>` flag to `rustc`, where true is the default. Make Cargo pass
`--use-sysroot=false` to `rustc` is the case that `implicit-deps` is false.
For these reasons, most freestanding projects I know of avoid Cargo altogether, and just include
submodule rust and run make in that. Cargo can still be used if one manages to get the requisite
libraries in the sysroot. But this is a tedious operation that individual projects shouldn't need to
reimplement, and one that has serious security implications if the normal libstd is modified.

This hotfix is enough to allow us bare-metal devs to use Cargo for our own projects, but doesn't
suffice for creating an ecosystem of packages that depend on crates behind the facade but not libstd
itself. This is because the choices are all or nothing: Either one implicitly depends on libstd or
the crates behind the facade, or they don't depend on them at all.
The fundamental plan proposed in this RFC is to make sure that anything Cargo builds never blindly
links against libraries in the sysroot. This is achieved by making Cargo aware of all dependencies,
including those libstd or its backing crates. That way, these problems are avoided.

## Phase 2
For the record, I first raised this issue [here](https://github.com/rust-lang/Cargo/issues/1096).

Since, passing in a directory of crates is inherently more fragile than passing in a crate itself,
make Cargo use `--use-sysroot=false` in all cases.

Cargo would special case package names corresponding to the crates behind the facade, such that if
the package don't exist, it would simply pass the corresponding system crate to `rustc`. I assume
the names are blacklisted on crates.io already, so by default the packages won't exist. But users
can use config files to extend the namespace so their own modded libstds can be used instead. Even
if they don't want to change libstd but just cross-compile it, this is frankly the easiest way as
Cargo will seemliest cross compile both their project and it's transitive dependencies.
# Detailed design

The only new interface proposed is a boolean field in `Cargo.toml` specifying that the package does
not depend on libstd by default. Note that this is technically orthogonal to Rust's `no_std`, as one
might want to `use` their own build of libstd by default, or implicitly depend on it but not
glob-import the prelude. To disambiguate, this field is called `implicit-deps`; please, go ahead and
bikeshead the name. `implicit-deps` is true by default to maintain compatibility with existing
packages. When true, "std" will be implicitly appended to the list of dependencies.

When Cargo sees a package name it cannot resolve, it will query `rustc` for the default sysroot, and
look inside to see if it can find a matching rlib. [It is necessary to query `rustc` because the
`rustc` directory layout is not stabilized and `rustc` and Cargo are versioned independently. The
same version issues make giving a Cargo a whitelist of potential standard library crate-names
risky.] If a matching rlib is successful found, Cargo will copy it (or simlink it) into the
project's build directly as if it built the rlib. Each rlib in the sysroot must be paired with some
sort of manifest listing its dependencies, so Cargo can copy those too.

In this way we can put packages on crates.io that depend on the crates behind the facade. Some
packages that already exist, like liblog and libbitflags, should be given features that optionally
allow them to avoid libstd and just depend directly on the crates behind the facade they really
need.
`rustc` will have a new `--use-sysroot=<true|false>` flag. When Cargo builds a package, it will
always pass `--use-sysroot=false` to `rustc`, as any rlibs it needs will have been copied to the
build directory. Cargo can and will then pass those rlibs directly just as it does with normal Cargo
deps.

## Phase 3
If Cargo cannot find the libraries it needs in the sysroot, or a library's dependency manifest is
missing, it will complain that the standard libraries needed for the current job are missing and
give up.

If/when the standard library is built with Cargo and put on crates.io, all the specially-cased
package names can be treated normally,
## Future Compatibility

The standard library is downloaded and built from crates.io. Or equivalently, Cargo comes with a
cache of that build, as Cargo should be able cache builds between projects at this point. Just as in
phase 2, `implicit-deps = false` just prevents libstd from implicitly being appended to the list of
dependencies.
In the future, rather than giving up if libraries are missing Cargo could attempt to download them
from some build cache. In the farther future, the stdlib libraries may be Cargoized, and Cargo able
to query pre-built binaries for any arbitrary package. In that scenario, we can remove all code
relating to falling back on the sysroot to look for rlibs.

In the meantime, developers living dangerously with an unstable compiler can package the standard
library themselves, and use their Cargo config file to get Cargo to cross compiler libstd for them.

Again, to make this as least controversial as possible, this RFC does not propose outright that the
standard library should be Cargoized. This 3rd phases just describes how this feature would work
were that to happen.

# Drawbacks

I really don't know of any. Development for hosted environments would hardly be very affected.
Cargo does more work than is strictly necessary for rlibs installed in sysroot; some more metadata
must be maintained by `rustc` or its installation.

- But in a future where Cargo can build stdlib like any other, all this cruft goes away.


# Alternatives

Make it so all dependencies, even libstd, must be explicit. C.f. Cabal and base.
- Simply have `implicit-deps = false` make Cargo pass `--use-sysroot=false` to `rustc`.

- This doesn't by-itself make a way for package to depend on only some of the crates behind the
facade. That, in turn, means Cargo is little better at cross compiling those than before.

- While unstable compiler users can just package the standard library and depend on it as a
normal crate, it would be weird to have freestanding projects coalesce around some bootleg
libcore on crates.io.

- Make it so all dependencies, even libstd, must be explicit. C.f. Cabal and base. Slightly
simpler, but breaks nearly all existing packages.

- Don't track stdlib depencies. Then, in the future when Cargo tries to obtain libs for cross
compiling, stick them in the sysroot instead. Cargo either assumes package needs all of stdlib,
or examines target to see what crates behind the facade are buildable and just goes for those.

- Cargo does extra work if you need less of the stdlib

- No nice migration into a world where Cargo can build stdlib without hacks.


# Unresolved questions

There are multiple lists of dependencies for different things (e.g. tests), Should libstd be append
to all of them in phases 2 and 3?
- There are multiple lists of dependencies for different things (e.g. tests), Should libstd be
append to all of them in phases 2 and 3?

- Should rlibs in the sysroot respect Cargo name-frobbing conventions? If they don't, should Cargo
frob the name when it copies it (e.g. with `ld -i`)?

- Just as make libstd a real dependency, we can make `rustc` a real dev dependency. The standard
library can thus be built with Cargo by depending on the associated unstable compiler. There are
some challenges to be overcome, including:

- Teaching Cargo and its frobber an "x can build for y" relation for stable/unstable compiler
compatibility, rather than simply assuming all distinct compilers are mutually incompatible.

- Coalescing a "virtual package" out of many different packages with disjoint dependencies. This
is needed because different `rustc` version has a different library implementation that
present the same interface.

This almost certainly is better addressed in a later RFC.

0 comments on commit 0d76898

Please sign in to comment.