From 95bab3ed0a03ac255f3e798b3974612f392ef28e Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Sat, 24 Feb 2018 17:14:56 -0500 Subject: [PATCH 1/4] rework the MIR intro section, breaking out passes and visitors --- src/SUMMARY.md | 4 + src/background.md | 122 ++++++++++++++ src/glossary.md | 2 + src/mir-background.md | 122 ++++++++++++++ src/mir-borrowck.md | 57 ++++++- src/mir-passes.md | 169 +++++++++++++++++++ src/mir-regionck.md | 376 ++++++++++++++++++++++++++++++++++++++++++ src/mir-visitor.md | 45 +++++ src/mir.md | 303 +++++++++++++++++++++++++--------- 9 files changed, 1121 insertions(+), 79 deletions(-) create mode 100644 src/background.md create mode 100644 src/mir-background.md create mode 100644 src/mir-passes.md create mode 100644 src/mir-regionck.md create mode 100644 src/mir-visitor.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 4ead05705..29ad79998 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -24,11 +24,15 @@ - [Type checking](./type-checking.md) - [The MIR (Mid-level IR)](./mir.md) - [MIR construction](./mir-construction.md) + - [MIR visitor](./mir-visitor.md) + - [MIR passes: getting the MIR for a function](./mir-passes.md) - [MIR borrowck](./mir-borrowck.md) + - [MIR-based region checking (NLL)](./mir-regionck.md) - [MIR optimizations](./mir-optimizations.md) - [Constant evaluation](./const-eval.md) - [miri const evaluator](./miri.md) - [Parameter Environments](./param_env.md) - [Generating LLVM IR](./trans.md) +- [Background material](./background.md) - [Glossary](./glossary.md) - [Code Index](./code-index.md) diff --git a/src/background.md b/src/background.md new file mode 100644 index 000000000..92ae6507a --- /dev/null +++ b/src/background.md @@ -0,0 +1,122 @@ +# Background topics + +This section covers a numbers of common compiler terms that arise in +this guide. We try to give the general definition while providing some +Rust-specific context. + + + +## What is a control-flow graph? + +A control-flow graph is a common term from compilers. If you've ever +used a flow-chart, then the concept of a control-flow graph will be +pretty familiar to you. It's a representation of your program that +exposes the underlying control flow in a very clear way. + +A control-flow graph is structured as a set of **basic blocks** +connected by edges. The key idea of a basic block is that it is a set +of statements that execute "together" -- that is, whenever you branch +to a basic block, you start at the first statement and then execute +all the remainder. Only at the end of the is there the possibility of +branching to more than one place (in MIR, we call that final statement +the **terminator**): + +``` +bb0: { + statement0; + statement1; + statement2; + ... + terminator; +} +``` + +Many expressions that you are used to in Rust compile down to multiple +basic blocks. For example, consider an if statement: + +```rust +a = 1; +if some_variable { + b = 1; +} else { + c = 1; +} +d = 1; +``` + +This would compile into four basic blocks: + +``` +BB0: { + a = 1; + if some_variable { goto BB1 } else { goto BB2 } +} + +BB1: { + b = 1; + goto BB3; +} + +BB2: { + c = 1; + goto BB3; +} + +BB3: { + d = 1; + ...; +} +``` + +When using a control-flow graph, a loop simply appears as a cycle in +the graph, and the `break` keyword translates into a path out of that +cycle. + + + +## What is a dataflow analysis? + +*to be written* + + + +## What is "universally quantified"? What about "existentially quantified"? + +*to be written* + + + +## What is co- and contra-variance? + +*to be written* + + + +## What is a "free region" or a "free variable"? What about "bound region"? + +Let's describe the concepts of free vs bound in terms of program +variables, since that's the thing we're most familiar with. + +- Consider this expression: `a + b`. In this expression, `a` and `b` + refer to local variables that are defined *outside* of the + expression. We say that those variables **appear free** in the + expression. To see why this term makes sense, consider the next + example. +- In contrast, consider this expression, which creates a closure: `|a, + b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments + that the closure will be given when it is called. We say that the + `a` and `b` there are **bound** to the closure, and that the closure + signature `|a, b|` is a **binder** for the names `a` and `b` + (because any references to `a` or `b` within refer to the variables + that it introduces). + +So there you have it: a variable "appears free" in some +expression/statement/whatever if it refers to something defined +outside of that expressions/statement/whatever. Equivalently, we can +then refer to the "free variables" of an expression -- which is just +the set of variables that "appear free". + +So what does this have to do with regions? Well, we can apply the +analogous concept to type and regions. For example, in the type `&'a +u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it +does not. diff --git a/src/glossary.md b/src/glossary.md index 5d76cf3cf..2aa9b52f1 100644 --- a/src/glossary.md +++ b/src/glossary.md @@ -18,6 +18,7 @@ HirId | identifies a particular node in the HIR by combining HIR Map | The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers. ICE | internal compiler error. When the compiler crashes. ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled. +inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents value you are trying to find. Think of `X` in algebra. infcx | the inference context (see `librustc/infer`) IR | Intermediate Representation. A general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it. local crate | the crate currently being compiled. @@ -25,6 +26,7 @@ LTO | Link-Time Optimizations. A set of optimizations offer [LLVM] | (actually not an acronym :P) an open-source compiler backend. It accepts LLVM IR and outputs native binaries. Various languages (e.g. Rust) can then implement a compiler front-end that output LLVM IR and use LLVM to compile to all the platforms LLVM supports. MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html)) miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html)) +newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices. node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`. obligation | something that must be proven by the trait system ([see more](trait-resolution.html)) provider | the function that executes a query ([see more](query.html)) diff --git a/src/mir-background.md b/src/mir-background.md new file mode 100644 index 000000000..38fba5d16 --- /dev/null +++ b/src/mir-background.md @@ -0,0 +1,122 @@ +# MIR Background topics + +This section covers a numbers of common compiler terms that arise when +talking about MIR and optimizations. We try to give the general +definition while providing some Rust-specific context. + + + +## What is a control-flow graph? + +A control-flow graph is a common term from compilers. If you've ever +used a flow-chart, then the concept of a control-flow graph will be +pretty familiar to you. It's a representation of your program that +exposes the underlying control flow in a very clear way. + +A control-flow graph is structured as a set of **basic blocks** +connected by edges. The key idea of a basic block is that it is a set +of statements that execute "together" -- that is, whenever you branch +to a basic block, you start at the first statement and then execute +all the remainder. Only at the end of the is there the possibility of +branching to more than one place (in MIR, we call that final statement +the **terminator**): + +``` +bb0: { + statement0; + statement1; + statement2; + ... + terminator; +} +``` + +Many expressions that you are used to in Rust compile down to multiple +basic blocks. For example, consider an if statement: + +```rust +a = 1; +if some_variable { + b = 1; +} else { + c = 1; +} +d = 1; +``` + +This would compile into four basic blocks: + +``` +BB0: { + a = 1; + if some_variable { goto BB1 } else { goto BB2 } +} + +BB1: { + b = 1; + goto BB3; +} + +BB2: { + c = 1; + goto BB3; +} + +BB3: { + d = 1; + ...; +} +``` + +When using a control-flow graph, a loop simply appears as a cycle in +the graph, and the `break` keyword translates into a path out of that +cycle. + + + +## What is a dataflow analysis? + +*to be written* + + + +## What is "universally quantified"? What about "existentially quantified"? + +*to be written* + + + +## What is co- and contra-variance? + +*to be written* + + + +## What is a "free region" or a "free variable"? What about "bound region"? + +Let's describe the concepts of free vs bound in terms of program +variables, since that's the thing we're most familiar with. + +- Consider this expression: `a + b`. In this expression, `a` and `b` + refer to local variables that are defined *outside* of the + expression. We say that those variables **appear free** in the + expression. To see why this term makes sense, consider the next + example. +- In contrast, consider this expression, which creates a closure: `|a, + b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments + that the closure will be given when it is called. We say that the + `a` and `b` there are **bound** to the closure, and that the closure + signature `|a, b|` is a **binder** for the names `a` and `b` + (because any references to `a` or `b` within refer to the variables + that it introduces). + +So there you have it: a variable "appears free" in some +expression/statement/whatever if it refers to something defined +outside of that expressions/statement/whatever. Equivalently, we can +then refer to the "free variables" of an expression -- which is just +the set of variables that "appear free". + +So what does this have to do with regions? Well, we can apply the +analogous concept to type and regions. For example, in the type `&'a +u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it +does not. diff --git a/src/mir-borrowck.md b/src/mir-borrowck.md index 55bc9fc98..b632addc2 100644 --- a/src/mir-borrowck.md +++ b/src/mir-borrowck.md @@ -1 +1,56 @@ -# MIR borrowck +# MIR borrow check + +The borrow check is Rust's "secret sauce" -- it is tasked with +enforcing a number of properties: + +- That all variables are initialized before they are used. +- That you can't move the same value twice. +- That you can't move a value while it is borrowed. +- That you can't access a place while it is mutably borrowed (except through the reference). +- That you can't mutate a place while it is shared borrowed. +- etc + +At the time of this writing, the code is in a state of transition. The +"main" borrow checker still works by processing [the HIR](hir.html), +but that is being phased out in favor of the MIR-based borrow checker. +Doing borrow checking on MIR has two key advantages: + +- The MIR is *far* less complex than the HIR; the radical desugaring + helps prevent bugs in the borrow checker. (If you're curious, you + can see + [a list of bugs that the MIR-based borrow checker fixes here][47366].) +- Even more importantly, using the MIR enables ["non-lexical lifetimes"][nll], + which are regions derived from the control-flow graph. + +[47366]: https://github.com/rust-lang/rust/issues/47366 +[nll]: http://rust-lang.github.io/rfcs/2094-nll.html + +### Major phases of the borrow checker + +The borrow checker source is found in +[the `rustc_mir::borrow_check` module][b_c]. The main entry point is +the `mir_borrowck` query. At the time of this writing, MIR borrowck can operate +in several modes, but this text will describe only the mode when NLL is enabled +(what you get with `#![feature(nll)]`). + +[b_c]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check + +The overall flow of the borrow checker is as follows: + +- We first create a **local copy** C of the MIR. We will be modifying + this copy in place to modify the types and things to include + references to the new regions that we are computing. +- We then invoke `nll::replace_regions_in_mir` to modify this copy C. + Among other things, this function will replace all of the regions in + the MIR with fresh [inference variables](glossary.html). + - (More details can be found in [the regionck section](./mir-regionck.html).) +- Next, we perform a number of [dataflow analyses](./background.html#dataflow) + that compute what data is moved and when. The results of these analyses + are needed to do both borrow checking and region inference. +- Using the move data, we can then compute the values of all the regions in the MIR. + - (More details can be found in [the NLL section](./mir-regionck.html).) +- Finally, the borrow checker itself runs, taking as input (a) the + results of move analysis and (b) the regions computed by the region + checker. This allows is to figure out which loans are still in scope + at any particular point. + diff --git a/src/mir-passes.md b/src/mir-passes.md new file mode 100644 index 000000000..2fe471385 --- /dev/null +++ b/src/mir-passes.md @@ -0,0 +1,169 @@ +# MIR passes + +If you would like to get the MIR for a function (or constant, etc), +you can use the `optimized_mir(def_id)` query. This will give you back +the final, optimized MIR. For foreign def-ids, we simply read the MIR +from the other crate's metadata. But for local def-ids, the query will +construct the MIR and then iteratively optimize it by putting it +through various pipeline stages. This section describes those pipeline +stages and how you can extend them. + +To produce the `optimized_mir(D)` for a given def-id `D`, the MIR +passes through several suites of optimizations, each represented by a +query. Each suite consists of multiple optimizations and +transformations. These suites represent useful intermediate points +where we want to access the MIR for type checking or other purposes: + +- `mir_build(D)` – not a query, but this constructs the initial MIR +- `mir_const(D)` – applies some simple transformations to make MIR ready for constant evaluation; +- `mir_validated(D)` – applies some more transformations, making MIR ready for borrow checking; +- `optimized_mir(D)` – the final state, after all optimizations have been performed. + +### Seeing how the MIR changes as the compiler executes + +`-Zdump-mir=F` is a handy compiler options that will let you view the MIR +for each function at each stage of compilation. `-Zdump-mir` takes a **filter** +`F` which allows you to control which functions and which passes you are interesting +in. For example: + +```bash +> rustc -Zdump-mir=foo ... +``` + +This will dump the MIR for any function whose name contains `foo`; it +will dump the MIR both before and after every pass. Those files will +be created in the `mir_dump` directory. There will likely be quite a +lot of them! + +```bash +> cat > foo.rs +fn main() { + println!("Hello, world!"); +} +^D +> rustc -Zdump-mir=main foo.rs +> ls mir_dump/* | wc -l + 161 +``` + +The files have names like `rustc.main.000-000.CleanEndRegions.after.mir`. These +names have a number of parts: + +``` +rustc.main.000-000.CleanEndRegions.after.mir + ---- --- --- --------------- ----- either before or after + | | | name of the pass + | | index of dump within the pass (usually 0, but some passes dump intermediate states) + | index of the pass + def-path to the function etc being dumped +``` + +You can also make more selective filters. For example, `main & CleanEndRegions` will select +for things that reference *both* `main` and the pass `CleanEndRegions`: + +```bash +> rustc -Zdump-mir='main & CleanEndRegions' foo.rs +> ls mir_dump +rustc.main.000-000.CleanEndRegions.after.mir rustc.main.000-000.CleanEndRegions.before.mir +``` + +Filters can also have `|` parts to combine multiple sets of +`&`-filters. For example `main & CleanEndRegions | main & +NoLandingPads` will select *either* `main` and `CleanEndRegions` *or* +`main` and `NoLandingPads`: + +```bash +> rustc -Zdump-mir='main & CleanEndRegions | main & NoLandingPads' foo.rs +> ls mir_dump +rustc.main-promoted[0].002-000.NoLandingPads.after.mir +rustc.main-promoted[0].002-000.NoLandingPads.before.mir +rustc.main-promoted[0].002-006.NoLandingPads.after.mir +rustc.main-promoted[0].002-006.NoLandingPads.before.mir +rustc.main-promoted[1].002-000.NoLandingPads.after.mir +rustc.main-promoted[1].002-000.NoLandingPads.before.mir +rustc.main-promoted[1].002-006.NoLandingPads.after.mir +rustc.main-promoted[1].002-006.NoLandingPads.before.mir +rustc.main.000-000.CleanEndRegions.after.mir +rustc.main.000-000.CleanEndRegions.before.mir +rustc.main.002-000.NoLandingPads.after.mir +rustc.main.002-000.NoLandingPads.before.mir +rustc.main.002-006.NoLandingPads.after.mir +rustc.main.002-006.NoLandingPads.before.mir +``` + +(Here, the `main-promoted[0]` files refer to the MIR for "promoted constants" +that appeared within the `main` function.) + +### Implementing and registering a pass + +A `MirPass` is some bit of code that processes the MIR, typically -- +but not always -- transforming it along the way in some way. For +example, it might perform an optimization. The `MirPass` trait itself +is found in in [the `rustc_mir::transform` module][mirtransform], and +it basically consists of one method, `run_pass`, that simply gets an +`&mut Mir` (along with the tcx and some information about where it +came from). + +A good example of a basic MIR pass is [`NoLandingPads`], which walks the +MIR and removes all edges that are due to unwinding -- this is used +with when configured with `panic=abort`, which never unwinds. As you can see +from its source, a MIR pass is defined by first defining a dummy type, a struct +with no fields, something like: + +```rust +struct MyPass; +``` + +for which you then implement the `MirPass` trait. You can then insert +this pass into the appropriate list of passes found in a query like +`optimized_mir`, `mir_validated`, etc. (If this is an optimization, it +should go into the `optimized_mir` list.) + +If you are writing a pass, there's a good chance that you are going to +want to use a [MIR visitor] too -- those are a handy visitor that +walks the MIR for you and lets you make small edits here and there. + +### Stealing + +The intermediate queries `mir_const()` and `mir_validated()` yield up +a `&'tcx Steal>`, allocated using +`tcx.alloc_steal_mir()`. This indicates that the result may be +**stolen** by the next suite of optimizations – this is an +optimization to avoid cloning the MIR. Attempting to use a stolen +result will cause a panic in the compiler. Therefore, it is important +that you do not read directly from these intermediate queries except as +part of the MIR processing pipeline. + +Because of this stealing mechanism, some care must also be taken to +ensure that, before the MIR at a particular phase in the processing +pipeline is stolen, anyone who may want to read from it has already +done so. Concretely, this means that if you have some query `foo(D)` +that wants to access the result of `mir_const(D)` or +`mir_validated(D)`, you need to have the successor pass "force" +`foo(D)` using `ty::queries::foo::force(...)`. This will force a query +to execute even though you don't directly require its result. + +As an example, consider MIR const qualification. It wants to read the +result produced by the `mir_const()` suite. However, that result will +be **stolen** by the `mir_validated()` suite. If nothing was done, +then `mir_const_qualif(D)` would succeed if it came before +`mir_validated(D)`, but fail otherwise. Therefore, `mir_validated(D)` +will **force** `mir_const_qualif` before it actually steals, thus +ensuring that the reads have already happened: + +``` +mir_const(D) --read-by--> mir_const_qualif(D) + | ^ + stolen-by | + | (forces) + v | +mir_validated(D) ------------+ +``` + +This mechanism is a bit dodgy. There is a discussion of more elegant +alternatives in [rust-lang/rust#41710]. + +[rust-lang/rust#41710]: https://github.com/rust-lang/rust/issues/41710 +[mirtransform]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/transform/mod.rs +[`NoLandingPads`]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/transform/no_landing_pads.rs +[MIR visitor]: mir-visitor.html diff --git a/src/mir-regionck.md b/src/mir-regionck.md new file mode 100644 index 000000000..8be4fc8b9 --- /dev/null +++ b/src/mir-regionck.md @@ -0,0 +1,376 @@ +# MIR-based region checking (NLL) + +The MIR-based region checking code is located in +[the `rustc_mir::borrow_check::nll` module][nll]. (NLL, of course, +stands for "non-lexical lifetimes", a term that will hopefully be +deprecated once they become the standard kind of lifetime.) + +[nll]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll + +The MIR-based region analysis consists of two major functions: + +- `replace_regions_in_mir`, invoked first, has two jobs: + - First, it analyzes the signature of the MIR and finds the set of + regions that appear in the MIR signature (e.g., `'a` in `fn + foo<'a>(&'a u32) { ... }`. These are called the "universal" or + "free" regions -- in particular, they are the regions that + [appear free][fvb] in the function body. + - Second, it replaces all the regions from the function body with + fresh inference variables. This is because (presently) those + regions are the results of lexical region inference and hence are + not of much interest. The intention is that -- eventually -- they + will be "erased regions" (i.e., no information at all), since we + don't be doing lexical region inference at all. +- `compute_regions`, invoked second: this is given as argument the + results of move analysis. It has the job of computing values for all + the inference variabes that `replace_regions_in_mir` introduced. + - To do that, it first runs the [MIR type checker](#mirtypeck). This + is basically a normal type-checker but specialized to MIR, which + is much simpler than full Rust of course. Running the MIR type + checker will however create **outlives constraints** between + region variables (e.g., that one variable must outlive another + one) to reflect the subtyping relationships that arise. + - It also adds **liveness constraints** that arise from where variables + are used. + - More details to come, though the [NLL RFC] also includes fairly thorough + (and hopefully readable) coverage. + +[fvb]: background.html#free-vs-bound +[NLL RFC]: http://rust-lang.github.io/rfcs/2094-nll.html + +## Universal regions + +*to be written* -- explain the `UniversalRegions` type + +## Region variables and constraints + +*to be written* -- describe the `RegionInferenceContext` and +the role of `liveness_constraints` vs other `constraints`, plus + +## Closures + + + +## The MIR type-check + +## Representing the "values" of a region variable + +The value of a region can be thought of as a **set**; we call the +domain of this set a `RegionElement`. In the code, the value for all +regions is maintained in +[the `rustc_mir::borrow_check::nll::region_infer` module][ri]. For +each region we maintain a set storing what elements are present in its +value (to make this efficient, we give each kind of element an index, +the `RegionElementIndex`, and use sparse bitsets). + +[ri]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/borrow_check/nll/region_infer/ + +The kinds of region elements are as follows: + +- Each **location** in the MIR control-flow graph: a location is just + the pair of a basic block and an index. This identifies the point + **on entry** to the statement with that index (or the terminator, if + the index is equal to `statements.len()`). +- There is an element `end('a)` for each universal region `'a`, + corresponding to some portion of the caller's (or caller's caller, + etc) control-flow graph. +- Similarly, there is an element denoted `end('static)` corresponding + to the remainder of program execution after this function returns. +- There is an element `!1` for each skolemized region `!1`. This + corresponds (intuitively) to some unknown set of other elements -- + for details on skolemization, see the section + [skolemization and universes](#skol). + +## Causal tracking + +*to be written* -- describe how we can extend the values of a variable + with causal tracking etc + + + +## Skolemization and universes + +(This section describes ongoing work that hasn't landed yet.) + +From time to time we have to reason about regions that we can't +concretely know. For example, consider this program: + +```rust +// A function that needs a static reference +fn foo(x: &'static u32) { } + +fn bar(f: for<'a> fn(&'a u32)) { + // ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference + let x = 22; + f(&x); +} + +fn main() { + bar(foo); +} +``` + +This program ought not to type-check: `foo` needs a static reference +for its argument, and `bar` wants to be given a function that that +accepts **any** reference (so it can call it with something on its +stack, for example). But *how* do we reject it and *why*? + +### Subtyping and skolemization + +When we type-check `main`, and in particular the call `bar(foo)`, we +are going to wind up with a subtyping relationship like this one: + + fn(&'static u32) <: for<'a> fn(&'a u32) + ---------------- ------------------- + the type of `foo` the type `bar` expects + +We handle this sort of subtyping by taking the variables that are +bound in the supertype and **skolemizing** them: this means that we +replace them with +[universally quantified](background.html#quantified) +representatives, written like `!1`. We call these regions "skolemized +regions" -- they represent, basically, "some unknown region". + +Once we've done that replacement, we have the following types: + + fn(&'static u32) <: fn(&'!1 u32) + +The key idea here is that this unknown region `'!1` is not related to +any other regions. So if we can prove that the subtyping relationship +is true for `'!1`, then it ought to be true for any region, which is +what we wanted. (This number `!1` is called a "universe", for reasons +we'll get into later.) + +So let's work through what happens next. To check if two functions are +subtypes, we check if their arguments have the desired relationship +(fn arguments are [contravariant](./background.html#variance), so +we swap the left and right here): + + &'!1 u32 <: &'static u32 + +According to the basic subtyping rules for a reference, this will be +true if `'!1: 'static`. That is -- if "some unknown region `!1`" lives +outlives `'static`. Now, this *might* be true -- after all, `'!1` +could be `'static` -- but we don't *know* that it's true. So this +should yield up an error (eventually). + +### Universes and skolemized region elements + +But where does that error come from? The way it happens is like this. +When we are constructing the region inference context, we can tell +from the type inference context how many skolemized variables exist +(the `InferCtxt` has an internal counter). For each of those, we +create a corresponding universal region variable `!n` and a "region +element" `skol(n)`. This corresponds to "some unknown set of other +elements". The value of `!n` is `{skol(n)}`. + +At the same time, we also give each existential variable a +**universe** (also taken from the `InferCtxt`). This universe +determines which skolemized elements may appear in its value: For +example, a variable in universe U3 may name `skol(1)`, `skol(2)`, and +`skol(3)`, but not `skol(4)`. Note that the universe of an inference +variable controls what region elements **can** appear in its value; it +does not say region elements **will** appear. + +### Skolemization and outlives constraints + +In the region inference engine, outlives constraints have the form: + + V1: V2 @ P + +where `V1` and `V2` are region indices, and hence map to some region +variable (which may be universally or existentially quantified). This +variable will have a universe, so let's call those universes `U(V1)` +and `U(V2)` respectively. (Actually, the only one we are going to care +about is `U(V1)`.) + +When we encounter this constraint, the ordinary procedure is to start +a DFS from `P`. We keep walking so long as the nodes we are walking +are present in `value(V2)` and we add those nodes to `value(V1)`. If +we reach a return point, we add in any `end(X)` elements. That part +remains unchanged. + +But then *after that* we want to iterate over the skolemized `skol(u)` +elements in V2 (each of those must be visible to `U(V2)`, but we +should be able to just assume that is true, we don't have to check +it). We have to ensure that `value(V1)` outlives each of those +skolemized elements. + +Now there are two ways that could happen. First, if `U(V1)` can see +the universe `u` (i.e., `u <= U(V1)`), then we can just add `skol(u1)` +to `value(V1)` and be done. But if not, then we have to approximate: +we may not know what set of elements `skol(u1)` represents, but we +should be able to compute some sort of **upper bound** for it -- +something that it is smaller than. For now, we'll just use `'static` +for that (since it is bigger than everything) -- in the future, we can +sometimes be smarter here (and in fact we have code for doing this +already in other contexts). Moreover, since `'static` is in U0, we +know that all variables can see it -- so basically if we find a that +`value(V2)` contains `skol(u)` for some universe `u` that `V1` can't +see, then we force `V1` to `'static`. + +### Extending the "universal regions" check + +After all constraints have been propagated, the NLL region inference +has one final check, where it goes over the values that wound up being +computed for each universal region and checks that they did not get +'too large'. In our case, we will go through each skolemized region +and check that it contains *only* the `skol(u)` element it is known to +outlive. (Later, we might be able to know that there are relationships +between two skolemized regions and take those into account, as we do +for universal regions from the fn signature.) + +Put another way, the "universal regions" check can be considered to be +checking constraints like: + + {skol(1)}: V1 + +where `{skol(1)}` is like a constant set, and V1 is the variable we +made to represent the `!1` region. + +## Back to our example + +OK, so far so good. Now let's walk through what would happen with our +first example: + + fn(&'static u32) <: fn(&'!1 u32) @ P // this point P is not imp't here + +The region inference engine will create a region element domain like this: + + { CFG; end('static); skol(1) } + --- ------------ ------- from the universe `!1` + | 'static is always in scope + all points in the CFG; not especially relevant here + +It will always create two universal variables, one representing +`'static` and one representing `'!1`. Let's call them Vs and V1. They +will have initial values like so: + + Vs = { CFG; end('static) } // it is in U0, so can't name anything else + V1 = { skol(1) } + +From the subtyping constraint above, we would have an outlives constraint like + + '!1: 'static @ P + +To process this, we would grow the value of V1 to include all of Vs: + + Vs = { CFG; end('static) } + V1 = { CFG; end('static), skol(1) } + +At that point, constraint propagation is done, because all the +outlives relationships are satisfied. Then we would go to the "check +universal regions" portion of the code, which would test that no +universal region grew too large. + +In this case, `V1` *did* grow too large -- it is not known to outlive +`end('static)`, nor any of the CFG -- so we would report an error. + +## Another example + +What about this subtyping relationship? + + for<'a> fn(&'a u32, &'a u32) + <: + for<'b, 'c> fn(&'b u32, &'c u32) + +Here we would skolemize the supertype, as before, yielding: + + for<'a> fn(&'a u32, &'a u32) + <: + fn(&'!1 u32, &'!2 u32) + +then we instantiate the variable on the left-hand side with an existential +in universe U2, yielding: + + fn(&'?3 u32, &'?3 u32) + <: + fn(&'!1 u32, &'!2 u32) + +Then we break this down further: + + &'!1 u32 <: &'?3 u32 + &'!2 u32 <: &'?3 u32 + +and even further, yield up our region constraints: + + '!1: '?3 + '!2: '?3 + +Note that, in this case, both `'!1` and `'!2` have to outlive the +variable `'?3`, but the variable `'?3` is not forced to outlive +anything else. Therefore, it simply starts and ends as the empty set +of elements, and hence the type-check succeeds here. + +(This should surprise you a little. It surprised me when I first +realized it. We are saying that if we are a fn that **needs both of +its arguments to have the same region**, we can accept being called +with **arguments with two distinct regions**. That seems intuitively +unsound. But in fact, it's fine, as I +[tried to explain in this issue on the Rust issue tracker long ago][ohdeargoditsallbroken]. +The reason is that even if we get called with arguments of two +distinct lifetimes, those two lifetimes have some intersection (the +call itself), and that intersection can be our value of `'a` that we +use as the common lifetime of our arguments. -nmatsakis) + +[ohdeargoditsallbroken]: https://github.com/rust-lang/rust/issues/32330#issuecomment-202536977 + +## Final example + +Let's look at one last example. We'll extend the previous one to have +a return type: + + for<'a> fn(&'a u32, &'a u32) -> &'a u32 + <: + for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32 + +Despite seeming very similar to the previous example, this case is +going to get an error. That's good: the problem is that we've gone +from a fn that promises to return one of its two arguments, to a fn +that is promising to return the first one. That is unsound. Let's see how it plays out. + +First, we skolemize the supertype: + + for<'a> fn(&'a u32, &'a u32) -> &'a u32 + <: + fn(&'!1 u32, &'!2 u32) -> &'!1 u32 + +Then we instantiate the subtype with existentials (in U2): + + fn(&'?3 u32, &'?3 u32) -> &'?3 u32 + <: + fn(&'!1 u32, &'!2 u32) -> &'!1 u32 + +And now we create the subtyping relationships: + + &'!1 u32 <: &'?3 u32 // arg 1 + &'!2 u32 <: &'?3 u32 // arg 2 + &'?3 u32 <: &'!1 u32 // return type + +And finally the outlives relationships. Here, let V1, V2, and V3 be the variables +we assign to `!1`, `!2`, and `?3` respectively: + + V1: V3 + V2: V3 + V3: V1 + +Those variables will have these initial values: + + V1 in U1 = {skol(1)} + V2 in U2 = {skol(2)} + V3 in U2 = {} + +Now because of the `V3: V1` constraint, we have to add `skol(1)` into `V3` (and indeed +it is visible from `V3`), so we get: + + V3 in U2 = {skol(1)} + +then we have this constraint `V2: V3`, so we wind up having to enlarge +`V2` to include `skol(1)` (which it can also see): + + V2 in U2 = {skol(1), skol(2)} + +Now contraint propagation is done, but when we check the outlives +relationships, we find that `V2` includes this new element `skol(1)`, +so we report an error. + diff --git a/src/mir-visitor.md b/src/mir-visitor.md new file mode 100644 index 000000000..824ddd5b2 --- /dev/null +++ b/src/mir-visitor.md @@ -0,0 +1,45 @@ +# MIR visitor + +The MIR visitor is a convenient tool for traversing the MIR and either +looking for things or making changes to it. The visitor traits are +defined in [the `rustc::mir::visit` module][m-v] -- there are two of +them, generated via a single macro: `Visitor` (which operates on a +`&Mir` and gives back shared references) and `MutVisitor` (which +operates on a `&mut Mir` and gives back mutable references). + +[m-v]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir/visit.rs + +To implement a visitor, you have to create a type that represents +your visitor. Typically, this type wants to "hang on" to whatever +state you will need while processing MIR: + +```rust +struct MyVisitor<...> { + tcx: TyCtxt<'cx, 'tcx, 'tcx>, + ... +} +``` + +and you then implement the `Visitor` or `MutVisitor` trait for that type: + +```rust +impl<'tcx> MutVisitor<'tcx> for NoLandingPads { + fn visit_foo(&mut self, ...) { + // ... + self.super_foo(...); + } +} +``` + +As shown above, within the impl, you can override any of the +`visit_foo` methods (e.g., `visit_terminator`) in order to write some +code that will execute whenever a `foo` is found. If you want to +recursively walk the contents of the `foo`, you then invoke the +`super_foo` method. (NB. You never want to override `super_foo`.) + +A very simple example of a visitor can be found in [`NoLandingPads`]. +That visitor doesn't even require any state: it just visits all +terminators and removes their `unwind` successors. + +[`NoLandingPads`]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/transform/no_landing_pads.rs + diff --git a/src/mir.md b/src/mir.md index 34f6bbb8f..262082ed7 100644 --- a/src/mir.md +++ b/src/mir.md @@ -1,26 +1,229 @@ # The MIR (Mid-level IR) -MIR is Rust's _Mid-level Intermediate Representation_. It is constructed from -HIR (described in an earlier chapter). +MIR is Rust's _Mid-level Intermediate Representation_. It is +constructed from HIR (described in an earlier chapter). MIR was +introduced in [RFC 1211]. It is a radically simplified form of Rust +that is used for certain flow-sensitive safety checks -- notably the +borrow checker! -- and also for optimization and code generation. + +If you'd like a very high-level introduction to MIR, as well as some +of the compiler concepts that it relies on (such as control-flow +graphs and desugaring), you may enjoy the +[rust-lang blog post that introduced MIR][blog]. + +[blog]: https://blog.rust-lang.org/2016/04/19/MIR.html + +## Introduction to MIR MIR is defined in the [`src/librustc/mir/`][mir] module, but much of the code that manipulates it is found in [`src/librustc_mir`][mirmanip]. +[RFC 1211]: http://rust-lang.github.io/rfcs/1211-mir.html + +Some of the key characteristics of MIR are: + +- It is based on a [control-flow graph][cfg]. +- It does not have nested expressions. +- All types in MIR are fully explicit. + +[cfg]: ./background.html#cfg + +## Key MIR vocabulary + +This section introduces the key concepts of MIR, summarized here: + +- **Basic blocks**: units of the control-flow graph, consisting of: + - **statements:** actions with one successor + - **terminators:** actions with potentially multiple successors; always at the end of a block + - (if you're not familiar with the term basic block, see the [MIR background chapter][bg]) +- **Locals:** Memory locations alloated on the stack (conceptually, at + least), such as function arguments, local variables, and + temporaries. These are identified by an index, written with a + leading underscore, like `_1`. There is also a special "local" + (`_0`) allocated to store the return value. +- **Places:** expressions that identify a location in memory, like `_1` or `_1.f`. +- **Rvalues:** expressions that product a value. The "R" stands for + the fact that these are the "right-hand side" of an assignment. + - **Operands:** the arguments to an rvalue, which can either be a + constant (like `22`) or a place (like `_1`). + +You can get a feeling for how MIR is structed by translating simple +programs into MIR and ready the pretty printed output. In fact, the +playground makes this easy, since it supplies a MIR button that will +show you the MIR for your program. Try putting this program into play +(or [clicking on this link][sample-play]), and then clicking the "MIR" +button on the top: + +[sample-play]: https://play.rust-lang.org/?gist=30074856e62e74e91f06abd19bd72ece&version=stable + +```rust +fn main() { + let mut vec = Vec::new(); + vec.push(1); + vec.push(2); +} +``` + +You should see something like: + +``` +// WARNING: This output format is intended for human consumers only +// and is subject to change without notice. Knock yourself out. +fn main() -> () { + ... +} +``` + +This is the MIR format for the `main` function. + +**Variable declarations.** If we drill in a bit, we'll see it begins +with a bunch of variable declarations. They look like this: + +``` +let mut _0: (); // return place +scope 1 { + let mut _1: std::vec::Vec; // "vec" in scope 1 at src/main.rs:2:9: 2:16 +} +scope 2 { +} +let mut _2: (); +let mut _3: &mut std::vec::Vec; +let mut _4: (); +let mut _5: &mut std::vec::Vec; +``` + +You can see that variables in MIR don't have names, they have indices, +like `_0` or `_1`. We also intermingle the user's variables (e.g., +`_1`) with temporary values (e.g., `_2` or `_3`). You can tell the +difference between user-defined variables have a comment that gives +you their original name (`// "vec" in scope 1...`). + +**Basic blocks.** Reading further, we see our first **basic block** (naturally it may look +slightly different when you view it, and I am ignoring some of the comments): + +``` +bb0: { + StorageLive(_1); + _1 = const >::new() -> bb2; +} +``` + +A basic block is defined by a series of **statements** and a final **terminator**. +In this case, there is one statement: + +``` +StorageLive(_1); +``` + +This statement indicates that the variable `_1` is "live", meaning +that it may be used later -- this will persist until we encounter a +`StorageDead(_1)` statement, which indicates that the variable `_1` is +done being used. These "storage statements" are used by LLVM to +allocate stack space. -_NOTE: copy/pasted from README... needs editing_ +The **terminator** of the block `bb0` is the call to `Vec::new`: -# MIR definition and pass system +``` +_1 = const >::new() -> bb2; +``` + +Terminators are different from statements because they can have more +than one successor -- that is, control may flow to different +places. Function calls like the call to `Vec::new` are always +terminators because of the possibility of unwinding, although in the +case of `Vec::new` we are able to see that indeed unwinding is not +possible, and hence we list only one succssor block, `bb2`. + +If we look ahead to `bb2`, we will see it looks like this: + +``` +bb2: { + StorageLive(_3); + _3 = &mut _1; + _2 = const >::push(move _3, const 1i32) -> [return: bb3, unwind: bb4]; +} +``` + +Here there are two statements: another `StorageLive`, introducing the `_3` temporary, +and then an assignment: -This file contains the definition of the MIR datatypes along with the -various types for the "MIR Pass" system, which lets you easily -register and define new MIR transformations and analyses. +``` +_3 = &mut _1; +``` -Most of the code that operates on MIR can be found in the -`librustc_mir` crate or other crates. The code found here in -`librustc` is just the datatype definitions, along with the functions -which operate on MIR to be placed everywhere else. +Assignments in general have the form: -## MIR Data Types and visitor +``` + = +``` + +A place is an expression like `_3`, `_3.f` or `*_3` -- it denotes a +location in memory. An **Rvalue** is an expression that creates a +value: in this case, the rvalue is a mutable borrow expression, which +looks like `&mut `. So we can kind of define a grammar for +rvalues like so: + +``` + = & (mut)? + | + + | - + | ... + + = Constant + | copy Place + | move Place +``` + +As you can see from this grammar, rvalues cannot be nested -- they can +only reference places and constants. Moreover, when you use a place, +we indicate whether we are **copying it** (which requires that the +place have a type `T` where `T: Copy`) or **moving it** (which works +for a place of any type). So, for example, if we had the expression `x += a + b + c` in Rust, that would get compile to two statements and a +temporary: + +``` +TMP1 = a + b +x = TMP1 + c +``` + +([Try it and see, though you may want to do release mode to skip over the overflow checks.][play-abc]) + +[play-abc]: https://play.rust-lang.org/?gist=1751196d63b2a71f8208119e59d8a5b6&version=stable + +## MIR data types + +The MIR data types are defined in the [`src/librustc/mir/`][mir] +module. Each of the key concepts mentioned in the previous section +maps in a fairly straightforward way to a Rust type. + +The main MIR data type is `Mir`. It contains the data for a single +function (along with sub-instances of Mir for "promoted constants", +but [you can read about those below](#promoted)). + +- **Basic blocks**: The basic blocks are stored in the field + `basic_blocks`; this is a vector of `BasicBlockData` + structures. Nobody ever references a basic block directly: instead, + we pass around `BasicBlock` values, which are + [newtype'd] indices into this vector. +- **Statements** are represented by the type `Statement`. +- **Terminators** are represented by the `Terminator`. +- **Locals** are represented by a [newtype'd] index type `Local`. The + data for a local variable is found in the `Mir` (the `local_decls` + vector). There is also a special constant `RETURN_PLACE` identifying + the special "local" representing the return value. +- **Places** are identified by the enum `Place`. There are a few variants: + - Local variables like `_1` + - Static variables `FOO` + - **Projections**, which are fields or other things that "project + out" from a base place. So e.g. the place `_1.f` is a projection, + with `f` being the "projection element and `_1` being the base + path. `*_1` is also a projection, with the `*` being represented + by the `ProjectionElem::Deref` element. +- **Rvalues** are represented by the enum `Rvalue`. +- **Operands** are represented by the enum `Operand`. + +## MIR Visitor The main MIR data type is `rustc::mir::Mir`, defined in `mod.rs`. There is also the MIR visitor (in `visit.rs`) which allows you to walk @@ -32,74 +235,18 @@ routines for visiting the MIR CFG in [different standard orders][traversal] [traversal]: https://en.wikipedia.org/wiki/Tree_traversal -## MIR pass suites and their integration into the query system - -As a MIR *consumer*, you are expected to use one of the queries that -returns a "final MIR". As of the time of this writing, there is only -one: `optimized_mir(def_id)`, but more are expected to come in the -future. For foreign def-ids, we simply read the MIR from the other -crate's metadata. But for local def-ids, the query will construct the -MIR and then iteratively optimize it by putting it through various -pipeline stages. This section describes those pipeline stages and how -you can extend them. - -To produce the `optimized_mir(D)` for a given def-id `D`, the MIR -passes through several suites of optimizations, each represented by a -query. Each suite consists of multiple optimizations and -transformations. These suites represent useful intermediate points -where we want to access the MIR for type checking or other purposes: - -- `mir_build(D)` – not a query, but this constructs the initial MIR -- `mir_const(D)` – applies some simple transformations to make MIR ready for constant evaluation; -- `mir_validated(D)` – applies some more transformations, making MIR ready for borrow checking; -- `optimized_mir(D)` – the final state, after all optimizations have been performed. - -### Stealing - -The intermediate queries `mir_const()` and `mir_validated()` yield up -a `&'tcx Steal>`, allocated using -`tcx.alloc_steal_mir()`. This indicates that the result may be -**stolen** by the next suite of optimizations – this is an -optimization to avoid cloning the MIR. Attempting to use a stolen -result will cause a panic in the compiler. Therefore, it is important -that you do not read directly from these intermediate queries except as -part of the MIR processing pipeline. - -Because of this stealing mechanism, some care must also be taken to -ensure that, before the MIR at a particular phase in the processing -pipeline is stolen, anyone who may want to read from it has already -done so. Concretely, this means that if you have some query `foo(D)` -that wants to access the result of `mir_const(D)` or -`mir_validated(D)`, you need to have the successor pass "force" -`foo(D)` using `ty::queries::foo::force(...)`. This will force a query -to execute even though you don't directly require its result. - -As an example, consider MIR const qualification. It wants to read the -result produced by the `mir_const()` suite. However, that result will -be **stolen** by the `mir_validated()` suite. If nothing was done, -then `mir_const_qualif(D)` would succeed if it came before -`mir_validated(D)`, but fail otherwise. Therefore, `mir_validated(D)` -will **force** `mir_const_qualif` before it actually steals, thus -ensuring that the reads have already happened: - -``` -mir_const(D) --read-by--> mir_const_qualif(D) - | ^ - stolen-by | - | (forces) - v | -mir_validated(D) ------------+ -``` - -### Implementing and registering a pass - -To create a new MIR pass, you simply implement the `MirPass` trait for -some fresh singleton type `Foo`. Once you have implemented a trait for -your type `Foo`, you then have to insert `Foo` into one of the suites; -this is done in `librustc_driver/driver.rs` by invoking `push_pass(S, -Foo)` with the appropriate suite substituted for `S`. +## Representing constants + +TBD + + + +### Promoted constants + +TBD [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir [mirmanip]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir +[newtype'd]: glossary.html From fe632f2756646ddc8bc2d216935ca5e2fa410804 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Sun, 25 Feb 2018 20:55:56 -0500 Subject: [PATCH 2/4] apply mark-i-m's suggestions --- src/SUMMARY.md | 2 +- src/background.md | 20 ++--- src/glossary.md | 12 ++- src/mir-background.md | 122 ------------------------------ src/mir-borrowck.md | 8 +- src/mir-passes.md | 36 +++++---- src/mir-regionck.md | 167 +++++++++++++++++++++++++++++++++++------- src/mir-visitor.md | 10 +++ src/mir.md | 34 +++------ 9 files changed, 209 insertions(+), 202 deletions(-) delete mode 100644 src/mir-background.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index 29ad79998..ab9da7295 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -24,7 +24,7 @@ - [Type checking](./type-checking.md) - [The MIR (Mid-level IR)](./mir.md) - [MIR construction](./mir-construction.md) - - [MIR visitor](./mir-visitor.md) + - [MIR visitor and traversal](./mir-visitor.md) - [MIR passes: getting the MIR for a function](./mir-passes.md) - [MIR borrowck](./mir-borrowck.md) - [MIR-based region checking (NLL)](./mir-regionck.md) diff --git a/src/background.md b/src/background.md index 92ae6507a..50c247774 100644 --- a/src/background.md +++ b/src/background.md @@ -17,9 +17,9 @@ A control-flow graph is structured as a set of **basic blocks** connected by edges. The key idea of a basic block is that it is a set of statements that execute "together" -- that is, whenever you branch to a basic block, you start at the first statement and then execute -all the remainder. Only at the end of the is there the possibility of -branching to more than one place (in MIR, we call that final statement -the **terminator**): +all the remainder. Only at the end of the block is there the +possibility of branching to more than one place (in MIR, we call that +final statement the **terminator**): ``` bb0: { @@ -88,7 +88,8 @@ cycle. ## What is co- and contra-variance? -*to be written* +Check out the subtyping chapter from the +[Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html). @@ -97,18 +98,17 @@ cycle. Let's describe the concepts of free vs bound in terms of program variables, since that's the thing we're most familiar with. -- Consider this expression: `a + b`. In this expression, `a` and `b` - refer to local variables that are defined *outside* of the - expression. We say that those variables **appear free** in the - expression. To see why this term makes sense, consider the next - example. -- In contrast, consider this expression, which creates a closure: `|a, +- Consider this expression, which creates a closure: `|a, b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments that the closure will be given when it is called. We say that the `a` and `b` there are **bound** to the closure, and that the closure signature `|a, b|` is a **binder** for the names `a` and `b` (because any references to `a` or `b` within refer to the variables that it introduces). +- Consider this expression: `a + b`. In this expression, `a` and `b` + refer to local variables that are defined *outside* of the + expression. We say that those variables **appear free** in the + expression (i.e., they are **free**, not **bound** (tied up)). So there you have it: a variable "appears free" in some expression/statement/whatever if it refers to something defined diff --git a/src/glossary.md b/src/glossary.md index 2aa9b52f1..81eb62bc9 100644 --- a/src/glossary.md +++ b/src/glossary.md @@ -6,11 +6,16 @@ The compiler uses a number of...idiosyncratic abbreviations and things. This glo Term | Meaning ------------------------|-------- AST | the abstract syntax tree produced by the syntax crate; reflects user syntax very closely. +binder | a "binder" is a place where a variable or type is declared; for example, the `` is a binder for the generic type parameter `T` in `fn foo(..)`, and `|a| ...` is a binder for the parameter `a`. See [the background chapter for more](./background.html#free-vs-bound) +bound variable | a "bound variable" is one that is declared within an expression/term. For example, the variable `a` is bound within the closure expession `|a| a * 2`. See [the background chapter for more](./background.html#free-vs-bound) codegen unit | when we produce LLVM IR, we group the Rust code into a number of codegen units. Each of these units is processed by LLVM independently from one another, enabling parallelism. They are also the unit of incremental re-use. completeness | completeness is a technical term in type theory. Completeness means that every type-safe program also type-checks. Having both soundness and completeness is very hard, and usually soundness is more important. (see "soundness"). +control-flow graph | a representation of the control-flow of a program; see [the background chapter for more](./background.html#cfg) cx | we tend to use "cx" as an abbrevation for context. See also `tcx`, `infcx`, etc. DAG | a directed acyclic graph is used during compilation to keep track of dependencies between queries. ([see more](incremental-compilation.html)) +data-flow analysis | a static analysis that figures out what properties are true at each point in the control-flow of a program; see [the background chapter for more](./background.html#dataflow) DefId | an index identifying a definition (see `librustc/hir/def_id.rs`). Uniquely identifies a `DefPath`. +free variable | a "free variable" is one that is not bound within an expression or term; see [the background chapter for more](./background.html#free-vs-bound) 'gcx | the lifetime of the global arena ([see more](ty.html)) generics | the set of generic type parameters defined on a type or item HIR | the High-level IR, created by lowering and desugaring the AST ([see more](hir.html)) @@ -18,7 +23,7 @@ HirId | identifies a particular node in the HIR by combining HIR Map | The HIR map, accessible via tcx.hir, allows you to quickly navigate the HIR and convert between various forms of identifiers. ICE | internal compiler error. When the compiler crashes. ICH | incremental compilation hash. ICHs are used as fingerprints for things such as HIR and crate metadata, to check if changes have been made. This is useful in incremental compilation to see if part of a crate has changed and should be recompiled. -inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents value you are trying to find. Think of `X` in algebra. +inference variable | when doing type or region inference, an "inference variable" is a kind of special type/region that represents what you are trying to infer. Think of X in algebra. For example, if we are trying to infer the type of a variable in a program, we create an inference variable to represent that unknown type. infcx | the inference context (see `librustc/infer`) IR | Intermediate Representation. A general term in compilers. During compilation, the code is transformed from raw source (ASCII text) to various IRs. In Rust, these are primarily HIR, MIR, and LLVM IR. Each IR is well-suited for some set of computations. For example, MIR is well-suited for the borrow checker, and LLVM IR is well-suited for codegen because LLVM accepts it. local crate | the crate currently being compiled. @@ -27,14 +32,18 @@ LTO | Link-Time Optimizations. A set of optimizations offer MIR | the Mid-level IR that is created after type-checking for use by borrowck and trans ([see more](./mir.html)) miri | an interpreter for MIR used for constant evaluation ([see more](./miri.html)) newtype | a "newtype" is a wrapper around some other type (e.g., `struct Foo(T)` is a "newtype" for `T`). This is commonly used in Rust to give a stronger type for indices. +NLL | [non-lexical lifetimes](./mir-regionck.html), an extension to Rust's borrowing system to make it be based on the control-flow graph. node-id or NodeId | an index identifying a particular node in the AST or HIR; gradually being phased out and replaced with `HirId`. obligation | something that must be proven by the trait system ([see more](trait-resolution.html)) +promoted constants | constants extracted from a function and lifted to static scope; see [this section](./mir.html#promoted) for more details. provider | the function that executes a query ([see more](query.html)) +quantified | in math or logic, existential and universal quantification are used to ask questions like "is there any type T for which is true?" or "is this true for all types T?"; see [the background chapter for more](./background.html#quantified) query | perhaps some sub-computation during compilation ([see more](query.html)) region | another term for "lifetime" often used in the literature and in the borrow checker. sess | the compiler session, which stores global data used throughout compilation side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node. sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references. +skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)` as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir-regionck.html#skol) for more details. soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness"). span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more. substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap`) @@ -45,6 +54,7 @@ token | the smallest unit of parsing. Tokens are produced aft trans | the code to translate MIR into LLVM IR. trait reference | a trait and values for its type parameters ([see more](ty.html)). ty | the internal representation of a type ([see more](ty.html)). +variance | variance determines how changes to a generic type/lifetime parameter affect subtyping; for example, if `T` is a subtype of `U`, then `Vec` is a subtype `Vec` because `Vec` is *covariant* in its generic parameter. See [the background chapter for more](./background.html#variance). [LLVM]: https://llvm.org/ [lto]: https://llvm.org/docs/LinkTimeOptimization.html diff --git a/src/mir-background.md b/src/mir-background.md deleted file mode 100644 index 38fba5d16..000000000 --- a/src/mir-background.md +++ /dev/null @@ -1,122 +0,0 @@ -# MIR Background topics - -This section covers a numbers of common compiler terms that arise when -talking about MIR and optimizations. We try to give the general -definition while providing some Rust-specific context. - - - -## What is a control-flow graph? - -A control-flow graph is a common term from compilers. If you've ever -used a flow-chart, then the concept of a control-flow graph will be -pretty familiar to you. It's a representation of your program that -exposes the underlying control flow in a very clear way. - -A control-flow graph is structured as a set of **basic blocks** -connected by edges. The key idea of a basic block is that it is a set -of statements that execute "together" -- that is, whenever you branch -to a basic block, you start at the first statement and then execute -all the remainder. Only at the end of the is there the possibility of -branching to more than one place (in MIR, we call that final statement -the **terminator**): - -``` -bb0: { - statement0; - statement1; - statement2; - ... - terminator; -} -``` - -Many expressions that you are used to in Rust compile down to multiple -basic blocks. For example, consider an if statement: - -```rust -a = 1; -if some_variable { - b = 1; -} else { - c = 1; -} -d = 1; -``` - -This would compile into four basic blocks: - -``` -BB0: { - a = 1; - if some_variable { goto BB1 } else { goto BB2 } -} - -BB1: { - b = 1; - goto BB3; -} - -BB2: { - c = 1; - goto BB3; -} - -BB3: { - d = 1; - ...; -} -``` - -When using a control-flow graph, a loop simply appears as a cycle in -the graph, and the `break` keyword translates into a path out of that -cycle. - - - -## What is a dataflow analysis? - -*to be written* - - - -## What is "universally quantified"? What about "existentially quantified"? - -*to be written* - - - -## What is co- and contra-variance? - -*to be written* - - - -## What is a "free region" or a "free variable"? What about "bound region"? - -Let's describe the concepts of free vs bound in terms of program -variables, since that's the thing we're most familiar with. - -- Consider this expression: `a + b`. In this expression, `a` and `b` - refer to local variables that are defined *outside* of the - expression. We say that those variables **appear free** in the - expression. To see why this term makes sense, consider the next - example. -- In contrast, consider this expression, which creates a closure: `|a, - b| a + b`. Here, the `a` and `b` in `a + b` refer to the arguments - that the closure will be given when it is called. We say that the - `a` and `b` there are **bound** to the closure, and that the closure - signature `|a, b|` is a **binder** for the names `a` and `b` - (because any references to `a` or `b` within refer to the variables - that it introduces). - -So there you have it: a variable "appears free" in some -expression/statement/whatever if it refers to something defined -outside of that expressions/statement/whatever. Equivalently, we can -then refer to the "free variables" of an expression -- which is just -the set of variables that "appear free". - -So what does this have to do with regions? Well, we can apply the -analogous concept to type and regions. For example, in the type `&'a -u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it -does not. diff --git a/src/mir-borrowck.md b/src/mir-borrowck.md index b632addc2..3c10191d4 100644 --- a/src/mir-borrowck.md +++ b/src/mir-borrowck.md @@ -37,9 +37,9 @@ in several modes, but this text will describe only the mode when NLL is enabled The overall flow of the borrow checker is as follows: -- We first create a **local copy** C of the MIR. We will be modifying - this copy in place to modify the types and things to include - references to the new regions that we are computing. +- We first create a **local copy** C of the MIR. In the coming steps, + we will modify this copy in place to modify the types and things to + include references to the new regions that we are computing. - We then invoke `nll::replace_regions_in_mir` to modify this copy C. Among other things, this function will replace all of the regions in the MIR with fresh [inference variables](glossary.html). @@ -51,6 +51,6 @@ The overall flow of the borrow checker is as follows: - (More details can be found in [the NLL section](./mir-regionck.html).) - Finally, the borrow checker itself runs, taking as input (a) the results of move analysis and (b) the regions computed by the region - checker. This allows is to figure out which loans are still in scope + checker. This allows us to figure out which loans are still in scope at any particular point. diff --git a/src/mir-passes.md b/src/mir-passes.md index 2fe471385..6d657ae70 100644 --- a/src/mir-passes.md +++ b/src/mir-passes.md @@ -4,9 +4,9 @@ If you would like to get the MIR for a function (or constant, etc), you can use the `optimized_mir(def_id)` query. This will give you back the final, optimized MIR. For foreign def-ids, we simply read the MIR from the other crate's metadata. But for local def-ids, the query will -construct the MIR and then iteratively optimize it by putting it -through various pipeline stages. This section describes those pipeline -stages and how you can extend them. +construct the MIR and then iteratively optimize it by applying a +series of passes. This section describes how those passes work and how +you can extend them. To produce the `optimized_mir(D)` for a given def-id `D`, the MIR passes through several suites of optimizations, each represented by a @@ -97,18 +97,19 @@ that appeared within the `main` function.) ### Implementing and registering a pass A `MirPass` is some bit of code that processes the MIR, typically -- -but not always -- transforming it along the way in some way. For -example, it might perform an optimization. The `MirPass` trait itself -is found in in [the `rustc_mir::transform` module][mirtransform], and -it basically consists of one method, `run_pass`, that simply gets an +but not always -- transforming it along the way somehow. For example, +it might perform an optimization. The `MirPass` trait itself is found +in in [the `rustc_mir::transform` module][mirtransform], and it +basically consists of one method, `run_pass`, that simply gets an `&mut Mir` (along with the tcx and some information about where it -came from). +came from). The MIR is therefore modified in place (which helps to +keep things efficient). -A good example of a basic MIR pass is [`NoLandingPads`], which walks the -MIR and removes all edges that are due to unwinding -- this is used -with when configured with `panic=abort`, which never unwinds. As you can see -from its source, a MIR pass is defined by first defining a dummy type, a struct -with no fields, something like: +A good example of a basic MIR pass is [`NoLandingPads`], which walks +the MIR and removes all edges that are due to unwinding -- this is +used when configured with `panic=abort`, which never unwinds. As you +can see from its source, a MIR pass is defined by first defining a +dummy type, a struct with no fields, something like: ```rust struct MyPass; @@ -120,8 +121,9 @@ this pass into the appropriate list of passes found in a query like should go into the `optimized_mir` list.) If you are writing a pass, there's a good chance that you are going to -want to use a [MIR visitor] too -- those are a handy visitor that -walks the MIR for you and lets you make small edits here and there. +want to use a [MIR visitor]. MIR visitors are a handy way to walk all +the parts of the MIR, either to search for something or to make small +edits. ### Stealing @@ -149,7 +151,9 @@ be **stolen** by the `mir_validated()` suite. If nothing was done, then `mir_const_qualif(D)` would succeed if it came before `mir_validated(D)`, but fail otherwise. Therefore, `mir_validated(D)` will **force** `mir_const_qualif` before it actually steals, thus -ensuring that the reads have already happened: +ensuring that the reads have already happened (remember that +[queries are memoized](./query.html), so executing a query twice +simply loads from a cache the second time): ``` mir_const(D) --read-by--> mir_const_qualif(D) diff --git a/src/mir-regionck.md b/src/mir-regionck.md index 8be4fc8b9..d9d854081 100644 --- a/src/mir-regionck.md +++ b/src/mir-regionck.md @@ -10,11 +10,11 @@ deprecated once they become the standard kind of lifetime.) The MIR-based region analysis consists of two major functions: - `replace_regions_in_mir`, invoked first, has two jobs: - - First, it analyzes the signature of the MIR and finds the set of - regions that appear in the MIR signature (e.g., `'a` in `fn - foo<'a>(&'a u32) { ... }`. These are called the "universal" or - "free" regions -- in particular, they are the regions that - [appear free][fvb] in the function body. + - First, it finds the set of regions that appear within the + signature of the function (e.g., `'a` in `fn foo<'a>(&'a u32) { + ... }`. These are called the "universal" or "free" regions -- in + particular, they are the regions that [appear free][fvb] in the + function body. - Second, it replaces all the regions from the function body with fresh inference variables. This is because (presently) those regions are the results of lexical region inference and hence are @@ -49,6 +49,8 @@ the role of `liveness_constraints` vs other `constraints`, plus ## Closures +*to be written* + ## The MIR type-check @@ -131,15 +133,14 @@ replace them with representatives, written like `!1`. We call these regions "skolemized regions" -- they represent, basically, "some unknown region". -Once we've done that replacement, we have the following types: +Once we've done that replacement, we have the following relation: fn(&'static u32) <: fn(&'!1 u32) The key idea here is that this unknown region `'!1` is not related to any other regions. So if we can prove that the subtyping relationship is true for `'!1`, then it ought to be true for any region, which is -what we wanted. (This number `!1` is called a "universe", for reasons -we'll get into later.) +what we wanted. So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship @@ -154,6 +155,118 @@ outlives `'static`. Now, this *might* be true -- after all, `'!1` could be `'static` -- but we don't *know* that it's true. So this should yield up an error (eventually). +### What is a universe + +In the previous section, we introduced the idea of a skolemized +region, and we denoted it `!1`. We call this number `1` the **universe +index**. The idea of a "universe" is that it is a set of names that +are in scope within some type or at some point. Universes are formed +into a tree, where each child extends its parents with some new names. +So the **root universe** conceptually contains global names, such as +the the lifetime `'static` or the type `i32`. In the compiler, we also +put generic type parameters into this root universe. So consider +this function `bar`: + +```rust +struct Foo { } + +fn bar<'a, T>(t: &'a T) { + ... +} +``` + +Here, the root universe would consider of the lifetimes `'static` and +`'a`. In fact, although we're focused on lifetimes here, we can apply +the same concept to types, in which case the types `Foo` and `T` would +be in the root universe (along with other global types, like `i32`). +Basically, the root universe contains all the names that +[appear free](./background.html#free-vs-bound) in the body of `bar`. + +Now let's extend `bar` a bit by adding a variable `x`: + +```rust +fn bar<'a, T>(t: &'a T) { + let x: for<'b> fn(&'b u32) = ...; +} +``` + +Here, the name `'b` is not part of the root universe. Instead, when we +"enter" into this `for<'b>` (e.g., by skolemizing it), we will create +a child universe of the root, let's call it U1: + +``` +U0 (root universe) +│ +└─ U1 (child universe) +``` + +The idea is that this child universe U1 extends the root universe U0 +with a new name, which we are identifying by its universe number: +`!1`. + +Now let's extend `bar` a bit by adding one more variable, `y`: + +```rust +fn bar<'a, T>(t: &'a T) { + let x: for<'b> fn(&'b u32) = ...; + let y: for<'c> fn(&'b u32) = ...; +} +``` + +When we enter *this* type, we will again create a new universe, which +let's call `U2`. It's parent will be the root universe, and U1 will be +its sibling: + +``` +U0 (root universe) +│ +├─ U1 (child universe) +│ +└─ U2 (child universe) +``` + +This implies that, while in U2, we can name things from U0 or U2, but +not U1. + +**Giving existential variables a universe.** Now that we have this +notion of universes, we can use it to extend our type-checker and +things to prevent illegal names from leaking out. The idea is that we +give each inference (existential) variable -- whether it be a type or +a lifetime -- a universe. That variable's value can then only +reference names visible from that universe. So for example is a +lifetime variable is created in U0, then it cannot be assigned a value +of `!1` or `!2`, because those names are not visible from the universe +U0. + +**Representing universes with just a counter.** You might be surprised +to see that the compiler doesn't keep track of a full tree of +universes. Instead, it just keeps a counter -- and, to determine if +one universe can see another one, it just checks if the index is +greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see +U2, because 0 >= 2 is false. + +How can we get away with this? Doesn't this mean that we would allow +U2 to also see U1? The answer is that, yes, we would, **if that +question ever arose**. But because of the structure of our type +checker etc, there is no way for that to happen. In order for +something happening in the universe U1 to "communicate" with something +happening in U2, they would have to have a shared inference variable X +in common. And because everything in U1 is scoped to just U1 and its +children, that inference variable X would have to be in U0. And since +X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest +to see by using a kind of generic "logic" example: + +``` +exists { + forall { ... /* Y is in U1 ... */ } + forall { ... /* Z is in U2 ... */ } +} +``` + +Here, the only way for the two foralls to interact would be through X, +but neither Y nor Z are in scope when X is declared, so its value +cannot reference either of them. + ### Universes and skolemized region elements But where does that error come from? The way it happens is like this. @@ -179,10 +292,11 @@ In the region inference engine, outlives constraints have the form: V1: V2 @ P where `V1` and `V2` are region indices, and hence map to some region -variable (which may be universally or existentially quantified). This -variable will have a universe, so let's call those universes `U(V1)` -and `U(V2)` respectively. (Actually, the only one we are going to care -about is `U(V1)`.) +variable (which may be universally or existentially quantified). The +`P` here is a "point" in the control-flow graph; it's not important +for this section. This variable will have a universe, so let's call +those universes `U(V1)` and `U(V2)` respectively. (Actually, the only +one we are going to care about is `U(V1)`.) When we encounter this constraint, the ordinary procedure is to start a DFS from `P`. We keep walking so long as the nodes we are walking @@ -190,24 +304,24 @@ are present in `value(V2)` and we add those nodes to `value(V1)`. If we reach a return point, we add in any `end(X)` elements. That part remains unchanged. -But then *after that* we want to iterate over the skolemized `skol(u)` +But then *after that* we want to iterate over the skolemized `skol(x)` elements in V2 (each of those must be visible to `U(V2)`, but we should be able to just assume that is true, we don't have to check it). We have to ensure that `value(V1)` outlives each of those skolemized elements. Now there are two ways that could happen. First, if `U(V1)` can see -the universe `u` (i.e., `u <= U(V1)`), then we can just add `skol(u1)` +the universe `x` (i.e., `x <= U(V1)`), then we can just add `skol(x)` to `value(V1)` and be done. But if not, then we have to approximate: -we may not know what set of elements `skol(u1)` represents, but we -should be able to compute some sort of **upper bound** for it -- -something that it is smaller than. For now, we'll just use `'static` -for that (since it is bigger than everything) -- in the future, we can -sometimes be smarter here (and in fact we have code for doing this -already in other contexts). Moreover, since `'static` is in U0, we -know that all variables can see it -- so basically if we find a that -`value(V2)` contains `skol(u)` for some universe `u` that `V1` can't -see, then we force `V1` to `'static`. +we may not know what set of elements `skol(x)` represents, but we +should be able to compute some sort of **upper bound** B for it -- +some region B that outlives `skol(x)`. For now, we'll just use +`'static` for that (since it outlives everything) -- in the future, we +can sometimes be smarter here (and in fact we have code for doing this +already in other contexts). Moreover, since `'static` is in the root +universe U0, we know that all variables can see it -- so basically if +we find that `value(V2)` contains `skol(x)` for some universe `x` +that `V1` can't see, then we force `V1` to `'static`. ### Extending the "universal regions" check @@ -258,7 +372,7 @@ To process this, we would grow the value of V1 to include all of Vs: Vs = { CFG; end('static) } V1 = { CFG; end('static), skol(1) } -At that point, constraint propagation is done, because all the +At that point, constraint propagation is complete, because all the outlives relationships are satisfied. Then we would go to the "check universal regions" portion of the code, which would test that no universal region grew too large. @@ -280,8 +394,9 @@ Here we would skolemize the supertype, as before, yielding: <: fn(&'!1 u32, &'!2 u32) -then we instantiate the variable on the left-hand side with an existential -in universe U2, yielding: +then we instantiate the variable on the left-hand side with an +existential in universe U2, yielding the following (`?n` is a notation +for an existential variable): fn(&'?3 u32, &'?3 u32) <: diff --git a/src/mir-visitor.md b/src/mir-visitor.md index 824ddd5b2..3a8b06c54 100644 --- a/src/mir-visitor.md +++ b/src/mir-visitor.md @@ -43,3 +43,13 @@ terminators and removes their `unwind` successors. [`NoLandingPads`]: https://github.com/rust-lang/rust/tree/master/src/librustc_mir/transform/no_landing_pads.rs +## Traversal + +In addition the visitor, [the `rustc::mir::traversal` module][t] +contains useful functions for walking the MIR CFG in +[different standard orders][traversal] (e.g. pre-order, reverse +post-order, and so forth). + +[t]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir/traversal.rs +[traversal]: https://en.wikipedia.org/wiki/Tree_traversal + diff --git a/src/mir.md b/src/mir.md index 262082ed7..6e7ac0691 100644 --- a/src/mir.md +++ b/src/mir.md @@ -1,10 +1,10 @@ # The MIR (Mid-level IR) MIR is Rust's _Mid-level Intermediate Representation_. It is -constructed from HIR (described in an earlier chapter). MIR was -introduced in [RFC 1211]. It is a radically simplified form of Rust -that is used for certain flow-sensitive safety checks -- notably the -borrow checker! -- and also for optimization and code generation. +constructed from [HIR](./hir.html). MIR was introduced in +[RFC 1211]. It is a radically simplified form of Rust that is used for +certain flow-sensitive safety checks -- notably the borrow checker! -- +and also for optimization and code generation. If you'd like a very high-level introduction to MIR, as well as some of the compiler concepts that it relies on (such as control-flow @@ -35,20 +35,20 @@ This section introduces the key concepts of MIR, summarized here: - **Basic blocks**: units of the control-flow graph, consisting of: - **statements:** actions with one successor - **terminators:** actions with potentially multiple successors; always at the end of a block - - (if you're not familiar with the term basic block, see the [MIR background chapter][bg]) + - (if you're not familiar with the term *basic block*, see the [background chapter][cfg]) - **Locals:** Memory locations alloated on the stack (conceptually, at least), such as function arguments, local variables, and temporaries. These are identified by an index, written with a leading underscore, like `_1`. There is also a special "local" (`_0`) allocated to store the return value. - **Places:** expressions that identify a location in memory, like `_1` or `_1.f`. -- **Rvalues:** expressions that product a value. The "R" stands for +- **Rvalues:** expressions that produce a value. The "R" stands for the fact that these are the "right-hand side" of an assignment. - **Operands:** the arguments to an rvalue, which can either be a constant (like `22`) or a place (like `_1`). You can get a feeling for how MIR is structed by translating simple -programs into MIR and ready the pretty printed output. In fact, the +programs into MIR and reading the pretty printed output. In fact, the playground makes this easy, since it supplies a MIR button that will show you the MIR for your program. Try putting this program into play (or [clicking on this link][sample-play]), and then clicking the "MIR" @@ -96,7 +96,9 @@ You can see that variables in MIR don't have names, they have indices, like `_0` or `_1`. We also intermingle the user's variables (e.g., `_1`) with temporary values (e.g., `_2` or `_3`). You can tell the difference between user-defined variables have a comment that gives -you their original name (`// "vec" in scope 1...`). +you their original name (`// "vec" in scope 1...`). The "scope" blocks +(e.g., `scope 1 { .. }`) describe the lexical structure of the source +program (which names were in scope when). **Basic blocks.** Reading further, we see our first **basic block** (naturally it may look slightly different when you view it, and I am ignoring some of the comments): @@ -223,27 +225,15 @@ but [you can read about those below](#promoted)). - **Rvalues** are represented by the enum `Rvalue`. - **Operands** are represented by the enum `Operand`. -## MIR Visitor - -The main MIR data type is `rustc::mir::Mir`, defined in `mod.rs`. -There is also the MIR visitor (in `visit.rs`) which allows you to walk -the MIR and override what actions will be taken at various points (you -can visit in either shared or mutable mode; the latter allows changing -the MIR in place). Finally `traverse.rs` contains various traversal -routines for visiting the MIR CFG in [different standard orders][traversal] -(e.g. pre-order, reverse post-order, and so forth). - -[traversal]: https://en.wikipedia.org/wiki/Tree_traversal - ## Representing constants -TBD +*to be written* ### Promoted constants -TBD +*to be written* [mir]: https://github.com/rust-lang/rust/tree/master/src/librustc/mir From 55f75dc7bd666ae45452722382da9b6ba4b1fa76 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Tue, 27 Feb 2018 14:16:18 -0500 Subject: [PATCH 3/4] fix typo --- src/mir-regionck.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mir-regionck.md b/src/mir-regionck.md index d9d854081..571d7c673 100644 --- a/src/mir-regionck.md +++ b/src/mir-regionck.md @@ -20,7 +20,7 @@ The MIR-based region analysis consists of two major functions: regions are the results of lexical region inference and hence are not of much interest. The intention is that -- eventually -- they will be "erased regions" (i.e., no information at all), since we - don't be doing lexical region inference at all. + won't be doing lexical region inference at all. - `compute_regions`, invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variabes that `replace_regions_in_mir` introduced. From f45d419a1954d2412f1fe277d6b30d1d103bf806 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Wed, 28 Feb 2018 15:28:57 -0500 Subject: [PATCH 4/4] address nits --- src/glossary.md | 2 +- src/mir-regionck.md | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/src/glossary.md b/src/glossary.md index 81eb62bc9..bff2a6557 100644 --- a/src/glossary.md +++ b/src/glossary.md @@ -43,7 +43,7 @@ region | another term for "lifetime" often used in the literat sess | the compiler session, which stores global data used throughout compilation side tables | because the AST and HIR are immutable once created, we often carry extra information about them in the form of hashtables, indexed by the id of a particular node. sigil | like a keyword but composed entirely of non-alphanumeric tokens. For example, `&` is a sigil for references. -skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)` as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir-regionck.html#skol) for more details. +skolemization | a way of handling subtyping around "for-all" types (e.g., `for<'a> fn(&'a u32)`) as well as solving higher-ranked trait bounds (e.g., `for<'a> T: Trait<'a>`). See [the chapter on skolemization and universes](./mir-regionck.html#skol) for more details. soundness | soundness is a technical term in type theory. Roughly, if a type system is sound, then if a program type-checks, it is type-safe; i.e. I can never (in safe rust) force a value into a variable of the wrong type. (see "completeness"). span | a location in the user's source code, used for error reporting primarily. These are like a file-name/line-number/column tuple on steroids: they carry a start/end point, and also track macro expansions and compiler desugaring. All while being packed into a few bytes (really, it's an index into a table). See the Span datatype for more. substs | the substitutions for a given generic type or item (e.g. the `i32`, `u32` in `HashMap`) diff --git a/src/mir-regionck.md b/src/mir-regionck.md index 571d7c673..e7b12405a 100644 --- a/src/mir-regionck.md +++ b/src/mir-regionck.md @@ -12,7 +12,7 @@ The MIR-based region analysis consists of two major functions: - `replace_regions_in_mir`, invoked first, has two jobs: - First, it finds the set of regions that appear within the signature of the function (e.g., `'a` in `fn foo<'a>(&'a u32) { - ... }`. These are called the "universal" or "free" regions -- in + ... }`). These are called the "universal" or "free" regions -- in particular, they are the regions that [appear free][fvb] in the function body. - Second, it replaces all the regions from the function body with @@ -164,7 +164,8 @@ are in scope within some type or at some point. Universes are formed into a tree, where each child extends its parents with some new names. So the **root universe** conceptually contains global names, such as the the lifetime `'static` or the type `i32`. In the compiler, we also -put generic type parameters into this root universe. So consider +put generic type parameters into this root universe (in this sense, +there is not just one root universe, but one per item). So consider this function `bar`: ```rust @@ -175,7 +176,7 @@ fn bar<'a, T>(t: &'a T) { } ``` -Here, the root universe would consider of the lifetimes `'static` and +Here, the root universe would consist of the lifetimes `'static` and `'a`. In fact, although we're focused on lifetimes here, we can apply the same concept to types, in which case the types `Foo` and `T` would be in the root universe (along with other global types, like `i32`). @@ -214,7 +215,7 @@ fn bar<'a, T>(t: &'a T) { ``` When we enter *this* type, we will again create a new universe, which -let's call `U2`. It's parent will be the root universe, and U1 will be +we'll call `U2`. Its parent will be the root universe, and U1 will be its sibling: ```