Skip to content

Commit

Permalink
analyze: new implementation of MIR-to-HIR rewrite lifting (#934)
Browse files Browse the repository at this point in the history
This replaces `rewrite::expr::hir_op` with three new modules:

**`rewrite::expr::unlower`:** Builds the "unlowering map", which maps a
MIR "precise location" (a `Location` plus a `Vec<SubLoc>` path) to
information about the HIR that was lowered to produce that piece of the
MIR. Specifically, it records the `HirId` of the HIR `Expr` that was
lowered along with a description of how the MIR relates to the HIR. For
example:

For example:

```Rust
fn f(a: i32) -> i32 {
    a + 1
}

fn g(x: i32, y: i32) -> i32 {
    x + f(y)
}
```

For `f`, the unlowering map annotates the MIR as follows:

```
block bb0:
  bb0[0]: StorageLive(_2)
  bb0[1]: _2 = _1
    []: StoreIntoLocal, `a`
    [Rvalue]: Expr, `a`
  bb0[2]: _0 = Add(move _2, const 1_i32)
    []: StoreIntoLocal, `a + 1`
    [Rvalue]: Expr, `a + 1`
  bb0[3]: StorageDead(_2)
  bb0[4]: Terminator { source_info: ..., kind: return }
```

The statement `_2 = _1` is associated with the expression `a`; the
statement as a whole is storing the result of evaluating `a` into a MIR
local, and the statement's rvalue `_1` represents the expression `a`
itself. Similarly, `_0 = Add(move _2, const 1)` stores the result of `a
+ 1` into a local. If needed, we could extend the `unlower` pass to also
record that `move _2` (a.k.a. `bb0[2]` `[Rvalue, RvalueOperand(0)]`) is
lowered from the `Expr` `a`.

On `g`, the unlowering map includes the following (among other entries):

```
bb0[5]: Terminator { source_info: ..., kind: _4 = f(move _5) -> [return: bb1, unwind: bb2] }
  []: StoreIntoLocal, `f(y)`
  [Rvalue]: Expr, `f(y)`
  [Rvalue, CallArg(0)]: Expr, `y`
bb1[1]: _0 = Add(move _3, move _4)
  []: StoreIntoLocal, `x + f(y)`
  [Rvalue]: Expr, `x + f(y)`
```

The call terminator `_4 = f(move _5)` computes `f(y)` and stores the
result into a local; its rvalue is `f(y)` itself, and the first argument
of the rvalue is `y`.

**`rewrite::expr::distribute`:** Given the MIR rewrites produced by
`mir_op` and the unlowering map from `unlower`, this module distributes
MIR rewrites to HIR nodes (identified by their `HirId`s). This step also
checks for ambiguous case, such as multiple rewrites from different MIR
locations being mapped to the same HIR node, where it's unclear in which
order to apply the rewrites, and also detects MIR rewrites that couldn't
be mapped to any HIR node.

Using the example from above:

```
bb0[5]: Terminator { source_info: ..., kind: _4 = f(move _5) -> [return: bb1, unwind: bb2] }
  []: StoreIntoLocal, `f(y)`
  [Rvalue]: Expr, `f(y)`
  [Rvalue, CallArg(0)]: Expr, `y`
```

A MIR rewrite on `bb0[5]` `[Rvalue, CallArg(0)]` would be attached to
the MIR
`Expr` `y`, and a rewrite on `bb0[5]` `[Rvalue`] would be attached to
`f(y)`.
A MIR rewrite on `bb0[5]` `[]` (i.e. on the call terminator itself)
would
result in an error, since there is no good place in the HIR to attach
such a
rewrite.

**`rewrite::expr::convert`:** Converts the `MirRewrite`s for each HIR
`Expr` into `Span`-based `rewrite::Rewrite`s. These are returned as the
final result of `expr::gen_expr_rewrites` and are later applied to the
input source code using `rewrite::apply`.


---

Overall, this branch simplifies the handling of `SubLoc`s; `distribute`
treats `Vec<SubLoc>` paths as mostly opaque (it expects each MIR
rewrite's `sub_loc` field to exactly match an entry in the unlowering
map), and `convert` doesn't deal with `SubLoc`s at all. More precise
`SubLoc` handling, which will be needed for some tricky
pointer-to-pointer cases, goes in the `unlower` module, whose sole
purpose is establishing the mapping between `SubLoc`s and HIR. And
`unlower` makes the mapping between MIR and HIR explicit (unlike
`hir_op`, where it was computed implicitly while building rewrites), so
it can be inspected for easier debugging.

All test cases produce the same rewrites as before this branch, except
for one place in `lighttpd-minimal`, where we now place a
`Cell::from_mut` cast in a more appropriate location.
  • Loading branch information
spernsteiner committed Jun 12, 2023
2 parents 88d1f0e + 2100f0f commit 62a077a
Show file tree
Hide file tree
Showing 12 changed files with 691 additions and 381 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions c2rust-analyze/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ indexmap = "1.9.2"
env_logger = "0.10.0"
log = "0.4.17"
backtrace = "0.3.67"
itertools = "0.10"

[build-dependencies]
c2rust-build-paths = { path = "../c2rust-build-paths", version = "0.18.0" }
Expand Down
103 changes: 67 additions & 36 deletions c2rust-analyze/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ use rustc_middle::ty::{Ty, TyCtxt, TyKind, WithOptConstParam};
use rustc_span::Span;
use std::collections::{HashMap, HashSet};
use std::env;
use std::fmt::{Debug, Display};
use std::fmt::{Debug, Display, Write as _};
use std::iter;
use std::ops::{Deref, DerefMut, Index};
use std::panic::AssertUnwindSafe;
Expand Down Expand Up @@ -576,10 +576,70 @@ fn run(tcx: TyCtxt) {
}
eprintln!("reached fixpoint in {} iterations", loop_count);

// Print results for each function in `all_fn_ldids`, going in declaration order. Concretely,
// we iterate over `body_owners()`, which is a superset of `all_fn_ldids`, and filter based on
// membership in `func_info`, which contains an entry for each ID in `all_fn_ldids`.
// Buffer debug output for each function. Grouping together all the different types of info
// for a single function makes FileCheck tests easier to write.
let mut func_reports = HashMap::<LocalDefId, String>::new();

// Generate rewrites for all functions.
let mut all_rewrites = Vec::new();
for &ldid in &all_fn_ldids {
if gacx.fn_failed(ldid.to_def_id()) {
continue;
}

let info = func_info.get_mut(&ldid).unwrap();
let ldid_const = WithOptConstParam::unknown(ldid);
let name = tcx.item_name(ldid.to_def_id());
let mir = tcx.mir_built(ldid_const);
let mir = mir.borrow();
let acx = gacx.function_context_with_data(&mir, info.acx_data.take());
let mut asn = gasn.and(&mut info.lasn);

let r = panic_detail::catch_unwind(AssertUnwindSafe(|| {
// Add the CELL permission to pointers that need it.
info.dataflow.propagate_cell(&mut asn);

acx.check_string_literal_perms(&asn);

if util::has_test_attr(tcx, ldid, TestAttr::SkipRewrite) {
return;
}

let hir_body_id = tcx.hir().body_owned_by(ldid);
let expr_rewrites = rewrite::gen_expr_rewrites(&acx, &asn, &mir, hir_body_id);
let ty_rewrites = rewrite::gen_ty_rewrites(&acx, &asn, &mir, ldid);
// Print rewrites
let report = func_reports.entry(ldid).or_default();
writeln!(
report,
"generated {} expr rewrites + {} ty rewrites for {:?}:",
expr_rewrites.len(),
ty_rewrites.len(),
name
)
.unwrap();
for &(span, ref rw) in expr_rewrites.iter().chain(ty_rewrites.iter()) {
writeln!(report, " {}: {}", describe_span(tcx, span), rw).unwrap();
}
writeln!(report).unwrap();
all_rewrites.extend(expr_rewrites);
all_rewrites.extend(ty_rewrites);
}));
match r {
Ok(()) => {}
Err(pd) => {
gacx.mark_fn_failed(ldid.to_def_id(), pd);
continue;
}
}

info.acx_data.set(acx.into_data());
}

// Print analysis results for each function in `all_fn_ldids`, going in declaration order.
// Concretely, we iterate over `body_owners()`, which is a superset of `all_fn_ldids`, and
// filter based on membership in `func_info`, which contains an entry for each ID in
// `all_fn_ldids`.
for ldid in tcx.hir().body_owners() {
// Skip any body owners that aren't present in `func_info`, and also get the info itself.
let info = match func_info.get_mut(&ldid) {
Expand All @@ -596,10 +656,7 @@ fn run(tcx: TyCtxt) {
let mir = tcx.mir_built(ldid_const);
let mir = mir.borrow();
let acx = gacx.function_context_with_data(&mir, info.acx_data.take());
let mut asn = gasn.and(&mut info.lasn);
info.dataflow.propagate_cell(&mut asn);

acx.check_string_literal_perms(&asn);
let asn = gasn.and(&mut info.lasn);

// Print labeling and rewrites for the current function.

Expand All @@ -622,34 +679,8 @@ fn run(tcx: TyCtxt) {
rewrite::dump_rewritten_local_tys(&acx, &asn, &mir, describe_local);

eprintln!();

if util::has_test_attr(tcx, ldid, TestAttr::SkipRewrite) {
continue;
}

let r = panic_detail::catch_unwind(AssertUnwindSafe(|| {
let hir_body_id = tcx.hir().body_owned_by(ldid);
let expr_rewrites = rewrite::gen_expr_rewrites(&acx, &asn, &mir, hir_body_id);
let ty_rewrites = rewrite::gen_ty_rewrites(&acx, &asn, &mir, ldid);
// Print rewrites
eprintln!(
"\ngenerated {} expr rewrites + {} ty rewrites for {:?}:",
expr_rewrites.len(),
ty_rewrites.len(),
name
);
for &(span, ref rw) in expr_rewrites.iter().chain(ty_rewrites.iter()) {
eprintln!(" {}: {}", describe_span(tcx, span), rw);
}
all_rewrites.extend(expr_rewrites);
all_rewrites.extend(ty_rewrites);
}));
match r {
Ok(()) => {}
Err(pd) => {
gacx.mark_fn_failed(ldid.to_def_id(), pd);
continue;
}
if let Some(report) = func_reports.remove(&ldid) {
eprintln!("{}", report);
}
}

Expand Down
Loading

0 comments on commit 62a077a

Please sign in to comment.