Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the way interpreters handle special functions #121273

Closed
wants to merge 6 commits into from

Conversation

WaffleLapkin
Copy link
Member

... by reusing the ExtraFnVal type.

This changes find_mir_or_eval_fn into find_mir_or_extra_fn, which helps structure code a little bit better. Having separate "find info for evaluation" and "evaluate" steps is nice on its own, but it's also required for tail calls.

(this is needed for #113128, as otherwise correctly implementing tail calls in the interpreter creates a dependency loop)

r? @oli-obk

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 18, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 18, 2024

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

The Miri subtree was changed

cc @rust-lang/miri

@rust-log-analyzer

This comment has been minimized.

By reusing the `ExtraFnVal` type.

This changes `find_mir_or_eval_fn` into `find_mir_or_extra_fn`, which helps
structure code a little bit better. Having separate "find info for evaluation"
and "evaluate" steps is nice on its own, but it's also required for tail calls.
@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented Feb 18, 2024

Stack overflow happens on this example:

#![feature(const_align_offset)]

const _: usize = (1 as *const ()).align_offset(2);

fn main() {}

With the following infinite recursion:

  • eval_fn_call -> call_extra_fn (ref)
  • call_extra_fn -> eval_fn_call (ref)

I'm not exactly sure how to resolve this... When handling align_offset we need to try special const eval version, but if it fails (the pointer doesn't point to an allocation) we need to actually execute the normal function body, which gets us to the same exact Instance which makes us retry the const eval path....

I can add a dont_use_extra_fn: bool to the eval_fn_call/find_mir_or_extra_fn so that find_mir_or_extra_fn returns the actual body if it's true but uuuhhhhh. I don't like this solution... Any ideas?

@@ -191,22 +192,17 @@ pub trait Machine<'mir, 'tcx: 'mir>: Sized {
/// nor just jump to `ret`, but instead push their own stack frame.)
/// Passing `dest`and `ret` in the same `Option` proved very annoying when only one of them
/// was used.
fn find_mir_or_eval_fn(
fn find_mir_or_extra_fn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation of this function needs an update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it up, should be better now.

Athough it looks like there are other tests and comments mentioning __rust_maybe_catch_panic, which doesn't seem to exist anymore?... Possibly someone needs to go over them and update them too.

@WaffleLapkin
Copy link
Member Author

@bors r=oli-obk

@bors
Copy link
Contributor

bors commented Feb 19, 2024

📌 Commit 3f17b34 has been approved by oli-obk

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 19, 2024
Copy link
Member

@RalfJung RalfJung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I still have some concerns though. @bors r-

Having separate "find info for evaluation" and "evaluate" steps is nice on its own,

I buy this in the abstract, but looking at the PR I am not entirely sold:

  • You had to expose a fairly low-level function (eval_fn_call) as fully pub now, and add an even-lower-level helper (eval_body) as pub(crate), indicating maybe the new structure is worse than the old one in terms of encapsulation.
  • The phase separation doesn't seem to be entirely working, given that in a bunch of cases call_extra_fn has to push a stack frame with MIR in it. So now we have two different code paths that do something similar: find_mir_or_extra_fn returns MIR, or it returns an ExtraFnVal and then call_extra_fn gets the MIR and has to deal with it by itself. That's definitely worse than before. Is there a way to avoid this redundancy?

Maybe explain on a high level the problem you are trying to solve. We should look for a better design that solves your problem while avoiding introducing these issues.

// const eval extra fn which ends up here, so calling `eval_fn_call`
// would cause infinite recursion and stack overflow.
let body = ecx.load_mir(instance.def, None)?;
ecx.eval_body(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having to introduce and expose a new very low-level function here, would it work to have find_mir_or_extra_fn make the choice of whether to call the real function or the interpreter shim? Fundamentally the problem seems to be that we first said we want an ExtraFn but then later we decided we want MIR anyway.

OTOH, Miri also has that situation, I wonder how this is handled there.

Comment on lines +30 to +33
pub enum ExtraFnVal {
ForeignFn { link_name: Symbol },
DynSym(DynSym),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these represent function pointers to a named global symbol. Why do we need to distinguish these variants?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. They were handled differently before and I just kept the same difference — DynSym has an assert that makes sure that we actually used a shim, and not an exported symbol.

Copy link
Member

@RalfJung RalfJung Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, this type had a clear meaning: it represented "things a function pointer can point to". The interpreter natively supports function pointers that point to an Instance, but for dlsym support we needed function pointers that point to something else, and that's what this extra-fn stuff is about.

But with your change, now there can be a function pointer that points to DynSym("functionname"), and a function pointer that points to ForeignFn { link_name: "functionname" }, and you made those be different things. That's not pre-existing, that's new in this PR. And I don't see why we need these two kinds of function pointers. The assert indicates that either these should be two different types, or one type that's just a Symbol like DynSym before. But I can't see why it would be a good idea to make this a sum type with two variants that conceptually represent the same thing.

Using ExtraFnVal as return type in find_mir_or_extra_fn makes some sense, because it matches the same dichotomy: either a native Rust function (Instance/MIR body), or "something else". Probably things could be cleaned up a bit by actually using FnVal<ExtraFnVal> as return type, then the function pointer ("indirect call"/Ty::FnPtr) codepath and the direct call (Ty::FnDef) code path can be even more unified.

But that is something we can do later, currently there are bigger problems.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not pre-existing

I'm pretty sure it is. DynSyms are handled in emulate_dyn_sym here:

let res = self.emulate_foreign_item(sym.0, abi, args, dest, ret, unwind)?;
assert!(res.is_none(), "DynSyms that delegate are not supported");

While usual foreign function are handled in a branch in find_mir_or_eval_fn here:

return ecx.emulate_foreign_item(link_name, abi, &args, dest, ret, unwind);

Those are slightly distinct (see the assert) even though they result in basically the same emulate_foreign_item call.

I have nothing against having only a single type, but I'm not sure how to keep this assert then...

Copy link
Member

@RalfJung RalfJung Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure it is. DynSyms are handled in emulate_dyn_sym here:

While usual foreign function are handled in a branch in find_mir_or_eval_fn here:

Yes, but that's not what I was talking about. We just had two separate codepaths for handling calls to Rust items vs dynamically loaded items. The "obvious" thing for you to do would be having two different types to represent this. You decided to unify these types, which I can get behind (in fact I think it's a great idea), but it's a bad idea to unify them by introducing redundancy in the value type of the interpreter.

Here's what I said: there were before not two ways to represent a function pointer. You then pointed at two functions doing related things. That's a category error as I was talking about representation! So any counterargument can only point at data in the interpreter, not interpreter functions.

Those are slightly distinct (see the assert) even though they result in basically the same emulate_foreign_item call.

Yeah, the dynsym handling doesn't currently support calling a MIR-implemented function. We simply didn't need this so far. It may be that with your patch, supporting this becomes trivial, so then there's no more reason for the assert.

| "ExitProcess"
=> {
let exp_abi = if link_name.as_str() == "exit" {
Abi::C { unwind: false }
} else {
Abi::System { unwind: false }
};
let [code] = this.check_shim(abi, exp_abi, link_name, args)?;
let args = this.copy_fn_args(args)?; // FIXME: Should `InPlace` arguments be reset to uninit?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this duplicated so often now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there is a code path which needs &[FnArg], but most need a &[OpTy].

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's now more of these asserts than before though, I think? So what changed?

@@ -497,15 +498,15 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
/// `with_caller_location` indicates whether the caller passed a caller location. Miri
/// implements caller locations without argument passing, but to match `FnAbi` we need to know
/// when those arguments are present.
pub(crate) fn eval_fn_call(
pub fn eval_fn_call(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not happy with making this more public, this is a rather low-level function. Can this be avoided?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can probably be avoided with the return Result<Option<Instance>> scheme.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like returning Option<Instance> is the same level of awkwardness...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has the awkwardness of attempting to split "find the fn" from "call the fn", but then in quite a few cases "call the fn" ends up finding MIR elsewhere, completely breaking the structure.

If it is okay for call_extra_fn to call eval_fn_call, then we're just back to where we were before but with more messy code, I think? I don't see how you gained anything.

@@ -885,6 +709,198 @@ impl<'mir, 'tcx: 'mir, M: Machine<'mir, 'tcx>> InterpCx<'mir, 'tcx, M> {
}
}

pub(crate) fn eval_body(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a doc comment.

@RalfJung
Copy link
Member

@bors r-
since bors does not seem to be reading review comments

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 19, 2024
@WaffleLapkin
Copy link
Member Author

@RalfJung the problem I'm trying to solve is needing to pass arguments to find_mir_or_.... This is a problem because it creates the following circular dependency: to evaluate arguments for a tail call we need to create a new stack frame1, but to create a new stack frame, we need to know what function we are calling, which requires calling find_mir_or_..., and so on.


would it work to have find_mir_or_extra_fn make the choice of whether to call the real function or the interpreter shim?

I don't think so, no. This won't work because align_offset needs to check arguments to know if we want to call normal function or the shim (depending on if this is a pointer to an allocation or a pointer typed integer).

Although I guess we could implement both paths in a shim?... The problem is only the complexity of align_offset implementation.

But other than that, I think all other cases can decide based on just the instance.


Fundamentally the problem seems to be that we first said we want an ExtraFn but then later we decided we want MIR anyway.

Yes, that's the awkward thing. It both causes my original problem (evaluating special function requires arguments, but getting mir shouldn't) and the awkwardness of this PR (everything calling into each other, eval_body, etc).

Footnotes

  1. We need to copy arguments to a new stack frame, otherwise they die before being passed to the function

@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented Feb 19, 2024

Although. Mh. It looks like naive implementation of align_offset is not that complex, actually? (unless I've missed some edgecase ofc; UPD: I indeed missed the case where size is not a power of two)

(play)

@WaffleLapkin
Copy link
Member Author

WaffleLapkin commented Feb 19, 2024

Actually, InterpCx::align_offset already calls the original instance in the const eval case, so this might be easier than I thought?... Ah, nevermind, copying this call would also cause infinite recursion, I'm pretty sure.

@RalfJung
Copy link
Member

Yes, that's the awkward thing. It both causes my original problem (evaluating special function requires arguments, but getting mir shouldn't) and the awkwardness of this PR (everything calling into each other, eval_body, etc).

Ah okay so that's why you can't have find_mir_or_extra_fn check the argument for align_offset to decide whether to pick a MIR body or call a special handler? I'll have to spend more time to understand why you need to find the MIR without knowing the arguments (not sure when I will have the time to read all the things you wrote about that), but I can just take this constraint as given for now.

For align_offset, I think this means you need to always pick the MIR body (making align_offset not even be a special function any more), and then in the MIR body call a special intrinsic that allows to early return in a special case in const-eval. This could be done as a separate PR preparing for this one.

But the problem isn't solved then, Miri also has some cases where it wants to run a MIR body:

  • easy: panic_impl should call the #[panic_handler] function. That can just be done in find_mir_or_extra_fn rather than eval_extra_fn.
  • medium: the Rust allocator functions should call the #[global_allocator] when it exists, but be special magic functions without MIR when it does not. This may mean that both find_mir_or_extra_fn and eval_extra_fn need to match on these names, to handle these two cases.
  • hard: when a foreign (extern) function is not supported, we fall back to finding a function with that name as its #[link_name]. But of course in find_mir_or_extra_fn we can't know yet whether it is supported or not... I am not sure how to best to this without losing the ability to detect the situation where someone uses #[link_name] to overwrite a function Miri supports (which are generally well-known functions that it is UB to overwrite). We could completely refactor the "foreign item" handling in Miri such that there is a global table / registry from function names to handlers for these functions, and then find_mir_or_extra_fn could check #[link_name] and check if an entry exists in that table to make sure we don't overwrite such well-known functions... but that would be a huge refactoring, I hope there's an easier way.

@WaffleLapkin
Copy link
Member Author

For align_offset, I think this means you need to always pick the MIR body (making align_offset not even be a special function any more), and then in the MIR body call a special intrinsic that allows to early return in a special case in const-eval. This could be done as a separate PR preparing for this one.

Makes sense, sure.

But the problem isn't solved then, Miri also has some cases where it wants to run a MIR body:

I think all of those are solved by allowing find_mir_or_extra_fn to return a different instance/body pair (as it was before the current changes) + using more enums inside ExtraFnVal.

  • panic_implfind_mir_or_extra_fn returns the instance/body of the right function
  • global_allocatorfind_mir_or_extra_fn checks the allocator mode and either returns the instance/body of the right function or returns an enum describing the function to be emulated
  • foreign functions — same deal

@RalfJung
Copy link
Member

RalfJung commented Feb 20, 2024

foreign functions — same deal

How is find_mir_or_extra_fn supposed to know whether Miri supports a given foreign function?
This is one big match over the function name, and the match will live in eval_extra_fn with your approach. I certainly don't want to duplicate this list.

In fact it gets even worse -- Miri has support for loading a .so file and dynamically using functions from that .so file to implement foreign functions. So knowing whether we support a function involves checking whether the .so file contains a function of the given name.

@RalfJung
Copy link
Member

RalfJung commented Feb 20, 2024

Looking at the current align_offset handling (in master) -- woah that is messy.^^ align_offset calls eval_fn_call which recursively calls align_offset again but now with different arguments so there's no loop.

Miri foreign item handling also has to sometimes push a function, which is similar to what align_offset needs to do, but Miri does it quite differently.

And neither of these seem great, they're both accessing pretty low-level interpreter functionality. If I were to refactor this now I'd try to extract the common code between eval_fn_call and Miri's call_function: something that takes instance, MIR body, ABIs, etc, as well as arguments, and then pushes the stack frame and passes the arguments. Basically, the part of eval_fn_call that's after find_mir_or_eval_fn. Both CTFE align_offset and Miri should be able to use this. And this seems quite similar to your eval_body as well! I didn't realize this is what it does, the diff makes it quite hard to see. So maybe that would be a good first step -- make eval_fn_call and push_stack_frame private (or pub(super) if needed) and make the only pub thing something like eval_body -- though I'd name it something more like push_stack_frame_with_args or so.

That doesn't achieve your desired split of "find fn" and "call fn" though, which I still need to understand better.

@RalfJung
Copy link
Member

This is a problem because it creates the following circular dependency: to evaluate arguments for a tail call we need to create a new stack frame1, but to create a new stack frame, we need to know what function we are calling, which requires calling find_mir_or_..., and so on.

Hm... but normal calls also already have that issue, don't they? We evaluate arguments after pushing the stack frame. And it does work fine. What changes when considering tail calls?

align_offset is awkward indeed. I've now sketched two ways to improve it:

  • make it not be special by itself, but instead call an intrinsic to allow the interpreter to short-circuit its evaluation.
  • use push_stack_frame_with_args instead of eval_fn_call to avoid the circularity.

@oli-obk
Copy link
Contributor

oli-obk commented Feb 20, 2024

For intrinsics with fallback bodies we will need a similar scheme soon (fall back to a different instance than the one invoked)

@oli-obk
Copy link
Contributor

oli-obk commented Feb 20, 2024

I don't understand the state of the discussion here anymore. We can figure out how to untangle align_offset, but that seems orthogonal to this PR

IMO the PR I approved was a good refactoring. We could make it more codegen-like by not calling the bodies directly, but by bubbling the new instance outwards towards the regular call logic. But that's not strictly necessary

@RalfJung
Copy link
Member

RalfJung commented Feb 20, 2024

IMO this PR is a strict decrease in code organization and maintainability.

Currently we have a single clear entrypoint for "what happens on a fn call": find_mir_or_eval_fn. It either returns the MIR+Instance to use or just completely emulates handles the call itself. What align_offset does is messy, but this PR doesn't improve that I think?

With this PR we have one function (find_mir_or_extra_fn) that will sometimes find the MIR and sometimes return a ExtraFnVal, and then a second function (call_extra_fn) that sometimes finds the MIR and sometimes handles the call. When call_extra_fn finds the MIR it has to take care of initializing the stack frame itself, but when find_mir_or_extra_fn finds the MIR then the stack frame handling is done by the core interpreter. This is redundant, and inconsistent. That looks strictly worse than the status quo to me. We also have to make previously private interpreter functions public, which is never a good sign. And then there is the change to the value type for function pointers in Miri (which is ExtraFnVal's primary purpose), which I don't think makes sense.

Also if it is somehow necessary to find the MIR in find_mir_or_extra_fn for tail calls, this PR doesn't achieve that since sometimes it's call_extra_fn that finds the MIR. So I don't even understand how this resolves whatever issue there is with tail calls.

I can think of various ways to clean this up, since I agree that improvements are possible here. I've mentioned some of them above. But I don't think this PR is an improvement. And I don't understand how tail calls factor in so I don't know which of the refactorings I have in my mind are prepared for tail calls.

Without taking into account tail calls, the first refactor I would try is to notice that eval_fn_call kind of has 3 branches: call_extra_fn, call_intrinsic, find_mir_or_eval_fn. They differ quite a bit in what the hooks can do and which services are "offered" to them (like being able to just return a MIR body and have someone else do the work of initializing the stack frame). These could probably be merged into a single function (i.e., a single machine hook, probably still called find_mir_or_eval_fn) that takes FnVal and returns Option<(&'mir mir::Body<'tcx>, ty::Instance<'tcx>)>. And then I already mentioned above that it would be a good idea to make eval_fn_call and push_stack_frame private and only expose a push_stack_frame_with_args. (Though that does make me wonder which caller FnAbi Miri's call_function helper should use.)

What I don't see is why splitting this new version of find_mir_or_eval_fn into two parts is a good idea. This introduces a two-phase split with a "narrow waist" into the logic of that function, and that split is just not natural if we look at what this function does today. (If this truly did achieve separating of "find function" from "eval function", I could be convinced that this helps structure the code. But this is not what the PR does. The "eval function" hook in many cases still ends up just finding the function. And it's not surprising, the things we want to do in Miri just don't lend themselves to introducing such a "narrow waist". So looking at the machine trait I might be misled into thinking that things are organized one way, when actually that organization isn't present.) Intrinsic fallback bodies shouldn't need this. Tail calls maybe do but I haven't yet understood why. Once I understand the constraints I'm happy to help design a refactor that satisfies them. I don't want to block tail calls, but before we land a PR that imposes a structure on the code that the code just doesn't have, complete with the work-arounds required because of that mismatch, I'd like us to take a step back to the drawing board.

@RalfJung
Copy link
Member

(I've kept editing the post above, please read on Github of you're usually just reading the email notifications.)

@bors
Copy link
Contributor

bors commented Feb 27, 2024

☔ The latest upstream changes (presumably #121655) made this pull request unmergeable. Please resolve the merge conflicts.

@WaffleLapkin
Copy link
Member Author

@RalfJung

Also if it is somehow necessary to find the MIR in find_mir_or_extra_fn for tail calls, this PR doesn't achieve that since sometimes it's call_extra_fn that finds the MIR. So I don't even understand how this resolves whatever issue there is with tail calls.

I think what I had in mind is that if find_mir_or_extra_fn returns an extra fn, we downgrade the tail call to a normal call. This is questionable, but I believe would be correct given that extra fns don't use tail calls themselves.


Still, I agree that this PR is not a clear improvement. Let me try to explain the problem I'm trying to solve.

So the problem is here (TerminatorKind::TailCall PR):

let Some(prev_frame) = self.stack_mut().pop() else {
span_bug!(
terminator.source_info.span,
"empty stack while evaluating this tail call"
)
};
let StackPopCleanup::Goto { ret, unwind } = prev_frame.return_to_block else {
span_bug!(terminator.source_info.span, "tail call with the root stack frame")
};
self.eval_fn_call(
callee,
(fn_sig.abi, fn_abi),
&args,
with_caller_location,
&prev_frame.return_place,
ret,
unwind,
)?;

(this will be refactored into a replace_stack_frame function but that's not the point)

This currently is completely broken. stack_mut().pop() invalidates the arguments' locals, causing ICE when performing the call. To fix this I wanted to do the following (in replace_stack_frame):

  1. push the new stack frame
  2. move the arguments from the old stack frame to the new one
  3. delete the old stack frame with .remove(len - 2) (-ish)
  4. evaluate the function 🎉

The problem is that there is a circular dependency — 1 requires knowing the function (stack frame contains body & instance), but the only way to know the function is to call find_mir_or_eval_fn, which requires arguments, which we can"t have, because we haven't moved them yet, which we can't do because there is no new stack frame yet. The dependency cycle is new stack frame -> function -> arguments -> new stack frame.

My idea (granted, not fully implemented here) was to break the "function -> arguments" edge. I'm not sure what other options there are, breaking this edge seems like a requirement. Hopefully you have ideas now to solve the problem better, than this PR =)

LMK if something in this explanation is not clear.

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

delete the old stack frame with .remove(len - 2) (-ish)

Uh, I'm not sure that will work. We have an optimized representation for local variables that never have their address taken: we store them not in an Allocation but directly in the locals map of the stack frame. We still need to be able to create places that point to them though, so the place is represented as a Local + the index of the frame in the stack. If you remove a non-top stack frame the indices shift and who knows what happens...

I once tried to get rid of this index and make it implicitly always the "current" stack frame, but it turns out when it comes to return values we actually do use cross-stack-frame indexing here: the callee stores the place in the caller that the return value should be put in, and that may be such an optimized place. We could make sure the return place is always a "proper" place in memory, and that would probably not cost a lot of performance. (The long-lived PlaceTy here does make me nervous, so making this a MPlaceTy would make things a bit less fragile anyway.) But then I got worried there might be other places where we're doing this and I didn't want to touch this system...


But anyway, taking a step back... in an idealized interpreter, we would evaluate all the arguments to values, then pop the stack frame (deallocating the backing memory for all its locals), then push the new one. In the rustc interpreter this does not directly work, since an evaluated argument (OpTy) might actually not contain the value, but just reference the value in memory -- yet another optimization. But you should still be able to pop the frame from the stack, as long as you don't deallocate the locals. Then you can push the new stack frame using the same method as today. And then you deallocate all the locals from the old frame. I think that should work?

So in other words:

  1. evaluate all arguments to OpTy
  2. pop the old stack frame without deallocating any locals (but keep it around, we need its local map later); this does not invalidate the OpTy (it would invalidate mir::Operand but we don't need those any more)
  3. push the new stack frame; we have the OpTy so this should work as today
  4. now that the arguments have been copied, deallocate the locals from the old frame

@RalfJung
Copy link
Member

RalfJung commented Mar 4, 2024

Ah, it's slightly more complicated since evaluated arguments are not OpTy, they are FnArg. Right, that's for in-place argument passing. That may store a PlaceTy and that may be invalidated by the pop.

But what even are the semantics of in-place argument passing with tail calls? IOW, what does become func(move mylocal) do?

@WaffleLapkin
Copy link
Member Author

But what even are the semantics of in-place argument passing with tail calls? IOW, what does become func(move mylocal) do?

I'm not exactly sure, since I don't even know what precisely is f(move local)... Am I interpreting your words correctly that in-place = by-ref? (i.e. func accepts a pointer to some place in the caller that stores mylocal)

From the backend perspective I think it means "move mylocal to the place of the same argument of the current function". Example with low level pseudo code:

fn a(s: Struct) {
    become b(String::new());
}

fn b(s: String) {}

==>

fn a(s_ptr: *mut Struct) {
    // evaluation of the arguments
    let arg0 = String::new();

    // drops of all the locals
    drop_in_place(s_ptr);

    // move of arguments to the appropriate place
    ptr::write(s_ptr, arg0);

    // reset everything to be as if the function is just called... somehow? idk

    // jump to b
    jump b;
}

fn b(s_ptr: *mut String) { ... }

Note that we can always do that, because tail calls (as currently designed) always require signatures/abis to match.

How to represent this in the Rust AM? I'm not sure... Somehow the arguments need to move from the replaced frame to the new frame (which is done in the backend by just reusing the same frame).

@RalfJung
Copy link
Member

RalfJung commented Mar 5, 2024

I'm not exactly sure, since I don't even know what precisely is f(move local)... Am I interpreting your words correctly that in-place = by-ref? (i.e. func accepts a pointer to some place in the caller that stores mylocal)

It means that the compiler is allowed but not required to make the callee directly use the caller memory for this place. In the AM, what we do is we create a copy as usual but we also mark the caller memory as "inaccessible" so any attempt to observe or overwrite it causes UB, meaning the compiler can then optimize to reuse that memory since it can't be used for anything else.

But with tail calls, the caller memory is deallocated before the callee even starts. So I think it just makes no sense to even have move in a tail call?

Well I guess it makes sense in your example because the actual location of the argument is even further up the stack, so it doesn't go away with the stack frame. Hm...

I guess in principle we could change how arguments are evaluated to FnArg. If it's an in-place argument in an optimized stack frame, then we may as well use OpTy, since there's anyway no memory to "protect" that could be reused. Then FnArg no longer contains a PlaceTy and we can keep them around even after popping the stack frame.

@WaffleLapkin
Copy link
Member Author

But with tail calls, the caller memory is deallocated before the callee even starts. So I think it just makes no sense to even have move in a tail call?

Well I guess it makes sense in your example because the actual location of the argument is even further up the stack, so it doesn't go away with the stack frame. Hm...

Yeah. The idea is that due to how tail call restrictions are currently done1 you can always find a stack frame where everything can live, because there was some original normal call to a function with the same signature, which had to pass arguments of the same types.

I guess in principle we could change how arguments are evaluated to FnArg. If it's an in-place argument in an optimized stack frame, then we may as well use OpTy, since there's anyway no memory to "protect" that could be reused. Then FnArg no longer contains a PlaceTy and we can keep them around even after popping the stack frame.

I have little idea what all of this means but 👍🏻 ahaha.

Footnotes

  1. NB: we potentially want to eventually add an ABI which does not have these restrictions, but that's a whole other issue for another time

@RalfJung
Copy link
Member

RalfJung commented Mar 5, 2024

Yeah. The idea is that due to how tail call restrictions are currently done1 you can always find a stack frame where everything can live, because there was some original normal call to a function with the same signature, which had to pass arguments of the same types.

Hm, interesting. But in the model I don't think we want to require that if a call was "move" then all tail calls it makes have to be "move" as well, so we can't exploit that.

I would be curious about which ICEs you ran into. The code you posted already does eval_fn_call_arguments before popping the stack frame and it seems to entirely omit cleaning up the locals of that call (-> memory leak), so not sure why it should ICE -- except maybe it is actually due to the FnArg issue I mentioned above. Does the call that ICEs involve move?

@WaffleLapkin
Copy link
Member Author

@RalfJung so.. I was a bit misremembering and I'm getting errors, not ICEs:

#![allow(incomplete_features, dead_code)]
#![feature(explicit_tail_calls)]

const fn f(x: u32) {
    become f(x);
}

const _: () = f(0);
error[E0080]: evaluation of constant value failed
 --> ./t.rs:8:15
  |
8 | const _: () = f(0);
  |               ^^^^ accessing a dead local variable

And looking at the mir it does contain a move:

// MIR for `f` after built

fn f(_1: u32) -> () {
    debug x => _1;
    let mut _0: ();
    let mut _2: !;
    let mut _3: u32;

    bb0: {
        StorageLive(_3);
        _3 = _1;
        tailcall f(Spanned { node: move _3, span: ./t.rs:5:14: 5:15 (#0) });
    }

    bb1: {
        StorageDead(_3);
        unreachable;
    }

    bb2: {
        return;
    }
}

@WaffleLapkin
Copy link
Member Author

Aha, found how to make an example without move (by passing a constant). The following hangs the compiler (as it should!):

#![allow(incomplete_features, dead_code)]
#![feature(explicit_tail_calls)]

const fn f(_: u32) {
    become f(1);
}

const _: () = f(1);
// MIR for `f` after built

fn f(_1: u32) -> () {
    let mut _0: ();
    let mut _2: !;

    bb0: {
        tailcall f(Spanned { node: const 1_u32, span: ./t.rs:5:14: 5:15 (#0) });
    }

    bb1: {
        FakeRead(ForMatchedPlace(None), _1);
        unreachable;
    }

    bb2: {
        unreachable;
    }

    bb3: {
        return;
    }
}

@RalfJung
Copy link
Member

RalfJung commented Mar 6, 2024

Okay so I think the first order of business would be to get rid of the PlaceTy in FnArg, replace it by MPlaceTy.

That would require changes in eval_fn_call_arguments: in the Move case, call eval_place. Then do something like:

if let Either::Left(mplace) = place.as_mplace_or_local() {
  FnArg::InPlace(mplace)
} else {
  // This argument doesn't live in memory, so there's no place
  // to make inaccessible during the call.
  // This is also crucial for tail calls, where we want the `FnArg` to
  // stay valid when the old stack frame gets popped.
  FnArg::Copy(self.place_to_op(place)?)
}

@WaffleLapkin WaffleLapkin deleted the find-mir-please branch March 6, 2024 10:07
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Mar 8, 2024
Tweak the way we protect in-place function arguments in interpreters

Use `MPlaceTy` instead of `PlaceTy` in `FnArg` and ignore (copy) locals in an earlier step ("Locals that don't have their address taken are as protected as they can ever be").

This seems to be crucial for tail call support (as they can't refer to caller's locals which are killed when replacing the stack frame).

r? `@RalfJung`
cc `@oli-obk`
see rust-lang#121273 (comment)
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Mar 8, 2024
Rollup merge of rust-lang#122076 - WaffleLapkin:mplace-args, r=RalfJung

Tweak the way we protect in-place function arguments in interpreters

Use `MPlaceTy` instead of `PlaceTy` in `FnArg` and ignore (copy) locals in an earlier step ("Locals that don't have their address taken are as protected as they can ever be").

This seems to be crucial for tail call support (as they can't refer to caller's locals which are killed when replacing the stack frame).

r? `@RalfJung`
cc `@oli-obk`
see rust-lang#121273 (comment)
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Mar 9, 2024
Tweak the way we protect in-place function arguments in interpreters

Use `MPlaceTy` instead of `PlaceTy` in `FnArg` and ignore (copy) locals in an earlier step ("Locals that don't have their address taken are as protected as they can ever be").

This seems to be crucial for tail call support (as they can't refer to caller's locals which are killed when replacing the stack frame).

r? `@RalfJung`
cc `@oli-obk`
see rust-lang/rust#121273 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants