Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add const-ub RFC #3016

Merged
merged 18 commits into from
May 4, 2021
Merged

add const-ub RFC #3016

merged 18 commits into from
May 4, 2021

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Nov 10, 2020

This RFC was drafted by me with lots of input from @oli-obk.

Define UB during const evaluation to lead to an unspecified result for the affected CTFE query, but not otherwise infect the compilation process.

Cc @rust-lang/wg-const-eval

Rendered (latest version)

@oli-obk oli-obk added the T-lang Relevant to the language team, which will review and decide on the RFC. label Nov 10, 2020
@vi
Copy link

vi commented Nov 12, 2020

Does this RFC tries to make unsafe things not-actually-unsafe in const context?

It is obvious that if unsafe code does something wrong, undefined behaviour happens during execution. So isn't it obvious that if const unsafe code does something wrong, undefined behaviour happens during compilation? unsafe is supposed to turn off checks and turn on freedom. Otherwise it's not a "real" unsafe.

Another extreme alternative would be to say that UB during CTFE may have arbitrary effects in the host compiler, including host-level UB.

Maybe this way should be the main way instead of an alternative?

@ssokolow
Copy link

ssokolow commented Nov 12, 2020

unsafe is supposed to turn off checks and turn on freedom. Otherwise it's not a "real" unsafe.

unsafe doesn't and never did "turn off checks and turn on freedom". It grants access to additional language constructs that can cause UB or break memory safety if mis-used. That's it.

(All the invariants for the non-unsafe language constructs still must be upheld.)

More specifically, it grants access to constructs which would have to be checked by MIRI as part of their compile-time execution for UB to be detected by the compiler.

@RalfJung
Copy link
Member Author

The "freedom" turned on by unsafe is the freedom to shoot yourself in the foot. ;)

But it's still just as wrong to cause UB inside unsafe than anywhere else. It's just (supposed to be) impossible outside unsafe.

Does this RFC tries to make unsafe things not-actually-unsafe in const context?

It depends on what you mean by "not-actually-unsafe". unsafe is still the only way to cause these "you caused UB" errors during CTFE. The RFC also changes nothing about the fact that we do not make stability guarantees for incorrect unsafe code, including during CTFE. It just says that, for now (until a follow-up RFC), incorrect unsafe may not go completely crazy during CTFE.

Maybe this way should be the main way instead of an alternative?

Why would that be a good idea?

@chorman0773
Copy link

chorman0773 commented Nov 13, 2020

+1 to this.
Defining UB to fail constant evaluation is a good idea, imo.

A few questions though:

  • What about constant promotion. Would a failure in constant evaluation be downgraded to runtime whenever it would be semantically permitted? It can be noted that code may never be evaluated at runtime, so making it a compile time error may be incorrect.
  • Should this be guaranteed to extend to intrinsics? Intrinsics aren't stable, and vary by compiler. In C++ it is unspecified whether any UB in the standard library is diagnosed. Perhaps saying the same about the rust standard library would be beneficial
  • Perhaps increasing the rules for dereferencing raw pointers to be closer to C++'s requirements (either needs to be a pointer constant expression, and not a null or passed the end pointer, or point to an "object" for which the lifetime started since the evaluation of the constant expression). This may make the remainder of the requirements trivial. This may, however, be undermined by arbitrary pointer casts? Are those allowed in const presently?

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

What about constant promotion. Would a failure in constant evaluation be downgraded to runtime whenever it would be semantically permitted? It can be noted that code may never be evaluated at runtime, so making it a compile time error may be incorrect.

Promotion is completely independent. It should not be possible to cause UB in promotion, as the analysis that allows promotion essentially just allows basic math and some field accesses.

Should this be guaranteed to extend to intrinsics? Intrinsics aren't stable, and vary by compiler. In C++ it is unspecified whether any UB in the standard library is diagnosed. Perhaps saying the same about the rust standard library would be beneficial

The RFC states that any UB in intrinsic invocation must be caught, so exact_div will error if you pass 0 as the second argument or if a remainder remains. Since we must implement something in the interpreter for all inputs to the intrinsic that is being emulated, we would have to add additional code just to not error. It is easier to just error.

Perhaps increasing the rules for dereferencing raw pointers to be closer to C++'s requirements (either needs to be a pointer constant expression, and not a null or passed the end pointer, or point to an "object" for which the lifetime started since the evaluation of the constant expression). This may make the remainder of the requirements trivial.

Restricting dereferences beyond "must be in a live allocation" is actually harder than not doing so. I don't think Rust has any notion of an "object".

This may, however, be undermined by arbitrary pointer casts? Are those allowed in const presently?

Even if we disallowed pointer casts, you could always use transmute, so this is impossible to protect against afaict.

@MikailBag
Copy link

Did I understand correctly that for following program:

const x: String = unsafe {std::hint::unreachable_unchecked()};

fn main() {
    printn!("{}", x);
}

RFC states that

  1. Current rustc version must reject this program.
  2. Any compilter is allowed to reject this program.
  3. Some future rustc version (or alternative implementation) however can accept this program (and say that is's behavior is never defined) and miscomplite it pretty awfully.
  4. But anyway the compiler itself can not perform arbitrarily bad things like RCE or running rm -rf / on the host?

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

For the specific example (unreachable_unchecked) the RFC states that

  • Incorrect use of compiler intrinsics (e.g., reaching an unreachable or violating the assumptions of exact_div).

will always be an error, and all compilers must emit an error

But there are cases of UB that we do not reliably detect:

A compiler may choose to emit an error, warning or nothing at all if it encounters this kind of UB. This means on 64 bit systems (not on 16 or 32 bit, due to transmute size checks) you are free to do

const x: String = unsafe { std::mem::transmute((1, 2, 3, 4, 5, 6)) };

fn main() {
    print!("{}", x);
}

which actually compiles on stable and will die horribly.

@chorman0773
Copy link

chorman0773 commented Nov 13, 2020

Since we must implement something in the interpreter for all inputs to the intrinsic that is being emulated, we would have to add additional code just to not error. It is easier to just error.

I'm not saying that it should not be allowed to error, rather it should generally be unspecified. It may be entirely unreasonable to do anything other than error. However, there are some cases where diagnosing errors for some intrinsics may be difficult itself.

Even if we disallowed pointer casts, you could always use transmute

The issue with pointer casts and transmute involving pointers actually goes with usize, and was something I was made aware of when I looked to C++'s std-proposals list to see if it was viable to lift some reinterpert_cast restrictions. It may not be possible to implement constant conversion to integer values, as the actual runtime bit pattern may not be known at compile time (for example, transmute, if permitted).

Promotion is completely independent

And I presume that if Promotion and required Constant evaluation are evaluated together, it would still be permissible (and in fact, required) that failed promotion downgrades to runtime.

Producing an invalid value (but not using it in one of the ways defined above).

I would assume that producing an invalid value should always be an error, and, absent pointer transmutes, is probably quite easy to diagnose.

@chorman0773
Copy link

Another question:
Would it be permitted to detect UB in any standard library function, again left as unspecified whether or not it does so?

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

However, there are some cases where diagnosing errors for some intrinsics may be difficult itself.

we have not encountered this yet, so we could also just wait until it comes up with a new intrinsic

And I presume that if Promotion and required Constant evaluation are evaluated together, it would still be permissible (and in fact, required) that failed promotion downgrades to runtime.

yes, this is what happens with &(1/0) which has a compile-time warning (not an error) and a runtime panic before we ever encounter the constant that was never successfully computed.

I would assume that producing an invalid value should always be an error, and, absent pointer transmutes, is probably quite easy to diagnose.

it's very expensive to do that check, we could, but it would slow down a lot of totally fine code. We do check final values of constants though, just not the intermediate states. So you could have a bool with the bit pattern 3, and that's fine, until you try to branch on it, then you get an error. But if you just convert it to an integer, there's no error.

Would it be permitted to detect UB in any standard library function, again left as unspecified whether or not it does so?

Depends on the method you're suggesting. Ideally you'd just create a new intrinsic and invoke that. It could be as simple as invoking std::intrinsics::assume(your_condition_that_must_be_true);

@matu3ba
Copy link

matu3ba commented Nov 13, 2020

Since we must implement something in the interpreter for all inputs to the intrinsic that is being emulated, we would have to add additional code just to not error. It is easier to just error.

I'm not saying that it should not be allowed to error, rather it should generally be unspecified. It may be entirely unreasonable to do anything other than error. However, there are some cases where diagnosing errors for some intrinsics may be difficult itself.

@chorman0773 @oli-obk
Some code examples what things compiler intrinsics can do should help to provide insight into the RFC. I guess one can group them into categories? The reference does not explain what compiler intrinsics do.

Promotion is completely independent

And I presume that if Promotion and required Constant evaluation are evaluated together, it would still be permissible (and in fact, required) that failed promotion downgrades to runtime.

I dont understand what you mean with downgrading. It is not in the promotion RFC.
"only const fn that were explicitly marked with the #[rustc_promotable] attribute are subject to promotion"
"Those functions must be manually reviewed to never raise CTFE errors."

Producing an invalid value (but not using it in one of the ways defined above).

I would assume that producing an invalid value should always be an error, and, absent pointer transmutes, is probably quite easy to diagnose.

Every check of things you dont use is a performance loss, so you dont want to do it "in release builds".

@chorman0773
Copy link

so we could also just wait until it comes up with a new intrinsic

Keeping in mind that the intrinsics provided by the compiler are not specified as part of the stable language (with the possible exception of the limited ones marked rust_const_stable), ::__lccc::builtins::cxx::__builtin_launder is potentially an issue. It's not an intrinsic in the rust sense, per se, it's not obtained using extern"rust-intrinsic" (the intrinsics that can be live under ::__lccc::builtins::rust), but it's implemented purely by the compiler, and has equivalent semantics to the C++17 standard library function std::launder. It's allowed during constant evaluation if it's parameter is a constant expression, and I believe contains no union subobjects. If it's possible to create arbitrary pointers, enforcement of it's undefined behaviour (reachability restrictions in particular) could prove difficult. I'd have to look into the equivalent code in clang, and see how other interacting rules could alter how I perform the appropriate checks (previously I did not have to consider this, as in C++, it's unspecified whether the undefined behaviour of std::launder is diagnosed).

I dont understand what you mean with downgrading.

What I mean here by promotion is I want to convert constant expressions (which is a property of the expression, not of when it is evaluated) to evaluate at compile time, similar to what a C++ compiler can do. If attempting to perform constant evaluation of a non-madatory constant expression fails, "downgrade" here means that it just abandons constant evaluation and leaves it to runtime.

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

Keeping in mind that the intrinsics provided by the compiler are not specified as part of the stable language (with the possible exception of the limited ones marked rust_const_stable)

Intrinsics are never stable. The rustc_const_stable only specifies that the standard library may use this intrinsic in const fn that are stable.

I don't think Rust's memory is typed, and thus something like C++'s std::launder does not need to exist. I can't really come up with an example where the intrinsic itself would do something that is UB without us being able to detect it. It could always produce a result that if used will cause UB, but that's not UB happening in the intrinsic.

@chorman0773
Copy link

I don't think Rust's memory is typed, and thus something like C++'s std::launder does not need to exist.

As the name of the intrinsic in question may imply, it's an intrinsic provided by lccc (a work in progress implementation of rust designed primarily as a competitor for rustc), for the purposes of supporting C++ code (which lccc can also compile). When I say the intrinsic matches the semantics of std::launder I mean exactly. lccc does operate on typed memory (this is permissible, provided observable-behaviour remains intact as-if it operates on untyped memory). The specific issue is that I don't know how the reachability restriction of std::launder (which is passed onto the intrinsic because the reachability problem is actually something I am very much interested in optimizing) will interact with other features of rust, either current, in discussion, or future (particularily things like rust-lang/unsafe-code-guidelines#2).

@chorman0773
Copy link

What I would propose is the following:
Instead of specifying that all intrinsics are diagnosed, the following should be required instead:

  • It is unspecified if undefined behaviour caused by an intrinsic is diagnosed
  • Implementations shall diagnose undefined behaviour caused by or in calls to any of the following standard library functions
    • core::hint::unreachable_unchecked()
    • core::mem::transmute()
    • core::mem::zeored()
    • and any other unspecified functions the implementation chooses.

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

a work in progress implementation of rust designed primarily as a competitor for rustc

Oh awesome! I didn't know about that.

Won't your const evaluator need to support checking whether a pointer points to a valid object of the appropriate type anyway?
In that case the intrinsic could just query that information. Basically you'd need to tag the const evaluator's virtual memory with the original type when the object was placed in that memory and drag that tag along with any memory moving operations.

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

Implementations shall diagnose undefined behaviour caused by or in calls to any of the following standard library functions

Ah interesting, we stop focusing on intrinsics, which are not part of the language anyway and only address the stable standard library API and language features. I wonder if we can specify this in a way that's better than listing specific things that are caught, but would not be averse to just having an explicit list that we grow.

One way to specify this in a general way could be to merge the rules for operations/branching and "intrinsics" by including the latter in "operations". Just like a + undef is not ok, even if an undef value by itself being moved around but not inspected is ok, intrinsics are simply treated as operations stating that they must not make decisions on an undef value or a value that is invalid for its type (e.g. a bool with value 3 or a null/OOB reference).

@oli-obk
Copy link
Contributor

oli-obk commented Nov 13, 2020

As a concrete example: I guess under this definition it would be ok for you to ignore overflow in <*const T>::offset, even if that is UB, but you would not be allowed to ignore undef input. Not sure if we can come up with a wording that doesn't also require us to validate the input and result of a transmute

text/0000-const-ub.md Outdated Show resolved Hide resolved
@RalfJung
Copy link
Member Author

@chorman0773

What about constant promotion. Would a failure in constant evaluation be downgraded to runtime whenever it would be semantically permitted? It can be noted that code may never be evaluated at runtime, so making it a compile time error may be incorrect.

An RFC for this is under development.

Should this be guaranteed to extend to intrinsics? Intrinsics aren't stable, and vary by compiler. In C++ it is unspecified whether any UB in the standard library is diagnosed. Perhaps saying the same about the rust standard library would be beneficial

Good point -- I wasn't fully considering other implementations with a different set of intrinsics when writing the RFC.

I think intrinsics should likewise either error or do the "obvious" sensible thing (and if there is nothing "obvious" they must error). I am not sure how to make that wording more precise though. Do you think this makes sense?

As the name of the intrinsic in question may imply, it's an intrinsic provided by lccc (a work in progress implementation of rust designed primarily as a competitor for rustc), for the purposes of supporting C++ code (which lccc can also compile). When I say the intrinsic matches the semantics of std::launder I mean exactly. lccc does operate on typed memory (this is permissible, provided observable-behaviour remains intact as-if it operates on untyped memory).

If your memory behaves "as-if" it is untyped, then there cannot be any UB caused by "type mismatches" (as those cannot occur in untyped memory), so this should not be a concern relevant for this RFC.

Implementations shall diagnose undefined behaviour caused by or in calls to any of the following standard library functions

I don't think we want to go over all stdlib functions here and classify them like this. Plus, detecting UB even for the ones you stated would be very expensive (in the way rustc is currently architected), so this is not a good option IMO.

The RFC deliberately says nothing about library UB, it is only about language UB. The issue is that intrinsics make this line a bit harder to draw that I'd like.

@oli-obk

will always be an error, and all compilers must emit an error

Well, the RFC also says that this is not a stable guarantee. So it will always be an error until another RFC changes the rules.

@chorman0773
Copy link

chorman0773 commented Nov 15, 2020

then there cannot be any UB caused by "type mismatches"

I should rephrase: to the extent necessary to emulate the observable behaviour of the rust abstract machine, it operates as-if memory is untyped (which, pending answers to UCG #2, #84, and #236, merely requires that strict-aliasing is not applied to code compiled from rust). This is distinguishable via the launder builtin, which notably is absent from rust, and present only because it's a C++ intrinsic that needs to communicate with the optimizer.

so this should not be a concern relevant for this RFC.

It was primarily used as an expository argument against blanket requiring intrinsic UB to be diagnosed. Any compiler could come up with some wild intrinsic that has hard-to-diagnose UB. In this case, it's existence is serving a compelling interest, implementing the std::launder function from C++ in a way that generally preserves the optimizations it serves to inhibit.

Plus, detecting UB even for the ones you stated would be very expensive (in the way rustc is currently architected), so this is not a good option IMO

I believe the three I identified would already be diagnosed under the intrinsic rules (transmute is an intrinsic, zeroed calls init, an intrinsic, and unreachable_unchecked is an intrinsic, and rather easy to diagnose, imo). Also, production/copies/moves of trivially-invalid values should be some of the easiest to diagnose, as I presume rustc knows about the bitwise validity invariants of every type.

I think intrinsics should likewise either error or do the "obvious" sensible thing (and if there is nothing "obvious" they must error)

I stated, I think it should be unspecified whether intrinsics are diagnosed, though maybe promoting it to requiring the implementation to document the intrinsics where it might not diagnose. I'm sure any implementation would choose to diagnose as much as is possible and reasonable to implement. If this is a reasonable direction, I'd propose the following wording:

An implementation may support the constant evaluation of an implementation-defined set of intrinsics. For each such intrinsic which can cause undefined behaviour, the implementation shall document any which might not diagnose such undefined behaviour.

The RFC deliberately says nothing about library UB, it is only about language UB. The issue is that intrinsics make this line a bit harder to draw that I'd like

I'd argue that at the least, it should say that it is unspecified whether library UB is diagnosed. That way, implementations can do so.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 15, 2020

I believe the three I identified would already be diagnosed under the intrinsic rules (transmute is an intrinsic, zeroed calls init, an intrinsic, and unreachable_unchecked is an intrinsic, and rather easy to diagnose, imo). Also, production/copies/moves of trivially-invalid values should be some of the easiest to diagnose, as I presume rustc knows about the bitwise validity invariants of every type.

mem::zeroed/mem::uninitialized UB arises from "UB due to invalid values". Detecting that is fairly non-trivial -- but we do have the required machinery because Miri needs it. However, it is also very expensive, so I quite explicitly do not want to mandate that (as the RFC says).

Having a special case just for mem::zeroed/mem::uninitialized would be possible, but I see no good reason to special-case just this. For example, MaybeUninit::zeroed().assume_init() is implemented without any intrinsic -- it would be strange to require detecting UB from mem::zeroed() but not from this.

EDIT: Actually, zeroed() just calls MaybeUninit::zeroed().assume_init(). So there is no intrinsic involved in either of them. And FWIW, mem::uninitialized() is also implemented without an intrinsic. Of course other implementations can choose to provide intrinsics for this, but it is by no means necessary. (These implementations do not rely on anything rustc-specific.)

It was primarily used as an expository argument against blanket requiring intrinsic UB to be diagnosed. Any compiler could come up with some wild intrinsic that has hard-to-diagnose UB.

The question is, is it easy to implement this intrinsic just ignoring UB? I assume this can be treated like we treat e.g. aliasing violations, where checking memory accesses for UB is actually highly non-trivial due to aliasing constraints, but there is "obvious" behavior to fall back to if this UB cannot be detected: just perform the access ignoring the aliasing rules.

I'd argue that at the least, it should say that it is unspecified whether library UB is diagnosed. That way, implementations can do so.

That's fair. OTOH, there is a tension here between documenting precisely what rustc does, and documenting what should apply to all implementations.

@nikomatsakis
Copy link
Contributor

@rfcbot fcp merge

Not everyone has read this yet (including me, ha!) but I'm going to propose FCP anyway based on the summary of what is in it, and because this has been extensively discussed.

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 6, 2021

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. labels Apr 6, 2021
@nikomatsakis
Copy link
Contributor

Folks in @rust-lang/lang can check their boxes.

@gilescope
Copy link

I'm assuming that if we hit const ub then rustc will still produce the same binary outputs bit for bit? It would be unfortunate if we couldn't get reproducable build outputs. (I.e. same file hashes)

@RalfJung
Copy link
Member Author

In rustc, we will. I don't think the RFC mandates that, though.

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Apr 14, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 14, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this RFC. and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Apr 24, 2021
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 24, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@pnkfelix pnkfelix merged commit 2654cf1 into rust-lang:master May 4, 2021
Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 4, 2021
…metics_as_const, r=joshtriplett

Mark unsafe methods NonZero*::unchecked_(add|mul) as const.

Now that rust-lang/rfcs#3016 has landed, these two unstable `std` function can be marked `const`, according to this detail of rust-lang#84186.
Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 4, 2021
…metics_as_const, r=joshtriplett

Mark unsafe methods NonZero*::unchecked_(add|mul) as const.

Now that rust-lang/rfcs#3016 has landed, these two unstable `std` function can be marked `const`, according to this detail of rust-lang#84186.
@RalfJung RalfJung deleted the const-ub branch September 30, 2022 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. to-announce
Projects
None yet
Development

Successfully merging this pull request may close these issues.