Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: stabilize const_refs_to_static #128183

Open
nikomatsakis opened this issue Jul 25, 2024 · 16 comments · May be fixed by #129759
Open

Proposal: stabilize const_refs_to_static #128183

nikomatsakis opened this issue Jul 25, 2024 · 16 comments · May be fixed by #129759
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-const_refs_to_static `#![feature(const_refs_to_static)]` finished-final-comment-period The final comment period is finished for this PR / Issue. I-types-nominated Nominated for discussion during a types team meeting. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jul 25, 2024

Write-up source: https://hackmd.io/It7MysqPRzeuQZ2RZevz9w?edit

Summary

This document proposes to stabilize refs-to-static-in-constants. This feature permits one to create a constant expression that references a static:

struct Vtable { ptr: *const u32 };
static VT: Vtable = Vtable { ptr: std::ptr::null() };
const C: &Vtable = &VT;

On stable Rust, this stabilization does not introduce any surprising behavior. The resulting constants C will be equal to the address of &VT at runtime -- note that this value is not knowable at compilation time (certainly not early on in the compilation) and so must be represented abstractly (i.e., the compiler thinks of the value of C as "the address of VT", whereas std::ptr::null() is an example of a constant pointer whose value is known to be 0).

Given the limited surface area, this stabilization has no interactions with stable const generics. However, it does have some implications for future const generics; those are discussed in the Future interactions section. The conclusion is that supporting refs-to-statics does not introduce new challenges for const generics that were not already present in some other form.

Procedural note

Const-refs-to-statics never had an RFC. This stabilization report could be made into an RFC if we think that makes sense. Niko's general opinion is that incremental extensions to const generics do not all seem worthy of RFCs, and yet there is some value to establish principles like 'statics have significant addresses that ought to be preserved'.

What is being proposed for stabilization

Background: const evaluation and const values

The term const evaluation refers to evaluating a constant expression at compilation time. The result of const evaluation is a const value. Const values can contain abstract pointers (e.g., the result of &VT is "the address of the static VT") that are not truly known. We cannot always know whether two const values should be considered equal or whether they would compare as equal at runtimer. Const values are used to store the initializer for a named static (static S: T = /* initializer */), the values of named constants (const C: T = ...), and the values of associated constants (<T as Something>::SIZE).

What const_refs_to_static allows

Currently, const evaluation does not allow a const value to reference a static. A program like this one therefore requires a feature gate:

#![feature(const_refs_to_static)]

static S: u32 = 66;
const C: u32 = S;

fn main() {
    println!("{C}"); // prints 66
}

The feature gate allows not only reading the value of a static but also taking references (and even dereferencing them):

#![feature(const_refs_to_static)]

static S: u32 = 66;
const C: &u32 = &S;
const D: u32 = *C;

fn main() {
    println!("{D}"); // prints 66
}

The "significant address" property

The key distinguishing feature of a static versus any other form of variable is that they have a significant address. In short, &S for some static S is often expected to be the same pointer everywhere in the program whenever it occurs (but see the caveat below). This is distinct from a local variable, say, which may have different addresses on each invocation of the function1; it is also distinct from a constant like const { &22 }, which can also refer to different memory locations (though they will always have 22).

Const evaluation and constants preserve this property (playground):

#![feature(const_refs_to_static)]

static S: usize = 44;
const S_X: &usize = &S;
const S_Y: &usize = &S;

static T: usize = 44;
const T_X: &usize = &T;
const T_Y: &usize = &T;

fn main() {
    // These assertions are guaranteed to be true.
    
    assert!(std::ptr::eq(&S, S_X));
    assert!(std::ptr::eq(S_X, S_Y));
    
    assert!(std::ptr::eq(&T, T_X));
    assert!(std::ptr::eq(T_X, T_Y));
    
    assert!(!std::ptr::eq(&S, &T));
}

In contrast, the pointer values of constants are not guaranteed to be equal, and hence equivalent assertions would not be guaranteed to be true for these declarations (playground):

const S: usize = 44;
const S_X: &usize = &S;
const S_Y: &usize = &S;

const T: usize = 44;
const T_X: &usize = &T;
const T_Y: &usize = &T;

fn main() {
    // These assertions hold in practice because LLVM coallesces
    // pointers to constants, but that is a "best effort" optimization
    // and they are not guaranteed to hold:
    assert!(std::ptr::eq(&S, S_X));
    assert!(std::ptr::eq(S_X, S_Y));
    assert!(std::ptr::eq(&T, T_X));
    assert!(std::ptr::eq(T_X, T_Y));
    
    // As above, `S` and `T` are distinct constants, but they are coallesced
    // in practice (not guaranteed):
    assert!(std::ptr::eq(&S, &T));
}

Caveat (generic statics): Statics are currently forbidden from having generic parameters in large because it is not clear if and how the significant address property could be maintained given monomorphization. Future extensions of statics to support generics may revise the precise guarantee being offered here (e.g., to say that generic statics instantiated in distinct compilation units may sometimes have distinct addresses) and they would have to address how that interacts with constants.

Extern statics

Extern statics are treated conservatively. It is possible to get their address as a raw pointer but it is not possible to read from them (what would the value be) or to include a safe reference to them in your final value (playground):

#![feature(const_refs_to_static)]

extern {
    static S: u32;
}

// ERROR cannot access extern static
const C: u32 = unsafe { S };

// ERROR encountered reference to `extern` static in `const`
const D: &u32 = unsafe { &S };

// OK
const E: *const u32 = unsafe { std::ptr::addr_of!(S) };

Freeze requirement

Const evaluation is not allowed to

  • access the contents of any mutable static (whether that via interior mutability or static mut).
  • result in values that safely reference anything mutable (whether that is via interior mutability or &mut).

"Safely reference" here refers to recursively traversing the value in the same way safe code could (but ignoring visibility), i.e. recursing through references but not through raw pointers or unions.

It is possible to create static values with UnsafeCell contents, but they can not typically be used from constants except in very narrow ways. For example, creating a constant whose value includes an UnsafeCell (or a reference to memory contained in an unsafe cell) triggers an error that "it is undefined behavior to use this value":

#![feature(const_refs_to_static)]
#![feature(sync_unsafe_cell)]     // required to use `SyncUnsafeCell`, trivial to do on stable

use std::cell::SyncUnsafeCell;

static S: SyncUnsafeCell<u32> = SyncUnsafeCell::new(66);
const C: &SyncUnsafeCell<u32> = &S; // ERROR: undefined behavior to use this value

Similarly attempting to access the contents of an unsafe cell results in "constant accesses mutable global memory":

#![feature(const_refs_to_static)]
#![feature(const_mut_refs)]       // required to deref the raw pointer
#![feature(sync_unsafe_cell)]     // required to use `SyncUnsafeCell`, trivial to do on stable

use std::cell::SyncUnsafeCell;

static S: SyncUnsafeCell<u32> = SyncUnsafeCell::new(66);
const C: u32 = unsafe { *S.get() }; // ERROR: constant accesses mutable global memory

It is however possible to use statics that have UnsafeCell in other ways, e.g. returning a raw pointer to their contents:

#![feature(const_refs_to_static)]
#![feature(sync_unsafe_cell)]

use std::cell::SyncUnsafeCell;

static S: SyncUnsafeCell<u32> = SyncUnsafeCell::new(66);
const C: *mut u32 = S.get(); // OK

Static mut

Statics declared as static mut generally behave "as if" they were enclosed in an unsafe cell (playground):

#![feature(const_refs_to_static)]
#![feature(const_mut_refs)]

static mut S: u32 = 0;

// ERROR constant accesses mutable global memory
const C: u32 = unsafe { S };

// ERROR it is undefined behavior to use this value
const D: &u32 = unsafe { &S };

// OK, requires feature(const_mut_refs)
const E: *mut u32 = unsafe { std::ptr::addr_of_mut!(S) };

The same is true of external statics (playground):

#![feature(const_refs_to_static)]
#![feature(const_mut_refs)]

extern {
    static mut S: u32;
}

// ERROR constant accesses mutable global memory
const C: u32 = unsafe { S };

// ERROR it is undefined behavior to use this value
const D: &u32 = unsafe { &S };

// OK, requires feature(const_mut_refs)
const E: *mut u32 = unsafe { std::ptr::addr_of_mut!(S) };

Future interactions

Const generics refers to Rust items with generic parameters of kind const, such as fn foo<const C: usize>(). Stable Rust requires that const generic parameters have simple scalar types like usize or i32. This limitation means that there is no real interaction between the stable surface area of const generics and const_refs_to_static.

So long as we do not extend const generics to permit values of &-type, then there are no problems at all (but of course we limit what users can do, and in particular don't support &str values). If however we wish to extend const generics to permit parameters of &-type (e.g., fn foo<const C: &usize>()), then we will need to extend the current implementation to preserve the "significant address" property. This section dives into detail as to why that property is not currently preserved, the various options to fix that, and some related challenges.

Background: Const generics and monomorphization

Given a function fn foo<const C: SomeType>(), Rust's type system must be able to decide whether foo::<X> and foo::<Y> represent two different instances of the same generic function (or, equivalently, given struct Foo<const C: SomeType>, whether Foo<X> and Foo<Y> are the same type). This requires being able to determine whether X and Y are equal (i.e., the same value). This equality comparison cannot be done for all const values since some of them lack a well-defined notion of equality (e.g., two values of type fn()). Stable Rust sidesteps this issue by only permitting const generics where the type is a scalar value (e.g., u32) and the constant expression can be evaluated to a fixed constant (in particular, the expression is not allowed to reference generic types).

Introducing valtrees

To support a richer set of values in const generics, nightly Rust makes use of valtrees. A valtree ("value tree") is a simplified form of const value consisting of "branch nodes" and "leaf nodes", which carry simple scalar values. The "value" of a const generic parameter is always a valtree, not an arbitrary const value.

For the simple types supported in const generics today, valtree conversion is infallible -- simply convert the scalar value to a leaf node. The same is true for ADTs composed of those simple types. Converting a (u32, u32) tuple like (22, 44) for example simply means you get a valtree like (I32LeafNode(22_i32), I32LeafNode(44_i32)).

Valtrees do not carry type information. The same valtree (I32LeafNode(22_i32), I32LeafNode(44_i32)) that represents a tuple would also represent a fixed-length array like [22, 44] or a value of struct Point { x: u32, y: u32 }. At monomorphization time, generic constants have both a type and an associated valtree suitable for that type, and that type can be used to instantiate the valtree into an actual value.

Values of more complex types may not have a well-defined valtree. For example, there is no way to represent a fn() value as a valtree. In the nightly version of const generics, whenever a const value is given as the value for a const generic, the compiler internally attempts to convert that const value to a valtree. This process can fail, in which case an error results. But if it succeeds, then the const generic can be compiled. Whenever the const generic argument is referenced, the valtree will be converted into a const value which can in turn be converted into a real value at runtime.

Example. Let's walk through an example supported on stable today:

fn test<const C: u32>() {
    let x = C;
    println!("{x}");
}

fn main() {
    test::<{22 + 44}>();
}
  • In main, the expression 22 + 44 is const evaluated into a const value ConstVal(66).
  • ConstVal(66) is then converted into a valtree I32LeafNode(66).
  • During codegen time, the function test::<I32LeafNode(66)> is compiled.
  • When let x = C is compiled, I32LeafNode(66) is converted back to ConstVal(66) and from there the code is compiled to load a constant. Execution proceeds as expected.

Supporting references in valtrees

As currently implemented, references are ignored when creating a valtree, so the valtrees for 22 and &22 and even &&22 are all the same (just I32LeafNode(22)). This preserves the property that, given two values X and Y, if valtree(X) == valtree(Y) then x == y. For refrences, this means that pointer equalty ought not be considered part of identity, since the == operator for &T says that two references are equal if their referents are equal (and it doesn't consider the pointer address). Put another way, the Eq trait doesn't respect "significant addresses", and valtrees are currently defined to align with Eq, so they do not either.

The current definition of valtrees implies that const generics of type &usize (or any reference) will preserve the value of the referent but not its address (as that is not part of the valtree). This can create observable behavior on nightly. Consider this example from #120961:

#![feature(const_refs_to_static)]
#![feature(adt_const_params)]

static FOO: usize = 42;
const BAR: &usize = &FOO;
fn foo<const X: &'static usize>() {
    if std::ptr::eq(X, &FOO) {
        // Never prints! But isn't `X == BAR == &FOO`??
        println!("activating special mode");
    }
}

fn main() {
    foo::<BAR>();
}

When executed, this example does NOT print anything, even though you might expect that it would. What is happening?

  • The value of BAR is ConstVal(&FOO), which tracks that it is the address of the static FOO.
  • The value of BAR is converted into a valtree, which results in just 42 (the value of the static is used to create the valtree).
  • When foo::<Leaf(64)> is compiled, the valtree must be converted into a &usize. A new temporary value is synthesized. The str::ptr::eq (which observes the physical pointer address) compares the address of this temporary to FOO and they have different addresses.
    • In practice, an anonymous constant like const BAZ: &'static u32 = &42 would typically be equal to X, but that is because LLVM deduplicates such constants into a single allocation; such deduplication is also not guaranteed to occur, particularly across codegen units.

There is general agreement that this behavior is surprising and not desirable. But note that it requires multiple feature gates -- const_refs_to_static AND adt_const_params (and as of very recently, unsized_const_params). Stabilizing just const_refs_to_static does not really change anything. In other words, the problem with the above example is not due to permitting references to statics in constants, it's due to valtrees encoding references in a surprising way (though if you didn't have references to statics, you couldn't observe it).

Options to support references in const generics

So, what are the options for supporting reference types in const generics, while avoiding surprising examples like the one from #120961 above?

Option A: Disallow creating valtrees from references to statics

We could make valtree construction fail if it encounters a reference to a static (but succeed for references to anonymous constants). This would avoid the issues but only be preventing users from doing something they likely want to do. This program would not compile, for example, since it invokes foo with the constant &S:

fn foo<const C: &usize>() { }
static S: usize = 22;
fn main() {
    foo::<&S>(); // People will want to do this!
}

This option is not very appealing, ecause users likely want to create valtrees that reference statics.

Option B: Extend valtree to represent ref-to-static

A more appealing option is to extend valtrees so that "ref-to-static" is something they can directly encode, and thus sacrifice the invariant that valtree(X) == valtree(Y) implies X == Y. This recognizes the fact that there are additional properties to values that we may wish to preserve beyond what is compared by the Eq trait. Significant addresses are not the only examples of such properties, there are many that arise when const functions use unsafe code, such as the value of padding bits, provenance, and potentially things like which NaN is in use (if we wished to support f64). We will have to decide which of them we wish to make observable in const evaluation.

The upshot: Stuff to figure out, but refs-to-statics doesn't make it harder

As BoxyUwu put it:

Notably that any of the solutions to making refs to statics not behave weirdly in const generics, wind up being strongly related to existing problems in const generics that already need to be solved. So while there are open questions here they don't actually really make anything worse (in my opinion).

I wouldn't want anyone to read this and come away wondering whether the feature should be blocked for a while until const generics stuff is figured out.

Links

Footnotes

  1. And potentially even within a single function call, if the value is moved or becomes dead -- though arguably that is a separate variable. Precise limitations here still TBD.

@nikomatsakis nikomatsakis added T-lang Relevant to the language team, which will review and decide on the PR/issue. F-const_refs_to_static `#![feature(const_refs_to_static)]` labels Jul 25, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 25, 2024
@nikomatsakis
Copy link
Contributor Author

@rustbot labels +I-lang-nominated +I-types-nominated T-lang T-types

Hello @rust-lang/lang and @rust-lang/wg-const-eval (cc @rust-lang/types), I am proposing to stabilize the const_refs_to_static feature. There is a write-up in this issue (also available at https://hackmd.io/@rust-lang-team/Bk0BHtHOA) detailing the considerations .

Procedural note: I am not sure if this is T-lang or T-types or both, so I've opted to nominate and tag for both teams.

@rustbot rustbot added I-lang-nominated Nominated for discussion during a lang team meeting. I-types-nominated Nominated for discussion during a types team meeting. T-types Relevant to the types team, which will review and decide on the PR/issue. labels Jul 25, 2024
@traviscross traviscross removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jul 26, 2024
@traviscross
Copy link
Contributor

traviscross commented Jul 26, 2024

@rustbot labels -T-types

On the procedural question, in earlier discussion, @oli-obk said:

T-types isn't the right team for this imo. The type system concerns of this feature are almost nonexistent.

...and this sounds right to me. To my eyes, this seems distinctly a lang matter, and that's also how the tracking issue was tagged by @RalfJung and how it has remained tagged. So I'm going to pull off the T-types label before proposing FCP here. But of course, if there's something I missed here, please let me know. I'll leave it nominated for types for visibility.

@rustbot rustbot removed the T-types Relevant to the types team, which will review and decide on the PR/issue. label Jul 26, 2024
@traviscross
Copy link
Contributor

@rfcbot fcp merge

This stabilization looks correct and conservative to me. The analysis by @nikomatsakis is thorough and answers all the questions I had. I don't see any doors of any concern that we're closing here.

This feature is desirable. It's known to be useful for Rust-for-Linux and will be useful for async-related purposes in the standard library.

Thanks to @nikomatsakis for putting together this careful write-up and to @RalfJung, @oli-obk, @BoxyUwU, and others for helping through discussion to shape and refine it.

@rfcbot
Copy link

rfcbot commented Jul 26, 2024

Team member @traviscross has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Jul 26, 2024
@lcnr
Copy link
Contributor

lcnr commented Aug 1, 2024

Freeze requirement

Const evaluation is not allowed to

  • access the contents of an UnsafeCell;
  • result in values that contain an UnsafeCell.

The doesn't seem quite right to me 🤔 It's totally fine for const evaluation to result in values containing UnsafeCell, it's necessary even to have a static with interior mutability.

use std::cell::UnsafeCell;
const FOO: UnsafeCell<u32> = UnsafeCell::new(1);

fn main() {
    unsafe { *FOO.get() = 2 };
    unsafe { assert_eq!(*FOO.get(), 1) };
}

It's also fine for const eval to access the content of an UnsafeCell as long as the unsafe cell is 'local' to the current const (i don't know the exact rules)

#![feature(const_refs_to_cell)]
#![feature(const_mut_refs)]

use std::cell::UnsafeCell;
const S: u32 = {
    let x = UnsafeCell::new(1);
    unsafe { *x.get() }
};

I guess the rules should be something like the following?

Const evaluation is not allowed to

  • access the contents of an UnsafeCell from a static;
  • result in values that reference an UnsafeCell.

@RalfJung
Copy link
Member

RalfJung commented Aug 1, 2024

I guess the rules should be something like the following?

Specifically, const-eval can never read from (or return a reference to) any mutable static -- that includes static mut as well as static !Freeze.

And indeed it cannot return a reference to UnsafeCell, either.

@workingjubilee
Copy link
Member

as-of #125834 the final examples can be written

#![feature(const_refs_to_static)]
#![feature(const_mut_refs)]

extern {
    static S_EXT: u32;
}

static mut S_MUT: u32 = 0;


// OK
const EXT: *const u32 = std::ptr::addr_of!(S_EXT);e

// OK, requires feature(const_mut_refs)
const MUT: *mut u32 = std::ptr::addr_of_mut!(S_MUT);

i.e. they are not unsafe because they perform no unsafe operation.

@nikomatsakis
Copy link
Contributor Author

@rfcbot reviewed

I'll read over what @lcnr wrote but (if I'm not mistaken) it seems like a relatively minor tweak where I overstated the rules?

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Aug 21, 2024
@rfcbot
Copy link

rfcbot commented Aug 21, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 21, 2024
@RalfJung
Copy link
Member

I guess the rules should be something like the following?

Const evaluation is not allowed to

  • access the contents of an UnsafeCell from a static;
  • result in values that reference an UnsafeCell.

I would phrase this as: Const evaluation is not allowed to

  • access the contents of any mutable static (whether that via interior mutability or static mut).
  • result in values that safely reference anything mutable (whether that is via interior mutability or &mut).

"Safely reference" here refers to recursively traversing the value in the same way safe code could (but ignoring visibility), i.e. recursing through references but not through raw pointers or unions.

@nikomatsakis
Copy link
Contributor Author

Thanks @RalfJung, I'll update.

@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in lang triage today. People felt good about this, and it's now in FCP.

@rustbot rustbot removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 21, 2024
@tmandry
Copy link
Member

tmandry commented Aug 24, 2024

Sorry, I'd like to review this before stabilization but haven't had time yet.

@rfcbot concern tmandry review

@rfcbot rfcbot added the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 24, 2024
@rfcbot rfcbot removed the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Aug 24, 2024
@tmandry tmandry self-assigned this Aug 24, 2024
@tmandry
Copy link
Member

tmandry commented Aug 27, 2024

@rfcbot resolve tmandry review
@rfcbot reviewed

@rfcbot rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 27, 2024
@rfcbot
Copy link

rfcbot commented Aug 27, 2024

🔔 This is now entering its final comment period, as per the review above. 🔔

@tmandry tmandry removed their assignment Aug 27, 2024
@dingxiangfei2009 dingxiangfei2009 linked a pull request Aug 29, 2024 that will close this issue
@rfcbot rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Sep 6, 2024
@rfcbot
Copy link

rfcbot commented Sep 6, 2024

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@rfcbot rfcbot added the to-announce Announce this issue on triage meeting label Sep 6, 2024
workingjubilee added a commit to workingjubilee/rustc that referenced this issue Sep 11, 2024
…efs-to-static, r=petrochenkov

Stabilize `const_refs_to_static`

Close rust-lang#128183
Tracked by rust-lang#119618
cc `@nikomatsakis`

Meanwhile, I am cooking a sub-section in the language reference.
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. F-const_refs_to_static `#![feature(const_refs_to_static)]` finished-final-comment-period The final comment period is finished for this PR / Issue. I-types-nominated Nominated for discussion during a types team meeting. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants