Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constants that depend on type parameters in generic code #1062

Closed

Conversation

quantheory
Copy link
Contributor

Rendered

This attempts to lay down some rules involving possible uses of expressions like <T as Int>::ONE, where T is a type parameter. This is a bit tricky, and currently forbidden completely (even on my associated const branch). However, I believe that some cases can be allowed without major problems, and it may be good to explore this with associated constants before trying to deal with things like #1038.

If this is accepted (and for that matter, maybe even if this is rejected), #865 could be cut back to just mentioning simple syntax issues, and dealing with type object safety.

Note that I wrote this, then mostly rewrote it a couple of weeks later. I think that I got it roughly right in the end, but the RFC might be a bit weird in places if I missed something.

preceding section. That is to say, the initializer expression itself is invalid.

In order to implement this restriction, use of a type parameter in a constant
expression must be considered "contagious". That is, if the initializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think contagious is the correct word. Maybe contiguous? I'm not really sure what word you're looking for.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contagious is the correct word. People sometimes use "viral" to have the same meaning, but with stronger negative connotations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I thought it had to do with sickness. I even looked it up to check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A condition that is "contagious" is spread between things that contact one another. It often refers to disease, but not always. Maybe I could use a different word, like "inherited".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters anymore because it's been explained to me. Only if the misunderstanding is common should be it changed. I don't have any evidence that it is.

@aturon
Copy link
Member

aturon commented Apr 15, 2015

cc @nikomatsakis

@nikomatsakis nikomatsakis self-assigned this Apr 16, 2015
value, say `16`, in which case it is clear that both sides of the assignment
involve arrays of size `32`.

However, in generic code, if `T` is a type parameter, we are asking the type
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I object to this notion. I don't think we're asking the compiler to prove some generic equation. We're asking it to evaluate a kind of a poor man's static_assert at the point of template instantiation. To wit, given:

trait Foo {
    const VALUE: usize;
}

...then:

fn bar<T: Foo>() {
    let _: [u8; T::VALUE] = [0; 111];
}

...is effectively the same as (using C++ syntax for static_assert):

fn bar<T: Foo>() {
    static_assert(T::VALUE == 111, "Invalid value");
}

Or, in more general:

let _: [u8; x] = [0; y];

...is effectively the same as:

static_assert(x == y, "");

...and if either expression x or y depends on some template parameter, then checking such static assertion should be deferred to the point of instantiation of the given template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, I believe that, at least in theory, if a generic function is valid on its own, and it is instantiated with types that match its signature, instantiation should never fail. Or to put it differently, Rust generics are not like C++ templates in that they are fully type-checked before monomorphization. We use trait bounds and where clauses, which explicitly spell out requirements in a function's signature, instead of static assertions in function bodies.

This means that we need one of three things to be the case:

  1. We could give up and make generics more like C++ templates. This is a big change to the language with many real and potential downsides, and I don't see it happening. Being able to see all the requirements of a generic function in its signature is very useful.

  2. The user could be able to specify constraints on constants, as well as types, using where clauses. I fully expect this to happen eventually (in fact I plan to propose it), and it would provide other benefits (like specializing an impl based on a constant value?), but there are problems with relying on where clauses.

    Firstly, I think that it will take longer to figure out how to do this well; we need a syntax for such where clauses, we need to specify what kinds of constraints can be put on constants, and we need to spell out the impact on coherence checks, just to start with.

    Secondly, such where clauses can very easily clutter generic code. If a generic routine requires T::Value == 111, then a generic routine that calls it also needs to specify T::Value == 111, and that affects still other routines and containers, so that the constraint would start to clutter a large body of unrelated code unless we have a way to contain it. This may turn out to be inevitable for some cases, but it also simply looks ridiculous if your constraint is something that should always be true, such as T::N + T::N == 2*T::N.

  3. We could have the compiler become capable of figuring out certain things about constants on its own. I'm not really fond of the idea of having a solver that actually tries to do a lot of algebra for you. However the trivial case (T::VALUE == T::VALUE) seems to be a fairly safe thing for the compiler to understand, which is why this RFC bases its functionality for array sizes upon that extremely limited foundation.

    If I understood correctly, @freebroccolo suggested instead giving constants an inductive representation in terms of some type-level naturals, which could be provided via plugin or ideally in the standard library. This, in a sense, would use the type system to create an effective "solver" that's more predictable, limited, and rigorously specified, as opposed to some kind of SMT solver that would seem like pure compiler magic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't really understand what the difference is between your case 2 (being able to specify template argument constraints on associated constants) and having a C++ style static_assert inside the function body (other than that the former is nicer because then you see the function requirements just by looking at the function signature). In case Rust gets function specialization in the future, then those aforementioned two things would be fundamentally different.

Assuming your case 2 has been implemented using the obvious syntax, the following would be effectively the same as doing the equivalent C++ style static assertion inside the function body:

fn bar<A, B>() where
    A: Foo,
    B: Foo,
    A::VALUE == B::VALUE
{}

trait Foo { const VALUE: usize; }

...in that we are deferring the evaluation of the assertion that the two VALUEs are equal to the point of template instantiation. So, I don't see why the compiler should be able to prove that the following constraint/assertion is always true, no matter what type you instantiate the template with:

fn bar<A>() where
    A: Foo,
    A::VALUE + A::VALUE == 2 * A::VALUE
{}

Evaluating all those kinds of equations in constraints can be deferred to the point of template instantiation.

I don't see any use case for allowing code like let a: [u8; T::N + T::N] = [0; 2*T::N]; where T is a template type argument and T::N is a trait-associated constant of type usize. Just reject it altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't really understand what the difference is between your case 2 [...] and having a C++ style static_assert inside the function body

Three differences I can think of:

  1. Constraints in where clauses interact across function boundaries. If I had some function funky_fn<I, J> that called some other function (that called another function...) that ultimately called your bar as bar::<I, J>(), then I might need to also specify I::VALUE == J::VALUE on funky_fn (and maintain this condition on all functions in between). As I mentioned, the downside is that this is more verbose. The upside is that this forces the restriction on funky_fn parameters to be specified up front.

    With static asserts, I might not know that there was a restriction on funky_fn until I tried calling it the wrong way, and then the error message would come from all the way down in bar, which has completely different names for its type parameters, and which might be an undocumented implementation detail that I'm not even supposed to have to deal with. This is one contributing factor to puzzling template error messages you sometimes get from C++ compilers.

  2. When the constraint is specified as part of the function signature, the compiler can tell that A::VALUE == 2 and A::VALUE == 3 are non-overlapping during coherence checking, and thus allow two implementations that differ only with respect to these constraints. This touches on function specialization, but I just want to clarify that this is something you get even if constants are treated just like types are now, without any of the things proposed for future improvements in specialization.

  3. If you put a constraint on a method's signature in a trait, it can be taken into account by users who have no idea which implementations of that method they could get in a piece of generic code. Depending on how we want to approach object safety, we could also have trait objects specified in types like Box<Trait<CONST=3>>. Sometimes you want to talk about constraints on a trait item even when you don't know what the actual implementation of the trait is at compile time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only now realized that your argument was that given that template instantiations are not allowed to fail if their constraints are satisfied, then using something like let _: [u8; x] = [0; y]; is effectively asking the compiler to be able to prove an arbitrary math equation x=y. So, please ignore everything I said.

I think some kind of a mutual-agreement-based pruning of irrelevant discussion branches would be a nice feature here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood correctly, @freebroccolo suggested instead giving constants an inductive representation in terms of some type-level naturals, which could be provided via plugin or ideally in the standard library. This, in a sense, would use the type system to create an effective "solver" that's more predictable, limited, and rigorously specified, as opposed to some kind of SMT solver that would seem like pure compiler magic.

IIRC, what I was suggesting was essentially some sort of reflection based type-level computation. I've been meaning to write something more concrete about this but until then here is some intuition about how it should work.

Take a piece of code like this from my length-indexed containers example:

    pub fn insert<M: Nat, SN: Nat>(mut self, index: M, element: T) -> LVec<T, SN> where
        Compare: Fn(M, N) -> order::LT,
           Succ: Fn(N) -> SN,
    {
        self.data.insert(index.reflect(), element);
        unsafe { self.reindex() }
    }

Now, this is an old example that used plain binary natural numbers, but it is efficient enough to use for vectors with at least thousands of elements, probably more. Scaling up to much larger numbers is possible with a different encoding.

However, it would still be nice to have near-native performance for computation when type-checking these things. The way to do that, I think, is to just provide some additional information to the Rust compiler. If we forget about the fact that singleton values are "more typed" than the usual runtime values, we can construct a bijection:

// Positive binary naturals in bijective base-2 style
struct One;
struct Mul2 <N: Nat>(N);
struct Mul2s<N: Nat>(N);

// type is the builtin type to use, trait is the trait classifying type data
#[reify(type = u64, trait = Nat)]
trait Nat: Inductive {}

// cons is the expression interpreting the type data constructor
#[reify(type = u64, cons = 1)]
impl Nat for One {}

#[reify(type = Fn(u64) -> u64, cons = |n| { n * 2 })]
impl<N: Nat> Nat for Mul2<N> {}

#[reify(type = Fn(u64) -> u64, cons = |n| { n * 2 + 1 })]
impl<N: Nat> Nat for Mul2S<N> {}

struct Div2;
struct Div2p;
struct Succ;
struct Pred;
struct Add;
struct Mul;#[reify(type = Fn(u64) -> u64, fun = |n| { n / 2 })]
impl<N: Nat> Fn<(Mul2<N>,)> for Div2 {
  type Output = N;
  extern “rust-call” fn call(&self, (m2n,): (Mul2<N>,)) -> N {
    m2n.0
  }
}

#[reify(type = Fn(u64) -> u64, fun = |n| { (n - 1) / 2 })]
impl<N: Nat> Fn<(Mul2s<N>,)> for Div2p {
  type Output = N;
  extern “rust-call” fn call(&self, (m2sn,): (Mul2s<N>,)) -> N {
    m2sn.0
  }
}

// arguments are rules name, domain Rust type, codomain Rust trait classifying type data
reflect_rules!(u64_to_nat, u64, Nat) {
  (0) => { impossible!() }
  (1) => { One }
  (n) if $n & 1 == 0 => { Mul2<u64_to_nat!($n / 2)> }
  (n) => { Mul2s<u64_to_nat!(($n - 1) / 2)> }
}

The idea is to provide additional information with type-level "constructors" that says how to interpret them in terms of a normal Rust type. The inverse is provided through a mapping like reflect_rules!. Next, type-level functions in terms of trait implementations can be mapped to Rust functions or closures.

So back to the original example, when Rust starts to check something like Compare: Fn(M, N) -> order::LT, it can instead interpret M and N as u64 (e.g., cmp([[M]],[[N]]) == Ordering::Less) and use the builtin comparison and check that the invariant holds (the ordering can also be reified, all first-order data is trivial to encode this way). In fact, the compiler could be even more clever with these rules and avoid unnecessary conversions for larger computations.

The main issue with this approach is that Rust isn't expressive enough to be able to also encode that these rules mapping back and forth are correct. This is up to the programmer to prove. Essentially the rules need to form a well behaved equational theory, so you want things like this (call [[_]] reflect and {{_}} reify), interpreted in a manner where term equality is compatible with unification:

[[ Op: Fn(X, Y) -> R ]] implies [[ Op ]]([[ X ]], [[ Y ]]) == [[ R ]]
{{ [[Op]](x, y) == r }} implies Op: Fn({{x}}, {{y}}) -> {{r}}

I think there are a number of advantages to using an approach like this. For one, it degrades nicely and (assuming correctness) doesn't change the semantics, it's just providing a potential speedup. It facilitates a bridge between runtime computation and static computation that doesn't require an entirely new evaluation mechanism and extensions to the checker since it would re-use existing facilities for this via the plugin mechanism. It would require the ability to load plugins during the type-checking phase, but from what I could tell this shouldn't cause any fundamental problems. And since the structure of this kind of code is very predictable, it would be possible to automatically derive a lot of it, which could also be used for generating quickcheck tests, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tommit:

Don't worry about it. I know that it these discussions cause some clutter, and I hope that that doesn't discourage people from reading, but at the same time I had almost exactly the same reaction as you only a few months ago, when I was still thinking of Rust generics as being like C++ templates (even though I thought I had absorbed enough from Haskell, maybe ML, to know better). So my hope is that people reading are either interested enough to get through it, ADHD enough to skip to the next part, or green enough that it's educational for them.

@freebroccolo:

Thanks for your explanation (as you may have guessed, I mentioned you in the hope of getting you involved here!). I think that in a very broad sense, I had the gist of what you intended, but I'll have to consider the implementation details you mentioned, which I either didn't see before or simply forgot.

I'll have to consider this when I have the time to do it justice. In the meantime, I'd appreciate your thoughts on this RFC in general, and particularly if it's compatible with the route you'd take, whether or not it lines up conceptually. (Preferably on the main thread rather than this line note, which is getting a bit excessive; admittedly that's my fault.)

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 15, 2015
statics. There are two justifications for this.

Firstly, nested static items, whether they are static variables or functions,
must have static locations in memory. If these items could use the type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not expect that. Tried it out in the playpen, and you're completely correct. I would have expected every parameter-combination to have its own static, but that would be impossible to get correct, since multiple crates might be using a crate with a generic function and then you'd get multiple statics for the same parameter-combination.

@jroesch
Copy link
Member

jroesch commented May 28, 2015

cc me

Consider the following line of code in a non-generic context:

```rust
let a: [u8; T::N + T::N] = [0u8; 2*T::N];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting, that with generic T T::N should have a constraint where T::N <= usize::MAX / 2, otherwise it will fail in trans since overflow is an error, making the situation closer to "giving up and making generics more like C++ templates". It looks like even most basic operations on (non-C++ style) type level numbers would be impossible without support for such constraints.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, if it's part of the signature, it will be detected before instantiating the function's body with the given type parameters (i.e. "monomorphization").
If it's not part of the signature, then the operation would generate a runtime panic if it were operating on runtime values.

The way I see it, such compiler errors are "early warnings" for runtime errors - you can still get a program that type-checks if you define the result of all operations which could fail.

It is actually possible to codegen such a function to panic at runtime instead of emitting a compile-time error.
It could also trigger a lint, so you can get an error for functions which unconditionally panic, if you want that (could be an error by default, too).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand. If I write something like

trait HasN {
    const N: usize;
}
fn f<T: HasN>() {
    let a = [0u8; 2 * T::N];
    println!("{}", a[0]);
}

shouldn't the type [u8; 2 * T::N] be "evaluated" during instantiation of f leading to an error? And there's no runtime here, because f can't even be instantiated with T::N > usize::MAX / 2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Evaluating [0u8; 2 * T::N] might fail due to overflow at compile-time and the compiler could instead produce an unconditional overflow panic.
All I'm saying is that it's possible - whether or not it's also crazy, that's another matter entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can still get a program that type-checks if you define the result of all operations which could fail.

I see, f can be instantiated if we substitute any usize value (like result of wrapping multiplication) instead of an error. Then runtime will exist and panic can be produced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still feels like regression compared to C++, if error reporting is delayed until runtime, so early warnings would definitely be useful.

@huonw
Copy link
Member

huonw commented Jul 8, 2015

My SIMD work is weakly blocked on this. Specifically it would be useful to allow associated constants in expressions, since certain SIMD operations (shuffles, mainly) require compile-time constants to work properly. One example is something like

trait Num {
    const N: u32;
}
fn shuffle2<A: Num, B: Num>(v: f64x2, w: f64x2) -> f64x2 {
    shuffle_intrinsic(v, w, A::N, B::N)
}

struct Zero;
struct One;
// ...
impl Num for Zero { const N: u32 = 0; }
impl Num for One { const N: u32 = 1; }
// ...

This can then be used like shuffle2::<Three, One>(v, w). (This would be made nicer by more sugar, but that's not necessary for now.)

@archer884
Copy link

Would very much like to be able to use associated constants in default implementations (is that the right term?). As a matter of fact, I honestly thought that was the reason they existed up until yesterday. :)

@hauleth
Copy link

hauleth commented Oct 9, 2015

I'm also blocked by this. This would allow removing almost all allocations in my Octavo crate. I.e. user could make that:

let mut digest = [0; sha3::Sha3512];

digest.result(&mut digest[..]);

And all allocations in Hmac would be unnecessary.

Also error when using associated constant that was declared as struct param is very confusing.

impl<T: Digest + Default> Hmac<T> {
    fn expand_key(key: &[u8]) -> Vec<u8> {
        const SIZE: usize = T::BLOCK_SIZE + 7 / 8;
        // …

Produce:

src/mac/hmac.rs:29:29: 29:42 error: failed to resolve. Use of undeclared type or module `T` [E0433]
src/mac/hmac.rs:29         const SIZE: usize = T::BLOCK_SIZE + 7 / 8;
                                               ^~~~~~~~~~~~~
src/mac/hmac.rs:29:29: 29:42 help: run `rustc --explain E0433` to see a detailed explanation
src/mac/hmac.rs:29:29: 29:42 error: unresolved name `T::BLOCK_SIZE` [E0425]
src/mac/hmac.rs:29         const SIZE: usize = T::BLOCK_SIZE + 7 / 8;

@Zoxc
Copy link

Zoxc commented Mar 28, 2016

Substituting in defintions of constants and comparing the resulting AST would allow the type checker to prove some more values equal:

const Y = 2* PARAM;
const T = 2* Y;
let _: [u8; T] = [1; 2*Y]; // allowed
let _: [u8; T] = [1; 2*(2* PARAM)]; // allowed
let _: [u8; T] = [1; 2*2* PARAM]; // not allowed because * is not associative
let _: [u8; T] = [1; 4*PARAM]; // not allowed

@nikomatsakis
Copy link
Contributor

So this RFC has been sitting around open for a while. I think that with the MIR transition, as well as all the work that @tsion has been doing on miri, we're starting to make some progress here. It's also clear that we want to achieve the various aims of this RFC in one way or another.

But all that said, it seems that there is enough going on, and this particular RFC is not especially active just now. Therefore, I'm going to postpone this RFC (under issue #1038). I expect however that its content and ideas will make their way into whatever design/RFC we ultimately adopt.

Thanks to @quantheory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.