Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚲🏠 Renaming validity and safety? #95

Closed
Centril opened this issue Mar 7, 2019 · 41 comments
Closed

🚲🏠 Renaming validity and safety? #95

Centril opened this issue Mar 7, 2019 · 41 comments
Labels
A-validity Topic: Related to validity invariants C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary

Comments

@Centril
Copy link
Contributor

Centril commented Mar 7, 2019

It has been noted on several occasions, e.g. rust-lang/rust#53491 (comment) and rust-lang/rust#53491 (comment) that "valid" is prone to confusion with "safe" due to the way the former sounds. In particular, it raises the question "Valid with respect to what"?

Thus it is probably a good idea to rename one or the other into something else.

@RalfJung suggested "initialization invariant" (because any initialized T must satisfy such invariants) as a replacement of "validity invariant" while I suggested "machine invariant" (because it signifies the invariants the abstract machine requires of a T). If we rename validity to either of those then "safety" might not need renaming; but if it does, "type system invariant" may be one candidate.

@RalfJung
Copy link
Member

RalfJung commented Mar 7, 2019

One reason why I suggested "initialziation invariant" is because it connects with MaybeUninit.

MaybeNoMachine doesn't really roll very well...

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2019

Also note that the validity invariant only applies to data that the compiler considers initialized -- this has been a point of confusion several times. Having something with "initialization" in its name should help clarify that.

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2019

In the MaybeUninit docs I am talking about the "initialization invariant". But I have since regretted that choice, because e.g. a MaybeUninit<u32> is actually "initialized" in uninitialized memory -- and we might end up actually allowing the same for u32.

"Machine invariant" seems like a reasonable choice. But then MaybeUninit should be MaybeMachine?^^

@Ixrec
Copy link
Contributor

Ixrec commented Aug 4, 2019

On one or two past threads I've suggested that we use "sound" to mean "has defined behavior" and "unsound" to mean "is UB", and I still think that's a good idea. So by analogy, I'd suggest renaming the "validity invariant" to the "soundness invariant". At least to me, that makes it immediately obvious what it's supposed to mean: if you violate this, then your code is broken, full stop.

Regardless of which particular word we choose, I do think that it's strongly desirable that "X-ness invariant" corresponds to "if your code is X, it does not have UB". I would not be surprised if "sound" and "unsound" have different connotations to other people and we have to come up with something else, but if we end up agreeing on (for example) "validity invariant" for this, I believe we should also agree to start using "valid" with the very specific meaning of "does not have UB".

IMO, that's the biggest disadvantage of "initialization" or "machine" invariant: it clearly doesn't make any sense to say "if your code is initialized/machined, it has no UB", so we can't hope to kill two birds with one piece of jargon that way.

Also, to me "machine invariant" sounds very strongly like it's trying to distinguish hardware properties from software properties, which is presumably not a distinction the Rust abstract machine would make. It didn't even occur to me the "machine" in "machine invariant" might refer to the Rust abstract machine until I re-read Centril's comment above.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 5, 2019

On one or two past threads I've suggested that we use "sound" to mean "has defined behavior" and "unsound" to mean "is UB", and I still think that's a good idea. So by analogy, I'd suggest renaming the "validity invariant" to the "soundness invariant". At least to me, that makes it immediately obvious what it's supposed to mean: if you violate this, then your code is broken, full stop.

Code that does not violate this isn't necessarily sound, e.g., if it violates the safety invariant.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 5, 2019

@RalfJung suggested "initialization invariant" (because any initialized T must satisfy such invariants) as a replacement of "validity invariant" while I suggested "machine invariant" (because it signifies the invariants the abstract machine requires of a T).

"Initialization" is super confusing. The other day some reddit users where super confused about why this is always correct: MaybeUninit::<MaybeUninit<T>>::uninit().assume_init().

Their argument was that the documentation of MaybeUninit<T>::assume_init() says that if the T is not initialized, the behavior is undefined. Some people were trying to explain them that "uninitialized memory is initialized" but that obviously did not go very far. I can't blame them.

This issue was raised multiple times during the RFC, FCP, etc. of MaybeUninit and I still have no idea why we ended up in a place where we have to explain people in Rust that sometimes "uninitialized memory is initialized", but I rather don't introduce the same issue in the UCGs. If someone wants to introduce the same issue here, they should IMO make a case again for it.

I'm personally fine with using the term "validity", and I believe that any Rust programmer dealing with unsafe code and uninitialized memory is smart enough to learn a new word and its meaning, particular a word that is used many times throughout the spec.

This means that when reading the spec they will probably have to first unlearn the "initialization" terminology used in the API docs, then learn a new word and its meaning, and afterwards mentally replace "initialization" in the API docs everywhere with that word for anything to make sense ("uninitialized memory is initialized" vs "uninitialized memory is valid at type T").

This situation is unfortunate, but better than introducing here another overloaded term like "machine" that might end up confusing users in different ways that "initialization" does today.


"Machine invariant" seems like a reasonable choice. But then MaybeUninit should be MaybeMachine?^^

What does the Maybe_ prefix tries to achieve ?

Why can't it just be Uninit<T> or Invalid<T> with an ::assume_valid method ?

A Machine<T>/MaybeMachine<T> with an ::assume_machine method is IMO much weirder than Invalid<T>. I wonder how this will impact the readability of the text. We currently use "valid"/"invalid" in a lot of places in the text and discussions, and with machine we would need to use "machine"/"not machine" or similar.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

On one or two past threads I've suggested that we use "sound" to mean "has defined behavior" and "unsound" to mean "is UB", and I still think that's a good idea.

I disagree strongly. Why invent new terms for something that already has a term, namely, "has (no) UB"? If "has no UB" sounds too negative, an alternative positive wording I sometimes use is "has well-defined behavior".

Worse, "sound" is already used to mean something else: A safe function / API is unsound if it is possible for safe code using it to cause UB. Otherwise, it is sound.
So, soundness is a property of a library; having UB is a property of a program. @gnzlbg already used "sound" the same way up there. I'd appreciate if we could be consistent about this -- let us not burn terms by using them in different ways for no good reason.

EDIT: I propose we add this term to the glossary: #190

it clearly doesn't make any sense to say "if your code is initialized/machined, it has no UB", so we can't hope to kill two birds with one piece of jargon that way.

"code" isn't "initialized", that makes no sense indeed. But I think this is actually in fact another argument against your terminology: code can have UB in other ways than violating the validity invariant! Say we have code that uses unchecked_add to cause an overflow, but the validity invariant is always satisfied. This code has UB. It would be very confusing if the invariant and UB-ness used related terms.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

"Initialization" is super confusing. The other day some reddit users where super confused about why this is always correct: MaybeUninit::<MaybeUninit>::uninit().assume_init().

I admitted exactly that a few posts up. :)

This issue was raised multiple times during the RFC, FCP, etc. of MaybeUninit and I still have no idea why we ended up in a place where we have to explain people in Rust that sometimes "uninitialized memory is initialized"

The name of what is now called assume_init was heavily bikeshed. Nobody suggested a better name. Pretending like you don't understand the process at which we arrived here is disrespectful of everyone involved.

Multiple people reviewed the stablization PR and also the others about documenting MaybeUninit, renaming assume_init, all of that. The reason this wording stuck is that nobody proposed a better one.

Yes, you raised a concern related to this. You made an "excruciatingly verbose" proposal but said yourself "I don't know whether the painfully excruciating approach is worth pursuing". And moreover, it's not even the same concern -- that one was about validity not being enough because there is also the safety invariant, now the issue is that "uninitialized memory" can be considered initialized. I do not think anyone raised that particular concern before stabilization. Prove me wrong.

I'm personally fine with using the term "validity"

The issue is that "this is a valid Vec<T>" will likely often be read as "oh so it's safe to use it". This was discussed pre-stabilization. People seemed to prefer MaybeUninit over MaybeValid. I would have been fine to go either way, but it's a bit late for that now.

Why can't it just be Uninit or Invalid with an ::assume_valid method ?

This was discussed pre-stabilization, the argument was that Uninit<T>/Invalid<T> sounds like this is definitely uninitialized/invalid, which is not true.

I wonder how this will impact the readability of the text. We currently use "valid"/"invalid" in a lot of places in the text and discussions, and with machine we would need to use "machine"/"not machine" or similar.

"machine" is indeed not suited as an adjective. We'd have to find another short-hand for "satisfies the machine invariant".

This means that when reading the spec they will probably have to first unlearn the "initialization" terminology used in the API docs, then learn a new word and its meaning, and afterwards mentally replace "initialization" in the API docs everywhere with that word for anything to make sense ("uninitialized memory is initialized" vs "uninitialized memory is valid at type T").

We will have to disentangle "(un)initialized memory" from "(un)initialized variable (of type T)". Not great, but I think it can be done. "A variable of type T is initialized if it satisfies the validity invariant", or so.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 5, 2019

Yes, you raised a concern related to this. You made an "excruciatingly verbose" proposal but said yourself "I don't know whether the painfully excruciating approach is worth pursuing". And moreover, it's not even the same concern -- that one was about validity not being enough because there is also the safety invariant, now the issue is that "uninitialized memory" can be considered initialized. I do not think anyone raised that particular concern before stabilization. Prove me wrong.

People in that thread were confused about the "initialized" vs "validity invariant" semantics of "into_init" because it has "initialized" in its name. I thought this issue was important enough to raise it in the first sentence of the comment you just cited: "I think it is worth it to call out again that the only thing that into_init asserts is that the value satisfies the validity invariant of T, which is not to be confused with T being "initialized" in any general sense of the word."

I proposed using Invalid<T> with into_valid instead, but this was briefly dismissed here: rust-lang/rust#53491 (comment): "This is correct. OTOH, I feel "initialized" is a reasonable proxy for this in a first explanation."

I felt the issue fell in deaf ears, and the throughput of the discussion was too high for me to keep up with it, so I left it at that.

So this precise issue was raised there and known, and from what I understood it was an explicit design decision to use "initialized" instead. Even though it could be confusing, as that discussion showed.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

"I think it is worth it to call out again that the only thing that into_init asserts is that the value satisfies the validity invariant of T, which is not to be confused with T being "initialized" in any general sense of the word."

You are quoting without context. The post goes on demonstrating the issue with an example of something that satisfies the validity invariant, but not the safety invariant. So, my reading of what you said -- and I still interpret your post the same way when I re-read it now -- is that "initialized" above means "safety invariant". Your concern (or rather, my interpretation of your concern) was that someone could "initialize" a Vec<T> (or an AlwaysTrue in your example) but fail to actually make it a safe instance of that type. To account for this, an appropriate paragraph was added to the MaybeUninit docs.

If you meant that particular post to raise the concern that "uninitialized memory can be initialized because some validity invariants allow that", then we heavily miscommunicated. As far as I can see, nothing in your post indicates that you are worried about the interaction of uninitialized memory and assume_init.

I proposed using Invalid with into_valid instead, but this was briefly dismissed here: rust-lang/rust#53491 (comment): "This is correct. OTOH, I feel "initialized" is a reasonable proxy for this in a first explanation."

I was responding to the issue that Vec<T> can be "initialized" without being safe to use -- we can satisfy the validity invariant, which makes assume_init non-UB, but fail to satisfy the safety invariant, so e.g. dropping that vector will cause UB. I still think "initialized" is a reasonable proxy here.

Note that this is not the problem that is biting us now. MaybeUninit::<MaybeUninit<T>>::uninit().assume_init() is fine both from a validity and safety invariant perspective, so there is not even an overlap with the issue you raised in that post. The problem we are having now is specifically about uninitialized memory being valid for some types (0xUU, in my terminology). Uninitialized memory during assume_init AFAIK never came up in the pre-stabilization discussion.

into_valid was part of your "excruciatingly verbose" proposal (your terms), and then of another proposal involving two types and two traits, which I would also call "excruciatingly verbose". I have not seen a proposal to keep the API as it was but just rename assume_init to assume_valid or so. AFAIK, nothing involving valid was suggested during the into_inner bikeshed. It is sure possible I missed that though.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

The other problem with enshrining "valid" anywhere in the API is that there is no consensus yet (that I know of) that this is the term we want to stick to -- source: this issue is still open.

I'd be totally up for that. Of course the API is fixed now (we could consider renaming assume_init, but we should wait at least a year or two otherwise this looks silly), but the docs could use this term.

If we decide that the validity invariant arises from typed copies alone, another good term might be "representation invariant": in that case, a bit string is "valid" for a type if and only if the bit string represents some value at the type. However, this would preclude some options in #77: the representation relation is "pure" and cannot demand things like "the pointer must not be dangling" or "the pointer must point to memory that is valid for T".

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 5, 2019

So, my reading of what you said -- and I still interpret your post the same way when I re-read it now -- is that "initialized" above means "safety invariant".

If this is the case, you are completely misunderstanding both the post, and also the confusion that the users in that thread had back then, as well as the confusion that people still have now.

By "initialized"/"uninitialized" most users just mean whether that memory was touched or not (was something written to it?). They don't know that there are things like "validity" or "safety" "invariants", or any other kind of "invariant" for that matter.

They are literally just confused that the memory was not touched or written to, therefore it is uninitialized, and claiming that it is initialized is incorrect, becasue that would mean that something was written to it, even if it is all just zeros (independently of whether anything has to be written at all).

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 5, 2019

To expand, the confusion of users is orthogonal to whether something must actually be written at all, or whether what's written satisfies the validity or safety invariants, e.g., many users do believe that

let mut x: MaybeUninit<T> = MaybeUninit::uninit(); // x is uninitialized
x = MaybeUninit::zeroed(); // x is initialized

holds for all T because they see it only from the POV of whether something was written to the memory. They don't look at this from the POV of a safety or validity invariant, mostly because they do not know that such a thing exists.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

If this is the case, you are completely misunderstanding both the post, and also the confusion that the users in that thread had back then, as well as the confusion that people still have now.

I don't recall anyone being confused about this (uninitialized memory), do you have pointers?

And I do understand that the discussion right now is about uninitialized memory. But where does your post, your example back then indicate that you are worried about uninitialized memory? The example specifically was about creating an instance of a type that is valid but not safe. We do agree that that is not the issue right now, right?

We have two confusions about assume_init:

  • The Vec<T> case:
    let mut u = MaybeUninit::<Vec<T>>::uninit();
    u.as_mut_ptr().write_bytes(1u8, 1);
    let v = u.assume_init(); // no UB here: this is valid!
    drop(v); // but it's not safe, so there is UB here.
    You raised this before, you gave the AlwaysTrue example, but I considered this to be fixable by docs. People were confused about this, yes. But the proposed alternatives were very complicated APIs that even you admitted were probably overcomplicating things.
  • The "nested MaybeUninit" case:
    let mut u = MaybeUninit::<MaybeUninit<T>>::uninit();
    let v = u.assume_init(); // no UB here: this is valid!
    // And in fact this is also safe, we could run any safe code here
    This only came up after stabilization. The fix could have been to call this assume_valid or so, but that wasn't proposed (or rather, only proposed as part of the much more complex proposals trying to solve the Vec<T> case).

I think these are very distinct cases. In the first case memory is initialized, it is even valid, but then the issue is that there's another invariant, the safety invariant, that is violated. In the second case, memory is not initialized and yet it is valid.
The fact that these are distinct is also evident in the observation that fixing them requires very different measures to be taken. The first requires teaching the two invariants, the second does not.

If you meant to raise the "nested MaybeUninit" problem in that post, you seriously failed to express that.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

To expand, the confusion of users is orthogonal to whether something must actually be written at all, or whether what's written satisfies the validity or safety invariants, e.g., many users do believe that

If people think that ::zeroed() produces an "initialized T", they didn't read the docs. Not sure we can do much for people calling unsafe functions without reading the docs.

I proposed back then to use "initialized" as proxy for "good enough to be taken out of the MaybeUninit". There were some objections (the Vec<T> case above), but no better proposal (only something involving two types and/or two traits reflecting both invariants separately). The idea was to "re-define" initialized as more than just "something was written there". And I think insofar as this is strictly more, this worked -- or do you have concrete feedback from someone who (after reading the docs) was confused about whether MaybeUninit::<&T>::zeroed().assume_init() is okay or not?

Where this failed -- what I did not foresee and what nobody raised before stabilization -- is that sometimes, our "initialized" is actually less than "something was written there" (the "nested MaybeUninit case above). This is what confuses people now.

@Lokathor
Copy link
Contributor

Lokathor commented Aug 5, 2019

Gentlemen, the historical details of how we arrived here aren't very important or particularly worth arguing about.

We are here, we just need to decide how to move forward with the situation we've got.

Here's some points I'll throw into the pile:

  • People outside the UCG already use the phrase "valid bit pattern" to describe some bits that are allowed within type T.
  • It's very good to distinguish "insta-UB" memory from other unsafe things. Settings a bool to the bit pattern 3 is very different from settings the len of a Vec far too high. One of these can be recovered from, the other cannot.
  • Anecdotal Account: I've seen a lot more data transmutation / type punning hackery than I've ever seen people misusing uninitialized. I'm sure that uninitialized gets used plenty in some parts of the ecosystem, but mostly just u8 buffers from what I've seen when people ask about unsafe code in #black-magic or from what has been brought up so far in safety dance.

From this my takeway is that "Initialization Invariant" is probably not a good term to try and transition to. Telling people that initialized memory might not meet an initialization invariant is a little funky, and then explaining on top of that about a safety invariant that also might be required means you have to back up and admit that the initialization invariant doesn't include all of the safety rules that a fully initialized value of the type must follow. It quickly becomes a mess.

I don't think we need to transition to any new term at all. Just improve the explanation of Validity Invariant and Safety Invariant by a bit in the glossary.

EDIT: Ralf you posted while I was typing and covered some of the same points as me. XD oh well.

@RalfJung
Copy link
Member

RalfJung commented Aug 5, 2019

@Lokathor so you are suggesting "validity invariant" and "safety invariant" are good enough terms, and we should start to use them "for real" because the concepts are already out there and it is important to get consistent and precise terminology?

I won't disagree. But this will need consensus, maybe even involving the lang team. These are key concepts of the language. (Similar to "soundness", which I recently had to see someone use in a different way so I opened #190.)

@Lokathor
Copy link
Contributor

Lokathor commented Aug 5, 2019

Right. I think the best path is to continue with Validity/Safety as terms, and then build consensus around those terms and increase education of those terms.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 6, 2019

I don't recall anyone being confused about this (uninitialized memory), do you have pointers?

Sure, here and here.

For these users,

let mut x: bool = uninitialized();
x = true;

was ok because they don't think about validity or safety invariants. They just see uninitialized memory and initialized memory, and as long as you don't use it as such, they expected uninitialized memory to be fine. That's the exact same confusion that we are seeing here, where MaybeUninit usage of the word "initialized" means something very specific, but Rust users understand it differently depending on their background. Some people might understand it as initialized in such a way that it does not violate safety, while others understand it as uninitialized/initialized memory (that's the specific issue that the reddit users were having in the reddit thread).

This was discussed pre-stabilization, the argument was that Uninit/Invalid sounds like this is definitely uninitialized/invalid, which is not true.

Where was this argument made? I definitely raised that issue, but this wasn't the answer I got, e.g.,
rust-lang/rust#53491 (comment).

But where does your post, your example back then indicate that you are worried about uninitialized memory?

First sentence, emphasis mine: "I think it is worth it to call out again that the only thing that into_init asserts is that the value satisfies the validity invariant of T, which is not to be confused with T being "initialized" in any general sense of the word."

Safety or validity invariants are extremely Rust specific, and not even general Rust terms yet. Using "initialized" to mean any of those was predestined to confuse every user that attempts to transfers what "initialized" means in their dictionary or in the PLs they are familiar with. Particular C and C++ programers, where bool x = 42; is ok and that x is "initialized" to an invalid value, as opposed to bool x; which is uninitialized.

Gentlemen, the historical details of how we arrived here aren't very important or particularly worth arguing about.

AFAIK, these issues were raised back then, therefore they are not new, and the decision to use the "initialized" terminology balanced the tradeoff of being potentially confusing to some users with other tradeoffs like using a familiar word that users do not have to look up.

If the RFC and FCP process in which the lang-team approved the "initialized" terminology did not consider the issues being raised about how confusing that terminology is, then these could be considered new issues, and we can change old decisions if new issues are raised.

Otherwise, we cannot. And the open question is whether the spec should deviate from the terminology used in the MaybeUninit API. I think it is unfortunate that the words used in the MaybeUninit API confuse some users, but instead of using confusing terminology in the spec I think it is ok to just add a note to the glossary where the validity invariant is explained and mention that the MaybeUninit API calls this "initialized" or something like that.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2019

That's the exact same confusion that we are seeing here

I think that's a very far-fetched statement. There are lots of subtleties around uninitialized memory, and you are taking all of them and claiming they are the same. Just because two people are confused about the same thing doesn't mean they are confused the same way, and doesn't mean the same approach is best to help them.

That particular issue that you are now bringing up (which has basically nothing to do with what we were discussing before) was about someone not understanding that validity is required even on a typed copy, not just when the value is "used" in some obvious sense of the word. That doesn't even have anything to do with mem::uninitialized, it applies equally to transmute. Yes, we have to teach them this, but this is not what people are confused about in the reddit threads that triggered this.

In fact I think that the "initialized" terminology helps here because then we can say that a let b: bool = mem::uninitialized() does not actually "dynamically" initialize variable b because its validity/initialization invariant is not satisfied. I have used this analogy (static vs dynamic initialization, the latter being basically validity) successfully. "Variables must be initialized" makes sense, and then we just have to teach people that in Rust, "initialized variable" (not memory) comes with an extra invariant. This has the benefit that we don't have to declare a "typed copy" as a "use" in any sense, which trips people all the time.

This plan went fine, until people pointed out that sometimes uninitialized memory can be an initialized variable. And that's what triggered this discussion.


First sentence, emphasis mine: "I think it is worth it to call out again that the only thing that into_init asserts is that the value satisfies the validity invariant of T, which is not to be confused with T being "initialized" in any general sense of the word."

You said this already, and I responded to it before in great detail. Instead of repeating what you said, please reply to my post where I argue why that is a totally different problem that you raised there.

If the RFC and FCP process in which the lang-team approved the "initialized" terminology did not consider the issues being raised about how confusing that terminology is

It considered the existing issues that were raised for why the terminology is confusing ("the Vec<T> case" and "typed copy is a use"). Those issues were deemed manageable.

But after stabilization, new issues were discovered ("the nested MaybeUninit case"). I think this falls squarely into the "new evidence" category, and I do not understand why you are trying to convince us that it does not.


Where was this argument made? I definitely raised that issue, but this wasn't the answer I got, e.g.,
rust-lang/rust#53491 (comment).

Hm, I was sure I read this somewhere, but it's very hard to find anything in these discussions, and maybe I was wrong.
It may have been that I only saw the Invalid/Unsafe in your extremely complicated proposal (the one that also tried to reflect the safety invariant), which this and the following posts argue against (plus your own post introducing it didn't seem entirely convinced either).

Also remember at that point the type naming bikeshed in rust-lang/rust#56138 was 3 months done -- and in that bikeshed, I see you propose MaybeInvalid and @Centril propose MaybeValid, but a few people were worried about the term "valid" being too vague and it didn't seem like we could commit to "validity invariant" as a terminology. So then the name MaybeUninit was basically set -- if we keep re-opening old bikesheds, we'll never get anywhere. It is possible that because of this, Invalid didn't get as much consideration as it should have. But given the fate of MaybeValid as "too vague", I also don't think it would have fared well.

@comex
Copy link

comex commented Aug 7, 2019

FWIW, I don't like any of the alternate names. The name MaybeUninit is inaccurate in certain edge cases, but it does a much better job of conveying what the type is generally used for than MaybeValid or MaybeInvalid. However, in the documentation, where there isn't the same need for brevity, it makes sense to use different terminology.

@RalfJung
Copy link
Member

RalfJung commented Aug 7, 2019

All right, so I think we all agree that "initialization invariant" is a bad name for UCG purposes. We are having an argument about how foreseeable that was, but leaving that aside we have to see how we proceed. It is used in MaybeUninit docs currently and we should change those docs to align with UCG terminology, once UCG has terminology.

  1. We need to come to consensus on a name for "validity invariant".
  2. Then we need to update the docs to use that. Should this go with lang team FCP or so? Seems reasonable.
  3. Then maybe we should also update the MaybeUninit API itself. I agree with @comex though that the type name is reasonable and should stick. There might be arguments to rename assume_init to assume_valid (or whatever), but I think @comex' argument also applies there.

So, for step 1, which contenders do we have?

  • Validity invariant. UCG people are used to it by now, but there is still the counter-argument of vagueness: "valid for what?"
  • Machine invariant. Refers to what the Rust Abstract Machine knows about, which is appealing. We'd also need an adjective though, maybe "machine-valid"?
  • Anything else?

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 7, 2019

I think this falls squarely into the "new evidence" category, and I do not understand why you are trying to convince us that it does not.

I have no idea why you believe this is new evidence, and I disagree that it is, since it was pointed out, e.g., in the bikeshedding discussion, that using "uninitialized" as a word obscures that this is about "validity" (e.g. here), and sub-sequently, once it was decided that "uninitialized" should be used, it was requested that at least the documentation should talk about validity, because uninitialized-ness is irrelevant (e.g. here).

However, in the documentation, where there isn't the same need for brevity, it makes sense to use different terminology.

👍

There might be arguments to rename assume_init to assume_valid (or whatever), but I think @comex' argument also applies there.

I don't think the churn is worth it here. The brief API docs should use the term valid, and the body should explain it.

@Centril
Copy link
Contributor Author

Centril commented Aug 8, 2019

We are having an argument about how foreseeable that was, but leaving that aside we have to see how we proceed.

I agree with @Lokathor's sentiment that contrafactual arguments are not productive so let's please end that argument and start from the facts that cannot be changed. ;)

  1. Then we need to update the docs to use that. Should this go with lang team FCP or so? Seems reasonable.

I think that the language team is more interested in the concept itself rather than the label so I don't think FCP is necessary. Curious to hear what @nikomatsakis thinks.

  1. Then maybe we should also update the MaybeUninit API itself. I agree with @comex though that the type name is reasonable and should stick. There might be arguments to rename assume_init to assume_valid (or whatever), but I think @comex' argument also applies there.

I think renaming the MaybeUninit<T> API in any way that isn't just about docs is unnecessarily disruptive and a non-starter.

  • Machine invariant. Refers to what the Rust Abstract Machine knows about, which is appealing. We'd also need an adjective though, maybe "machine-valid"?

If not "initialization" I think this is a reasonable choice (though I'm biased) and machine-valid seems like a reasonable adjective. The way the adjective is phrased answers the basic question "Valid for what?"

  • Anything else?

There was the "representation invariant" which makes sense if the invariant fits the wording.

Maybe what we need now to gather all the pros & cons for the various choices in a document?

@Lokathor
Copy link
Contributor

Lokathor commented Aug 8, 2019

Valid for what?

My understanding of Validity Invariant has been that it meant references and typed accesses and such must have "valid bits for a value of that type to have".

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

The way the adjective is phrased answers the basic question "Valid for what?"

The problem with "initialized" is that everybody thinks they know what it means so they don't look it up and end up being wrong in subtle ways that bites them in the form of undefined behavior. This happens to Rust lang team, library team, ... team members, experienced users, and new users alike.

When "valid" was proposed, most people said is too vague, or asked: "Valid for what?". I agree that it is too vague, and some people might think they know what that means, and get it wrong. However, the people that said "Valid for what?" were all pointed to our definition in the glossary, which is a 10 seconds read, and then they knew precisely what it meant.

The proposal "machine-valid" makes it less vague, but it also incurs the danger of people thinking "Ah, valid for the machine, that is, the hardware, and bool x = 42 is ok _for the hardware, so that's ok for Rust as well.".

We need a word that is easy to use and remember once it is learned, but completely opaque for somebody that hasn't looked up its precise meaning, so that they will invest the 10 seconds it takes to look it up.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

From the two words we have used for this: many people actually asked "Valid for what?" but nobody asked anything about "initialized" (instead they just were surprised when it did not mean what they though it did). That makes "valid" and "validity invariant" much better terms from my point-of-view than "initialized" and "initialization invariant". I think it would be worth it to measure proposals to replace "valid" and "validity invariant" along this axis as well.

@Centril
Copy link
Contributor Author

Centril commented Aug 8, 2019

We need a word that is easy to use and remember once it is learned, but completely opaque for somebody that hasn't looked up its precise meaning, so that they will invest the 10 seconds it takes to look it up.

I'm starting to think this is a chimera -- "representation invariant" might be the only term that would not have any problems but on the other hand then my understanding is that we must also answer #77 in a certain way for the term to make sense.

The proposal "machine-valid" makes it less vague, but it also incurs the danger of people thinking "Ah, valid for the machine, that is, the hardware, and bool x = 42 is ok _for the hardware, so that's ok for Rust as well.".

If you want we can disambiguate to "abstract-machine-valid" or "virtual-machine-valid" (shortened to AM-valid or VM-valid respectively). Many won't know what the initialism "AM" will stand for but at least they will not think they know what it means so they will look it up.

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

I think "abstract-machine-valid" or "virtual-machine-valid" are better and I don't have a better proposal, but they are also quite a mouthful and I fear that we will just inadvertently abbreviate them as "valid", bringing us back to square one.

@Centril
Copy link
Contributor Author

Centril commented Aug 8, 2019

There might be no substitute for code review here to root out inadvertent abbreviations to "valid".

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 8, 2019

There might be no substitute for code review here to root out inadvertent abbreviations to "valid".

I wasn't even talking about the spec itself or the API docs, but just informal discussions, e.g., on Zulip, Discord, the forums, reddit, etc. Maybe everybody will stick to the script and write "abstract-machine-valid" or "AM-valid" everywhere, but I would find it reasonable if people abbreviate it to "valid" here and there, and this might result in "valid" becoming the de-facto term independently of what the spec or the docs say.

@Lokathor
Copy link
Contributor

Lokathor commented Aug 8, 2019

That is absolutely what will happen, there is no doubt.

@Centril
Copy link
Contributor Author

Centril commented Aug 8, 2019

@gnzlbg Maybe; I don't find "AM-valid" too burdensome. I suspect people will abbreviate to "valid" after having introduced "abstract-machine-valid" previously. E.g.:

Abstract-machine-valid (or "valid" for short) refers to...

...bla bla bla is valid...

which is fine IMO.

@RalfJung
Copy link
Member

RalfJung commented Aug 8, 2019

There was the "representation invariant" which makes sense if the invariant fits the wording.

A good point. This is a great name IMO if that's really how we end up justifying the entire validity thing. Which I think we should.

@RalfJung RalfJung added A-validity Topic: Related to validity invariants C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary labels Aug 14, 2019
@RalfJung
Copy link
Member

Currently, we de-facto use initialization invariant to refer to "validity invariant", but that is very confusing as some types do not required initialized memory to be "initialized"... so I think we can scratch that idea (still mentioned in the OP here).

Here's a naming scheme that occurred to me yesterday: language invariant and library invariant. After sleeping over it I still like it so I figured I would share it here.

@nikomatsakis
Copy link
Contributor

@RalfJung I do like that.

@Lokathor
Copy link
Contributor

I'm a fan of staying with "safety invariant" because it relates to the idea, "this is what you need to uphold for safe code to interact with your data"

@RalfJung
Copy link
Member

@Lokathor that's not entirely accurate though -- it's more, "this is what you need to uphold across a safe abstraction boundary". Safe code that you control may well work with data that violates the safety invariant.

@Lokathor
Copy link
Contributor

Well, code you control can always potentially go above and beyond the norm. I suppose to be more precise you'd say "this is what you need to uphold for unknown safe code to interact with your data".

I suppose I just like it to more clearly line up with the safe / unsafe terms rust already has.

  • "What do you call the invariants that all safe code upholds about some type of data?"
    • The Safety Invariant, of course.

@RalfJung
Copy link
Member

"this is what you need to uphold for unknown safe code to interact with your data"

Yes, the "unknown" part is crucial.

@JakobDegen
Copy link
Contributor

Closing for backlog bonanza. We probably can't change this and it doesn't seem useful to track in an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-validity Topic: Related to validity invariants C-terminology Category: Discussing terminology -- which term to use, how to define it, adding it to the glossary
Projects
None yet
Development

No branches or pull requests

8 participants