Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict constants in patterns #1445

Merged

Conversation

nikomatsakis
Copy link
Contributor

@nikomatsakis nikomatsakis commented Jan 6, 2016

Feature-gate the use of constants in patterns unless those constants have simple types, like integers, booleans, and characters. The semantics of constants in general were never widely discussed and the compiler's current implementation is not broadly agreed upon (though it has many proponents). The intention of adding a feature-gate is to give us time to discuss and settle on the desired semantics in an "affirmative" way.

Because the compiler currently accepts a larger set of constants, this is a backwards incompatible change. This is justified as part of the "underspecified language semantics" clause of RFC 1122. A crater run found 14 regressions on crates.io, which suggests that the impact of this change on real code would be minimal.

Note: this was also discussed on an internals thread. Major points from that thread are summarized either inline or in alternatives.

Rendered view.
tracking issue: rust-lang/rust#31434

@nikomatsakis nikomatsakis added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jan 6, 2016
| u8 | u16 | u32 | u64 | usize // unsigned integers
| char // characters
| bool // booleans
| (B, ..., B) // tuples of builtin types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to consider including &str and arrays as well, since they have a well defined “primitive” memory-representation equality.

@nagisa
Copy link
Member

nagisa commented Jan 6, 2016

Pretty strongly in favour.

@retep998
Copy link
Member

retep998 commented Jan 6, 2016

This RFC is a good idea. Since it only affects matching on constants of non-primitive types which is quite the sticky situation, I am in favor of this RFC.

-- the same issues arise if you consider exhaustiveness checking.

On the other hand, it feels very silly for the compiler not to
understand that `match some_bool { true => ..., false => ... }` is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say true and false feel more like unit variants of "enum bool { false, true }" than opaque constants.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

This breaks lots of code of this kind:

pub struct Errno(pub c_int);

pub const Interrupted: Errno = Errno(EINTR);

My code contains at least 26 instances of this pattern (excluding flags) and an uncountable number of variants:

ack "pub struct .*\(pub .*\)" | grep -iv flags | wc -l

For instance, the Errno struct comes with 132 variants. How am I supposed to repair this?

@petrochenkov
Copy link
Contributor

pub struct Errno(pub c_int);

Newtypes like this are exactly the pattern which is responsible for the most of breakage on crates.io and in rustc itself.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

I've also no idea what this has to do with constants.

match x {
    Errno(1) => 1,
    _ => 0,
}

if x == Errno(1) { 1 } else { 0 }

are not guaranteed to produce the same result either. Whether Errno(1) is behind a constant or not is irrelevant here.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

Of course the same also applies to enums. E.g.

enum A {
    X(u8),
    Y,
}

impl PartialEq for A {
    fn partial_cmp(&self, other: &A) -> bool { false }
}

Matching will return results that are incompatible with the PartialEq results.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

Given that the behavior for enums is already fixed (at least the RFC doesn't suggest otherwise), and given that newtypes with all fields public are very similar to enums, it seems that the current behavior (which agrees with the enum behavior) is already the expected behavior.

@nagisa
Copy link
Member

nagisa commented Jan 6, 2016

@mahkoh the problem here is using enums has a well specified structural equivalency, but using a constant doesn’t make the equivalency used obvious. There’s a bunch of additional reasons outlined by the RFC why structural equivalency with constants is not satisfactory.

@nikomatsakis
Copy link
Contributor Author

@mahkoh

I've also no idea what this has to do with constants.

I see a difference between matching on a constant C (which is defined as Errno(1)) and matching on Errno(1) directly in the pattern. Basically, to me, the pattern translates into a set of "test-and-extract" operations that will be performed. So, the pattern Errno(pattern>) compiles to "test that the variant is Errno and extract the value". We then recursively apply <pattern> to that value. Sometimes there are no values to extract: if you match a nullary enum variant like None, it just means "test that the variant is None". Along these lines, matching a constant pattern like C (resp. 1) corresponds to "test that the value is equal (for some definition of equal) to C (resp. 1) and extract no values".

Looking at it like this, there is no reason to expect that

match foo { Errno(1) => ... }

and

const C: Errno = Errno(1);
match foo { C => ... }

would do the same thing, just as you would not necessarily expect that:

if errno.0 == 1 { ... }

and

if errno == C { ... }

would do the same thing.

PS, I am summarizing something I wrote on the internals thread, just for reference.

@nikomatsakis
Copy link
Contributor Author

@mahkoh

This breaks lots of code of this kind

Yes, the proposal is backwards incompatible, and this is exactly the kind of cases that would no longer work. You would have to either use the feature-gate or translate such code from match foo { BAR => ... } to match foo { c if c == BAR => ... }.

UPDATE: To be clear, I don't want to break code either, and I also really want constants of user-defined types to work in patterns! I don't mean to sound glib. But, based on the crater results, it does seem that we have room here to rollback to semantics everyone can agree on. This would allow us to have a productive discussion later on focusing on what the best overall semantics ought to be for constants in a pattern.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis

I'm fine with a feature gate if it can be enabled at the point where the type is defined. E.g.

#[structural_match]
pub struct Errno(pub c_int);

@nikomatsakis
Copy link
Contributor Author

@mahkoh

I'm fine with a feature gate if it can be enabled at the point where the type is defined.

By this, do you mean that the clients who are matching on Errno values would not need to use the feature-gate?

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis Exactly.

@nikomatsakis
Copy link
Contributor Author

@mahkoh Hmm. It is normally something we would not allow, since if we decided to use semantic equality (for example), then there might not be an equivalent behavior to #[structural_match] in the future (and I wouldn't want to preserve the attribute). OTOH, since I suspect you do define Eq for errno to act exactly as structural match acts, that might be ok (modulo exhaustiveness). I have to think about it.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis As long as the attribute is behind a feature gate, breaking it again doesn't seem to be a problem.

@seanmonstar
Copy link
Contributor

This sort of breaking change seems like it should be a semver major bump. It's not a security, bug, or soundness fix. It's instead trying to improve reasoning about code. Not that the goal is bad, but that breakage for that goal seems unacceptable for the 1.x versions.

@mahkoh
Copy link
Contributor

mahkoh commented Jan 6, 2016

@nikomatsakis Whether the behavior is the current one or one that uses PartialEq, the code I'm worried about behaves the same. This attribute is simply supposed to bridge the gap. Maybe call it #[deprecated_const_match] if that makes it more obvious.

@nikomatsakis
Copy link
Contributor Author

@seanmonstar

This sort of breaking change seems like it should be a semver major bump. It's not a security, bug, or soundness fix. It's instead trying to improve reasoning about code. Not that the goal is bad, but that breakage for that goal seems unacceptable for the 1.x versions.

Personally, I consider this a bug fix. That is, I did not expect that constants of arbitrary types should be matchable. In fact, I opened a bug about it before 1.0, but that bug was accidentally closed when I wrote a comment like "does not try to fix #20489", and hence it dropped off of my radar when triaging for "things that ought to be feature-gated before 1.0". However, clearly there is room for disagreement here.

@petrochenkov
Copy link
Contributor

Curiously, #[structural_match] already exists and is called #[derive(PartialEq)].
#[derive(PartialEq)] is essentially an assertion that structural and semantic matching do the same thing, so there's no choice left what to do when performing a match and we can unambiguously generate the same code as today.
Maybe #[derive(PartialEq)] should not only generate some code during expansion, but also emit a flag this_adt_is_usable_in_const_pattern_matching,which is considered in later stages of compilation.
this_adt_is_usable_in_const_pattern_matching can become a separate attribute eventually if necessary, but generating it as a part of #[derive(PartialEq)] allows to avoid breakage and generally looks like a reasonable way forward, covering most of practical cases.

Exhaustiveness checking is a separate orthogonal concern, it still may be better to turn it off for constants regardless of this_adt_is_usable_in_const_pattern_matching

@nikomatsakis
Copy link
Contributor Author

@petrochenkov

Curiously, #[structural_match] already exists and is called #[derive(PartialEq)].

That is mostly true, but only if all types embedded within the struct also #[derive(PartialEq)]. (Still, that's a very interesting thought.)

@petrochenkov
Copy link
Contributor

@nikomatsakis
#[structural_match] should not affect nested things, otherwise you could make an "opaque" structure transparent/matchable by using a newtype:

`#[structural_match]`
struct Transparent(Opaque);

So, yes, it was implied, that the both #[structural_match] or #[derive(PartialEq)] have to be checked recursively.
In theory marking a structure containing non-#[structural_match] fields with #[structural_match] can be made an error, but derive is not smart enough to emit #[structural_match] conditionally to avoid this error, so I don't consider this variant.

@nikomatsakis
Copy link
Contributor Author

@petrochenkov @mahkoh

So, on reflection, I quite like this idea. My assumption is that this would be something like #[fundamental] -- that is, an attribute that we intend to never stabilize but which lets us adopt the subset of semantics we know we want while we dicker about the remainder. In this case, using the semantics you proposed, we could ultimately resolve this in at least two ways:

  1. Removing the attribute and adopt the current "structural match" implementation.
  2. Removing the attribute and adopt the "semantic equality" interpretation. (*)

Shall I adjust the RFC? I think so.

(*) Actually, the attribute could even stay as a kind of performance optimization. That is, @pnkfelix and I have talked about the compiler recognizing when the PartialEq impl resulted from derive and generating better match code in that case, since its semantics are well understood. One could imagine adopting an unsafe attribute or something along those lines, which PartialEq implicitly provides.

@petrochenkov
Copy link
Contributor

#[structural_match] can probably be a (possibly unsafe) structural trait (OIBIT) and not an attribute.
Edit: It can't be an OIBIT, but it can be a normal trait, maybe even a sub-trait for PartialEq, kind of like Copy for Clone.

trait TrivialPartialEq: PartialEq {}

@petrochenkov
Copy link
Contributor

On interaction of #[structural_match] with floats.
(I assume that match can be used only with #[structural_match] types)
Edit: by "permitted in match" below I mean permitted in constants in match.

Variant 1.
The attribute gives hard guarantees about PartialEq properties.
In this case it can be used for optimizations, for example operators == for #[structural_match] types without padding can be translated into memcmps.
In this case floats simply can't be used in match, their equality comparison is not trivial.

Vartiant 2.
The attribute is a rubber stamp, it says "yeah, this can be used in match" and that's all. match in its turn performs a structural comparison OR runs partial_eq even if they do different things. In this case floats are allowed in match and compared structurally OR semantically, respectively.

Variant 3.
Floats are an exception, they are not #[structural_match], but permitted in match and compared structurally or semantically. Structures containing floats are not permitted in match,


Some bikeshedding, names for structural_match:
trivially_comparable, default_comparable - "comparable" can be confused with PartialOrd
trivial_equality
trivial_partial_eq (or TrivialPartialEq for a trait) - says exactly what it does, seems like a good if it provides hard guarantees.

@mrmonday
Copy link

Is there a nice way to deal with this, that doesn't require a loss of type safety, or far more verbose code? libpnet has a lot of code like:

#[derive(Copy, Clone, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub struct EtherType(pub u16);

pub const Ipv4: EtherType = EtherType(0x0800);
// ... list of constants ...
match some_ethertype {
    Ipv4 => {},
    Other => {}
}

The easy way to solve it is to remove the type safety, which is what @wictory's pull request did initially - these constants have been specifically designed like this though to minimise errors (there are lots of magical numbers for lots of different things), whilst still allowing the ability to use random values if you explicitly qualify it.

@eddyb
Copy link
Member

eddyb commented Mar 31, 2016

@mrmonday But that example compiles just fine (I added more variants to make sure there wasn't an exhaustiveness-related special casing).
@bluss pointed out on IRC that if you implement Eq manually you do get the errors, maybe that's what pnet is doing?

@eddyb
Copy link
Member

eddyb commented Mar 31, 2016

Turns out that syntex partially expands #[derive] so the code in libsyntax adding #[structural_match] on #[derive(..., PartialEq, ..., Eq, ...)] doesn't trigger. I'll try to fix this in the compiler.

EDIT: hang on, those attributes aren't stable, how was any of that compiling at all?!

@nikomatsakis
Copy link
Contributor Author

@eddyb sigh. I can readily fix the code to have two distinct attributes, I was just being lazy.

@nikomatsakis
Copy link
Contributor Author

Although, triggering on #[derive_Eq] is a bit harder. Hopefully not that hard. I'm trying to remember why I chose not to do so -- I feel like there was a reason for that.

lifthrasiir added a commit to chronotope/chrono that referenced this pull request Jul 25, 2016
@Centril Centril added A-patterns Pattern matching related proposals & ideas A-const Proposals relating to const items A-const-eval Proposals relating to compile time evaluation (CTFE). labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-const Proposals relating to const items A-const-eval Proposals relating to compile time evaluation (CTFE). A-patterns Pattern matching related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.