Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: "compact" enums and structs #311

Open
rust-highfive opened this issue Sep 24, 2014 · 4 comments
Open

Idea: "compact" enums and structs #311

rust-highfive opened this issue Sep 24, 2014 · 4 comments
Labels
A-machine Proposals relating to Rust's abstract machine. A-optimization Optimization related proposals & ideas A-repr #[repr(...)] related proposals & ideas A-syntax Syntax related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.

Comments

@rust-highfive
Copy link

Issue by glaebhoerl
Saturday Mar 01, 2014 at 12:33 GMT

For earlier discussion, see rust-lang/rust#12642

This issue was labelled with: B-RFC in the Rust repository


TL;DR By optionally taking away interior references, enums and structs can be represented much more compactly, potentially taking over most of the current use cases for manual bit twiddling code.


Rust leaves the layout of its enum types undefined for optimization purposes, but in practice most optimization possibilities are ruled out by the fact that interior references need to work. For instance, if you have an enum which contains another enum, you may want to collapse their discriminants together into a single field:

enum X { Foo(Option<bool>), Bar(bool) }

Here we could use a single discrimant field for both X and Option, so that it ranges 0..2, e.g. 0 => Foo(None), 1 => Foo(Some), 2 => Bar. Going further, we could pack in the bools as well, and represent X as a single u8: 0 => Foo(None), 1 => Foo(Some(false)), 2 => Foo(Some(true)), 3 => Bar(false), 4 => Bar(true). But in both cases, this makes interior references, e.g. match some_x { Foo(ref opt_bool) => ..., _ => ... } impossible, so we must refrain.

The idea would be to allow annotating an enum as "compact", in which case interior references are made illegal (but see below), and these optimization possibilities are opened back up: the compiler would be free to use any magic it wants to make the representation of the enum as compact as it is able, combining discriminants as above, taking advantage of tag bits in pointers, NaN-boxing, and so on. When generating code, the compiler would insert the necessary bit twiddling operations to pack the original values into this representation, and to take them back out again.

The same thing is possible for structs. For instance, a struct of bools could be represented as a bunch of bits:

compact struct Flags { 
    is_big: bool,
    is_red: bool,
    is_quick: bool,
    is_hungry: bool,
    ...
}

Now size_of::<Flags>() == size_of::<u8>()! If we let our imagination run further ahead, given types like i31, i30 and so on, we could also take advantage of those extra bits.

(It's important to note that this "compactifying" could always be deep, because the restriction against interior references would also be deep. If we have a non-compact struct type and store it in a compact one, references to the interior of the inner struct would likewise be forbidden, so it could be compacted as well. The example from above of representing enum X as a single u8 also relies on this property.)

I think this feature could be a much nicer solution for most of the use cases where manual bit twiddling code is currently necessary (and also most of the use cases for bitfields in C).


Up until now, for simplicity's sake I've been assuming that interior references would be forbidden. But it may be possible to allow them. The idea is that when you take an interior reference:

let hungry: &bool = &my_flags.is_hungry;

the value would first be copied out onto the stack (only physically, not semantically!), and then the reference would point to the stack value. This is not dissimilar to what happens if you write let nine = &9;. Because & is an immutable reference, the difference would not be observable.

Going deeper into the woods, assuming #12624, &mut borrows could also be represented by copying out the value at the beginning of the borrow, and then copying it back at the end. Again, because &mut would have exclusive access to the given object, the difference should not be observable.

I'm not sure if this would definitely work, but it seems like it might. Whether this is feasible has repercussions for syntax: if compact enums/structs are semantically identical to normal ones, and differ only in representation and performance, #[compact] could just be an attribute. If, however, the semantics differ, it would be more appropriate to use actual syntax to mark the difference.

@comex
Copy link

comex commented Dec 29, 2016

Bump. I just wrote some code that uses a manual bool flag to indicate the validity of another field, instead of Option, in order to make a struct more compact. Struct layouts where this could reduce size substantially are not uncommon; for example,

struct S {
    a: Option<u64>,
    b: Option<u64>,
    c: Option<u64>,
}

has a size/stride of 48 bytes (16 bytes for each Option<u64>) when it could be 32 (3 * 8 bytes for each u64, plus 3 bytes to indicate each enum variant, rounded up to a multiple of 8).

@glaebhoerl was incorrect about it being possible to fully preserve semantics, though. Local borrows can be simulated as they indicated, but with normal layout you can write fn foo(&S) -> &Option<u64>, which doesn't work if the layout is altered.

@creationix
Copy link

I love this idea. I would like to start writing interpreters for microcontrollers in rust, but manual bit twiddling is a pain and I stick with C bitfields or macros. (still painful).

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Feb 23, 2018
@Centril Centril added A-syntax Syntax related proposals & ideas A-repr #[repr(...)] related proposals & ideas A-optimization Optimization related proposals & ideas A-machine Proposals relating to Rust's abstract machine. labels Nov 26, 2018
@moonheart08
Copy link

I love this idea as well. It would be Very helpful in cases where you are constrained by memory. In fact, making data smaller has TWO advantages, and both can be used to great effect on modern systems:

  • Time needed to read the data from memory that's not in cache is lower, as the CPU has to make fewer requests to the vastly slower DRAM in order to load it in.
  • Memory locality can be improved (With the potential to be a VERY big improvement, on that note). Want to check if all fields are filled (Bunch of Options?)? Read 1-2 bytes in the same spot instead of a large number of sparsely packed bytes.

The time it takes to perform a single bitfield operation is flat out negated if the data isn't in L1 cache, and multiple bitfield operations can occur in the time it takes to read from L3 cache. The ability to improve memory locality using this is a very lucrative possibility for high performance code.

wycats pushed a commit to wycats/rust-rfcs that referenced this issue Mar 5, 2019
Introduce `<AngleBracketInvocationSyntax />`
@eonil
Copy link

eonil commented Oct 4, 2019

I like the idea of having arbitrary bit integers. Things like u31 gonna make memory optimization sweet. It's frequent to see situations needing 20-30bit resolution integer with optionality. Option<u16> is not enough and Option<u32> takes extra 32bit for no reason. This is something what we are paying for we don't use. This kind of stuffs sacrifices precious cache space and bus bandwidth. Bandwidth bottleneck often becomes an issue on graphics, therefore GPU likes bit-level packing. Actually Rust already have this one for optional pointers. Generalizing it gonna benefit many things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-machine Proposals relating to Rust's abstract machine. A-optimization Optimization related proposals & ideas A-repr #[repr(...)] related proposals & ideas A-syntax Syntax related proposals & ideas T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

6 participants