Skip to content

Rust for CXX programmers

Jeff Donner edited this page Mar 7, 2015 · 119 revisions

Rust for C++ programmers

Pointers and references

Pointers in C++ are vulnerable to bugs involving dangling or null pointers. In Rust, pointers are non-nullable and the compiler enforces the invariant that pointers are never dangling.

C++ Rust
T& foo foo: &mut T (also non aliased)
const T& foo foo: &T (immutable - not just read-only through this reference)
std::unique_ptr<T> owned pointer: Box<T>
auto p = make_unique(...) let p = box ...;
std::shared_ptr<T>
  • garbage collection: Gc<T>, (cycles allowed/collected)
  • reference counting: std::rc::{Rc, RcMut} (cycles must manually be broken or leak)
T *foo; unsafe{ ... foo:*mut T ... }
potentially null pointer/ref Option<Box<T>>, Option<&T>

Borrowed pointers (&) have the same representation as C pointers or C++ references at runtime, but to enforce safety at compile-time they are more restricted and have a concept of lifetimes. See the borrowed pointer tutorial for details. Note that unlike C++ references but like raw pointers, they can be treated as values and stored in containers.

For example, a container can return a reference to an object inside it, and the lifetime will ensure that the container outlives the reference and is not modified in a way that would make it invalid. This concept also extends to any iterator that wraps borrowed pointers, and makes them impossible to invalidate.

Owned pointers are almost identical to std::unique_ptr in C++, and point to memory on a shared heap so they can be moved between tasks.

Rc<T> pointers are reference counted , similar to std::shared_ptr<T>. The reference count is embedded in a single allocation with the underlying type.

If managed Gc<T> pointers are avoided, the garbage collection runtime is not used (the compiler can enforce this with the managed-heap-memory lint check). Garbage collection will be avoided in the standard library except for concepts like persistent (copy-on-write, with shared substructures) containers that require them.

Use Option<Box<T>> or Option<&T> to represent a potentially null pointer or reference accessible safely through match or its helper methods.

The use of Raw pointers is controlled through unsafe blocks.

Strings

C++ Rust
const char * &str (closest approx, similar use)
"foo" (const char *) "foo" (&'static str)
std::string String
std::string("foo") String::from_str("foo")

A String is a general purpose growable string; A string slice &str is a slice of a string, should be used as an immutable input where possible; you can obtain a &str from a String with .as_slice() Rust strings are sequences of variable length unicode chars, so are not cheaply randomly accessible. use .chars() to iterate the characters sequentially.

Rust strings are not null-terminated like C strings - length information is held in the collection or reference (e.g. a slice is a pointer + size)

Vectors & Arrays

The &[] and &mut [] types represent a fixed-size slice (immutable and mutable respectively) of any kind of vector.

Enums, sum types and option types

Rust's enum is a sum type like boost::variant (tagged union). The match expression is used to pattern match out of it, replacing the usage of a switch with a C++ enum, or a visitor with boost::variant. A match expression is required to cover all possible cases, although a default fallback can be used.

C++ Rust
enum T {Foo,Bar,Baz} enum T {}, C like enums
union {} enum T {}- tagged-union, variants have fields
boost::variant enum
boost::optional Option (an enum)
switch foo.kind { ` case BAR: ...break;` match foo { ` Bar => ...,`

Option and Either are very common patterns, so they're predefined along with utility methods. Option<Box<T>> and Option<&T> are optimised by the compiler to be stored as a single pointer, with a null value representing 'None'.

Nominal types

A Rust struct is similar to struct/class in C++, and uses a memory layout compatible with C structs.

Rust has no conversion operators - it is necessary to call explicit methods such as .to_str() to perform conversions. The philosophy is to make the cost (no hidden allocations) & code-path clearer.

RAII

Rust has no special syntax or semantics for constructors, and just uses static methods that return the type. Struct initializer syntax SomeStruct{field:value..} avoids the boilerplate of manually creating simple constructors that just copy values into fields.

Custom destructors for non-memory resources can be provided by implementing the Drop trait.

rust collections and smart pointers manage memory similarly to C++ RAII techniques.

Visibility

Struct fields and functions are private by default, but can be marked with pub. Privacy works at the module level: private fields are still visible to functions and methods throughout a module, and any sub-modules. This eliminates the need for C++ 'friend' or 'protected'; C++ can in some situations use a class as a module-like entity. See Modules. Where a C++ programmer might use a class with friends, a rust programmer places the group of classes in a module.

Tuples and destructuring

Tuples are built into the language syntax and along with destructuring assignment (available in arguments, match, let) they provide a convenient way of grouping multiple values. There are also 'tuple structs', where the type is named but the fields are not.

Copying and move semantics

Rust has no concept of a copy constructor and only shallow types are implicitly copyable. Assignment or passing by-value will only do an implicit copy or a move. Other types can implement the Clone trait, which provides a clone method.

Rust will implicitly provide the ability to move any type, along with a swap implementation using moves. The compiler enforces the invariant that no value can be read after it was moved from.

Concurrency

Rust does not allow sharing mutable data between threads, so it isn't vulnerable to races for in-memory data (data races). Instead of sharing data, message passing is used - either by copying data, or by moving owned pointers between tasks.

Exceptions

Rust does not include C++-style exceptions that can be caught, only uncatchable unwinding (panic) that can be dealt with at task boundaries. The lack of catch means that exception safety is much less of an issue, since calling a method that panics will prevent that object from ever being used again.

Generally, errors are handled by using a enum of the result type and the error type, and the result module implements related functionality.

Memory safety

In addition to the prevention of null or dangling pointers and data races, the Rust compiler also enforces that data be initialized before being read. Variables can still be declared and then initialized later, but there can never be a code path where they could be read first. All possible code paths in a function or expression must also return a correctly typed value, or fail.

This also means indexing a string/vector will do bounds checking like vector<T>::at() (though there are unsafe functions in vec::raw that allow bypassing the bounds checks)

unsafe {} blocks are used to enable the use of raw pointers, *T, on which you can do pointer-arithmetic and dereferencing to recover all the power of C and C++. Functions may be marked as unsafe, hence only callable from unsafe blocks.

Unsafe code is used to implement standard library types, providing safe abstractions for most tasks.

by contrast, an entire C++ program is effectively an 'unsafe block' - if a rust application does 'segfault', the fault lies with the design of an unsafe library, rather than any safe code using it.

Pointer Aliasing

Thanks to memory/pointer safety and more rigorous immutability, rust can guarantee when multiple writable pointers don't access the same object. This gives more opportunities for caching values in registers, reducing loads and stores, and low level scheduling optimizations by the compiler. This avoids the need for 'restrict' that is used as an extension in some C/C++ compilers for performance sensitive code.

Methods

C++ Rust
class Foo{ void bar()} struct Foo; impl Foo { fn bar(); }
size_t foo() const fn foo(&self) -> uint
size_t foo() fn foo(&mut self) -> uint
static size_t foo() fn foo() -> uint

Methods in Rust are similar to those in C++, but Rust uses explicit self, (which must be declared in the parameter list) so referring to a member/method from within a method is done with self.member.

Methods for T are added in impl T {} blocks, rather than being declared inside a struct/class. impl <Trait> for <Type> {} is required to declare methods useable in polymorphic code (generics or trait-objects)

It's also possible to take self by-value, which allows for moving out of self since the object will be consumed.

For an example, the PriorityQueue struct in the standard library has a to_sorted_vec method that takes self by-value, making PriorityQueue::from_vec(xs).to_sorted_vec() an in-place heap sort.

Parametric polymorphism

Rust generics are more like the proposed concepts extension for C++, not like the templates that exist today. This allows templates to be type-checked at the time you write them, not when you instantiate them.

In Rust, functions and structs can be given generic type annotations (like templated functions and structs), but to actually call a type's methods and use operators, the type annotations must be explicitly bounded by the traits required. Annotating the type bounds for functions is only necessary at the module-level, and not for nested closures where it is inferred.

Despite the different model in the type system, traits are compiled to code similar to what templates would produce.

Function Overloading

Rust can only select function overloads based on the receiver (and sometimes through return values due to Hindley-Milner type inference).

To 'overload' functions on multiple parameters, one must create an extra layer of traits for something that looks like double-dispatch, with each parameter being moved to the receiver by an intermediate function. Although this is significantly more boilerplate, the benefit is that it's also ready for runtime polymorphism through trait-objects if needed, and traits improve error messages for generics(templates).

Without overloading multiple parameters, the information you pay in writing more specific function calls is sometimes recovered with HM type inference feeding back through its' arguments.

You can also consider Enums for conveniently passing varied options to a function - or use rusts powerful macros for wrapping extra convenience around function calls (eg, it is possible to annotate arguments with keywords & special syntax, handle variadics etc)

Runtime Polymorphism

Instead of adding virtual functions to classes, Rust uses Trait Objects. A trait object is a fat pointer to data and an associated vtable, created by "&<expr> as &<traitname>". Any number of traits can be implemented for an existing type without needing to establish an inheritance heirachy, and there is no 'diamond inheritance problem'.

It is possible to attach a new vtable to an existing struct instance (with a borrowed pointer) - so one library can take data from another library, without the other having to have even known the interfaces required.

Metaprogramming

Rust includes macros (not textual ones) and syntax extensions which can be used for many of the other use cases of templates and constexpr. These macros handle repeated items in patterns, can match expressions,blocks parsed properly, and separate arguments with custom syntax elements. As such they can achieve what the 'x-macros' pattern in C does in a single macro definition/invocation. Syntax extensions can also be used in place of user-defined literals. Being handled within the AST, they do not have many of the hazards of C macros and are considered to be a useful tool, rather than an obsolete anti-feature as in C++.

Module System

Rust's modules don't work like most other languages. Modules are namespaces, not 'imported' into individual sources. mod <module_name>; is used from the crate root or directory 'mod.rs' file to reference other source files by mounting their contents in the module tree. 'mod.rs' files are a special workaround to allow rust's modules tree to mimic a directory structure.

Any public items can be referenced from other modules using the full root-relative path e.g. ::foo::bar, or relative using self:: , super::. "use" directives are a little like 'using namespace' or 'using', and filling some of the role of '#include' - providing shorter aliases for paths, or bringing individual symbols into scope.

A common point of confusion is that what appears in use directives is always crate relative, whilst throughout the rest of the source file, paths are relative to the current module.

'mod <name> { ..definitions..}' statements can create nested namespaces containing items, replacing the use of nested classes in C++.

All Categories:

Clone this wiki locally