std::unique_ptr and Rust

If you are a first-time reader of my blog series, warm welcome to you! 😊 I recommend that you spend 30 seconds reading the Introduction, which explains how articles are created.
Takeaways
Today I would like to discuss about replacement for std::unique_ptr in Rust. We will focus on similarities and differences, along with some extensions that we find in Rust. After the reading, you shall:
have a clear path for transition from C++ to Rust when writing/rewriting code and looking for similar constructs,
know similarities between C++ and Rust when it comes to unique ownership of dynamic allocation
The seed to continue exploring Rust around this topic
Intro
Back in the old days, we all used to use plain old pointers in C++. Probably many of us knew that this always leads to many issues and came up with different abstractions to solve the problems until the C++ Standards Committee introduced std::unique_ptr in C++11 (with extensions in C++14). From the moment it landed in major compilers (i.e., GCC 4.5, who still remembers it?), it became the main way to manage dynamically allocated memory with ownership being taken care of.
By contrast, in Rust, pointers were never a real problem because, from the beginning and by design, they cannot be dereferenced1. Nevertheless, dynamic allocation and ownership of underlying objects were needed as a fundamental code construct that is used in almost all codebases. That's why the Box<T> type was provided.
Comparision
As you already know, the Box<T> is a direct replacement for std::unique_ptr<T>. Let's start with a comparison
| Box<T> | std::unique_ptr<T> | |
| Movable | Yes | Yes |
| Copyable | No | No |
| Automatic memory deallocation | Yes | Yes |
| Constructable from previously allocated memory | No1 | Yes |
| Can leak resources | No1 | Yes |
| Allow in-place construction | No/Yes | Yes |
| Can be empty (aka nullptr) | No | Yes |
Let's have a detailed look at the above points
Movability and Copyability
In both languages, the compiler will make sure that once an object is created, you can have only one instance of it at a time. This is achieved by different means (in C++ by deleting copy & assignment operators, in Rust by borrowing rules and not providing Copy trait).
int main() {
auto dynamicData = std::make_unique<Data>();
auto newObject = dynamicData; // ❌ Will not compile, missing copy constructor
auto movedObject = std::move(dynamicData); // ✅ All fine, the owned memory was moved into this instance
return 0;
}
And equivalent in Rust
struct Data {
some_field: i32,
}
fn main() {
let dynamic_data: Box<Data> = Box::new(Data { some_field: 0 });
let new_object: Box<Data> = dynamic_data; // ✅ This is valid because ownership of the Box is moved to new_object
// Copy cannot be made because Box does not implement the Copy trait
// println!("{}", dynamic_data.some_field); // ❌ This would cause a compile-time error because dynamic_data has been moved to new_object
}
Automatic memory deallocation
As before, implementation, language, and compiler will ensure that once the object's lifetime is finished, the destructor (or Drop in Rust) will be called, which will then call destructor (or Drop) on held T and free the memory allocated by the system allocator.
The only subtle difference here is that std::unique_ptr supports custom deleters that can be part of the object instance and the type. This is currently not possible in stable Rust when using the plain Box type.
Box is already in nightly (new_in), so it will land in stable release in some timeConstructable from previously allocated memory
In C++, std::unique_ptr lets us construct its instance from a previously created T*. Moreover, you can change the managed memory during its lifetime via the .reset(T*) method call. This opens up a whole bunch of checks that developers have to do to ensure that, after the creation of std::unique_ptr, the underlying memory is correct:
nullptrcheck onT*upon creationensure via external tooling or extensive reviews that
resetis not called withnullptr, or each subsequent dereference has anullptrcheck beforehand
In the end, construction from a raw pointer is still dangerous because you also need to ensure that no one else has this pointer and will not attempt to release it. We can clearly see that even though C++ provided constructs that solve a lot of pointer issues for us, it's still error-prone, easy to misuse, and becomes tricky to catch in review or during the maintenance period as bugs emerge.
In Rust, the Box cannot be created from previously allocated memory1. That's it.
So simple, so safe. Box will take care of allocation, take over the pointer, and hide it inside so you cannot misuse it. No nullptr checks, no hard-to-spot replacement of managed objects. You are fully covered.
Let’s head into an example
#include <iostream>
#include <string>
struct Data {
int32_t someData;
};
int main() {
auto * instance = new Data();
std::unique_ptr < Data > ptr {
instance
};
std::cout << ptr -> someData << std::endl; // ✅ All good for now
// Some time later in code base
ptr.reset(); // ❌ Next `ptr` dereference will be undefined behavior
std::cout << ptr -> someData << std::endl; // ❌ As above, will lead to crash or not - depends on compiler (try it in ccp.sh and godbolt)
std::unique_ptr < Data > ptrOther {
nullptr
}; // ❌ Same issue as after reset, `ptrOther` contains nullptr so dereferencing is undefined behavior
std::cout << ptrOther -> someData << std::endl;
}
And equivalent in Rust
// None of these errors can happen in Rust 🎊
Can leak resources
Using std::unique_ptr, one can leak resources by simply calling release. This gives back a pointer to the user (so now let's hope we remember to release it) and potentially leaves the instance with nullptr. So we are then in: Can be empty
Box does not allow a leak of the resource1. You can only obtain a reference to it. There is no way that we will end up with dangling pointers, unfreed memory, or anything else.
Can be empty (aka nullptr)
As in previous points, std::unique_ptr it can be constructed from any pointer, so it can be nullptr ! As mentioned, it is cumbersome to prove that once you are dereferencing data inside it, it is ensured that you are accessing the correct memory location.
In Rust, you are covered. No way to have nullptr in Box or a wrong memory location1.
Unsafe Rust
Everything that was written above for Box is true as long as you are using safe Rust. However, in practice, many of the mentioned operations, like releasing resources, are needed in complex implementations. That's why Box also provides a set of unsafe APIs, which are similar to std::unique_ptr. The core difference is that such usages have to be clearly annotated in code via unsafe {} blocks, which draw attention during reviews. Additionally, in practice, this is needed only by some libraries that are later used widely, so the majority of us will simply never need to use it, staying SAFE ☺️.
Rust extensions
Box is a fundamental code unit in Rust. It's a built-in compiler implementation with additional guarantees that std::unique_ptr does not have. For example, Box is guaranteed to have the same memory footprint as T* with no overhead.
Additionally, Box provides a very powerful and robust API, well-integrated with custom and base types, allowing conversions, extraction, and other manipulations. One notable feature is the ability to provide an object API that depends on self being Box. This enables developers to model APIs that are only available for a given type once it's wrapped in Box.
struct Data {
some_field: i32,
}
impl Data {
/// This method is only available on `Box<Data>`
fn available_on_boxed(self: &Box<Self>) -> i32 {
self.some_field * 3
}
fn square(&self) -> i32 {
self.some_field * self.some_field
}
}
fn main() {
let dynamic_data: Box<Data> = Box::new(Data { some_field: 4 });
println!("{}", dynamic_data.available_on_boxed());
println!("{}", dynamic_data.square());
let data: Data = Data { some_field: 0 };
// println!("{}", data.available_on_boxed()); // Will cause `no method named `available_on_boxed` found for struct `Data` in the current scope`
// Some other integrations
let vector_data = vec![1, 2, 34];
let boxed_vector_data: Box<[i32]> = vector_data.into_boxed_slice(); // Vector to dynamically allocated array (aka boxed slice)
println!("{:?}", dynamic_data.some_field); // Accessing field directly, without a need to dereference as Rust does it automatically (Box implementes Deref trait)
}
Why would we need such feature at all?
Becasue it opens new, better possibilties to design API. One of examples could be a recursive patterns where You can change objects in between. Imagine a pseudo code:
trait SomeRecursiveTrait {
fn next(self: Box<Self>) -> Box<dyn State>;
}
...
impl SomeRecursiveTrait for NodeA {
fn next(self: Box<Self>) -> Box<dyn SomeRecursiveTrait> {
Box::new(NodeA ::new(self.field1))
}
}
impl SomeRecursiveTrait for NodeB {
fn next(self: Box<Self>) -> Box<dyn SomeRecursiveTrait> {
Box::new(NodeC::new(self.field2))
}
}
This allows implementor to:
express that the object is consumed on each
nextcallHide a real type returning
dyn SomeRecursiveTraitand allow untyped abstraction and later use dynamic dispatch (viadyn Traitwhich will allow ie. storing it in containers likeVec)Keeps the trait object safe (
Box<dyn Trait>has known size at compile time so it’sSized)
Placement allocation
Unlike C++, where placement new (new(ptr) T()) is available, Rust does not have placement allocation concept (yet) in its implementation. This can cause some issues when one wants to create a big object on the heap. Consider this:
#[derive(Clone, Copy)]
struct SomeType {
value: u32,
}
struct BigData {
data: [SomeType; 32000],
}
impl BigData {
fn new() -> Self {
BigData {
data: [SomeType { value: 0 }; 32000],
}
}
}
fn main() {
let instance = Box::new(BigData::new());
}
What will happen is that first, we will create BigData instance on the stack , then, once Box allocates memory, this instance will be moved into heap memory. This will probably not cause more issues on x86_64 targets with Unix OS, but once you try to run it in a more constrained environment like QNX on aarch64, it will likely cause stackoverflow. One of the solutions would be to increase the stack size for the particular thread where the instance is created, but this is probably not what the developer wants, as his data should be on the heap! Unfortunately, there is no out-of-the-box solution for this problem, however, proper use of Box API still enables that. Let’s have a look:
use std::mem::MaybeUninit;
#[derive(Clone, Copy)]
struct SomeType {
value: u32,
}
struct BigData {
data: [SomeType; 32000],
}
impl BigData {
fn new() -> Self {
BigData {
data: [SomeType { value: 0 }; 32000],
}
}
fn placement_new(mut memory: Box<MaybeUninit<BigData>>) -> Box<Self> {
unsafe {
// !!!!!!
let data_ptr: *mut BigData = memory.as_mut_ptr(); // (3)
let first_elem: *mut SomeType = &raw mut (*data_ptr).data as *mut SomeType;
// In this casem, we will use an equivalent of memcpy to initialize our type
std::ptr::write_bytes(first_elem, 0xAB, (*data_ptr).data.len()); // (4)
memory.assume_init() // (5)
}
}
}
fn main() {
let instance = Box::new(BigData::new()); // Uses stack memory
let uninit_memory: Box<MaybeUninit<BigData>> = Box::new_uninit(); // (1)
let instance_no_stack: Box<BigData> = BigData::placement_new(uninit_memory);
// (2)
}
First, we create a Box that holds uninitialized memory (already allocated by system allocator) ready to accept our type (1). The sharp eye will catch that now our type is not Box, but Box<MaybeUninit>. The explanation for MaybeUninit would require a new post in this series. You only need to know for now that this is a piece of memory that is not initialized yet, and developers need to initialize it before claiming it really is. Then, filling of the area with actual data has to be coded by us (2) ! Next, we obtain a pointer to the region (3), fill it with our desired values (4), and then we claim that our data is ready to be used by assume_init (5). This way, we never put a byte from a BigData onto the stack. Those manipulations, especially a pointer derefernce between points 3 and 4, are inherently unsafe. That’s why they need to be wrapped into unsafe block. This will provide a clear indication to reviewers and readers that this piece of code in placement_new needs careful checking of all invariants to keep the rest of the code safe.
At the end, the purpose of this unsafe block is to allow certain manipulations of data, but also to check and remove all potential unsafe effects so that it can be used safely in the rest of the code base. This means that Box<Self> will still hold previous promises of not being null, being the only owner of allocated memory, or any other.
Summary
We have discussed the fundamental similarities and differences between std::unique_ptr and Box. As we can clearly see, thanks to the power of Rust, we can write the same low-level code as in C++, with the same overhead, while eliminating many fundamental problems during code development. This means that shipped code gains better quality immediately, takes less time to review, and does not introduce bugs that we would need to chase in production 😱.
Let me know what you think and what else you believe I should cover in this writing!
Notes unsafe
[1] - true, until using only safe Rust