Uninit Read/Write
— 2021-12-07

The AsyncRead and AsyncWrite traits are async versions of the Read and Write traits in Rust. They're core to async Rust, providing the interface to read and write bytes from for example the filesystem and network. But both the async and non-async variants have an open issue: how can we use these traits to write data into uninitialized memory?

In this post we look at the problem of reading into unitialized memory using Read/Write, and at possible solution we could introduce to solve this.

Showing the problem

Both async and non-async Rust share the same issue for both Read and Write traits. So let's take use the non-async Read trait as our example. It's defined as follows:

pub trait Read {
    fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
}

Calling Read::read will read bytes into a mutable slice of memory, and return an io::Result containing either the number of bytes read, or an error. Usage typically is something like this:

// open a file and init the buffer
let mut f = File::open("foo.txt")?;
let mut buffer = vec![0; 1024];

// read up to 1024 bytes
let read = f.read(&mut buffer)?;
let data = buffer[..read];

This will read 1024 bytes of the file into the buffer. But it comes with a slight inefficiency: we're writing zeroes into the buffer, and then immediately writing more data into the buffer after that. This means we're doing a double-write, which is not as efficient as it could be (report). Ideally we'd be able to do the following:

// open a file and init the buffer
let mut f = File::open("foo.txt")?;
let mut buf = Vec::with_capacity(1024); // reserve capacity, but don't initialize

// read up to 1024 bytes
let read = f.read(&mut buf)?;
let data = buffer[..read];

But this doesn't work. Even if the capacity is 1024, because we haven't initialized the vector it's as if we passed an empty slice, and the following assertion will always hold:

assert_eq!(data.len(), 0);

And this is the problem this post is about: our slices can't have uninitialized memory, and our vectors which can have uninitialized memory are always dereferenced into slices first. In order to support writing into uninitialized memory over Read bounds we need to resolve this.

Solution 1: unitialized slices

The current thinking is that we can solve this issue by introducing "slices of unitialized memory" and extending the IO traits with methods dedicated to taking these slices. This was proposed and subsequently merged as part of RFC 2930: read_buf (tracking issue).

The RFC is worth a read if you want a closer look at the problem space. But the gist of the solution it proposes is as follows:

Add a new ReadBuf type which wraps [MaybeUninit::<u8>::uninit(); N];
Add an optional read_buf method which takes &mut ReadBuf<'_> instead of &mut [u8].
Users who want to read into uninitialized memory can implement and use read_buf.

The RFC provides the following usage example:

let mut buf = [MaybeUninit::<u8>::uninit(); 8192];
let mut buf = ReadBuf::uninit(&mut buf);

loop {
    some_reader.read_buf(&mut buf)?;
    if buf.filled().is_empty() {
        break;
    }
    process_data(buf.filled());
    buf.clear();
}

With the following definition for Read::read_buf provided by default, intended to be modified by implementers:

impl Read for MyReader {
    fn read_buf(&mut self, buf: &mut ReadBuf<'_>) -> io::Result<()> { 
        let n = self.read(buf.initialize_unfilled())?;
        buf.add_filled(n);
        Ok(())
    }
}

This achieves the goal it sets out to do: we successfully can read data into uninitialized memory. However for users it requires a little more work to setup:

// Read into uninit memory.
let mut buf = [MaybeUninit::<u8>::uninit(); 1024];
let mut buf = ReadBuf::uninit(&mut buf);
file.read_buf(&mut buf)?;
let data = buf.filled();

// Read into init memory
let mut buf = [0; 1024];
let read = file.read(&mut buf)?;
let data = buf[..read];

Overall this is not too bad, and gets us in the right direction. In order to write bytes to the heap instead of the stack I believe you can wrap the buf in a Box::new ¹ ². The main downsides are that it requires using a dedicated type just for this purpose, and it's slightly more verbose.

I believe with placement new (box keyword) this might save yet another copy.

Or as maybewaffle pointed out: this could be used with Vec::spare_capacity_mut as well.

Solution 2: uninitialized vectors

Since August 2020, the Vec::spare_capacity_mut method has been available on nightly. This returns the remaining spare capacity of the vector as a slice of MaybeUninit<T>. The docs show the following usage example:

let mut v = Vec::with_capacity(10);
let uninit = v.spare_capacity_mut();

uninit[0].write(0);
uninit[1].write(1);
uninit[2].write(2);

// Mark the first 3 elements of the vector as being initialized.
unsafe {
    v.set_len(3);
}
assert_eq!(&v, &[0, 1, 2]);

This API fills much the same role as the ReadBuf type did in the first solution, but without requiring a new type to be introduced. Usage of uninitialized vectors becomes a lot closer to the current Read::read behavior:

// Read into uninit memory.
let mut buf = Vec::with_capacity(1024);
let read = file.read_buf(&mut buf)?;
let data = buf[..read];

// Read into init memory
let mut buf = [0; 1024];
let read = file.read(&mut buf)?;
let data = buf[..read];

Implementers of "leaf" IO types (e.g. File, etc.) could use this method to implement a reader which reads directly into uninitialized memory. The default implementation of read_buf could become something like this:

impl Read for MyReader {
    fn read_buf(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> { 
        // zero-fill the unintialized memory
        let uninit = buf.spare_capacity_mut();
        mem::swap(uninit, &mut MaybeUninit::zeroed());
        unsafe { buf.set_len(buf.capacity()) };

        // read the data into the buffer
        self.read(buf)
    }
}

It's worth pointing out that the name read_buf at this point doesn't convey the intent particularly well anymore. A name such as read_to_capacity might more closely match our intent. To my knowledge: "Read into the uninitialized bytes of a Vec, but don't grow beyond it", doesn't have any precedent in the stdlib. The closest I can think of is Vec::fill, but it's not quite it. If we choose to use this API, method naming and documentation is something we should take a more closer look at.

Even reading into the stack becomes possible if we use Vec::with_capacity_in backed by a stack allocator. The ergonomics of that aren't ideal yet, but that's expected to significantly improve over time ³.

I know folks are actively looking at the ergonomics of this, and am excited for some of the designs I've seen.

As we mentioned, the main benefit of this approach is that it doesn't introduce a new type and as such maps nicely to existing patterns. Compared to the first solution we have better ergonomics when reading into the heap, and slightly worse ergonomics when reading into the stack. Overall I think this approach is promising, and I prefer it over the first solution.

Solution 3: specialization

Disclaimer: I'm by no stretch an authority on dynamic dispatch or specialization. It might be that I got details wrong, or misunderstood limitations. This section is "best effort", and am happy to update it if turns out I got something wrong.

Going back to our earlier example, an ideal solution would be if instead of defining a new read_buf methods we could keep using the read method instead:

// Read into uninit memory.
let mut buf = Vec::with_capacity(1024);
let read = file.read(&mut buf)?; // note: `read`, not `read_buf`
let data = buf[..read];

The way to do this in Rust would be using specialization. Specialization is still considered "unstable", and has not yet been fully formed. So don't expect the code below to work anytime soon, if ever. But if we squint a little, we could imagine a trait could be defined as follows:

pub trait Read {
    // The "default" implementation.
    default fn read(&mut self, buf: &mut [u8]) -> Result<usize>;

    // Implementation specialized for `&mut Vec<u8>`
    fn read(&mut self, buf: &mut Vec<u8>) -> Result<usize> {
        // zero-fill the unintialized memory
        let uninit = buf.spare_capacity_mut();
        mem::swap(uninit, &mut MaybeUninit::zeroed());
        unsafe { buf.set_len(buf.capacity()) };

        // read the data into the buffer
        self.read(buf.as_slice())
    }
}

In this example we define the trait Read with a "default" implementation for &mut [u8], and a built-in specialization for &mut Vec<u8>. The specialization needs to be built directly into the trait definition, to ensure that &mut Vec<u8> will never default to being interpreted as an zero-length slice. Trait implementers would be expected to provide their own specialization for &mut Vec<u8>, to make use of the ability to write directly into uninitialized memory.

Another thing that's unclear about specializations is the interactions with semver. In our example we're now using Vec::capacity rather than Vec::len to determine how many bytes to write into the vector. All observable changes in behavior have the potential to be breaking, so any modification to this behavior should always use crater to determine impact. But it doesn't seem like we would break any intended use of the APIs.

At first glance this interface might seem ideal. It's the same interface that just works depending on what's passed. But as remarked in RFC 2930: read_buf, this approach has issues:

It must be compatible with dyn Read. Trait objects are used pervasively in IO code, so a solution can't depend on monomorphization or specialization.

Further elaboration on this point exists in this document and this interview. My interpretation of these materials is: "Specialization is not dyn safe, so if we add a specialization dyn Read would no longer be possible". That would mean the if Read specialized as we showed, the following code would fail to compile:

// This alias would cause an error because `Read` is not dyn safe.
type MyType = Box<dyn Read>;

Intuitively I would've expected this to be possible if we wanted it to, even if it's currently not implemented in the compiler. But my understanding of dyn is limited compared to the folks who've written the docs, so I trust there are good reasons why they marked it as a non-option.

That said though; the constraints around dyn in Rust are currently being reworked in order to enable things like an AsyncIterator with async fn next to work. Which makes me wonder: is the interaction between dyn and specialization something which can be reworked as well?

I asked my colleague Sy whether combining dynamic dispatch with specialization is possible in C++, and it appears it is! They walked me through this C++ example which combines a dynamic dispatch based on the class it's called on and whether a vec or span (slice) is passed. Even though the code is very different from Rust, the functionality matches how we would expect it to work.

If we think about the wider implications of this, it seems not great if specializations and dynamic dispatch would inherently remain mutually exclusive. Both are incredibly useful, and if they can't be used together that seems incredibly limiting.

Summary

In this post we've shown three examples for reading into uninitialized memory in Rust. As I mentioned throughout the post: it'd be great if we could make reading into unitialized memory as close as possible to reading into initialized memory. And I think in particular the second, and third examples have a lot of potential.

I was motivated to write this post after conversations with nrc last week. I remembered reading Amanieu's comment about spare_capacity_mut on the tracking issue for RFC 2930, and started thinking of what it could look like if we used that instead of ReadBuf.

In terms of timelines, I don't believe we're under great pressure to land the features described in this post into the stdlib. The performance benefits can be significant under certain workloads; but as we're collectively moving towards completion-based IO, traits such as AsyncBufRead will become more relevant. Because these traits manage buffers internally, uninitialized memory never needs to cross trait boundaries and this post doesn't apply.

I hope I was able to provide some insight on how we can enable reading into uninitialized memory for (async) IO traits. If you liked this post and would like to see my cats, you can follow me on Twitter.

Thanks to Sy Brand for showing me how dynamic dispatch and specialization interact in C++, and nrc for helping review this post.