Why Async Rust
β€” 2022-09-26

  1. language hierarchy
  2. async under the hood
  3. rust's async features
    1. ad-hoc cancellation
    2. ad-hoc concurrency
    3. combining cancellation and concurrency
  4. performance: workloads
  5. performance: optimizations
  6. ecosystem
  7. conclusion

A lot of system design is about thinking of the nature of the domains we encounter. And only later, once we understand them, encoding this understanding in a way that machines can verify it.

I often find async Rust to be misunderstood. Conversations around "why async" often focus on performance 1 - a topic which is highly dependent on workloads, and results with people wholly talking past each other. While performance is not a bad reason to choose async Rust, we often we only notice performance when we experience a lack of it. So I want to instead on which features async Rust provides which aren't present in non-async Rust. Though we'll talk a bit about performance too at the end of this post.

1

The introduction to the Rust async book summarizes the benefits of async Rust as follows: "In summary, asynchronous programming allows highly performant implementations that are suitable for low-level languages like Rust, while providing most of the ergonomic benefits of threads and coroutines."

Language Hierarchy

It's not uncommon to hear Rust and other languages described as "N languages in a trenchcoat". In Rust we have Rust's control flow constructs, we have the decl-macro meta language, we we have the trait system (which is turing-complete), we have the cfg annotation language - and the list keeps going. But if we consider Rust as it is provided out of the box to us, as "base Rust" then there are some obvious modifiers to it:

All of these "modifier keyword" to the Rust language provide new capabilities which aren't present in "base Rust". But they may sometimes also take capabilities away. The way I've started thinking and talking about language features is in terms of "subset of the language" or "superset of the language". With that classification we can look at the modifier keywords again and make the following categorization:

unsafe Rust only adds the ability to use raw pointers. async only adds the ability to .await values. But const adds the ability to compute values during compilation, but removes the ability to use statics and access things like the network or filesystem.

If language features only add to base Rust then they're considered supersets. But if the features they add require they also restrict other features, then they're considered subsets. In the case of const, all const functions can be executed at runtime. But not all code which can be executed at runtime can be marked as const.

The design of language features as sub/supersets of "base" Rust is essential: it ensures that the language keeps feeling cohesive. And more so than size or scope, uniformity is what leads to the feeling of simplicity.

async under the hood

At its core async/.await in Rust provides a standardized way to return types with a method on them that returns another type. Instead of returning a type directly, async functions return an intermediate type first. 2

2

Sure; "monad" πŸ™ƒ

/// This function returns a string
fn read_to_string(path: Path) -> String { .. }

/// This function returns a type which eventually returns a string
async fn read_to_string(path: Path) -> String { .. }

/// Instead of using `async fn` we can also write it like this.
/// `impl Future` here is a type-erased struct
fn read_to_string(path: Path) -> impl Future<Output = String> { .. }

A future is just a type with a method on it (fn poll). If we call the method at the right times and in the right way, then eventually it'll give us the equivalent of Option<T> where T is the value that we wanted.

"A future is just a type with a method we can call" has a few implications. First of all, just like we can choose to call the method, we can also choose to not call the method. If we don't call the method at all then the future won't execute any work 3. Or we can choose to call it, and then stop calling it for a while. Maybe we just call the method again later. We can even choose to drop the struct at any point, and then there's no more method to call and no more value to be obtained.

3

Yes, that's only the case for async fn / async {}-futures. If you -> impl Future you can do work before constructing the future and returning it. But that's not considered a great pattern, and it's pretty rare in practice.

When we talk about "representing a computation in a type", we're actually talking about compiling down the async fn and all of its .await points into a state machine which knows how to suspend and resume from the various .await points. These state machines are just structs with some fields in them, and and have an auto-generated Future::poll implementation which knows how to correctly transition between the various states. To learn more about how these state machines work, I recommend watching "life of an async fn" by tmandry.

The .await syntax provides a way to ensure that none of the underlying poll details surface in the user-syntax. Most usage of async/.await looks just like non-async Rust, but with async/.await annotations sprinkled on top.

Rust's Async Features

The core feature async/.await provides in Rust is control over execution. Instead of the contract being:

> "function call" -> "output"

We instead get access to an intermediate step:

> "function call" -> "computation" -> "output"

The computation isn't just something which is hidden away from us anymore. With async/.await we are empowered to manipulate the computation itself. This leads to several key capabilities:

ad-hoc cancellation

The ability to suspend/cancel/pause/resume any computation is incredibly useful. Out of the three being able to cancel execution is likely the most useful one. In both sync and async code it's both desireable to halt execution before it completes. But what's unique to async Rust is that any computation can be halted in a uniform way. Every future can be cancelled, and all futures need to account for that 4.

4

Yes I'm fully aware of the concept of "cancellation-safety", and I have a post coming up discussing it in more detail. The tldr: "cancellation-safety" as a concept is under-specified, but importantly: "cancellation-safety" is only relevevant when using select! {}, which is something which should not be used. Correctly handling cancellation is something that manually authored futures still need to do though, and that can be tricky to do without "async Drop". But that's different from "cancellation-safety" or the idea that futures don't need to have to account for being cancelled at all.

ad-hoc concurrency

The ability to execute computations concurrently is another hallmark capability of async Rust. Any number of async fns can be run concurrently 5, and .awaited together. In non-async Rust concurrency is typically tied to parallelism: many computations can be scheduled concurrently by using thread::spawn and splitting it up that way. But async Rust separates concurrency from parallelism, providing more control 6. In non-async Rust concurrency and parallelism are interlinked, which among other things has performance implications. We'll talk more about the differences later in this post.

5

Bar any runtime faults such as deadlocks, which can occur if two computations are run concurrently but share a resource.

6

Not all libraries make use of the separation between "concurrency" and "parallelism" though. We're still very much figuring out what async Rust even is, but many of the libraries in common use today don't necessarily surface this insight. I know many of my own older libraries sure don't.

combining cancellation and concurrency

Now finally: when happens when you combine cancellation and concurrency? It allows us to do some interesting things! In my blog post "Async Time III: Cancellation and Signals" I go in-depth on some of the things you can do with this. But the canonical example here is: timeouts. A timeout is a concurrent execution of some future and a timer future, mapped to a Result:

That's cancellation + concurrency combined to provide a new third type of operation. To get a sense for why being able to time-out any computation is a useful property, I highly recommend reading Crash-Only Software by Candea and Fox 7. But it doesn't just stop at timeouts: if we combine any of the suspend/cancel/pause/resume capabilities with concurrency, we unlock a myriad of new possible operations.

7

Shout out to Eric Holk for introducing me to this paper!


These are the features async Rust enables. In non-async Rust concurrency, cancellation, and suspensions often require calling out to the underlying operating system - and it's not always supported. For example: Rust doesn't have a built-in way to cancel threads. The way to do it is usually to pass a channel to a thread, and periodically check it to see if some "cancel" message has been passed.

In contrast in async Rust any computation can be paused, cancelled, or run concurrently. That doesn't mean that all computations should be run concurrently, or everything should have a timeout on it. But those decisions can be made on the basis of what we're implementing, rather than being limited by external factors such as system call availability.

Performance: workloads

When something is stated to perform better than something else, it's always worth asking: "under which circumstances?" Performance is always dependent on the workload. In graphics cards benchmarks you'll often see differences between video cards based on which games are run. In CPU benchmarks it matters a lot whether a workload is primarily single-threaded or multi-threaded. And when we talk about software features "performance" is not a binary either, but highly dependent on the workload. When talking about parallel processing we can distinguish between two general categories of workloads:

Thoughput-oriented workloads usually care about processing the maximum number of things in the shortest amount of time. While latency-oriented workloads care about processing each thing as quickly as possible. Sounds confusing? Let's make it more clear.

An example of software designed with throughput in mind is hadoop. It's built for "offline" batch processing of workloads; where the most important design goal is to minimize the total CPU time spent to process data. When data is put into the system it may often take minutes or even hours to be processed. And that's fine. We don't care when we get the results (within reason of course), we primarily care about using as few resources as possible to get the results.

Compare that to a public-facing HTTP server. Networking is typically latency-oriented. We often care less about how many requests we can handle, than how quickly we can respond to them. When a request comes in we don't want to take minutes or hours to generate a response. We want request-response roundtrips to be measured in at most milliseconds. And things like p99 tail-latencies are often used as key performance indicators.

Async Rust is generally considered to be more latency-oriented than throughput-oriented. Runtimes such as async-std and tokio primarily care about keeping overall latency low, and preventing sudden latency-spikes.

Understanding what type of workload is being discussed is often the first step in discussing performance. A key benefit of async Rust is that most of the systems which use it have been tuned heavily to provide good performance for latency-oriented workloads - of which networking is an example. If you want to handle more throughput-oriented workloads, non-async crates like rayon are often a better fit.

Performance: optimizations

Async Rust separates concurrency and parallelism from each other. Sometimes the two are confused with each other, but they are in fact different:

parallelism is a resource, concurrency is a way of scheduling computation

It's best to think of "parallelism" as a maximum. For example, if your computer has two cores, the maximum amount of parallelism you have may be two 8. But parallelism is different from concurrency: computation can be interleaved on a single core, so while we're waiting on the network to perform work, we can run some other computations until we have a response. Even on single-threaded machines, computation can be interleaved and concurrent. And vice-versa: just because we've scheduled things in parallel doesn't mean computation is interleaved. No matter how many cores we're running on, if threads are taking turns by waiting on a single, shared lock, then the logical execution may in fact still happen sequentially.

8

This is a simplified example; for a longer explainer see the std::thread::available_parallelism docs.

Let's take a look at an example of a concurrent workload in async and non-async Rust. In non-async Rust it's most common to use threads to achieve concurrent execution 9. But since threads are also the abstraction to achieve parallel execution, it means that in non-async Rust concurrency and parallelism are often tightly interlinked.

In async Rust, we can separate concurrency from parallelism. If a workload is concurrent, it doesn't imply it is also parallelizable as well. This provides finer control over execution, which is the key strength of async Rust. Let's compare concurrency in non-async and async Rust:

9

Unless you start manually writing epoll(7) loops, and hand-roll state machines. At some point you may start thinking about creating abstractions to make these state machines compose better with each other, at which point you've basically arrived at futures again. Natively, in order to achieve: "I want to make this code run concurrently", threads are the easiest, most convenient abstraction available in non-async Rust.

// thread-based concurrent computation
let x = thread::spawn(|| 1 + 1);
let y = thread::spawn(|| 2 + 2);
let (x, y) = (x.join(), y.join()); // wait for both threads to return

// async-based concurrent computation
let x = async { 1 + 1 };
let y = async { 2 + 2 };
let (x, y) = (x, y).await;  // resolve both futures concurrently

This may seem like a pretty silly example: computation is synchronous and so both do the same thing, but the non-async variant has the overhead of needing to spawn actual threads. And it doesn't stop there: because the second example doesn't need threads, the compiler's inliner can kick in, and may be able to optimize it to the following 10:

// compiler-optimized async-based concurrent computation
let (x, y) = (2, 4);

In contrast, the best optimization the compiler can likely perform for the thread-based variant is:

// thread-based concurrent computation
let x = thread::spawn(|| 2);
let y = thread::spawn(|| 4);
let (x, y) = (x.join(), y.join()); // wait for both threads to return
10

Note that this exact example doesn't yet work in the optimizer, but that I don't believe there is any strong reason why it couldn't either. It's all local reasoning, with no cross-thread synchronization needed. The closest example I have for optimizations of this kind is this example of a hand-rolled version of block_on which compiles down to absolutely nothing. This cheats a little bit by removing not using an atomic-based Arc, so I'm not sure how realistic it is. But it's def something to aspire to, and I'm optimistic that as async Rust sees more usage, we'll see more async-specific optimizations as well.

Separating concurrency from parallelism allows for more optimizations of computations. async in Rust is basically a fancy way of authoring a state machine, and nested async/.await calls allow types to be compiled down into singular state machines. Sometimes we may want to separate the state machines though, but that's the sort of control which async Rust provides us with, which is harder to achieve using non-async Rust.

Ecosystem

Before we close out, we should point out one last reason people may choose async Rust: the size of the ecosystem. Without keyword generics it can be a lot of work for library authors to publish and maintain libraries which work both in async and non-async Rust. Often it's easiest to just publish either an async or non-async library, and not account for the other use case. A lot of the network-related libraries on crates.io use async Rust though, which means that libraries building on top of this will also use async Rust. And in turn people looking to build websites without rewriting everything from scratch will often have a larger ecosystem to choose from when using async Rust.

Network effects are real and need to be acknowledged in this context. Not everyone wanting to build a website will be thinking in terms of language features, but may instead just be looking at the options they have in terms of ecosystem. And that's a perfectly valid reason to use async Rust for too.

Conclusion

Sometimes conversations pop up about async Rust with suggestions such as: "What if we disallowed cancellation in its entirety?" 11 Seeing this always confuses me, because it seems it carries a fundamental misunderstanding of what async Rust provides and why it should be used. If the focus is solely on performance, features such as cancellation or .await annotations may seem like a plain nuisance.

11

The goal of this is often to have linear-futures, or "futures which are guaranteed to complete". The way by which cancellation would be disallowed is only the "stop polling" kind. You should still be able to pass channels around to cancel things, at least that's the theory. Though I believe there may also be implications for concurrency, which would make this really tough.

But if the focus is more on the features async Rust enables, things like cancellation and timeouts quickly rise from nuisance to key reasons to adopt async Rust. Async Rust grants us the ability to control execution in a way which just isn't possible in non-async Rust. And frankly, isn't even possible in many other programming languages featuring async/.await either. The fact that an async fn compiles down to a lazy state machine instead of an eager managed task is a crucial distinction. And it means that we can author concurrency primitives entirely in library code, rather than needing to build it into the compiler or runtime.

In async Rust the features it enables build on each other. Here's a brief summary of how they relate:

            Yosh's Hierarchy
        of Rust Async Capabilities
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
3.  β”‚   Timeouts, Timers, Signals   β”‚  …which can then be composed into…
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
2.  β”‚ Cancellation  β”‚  Concurrency  β”‚  …which in turn enable…
    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
1.  β”‚    Control over Execution     β”‚  The core futures enable…
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

We also briefly covered the performance aspect of async Rust. Generally speaking it can be more performant than non-async Rust when you're doing async IO. But that will mostly be the case when the underlying system APIs are geared for that, which usually includes networking APIs, and more recently has also started including disk IO.

Thanks to Iryna Shestak for proof reading this post and providing helpful feedback along the way.