Async Iteration III: The Async Iterator Trait
— 2023-09-26
- fn poll_ vs async fn
- performance
- self-referential iterators
- unpin bounds
- implementation vs usage
- object safety
- "cancellation safety"
- pin ergonomics
- evolution
- conclusion
This post is part of the Async Iteration series:
- Async Iteration I: Async Iteration Semantics
- Async Iteration II: The Async Iterator Crate
- Async Iteration III: The Async Iterator Trait (this post)
Async Functions in Traits (AFIT) are in the process of being stabilized and I figured it would be a good time to look more closely at the properties they provide. In this post I want to compare AFIT-based traits with poll-based traits, using the "async iterator" trait as the driving example. But most everything which applies to async iterator, will also apply to other traits such as async read and async write.
In this post I will make the case that the best direction for the stdlib is to base its async traits on AFITs. The intended audience for this post is primarily my fellow members of WG-Async, as well as members of T-Lang and T-Libs. To read a summary of the findings jump ahead to the conclusion. This post assumes readers are familiar with the inner workings of Rust's async systems, as well as a familiarity of the tradeoffs being discussed.
fn poll_ vs async fn
To provide some flavor to what I'm talking about, in this post we'll be
discussing the "async iterator" trait, asking the question whether we should
base it on fn poll_next
or async fn next
. Here are both variants
side-by-side:
// Using `fn poll_next`.
trait AsyncIterator {
type Item;
fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>>;
}
// Using `async fn next`.
trait AsyncIterator {
type Item;
async fn next(&mut self) -> Option<Self::Item>;
}
I expect pretty much everyone will agree that on a first look the async fn next
-based trait seems easier to use. Rather than needing to think about what
Pin
is, or how Poll
works, we can just write our async functions the way we
usually do, and it will just work. Pretty neat!
But that's just on the surface. Does that still hold if we look more closely?
Concerns have been raised about the performance of async fn next
, claiming not
only would it perform less well. It's also alleged that async fn next
does not
provide essential features, even going so far to claim that fn poll_next
is
fundamentally lower-level and thus the only reasonable choice for a systems
programming language. In the remainder of this post we'll be going over those
claims, and show why upon closer examination they do not appear to hold.
Performance
Let's start with the most obvious one: performance. At its core Rust is a
systems programming language, and in order to properly cater to its niche it
tends to only provide abstractions which have comparable performance to their
hand-rolled versions. The claim is that poll_next
should provide better
performance than async fn next
since we're compiling it by hand. But when
actually measured, the two approaches appear to compile to identical assembly in
various configurations - meaning they will have identical performance.
But don't just take my word for it, we can use examples to substantiate this.
Let's create a simple "once" future which holds some data, and when polled it
will return that data. Rather than using complex async/.await machinery, we'll
be creating a new function poll_once
which constructs a dummy waker in-line
and can be used to poll a future exactly once:
pub fn call_once() -> Poll<Option<usize>> {
let mut iter = once(12usize);
poll_once(iter.next())
}
Let's start by evaluating what this looks like when implemented using fn poll_next
. We could write this as follows:
struct Once<T>(Option<T>);
impl<T> AsyncIterator for Once<T> {
type Item = T;
fn poll_next(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
// SAFETY: we're projecting into an unpinned field
let this = unsafe { Pin::into_inner_unchecked(self) };
Poll::Ready((&mut this.value).take())
}
}
When polled we project Self
into its fields, which is just the Option
type.
We then call .take
to extract the value, or panic if there is none. This
should be fairly straight forward. If poll_once
creates a non-atomic dummy
waker (this is just the first example), the compiler will compile this code down
to the following x86 assembly (compiler explorer):
example::call_once:
mov eax, 1
mov edx, 12
ret
This assembly basically means: "Hey I've got the constant '12' over here -
please move it into the return registry and then exit the function". That's
about the smallest this function can be without being inlined. Now let's see
what happens if we implement this code using async fn next
. Instead of fn poll_next
we can use an async function directly:
pub struct Once<T>(Option<T>);
impl<T> AsyncIterator for Once<T> {
type Item = T;
async fn next(&mut self) -> Option<T> {
self.0.take()
}
}
It's nice we don't have to perform pin projections anymore (more on that later). But what's the performance like? Well, if this was slower we'd expect it to generate more assembly. So let's take a look (compiler explorer):
example::call_once:
mov eax, 1
mov edx, 12
ret
The assembly is identical! Why is that? Well, for one: the Rust compiler is
pretty good at generating fast code. But we're also in a bit of a simplified
environment. So far in our examples we've not using "real" thread-safe wakers,
instead basing our wakers on Rc
. What happens if we switch to Arc
-based
wakers? Here's the link to a compiler
explorer comparing the two. It now generates a
lot more assembly than before (yay atomics), but luckily we can use diff(1)
to
compare the output:
example::call_once:
; 21 lines of assembly + calls to another 118 lines
yosh@MacBook-Pro scratch % pbpaste > one.rs
yosh@MacBook-Pro scratch % pbpaste > two.rs
yosh@MacBook-Pro scratch % diff one.rs two.rs
yosh@MacBook-Pro scratch %
The diff
output is empty, meaning there are no differences even if we Arc
s
to correctly construct our wakers, it just generates a lot more code. But okay
fine, maybe there are more differences? After all: fn poll_next
has access to
the Waker
and can return Poll
, meaning it has low-level control over the
future state machine while async fn next
does not. What happens if we want to
provide low-level control over the future state machine from async fn next
?
Luckily we've stabilized a simple mechanism for this already:
std::future::poll_fn
.
This function provides the ability to access the low-level internals of any
future, including AFITs. Let's lower our example to make use of this, shall we?
pub struct Once<T>(Option<T>);
impl<T> AsyncIterator for Once<T> {
type Item = T;
async fn next(&mut self) -> Option<T> {
future::poll_fn(|_cx| /* -> Poll<Option<T>> */ {
// We have access to `cx` here which contains the `Waker`.
Poll::Ready((&mut self.value).take())
}).await
}
}
This seems simple enough: whenever we want to do anything low-level inside of an
async fn
, we can use poll_fn
to drop into the future state machine. This
should work not just for the async version of the iterator trait, but for all
async traits. There is more to be said about how this interacts with pinning and
self-referential types, but we'll cover that in more detail later on in the
post. To close this out though: what does this compile
to if we call it using "real" wakers? (compiler explorer):
example::call_once:
; 21 lines of assembly + calls to another 118 lines
yosh@MacBook-Pro scratch % pbpaste > two.rs
yosh@MacBook-Pro scratch % diff one.rs two.rs
yosh@MacBook-Pro scratch %
That's right: the output remains the same. This gives us a pretty good clue
about what is happening here. Inside the compiler async fn next
is desugared
to a future, just like fn poll_next
is. And because of basic inlining and
const-folding optimizations, the resulting state machines are identical - which
means that the resulting assembly is identical too. This is exactly how
zero-cost abstractions are supposed to work, and is the entire premise of Rust's
async system. If we ever find
a case where the optimizer doesn't perform those basic optimizations we can then
treat that as a bug in the compiler - not a limitation of the design.
Self-Referential Iterators
When people say that "async iterator is not the async version of iterator" they are correct. Well, sort of. If we look at existing implementations that is true: it doesn't quite work like the async version of iterator. Instead what it really is is the async version of "pinned iterator". Which is not a trait we currently have, but there certainly is a case to be made for it. Instead it's better to ask whether async iterator should be the "async version of iterator" - and I certainly believe it should be 1.
Incidentally that has also been the framing of the trait WG-async has been communicating to T-lang and T-libs, who have signed off on it. I'm not suggesting that this decision should bind us (I don't like to rules lawyer). What I'm instead trying to show with this is that this has been an accepted framing of what the design should achieve for years now, and we've already rejected the framing that "async iterator" (or "stream") should be its own special thing. That certainly can be changed again, but it is not a novel insight by any stretch.
Let me explain what I mean by this using examples. In Rust the base iterator API
has an associated type Item
, a function next
which takes a mutable reference
to self
, and returns an Option<Self::Item>
:
trait Iterator {
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
If we did a direct translation to async Rust, we'd have an API which instead of
exposing an fn next
exposed an async fn next
. The only real difference here
is the addition of the async
keyword:
trait AsyncIterator {
type Item;
async fn next(&mut self) -> Option<Self::Item>;
}
However, when we look at the ecosystem Stream
trait,
or the currently unstable AsyncIterator
API they are
not implemented in terms of async fn next
. Instead they provide an fn poll_next
which takes both a pinned reference to self
, a mutable reference
to the waker context, and wrap the return type in Poll
:
trait AsyncIterator {
type Item;
fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>>;
}
In the previous section we've already discussed how you can get access to the
waker context from inside an async function by using poll_fn
. So we can pretty
much ignore the waker context and the Poll
in the return type. That leaves the
change in the self
type. Our async fn next
takes &mut self
, while this
variant takes Pin<&mut Self>
. This isn't necessary for the core functionality
of async iterator, since it is pinning more than needed. So simply put, what
we've just written is in fact the async version of this trait:
trait PinnedIterator {
type Item;
fn next(self: Pin<&mut Self>) -> Option<Self::Item>;
}
Here we have a non-async version of iterator which takes self as a pinned
reference. This is useful if you ever need to write an iterator which can
operate on self-referential structs. For example: if we ever start thinking of
stabilizing generator functions, we want them to be able to hold references
across yield
points. That will require self-referential iterators.
An important insight of this is that the question of whether iterator should be pinned is orthogonal to whether it is async. Which is illustrated by the fact that we can reformulate a "pinned asynchronous iterator" just fine using AFITs:
trait PinnedAsyncIterator {
type Item;
async fn next(self: Pin<&mut Self>) -> Option<Self::Item>;
}
This can be combined with the poll_fn
function as we showed in the previous
section to recreate the low-level semantics of fn poll_next
, providing access
to both a pinned self-type and the future's waker argument. To put it plainly:
"is async" and "is pinned" are orthogonal features, and fn poll_next
needlessly combines both.
Unpin Bounds
People occasionally ask me about the Unpin
trait when I talk about async
versions of traits. For example if you compare
Iterator::next
and
futures::stream::StreamExt::next
,
you will see that the latter has an extra where Self: Unpin
bound.
// `Iterator::next`
fn next(&mut self) -> Option<Self::Item>;
// `FuturesExt::next`
fn next(&mut self) -> Next<'_, Self>
where
Self: Unpin; // This is different
This extra Unpin
bound is only needed when a trait is implemented in terms of
poll functions - which by design take Pin<&mut Self>
. And so we need a way to
later on opt out of those bounds. You can see this same mechanism in action with
the other poll-based traits, such as
AsyncWriteExt::write
which also has a Self: Unpin
bound.
Instead if we recognize that the async counterparts to Rust's core traits don't
actually need to be pinned in order to be implemented, we can drop Pin<&mut Self>
from the signature. And since our type isn't pinned to begin with,
we no longer have to opt-out of it being pinned via Unpin
meaning all the
extra Unpin
bounds go away. You can see this in action in the
async-iterator
crate
which provides a diverse range of methods on async iterator, none of which
require additional Unpin
bounds to function.
Implementation vs Usage
To stay on the topic of API-shapes: one major downside of fn poll_next
is that
the "method to implement" and "method to call" are different methods. In the
regular iterator trait, there is only one method next
which is both
implemented and called. Instead the poll-API is only meant to be implemented,
and in virtually all cases the next
function is the one you want to call. This
is a major deviation of how all other traits work in the stdlib today.
This isn't just limited to async Iterator
either. Presumably we'd want to
adapt this approach for all traits in the stdlib. That means users of async
Rust would need to think of traits in the stdlib as somehow "different", and
remember that they cannot directly implement the methods they're calling. Among
others, the following APIs would be affected:
poll-based stdlib traits
trait name | to be implemented | to be called | is same? |
---|---|---|---|
async Read | fn poll_read | async fn read | ❌ |
async Write | fn poll_write | async fn write | ❌ |
async BufWrite | fn poll_fill_buf | async fn fill_buf | ❌ |
async Seek | fn poll_seek | async fn seek | ❌ |
Instead, if we base these traits on async fn
, the method to implement and the
method to call are identical. And as we've covered earlier, if anyone would want
to manually author a poll
-based state machine for any of these traits,
poll_fn
provides a uniform way to do so for all async traits:
async-fn based stdlib traits
trait name | to be implemented | to be called | is same? |
---|---|---|---|
async Read | async fn read | async fn read | ✅ |
async Write | async fn write | async fn write | ✅ |
async BufWrite | async fn fill_buf | async fn fill_buf | ✅ |
async Seek | async fn seek | async fn seek | ✅ |
This might seem like a minor point, but we have to consider that every deviation from existing norms is a point of friction for users. To zoom out slightly: I don't believe that async Rust inherently needs to be much more difficult than regular Rust. But the missing language features, combined with subtle differences like these, eventually add up and create an experience which is sufficiently different that the resulting system feels like an entirely different language. When in reality it does not need to be. For good measure here are the existing non-async stdlib traits:
non-async stdlib traits
trait name | to be implemented | to be called | is same? |
---|---|---|---|
Read | fn read | fn read | ✅ |
Write | fn write | fn write | ✅ |
BufWrite | fn fill_buf | fn fill_buf | ✅ |
Seek | fn seek | fn seek | ✅ |
Object Safety
So far we've only discussed the implementation side of the traits. However that
isn't the complete story, and we need to consider auto traits and other subtle
semantics too. So let's start looking at those, starting with
object-safety.
Out of the box poll
-based traits are dyn-safe. Say we wanted to implement a
poll-based version of async iterator which can produce an infinite number of
meows, we could create a dyn variant like so
(playground):
struct Cat;
impl AsyncIterator for Cat {
type Item = String;
fn poll_next(self: Pin<&mut Self>, _cx: &mut Context<'_>) -> Poll<Option<Self::Item>> {
Poll::Ready(Some("meow".to_string()))
}
}
fn dyn_iter() -> Box<dyn AsyncIterator<Item = String>> {
Box::new(Cat {})
}
This is the same as any other dyn-safe trait, and doesn't requiring any
additional steps. Nice! Now what happens if we try and rewrite it to use async fn next
. Well, we would probably try and write it like so
(playground):
#![feature(async_fn_in_trait)]
struct Cat;
impl AsyncIterator for Cat {
type Item = String;
async fn next(&mut self) -> Option<Self::Item> {
Some("meow".to_string())
}
}
fn dyn_iter() -> Box<dyn AsyncIterator<Item = String>> {
Box::new(Cat {})
}
However if we now try and compile this code we now get the following error:
error[E0038]: the trait `AsyncIterator` cannot be made into an object
--> src/lib.rs:16:22
|
16 | fn dyn_iter() -> Box<dyn AsyncIterator<Item = String>> {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `AsyncIterator` cannot be made into an object
|
note: for a trait to be "object safe" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
--> src/lib.rs:5:14
|
3 | trait AsyncIterator {
| ------------- this trait cannot be made into an object...
4 | type Item;
5 | async fn next(&mut self) -> Option<Self::Item>;
| ^^^^ ...because method `next` is `async`
= help: consider moving `next` to another trait
This would not happen if we were using the async-trait
crate, it only happens if we
use the AFIT language feature. But what exactly is going on here? In order for
async traits to work in dyn contexts, we need to find a place to store them
first. In the async-trait
crate this is always a Box
, but in stable Rust we
can't just do that because we've pinky-promised not to perform any "hidden"
allocations 2. Solving this is pretty complicated, and will require a
feature like
dyn*
to land. With dyn*
rather than calling Box::new
you'd need to call a "dyn
adapter" instead. For
example:
Though in the past we have used allocations in language features as
placeholders: that's how async/.await
was originally stabilized in 2019. It
wasn't until a year or so later that support landed for async functions which
didn't box in their desugaring.
fn dyn_iter() -> Box<dyn AsyncIterator<Item = String>> {
Boxed::new(Cat {}) // NOTE: using the `Boxed` dyn-adapter, not `Box`.
}
Needing to replace Box::new
with Boxed::new
is not a major difference. And
it's still unclear what the upper bound on the ergonomics of dyn async traits
are, since work on it has been paused for the past year in favor of stabilizing
AFIT first. But it's going to be pretty confusing if certain async traits
require a Box
, but other async traits require a Boxed
notation.
I believe the better direction is for all async traits in the stdlib to use the same mechanism to work with dyn, even if initially it's slightly less convenient at first. We can then gradually work on improving those ergonomics for all async traits, both in the stdlib and ecosystem. This ensures consistency and enables us to design solutions which are shared by the entire async ecosystem and preventing a possible permanent bifurcation of the async trait space.
"Cancellation Safety"
Another subtle outcome here is the interaction between async traits and the property has been historically called: "cancellation safety". I'm putting it in quotes because the property is not actually about whether it is "safe to cancel" a future. A few months ago I gave a talk about this, which I'll summarize in this section. I'll explain what "cancellation safety" is, how it currently falls short, and how may be able to fix those shortcomings. I particularly want to show how we can bring the "cancellation safety" property into the type system, which could enable async functions (including AFIT) to automatically provide it.
"Cancellation safety" refers to the ability to drop and recreate futures without any "side-effects" being observable. It's important to note that the word "safety" in this context has nothing to do with memory safety: in the absence of linear types, all types in Rust must be memory-safe to drop, including futures. The "safety" in "cancellation safety" refers to logical correctness only. Being able to drop and recreate futures is the most relevant to the select! macro which commonly does that in a loop. But it can also come in useful if you're ever authoring future state machines by hand.
/// # Cancel safety
///
/// This method is cancel safe. If you use it as the event in a
/// `tokio::select!` statement and some other branch completes
/// first, then it is guaranteed that no data was read.
As I've mentioned, "cancellation safety" as a property currently only exists in
documentation. This means that in order to learn whether you can drop and
recreate futures without any actions you need to consult the documentation for
the method. The difference between async fn next
and fn poll_next
is that
for the former the property will be documented on the implementation, whereas
for the latter the property will be documented on the trait definition.
That practically means that with fn poll_next
you'll be able to remember that
all calls to next
are "cancellation-safe". Whereas with async fn next
you'll need to remember that for each implementation (or more likely: remember
which impls aren't "cancellation-safe").
But we should be legitimately asking ourselves whether using documentation for
this is the best we can do. RTFM is not
really how we do things in the Rust project, instead much preferring if the
compiler can tell you when you've messed up, which can then explain what to do
instead. This is possible because of Rust's "type
safety" guarantee, and APIs such as
select!
are decidedly not type-safe right now. Which is why even after
successfully compiling code, people regularly find runtime bugs in their
select!
statements.
I believe a better direction would be to bring "cancellation safety" out of the
docs and into the type system. We could do this by introducing a new
subtrait,
which I'll refer to in this post as AtomicFuture
(but we can call it anything
we like really, the semantics are what I care about most). We already have some
precedent for this in the iterator family of traits, where for example the
unstable
TrustedLen
trait
provides additional guarantees on top of the existing Iterator
trait. I
imagine this would look something like this:
//! ⚠️ This uses a placeholder name and is intended as a design-sketch only. ⚠️
/// A future which performs at most a single async operation.
trait AtomicFuture: Future {}
One of the downsides of the current "cancellation safety" system is that we have to manually inspect anonymous futures whether they're cancellation-safe or not. I believe by bringing "cancellation safety" into the type system the compiler should be able to figure it out automatically if the following cases are met:
- No local state: The async context must contain at most one
.await
point. That way if a future is cancelled it will never occur between two.await
points. - Virality: All futures awaited inside the
async
context implement the trait.
Any async fn
or async {}
block would automatically be able to implement
AtomicFuture
if those requirements are upheld, which I believe is something
which shouldn't be too hard to figure out from the generated state machine
3. Maybe there is a case to be made for syntax for this too; I'm not
sure. But that's something we can figure out later.
Also: this is something we'll want to do regardless if we ever
get generator functions. It would be pretty bad if gen fn
couldn't
automatically implement marker traits on the type it returns.
// ✅ -> impl Future + AtomicFuture
async fn foo() -> u32 { 12 }
// ✅ -> impl Future + AtomicFuture
async fn foo<F: AtomicFuture>(fut: F) -> F::{
fut.await
}
// ❌ -> impl Future
async fn foo<F: Future>(fut: F) -> F::{
fut.await
}
// ❌ -> impl Future
async fn foo<F: AtomicFuture>(fut1: F, fut2: F) {
fut1.await;
fut2.await;
}
The compiler will only be able to automatically figure out whether
AtomicFuture
can be implemented for futures returned by async fn
and
async {}
. For manually implemented futures the author of the future will need
to uphold those guarantees themselves. The bar to meet there is that the future
should only perform a single operation, and then return. read
is a good
example of a "stateless future", while the future join
operation
is a good example of a future which is not.
As a basis I think this is pretty good, but there are still some cases we
haven't covered yet. For example tokio's Mutex::lock
operation:
only performs one operation, but when it's dropped and recreated it'll be the
last in the unlock queue again - which in pathological cases could lead to
certain unlocks never being scheduled. It's not marked "cancellation-safe" in
the tokio docs, but the reason for that is not covered by the rules we've
described so far.
I think practically we'll want to play around with the rules a little. I don't
think we want or even can practically nail down what a "side-effect" is in Rust.
But if we intentionally don't make the presence of AtomicFuture
a safety
invariant, we can probably get away by adding broad rules such as:
"Cancelling and recreating an
AtomicFuture
should not cause logic errors"
What does that exactly mean? That's up for interpretation. But that might be fine for our purposes. I think it's worth experimenting a little, so we can settle on something that works for us. Ideally a definition which can be automatically implemented by the compiler for async blocks and functions, but if we can't that's probably also okay. The key takeaway here should be that if we care enough about "cancellation-safety" to make it a guarantee of our public APIs, we should be trying to bring it into the trait system instead.
Pin Ergonomics
I think if we're to have an honest conversation about poll-based APIs, we should
acknowledge that working with Pin
is not a great experience for most people.
I'm a world-leading expert in async Rust, and four years post stabilization I
still regularly struggle to use it. I've seen folks on both T-lang and T-libs
struggle with it. I've seen very capable engineers at Microsoft struggle with
it. And if experienced folks struggle to use Pin, I don't think we can
reasonably expect those with less experience to use it without much problems either.
I've covered some of the unique difficulties of Pin
on this blog before. I
believe pin is hard to use in part because pin inverts Rust's usual semantics
via a combination of auto-traits and wrapper structs. Another reason is because
it relies heavily on the concept of "pin-projection" which requires unsafe
and
is not directly represented in either the language or the stdlib. Which in turn
means that even the most common interactions with Rust's pinning system rely
third-party ecosystem crates such as
pin-project and until
recently pin-utils.
We currently don't have a clear path to fix Pin's ergonomics. If we want Rust to
provide a way to make pin projections safe, we'll at least need to make a
breaking change to the Unpin
trait 4. As well as a whole lot of
design
work to
integrate it into the language. And while this may be something we'll want to do
eventually, we really need to ask ourselves whether this is something we want to
do now. Because taking on one project necessarily means not taking on another.
Safe pin projections in the language necessarily require that
Unpin
is an unsafe
trait. It's currently not unsafe, and changing it would
be a breaking change.
I honestly think async functions in traits, async drop, async iteration, and
async closures all are far more important things to work on right now, and they
should take priority in WG-async over directly fixing Pin's rough edges. The
most effective strategy to reduce the cost of Pin
we have in Rust right now is
to simply make pin less common. If Pin
is only needed to implement intrusive
collections, self-referential types, and manual future state machines then for
the time being we think of it as an experts-only system. And once we have more
bandwidth we can revisit it later to tidy it up.
Evolution
A thing that worries me in particular about fn poll_next
is its inability to
co-evolve with Rust's needs. As we've seen we're not just considering adding an
"async version of iterator", we're also talking about "a pinned version
of iterator". But there are other flavors of iterator being discussed as well,
such as: "a fallible version of iterator", "a lending version of
iterator", "a const version of iterator", and so on. It's impossible to
know today which requirements Rust will have ten years from now. All we really
know is that the decisions we make then will be affected by the decisions we
make today 5.
Ten years ago we were unsure whether it was even possible to publish a new systems programming language. Five years ago we were unsure whether Rust would be able to escape its browser niche and reach mainstream adoption. Flash forward to today, and Rust is being used in the hearts of major operating systems (Android, Linux, and Windows). As well as every packet on the internet more likely than not being routed through some Rust code at some point (Cloudflare, AWS, and Azure). I don't think we can reasonably anticipate all the requirements Rust will face ten years from now. So I believe one of the most important things for a programming language is to find ways to keep evolving the language in the face of changing requirements.
Say for a second we did add an fn poll_next
-based version of iterator. If we
wanted to say, add a lending version of iterator. Then we'd probably also want
to add a lending version of async iterator too. That would be four
separate families of traits, and that's just for three variants. If we actually
wanted to add support for "pinned iterators" or "fallible iterators" we'd be
looking at nine or maybe even seventeen different families of traits. And nobody
reasonably wants that.
This is the reason why I believe async fn next
is the better direction. If we
add it to the stdlib via an effect-generic mechanism, both "iterator" and "async
iterator" could be served using a single trait which is generic over the async
effect. But we could use the same mechanism for other traits such as Read
and
Write
, but also maybe some less common traits like Into
and Display
.
//! A version of iterator which can be implemented as either sync or async.
//! This uses placeholder syntax just to illustrate the idea.
#[maybe_async]
trait Iterator {
type Item;
#[maybe_async]
fn next(&mut self) -> Option<Self::Item>;
}
fn poll_next
forces us to duplicate existing interfaces in order to expose
those same capabilities. While async fn next
enables us to extend existing
interfaces with new capabilities. It doesn't immediately solve all of the
limitations of iterator; effect generics won't provide an answer for how to add
support for "lending" or "pinned" iterator variants. But it provides an answer
to some other problems, like how to add support for async, const, and
fallibility. And in doing so encourages us to find similar solutions for the
problems which remain, even if we don't yet know what they'll look like.
Conclusion
In this post I've presented a case for basing the async Iterator trait on async fn next
over fn poll_next
. To summarize the arguments:
- Performance: Iterators based on
async fn next
andfn poll_next
generate identical code, which means they have identical performance profiles. Implementations which need to access the low-level future state machine ofasync fn next
can do so usingpoll_fn
. - Self-referential iterators:
fn poll_next
is the async version ofPinnedIterator
,async fn next
is the async version ofIterator
. Adding a trait for pinned async iteration could be useful, but realistically it should mirror a synchronous "pinned iterator" design. And it could be written usingasync fn next
taking a pinned self-type. - Unpin bounds: The presence of extra
Unpin
bounds are a way in which async traits presently meaningfully deviate from their synchronous counterparts. "async iterator" meaningfully deviates from its synchronous counterparts. It makes it seem like deeper changes are needed to get to the same async functionality. By not requiringself: Pin<&mut Self>
, no additionalwhere Self: Unpin
bounds are required as seen on methods such asStreamExt::next
. - Implementation vs Usage:
async fn next
has one method to both implement and use.fn poll_next
requires two methods: one which must be implemented, and another which must be called. This is inherently more difficult to use, and as a mechanism is unique to poll functions other thanFuture
. - Object Safety:
async fn next
andfn poll_next
need to use subtly different mechanisms to create dynamically dispatched objects. Adding support for dynamic dispatching is still in-progress for AFITs, but that seems like it's mostly a matter of time. Neither approach is likely going to be much harder to use than the other, but the subtle differences may be difficult to internalize for users. Diagnostics seem like they'll play an important role. - "Cancellation Safety": "cancellation safety" is currently a
documentation-only property. If async iterator is based on
fn poll_next
users can blindly assume every implementation provides a cancellation-safenext
method. If async iterator is based onasync fn next
users will have to check the implementation's docs to learn whether thenext
method is "cancellation-safe". However, rather than keeping "cancellation-safety" as a documentation-only property, we should probably instead be working to bring into the type system instead. - Pin Ergonomics: The
Pin
family of APIs in Rust is notorious for being difficult to use. One of the most effective ways we have to reduce the difficulty of Pin is by limiting user's exposure to it. By basing Rust's core async traits on async functions we can reduce the number of pin-based APIs, making async Rust more accessible to more people. - Evolution:
async fn next
can be implemented as an extension to the existing iterator trait via effect generics.fn poll_next
would most need to be implemented as a standalone trait, resulting in a manual duplication of the APIs. Support for async is not the last feature we'll want to add to iterators: there are currently ecosystem demands to support self-referential iteration, lending iteration, and fallible iteration. It's not practical to add individual traits for all of these and their combinations. Nor is it reasonable we can anticipate all needs which may arise in the future. By extending rather than duplicating we
I believe basing the async Iterator trait on async fn next
to be superior
across all axes, and I hope this post sufficiently makes that case. In a future
post I'd like to round out this series by covering the desugaring of an async
iteration notation. Together with an RFC for effect-generic trait definitions,
this will be one of the last steps necessary step before we can re-RFC RFC
2996: Async Iterator
to cover the full scope of async iteration.
Thanks to Eric Holk for reviewing an earlier draft of this post.