The gen auto-trait problem
— 2025-01-13
- leaking auto-traits from gen bodies
- preventing the leakage
- what about other effects and auto-traits?
- conclusion
One of the open questions surrounding the unstable gen {}
feature is whether
it should return Iterator
or IntoIterator
. People have had a feeling there
might be good reasons for it to return IntoIterator
, but couldn't necessarily
articulate why. Which is why it was included in the "unresolved questions"
section on the gen blocks RFC.
Because I'd like to see gen {}
blocks stabilize sooner, I figured it would be
worth spending some time looking into this question and see whether there are
any reasons to pick one over the other. And I have found what I believe to be a
fairly annoying issue with gen
returning Iterator
that I've started calling
the gen auto-trait problem. In this post I'll walk through what this problem is,
as well as how gen
returning IntoIterator
would prevent it. So without
further ado, let's dive in!
Leaking auto-traits from gen bodies
The issue I've found has to do with auto-trait impls on reified gen {}
instances. Take the thread::spawn
API: it takes a closure which returns a type
T
. If you haven't seen its definition before, here it is:
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static,
This signature states that the closure F
and return type T
must be Send
.
But what it doesn't state is that all local values created inside the closure
F
must be Send
too - which is common in the async world. Now let's create an
iterator using a gen {}
block that we try and send across threads. We can create an iterator that yields a single u32:
let iter = gen {
yield 12u32;
};
Now if we try and pass this to thread::spawn
, there are no problems. Our
iterator implements + Send
and things will work as expected:
// ✅ Ok
thread::spawn(move || {
for num in iter {
println("{num}");
}
}).unwrap();
Now to show you the problem, let's try this again but this time with a gen {}
block that internally creates a std::rc::Rc
. This type is !Send
, which means
that any type that holds it will also be !Send
. That means that iter
in our
example here is now no longer thread-safe:
let iter = gen { // `-> impl Iterator + !Send`
let rc = Rc::new(...);
yield 12u32;
rc.do_something();
};
That means if we try and send it across threads we'll get a compiler error:
// ❌ cannot send a `!Send` type across threads
thread::spawn(move || { // ← `iter` is moved
for num in iter { // ← `iter` is used
println("{num}");
}
}).unwrap();
And that right there is the problem. Even though iterators are lazy and the Rc
in our gen {}
block isn't actually constructed until iteration begins on the
new thread, the generator block needs to reserve space for it internally and so
it inherits the !Send
restriction. This leads to incompatibilities where
locals defined entirely within gen {}
itself end up affecting the
public-facing API. This ends up being very subtle and tricky to debug, unless
you're already familiar with the generator desugaring.
Preventing the leakage
Solving the gen's auto-trait problem is not that hard. What we want is for the
!Send
fields in the generator to not show up in the generated type until we
are ready to start iterating over it. That sounds a little scary, but in
practice all we have to do is for gen {}
to start returning an impl IntoIterator
rather than an impl Iterator
. The actual Iterator
will still be !Send
, but our type IntoIterator
will be Send
:
let iter = gen { // `-> impl IntoIterator + Send`
let rc = Rc::new(...);
yield 12u32;
rc.do_something();
};
Since our value iter
implements Send
we can now happily pass it across
thread bounds. And our code continues to operate as expected, as for..in
operates on IntoIterator
which is implemented for all Iterator
:
// ✅ Ok
thread::spawn(move || { // ← `iter` is moved
for num in iter { // ← `iter` is used
println("{num}");
}
}).unwrap();
If you think about it, from a theoretical perspective this makes a lot of sense
too. We can think of for..in
as our way of handling the iteration effect,
which expects an IntoIterator
. gen {}
is the dual of that, used to create
new instances of the iteration effect. It's not at all strange for it to return
the same trait that for..in
expects. With the added bonus that it doesn't leak auto-traits from impl bodies to the type's signature.
What about other effects and auto-traits?
This issue primarily affects effects which make use of the generator
transform. That is: iteration and async. In theory I believe that, yes, async {}
should probably return IntoFuture
rather than Future
. In WG Async we regularly see issues that have to do with auto-traits leaking from inside function bodies out into the type's signatures. If async
(2019) would have returned IntoFuture
(2022) rather than Future
(2019) that certainly seems like it could have helped. Though there would be a higher barrier to make that change today now that things are already stable.
On the trait side this doesn't just affect Send
either: it applies to all
auto-traits, present and future. Though Send
is by far the most common trait
people will experience issues with today, to a lesser extent this also already
applies to Sync
. And in the future possibly also auto-traits like
Freeze
,
Leak
, and
Move
. Though this
shouldn't be the main motivator, preventing potential future issues is not
worthless either.
Conclusion
Admittedly the worst part of gen {}
returning IntoIterator
is that the name
of the trait kind of sucks. IntoIterator
sounds auxiliary to Iterator
, and
so it feels a little wrong to make it the thing we return from gen {}
. But on top of that: that's a lot of characters to write.
I wonder what would have happened if we'd taken an approach more similar to
Swift. Swift has
Sequence
which is
like Rust's IntoIterator
and
IteratorProtocol
which is like Rust's Iterator
. The primary interface people are expected to
use is short and memorable. While the secondary interface isn't meant to be
directly used, and so it has a much longer and less memorable name. As we're
increasingly thinking of IntoIterator
as the primary interface for async
iteration, maybe we'll want to revisit the trait naming scheme in the future.
In conclusion: it seems like a good idea to prevent auto-traits from leaking from gen bodies when reified. This is the kind of issue that if we can prevent we should prevent, as it can be both difficult to diagnose and annoying to work around. Having auto-trait leakage be a non-issue for generator blocks seems worthwhile.