Fallible Iterator Adapters
— 2020-04-12
A great aspect of the Rust stdlib is that many common operations are provided with a shared idiom. Rather than say having to remember how to write a correct sort function by hand, the stdlib provides a fast one for you.
One of the places with the most such conveniences is the
std::iter::Iterator
trait. This trait provides a rich set of operations for filtering, combining
and merging streams of items. And because iterators are lazy they're
incredibly efficient too 1.
Work doesn't start until a driver method calls it. Examples include
next
, fold
, and for_each
. Also because of the way Rust's iterators are
setup the compiler can inline them until only the smallest possible state
machine remains. Not all languages can do this, and the result is really cool!
However one downside to iterators is that they don't play well with all
input. In particular there are some limits around Result
(and Try
in
general). When using any stream adapter with fallible input2, we generally
want to be able to do several things:
- We want to stop iterating (short-circuit) when we encounter an error.
- We want to be able to return the error out of the iterator so it can be bubbled up.
- We never want to have to ignore errors.
Someone who understands monads better would probably be able to state this more eloquently. But I'm not a language designer, so hopefully I can get away with waving my hands a bit here.
For example with the try_for_each
function we're able to take a vec of
names, and perform some IO operation on them. This will stop iterating over
file names if an error occurs, and will immediately return the error:
use std::io::{stdout, Write};
let data = ["no_tea.txt", "stale_bread.json", "torrential_rain.png"];
data.iter().try_for_each(|x| {
writeln!(stdout(), "{}", x)?;
Ok(())
})?;
This works pretty smoothly. Now imagine we wanted to do something with IO operations. Say we had a folder for a database, and we wanted to compare the content of several files with each other, finding the smallest string by index 3.
Yes I know this example is completely unreasonable, but it's late
here and I had to come up with something that did some IO. min_by
is the
actual method a friend brough up when building a database though, so at least
we're hitting the right method here (:
Now we would likely want to use something like fs::read
and
Iterator::min_by
, which we could do as such:
let offsets = [1, 4, 9, 18, 22, 17]
let res = db_files.iter().min_by(|x, y| {
let fx = fs::read_file(format!(".db/{}.index"), x).unwrap();
let fy = fs::read_file(format!(".db/{}.index"), y).unwrap();
fx.cmp(fy)
});
Now as you might have noticed, we need to use unwrap
here. Oh no! What this
practically means is that if any error occurs when reading files our program
will panic. Which means that Iterator::min_by
probably shouldn't be used
here, and we should turn this into manual loop logic instead.
Surveying Fallible Adapters
In order to figure out which methods are incompatible with fallible
operations we should take a look at all methods on Iterator
. The only
methods that have the potential to be incompatible are the ones that take
closures.
We've created an overview of all iterator methods that take closures. Here we look at the iterator adapter name, the signature of the closure that's passed, and whether it can short-circuit on error. Finally we also look at whether it has a fallible counterpart that can short-circuit.
In the following overview there are probably ways to make certain methods
work with errors, or perhaps they work if Result
has been passed into the
method, but does not originate from the method. The measure we'll be using
is whether the ?
operator can be used inside the closure.
Name | Closure Signature | Accepts try operator? | Fallible counterpart |
---|---|---|---|
all | FnMut(Self::Item) -> bool | No | - |
any | FnMut(Self::Item) -> bool | No | - |
cmp_by | FnMut(Self::Item, <I as IntoIterator>::Item) -> Ordering | No | - |
eq_by | FnMut(Self::Item, <I as IntoIterator>::Item) -> Ordering | No | - |
filter | FnMut(&Self::Item) -> bool | No | - |
filter_map | FnMut(Self::Item) -> Option<B> | Yes | - |
find | FnMut(&Self::Item) -> bool | No | try_find |
find_map | FnMut(Self::Item) -> Option<B> | Yes | - |
flat_map | FnMut(Self::Item) -> impl IntoIterator | Yes | - |
fold | FnMut(B, Self::Item) -> B | No | try_fold |
for_each | FnMut(Self::Item) | No | try_for_each |
inspect | FnMut(&Self::Item) | Yes | - |
is_partitioned | FnMut(Self::Item) -> bool | No | - |
is_sorted_by | FnMut(&Self::Item, &Self::Item) -> Option<Ordering> | No | - |
is_sorted_by_key | FnMut(&Self::Item) -> impl PartialOrd | No | - |
map | FnMut(Self::Item) -> B | Yes | - |
max_by | FnMut(&Self::Item, &Self::Item) -> Ordering | No | - |
max_by_key | FnMut(&Self::Item) -> impl Ord | No | - |
min_by | FnMut(&Self::Item, &Self::Item) -> Ordering | No | - |
min_by_key | FnMut(&Self::Item) -> impl Ord | No | - |
partial_cmp_by | FnMut(Self::Item, <I as IntoIterator>::Item) -> Option<Ordering> | No | - |
partition | FnMut(&Self::Item) -> bool | No | - |
partition_in_place | FnMut(&T) -> bool | No | - |
position | FnMut(Self::Item) -> bool | No | - |
rposition | FnMut(Self::Item) -> bool | No | - |
scan | FnMut(&mut St, Self::Item) -> Option<B> | No | - |
skip_while | FnMut(&Self::Item) -> bool | No | - |
take_while | FnMut(&Self::Item) -> bool | No | - |
try_find | FnMut(&Self::Item) -> impl Try<Ok = bool, Error = E> | Yes | - |
try_fold | FnMut(B, Self::Item) -> impl Try<Ok = B> | Yes | - |
try_for_each | FnMut(Self::Item) -> impl Try<Ok = ()> | Yes | - |
Out of the 63 current stream adapters available, 30 take closures. Of those
30 adapters only 10 can bubble up a Result
or have a counterpart that can
do so. That means that 20 methods, or almost a third of the functionality in
Iterator
is currently incompatible with fallible operations.
Implementing missing adapters
If we wanted to expose the full range of functionality to fallible iterators,
we would likely need to create try_
counterparts to the stream adapters
that currently don't support it. This is a list of methods and their proposed
fallible counterparts:
Name | Fallible counterpart |
---|---|
all | try_all |
any | try_any |
cmp_by | try_cmp_any |
eq_by | try_eq_by |
filter | try_filter |
inspect | try_inspect |
is_partitioned | try_is_partitioned |
is_sorted_by | try_is_sorted_by |
is_sorted_by_key | try_is_sorted_by_key |
max_by | try_max_by |
max_by_key | try_max_by_key |
min_by | try_min_by |
min_by_key | try_min_by_key |
partial_cmp_by | try_partial_cmp_by |
partition | try_partition |
partition_in_place | try_partition_in_place |
position | try_position |
rposition | try_rposition |
scan | try_scan |
skip_while | try_skip_while |
take_while | try_take_while |
Conclusion
In this post we've looked at the various adapters std::iter::Iterator
provides, talked about error handling, and surveyed how existing methods
interact with them. Finally we've provided an overview of methods that
currently don't provide support for the try operator in the closure body, and
which counterparts would need to be added so they could work.
Some readers may be wondering now whether it's actually worth adding several
new methods to Iterator
, and whether that's perhaps too much. But I don't
think counting methods is a particularly effective way of making API
decisions 4. Instead it's better to look at overall cohesion and
usability.
I think a more useful framework for answering this design question is to ask: "Are we okay that a third of the methods on iterator only work some of the time?" I think the answer there should be a clear "no". If a person wants to perform IO operations while filtering some input, they should be able to, without needing to start over and write manual loops from scratch.
I've definitely done this a lot in the past, and I was wrong for doing so. Growth.