State Machines III: Type States
— 2023-01-02
- setting the stage
- where type states run into trouble
- doing better with enums
- embedding in structs
- enum method unification
- enums as type params
- enum variant tracking
- narrowing of state modifications
- language feature overview
- conclusion
I've been writing a bit about state machines recently. In my last post I explained how we could potentially leverage arbitrary self-types and anonymous enums to support type-level state machines as a first-class construct. Which is cool, but it had some readers asking: "How does this improve on the existing type state pattern?" Which is fair!
Well, to start off by answering that question: it's only after authoring that post that I learned that "type states" have a name! I was familiar with the pattern, but I didn't realize it had a name. If you'd like to learn more about them, Will Chrichton gave an excellent presentation about API design which also covers type states.
Though type states are powerful, they're not the easiest to use - and can be limited in expressivity at times. In this post I want to show what some of those limitations are, and how we can over come them by encoding them using first-class type-level state machines instead.
editor's note: I tried capping this post off at 500 words and failed. I'm sorry, here's 2500 words instead.
Setting the stage
We'll continue using the example we've been using in the past two posts; a traffic light which can transition between states. We'll take the simpler variant of that for this post:
- green can transition to yellow
- yellow can transition to red
- red can transition to green
If we want to encode it with the type-state pattern, we have to create separate structs for each of the states, and then parameterize the traffic light struct over the state. This is roughly what that looks like (playground):
use std::marker::PhantomData;
struct Green;
struct Yellow;
struct Red;
struct TrafficLight<S> {
_phantom: PhantomData<S>
}
impl TrafficLight<Green> {
fn new() -> TrafficLight<Green> {
TrafficLight { _phantom: PhantomData }
}
}
impl TrafficLight<Green> {
fn next(self) -> TrafficLight<Yellow> {
TrafficLight { _phantom: PhantomData }
}
}
impl TrafficLight<Yellow> {
fn next(self) -> TrafficLight<Red> {
TrafficLight { _phantom: PhantomData }
}
}
impl TrafficLight<Red> {
fn next(self) -> TrafficLight<Green> {
TrafficLight { _phantom: PhantomData }
}
}
fn main() {
let light = TrafficLight::new(); // TrafficLight<Green>
let light = light.next(); // TrafficLight<Yellow>
let light = light.next(); // TrafficLight<Red>
let light = light.next(); // TrafficLight<Green>
}
This makes it so that if we attempt to call to_red
when we have a
TrafficLight<Green>
, the compiler will disallow it. Type states allow us to
encode program invariants into the type system, preventing it from causing
runtime bugs.
Where type states run into trouble
Type states become tricky when you start wanting to combine them with runtime logic. After all, a traffic light example is nice: but what if we're actually trying to run it? Take for example the following code, which can't be neatly represented using type states:
use futures_lite::prelude::*;
use futures_time::prelude::*;
use futures_time::time::Duration;
use futures_time::stream;
let mut light = TrafficLight::new();
let mut interval = stream::interval(Duration::from_secs(20))
while let Some(_) = interval.next().await {
light = light.next(); // Cycle to the next state every 20ms
}
We want to switch to the next state
every 20s, but this won't compile because
TrafficLight<Green>
and TrafficLight<Yellow>
aren't the same type, so they
can't be re-assigned to the same value. In order to do so we can match the
type states back to an actual enum, and wrap the type-state variant in it. This
could look something like this (playground):
use futures_lite::prelude::*;
use futures_time::prelude::*;
use futures_time::time::Duration;
use futures_time::stream;
enum TrafficLightWrapper {
Green(TrafficLight<Green>),
Yellow(TrafficLight<Yellow>),
Red(TrafficLight<Red>),
}
impl TrafficLightWrapper {
pub fn new() -> Self {
TrafficLightWrapper::Green(TrafficLight::new())
}
pub fn next(self) -> Self {
match self {
TrafficLightWrapper::Green(inner) => TrafficLightWrapper::Yellow(inner.next()),
TrafficLightWrapper::Yellow(inner) => TrafficLightWrapper::Red(inner.next()),
TrafficLightWrapper::Red(inner) => TrafficLightWrapper::Green(inner.next()),
}
}
}
let mut light = TrafficLightWrapper::new();
let mut interval = stream::interval(Duration::from_secs(20))
while let Some(_) = interval.next().await {
light = light.next();
}
This is a non-trivial amount of boilerplate, just to work with code which uses the type state pattern at runtime. 50 lines to encode just 3 states is probably too much to be practical for most uses, even if the resulting API is flexible enough to be used in a variety of scenarios. We should be able to do better than this!
Doing better with enums
Now, say we had access to
enum-variant-types
(enum
variants as first-class types), and
arbitrary-self-types
, we
could imagine a reformulation of the traffic light as follows:
//! A basic traffic light state machine
//! Novel language features: enum-variant-types, arbitrary-self-types
enum TrafficLight {
Green,
Yellow,
Red,
}
impl TrafficLight {
pub fn new() -> Self::Green {
Self::Green
}
pub fn next(self: Self::Green) -> Self::Yellow {
Self::Yellow
}
pub fn next(self: Self::Yellow) -> Self::Red {
Self::Red
}
pub fn next(self: Self::Red) -> Self::Green {
Self::Green
}
}
fn main() {
let mut light = TrafficLight::new(); // TrafficLight<Green>
light = light.next(); // TrafficLight<Yellow>
light = light.next(); // TrafficLight<Red>
light = light.next(); // TrafficLight<Green>
}
Because TrafficLight
is now a single type, it can be assigned to the same
variable without a problem. Paving over some details, we could imagine the
following could then work as well:
// ..imports..
let mut light = TrafficLight::new();
let mut interval = stream::interval(Duration::from_secs(20))
while let Some(_) = interval.next().await {
light = light.next(); // Cycle to the next state every 20ms
}
And that, in a nutshell, is why I believe that access to first-class state machines at the type level would be helpful. It would enable us to use type states in more places, which in turn would allow us to create more robust abstractions, which yet in turn lead to fewer bugs.
Embedding in structs
Being able to treat enums as type-level state machines has other benefits as well: it makes it so when you embed a state machine in another type, the state itself doesn't necessarily need to be observable from the outside either. Take the following code:
struct Green;
struct Yellow;
struct Red;
struct TrafficLight<S> {
_phantom: PhantomData<S>
}
// A new struct containing multiple traffic lights
struct Crosswalk<S1, S2> {
light1: TrafficLight<S1>,
light2: TrafficLight<S2>,
}
impl<S1, S2> Crosswalk<S1, S2> {
pub async fn run(self) {
// loop here.
}
}
The internal states for each of the traffic lights necessarily becomes externally observable, even if no external methods ever need to observe it. The internal state leaks to the external API surface, making it a leaky abstraction. We could of course define the wrapper type we saw earlier again. But say we did have first-class state machines, we could encode it directly like this instead:
enum TrafficLight {
Green,
Yellow,
Red,
}
// A new struct containing multiple traffic lights
struct Crosswalk {
light1: TrafficLight,
light2: TrafficLight,
}
impl CrossWalk {
pub async fn run(self) {
// loop here.
}
}
We could still choose to surface the enums to the outside world if we wanted to (more on that later in this post). But there is a real benefit to not having to if you don't want to, without needing to define pages of boiletplate!
Enum method unification
One assumption I've made is that if you have an enum with three states, and you define a method for all three states, then that method should become available even if we don't know which state we're in.
Using match
we can always find out what state we're in by inspecting the
enum's tag, meaning we can always figure out what state we're in and what state
we should transition to, at runtime.
This also means that, say, if a method is only available for 2/3 of the states,
as long as we can match
down to those states, the method should be available:
enum RGB { Red, Green, Blue }
impl RGB {
fn test(self: Self::Green | Self::Blue) {
println!("can print!");
}
}
fn try_to_print(rgb: RGB) {
match rgb {
RGB::Green | RGB::Blue => rgb.print(), // `print` is defined for both states here
_ => { /* do nothing */ }
}
}
Enums as type params
Even if we could hide the internal state sometimes, doesn't mean we'd always
want to do it. I think one other benefit of "enums as state machines" in this
model would be just to improve the existing type-state pattern. Take for example
my tasky
crate. In v3.0.0 we
distinguished between a "local builder" and "non-local builder" at the type
level. This was needed so we could add different IntoFuture
implementations to
them. This is how they're defined:
/// Sealed trait to determine what type of builder we got.
mod sealed {
pub trait Kind {}
}
/// A local builder.
pub struct Local;
impl sealed::Kind for Local {}
/// A nonlocal builder.
pub struct NonLocal;
impl sealed::Kind for NonLocal {}
/// Task builder that configures the settings of a new task.
pub struct Builder<Fut: Future, K: sealed::Kind> {
kind: PhantomData<K>,
future: Fut,
builder: ...,
}
That's a lot of work to encode the state 1. And just from glancing over the rustdoc output you'd never know that that's what this is for. Instead it'd be far nicer if we could define typestates using enums, and not actually have to bother with sealed traits and disjointed structs:
A note on sealed traits: this is the first time in the post we're using them, but they're useful to ensure that only types defined within the crate can ever be passed to it. If we had support for concrete types in bounds + enums as first-class types, then we'd probably have less need for sealed traits to be used in this way.
enum BuilderKind {
Local,
NonLocal,
}
/// Task builder that configures the settings of a new task.
#[derive(Debug)]
pub struct Builder<Fut: Future, K: BuilderKind> {
kind: K,
future: Fut,
builder: ...,
}
I believe doing this would require us to enable using concrete types as type-bounds. I believe this is something which comes up repeatedly, and we do want to resolve eventually for other reasons as well. Perhaps "better typestates" could be added to that list as well now?
Enum variant tracking
Another feature which would probably be helpful here would be something I call
"enum variant tracking". Take for example the following code, which inserts a
value into Option
if it doesn't exist, and then provides us with mutable
access to it:
let mut option = None;
if option.is_none() {
option = Some(12);
}
let num: &mut u32 = option.as_mut().unwrap(); // may panic
Sure, Option::unwrap_or_else
exists and does the same. But assume for a second
we've defined our own Option
, and we need to implement our own. We should
never have to unwrap
here, since we know from the code that we're
guaranteed to always have a Some
variant here. And the is_none
check is
redundant here, since we're guaranteed to always have a None
variant to start
with.
The idea is that if we know for a fact that we're dealing with a specific
variant, we should be able to skip the ceremony and checks to access that
specific variant. Instead of just knowing we have an Option
, with "enum
variant types" we should be able to accurately track that we in fact start with an
Option::None
, which always becomes Option::Some
after. The latter might be a
bit more tricky, but given both option
and is_some
are const
, we could
imagine that the compiler should be able to always know this.
To clarify though: while I think this feature might make sense from a
theoretical perspective, it may not make sense from a practical one. I'm not
trying to argue we should or shouldn't support this, more that it might be
within the realm of possibility if we get "enum variant types". This
feature would also only be intended for local reasoning only. Nothing should
ever cross function boundaries; other features should be available to enable us
to more granularly annotate things. I think if we ever start to get serious
about wanting to allow people to write panic-free code, this may become
interesting. I'm kind of reminded of the (unstable)
Error::into_ok
method, which is gated behind the unwrap_infallible
feature on nightly.
let mut option = None;
if option.is_none() {
option = Some(12);
}
let num: &mut u32 = option.as_mut().into(); // will never panic
narrowing of state modifications
Something we haven't yet covered, but probably should is the fact that all the
state transition things we've talked about only work with owned types. They
don't at all work with mutable references. When we're calling next
on an
iterator we get a &mut Self
type. When transitioning between state machines,
wouldn't it make sense for the state transitions to be able to mutate
Self
rather than always having to return a value?
I've saved this one for last, because this is where I really start to hesitate to be honest. All the features we've covered so far are in my opinion fairly natural extensions to what we already have. Subsets of enums, being able to use enums in more places, being able to narrow our views. What we're talking about here feels much further out there, and so I'm also way less sure whether it's a good idea.
The basic premise is as follows, we want to enable the following to work:
fn main() {
let mut light = TrafficLight::new(); // TrafficLight<Green>
light.next(); // TrafficLight<Yellow>
light.next(); // TrafficLight<Red>
light.next(); // TrafficLight<Green>
}
This means we'd need to find a syntax in the view type to not only describe what types go in, but also what the type will transition to. Niko's view type post didn't cover this, and my previous state machine post didn't cover this either. But at least conceptually, it seems to me that if we would enable a narrowing of function parameters and function return types, we might also want to provide a way to express a narrowing of value modifications.
I'm completely just riffing here, but maybe something like this? This is
supposed to indicate that after the function returns, the value of self
will
be Self::Yellow
.
impl TrafficLight {
pub fn next(self: Self::Green -> Self::Yellow) {
Self::Yellow
}
}
I'm really unsure about this. But I think if we end up doing anonymous enums and view types, then this might be a question we may want to find an answer for.
Language feature overview
To recap: here is the list of language features we've covered in this series so far. None of them are quite enough to bring us all the way, but if we combine them they might unlock the ability to improve the status quo of type states:
- enum variant types: Enable reasoning about enum members as first-class types. We can't specify say that a method returns a specific enum member. We can only ever return the full enum.
- arbitrary self-types / enum view-types: Enable declaring
Self: SomeEnum::Member
, allowing us to declare methods which are only available for certain enum variants. - anonymous enums / enum subsets: To enable us to declare sub-sets of enums using syntax
like:
Member1 | Member2
. For example, we may want to declare that a certain method can only return a specific sub-set of errors, or specific network status codes. - enum member unification: Say a method
next
is declared forSomeEnum::Blue
andSomeEnum::Red
. If we obtainBlue | Red
then we should be able to call that method in either case. From the perspective of the language it should be no different than if we'd have an enum with just those two members, and we'd have the method implemented for that entire enum. - enum variant tracking: If we know for a fact that we're dealing with a specific variant, we should be able to skip the ceremony and checks to access that specific variant.
- narrowing of state modifications: If we end up with anonymous enums and view types, we'll have syntax to express narrowings of both owned input and output function params. If that's the case, we may also find ourselves wanting to express narrowings of in-out params, which requires a way to express.
- concrete type bounds: So we can use
where S: SomeEnum::Member
directly as a bound, whereState
is an enum. Currently using sealed traits + individual structs seems like the best way to approximate this.
Conclusion
We've covered quite a number of language features so far. Just to be clear: I'm not trying to say that we should be implementing any of this, and even if we decided to: that we should be prioritizing this. I'm more interested in thinking out loud about how we could improve on type states, and summarizing which language features could be implemented to bring us closer to that.
With that in mind, I hope this made for an interesting read. Thanks for reading, and happy new year!
Thanks to Vagrant for proof reading this post!