State Machines: Introduction
— 2020-03-30
Every now and then I like to think about topics that don't directly relate to my daily work. One of those topics has been parsers: back in January I wrote about byte-ordered stream parsing and how I think we can improve streaming parsing in Rust with a small set of API additions.
I've recently been thinking of how to improve beyond that, and one topic of interest has been state machines. 1 Much of parsing logic includes sequences of: "If you see character X, read until character Y, then look for character Z." This sequence can best be described as a "transition between states", and state machines are a common technique/structure/pattern to describe these. 2
Back in December I was exploring parsers, and got a feel for the difference between "parser combinators" and "parser generators". Mountain Ghosts (James Coglan) pointed out on Twitter that there's another variant of parsers: Prolog's "Definite clause grammars". After reading the Wiki entry on it it felt like it definitely looked like state machines. And I happen to know that the compiler uses Prolog for type resolution. So I got to wonder: if Rust has state machines as part of the language, would that get us closer to being able to express parsers in a manner not too dissimilar from definite clause grammars? I bet someone with more knowledge about these things could say more on this.
It's 23:05 on a Monday here. I've been thinking about state machines all weekend for no good reason. So I'm writing this post in a single take to get it out of my mind and into a structured form before I lose my mind.
What are state machines?
In Rust states are often described as enums: a boolean can be true
or
false
. An Option
can be Some
or None
. State machines describe
the relations between states. For example a traffic light can go from red to
green, but never from red to yellow - it has to pass through green first 3.
In Germany traffic lights go green -> yellow -> red -> yellow -> green
, but let's pretend they're only green -> yellow -> red -> green
.
A graph depicting 4 states and 5 transitions, drawn by @_lrlna for our "Graphs" post.
State machines not only encode which transitions are valid, in doing so they also encode which transitions are invalid. For example in a financial system there might be some important check we need to perform before we declare success. A state machine would allows us to encode that the "success" state must be preceded by a "verifying" state.
An interesting property that I personally value a lot in state machines is how much utility they provide in active codebases. Instead of encoding relations in boolean logic, they can be extracted into their own site. And in doing so they become more resilient against emergent bugs.
State machines in Rust today
Now that we've looked at what state machines are, and hopefully convinced you of their usefulness, let's take a look at how to implement them in Rust. Much of what I'm saying here was inspired by Hoverbear's "Rust state machine pattern" post, which I recommend reading.
The ideal scenario for a state machine is to encode it entirely at compile time: that way if our program compiles we can be sure that the only state transitions occurring are the ones we've defined as being allowed.
The way to make this work in Rust today is through type params. If we take our "traffic light" example we mentioned earlier, we could encode it in Rust today as:
#[derive(Debug)]
pub struct Green;
#[derive(Debug)]
pub struct Yellow;
#[derive(Debug)]
pub struct Red;
#[derive(Debug)]
struct State<S> {
_inner: S
}
impl State<Green> {
pub fn new() -> State<Green> {
State { _inner: Green{} }
}
}
impl State<Green> {
pub fn next(self) -> State<Yellow> {
State { _inner: Yellow {} }
}
}
impl State<Yellow> {
pub fn next(self) -> State<Red> {
State { _inner: Red {} }
}
}
impl State<Red> {
pub fn next(self) -> State<Green> {
State { _inner: Green {} }
}
}
fn main() {
let state = State::new(); // green
let state = state.next(); // yellow
let state = state.next(); // red
let state = state.next(); // green
dbg!(state);
}
As you can see calling next
changes the state from one variant to another.
And even though the method might look the same and exist on the same struct:
they're very much different.
If we try and call a method with the incorrect state the compiler will provide some really good messages around this. Say, we wanted to make sure bikes can only go when the light is green. The following example tries to allow bikes in a "red" state, which provides a nice compilation error.
fn main() {
let state = State::new(); // green
let state = state.next(); // yellow
let state = state.next(); // red
allow_bikes(&state);
let state = state.next(); // green
}
fn allow_bikes(state: &State<Green>) {
todo!();
}
Compiling playground v0.0.1 (/playground)
error[E0308]: mismatched types
--> src/main.rs:42:17
|
42 | allow_bikes(&state);
| ^^^^^^ expected struct `Green`, found struct `Red`
|
= note: expected reference `&State<Green>`
found reference `&State<Red>`
error: aborting due to previous error
However one big limitation of this is that this pattern cannot store the state inside another struct; it can only exist on the stack this way. So we cannot do the following:
struct Foo<S> {
state: State<S>,
}
The moment we initialize Foo
to take e.g. Green
as its parameter, it can
now no longer switch to Red
in safe Rust. This is what enums are for, and
unfortunately we can't use those here.
Promising state machine ecosystem crates to check out that work today are: machine and state_machine_future.
Future Directions
The P programming
language has state
machines as a first-class construct in the language. It defines states using
the state
keyword and uses on
, goto
, and raise
keywords to switch
between states 4. After seeing this I got to wonder: "Would it be possible for
Rust to have first-class state machines as part of the language with minimal
changes?" And well, I think, maybe?
I also found the Plaid language which talks about state machines a bunch, but most of it is written as PDFs, and source tarballs, so I didn't really research it further.
Before we continue, the usual disclaimer: I'm not a language designer nor on any lang team. This is all speculative. This all comes from a place of interest, and am by no means an authority on the matter.
So the pitch is: "If enums are a way of describing state, and methods are a way of describing behavior, would it be possible to put the two together to create a state machine?"
Currently enums have the limitation that its members aren't fully qualified types. But imagine for a second we they were. What that would mean is that we could reason about enum members as if they were full types.
Now imagine we could use enum members as arbitrary self types, and return enum members from functions 5. We could then rewrite our traffic example like this:
This probably needs specialization too. Insert hand waving / jazz hands here.
enum State {
Green,
Yellow,
Red,
}
impl State {
pub fn new() -> Self::Green {
Self::Green
}
pub fn next(self: Self::Green) -> Self::Yellow {
Self::Yellow
}
pub fn next(self: Self::Yellow) -> Self::Red {
Self::Red
}
pub fn next(self: Self::Red) -> Self::Green {
Self::Green
}
}
fn main() {
let mut state = State::new(); // green
state = state.next(); // yellow
state = state.next(); // red
state = state.next(); // green
}
As you can see this nicely tucks state back into a single enum, and uses named transitions to switch between one state to the other. This makes enums the single source of both states and transitions between states.
Additionally methods could not only return Self
: they could return results
or futures as well; enabling various kinds of transitions to occur. And
diagnostics could probably be quite good here as well:
Compiling playground v0.0.1 (/playground)
error[E0308]: mismatched types
--> src/main.rs:42:17
|
42 | allow_bikes(&state);
| ^^^^^^ expected `State::Green`, found `State::Red`
|
= note: expected reference `&State::Green`
found reference `&State::Red`
error: aborting due to previous error
However, there are a lot of big speculative "if"'s attached here. Is it possible to fully reason about enum members in this way? Can this even be made sound? Implementing this would leave a lot of questions that need to be answered by people with domain expertise.
Conclusion
In this rather hastily written post we've shared what state machines are, how they can be authored in Rust today, their limitations, and speculated on language extensions to make them easier to author.
In principle it's already possible to write state machines in Rust today. The compiler will verify them for you, and provide helpful messages when things go wrong. It's somewhat of a testatement to how useful linear types are. However they come with severe limitations, and don't always feel the most natural to write.
What I like about the future direction I've sketched in this post is that it looks like a fairly natural extension to today's Rust. But would enable compiler-checked general-purpose state machines to exist in Rust, which is something that doesn't seem common in programming languages.
What I like about the direction we've outlined in this post so far is that it sketches something that feels like a fairly natural extension to what Rust is like today.
Maybe this post will prove to be inspirational for first-class state machines in Rust at some point in the future. Or possibly I'm saying things that are strictly impossible. Either way, this has been a fun thought exercise, and figured it might be fun to share 6. Thanks for reading7!
In these trying times.