Totality
— 2023-07-20
- total functions
- functions which cannot panic
- functions which guarantee termination
- const functions
- evolving totality
- conclusion
Recently Eric and I have been chatting with Daan Leijen, of Koka fame, discussing among other things the distribution of effects. Daan estimated that for a typical program written in Koka the distribution of effects would roughly be something like:
- ~70% of a program could be "total" (as in: has no effects)
- ~15% of a program would have to deal with exceptions or divergence (may error, may panic, or may loop indefinitely)
- ~5% of a program ends up dealing directly with IO resources
That leaves another ~10% for wiggle room and other effects. And I think the broad implications of this are fascinating. At least for Koka, according to Daan the vast majority of code could be written as entirely un-effectful. No panics, no infinite loops, no IO - just functions operating on input and producing output.
total functions
Perhaps you may have heard people refer to Haskell as a "pure" language.
Functions in Haskell may by default diverge (loop infinitely), and they may
throw exceptions. Koka on the contrary is what you could call a "total
language". By default functions are considered "total" 1, and are
only allowed to operate on their inputs, and produce outputs. In Koka if you
want to write a "pure" function, you write a function which makes use of the
exn
effect ("may throw exceptions") and div
effect ("may diverge"):
I believe terminology here is plucked from math/category theory, which I know little about - so I hope I'm doing this description justice.
// Has no side-effects ("total")
fun square_total(x: int): total int {
x * x
}
// Doesn't terminate + throws exceptions ("pure")
// "pure" is an alias for the `<div,exn>` effects
fun square_pure(x: int): pure int {
throw("oops");
x * square_pure(x)
}
In Rust however, we can't write total functions. While we are able to write functions which meet all the conditions to be total - we can't actually express in the type system that they're total. Meaning that by reading a function signature you have no idea whether a function will terminate, may panic, or perhaps open files - it doesn't say.
We get pretty close to the "pure" set of effects through the const fn
notation. It can't heap-allocate, it can't access IO, and can't access globals
("statics"). Right now it's also deterministic - in Koka if code wants to be
nondeterministic it needs to be marked as
ndet
. But in
order to declare "total" functions, we'd need to be able to strip the set
effects on functions further in Rust beyond what const fn
does:
- The ability to declare functions which will not panic
- The ability to declare function which "will not diverge" (guarantee termination)
So let's take a look at what that means, and what it would take to implement in Rust.
Functions which cannot panic
Rust distinguishes between "panic" for programmer-errors and "result" for runtime-errors. Both can be mapped into each other using a combination of:
result | panic | |
---|---|---|
Create | return Err(..) | panic_any / unwrap |
Consume | match {} | catch_unwind |
However, despite it being possible to convert between panics and errors - panics
themselves do not exist anywhere in the Rust type system. You cannot construct a
panic type and pass it around, you can only trigger a panic in-place. Similarly
functions don't return a Panic<T>
or impl Panic
when they can panic. Every
function can panic, and the ability to panic lives outside of the type system. 2
I don't think it's necessarily bad that panics live outside of the type
system though. I'm definitely not convinced that we should bring it into the
type system? Maybe it's closer to const
in the way that it just annotates the
function, but it doesn't live in the type system? idk.
"But Yosh?", I hear you asking, "Do people even want to write functions which can't panic?" And well dear reader, people certainly want to be able to do that. There's a thread full of people here discussing wanting the ability to opt-out of panics. For reasons such as better FFI (unwinding through FFI boundaries is generally unsound in Rust), support for embedded systems, and more. Formal verification tools such as Prusti and Kani advertise their ability to prove the absence of panics as their key features. And I know some folks working on Windows here at Microsoft who would love it if we could prove that code which should not panic in fact will never panic.
A basic form of "will not panic" notations can be implemented today via third party crates such as Dtolnay's no-panic, and Japaric's panic-never. But while it works okay, code like that is not composable and will eventually run into the effect sandwich problem. Instead the better approach is probably to bring it into the type system, and reason about it as an effect.
While a "do not panic" notation is sufficient to prove that a function won't panic, inside the function body we're then obliged to actually write code which won't panic. And even with notations that's hard because of a number of reasons:
- In Rust any operation may currently panic - opting into "will not panic" semantics would be a pretty big shift.
- Proving panic-freedom in a way that is also ergonomic requires introducing verification tools into the language. Pattern types backed by something like Flux could be one approach. But this is an incredibly rich topic with lots of challenges - many of which we don't know if we can overcome.
In my post on pattern types I go into this with some more detail on how we could deal with the first issue. Roughly said: if we were able to encode more details about the types in the types themselves, we could use that to prove to the compiler certain operations are fine:
// ❌ A basic doubling function in today's Rust
// combined with a "no_panic" notation
//
// Compiler error: this could panic because: u32::MAX + u32::MAX >= u32::MAX
#[no_panic] fn double(num: u32) -> u32 {
num + num
}
// ✅ A basic doubling function using a hypothetical "pattern type"
// notation combined with a "no_panic" notation
const N: u32 = u32::MAX / 2;
#[no_panic] fn double(num: u32[0..N]) -> u32[0..u32::MAX] {
num + num
}
functions which guarantee termination
In Rust by default functions do not guarantee termination. And that's not a bad property: we want to be able to write functions which don't guarantee termination, such as HTTP servers which keep listening indefinitely for requests, or other similar "main loops". Useful stuff.
But if every part of your program can loop indefinitely, it acts a little bit like if any part of your program has access to raw pointers or IO: it can be hard to track where exactly things are going wrong, because the potential site of it is undetermined.
Just like with the hypothetical "no_panic" notation, we could have some form of
a "nobody_loops" 3 notation. This could signal to the compiler
that a function will guarantee termination, and that actions which don't
terminate are disallowed. This one is undecidedly hard impossible to prove for
the general case, but if we wanted to do something practical we could probably
create a new unsafe marker trait like say BoundedIterator
which would be
automatically implemented for things like looping over ranges, looping over
items in collections, etc.
// ✅ This would compile fine since the loop in the body has an upper bound
#[nobody_loops] fn main() {
for n in 0..100 {
println!("{n}");
}
}
Similar to Rust's borrow-checking rules, it's unlikely any implementation for "will terminate" will be enough to ever allow the full set of hypothetically compliant programs. That requires proving termination in the general case, which I believe may be impossible. Instead the focus should be on a practical subset of programs which can provably terminate. This is another potential reason why generators as a first-class language feature would be useful. Because the compiler generates the iterator implementation, it would enable creating terminating iterators without manual unsafe trait impls:
// Made-up syntax. This would return:
// -> impl Iterator<Item = Cat> + BoundedIterator
#[nobody_loops] gen fn cats() -> Cat { .. }
Const Functions
In Rust the const
effect isn't so much an effect onto itself, as it is the
subtraction of effects from the "base" set of effects. As we mentioned: const
functions must (currently) be deterministic, don't get access to IO, don't get
access to statics, and can't allocate on the heap. And in return they gain the
ability to be evaluated during compilation.
But that's if we view it from the perspective of Rust's base set of effects. If we look at const from a "total" perspective, then "const" represents roughly represents the "may panic" and "may diverge" effects. And while we don't have any annotations for basic functions in Rust, they effectively act as a superset of "const", and also gain access to globals, the host environment (io), etc.
So if we take a position that const functions somehow subtract effects from "base Rust", then it is indeed the case that "const" can probably not be considered an effect. But if we take a look at the effects from a perspective of totality, then "const" and "base" contexts simply represent different sets of effects:
- total: no effects
- const: panic + diverge
- base 4: panic + diverge + allocate + host environment + globals
By "base Rust" I mean the capabilities a function has
when you write fn foo
with no other annotations, compiled for a std-capable
target. It doesn't have any specific effect annotations like const
or async
,
hence the "base" qualifier.
Reasoning about effects using a starting point of totality is also far simpler.
It might not necessarily map to the surface-level syntax Rust provides, but it
certainly makes it easier to reason about what happens when say, you have an
unsafe const fn
compared to a async gen fn
or something. Both have added
effects on top of "total", but while they slightly overlap they also differ
somewhat. This makes it easier to reason about than when only talking about
effect in terms of subsets or supersets, rooted in "base" Rust.
Evolving Totality
So far we've mainly focused at the "may panic" and "does not guarantee termination" effects. We've made the case that if only we had both those effects, we'd achieve totality. But that's not quite true. What "total" means for a language will differ depending on the language. In a way you think of "total" as the absence of all optional effects, but it includes all non-optional assumptions.
Over time we can discover we want more control over which assumptions are baked in and are optional. We've seen this with "const rust" already, which provides fewer effects than "base rust" does. And both "may panic" and "may diverge" are built-in assumptions which exist in Rust today, but could potentially be surfaced. But those aren't the only ones, there are potentially more such as:
⚠️ Reader alert: I'm not proposing we add any of these effects. These are just intended as illustrative examples for a point I'm working up towards the end of this section. ⚠️
- Access to host environment: every function in Rust assumes it has access to a host
environment, and it can just call things like
File::open
to access resources on the host. Embedded systems can't assume a host. The difference between the two is currently not encoded in the type system, and a notion of "total" may want to do away with the assumption a host is present. - Drop is not guaranteed to be called: today any function in Rust can freely
take any type and pass it to
mem::forget
or similar to leak them without calling their destructors. In a way you can think of every function having access to a leak effect, and a notion of "total" may want to exclude that. - Type conversions can be reversed: say a function takes an
impl Read
argument, using theAny
trait it's possible to obtain the underlying type, as long as the type name is accessible. This means there is no guarantee that a function will only operate on theimpl Read
interface, because it may in fact cast it back to the original type and operate on that instead. Being able to obtain the underlying type could be considered an effect which "total" may not want to have access to. - Type sizes can be accessed: every type has a size and a layout. Functions can learn information about a type by calling the size_of function on a type. A notation of totality could restrict obtaining any size or layout information about a type.
- Memory locations can be directly accessed: we assume types are backed by memory addresses in the address space. This means we can always find the memory address of a type. But what if we were to run, say, Rust on the JVM. The JVM doesn't expose an address space; it exposes objects and references. Which means that "type locations can be accessed" could be considered an effect, which could be absent from totality by default.
And there are probably lots more potential "effects" which fall outside of the notion of "totality" presented in this post so far. That's because in a way writing effects brings the assumptions we make about functions directly into the type system. And because our programs run on real physical hardware with memory models and operating systems, it's possible to map just about any detail of the underlying hardware into an effect. Even to the point of silliness, such as an effect which enables you to allocate stack space.
Choosing which effects to add means choosing which assumptions to surface. That is a real art, and requires thinking through the purpose and use cases of the effects. Ideally effects could even come in themes: for example "type conversions can be reversed" and "type sizes can be accessed" share a theme of "type privacy" - maybe they could be folded into each other. We've done something similar with "const" too, where we've combined "panic" and "diverge" into a single effect. What I'm trying to say is: what we mean by "total" depends on our current idea of what the language is and should do. And it's even something which can change over time as our model and needs of the language evolve.
Conclusion
Total functions go a step beyond "pure" functions because they also disallow code from panicking and require functions provably terminate. Effect-wise it represents the least a function could possibly do, so when talking about effects we can think of every other effect as "adding" effects on top of the "total" effect. This will also be helpful in case we ever manage to constify the majority of the stdlib, and want to switch the default set of implied effects across an edition bound. Reasoning about effects rooted in totality provides us with a model to reason about what would change.
As we're closing out here, something I did want to briefly touch on Rice's
theorem. It states that: "All
non-trivial semantic properties of programs are undecidable". In effect what
we're doing with effects is attempting to go against that. The unsafe
keyword
in Rust exists because the compiler can't prove memory safety for all
functions, just most of them but we need a way out for when we can't. Similarly
we won't be able to prove panic-freedom for all functions, or prove all
functions can terminate. But we can for a lot of functions; and statically
proving interesting properties of our programs is kind of what type systems are
about.
I find it fascinating that ~70% of code in a typical Koka program can be marked as total. For Rust there is a thriving research movement to build verifiers for the language, and one of the properties they often start by proving is panic-freedom for functions. I can't help but wonder how we could not just expose these kinds of guarantees through extensions to Rust, but actually by Rust itself. And I think in order to do that, we may want to start with by adding ways to reason about semantic properties such as "may panic" or "may diverge" as effects native to Rust.
Thanks to Eric Holk and Daan Leijen for reviewing this post prior to publishing, and chatting about effects over the past few months.