Totality
— 2023-07-20

Recently Eric and I have been chatting with Daan Leijen, of Koka fame, discussing among other things the distribution of effects. Daan estimated that for a typical program written in Koka the distribution of effects would roughly be something like:

~70% of a program could be "total" (as in: has no effects)
~15% of a program would have to deal with exceptions or divergence (may error, may panic, or may loop indefinitely)
~5% of a program ends up dealing directly with IO resources

That leaves another ~10% for wiggle room and other effects. And I think the broad implications of this are fascinating. At least for Koka, according to Daan the vast majority of code could be written as entirely un-effectful. No panics, no infinite loops, no IO - just functions operating on input and producing output.

total functions

Perhaps you may have heard people refer to Haskell as a "pure" language. Functions in Haskell may by default diverge (loop infinitely), and they may throw exceptions. Koka on the contrary is what you could call a "total language". By default functions are considered "total" ¹, and are only allowed to operate on their inputs, and produce outputs. In Koka if you want to write a "pure" function, you write a function which makes use of the exn effect ("may throw exceptions") and div effect ("may diverge"):

I believe terminology here is plucked from math/category theory, which I know little about - so I hope I'm doing this description justice.

// Has no side-effects ("total")
fun square_total(x: int): total int {
    x * x
}

// Doesn't terminate + throws exceptions ("pure")
// "pure" is an alias for the `<div,exn>` effects
fun square_pure(x: int): pure int {
    throw("oops");
    x * square_pure(x)
}

In Rust however, we can't write total functions. While we are able to write functions which meet all the conditions to be total - we can't actually express in the type system that they're total. Meaning that by reading a function signature you have no idea whether a function will terminate, may panic, or perhaps open files - it doesn't say.

We get pretty close to the "pure" set of effects through the const fn notation. It can't heap-allocate, it can't access IO, and can't access globals ("statics"). Right now it's also deterministic - in Koka if code wants to be nondeterministic it needs to be marked as ndet. But in order to declare "total" functions, we'd need to be able to strip the set effects on functions further in Rust beyond what const fn does:

The ability to declare functions which will not panic
The ability to declare function which "will not diverge" (guarantee termination)

So let's take a look at what that means, and what it would take to implement in Rust.

Functions which cannot panic

Rust distinguishes between "panic" for programmer-errors and "result" for runtime-errors. Both can be mapped into each other using a combination of:

	result	panic
Create	`return Err(..)`	`panic_any` / `unwrap`
Consume	`match {}`	`catch_unwind`

However, despite it being possible to convert between panics and errors - panics themselves do not exist anywhere in the Rust type system. You cannot construct a panic type and pass it around, you can only trigger a panic in-place. Similarly functions don't return a Panic<T> or impl Panic when they can panic. Every function can panic, and the ability to panic lives outside of the type system. ²

I don't think it's necessarily bad that panics live outside of the type system though. I'm definitely not convinced that we should bring it into the type system? Maybe it's closer to const in the way that it just annotates the function, but it doesn't live in the type system? idk.

"But Yosh?", I hear you asking, "Do people even want to write functions which can't panic?" And well dear reader, people certainly want to be able to do that. There's a thread full of people here discussing wanting the ability to opt-out of panics. For reasons such as better FFI (unwinding through FFI boundaries is generally unsound in Rust), support for embedded systems, and more. Formal verification tools such as Prusti and Kani advertise their ability to prove the absence of panics as their key features. And I know some folks working on Windows here at Microsoft who would love it if we could prove that code which should not panic in fact will never panic.

A basic form of "will not panic" notations can be implemented today via third party crates such as Dtolnay's no-panic, and Japaric's panic-never. But while it works okay, code like that is not composable and will eventually run into the effect sandwich problem. Instead the better approach is probably to bring it into the type system, and reason about it as an effect.

While a "do not panic" notation is sufficient to prove that a function won't panic, inside the function body we're then obliged to actually write code which won't panic. And even with notations that's hard because of a number of reasons:

In Rust any operation may currently panic - opting into "will not panic" semantics would be a pretty big shift.
Proving panic-freedom in a way that is also ergonomic requires introducing verification tools into the language. Pattern types backed by something like Flux could be one approach. But this is an incredibly rich topic with lots of challenges - many of which we don't know if we can overcome.

In my post on pattern types I go into this with some more detail on how we could deal with the first issue. Roughly said: if we were able to encode more details about the types in the types themselves, we could use that to prove to the compiler certain operations are fine:

// ❌ A basic doubling function in today's Rust
// combined with a "no_panic" notation
//
// Compiler error: this could panic because: u32::MAX + u32::MAX >= u32::MAX
#[no_panic] fn double(num: u32) -> u32 {
    num + num
}

// ✅ A basic doubling function using a hypothetical "pattern type"
// notation combined with a "no_panic" notation
const N: u32 = u32::MAX / 2;
#[no_panic] fn double(num: u32[0..N]) -> u32[0..u32::MAX] {
    num + num
}

functions which guarantee termination

In Rust by default functions do not guarantee termination. And that's not a bad property: we want to be able to write functions which don't guarantee termination, such as HTTP servers which keep listening indefinitely for requests, or other similar "main loops". Useful stuff.

But if every part of your program can loop indefinitely, it acts a little bit like if any part of your program has access to raw pointers or IO: it can be hard to track where exactly things are going wrong, because the potential site of it is undetermined.

Just like with the hypothetical "no_panic" notation, we could have some form of a "nobody_loops" ³ notation. This could signal to the compiler that a function will guarantee termination, and that actions which don't terminate are disallowed. This one is ~~undecidedly hard~~ impossible to prove for the general case, but if we wanted to do something practical we could probably create a new unsafe marker trait like say BoundedIterator which would be automatically implemented for things like looping over ranges, looping over items in collections, etc.

https://github.com/rust-lang/rust/pull/19964

// ✅ This would compile fine since the loop in the body has an upper bound
#[nobody_loops] fn main() {
    for n in 0..100 {
        println!("{n}");
    }
}

Similar to Rust's borrow-checking rules, it's unlikely any implementation for "will terminate" will be enough to ever allow the full set of hypothetically compliant programs. That requires proving termination in the general case, which I believe may be impossible. Instead the focus should be on a practical subset of programs which can provably terminate. This is another potential reason why generators as a first-class language feature would be useful. Because the compiler generates the iterator implementation, it would enable creating terminating iterators without manual unsafe trait impls:

// Made-up syntax. This would return:
// -> impl Iterator<Item = Cat> + BoundedIterator
#[nobody_loops] gen fn cats() -> Cat { .. }

Const Functions

In Rust the const effect isn't so much an effect onto itself, as it is the subtraction of effects from the "base" set of effects. As we mentioned: const functions must (currently) be deterministic, don't get access to IO, don't get access to statics, and can't allocate on the heap. And in return they gain the ability to be evaluated during compilation.

But that's if we view it from the perspective of Rust's base set of effects. If we look at const from a "total" perspective, then "const" represents roughly represents the "may panic" and "may diverge" effects. And while we don't have any annotations for basic functions in Rust, they effectively act as a superset of "const", and also gain access to globals, the host environment (io), etc.

So if we take a position that const functions somehow subtract effects from "base Rust", then it is indeed the case that "const" can probably not be considered an effect. But if we take a look at the effects from a perspective of totality, then "const" and "base" contexts simply represent different sets of effects:

total: no effects
const: panic + diverge
base ⁴: panic + diverge + allocate + host environment + globals

⁴

By "base Rust" I mean the capabilities a function has when you write fn foo with no other annotations, compiled for a std-capable target. It doesn't have any specific effect annotations like const or async, hence the "base" qualifier.

Reasoning about effects using a starting point of totality is also far simpler. It might not necessarily map to the surface-level syntax Rust provides, but it certainly makes it easier to reason about what happens when say, you have an unsafe const fn compared to a async gen fn or something. Both have added effects on top of "total", but while they slightly overlap they also differ somewhat. This makes it easier to reason about than when only talking about effect in terms of subsets or supersets, rooted in "base" Rust.

Evolving Totality

So far we've mainly focused at the "may panic" and "does not guarantee termination" effects. We've made the case that if only we had both those effects, we'd achieve totality. But that's not quite true. What "total" means for a language will differ depending on the language. In a way you think of "total" as the absence of all optional effects, but it includes all non-optional assumptions.

Over time we can discover we want more control over which assumptions are baked in and are optional. We've seen this with "const rust" already, which provides fewer effects than "base rust" does. And both "may panic" and "may diverge" are built-in assumptions which exist in Rust today, but could potentially be surfaced. But those aren't the only ones, there are potentially more such as:

⚠️ Reader alert: I'm not proposing we add any of these effects. These are just intended as illustrative examples for a point I'm working up towards the end of this section. ⚠️

Access to host environment: every function in Rust assumes it has access to a host environment, and it can just call things like File::open to access resources on the host. Embedded systems can't assume a host. The difference between the two is currently not encoded in the type system, and a notion of "total" may want to do away with the assumption a host is present.
Drop is not guaranteed to be called: today any function in Rust can freely take any type and pass it to mem::forget or similar to leak them without calling their destructors. In a way you can think of every function having access to a leak effect, and a notion of "total" may want to exclude that.
Type conversions can be reversed: say a function takes an impl Read argument, using the Any trait it's possible to obtain the underlying type, as long as the type name is accessible. This means there is no guarantee that a function will only operate on the impl Read interface, because it may in fact cast it back to the original type and operate on that instead. Being able to obtain the underlying type could be considered an effect which "total" may not want to have access to.
Type sizes can be accessed: every type has a size and a layout. Functions can learn information about a type by calling the size_of function on a type. A notation of totality could restrict obtaining any size or layout information about a type.
Memory locations can be directly accessed: we assume types are backed by memory addresses in the address space. This means we can always find the memory address of a type. But what if we were to run, say, Rust on the JVM. The JVM doesn't expose an address space; it exposes objects and references. Which means that "type locations can be accessed" could be considered an effect, which could be absent from totality by default.

And there are probably lots more potential "effects" which fall outside of the notion of "totality" presented in this post so far. That's because in a way writing effects brings the assumptions we make about functions directly into the type system. And because our programs run on real physical hardware with memory models and operating systems, it's possible to map just about any detail of the underlying hardware into an effect. Even to the point of silliness, such as an effect which enables you to allocate stack space.

Choosing which effects to add means choosing which assumptions to surface. That is a real art, and requires thinking through the purpose and use cases of the effects. Ideally effects could even come in themes: for example "type conversions can be reversed" and "type sizes can be accessed" share a theme of "type privacy" - maybe they could be folded into each other. We've done something similar with "const" too, where we've combined "panic" and "diverge" into a single effect. What I'm trying to say is: what we mean by "total" depends on our current idea of what the language is and should do. And it's even something which can change over time as our model and needs of the language evolve.

Conclusion

Total functions go a step beyond "pure" functions because they also disallow code from panicking and require functions provably terminate. Effect-wise it represents the least a function could possibly do, so when talking about effects we can think of every other effect as "adding" effects on top of the "total" effect. This will also be helpful in case we ever manage to constify the majority of the stdlib, and want to switch the default set of implied effects across an edition bound. Reasoning about effects rooted in totality provides us with a model to reason about what would change.

As we're closing out here, something I did want to briefly touch on Rice's theorem. It states that: "All non-trivial semantic properties of programs are undecidable". In effect what we're doing with effects is attempting to go against that. The unsafe keyword in Rust exists because the compiler can't prove memory safety for all functions, just most of them but we need a way out for when we can't. Similarly we won't be able to prove panic-freedom for all functions, or prove all functions can terminate. But we can for a lot of functions; and statically proving interesting properties of our programs is kind of what type systems are about.

I find it fascinating that ~70% of code in a typical Koka program can be marked as total. For Rust there is a thriving research movement to build verifiers for the language, and one of the properties they often start by proving is panic-freedom for functions. I can't help but wonder how we could not just expose these kinds of guarantees through extensions to Rust, but actually by Rust itself. And I think in order to do that, we may want to start with by adding ways to reason about semantic properties such as "may panic" or "may diverge" as effects native to Rust.

Thanks to Eric Holk and Daan Leijen for reviewing this post prior to publishing, and chatting about effects over the past few months.

Totality— 2023-07-20