Hoisting Expressions

Introduction

There is an RFC open on Rust which proposes what I’m calling hoisting expressions into the language. These are expressions which can be introduced inside of closures-only (for now), and are hoisted by the compiler to run before the rest of the closure does. To illustrate how they work, consider this example:

rust

print!("hello ");
hoist { print!("world!") };

Even though in the code we declared "hello " first and "world!" second, this will print: world!hello . In a way, you can think of hoist as the inverse of defer:

defer in languages declare statements which are run after exiting the scope.
hoist in Rust would declare expressions which are run before entering the scope.

If you’re familiar with variable hoisting from JavaScript: this feature goes beyond that, not just hoisting variable (var) declarations to the top of the function, but hoisting the entire expression-evaluation to run before the rest of the closure.

The proposal does not call it hoist

RFC 3968 proposes the addition of move($expr), not hoist-expressions. But that’s only a syntactic difference; functionally it is identical. If I’m not mistaken this is an evolution of an earlier notation based on postfix .use. Here is our earlier example written using the experimental move-expressions feature on nightly (playground):

rust

#![allow(incomplete_features, unused)]
#![feature(move_expr)]

fn main() {
    (|| {
        print!("hello ");
        move({ print!("world!")});
    })();
}

It’s not hard to see how you could add (|| move({ $expr }))(); to the inside of any function to be able to use move({ $expr }) anywhere. With some macro magic, we can even convert move($expr) to be written as hoist! { ... } instead:

rust

// Enables: `hoist! { println!("world!") };`
macro_rules! hoist { ($x:expr) => { move($x) };}

Also on the semantics: the way move desugars is not actually to the top of the closure, but instead to the outside of where the closure is initially created. That means that if we zoom out a level, the earlier code actually looks like this:

rust

#![allow(incomplete_features, unused)]
#![feature(move_expr)]

fn main() {
    print!("world!");
    (|| {
        print!("hello ");
    })();
}

That means that even if we were to never execute the closure, the hoisted expression will still be unconditionally run. However from the perspective of the closure body, all we need to know is that expressions which are defined later in the body are now executed before¹ the remainder of the body.

happens-before, anyone?

Arbitrary lookahead

My main worry here is that hoisting expressions change the evaluation order away from the source order. Closures defer execution of something to a later point in time. move($expr)-expressions travel backwards to execute something before they were ever declared. And that’s a problem, because that means we must know facts about code we haven’t even read yet to accurately reason about code we are evaluating in the present.

To illustrate this point, consider a 100-line closure. This takes a reference to a file and writes lines out to it. A question for the reader: what are the first three lines we’ve written to the file?

rust

let file = File::open("./my-file.md")?;
(|| {
    write!(&file, "first")?;
    write!(&file, "second")?;
    write!(&file, "third")?;
    // 97 more lines here
})();

Once we introduce hoisting expressions, the only correct answer becomes: “I cannot say, I would have to read the remaining 97 lines to be able to answer that.” That means that to accurately answer questions you have to look ahead and actually scan whether any non-local effects are being applied in the remainder of the scope.

It’s hard to gauge how common this will be, and how much people would actually end up abusing this in practice. Maybe Rust users will only use it to add the odd in-line .clone() into their code and that’s that. Maybe Rust’s borrow checking rules will prevent most bad cases, and this only becomes a hazard for operations which rely on internal mutability or are inherently shared (like files and stdio).

But if something can happen, with sufficient exposure it usually means it eventually will happen. Internal mutability is readily available, and All it takes is for people to start putting moderately complex things in hoisting expressions and all of a sudden what I’m describing becomes a reality. Likely? Hard to say. Possible? Definitely.

Defer is a library feature

In Rust the primitive for running items at the end of scopes are the Drop and Destruct traits, not a dedicated defer {} language feature. My operating theory is that a majority of the benefit of a hypothetical defer language feature could be captured by adding a library API that takes a closure which is run on drop at the end of the scope. That's why I started the process of adding std::mem::DropGuard last year. Using it looks like this:

rust

#![feature(drop_guard)]

use std::mem::DropGuard;

{
    // Create a new guard around a string that will
    // print its value when dropped.
    let s = String::from("Chashu likes tuna");
    let mut s = DropGuard::new(s, |s| println!("{s}"));

    // Modify the string contained in the guard.
    s.push_str("!!!");

    // The guard will be dropped here, printing:
    // "Chashu likes tuna!!!"
}

This implements the majority of defer {}’s functionality in maybe 20 lines of library code. It also solves some of the harder questions around defer {} which are usually left open like: “How would defer interact with lifetimes?” ¹ and “How can I conditionally prevent defer from executing?” ¹

DropGuard owns the type it wrapped and its handle implements Deref{,Mut} granting access to the internals.

DropGuard::dismiss returns the original T without running the drop guard.

So if defer {} can be implemented using a library type and a trait, is there perhaps a similar primitive mechanism to back hoist {}? Drop for us encodes parts of the object lifecycle, maybe there is a primitive trait available for hoist {} too.

The motivating example

This is the motivating example at the start of RFC 3968 with identifiers shortened:

rust

let a = self.a.clone();
let b = self.b.clone();
let c = self.c.clone();
tokio::task::spawn(async move {
    foo(a, b, c)
});

I would have expected this example to be used again later in the RFC to show how move improves on it. I believe that if we were to actually rewrite it, it would look like this:

rust

tokio::task::spawn(async move {
    foo(
        move(self.a.clone()),
        move(self.b.clone()),
        move(self.c.clone())
    )
});

Though if I'm being honest, I personally don't like multi-line arguments in functions like that, and I personally would be inclined to create intermediate variables for this:

rust

tokio::task::spawn(async move {
    let a = move(self.a.clone());
    let b = move(self.b.clone());
    let c = move(self.c.clone());
    foo(a, b, c)
});

How much better is this actually compared to what we started with? Does it justify all the tradeoffs this feature comes with? I’m having my doubts, and I see explicit capture clauses as a better general direction here. Not perfect yet either, but the thrust seems good and I think it can be improved upon ¹.

I see some common themes here with self-referential borrows and view/pattern-types. All of these use short notations for “paths into types” (selectors, anyone?). I feel like something like an inverse of the ref keyword in patterns (own) might remove the requirement that a feature like this must be able to support arbitrary expressions. I’d hope to write more about this, but I’m strapped for time right now.

Conclusion

Features like async/.await and yield feel like they are about making code feel more sequential when the underlying systems are actually not. In many ways hoist {} feels like it does the opposite, and that does worry me a little since other features like this (callbacks, variable hoisting, atomics, etc.) are notoriously difficult for people to wrap their heads around ¹.

Yes yes, non-linear programming features (async, concurrency, stuff like that) is most of what I talk about on this blog. But most of it is about making that kind of code look and feel more linear, not less.

RFC 3968 makes the following claim is made in “Rationale and Alternatives” which made me want to write this post:

Does move($expr) change evaluation order away from source order?

Arguably not more so than closures already do. The body of a closure is already deferred — none of it executes at the point where the closure expression appears in the source. […]

I hope this post sufficiently illustrates that move-expressions actually do fact change the evaluation order away from the source order. But only if you also consider the evaluation order of the content of the closure, and not just at the seam between the closure and the enclosing context. I think that's something worth looking more closely at and evaluating whether it is truly what we want.

Hoisting Expressions

Introduction

The proposal does not call it hoist

Arbitrary lookahead

Defer is a library feature

The motivating example

Conclusion

Does move($expr) change evaluation order away from source order?

Does `move($expr)` change evaluation order away from source order?