refcounts
— 2021-05-07

MyData I've wondered about for a while now is how to improve the ergonomics of some of the structures in std::sync. There's quite a bit of excitement gathering about ghost-cell (and the paper), and I figured this would be as good a time as any to share some thoughts (questions?) on ergonomics.

Like my last post, the intent of this post is less to provide a solution or result of research, but instead share my current thinking on the topic. Rather than asking questions in private, I figured I'd ask questions on here instead (:

std::sync::Barrier

Let's use std::sync::Barrier to center our examples around. This is a data structure which enables synchronization between any number of threads. One example of how you might want to use this is in an HTTP server. You may want to create a connection to a database, connect to a cache server, connect to perhaps some other backend, and only when all of those have succeeded do you proceed to accept connections. If one of them takes a while to initialize (or even fails entirely), our end-users won't be impacted because none of their requests are being handled yet.

This structure enables multiple threads to synchronize before starting work. Its use usually looks something like this:

use std::sync::{Arc, Barrier};
use std::thread;

let mut handles = Vec::with_capacity(10);
let barrier = Arc::new(Barrier::new(10));
for _ in 0..10 {
    let c = Arc::clone(&barrier);
    // The same messages will be printed together.
    // You will NOT see any interleaving.
    handles.push(thread::spawn(move|| {
        println!("before wait");
        c.wait();
        println!("after wait");
    }));
}
// Wait for other threads to finish.
for handle in handles {
    handle.join().unwrap();
}

We create an instance of Barrier by passing it a count, wrap it in an Arc (so it can be cloned), and then pass an instance of it to each thread. Important to note is that the number of threads we spawn needs to exactly line up to the number we've apssed into Barrier::new. If the number is too high our application will lock up and never continue. If the number is too low, the application will start before all threads are ready, defeating the point of using the structure in the first place.

dynamic Barrier

One difficulty with std::sync::Barrier is that we need to statically pass the number in. This means in order to create the synchronization structure we need to know up front how many items we're trying to synchronize. Say we're dealing with an iterator of unknown length, and we want to exhaust all items before continuing? We don't know the number up front, so we can't use the structure:

use std::thread;
use std::sync::{Arc, Barrier};

fn from_iter(iter: impl Iterator<Item = MyData>) {
    let mut handles = Vec::new();
    let barrier = Arc::new(Barrier::new(/* ??? */));
    for item in iter {
        // ... create a thread and wait within
    }
    // ... join all handles
}

Counting at runtime is not something exceptional though. In our example Arc works entirely at runtime, making work without passing a count ahead of time. Though we don't have a dynamic version of Barrier in the stdlib, we do have one in the ecosystem in the form of: crossbeam::sync::WaitGroup. With it we can rewrite our previous example like so:

use std::thread;
use crossbeam::sync::WaitGroup;

fn from_iter(iter: impl Iterator<Item = MyData>) {
    let mut handles = Vec::new();
    let wg = WaitGroup::new();    // 1. create a new instance.
    for item in iter {
        let wg = wg.clone();      // 2. clone it
        handles.push(thread::spawn(move|| {
            println!("before wait");
            wg.wait();            // 3. wait to sync
            println!("after wait");
        }));
    }

    wg.wait();                    // 4. also sync our original wg instance

    // ... join all handles
}

The upside of using WaitGroup over Barrier is that it's much easier to use, and more widely applicable. The downside is that it's (marginally) less efficient since we need to count the instances at runtime rather than having that information up front 1.

1

One could argue that manually passing the count up front can be an extra safeguard - but there's inherent risks to mis-counting the number of instances. So I'm not sure that's much of an extra safeguard. More on that later.

I don't want to argue for including WaitGroup in the stdlib since it's unclear how common the use of Barrier is. But I hope to have illustrated that Barrier has limitations in how it can be used to create cross-thread synchronization points.

const Barrier

As we've seen in the first example there's another issue: we need to ensure that the number passed to Barrier::new matches the number of threads we spawn. One way we could fix this is by creating a variable, and passing that same variable to both the loop and Barrier::new:

let count = 10;
let mut handles = Vec::new();
let barrier = Arc::new(Barrier::new(count));
for _ in 0..count {
    let c = Arc::clone(&barrier);
    /* ... */
}

This definitely works, but it still manually requires synchronizing the value in Barrier and the value in the loop. Even though it's more efficient than using WaitGroup, it's less ergonomic to use. However if we, as humans, look at the code it's immediately clear what the value in Barrier::new should be. Namely: we're calling Arc::clone ten times, so we know ahead of time that the value we pass to Barrier::new should be 10.

Now here's what I wonder (and honestly, am not sure about). Given the number of clones can be statically inferred by humans by looking at the code, could we find a way to for this information to become statically available during const?

let count = 10;
let mut handles = Vec::new();
let barrier = Barrier::new()
for _ in 0..count {
    let c = barrier.clone();
    /* ... */
}

That would allow us to create a dynamic barrier and compile-time barrier with similar APIs, but different assurances. The dynamic barrier would be more flexible and work for unknown inputs, and the compile-time barrier would be more efficient, but only work if the number of items is known at compile-time.

Conclusion

Using std::sync::Barrier only acts an example. The same reasoning could be applied to e.g. Arc, Rc, channel, and more. For example for channels it could be interesting if we could compute ahead of time what the max number of items that might be in flight are, and fill that in as the queue size. Caveats and circumstances apply, but the general premise is: "If some item count is known at compile time, is there a way we can pass that number back to our structures?"

Crates such as static_rc are getting close to this idea, but still require manually passing in the count. I'm wondering whether it could be possible to eliminate that count entirely.

edit: I found this thread by the author of static_rc which asks very similar questions as we do here. Also lcnr shared some thoughts on Twitter about the feasibility of what refcounting in const.

Before closing out this post, I want to re-iterate once more that nothing of this is a concrete proposal. I'm not even advocating for including crossbeam::sync::WaitGroup in the stdlib. The purpose of this post is to share thinking, and ask questions about what is desireable and possible within the (future of) the Rust language.

And with that, thanks so much for reading this - happy weekend!