refcounts
— 2021-05-07
MyData I've wondered about for a while now is how to improve the ergonomics
of some of the structures in std::sync
. There's quite a bit of excitement
gathering about ghost-cell
(and the paper), and I figured
this would be as good a time as any to share some thoughts (questions?) on
ergonomics.
Like my last post, the intent of this post is less to provide a solution or result of research, but instead share my current thinking on the topic. Rather than asking questions in private, I figured I'd ask questions on here instead (:
std::sync::Barrier
Let's use
std::sync::Barrier
to center our examples around. This is a data structure which enables synchronization between
any number of threads. One example of how you might want to use this is in an
HTTP server. You may want to create a connection to a database, connect to a
cache server, connect to perhaps some other backend, and only when all of those
have succeeded do you proceed to accept connections. If one of them takes a
while to initialize (or even fails entirely), our end-users won't be impacted
because none of their requests are being handled yet.
This structure enables multiple threads to synchronize before starting work. Its use usually looks something like this:
use std::sync::{Arc, Barrier};
use std::thread;
let mut handles = Vec::with_capacity(10);
let barrier = Arc::new(Barrier::new(10));
for _ in 0..10 {
let c = Arc::clone(&barrier);
// The same messages will be printed together.
// You will NOT see any interleaving.
handles.push(thread::spawn(move|| {
println!("before wait");
c.wait();
println!("after wait");
}));
}
// Wait for other threads to finish.
for handle in handles {
handle.join().unwrap();
}
We create an instance of Barrier
by passing it a count, wrap it in an Arc
(so it can be cloned), and then pass an instance of it to each thread. Important
to note is that the number of threads we spawn needs to exactly line up to the
number we've apssed into Barrier::new
. If the number is too high our
application will lock up and never continue. If the number is too low, the
application will start before all threads are ready, defeating the point of
using the structure in the first place.
dynamic Barrier
One difficulty with std::sync::Barrier
is that we need to statically pass
the number in. This means in order to create the synchronization structure we
need to know up front how many items we're trying to synchronize. Say we're
dealing with an iterator of unknown length, and we want to exhaust all items
before continuing? We don't know the number up front, so we can't use the
structure:
use std::thread;
use std::sync::{Arc, Barrier};
fn from_iter(iter: impl Iterator<Item = MyData>) {
let mut handles = Vec::new();
let barrier = Arc::new(Barrier::new(/* ??? */));
for item in iter {
// ... create a thread and wait within
}
// ... join all handles
}
Counting at runtime is not something exceptional though. In our example Arc
works entirely at runtime, making work without passing a count ahead of time.
Though we don't have a dynamic version of Barrier
in the stdlib, we do have
one in the ecosystem in the form of:
crossbeam::sync::WaitGroup
.
With it we can rewrite our previous example like so:
use std::thread;
use crossbeam::sync::WaitGroup;
fn from_iter(iter: impl Iterator<Item = MyData>) {
let mut handles = Vec::new();
let wg = WaitGroup::new(); // 1. create a new instance.
for item in iter {
let wg = wg.clone(); // 2. clone it
handles.push(thread::spawn(move|| {
println!("before wait");
wg.wait(); // 3. wait to sync
println!("after wait");
}));
}
wg.wait(); // 4. also sync our original wg instance
// ... join all handles
}
The upside of using WaitGroup
over Barrier
is that it's much easier to use,
and more widely applicable. The downside is that it's (marginally) less efficient
since we need to count the instances at runtime rather than having that
information up front 1.
One could argue that manually passing the count up front can be an extra safeguard - but there's inherent risks to mis-counting the number of instances. So I'm not sure that's much of an extra safeguard. More on that later.
I don't want to argue for including WaitGroup
in the stdlib since it's unclear
how common the use of Barrier
is. But I hope to have illustrated that
Barrier
has limitations in how it can be used to create cross-thread
synchronization points.
const Barrier
As we've seen in the first example there's another issue: we need to ensure that
the number passed to Barrier::new
matches the number of threads we spawn. One
way we could fix this is by creating a variable, and passing that same variable
to both the loop and Barrier::new
:
let count = 10;
let mut handles = Vec::new();
let barrier = Arc::new(Barrier::new(count));
for _ in 0..count {
let c = Arc::clone(&barrier);
/* ... */
}
This definitely works, but it still manually requires synchronizing the value in
Barrier
and the value in the loop. Even though it's more efficient than using
WaitGroup
, it's less ergonomic to use. However if we, as humans, look at the
code it's immediately clear what the value in Barrier::new
should be. Namely:
we're calling Arc::clone
ten times, so we know ahead of time that the value
we pass to Barrier::new
should be 10
.
Now here's what I wonder (and honestly, am not sure about). Given the number of
clones can be statically inferred by humans by looking at the code, could we
find a way to for this information to become statically available during const
?
let count = 10;
let mut handles = Vec::new();
let barrier = Barrier::new()
for _ in 0..count {
let c = barrier.clone();
/* ... */
}
That would allow us to create a dynamic barrier and compile-time barrier with similar APIs, but different assurances. The dynamic barrier would be more flexible and work for unknown inputs, and the compile-time barrier would be more efficient, but only work if the number of items is known at compile-time.
Conclusion
Using std::sync::Barrier
only acts an example. The same reasoning could be
applied to e.g. Arc
, Rc
, channel
, and more. For example for channels it
could be interesting if we could compute ahead of time what the max number of
items that might be in flight are, and fill that in as the queue size. Caveats
and circumstances apply, but the general premise is: "If some item count is
known at compile time, is there a way we can pass that number back to our
structures?"
Crates such as static_rc
are
getting close to this idea, but still require manually passing in the count. I'm
wondering whether it could be possible to eliminate that count entirely.
edit: I found this
thread
by the author of static_rc
which asks very similar questions as we do here.
Also lcnr shared some thoughts on Twitter about the feasibility of what refcounting in const.
Before closing out this post, I want to re-iterate once more that nothing of
this is a concrete proposal. I'm not even advocating for including
crossbeam::sync::WaitGroup
in the stdlib. The purpose of this post is to share
thinking, and ask questions about what is desireable and possible within the
(future of) the Rust language.
And with that, thanks so much for reading this - happy weekend!