Keywords I: Unsafe Syntax
— 2022-07-09
- two meanings of unsafe
- bringing it back to traits
- do we have options?
- okay, so what if we start coloring outside the lines?
- but owo there's more!
- going even further?
- conclusion
Last week Oli and I were thinking through some examples involving modifiers and
impl
blocks. Namely: would it be possible or even useful to declare something
like this:
// all methods in this block are `const`.
const impl MyStruct {
fn foo(&mut self) {}
fn bar(&mut self) {}
}
We thought this looked alright. But something we realized is that if we allowed
this for const
, we'd likely want to allow this for all modifier keywords too,
including unsafe
. And that unfortunately causes some issues. So this post is a
brief look at what those issues are, what plans exist to improve it, and how we
might even be able to do things better?
Note: The purpose of this post is mostly to share thoughts. I don't speak for Oli here, and I definitely don't speak for any Rust team. Don't take think of this as a serious proposal, but instead think of it as notes which may come in useful at a later time.
Two meanings of unsafe
unsafe
in Rust is actually two keywords in a trenchcoat. Depending on where
you use it, it'll mean something different:
- On definition:
unsafe fn
/unsafe trait
- On usage:
unsafe {}
/unsafe impl
Each provides the following contract:
- On definition: there are additional invariants which the compiler cannot check, so the programmer must check those by hand instead.
- On usage: I'm manually checking invariants here which the compiler cannot check for me.
Together these form a pair - not unlike async/.await
. With the slight
difference that defining an unsafe
block in a function body does not come with
the additional restriction that the function itself must also be marked as
unsafe
:
// using `.await` requires you wrap it in an `async` context, which propagates
// to the next caller:
async fn foo() {
bar().await;
}
// however using `unsafe` doesn't require you wrap it in an `unsafe` context;
// `fn foo` doesn't also need to be `unsafe fn foo` here:
fn foo() {
unsafe { bar() };
}
And this makes sense: pretty much all of Rust builds on some foundation of
unsafe
. So if unsafe
was transitive much the same way async/.await
is,
then virtually every function would need to be marked unsafe
- which would defeat the point of using unsafe
as a sign post for where we're manually checking invariants.
Bringing it back to traits
So okay, that's all well and good. But I mentioned we got onto this because we were
exploring the idea of allow keyword modifiers on entire blocks. Which is a
useful thing to have if, idk, you're in the process of marking the majority of
the stdlib as const
1. If you're able to declare things in a more
general way it makes things, uh, flow a bit more.
In practice very few APIs cannot ever be called from const fn
functions. Those are: all APIs which need to have knowledge of the host system (OS, filesystem, network devices, etc.) or have access to statics. I think that's like, less than 10% of the total API surface in the stdlib; meaning the vast majority may eventually be constified. Not even to being about adding other types of effects (cough async cough) to the stdlib as well.
But here's where it gets odd. Take the following code:
struct MyStruct;
unsafe impl Send for MyStruct {}
This says: "we promise the implementation of Send
here is safe". But if we
adapt our const impl
example from earlier to be unsafe
impl instead we get
this:
// The intended meaning here is:
// "all methods in this block are `unsafe` to call"
unsafe impl Foo {
fn foo(&mut self) {}
fn bar(&mut self) {}
}
This has the exact opposite meaning of our earlier unsafe impl
! Doing this
means we'd have a semantic difference between inherent impls 2 and
trait impls. Which is a suuuuuper subtle distinction, and one which is almost guaranteed to trip people up!
This is jargon for: "A regular impl block on a type". So not the trait impl block.
Do we have options?
There's some things we might be able to do here. First option:
Do nothing!
Yes, we could definitely choose to not care about this and move on. But that's an intellectual dead end, and I don't like those. I'm choosing to care about this instead. At least enough to write about it and think about it and think things through a bit more.
Color within the lines!
So what if we choose to restrict ourselves to making only additive changes here.
No breaking changes allowed. Well, we could yield that unsafe impl
has the
meaning it has, and we can't change that. So maybe we could for example change
where we put the keyword to have different meanings. What if we
could make it something like this instead:
struct MyStruct;
// `unsafe impl` - this impl has been checked
unsafe impl Send for MyStruct {}
// `impl unsafe` - the methods here need to be checked when used
impl unsafe MyStruct {
fn foo(&mut self) {}
fn bar(&mut self) {}
}
We could make it so unsafe impl
means: "this impl has been checked". And impl unsafe
means: "this implementation needs to be checked. That's still subtle,
but at least it would no longer be identical. Adapting it to const
would then
be:
// `const impl` would be disallowed; that's for `unsafe` only.
impl const MyStruct {
fn foo(&mut self) {}
fn bar(&mut self) {}
}
This is workable, but imo not ideal. When I say things out loud I definitely say it's a "const impl" or "async impl". But we'd be writing it the other way around.
Okay, so what if we start coloring outside the lines?
What if we actually took a step back and asked ourselves: "if we
could start from scratch, what would we change about unsafe
?" Well, for
starters, I think our memory
model is looking pretty
good! So we wouldn't want to bother too much there, and focus more on the
syntax.
Today having an unsafe fn
implies the function body to be an
unsafe {}
block. RFC
2585 (2020)
was merged to break that assumption; presumably over the course of a few
editions (though afaict no decisions have been made on the exact process). But
that would move us one step closer to differentiating between the two meanings
of unsafe
.
But what if we deprecated unsafe
entirely instead? RFC
117 (2014) suggested we do exactly this,
albeit pre-1.0. But given the semantics of unsafe
wouldn't change, but only
the syntax would, this is the type of change I believe we could run over an
edition bound.
Personally I like checked
/ unchecked
to differentate between the two types
of unsafe
3. It's common
terminology in the stdlib today
already, and it would solve the problem of "what did you mean by unsafe
"
entirely. Our code examples could then be written like this:
I like "checked" more than I like "safe". When I hear safe to me it sounds like we've promised there are no bugs. And to quote Dijkstra: Testing shows the presence, not the absence of bugs. At best we can guarantee we've checked for bugs. And its up to others to decide whether they deem that safe enough.
struct MyStruct;
// this implementation has been checked
checked impl Send for MyStruct {}
// these methods need to be checked
unchecked impl MyStruct {
fn foo(&mut self) {}
fn bar(&mut self) {}
}
See! No more ambiguity. And I think that'd be a huge step up from the status
quo! To spell out the other uses as well, here's all existing uses of unsafe
translated to use "check" terminology:
// `unsafe fn`
unchecked fn foo() {}
// `unsafe trait`
unchecked trait MyTrait {}
// `unsafe impl`
checked impl Send for MyStruct {}
// `unsafe {}`
fn foo() {
checked { bar() };
}
But owo there's more!
What? There's more?
Yes! Because we now have a keyword pair, we can take a page out of the book of other keyword pairs. Ergonomics matter in all parts of a language, but especially so in places where the stakes are high. So what if we allowed
In RFC 2585 Niko showed that many unsafe
functions today are one-liners, and
making it so unsafe fn
does not imply unsafe {}
would yield the following
result:
unsafe fn foo(...) -> ... { unsafe {
// Code goes here
} }
Now, what if... we did away with unsafe {}
blocks (or at least supplemented
them) and added a postfix operator operator? What if we could do this instead!:
unchecked fn foo(...) -> ... {
bar().checked
}
We can probably debate exact words and syntax all day long. But the idea doesn't
seem too bad, does it? Now granted, perhaps there are solid reasons to keep
blocks as well. But it's common advice that unsafe
blocks should be scoped as
tightly as possible. In today's rust that often means we're writing things like:
unsafe fn foo(...) -> ... {
unsafe { bar() }.baz()
}
And I think in comparison having a postfix notation would look a lot better:
unchecked fn foo(...) -> ... {
bar().checked.baz()
}
Going even further?
The core principle of Rust is that it can wrap unchecked internals into a package which exposes checked semantics. This allows us to clearly scope which parts of the program need to deal with manually checking invariants, and allows the rest of the program to not have to care about any of that. Instead the compiler helps us.
But not all checks are equal. I trust the stdlib devs to have carefully poured
over every single line of unsafe
and done their best to ensure it's correct.
At worst there might be a bug. But when I'm downloading code from third parties,
well, we can be exposed to far more than bugs 4.
I believe this is the first supply chain attack we've become aware of. It does something bad during compilation. But a malicious crate might just as well choose to do something bad during runtime. Supply chain attacks are real, and as Rust grows, so will the frequency of attacks.
unsafe
is the building block that everything else is built on. Allowing all
code we download to do anything at all is not great. I mean: it's great
insofar that we can wrap openssl
, libgit
, winapi
, and more. But we tend to vet those packages, and make sure they're from trusted sources. It's not
great when that one random protocol parser I downloaded was actually turned out to be a piece of ransomware.
Sunfish wrote a thought
experiment about what
it would take to create a sandbox backed by the Rust type system. And at least
we'd need to have a knob to turn off giving unrestricted access to unsafe
to every function in every dependency we have.
pub fn im_a_sweet_and_innocent_parser_uwu(reader: impl Read) -> ParsedTree {
haha_no_im_not_lemme_break_into_ur_unpatched_kernel_real_quick();
ParsedTree::new(reader)
}
The reason why I'm raising it here is because in a way this goes in the
opposite direction of RFC 2585. Instead of decoupling unsafe fn
from unsafe {}
it almost creates a tighter relationship between the two - but through
other means. In order to use unsafe {}
/ checked {}
anywhere in a function,
the function signature would need to reflect that.
We wouldn't want to make the unsafe
keyword transitive so this would need to
behave differently than async/.await
. But in order to enable unsafe
to be
used, we'd want to share the capability to do so via the function signature.
The only types excluded from this mechanism would be the stdlib; since if you
trust the compiler you should also trust the stdlib.
// `foo` needs maximum permissions ("ambient authority") to
// perform `unsafe`/`checked` operations:
fn foo()
with
ambient_authority
{
bar().checked
}
This would allow us to start flagging which functions depend on having the keys
to the castle, and which do not. Obviously that's not the full story,
capabilities and especially capability safety are a pretty deep topic in and
of themselves. But because they would affect the function signature of unsafe
types, it's worth bringing up here.
Conclusion
Okay, so we kind of went hard on the changes we could plausibly want to make to
unsafe
in the future. But I think that's fine! - We at least figured a
plausible way out of the main drawback of RFC 2585 - though I guess a postfix
unsafe
keyword could hold water even without the rename to
checked/unchecked
.
But equally importantly this would actually open the path to allowing
const impl {}
, async impl
, and possibly other effects in the future as well.
And that's probably needed, because as we're looking to make our typesystem more
powerful, we don't want people to have to actually type more (get it) to make
use of it.
This initially was a thread on twitter, but I figured it might be worth giving it some permanence. I hope you enjoyed it; love you.