Keywords I: Unsafe Syntax
— 2022-07-09

Last week Oli and I were thinking through some examples involving modifiers and impl blocks. Namely: would it be possible or even useful to declare something like this:

// all methods in this block are `const`.
const impl MyStruct {
    fn foo(&mut self) {}
    fn bar(&mut self) {}
}

We thought this looked alright. But something we realized is that if we allowed this for const, we'd likely want to allow this for all modifier keywords too, including unsafe. And that unfortunately causes some issues. So this post is a brief look at what those issues are, what plans exist to improve it, and how we might even be able to do things better?

Note: The purpose of this post is mostly to share thoughts. I don't speak for Oli here, and I definitely don't speak for any Rust team. Don't take think of this as a serious proposal, but instead think of it as notes which may come in useful at a later time.

Two meanings of unsafe

unsafe in Rust is actually two keywords in a trenchcoat. Depending on where you use it, it'll mean something different:

On definition: unsafe fn / unsafe trait
On usage: unsafe {} / unsafe impl

Each provides the following contract:

On definition: there are additional invariants which the compiler cannot check, so the programmer must check those by hand instead.
On usage: I'm manually checking invariants here which the compiler cannot check for me.

Together these form a pair - not unlike async/.await. With the slight difference that defining an unsafe block in a function body does not come with the additional restriction that the function itself must also be marked as unsafe:

 // using `.await` requires you wrap it in an `async` context, which propagates
 // to the next caller:
async fn foo() {
    bar().await;
}

// however using `unsafe` doesn't require you wrap it in an `unsafe` context;
// `fn foo` doesn't also need to be `unsafe fn foo` here:
fn foo() {
    unsafe { bar() };
}

And this makes sense: pretty much all of Rust builds on some foundation of unsafe. So if unsafe was transitive much the same way async/.await is, then virtually every function would need to be marked unsafe - which would defeat the point of using unsafe as a sign post for where we're manually checking invariants.

Bringing it back to traits

So okay, that's all well and good. But I mentioned we got onto this because we were exploring the idea of allow keyword modifiers on entire blocks. Which is a useful thing to have if, idk, you're in the process of marking the majority of the stdlib as const ¹. If you're able to declare things in a more general way it makes things, uh, flow a bit more.

In practice very few APIs cannot ever be called from const fn functions. Those are: all APIs which need to have knowledge of the host system (OS, filesystem, network devices, etc.) or have access to statics. I think that's like, less than 10% of the total API surface in the stdlib; meaning the vast majority may eventually be constified. Not even to being about adding other types of effects (cough async cough) to the stdlib as well.

But here's where it gets odd. Take the following code:

struct MyStruct;

unsafe impl Send for MyStruct {}

This says: "we promise the implementation of Send here is safe". But if we adapt our const impl example from earlier to be unsafe impl instead we get this:

// The intended meaning here is:
// "all methods in this block are `unsafe` to call"
unsafe impl Foo {
    fn foo(&mut self) {}
    fn bar(&mut self) {}
}

This has the exact opposite meaning of our earlier unsafe impl! Doing this means we'd have a semantic difference between inherent impls ² and trait impls. Which is a suuuuuper subtle distinction, and one which is almost guaranteed to trip people up!

This is jargon for: "A regular impl block on a type". So not the trait impl block.

Do we have options?

There's some things we might be able to do here. First option:

Do nothing!

Yes, we could definitely choose to not care about this and move on. But that's an intellectual dead end, and I don't like those. I'm choosing to care about this instead. At least enough to write about it and think about it and think things through a bit more.

Color within the lines!

So what if we choose to restrict ourselves to making only additive changes here. No breaking changes allowed. Well, we could yield that unsafe impl has the meaning it has, and we can't change that. So maybe we could for example change where we put the keyword to have different meanings. What if we could make it something like this instead:

struct MyStruct;

// `unsafe impl` - this impl has been checked
unsafe impl Send for MyStruct {}

// `impl unsafe` - the methods here need to be checked when used
impl unsafe MyStruct {
    fn foo(&mut self) {}
    fn bar(&mut self) {}
}

We could make it so unsafe impl means: "this impl has been checked". And impl unsafe means: "this implementation needs to be checked. That's still subtle, but at least it would no longer be identical. Adapting it to const would then be:

// `const impl` would be disallowed; that's for `unsafe` only.
impl const MyStruct {
    fn foo(&mut self) {}
    fn bar(&mut self) {}
}

This is workable, but imo not ideal. When I say things out loud I definitely say it's a "const impl" or "async impl". But we'd be writing it the other way around.

Okay, so what if we start coloring outside the lines?

What if we actually took a step back and asked ourselves: "if we could start from scratch, what would we change about unsafe?" Well, for starters, I think our memory model is looking pretty good! So we wouldn't want to bother too much there, and focus more on the syntax.

Today having an unsafe fn implies the function body to be an unsafe {} block. RFC 2585 (2020) was merged to break that assumption; presumably over the course of a few editions (though afaict no decisions have been made on the exact process). But that would move us one step closer to differentiating between the two meanings of unsafe.

But what if we deprecated unsafe entirely instead? RFC 117 (2014) suggested we do exactly this, albeit pre-1.0. But given the semantics of unsafe wouldn't change, but only the syntax would, this is the type of change I believe we could run over an edition bound.

Personally I like checked / unchecked to differentate between the two types of unsafe ³. It's common terminology in the stdlib today already, and it would solve the problem of "what did you mean by unsafe" entirely. Our code examples could then be written like this:

I like "checked" more than I like "safe". When I hear safe to me it sounds like we've promised there are no bugs. And to quote Dijkstra: Testing shows the presence, not the absence of bugs. At best we can guarantee we've checked for bugs. And its up to others to decide whether they deem that safe enough.

struct MyStruct;

// this implementation has been checked
checked impl Send for MyStruct {}

// these methods need to be checked
unchecked impl MyStruct {
    fn foo(&mut self) {}
    fn bar(&mut self) {}
}

See! No more ambiguity. And I think that'd be a huge step up from the status quo! To spell out the other uses as well, here's all existing uses of unsafe translated to use "check" terminology:

// `unsafe fn`
unchecked fn foo() {}

// `unsafe trait`
unchecked trait MyTrait {}

// `unsafe impl`
checked impl Send for MyStruct {}

// `unsafe {}`
fn foo() {
    checked { bar() };
}

But owo there's more!

What? There's more?

Yes! Because we now have a keyword pair, we can take a page out of the book of other keyword pairs. Ergonomics matter in all parts of a language, but especially so in places where the stakes are high. So what if we allowed

In RFC 2585 Niko showed that many unsafe functions today are one-liners, and making it so unsafe fn does not imply unsafe {} would yield the following result:

unsafe fn foo(...) -> ... { unsafe {
  // Code goes here
} }

Now, what if... we did away with unsafe {} blocks (or at least supplemented them) and added a postfix operator operator? What if we could do this instead!:

unchecked fn foo(...) -> ... { 
    bar().checked
}

We can probably debate exact words and syntax all day long. But the idea doesn't seem too bad, does it? Now granted, perhaps there are solid reasons to keep blocks as well. But it's common advice that unsafe blocks should be scoped as tightly as possible. In today's rust that often means we're writing things like:

unsafe fn foo(...) -> ... { 
    unsafe { bar() }.baz()
}

And I think in comparison having a postfix notation would look a lot better:

unchecked fn foo(...) -> ... { 
    bar().checked.baz()
}

Going even further?

The core principle of Rust is that it can wrap unchecked internals into a package which exposes checked semantics. This allows us to clearly scope which parts of the program need to deal with manually checking invariants, and allows the rest of the program to not have to care about any of that. Instead the compiler helps us.

But not all checks are equal. I trust the stdlib devs to have carefully poured over every single line of unsafe and done their best to ensure it's correct. At worst there might be a bug. But when I'm downloading code from third parties, well, we can be exposed to far more than bugs ⁴.

⁴

I believe this is the first supply chain attack we've become aware of. It does something bad during compilation. But a malicious crate might just as well choose to do something bad during runtime. Supply chain attacks are real, and as Rust grows, so will the frequency of attacks.

unsafe is the building block that everything else is built on. Allowing all code we download to do anything at all is not great. I mean: it's great insofar that we can wrap openssl, libgit, winapi, and more. But we tend to vet those packages, and make sure they're from trusted sources. It's not great when that one random protocol parser I downloaded was actually turned out to be a piece of ransomware.

Sunfish wrote a thought experiment about what it would take to create a sandbox backed by the Rust type system. And at least we'd need to have a knob to turn off giving unrestricted access to unsafe to every function in every dependency we have.

pub fn im_a_sweet_and_innocent_parser_uwu(reader: impl Read) -> ParsedTree {
    haha_no_im_not_lemme_break_into_ur_unpatched_kernel_real_quick();
    ParsedTree::new(reader)
}

The reason why I'm raising it here is because in a way this goes in the opposite direction of RFC 2585. Instead of decoupling unsafe fn from unsafe {} it almost creates a tighter relationship between the two - but through other means. In order to use unsafe {} / checked {} anywhere in a function, the function signature would need to reflect that.

We wouldn't want to make the unsafe keyword transitive so this would need to behave differently than async/.await. But in order to enable unsafe to be used, we'd want to share the capability to do so via the function signature. The only types excluded from this mechanism would be the stdlib; since if you trust the compiler you should also trust the stdlib.

// `foo` needs maximum permissions ("ambient authority") to
// perform `unsafe`/`checked` operations:
fn foo()
with
    ambient_authority
{
    bar().checked
}

This would allow us to start flagging which functions depend on having the keys to the castle, and which do not. Obviously that's not the full story, capabilities and especially capability safety are a pretty deep topic in and of themselves. But because they would affect the function signature of unsafe types, it's worth bringing up here.

Conclusion

Okay, so we kind of went hard on the changes we could plausibly want to make to unsafe in the future. But I think that's fine! - We at least figured a plausible way out of the main drawback of RFC 2585 - though I guess a postfix unsafe keyword could hold water even without the rename to checked/unchecked.

But equally importantly this would actually open the path to allowing const impl {}, async impl, and possibly other effects in the future as well. And that's probably needed, because as we're looking to make our typesystem more powerful, we don't want people to have to actually type more (get it) to make use of it.

This initially was a thread on twitter, but I figured it might be worth giving it some permanence. I hope you enjoyed it; love you.

Keywords I: Unsafe Syntax— 2022-07-09