Pattern Extensions
— 2023-05-26

Microsoft Build is happening this week, and with it come new announcements for the C# language and dotnet runtime. I was watching the video: "What's new in C#12 and beyond", and the C# team talked about pattern matching and pattern constructors. I've been thinking a bit about patterns in Rust recently, and I wanted to share some of my doodles.

Disclaimer time: I'm not on the Rust language team, nor do I hold any direct decision making power. This post is not intended to be a complete or consistent proposal, but rather to share some short ideas which seem neat. I'm not suggesting we should prioritize this work over any other work, nor arguing that this should be a high priority item. The reason I'm writing this is because I like to keep up with programming languages - and C# made some cool announcements this week.

Shapes of types

For the purpose of this post I want to distinguish between three different shapes of types. The shapes are:

List types, or "arrays" in Rust
Sum types, or "enums" in Rust
Product types, or "structs" in Rust
String types, such as "hello"
Ranges, such as 1..5

This is intentionally not a comprehensive list; neither on the side of the shapes, nor on the side of the examples. For example: in Rust we also have string-shaped items; and we also have vectors which are a kind of list. But this is intentionally not a complete list - just enough of a categorization to allow us to work through the remainder of this post.

Pattern Initializers

In Rust we have different kinds of initializer syntax available today:

let x = [0; 12];        // Initialize an array containing twelve zeros
let f = Foo {};         // Initialize a struct "Foo"
let b = Result::Ok(12); // Initialize the enum `Result` to `Ok`
let r = 1..5;           // Initialize a range of one to five
let s = "hello";        // Initialize a static str to "hello"

Let's take the list-initializer syntax as an example. Say we wanted to initialize not an array, but a vec. The way we would write that today would be by doing:

let x = vec![0; 12];

The vec![] macro is built into Rust, and imported by default so this works for Vec. But if we for example wanted to initialize a VecDeque we'd be out of luck for the short-hand syntax, and we'd need to go through a struct constructor instead.

In C# 12 a new feature is added which is called "collection literals" (demo, spec) which allows you to opt-in types to make use of constructor literal syntax. In the demo they take an example of ImmutableList, and allow it to be constructed using the same syntax as a regular list. You can watch the demo or read the spec to get a full picture of how this works in C#. But in Rust we could imagine doing something very similar via a trait. Roughly sketching it out, you could imagine this could work something like this:

let v = [0; 12];               // array
let v: Vec<_> = [0; 12];       // Vec
let v: VecDeque<_> = [0; 12];  // VecDeque

Pattern Matching

Pattern matching is another interesting feature in Rust. In C# pattern matching was added in (if I'm not mistaken) C# 6, and has seen steady improvements over the past number of releases. In Rust we've had pattern matching since the very beginning, but what we can match on has remained fairly limited to a number of built-ins over time.

Let's take a look again at the venerable list type in Rust: the array. We can match on arrays by using the [] notation. Take for example the following, whose first arm will match on the content of the array because it's the same:

let x = [12, 13];
match x {
    [12, 13] => {}
    _ => {}
}

For arrays that's all well and good. But for other list types it's more complicated. Take for example the Vec type: we can't directly match on it using the same [] syntax. Instead we must obtain a slice first via the .as_slice method or using the index syntax ([..]):

let x = vec![12, 13];
match x[..] {
    [12, 13] => {}
    _ => {}
}

But while we have a workaround to enable matching for vec, we don't have that option for many other container types. For example: VecDeque doesn't necessarily hold items in a single continguous slice of memory, so we can't index into it directly using [..] like we can with vec. The easiest way to do this is to call VecDeque::make_contiguous to reorder items in memory and return a single unified slice. This does however come with a runtime cost, which means we're now trading convenience for performance:

let mut x = VecDeque::new();
x.push_front(13);
x.push_back(12);
match x.make_contiguous() { // this will re-allocate
    [12, 13] => {}
    _ => {}
}

It would be far nicer if we could directly affect the matchers in Rust, rather than needing to work via slices. A lot of interesting container types don't hold memory in contiguous slices, so obtaining a slice can be difficult or expensive. But it is often reasonably cheap to visit items in the container, which is all we would need to enable pattern matching to work on richer types. Just like with initializer patterns, it'd be nice if there was a way to enable pattern matching to be implementable via the trait system. What if this was possible instead?:

let mut x = VecDeque::new();
x.push_front(13);
x.push_back(12);
match x { // no re-allocation because we match directly
    [12, 13] => {}
    _ => {}
}

Pattern Types

Another interesting potential extension to Rust's pattern system is the notion of "pattern types". In my series on state machines I've covered some of this. But Oli pointed out that most of that actually falls under a feature called "pattern types". This would allow us to not only reason about types as we can today, but also about their refinements ¹.

I hope I'm using this word correctly here lol.

Let's say we have a "double" function, which takes a number and doubles it. In order for it to fit in the same number type, the number can't be more than half as big. Because if we double it, we don't want to overflow the number type or use a bigger container. If we hand-coded this today, we'd probably want to write something like this:

fn u32_double(num: u32) -> u32 {
    if num >= u32::MAX / 2 {
        panic!("number cannot be doubled without overflowing");
    }
    num + num
}

assert_eq!(u32_double(1), 2); // ✅ ok
u32_double(u32::MAX);         // ❌ panic, numbers are too big

This has an issue that this check exists entirely at runtime. If we initialize a number literal such as in our passing example, we hope that the compiler's optimizer kicks in and removes the check. But there is no guarantee that it will. Instead using pattern types we can move this guarantee into the type system, and we can omit the check in most cases. Using entirely made-up syntax, you could imagine this could look something like this:

// lol, idk what this should look like. let's just pretend ok?
const N: u32 = u32::MAX / 2;
fn u32_double(num: u32 is 0..N) -> u32 {
    num + num
}

assert_eq!(u32_double(1), 2); // ✅ ok
u32_double(u32::MAX);         // ❌ compile error: doesn't fit in the pattern range

Pattern types would be nice for a number of other reasons too. As previously mentioned, I believe they would make it easier to work with the type state pattern in Rust. But I also believe they would allow to be more specific about borrowing, enable libary authors to define their own NonZero types. But also provide us with what I believe may be the most ergonomic syntax for try-functions:

// function notation
try fn foo() -> u32 throws Err<io::Error> {..}  // Result
try fn foo() -> u32 throws None {..}            // Option
try fn foo() -> u32 throws Break<()> {..}       // ControlFlow

// closure notation
try || {}                     // infer everything
try || throws None {}         // infer success path
try || -> u32 {}              // infer error path
try || -> u32 throws None {}  // infer nothing

Tangent time: (it's my blog and you can't tell me what to do). But what I like about this in particular is that it syntactically separates the fallibiliy aspects of the function from the base of the function. This is especially nice because for closure notations this allows us to provide partial ascriptions. Most of the time you can probably just write try || and the inference engine will help you out. But it's nice when you can write more of the signature when you need to. Anyway, to close this out: I think pattern types are pretty neat and they feel like they are a bit of a non-feature. In the sense that they (at least to me) don't really change anything about the idea of Rust, they just allow more code to compile which people reasonably could assume should already compile today.

the "is" keyword

Rust recently (1.42, I know I know) stabilized the matches! macro which allows you to use patterns for comparisons. The way this works is as follows (taken from the docs):

let foo = 'f';
assert!(matches!(foo, 'A'..='Z' | 'a'..='z'));

let bar = Some(4);
assert!(matches!(bar, Some(x) if x > 2));

It takes a variable on the left side, and a pattern on the right side, and then compares the two. This is often nice to have when working within expressions such as macros. Prior to the macro, it was common (and still is common) to add is_ methods to enums, such as: Result::is_ok to allow easy comparisons within expressions. In C# (and other languages too) there exists the is keyword which elevates this to a language feature. With it we could rewrite the previous example as:

let foo = 'f';
assert!(foo is 'A'..='Z' | 'a'..='z');

let bar = Some(4);
assert!(bar is Some(x) if x > 2);

Whether this feature would carry its weight is a good question. Right now it's good practice to add is_ methods for each enum member, which is a lot of typing. So much so that I added a generator to Rust-Analyzer for it to automate the process. If every member in every enum would benefit from a corresponding method, maybe having it as a language feature would not be a bad idea? ²

Tangent: I'm still wondering about a different timeline where the as keyword would make use of Into instead of doing primitive casts. And as? could be mapped to TryInto. I don't know enough about the topic to say anything about why that isn't the case / couldnt't be made to work. But conceptually I kind of like the idea of having both as and is as a pair of keywords to operate on types.

On Product Patterns

At the start of this post I mentioned there are a number of shapes types can be, but so far we've only looked at list types. I've been looking at C# and while support for matching on product types (C# Records) seems to roughly match to what Rust is doing - I'm less sure about what the story will be for constructing or matching on e.g. hashmap structures. It seems the following syntax (collection initializers) is supported today to construct hashmaps:

var data = new Dictionary<string, string> {
    { "test", "val" }, 
    { "test2", "val2" }
};

This isn't quite the same as collection literals, and I also don't know what that the future plans for pattern matching are in C#. Also I'm kind of coming up empty for other languages on how e.g. matching on arbitrary product types should look like. That's a long way of saying: for any record types that aren't just structs, I'm unsure what the syntax should be. But because we're fun-posting, here's three potential options that I'm somewhat certain are all not a good idea:

// Ehhhh option 1: reinterpret struct tokens as
// values. This seems really weird and bad?
// Downside: record names are not string literals,
// and string literals are not the only possible keys.
let x: HashMap<&'static str, u32> = {
    hello: 12,
    world: 13,
};
match x {
    { hello: 12, world: 13 } => {}
    _ => {}
};

// Ehhhh option 2: Literals in both the key and the value
// position kind of like JS Object notation.
// Downside: this syntax is ambiguous, because it's 
// the same as block scope syntax.
let x: HashMap<&'static str, u32> = {
    "hello": 12,
    "world": 13,
};
match x {
    { "hello": 12, "world": 13 } => {}
    _ => {}
};

// Ehhhh option 3: The only initializer syntax is [],
// so we tuple our way through it.
// Downside: this implies ordering where there is none.
let x: HashMap<&'static str, u32> = [
    ("hello", 12),
    ("world", 13),
];
match x {
    [("hello", 12), ("world", 13)] => {}
    _ => {}
};

Anyway tldr: where pattern initializers and matchers for list-like types seem like a relatively straighforward design, a syntax for product types (maps, sets) seems much less obvious to me.

On String Patterns

In Rust 2021 we already reserved token prefixes with the intent of allowing new string literals such as f"hello {world}" for format strings or s"" for allocated strings. So we're already somewhat on the way to expand initializer syntax. I think it would make sense if we could work in the inverse and as we work on that also allow the inverse to work:

// Rust today
match String::from("hello") {
    "hello" => {}   // ❌ Compile error: expected `String`, found `&str`
    _ => {}
}

// What if something like this could work instead?
// (string literals + string matching)
match s"hello" {    // String literal constructor
    "hello" => {}   // String pattern matching (without casting to string slice)
    _ => {}
}

Conclusion

As I mentioned at the start, this is a bit of a rambly/braindumpy post because I saw some cool C# things today and I wanted to write about that. I think it's fun to think through what it could look like if Rust's pattern system became more capable. To briefly recap what we've shown about pattern initializers + matchers:

// Creating + matching on a `VecDeque` today.
let mut x = VecDeque::new();
x.push_front(13);
x.push_back(12);
match x.make_contiguous() {
    [12, 13] => {}
    _ => {}
}

// Creating + matching on a `VecDeque` if we had
// implementable list initializers + list patterns
let mut x: VecDeque = [12, 13];
match x {
    [12, 13] => {}
    _ => {}
}

That's all. ✌️

Pattern Extensions— 2023-05-26