optimizing hashmaps even more
— 2021-05-08

In our last post we took a look at possible ways we could improve the ergonomics of Rust's refcounting APIs. In this post we'll be looking at Hashmap: how it's currently implemented, how we could optimize it further, and finally directions we could explore to enable the compiler to automatically apply these optimizations.

Disclaimer: I'm not an expert. I'm not on any team involved with const execution, nor do I hold any decision making power. This exists to share ideas and curiosities of what might be possible in Rust in the spirit of sharing posts with pals.

Hashmaps and hashing algorithms

In 2017 Google introduced a new kind of hashmap for C++: the Swisstable hashmap. This structure uses SIMD instructions to compute hashes, which is a great match for modern computer hardware. So any improvements to the hashing algorithm would have an outsize impact on performance, and in turn resource efficiency.

In the cppcon 2017 talk on Swisstable, the authors mention that across all of Google's code, they found a non-trivial amount of time was spent computing hashes ¹. Amanieu from the libs team created a Rust variant of the same algorithm for the Hashbrown crate, which has since become the default Hashmap implemention for Rust.

1% of CPU and 4% of RAM of all of Google's C++ code is spent doing things inside hashtables. Considering the scale of Google's deployment, the amount of CPU cycles spent is enormous. And it doesn't even consider software running on user devices such as Android and Chrome, or programming languages other than C++.

However we've known for a long time about a faster hashmap than Swisstable: the perfect hashtable. The idea is that if you know all your keys ahead of time, you can create a lookup table which never creates collisions, never has to re-index the hashmap, or grow in size². Though it's not always possible to know all keys ahead of time, so there are limits to when it can be used.

There are a lot of considerations and tradeoffs for hashing algorithms. Sy Brand gave a brief overview on Twitter in reply to this post.

Even so, let's take a look at when it does make sense to use perfect hashtables, and show examples of how to implement them.

enums as keys

Imagine we're building a protocol. The protocol has support for "metadata" and "body data". The metadata consists of keys and values. We could imagine its use could look something like this:

use std::collections::HashMap;

let mut metadata: HashMap<String, String> = HashMap::new();

metadata.insert("content_type".into(), "fast_proto".into());
metadata.insert("encryption_key".into(), generator.next_key());
metadata.insert("content_fingerprint".into(), bodyfingerprint());
metadata.insert("content_length".into(), bodylength().into());

request.send_metadata(metadata).await?;

We create a hashmap, insert several key-value pairs into it, and finally serialize it over the wire. Now imagine these are the only keys that are valid to insert into the hashmap. Since we know all possible representations up front, we could capture them within an enum:

use std::collections::HashMap;

enum KeyName {
    ContentType,
    EncryptionKey,
    ContentFingerprint,
    ContentLength,
}
impl Display for KeyName { ... }

let mut metadata: HashMap<KeyName, String> = HashMap::new();

metadata.insert(KeyName::ContentType, "fast_proto".into());
metadata.insert(KeyName::EncryptionKey, generator.next_key());
metadata.insert(KeyName::ContentFingerprint, body.fingerprint());
metadata.insert(KeyName::ContentLength, body.length().into());

request.send_metadata(metadata).await?;

Because we're using an enum we know all of our keys ahead of time ³, and can simply match on the keys. Yet the code above will still expand to roughly 1200 lines of assembly. This is because the Rust compiler currently isn't clever enough to lower the hashmap into a lookup table.

Yes yes, #[non_exhaustive] would change that. But we aren't using that here.

Now what would happen if we had a "sufficiently smart compiler". The compiler would be able to lower the hashmap to a fixed-sized array containing Option<String> which uses match to access the data within ⁴:

⁴

We may not always want to use an array to store the data. But we're dealing with a small key size here, so it's fine.

pub struct HashMap {
    inner: [Option<String>; 4],
}

impl HashMap {
    pub fn insert(&mut self, key: KeyName, value: String) -> Option<String> {
        match key {
            KeyName::ContentType => self.inner[0].replace(value),
            KeyName::EncryptionKey => self.inner[1].replace(value),
            KeyName::Fingerprint => self.inner[2].replace(value),
            KeyName::ContentLength => self.inner[3].replace(value),
        }
    }

    pub fn get(&self, key: KeyName) -> Option<&String> {
        match key {
            KeyName::ContentType => self.inner[0].as_ref(),
            KeyName::EncryptionKey => self.inner[1].as_ref(),
            KeyName::Fingerprint => self.inner[2].as_ref(),
            KeyName::ContentLength => self.inner[3].as_ref(),
        }
    }
}

What we're seeing here is that we've replaced the hashing function with a match statement. All data is stored within a fixed-sized array of Options, and each enum variant directly maps to one of the slots within array. This removes the overhead of hashing entirely, and opens the door for further optimizations (such as inlining).

Some might argue that: "We could just write this code ourselves", but writing custom data structures has costs associated with it: we need to realize we need the structure in the first place, then write the structure, maintain it, and teach everyone else working on the codebase about it. Even though it might be faster than Rust's default hashmap, going down this route has extra costs.

static strings as keys

In many ways our example of keying into a hashmap using an enum is a bit too perfect. It assumes the author has awareness that their data is in fact enumerable, and willingness to refactor the set of all keys into an enum. But it's also risky: what if want to add more keys later. What if those keys may not be known during compilation? I personally often prefer to keep things simple when I'm not sure how things might change in the future.

Code in the real world is often far less pristine. When trying to deliver a feature under time pressure we're often more focused on getting it working and tested, than on performance. After all it's usually fast enough anyway. We'll often key into hashmaps using strings, which have to be inlined into the created binary. The assembly output for those inlined strings will look somewhat like this:

.L__unnamed_1:
        .ascii  "content_type"

.L__unnamed_2:
        .ascii  "encryption_key"

.L__unnamed_3:
        .ascii  "content_fingerprint"

.L__unnamed_4:
        .ascii  "content_length"

A compiler not only knows which strings are being inlined. It also knows which strings are used as inputs to which methods. This is to say: a compiler is very much capable of capturing the full set of inputs during compilation ⁵, which is provides us with data not unlike a Rust enum. That means at least theoretically it should be possible to generate a perfect hashmap for the following input:

⁵

Not sure if this currently occurs in LLVM or not. But I know I'm able to obtain this data through Salsa in Rust-Analyzer.

use std::collections::HashMap;

let mut metadata: HashMap<&'static str, String> = HashMap::new();

metadata.insert("content_type", "fast_proto".into());
metadata.insert("encryption_key", generator.next_key());
metadata.insert("content_fingerprint", body.fingerprint());
metadata.insert("content_length", body.length().into());

request.send_metadata(metadata).await?;

And the generated perfect hashmap would be:

pub struct HashMap {
    inner: [Option<String>; 4],
}

impl HashMap {
    pub fn insert(&mut self, key: &str, value: String) -> Option<String> {
        match key {
            "content_type" => self.inner[0].replace(value),
            "encryption_key" => self.inner[1].replace(value),
            "content_fingerprint" => self.inner[2].replace(value),
            "content_length" => self.inner[3].replace(value),
            _ => unsafe { std::hint::unreachable_unchecked() },
        }
    }
    
    pub fn get(&self, key: &str) -> Option<&String> {
        match key {
            "content_type" => self.inner[0].as_ref(),
            "encryption_key" => self.inner[1].as_ref(),
            "content_fingerprint" => self.inner[2].as_ref(),
            "content_length" => self.inner[3].as_ref(),
            _ => unsafe { std::hint::unreachable_unchecked() },
        }
    }
}

hybrid static + dynamic keys

If our keys are only ever known during runtime, we know the best hashmap we can use is Hashbrown. But what if some of our keys are known during compilation?

This is where we can use a "hybrid" hashmap ⁶: we generate a lookup table for the known keys, and use a regular hashmap as a fallback for all other keys. This approach is used by hyperium/http's HeaderMap struct. It keeps track of commonly used HTTP headers in a lookup table, and falls back to a hashmap for all other headers.

⁶

I made up the term "hybrid hashmap" for the purpose of this post. Let me know if you know of a term used in literature to describe this same structure.

Here's a usage example of a hashmap which uses both dynamic and static data:

use std::collections::HashMap;

let mut metadata: HashMap<String, String> = HashMap::new();

metadata.insert("content_type".into(), "fast_proto".into());
metadata.insert("encryption_key".into(), generator.next_key());
metadata.insert(gen_keyname(), gen_headername()); // this generates values at runtime

request.send_metadata(metadata).await?;

For this code we're able to generate the following implementation. We create a lookup table for our known keys, and fall back to a hashmap for all other keys:

pub struct HashMap {
    inner: [Option<String>; 4],
    map: std::collections::HashMap<String, String>,
}

impl HashMap {
    pub fn insert(&mut self, key: String, value: String) -> Option<String> {
        match key.as_ref() {
            "content_type" => self.inner[0].replace(value),
            "encryption_key" => self.inner[1].replace(value),
            "content_fingerprint" => self.inner[2].replace(value),
            "content_length" => self.inner[3].replace(value),
            _ => self.map.insert(key, value),
        }
    }

    pub fn get(&self, key: &str) -> Option<&String> {
        match key {
            "content_type" => self.inner[0].as_ref(),
            "encryption_key" => self.inner[1].as_ref(),
            "content_fingerprint" => self.inner[2].as_ref(),
            "content_length" => self.inner[3].as_ref(),
            _ => self.map.get(key),
        }
    }
}

A nice property about this approach is that if we only ever use known keys, we'll get speeds similar to that of a lookup table. It's only once we start using custom keys that we branch into a slower path and start allocating for the hashmap.

Looking ahead

While it's possible for humans to look for instances where a hashmap can be converted into a lookup table, for Rust only a compiler will be able to catch all cases. This is because virtually no code in Rust is ever completely self-contained.

Say we have a hashmap defined in crate A. And we have a crate B which uses that hashmap. If B only ever uses A with static keys we're able to optimize for that use. However crate A will cannot know ahead of time what the usage patterns of B will be like.

Add in more crates, transient wrappers, and legacy code - and it quickly becomes clear why it's ideal to let machines handle these kinds of optimizations. However the question then becomes: how would we enable this in Rust?

I think if we had access to the following information during const / specialization we'd be able to create these kinds of optimizations:

Provide a way to enumerate all static inputs for a function.
Provide a way to tell if there are runtime inputs for a function.
Provide a way to count the number of invocations a function has within the codebase.

All of this information can be known by programmers ahead of time by reading the code they're working on. This means that a compiler can know this information too. The question really is how (and where) to expose this information.

My gut feeling is that we should be able to figure this out somewhere during const evaluation and specialization ⁷ ². Using an entirely made-up, fantasy pseudo version of Rust I'd imagine we'd be able to create a new hashmap using something like this:

⁷

Yes I know that's not a separate step, - less of a time, more of a place I guess.

I don't believe we can just specialize every hashmap + hasher combo. I'm thinking we might be able to specialize Rust's default hashmap though. As I understand it we don't guarantee much of anything other than it being fast and not being a DDoS vector.

#[this_is_a_specialization]
const fn create_hashmap<K: 'static, V>(m: Map<K, V>) -> HashMap {
    // how often is this map's `get` function called?
    if mem::invocation_count_of(m.get) == 0 {
        // return stub structure because we know the data is never accessed
    }

    // iterate over all static inputs provided to m.insert
    for (k, _) in mem::static_inputs_of(m.insert) {
        // iterate over all keys and return a lookup table
    }

    // count how often m.insert is called with non-static input
    if mem::dynamic_input_count_of(m.insert) != 0 {
        // if there are runtime (non-static) arguments we want to return a hybrid hashmap
    } else {
        // yay, return a lookup table
    }
}

Maybe there are other proposals in progress for const execution / specialization which cover these cases as well. Admittedly adding a bunch of new APIs backed by /* compiler built-in */ is a bit of a cop-out, and I'd love to learn if there could be ways to do the same which more closely integrate with what we already have.

I hope this paints a rough picture of what I'm thinking. I wish I knew more about the present and future of Rust's const engine to design a better API. But my goal with this post is less to propose something concrete, and more to convince the right people this is something worth exploring. I hope in that regard I'm doing alright so far (:

Conclusion

I hope everyone walks away from this post with two things: an understanding of how to create a lookup table/perfect hashmap in Rust, and excitement about a future where our compilers can apply these optimizations for us.

The idea of transparently optimizing existing patterns based on usage is fascinating to me. Usually when I think about these kinds of optimizations it's things like compilers optimizing math by converting it to bitwise instructions. But imagine how cool it would be if we could optimize entire data structures based on how they're used. This type of optimization doesn't need to stop with hashmaps either: I got interested in this topic because I think we can optimize channels significantly if we can inspect their use during compilation ⁸.

⁸

For example: we could optimize an MPMC channel into a one-shot SPSC channel if we know both the sender and receiver are never cloned, and only one item is ever sent.

Another thing which excites me about this is that if we're able to lower a hashmap into a match statement, we can proceed to apply all optimizations we use to match statements. This means optimizations such as removing unused enum variants, reordering keys for quicker lookups, and even lowering the match statement itself to more efficient structures such as prefix-tries. Rust's match performance is probably not where it could be yet either, and the more we can express code of one kind in terms of another, the more impactful optimizations can become.

Thanks for reading!

optimizing hashmaps even more— 2021-05-08