Inline Crates
— 2022-10-25

  1. what are crates?
  2. what are modules?
  3. comparing modules and crates
  4. in-line modules
  5. example
  6. implementation
  7. future directions
    1. pub crate
    2. wasi components
  8. conclusion

People sometimes jest that Rust is just ML dressed up to look like C++. And I don't think that's entirely off: Rust has many of the key features present in ML languages. We have the same kind of type system (Hindley-Milner), we have sum types, and we have a module system which isn't directly tied to a module hierarchy. I want to talk a bit more about Rust's module system here.

In Rust we distinguish between "crates" and "modules". To people just learning about Rust the distinction can be a bit confusing. But in practice it makes sense to have both. In this post we're going to take a look at Rust's module system, what the differences are, and how we could introduce some features to bring crates and modules closer together.

Disclaimer: In this post I'm sharing some ideas on language design that heavily tie into the current compiler architecture. I want to make it super clear that I'm not necessarily advocating we make these changes, and I'm especially not trying to say that this should be prioritized. I just wanted to have this written down so it can be referenced later. Because I think there's value in writing things down even if they're not fully formed, feasible, or quite the right time.

What are crates?

A crate is a unit of code with strict encapsulation rules: Crates are also a unit which can individually be published to and pulled from crates.io. In the compiler it represents a translation unit, and is the boundary of parallelization 1. We're only able to compile entire crates in parallel; not yet sub-components of crates. And it also affects the boundaries of re-compilations.

1

This is only about rustc specifically - the LLVM part of the compilation may parallelize individual crates. You can read more on this in the rustc dev guide and the rustc book.

Crates themselves also have strict rules on disallowing cyclic dependencies 2, and of course the orphan rules 3. A crate is always tied to a file hierarchy on disk, and needs an accompanying Cargo.toml file. Matklad (of Rust-Analyzer fame) goes into detail on their blog how crates affect compilation times, and how changing crate hierarchies can be used to significantly improve throughput.

2

Given a and b. a can depend on b, or b can depend on a. But they can't both depend on each other, because that would cause a cycle in the dependency graph.

3

Orphan rules come down to: if you want to implement a trait for a type, you must either define the trait or the type in your crate. You're not allowed to implement a trait you haven't defined for a type you haven't defined.

Finally, some features such as procedural macros can only be defined in crates which have a specific configuration that disallows any other type of code to be exported. Crates, more generally, also serve as a boundary of configuration.

backyard
├── Cargo.lock
├── Cargo.toml
└── src
    ├── garden
    │   └── vegetables.rs
    ├── garden.rs
    └── lib.rs

An example of a crate representation, copied from the Rust book.

For the purpose of this post we'll not be talking on the Rust 2015-era concept of extern crate.

What are modules?

A module is a unit of code with loose encapsulation rules: Cycles between modules are allowed, and a module does not necessarily correspond to a file hierarchy. Crates are a lightweight way of splitting code up into logical components. Here's an example of a valid mod relationship (playground):

// example of cycles in modules

pub mod foo {
    use super::bar::*;
    pub struct First;
}

pub mod bar {
    use super::foo::*;
    pub struct Second;
}

But modules can also correspond to a file hierarchy. In the following example, backyard is the crate, with lib.rs representing the entry point, and internally contains crate::garden, and crate::garden::vegetables as sub-modules:

backyard
├── Cargo.lock
├── Cargo.toml
└── src
    ├── garden
    │   └── vegetables.rs
    ├── garden.rs
    └── lib.rs

Modules also accept visibility modifiers such as pub(crate) or pub(super). This restricts the encapsulation rules somewhat by making the code accessible from fewer sites. It also changes things like being able to access fields on structs. But while it's stricter, it still allows cycles between modules. Which is not the same level of guarantees which crates provide.

comparing modules and crates

To summarize what we've covered so far:

nameencapsulationfile reprin-source reprcost to define and maintain
cratestricthigh
modlooselow

Crates and modules are both incredibly useful concepts, with benefits and tradeoffs. But today it's significantly harder to define new crates, than it is to define new modules. Features like "workspace inheritence" lower the barrier somewhat, but there's still a pretty large difference between how easy they are to introduce to your project.

Adding a new module is basically just adding a mod {} and copying over the right imports - something which Rust-Analyzer can even do for you. But adding a new crate is far more involved: it requires creating a new module hierarchy, adding the right dependencies, adding the right imports to the workspace, and then linking the right imports back to the crates you want to use. This could be automated as well, but that only makes authoring crates easier - it doesn't help with readability.

And that's not even taking into account releasing crates. When you add a new mod to a crate it doesn't affect the ability to publish new crates. But when you create a new crate, you now have a new dependency. Which must be accurately versioned 4 and published before the main crate can be released. The fact that "strict encapsulation semantics" are necessarily tied to an on-disk hierarchy is not ideal.

4

"workspace inheritence" can also help with versioning here.

in-line modules

You can probably see where this is going. In my opinion Rust would benefit from detaching crates from the module hierarchy, and allowing them to also be defined in a manner similar to modules. A single "crate" should be able to contain many sub-crates, all of which enforce the same strict encapsulation rules as their on-disk counterparts. The only difference being in how they're declared. Much like workspace.inherit, sub-crates would inherit the dependencies and version number of their parent crate. And unlike on-disk crates, they wouldn't be individually publishable to crates.io, but instead just be sent along with their parent crates.

To give an example of what this would look like; I'm thinking we adopt a very similar syntax to modules:

mod foo {}   // define an in-line module
crate foo {} // define an in-line crate

mod bin;     // import a module from a file
crate bin;   // import a crate from a file

Example

Being able to distinguish at the source-level between "public" and "private" crates is something which would be useful for larger projects in particular. An example of such a project is probably Rust-Analyzer, which I happen to be familiar with.

You can see an overview of all crates in the code map section of the architecture.md file. Rust-Analyzer clearly distinguishes between "API boundary" crates, and all other crates. For example something like the ide crate is a "boundary", but internally uses other crates such as ide-db, and ide-assists which are not boundaries:

crates/
    hir-def/
    hir-expand/
    hir-ty/
    hir/             # boundary
    ide-assists/
    ide-completion/
    ide-db/
    ide-diagnostics/
    ide-ssr/
    ide/             # boundary

I've contributed to Rust-Analyzer in the past, but I'm not on the RA team - so I can't speak with any authority about the project. But it's not unreasonable that a project similar to Rust-Analyzer could benefit from only surfacing the API boundary crates in the top-level crate hierarchy - without needing to give up the strict encapsulation rules Rust crates provide.

crates/
    hir/             # boundary
        def/
        expand/
        ty/
    ide/             # boundary
        assists/
        completion/
        db/
        diagnostics/
        ssr/

There may be projects who prefer to use a flat module hierarchy even if in-line crates become available. But when starting it seems easier to be able to quickly define a new crate, and only later move it to its own hierarchy. The point of the crate keyword is to enable strict encapsulation without immediately having to resort to separate on-disk representations. Which should make it possible to let the structure found during prototyping hold all the way. As opposed of the current status quo where prototypes can be started using mod statements, but eventually you'll want to refactor into separate crates.

Implementation

I suspect implementing inline crates might not be an easy task. If I'm not mistaken 5, right now parallelism of compilation is driven by cargo spawning a bunch of instances of rustc which compile individual crates. If crates are represented as anything other than something you can point rustc(1) to compile, it might need some rethinking of the architecture.

I don't think that's an argument against inline crates though. But rather a reflection that the architecture of how we've organized the various rust compiler projects is reflected in the features provided it provides 6. And things features such as inline exist right at the seems of where the two projects meet.

5

Wesley checked this and said my understanding of this is indeed correct. I really wasn't sure about this in earlier drafts, but I'm thankful I was able to confirm ^^.

6

See: Conway's law.

Future Directions

pub crate

Alright, time to speculate a little. Conversations around "crates.io namespaces" pop up pretty regularly. The basic idea is that sometimes crates are correlated, and being able to group them together under a single namespace would be nice.

Take for example the windows crate. It's probably the single biggest crate on crates.io 7. Internally it's built up of lots of crates which are all versioned in lock-step, and hidden behind feature flags. Take for example win32's HTTP service API. To use it with the windows crate you'd need to define it in your toml like so:

7

I believe it exposes something like 30.000 unique types and traits. Like an order of magnitude more than e.g. web-sys does, which covers the entire web platform. windows is a pretty good example for a "big" crate that needs to be broken up into sub-components for the compiler to even attempt to build it in reasonable time.

[dependencies]
"windows" = { version = "0.41.0", features = ["Win32_Networking_HttpServer"] }

Instead, if the windows crate could be used as a namespace, we could imagine that instead of exclusively exposing APIs using feature flags, we may want to expose it as crates namespaced under windows. For example, we could imagine something like this:

[dependencies]
"windows/win32_networking_httpserver" = "0.41.0"

There are some big "ifs" involved here: we're assuming this can be neatly split out into a single crate. But let's assume we can. How would we split it out? Today that would mean creating a new hierarchy on disk, and putting the right code there. And then in turn resolving the dependency hierarchy ahead of time to correctly version and publish the dependencies in the right order.

What if we could have an 1:N mapping to published crates from within crates? What if the definition inside the windows crate could look like this and it would be enough to both provide the feature from inside the windows crate, and be publicly accessible as a sub-crate?:

#[cfg(feature = "Win32_Networking_HttpServer")]
pub crate win32_networking_httpserver { ... }

I'd imagine the dependencies and versions would all be shared from the Cargo.toml definition. And dependencies between inline crates would be extrapolated and extracted into the manifest. I don't think this should replace workspaces in any way. Workspaces are a great feature, and give a great deal of control. But it seems like it could be a way to lower the overall burden of authoring new packages.

WASI components

And to continue the speculation: I've been wondering for a few months now whether we could provide an 1:N mapping of Rust programs to WASI components. In WebAssembly components serve as a boundary of strict encapsulation, not unlike but als not exactly along the lines of crates in Rust.

I'm fairly optimistic that WASI will present a best-in-class target to compile Rust to for networked services. And if we're able to provide fine-grained and easy-to-use mappings from Rust to WASI components, it might give Rust an edge over other languages. I suspect being able to lean into the strict encapsulation semantics of Rust crates may have meaningful benefits wrt compile targets that we're currently unable to leverage to its full potential 8.

8

I'm trying not to go full in on WASI encapsulation rules, and mapping Rust programs to WASI. But I think there's a lot there, and being able to define in-line crates may end up being a component in a full WASI story.

Conclusion

That's about it I think. I would love to see more equivalence between "crates" and "modules" in Rust. Even if we one day see a loosening of the orphan rules, or we see Rust move to fine-grained paralellism - having in-line crates will still be useful, as they're easier to create and modify than their on-disk counterparts. Even if it's just to prototype and experiment before wanting to commit to a final design.

To come entirely clean: I don't see this work panning out anytime soon. It would require an effective rearchitecture of the compiler, and by itself this feature wouldn't carry its weight to justify doing that. But perhaps if enough reasons build up, eventually we'll look to make the changes which would also enable this feature. Either way, I figured it'd be worth spelling this out.

I think we could ask some very interesting questions about what other benefits such a rearchitecture of the compiler could look like as well. Could compilation become more efficient? Could it become easier to optimize builds? What would we lose? What other features could be enabled? Even if we know we don't have the time for any of this now, I do think it's 1: fun to think about, and 2: perhaps something useful will come out of it.

I also want to at least state once that I don't believe this should be in place of any workspace improvements. But I see working in addition to workspace improvements. I view both approaches as improvements to status quo, and I think they would end up complimenting each other rather well!

Thanks to Wesley Wiser for proof reading!