Using HTML as a compile target
— 2023-02-03

If you've been around an industry for long enough you end up with Opinions™ about things. Here's one of mine: I think one of the biggest missing aspects of web programming is the lack of support for treating HTML as a compile-target.

Most folks reading this blog may not be aware of this, but I started my career working with web technologies, which I kept at for the better part of a decade. I co-authored one of the (at least to my knowledge) first ¹ CSS-in-JS engines, I've written frontend frameworks, designed web compilers, and authored entire HTTP stacks. Even if web tech isn't really my current area of focus, it's something I've worked on a fair bit, and still care about a lot. So here's a post about an opinion I have, in a field I don't do much in currently, but still think would like to share.

To my knowledge at the time at least. I'll totally buy that e.g. PHP or Google's web compiler may have supported similar stuff - but we didn't know about it at the time. When we designed sheetify in ~2015 ish css-modules was all the rage, and we figured we could do something similar without needing to overload the meaning of require. And we didn't know of anything that worked similar at the time. Later, styled-components became a thing, citing our work on sheetify as an inspiration. But that's all ancient history at this point.

A taste of HTML

There are well over 100 unique HTML elements in the spec. Let's zoom in on just one of those: the <li> element. It inherits from a class hierarchy which is 4 levels deep. It exposes two unique attribute values, of which 1 has been deprecated and shouldn't be used. And it can't just be placed anywhere in the DOM: it has exactly 4 valid parent elements, of which 1 also has been obsoleted. Here are some examples of it in action:

<aside>
    <li>first</li>
    <li>second</li>
</aside>

<menu>
    <li>first</li>
    <li>second</li>
</menu>

Quiz time: which of these is valid? Which of these isn't? None, both, either one? When you write the code, you don't know. It's only when you try and render the code in the browser that you'll receive feedback.

Okay fine, the answer to the last question was that <aside><li> is invalid, but <menu><li> is valid. Quiz number two (no peeking at the reference): what is rendered here?

<ol>
    <li value="2">second</li>
    <li value="1">first</li>
</ol>

<menu>
    <li value="2">second</li>
    <li value="1">first</li>
</menu>

Okay, answer time: first one is valid, second one is not. But both will render, without error. It's just that in the first case value is respected ("first, second") while in the second case it renders "second, first". Did you know that? I sure didn't.

As you're probably aware I'm not actually trying to teach you the rules of <li>, but just trying to show that this teeny tiny basic element has a host of interactions that will affect what you build. And we're only scratching the surface of this element. It also interacts with styling, event handlers, ARIA roles, and elements contained within. The fact that value attributes on <li> are valid when nested within <ol>, but not within <menu> is just one interaction of thousands out there. Everyone gets things wrong all the time with this. And we need to do better. All these rules are specced, implemented, and (mostly) consistent. We need a way to automate this knowledge.

Type-Safe HTML

I think we need to start by acknowledging that HTML is complex. Not in a deregatory way, HTML is also really useful, but as a statement of fact. There are more rules and interactions than anyone reasonable person can manage. Which means we shouldn't expect people to be the ones managing it.

Instead, I believe that HTML would benefit from type safety. Right now the way HTML is validated is by shipping it to a browser, and then checking the console tab for errors. Or sometimes people run linters on it such as lighthouse or pa11y. Sometimes these things even happen in CI.

A shortcoming of this approach is that in order to validate whether you've done things right, you first have to load your web page into a browser engine, and only then you can validate whatever was rendered. But that's where it stops: it only validates what was rendered at that point. If your page has different components which can be in different states, or also has different pages, you need to ensure every one of those is validated. And that's often more work than it's worth, so people don't bother.

Instead the way I'm thinking about it is: what if we moved the validation to a compilation pass. That way we can prevent errors from occurring altogether. A compiler is able to reason about all states, and validate they're all valid, all in a blink of an eye. No browsers required. For people who're looking to reduce the amount of runtime issues in their websites: this is by far the most feasible way to actually get the number of HTML bugs down to zero.

This isn't the first time I'm writing about this: I first pitched this in 2019 because I didn't have the bandwidth to implement it myself. But Bodil did, who ran with it and subsequently published the typed-html crate. This is good; I'd love to see different approaches of this same idea spring up in Rust and in other languages. All that's needed to get started is an idea of what the resulting API should look like. And then write a compiler which takes .webidl node definitions, and outputs types in the right shape on the other end ².

It'd be amazing if the W3C kept a repo with these definitions up to date themselves, so everyone else could just pull them in. But one can wish.

Type-Safe Semantic Represenation

Once you have a base of checked HTML types that's pretty good. But that doesn't mean we're quite done yet. Remember when I said we weren't getting into ARIA roles? Well, we are now.

So see, if HTML is structure, and CSS is representation, then ARIA is "semantics". WAI-ARIA, or "ARIA roles" for short, enables you to ascribe meaning to your structure. When you see a bunch of <li> items in text form, what are you looking at? Are they tabs? Are they a menu? Are they a dropdown? ARIA roles enable you to meaningfully distinguish between them. How many of these roles are there? Over 250, and they all have different properties and interactions.

Again: the path away from trouble here is by creating abstractions. Instead of having to remember which elements to apply the tablist, tab, and tabpanel roles on, you should be able to create something akin to Swift's TabView which encapsulates all of those differences for you.

This is a higher level of abstraction than just typed HTML provides. But it's equally important, since it's a requirement for useful keyboard navigation and accessibility. It'll take some more work than just plain typed HTML, and some more design work is required to incorparate best practices on how to actually structure elements since you can't just scrape a spec for definitions. But once you can reason about e.g. "tabs" or "feeds", building more complex things becomes a lot easier. Check out the ARIA Patterns page to get a sense of the kind of stuff this could cover.

Embracing Compilation

To recap so far: we've established that HTML and ARIA are complex interfaces, that checking during compilation rather than at runtime has a lot of benefits, and that applying this to ARIA it would enable us to reason about elements at a higher level of abstraction.

I believe that in order to make this all make the most sense, the web is missing "sourcemaps for HTML". If we're going to be compiling a bunch of semantic components down to HTML, then devtools should enable us to reason about them as components. When inspecting HTML in the DOM, we shouldn't have to look at a bunch of <div>, <p>, and <span> tags. But more meaningful things, such as <feed>, <profile>, and <calendar>.

The way I think this is that devtools are like having access to a debugger. When debugging a compiled program we usually don't reason in terms of MOV rax jax and POWF bonk donk. But the operations we actually wrote such as: pet_cat, and big_stretch. It's relatively infrequently that actual output is what we're interested in, but it's always there if you need it.

Some of you web heads might be wondering: "What about Web Components? - Isn't that the same?" And, like, kind of! It's similar in how a reflection in a mirror is similar to the thing it reflects: similar at first sight, but actually everything is the opposite.

The point of the Custom Elements part of Web Components is that you can send <custom-tags> to the browser which are then interpreted at runtime to introduce interactivity, but preserve qualities such as cacheability of shared resources. A canonical example here is the relative-time-element tag used by GitHub. What we're doing is the opposite: the HTML we send to the browser is already fully expanded, and doesn't need to interpret any JS to enable interactivity. What we want is for the browser to be able to interpret all of our <li> and <span> tags generated by our components as the <calendar> and <feed> elements that we've authored. This is much closer to source maps: associate the building blocks we generated with the semantic components they actually came from.

The similarity with custom elements comes back up when we take a look at how custom-elements are rendered in the browser's Elements tab:

<relative-time
    tense="past"
    datetime="2016-11-23T11:12:43-08:00"
    data-view-component="true"
    title="Nov 23, 2016, 8:12 PM GMT+1">
    #shadow-root (open)
        "6 years ago"
    November 23, 2016 11:12
</relative-time>

This contains all the information anyone needs to debug the element. And what I'd love is to be able to access this view for server-rendered components without ever needing to go use any JS. And the uses don't stop there either: frameworks such as React also reason in terms of components and require custom devtools to re-surface that view. What we really need is a way to ascribe semantic meaning to our elements in a way that works for both server- and client-side applications.

Now, it could be that I've missed a capability that would enable doing exactly this with custom elements in browsers today - or perhaps some other mechanism. It's been a minute since I last used custom elements. But if that's the case I'd love to learn about it, because in my opinion every framework under the sun should start making use of that. I know I will!

Conclusion

In this post we've talked about the complexity of getting HTML right, how using a compiler could be used to mitigate those issues, what ARIA roles are, how we could model higher-level type-safe components, and finally what's missing in the browsers devtools to round this out.

I strongly believe that type-safety would be a boon for the browser space. People have already experienced some of it with the adoption of TypeScript, and I believe leaning further into types and type-safety would enable the web platform to become far more accessible - both for developers and users.

The only piece we can't directly control is the ability to teach browsers to reason about the semantic meaning of elements - almost like being able to declare source maps for the resulting HTML. In an alternate universe this is what Custom Elements would have been. But I believe in order to make the web platform more accessible, we must make it possible to actually reason about <feed> and <calendar> as first-class items. And not just the <div> and <p> tags they're comprised of.

update (2023-04-07): As coincidence will have it, two weeks after publishing this post the chrome and safari teams began discussing the upcoming "declarative shadow DOM" feature. This seems like it will resolve the outstanding issues involving shipping semantically meaningful elements from the server.

update (2023-04-11): I published the html crate implementing most of this blog post. You can find the source on my GitHub at yoshuawuyts/html.