Reliability
— 2020-05-02
Last December I turned 28, and last September marked 6 years since I started programming1. Reflecting on the past I'm quite pleased with what I've achieved. But every now and then I like to take a moment to look ahead and think about what I want to work on next. This is a post sharing some thoughts.
Time is a weird one. Last March felt like it lasted a quarter. April felt like it lasted at most two weeks. I don't know how many years I'll have in this industry, but it seems likely I'll have more time remaining working in it than that I've spent so far. So it's worth planning ahead and considering what to invest in that will pay off in the long run.
Which things are hard for me?
There are an infinite number of things in the realm of computers that I'm not good at. Some include:
- Designing algorithms. I never put in the hours, so I'm just not good at it.
- Writing reliable software in one go. My work tends to need several bug-fixing cycles.
- Atomics and lock-free algorithms. This is like algorithms, but harder.
- Everything graphics and hardware. I haven't spent any time in either field.
- Optimizing performance. I'm not good at measuring performance, let alone improving it.
Some things that I used to think were hard, but have gotten better at are:
- In 2018 a goal I set for myself was to become fluent in Rust.
- In 2019 I wanted to become fluent in async Rust and make it more accessible.
- In 2019 I wanted to both gain confidence and skill at writing.
This year I'd like to continue my work on Async Rust, with a continued focus on the web space. This is a critical area for Rust to succeed in, as success in that field means more people will be able to use Rust at their jobs, which in turn means we can expect larger investments to go into Rust. But to me that seems mostly like a matter of putting in the hours now. A personal goal for professional growth I'd like to set is to become better at authoring reliable software.
Reliability
"It's been amazing working with <colleague>. They managed to implement <feature> in record time, and it's since worked flawlessly. I've never seen anything like it."
— A colleague describing another colleague's work last year, paraphrased.
This exchange has really stuck with me -- because I realized that it didn't just describe some of my colleague's work. It described all of their work. It felt like their work consistently had fewer bugs than everyone else, while often also being friendly to use and rich in features. Even when building complex and novel programs.
I aspire for my work to be of the same quality. I feel I'm able to produce designs that are friendly to use, fairly quickly. But it's nowhere near the level of reliability that I would want it to be. Which means I spend a fair amount of time fixing bugs after publication. And I'd like to do better!
That's why I'd like to make it an explicit goal for myself over the next year to improve the reliability of my work. It'll be a fun challenge to bring down the number of bugs in my work in a structured way.
Approach
My initial thesis is that "reliable software" is made up of three parts that can be individually improved:
- The software that provides the functionality we actually want.
- Assertions to validate the software works as intended.
- Introspection tools to help debug problems.
Even slight progress on these facets will have a compounding effect on the reliability of my work.
My starting point for "better software" is the talk "PID Loops and the Art of Keeping Systems Stable" by Colm MacCárthaigh. The talk is about bringing practices from industrial control systems to everyday software engineering. It covers many useful concepts that seem worth trying out; in particular the notion of building "measure-first" software seems intriguing. It also seems state machines could play a part here.
My starting point for "better tests" is fuzzing and property testing 2. For the
datrs project I wrote property tests for
most modules, which helped catch many bugs 3. cargo-fuzz
now has
support for structure-aware
fuzzing,
and I recently became aware Fred Hebert (of Erlang fame) wrote a proptest
book.
Making fuzz testing a regular part of my workflow will likely yield large returns.
Fuzz / property testing in a nutshell: write a program that can generate test cases enabling software to be much more thoroughly tested.
I gave a talk about it here.
My starting point for "better introspection" is to really up the amount and quality of logs I write. I often don't add logs to my work until the very end, and even then it leaves much to be desired. Being much more proactive with this will make systems much easier to debug, which in turn means less time spent debugging. The work Eliza shared yesterday provides a glimpse of what better introspection tools can be like, and has me excited about possibilities here.
Conclusion
In this post I've covered why software reliability matters, and how I plan to improve:
- Adopt patterns to write inherently more reliable software.
- Use automation to test more often, and more thoroughly.
- Make writing log points a regular part of my workflow.
I'm sure as time goes on I'll learn about more approaches, and deepen my understanding of what makes reliable software. Some of these ideas might not work as expected, or take too much time to be practical. But I want to actively try and build more reliable software, and as always I'll be documenting the process.