Path-Sensitive

Monday, April 1, 2024

As you’re working day-to-day, every so often it’s nice to take a step back and improve things.

Some changes require having a strong understanding of how things are done and why, but can lead to huge benefits. Think: a chef finding a better way to brine poultry to preserve flavor, or finding a baking schedule that lets most customers get bread fresh out of the oven.

Other changes lead to a slightly pleasant environment, but can be knocked off without much thought when you walk by. Think: straightening the chairs in the dining area, or throwing away the cracked plates.

Refactoring a codebase can be intense and scary. Kent Beck’s ”Tidy First?” suggests we reduce the barriers by starting small. It offers a catalogue of “tidyings”: small, uncontroversial improvements you can knock out right before you code. Sorta like the software equivalent of straightening the chairs.

And then it builds on that. It talks about whether you should straighten the chairs before or after you cook. About whether you should straighten the chairs and clean the cupboard in one batch or two. About the theory of why having straightened chairs leads to a more pleasant environment and improved business.

By the end of it, you may have forgotten that deeper improvements even exist.

Shorten first?

This book has one cardinal sin: not understanding when more details are helpful vs. superfluous.

On the side of harmful brevity, there’s Chapter 12 about when and how to extract helper methods. I think everything in the chapter is correct and good advice, but it’s also vague enough that I can’t see it being useful to anyone who doesn’t already understand it. And one of the core tidyings is described by just a few lines of text and example code featuring functions foo and foo_body, so that I myself had trouble understanding what he’s recommending, especially since it mostly applies to just a couple of languages.

But on the side of verbosity, there are a lot of paragraphs like this:

Behavior creates value. Rather than having to calculate a bunch of numbers by hand, the computer can calculate millions of them every second. Turns out people will pay not to have to calculate numbers by hand. If running the software costs $1 in electricity and you can charge folks $10 to run it on their behalf, then you have a business.

(Chapter 23)

You know what doesn’t create value? Spending 4 sentences to say that automating stuff is worth money.

This pattern — giving extra detail where it’s obvious, omitting detail where it would be helpful — repeats throughout the book. And so, even though “Tidy First?” is pretty short at around 100 pages, I feel like it’s constantly trying to fill space.

Sometimes the space issues go deeper. Like, I’m reminded in parts of a story I read about a “computer skills” class taught in a 3rd-world classroom without computers. Think: a lecturer stands in front of a chalkboard, and drones, “When you click the Start button, a menu would appear. Then you would mouse over…”

Chapter 17, “Chaining,” is like this. It’s supposed to be a list of examples of how doing one tidying might lead to another, like how you might see a new opportunity to reorder code after deleting a branch that never runs. But it’s tough to follow and impossible to remember if you haven’t tried tidying and seen this play out. And if you have, then your own experience will show you far better than this chapter could.

The actual list of tidyings is fine. Most are pretty uncontroversial small improvements. He says some things about reordering code that I find overly simplistic, and generally a smaller improvement than adding good code sections. He repeats the typical line of “Delete code that’s not used instead of commenting it out, because you can always recover it from VCS.” (That’s a view that I’ve [controversially] started to turn against, for the same reason that “you can recover it from backups” is not a compelling reason to delete files currently not in use.) He gives some examples of ways that programmers, even after being taught in intro classes not to use magic numbers, still litter their code with constants like 404. But I wish he’d tell Python programmers to stop designing APIs where you write string constants like “r--” and “bs” to denote that your scatterplot should use red dashes and blue squares. His catalogue lists the changes of “Add a comment that you wish you had when reading the code” and “Remove a comment that just says what the code says.” But I’d rather have a deeper discussion that offers a bigger toolbox of how to improve comments. (The discussion in A Philosophy of Software Design is my favorite part of that book.)

Connect the ideas last?

On the whole, Beck seeks to provide a repertoire of easy wins you can earn when you drive by some code. I think he succeeds in that goal, although it’s a rather impoverished repertoire. (E.g.: they can’t solve any of the refactoring challenges I’ve been running lately.) But if you’re someone who lacks a repertoire at all, I can see them as great training wheels.

But then he spends 2/3 of the book talking about how to schedule time for tidying and how reducing coupling reduces the cost of change and the benefits of making decisions reversible and other stuff that’s not very relevant to deleting useless comments and adding blank lines to break up code. It feels a bit like he wrote a book on these shallow tidyings, and then also wrote a book on deep refactoring, and then accidentally got the chapters mixed up.

And then when I asked him about this, he actually said that I’m right but should wait for his next book: “I'm giving concepts bottom up--tidyings--and top-down--theory. I *will* meet in the middle, I promise.” And he’s pursuing this strategy because: “I think having people practice designing consciously for a year is good prep for being able to understand the next layer.”

Maybe he envisions his readers growing up with the series as it matures from microscopic code improvements to aligning a team? Perhaps I have a higher opinion of other people than he does, but I don’t understand how he can say that while also marketing it to senior engineers.

If you read this book, I recommend just skimming the first part with the list of tidyings and figuring out what everything is. He does say some stuff later about how to send tidyings for code review and why they provide value, but I’d expect most readers will be able to produce the same insights themselves after trying them a few times.

But really, the interesting contents could be shortened to a list of Tweets. And I’ve done so here.

Kent Beck is a rather renowned software engineer, having created Test-Driven Development, Extreme Programming, and jUnit. His Substack has many pieces I enjoyed a lot, such as his posts on bitemporality and measuring developer productivity. In our interactions, he’s been an exemplar of a gentleman, and also writes about the insights he’s learned that’s made him so. Overall, he has a lot of interesting things to say, both about software engineering and about dealing with people and emotions. But you won’t find them in this book.

So, if you’re considering buying this book, just purchase a subscription to his Substack instead. He’ll earn more and you’ll learn more. The only one who loses is the publisher.

Thanks to Hillel Wayne for comments on earlier drafts of this post.

Friday, December 1, 2023

You’re a line programmer for EvilCorp, and it’s just an average day working on some code to collapse the economy.

Then you realize you need some code for disrupting supply chains.

Should you split it into a new file?

Let’s say you do.

Pretty soon your directory looks like this:

It’s so well organized! You want to know what it tracks about the robot armies, it’s right there.

Except that all your files look like this:

And the control flow through the files looks like this:

Now you’re starting to regret having broken it up so aggressively.

Now back up a bit.

Let’s say you keep it in one file, at least for now.

Then you need to add some code for supply chain and robot info.

This repeats a few times. If you’re lucky, the file looks like this, where lines of the same color represent related code:

There, we can see related code is mostly together, but there’s some degradation, and a few things don’t fit neatly into any category. Needs some weeding, but overall a decently-kept garden.

If you’re less lucky, it looks more like this:

This is a file where there clearly used to be some structure, but now it’s overgrown with chaos.

Of course, if you don’t have the whole file committed in memory, it looks more like this:

Man, just figuring how it directs the army is a headache. That really should be broken out into its own file. But don’t you wish you had done so earlier?

In nearly all ecosystems, programs consist of files consisting of text. When you break code into more files for more categories, it becomes easier to find and understand code for each category, but harder to read anything involving multiple categories. When you keep code together into fewer files, it becomes easier to track the control flow for individual operations, but harder to form a mental map of the code.

What if I told you that you can eat the cake and have it too?

Here’s how.

The magic third way

Let’s look at your colleague Tom in the service division of the robotics department. He works on the repair manual that keeps the whole company’s army running smoothly. One day he’s working on the section for how to maintain the mirrors in the laser cannons.

He realizes that he actually wants to add quite a few things about polishing the mirror. You see, the mirrors can only be polished with a custom nanoparticle solution, and so part of maintaining the mirror is really about maintaining the polish. Where to put this information?

Unlike in code, it’s a pretty big deal to “split stuff out into a new file,” since they like to keep everything in one volume for the technicians. Putting it in a new chapter would mean an awful lot of page flipping. And it’s quite messy to just mix in a lot of sections about maintaining the polish into the larger chapter on the laser cannon.

But he has no problem adding them with more organization:

That’s the normal way to organize books, with chapters and subchapters (and sub-subchapters). Or, in HTML: h1, h2, etc.

We have them in code too.

They look like this:

/*******************************************************
 **************** h1 in C/C++/Java/JS ******************
 *******************************************************/

/**************
 ********* This is an h2
 **************/


/********
 **** An h3
 ********/

Or this:

################ Python/Ruby/Bash H1 ################

############## An H2

##### An H3

Or this:

--------------------------------------------------------
--                   Haskell/Lua h1                   --
--------------------------------------------------------


-----------                An h2             -----------


------ An h3

Or any of countless other variations. They all get the job done. I tend to like the ones that more visibly break up the text, so: like the first bunch, except translated into whatever language I’m using.

I teach people a lot of things about software design. Some of them are things I dusted off from papers written in the 70’s. Many more I can claim to have invented myself.

This one, not in the slightest. In fact, here’s some CSS guys doing it, in the frontend framework “Semantic UI”:

/*******************************
             Types
*******************************/

/*-------------------
       Animated
--------------------*/

.ui.animated.button {
  position: relative;
  overflow: hidden;
  padding-right: 0em !important;
  vertical-align: @animatedVerticalAlign;
  z-index: @animatedZIndex;
}

And here’s some smart-contract developers:

But I can say that I’ve never seen anyone else write about it explicitly, nor take them as far as I do.

“Jimmy,” one listener commented. “At my workplace, I’ve seen a lot of code with these sections, but stuff keeps getting added in the middle, and then the sections become meaningless.”

“Is it too hard to just add a subsection for exactly the stuff added?”

“Actually,” he replied, “I don’t think I’ve ever seen subsections.”

But my code these days has it everywhere. On occasion to three or more levels of organization.

(What in the world is this code doing, just renaming strings? Another deep idea and another discussion. Short version: Trying to get the benefits of having a fancy datatype for identifiers without actually doing any work.)

Aggressively splitting files into sections and (sub-)subsections is the biggest way my code has changed in the last 5 years. It requires little skill, and, once you build the habit, little effort. But I’ve found it makes a huge difference in how pleasant it is to live in a codebase.

Cognitive load and design reconstruction

Hopefully I don’t have to argue too hard that code organized into sections and subsections is nicer to read, if not to write. Here are two versions of a code snippet (source) from Semantic UI: the original, and one with the section dividers removed. Personally, even at a glance, I find the version with them present more inviting.

There are actually some pretty deep reasons why sub-file organization works.

We saw that splitting a file comes with advantages and disadvantages. As does not splitting.

But, actually, for not splitting, all the disadvantages were in reading it.

When you start a new feature, you have some high-level intentions. By some process, you turn the high-level intentions into low-level intentions into code.

But when someone first looks at a file, it’s just a blob.

Then they start to read and build an understanding of each piece.

As they understand more pieces, they can begin to understand how they fit together into a bigger picture.

But all this is wasted work! The reader is just trying to reconstruct knowledge that was already known to the writer!

That’s a very general problem. Anything done to counter it falls under the umbrella of what I call the Embedded Design Principle. Splitting a file into sections is just one particularly effective instance of this broader idea. As poetically explained in The 11 Aspects of Good Code:

Good code makes it easy to recover the intent of the programmer

A programmer dreams a new entity. Her mind gradually turns dream into mechanism, mechanism into code, and the dreamed entity is given life.

A new programmer walks in and sees only code. But in his mind, as he reads and understands, the patterns emerge. In his mind, code shapes itself into mechanism, and mechanism shapes itself into dream. Only then can he work. For in truth, a modification to the code is a modification to the dream.

Much of a programmer's work is in recovering information that was already present in the mind of the creator. It is thus the creator's job to make this as simple as possible.

Back to the robot armies. The reader has started to piece together a bigger picture.

In this example, the code was written in three sections and then not edited. That brings an offer: understand the first three functions, and you understand the big ideas of the mechanics behind sending forth a robot army. Understand the next three, and you understand the bigger picture. But the reader in the picture hasn’t found that structure yet.

Piecing this code together is like a jigsaw puzzle. And in a jigsaw puzzle, if I were to give you a box with only pieces from the left half, and a box with only the pieces from the right half, it would be more than twice as easy.¹ That’s a lot like what you’re doing for the reader by labeling code sections.²

There’s one more benefit too. I and many others I showed this to report a sense of relaxation and calm from skimming through a well-sectioned file, a lot like coming home to a clean room. I think what’s going on here is cognitive ease: there’s a psychological phenomenon in which easy things literally cause happiness. There’s an entire chapter on it in Kahnemann’s Thinking Fast and Slow.

Oh, and then there’s also how naming is one of the two hard problems of computer science (the others being cache invalidation and off-by-one errors). If you put two functions that coordinate robots surrounding and invading a factory into a file, then you’re going to want to think of some general name that captures both these and everything similar that should go in the same file. That sounds kinda tough; my best is “offensive_tactics.ts.” But you just cordon these off into a little section of a larger file containing the whole supply-chain disruption logic, then naming that section is a much lower bar. After you find yourself writing additional related functions, then you can break it off into a new file as easily as you can change a subchapter in a book into a full chapter.

So there's a lot of costs and benefits to breaking up a file vs. keeping it together, and we've seen that having sections and subsections does a lot to lower the cost of keeping it together. But actually, it's pretty rare that I've seen people go too aggressive in breaking up files. More often I see people who think breaking up a file would make it more organized, but there's just too much inertia. And the bigger a file grows, the harder it becomes to break out meaningful components.

That's why having a handful of giant files used to be the hallmark of a bad codebase, one completely disorganized. But this is the real greatness of sections: it's a way to get much of the benefits of splitting up files, but it feels more like jotting down a thought you had than actually doing work. And if you keep things organized in sections, then it's not any harder to break apart a file later than it is now.

So now we know that, just by recording a little bit more of your thinking when writing code, it’s possible to have files which are both large and well-organized. And doing so lets you read code faster, follow control-flow better, delay having to find good names, and literally injects happiness into your life. Let’s make our files large again!

Of course, this is still not the easiest thing you can do to lower the cost of large files.

That would be buying a bigger monitor.

Thank you to Jonathan Camenisch, James He, and Supachai “Champ” Suwanthip for discussion on the ideas behind this blog post. Thank you to Benoît Fleury, Torbjörn Gannholm, Oliver Chambers, and William Berglund for comments on earlier drafts.

¹ I had to check this one and, it turns out, average solving time for jigsaws is remarkably linear in the number of pieces. But, if you like jigsaws and know some computer science, we can reason about the complexity of each step of solving: first find the corners and edges (linear), then group pieces by region (linear-ish), then solve the parts of the puzzle where each piece looks distinct (linear to quadratic), then solve the parts of the puzzle where the pieces all look similar (near quadratic). Through this lens, a large jigsaw is actually composed of many subregions, each of which could take near-quadratic solving time. In the worst case, the jigsaw is just a solid color, and you’re stuck comparing each edge pairwise, which is clearly quadratic unless you’re really good at indexing on the shapes of the holes and protrusions. This invites the more accurate statement: for each of the quadratic-time subregions of a jigsaw puzzle, if I were to split the pieces into a left and right half, the solving speed for that subregion would roughly improve by 4x. This is both more accurate and a better metaphor for the effect of adding subdivisions to a source file.

² The ideal would be to do the code equivalent of handing someone a painting instead of cutting it up into jigsaw pieces in the first place. The programming equivalent would be to actually make your designs into the program. Choices for approaching that include writing with declarative libraries, using symbolic program synthesis techniques, or using ChatGPT and letting natural language be the code.

Friday, November 17, 2023

There’s a ubiquitous piece of startup advice: “Sell painkillers, not vitamins.” With painkillers, the story goes, you fix something that’s been bothering the customer immediately. With vitamins, all you have to offer is some vague future wellness benefit. Through this lens, a lot of product ideas are bad businesses because they are vitamins. But is it true?

I think there’s a message here that a lot of engineer types interested in creating a product need to hear. Working on something new, it’s easy to get in a bubble where the only thing that matters is the thing you’re trying to improve, and it can be shocking to encounter people who don’t seem to care, even when your offering could dramatically improve their lives. Even a lot of sales types, when asked to “sell me this pen,” will immediately launch into a spiel on the product’s features, instead of engaging with the customer’s actual desires.

But all that is a little deeper than the words “painkillers, not vitamins.” And so what actually results from people applying this slogan is:

Debates about whether a given product is a vitamin or painkiller
Declaration that some company is doomed because they “sell vitamins”
People talking themselves out of attempting “vitamin ideas” at all

Simple models help us evaluate ideas. But this metaphor acts less as a model to guide decision-making and more as a thought-stopper, something that prevents a lot of useful products from getting built. Citing this analogy, Ben Horowitz once told me and a group of friends “You wouldn’t pay if someone built a 2x-better way of brushing your teeth.” But at that time, I was interested in optimizing my morning routine, and would have loved Andreesen-Horowitz to fund exactly that. Something that makes your web browser 2x faster while using less battery? No; vitamin, not painkiller. Automating payback of student loans? No; vitamin, not painkiller. And my first company, something that fixes bugs in your program for you? Get out! Vitamin, not painkiller!

In all of these cases, the entrepreneur would have been better served by being given an actual model of customer behavior to evaluate potential plans. Browsing the examples above, this advice more often than not serves as a way for the speaker to feel smug in demeaning someone’s plan without engaging in the real question of customer profiles and how to reach them.

So maybe the metaphor has been misused, but it seems a bit extra to ask people to stop using a useful metaphor just because someone finds it annoying. If you talk about how painkillers outsell vitamins, won’t that help people build more useful products by focusing on the problem rather than the benefit?

No.

And not just because of this slogan’s tendency to outcompete useful thinking.

It is also literally wrong.

Sell literal vitamins, not literal painkillers

The metaphor rests on several supposed facts about the actual vitamin and painkiller markets.

Vitamins are cheap, painkillers are expensive.
Vitamins are about vague future wellness benefits, painkillers make the problem go away NOW.
People go out of their way for painkillers, but endlessly procrastinate on vitamins.
And because of these, painkillers far outsell vitamins.

And every single one of these is wrong.

Vitamins are cheap, painkillers are expensive? Well, here’s the painkiller section of my local CVS.

And here’s my local GNC.

At $59.99 for 30 capsules of the Vitapak Program, that’s $2.00 per capsule. Contrast that with the $0.05 to $0.325 per capsule of the painkillers at CVS.

That was the woman’s version. Let’s look at the men’s versions:

If you zoom in really hard on the Strength Vitapak, it says “14 day supply.” In a $79.99 box. That’s $5.71 per day of vitamins. You can buy an entire 24-caplet box of acetaminophen at that price.

Perhaps I’m being unfair comparing a discount drug store to a notoriously expensive chain. But that’s part of the point. What’s the painkiller equivalent of GNC? Somewhere where pain-free enthusiasts will gather to optimize their lifestyle with the latest and greatest way to avoid pain? The chiropractor?

Still, we can go down-market for a more apples-to-apples comparison, showcasing vitamins sold to everyday people instead of fitness nuts. Here’s the vitamin section of my CVS:

These range from $0.061 to $0.433 per tablet, higher in aggregate than the painkillers. And a lot of people consume multiple kinds of vitamins every single day.

Onto the next claim. Vitamins are about vague wellness benefits, eh? So, uh, have you seen actual vitamin advertisements?

Of course, this is a selection. Most vitamin advertisements basically just say “Pure vitamin!” In contrast, about all of the painkiller ads are of that form. I really tried to find good painkiller advertisements to compete, but mostly couldn’t. Here’s the best I found.

Maybe I cheated a bit. I already knew where to look for good vitamin ads, probably because I’ve been blasted with them all my life while the painkillers sat on the discount shelves. On the other hand, I cheated harder for painkillers, as the Tylenol ad is a famous one from 1975. It turns out there are a lot of ways to sell vitamins, while the painkillers are limited to “pure extra-strength aspirin,” “cures headaches,” and “fewer side effects.” Can you imagine someone writing a book titled “The Acetaminophen Advantage?”

Then we have the claim that people are more eager to take painkillers than vitamins. I would expect this to be true on aggregate, as a pure guess. But still we have medical providers complaining about patients who refuse pain meds and even laws to help people refuse them more. Googling for “vitamin refusal” gives me people who don’t want their babies injected with Vitamin K after birth. Googling for “vitamin procrastination” gives me people debating whether multivitamins can cure procrastination. I finally tried “can’t get mom to take vitamins.” Instead I got “How do I get Grandmother to stop taking supplements.”

And so, do painkillers really outsell vitamins?

Based on the things above, I was pretty sure the answer would be no. But even still I was shocked.

I collected sales numbers for an exhaustive list of vitamins, minerals and painkillers; over 40 chemicals in total. My methodology and data are in the appendix.

The result?

Not even close.

Vitamin and mineral sales total $46.723 billion, while over-the-counter painkillers total $17.232 billion. Vitamins and minerals outsell painkillers by over 2.5x.

If you add prescription painkillers but not prescription vitamins, then the painkillers do win out in my calculations, at $48.322 billion. But I think over-the-counter painkillers are the intended comparison, as the articles spouting the vitamins vs. painkillers metaphor usually talk about pills for relieving headaches rather than addictive substances given to cancer patients. Also, my estimates for vitamins and minerals actually omit a few that I couldn’t find numbers for, while the opioids market has probably shrunk since the size was reported. All is spelled out in the appendix.

So if you want to make money: sell vitamins, not painkillers.

Autopsy

By analyzing how the “vitamins vs. painkillers” metaphor failed, and trying to understand why it’s been repeated so much despite being based on a premise easily proven false, we can extract general lessons about how to create and sell products, and even about how people reason.

Let’s look first at the divergence in pricing. Both the vitamin and painkiller markets are in some ways static. New painkillers are rarely invented, and new vitamins are even more rarely discovered. But while people only need one painkiller at a time, they need all the vitamins and minerals all the time. And so there are far more ways to create a unique multivitamin than a unique painkiller.

This idea can be transported back to software with a new interpretation of what it means for software to be a painkiller. Rob Walling, in Finding Your SaaS Flywheel, uses “painkiller” as a startup metaphor in a different way: rather than than using vitamins vs. painkillers as a metaphor for “nice to have vs. need to have,” he uses them as different points on the cube of whether the customer is sold on the product’s value, whether the need is obvious, and whether they’re actively looking for it.

In Walling’s model, most enterprise software, the kind that takes a multi-thousand dollar paid pilot just to figure out whether it’s helpful, is a secret third thing. And this brings up a salient feature of the actual painkiller market. His prototypical example of “painkillers,” in his sense of the term, are invoicing and time-tracking software. Everyone knows they need them. And because of that, like actual aspirin, they are cheap and commoditized.

Why are the vitamin advertisements so much more creative than the painkiller ones? Actually, we can go further: by the traditional narrative that painkillers are supposed to be about reducing a problem the customer already has and notices, vitamins actually make for better painkillers than painkillers! Advil can only treat headaches, while The Magnesium Miracle treats headaches, insomnia, depression, and anxiety, and it can even help with weight loss!

The vitamin advertisers can make such bigger claims because they’re less regulated. But also because it might be true.

There is no stable distinction between something that solves a pain vs. something that merely provides a benefit. There is instead a spectrum of how large a pain is, and what profiles of customers care about it. Commenters attempting to classify products into this imaginary dichotomy thus often conflate it with “solves big problem” vs. “solves small problem.” In this article, for instance, the investor’s examples are both companies solving real pain points; it’s just that the “painkiller” one is solving a $35 billion problem, while the “vitamin” one’s is smaller. The Mighty browser got called a vitamin, but browsers being slow and draining battery is a real problem for laptop users of all stripes. They just didn’t solve it well enough.

Why is it that some procrastinate on vitamins and rush towards painkillers, while others refuse painkillers and spend thousands on vitamins and supplements? When people make a bold declaration about whether or not something is a painkiller that solves real problems for people, it papers over an obvious statement: there are many, many different kinds of people.

For instance, I sell software-engineering training products for a living. Some do it to get a promotion; some do it to help coach their team; others do it because they’re the kind of person that spends every weekend studying cool things for years on end. The correct response to “Does your product solve a pain that customers care about” is always “Let me go find different kinds of customers.”

And finally, why did people believe the “painkillers outsell vitamins” story when a quick stroll through the store so plainly contradicts it?

Facts invite contradiction, but stories have a way of suppressing thought and creating cached thoughts. When you adopt the “vitamins vs. painkillers” analogy, you are replacing concrete thinking with magical. Doing so replaces the current topic with something broader, traveling up the ladder of abstraction. The rich world of how a product is found and sold becomes replaced with the narrow aspect of which bottle at the drug store has a greater resemblance. Analogies may be good for generating ideas, but treating them as a source of truth is a path to madness. An argument over whether a startup is or isn’t a vitamin is not a failure in using the analogy: it’s an inevitable consequence.

In contrast to the lossiness of analogies, there’s another kind of comparison that only adds information: prototypes. If you call your delivery service “Uber for food,” the response is “tell me more,” not assuming details: “You’re going to get shut down for selling home-cooked food.”

And so it’s a coincidence that the typical “vitamins vs. painkillers” story is the one that caught on, instead of any of the others that can be extracted from the drug store. Contrast: “Everyone says they can relieve a little ache or pain, and a lot of them do. But with vitamins, you can really give them something new and exciting.” Then the same set of ideas might invite different discussion. But the reality would be the same.

So take these lessons and evaluate ideas as they are. Whether someone wants a faster browser is about what you can build and who you can offer it to and how you approach them. Learn to reason directly and not by analogy. Forget about selling vitamins and painkillers.

But if you do:

I once met a vitamin tycoon. His philanthropy exceeds $10 million per year.

Thanks to Benjamin Brule, Jun Hong “Nemo” Yap, Alexey Kommisarouk, and Jonathan Camenisch for comments on earlier drafts of this post.

Addendum, Just for Fun: Other Wellness-Inspired Product Positioning

Gym Memberships: Items whose sales pitch is based on “every healthy company is doing this.”

This is the positioning of my former employer Apptimize, which sold A/B testing software to mobile app developers. They rose during a zeitgeist where it was said that everyone should be doing A/B testing, and the buyers often saw it as part of shifting their company towards healthier data-driven habits. Products for agile development, software testing, and continuous delivery have all ridden similar hype trains.

Physical therapy / stretching: Something you keep doing because it’s good for you. You never notice it working when you do it, but if you’re physically active and you stop doing it for a few weeks, things start to hurt.

The prototypical example is Coca-Cola advertising. The product never changes and everyone knows about it, but somehow sales fall if they ever stop. Repeatedly reminding employees to follow some policy they should already know, such as handwashing, is in this category.

As the product becomes more passive and the protection it offers less certain, this morphs smoothly into the next category:

Helmets: Something that usually has no effect, but protects against catastrophic risk.

This is the sales pitch for every security and insurance product.

Heart surgery: Something you never want to do. But you’ll die if you don’t.

This was coined by my former employer Semantic Designs to describe their work, building bespoke tools that automatically transform large codebases. Time and time again, companies avoid solving problems that require massive change to their codebases, even when they can see disaster coming by not doing so. Consider Goldman Sachs’s Australian division, which was facing its own mini-Y2K problem by running out of daily transaction IDs. But they decided that buying a second mainframe so they could label their transactions “Mainframe A Transaction 1” and “Mainframe B Transaction 1” was easier than upgrading their codebase.

Goldman Sachs AU used 16-bit IDs for daily transactions., baked into 3M lines of PL/I. As daily volume approached 2^16, they considered making a refactoring tool. Instead, they bought another mainframe, and called it a day.

That was 2005. Wonder how many mainframes they have now
— Jimmy Koppel (@jimmykoppel) August 17, 2019

Semantic Designs’s most famous project, automatically converting the code for the B-2 stealth bomber to run on modern hardware, came after Northrup Grumman failed to upgrade it manually, and then failed again to build a tool in-house. Now the founder asks managers who come to him “Have you tried to do this kind of upgrade before?” If the answer is no, he knows that he’ll lose the sale as they try to do it in-house and fail.

Go out there and sell vitamins, sell painkillers, sell gym memberships, and sell candy. And be glad that someone else is trying to make a living providing heart surgery.

Appendix: Vitamins vs. Painkillers: Sales Numbers

If there’s one thing everyone agrees on about this question, it’s that the answer shifts with where you draw the boundaries. You can throw prescription drugs and even anesthetics into painkillers, but there are also prescription and injectable vitamins. You can add some types of massage to painkillers, and you can add other types of massage to vitamins. And if you want to consider products that are sometimes used and advertised for these purposes, you can toss the entire cannabis industry into painkillers, and the entire dairy industry into vitamins.

Below is my methodology for deciding what substances to include, followed by the analysis. The raw results for sales numbers of vitamins and painkillers are in this spreadsheet.

What is a vitamin?

Literally speaking, a vitamin is any molecule on the Official List of Vitamins™: 13 according to the WHO, and 14 according to Harvard Health. To be included, a molecule must (a) not be endogenously produced by the human body, and (b) cause health problems without adequate intake. Each vitamin is a family of molecules called vitamers. This is why you can get a toxic dose of Vitamin A from eating liver but not kale, even though a pound of kale contains 1400% of the daily recommended dose; they contain different vitamers. For instance, the Vitamin B3 family contains three vitamers: niacin (aka: nicotinic acid), nicotinamide, and nicotinamide riboside (NR).

So, by its narrowest definition, vitamins only include supplements containing this fixed set of molecules. But as a consumer market, there’s no reason to exclude other supplements which also rectify common deficiencies. For example, in spite of the above definition, Vitamin D actually is endogenously produced, using sunlight and cholesterol. It’s still on the Official List of Vitamins™ because it’s commonly deficient. This distinguishes it from Coenzyme Q10, where deficiency is a sign of a genetic disorder. But from a consumer and a commercial standpoint, there is no distinction; the bottles are sold next to each other on the shelves. And I expect NMN (nicotinamide mononucleotide), a supplement much hyped for its supposed anti-aging effects, to one day be considered a new vitamer of Vitamin B3, as the main pathway of Vitamin B3 first metabolizes them into NMN.

So we can similarly look at including other nutrients. Minerals in particular also have no consumer-facing difference with vitamins. Sure, there’s a chemical difference — minerals are single atoms. But it would even be difficult to draw even a pharmacological distinction: copper and Vitamin B3, for instance, both play a similar role as cofactors in the electron transport chain in cellular respiration (Vitamin B3 through its metabolite NADH). So there’s a strong reason to also include mineral supplements in even a narrow analysis.

The other two kinds of essential micronutrients are essential fatty acids (think: the Omega-3’s in fish oil) and essential amino acids (which would be included in any protein powder, but are the focus of BCAA [branched-chain amino acid] powders). These are midway between supplements and food, and there’s a case to be made for including them as well, particularly with the global fish oil market in excess of $2 billion.

But a definition has to be made to do the analysis, and it’s natural to exclude fish oil and BCAA supplements because they’re much more different from vitamins than are minerals. Here’s the definition:

For the purpose of this article, I define vitamins and minerals to be any nutrient which (a) does not provide calories and where (b) not eating it will cause some kind of deficiency disorder for a sizable fraction of people. We then only consider over-the-counter vitamin and mineral supplements.

This includes the usual suspects, like Vitamin D and Iron. It also includes multivitamins, CoQ 10, choline, and NMN. It does not include fish oil (great relief for an empty stomach), nor creatine (which is produced endogenously in enough quantity to make deficiency symptoms rare), nor melatonin (ditto, and also not a nutrient), nor glucosamine (where deficiency has never been reported), nor whey protein (which is just a food). Probiotics, herbal supplements such as tribulus and horny goat weed, and hormones such as DHEA are all right out.

What is a painkiller?

A painkiller is a pharmaceutical that relieves pain. The most common families work by suppressing production of prostaglandins, molecules involved in the transmission of pain information.

The over-the-counter oral painkillers are acetaminophen (a.k.a. paracetamol, Tylenol), aspirin, ibuprofen (Motrin / Advil), and naproxen (Aleve). Also on the list of OTC painkillers is a cocktail containing sodium bicarbonate, citric acid, and aspirin, commonly sold as Alka-Seltzer. I left this out because I could not find sales, especially as its generic name is difficult to Google. And there’s a risk of double-counting: while I don’t know the methodology for the aspirin numbers because the report costs $4250, it probably already includes either Alka-Seltzer or its raw aspirin component.

On top of the oral ones, there is one primary over-the-counter topical painkiller: lidocaine. A gel containing 1% diclofenac, an arthritis treatment, was recently made over-the-counter, although diclofenac is still primarily a prescription drug, and the numbers I’ve found are still for the entire diclofenac market. Browsing a list of pain-relief creams, the other active ingredients I found were menthol and trolamine salicylate. Beyond that, there are also non-medicated ointments that people use for pain, such as hemp oil. Continuing down this path, I could also include everything else people do relieve pain, like ice packs and ergonomic chairs. But at that point I should throw in the entire fitness industry under “vitamin.”

Beyond the over-the-counter painkillers, there are many prescription painkillers. My spreadsheet contains about 20 of them. Unfortunately, numbers on prescription painkillers are harder to find. Many numbers are for the US only, and some, such as the numbers for oxycodone and OxyContin (a particular form of oxycodone), are overlapping.

I could go even further and include actual anesthetics as well as the combined salary of every anesthesiologist. But this runs counter to the meaning of “painkiller” in the product metaphor. People ask for painkillers. But they usually don’t beg to be put under general anesthesia.

While I collected sales numbers for prescription medicines to the best of my ability, I believe an evaluation of this analogy should focus on over-the-counter painkillers, for two reasons. First, most of the articles about “vitamins vs. painkillers” focus on aspects of painkillers that are only true of over-the-counter painkillers. When someone writes an article arguing that customers want painkillers, they don’t mean that you should build a product which requires them to first undergo surgery to get permission to use it, and then causes lightheadedness and cramps if stopped too quickly. Second, prescription medications operate in a very different market with inflated pricing. For comparison, a bottle of prescription Vitamin D containing 7.5 milligrams total (300,000 IU) costs $28, while a bottle of over-the-counter Vitamin D containing 45 milligrams total (1,800,000 IU) costs $14, making the prescription version 12x more expensive on a per-milligram basis. And that’s for Vitamin D, as generic as it comes.

Speaking of prescriptions, there’s another very lucrative painkiller market I haven’t mentioned: cannabis. But it’s not possible to separate its painkiller uses from its recreational ones, even if I limit the analysis to people who are “prescribed” it. Such is the way of black markets in economic data.

So: Painkillers shall include over-the-counter analgesic medications, both oral (e.g.: Advil [ibuprofen], Tylenol [acetaminophen], and aspirin) and topical (e.g.: lidocaine, Capzasin [capsaicin]). It shall also include other chemically-active pain-relieving substances, such as IcyHot (menthol). It does not include products which relieve pain physically such as ice packs, hand warmers, or air casts.

Methodology

I looked up lists of vitamins and painkillers, and then Googled endlessly for “<name of chemical> global sales” and similar phrases, for about 50 chemicals. For each, I recorded the source of the number, and the year and region it’s for. For prescription painkillers, I primarily only found numbers from the US, but I found global numbers for nearly everything else. Spurred on by the example of NMN, a supplement not officially considered a vitamin but which rightfully should be, I asked ChatGPT 4 for more supplements which are synthesized from vitamins. It told me that CoQ 10 is synthesized from Vitamin K, which is false, although CoQ 10 was already on my list.

Why didn’t I just get global numbers from the same year for everything? Those numbers don’t exist.

Numbers on global product sales come from market research companies. Their employees call pharma and retail executives and survey doctors and, for all I know, send spy planes over vitamin factories. Naturally, they don’t do this for every chemical every year. The output is long reports like this, which sell for $5,000 a pop. Whenever they release a report, they also issue a press release with the top-level number.

So all the details beyond the top-level number, I don’t know. How are the sales of combination calcium/magnesium supplements reflected in the individual reports for the calcium and magnesium markets? I don’t know. Do the Vitamin D sales numbers include the wholesale vitamin D sold for use in multivitamins and to fortify milk? Do they include a portion of the retail price of multivitamins? I don’t know. Do the aspirin numbers include Alka-Seltzer? I don’t know.

And then there are the minerals where a single report combines both nutritional and industrial use. Some people supplement phosphorus, but the market is dominated by its use to supplement crops. And there’s a report saying the global market for manganese gluconate is $600 million, but it lists a dozen uses other than supplementation. If I want the breakdown, then it’ll be $4250. I wound up just leaving numbers for those minerals out.

Spreadsheet of all raw numbers

Results

Adding together the available numbers for global vitamin sales, the total is $11.438 billion. Adding choline and CoQ 10 raises this to $13.723 billion; adding NMN further raises it to $14.003 billion. I am missing numbers only for Vitamin B6.

Again adding numbers from different years, supplement sales of calcium, magnesium, iron, zinc, chromium, and selenium totals $32.720 billion. I refrained from adding in the $24.4 billion market for the most common type of sodium supplement, even if humans’ love of salty food is just a biologically-programmed urge for supplementation. I was unable to find supplement-specific numbers for potassium, phosphorus, manganese, iodine, or molybdenum. For some of these, supplement specific reports do exist, but the headlines are infuriatingly stripped of numbers: “The global Potassium Supplement market is projected to reach US$ million by 2028 from an estimated US$ million in 2022, at a CAGR of % during 2023 and 2028.” (Yes, this is an exact quote.) I could put as a lower bound one popular kind of potassium supplement, NuSalt and its sales of $1.2 billion, except that a lot of people eat it to satisfy the biological craving for sodium while not eating sodium.

I was surprised by some of the mineral numbers. I expected iron and calcium to be the biggest sellers. But actually it’s chromium at $10.5 billion, which I’ve never even seen on a drug store shelf before. I had to triple-check this number, but it seems real. Apparently it’s very popular in India because it’s supposed to help weight loss and prevent diabetes. Magnesium was a close second at $10.1 billion; I guess that “Magnesium Miracle” lady was onto something.

These sums include numbers from 2018 through 2022. Assuming that the markets only grow over time as the world becomes richer and more populated, this comes to a combined vitamin and mineral market of at least $46.723 billion. This is suspiciously close to the $44.12 billion number that one report gave for the entire vitamin and supplement industry in 2020, even though that report’s press release says it contains multivitamins, which in turn include a lot of things definitely not on this list.

The over-the-counter oral painkillers of acetaminophen, ibuprofen, and aspirin total $12.24 billion. Adding the topical painkiller lidocaine brings the total to $13.723 billion. I did not find numbers for naproxen, but the most popular US brand, Aleve, has US sales of $323.6 million. I also did not find general numbers for trolamine salicylate and topical menthol, although popular brands containing these ingredients — IcyHot and Biofreeze for menthol, and Aspercreme and Blue Emu for trolamine salicylate — have combined US sales of $420 million. Another source gives the US-only sales numbers for topical pain relief — including these but also opioids — as $2.6 billion. Capsaicin is sometimes used as a painkiller; a report proudly proclaims it has a market size of $XX. That report includes all its uses, including bear spray and particularly malicious brands of hot sauce.

For cases where I only have US sales, I’ll use this method of estimating: I found the total sales of oral OTC painkillers in the US to be $2.917 billion, or $2.593 billion at most without naproxen. Compared to the global sales of these same painkillers, $12.24 billion, that suggests a 4.72 multiplier to go from the US sales to the global sales. Applying this estimator gives global naproxen sales of $1.527 billion and an estimate for the global sales of creams containing menthol or trolamine salicylate of $1.982 billion. This gives a combined total estimate of global over-the-counter painkiller sales of $17.232 billion.

The largest-selling prescription painkillers are Diclofenac, Celecoxib, and the various opioids, including oxycodone, Tapentadol, codeine, and morphine. The two major non-opioids sum to $7.344 billion in global sales. The opioid numbers overlap in various ways (do the oxycodone numbers include OxyContin? I think so), and some of them are US-only. Fortunately, I have a separate number for global sales of all opioids: $22.66 billion in 2020.

Summing this, we get a global market for prescription painkillers of at least $30.004 billion. In the spreadsheet, I also have the US-only numbers for 5 less popular non-opioid painkillers, totalling $230 million. Using the same estimator as above, I estimate the global totals for these to be $1.086 billion, bringing us to $31.090 billion.

So, combined, over-the-counter and prescription painkillers total $48.322 billion, edging out the $46.723 billion estimate for vitamins and minerals. This is based on a mixture of numbers from different years, but you’d expect most markets to grow each year. But there’s an important exception: half of the painkiller market is opioids, a product under active attack and with expiring patents. And keep in mind that several minerals are outright missing from the estimate. So my money is on the vitamin and mineral market handily outpacing the painkiller market, if it hasn’t already.

And if you only compare vitamins and minerals with over-the-counter painkillers, as the analogy is usually deployed, then there is no contest.

Sell vitamins. Not painkillers.

Thursday, October 19, 2023

Resume-writing is a game.

There are two players.

There's you, trying to condense your whole life into one page in a way that presents you as the most impressive candidate possible.

And then there's the reviewer, trying to decode that page into a real person they can assess.

There’s been a lot written about writing resumes. And also on getting awards and grants, another variation of this game. My favorite piece is Steve Yegge’s Ten Tips for a (Slightly) Less Awful Resume. 15 years later, it’s still relevant. Turns out the ways people communicate about competence don’t change very fast.

But you don’t want a slightly less awful resume.

You want an actually good resume.

A few months ago, I finally shared my #1 piece of interviewing advice. I think it’s time I did likewise for this earlier part of job searching.

But there’s something you need to know about first.

Posers

Back at school at Carnegie Mellon, people were who they said they were. Classes were hard, and the number of CS majors was kept small. If someone told you they were a good coder, it meant they had stayed alive through labs that slaughtered their peers. Boast falsely, and there’s a fair chance someone in the room saw you spend 90 minutes on a programming final that took them 15.

But something that shocked me when I first moved to Silicon Valley was how often the “good programmers” aren’t. For the first time in my life, I met posers.

There’s the guy volunteering at a Stanford lab while claiming their project was actually his company. The person who was employee #9 at a $100M startup but told his housemates he’s the founder. The intern from the East coast who drew an audience with his tales of being a machine-learning researcher while brushing aside questions about his actual role in the research. The woman who introduced herself for years as the CEO of some startup that never seemed to do anything. And then there’s the countless Google and Facebook employees who thought getting a job at a billion-dollar company somehow made them the world’s best.

I think Silicon Valley tends to attract the extreme in this regard, and they tend to show up at parties frequented by ambitious young people. The rest of the world lies somewhere in-between that and my famously unpretentious alma mater.

But still, they’re out there. And some of them send in resumes.

Like there’s the guy applying for an enterprise sales job coming whose resume proudly listed “Business Sales — Apple, Cupertino, CA.” His resume spoke about all the companies he worked with, but soon we realized that he wasn’t putting together complex deals from Apple HQ, but just handing out Macbooks at a local Apple Store. (And not in Cupertino — around that time I learned the hard way that you can get Macbook t-shirts but not repairs at Apple’s only store in Cupertino.)

Then there’s the guy finishing up a master’s degree who talked his way into a first round software-engineering interview. He regaled me with the story of his past internship where he acted as both a software engineer and a project manager. Then we got to the coding part, where he happily churned out code full of unmatched braces and variables that don’t exist. We later learned from his former employer that they put him on project-management only after giving up on him producing useful code. And I learned not to take “graduate courses in machine learning and data science from _____ State University” as a meaningful signal, no matter how cool the project sounded in one sentence.

I saw it too when reviewing applications for the Thiel Fellowship. I distinctly remember an applicant who boasted about being “one of the only people able to program futuristic technology like Google Glass” and speaking in front of huge crowds at tech conferences despite not seeming to have accomplished anything of note. But he also claimed to have over 100,000 Twitter followers, and…that was true! That’s when I learned that Twitter followers can be bought. Now I see boasts about follower counts as a red flag without a clear reason for the follows. (I Googled this guy recently. He’s now claiming that Steve Jobs came to him for advice when he was 15.)

Real Evidence of Competence

All of this motivates my #1 piece of resume-writing advice.

So, here it is:

You have an incompetent evil twin who is trying to pass themselves off as you. You must say things they can’t.

That is, you must say things where “this person is competent” is a very likely explanation for you saying that, and “this person is overinflating themselves” is a very unlikely explanation.

There’s a certain law where, if you hear an ad for a game boast “Explore over 10 levels and fight with dozens of weapons,” then there are exactly 11 levels and 24 weapons. Likewise, jaded reviewers will interpret your resume as the weakest thing consistent with the text. So if you write “Helped launch new features with millions of users,” then the default assumption is that you took notes in the meetings and maybe built a few unit tests. But if you write “Sole developer of the Foobar feature, which is used by 500,000 people weekly,” then your note-taking non-coding doppelganger can’t compete, and all that’s left is to evaluate how impressive the Foobar feature actually is.

There’s a mathematical way to state this advice that I find illuminating.

Let P(t|c) represent “the probability this text was written given the person is competent.”
Let P(t|~c) represent “the probability this text was written given the person is not competent.”
You want to maximize P(t|c)/P(t|~c).

There’s a technical term for this: “Bayesian evidence of competence.”

So many of the mistakes people make in resume writing come from focusing on writing something that sounds like what an impressive and competent person would say — optimizing P(t|c) — without a corresponding focus on writing something that a poser couldn’t say without blatantly lying.

There are parts of the tech and business worlds where expertise is hard to acquire, where the choices are endless and subtle, and their consequences are years off. Software architecture can be like that.

But whether to pass a resume to the next stage is a binary decision, and even a beginning interviewer can quickly review hundreds of resumes and get rapid feedback on how well they predict interview performance, if not actual job performance.

That means that the potential interviewer reading your resume will almost certainly be very good at playing their side of the game.

But they want you to win. So just a few tips can make you much better at playing yours.

Just remember: the real game is not to write an impressive resume.

It’s to be an impressive person.

Tuesday, July 18, 2023

Lessons on code quality start in the first few weeks of learning to program, when a newcomer to the field is taught the basics of variable naming and told why programming languages have comments. They continue in countless blog posts and in every debate on a pull request.

Avoid it or embrace it, code quality training permeates one's entire career.

But it is so easy to lose sight of why.

Says the skeptic: pretty code is a distraction, like the gargoyles on a cathedral or the curlicues of Baroque architecture.

Says the maximalist: Have you ever had a debugging session that demanded you turn over so many stones it felt as if the world was crumbling under you? You don't achieve 10x velocity by becoming 10x faster at debugging, but by writing code that doesn't need debugging at all.

And says the guru: that which is merely a pretty distraction is not code quality.

I've made a professional quest to clarify all the fuzzy terms of software engineering. Students of our course learn definitions of code knowledge and coupling in the first lessons. One of my research papers gives a rigorous definition of dependence.

Today, I carry this quest to its apotheosis: what is quality code?

To this question, there can be no short exhaustive answer. Asking “what is good code” is a lot like asking “how do chemicals work?” It's the subject of an entire field.

But we can more easily ask what is the purpose of chasing code quality, even if achieving it is a craft worthy of a lifetime of study. To recognize quality code, we begin by asking: what are the external and internal properties quality code should have?

External Properties

Good code is done code

We begin with the cliché. All discussions of quality are grounded in the ultimate purpose of the object being designed. The purpose of the vast majority of code is to be executed as software which accomplishes some goal, be it entertaining people, helping them with their taxes, shuttling data, or testing other code. There is also a minority of code built for other purposes: experiments to see if something is possible, examples to explain a library or algorithm, and code that does tricks such as printing its own source.

For all of these, code that fails to achieve its purpose cannot have extrinsic quality any more than an abandoned construction site can be a useful building.

But this does not justify single-mindedness in getting a program to work. There are also non-functional requirements such as performance. Software cannot be quality if sluggishness sends users back to pen-and-paper.

And though an entire business may be dedicated to helping a software product fulfill its purpose, that does not subordinate all other functions to “code quality.” Whether code is quality cannot depend on factors that lie entirely outside the realm of engineering. The failure to market a software package does not make it bad, and a top-down directive requiring people to use it does not make it good.

And so good code is done code.

But we cannot stop there. That is not the end of the story. It is just the beginning.

For if you say “We got it done and delivering value to the customer,” that is not an excuse to your boss when you explain why adding a feature to wish users “Happy birthday” will take several years. And it is not an excuse to yourself when you spend 4 days debugging an issue that turned out to be a typo. Done code is not good code.

Good code is understandable

By one definition, an engineer is someone who understands a system at a deep level.

And, it follows that, for code to have good engineering, it must be understandable.

And sometimes, such as for teaching code, this is its entire purpose.

So you want code to be understandable. But understandable to whom?

To yourself and the people who need to read it.

Or more specifically: to those people at the time they need to read it.

The foolish engineer is offered a new skill, one that will shrink a segment of his code by a factor of 10. “No-one else will understand this” he says as he refuses to learn it. He thus reveals a low expectation of himself masquerading as a low expectation of others. He has hidden within his comfort zone of skill.

The arrogant engineer has a skill and knows it will be effective. “Anyone who cares about this code should be able to learn the technique I used.” If there are to be suitably ambitious readers, the choice turns out correct; but if the audience is one whose concerns lie elsewhere, then it does not. But an uninformed decision cannot be a good one. He has hidden within his comfort zone of empathy.

Either can improve by breaking out of their comfort zone, learning which walls to climb and how, and aiding the rest through construction and placement of ladders.

But the arch-engineer of engineers breaks the comfort zone itself. They are not concerned with climbing nor ladders, for those who follow shall suddenly find themselves atop mountains.

Good code is evolvable

Software is not a point in time but a system.

Precious few programs stand like museum pieces encased in glass, existing only for their own sake, or illustrating a piece of the frozen past. The rest are connected to other programs, to platforms, to growing businesses and rotating customers. They are connected to a changing world.

And now, as we spend our lives glued to screens that came from robotic factories and arrived via satellite-controlled ships, software is the changing world.

You cannot change the world without changing its software. Every software engineer carries the professional burden of building software that is easy to change.

And specifically, to change from one desirable state to another.

We must avoid creating rigid code that is difficult to change at all.

And we must also prevent brittle code that can all-too-easily be changed to something broken.

Good code is easy to extend and difficult to break. The power of a design lies not in what it can do, but rather what it can't do.

But why prepare for a future that may never come? It is impossible to predict the exact ways code will change.

Yet it is often easy to predict that code will change. Or even where. And that's all that's needed to create evolvable code.

Yes, it is a folly to design assuming certain changes will need to be made as life takes a certain path. But it is a greater folly to design as if no changes will occur at all.

Internal Properties

It is a lofty goal to say that a program must be correct, understandable, and evolvable.

It is an achievable goal to say that a single function should pass its tests, have few branches, and use abstract types.

And yet the summation of the latter yields the former. Extrinsic quality comes from intrinsic quality. These properties are presented below:

Good code can be understood modularly

Programs are composed of files. Files are composed of declarations. Declarations are composed of lines.

That is to say, programs are built out of pieces.

And every single piece has its purpose.

Every time a line is executed, a change occurs, and there are many true statements that can be said about each change. Most such statements are of no consequence, while some are crucial to the ensuing lines achieving their purpose.

But in some programs, there is a third category of statements. There are facts about the program state that become true on some line, and then are of no consequence until some distant line requires them to be true. Whereas most lines are of concern only to their neighbors, these two lines have grasped hands through a wormhole, their fates entangled. A change in one place can cause breakage on the other side of the universe.

In good code, the purpose of every line can be stated simply. Each line can be understood in isolation. For each, one can reason: if some simple fact about the state of the program is true, then, after running this line, some other simple fact will be true. The assumptions and guarantees of each line click together like Legos, forming simple and correct functions, which in turn click together into simple modules and simple programs.

In bad code, you read a function, ask whether it works, and then read a dozen more in order to have an answer. Changes must be made as tenderly as one playing Jenga, lest the tower collapse.

Good code works by design. Bad code by coincidence.

Good code makes it easy to recover the intent of the programmer

A programmer dreams a new entity. Her mind gradually turns dream into mechanism, mechanism into code, and the dreamed entity is given life.

A new programmer walks in and sees only code. But in his mind, as he reads and understands, the patterns emerge. In his mind, code shapes itself into mechanism, and mechanism shapes itself into dream. Only then can he work. For in truth, a modification to the code is a modification to the dream.

Much of a programmer's work is in recovering information that was already present in the mind of the creator. It is thus the creator's job to make this as simple as possible.

But to do so is a constant struggle.

Every naming decision is a quest to find the word that conjures in the reader's mind the true purpose of the named while warding off misconceptions.

Every function, a quest to carve behavior into something meaningful.

Every module, a quest to create new words that give new powers to the wielder.

Through each such step, we climb towards the ideal of making the program written not in the language of the machine, but in the language of the dream.

They say that for those who have reached the peak, they can simply dream changes to the program and it is instantly so. But great powers are had even by those who only make it partway.

Good code expresses intent in a single place

But it is not enough for it to be easy to go from code to design. It must also be easy to change the design to new code.

The shaper of atoms walks into a room under construction, wide open and brightly lit. “No!” he cries. “I want it to be dark and intimate.” Before him a vast itinerary of work is created, as that one directive demands thousands of strokes of the paintbrush and new choices for every object so contained.

The shaper of bits walks into a website, and says she wants a dark mode. In good code, she speaks the new colors that comprise a dark mode, and it is so. In great code, she merely speaks “dark mode” and the colors are inferred.

Yet all too often such a change, though simple, requires tweaks in thousands of locations, like a thousand well-coordinated strokes of the brush.

The bits should be easier to change than the atoms, for they live inside the machine.

Yet they can be harder, for there are so many more of them.

Good code is robust

If every line serves a purpose, then every line must be correct.

That means that every line is a new opportunity for a mistake to slip in unnoticed.

And how easy it is to make a mistake is something under the control of the software designer.

Some codebases are so treacherous that working in them is like a tightrope walk across the Grand Canyon. There are functions which require consulting a tome to invoke correctly. Writing to a data structure can produce nonsense. Reading from a data structure may produce only a partial story.

Other codebases are more like an elevator ride, to the point where not even deliberate effort can produce an accident. In such code, APIs have guardrails, where any misuse is either disallowed or can only be accomplished by spray-painting on a red flag. Try as you might, no write to a data structure can produce nonsense. If it compiles, it probably works.

As Tony Hoare says, one can write code so simple there are obviously no bugs, or so complex that there are no obvious bugs.

If you must think as hard as you can to check that a program works, it probably doesn't.

But in good code, you barely need think at all.

Good code hides secrets

Software is not a point in time but a system.

And it is not one system, but many interacting ones.

And each is constantly morphing.

But if it looks and acts the same on the outside, no-one will ever know.

It does not matter to the driver when a car's engineer changes its wiring. Unless, that is, the manual had told her in great detail what to expect from its electrical system and she had come to depend on it. The one who learns the car's battery can charge 5 cell phones for exactly 433 minutes before dying is the one able to achieve maximum performance. But, in a changing world, the one who uses this forbidden knowledge tiptoes close to ruin.

Subsystems are joined when their creator's minds are joined, in conversations that should not occur sharing details that should not be shared. Or when the single master fails to erect a firewall in his own mind.

The hotshot boasts about knowing everything. She creates software that can only be worked on by her fellow all-knowing.

The master's virtue is knowing nothing. And that's enough to maintain his software.

Good code isolates assumptions

Minimizing use of knowledge is the path to evolvability. Secrets are but the extreme, known only to their owners. Every use of knowledge ties the program to the World That Was, hindering the creation of the World That Could Be.

For every datum, there are the components that create it, the components that use it, and those in between that merely deliver it. Do those components pass the datum along like a sealed package? Or are those couriers prying into its contents?

A value is passed from one end of the program to another. Every function on the way that calls the value an “int” is another barrier to making it a float. And every function on the way that calls it anything is another barrier to turning this value into two numbers.

The physical world is full of irreversible changes. Build a building and the town shapes around it; burn it back down and forever shall the wind be tainted with its ashes. But when it comes to reshaping the world of bits, the only thing in a programmer's way is himself.

Good code is open

Programs deal with a domain, and both program and domain can be sliced countlessly many ways into sets. Sets of options! Sets of fields! Sets of formats! Sets of formats of fields which represent options!

And as programs and domains change, such sets grow and shrink. When good code deals with such a set, it is to the extent possible agnostic to the set's size.

The simpleton sees an entity with two possible values, and builds the program using a boolean. The next day, the possibilities have grown to three. A rewrite is required.

There is a set of three kinds of entities, and a program is written that can work with each of them. Then comes the day where one is deprecated. If the program was open in this set, then the relevant code is already in a box that can be discarded. If the program was closed, then branches all throughout have become skeletons demanding burial.

The open-minded person accepts new things. So does the open program.

Good code uses a programmer's full wisdom

The journeyman programmer finds a list of 10 principles for good code. She studies them one by one, and after years of toil attains mastery. Before her the baffling complexity of programs stood as stalagmites of wax; before her gaze, it has now melted down and separated into buckets.

The path there is one of toil. Every place where intuition says the code could be simpler, she seeks how. Every issue that was hard to debug, she searches for how it could have been prevented.

And then she declares “that is all.” Her apparent mastery has brought her respect, and her skill cleaves problems that foil others. She accepts her place at the top and rests.

But the one with the potential to become a grandmaster does not rest. They notice the dregs of wax that fall outside the buckets and see in them opportunities to find new explanations. As they search ever deeper, the buckets dissolve and reveal the interconnected whole. As with the programmer learning a codebase, they have stepped into the dream behind the concepts, and are now ready to dream themselves.

This list came from years of observation, reflection, study, teaching, and refinement. It is yours now to study, criticize, preach, ridicule, and extend.

Resources

On modularity: The 3 Levels of Logic

On intent: My Favorite Principle for Code Quality

On Robustness: State of emergency! The four ways your state might be wrong

On Secrets: David Parnas, “The Secret History of Information Hiding”

I mostly lack public resources on openness and the sequestering of assumptions, although refunctionalization is one technique for achieving it.

But for all of these, the best way to learn them is through deliberate practice.

And for that, we have the Advanced Software Design Course.

Thank you to Nils Eriksson, Jun Hong “Nemo” Yap, Emmanuel Genard, Paul Weidinger, and Yongming Han for comments on earlier drafts of this essay.