Wednesday, July 18, 2018

When your data model means something else

One of my coaching clients was designing a program, and came to me with a data-modeling question. The program was for making interactive stories like those Choose Your Own Adventure books. A story contained many passages. In a passage, you might encounter a monster, and then flip to page 62 to fight it, or to page 187 to run away.

He had been choosing between two models. In one, the story contains a list of passages. In the other, each passage has a reference to a story. Which was better?

They have differences in which operations they make easier or more complicated. But I told him that, before even discussing this, or looking at third alternatives: there’s a basic question to answer first:

Do they even represent the same thing?

To think about this question, stay here.
To see the answer, scroll past this code.

Option 1:
Option 2:
 title: String
 passages: Passage[]

 title: String
 content: String
 links: Passage[]
 title: String

 title: String
 content: String
 links: Passage[]
 story: Story

I smiled as I gave my answer: They are equivalent if passages are immutable, but different if they are mutable.


At first glance, both options seem identical. Option 1 says that a story has many passages. Option 2 says that every passage is associated with a story. It’s a one-to-many-relationship.

Or is it? It’s pretty obvious in Option 2 that no passage can be associated with two stories. But they can be in Option 1 —in the mutable case only.

Why not in the immutable case? You might think that you can create an immutable passage, stick it in two stories, and presto — sharing! But, in the immutable world, one copy used twice is the same as two independent copies.

Think about what happens when you go to implement the “edit passage” screen. In the mutable world, you modify the title and content in-place, and it gets updated in both stories. But in the immutable world, modifying the passage really means making a modified copy, and updating it into the story. It’s copy-on-write. To modify the other story similarly, you’d have to go find it and do the same operation to its passage. It really is a separate copy.

What we’ve seen is a fundamental concept in programming: mutable values have identity. And immutables don’t.

With immutable values, there is no notion of “the same” story or passage beyond “these two have the same values.” There is no way to distinguish between two stories that have the same passage, vs. two stories that have identical copies of the passage. There is no true sharing.

With mutability, the opposite holds. If there are two passages that have the same values, and updating one also updates the other (i.e.: all reads and writes are identical), then they really are the same. Conversely, it’s now possible to have two identical passages which are not the same.

So, the mutable vs. immutable debate is not just about in-place updates vs. modified copies. Immutability means independent parts. Mutability means sharing.

One way to model this is to think of every mutable value as really a pair. The first element of this pair is the value itself. The second element is a magic “token,” created at allocation-time, which is guaranteed to be distinct from every other such token in the program. This explains why, if you do want proper sharing in the immutable world, you’ll need to change your data model to explicitly add some kind of identifier. This, by the way, is one of the reasons why it’s correct to think about immutability as the default, and mutability as an added feature.

Bigger Differences

In the example above, there’s a simple fix if you like Option 1, but you don’t want sharing between stories: Add an extra rule, or invariant, forbidding sharing. Every rule means extra work to enforce, but this one should be easy. And you still have the option to just not care about sharing.

But adding mutability can mean more pervasive changes to a data model. Here are two more options for passages and stories, in stripped-down form. In Option 3, the story has a start passage, and every passage has a number of successors. In Option 4, we store the edges in reverse, with each passage having a parent.

Option 3:
Option 4:
 start: Passage

 links: Passage[]
 passages: Passage[]
 startIdx: int

 parent: optional<Passage>

Extra invariant:
  Every passage is reachable from the start passage

If you squint at it right, when passages are immutable, these are the same. Option 3, naturally, represents trees (no cycles). But, by reversing the arrows, any tree can also be turned into the form of Option 4, so Option 4 also represents trees. This is a convenient lie up until you have to implement any kind of editing, and find that it’s just pretending to be a tree: you’ll need to rebuild large sections on every edit.

Option 3, immutable
Option 4, immutable

But, after adding mutability, Options 3 and 4 become unequivocally not the same. When passages have identity, Option 3 now represents arbitrary graphs with ease. Option 4, meanwhile, now represents these weird graphs that can have cycles, but where every passage can only have one predecessor. Picking between models goes from a game of “which one is more convenient to use” to “which one actually represents a story?”

Option 3, mutable
Option 4, mutable

Reversing the Polarity
We’ve seen two examples where adding mutability suddenly made two data models different. Is there a case where the story is reversed, and mutability can make two different models the same?

At first I thought there wasn’t — making things mutable just adds distinctions. We need the reverse. But for any rule about data structures, you can often get a reverse rule by putting the data structure in a “negative position” — that is, as the argument to a function, or as the key to a map. So consider options 5 and 6, this time representing just passages:

Option 5:
Option 6:
 title: string
 content: string
 summary: string
 title: string
 content: string

 summary: Map<Passage, string>

Here, we want to track some extra information on a passage. Perhaps you want to add some auxiliary information like the Chinese translation or a summary. Or perhaps you want to compute some property of all the passages, like how many steps till the end of the story, and you need to cache the intermediate results. The most straightforward way is adding an extra field. But another approach is to create a hashmap elsewhere associated each passage to some extra data. This pattern is often necessary when extending third-party APIs. And it only works when the objects are mutable.

Option 5 is always straightforward: each passage gets an extra string. But with immutable passages, Option 6 has an extra restriction: there cannot be two passages with the same title and content, but a different summary. Database people call this a functional dependency.

Perhaps you can hack it by making the map check for pointer-equality anyway, but chances are you’re breaking the assumptions of the rest of your code. Immutable code usually moves and copies objects willy-nilly, meaning that “the same object” will have different memory addresses over time.

On the other hand, when passages are mutable, you can think of every passage as coming with a magic token that distinguishes it from others with the same values. And so, that hashmap really does just mean that every passage gets some extra data.

The True Relationship

In software design, the first step is to discover the essence of the concepts you are implementing. I wrote in the first paragraph that a story contains many passages, so the basic idea is that a story contains passages, right?

Too fast. Since it makes sense to speak of a passage without a story, neither direction is more primitive. The primitive idea is just that the one-to-many relationship exists. If you were modeling stories in a database, the standard approach is to have a table just for storing the stories-passages relationship, separate from the tables for stories and passages. You can get the passages for a story, and the story for a passage. The relationship knows no direction.

How do you express this in a program? In 1987, a researcher at General Electric came up with an idea for a new style of programming in which relationships between objects can be created and manipulated as easily as the objects themselves. There’s been an uptick in the past decade, but first-class relationships still haven’t caught on even in the research world, and so I promise your language doesn’t have them. We can’t put the idea of a story-passage relationship directly into the program. We have to make do.

In databases, when you find yourself often using something in one table to look up things in another, you can speed things up by denormalizing the structure, fusing two tables together. In one possible denormalization, you fuse the Story-Passages table into the Stories table, giving each story a list of passages. In another denormalization, you fuse the Story-Passages table into Passages, giving each Passage a reference to its containing story. The two denormalizations correspond to the original Options 1 and 2 at the top of this post. (A third denormalization fuses Story-Passages into both.)

And now, we see that the reasons to choose one over the other are the same as the reasons to denormalize a table. For the choice between the passages and the story fields, it’s a matter of how often each would be accessed. Since data tends to be accessed hierarchically, I’d expect the need to fetch all the passages of a story to be great, and the need to work with a passage of an unknown story to be slight. This suggests Option 1 (well, Option 1 + more fields you need to actually write this app). But know that both are denormalizations of the true relationship.

Building Blocks

What about mutability gives values identity? How does it make two things of equal values suddenly inequal? Let’s dig into the mechanics of equality and find out.

Equality in programming languages comes from the idea of identity of indiscernibles , which states that if there is nothing you can do to tell two things apart, they are the same.

At their core, Story and Passage are both tuples. In Option 1, Story is a pair of a title and a list of passages, while Passage is a triple of a title, contents, and a list of links. At the end of the day, the only thing you can do with a Story or Passage is to look at its components. So if both components of two Story's are equal, then the Story‘s are indistinguishable, and hence equal.

This generalizes easily to arbitrary pairs, and then arbitrary tuples. We can formalize that into a rule, like this:

This is the eq-pair rule, which defines equality for pairs exactly as we said above. You read this as “if the conditions on the top are true, then the statement on the bottom is true.” So this says that, if the corresponding components of two pairs are equal, then the pairs themselves are equal.

How does this change when one of those components, e.g.: the list of passages, is mutable?

First, let’s talk about mutable variables. When you make a mutable variable like in “int x = 1,” you should think of those as happening in two steps. First, you create a location in memory that can change, and initialize it with the value “1”. This location is called a mutable reference cell. You then assign this reference cell to the variable x. Variables are immutable containers that hold mutable reference cells.

In languages like OCaml/SML and Haskell, it looks exactly like this: you create a mutable reference cell using the ref or newIORef command, and store it into an immutable variable. This distinction is also apparent in languages like modern JavaScript and Scala, which have separate keywords (JS: let/const ; Scala: var/val) for mutable and immutable variables: mutable variables act like immutable variables containing mutable objects. Yet even in C and Java, the execution works like this. When a program is compiled, there will typically be a memory slot on the stack containing a value that changes, and the variable itself, which is always “stack pointer + 16.” The difference between these is the difference between call-by-value and call-by-reference.

So, if Passage’s are mutable, when comparing two Story’s, we are not comparing a title and a list of Passage’s; we are comparing a title and a list of mutable reference cells.

Mutable reference cells can be thought of as an “address” which refers to some location in the heap. The address and heap usually wind up being a pointer and main memory, but this definition is abstract, and you can also implement it as a hashmap. To dereference a cell, you look up that address in the heap, return the value, and keep the current heap the same. Here’s a rule that says that:

By the way, while this exact rule is specific to a handful of simple programming language, the underlying idea is universal. In every language, when you write down what it means to look up a mutable variable, you’ll get something very close to this rule.

So, equality: Once it’s created, the only things you can do with a mutable reference cell are read to it and write to it. Following again the principle of Identity of Indiscernibles, two mutable reference cells are equal if and only if every read and write is the same. If two cells are equal, and you write a value to one of them, you better be able to read it from the other. This means that equality of addresses is trivial: every address is equal to itself (and nothing else).

So, the interesting part is in where these addresses come from when reference cells are created. In every language, the semantics look like this:

In English: to create a new reference cell with initial value v, you find an unused address a in the heap, and set the value at that location to v. The new reference cell is represented by that address a.

That top condition “a is a fresh address” is the magic. That condition provides the global property we need: that the address a is different from the address of every other mutable reference cell. Otherwise, it could confuse different cells. This means that, if two reference cells are equal, they were created by the same call to newRef. Or: If two mutable Passage’s are equal, they were created by the same constructor.

Ergo, it’s the magic token that gives mutable values their identity.

Everything we’ve discussed about mutable values having identity boils down to this one rule. Yet this one rule has deep implications about what your data means. So much of software design reduces to just understanding these building blocks.

Liked this post? There are still some slots open in my Advanced Software Design Weekend Intensive, happening in San Francisco the weekend of July 28th.


Thanks to Dhash Srivathsa and Tikhon Jelvis for comments on drafts of this post.

Friday, February 23, 2018

The Practice is not the Performance: Why project-based learning fails

Last night, I encountered an old post by Zach Holman where he pushes the idea that traditional school-based CS is useless; project-based learning is the way to go. I’ve heard this idea repeatedly over the last 10 years, and know at least one person who’s started an education company with that premise.

I don’t want to debate the current way universities do things (I found my undergrad quite useful, thank you), but I do want to dispel the idea that everything would be better if only we switched to project-based learning. The opposite is closer to true. Project-based learning is not a superior new idea. Project-based learning is the most inefficient form of learning that still works.

To see why, we actually have to sit down and think about the learning process. I’ve learned a number of models of learning over the course of my teaching training, but the one I’ve found most useful is taught by the Center for Applied Rationality. It goes like this:

  1. Break down a skill into small components.
  2. Drill each component rapidly, with immediate feedback
  3. Integrate the pieces together into larger components; drill these similarly

In that light, project-based learning definitely has something going for it: if you do a project, you will practice all the skills needed to do projects.

There’s something missing, though: it completely gives up on trying to think about what components it’s trying to teach.

Here’s how project-based learning might play out: After finishing his first two Android apps, Bob decides he wants to learn network programming, and generally get better at working on a software team. He and Alice pair up and decide to build a chat app for sending cat gifs, with a distributed replicated backend. They decide to make a spreadsheet of tasks and claim them. Alice also wants to learn network programming, and she swoops in and takes most of the server components; Bob gets left with most of the client work. Over the next three weeks, Bob spends a lot of time building the GUI for the app, which he already knows how to do, and only a couple hours implementing the client protocol. They start writing tests because the teacher said they had to. Bob has trouble writing the tests, and realizes a couple ways he could have made his code differently to make testing easier. He also discovers that two of his tests were too fragile, and needed to be changed when he updated the code.

It’s now one month later. What has he gotten from the experience? He's learned he needed to be more proactive about taking on tasks that challenge him and grow his skills. He’s learned a smidgeon about network programming, and a couple ideas about how to write better tests. These are good lessons, sure, but expensive. Isn’t there a way for Bob to learn more in that month?

False Temptations

What are the common arguments in favor of project-based learning? Here are two of the main ones.

Real skills

The first big reason for project-based learning is that it teaches real skills used in industry. Why do many schools teach much of their curriculum in Haskell, not in the Tiobe Top 10, or even SML or OCaml, not even in the top 50? Wouldn’t they serve their graduates better teaching Node and React?

The first counterargument is that industrial technologies come and go. Proponents acknowledge this, sure, but still call CS departments “out of date” for not following trends. What really drove home the futility of this argument for me was this essay by software-engineering pioneer Mary Shaw. Had she followed that advice in the 60s, she points out, her students would have spent their time studying JCL, the language used to schedule jobs on IBM mainframes.

The second and bigger counterargument: learning concepts is much more important than learning applications, and the best environment to learn a concept is rarely the one in industrial demand.

People often ask me what’s the best language to learn to study software design. I ask them what’s the best instrument to learn to study music theory. Everyone answers piano. In piano, you can see the chords in a way that you can’t in, say, trombone. We see something similar in other domains. In The Art of Learning, Josh Waitzkin recounts how, unlike others, he started studying chess at the fundamentals, in situations with few pieces on the board. He ultimately beat competitors who studied many times harder. In The Art of Game Design: A Book of Lenses, Jesse Schell advocates looking past modern video games and instead studying the concepts in board games, dice games, playground games.

So, for programming, we need to (1) figure out the core concepts to teach, and (2) pick languages that make the concepts readily available. C and Java --- indeed, all top languages until Swift arrived --- lack elegant ways of expressing the idea of a sum, a value that can be one of many alternatives. Thus, when explaining why adding a new alternative sometimes breaks existing code and sometimes doesn’t, I find myself having to explain using the clumsy manifestations as unions and subclasses. The first time I read Bob Harper’s explanation of why they chose SML for the CMU undergraduate curriculum, I thought he was just rationalizing snobbery. Now, I quite agree.

More like real jobs

The second big argument for project-based learning is that it more closely resembles what students will actually do on the job. This, in turn, is based on the idea that the best way to practice an activity is to do it.

This is false. “The only way to improve at X is to do it” is the advice you give when you actually have no idea how to improve. When you do know, you isolate subskills and drill them. Martial artists punch bags and do kata, and fight with extra constraints like no dodging. Musicians play scales, and practice a single measure over and over. Mathematicians rederive theorems from the book. And to condition a shot-putter, the best way is not to put weights in their hands and have them mimic the throwing motion, but rather to train the body’s ability to produce power, using squats and lots of heavy exercises that don’t even resemble shot-putting.1

Drilling programming

So what am I advocating? I’m advocating that you actually think about what you’re trying to teach, and design drills for it. The first drills should focus on one thing at a time.

So, for Bob trying to learn network programming and the general software engineering skills of being on a team project, here’s my 5-minute attempt to come up with an alternative way to teach these skills:

  1. Writing just the networking component of a larger system.
  2. Being asked to write the test cases for a small program given to you. The program is deliberately designed to tempt you into all the classic mistakes of test-writing.
  3. A simulation where you “coordinate” with the TAs to build something, committing pseudocode to version control. They troll you by deliberately misunderstanding things you say and seeing if their misunderstanding can go undetected.
  4. You and several teammates are assigned a small project. You’re asked to divide it amongst yourselves into unrealistically small units of work, and write test cases for each other’s components. (Integrates the skills of 2 and 3.)

These total much less time than the chat app, and teach much more. If students find they’re not optimal for learning, I as the instructor have much more room for experimenting. And if the students do choose to do a full project afterwards, they’ll be much more prepared.

I don’t teach network programming and haven’t tested these specific ideas. But, in my specialty of software design and code quality, I use exercises built on similar principles all the time. So I may teach a client some of the pitfalls of a naive understanding of information hiding, and then show them code from a real project that has that problem and ask them how they’d solve it. Or I’ll ask them to give just the type definitions and method signatures for a Minesweeper game; if they violate any design principles, then I can give them a feature request they can’t handle, or show why the implementation will be prone to bugs.

Is it better than just assigning projects? That’s the wrong question to ask because project-based learning is incredibly easy to beat. My clients are mostly working professional software engineers; they’re already doing “project-based learning” every day. On my website, I claim.

Programmers learn by making bad design decisions, working on the codebase for a year, and then find themselves wishing they could go back and do things differently. I can give you that experience in an hour. 

Does this sound like a bold statement about my teaching prowess? It’s not. In fact, piano teachers put that claim to shame. You can spend hundreds of hours practicing a piece using too many muscles on every key press. If your body awareness isn’t great, you might not find out until your hand cramps up right before your performance. A couple seconds to catch that mistake, a couple minutes to tell you a story of that happening to others, and the piano teacher’s just saved you months.

(As an aside, this is why I believe in finding a competent private coach for every skill you really care about improving in.)

Replacing a traditional CS education with a “software engineering BFA,” like Spoelsky and Atwood suggest, is no longer a hypothetical exercise. We’ve tried it. And now dev bootcamps are going bankrupt. Instead of substituting for a traditional degree, recruiters are calling bootcamps jokes. Olin College of Engineering is famous for its project-based curriculum, but one student reports that she learned much more from traditional classes.

It’s time to stop looking for panaceas and shortcuts and realize that deliberate learning and deliberate practice --- as a separate activity from the everyday doing --- is the only way to mastery. As famed gymnastics coach Chris Sommer puts it, the fastest way to learn is to do things the slow way. Studying the fundamentals may seem like a distraction keeping you from getting your hands dirty making a Rails app using the Google Maps and Twilio APIs, but when you do get there, you’ll find there is less to learn if you’ve already compressed the knowledge into concepts.

Shameless Plug

My Advanced Software Design Web Course starts next week. It’s based on a lot of the learning principles I mentioned in this post, starting each concept with isolated drills and progressing to case studies from real software, and comes with personalized feedback from me on every assignment.


No, the sum total of knowledge about CS education is not to be found within this post. Yes, I do have some formal training in education; yes, other people have a lot more. Yes, there are a lot of things I didn’t bring up. Yes, the situation with bootcamps is more complicated than a simple referendum on project-based learning. The simple “turbocharging training” model of learning I gave is not a theory of everything. Yes, you need to run into problems in context, find motivation, try things out of order, and even eventually do a full project on a team without guidance. I believe realistic projects do have a place in education, but they still must be coupled with the principles of rapid feedback, and they are a poor substitute for learning the concepts piece-by-piece.


Thanks to Elliott Jin for comments on earlier drafts of this post.

1 When I was searching for a personal trainer, I asked about this to help screen candidates.

Wednesday, February 14, 2018

My favorite principle for code quality

Programming seems to become more about memorization every day, with advocates pushing for memorizing lists of design patterns and refactorings and the difference between “parameter coupling” and “invocation coupling.”

Forget all about that. There’s much more to gain by having general principles which apply to many situations. In this post, I’m going to show you one of my favorite software design principles in action, the Embedded Design principle. The Embedded Design principle, which I briefly introduced in a previous post, states that you should always code in a way that makes the design apparent. It’s like pixie dust in its ability to make code more readable, loosely coupled, and impervious to bugs. It’s also hard to grasp, because programmers rarely see a program’s design in first-class form. So, in this post, I’ll show you an example.

Conveniently, a principle can be used as a “machine” for generating examples that violate it. In software design, these examples always end up being code that looks reasonable to the untrained eye, but will spell doom and gloom — often in the form of hilarious bugs — for any team that tries to build on it. Last week, one of my coaching clients asked me how this principle applies to caching. I turned on my machine and it produced this:

The Problem

Meet Bob. Bob is an engineer at He’s only been there a few months, but he has a project that will affect everyone: he’s going to create the website’s stats page! Now everyone can see just how engaged AwesomeSauce’s users are.

Bob starts coding. There are three data items, and each one will print either a computed or cached value. Bob ended up using one chunk of code for each data item.

public void displayStats() {
  if (lastCachedTime <= lastMidnight()) {
    numUsers = countUsers();
    lastCachedTime =;
    print(Total Users:  + numUsers);
  } else {
    print(Total Users:  + numUsers);

  if (lastCachedTime <= lastMidnight()) {
    numArticles = countArticles();
    lastCachedTime =;
    print("Articles written: " + numArticles);
  } else {
    print("Articles written: " + numArticles);

  if (lastCachedTime <= lastMidnight()) {
    numWords = countWords();
    lastCachedTime =;
    print("Words written: " + numWords);
  } else {
    print("Words written: " + numWords);

So simple! As the stats page gets bigger, it will be easy to add more by repeating the pattern. Bob sends it to Charlie for review.

Now, one thing about Charlie is that he’s not me. You see, I’d probably flip out and tell Bob about all the bad things that will happen because of this code, and give him a lecture on the Embedded Design Principle and a link to this blog post (which conveniently already has this code). Charlie, however, just takes a sip of coconut water, and says “There’s a bug over there because you re-used lastCachedTime. Otherwise, looks good. Ship it!”

- if (lastCachedTime <= lastMidnight()) {
+ if (lastCachedTimeUsers <= lastMidnight()) {
     numUsers = countUsers();
-    lastCachedTime =;
+    lastCachedTimeUsers =;

< similar for articles and words >

It’s 6 months later. has been growing faster than the national debt. AwesomeSauce needs more frequent updates to the stats page to show this. Team lead Denise decides to double the refresh rate, and Charlie does it.

- if (lastCachedTimeUsers <= lastMidnight()) {
+ if (lastCachedTimeUsers <= lastMidnight() || lastCachedTimeUsers <= lastNoon()) {
    numUsers = countUsers();

Bob is displeased. They had previously decided all stats should refresh simultaneously; see how the mockup just has one “Last updated” time. “Oops,” says Charlie, and together they merge the if-statements to prevent a future mistake.

public void displayDashboard() {
  if (lastCachedTime <= lastMidnight() || lastCachedTime <= lastNoon()) {
    numUsers = countUsers();
    numArticles = countArticles();
    numWords = countWords();
    lastCachedTime =;

Another year passes. AwesomeSauce has been taking off like Bitcoin. Their stats page is now an information feast, with dozens of items. Time to start simplifying! Surveys show that no-one is actually wondering about the count of words, so they take it down:

- print(numWords);

AwesomeSauce has hit a jackpot! Writers are making 10,000 posts per day. One day, an AwesomeSauce engineer notices they have a big slowdown twice a day, at noon and midnight. After a full day of sleuthing, he discovers that it has something to do with how, at cache refresh time, the stats page is taking a full three minutes to load. With horror, he discovers the call to countWords(), still living, though its results are unused.

This was not merely an instance of “mistakes happen.” This was a heavy price paid from badly-designed code. Everything from the first bug to the site slowdown could have been prevented by structuring it better, and the way to structure it better is a simple application of the Embedded Design Principle.

Nipping it in the Bud

Standard disclaimer: When reading software design advice, always imagine the examples given are 10x longer. Overengineering is bad. But, if you’re not sure whether applying a technique to your code would be overengineering, error on the side of doing it. Abstract early.

In the first version of the code, the computation, caching, and display of each statistic were all independent knobs, 3*N in total, where N is the number of stats to display. Merging the if-statements combined the N caching behaviors into one knob, bringing the total down to 2*N+1 knobs. Nonetheless, this still made it possible to compute a stat without displaying it. This was the rope that hung AwesomeSauce with the twice-daily slowdown.

This example was extreme, but ones like it are real. A static analysis researcher I know once visited eBay. She was shocked to discover that the checker that excited the engineers the most was one of the most shallow: detecting unused variables. Each unused computation eliminated was a database call saved, and a nibble taken out of server costs.

Conceptually, the set of stats and their caching behavior should be only N+1 knobs: each stat as a whole can be turned on or off, and the caching behavior can be changed. Achieving this will be a non-local change, and won’t be obvious by just thinking about what the code does, thinking at Level 2. Instead, we have to look at the design:

This diagram gives a rough picture of concepts used in the design of the program, showing how the concept of a dashboard stat is instantiated multiple times, and how the concept of each stat is built into the code. This diagram is far from complete; it omits, e.g.: the choice of the English language. But, with it, we can see several things.

The same caching behavior is implemented multiple times, which made the bug in Bob’s first version possible.

The code is indistinguishable from an alternate design where each stat has its own caching policy. This is another example of why design recovery is in general impossible.

And finally, each dashboard stat gets expressed in multiple independent lines of code, each its own “knob.” Hence, the slowdown that plagued AwesomeSauce.

All can be fixed — and the code rendered beautiful — if we take all those many arrows emanating from those boxes on the right, and reduce them down to one per box, one line of code per low-level concept.

This is the true version of the “Don’t Repeat Yourself” (DRY) principle.


In Zen and the Art of Motorcycle Maintenance, Robert Piersig illustrates these higher-level concepts, which he calls forms, in the world of mechanics. A motorcycle is a power assembly and a running assembly; the power assembly can be divided into the engine and the power-delivery assembly; and so on. A mechanic that’s testing the spark plugs or adjusting a tappet is really working on these higher-level forms. And yet you cannot walk into a motorcycle parts shop and ask for a power assembly: these concepts only exist in the mechanic’s head.

But we’re programmers, and we have tools that translate our thoughts into CPU instructions. To a certain extent, we can actually build these forms directly into our code. Let’s create a value representing a DashboardStat.

public interface DashboardStatComputation {
  public Object compute(); // in real code, use a more specific type

public class DashboardStat {
  private DashboardStatComputation computation;
  private String textLabel;
  // constructors and getters

In the design, we already had a concept of a dashboard stat: it’s a computation and a label. Now that concept is in the code too. We can go one level further, and also put the collection of dashboard stats into the code too:

public class Dashboard {
  private List<DashboardStat> stats;
  private Time lastComputedTime;
  private Map<DashboardStat, Object> curValues;
  // constructors and getters
  /** Recomputes all statistics if cache is stale */
  public Map<DashboardStat, Object> getCurrentValues() {  }

With these higher-level concepts turned into code, Bob’s original dashboard page can now be written declaratively. This code is just too simple to fail.

// Initialization (slightly simplified from actual Java)
dash = new Dashboard(list( new DashboardStat(countUsers,  Total Users)
                         , new DashboardStat(countArticles, Articles written)
                         , new DashboardStat(countWords,  Words written));

// Use
public void displayDashboard() {


More (or less) Embedded Design

You will never turn your program into a pure expression of the design, at least not unless you’re a descendant of Zohar Manna. You are always making tradeoffs on how far to go down this spectrum. Here are some other choices for applying the Embedded Design principle to this example:

  • If this example refactoring was too much, there’s a simpler compromise. Don’t have a DashboardStat value; just have a DashboardData which hardcodes each of the computations. This is much less overhead than the way I did it above, but it still dictates that all stats should be computed in an all-or-nothing fashion. 
  • Extract the check for whether to recompute into a shouldUpdate method. Now your decisions about cache timing are reified in a separate chunk of code. To go one step further, the caching policy could be parameterized. 
  • Might you one day want to serve the website in multiple languages? Wrap each string in a call to an internationalization function. For now, this can just be the identity function, but doing so makes the choice of language explicit in your code, and marks all the text in your code which corresponds to a “translatable string” concept. (Another example of how, at the design level, two different identity functions are not identical.) 

Wrapping Up

Many programmers will get a sense that something was off about the first example. Every programmer I’ve shown this example to knew they should merge the if-statements and factor out the prints. But the ones who knew they should refactor out a “dashboard stat” value only did so by flash of insight, and couldn’t articulate how they came up with it. But by instead thinking about how the code is derived from the design, this refactoring becomes just a straightforward application of the Embedded Design Principle. Finding a good structure for the code was easy once we saw the structure of the design.

So always think beyond the code. Ask yourself about the concepts of your program, and the values that define them. Like Piersig, this is what turns you from the mechanic turning screws into the engineer and artist building something to be proud of.

But what if you didn’t have a clear picture of how this code was derived, and you want to just be able to look at the code and realize there’s a deeper concept that should be extracted? There is a way to do that, and, for that, you’ll have to wait for my next post.

Like this post? You may be interested in my Advanced Software Design Web Course.


Thanks to Alex Reece, Elliott Jin, and Mimee Xu for comments on earlier drafts of this post.

Monday, January 22, 2018

The Benjamin Franklin Method of Reading Programming Books

Let’s face it, programming books suck. Those general books on distributed systems or data science or whatever can be tomes for a lifetime, but, with few exceptions, there’s something about the books on how to write code in a language/framework/database/cupcake-maker, the ones with the animal covers and the cutesy sample apps, they just tend to be so forgettable, so trite, so….uneducational.

I think I’ve figured out why I don’t like them, and it’s not just that they teach skills rapidly approaching expiration. It’s their pedagogical approach. The teaching algorithm seems to be: write these programs where we’ve told you everything to do, and you’ll come out knowing this language/framework/database/cupcake-maker. Central to these books are the long code listings for the reader to reproduce. Here’s an example, from one of the better books in this category

class User < ApplicationRecord
  attr_accessor :remember_token
  before_save { = email.downcase }
  validates :name,  presence: true, length: { maximum: 50 }
  VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i
  validates :email, presence: true, length: { maximum: 255 },
                    format: { with: VALID_EMAIL_REGEX },
                    uniqueness: { case_sensitive: false }
  validates :password, presence: true, length: { minimum: 6 }

  # …another 30 lines follows...

Traditionally, there are two ways to study a page like this:

  1. Type out every line of code 
  2. Copy+paste the code from their website, maybe play around and make small changes 

Approach #1 is a method that, like a lecture, causes the code to go from the author’s page to the reader’s screen without passing through the heads of either. The second is like trying to learn how to make a car by taking apart a seatbelt and stereo: you’re just toying with small pieces. Neither is a sound way to learn.

If you had an expert tutor, they wouldn’t teach you by handing you a page of code. Still, these books are what we have. How can we read them in a way that follows the principles of learning? Read on.

Mental Representations

According to K. Anders Ericsson in his book Peak, expertise is a process of building mental representations. We can see this because expert minds store knowledge in a compressed fashion. Musicians can memorize a page of music far faster than a page of random notes. Expert chess players told to memorize a board position will do much better than amateurs, but, when they make a mistake, they’ll misplace whole groups of pieces.

This is possible because music and chess positions have structure that makes them look very different from a page of random notes or a random permutation of pieces. Technically speaking, they have lower perplexity than random noise. So, even though there are 26 letters in the English alphabet, Claude Shannon showed that the information content of English is about 1 bit per letter: given a random prefix of a paragraph, people can guess the next letter about half the time.

This is why a programmer skilled in a technology can look at code using it and read through it like fiction, only pausing at the surprising bits, while the novice is laboring line-by-line. This is also why a smart code-completion tool can guess a long sequence of code from the first couple lines. With a better mental representation, understanding code is simply less work.

(How do these mental representations work? My officemate Zenna Tavares argues they are distribution-sensitive data structures.)

This is exactly what’s missing from the “just type out the code” approach: there’s nothing forcing your mind to represent the program as anything better than a sequence of characters. Yet being able to force your mind to do this would mean being able to learn concepts more rapidly. Here’s a 200 year-old idea for doing so.

The Benjamin Franklin Method

I don’t know what’s more impressive: that Benjamin Franklin became a luminary in everything from politics to physics, or that he did this without modern educational techniques such as schools, teachers, or StackOverflow. As part of this, he discovered a powerful method of self-study. I’ll let him speak for himself (or go read someone else’s summary).

About this time I met with an odd volume of the Spectator. It was the third. I had never before seen any of them. I bought it, read it over and over, and was much delighted with it. I thought the writing excellent, and wished, if possible, to imitate it. With this view I took some of the papers, and, making short hints of the sentiment in each sentence, laid them by a few days, and then, without looking at the book, try'd to compleat the papers again, by expressing each hinted sentiment at length, and as fully as it had been expressed before, in any suitable words that should come to hand. Then I compared my Spectator with the original, discovered some of my faults, and corrected them.

—Benjamin Franklin, Autobiography

This process is a little bit like being a human autoencoder. An autoencoder is a neural network that tries to produce output the same as its input, but passing through an intermediate layer which is too small to fully represent the data. In doing so, it’s forced to learn a more compact representation. Here, the neural net in question is that den of dendrons in your head.

K. Anders Ericsson likens it to how artists practice by trying to imitate some famous work. Mathematicians are taught to attempt to prove most theorems themselves when reading a book or paper --- even if they can’t, they’ll have an easier time compressing the proof to its basic insight. I used this process to get a better eye for graphical design; it was like LASIK.

But the basic version idea applied to programming books is particularly simple yet effective.

Here’s how it works:

Read your programming book as normal. When you get to a code sample, read it over

Then close the book.

Then try to type it up.

Simple, right? But try it and watch as you’re forced to learn some of the structure of the code.

It’s a lot like the way you may have already been doing it, just with more learning.


Thanks to Andrew Sheng and Billy Moses for comments on previous drafts of this post.

Monday, January 15, 2018

The Design of Software is A Thing Apart

Big up-front planning phases are out. Rapid iteration is in. With all this movement towards agile, it’s increasingly tempting to throw out the idea of having a separate design doc for software in favor of just getting started and having self-documenting code.

And that is a fatal mistake.

Last week, I explained that we speak about program behavior at one of three levels: executions, code, and specification. Most knowledge about software design is contained in Level 3, the specification. The flipside: the information of a program’s design is largely not present in its code1. I don’t just mean that you have to really read the code carefully to understand the design; I mean that there are many designs that correspond to the exact same code, and so recovering all the design information is actually impossible.

In that post, I talked a lot about the example of describing malloc and free, and how it’s quite possible to have a semantic-level notion of “freeing memory” that corresponds to absolutely nothing in the implementation. In this post, I give three more examples. The first discusses the problem of testing too much, and how the decision of what to test is in general unrecoverable from the code. The second is a question of coupling which cannot be answered from the code alone. The third shows how an identical comparison function may have different meanings in different contexts, with different expectations about how the code may evolve. None of these examples will cause your code to break, at least not today. What they will cause are the characteristic problems of bad design: fragile tests, and the risk of extra work and bugs tomorrow. The first two examples are based on real code I’ve encountered (or written!), which caused exactly those problems.

Let’s begin!

1) What should you test?

“Write the tests first,” say advocates of test-driven development, “and the code will follow.” Doing so forces you to think about what the code should do before writing it. What it doesn’t do is get you to separate the code’s goals from its implementation details.

So you update the code, a test fails, and you think “Oh. One of the details changed.” Should you update the test to compensate? If you tried to use TDD as a substitute for thinking about the design, then you probably will. Congratulations, you now have a fragile test. Writing tests in this fashion is mistake #7 in my booklet “7 Mistakes that Cause Fragile Code”.

Here’s some code and its test. Is this a good test?

Code (pseudo-Java):
public void postProfileUpdateToServer(User u, Profile p) {
  HttpParams params = ;
  addProfileToParams(params, p);
  HttpResult res =, params);
  logger.logUpdateEvent(u, p); // calls appState.saveCurrentState()
  if (res.getResponseCode() != Http.HTTP_SUCCESS)
    scheduleRetry(u, this.updateBlock);
Test checkPostProfileUpdateToServerAssumeSuccess (pseudocode):
  Call postProfileUpdateToServer with user Bob,
       and his profile with address changed to 123 Mulberry St
  Check that was called once
  Check that its first argument was
               Check that the first parameter is the users E-mail address,
                  the second their mailing address,
                  and the third their preferred language
  Check that appState.saveCurrentState was called twice

( How do you check that was called? This is quite easy to do using a mocking framework such as Java’s Mockito. I quite recommend it, as a way to unit-test functionality purely in terms of how it’s defined using other operations. )

There are generally two ways to write bad tests. One is to miss cases and not be rigorous enough. This is what happens when developers chase code coverage without actually thinking about what they’re testing. A common blind spot is that a programmer will test what happens when a program goes right, but not test what happens when a server’s down, when it’s fed bad input, etc.

The other is to test too much and be too specific. Is it part of the design that postProfileUpdateToServer must call appState.saveCurrentState twice, and it chooses to do so once by calling logger.logUpdateEvent? I don’t know. The test assumes a certain order about the HTTP parameters, but does the Profile object guarantee it? If the Profile uses a hash table internally, then the order is arbitrary, and may change when upgrading the libraries. Then the test will fail when upgrading to a new version of Java. (This is why Go randomizes hash table iteration order; one of my colleagues built a tool that does the same for Java.) At the extreme, I’ve seen a very experienced programmer write tests like assert(MSG_CODE_UPLOAD_COMPLETE == 12), which, given that these constants are meant to encapsulate knowledge, kinda defeats the point of having them in the first place.

The code should not follow the tests, nor the tests the code. They should both follow the design.

2) Do these pieces of code know about each other?

One nice thing about understanding the design/specification level of code is that it gives us very clean definitions of formerly-fuzzy concepts like “this code knows about that code.” For now, stick with your intuition as you answer this question:

Here are three modules. Module 1 defines a “student” record. Module 2 defines an “employee” record, and creates them by turning students into student employees. Finally, module 3 uses employee records, namely by printing them.

Oh yeah, and this example is in pseudo-JavaScript, so there’s no notion of “type” other than “the set of stuff this function can return.” Here’s the code:

Module 1 :

function makeStudent(firstName, lastName) {
  return {firstName: firstName, lastName: lastName};

Module 2:

function makeStudentEmployee(student) {
  var e =  student.copy();
  e["employeeId"] = nextEmployeeId();
  return e;

Module 3:

function printEmployeeFirstName(employee) {

So, Module 3 accesses the firstName field of the employee value, which is defined in Module 1. Now, the question is: does Module 3 know about Module 1. That is, does the description of what Module 3 does need to be written in terms of the description of what Module 1 does?

The answer is: it depends. If the documentation of Module 2 is “Returns a student employee record with firstName, lastName, and employeeId fields,” then the answer is a clear No; it’s just an implementation detail that it obtains a student-employee record by appending to a student record. If the documentation is “Returns a student record with an additional employeeId” field, then it really is the case that you can’t know what’s in the argument to printEmployeeFirstName without reading Module 1, and so modules 1 and 3 are very much coupled.

3) What does this code accomplish?

Quick! What does this line of code accomplish?

  return x >= 65;

Two possible answers:

  1. x is an ASCII character which is already known to be alphanumeric; this code checks if it’s a letter. 
  2. x is an age of a person; the code checks if the person is past the retirement age in the US. 

In both possible answers, the code is the same. I can probably find examples where the surrounding context is the same too. If this is just a standalone function lying in a Utils file somewhere, it makes absolutely no difference to the program which I use it for; I could even use it for both. But it makes a massive difference to the evolution of the program.

The former may need to be changed for internationalization, while the latter may need to be changed following changes in US laws. The former should live alongside other functionality related to character sets; the latter alongside code relating to people or company policies. And in the former case, since there’s a bunch of non-alphanumeric ASCII characters between “9” and “A”, some crazy-optimizing programmer could change it to “return x >= 64” and then optimize it to use bit-ops like “return (x >> 6)” with no effect on the program. In the latter, that might lead to a lot of erroneously-sent retirement benefits.

Meanwhile, if you were trying to prove the correctness of a program that used this function, the proof environment would not be convinced that Bob is eligible for retirement benefits just because that function returned “true” until you tell it exactly what this function is meant to do.

If you just think about the code, then the letter ‘A’ is the same as the retirement age. So focus on the design.

Credit to Ira Baxter for this example.

Writing Better Code

I gave you three examples of situations where it’s impossible to recover design information from the code alone. For the tests, there’s no solution other than to have the design written somewhere separately, so you can know whether the order of HTTP parameters is stable, or whether to rely on the number of times saveState is called. For the code examples, it’s actually quite possible to change the code so that it only corresponds to one design. For that unclear case of coupling, if you want to make it clear that modules 1 and 3 are separate, have module 2 create records with explicit fields. For that retirement age check, just write it as “return x >= RETIREMENT_AGE”. Even if RETIREMENT_AGE and ASCII_A (usually spelled just 'A') have the exact same runtime values (for now), they represent different concepts. They are not interchangeable in the design, nor in a formal proof.

This gives rise to my first rule of writing good code:

Jimmy Koppel’s rule of good code #1: Make the design apparent in the code (Embedded Design Principle).

Sounds simple, but it’s quite vague and subtle until you’ve really internalized how code is derived from a design. I’ve given multiple workshops exploring different aspects of this idea, especially around coupling.

Some things are better left unsaid

In statistics, we used to think that, if you had enough data, you could learn anything. Then the causal inference guys came along and showed us that, actually, even with infinite data, there can be many processes that generate the same observations. If you just collect data about what days there was rain in town and what days your lawn was muddy, you can’t tell the difference between a universe in which rain causes mud and one where mud causes rain. You need to know something more.

Software design is the same way: there can be many designs that correspond to the exact same code. Those who speak of “self-documenting code” are missing something big: the purpose of documentation is not just to describe how the system works today, but also how it will work in the future and across many versions. And so it’s equally important what’s not documented.


Thanks to Alan Pierce, Alex Reece, and Tej Chajed for comments on earlier drafts of this post.

1 Note that I am considering text in a source file, namely comments and variable names, as part of the documentation, not part of the code. A long comment explaining the structure and guarantees of your program is absolutely a form of specification.