Wednesday, September 21, 2022

Solving the Dog-Bunny Puzzle with Program Verification Technology

As a high schooler in the 70’s, my father enjoyed playing the Star Trek game written for the Sigma 7 mainframe. You play as the Enterprise surrounded by Klingon ships, and to shoot them down you have to look at a grid and figure out which angle (in degrees) to fire at.


It was fun right until he learned trigonometry.

Three days ago, Conrad Barski, creator of the absolutely delightful book/comic Land of Lisp, took the Internet by storm with a new puzzle game. In this puzzle, three characters move between rooms and press buttons, where each button closes or opens other doors. At least, that’s how it might appear in a traditional video game. Phrased that way, it may remind you of many other games; I’m reminded of the famous Flash game Fireboy and Water Girl and the obscure student project Lineland. But in its actual presentation, the Dog-Bunny Puzzle abstracts this mechanic into a thing someone can understand with no instructions in 60 seconds, while also being surprisingly difficult.

I spent a minute or two playing around and discovered that it would actually take serious thinking to solve it. I was about to put it down to get back to my work, writing design challenges for software engineers, when I realized this puzzle could fall to a sledgehammer a bit less humdrum than trigonometry, something I first learned about in a lecture on how to automatically fix bugs in concurrent programs.

You see, the puzzle could be modeled with something called a Petri net. And then I could run a Petri-net solver to solve the puzzle.

In fact, modeling it is so straightforward that I’d go so far as to claim the puzzle is a Petri net.

Following in my father’s footsteps, I shall now proceed to destroy the fun in this puzzle by replacing it with an intellectual challenge.

What is a Petri Net?

A Petri net is a graph with a bunch of tokens on each node. When certain conditions are satisfied, you are allowed to move tokens from one node onto another node. If this already sounds like the Dog-Bunny Puzzle, then, well, I did say the puzzle is a Petri net, right?

They were supposedly invented by Carl Adam Petri when he was 13, and are widely used in the field of formal verification. Unlike Petri dishes, invented a century earlier, they cannot be used to grow bacteria. But they can be used to model cellular activity in bacteria.

A Petri net has three components: places, transitions, and tokens. Consider this Petri net:

  • The three big circles are places.
  • The small black circles are tokens.
  • The rectangles are transitions.
  • To fire a transition, you remove one token from each of the incoming places, and add a token to each of the outgoing places. So, in this picture, to fire T1, you would remove a token from place cp1 and add a token to place p1.
  • If all of the incoming places for a transition have tokens on them, then it is possible to fire that transition, and we call it enabled. Otherwise, it is disabled. In this picture, T1 is enabled, while T2 and T3 are disabled.

And….that’s basically it. But they can do a lot. Places can represent lines in a program, states in a protocol, or molecules in a chemical reaction. Tokens can represent which line a thread is on, or some shared resource like a lock. And then the same general tools and algorithms can be applied to each of them.

There are several bajillion extensions to the basic Petri net model. For this puzzle, we need colored Petri nets. All that means is that there are now two types of tokens instead of one: dog tokens and bunny tokens.

I went searching for a colored Petri net solver, and found what looks to be the only game in town: a pretty serious package called CPN Tools. This was going to be fun. I eagerly went to the downloads page and found the Mac section.

We recommend you to download the latest stable Windows version of CPN Tools and run it using a virtual machine. See instructions for doing this on a Mac here (the instructions for doing so on Linux should be similar). You can also run CPN Tools using Wine. 

Fun indeed.

Modeling the Dog-Bunny Puzzle in CPN Tools

You ever hear complaints about UIs designed by programmers? You know you’re using one when there are two colors and two hundred buttons.

CPN Tools feels more like it was written by someone whose parents forced them to go to tech school instead of art school. The only thing you can do is drag pictures onto the screen, and the primary effect of doing so is to cover your window in pastel-colored bubbles. While conformist programs let you save by either going to File->Save or clicking an icon of a floppy disk, in CPN tools you get to click an illustration of a Petri net moving onto a floppy disk, and then drag it onto the thing you want to save. A little green comic-book speech bubble pops up to tell you your save is successful, except that there’s no actual speech in the bubble, because artwork that speaks to the head misses the soul, or something. The manual paints a sweeping Impressionistic picture of the software, with the details left to the reader’s imagination.

But enough of that. Time for you to sit back and learn some formal methods.

The first step is to define the places. We’ll have one place in the Petri net for each place in the Dog-Bunny Puzzle. I’ll set them up with roughly the same geometry as in the puzzle.

Now we start on the transitions. There are unconditional unidirectional transitions from the Bone and Flower to the Boat, and from the Carrot to the Tree. That is, a dog or bunny can move from the Bone to the Boat at any time, no matter what, but not vice versa. These are transitions with one incoming and one outgoing arc each, removing a token from one and adding it to the other.

There are unconditional bidirectional edges between the tree and well, and the well and flower. That means, at any time, a dog or bunny can move between these in either direction. Petri nets don’t really have bidirectional transitions, so we have to model these as a unidirectional transition in each direction.

Now we turn to the conditional edges.  There’s an edge from the House to the Bone that can only be crossed if something is at the carrot. This can be modeled: there’s a transition that removes a token from the House and Carrot, and then adds tokens to the Bone and Carrot. So the net effect of firing this transition is to move a token from the House to the Bone, but it’s only enabled if something is on the Carrot. I’ll call the arcs between this transition and the carrot “enablement arcs.”

This is the point where the model starts to get graphically ugly even as it remains conceptually simple. There’s a reason the puzzle just represents these relationships with text. From now on, I’ll be drawing all these enablement arcs in white so you can pretend they’re not there. Let’s finish these: There transitions from House to Boat and vice-versa which require something on the tree, and ones between Tree and House which require a token both on Bone and Flower.

Now, the last transition is a little different: A token can move from Well to Carrot only if there is nothing on Bone. This one sounds painful to model the way we’ve been working. One option is to add a transition from Well to Carrot enabled by House, and another one from Well to Carrot enabled by Boat, and so on for every place except for Bone. Fortunately, CPN Tools supports inhibitor edges: I draw a transition from Well to Carrot with an inhibitor edge from Bone, and it behaves exactly the way we want. Here’s the model, with the inhibitor edge in red, and all enablement edges in white. I’ll include a little bit more of the UI in the screenshot this time.

Now, it’s time to add the tokens. The “colors” of tokens in CPN tools can have a lot of structure: they can be numbers, booleans, or richer data structures. But here, we just need two types of tokens: dogs and rabbits. These tokens are interchangeable except for the victory condition, so they’re really two variants of the same type, at least as far as CPN Tools is concerned. The “Declarations” pane to the left has a standard list of “color sets.” I’ll add a new one:

colset ANIMAL = union DOG | RABBIT;

Now it’s time to add the dog and rabbit tokens to the graph. We’ll need to change the type of every place to ANIMAL, and then we can add the tokens. How do we do it? If you guessed “Press the TAB key twice and then type 1`RABBIT”, being careful not to balance the backquote,  then you are absolutely right.

CPN Tools at this point has decided that my model must be for some kind of performance, and decided it needs an accompanying laser show. 

The reason for the errors on each arc is that, because tokens are no longer interchangeable, when tokens enter and leave a transition, I need to specify which is which. If I have a dog on the House, I need a way to specify that the transition moves the dog to the Bone when there’s a rabbit on the Carrot, not that it moves the rabbit to the bone and the dog to the Carrot.

To do this, we declare three variables under the ANIMAL color set.

   var mover : ANIMAL;
   var constraint1 : ANIMAL;
   var constraint2 : ANIMAL;

Now, we label every single arc with one of these variables. If a transition has incoming arcs labeled mover and constraint1, then the token which travel across those arcs get bound to mover and constraint, respectively. Then there are outgoing arcs also labeled mover and constraint1, indicating which token goes where. So we’ll label most arcs with mover, but enablement arcs with constraint1 and constraint2.

One problem though: I can’t find a way to hide the labels. To save space in the diagram, I’ll rename these variables to p, q, and r.

And…the model is done! We can now plan with it in CPN Tools’s simulator mode, and see that it follows the same rules as the original Dog-Bunny puzzle.

One last thing though: Before we can start to use the solving features of CPN tools, we need to give each transition a unique name. I’ll just name them a, b, c, d, etc.

Solving the Dog-Bunny Puzzle with CPN Tools

It is now time to solve.

Every arrangement of tokens on places forms a state. In the start state, there is a rabbit on the House, a rabbit on the Boat, and a dog on the Tree. In one possible successor state, there are two rabbits on the House. In a different successor state, there are two rabbits on the Boat. From the start state, we seek to reach a state with the rabbits on the Carrot and the dog on the Bone.

You might think we could try dragging the rabbit tokens to the Carrot, the dog tokens to the Bone, and then ask “How do we reach this state?” Or perhaps by typing a query into some kind of query box. But that would not be neither artistic nor technical enough.

We make the procedure technical by writing code in asking for which state has the properties  of the dog and rabbits being in their proper positions. We use Standard ML, as the language endorsed by a certain breed of university professor who thinks they know programming better than you do. The important part of this snippet is the lambda beginning on the third line; the rest is just telling it to actually search the state space and give me a list of results.

SearchNodes (
  fn n => [DOG] = (Mark.Puzzle'Bone 1 n)
          [RABBIT, RABBIT] = (Mark.Puzzle'Carrot 1 n),
 fn n => n,
 op ::)

We then run the code using an out of the box idea, which makes the code one with the image: we type it into an arbitrary textbox and right-click “Evaluate ML.”

And thus, we learn that it is possible to reach a state with the desired property, and it is called state 150. Woohoo! 

Not only that, but by similarly running more code

NodesInPath(1, 150)

We learn the sequence of states needed to get there is:


We learn that the solution to the puzzle requires 26 moves. All that’s left is to learn what these numbers mean.

You might think we could just click a button and it would show you the state that one of these numbers means. That is close, but it has the problem of not feeling like Photoshop, so instead you need separate clicks to first select the “Set Petri net to state 4” tool and then to tell the computer system, that, yes, the reason you selected this tool is to see state 4.

If that’s too slow, well, you also have the option of reading the states like this.

But now, after spending more time trying to read the generated solution, than some people took to solve it from scratch, I have it! And now my animals are well-fed and happy.

  • Time I spent to solve this puzzle: 3 hours
    • 10 minutes modeling, 2 hours 50 minutes figuring out CPNTools’s interface
  • Time spent by the 8 year-old daughter of a random commenter: 2 minutes


I don’t see myself using CPN Tools again. It’s too programmatic to be a good GUI tool, and too GUI-based to be a good programmatic tool. Hopefully in the future if I use Petri nets, I’ll only need normal ones, not colored. That opens up quite a few programming libraries, such as the SNAKES library in Python.  Some of them might actually implement the clever solving algorithms we’ve had for 40 years, as opposed to CPN Tools, which appears to just brute force it.

Funny enough, while I know of a use of Petri nets in practical programming tools (productivity, not verification) recently, I’m personally responsible for killing it. Hoogle+ is a tool that uses Petri nets to come up with small programs of a desired type. My most recent paper introduces a new algorithm (not based on Petri nets) which solves the same problem 8x faster despite an implementation just a tenth the length.

But, regardless, now I’m more experienced with a technique that can solve loads of problems. And so are you.

I have released my CPN Tools file for this post here.

Monday, March 28, 2022

Abstraction: Not What You Think It Is

“Interfaces are abstractions”
Olaf Thielke, the "Code Coach"

“Interfaces are not abstractions”
Mark Seeman, author of Code that Fits in Your Head and Dependency Injection

“Abstraction in programming is the process of identifying common patterns that have systematic variations; an abstraction represents the common pattern and provides a means for specifying which variation to use”
Richard P. Gabriel, author of “The Rise of Worse is Better” and Patterns of Software
“[...] windshield wipers [...] abstract away the weather”
Joel Spolsky, cofounder of Fog Creek Software, Stack Overflow, and Trello

Of all the concepts debated in software engineering, abstraction is at the top. I found two separate debates about it on Twitter from the past week.

As the quoted writers show, people do not even agree what abstraction means. Abstraction seems to stand for a hodgepodge of different concepts involving generality, vagueness, or just plain code reuse. These engineering debates — debates about whether duplications are better than the wrong abstraction or about whether abstraction makes code harder to read — trickle down into heated discussions over code. But this confusion over abstraction's basic meaning makes all such debates doomed.

This situation is particularly sad for me as someone with a background in PL theory. There are a lot of topics in software engineering that are the result of accumulated intuition over decades. But we've had a pretty good definition of abstraction since 1977, originally in the context of program analysis, and — I claim — it actually translates quite well into a precise definition of “abstraction” in engineering.

Abstraction in general is usually said to be something which helps readers understand code without delving into the details. Yet, for many of the concrete code examples programmers actually call “abstraction,” they can be (and are) used in ways which add details and hinder understanding. In opposition, I take the position that software engineers will benefit from studying the mathematics of PL-theoretic abstraction, understanding how it explains things they already do, and letting this coherent definition rule their use of the term "abstraction."

My initial goal in this post is a smaller enabling step: to give you names for the other concepts that are often combined under the name "abstraction" that they may be referenced, used, and critiqued specifically, and to help you move away from the vague and contradictory citrus advice that arises from using the same name for different things. From there, I will proceed to provide the rigorous definition of abstraction as it pertains to software, followed by concrete examples.

But the story does not end after separating true abstractions from their impostors. The endless debates over what is and isn't an abstraction shall be resolved in an unsatisfying way: almost anything can be viewed as an abstraction, and most abstractions are useless. Yet once you learn how to actually write down an abstraction mapping, you gain the ability to look beyond the binary and explain the exact benefit that a given abstraction does or does not provide a reasoner, and in doing so rearrange the discordant intuition around abstraction into harmonic lines of precise, actionable advice.

Not Abstraction

There are at least five other things that go under the name abstraction.


One contender for the oldest programming language is the lambda calculus, where Alonso Church showed us that, by copying symbols on pen-and-paper using a single rule, one could compute anything.

In the lambda calculus, making a new function is called “lambda abstraction,” and often just “abstraction.”

By "making a new function," that doesn't mean that it's the process of looking at two similar terms like sin(x)2+1 and 2*x+1, and deciding to make a function λx.x+1. That's anti-unification, described below. It is quite literally the process of taking x+1 and changing it to λx.x+1.

And “abstraction” also refers to the output of this process, i.e.: any lambda/function whatsoever.1

It's been noted that this use of the word "abstraction" is quite different from other uses. Unfortunately, this has polluted broader discussion. Even though the lambda calculus usage is akin to adding an opening and closing brace to a block of code, this usage leaks out into discussion of "abstracting things into functions" and from there into extolling the benefits of being able to ignore details and other things that closing braces really don't let you do.

So functions are abstractions — just in a very limited meaning of the word with little relation to everything else under that label. Moving on...


"Anti-unification" is a fancy term for the process of taking two things that are mostly similar, and replacing the different parts with variables. If you substitute those variables one way, you get the first thing back; else you get the second. If you see x*x and (a-b)*(a-b) near each other in a codebase and extract out a square function, then you've just done an anti-unification (getting the pattern A*A with the two substitutions [A ↦ x] and [A ↦ a*b]).

(Its opposite is unification, which is comparatively never used in programming. Unless you're writing Prolog, in which case, it's literally on every single line.)

This probably looks familiar: virtually all of what goes under the Don't Repeat Yourself label is an example of anti-unification. Perhaps you would describe the above as "abstracting out the square function" (different from the previous definition, where "abstracting" is just adding curly braces after the variables are already in place).

In fact, Eric Elliott, author of Composing Software and Programming JavaScript Applications goes as far as to say “abstraction is the process of simplifying code by finding similarities between different parts of the code and extracting shared logic into a named component (such as a function, module, etc...)” — i.e.: that abstraction is anti-unification. He then goes on to claim "The secret to being 10x more productive is to gain a mastery of abstraction." That sounds pretty impressive for a process which was first automated in 1970.2


"Boxing" is what happens when you do too much anti-unification: a bunch of places with syntactically-similar code turns into one big function with lots of conditionals and too many parameters. "Boxing" is a term of my own invention, though I can't truly claim credit, as the "stuffing a mess into a box" metaphor predates me. Preventing this is exactly the concern expressed in the line "duplication is better than the wrong abstraction," as clarified by a critic.

There's a surefire sign that boxing has occurred. Sandi Metz describes it nicely:

Programmer B feels honor-bound to retain the existing abstraction, but since [it] isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter.

I've written and spoken against this kind of naive de-duplication before. One of the first exercises in my course is to give two examples of code that have the same implementation but a different spec (and should therefore evolve differently). Having identical code is not a foolproof measure that two blocks do the same thing, and it's helpful to have different terminology for merging similar things that do and do not go together.

But, particularly, if we want abstraction to have something to do with being able to ignore details, we have to stop calling this scenario "abstraction."


Though not precisely defined, indirection typically means "any time you need to jump to another file or function to see what's going on." It's commonly associated with Enterprise Java, thanks to books such as Fowler's Patterns of Enterprise Application Architecture, and is exemplified by the Spring framework and parodied by FizzBuzz: Enterprise Edition. This is where you get people complaining about "layers" of abstraction. It commonly takes the form of long chains of single-line functions calling each other or class definitions split into a hierarchy across 7 files.

Fowler's examples include abstracting


into the Service Layer Abstraction™


If you were to describe the former, you'd probably say "It calculates the recognized revenue for the contract." If you were to describe the latter, you'd probably say "It calculates the recognized revenue for the contract." We've been spared no details.

Interfaces, Typeclasses, and Parametric Polymorphism

All three of these are mechanisms for grouping multiple function implementations so that a single invocation may dispatch to any of them. Most programmers will be familiar with interfaces, which are a language feature in Java and TypeScript and a common pattern in Python. Typeclasses, a.k.a. “traits,” are essentially interfaces not attached to objects. Parametric polymorphism, a.k.a. “generics,” are a little different in that they combine functions which differ in nothing but their type signature.

Parametric polymorphism is essentially just adding an extra parameter to a function, except that this extra parameter is a type. It's not abstraction in the same way that anti-unification isn't.

Interfaces and typeclasses do tend to be associated with abstractions in the sense about to be introduced. But there's a banal reason they do not satisfy the goal of an abstraction definition: there's nothing mandating that the many implementations have anything to do with each other. For example, in my installation of the Julia language, the getindex function is actually an interface with 188 implementations, dispatched based on the runtime type of its arguments. Most of these implementations do lookups into array-like structures, but a few do the exact opposite and create an array.

Sometimes, when calling a function behind a polymorphic typeclass interface, the programmer knows only one implementation is relevant, so the shared name is just to save some typing. Other than that, in order to make use of any of these, one must be able to explain the operation of multiple functions in some common language. It is not the language feature that allows one to program liberated from details, but rather this common language and its correspondence with each of the implementations. Which brings us to...

True Abstraction

In programming language theory and formal methods, there are several definitions of "abstraction" in different contexts, but they are all quite similar: abstractions are mappings between a complex concrete world and a simple idealized one. For concrete data types, an abstraction maps a complicated data structure to the basic data it represents. For systems, an abstraction relates the tiny state changes from every line implementing a TCP stack with the information found in the actual protocol diagram. These abstractions become useful when one can define interesting operations purely on the abstract form, thus achieving the dictum of Dijkstra, that "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."

You've probably used one such abstraction today: the number 42 can be represented in many forms, such as with bits, tally marks, and even (as is actually done in mathematics) as a set. Addition has correspondingly many implementations, yet you need not think of any of them when using a calculator. And they can be composed: there is an abstraction to the mathematical integers from their binary implementation, and another abstraction to binary from the actual voltages. In good abstractions, you'll never think that it's even an abstraction.

So, what do abstractions actually look like in code?

They don't.

Where are the abstractions?

A running joke in my software design course is that, whenever I show some code and ask whether it has some design property, the answer is always "maybe." And so it is whenever you ask "Is Y an abstraction of X?"

First, a quick digression. In PL theory and formal methods, there are many definitions of abstraction in different contexts, though they are all far more similar to each other than to any of the not-abstractions in previous sections. The one I am about to present is based on the theory of abstract interpretation. Abstract interpretation is usually taught as a (very popular) approach to static analysis, where it's used to write tools that can, say, prove a program never has an array-out-of-bounds access. But it can also be applied to more interesting properties, though usually in a less automated fashion. I'll be presenting it from the perspective of understanding programs rather than building tools, and explaining it without math symbols and Greek letters. I'll be in particular focusing on abstracting program state. I'll occasionally gesture at abstracting steps in a program, though the actual definitions are more complicated. (Google “simulation relation” and “bisimulation” to learn the technical machinery.)


Abstractions are mappings. An abstraction is a pattern we impose on the world, not the entities it relates, which are called the abstract domain and concrete domain. Strictly adhering to this definition, the well-formed question would be "Is there an abstraction from X to Y?" followed by “Is that abstraction good?”

Let's return to the example of numbers. There is a great abstraction from voltages in hardware to strings of 0's and 1's, from strings of 0's and 1's to mathematical numbers, and also from tally marks to numbers. These abstractions on numbers induce abstracted versions of each operation on the base representations, such as the very-simple operation of adding 1 to a number. Already, we can see that the abstractions live outside the system; computers function just fine without a device that reads an exact value for the voltage in each transistor and then prints out the represented number. Yet in spite of living outside the system, this mapping is perfectly concrete.

Evaluating abstractions

The three example abstractions above have two properties that make them useful. The first is soundness. The picture above is what's called a “commutative diagram” in that any path through the diagram obtains the same result: given an input set of voltages, it would be equivalent to either (a) run a circuit for adding one and then convert the resulting voltages to a number, or (b) convert the voltages to a number and then add 1 with pen-and-paper. The second is precision: Adding 1 to a number produces exactly one result, even though it corresponds to a diverse set of output voltages.

Precision is what makes finding good abstractions nontrivial. The eagle-eyed reader might notice that any function to the integers yields a sound abstraction. For example, there is an abstraction from your TV screen to integers: the serial number. However, you'd be hard pressed to find any operations on TVs that can be sanely expressed on integers. Turning the TV on or changing the channel leaves it with exactly the same serial number, while replacing the TV with one slightly larger is barely distinguishable from randomly scrambling the serial number. Indeed, one would likely implement the “slightly larger TV” function on the abstract domain of serial numbers by mapping each serial number to the set of all other serial numbers. This is sound — getting a larger TV and then taking its serial number is certainly contained in the result of looking at your current TV's serial number then applying this operation — but maximally imprecise.

A consequence of this: if we translate every question “is X an abstraction of Y” to “is there an abstraction which maps X to Y,” the answer is always “yes.” Instead, we can ask which operations can be tracked precisely with reference only to the abstract domain. The abstraction from voltages to numbers is perfectly precise for all operations on numbers, but not for determining whether a certain transistor in the adder circuit contains 2.1 or 2.2 volts. The abstraction from TVs to serial numbers is perfectly imprecise for every operation except checking whether two TV's are the same (and maybe also getting their manufacturer and model).

To the property of soundness and the measurement of precision, we add a third dimension on which to evaluate abstractions: the size (in bits) of an abstract state. The good abstractions are then the sound abstractions which are small in bits, yet precise enough to track many useful operations.

So consider a website for booking tables at restaurants, where the concrete domain is the actual state of bookings per table {table1BookedFrom: [(5-6 PM, “Alice”), (7-8 PM, “Bob”)], ..., table10BookedFrom: [...]}. The abstract domain shall be a list of timeslots. For each user, it is possible to abstract a concrete state down to the abstract state listing the booked timeslots for that user, i.e.: for Bob, [7-8 PM]. What makes this an abstraction, as opposed to just an operation on the restaurant state, is that we shall then proceed to describe the effect of every other operation on this value. So consider the actual booking function, which might have this signature:

void bookTable(User u, TimeInterval t)

Below I give 4 specifications for this function. For the example of booking a table for Carol from 7 to 8 PM, these specifications give:

  1. The actual behavior of this implementation, say, trying to assign Carol to the lowest-numbered table.
  2. All allowed behaviors on the concrete states, i.e.: finding any table open at the given time and assigning it to Carol, or doing nothing if there is none
  3. The allowed outputs on the abstract domain of Carol's bookings, namely (assuming Carol does not already have a reservation) either [7-8 PM] or [].
  4. The allowed outputs on the abstract domain for Bob or any other user, i.e.: the exact same as the input.

From this one example, we can derive quite a few lessons, including:

  • All of these specifications are useful in that you might use each of them when mentally stepping through the code. Perhaps you'd think “This line shouldn't have affected any of Bob's bookings” (using specification 4, corresponding to the “Bob's bookings” abstraction) or “When I click this button, either table 5, 6, or 7 will be booked” (using specification 2).
  • Abstractions are separate from the code, and even from the abstract domain. It does not make sense to say that the bookTable function or anything else in this file “is” the abstraction, because, as we have just seen, we can use many different abstractions when describing its behavior. More striking, we see that, even for a specific pairing of concrete and abstract domains, there can be many abstractions between them.
  • Instead, code is associated with abstractions. Note the plural. We've seen that bookTable can be associated with several abstractions of the behavior — infinitely many in fact, including many useful ones not previously discussed, like mapping the restaurant state to the list of timeslots with available tables.
  • No code change is needed to reason using an abstraction. We could extend the mapping from relating abstract/concrete states to relating steps between them, and then say the bookTable function “abstracts” the set of intermediate steps the program takes for each line in the function, but we could do this almost as easily if the bookTable implementation was actually a blob in a much larger function.
  • Different abstractions tend not to be more or less precise than each other, just differently precise. Compare the abstractions from the restaurant state to Bob's bookings, Carol's bookings, and the set of open timeslots. All of them can be used to answer different questions.

Continuing, we can also evaluate what makes the “Carol's bookings” abstraction a good one. The corresponding specification, Specification 3, is quite close to deterministic, yielding only two possible output states. The corresponding abstract states contain much less information than the concrete ones. And an entity (human or tool) reading the code tracking only this abstract state will still be able to perfectly predict the result of several other operations, such as checking whether Carol has a table. This is the new semantic layer on which one can be absolutely precise that Dijkstra speaks of!

Of course, it is cumbersome to say “there is a sound and precise abstraction mapping voltages on hardware to mathematical numbers” or “there is a good abstraction from the specific details of when tables are booked to just the available times.” It is quite convenient shorthand to use the more conventional phrasing “numbers abstract the hardware” or “bookTable abstracts all the details of reservations;” you can say “abstraction mapping” if you want to clearly refer to abstractions as defined in this blog post. Yet this shorthand invites multiple interpretations, and can spiral into an argument about whether booking tables should actually be the same abstraction as booking hotel rooms. Feel free to call numbers an abstraction of the hardware, but be prepared to switch to this precise terminology when there's tension on the horizon.

Whence the Confusion

Programmers correctly intuit that it is desirable to have some way to reason about code while ignoring details. In their 1977 paper, Patrick and Radhia Cousot first taught us the precise definition of abstraction that makes this possible. The other attempts fail to see the incorporeal nature of abstractions and instead fixate on something in the code. But there must be some connection.

Yes, functions are not abstractions. But for every function, there is an abstraction, not necessarily a good one, collapsing the many intermediate steps of the function into an atomic operation. There may also be abstractions which admit a simple description of the relation between the inputs and outputs.

Anti-unification is not abstraction. Yet two code snippets that could be fruitfully semantically modeled with similar abstract states will often be amenable to syntactic anti-unification. As disparate operations are combined, ever more information must be added into the abstract state to maintain precision. The result is boxing.

Indirection is hella not abstraction, though similarly-named functions may suggest slightly-different abstract domains associated with them. Many changes that make abstractions more explicit come with indirection, but we've seen it's possible for readers to impose abstractions on code without any changes at all.

Typeclasses and interfaces are not abstraction, but a good interface will be associated with at least one good abstract domain precise enough to make each of the interface's operations nearly deterministic (or at least simple to specify), and each implementation will come with a sound abstraction mapping its concrete states into that abstract domain.

Your car's windshield wipers and roof do not abstract away the rain, as claims Spolsky, but they do mean that, to predict your happiness after running your brain's DriveToStore() function, you can use an abstract state that does not include the weather.

Abstractions offer the dream of using simple thoughts to conjure programs with rich behavior. But fulfilling this promise lies beyond the power of any language feature, be it functions or interfaces. We must think more deeply and identify exactly how the messy world is being transformed into a clean ideal. We must look beyond the binary of whether something is or is not an abstraction, and discover the new semantic level on which we can be absolutely precise.

Appendix: Some Other Views of Abstraction

Abelson and Sussman's Structure and Interpretation of Computer Programs is a sure candidate for the title of “Bible of computer programming” (sometimes mixed with the actual Bible), and it's full of instruction on abstraction, beginning with chapter 1 “Building Abstractions with Procedures.” I expect at least one reader wants beat me over the head with a copy of the book saying I'm getting it wrong. I don't really want that (it's 900 pages), so I walked down 2 flights of stairs from my MIT office to Gerry Sussman's office and asked him. I'll represent his ideas below.

Sussman explained that he believes abstraction is a “suitcase term” which means too many different things, though he only sees two main uses relevant to software. The first definition is: giving names to things produced by the second definition. That second definition, he explained, has to do with the fundamental theorem of homomorphisms and its generalization. And then he pulled an abstract algebra book off the shelf.

I'll try to explain this as best I can with minimal math jargon while still being precise. I'll explain the definition simultaneously with Sussman's two examples, multivariate polynomials, and (physical) resistors in electric circuits.

In this picture, G can be seen as the set of all data in all forms and f is some operation on G. So G can be the set of all polynomials in all representations, or the set of all resistors. f then can be the operation of plotting the polynomial on all values, or computing the current through a resistor across a range of voltages.

Now, there are many different values in G on which f does the same thing. G contains both sparse and dense representations of the same polynomial; these have the same plot. There are different resistors with the same resistance; assuming they perfectly follow Ohm's Law, they have the same current at the same voltage. One can write down the list of lists of which values of G are treated the same by f; that's φ, the kernel of f. So for polynomials, φ would be the list of distinct polynomials, each of which contains the list of all representations of that polynomial.

Now, finally, one can use φ to quotient or “smush” together the like elements of G. All the different representations of the same polynomial get mapped to something representing that polynomial independent of representation. All the different resistors of resistance R get mapped to the idea of a resistor with resistance R. This is G/K in the picture.

The theorem is then that G/K behaves the same (is isomorphic to) H, e.g.: that that the set of different representations of polynomials can be manipulated in the same way as their plot. But I believe Sussman was gesturing less at this theorem and more at G/K itself, i.e.: at the idea of merging together different representations that behave the same under some operations.

I like this idea because it gives a way to unify into a single mapping the relation between many different implementations and their shared abstract domain. For the different representations of polynomials, a typical formulation with abstract interpretation would provide a different mapping from each kind of implementation into the shared abstract domain. On the whole though, the operative idea in the “homomorphism theorem” theory of abstraction seems to be that of merging together concrete values that can be treated similarly by certain operations. This idea is already present in abstract interpretation; indeed, φ can be directly taken to be an abstraction mapping, with G and G/K the concrete and abstract domains. On the whole, while I find the connection to abstract algebra cute, I'm not sure that the “homomorphism theory of abstraction” offers any insight that the theory of abstract interpretation does not.

So that's the main item in Sussman's suitcase of meanings of abstraction in software. It looks superficially different from any of the variations of abstract interpretation, but is actually quite compatible.

Is there anything else in that suitcase? Any other (good) uses of the word “abstraction” not captured by the previous definitions?

Maybe. I can say that there are is something I'd like to be able to do with something called “abstraction,” but that I can't do with abstract interpretation: dealing with inaccuracy.

You see, the orthodox definition of a sound abstraction would rule out a technique that predicts the concrete output perfectly 99.99% of the time and is otherwise slightly off, and instead prefers a function that says “It could be anything” 100% of the time. I know there is work extending abstract interpretation to some kinds of error, namely for numeric approximations of physical quantities, but, overall, I just don't know a good approach to abstraction that allows for reasonable error.

On a related note, sometimes AI researchers also talk about abstraction. I know that the best poker AIs “abstract” the state space of the game, say by rounding all bet sizes to multiple of $5, and that human pros exploit it by using weird bet sizes and letting the rounding error wipe out its edge. But I am not aware of a general theory backing this beyond just “make some approximations and hope the end result is good,” and am not even sure “abstraction” is a good term for this.


Thanks to Nate McNamara, Benoît Fleury, Nils Eriksson, Daniel Jackson, and Gerry Sussman for feedback and discussion on earlier drafts of this blog post.

1 Daniel Jackson credits Turing Laureate Barbara Liskov with promulgating this usage. Her influential CLU language uses “abstraction” to mean any unit of functionality (type, procedure, or iterator).

2 Technically, it’s only first-order anti-unification, equivalent to extracting named constants, that Gordon Plotkin developed in 1970. Work on higher-order anti-unification, which corresponds to extracting (higher-order) functions, began in 1990; Feng and Muggleton’s “Towards Inductive Generalisation in Higher order Logic" (1992) provides an early discussion. While Elliott is quite explicit that any anti-unification is abstraction, it is true that there are a vast number of anti-unifications of a given set of programs, and choosing the best is very difficult. DreamCoder is a recent notable work based on doing so.

Sunday, March 28, 2021

Developer tools can be magic. Instead, they collect dust.

Update 6/14/21: Now available in Chinese.

I started working on advanced developer tools 9 years ago. Back when I started, “programming tools” meant file format viewers, editors, and maybe variants of grep. I’d mention a deep problem such as inferring the underlying intent of a group of changes, and get questions about how it compares to find-and-replace.

Times have changed. It’s no longer shocking when I meet a programmer who has heard of program synthesis or even tried a verification tool. There are now several1 popular products based on advanced tools research, and AI advances in general have changed expectations. One company, Facebook, has even deployed automated program-repair internally.

In spite of this, tools research is still light-years ahead of what’s being deployed. It is not unusual at all to read a 20 year-old paper with a tool empirically shown to make programmers 4x faster at a task, and for the underlying idea to still be locked in academia.

I’d like to give a taste of what to expect from advanced tools — and the ways in which we are sliding back. I will now present 3 of my favorite tools from the last 30 years, all of which I’ve tried to use, none of which currently run.

Reflexion Models

We often think of software in terms of components. For an operating system, it might be: file system, hardware interface, process manager. An experienced engineer on the project asked to make certain files write to disk faster will know exactly where to go in the code; a newcomer will see an amorphous blob of source files.

In 1995, as a young grad student at the University of Washington, Gail C. Murphy came up with a new way of learning a codebase called reflexion models.

First, you come up with a rough hypothesis of what you think the components are and how they interact:

Then, you go through the code and write down how you think each file corresponds to the components.

Now, the tool runs, and computes the actual connectivity of the files (e.g.: class inheritance, call graph). You compare it to your hypothesis.

Armed with new evidence, you refine your hypothesis, and make your mental model more and more detailed, and better and better aligned with reality.

Around this time, a group at Microsoft was doing an experiment to see if they could re-engineer the Excel codebase to extract out some high-level components. They needed a pretty strong understanding of the codebase, but getting it wouldn’t be so easy, because they were a different team in a different building. One of them saw Gail’s talk on reflexion models and liked it.

In one day, he created his first cut of a reflexion model for Excel. He then spent the next four weeks refining it as he got more acquainted with the code. Doing so, he reached a level of understanding that he estimates would have taken him 2 years otherwise.

Today, Gail’s original RMTool is off the Internet. The C++ analysis tool from AT&T it’s based on, Ciao, is even more off the Internet. They later wrote a Java version, jRMTool, but it’s only for an old version of Eclipse with a completely different API. The code is written in Java 1.4, and is no longer even syntactically correct. I quickly gave up trying to get it to run.

Software engineering of 2021: Still catching up to 1995.

The WhyLine

About 10 years later, at the Human-Computer Interaction Institute at Carnegie Mellon, Amy Ko was thinking about another problem. Debugging is like being a detective. Why didn’t the program update the cache after doing a fetch? What was a negative number doing here? Why is it so much work to answer these questions?

Amy had an idea for a tool called the Whyline, where you could ask questions like “Why did ___ happen?” in an interactive debugger? She built a prototype for Alice, CMU’s graphical programming tool that let kids make 3D animations. People were impressed.

Bolstered by their success, Amy spent another couple years working hard, building up the technology to do this for Java.

They ran a study. 20 programmers were asked to fix two bugs in ArgoUML, a 150k line Java program. Half of them were given a copy of the Java WhyLine. The programmers with the WhyLine were 4 times more successful than those without, and worked twice as fast.

A couple years ago, I tried to use the Java Whyline. It crashed when faced with modern Java bytecode.


In 2008, my advisor, Armando Solar-Lezama, was freshly arrived at MIT after single-handedly reviving the field of program synthesis. He had mostly focused on complex problems in small systems, like optimizing physics simulations and bit-twiddling. Now he wanted to solve simple problems in big systems. So much of programming is writing “glue code,” taking a large library of standard components and figuring out how to bolt them together. It can take weeks of digging through documentation to figure out how to do something in a complex framework. Could synthesis technology help? Kuat Yessenov, the Kazakh genius, was tasked with figuring out how.

Glue code is often a game of figuring out what classes and methods to use. Sometimes it’s not so hard to guess: the way you put a widget on the screen in Android, for instance, is with the container’s addView method. Often it’s not so easy. When writing an Eclipse plugin that does syntax highlighting, you need a chain of four classes to connect the TextEditor object with the RuleBasedScanner.

class UserConfiguration extends SourceViewerConfiguration {
  IPresentationReconciler getPresentationReconciler() {
    PresentationReconciler reconciler = new PresentationReconciler();
    RuleBasedScanner userScanner = new UserScanner();
    DefaultDamagerRepairer dr = new 
    reconciler.setRepairer(dr, DEFAULT_CONTENT_TYPE);
    reconciler.setDamager(dr, DEFAULT_CONTENT_TYPE);
    return reconciler;

class UserEditor extends AbstractTextEditor {
  UserEditor() {
    userConfiguration = new UserConfiguration();
class UserScanner extends RuleBasedScanner {...}

If you can figure out the two endpoints of a feature, what class uses it and what class provides it, he reasoned, then you could ask a computer to figure out what’s in-between. There are other programs out there that implement the functionality you’re looking for. By running them and analyzing the traces, you can find the code responsible for “connecting” those two classes (as a chain of pointer references). You then boil the reference program down to exactly the code that does this — voila, a tutorial! The MatchMaker tool was born.

In the study, 8 programmers were asked to build a simple syntax highlighter for Eclipse, highlighting two keywords in a new language. Half of them were given MatchMaker and a short tutorial on its use. Yes, there were multiple tutorials on how to do this, but they contained too much information and weren’t helpful. The control group floundered, and averaged 100 minutes. The MatchMaker users quickly got an idea what they were looking for, and took only 50 minutes. Not too bad, considering that an Eclipse expert with 5 years experience took a full 16 minutes.

I did actually get to use Matchmaker, seeing as I was asked to work on its successor in my first month of grad school. Pretty nice; I’d love to see it fleshed out and made to work for Android. Alas, we’re sliding back. A few years back, my advisor hired a summer intern to work on MatchMaker. He instantly ran into a barrier: it didn’t work on Java 8.


The first lesson is that the tools we use are heavily shaped by the choices of eminent individuals. The reason that Reflexion Models are obscure while Mylyn is among the most popular Eclipse plugins is quite literally because Gail C. Murphy, creator of Reflexion Models, decided to go into academia, while her student Mik Kersten, creator of Mylyn, went into industry.

Programming tools are not a domain where advances are “an idea whose time has come.” That happens when there are many people working on similar ideas; if one person doesn’t get their idea adopted, then someone else will a few years later. In programming tools, this kind of competition is rare. To illustrate: A famous professor went on sabbatical to start a company building a tool for making websites. I asked him why, if his idea was going to beat all the previous such tools, it hadn’t been done before. His answer was something like “because it requires technology that only I can build.”

The second lesson is that there is something wrong with how we build programming tools. Other fields of computer science don’t seem to have such a giant rift between the accomplishments of researchers and practitioners. I’ve argued before that this is because the difficulty of building tools depends more on the complexity of programming languages (which are extremely complicated; just see C++) than on the idea, and that, until this changes, no tool can arise without enough sales to pay the large fixed cost of building it. This is why my Ph. D. has been devoted to making tools easier to build. It is also why I am in part disheartened by the proliferation of free but not-so-advanced tools: it lops off the bottom of the market and makes these fixed-costs harder to pay off.

But the third lesson is that we as developers can demand so much more from our tools. If you’ve ever thought about building a developer tool, you have so much impressive work to draw from. And if you’re craving better tools, this is what you have to look forward to.


1 I’d list some, but I don’t want to play favorites. I’ll just mention CodeQL, which is quite advanced and needs no touting.

Monday, March 15, 2021

Why Programmers Should(n't) Learn Theory

I’m currently taking my 5-person advanced coaching group on a month-long study of objects. It turns out that, even though things called “objects” are ubiquitous in modern programming languages, true objects are quite different from the popular understanding, and it requires quite a bit of theory to understand how to recognize true objects and when they are useful. As our lessons take this theoretical turn, one asks me “what difference will this make?”

I recently hit my five-year anniversary of teaching professional software engineers, and now is a great time to reflect on the role that theoretical topics have played in my work, and whether I’d recommend someone looking to become the arch-engineer of engineers should include in their path steps I’ve taken in developing my own niche of expertise.

I’ve sometimes described my work as being a translator of theory, turning insights from research into actionable advice from engineers. So I’ve clearly benefited from it myself. And yet I spend a lot of time telling engineers not to study theory, or that it will be too much work for the benefit, or that there are no good books available.

Parts of it are useful sources of software-engineering insight, parts are not. Parts give nourishment immediately; parts are rabbit holes. And some appear to have no relevance to practical engineering until someone invents a new technique based on it.

I now finally write up my thoughts: how should someone seeking to improve their software engineering approach learning theory?

What is theory?

“If I were a space engineer looking for a mathematician to help me send a rocket into space, I would choose a problem solver. But if I were looking for a mathematician to give a good education to my child, I would unhesitatingly prefer a theorizer.”

— Gian-Carlo Rota, “Problem Solvers and Theorizers

“Theory” in software development to me principally means the disciplines studied by academic researchers in programming languages and formal methods. It includes type theory, program analysis, program synthesis, formal verification, and some parts of applied category theory.

Some people have different definitions. There’s a group called SEMAT, for instance, that seeks to develop a “theory” of software engineering, which apparently means something like writing a program that generates different variants of Agile processes. I don’t really understand what they’ve been up to, and never hear anyone talking about them.

What about everything else which is important when writing software? UX design has a theory. Distributed systems have a theory. Heck, many books about interpersonal topics can be called “theory.”

Well, I’ve stated what I think of when I hear “software engineering” and “theory” together. Plenty of people disagree. You can call me narrow-minded. More importantly though, I’m only writing about things that I’m an expert in by the standards of industry programmers. If you read a couple of textbooks about any of those other subjects, you’ll probably know more than I do.

One more preliminary: I’ve named 5 different subfields, but the boundary between them gets very blurry. There is a litmus test for whether a topic is part of category theory, namely whether it deals with mathematical objects named “categories,” but, otherwise, most specific topics are studied and used in multiple disciplines, and the boundaries between them are defined as much by history and the social ties of researchers as by actual technical differences.

So, for many things, if you ask “What about topic X,” I could say “X is a part of type theory,” but what I’d really mean is “The theoretical aspects of X are likely to show up in a paper or book that labels itself ‘type theory.’”

In the remainder of this piece, I’ll discuss these 5 disciplines of programming languages and formal methods, and discuss how each of them does or does not provide useful lessons for the informal engineer.

Type Theory

“Type theory” can roughly be described as the study of small/toy programming languages (called “calculi”) designed to explore some kind of programming language feature. Usually, those features come with types that constrain how they are used, and, occasionally, those types have some deep mathematical significance, hence the name “type theory.” It can be quite deep indeed  did you know that “goto” statements are connected to Aristotle’s law of the excluded middle? Outside of things like that though, type theory is not about the types.

Studying type theory is the purest way to understand programming language features, which are often implemented in industrial languages in a much more complicated form. And, for this reason, studying type theory is quite useful.

To the uninitiated, programming is a Wild West of endless possibilities. If your normal bag of tricks doesn’t work, you can try monkey-patching in a dynamic proxy to generate a metaclass wildcard. To the initiated, the cacophony of options offered by the word-salad in the previous sentence boils down to a very limited menu: “this is just a Ruby idiom for doing dynamic scope.”

So many thorny software engineering questions become clear when one simply thinks about how to express the problem in one of these small calculi. I think learning how to translate problems and solutions into type theory should be a core skill of any senior developer. Instead of looking at the industry and seeing a churning sea of novelty, one sees but a polishing of old wisdom. Learning type theory, more than any other subfield of programming languages, fulfills the spirit of the Gian-Carlo Rota quote above, that one should pick a theorizer for a good education.

Perhaps the most profound software-engineering lesson of type theory comes from understanding existential types. They are utterly alien to one only familiar with industrial languages. Yet they formalize “abstraction boundaries,” and the differences between different ways of abstraction such as objects vs. modules are best explained by their different encodings into existential types. In fact, I will claim, however, that it is quite difficult (and perhaps impossible, depending on your standards) to fully grasp an abstraction boundary without understanding how to translate it into type theory. Vague questions like “is it okay to use this knowledge here?” become crisp when one imagines an existential package bounding the scope.

Type theory helps greatly in understanding many principles of software design. I’ve found many times I can teach some idea in a few minutes to an academic that takes me over an hour to teach a layprogrammer. But it is by no means inevitable that picking up a type theory book will lead to massive improvement without also studying how to tie it to everyday programming. It’s easy to read over the equality rule for mutable references, for instance, and regard it as a curiosity of symbolic manipulation. And yet I used it as the kernel of a 2,500-word lesson on data modeling. The ideas are in the books, but the connections are not. And thus, indeed, I’ve encountered plenty of terrible code written by theoretical experts.

Program Analysis

Program analysis means using some tool to automatically infer properties of a program in order to e.g.: find bugs. It can be broadly split into techniques that run the code (dynamic analysis) vs. those that inspect it without running it (static analysis); both of these have many sub-families. For example, a dynamic deadlock detector might run the program and inspect what order it acquires locks, and conclude that a program does not follow sound lock-discipline and is thus in danger of deadlocking. (This is different from testing, which may be unable to discover a deadlock without being exceptionally (un)lucky in getting the right timings.) In contrast, a static analyzer would trace through all possible paths in the source code to discover all possible lock orderings. I’ll focus on static analysis for the rest of this section, where most of the theory lies.

First, something to get out of the way: Static analysis has become a bit of a buzzword in industry, where it’s used to describe a smorgasbord of tools that run superficial checks on a program. To researchers, while the term “static analysis” technically describes everything that can be done to a program without running it, it is typically used to describe a family of techniques that, loosely speaking, involve stepping through a program and tracking how it affects some abstract notion of intermediate state, a bit like how humans trace through a program. Industrial bug-finding tools such as Coverity, CodeQL, FindBugs, and the Clang static analyzer all include this more sophisticated kind of static analysis, though they all also mix in some more superficial-but-valuable checks as well. I refer to this excellent article by Matt Might as a beginner-level intro.

This deeper kind of static analysis is done by a number of techniques which have names such as dataflow analysis, abstract interpretation, and effect systems. The lines between the approaches get blurrier the deeper you go, and I’ve concluded that the distinctions between them often boil down to little things such as “constraints about the values of variables cannot affect the order of statements” (dataflow vs. constraint-based analysis) and “the only way to merge information from multiple branches (e.g.: describing the state of a program after running a conditional) is to consider the set containing both” (model-checking vs. dataflow analysis).

Is studying static analysis useful for understanding software engineering? I’ve changed my mind on this recently.

I think the formal definition of an abstraction as seen in static analysis, specifically the subfield called “abstract interpretation,” is useful for any engineer to know. Sibling definitions of abstraction also appear in other disciplines as well such as verification, but one cannot study static analysis without understanding it.

Beyond that, however, I cannot recall any instance where I’ve used any concept from static analysis in a lesson about software engineering.

Static analysis today is focused on tracking simple properties of programs, such as whether two variables may reference the same underlying object (pointer/alias analysis) or whether some expression is within the bounds of an array (covered by “polyhedral analysis” and its simplified forms). When more complex properties are tracked, it is typically centered around usage of some framework or library (e.g.: tracking whether files are opened/closed, tracking the dimensions of different tensors in a TensorFlow program) or even tailored to a specific program (example). Practical deployments of static analysis are a balancing act in finding problems which are important enough to merit building a tool, difficult enough to need one, and shallow enough to be amenable to automated tracking.

A common view is that, to get a static analyzer to track the deeper properties of a program that humans care about, one must simply take existing techniques and just add more effort. As my research is on making tools easier to build, I recently spent weeks thinking about specific examples of which such deeper properties could be tracked upon magically conjuring more effort, and concluded that, on the contrary, building such a “human-level analyzer” is well beyond present technology.

Static analysis is the science of how to step through a program and track what it is doing. Unfortunately, the science only extends to tracking shallow properties. But there is plenty of work tracking more complex properties: it’s done in mechanized formal verification.

Program Synthesis

Program synthesis is exactly what it says on the tin: programs that write programs. I gave an entire talk on software engineering lessons to be drawn from synthesis. There is much inspiration to take from the ideas of programming by refinement and of constraining the search space of programs.

But, that doesn’t mean you should go off and learn the latest and greatest in synthesis research.

First, while all the ideas in the talk are used in synthesis, many of the big lessons are from topics that do not uniquely belong to synthesis. The lessons about abstraction boundaries, for instance, really come from the intersection of type theory and verification.

Second, the majority of that talk is about derivational synthesis, the most human-like of the approaches. Most of the action these days is in the other schools: constraint-based, enumerative, and neural synthesis. All of these are distinctly un-human-like in their operation  well, maybe not for neural, but no-one understands what those neural nets are doing anyway. There are nonetheless software-engineering insights to be had from studying these schools as well, such as seeing how different design constraints affect the number of allowed programs, but if you spend time reading a paper titled “Automated Synthesis of Verified Firewalls,” to use a random recent example, you’re unlikely to get insight into any aspect of software engineering other than how to configure firewalls. (But if that’s what you’re doing, then go ahead. Domain-specific synthesizers do usually involve deep insights about the domain.)

Formal Verification

Formal verification means using a tool to rigorously prove some property (usually correctness) of a program, protocol, or other system. It can broadly be split into two categories: mechanized and automated. In mechanized verification/theorem-proving, engineers attempt to prove high-level statements about a program’s properties by typing commands into an interactive “proof assistant” such as Coq, Isabelle, or Lean. In automated verification, they instead pose a query to the tool, which returns an answer after much computation. Both require extensive expertise to model a program and its properties.

Mechanized verification can provide deep insights about everyday software engineering. For instance, one criteria for choosing test inputs in unit testing is “one input per distinct case,” and doing mechanized verification teaches one what exactly is a “case.” Its downside: in my personal experience, mechanized verification outclasses even addictive video games in its ability to make hours disappear. As much as programming can suck people into a state of flow and consume evenings, doing proofs in Coq takes this to another level. There’s something incredibly addicting about having a computer tell you every few seconds that your next tiny step of a proof is valid. Coq is unfortunately also full of gotchas that are hard to learn about without expert guidance. I remember a classmate once made a subtle mistake early on in a proof, and then spent 10 hours working on this dead end.

I do frequently draw on concepts from this kind of verification. Most notably, I teach students Hoare logic, the simplest technique for proving facts about imperative programs. But I do it mostly from the perspective of showing that it’s possible to rigorously think about the flow of assumptions and guarantees in a program. I tell students to handwave after the first week in lieu of finding an encoding of “The system has been initialized” in formal logic, and even leave off the topic of loop invariants, which are harder than the rest of the logic. Alas, this means students lose the experience of watching a machine show them exactly how rule-based and mechanical this kind of reasoning is.

Automated tools take many forms. Three categories are solvers such as Z3, model-checkers such as TLA+  and CBMC, and languages/tools that automatically verify program contracts such as Dafny.

have argued before that software design is largely about hidden concepts underlying a program that have a many-many relationship with the actual code, and therefore are not directly derivable from the code. It follows that the push-button tools are limited in the depth of properties they can discuss, and therefore also in their relevance to actual design. Of these, the least push-button is TLA+, which I’ve already written about, concluding there were only a couple things with generalizable software design insights, though just learning to write abstracted models can be useful as a thinking exercise.

In short, learning mechanized verification can be quite deep and insightful, but is also a rabbit hole. Automated verification tools are easier to use, but are less deep in the questions they can ask, and less insightful unless one is actually trying to verify a program.

I’ve thought about trying to create a “pedagogical theorem prover” in which programmers can try to prove statements like “This function is never called unless the system has been initialized” without having to give a complicated logical formula describing how “being initialized” corresponds to an exact setting of bits. Until then, I’m still on the lookout for instruction materials that will provide a nice on-ramp to these insights without spending weeks learning about Coq’s difference between propositions and types.

Category Theory

Category theory is a branch of mathematics which attempts to find commonalities between many other branches of mathematics by turning concepts from disparate fields into common objects called categories, which are essentially graphs with a composition rule. In the last decade, category theory in programming has escaped the ivory tower and become a buzzword, to the point where a talk titled “Categories for the Working Hacker” can receive tens of thousands of views. And at MIT last year, David Spivak taught a 1-month category theory course and had over 100 people show up, primarily undergrads but also several local software engineers.

I studied a lot of category theory 8 years ago, but have only found it somewhat useful, even though my research in generic programming touches a lot of topics heavily steeped in category theory. The chief place it comes up when teaching software engineering is a unit on turning programs into equivalent alternatives, such as why a function with a boolean parameter is equivalent to two functions, and how the laws justifying this look identical to rules taught in high school algebra. The reason for this comes from category theory (functions are “exponentials,” booleans are a “sum”), but it doesn’t take category theory to understand.

Lately, I’ve soured on category theory as a useful topic of study, and I became fully disillusioned last year after studying game semantics. Game semantics are a way of defining the meaning of logical statements. Imagine trying to prove a universally-quantified statement like “Scruffles is the cutest dog in the world.” Traditional model-based semantics would say this statement is true if, for all dogs in the world, that dog is either Scruffles or is less cute than Scruffles. Game semantics casts this as a game between you (the “Verifier”) and another party (the “Falsifier”). It goes like this: They present you another dog. If you can show the dog is less cute than Scruffles (or rather, they are unable to find a dog for which you cannot do so), then the statement is true and you win. Otherwise, the statement is false.

(I admit that having this kind of conversation has been my main application of game semantics thus far.)

I started reading the original game semantics paper. The first few sections explained pretty much what I just told you in semi-rigorous terms, and were enlightening. The next section gave category-theoretic definitions of the key concepts. I found this section extremely hard to follow; it would have been easier had they laid it out directly rather than shoehorning it into a category. And while a chief benefit of category theory is the use of universal constructions that allow insights to be transported across disciplines, the definitions here were far too specialized for that to be plausible.

There’s a style of programming called point-free programming which involves coding without variables. So, instead of writing an absolute value function as if x > 0 then x else -x, you write it as sqrt . square, where the . operator is function composition. Category theory is like doing everything in the point-free style. It can sometimes lead to beautifully short definitions that enable a lot of insight, but it can also serve to obscure needlessly.

I’m still open to the idea that there may be a lot of potential in learning category theory. David Spivak told me “A lot of what people do in databases is really Kan extensions” and said his collaborators were able to create an extremely powerful database engine in a measly 5000 lines of Java. Someday, I’ll read his book  and find out. Until then, I don’t recommend programmers study category theory unless they like learning math for its own sake.


So, here’s the overall tally of fields and their usefulness in terms of lessons for software design:

  • Type theory: Very useful
  • Program Analysis: Not useful, other than the definition of “abstraction.”
  • Program Synthesis: Old-school derivational synthesis is useful; modern approaches less so.
  • Formal Verification: Mechanized verification is very useful; automated not so much.
  • Category theory: Not useful, except a small subset which requires no category theory to explain.

So, we have a win for type theory, a win for the part of verification that intersects type theory (by dealing with types so fancy that they become theorems), and a wash for everything else. So, go type theory, I guess.

In conclusion, you can improve your software engineering skills a lot by studying theoretical topics. But most of the benefit comes from the disciplines that study how programs are constructed, not those that focus on how to build tools.

Appendix: Learning Type Theory

Some of you will read the above and think "Great," and then rush to Amazon to order a type theory book.

I unfortunately cannot endorse that, namely because I can't endorse reading textbooks as a good way to learn type theory.

The two main intro type theory textbooks are Practical Foundations for Programming Languages (PFPL) and Types and Programming Languages (TAPL). I'm more familiar with PFPL, but I regard them as pretty similar. Basically, both present the subject as pretty dry, consisting of a list of rules, although they try to add excitement by discussing their significance. The rules have a way of coming alive when you try to come up with them yourself, or come up with rules for a variant of an idea found in a specific programming language. This is how I tutor my undergrads, but it's hard to communicate in book format.

So, what's the right way to learn type theory on your own? I don't have one. The best I can share right now is the general advice that following a course website that uses a book is often better than using the book itself. The best way for practitioners to learn type theory has yet to be built.