Friday, September 30, 2022

How an Ancient Philosophy Problem Explains Software Dependence

Update Oct 30, 2022: Professor Daniel Jackson wrote a response to some of the criticism this post experienced on Hacker News.

This post is based on a research paper at Onward! 2020 and loosely follows the talk I gave at that conference.

Have you tried to remove a dependency from a project and been unsure if you have successfully done so?

To “invert a dependency?”

To keep the core of a codebase from depending on something specific?

Or to “depend on interfaces, not implementation?”

If you’re a programmer, you probably have. If you’ve been doing this for a while, there’s a good chance you’ve gotten into a disagreement about whether you’ve successfully done so. Dependence is one of those words where everyone thinks they know what it means (Clean Code uses the word over 100 times without defining it). But as easy as it is to speak broadly about limiting dependencies, uncertainty about what it means leads into uncertainty in actual jobs.

By the end of this post, you will learn an objective definition of dependence for software engineering. This definition encompasses all places where engineers talk about dependence, from package management to performance engineering and even usability, and it’s sufficiently rigorous to be mechanically checkable.

By “mechanical,” I do not mean automated, nor even easy. It’s relatively easy to write a function where deciding whether it can interact with another function requires solving a famous unsolved math problem. But with this new understanding in hand, and some amount of thinking, next time you have to ensure that customer privacy does not depend on a third-party library, you should be able to explain exactly what that means and what must be done.

Background


   A dependence question raised in "Oracle v. Google"


We programmers talk about dependence all the time. Like, “my package has Postgres as a dependency but it doesn't depend on it until it's used, except then it still never depends on it because I'm database-agnostic.” If you ponder that sentence, you may realize that every use of the word “depend” has a different meaning. We know that limiting dependence is a good thing, but without really understanding what it means, it's advice doomed to break down — citrus advice.

A few years ago, Daniel Jackson and I set out to figure out what “dependence” really means in software. I daresay we were extremely successful. We have a general definition that specializes to each of the varieties of dependence hinted at in the previous paragraph, and then some. With it, you can definitively (though not always feasibly) answer whether you've managed to remove a dependence. Trying to apply the definitions raises a lot of subtlety, but we found we can address them by stealing algorithms from a certain subfield of (wait for it) computational philosophy. We first published our findings in a paper at the Onward! 2020 conference, where it was greeted enthusiastically by peer reviewers, becoming the only paper that year accepted without revision.


   Image by DALL-E 2, “complex clock mechanism hidden in mist”

This blog post is loosely based on my previous effort to distill our findings, my Onward! 2020 talk. Loosely. The words here may occasionally match what is said in the video, and the numbers next to the slides illustrations below will usually go up. The Onward! conference is a funny place, full of people who fly around the world to learn about the latest in static analysis and type systems, but then ditch that go hear about pie-in-the-sky ideas that don't work. (Though it's also the home of some of my favorite software-engineering essays ever.) I will make no assumption in this blog post that you've ever had reason to care about “static and dynamic semantics.” On the other hand, you might follow the example better than the original audience if you've been exposed to an “analytics framework” by means other than through the window of an ivory tower.

Anyway, let's go learn about dependence. First: why it's hard?

Depending on Analytics

As a running example, I have an iPhone app. It uses an analytics framework, but it doesn't use it directly. I've interposed an analytics abstraction layer, putting some kind of delegator between them. Now my question is, does the app depend on the analytics framework?

I'm trying to argue that the world didn't previously understand dependence, so I'd like to find some contradictory statements about dependence in well-known books. Problem is, these books mention dependence all the time, but rarely say anything concrete enough to find a contradiction. Still, I found this:

Looking at the quotes in the slide, Martin Fowler seems to think that having this interposed delegator removes the dependence. I believe Robert Martin is saying that if something important depends on my choice of analytics framework, then it should be referred to directly in the program text. But my first reading suggests something in tension: one saying that the analytics abstraction layer removes the dependence and hinting that's a good thing, and the other stating that it's bad. This is a pretty interesting contradiction from Robert Martin Fowler, who are quite close as far as software engineering gurus go. Of course, I'm not sure what they're actually advocating, which is part of the problem with existing terminology.

Branching out a bit, I'll let you make your own judgment on the following quote.

In the old Web site with the background specified separately on each page, all of the Web pages were dependent on each other.
—John Ousterhout, A Philosophy of Software Design

(Ousterhout did remove one misuse of “dependence” in the newer versions after I pointed it out. But I found more. Note that I still consider it a good book, even though I've criticized it a lot.)

To really draw out the confusion and mist around the idea of dependence, we prepared nine dependency puzzles.

The 9 Dependency Puzzles

  1. The Framework Puzzle: We usually think that, if A calls B, then A depends on B.

    But consider a round-robin scheduler: it invokes every task to be scheduled. Does it depend on each of the tasks?

  2. The Shared Format Puzzle: If A writes a file and B reads it, they are clearly coupled because of the shared format. But which depends on which?

    You also see this when a client sends messages to a server; if either end changes, the other will break. Do they both depend on each other?

  3. The Frame1 Puzzle: We like to think that, if A does not depend on B, then a change to B cannot affect A.

    But what if you change B in a way that introduces a new dependence, like modifying a global variable used by A?

  4. The Crash Puzzle: As a more extreme version of the previous: A change to any module can affect a change to any other module, namely by crashing the program. So, by the “A does not depend on B if a change to B cannot affect A” rule, everything depends on everything else.

  5. The Fallback Puzzle: A website loads a resource from a CDN but hits a server if unavailable; an app uses a cheap-but-flaky payment processor when it can but switches to a more established one when it's down; an analytics service tries to connect to multiple servers until it finds one that works; a media player that uses hardware video-decoding when available but software when not — these all have the structure “A uses B but falls back to C.”

    With fallback, the failure of B does not imply the failure of A. Does this mean that A does not depend on B?

  6. The Inversion Puzzle: Dependency inversion is touted for removing dependencies: instead of A statically invoking module B, the program passes A a reference to B at runtime. But A will still fail if B fails. So is there no change in the dependence structure?

  7. The Self-Cycle Puzzle: We usually think of dependence as transitive: A cannot function without B, B cannot function without C, therefore A cannot function without C.

    We also complain about dependency cycles: A depending on B and B depending on A is bad, but it can happen.

    Now, putting these together: What does it mean for A to depend on A?

  8. The Higher-Order Puzzle: Libraries should not depend on application code. But, from one perspective, they clearly do: a hash table implementation will give incorrect output if keys have inconsistent implementations of equals() and hashcode() methods (that is, if equal values have different hashcodes). Do hash tables and other library classes depend on their callers?

  9. The Majority Puzzle: There was a fad a few decades ago called “n-version-programming.” The idea of N-version programming: if 5 people write the same program, then there's no way they'll exhibit any of the same bugs, right? So you can have 5 people implement the same (very detailed) spec for a robot controller, and have the programs vote what to do at each timestep; so long as no 3 people have the same bug, it should always function according to spec, right?

    Of course, Nancy Leveson showed empirically that this doesn't work and the fad died, but it presents an interesting hypothetical: Suppose all 5 controllers implement the spec perfectly, and thence all 5 always vote for the same action. Then no change to any single implementation can alter the robot. Does this mean that the robot does not depend on any of the controllers?

We'll be answering all these puzzles at the end. But, coming up next: understanding why the sentence “Does A depend on B?” can actually stand for one of many, many different questions.


Types of Dependence

Let's revisit the analytics example and ask “does the app depend on the analytics framework?” There are several questions here.

Well, not quite. There's a tradition in the Jewish holiday of Passover that the youngest always chants a Hebrew verse titled “The Four Questions.” The joke is that it's actually one question (“Why is this night different from all other nights?”) with four answers. Likewise, “Does it depend” is just one question, but here's five answers, as seen in the image:

  1. No, I can build them separately.
  2. Yes, a crash in the framework will be a crash in the app.
  3. No, because I can just use a different framework that changed the code and I will still have working analytics.
  4. And no because I'll get the same resul-.....actually, no I won't. It's an open secret that no two analytics frameworks give the exact same numbers. Ideally they'll be close enough for all purposes other than choosing what to say when boasting about number of users. But maybe one will give more statistical accuracy, so, realistically, yes.
  5. Oh my god yes, because any security vulnerabilities in the framework will be security vulnerabilities of the app as well.

But yeah, these are definitely answering different questions.

Dimensions of Dependence

To make a question about dependence concrete, you must specify three parameters: the semantics or execution model in question, the property of interest, and the space of permitted changes.




Dimension 1: Semantics

Let's go back to that agonizing sentence from the intro. “My package has Postgres as a dependency but it doesn't depend on it until it's used, except then it still never depends on it because I'm database-agnostic.” Change a couple words, and it fits the analytics example as well. So I said before that this sentence uses “depend” in three different ways1. It's not just that they're answering different questions. These are actually different models of execution.

  1. When talking about importing a dependency, the “execution” engine of interest is the compiler or build system: is the program even well-formed, or is there a build error? This roughly maps onto the more theoretical term “static semantics” or “type system.”

  2. When saying that actually invoking a package is what gives a dependence, the execution engine in question is an interpreter, or the combination of the compiler + a CPU. The focus is now on whether the code actually runs and contributes to the overall program output. “Dynamic semantics” is a general theoretical term for what actually happens when running.

  3. When saying that a program would work even after switching to a different library, the “execution engine” in question is actually the reasoning used to convince yourself or someone else that it's correct / achieves its goal, based only on the guaranteed properties of the analytics abstraction layer. If you buy that that human reasoning is analogous to automated reasoning, then the theoretical term that captures this is the “program logic,” the proof system used to show the program achieves its intention.

This lists just 3 possible execution models, but there are infinitely many more, a few of which are discussed in the paper. For example, the property of whether text is readable could be phrased in Visual Logic.

We can already start to categorize the answers about the analytics framework accordingly. Inability to build separately refers to the well-formedness / static semantics. When a crash in one is a crash in the other, that's the interpreter / dynamic semantics. The ability to swap it out and still have “working analytics,” that's the correctness argument / program logic.

Of course, the last two answers, about accuracy and security, also sound like correctness arguments. So it's not enough.

Dimension 2: Properties

You might notice from the last answers about whether it's secure and whether it has working analytics: those are actually different properties.

You should never ask, does module A depend on the module B? But rather ask, does some property P of A depend on B?

We can now classify the answers better by looking at the property. It does depend in the well-formedness property, does depend on runtime behavior, does not depend on correctness, does depend on statistical performance, and does depend on security.

But there's a tension here. I'm saying that there's no dependence in whether it works, except that I can hack into some versions and not others and make them not work. So maybe correctness does depend? We'll address that soon.

In the meantime, we're on our way to resolving some of the dependency puzzles puzzles. Back to the Framework Puzzle: Does a round-robin scheduler depend on the individual tasks? It clearly does for many runtime properties. But for the correctness of the scheduler, it shouldn't....

...unless it's a non-preemptive scheduler and a task doesn't terminate and hogs all the scheduler time. So yeah, maybe.

We've now seen two examples where we want to say that there's no dependence on a component, except that component can “break out of the box” and mess up the rest of the system. That's where the third dimension comes in.

Dimension 3: Permitted Changes

A property of dependence we'd like to have: if A does not depend on B, then no change to B can affect A. Actually, this sounds like it would make a nice definition. When Dale Carnegie says “happiness doesn't depend upon who you are or what you have; it depends solely upon what you think,” he seems to be saying that merely changing whether one owns a Lamborghini would not alter whether they have the property of being happy; they must also change their belief about whether they own a Lamborghini. But we've seen this apparently violated in the Crash Puzzle described earlier, which can be instantiated to this scenario quite literally: owning a Lamborghini but believing otherwise might get you in a fatal crash, which is detrimental to happiness. More generally: how can we say “no change to B can affect A” when one can “change B” by replacing it with a program that destroys A.

So we're left with this tautological definition: if A does not depend on B, then no change to B can affect A, unless that change introduces a dependence. It's easy to say “we should just exclude these kinds of changes. Like, Signal should be able to say they don't specifically depend on Twilio to work, as opposed to anything else they could use for sending SMS. But they announced a Twilio-based security incident while I was writing this, so sometimes these kinds of changes really matter.

So, here's an idea: how about sometimes we exclude these kinds of changes, and sometimes we don't?

Or, in other words: every conversation about dependence implicitly carries a reference frame about which kinds of changes of reasonable to consider. When you're asking about whether your app's stability depends on an analytics framework, you should get a different answer based on whether you're interested in whether an API change can cause your code to raise an exception versus whether you're wondering whether someone can exploit a vulnerability to brick your customer's phone.

After the execution model and the property of interest, this “space of reasonable changes” is the third parameter determining a dependence question. Sometimes this is a spec: the mantra “depend on interfaces, not implementations” seems to mean that, if your program uses a function, then any change to that function's implementation within the space of “changes that don't alter the spec” should not change your function's correctness. So you could call the “space of reasonable changes” a spec. But that would be misleading, because if you want to ask “Could the library authors mess me up by removing this guarantee,” you are actually asking about possible changes to the library's already-existent spec. In that case, the space of permitted changes is actually a spec on the spec, and so the best name we came up for this is “super-spec.”

Now we can totally resolve the Framework Puzzle: Forbid tasks that don't terminate, and there's clearly no dependence. And the Crash Puzzle: Forbid changes that make the component crash, and there's no dependence.

But there's still the Fallback Puzzle. Based on what we've said so far, we really have no clue. Intuitively, I'd want to say that my payment succeeding depends on the payment processor that was actually used. But if the only thing that a change to that processor can do is bring it down, then no change to it can stop my payment from going through, because the system would just use the other processor. So now it sounds like, if your system has a fallback component for everything, then it has no dependencies!

More abstractly, we're asking questions like “I used it, but I had another one, so did I really need it?” If you're thinking that this sounds like something philosophers ask, then you're right.

Dependence is Causation

So I foreshadowed this one by suddenly talking about happiness and Lamborghinis: everything I've been talking about can also apply to everyday things in the physical world. In fact, all of the dependency puzzles are isomorphic to a physical causality question.

The Framework Puzzle with the round-robin scheduler and how long tasks take? That's a lot like stacking boxes; the contents of the box don't matter unless they're really heavy. The Crash Puzzle, where everything depends on every other component because they can crash? That's a lot like saying that your ability to read this right now depends on what every single one of your neighbors is doing, which is, apparently, not cutting your electricity. And the Fallback Puzzle? That's a lot like asking if the grass being wet right now is caused by it just having rained when the sprinkler was programmed to come on if it didn't.

More specifically, there are actually two categories of causality. There's “type causality" or “general causality,” which deals with general statements about categories such as “A stray spark causes fire." And then there's “actual causality" or “token causality,” which deals with statements about specific events, such as “A stray spark in Thomas Farriner's bakery on Pudding Lane caused the Great Fire of London on September 2nd, 1666." As dependence deals with questions about specific programs and specific scenarios, actual causation is the one relevant here.

There have been several proposed formal definitions of actual cause, and there is no standard for determining whether a definition is correct—only arguments that it is is useful and produces answers that match intuition. The basic one is the “naive but-for” definition: A causes B if, but for A having been true, B would not be true. This is the definition used by lawyers, and is the reason you can be charged with murder if a cop falls to their death while chasing you, because you “caused” their death. But when applied to the rain and sprinkler, it tells you that the rain cannot be a cause of the wet grass, which is counter to intuition.

But maybe that intuition is because rain and sprinklers are actually pretty different. There are isomorphic questions, such as “Is my apartment cool because I turned on the AC at 6:58 when it was scheduled to turn on at 7:00,” for which the intuitive answer is no. Similarly, my intuition says that my credit card succeeding did depend on which payment processor was actually used. But I can't be so moved to care which of many redundant servers my browser is connected to when writing this. So while much of the literature on actual causation examines the shortcomings of the “naive” but-for definition and focuses on examples like the sprinkler, we've decided that the but-for definition is actually pretty good, and the problems mostly come up when theorists play with models of rain and sprinklers that ignore their differences.

This means that I get to spare you the debate about the definitions of actual causation. If you really want it, you can have a look at our paper for a brief intro. For more, well, Joseph Halpern has an entire book on this subject.

Anyway, this leads to the finale, a general definition of dependence:

A property P definable in some execution model of a program depends on component A if and only if A is an actual cause of P, with respect to a space of permitted changes to A.

Using the but-for definition of causation, this becomes:

A property P definable in some execution model of a program depends on component A if and only if there is a permissible change to A which alters whether P holds.

Puzzling No More!

And now, final definition in hand, we can go back and definitively answer all of the dependency puzzles. And by aping the causality literature, we get some new concepts for free. For some questions, we get the answer “it's not a dependence, but is part of the dependence,” stealing the concept of “part of a cause.”

  • The Framework Puzzle: Asking about dependence of a specific property resolves the confusion. It is true that the runtime behavior of a round-robin scheduler, particularly timing, does depend on the tasks being scheduled. But its correctness does not.
  • The Shared Format Puzzle: There are actually two resolutions to this puzzle. For the first, we again look to a specific property.

    The serializer/deserializer should satisfy a round-trip property, that serializing a value and then deserializing it results in the original value. This round-trip property clearly depends on both. Through another lens, we can define correctness of each end separately, in terms of an abstract description of the format. Then the correctness of each depends on the shared format, but not on the other end.

    In both cases, the serializer and deserializer do not depend on each other, though they are coupled. 
  • The Frame Puzzle: Super-specs resolve this puzzle! If it is reasonable for B to modify a variable used by A, then A already depends on B, and a programmer would indeed need to read the source code of A to check that it does not interfere with B.

    But if one imposes a super-spec on B forbidding it from modifying such extraneous state — in mundane terms, if your tech lead would yell at you for changing it to do such mutation — then the dependence, and with it the need to read, disappears.
  • The Crash Puzzle: This one's just an extension of the previous. Super-specs again come to the rescue. If you consider crashing changes to any module, then indeed the runtime behavior of every module depends on every other module. But if crashing changes are out of consideration, then this is not so.
  • The Fallback Puzzle: These fallback scenarios are isomorphic to examples from physical causality, such as “If it doesn't rain, the sprinkler will turn on [ergo, the grass-watering procedure falls back to a sprinkler].” And so, concepts from causality dissolve the confusion. There are two resolutions.

    The first: a dependence doesn't have to be a single module! We can say that the only reasonable change to the CDN (i.e.: its superspec) is to knock it offline. Because of fallback, that can't prevent an image to loading, so the image loading indeed does not depend on the CDN. But simultaneously knocking the CDN and the fallback server offline can! So, the dependence is actually on the set {CDN, fallback server}, and the CDN itself is part of a dependence.

    But another resolution is to ask: is it really the same to use B vs. falling back to C? In the case of loading an image from a CDN, perhaps not, and it's intuitive to say that you loading a website just now did not depend on its CDN. But for receiving a payment through SketchyCheapProcessor vs. StodgyOldProcessor, it would be an expensive mistake to treat them as the same!
  • The Inversion Puzzle: Thinking about the lens of multiple tools resolves this one. To the interpreter, even after doing dependency inversion, there is a dependence (i.e.: runtime behavior does depend). But to a compiler or typechecker, the dependence has been removed (i.e.: well-formedness does not depend). 
  • The Self-Cycle Puzzle:

    Insights from the causal-inference literature resolve this one as well. Surprisingly to many, causality is not transitive, and hence dependence is not either. Simple, if facetious example: Seeing an advertisement for pizza causes me to press buttons on my phone. Me pressing buttons on my phone causes your phone to ring. Me seeing an advertisement for pizza does not cause your phone to ring (as it causes me to press the wrong buttons). Discussing exactly when and why causality is not transitive can get pretty technical, and is sensitive to exactly how one translates a statement into formal language. (Indeed, you may have already prepared a valid reason why my pizza example should not count.) If you're interested, I'll refer you to Halpern's “Sufficient Conditions for Causality to Be Transitive.”

    Of course, many kinds of dependence are transitive, such as build dependence. But this does mean that you don't need to bend your brain around self-dependences; just use the definition in this article.
  • The Higher-Order Puzzle: Actually stating the property of interest resolves this one. Every function's correctness is defined with a precondition: if the inputs are valid, then the outputs are as desired. In the case of a hash table, the input includes the keys' equals() and hashcode() functions, and these are only valid if their implementations are consistent. If you build a hash table on keys whose equality is determined by coin-flipping, then getting bad output is not the hash table's fault. So the hash table's correctness does not depend on any specific implementation of equals() and hashcode(), but its runtime behavior sure does.
  • The Majority Puzzle: This is another puzzle ripped straight out of the causality literature, taken from examples on (wait for it) majority vote! The idea of being “part of a dependence” returns to resolve this one. Indeed, the robot's behavior does not depend on any single one of the 5 controllers, but it does depend on each “winning coalition” of voting programs; that is, each subset of size 3.

And now we can go back and un-confuse the Fowler and Martin quotes from the beginning. Fowler is talking only about static/build dependence. Martin Fowler is talking about how inserting your abstract delegate factory removes the build dependence. Robert Martin seems to be saying that if the correctness property depends on which analytics framework you're using, then you shouldn't be using this abstract delegate factory. You should actually have direct reference textually to make the well-formedness depend. Now we can see this piece of advice are actually not in contradiction, that it can still be good to insert this analytics subtraction framework when the choice of analytics framework does not matter, when the correctness property is not affected by the choice. But when you are relying on a single choice, then: not such a good idea.

Coupling vs. dependence

I mentioned coupling briefly above. Where does coupling fit in? I think this story about the connection between causation and dependence also explains coupling: dependence is to causality as coupling is to correlation.

A common view in causality is that the connection between causality and correlation is given by Reichenbach's principle: that if A and B are correlated (i.e.: knowing one gives some information about the other), then either A causes B, B causes A, or there is some common cause C. As an addendum, correlations can also be observed if one limits observations to situations where some common effect of A and B occurs. For example, if you have a bookshelf, you'll likely find that the more useful books tend to be less interesting — not because of some intrinsic trait, but because you probably don't buy books which are neither interesting nor useful. This is Berkson's paradox.

I propose that there is a software analogue to Reichenbach's principle: that if A and B are coupled (which I take to mean: tend to change together), then either A depends on B, B depends on A, or they have some common dependency. And there is a software analogue of Berkson's paradox too: if one is trying to preserve some property that depends on both A and B, then a change to A will be accompanied by a change to B, as in the case of changing a serializer while trying to preserve the round-trip property.

Conclusion

Dependence is something everyone assumes they know, but that understanding breaks down quickly. It may be pretty cool to connect dependence and causality, but what does learning this mean for actual programming? I say there are three benefits, although they can be said to all be part of the same benefit.

The first benefit: it ends debates and clarifies confusing discussions. See our alteration to the Martin Fowler and Robert Martin quotes above.

The second: it paints a target. When I started the Thiel Fellowship in 2012, I had to write a letter from my future self in 2014 explaining my accomplishments. I stated that I would have a tool that could globally chase down every place in the code that depends on some assumption and remove that assumption. It's 2022 and I still don't have one, but at least the problem is defined now, so I know where to begin.

And, of course: knowing what dependencies are helps you remove them. For example, Parnas teaches us to not let general things depend on specific things. Just as there are ways to trick yourself into thinking you've removed an if-statement without really doing so, there are ways to trick yourself into thinking you've removed a dependence — say you only removed a static dependence, but you actually wanted to remove the dependence of some correctness property.

And there's one dependency I especially want to remove: the dependency of the ability to produce quality code on indescribable experience and pattern-matching.

I'm thinking of a story from a recent student of my Advanced Software Design Course, whom I'll call Karl. Karl is the lead software engineer of a startup in Northern Europe. He noticed some design flaws in some code from one of his junior engineers, and started to refactor it, changing 1800 lines across 10 PRs. Hoping to make it a learning experience, Karl walked the engineer through it until Karl thought the engineer understood, so that he could finish it himself. Karl assigned the rest of it to the engineer, and then came back to find...

...the code had been un-refactored into the original form.

What happened is that Karl built understanding of what was wrong with the original code and good about the refactoring via a mechanism whose effectiveness depends on building an intuition from a lot of experience. Although it served Karl well as an engineer, it failed him as a lead; it was not transmittable. That motivated him to come to me.

When a concept teaches you how to identify the best design decision and explain it, it's obviously useful. But I think there's a subtler benefit that comes from understanding your actions, like going from following recipes to being a chef. I hope this understanding helps your subjective experience of programming move from climbing a mountain to enjoying the view at the top.

1 “Frame” is a term used in program analysis to mean “the set of values a code snippet may modify,” and more generally in AI and philosophy to mean “the set of beliefs that may have to be updated in response to an action.” See Wikipedia or Daniel Dennett’s delightful essay.

2 If you've read The Three Levels of Software: The dynamic semantics here correspond to both the runtime and code levels in that blog post, depending on whether you're considering one execution vs. all possible executions. The level of modular reasoning maps perfectly onto the “program logic” model here. The static semantics here was not discussed at all in that blog post, because there's no need for new terminology to discuss build errors.


Thanks to Nils Eriksson for comments on earlier drafts of this blog posts. Thanks to Daniel Jackson for being a great coauthor helping me create the original content, and to the anonymous reviewers at Onward! 2020 for their feedback helping us to refine it.

Wednesday, September 21, 2022

Solving the Dog-Bunny Puzzle with Program Verification Technology

As a high schooler in the 70’s, my father enjoyed playing the Star Trek game written for the Sigma 7 mainframe. You play as the Enterprise surrounded by Klingon ships, and to shoot them down you have to look at a grid and figure out which angle (in degrees) to fire at.

  

It was fun right until he learned trigonometry.

Three days ago, Conrad Barski, creator of the absolutely delightful book/comic Land of Lisp, took the Internet by storm with a new puzzle game. In this puzzle, three characters move between rooms and press buttons, where each button closes or opens other doors. At least, that’s how it might appear in a traditional video game. Phrased that way, it may remind you of many other games; I’m reminded of the famous Flash game Fireboy and Water Girl and the obscure student project Lineland. But in its actual presentation, the Dog-Bunny Puzzle abstracts this mechanic into a thing someone can understand with no instructions in 60 seconds, while also being surprisingly difficult.

I spent a minute or two playing around and discovered that it would actually take serious thinking to solve it. I was about to put it down to get back to my work, writing design challenges for software engineers, when I realized this puzzle could fall to a sledgehammer a bit less humdrum than trigonometry, something I first learned about in a lecture on how to automatically fix bugs in concurrent programs.

You see, the puzzle could be modeled with something called a Petri net. And then I could run a Petri-net solver to solve the puzzle.

In fact, modeling it is so straightforward that I’d go so far as to claim the puzzle is a Petri net.

Following in my father’s footsteps, I shall now proceed to destroy the fun in this puzzle by replacing it with an intellectual challenge.

What is a Petri Net?

A Petri net is a graph with a bunch of tokens on each node. When certain conditions are satisfied, you are allowed to move tokens from one node onto another node. If this already sounds like the Dog-Bunny Puzzle, then, well, I did say the puzzle is a Petri net, right?

They were supposedly invented by Carl Adam Petri when he was 13, and are widely used in the field of formal verification. Unlike Petri dishes, invented a century earlier, they cannot be used to grow bacteria. But they can be used to model cellular activity in bacteria.

A Petri net has three components: places, transitions, and tokens. Consider this Petri net:

  • The four circles are places.
  • The small black circles are tokens.
  • The rectangles are transitions.
  • To fire a transition, you remove one token from each of the incoming places, and add a token to each of the outgoing places. So, in this picture, to fire T1, you would remove a token from place cp1 and add a token to place p1.
  • If all of the incoming places for a transition have tokens on them, then it is possible to fire that transition, and we call it enabled. Otherwise, it is disabled. In this picture, T1 is enabled, while T2 and T3 are disabled.

And….that’s basically it. But they can do a lot. Places can represent lines in a program, states in a protocol, or molecules in a chemical reaction. Tokens can represent which line a thread is on, or some shared resource like a lock. And then the same general tools and algorithms can be applied to each of them.

There are several bajillion extensions to the basic Petri net model. For this puzzle, we need colored Petri nets. All that means is that there are now two types of tokens instead of one: dog tokens and bunny tokens.

I went searching for a colored Petri net solver, and found what looks to be the only game in town: a pretty serious package called CPN Tools. This was going to be fun. I eagerly went to the downloads page and found the Mac section.

We recommend you to download the latest stable Windows version of CPN Tools and run it using a virtual machine. See instructions for doing this on a Mac here (the instructions for doing so on Linux should be similar). You can also run CPN Tools using Wine. 

Fun indeed.

Modeling the Dog-Bunny Puzzle in CPN Tools

You ever hear complaints about UIs designed by programmers? You know you’re using one when there are two colors and two hundred buttons.

CPN Tools feels more like it was written by someone whose parents forced them to go to tech school instead of art school. The only thing you can do is drag pictures onto the screen, and the primary effect of doing so is to cover your window in pastel-colored bubbles. While conformist programs let you save by either going to File->Save or clicking an icon of a floppy disk, in CPN tools you get to click an illustration of a Petri net moving onto a floppy disk, and then drag it onto the thing you want to save. A little green comic-book speech bubble pops up to tell you your save is successful, except that there’s no actual speech in the bubble, because artwork that speaks to the head misses the soul, or something. The manual paints a sweeping Impressionistic picture of the software, with the details left to the reader’s imagination.

But enough of that. Time for you to sit back and learn some formal methods.

The first step is to define the places. We’ll have one place in the Petri net for each place in the Dog-Bunny Puzzle. I’ll set them up with roughly the same geometry as in the puzzle.

Now we start on the transitions. There are unconditional unidirectional transitions from the Bone and Flower to the Boat, and from the Carrot to the Tree. That is, a dog or bunny can move from the Bone to the Boat at any time, no matter what, but not vice versa. These are transitions with one incoming and one outgoing arc each, removing a token from one and adding it to the other.

There are unconditional bidirectional edges between the tree and well, and the well and flower. That means, at any time, a dog or bunny can move between these in either direction. Petri nets don’t really have bidirectional transitions, so we have to model these as a unidirectional transition in each direction.

Now we turn to the conditional edges.  There’s an edge from the House to the Bone that can only be crossed if something is at the carrot. This can be modeled: there’s a transition that removes a token from the House and Carrot, and then adds tokens to the Bone and Carrot. So the net effect of firing this transition is to move a token from the House to the Bone, but it’s only enabled if something is on the Carrot. I’ll call the arcs between this transition and the carrot “enablement arcs.”

This is the point where the model starts to get graphically ugly even as it remains conceptually simple. There’s a reason the puzzle just represents these relationships with text. From now on, I’ll be drawing all these enablement arcs in white so you can pretend they’re not there. Let’s finish these: There transitions from House to Boat and vice-versa which require something on the tree, and ones between Tree and House which require a token both on Bone and Flower.

Now, the last transition is a little different: A token can move from Well to Carrot only if there is nothing on Bone. This one sounds painful to model the way we’ve been working. One option is to add a transition from Well to Carrot enabled by House, and another one from Well to Carrot enabled by Boat, and so on for every place except for Bone. Fortunately, CPN Tools supports inhibitor edges: I draw a transition from Well to Carrot with an inhibitor edge from Bone, and it behaves exactly the way we want. Here’s the model, with the inhibitor edge in red, and all enablement edges in white. I’ll include a little bit more of the UI in the screenshot this time.

Now, it’s time to add the tokens. The “colors” of tokens in CPN tools can have a lot of structure: they can be numbers, booleans, or richer data structures. But here, we just need two types of tokens: dogs and rabbits. These tokens are interchangeable except for the victory condition, so they’re really two variants of the same type, at least as far as CPN Tools is concerned. The “Declarations” pane to the left has a standard list of “color sets.” I’ll add a new one:

colset ANIMAL = union DOG | RABBIT;

Now it’s time to add the dog and rabbit tokens to the graph. We’ll need to change the type of every place to ANIMAL, and then we can add the tokens. How do we do it? If you guessed “Press the TAB key twice and then type 1`RABBIT”, being careful not to balance the backquote,  then you are absolutely right.

CPN Tools at this point has decided that my model must be for some kind of performance, and decided it needs an accompanying laser show. 

The reason for the errors on each arc is that, because tokens are no longer interchangeable, when tokens enter and leave a transition, I need to specify which is which. If I have a dog on the House, I need a way to specify that the transition moves the dog to the Bone when there’s a rabbit on the Carrot, not that it moves the rabbit to the Bone and the dog to the Carrot.

To do this, we declare three variables under the ANIMAL color set.

   var mover : ANIMAL;
   var constraint1 : ANIMAL;
   var constraint2 : ANIMAL;

Now, we label every single arc with one of these variables. If a transition has incoming arcs labeled mover and constraint1, then the token which travel across those arcs get bound to mover and constraint, respectively. Then there are outgoing arcs also labeled mover and constraint1, indicating which token goes where. So we’ll label most arcs with mover, but enablement arcs with constraint1 and constraint2.

One problem though: I can’t find a way to hide the labels. To save space in the diagram, I’ll rename these variables to p, q, and r.

And…the model is done! We can now plan with it in CPN Tools’s simulator mode, and see that it follows the same rules as the original Dog-Bunny puzzle.

One last thing though: Before we can start to use the solving features of CPN tools, we need to give each transition a unique name. I’ll just name them a, b, c, d, etc.

Solving the Dog-Bunny Puzzle with CPN Tools

It is now time to solve.

Every arrangement of tokens on places forms a state. In the start state, there is a rabbit on the House, a rabbit on the Boat, and a dog on the Tree. In one possible successor state, there are two rabbits on the House. In a different successor state, there are two rabbits on the Boat. From the start state, we seek to reach a state with the rabbits on the Carrot and the dog on the Bone.

You might think we could try dragging the rabbit tokens to the Carrot, the dog tokens to the Bone, and then ask “How do we reach this state?” Or perhaps by typing a query into some kind of query box. But that would be neither artistic nor technical enough.

We make the procedure technical by writing code in asking for which state has the properties  of the dog and rabbits being in their proper positions. We use Standard ML, as the language endorsed by a certain breed of university professor who thinks they know programming better than you do. The important part of this snippet is the lambda beginning on the third line; the rest is just telling it to actually search the state space and give me a list of results.

SearchNodes (
 EntireGraph,
  fn n => [DOG] = (Mark.Puzzle'Bone 1 n)
          andalso
          [RABBIT, RABBIT] = (Mark.Puzzle'Carrot 1 n),
 10,
 fn n => n,
 [],
 op ::)

We then run the code using an out of the box idea, which makes the code one with the image: we type it into an arbitrary textbox and right-click “Evaluate ML.”

And thus, we learn that it is possible to reach a state with the desired property, and it is called state 150. Woohoo! 

Not only that, but by similarly running more code

NodesInPath(1, 150)

We learn the sequence of states needed to get there is:

  [1,4,8,13,17,20,23,26,28,33,42,55,70,
   86,98,105,110,113,116,118,123,130,137,
   142,145,148,150]
.

We learn that the solution to the puzzle requires 26 moves. All that’s left is to learn what these numbers mean.

You might think we could just click a button and it would show you the state that one of these numbers means. That is close, but it has the problem of not feeling like Photoshop, so instead you need separate clicks to first select the “Set Petri net to state 4” tool and then to tell the computer system, that, yes, the reason you selected this tool is to see state 4.

If that’s too slow, well, you also have the option of reading the states like this.

But now, after spending more time trying to read the generated solution, than some people took to solve it from scratch, I have it! And now my animals are well-fed and happy.

  • Time I spent to solve this puzzle: 3 hours
    • 10 minutes modeling, 2 hours 50 minutes figuring out CPNTools’s interface
  • Time spent by the 8 year-old daughter of a random commenter: 2 minutes

Conclusion

I don’t see myself using CPN Tools again. It’s too programmatic to be a good GUI tool, and too GUI-based to be a good programmatic tool. Hopefully in the future if I use Petri nets, I’ll only need normal ones, not colored. That opens up quite a few programming libraries, such as the SNAKES library in Python.  Some of them might actually implement the clever solving algorithms we’ve had for 40 years, as opposed to CPN Tools, which appears to just brute force it.

Funny enough, while I know of a use of Petri nets in practical programming tools (productivity, not verification) recently, I’m personally responsible for killing it. Hoogle+ is a tool that uses Petri nets to come up with small programs of a desired type. My most recent paper introduces a new algorithm (not based on Petri nets) which solves the same problem 8x faster despite an implementation just a tenth the length.

But, regardless, now I’m more experienced with a technique that can solve loads of problems. And so are you.


I have released my CPN Tools file for this post here.

Monday, March 28, 2022

Abstraction: Not What You Think It Is

“Interfaces are abstractions”
Olaf Thielke, the "Code Coach"

“Interfaces are not abstractions”
Mark Seeman, author of Code that Fits in Your Head and Dependency Injection

“Abstraction in programming is the process of identifying common patterns that have systematic variations; an abstraction represents the common pattern and provides a means for specifying which variation to use”
Richard P. Gabriel, author of “The Rise of Worse is Better” and Patterns of Software
“[...] windshield wipers [...] abstract away the weather”
Joel Spolsky, cofounder of Fog Creek Software, Stack Overflow, and Trello

Of all the concepts debated in software engineering, abstraction is at the top. I found two separate debates about it on Twitter from the past week.

As the quoted writers show, people do not even agree what abstraction means. Abstraction seems to stand for a hodgepodge of different concepts involving generality, vagueness, or just plain code reuse. These engineering debates — debates about whether duplications are better than the wrong abstraction or about whether abstraction makes code harder to read — trickle down into heated discussions over code. But this confusion over abstraction's basic meaning makes all such debates doomed.

This situation is particularly sad for me as someone with a background in PL theory. There are a lot of topics in software engineering that are the result of accumulated intuition over decades. But we've had a pretty good definition of abstraction since 1977, originally in the context of program analysis, and — I claim — it actually translates quite well into a precise definition of “abstraction” in engineering.

Abstraction in general is usually said to be something which helps readers understand code without delving into the details. Yet, for many of the concrete code examples programmers actually call “abstraction,” they can be (and are) used in ways which add details and hinder understanding. In opposition, I take the position that software engineers will benefit from studying the mathematics of PL-theoretic abstraction, understanding how it explains things they already do, and letting this coherent definition rule their use of the term "abstraction."

My initial goal in this post is a smaller enabling step: to give you names for the other concepts that are often combined under the name "abstraction" that they may be referenced, used, and critiqued specifically, and to help you move away from the vague and contradictory citrus advice that arises from using the same name for different things. From there, I will proceed to provide the rigorous definition of abstraction as it pertains to software, followed by concrete examples.

But the story does not end after separating true abstractions from their impostors. The endless debates over what is and isn't an abstraction shall be resolved in an unsatisfying way: almost anything can be viewed as an abstraction, and most abstractions are useless. Yet once you learn how to actually write down an abstraction mapping, you gain the ability to look beyond the binary and explain the exact benefit that a given abstraction does or does not provide a reasoner, and in doing so rearrange the discordant intuition around abstraction into harmonic lines of precise, actionable advice.

Not Abstraction

There are at least five other things that go under the name abstraction.

Functions

One contender for the oldest programming language is the lambda calculus, where Alonso Church showed us that, by copying symbols on pen-and-paper using a single rule, one could compute anything.

In the lambda calculus, making a new function is called “lambda abstraction,” and often just “abstraction.”

By "making a new function," that doesn't mean that it's the process of looking at two similar terms like sin(x)2+1 and 2*x+1, and deciding to make a function λx.x+1. That's anti-unification, described below. It is quite literally the process of taking x+1 and changing it to λx.x+1.

And “abstraction” also refers to the output of this process, i.e.: any lambda/function whatsoever.1

It's been noted that this use of the word "abstraction" is quite different from other uses. Unfortunately, this has polluted broader discussion. Even though the lambda calculus usage is akin to adding an opening and closing brace to a block of code, this usage leaks out into discussion of "abstracting things into functions" and from there into extolling the benefits of being able to ignore details and other things that closing braces really don't let you do.

So functions are abstractions — just in a very limited meaning of the word with little relation to everything else under that label. Moving on...

Anti-unification

"Anti-unification" is a fancy term for the process of taking two things that are mostly similar, and replacing the different parts with variables. If you substitute those variables one way, you get the first thing back; else you get the second. If you see x*x and (a-b)*(a-b) near each other in a codebase and extract out a square function, then you've just done an anti-unification (getting the pattern A*A with the two substitutions [A ↦ x] and [A ↦ a*b]).

(Its opposite is unification, which is comparatively never used in programming. Unless you're writing Prolog, in which case, it's literally on every single line.)

This probably looks familiar: virtually all of what goes under the Don't Repeat Yourself label is an example of anti-unification. Perhaps you would describe the above as "abstracting out the square function" (different from the previous definition, where "abstracting" is just adding curly braces after the variables are already in place).

In fact, Eric Elliott, author of Composing Software and Programming JavaScript Applications goes as far as to say “abstraction is the process of simplifying code by finding similarities between different parts of the code and extracting shared logic into a named component (such as a function, module, etc...)” — i.e.: that abstraction is anti-unification. He then goes on to claim "The secret to being 10x more productive is to gain a mastery of abstraction." That sounds pretty impressive for a process which was first automated in 1970.2

Boxing

"Boxing" is what happens when you do too much anti-unification: a bunch of places with syntactically-similar code turns into one big function with lots of conditionals and too many parameters. "Boxing" is a term of my own invention, though I can't truly claim credit, as the "stuffing a mess into a box" metaphor predates me. Preventing this is exactly the concern expressed in the line "duplication is better than the wrong abstraction," as clarified by a critic.

There's a surefire sign that boxing has occurred. Sandi Metz describes it nicely:

Programmer B feels honor-bound to retain the existing abstraction, but since [it] isn't exactly the same for every case, they alter the code to take a parameter, and then add logic to conditionally do the right thing based on the value of that parameter.

I've written and spoken against this kind of naive de-duplication before. One of the first exercises in my course is to give two examples of code that have the same implementation but a different spec (and should therefore evolve differently). Having identical code is not a foolproof measure that two blocks do the same thing, and it's helpful to have different terminology for merging similar things that do and do not go together.

But, particularly, if we want abstraction to have something to do with being able to ignore details, we have to stop calling this scenario "abstraction."

Indirection

Though not precisely defined, indirection typically means "any time you need to jump to another file or function to see what's going on." It's commonly associated with Enterprise Java, thanks to books such as Fowler's Patterns of Enterprise Application Architecture, and is exemplified by the Spring framework and parodied by FizzBuzz: Enterprise Edition. This is where you get people complaining about "layers" of abstraction. It commonly takes the form of long chains of single-line functions calling each other or class definitions split into a hierarchy across 7 files.

Fowler's examples include abstracting

contract.calculateRevenueRecognition()

into the Service Layer Abstraction™

calculateRevenueRecognition(Contract)

If you were to describe the former, you'd probably say "It calculates the recognized revenue for the contract." If you were to describe the latter, you'd probably say "It calculates the recognized revenue for the contract." We've been spared no details.

Interfaces, Typeclasses, and Parametric Polymorphism

All three of these are mechanisms for grouping multiple function implementations so that a single invocation may dispatch to any of them. Most programmers will be familiar with interfaces, which are a language feature in Java and TypeScript and a common pattern in Python. Typeclasses, a.k.a. “traits,” are essentially interfaces not attached to objects. Parametric polymorphism, a.k.a. “generics,” are a little different in that they combine functions which differ in nothing but their type signature.

Parametric polymorphism is essentially just adding an extra parameter to a function, except that this extra parameter is a type. It's not abstraction in the same way that anti-unification isn't.

Interfaces and typeclasses do tend to be associated with abstractions in the sense about to be introduced. But there's a banal reason they do not satisfy the goal of an abstraction definition: there's nothing mandating that the many implementations have anything to do with each other. For example, in my installation of the Julia language, the getindex function is actually an interface with 188 implementations, dispatched based on the runtime type of its arguments. Most of these implementations do lookups into array-like structures, but a few do the exact opposite and create an array.

Sometimes, when calling a function behind a polymorphic typeclass interface, the programmer knows only one implementation is relevant, so the shared name is just to save some typing. Other than that, in order to make use of any of these, one must be able to explain the operation of multiple functions in some common language. It is not the language feature that allows one to program liberated from details, but rather this common language and its correspondence with each of the implementations. Which brings us to...

True Abstraction

In programming language theory and formal methods, there are several definitions of "abstraction" in different contexts, but they are all quite similar: abstractions are mappings between a complex concrete world and a simple idealized one. For concrete data types, an abstraction maps a complicated data structure to the basic data it represents. For systems, an abstraction relates the tiny state changes from every line implementing a TCP stack with the information found in the actual protocol diagram. These abstractions become useful when one can define interesting operations purely on the abstract form, thus achieving the dictum of Dijkstra, that "The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise."

You've probably used one such abstraction today: the number 42 can be represented in many forms, such as with bits, tally marks, and even (as is actually done in mathematics) as a set. Addition has correspondingly many implementations, yet you need not think of any of them when using a calculator. And they can be composed: there is an abstraction to the mathematical integers from their binary implementation, and another abstraction to binary from the actual voltages. In good abstractions, you'll never think that it's even an abstraction.

So, what do abstractions actually look like in code?

They don't.

Where are the abstractions?

A running joke in my software design course is that, whenever I show some code and ask whether it has some design property, the answer is always "maybe." And so it is whenever you ask "Is Y an abstraction of X?"

First, a quick digression. In PL theory and formal methods, there are many definitions of abstraction in different contexts, though they are all far more similar to each other than to any of the not-abstractions in previous sections. The one I am about to present is based on the theory of abstract interpretation. Abstract interpretation is usually taught as a (very popular) approach to static analysis, where it's used to write tools that can, say, prove a program never has an array-out-of-bounds access. But it can also be applied to more interesting properties, though usually in a less automated fashion. I'll be presenting it from the perspective of understanding programs rather than building tools, and explaining it without math symbols and Greek letters. I'll be in particular focusing on abstracting program state. I'll occasionally gesture at abstracting steps in a program, though the actual definitions are more complicated. (Google “simulation relation” and “bisimulation” to learn the technical machinery.)

So:

Abstractions are mappings. An abstraction is a pattern we impose on the world, not the entities it relates, which are called the abstract domain and concrete domain. Strictly adhering to this definition, the well-formed question would be "Is there an abstraction from X to Y?" followed by “Is that abstraction good?”

Let's return to the example of numbers. There is a great abstraction from voltages in hardware to strings of 0's and 1's, from strings of 0's and 1's to mathematical numbers, and also from tally marks to numbers. These abstractions on numbers induce abstracted versions of each operation on the base representations, such as the very-simple operation of adding 1 to a number. Already, we can see that the abstractions live outside the system; computers function just fine without a device that reads an exact value for the voltage in each transistor and then prints out the represented number. Yet in spite of living outside the system, this mapping is perfectly concrete.

Evaluating abstractions

The three example abstractions above have two properties that make them useful. The first is soundness. The picture above is what's called a “commutative diagram” in that any path through the diagram obtains the same result: given an input set of voltages, it would be equivalent to either (a) run a circuit for adding one and then convert the resulting voltages to a number, or (b) convert the voltages to a number and then add 1 with pen-and-paper. The second is precision: Adding 1 to a number produces exactly one result, even though it corresponds to a diverse set of output voltages.

Precision is what makes finding good abstractions nontrivial. The eagle-eyed reader might notice that any function to the integers yields a sound abstraction. For example, there is an abstraction from your TV screen to integers: the serial number. However, you'd be hard pressed to find any operations on TVs that can be sanely expressed on integers. Turning the TV on or changing the channel leaves it with exactly the same serial number, while replacing the TV with one slightly larger is barely distinguishable from randomly scrambling the serial number. Indeed, one would likely implement the “slightly larger TV” function on the abstract domain of serial numbers by mapping each serial number to the set of all other serial numbers. This is sound — getting a larger TV and then taking its serial number is certainly contained in the result of looking at your current TV's serial number then applying this operation — but maximally imprecise.

A consequence of this: if we translate every question “is X an abstraction of Y” to “is there an abstraction which maps X to Y,” the answer is always “yes.” Instead, we can ask which operations can be tracked precisely with reference only to the abstract domain. The abstraction from voltages to numbers is perfectly precise for all operations on numbers, but not for determining whether a certain transistor in the adder circuit contains 2.1 or 2.2 volts. The abstraction from TVs to serial numbers is perfectly imprecise for every operation except checking whether two TV's are the same (and maybe also getting their manufacturer and model).

To the property of soundness and the measurement of precision, we add a third dimension on which to evaluate abstractions: the size (in bits) of an abstract state. The good abstractions are then the sound abstractions which are small in bits, yet precise enough to track many useful operations.

So consider a website for booking tables at restaurants, where the concrete domain is the actual state of bookings per table {table1BookedFrom: [(5-6 PM, “Alice”), (7-8 PM, “Bob”)], ..., table10BookedFrom: [...]}. The abstract domain shall be a list of timeslots. For each user, it is possible to abstract a concrete state down to the abstract state listing the booked timeslots for that user, i.e.: for Bob, [7-8 PM]. What makes this an abstraction, as opposed to just an operation on the restaurant state, is that we shall then proceed to describe the effect of every other operation on this value. So consider the actual booking function, which might have this signature:

void bookTable(User u, TimeInterval t)

Below I give 4 specifications for this function. For the example of booking a table for Carol from 7 to 8 PM, these specifications give:

  1. The actual behavior of this implementation, say, trying to assign Carol to the lowest-numbered table.
  2. All allowed behaviors on the concrete states, i.e.: finding any table open at the given time and assigning it to Carol, or doing nothing if there is none
  3. The allowed outputs on the abstract domain of Carol's bookings, namely (assuming Carol does not already have a reservation) either [7-8 PM] or [].
  4. The allowed outputs on the abstract domain for Bob or any other user, i.e.: the exact same as the input.

From this one example, we can derive quite a few lessons, including:

  • All of these specifications are useful in that you might use each of them when mentally stepping through the code. Perhaps you'd think “This line shouldn't have affected any of Bob's bookings” (using specification 4, corresponding to the “Bob's bookings” abstraction) or “When I click this button, either table 5, 6, or 7 will be booked” (using specification 2).
  • Abstractions are separate from the code, and even from the abstract domain. It does not make sense to say that the bookTable function or anything else in this file “is” the abstraction, because, as we have just seen, we can use many different abstractions when describing its behavior. More striking, we see that, even for a specific pairing of concrete and abstract domains, there can be many abstractions between them.
  • Instead, code is associated with abstractions. Note the plural. We've seen that bookTable can be associated with several abstractions of the behavior — infinitely many in fact, including many useful ones not previously discussed, like mapping the restaurant state to the list of timeslots with available tables.
  • No code change is needed to reason using an abstraction. We could extend the mapping from relating abstract/concrete states to relating steps between them, and then say the bookTable function “abstracts” the set of intermediate steps the program takes for each line in the function, but we could do this almost as easily if the bookTable implementation was actually a blob in a much larger function.
  • Different abstractions tend not to be more or less precise than each other, just differently precise. Compare the abstractions from the restaurant state to Bob's bookings, Carol's bookings, and the set of open timeslots. All of them can be used to answer different questions.

Continuing, we can also evaluate what makes the “Carol's bookings” abstraction a good one. The corresponding specification, Specification 3, is quite close to deterministic, yielding only two possible output states. The corresponding abstract states contain much less information than the concrete ones. And an entity (human or tool) reading the code tracking only this abstract state will still be able to perfectly predict the result of several other operations, such as checking whether Carol has a table. This is the new semantic layer on which one can be absolutely precise that Dijkstra speaks of!

Of course, it is cumbersome to say “there is a sound and precise abstraction mapping voltages on hardware to mathematical numbers” or “there is a good abstraction from the specific details of when tables are booked to just the available times.” It is quite convenient shorthand to use the more conventional phrasing “numbers abstract the hardware” or “bookTable abstracts all the details of reservations;” you can say “abstraction mapping” if you want to clearly refer to abstractions as defined in this blog post. Yet this shorthand invites multiple interpretations, and can spiral into an argument about whether booking tables should actually be the same abstraction as booking hotel rooms. Feel free to call numbers an abstraction of the hardware, but be prepared to switch to this precise terminology when there's tension on the horizon.

Whence the Confusion

Programmers correctly intuit that it is desirable to have some way to reason about code while ignoring details. In their 1977 paper, Patrick and Radhia Cousot first taught us the precise definition of abstraction that makes this possible. The other attempts fail to see the incorporeal nature of abstractions and instead fixate on something in the code. But there must be some connection.

Yes, functions are not abstractions. But for every function, there is an abstraction, not necessarily a good one, collapsing the many intermediate steps of the function into an atomic operation. There may also be abstractions which admit a simple description of the relation between the inputs and outputs.

Anti-unification is not abstraction. Yet two code snippets that could be fruitfully semantically modeled with similar abstract states will often be amenable to syntactic anti-unification. As disparate operations are combined, ever more information must be added into the abstract state to maintain precision. The result is boxing.

Indirection is hella not abstraction, though similarly-named functions may suggest slightly-different abstract domains associated with them. Many changes that make abstractions more explicit come with indirection, but we've seen it's possible for readers to impose abstractions on code without any changes at all.

Typeclasses and interfaces are not abstraction, but a good interface will be associated with at least one good abstract domain precise enough to make each of the interface's operations nearly deterministic (or at least simple to specify), and each implementation will come with a sound abstraction mapping its concrete states into that abstract domain.

Your car's windshield wipers and roof do not abstract away the rain, as claims Spolsky, but they do mean that, to predict your happiness after running your brain's DriveToStore() function, you can use an abstract state that does not include the weather.

Abstractions offer the dream of using simple thoughts to conjure programs with rich behavior. But fulfilling this promise lies beyond the power of any language feature, be it functions or interfaces. We must think more deeply and identify exactly how the messy world is being transformed into a clean ideal. We must look beyond the binary of whether something is or is not an abstraction, and discover the new semantic level on which we can be absolutely precise.

Appendix: Some Other Views of Abstraction

Abelson and Sussman's Structure and Interpretation of Computer Programs is a sure candidate for the title of “Bible of computer programming” (sometimes mixed with the actual Bible), and it's full of instruction on abstraction, beginning with chapter 1 “Building Abstractions with Procedures.” I expect at least one reader wants beat me over the head with a copy of the book saying I'm getting it wrong. I don't really want that (it's 900 pages), so I walked down 2 flights of stairs from my MIT office to Gerry Sussman's office and asked him. I'll represent his ideas below.

Sussman explained that he believes abstraction is a “suitcase term” which means too many different things, though he only sees two main uses relevant to software. The first definition is: giving names to things produced by the second definition. That second definition, he explained, has to do with the fundamental theorem of homomorphisms and its generalization. And then he pulled an abstract algebra book off the shelf.

I'll try to explain this as best I can with minimal math jargon while still being precise. I'll explain the definition simultaneously with Sussman's two examples, multivariate polynomials, and (physical) resistors in electric circuits.

In this picture, G can be seen as the set of all data in all forms and f is some operation on G. So G can be the set of all polynomials in all representations, or the set of all resistors. f then can be the operation of plotting the polynomial on all values, or computing the current through a resistor across a range of voltages.

Now, there are many different values in G on which f does the same thing. G contains both sparse and dense representations of the same polynomial; these have the same plot. There are different resistors with the same resistance; assuming they perfectly follow Ohm's Law, they have the same current at the same voltage. One can write down the list of lists of which values of G are treated the same by f; that's φ, the kernel of f. So for polynomials, φ would be the list of distinct polynomials, each of which contains the list of all representations of that polynomial.

Now, finally, one can use φ to quotient or “smush” together the like elements of G. All the different representations of the same polynomial get mapped to something representing that polynomial independent of representation. All the different resistors of resistance R get mapped to the idea of a resistor with resistance R. This is G/K in the picture.

The theorem is then that G/K behaves the same (is isomorphic to) H, e.g.: that that the set of different representations of polynomials can be manipulated in the same way as their plot. But I believe Sussman was gesturing less at this theorem and more at G/K itself, i.e.: at the idea of merging together different representations that behave the same under some operations.

I like this idea because it gives a way to unify into a single mapping the relation between many different implementations and their shared abstract domain. For the different representations of polynomials, a typical formulation with abstract interpretation would provide a different mapping from each kind of implementation into the shared abstract domain. On the whole though, the operative idea in the “homomorphism theorem” theory of abstraction seems to be that of merging together concrete values that can be treated similarly by certain operations. This idea is already present in abstract interpretation; indeed, φ can be directly taken to be an abstraction mapping, with G and G/K the concrete and abstract domains. On the whole, while I find the connection to abstract algebra cute, I'm not sure that the “homomorphism theory of abstraction” offers any insight that the theory of abstract interpretation does not.

So that's the main item in Sussman's suitcase of meanings of abstraction in software. It looks superficially different from any of the variations of abstract interpretation, but is actually quite compatible.

Is there anything else in that suitcase? Any other (good) uses of the word “abstraction” not captured by the previous definitions?

Maybe. I can say that there are is something I'd like to be able to do with something called “abstraction,” but that I can't do with abstract interpretation: dealing with inaccuracy.

You see, the orthodox definition of a sound abstraction would rule out a technique that predicts the concrete output perfectly 99.99% of the time and is otherwise slightly off, and instead prefers a function that says “It could be anything” 100% of the time. I know there is work extending abstract interpretation to some kinds of error, namely for numeric approximations of physical quantities, but, overall, I just don't know a good approach to abstraction that allows for reasonable error.

On a related note, sometimes AI researchers also talk about abstraction. I know that the best poker AIs “abstract” the state space of the game, say by rounding all bet sizes to multiple of $5, and that human pros exploit it by using weird bet sizes and letting the rounding error wipe out its edge. But I am not aware of a general theory backing this beyond just “make some approximations and hope the end result is good,” and am not even sure “abstraction” is a good term for this.

Acknowledgments

Thanks to Nate McNamara, Benoît Fleury, Nils Eriksson, Daniel Jackson, and Gerry Sussman for feedback and discussion on earlier drafts of this blog post.


1 Daniel Jackson credits Turing Laureate Barbara Liskov with promulgating this usage. Her influential CLU language uses “abstraction” to mean any unit of functionality (type, procedure, or iterator).

2 Technically, it’s only first-order anti-unification, equivalent to extracting named constants, that Gordon Plotkin developed in 1970. Work on higher-order anti-unification, which corresponds to extracting (higher-order) functions, began in 1990; Feng and Muggleton’s “Towards Inductive Generalisation in Higher order Logic" (1992) provides an early discussion. While Elliott is quite explicit that any anti-unification is abstraction, it is true that there are a vast number of anti-unifications of a given set of programs, and choosing the best is very difficult. DreamCoder is a recent notable work based on doing so.