Sunday, January 7, 2018

The Three Levels of Software: Why code that never goes wrong can still be wrong

Update 3/20/2019: Previous versions of this post used the terminology Level 1/Level 2/Level 3 and "Level 3 bug" a lot. I'm diminishing this in favor of "Runtime/Concrete Implementation/Logic level" and "Error of modular reasoning," because people started using "Level 3" to mean "Everything about the code that I don't like that can't cause an incorrect output."

Have you ever stopped to consider what it means for a program to be wrong? I mean, really stopped to consider? Like, “it’s wrong if it crashes — but what if the crash conditions are only hypothetically achievable — but wait…”. Let’s get to the bottom of this. Here’s a first try:

Definition #1
A program is wrong if it runs and produces a wrong result.
(Where “wrong result” is broadly defined to include performance bugs, usability bugs, etc.)

So I put my bread in the toaster, and, instead of giving me a piece of toast, it gives me a rhinoceros. Not what I wanted. Definitely a bug (err, mammal).

Programs can still usually produce the correct result but be wrong. So I get a new toaster, and I’ve never had a problem with it. Then I find out that, if I put in a thin slice of Bavarian rye when there’s a full moon out, and I’ve used the toaster exactly twice in the last 24 hours, then it will fry its circuits. Sounds like a bug. So we amend it into a 2nd definition.

Definition #2
A program is wrong if there exists some environment, sequence of events, or other “input” under which it produces a wrong result.
Now let’s eliminate all that. I’ve patched the software on my toaster, and I’ve both proven the code correct, and had a trillion monkeys test every conceivable situation. There’s now nothing you can do short of taking a sledgehammer that will make the toaster malfunction.

It’s still possible for the code to be wrong.

How? Simple. Let’s say somewhere in that toaster, it sums all the elements of an array — but it accidentally goes over by one. Memory error? Well, it’s written in C, so it just adds on whatever is in the next address in memory. That address is a byte which is always 0 for whatever reason, so it gets the right answer every time.

The code is still wrong. It may never fail, but the reasoning for why the code should work has a hole: what should have been a simple argument of reading an array now depends on complex assumptions about the compiler and memory layout, as well as a whole-program check that that byte is always 0. And that hole can cause an actual failure in future versions of the program: someone rearranges the fields in a structure, and code that should never have been affected starts failing.

We reach our third definition:

Definition #3
A program is wrong if the reasoning for why it should be correct is flawed.

Progress! We’ve gone from a clear and simple definition to one that’s handwavy and impossible to use. Actually, it’s quite rigorous, but I’ll have to teach you some concepts from formal verification to make it more concrete.

All three definitions are correct to use at different times. This forms one of my core teachings: that, when we talk about programs, we speak at one of three levels. These three levels are:

  • Level 1: Runtime. The runtime level deals with specific values and a specific environment from a single execution of the program. A lot of debugging is done at the runtime level.
  • Level 2: Concrete Implementation/Code. At the level of the concrete implementation, we think about what the current implementation could do when given arbitrary inputs and an arbitrary environment. Behaviors that cannot happen are not considered, even if it requires global reasoning to rule them out. A lot of implementation work is done at the code level.
  • Level 3: Design/Logic: At the level of logic, we consider the abstract specification of each unit of a program. When using other units, we only consider the guarantees made by the spec, and assume they may be replaced at any time with a different implementation. Many programs which are correct when viewed at the concrete implementation level are not correct when viewed at the logical level, because they rely on behavior which is not guaranteed to hold in all future versions. We call this an error of modular reasoning, because functions with such errors lack a desirable property: the ability to argue that the function is correct only from the function’s code and from the contracts of the function's dependencies, without need to even glance at the dependencies. Most software design is done at the level of logic.

I’ve met programmers who confuse the runtime and concrete implementation/code levels, but not many. The runtime and code levels are concrete enough: you see the runtime level by running under a debugger and inspecting the stack, while you see the concrete implementation level just by reading the code and thinking about what could happen. But most programmers have a much harder time getting that there even is a logical level, and I’ve seen much confusion come from when one programmer is talking about what a component is guaranteed to do, and the other is talking about what it happens to do. Most programmers will be taught the difference between interface and implementation at a high-level. But very few get to see the full details of what defines an “interface” beyond just “list of functions.” That can only be seen when doing formal verification, where a program’s properties and assumptions are written just as concretely as its source code. In everyday development, all that structure and reasoning is still there; it’s just scattered across documentation, comments, and programmers’ heads.

Statements at different levels do not mix. So if the proposed client/server protocol says the client should send a request twice and discards the first result (a Level 3/design-level statement), and the designer tells you it’s because there are three different kinds of request handlers in the codebase, and Bob’s sometimes gets it wrong the first time (a Level 2/implementation-level statement), you should get confused. You should be as confused as if someone wanted to call a file or write to a function.

Isn’t our goal to deliver working software to customers, and so correctness at the concrete implementation level is all that’s important? No, our goal is to be able to continue to deliver working software far into the future. The logical level is all about how modular are the interactions between the components of your software, and that’s important if you care at all about having different versions of those components, like, say, if you wanted to rewrite one of them tomorrow. Joel Spoelsky relates how the original Sim City had a use-after-free error. At the concrete implementation/code level, this was totally fine, since freed memory in DOS was valid until the next malloc, and so the program worked. At the logical level, this was a defect, because the spec for free says you need to act as if memory gets eaten by a dragon as soon as you free it, and any future free implementation may actually eat it. Sure enough, once Windows 3.1 rolled around with a new memory manager, SimCity would start crashing. Microsoft had to add a special case to check if SimCity was running and switch to a legacy memory manager if so.

People sometimes tell me about how software is so easy and you can just have an idea and make it and it’s so cheap because there’s nothing physical to build. Hogwash. Software is a domain where we make decisions that can’t be undone and have to be supported for all eternity. The HTTP “referer” is forever misspelled, and that SimCity special-casing code is still there 30 years later.

This is why it’s important to get programs right at the logical level, making it so you can argue each component is correct independently of the rest of the program, even if the overall program passes all your testing and meets external requirements.

And, lesson for API designers: This is why it’s important to make your APIs conform as strictly to the spec as possible, at least in Debug mode. This all could have been avoided if DOS’s free were to deliberately zero-out the memory.

I now have two years experience teaching engineers a better understanding of how to avoid complexity, improve encapsulation, and make code future-proof. But ultimately, that knowledge all stems from this single master insight: the most important parts of our craft deal not with code, but with the logic underneath. The logical layer may be hidden, but it’s not mystic: we have over 40 years of theory behind it; see the postscript below for a taste of the details. When you learn to see the reasoning behind your system as plainly as the code, then you have achieved software enlightenment.

Next time: why this implies keeping design and implementation separate.

Postscript: Technical Details

I could speak a book on harmony and composition, but you’d learn more from listening to a song and seeing the sheet music. Similarly, I could ramble on about reasoning and assumptions, but to truly understand it, you need to see the objects under study. The three levels deal with different views of a program: executions, code, and specifications. Each corresponds to its own kind of reasoning.1 Let’s bring in the math!

Level 1: Traces and states

The objects of study of Level 1, the runtime level, are traces and states. A trace is a sequence of events that occurred in a program execution. A trace looks something like this:

Enter method postProfileUpdateToServer
Read field profile to local variable p
Enter method saveCurrentState
Leave method saveCurrentState

Traces can be very high level, like which microservices get run, or very low level, involving instruction scheduling on the CPU. They tell you exactly what happened, and give the information to construct the current state.

The state is a collection of cells with their current value. It looks like this:

{ p=Profile(name=Bob, id=42, ), 
  this=NetworkClient(socket=..., baseUrl=”),
  __messages_output=[Initializing MyApp version 5, Welcome, Bob!, ],

Most often, the state just consists of values in memory, though it can be helpful to also include pieces of the environment, or a list of what’s already been output or sent over the wire.

Any statement that can be phrased in terms of specific traces and states is a runtime-level statement. These constructs should be familiar to programmers: printouts let you view fragments of a trace, while a debugger lets you observe the current state.

The corresponding manner of automated reasoning is ground reasoning. Ground reasoning means reasoning only about concrete values with no quantifiers. So, suppose I have this code:

1:       left = x - 10;
2:       if (left < 0)
3:         left = 0;
4:       right = min(x+10, 100);
5:       print(right);

Say I know from the printout that right=50 in a given trace, and I want to know whether it’s possible that the branch on line 3 executed. I could answer this question by asking a solver to find a satisfying assignment for the following formula:

left<0  (left=x-10 
          right=(if x+10<=100 then x+10 else 100)

Level 2: Code

The object of study of Level 2, the level of concrete implementation, is the code. Yes, code, the stuff you work with every day. Any statement that can be phrased in terms of the code, but not a specific trace or state, is a code-level statement. Let’s look at the following example, which computes a damage bonus in a hypothetical game:

public double computeDamageBonus(int creature1AttackSkill,
                                 int creature1ArmorPiercing,
                                 int creature2DefenseSkill) {

  int adjDefense = creature2DefenseSkill - creature1ArmorPiercing;

  if (adjDefense <= 0) {
    return 10;

  double factor = (double)creature1AttackSkill / 
                     (creature2DefenseSkill - creature1ArmorPiercing);

  if (factor > 10)
    return 10;
    return factor;

By picking specific inputs to this function, it will yield a trace and a sequence of states. But this program encodes an exponentially large number of possible traces and states. At the runtime level, we can ask “Did this execution experience a division-by-zero error?” At the concrete implementation level, we can ask “Is there any execution of this function that experiences a division-by-zero error?”

The corresponding manner of reasoning is first-order logic. This means we can write down formulas that say “for all inputs X, does this property hold?” or “does there exist an input Y, such that this function crashes?” What we can’t do is quantify over other functions. Here’s a first-order formula that states that the computeDamageBonus function can never have a division-by-zero error:

creature1AttackSkill, creature1ArmorPiercing, creature2DefenseSkill.
  (adjDefense = creature1ArmorPiercing - creature2DefenseSkill
  ¬(adjDefense  0))
     creature1ArmorPiercing - creature2DefenseSkill  0

Level 3: Specifications

The object of study of Level 3, the layer of logic, is the specification. I also call this the design level, because most software engineering terms, such as modularity and encapsulation, can only be defined in terms of the logic and specification. There are many, many ways of writing specifications, but a popular one is the Hoare triple: preconditions and postconditions. Here’s one for malloc:

Pre: n > 0
Post: retval  NULL  alloc(retval, n)

At the concrete implementation level, we could ask questions about a specific implementation of malloc, like “How much memory overhead does it use?” At the logical level, we can ask questions about all possible implementations of malloc, like “does this program have a memory error?”

The corresponding manner of reasoning is higher-order logic. Higher-order logic is like first-order logic, but we can now quantify over functions. First, let’s translate the spec above2:

MallocSpec(m) = n. n > 0  m(n)  NULL  alloc(m(n), n)

Now, if we hypothetically had a formal specification for the PlaySimCity function, a spec that it works for any implementation of malloc would look something like

m. MallocSpec(m)  SimCitySpec(m, PlaySimCity)

And that, my friends, is modularity reduced to a formula.

A really big thing about specifications is that they can involve properties which do not appear in the code at all. We defined malloc in terms of this alloc predicate, which has no intrinsic meaning other than “stuff returned by malloc.” But what this does is let us relate malloc to other operations. We can give the act of dereferencing pointer x  a precondition ∃a,n. a≤x<a+n ∧ alloc(a,n), and give the free(x) function a spec that destroys the predicate alloc(x,n). Now, having a proof of alloc(a, n) when reasoning about the program means “this memory was returned by malloc, with no intervening free” — exactly our intuitive notion of memory being allocated!

What we’ve shown here is that “this memory is allocated” is a specification-level notion which is independent of the code. And indeed, there may be nothing within the program to indicate that a piece of memory has been allocated.  It’s possible that being allocated corresponds to some internal data structure of the memory manager, but it’s also possible that your code will be compiled for a machine with infinite memory.

Now I can state exactly in what sense the SimCity code was wrong: A precondition of dereferencing a pointer is that the pointer is to allocated memory. They tried to dereference a pointer without meeting that precondition, i.e.: no proof of alloc(a,n). So, their code ran fine for one implementation of malloc, but, as they sadly learned, not for all of them.

When designing software, I always recommend trying to think in pure concepts, and then translate that into the programming language, in the same way that database designers write ER diagrams before translating them into tables. So whether you’re discussing coupling or security, always think of the components of your software in terms of the interface, its assumptions and guarantees, and mentally translate them into formulas like the ones above. So much becomes clearer when you do, for logic is the language of software design.


Thanks to Elliott Jin and Jonathan Paulson for comments on drafts of this post.

1 Don’t get too hung up on the analogies between the levels of software and the modes of reasoning; there are plenty of exceptions. For instance, much of the progress in program analysis/verification/synthesis research comes from finding all sorts of tricks to encode more complicated problems into a form that can be solved by ground reasoning, since we have good solvers. This likely isn’t going to be too relevant to you unless you work in programming tools.
2 I am lying slightly in this example — the kind of logical formula I gave is really just meant for pure functions. To handle malloc properly, we’d want to use separation logic.

Liked this post?

Related Articles


  1. Counterpoints on Lobsters here:

    Since I don't have a blog, I just put in URL field a link summarizing the methods of assuring software that worked in the past. That's just in case you or your readers find them interesting. It's my most popular, concise summary to date of methods going back to 60's. As concise as I can get it anyway...

  2. Get a life bro.. Write something in short that we dont get bugged reading your examples.

  3. "But most programmers have a much harder time getting that there even is a Level 3"

    Level 3 is your beloved project manager who has important requirements.