Monday, January 22, 2018

The Benjamin Franklin Method of Reading Programming Books

Let’s face it, programming books suck. Those general books on distributed systems or data science or whatever can be tomes for a lifetime, but, with few exceptions, there’s something about the books on how to write code in a language/framework/database/cupcake-maker, the ones with the animal covers and the cutesy sample apps, they just tend to be so forgettable, so trite, so….uneducational.

I think I’ve figured out why I don’t like them, and it’s not just that they teach skills rapidly approaching expiration. It’s their pedagogical approach. The teaching algorithm seems to be: write these programs where we’ve told you everything to do, and you’ll come out knowing this language/framework/database/cupcake-maker. Central to these books are the long code listings for the reader to reproduce. Here’s an example, from one of the better books in this category

class User < ApplicationRecord
  attr_accessor :remember_token
  before_save { self.email = email.downcase }
  validates :name,  presence: true, length: { maximum: 50 }
  VALID_EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-.]+\.[a-z]+\z/i
  validates :email, presence: true, length: { maximum: 255 },
                    format: { with: VALID_EMAIL_REGEX },
                    uniqueness: { case_sensitive: false }
  has_secure_password
  validates :password, presence: true, length: { minimum: 6 }

  # …another 30 lines follows...
end

Traditionally, there are two ways to study a page like this:

  1. Type out every line of code 
  2. Copy+paste the code from their website, maybe play around and make small changes 

Approach #1 is a method that, like a lecture, causes the code to go from the author’s page to the reader’s screen without passing through the heads of either. The second is like trying to learn how to make a car by taking apart a seatbelt and stereo: you’re just toying with small pieces. Neither is a sound way to learn.

If you had an expert tutor, they wouldn’t teach you by handing you a page of code. Still, these books are what we have. How can we read them in a way that follows the principles of learning? Read on.

Mental Representations

According to K. Anders Ericsson in his book Peak, expertise is a process of building mental representations. We can see this because expert minds store knowledge in a compressed fashion. Musicians can memorize a page of music far faster than a page of random notes. Expert chess players told to memorize a board position will do much better than amateurs, but, when they make a mistake, they’ll misplace whole groups of pieces.

This is possible because music and chess positions have structure that makes them look very different from a page of random notes or a random permutation of pieces. Technically speaking, they have lower perplexity than random noise. So, even though there are 26 letters in the English alphabet, Claude Shannon showed that the information content of English is about 1 bit per letter: given a random prefix of a paragraph, people can guess the next letter about half the time.

This is why a programmer skilled in a technology can look at code using it and read through it like fiction, only pausing at the surprising bits, while the novice is laboring line-by-line. This is also why a smart code-completion tool can guess a long sequence of code from the first couple lines. With a better mental representation, understanding code is simply less work.

(How do these mental representations work? My officemate Zenna Tavares argues they are distribution-sensitive data structures.)

This is exactly what’s missing from the “just type out the code” approach: there’s nothing forcing your mind to represent the program as anything better than a sequence of characters. Yet being able to force your mind to do this would mean being able to learn concepts more rapidly. Here’s a 200 year-old idea for doing so.

The Benjamin Franklin Method

I don’t know what’s more impressive: that Benjamin Franklin became a luminary in everything from politics to physics, or that he did this without modern educational techniques such as schools, teachers, or StackOverflow. As part of this, he discovered a powerful method of self-study. I’ll let him speak for himself (or go read someone else’s summary).

About this time I met with an odd volume of the Spectator. It was the third. I had never before seen any of them. I bought it, read it over and over, and was much delighted with it. I thought the writing excellent, and wished, if possible, to imitate it. With this view I took some of the papers, and, making short hints of the sentiment in each sentence, laid them by a few days, and then, without looking at the book, try'd to compleat the papers again, by expressing each hinted sentiment at length, and as fully as it had been expressed before, in any suitable words that should come to hand. Then I compared my Spectator with the original, discovered some of my faults, and corrected them.

—Benjamin Franklin, Autobiography

This process is a little bit like being a human autoencoder. An autoencoder is a neural network that tries to produce output the same as its input, but passing through an intermediate layer which is too small to fully represent the data. In doing so, it’s forced to learn a more compact representation. Here, the neural net in question is that den of dendrons in your head.

K. Anders Ericsson likens it to how artists practice by trying to imitate some famous work. Mathematicians are taught to attempt to prove most theorems themselves when reading a book or paper --- even if they can’t, they’ll have an easier time compressing the proof to its basic insight. I used this process to get a better eye for graphical design; it was like LASIK.

But the basic idea, applied to programming books and streamlined, is particularly simple yet effective.

Here’s how it works:

Read your programming book as normal. When you get to a code sample, read it over

Then close the book.

Then try to type it up.

Simple, right? But try it and watch as you’re forced to learn some of the structure of the code.

It’s a lot like the way you may have already been doing it, just with more learning.

Monday, January 15, 2018

The Design of Software is A Thing Apart

Big up-front planning phases are out. Rapid iteration is in. With all this movement towards agile, it’s increasingly tempting to throw out the idea of having a separate design doc for software in favor of just getting started and having self-documenting code.

And that is a fatal mistake.

Last week, I explained that we speak about program behavior at one of three levels: executions, code, and specification. Most knowledge about software design is contained in Level 3, the specification. The flipside: the information of a program’s design is largely not present in its code1. I don’t just mean that you have to really read the code carefully to understand the design; I mean that there are many designs that correspond to the exact same code, and so recovering all the design information is actually impossible.

In that post, I talked a lot about the example of describing malloc and free, and how it’s quite possible to have a semantic-level notion of “freeing memory” that corresponds to absolutely nothing in the implementation. In this post, I give three more examples. The first discusses the problem of testing too much, and how the decision of what to test is in general unrecoverable from the code. The second is a question of coupling which cannot be answered from the code alone. The third shows how an identical comparison function may have different meanings in different contexts, with different expectations about how the code may evolve. None of these examples will cause your code to break, at least not today. What they will cause are the characteristic problems of bad design: fragile tests, and the risk of extra work and bugs tomorrow. The first two examples are based on real code I’ve encountered (or written!), which caused exactly those problems.

Let’s begin!

1) What should you test?

“Write the tests first,” say advocates of test-driven development, “and the code will follow.” Doing so forces you to think about what the code should do before writing it. What it doesn’t do is get you to separate the code’s goals from its implementation details.

So you update the code, a test fails, and you think “Oh. One of the details changed.” Should you update the test to compensate? If you tried to use TDD as a substitute for thinking about the design, then you probably will. Congratulations, you now have a fragile test. Writing tests in this fashion is mistake #7 in my booklet “7 Mistakes that Cause Fragile Code”.

Here’s some code and its test. Is this a good test?

Code (pseudo-Java):
public void postProfileUpdateToServer(User u, Profile p) {
  appState.saveCurrentState();
  HttpParams params = ;
  addProfileToParams(params, p);
  HttpResult res = httpClient.post(getPostUrlForUser(u), params);
  logger.logUpdateEvent(u, p); // calls appState.saveCurrentState()
  if (res.getResponseCode() != Http.HTTP_SUCCESS)
    scheduleRetry(u, this.updateBlock);
}
Test checkPostProfileUpdateToServerAssumeSuccess (pseudocode):
  Call postProfileUpdateToServer with user Bob,
       and his profile with address changed to 123 Mulberry St
  Check that httpClient.post was called once
  Check that its first argument was
               www.website.com/callbacks/updateprofile/bob/
               Check that the first parameter is the users E-mail address,
                  the second their mailing address,
                  and the third their preferred language
  Check that appState.saveCurrentState was called twice

( How do you check that httpClient.post was called? This is quite easy to do using a mocking framework such as Java’s Mockito. I quite recommend it, as a way to unit-test functionality purely in terms of how it’s defined using other operations. )

There are generally two ways to write bad tests. One is to miss cases and not be rigorous enough. This is what happens when developers chase code coverage without actually thinking about what they’re testing. A common blind spot is that a programmer will test what happens when a program goes right, but not test what happens when a server’s down, when it’s fed bad input, etc.

The other is to test too much and be too specific. Is it part of the design that postProfileUpdateToServer must call appState.saveCurrentState twice, and it chooses to do so once by calling logger.logUpdateEvent? I don’t know. The test assumes a certain order about the HTTP parameters, but does the Profile object guarantee it? If the Profile uses a hash table internally, then the order is arbitrary, and may change when upgrading the libraries. Then the test will fail when upgrading to a new version of Java. (This is why Go randomizes hash table iteration order; one of my colleagues built a tool that does the same for Java.) At the extreme, I’ve seen a very experienced programmer write tests like assert(MSG_CODE_UPLOAD_COMPLETE == 12), which, given that these constants are meant to encapsulate knowledge, kinda defeats the point of having them in the first place.

The code should not follow the tests, nor the tests the code. They should both follow the design.

2) Do these pieces of code know about each other?

One nice thing about understanding the design/specification level of code is that it gives us very clean definitions of formerly-fuzzy concepts like “this code knows about that code.” For now, stick with your intuition as you answer this question:

Here are three modules. Module 1 defines a “student” record. Module 2 defines an “employee” record, and creates them by turning students into student employees. Finally, module 3 uses employee records, namely by printing them.

Oh yeah, and this example is in pseudo-JavaScript, so there’s no notion of “type” other than “the set of stuff this function can return.” Here’s the code:

Module 1 :

function makeStudent(firstName, lastName) {
  return {firstName: firstName, lastName: lastName};
}

Module 2:

function makeStudentEmployee(student) {
  var e =  student.copy();
  e["employeeId"] = nextEmployeeId();
  return e;
}

Module 3:

function printEmployeeFirstName(employee) {
  console.log(employee["firstName"]);
}

So, Module 3 accesses the firstName field of the employee value, which is defined in Module 1. Now, the question is: does Module 3 know about Module 1. That is, does the description of what Module 3 does need to be written in terms of the description of what Module 1 does?

The answer is: it depends. If the documentation of Module 2 is “Returns a student employee record with firstName, lastName, and employeeId fields,” then the answer is a clear No; it’s just an implementation detail that it obtains a student-employee record by appending to a student record. If the documentation is “Returns a student record with an additional employeeId” field, then it really is the case that you can’t know what’s in the argument to printEmployeeFirstName without reading Module 1, and so modules 1 and 3 are very much coupled.

3) What does this code accomplish?

Quick! What does this line of code accomplish?

  return x >= 65;

Two possible answers:

  1. x is an ASCII character which is already known to be alphanumeric; this code checks if it’s a letter. 
  2. x is an age of a person; the code checks if the person is past the retirement age in the US. 

In both possible answers, the code is the same. I can probably find examples where the surrounding context is the same too. If this is just a standalone function lying in a Utils file somewhere, it makes absolutely no difference to the program which I use it for; I could even use it for both. But it makes a massive difference to the evolution of the program.

The former may need to be changed for internationalization, while the latter may need to be changed following changes in US laws. The former should live alongside other functionality related to character sets; the latter alongside code relating to people or company policies. And in the former case, since there’s a bunch of non-alphanumeric ASCII characters between “9” and “A”, some crazy-optimizing programmer could change it to “return x >= 64” and then optimize it to use bit-ops like “return (x >> 6)” with no effect on the program. In the latter, that might lead to a lot of erroneously-sent retirement benefits.

Meanwhile, if you were trying to prove the correctness of a program that used this function, the proof environment would not be convinced that Bob is eligible for retirement benefits just because that function returned “true” until you tell it exactly what this function is meant to do.

If you just think about the code, then the letter ‘A’ is the same as the retirement age. So focus on the design.

Credit to Ira Baxter for this example.

Writing Better Code

I gave you three examples of situations where it’s impossible to recover design information from the code alone. For the tests, there’s no solution other than to have the design written somewhere separately, so you can know whether the order of HTTP parameters is stable, or whether to rely on the number of times saveState is called. For the code examples, it’s actually quite possible to change the code so that it only corresponds to one design. For that unclear case of coupling, if you want to make it clear that modules 1 and 3 are separate, have module 2 create records with explicit fields. For that retirement age check, just write it as “return x >= RETIREMENT_AGE”. Even if RETIREMENT_AGE and ASCII_A (usually spelled just 'A') have the exact same runtime values (for now), they represent different concepts. They are not interchangeable in the design, nor in a formal proof.

This gives rise to my first rule of writing good code:

Jimmy Koppel’s rule of good code #1: Make the design apparent in the code (Embedded Design Principle).

Sounds simple, but it’s quite vague and subtle until you’ve really internalized how code is derived from a design. I’ve given multiple workshops exploring different aspects of this idea, especially around coupling.

Some things are better left unsaid

In statistics, we used to think that, if you had enough data, you could learn anything. Then the causal inference guys came along and showed us that, actually, even with infinite data, there can be many processes that generate the same observations. If you just collect data about what days there was rain in town and what days your lawn was muddy, you can’t tell the difference between a universe in which rain causes mud and one where mud causes rain. You need to know something more.

Software design is the same way: there can be many designs that correspond to the exact same code. Those who speak of “self-documenting code” are missing something big: the purpose of documentation is not just to describe how the system works today, but also how it will work in the future and across many versions. And so it’s equally important what’s not documented.

Acknowledgments

Thanks to Alan Pierce, Alex Reece, and Tej Chajed for comments on earlier drafts of this post.


1 Note that I am considering text in a source file, namely comments and variable names, as part of the documentation, not part of the code. A long comment explaining the structure and guarantees of your program is absolutely a form of specification.

Sunday, January 7, 2018

The Three Levels of Software: Why code that never goes wrong can still be wrong

Update 3/20/2019: Previous versions of this post used the terminology Level 1/Level 2/Level 3 and "Level 3 bug" a lot. I'm diminishing this in favor of "Runtime/Concrete Implementation/Logic level" and "Error of modular reasoning," because people started using "Level 3" to mean "Everything about the code that I don't like that can't cause an incorrect output."

Have you ever stopped to consider what it means for a program to be wrong? I mean, really stopped to consider? Like, “it’s wrong if it crashes — but what if the crash conditions are only hypothetically achievable — but wait…”. Let’s get to the bottom of this. Here’s a first try:

Definition #1
A program is wrong if it runs and produces a wrong result.
(Where “wrong result” is broadly defined to include performance bugs, usability bugs, etc.)

So I put my bread in the toaster, and, instead of giving me a piece of toast, it gives me a rhinoceros. Not what I wanted. Definitely a bug (err, mammal).

Programs can still usually produce the correct result but be wrong. So I get a new toaster, and I’ve never had a problem with it. Then I find out that, if I put in a thin slice of Bavarian rye when there’s a full moon out, and I’ve used the toaster exactly twice in the last 24 hours, then it will fry its circuits. Sounds like a bug. So we amend it into a 2nd definition.

Definition #2
A program is wrong if there exists some environment, sequence of events, or other “input” under which it produces a wrong result.
Now let’s eliminate all that. I’ve patched the software on my toaster, and I’ve both proven the code correct, and had a trillion monkeys test every conceivable situation. There’s now nothing you can do short of taking a sledgehammer that will make the toaster malfunction.

It’s still possible for the code to be wrong.

How? Simple. Let’s say somewhere in that toaster, it sums all the elements of an array — but it accidentally goes over by one. Memory error? Well, it’s written in C, so it just adds on whatever is in the next address in memory. That address is a byte which is always 0 for whatever reason, so it gets the right answer every time.

The code is still wrong. It may never fail, but the reasoning for why the code should work has a hole: what should have been a simple argument of reading an array now depends on complex assumptions about the compiler and memory layout, as well as a whole-program check that that byte is always 0. And that hole can cause an actual failure in future versions of the program: someone rearranges the fields in a structure, and code that should never have been affected starts failing.

We reach our third definition:

Definition #3
A program is wrong if the reasoning for why it should be correct is flawed.

Progress! We’ve gone from a clear and simple definition to one that’s handwavy and impossible to use. Actually, it’s quite rigorous, but I’ll have to teach you some concepts from formal verification to make it more concrete.

All three definitions are correct to use at different times. This forms one of my core teachings: that, when we talk about programs, we speak at one of three levels. These three levels are:

  • Level 1: Runtime. The runtime level deals with specific values and a specific environment from a single execution of the program. A lot of debugging is done at the runtime level.
  • Level 2: Concrete Implementation/Code. At the level of the concrete implementation, we think about what the current implementation could do when given arbitrary inputs and an arbitrary environment. Behaviors that cannot happen are not considered, even if it requires global reasoning to rule them out. A lot of implementation work is done at the code level.
  • Level 3: Design/Logic: At the level of logic, we consider the abstract specification of each unit of a program. When using other units, we only consider the guarantees made by the spec, and assume they may be replaced at any time with a different implementation. Many programs which are correct when viewed at the concrete implementation level are not correct when viewed at the logical level, because they rely on behavior which is not guaranteed to hold in all future versions. We call this an error of modular reasoning, because functions with such errors lack a desirable property: the ability to argue that the function is correct only from the function’s code and from the contracts of the function's dependencies, without need to even glance at the dependencies. Most software design is done at the level of logic.

I’ve met programmers who confuse the runtime and concrete implementation/code levels, but not many. The runtime and code levels are concrete enough: you see the runtime level by running under a debugger and inspecting the stack, while you see the concrete implementation level just by reading the code and thinking about what could happen. But most programmers have a much harder time getting that there even is a logical level, and I’ve seen much confusion come from when one programmer is talking about what a component is guaranteed to do, and the other is talking about what it happens to do. Most programmers will be taught the difference between interface and implementation at a high-level. But very few get to see the full details of what defines an “interface” beyond just “list of functions.” That can only be seen when doing formal verification, where a program’s properties and assumptions are written just as concretely as its source code. In everyday development, all that structure and reasoning is still there; it’s just scattered across documentation, comments, and programmers’ heads.

Statements at different levels do not mix. So if the proposed client/server protocol says the client should send a request twice and discards the first result (a Level 3/design-level statement), and the designer tells you it’s because there are three different kinds of request handlers in the codebase, and Bob’s sometimes gets it wrong the first time (a Level 2/implementation-level statement), you should get confused. You should be as confused as if someone wanted to call a file or write to a function.

Isn’t our goal to deliver working software to customers, and so correctness at the concrete implementation level is all that’s important? No, our goal is to be able to continue to deliver working software far into the future. The logical level is all about how modular are the interactions between the components of your software, and that’s important if you care at all about having different versions of those components, like, say, if you wanted to rewrite one of them tomorrow. Joel Spoelsky relates how the original Sim City had a use-after-free error. At the concrete implementation/code level, this was totally fine, since freed memory in DOS was valid until the next malloc, and so the program worked. At the logical level, this was a defect, because the spec for free says you need to act as if memory gets eaten by a dragon as soon as you free it, and any future free implementation may actually eat it. Sure enough, once Windows 3.1 rolled around with a new memory manager, SimCity would start crashing. Microsoft had to add a special case to check if SimCity was running and switch to a legacy memory manager if so.

People sometimes tell me about how software is so easy and you can just have an idea and make it and it’s so cheap because there’s nothing physical to build. Hogwash. Software is a domain where we make decisions that can’t be undone and have to be supported for all eternity. The HTTP “referer” is forever misspelled, and that SimCity special-casing code is still there 30 years later.

This is why it’s important to get programs right at the logical level, making it so you can argue each component is correct independently of the rest of the program, even if the overall program passes all your testing and meets external requirements.

And, lesson for API designers: This is why it’s important to make your APIs conform as strictly to the spec as possible, at least in Debug mode. This all could have been avoided if DOS’s free were to deliberately zero-out the memory.

I now have two years experience teaching engineers a better understanding of how to avoid complexity, improve encapsulation, and make code future-proof. But ultimately, that knowledge all stems from this single master insight: the most important parts of our craft deal not with code, but with the logic underneath. The logical layer may be hidden, but it’s not mystic: we have over 40 years of theory behind it; see the postscript below for a taste of the details. When you learn to see the reasoning behind your system as plainly as the code, then you have achieved software enlightenment.

Next time: why this implies keeping design and implementation separate.

Postscript: Technical Details

I could speak a book on harmony and composition, but you’d learn more from listening to a song and seeing the sheet music. Similarly, I could ramble on about reasoning and assumptions, but to truly understand it, you need to see the objects under study. The three levels deal with different views of a program: executions, code, and specifications. Each corresponds to its own kind of reasoning.1 Let’s bring in the math!

Level 1: Traces and states

The objects of study of Level 1, the runtime level, are traces and states. A trace is a sequence of events that occurred in a program execution. A trace looks something like this:

Enter method postProfileUpdateToServer
Read field profile to local variable p
Enter method saveCurrentState
Leave method saveCurrentState
...

Traces can be very high level, like which microservices get run, or very low level, involving instruction scheduling on the CPU. They tell you exactly what happened, and give the information to construct the current state.

The state is a collection of cells with their current value. It looks like this:

{ p=Profile(name=Bob, id=42, ), 
  this=NetworkClient(socket=..., baseUrl=http://mywebsite.com/api/”),
  __messages_output=[Initializing MyApp version 5, Welcome, Bob!, ],
  
}

Most often, the state just consists of values in memory, though it can be helpful to also include pieces of the environment, or a list of what’s already been output or sent over the wire.

Any statement that can be phrased in terms of specific traces and states is a runtime-level statement. These constructs should be familiar to programmers: printouts let you view fragments of a trace, while a debugger lets you observe the current state.

The corresponding manner of automated reasoning is ground reasoning. Ground reasoning means reasoning only about concrete values with no quantifiers. So, suppose I have this code:

1:       left = x - 10;
2:       if (left < 0)
3:         left = 0;
4:       right = min(x+10, 100);
5:       print(right);

Say I know from the printout that right=50 in a given trace, and I want to know whether it’s possible that the branch on line 3 executed. I could answer this question by asking a solver to find a satisfying assignment for the following formula:

left<0  (left=x-10 
          right=(if x+10<=100 then x+10 else 100)
          right=50)

Level 2: Code

The object of study of Level 2, the level of concrete implementation, is the code. Yes, code, the stuff you work with every day. Any statement that can be phrased in terms of the code, but not a specific trace or state, is a code-level statement. Let’s look at the following example, which computes a damage bonus in a hypothetical game:

public double computeDamageBonus(int creature1AttackSkill,
                                 int creature1ArmorPiercing,
                                 int creature2DefenseSkill) {

  int adjDefense = creature2DefenseSkill - creature1ArmorPiercing;

  if (adjDefense <= 0) {
    return 10;
  }

  double factor = (double)creature1AttackSkill / 
                     (creature2DefenseSkill - creature1ArmorPiercing);

  if (factor > 10)
    return 10;
  else
    return factor;
}

By picking specific inputs to this function, it will yield a trace and a sequence of states. But this program encodes an exponentially large number of possible traces and states. At the runtime level, we can ask “Did this execution experience a division-by-zero error?” At the concrete implementation level, we can ask “Is there any execution of this function that experiences a division-by-zero error?”

The corresponding manner of reasoning is first-order logic. This means we can write down formulas that say “for all inputs X, does this property hold?” or “does there exist an input Y, such that this function crashes?” What we can’t do is quantify over other functions. Here’s a first-order formula that states that the computeDamageBonus function can never have a division-by-zero error:

creature1AttackSkill, creature1ArmorPiercing, creature2DefenseSkill.
  (adjDefense = creature1ArmorPiercing - creature2DefenseSkill
  ¬(adjDefense  0))
     creature1ArmorPiercing - creature2DefenseSkill  0

Level 3: Specifications

The object of study of Level 3, the layer of logic, is the specification. I also call this the design level, because most software engineering terms, such as modularity and encapsulation, can only be defined in terms of the logic and specification. There are many, many ways of writing specifications, but a popular one is the Hoare triple: preconditions and postconditions. Here’s one for malloc:

malloc(n)
Pre: n > 0
Post: retval  NULL  alloc(retval, n)

At the concrete implementation level, we could ask questions about a specific implementation of malloc, like “How much memory overhead does it use?” At the logical level, we can ask questions about all possible implementations of malloc, like “does this program have a memory error?”

The corresponding manner of reasoning is higher-order logic. Higher-order logic is like first-order logic, but we can now quantify over functions. First, let’s translate the spec above2:

MallocSpec(m) = n. n > 0  m(n)  NULL  alloc(m(n), n)

Now, if we hypothetically had a formal specification for the PlaySimCity function, a spec that it works for any implementation of malloc would look something like

m. MallocSpec(m)  SimCitySpec(m, PlaySimCity)

And that, my friends, is modularity reduced to a formula.

A really big thing about specifications is that they can involve properties which do not appear in the code at all. We defined malloc in terms of this alloc predicate, which has no intrinsic meaning other than “stuff returned by malloc.” But what this does is let us relate malloc to other operations. We can give the act of dereferencing pointer x  a precondition ∃a,n. a≤x<a+n ∧ alloc(a,n), and give the free(x) function a spec that destroys the predicate alloc(x,n). Now, having a proof of alloc(a, n) when reasoning about the program means “this memory was returned by malloc, with no intervening free” — exactly our intuitive notion of memory being allocated!

What we’ve shown here is that “this memory is allocated” is a specification-level notion which is independent of the code. And indeed, there may be nothing within the program to indicate that a piece of memory has been allocated.  It’s possible that being allocated corresponds to some internal data structure of the memory manager, but it’s also possible that your code will be compiled for a machine with infinite memory.

Now I can state exactly in what sense the SimCity code was wrong: A precondition of dereferencing a pointer is that the pointer is to allocated memory. They tried to dereference a pointer without meeting that precondition, i.e.: no proof of alloc(a,n). So, their code ran fine for one implementation of malloc, but, as they sadly learned, not for all of them.

When designing software, I always recommend trying to think in pure concepts, and then translate that into the programming language, in the same way that database designers write ER diagrams before translating them into tables. So whether you’re discussing coupling or security, always think of the components of your software in terms of the interface, its assumptions and guarantees, and mentally translate them into formulas like the ones above. So much becomes clearer when you do, for logic is the language of software design.


Acknowledgments

Thanks to Elliott Jin and Jonathan Paulson for comments on drafts of this post.


1 Don’t get too hung up on the analogies between the levels of software and the modes of reasoning; there are plenty of exceptions. For instance, much of the progress in program analysis/verification/synthesis research comes from finding all sorts of tricks to encode more complicated problems into a form that can be solved by ground reasoning, since we have good solvers. This likely isn’t going to be too relevant to you unless you work in programming tools.
2 I am lying slightly in this example — the kind of logical formula I gave is really just meant for pure functions. To handle malloc properly, we’d want to use separation logic.