Thursday, May 16, 2019

Book Review: Practical TLA+

I want to make something clear: I am not, and have never been, an advocate for formal verification of production systems. Sometimes people think I am because I talk about ideas from that space, but my perspective has always been to use knowledge about how machines reason about code to better understand how humans reason about code. There’s a small core of people pushing the technology, and a smaller core pushing for using it today, but anyone who works near that space has watched adopters get burned repeatedly. It’s thus become a trope for some engineer to gush enthusiastically about verifying their system while an expert explains all the reasons why it’s a bad idea.

Fortunately for the world, Hillel Wayne is one of those advocates. Hillel is exactly the sort of person whom common wisdom dictates would never need formal methods. When you think of a formal methods guy, if you think of anything at all, you’re probably picturing an ivory tower researcher or someone who works on airplanes. Instead, Hillel was at eSpark Learning, writing software to help schools manage their iPad collections, the kind of stuff where “just listen to your users and write your tests” is supposed to be enough to make your software grow big and strong.

Unfortunately, reality did not cooperate:

Having been failed by mainstream design techniques, they searched for anything that could help them. Salvation came in the form of model checking, a technology for specifying systems at a higher-level and then exhaustively checking whether some property is true over all states. Specifically, they picked TLA+, the model-checker developed by Leslie Lamport. Bolstered by this success, Hillel left his job last year to become a professional TLA+ teacher and evangelist. A few months ago, he released his book “Practical TLA+,” which has the dubious distinction of being the first formal methods book with “Practical” in the title.

As a disclaimer, Hillel and I discovered each other’s writings last year, and have become friends since learning we might be the only two bloggers who are fascinated both by SAT solvers and the history of UML. And so, when his book came out...hmmm, how does it go? “I received a free copy of this product in exchange for my honest review.” To compensate, I’m now incentivized to be extra-harsh.

Practical TLA+ is a programming book, the kind that makes great fodder for the Benjamin Franklin method. After a tour-de-force showing how you’d use TLA+ to catch mistakes in an errant wire-transfer protocol, the remainder of the first half is a lot like what you’d get in an intro book on Python or Scala: teaching you variables, then functions, then concurrency. Of course, all of these things work differently than what you’d find in Python, as in Python you can’t say “let S be the set of all integers” and expect anything useful to happen. And Practical TLA+ is still shielding you from the really crazy stuff, as these chapters really just teach PlusCal, a sub-language of TLA+ in which you say “x := 1” instead of “let S be a state and S' be another state which is the same as S except that x = 1,” with that stuff being left to longer books such as Lamport’s Specifying Systems.

The second half of the book is dedicated to applying TLA+ to an ever-more-sophisticated set of believable examples: first data structures and state machines, then a library and its loan policy, and then managing a MapReduce cluster. Pretty neat.

But I don’t care about that. I’m here to ask one question: Will reading this book make you a better programmer?

Why is this language different from all other languages?

If you’re coming from Haskell or OCaml or even TypeScript and you hear that TLA+ is about correctness, then you’re in for a culture shock. Leslie Lamport is in the camp of verification-people who like state machines and dislike types, and some days I think Leslie Lamport created TLA+ specifically to troll aficionados of typed functional programming. These are people ever in the pursuit of better static-checking and more abstraction. Then they make it all the way to doing full verification of high-level models, and wind up at TLA+, an untyped language which likes to represent everything as sets of integers and strings, and where all the errors come as Java stack traces.

Once you’ve gotten over that a server might just be represented as the string “server1,” then, coming from a traditional programming language, there are two things to know about TLA+.

The first is that you’re building models, not implementations. Imagine building a system for managing corporate bank accounts, in which each organization has many users, but only one user at a time can control the account. An implementation describes all details of a system in a manner that a machine can execute it. The state of an implementation might look like this: “A user has a name, an id, a company, a role, and an avatar. A company has a name, a list of accounts, a logo, and billing information. For each account at a company, at most one user may manage it. There is one server thread per logged-in user. Each thread may be on one of 100,000 lines.” Suppose someone asks a question about the system, like “Is it possible for an account balance to go negative?” To find the possible failure scenarios, you’ll be tempted to trace through the code: “gosh, well, if two people at the same company send a wire at the same time, but only one person per company is authorized to send a wire — but wait, if you change permissions while sending a wire — but wait, doing those at the same time might not be possible because they both access...ahhhh!”

The way to turn an implementation into a model is to remove details until it sounds like a toy: “There is a set of companies C, a set of users U, and a mapping of users to permissions.” Instead of implementing logic for managing sessions and handling web requests, you just say “At each time step, some user performs an arbitrary action.” It looks like this:

with u \in IdleUsers, a \in Accounts do
    call SendWire(u, a);
    call ChangeAccountPermission(u, a);
  end either;
end with;

With the system abstracted to a few pieces of state and high-level actions, we ask again: can an account balance go negative? The TLA+ way to find out is to set the set of companies to {a,b,c}, consider all possible assignments of account balances to numbers 0-10, and let TLA+’s model checker, TLC, brute-force all possible states. In this case, TLC found a 12-step trace that leads to account overdraft: one user goes to send a wire, but, after checking account balance and before completing the wire, another user on the same account changes himself to the admin, and sends another wire. The full example is available here. While this is a contrived example, there are many real cases where TLC’s exhaustive checking has found concurrency bugs that survived multiple levels of conventional testing.

The second thing to know about TLA+ is that its syntax makes no sense. As Hillel himself points out, TLA+ is full of inconsistencies and gotchas. One of these gotchas had me completely stuck in the first chapter, until I asked Hillel to bail me out.

For instance, imagine you’re running an event with 100 guests, and you want to track what each of them needs to do before they can enter. So, you put the guests into a set called guests, create an outstanding_tasks array, and initialize it to [guests |-> {"check_in", "pay_fee", "sign_waiver"}]. What you’ve actually just done is created a map with a single key, that key being the string “guests”. So, you try changing the |-> to a ->, which instead gives you a set of 3100 functions, each of which map each of the hundred guests to either “check_in”, “pay_fee”, or “sign_waiver”. What you actually need is [x \in guests |-> {"check_in", "pay_fee", "sign_waiver"}]. As Hillel warns, you will mess this up multiple times.

Using PlusCal instead of raw TLA+ makes things worse: you’ll literally be writing all your code inside a comment block and pressing a button to generate the TLA+. If you’re lucky, you’ll get an error in your PlusCal source instead of in the generated TLA+.

Executable Pseudocode?

Back in the day, model checking had been sold to me as a way of automatically finding bugs in programs. Imagine my disappointment when I learned it actually focused on, well, models, and would leave the actual program unchecked. The examples in this book finally convinced me that building a model of the system separate from the program is quite valuable, enabling one to “play with the design.”

In particular, the library example in the penultimate chapter really drove home for me the value of TLA+ modeling. In that example, you are trying to design the checkout and reservation policies for a library. In its model, there is a set of users who each want a set of books, and, every time step, one of them does an action based on these preferences. With every reservation policy, there are ways for a jerk to abuse the system, such as by reserving the same book over and over again so that no-one else can read it. With TLA+, it’s easy to check the property: if a user wants a book, then she eventually gets it.

This use of models — expressing all possible ways for the system to interact with users — really made the benefits of TLA+ click for me. And then I realized how I could have used it in the past.

My old company Apptimize builds software for A/B testing mobile apps. With Apptimize, if you want to know whether changing a button size in your app would increase upgrades, you could create the change by clicking around in a browser, and 10 minutes later have multiple variants running on a million people’s phones. Statistical validity is very important, and, with the dual state machines of Apptimize users starting and stopping and changing experiments, and app users uninstalling and reinstalling apps, there’s a lot of room for statistical error.

We had been finding bugs where you could enroll 5% of users in an A/B test, and then bump it up to 50%, and then down to 5%, and the chances of someone being unenrolled would be slightly too low. To catch one of these bugs, we’d have to think of a scenario, take the formulas from the code, and then do probability calculations.

After fixing a couple of these, we decided that allocating users to A/B tests this way was far too dangerous, and moved more of the decisions server-side. But, with a bit of finagling to fake the probabilistic aspects, we could have gotten it perfect with TLA+.

I’ve taught many engineers techniques that make it much easier to implement some policy without mistakes. But for showing that a 100%-correct implementation can’t be abused by someone trying to cheat the system — for that you need something like TLA+.

My Model of What You’ll Learn from Learning Models

The above example exemplifies the key takeaway that I think a system developer can take from learning TLA+: how to reduce a system to a model, and what questions you can ask of a model. Reducing the system to a model requires another idea unfamiliar to many programmers: how all the questions about timeouts and schedulers and inputs boil down to “and here a nondeterministic action may occur.”

In abstracting a system to a model, the system stops acting like a program, and new mental tools are needed. It’s here that Practical TLA+ falls short. Practical TLA+, despite the name, is a PlusCal book1. Leslie Lamport created PlusCal to make specifying state machines look like imperative programming. In doing so, one loses some of the concepts that are truly new.

For instance, one thing you can do with models is to have multiple models of the same system at different levels of detail, and show that one refines the other. This evolved from an idea introduced in 1971 by Niklaus Wirth in this article, where he showed how to rigorously evolve a high-level idea with steps like “pick an arbitrary square on the chessboard” into an actual implementation. Many of my students find this idea revelatory, and TLA+ offers a chance to actually play with it in a compiler rather than on pencil-and-paper. Hillel repeatedly introduces refinement only to explain that he won’t teach it: you need raw TLA+, not PlusCal.

Similarly, PlusCal’s imperative veneer leads to some confusing behavior around guards, expressed as PlusCal’s await keyword, which are half of understanding how to model systems with nondeterminism. You might look at the snippet x := 1; await y = 0 and think “it sets x to 1, and then blocks until something sets y to 0.” In fact, TLA+ actually tries to run both of these steps simultaneously, meaning that, in a state where y is not 0, the x := 1 can never run. This is most easily understood by looking at the actual TLA+ state formulas, which requires a mindset shift from imperative programming (“Give me an action!”) to state machines (“In this state, these transitions are possible”).

The final idea in this book useful to general system-building is temporal properties. Most programmers are taught about functional testing: if the program gets this input, then it gives this output. However, systems also care about temporal properties. These are properties about sequences of events, like: “If a user orders a pizza, they eventually receive it, or get their money back.” This property would be written in TLA+ as something like “Order pizza ⇒ ◇(receive pizza OR money back)”, where “◇” is the “eventually” operator. One chapter of Practical TLA+ is spent teaching this temporal logic, and I think having a language for temporal properties would be useful to any system designer. However, the book gives very little practice in expressing temporal properties. There are also many temporal properties that TLA+ cannot express, such as “Until an action is committed, a user may abort the action at any point.”


Practical TLA+ is the first of its kind: a practical programming book about formal verification. It is still a rough book. Whenever a snippet of code repeats, the changes are supposed to be marked in bold, yet I’d often find the bolding missing. He once introduced a piece of syntax, and then didn’t use it till 100 pages later. I am not a typical reader of the book, seeing as I took my first model-checking course at age 18, and had the author on instant messenger the whole time. For everyone else, there’s a lot of places to get tripped up. He did create a forum for TLA+ users, so, for prospective TLA+-users, I can still recommend this book to if looking to start in the next year, and to wait for the second edition if not.

For the programmer looking for mind expansion but not code verification, I don’t think this book has much to offer. The benefit of cross-training comes from seeing ideas that are a subtle part of one discipline become a blatant part of another. Just as it’s easier to understand harmony by playing piano than by singing in a choir, new paradigms are valuable when they take concepts that are implicit in software design and give them syntax and compiler errors. For TLA+, these concepts are: modeling, nondeterminism, and temporal correctness properties. However, Practical TLA+ gives you little practice writing correctness properties, and does its best to hide the most mind-expanding parts of modeling (refinement) and nondeterminism (guards) from the reader. It does what it set out to do: turn formal verification into ordinary programming.

But, nonetheless, I can say that I went through Practical TLA+ in an hour a day for 10 days, I had fun doing so, and — hey — now I know TLA+.

Rating: 3.5/5

UPDATE 5/17/2019: Ron Pressler writes a response here. I had knowingly been loose on terminology in much of this review, and he called me out on a lot of it.


Thanks to Hillel Wayne, Chris Khoo, and Elliott Jin for their feedback on this post.

1 Hillel was adamant in saying that PlusCal is TLA+. My take: In terms of getting (almost) all the power of TLA+, yes, PlusCal is TLA+. In terms of being able to write code that closely maps to how it actually works, which is important for building a mental model, no it is not.

Sunday, December 9, 2018

My Strange Loop talk: You are a Program Synthesizer (video + transcript)

In my day job, I work with programs that write, analyze, and transform other programs. You can't do this unless you have some special insight into how programs work. Much of my night job is finding these insights and making them accessible to non-specialist programmers. A few months ago, I spent a week in St. Louis, where I gave my first industry conference talk, at Strange Loop on this topic, as well as my first academic conference talk at ICFP: Capturing the Future by Replaying the Past.

I missed most of both conferences to finish preparing this talk, but it was worth it: this is the talk I've wanted to give for two years.

Video: You are a Program Synthesizer


This transcript was quite something to prepare! Including a few omitted slides, the slide deck for this talk totals over 100 slides, and over 350 transitions. For the most part, I present a slide before the accompanying speech, and show only a few key frames from each animation.

Hello, my name is Jimmy Koppel, and I live a dual life. By day, I work at MIT. I'm a researcher. I work on program synthesis, programs that write programs. By nights, I lead workshops, teaching software engineers like yourselves how to write better code.

Today, I'm going to talk about something from intersection of both worlds.

Software engineering advice today is confusing. For anything you read online, you'll find someone else arguing the opposite. Today I want to ask the question, how can we find a source of truth in software engineering advice?

This whole situation reminds me of another source for truth: how do you prevent scurvy?

Scurvy is an ancient disease caused by Vitamin C deficiency. After thousands of years of deaths, finally the 1700s, a British scientist named James Lind proved in an experiment the cure: eat citrus. And so the by the end of the century, the Royal Navy had started requiring all of their sailors to drink lemon juice. Scurvy rates went from crippling to almost none.

But then in the 19th century, they lost the cure. They didn't understand it well enough. They thought, "Lemons? Why not limes?" Big mistake. As a result, there were multiple voyages to the poles as late as 1910 in which everyone got scurvy and people died.


Now imagine that you are an 18th century sailor and you ask the powers that be, "How can I prevent scurvy?" The answer comes, "Eat citrus…..but wait! Lemons, not limes. And wait — do not boil it. And wait — do not store the juice in copper. If you store it in copper, it's worthless." This is kind of starting to sound like witch doctory.

From this, I coin the term citrus advice. The advice to eat citrus is good advice. It saved people’s lives. But it was not precise enough to capture the reality, and so it came with this long list of caveats that made it hard to use.

The precise advice is to get vitamin C. And while it took about 200 years to get from citrus to vitamin C, this is the ideal to be aiming for: the simple thing that explains all of the caveats.

Now compare software engineering advice.

Guru: Discover abstractions to make your code shorter. [Paul Graham]
Student: Okay, I abstracted my code.
Guru: No, that is the wrong abstraction. Duplication is better than the wrong abstraction. [Sandi Metz]
Student: Well, then how do I know if an abstraction is good design?
Guru: Well, you see, good design leads to simplicity. [Max Kanat-Alexander: ]
Student: Okay, I simplified my API.
Guru: No, you broke your API, and if you break your API, that creates complexity for other programmers. [Max Kanat-Alexander]

So, this advice: not so easy. It's citrus advice.

I think that if you were to take these three people — these are all close to but not actual quotes by these people — if you were to show them the same piece of code and ask the same question, you would get similar advice. This goes to show you that they have some deeper intuition about some deeper reality, that these words are not precise enough to capture.

So how do we get to more precise engineering advice?

Many years ago, I got really into program synthesis. Program synthesis is amazing. It's this program where you just give it stuff, you tell it what you want to do: on this input it should do this, maybe give it a little bit of information about what the program should look like, and it thinks for a while with its algorithm, and it gives you a correct program. It's cool, but a really nice thing is: you cannot BS a synthesizer. You cannot BS a synthesizer, and you cannot BS a synthesizer.

If you ask the programmer, "Hey Bob, if I do this thing, will it make it easier for us to program later on," it can be a pretty vague question, hard to answer. But if you ask, "If I do this this way, will it make it easier to synthesize the algorithms on paper," you can just look at it and get an answer. And if you want to know, if I have this data format, of the millions and millions of possible pieces of code that I can write, is there anything that's clean? You can just try it, give it a million examples, get an answer: no.

So we're going to look at the parallels between synthesis and human programming. We're going to dive into the different schools of synthesis. Broadly speaking, there are three kinds. There are inductive synthesizers — theorem proving, logic. There’s enumerative, searching very cleverly through large spaces of programs. This is how your database optimizes your SQL queries, by the way. And there’s constraint-based, using constraint solvers, In the interest of time, we're mostly going to talk about the deductive today, a little bit about constraint based, and no enumerative.

But there is a very important fourth school of synthesis. In fact we have a few hundred examples of them in this room. That is: you.

You are a synthesizer. You write programs. And while you might work a little bit differently than these other algorithms — you do not have a constraint solver in your head, although I do think you do work a bit like the deductive synthesizer and do that kind of deductive reasoning — still, there's only one universe of possible programs, only one universe of possible algorithms for writing programs. And so by looking at these other schools of synthesis, we can learn insights that we can take into our own programming. And that is the thesis of today's talk.

And throughout our journey to synthesis land, we will see three common recurring themes of abstractions, constraints and logic.

Enter deductive synthesis. At a high level, deductive synthesis is about taking a high level idea and through a series of mechanical transformations, making it more and more refined and precise until you reach an implementation. So what is the information a deductive synthesizer works with? Let's look at this.

When we talk about software, we can broadly categorize what we're saying at one of three levels. Here's a max function. You can talk about the runtime values: what is it you want as inputs? Or, we can talk about all inputs and look at the code. So runtime’s level one; code level two. But this is not capturing the why of the code. It doesn't explain why it works. It does not explain how you came up with it. That information is in the logic, in the derivation or proof. And I call this “Level Three.”

As you might guess from the relative size of these boxes, the most information lies in level three. This is the why. This is the stuff that's only in your head normally. But to a synthesizer, this is just as concrete as the code it outputs at the end. So let's look at it from the synthesizer's perspective.

Some quick background: here's what's a deductive synthesizer looks like. It has some goal, like “I need to get an integer which is the maximum of two other integers,” and you write down a logical formula for what that means, specified in some way. And so our goal is we're given two integers and after running the program, then the condition “it's the maximum of these two inputs” will be true. Its tools: it has a bunch of rules that that relates a piece of code to the abstract logical meaning of the program. And now it can do a search.

So we want this maximum function. Okay, let's try running the if rule. We know the outer skeleton’s an if, and we have two subgoals to prove.

We go into the search; maybe we try the if-rule again. Okay, let's try another rule. Another rule. Okay. Maybe using if there was not such a good idea.

It's trying an assignment instead. Ah, that works. There's my assignment. Now I know a little bit more about the program, and it keeps going like this and eventually it gets the actual program.

So that was a 40 year-old synthesis algorithm in a nutshell. But you'll also see this idea in a lot of newer systems. If you were at Nadia's talk an hour ago, you saw her talk about a deductive synthesizer called Synquid. Nadia was a postdoc in my lab; the world of synthesis is very small

Here are a couple of other systems: λ2 and Fiat. Note that all these only do small functional programs because synthesis is very hard. This is why instead of actually using this today, I'll tell you to not use it. Instead learn the insights, and use them to improve the algorithms inside your head. By looking at things through this lens, deductive synthesis, we can come up with some deeper insights into code.

Let's look at specification level notions. I'm running a web app and I need sanitized strings; no XSS attacks. So, to sanitize a string, I escape all single quotes. And, now I have two choices in how to use this. Before, I saved my data to this. It must be sanitized and you can do it either by calling this sanitized function that I just wrote or by doing it by hand. Take a few seconds to look at these, decide whether you prefer Option 1 or Option 2.

So, when I've shown this code to people before, a plurality prefer option one, a slightly smaller number have no opinion, and a small minority prefer option two. In a minute, I'll give you a sense in which option one is a correct answer (under some interpretations).

Well how else might you might have come to this conclusion? Well, you might've used the folk engineering advice and it's like: this one’s more abstract, but the other is more direct, and this one’s harder to see what’s happening. It's more work to understand. It's easier to change. And when you're working at this level, it’s basically like two lawyers at the podium arguing their side, and both of them have case law to support them. It's hard to get a real right answer. Even if you do, it might not be good enough.

So here's an engineer that I used to train, let's call him Steven. And his head is full of citrus advice.

He looks at this and says, "Option one, because it centralizes the code." Good job, Steven. But his understanding of the information hiding was not deep enough because five minutes later, I gave him a similar example:

So this time, I have my Twitter box and I type in 139 characters and it says: too long. What?

Because: it escaped my string, and it double-counted every single quote, because every single quote became two characters. So let's not do that. Let's un-double-count the single quotes. (And I'm very glad that the Python standard library has this count function. It made this slide easier to write.) But I propose instead of doing that, we abstract this “counting pre-sanitized string” into a function. Steven looks at this and he says, "No. Option one is over engineering. This is premature abstraction. EVIL!"

In a moment, we shall see how there is a principle that picks Option 1 of both circumstances. And if you adhere to this principle, then not only is Option 2 dispreferable, it is wrong. And by wrong, I mean, compile-error level wrong.

This comes to the idea of encapsulating ideas. Information hiding is not about private methods and encapsulating classes. It's about encapsulating ideas — and what the F does that mean. So, this is one of those vague things you might've heard before, where once you've learned to think like a synthesizer and actually see the logic it's based on, it becomes very precise. And so, let's do it. And so, through this lens of deduction and deductive synthesis, we're going to see how Option 1 is preferable in: information hiding, how easy it is to think about, and how easy it is to write. And we’ll give precise meanings to all three of these.

Let's begin just by talking about what we mean by sanitized. Well, right now it means that every single quote is escaped. I can write this down formally in one of many notations. I'm going to use Coq, because I'm familiar with it, but you can pick any of these others. Maybe you learned one of these earlier today in another session. It looks like this: It says that, before every single quote, there's a backslash. And now other models can use this definition when they’re reasoning about code. But, there's something missing here.

That definition I gave of sanitized, that's just today's definition. Other models can use it, but what if tomorrow I decide I need to escape other things?

I've changed the meaning of being sanitized. Now all this other code, that in their reasoning used the fact that this was what “sanitize” means, those are now broken.

So what I want is to is to put all current and future meanings of sanitized behind some abstract predicate and say, "I'll give you this string. It is sanitized." What do you mean by sanitized? — it's sanitized and that's final. And now this sanitization module can change its mind about what it means by “sanitized” and no one will ever know. It's a secret. And that is an abstraction barrier. Do Not Cross.

(At this point, I unfurl my “Abstraction barrier” caution tape I’ve been passing out at the conference, and get an applause.)

Now, you might've seen a diagram like this before. “Oh, I centralized my functionality. I wrapped it in a function. You can't expose it.” This is not like that. This is purely an idea. This sanitized predicate does not correspond to anything in your code. It is just how you think about the code.

On this slide, you'll see exactly how. So, in Coq, you write this idea of an opaque predicate, an existential predicate with the opaque keyword. Again, you can do this in about any other formalism. This is the abstraction boundary, but written as code. And now other modules do not get to think about that detail. 

All I know is, I got some sanitized thing — whatever that means; it's a black box. So this is the sense in which Option 2 is wrong. Because in order to look at that code and read it as a human and say, "I'm going to justify why this is doing what I want," something in your head has to use the information that sanitized strings have single quotes escaped and nothing else escaped. And so, what this is actually doing is, it's piercing the abstraction barrier (do not cross), and relying on this old piece of information that may change. Dangerous.

Whereas, in Option 1, all the information about what “sanitized” means is encapsulated into the sanitization module. So maybe the sanitized strlen function, which is right next to sanitized, is allowed to know what it means to be sanitized, but this user code is not. Thus, it is a secret preserved. It may be changed.

The same view, the sanitized predicates gives us the answer to the first question. So my sanitize function returns a sanitized string, whatever that means. And before I save, I have to feed it in a sanitized string, whatever that means. When I put use function together, they line up, but when I try to sanitize it by hand, BZZZZT.

It's like, “You gave me a string where you escape all single quotes, I need a sanitized string. I don't know that these are the same.” “What do you mean? That is a sanitized string.” Oh no, you can't use that information. It is opaque, hidden. It's behind the abstraction barrier (do not cross).

But, that's just an interpretation, because there’s another way of looking at the program that gives you the opposite conclusion. Suppose the sanitized function is not allowed to know what sanitize means. I don't know what it would return, but you can't show it returning a sanitized string — it doesn't know what that means. But this other code does know what sanitized means then, it works when you do it by hands, but not when you call this foreign function. Who knows what it does.

There is in the third option in which both of them are acceptable. This is the hippie version. There is no abstraction boundary. Everyone gets to know about sanitization — those dirty hippies. And in that case, in both modules, you're allowed to think about the definition of sanitized and see that these line up and do either.

Now when you're actually programming, you don't write down these formal logic things. You don't write preconditions and postconditions in gratuitous detail and everything. No, you have an idea in your head about what you're supposed to do, but it's not given to the compiler, and so for all three of these interpretations, all three of these worlds, whoever's allowed to know what “sanitized” means ... these all result in the same code. The difference is only at level three, in the logic and the Why, not in level two code.

Which one of these worlds you choose as your interpretation determines which option is correct and preferable. So is it just a matter of interpretation? Well, there's some pretty clear ... like everything's a trade off, but there's some pretty clear reasons to pick Option 1. So in Option 1, I'm all sanitized and I need a sanitized string for save so I could have sanitized x. I need to show what sanitizes x in order to save it, and this proof is trivial. Not only is it trivial, but if I'm programming in Coq, I can actually type in “trivial.” and it'll say, "Yep, that's a proof," and more generally ... it can get more complicated when you have aliasing and conditionals, but it's still going to fall within a fragment of first order logic called effectively propositional, or “EPR,” logic, which is generally pretty easy to reason about. They're fast algorithms. In the other world, I'm doing this replaceAll thing by hand. I need to show it satisfies some complicated condition about what goes where. And we can do this, but it's a little bit harder.

So hard, that it was only discovered this year.

So, in the first case when I first write sanitize, I still need to prove that when I sanitize a string, it satisfies the meaning of being sanitized, and so there I need to think a little bit harder about the sanitize function, but when I actually use it, I can use the fast EPR logic solver. It's easy for me. It's easy for this program to think about. Whereas, in the other options, I don't have a clear module boundary, and so when I call save with this complicated by-hand replace-all'ing, now I need to use the very fancy Chen algorithm and thinking about strings.

In world 1, it's also easier for the synthesizer, because if I say, "Okay, I have a string, now I need a sanitized string and I need it to satisfy some condition about what goes where. Everything's escaped." It's like, "Okay, well how can I get a sanitized string? Is there a way to use replace on those, a way to concatenate things or a way to reverse things?" It has a lot of options to choose from when it’s trying to search.

In the other world, when you say “It is sanitized, what does that mean? — That's on a need to know basis.” Then there is only one option for how to get a sanitized string, which is to call the sanitize function, which exports an interface saying it gives you a sanitized string. And so, that's ... you get the precise meaning of easy to think about and easier to code, synthesis style

So we've explored some correspondences between the deductive schools of synthesis and you. And, we've seen the themes of abstraction in the way that we hid sanitize behind the abstraction barrier (do not cross). We saw how it has put constraints on the program and what is possible to write. And of course, all this only seen at the level of logic.

Let's look at another example. How is it possible to write a straight line program, no indentation, that contains a conditional. Let's find out.

I have a program. On the server, it returns 4096. Elsewhere, it returns 1024. And my friend Steven sees this code, and he sees that if-statement, and he gets mad. Because the Internet's told him “Do not use if-statements. Death to the if-statement. Join the anti-if campaign. If-statements are bad.

So he goes back to his code. He gets an idea. This is in C. That isOnServer variable is either 0 or 1. We can replace this if with an array index. The guru is not pleased: “You have only moved the conditioner into the array.” So this anti-if advice is a little bit less straightforward than it may have seemed. Let's see what that means.

A quick change of notation: I'm going to denote the array [1024, 4096] by saying, "Start with the empty array. At zero, insert 1024. At one, insert 4096." On top of this notation, we have the array axioms. And I'm not going to go into them in detail, but they tell me exactly what it means to index into an array. And it gives me this.

It says at 1, the index is 4096.

Else, let's go deeper into the array.

At zero, it's 1024. Let's go deeper into the array

We're out of array and we're out of axioms and so now it's undefined, but really the only thing you care about is at the bottom right. This simplifies to the formula: if x equals zero, then the index is 1024, and if X is one, then the index is 4096. That formula is a conditional. If x is this, then it's that, and if x is this, then it's that.

Now using this, we can go back to the program and come up with a logical formula to describe what the program does and we get a conditional: if it's on the server then it's 4096; else, it's 1024. We can use other axioms to do the exact same for the original program, and we get the exact same formula. And so, there's a sense that both these have a conditional, the same conditional. The synthesizer only looks at these formulas, and so, from its perspective, these two pieces of code are not just semantically equivalent, they're actually structurally identical. We haven't changed it at all.

And so, conditionals are a semantic notion, not a syntactic notion. Just because you don't write the letters I-F does not mean there is no if-statement in your code. So, the thing that happened here was: the array index was a conditional in disguise.

Does this mean we should have an anti-array campaign and no arrays — death to the array? No, not every array access is a conditional, but the difference is subtle and it cannot be purely determined from the code. Rather, is a property of the level three, how you think about it. Let's inspect.

Here's another array. It's a list of all the president names. Using that, we can define a “president-after” function. The president after Washington was Adams, the president after Thomas Jefferson was James Madison, and so on. Here's a pretty simple function that does that.

It totally breaks on bad inputs, negatives, the last president. Don't tell me about those. It's a simplified example.

Let's talk about what it means to index into this array.

Like before, we get a giant conditional. At 0, it's Washington and at 1 it's Adams and so on. This does not look promising. It looks like a conditional. But if you already have a notion, an external notion of the i-th presidents, then you can relate this formula to that and you can get this simpler formula. It's the name of the i-th president, which does not involving any branching at all.

Now we can use this abstracted formula to look at this code and think about it in a very straight forward manner. Get this index, I name the next president and I'm done. That's how you would think about it. And then that's also how a verifier or synthesizer would think about it. Just like I've written down here.

There's no case-work here. There's no branching. It just works. And so, that's a sense in which one array index is a conditional and the other is not. So, the difference is that we're able to abstract one and not the other. Maybe you could abstract the size of array, find a better way of thinking about it, but I didn't and because I didn't find such a way, that one is a conditional and this one is not

So we've again seen this insight from the deductive school synthesis. And we’ve seen abstraction and how we abstract the array access; this lets us put extra constraints in a program like “Death to the if-statement.” But this only can be understood at the level of logic.

Let's go to another school of synthesis, the constraint based — and this section's going to be a little bit different from previous ones. I said at the beginning, that a human and a deductive synthesizer are kind of similar, and so the deductive synthesis can shed light on soft engineering advice. Whereas the enumerative and constraint- based, they're not like a human synthesizer, so they're just going to illuminate the nature of the programs themselves. We're going to be talking about a tool called Sketch built by my advisor, Armando Solar-Lezama. And Sketch is about programming with holes.

Humans are clever, insightful, and slow. Synthesizers are dumb, thorough, and fast. Let's put them together. You're going to write a high level sketch of the program, what the program title looks like; the synthesizer is going to fill in the details. So here's the “Hello, world” example. That question mark is a hole. And so this thing is asking: find me an integer “question mark,” such that X times question mark is equivalent to X plus X for all inputs X. You might figure out the answer: it's 2. And the synthesizer, will figure that out too, after 0.452 seconds.

You can do more complicated things with Sketch. We have a really inefficient linked list reversal. You can get the fast one. You do need a bit more of a complicated hole than the integer, say like, some number of statements where a statement looks like this, but it can do it. 

Let me give you the two-minute version of the synthesis algorithm. It's two parts: constraint solving and the CEGIS loop.

So, it looks like this. For simplicity, assume all integers are two bits; you'll thank me for that in a moment. We can just say: "Look at the bits of the inputs and the hole and the arithmetic expressions,” and all of these give us constraints that the the digits of the plus expression need to match the corresponding ones in the multiplication expression. So we get a list of constraints on the bits.

And now, we can also write down bit-level formulas for every bit of the plus and the times and it looks like this and being boolean formulas (thank goodness they use only two-bit numbers), and now we'll just solve for what the bits of the hole are.

And if you've taken a theory of computing course, you might start getting a little scared because in the theory competing course, you might have learned that this is the Boolean-satisfiability problem. We need to find a solution and assign them to these bits, and the thing works. That's the SAT problem. Oh no, that's NP-hard!

And that is a lie, because it's actually NP-easy. And by that, I mean that modern SAT solvers can take in millions of variables and solve them very fast, most of the time. In this case, we're lucky enough to be in the most of the time. And so we figure out the bits of the hole and we get the final answer of: 2. Success!

It makes it easier to think about, if you just pick a few inputs at one time instead of all inputs. So on top of this, we built something called the counterexample-guided inductive synthesis loop, or CEGIS. It's basically a conversation between two parts, the inductive synthesizer and the verifier. You have the synthesizer. It takes an input, takes some tests, gives you a program. The verifier takes in the program and says if it always works. If not, it gives a failing input.

So it goes: “Does this work?” “No, it fails.” “But does this work? No; here’s a failing test.” “Does this work?” “Yes.”

In other words, CEGIS is test-driven development.

And from this, I get the idea: maybe you can somehow use Sketch and CEGIS to tell us something about testing. Let's do it.

What we're going to do is: I want to see if my test case is good enough. If it's good enough, if the test passed, my program works. Is there a way to write a program that passes all my tests, but is the wrong program? Let's use synthesis to find out.

A view of testing is that you have some space of correct programs that you want to find. Every test is going to narrow down the passing programs more and more, hopefully to just the correct ones. But there's something missing from this story

Here's a very simple kind of program: curves on an (x, y) plane. Here are my test points, and I want a curve that goes through all three points. Can you guess what curve I have in mind?

That's right. It's this guy.

No, no, no. Here's another test.

Oh, you mean this one?

But if we add a structural constraint that the curve must be a line, then those two options go away. The only option is this line, and not only that, but I need fewer points, fewer test cases.

Programs are just generalized polynomials. Program synthesis is just generalized curve-fitting.

So what does this look like for testing programs? Well, if you have some kind of structural constraint in the programs you write…

…then maybe you don't need this third test. Let's put that in action.

So I want to synthesize a function to do something very complicated: compute the length of a linked list. Here are my three test cases on the left. And for this I do need to have some kind of oracle that tells me what the correct answer is. Using this, I'm going to synthesize a length function which passes all the three tests that I gave it, where there is some input “list question-mark,” which differs from the correct answer. So it passed all the test cases, but it's wrong somewhere. Let's synthesize.

So, first I say it'll have at most three branches, each conditional looks something like this, some reasonable list operations. Now, here's where things can happen within each branch, and let's see if you can come up with something. And it does.

(Sketch produces correct output, not readable output.)

And so, in order to make sure I have the right program, I need to add another test like: this list is length three.

And now it says, if it passed all the tests, there's no way to make it in the wrong program.

But if I restrict my program a little bit, then I don't need that test anymore, and just with only three tests, I must have the correct program.

There's all this talk about more coverage, more tests, more precise, but we don't often hear about writing simpler code, in that the more of one we have, the less the other we need. If I write simpler code, then you don't need as many tests. It's a choice, but I know which end I'd prefer, because there's a saying that quality cannot be tested in, quality must be built in, and this is what it means.

So we've gotten some insights into writing programs, writing tests from the school of constraint-based synthesis. And again, we've seen abstraction in our use of this test oracle, constraints and how we restrict the program space to make it work. And the synthesis algorithm itself is based on logic.

So we've seen a lot of stuff today. You learned the high-level of two different synthesis algorithms. You’ve seen a lot of stuff you probably haven't seen before.

But really what we're doing, is reaching into that black box in your head of how we write programs, and kind of opening up a bit and seeing how it works. And in doing so, we can start to stop accepting the vague answers for what we should be doing, start to accept, maybe you can think a little bit deeper about what's going on. What is it possible for me to write right now? Programs aren't a wild west. Not anything goes. There's a structure in what is possible to write. And we're accepting that we can find it.

So my hope is, that by coming here today, you're all joining me on a journey to get away from the citrus advice we've become used to, start to learn how to see the vitamins in our code. Because the first step, the most important step you've already done, is to get to more precise engineering advice. The big step is simply to believe that it's possible. Because you are a program synthesizer.

So, if you want to see more about this, check out my blog. I love this stuff. I also teach it. I have a web course starting in two weeks. We have one of my former students sitting right there.

Left: Zohar Manna, Armando Solar-Lezama, Nadia Polikarpova, Adam Chlipala, Cordell Green, Sumit Gulwani, John Darlington, Zora Tung

Right (not in order): Alex Miller, Ryan Senior, Mario Acquino, Nick Cowan, Bridget Hillyer, Crystal Martin]

I'd like to thank all the people who made this possible: the people who invented what I talked about as well as the organizers.

But most importantly: St. Louis is my hometown, and so we have a very special guest in the audience today.

I'd like to thank my father, for never telling me that I'm crazy, even when he thought it.

Remember: you are a program synthesizer, do not cross the streams, and do not cross the abstraction boundary. I hope you had a great Strange Loop.