Thursday, May 16, 2019

Book Review: Practical TLA+

I want to make something clear: I am not, and have never been, an advocate for formal verification of production systems. Sometimes people think I am because I talk about ideas from that space, but my perspective has always been to use knowledge about how machines reason about code to better understand how humans reason about code. There’s a small core of people pushing the technology, and a smaller core pushing for using it today, but anyone who works near that space has watched adopters get burned repeatedly. It’s thus become a trope for some engineer to gush enthusiastically about verifying their system while an expert explains all the reasons why it’s a bad idea.

Fortunately for the world, Hillel Wayne is one of those advocates. Hillel is exactly the sort of person whom common wisdom dictates would never need formal methods. When you think of a formal methods guy, if you think of anything at all, you’re probably picturing an ivory tower researcher or someone who works on airplanes. Instead, Hillel was at eSpark Learning, writing software to help schools manage their iPad collections, the kind of stuff where “just listen to your users and write your tests” is supposed to be enough to make your software grow big and strong.

Unfortunately, reality did not cooperate:

Having been failed by mainstream design techniques, they searched for anything that could help them. Salvation came in the form of model checking, a technology for specifying systems at a higher-level and then exhaustively checking whether some property is true over all states. Specifically, they picked TLA+, the model-checker developed by Leslie Lamport. Bolstered by this success, Hillel left his job last year to become a professional TLA+ teacher and evangelist. A few months ago, he released his book “Practical TLA+,” which has the dubious distinction of being the first formal methods book with “Practical” in the title.

As a disclaimer, Hillel and I discovered each other’s writings last year, and have become friends since learning we might be the only two bloggers who are fascinated both by SAT solvers and the history of UML. And so, when his book came out...hmmm, how does it go? “I received a free copy of this product in exchange for my honest review.” To compensate, I’m now incentivized to be extra-harsh.

Practical TLA+ is a programming book, the kind that makes great fodder for the Benjamin Franklin method. After a tour-de-force showing how you’d use TLA+ to catch mistakes in an errant wire-transfer protocol, the remainder of the first half is a lot like what you’d get in an intro book on Python or Scala: teaching you variables, then functions, then concurrency. Of course, all of these things work differently than what you’d find in Python, as in Python you can’t say “let S be the set of all integers” and expect anything useful to happen. And Practical TLA+ is still shielding you from the really crazy stuff, as these chapters really just teach PlusCal, a sub-language of TLA+ in which you say “x := 1” instead of “let S be a state and S' be another state which is the same as S except that x = 1,” with that stuff being left to longer books such as Lamport’s Specifying Systems.

The second half of the book is dedicated to applying TLA+ to an ever-more-sophisticated set of believable examples: first data structures and state machines, then a library and its loan policy, and then managing a MapReduce cluster. Pretty neat.

But I don’t care about that. I’m here to ask one question: Will reading this book make you a better programmer?

Why is this language different from all other languages?

If you’re coming from Haskell or OCaml or even TypeScript and you hear that TLA+ is about correctness, then you’re in for a culture shock. Leslie Lamport is in the camp of verification-people who like state machines and dislike types, and some days I think Leslie Lamport created TLA+ specifically to troll aficionados of typed functional programming. These are people ever in the pursuit of better static-checking and more abstraction. Then they make it all the way to doing full verification of high-level models, and wind up at TLA+, an untyped language which likes to represent everything as sets of integers and strings, and where all the errors come as Java stack traces.

Once you’ve gotten over that a server might just be represented as the string “server1,” then, coming from a traditional programming language, there are two things to know about TLA+.

The first is that you’re building models, not implementations. Imagine building a system for managing corporate bank accounts, in which each organization has many users, but only one user at a time can control the account. An implementation describes all details of a system in a manner that a machine can execute it. The state of an implementation might look like this: “A user has a name, an id, a company, a role, and an avatar. A company has a name, a list of accounts, a logo, and billing information. For each account at a company, at most one user may manage it. There is one server thread per logged-in user. Each thread may be on one of 100,000 lines.” Suppose someone asks a question about the system, like “Is it possible for an account balance to go negative?” To find the possible failure scenarios, you’ll be tempted to trace through the code: “gosh, well, if two people at the same company send a wire at the same time, but only one person per company is authorized to send a wire — but wait, if you change permissions while sending a wire — but wait, doing those at the same time might not be possible because they both access...ahhhh!”

The way to turn an implementation into a model is to remove details until it sounds like a toy: “There is a set of companies C, a set of users U, and a mapping of users to permissions.” Instead of implementing logic for managing sessions and handling web requests, you just say “At each time step, some user performs an arbitrary action.” It looks like this:

with u \in IdleUsers, a \in Accounts do
  either
    call SendWire(u, a);
  or
    call ChangeAccountPermission(u, a);
  end either;
end with;

With the system abstracted to a few pieces of state and high-level actions, we ask again: can an account balance go negative? The TLA+ way to find out is to set the set of companies to {a,b,c}, consider all possible assignments of account balances to numbers 0-10, and let TLA+’s model checker, TLC, brute-force all possible states. In this case, TLC found a 12-step trace that leads to account overdraft: one user goes to send a wire, but, after checking account balance and before completing the wire, another user on the same account changes himself to the admin, and sends another wire. The full example is available here. While this is a contrived example, there are many real cases where TLC’s exhaustive checking has found concurrency bugs that survived multiple levels of conventional testing.

The second thing to know about TLA+ is that its syntax makes no sense. As Hillel himself points out, TLA+ is full of inconsistencies and gotchas. One of these gotchas had me completely stuck in the first chapter, until I asked Hillel to bail me out.

For instance, imagine you’re running an event with 100 guests, and you want to track what each of them needs to do before they can enter. So, you put the guests into a set called guests, create an outstanding_tasks array, and initialize it to [guests |-> {"check_in", "pay_fee", "sign_waiver"}]. What you’ve actually just done is created a map with a single key, that key being the string “guests”. So, you try changing the |-> to a ->, which instead gives you a set of 3100 functions, each of which map each of the hundred guests to either “check_in”, “pay_fee”, or “sign_waiver”. What you actually need is [x \in guests |-> {"check_in", "pay_fee", "sign_waiver"}]. As Hillel warns, you will mess this up multiple times.

Using PlusCal instead of raw TLA+ makes things worse: you’ll literally be writing all your code inside a comment block and pressing a button to generate the TLA+. If you’re lucky, you’ll get an error in your PlusCal source instead of in the generated TLA+.

Executable Pseudocode?

Back in the day, model checking had been sold to me as a way of automatically finding bugs in programs. Imagine my disappointment when I learned it actually focused on, well, models, and would leave the actual program unchecked. The examples in this book finally convinced me that building a model of the system separate from the program is quite valuable, enabling one to “play with the design.”

In particular, the library example in the penultimate chapter really drove home for me the value of TLA+ modeling. In that example, you are trying to design the checkout and reservation policies for a library. In its model, there is a set of users who each want a set of books, and, every time step, one of them does an action based on these preferences. With every reservation policy, there are ways for a jerk to abuse the system, such as by reserving the same book over and over again so that no-one else can read it. With TLA+, it’s easy to check the property: if a user wants a book, then she eventually gets it.

This use of models — expressing all possible ways for the system to interact with users — really made the benefits of TLA+ click for me. And then I realized how I could have used it in the past.

My old company Apptimize builds software for A/B testing mobile apps. With Apptimize, if you want to know whether changing a button size in your app would increase upgrades, you could create the change by clicking around in a browser, and 10 minutes later have multiple variants running on a million people’s phones. Statistical validity is very important, and, with the dual state machines of Apptimize users starting and stopping and changing experiments, and app users uninstalling and reinstalling apps, there’s a lot of room for statistical error.

We had been finding bugs where you could enroll 5% of users in an A/B test, and then bump it up to 50%, and then down to 5%, and the chances of someone being unenrolled would be slightly too low. To catch one of these bugs, we’d have to think of a scenario, take the formulas from the code, and then do probability calculations.

After fixing a couple of these, we decided that allocating users to A/B tests this way was far too dangerous, and moved more of the decisions server-side. But, with a bit of finagling to fake the probabilistic aspects, we could have gotten it perfect with TLA+.

I’ve taught many engineers techniques that make it much easier to implement some policy without mistakes. But for showing that a 100%-correct implementation can’t be abused by someone trying to cheat the system — for that you need something like TLA+.

My Model of What You’ll Learn from Learning Models

The above example exemplifies the key takeaway that I think a system developer can take from learning TLA+: how to reduce a system to a model, and what questions you can ask of a model. Reducing the system to a model requires another idea unfamiliar to many programmers: how all the questions about timeouts and schedulers and inputs boil down to “and here a nondeterministic action may occur.”

In abstracting a system to a model, the system stops acting like a program, and new mental tools are needed. It’s here that Practical TLA+ falls short. Practical TLA+, despite the name, is a PlusCal book1. Leslie Lamport created PlusCal to make specifying state machines look like imperative programming. In doing so, one loses some of the concepts that are truly new.

For instance, one thing you can do with models is to have multiple models of the same system at different levels of detail, and show that one refines the other. This evolved from an idea introduced in 1971 by Niklaus Wirth in this article, where he showed how to rigorously evolve a high-level idea with steps like “pick an arbitrary square on the chessboard” into an actual implementation. Many of my students find this idea revelatory, and TLA+ offers a chance to actually play with it in a compiler rather than on pencil-and-paper. Hillel repeatedly introduces refinement only to explain that he won’t teach it: you need raw TLA+, not PlusCal.

Similarly, PlusCal’s imperative veneer leads to some confusing behavior around guards, expressed as PlusCal’s await keyword, which are half of understanding how to model systems with nondeterminism. You might look at the snippet x := 1; await y = 0 and think “it sets x to 1, and then blocks until something sets y to 0.” In fact, TLA+ actually tries to run both of these steps simultaneously, meaning that, in a state where y is not 0, the x := 1 can never run. This is most easily understood by looking at the actual TLA+ state formulas, which requires a mindset shift from imperative programming (“Give me an action!”) to state machines (“In this state, these transitions are possible”).

The final idea in this book useful to general system-building is temporal properties. Most programmers are taught about functional testing: if the program gets this input, then it gives this output. However, systems also care about temporal properties. These are properties about sequences of events, like: “If a user orders a pizza, they eventually receive it, or get their money back.” This property would be written in TLA+ as something like “Order pizza ⇒ ◇(receive pizza OR money back)”, where “◇” is the “eventually” operator. One chapter of Practical TLA+ is spent teaching this temporal logic, and I think having a language for temporal properties would be useful to any system designer. However, the book gives very little practice in expressing temporal properties. There are also many temporal properties that TLA+ cannot express, such as “Until an action is committed, a user may abort the action at any point.”

Conclusion:

Practical TLA+ is the first of its kind: a practical programming book about formal verification. It is still a rough book. Whenever a snippet of code repeats, the changes are supposed to be marked in bold, yet I’d often find the bolding missing. He once introduced a piece of syntax, and then didn’t use it till 100 pages later. I am not a typical reader of the book, seeing as I took my first model-checking course at age 18, and had the author on instant messenger the whole time. For everyone else, there’s a lot of places to get tripped up. He did create a forum for TLA+ users, so, for prospective TLA+-users, I can still recommend this book to if looking to start in the next year, and to wait for the second edition if not.

For the programmer looking for mind expansion but not code verification, I don’t think this book has much to offer. The benefit of cross-training comes from seeing ideas that are a subtle part of one discipline become a blatant part of another. Just as it’s easier to understand harmony by playing piano than by singing in a choir, new paradigms are valuable when they take concepts that are implicit in software design and give them syntax and compiler errors. For TLA+, these concepts are: modeling, nondeterminism, and temporal correctness properties. However, Practical TLA+ gives you little practice writing correctness properties, and does its best to hide the most mind-expanding parts of modeling (refinement) and nondeterminism (guards) from the reader. It does what it set out to do: turn formal verification into ordinary programming.

But, nonetheless, I can say that I went through Practical TLA+ in an hour a day for 10 days, I had fun doing so, and — hey — now I know TLA+.

Rating: 3.5/5

UPDATE 5/17/2019: Ron Pressler writes a response here. I had knowingly been loose on terminology in much of this review, and he called me out on a lot of it.

Acknowledgments

Thanks to Hillel Wayne, Chris Khoo, and Elliott Jin for their feedback on this post.


1 Hillel was adamant in saying that PlusCal is TLA+. My take: In terms of getting (almost) all the power of TLA+, yes, PlusCal is TLA+. In terms of being able to write code that closely maps to how it actually works, which is important for building a mental model, no it is not.