Sunday, March 28, 2021

Developer tools can be magic. Instead, they collect dust.

Update 6/14/21: Now available in Chinese.

I started working on advanced developer tools 9 years ago. Back when I started, “programming tools” meant file format viewers, editors, and maybe variants of grep. I’d mention a deep problem such as inferring the underlying intent of a group of changes, and get questions about how it compares to find-and-replace.

Times have changed. It’s no longer shocking when I meet a programmer who has heard of program synthesis or even tried a verification tool. There are now several1 popular products based on advanced tools research, and AI advances in general have changed expectations. One company, Facebook, has even deployed automated program-repair internally.

In spite of this, tools research is still light-years ahead of what’s being deployed. It is not unusual at all to read a 20 year-old paper with a tool empirically shown to make programmers 4x faster at a task, and for the underlying idea to still be locked in academia.

I’d like to give a taste of what to expect from advanced tools — and the ways in which we are sliding back. I will now present 3 of my favorite tools from the last 30 years, all of which I’ve tried to use, none of which currently run.

Reflexion Models

We often think of software in terms of components. For an operating system, it might be: file system, hardware interface, process manager. An experienced engineer on the project asked to make certain files write to disk faster will know exactly where to go in the code; a newcomer will see an amorphous blob of source files.

In 1995, as a young grad student at the University of Washington, Gail C. Murphy came up with a new way of learning a codebase called reflexion models.

First, you come up with a rough hypothesis of what you think the components are and how they interact:

Then, you go through the code and write down how you think each file corresponds to the components.

Now, the tool runs, and computes the actual connectivity of the files (e.g.: class inheritance, call graph). You compare it to your hypothesis.

Armed with new evidence, you refine your hypothesis, and make your mental model more and more detailed, and better and better aligned with reality.

Around this time, a group at Microsoft was doing an experiment to see if they could re-engineer the Excel codebase to extract out some high-level components. They needed a pretty strong understanding of the codebase, but getting it wouldn’t be so easy, because they were a different team in a different building. One of them saw Gail’s talk on reflexion models and liked it.

In one day, he created his first cut of a reflexion model for Excel. He then spent the next four weeks refining it as he got more acquainted with the code. Doing so, he reached a level of understanding that he estimates would have taken him 2 years otherwise.

Today, Gail’s original RMTool is off the Internet. The C++ analysis tool from AT&T it’s based on, Ciao, is even more off the Internet. They later wrote a Java version, jRMTool, but it’s only for an old version of Eclipse with a completely different API. The code is written in Java 1.4, and is no longer even syntactically correct. I quickly gave up trying to get it to run.

Software engineering of 2021: Still catching up to 1995.


The WhyLine

About 10 years later, at the Human-Computer Interaction Institute at Carnegie Mellon, Amy Ko was thinking about another problem. Debugging is like being a detective. Why didn’t the program update the cache after doing a fetch? What was a negative number doing here? Why is it so much work to answer these questions?

Amy had an idea for a tool called the Whyline, where you could ask questions like “Why did ___ happen?” in an interactive debugger? She built a prototype for Alice, CMU’s graphical programming tool that let kids make 3D animations. People were impressed.

Bolstered by their success, Amy spent another couple years working hard, building up the technology to do this for Java.


They ran a study. 20 programmers were asked to fix two bugs in ArgoUML, a 150k line Java program. Half of them were given a copy of the Java WhyLine. The programmers with the WhyLine were 4 times more successful than those without, and worked twice as fast.

A couple years ago, I tried to use the Java Whyline. It crashed when faced with modern Java bytecode.

MatchMaker

In 2008, my advisor, Armando Solar-Lezama, was freshly arrived at MIT after single-handedly reviving the field of program synthesis. He had mostly focused on complex problems in small systems, like optimizing physics simulations and bit-twiddling. Now he wanted to solve simple problems in big systems. So much of programming is writing “glue code,” taking a large library of standard components and figuring out how to bolt them together. It can take weeks of digging through documentation to figure out how to do something in a complex framework. Could synthesis technology help? Kuat Yessenov, the Kazakh genius, was tasked with figuring out how.

Glue code is often a game of figuring out what classes and methods to use. Sometimes it’s not so hard to guess: the way you put a widget on the screen in Android, for instance, is with the container’s addView method. Often it’s not so easy. When writing an Eclipse plugin that does syntax highlighting, you need a chain of four classes to connect the TextEditor object with the RuleBasedScanner.

class UserConfiguration extends SourceViewerConfiguration {
  IPresentationReconciler getPresentationReconciler() {
    PresentationReconciler reconciler = new PresentationReconciler();
    RuleBasedScanner userScanner = new UserScanner();
    DefaultDamagerRepairer dr = new 
    DefaultDamagerRepairer(userScanner);
    reconciler.setRepairer(dr, DEFAULT_CONTENT_TYPE);
    reconciler.setDamager(dr, DEFAULT_CONTENT_TYPE);
    return reconciler;
  }
}

class UserEditor extends AbstractTextEditor {
  UserEditor() {
    userConfiguration = new UserConfiguration();
    setSourceViewerConfiguration(userConfiguration);
  }
}
class UserScanner extends RuleBasedScanner {...}

If you can figure out the two endpoints of a feature, what class uses it and what class provides it, he reasoned, then you could ask a computer to figure out what’s in-between. There are other programs out there that implement the functionality you’re looking for. By running them and analyzing the traces, you can find the code responsible for “connecting” those two classes (as a chain of pointer references). You then boil the reference program down to exactly the code that does this — voila, a tutorial! The MatchMaker tool was born.

In the study, 8 programmers were asked to build a simple syntax highlighter for Eclipse, highlighting two keywords in a new language. Half of them were given MatchMaker and a short tutorial on its use. Yes, there were multiple tutorials on how to do this, but they contained too much information and weren’t helpful. The control group floundered, and averaged 100 minutes. The MatchMaker users quickly got an idea what they were looking for, and took only 50 minutes. Not too bad, considering that an Eclipse expert with 5 years experience took a full 16 minutes.

I did actually get to use Matchmaker, seeing as I was asked to work on its successor in my first month of grad school. Pretty nice; I’d love to see it fleshed out and made to work for Android. Alas, we’re sliding back. A few years back, my advisor hired a summer intern to work on MatchMaker. He instantly ran into a barrier: it didn’t work on Java 8.

Lessons

The first lesson is that the tools we use are heavily shaped by the choices of eminent individuals. The reason that Reflexion Models are obscure while Mylyn is among the most popular Eclipse plugins is quite literally because Gail C. Murphy, creator of Reflexion Models, decided to go into academia, while her student Mik Kersten, creator of Mylyn, went into industry.

Programming tools are not a domain where advances are “an idea whose time has come.” That happens when there are many people working on similar ideas; if one person doesn’t get their idea adopted, then someone else will a few years later. In programming tools, this kind of competition is rare. To illustrate: A famous professor went on sabbatical to start a company building a tool for making websites. I asked him why, if his idea was going to beat all the previous such tools, it hadn’t been done before. His answer was something like “because it requires technology that only I can build.”

The second lesson is that there is something wrong with how we build programming tools. Other fields of computer science don’t seem to have such a giant rift between the accomplishments of researchers and practitioners. I’ve argued before that this is because the difficulty of building tools depends more on the complexity of programming languages (which are extremely complicated; just see C++) than on the idea, and that, until this changes, no tool can arise without enough sales to pay the large fixed cost of building it. This is why my Ph. D. has been devoted to making tools easier to build. It is also why I am in part disheartened by the proliferation of free but not-so-advanced tools: it lops off the bottom of the market and makes these fixed-costs harder to pay off.

But the third lesson is that we as developers can demand so much more from our tools. If you’ve ever thought about building a developer tool, you have so much impressive work to draw from. And if you’re craving better tools, this is what you have to look forward to.


Sources


1 I’d list some, but I don’t want to play favorites. I’ll just mention CodeQL, which is quite advanced and needs no touting.

Liked this post?


Related Articles

29 comments:

  1. Same question is fascinating to me too.

    So much awesome ideas that went forgotten, and people still keep coding Java in Emacs.

    Also you have forgotten to put links:
    "I’ve argued before [links to blog posts] that this is because the ... "

    :)

    ReplyDelete
    Replies
    1. I feel bad for those who -actually- code in anything other than emacs. Poor sods.

      Delete
  2. In the second paragraph of Lessons: it seems like the sentence

    > Programming tools are not a domain where advances are not "an idea whose time has come."

    should only have one "not".

    ReplyDelete
    Replies
    1. Yep; already fixed before I saw this. Also fixed an outdated link to one of the paper (this post sat in my drafts folder for a long time). That should be the end of the copy-editing issues.

      Delete
    2. Since you wrote "should be the end of the copy-editing issues" I feel compelled to mention that I would've said "lops off" rather than "lobs off" but I will add that I'm only here reading the comments because it's a great post (imo), not just to annoy you with nit-picky criticisms :)

      Delete
  3. It doesn't seem like such a huge rift when you look at the widespread tools in industry. For example, dependency injection (Guice, Dagger) seems to make MatchMaker irrelevant for Java. Rather than have the tool find out how to connect the pieces, dependency injection automates it entirely.

    Not to mention tools that academics completely missed (like version control) because they don't experience the same problems practitioners experience.

    ReplyDelete
    Replies
    1. Hi J2KUN,

      I think you're you're making a bunch of leaps behind your first statement. I gave a very short description of MatchMaker, describing its core function as "finding out how two things become connected," not enough to actually understand what it does without reading more about it. I'm not seeing how DI and MatchMaker at all compete, nor how the problem in MatchMaker's study would be trivial had Eclipse been built with Guice. (I wanted to test this by finding a StackOverflow question about the kind of problem MatchMaker solves in a library that uses Guice....but 3 minutes of searching was not enough to let me actually find an open-source project that uses Guice.)

      Agreed that academics often miss many practical problems, but not that version control is an example. (According to Wikipedia, version control goes back to 1962!)

      Delete
    2. Dependency injection is not the same thing at all. To use dependency injection you have to already know which component it is that you need to inject to do the thing you want to do. MatchMaker, from what I understand, helps you figure that out.

      Delete
  4. Have you considered trying to get some of these tools included in software distros? That makes it much easier for developers to install them.

    ReplyDelete
  5. I don't think it's tools so much as academic programs written in pursuit of a degree don't run. And more generally programs that aren't maintained don't run.

    ReplyDelete
  6. I agree developer tools, in the realm of rapid application development (RAD), are not widespread or very advanced in the Java world. But seriously, Microsoft's Visual Basic IDE of the 1990s was more advanced than most of what I see today in the Java world. What Visual Studio 2019 can do out of the box, including the free community edition, outpaces anything I have seen in the Java domain.

    ReplyDelete
  7. Those tools are very interesting. How do you find these?

    ReplyDelete
    Replies
    1. A decade of reading papers and doing research in this space. :)

      Delete
  8. I'm going to use a broad-brush here, but none of these issues are anything to do with the tools in question - it's just what happens when software gets old and isn't used by anyone. The Acadaemia Software world is riddled with millions of examples of this, from course registration software to DNA sequencing scripts.
    One obvious solution.....If you want to keep software alive after you're not using it any more, make the source code available for others so that they can use it / fork it / enhance it / maintain it......OPEN SOURCE IT.

    ReplyDelete
  9. It's funny I wrote a rather long unfocused post, then just scrapped it all and am replacing it with:

    Thank you.

    ReplyDelete
  10. One reason tools collect dust is the nature of business. Specifically businesses that “grind along” and are willing to do something wrongly twice or thrice rather than take the time to do right the first time. It’s more common to see developers roll up their sleeves and start coding rather than take a minute to plan and think. Sure, there’s plenty of individual developers and groups that are more deliberate in their application of technology, but isn’t it more often just excited developers going for the gusto? In fact this attitude can be so pervasive that if someone were to even mention in a meeting some new tool (take Reflexion for instance) they could be laughed out of the room! Partly it’s ego that programmers don’t want to admit they need anything other than emacs to save the day (i know the emacs thing is a joke) and it’s partly the “not invented here” / “ain’t broke don’t fix it” mentality. People get used to grinding away and frankly don’t want to do anything about it. To boot, after a project goes bad you might find these same people in a “lessons learned” meeting complaining that the process wasn’t deliberate enough! This (as is everything) is the human dilemma. p.s. I would be happy if people would just realize the value of IDE refactoring + git!

    ReplyDelete
  11. Anyone else notice a trend in this article? All 3 examples given -- Java. I bet that old "even more off the internet" C++ version of Reflexion Tools still has a good chance of running, if you can find a copy.

    ReplyDelete
    Replies
    1. Ha. I was thinking that.

      I think it has more to do with Java being dominant in tools research for a couple decades because it was perceived as being friendly for tool development. (Also, because academics had a hard time believing Java stopped being "cool" in the mid-2000's and continued to equate "___ tool....for Java" with "practical".)

      I've literally spent several days this month trying to get a 10 year-old codebase in OCaml to run (not a tool for OCaml; just one in OCaml). Java is very far from the worst offender in builds becoming bitrotten.

      Delete
  12. Just out of curiosity, would these legacy java tools run if used on a VM with an older sdk/jdk installed? Think "dos box" but for old java apps. Or is there something you're trying to do that requires it to be executing on a modern platform?

    ReplyDelete
  13. Other cool tools, same era as Reflexion:
    https://ieeexplore.ieee.org/abstract/document/493433/, and more visualization debugging stuff that should be around by now
    https://dl.acm.org/doi/abs/10.1145/236337.236380

    ReplyDelete
  14. Do language meta-tools, like AST Explorer for JavaScript, provide a leg up for these kind of problems, letting the tool builder reach through the language to what they really want to juggle? It seems like they should, and there are more of these floating around now than there once were.

    ReplyDelete
    Replies
    1. Hi Lupestro,

      If I understand correctly, you're saying that AST Explorer provides some kind of magic that makes programming tools easier to build? I'm looking at AST Explorer, and it appears to be a JS parser connected to a JSON viewer. I am not seeing what makes it different from other parsers, like the ones used by all other source-level tools.

      Delete
  15. And then there's software patents which are still legal in the US...

    ReplyDelete
  16. Great article. Thank you.

    ReplyDelete
  17. The Lisp folks, in particular Lisp machine, experienced this wholesale.

    ReplyDelete
  18. This comment has been removed by a blog administrator.

    ReplyDelete