The Future of Compilers: Saving Millions of Developer Hours

We were troubleshooting an application a few days ago and, after realizing we’d discovered a design flaw, my co-worker Jeremy quipped “It’s not working like it’s supposed to.”

This got me thinking about the nature of errors in software and brought me all the way back to my early computer science courses at UC Davis.

Compiler errors are the best kind of errors. They happen early, they happen often, and they’re easily fixed. If the compiler chokes on my lack of a semicolon I can fix it in about 10 seconds (total time from beginning of compilation to applied remedy). With background compilation in tools like Visual Studio .NET and Eclipse, it’s less than that.

Runtime errors are a bit nastier; they creep up on you once you’ve started the app and *bam* knock you to the ground with a nasty crash. With a runtime error you must take the time to compile the code, start the application, begin interacting with it, and at some point watch it crash.

If you know what the problem is you can fix it in 30 seconds or less, but the additional time spent finding the bug, time that would not have been spent had the compiler found the bug, makes runtime errors creep into the “over 1 minute” mark on a good day. On a bad day you either have to step through the code (if you have that capability), look in the error log, or start adding “print” statements here and there until you figure out where your leak is. Runtime errors are nasty, but at least you have a clue where your problem is located.

Logic errors are the worst of the worst. These include design errors caused by faulty architecture or flow diagrams, or a miscalculation by the programmer. Nothing crashes. Nothing dies. And that’s the worst part. Logic errors are beasts, and are not typically discovered until much later in the process. In an application with any kind of complexity you often need to trace through multiple files, and may even scour through the database looking for bad data. Yes, logic errors are the “I spent 6 hours only to realize I had the minus sign in the wrong place” kind of errors.

Point #1: The earlier you find an error the less time it takes to fix.

Strong Typing Is Your Friend
It follows logically that any error caught by the compiler saves you gobs of time. This is why I’m such an adamant fan of strong typing, where variables can only contain one type of value, such as a string or integer. Languages like Java and .NET follow this pattern. In languages like Perl and PHP you can stuff pretty much any value into anything and it won’t crash while being compiled (or interpreted), but will tend to generate a runtime or logic error somewhere down the line. [Thanks to Ben for the correction on static vs. strong typing (see comments for more information)]

Although the flexibility of weak typing is appealing for cranking out small projects very quickly (since you’re allowed to be pretty sloppy), the payback on the debugging side tips the scales much in favor of strong typing for anything larger than a small application.

Point #2: Use strongly-typed languages. The more the compiler catches, the less time you spend debugging.

If I Only Had A Brain
Given that compiler errors are so dreamy and logic errors so ghastly, it begs the question: why don’t we build compilers that catch more errors; why don’t they find runtime and (heaven forbid) logic errors while they’re in there mucking around with code. Wouldn’t it be amazing if a compiler told you “You’re missing a semicolon, and I think you’re balances aren’t going to add up because you’re missing a minus sign”?

The short answer is because it’s really, really hard. Until data is pulled from a database, a file is opened, or user input is given, the compiler has no idea which paths the execution might follow.

Although compilers have become smarter over the years (VS.NET warns me when I’m doing a number of stupid things), they haven’t reached the point of employing Artificial Intelligence (AI) in an attempt to see the future. Compilers operate with no a priori knowledge of the code beyond the language syntax definitions, and it’s nearly impossible to determine how a person, file or database is going to interact with an application until it actually happens. But then again, it’s nearly impossible to provide accurate search results over billions of documents in a fraction of a second to millions of simultaneous users, and I think there’s a company out there making a few bucks doing that these days.

This problem is hard, but not unsolvable. Computer languages are little more than semantic rules governing communication between developer and machine; semantic rules that are very similar to our written languages such as English, Spanish, German, etc… If Google can build a tool to translate websites through gobs of pattern matching (a 20 billion word corpus), why can’t we build a compiler that uses a hundred million lines of code as a corpus, learning the basic patterns and using them as templates by which to make judgments on code it’s compiling? Since software is based on recurring patterns (open a file, loop through records, parse some things), this is a realistic approach.

From Joel on Software: “A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. ‘Google uses Bayesian filtering the way Microsoft uses the if statement,’ he said. That’s true.”

Compilers these days use the if-statement approach, isn’t it time we move to a higher level of abstraction?

Point #3: If we could build a compiler to catch runtime and logic errors, we would save millions of developer hours per year.

How much is that worth?

Start Small, Get Big
Growth Secrets for Self-Funded Startups. It'll Change Your Life.
What you get for signing up:
  • A 170-page ebook collecting my best startup articles from the past 5 years
  • Previously unpublished startup-related screencasts
  • Exclusive revenue-growing techniques I don't publish on this blog
"The ideas and information Rob provides should be required reading for anyone that wants to create a successful business on the web." ~ Jeff Lewis
Startups for the Rest of Us...
If you're trying to grow your startup you've come to the right place. I'm a serial web entrepreneur here to share what I've learned in my 11 years as a self-funded startup founder. Luckily several thousand people have decided to stick around and join the conversation.

For more on why you should read this blog, go here.


#1 http:// on 04.27.06 at 5:51 pm

Amen, preach it brother! Over the years, very few of my peers understood the argument made in this blog. My advocacy of strongly-typed static languages like Ada and Eiffel was not welcome in the fever swamps of C/C++ I have been forced to slog through over the years. Ruby is a great language and seems to me to be a step up the abtsraction ladder (albeit a small step) from muych of what’s out there. What you speak of here (AI to find logic errors) sounds great, like a quantum leap!

#2 http:// on 04.27.06 at 7:56 pm

Point #1: Strong typing and static typing aren’t the same thing, and you shouldn’t confuse the two in your article. Strong / weak typing refers to the presence of a type system at all: weakly typed languages like C have less of a type system than strongly typed languages like java and ruby. Static / dynamic typing refers to when the type system is used. In statically typed languages like Java, the type system is used at compile time and runtime. In dynamically typed languages like Ruby, the type system is used only at runtime. Not the same thing at all! Point #2: If we had a compiler that was “smart” enough to find logic errors, couldn’t it just write the program for us in the first place?

#3 rwalling on 04.28.06 at 3:05 am

Good points, Ben. Point #1 See the correction I made in the article’s text regarding strong vs. static typing. Point #2 Pattern matching and AI do not equate to human intelligence. Would you say that since a grammar checker can recognize grammatical errors in writing that it could write a masters thesis or a brilliant work of fiction? Many tasks, including programming, which some say is more art than science (See Paul Graham’s book Hackers & Painters for more info), take more than pattern matching and AI, no matter how well executed.

#4 http:// on 04.28.06 at 2:34 pm

Do even a small project on some decent strongly but dynamically typed language like Smalltalk and you re-think part of your ideas. The problem is that not every compiler check is for free. And most of people that learned C as their first language don’t even realise how much the static typing cost. I doubt benefits of the static typechecking cover its costs.

#5 John Wood on 05.05.06 at 2:42 am

I think the key is to reduce the distance between the articulation of the requirements, and the code you type in. We need higher level code, written to the right domain model. It should be written by people who understand the problem well, and it should be peer-reviewed by someone else who understands the problem well. When code is high-level enough, then these logic errors *do* show up as compile time errors.