We were troubleshooting an application a few days ago and, after realizing we’d discovered a design flaw, my co-worker Jeremy quipped “It’s not working like it’s supposed to.”
This got me thinking about the nature of errors in software and brought me all the way back to my early computer science courses at UC Davis.
Compiler errors are the best kind of errors. They happen early, they happen often, and they’re easily fixed. If the compiler chokes on my lack of a semicolon I can fix it in about 10 seconds (total time from beginning of compilation to applied remedy). With background compilation in tools like Visual Studio .NET and Eclipse, it’s less than that.
Runtime errors are a bit nastier; they creep up on you once you’ve started the app and *bam* knock you to the ground with a nasty crash. With a runtime error you must take the time to compile the code, start the application, begin interacting with it, and at some point watch it crash.
If you know what the problem is you can fix it in 30 seconds or less, but the additional time spent finding the bug, time that would not have been spent had the compiler found the bug, makes runtime errors creep into the “over 1 minute” mark on a good day. On a bad day you either have to step through the code (if you have that capability), look in the error log, or start adding “print” statements here and there until you figure out where your leak is. Runtime errors are nasty, but at least you have a clue where your problem is located.
Logic errors are the worst of the worst. These include design errors caused by faulty architecture or flow diagrams, or a miscalculation by the programmer. Nothing crashes. Nothing dies. And that’s the worst part. Logic errors are beasts, and are not typically discovered until much later in the process. In an application with any kind of complexity you often need to trace through multiple files, and may even scour through the database looking for bad data. Yes, logic errors are the “I spent 6 hours only to realize I had the minus sign in the wrong place” kind of errors.
Point #1: The earlier you find an error the less time it takes to fix.
Strong Typing Is Your Friend
It follows logically that any error caught by the compiler saves you gobs of time. This is why I’m such an adamant fan of strong typing, where variables can only contain one type of value, such as a string or integer. Languages like Java and .NET follow this pattern. In languages like Perl and PHP you can stuff pretty much any value into anything and it won’t crash while being compiled (or interpreted), but will tend to generate a runtime or logic error somewhere down the line. [Thanks to Ben for the correction on static vs. strong typing (see comments for more information)]
Although the flexibility of weak typing is appealing for cranking out small projects very quickly (since you’re allowed to be pretty sloppy), the payback on the debugging side tips the scales much in favor of strong typing for anything larger than a small application.
Point #2: Use strongly-typed languages. The more the compiler catches, the less time you spend debugging.
If I Only Had A Brain
Given that compiler errors are so dreamy and logic errors so ghastly, it begs the question: why don’t we build compilers that catch more errors; why don’t they find runtime and (heaven forbid) logic errors while they’re in there mucking around with code. Wouldn’t it be amazing if a compiler told you “You’re missing a semicolon, and I think you’re balances aren’t going to add up because you’re missing a minus sign”?
The short answer is because it’s really, really hard. Until data is pulled from a database, a file is opened, or user input is given, the compiler has no idea which paths the execution might follow.
Although compilers have become smarter over the years (VS.NET warns me when I’m doing a number of stupid things), they haven’t reached the point of employing Artificial Intelligence (AI) in an attempt to see the future. Compilers operate with no a priori knowledge of the code beyond the language syntax definitions, and it’s nearly impossible to determine how a person, file or database is going to interact with an application until it actually happens. But then again, it’s nearly impossible to provide accurate search results over billions of documents in a fraction of a second to millions of simultaneous users, and I think there’s a company out there making a few bucks doing that these days.
This problem is hard, but not unsolvable. Computer languages are little more than semantic rules governing communication between developer and machine; semantic rules that are very similar to our written languages such as English, Spanish, German, etc… If Google can build a tool to translate websites through gobs of pattern matching (a 20 billion word corpus), why can’t we build a compiler that uses a hundred million lines of code as a corpus, learning the basic patterns and using them as templates by which to make judgments on code it’s compiling? Since software is based on recurring patterns (open a file, loop through records, parse some things), this is a realistic approach.
From Joel on Software: “A very senior Microsoft developer who moved to Google told me that Google works and thinks at a higher level of abstraction than Microsoft. ‘Google uses Bayesian filtering the way Microsoft uses the if statement,’ he said. That’s true.”
Compilers these days use the if-statement approach, isn’t it time we move to a higher level of abstraction?
Point #3: If we could build a compiler to catch runtime and logic errors, we would save millions of developer hours per year.
How much is that worth?