Tools for Clean Code

By Jack Ganssle

Tools for Clean Code

Back when dinosaurs roamed the Earth most of our computer work was on punched card mainframes. Some wag at my school programmed the Fortran compiler to count error messages; if your program generated more than 50 compile-time errors it printed a big picture of Alfred E. Neuman with the caption "This man never worries. But from the look of your program, you should."

This bit of helpful advice embarrassed many till the University's administrators discovered that folks were submitting random card decks just to get the picture. Wasting computer time was a sin not easily forgiven, so the systems people were instructed to remove the compiler's funny but rude output. They, of course, simply buried the picture as a challenge to our cleverness.

How times have changed! Not only do we no longer feed punched cards to our PCs, but if only we got just 50 errors or warnings from a compilation of new code!

I've long held the theory that the reason developers don't ship code with syntax errors is because the compiler aborts, not producing an object file. Watch someone compiling. Warning messages fly off the screen at seemingly the speed of light, all too often as disregarded as "no tailgating" admonishments.

It blows my mind. Here's a tool almost shouting that the code may be flawed. That assignment looks suspicious. Do you really want to use a pointer that way?

With deaf ears we turn away, link and start debugging. Sure enough, some of these potential problems create symptoms that we dutifully chase down via debugging, the slowest possible way. Some of the flaws don't surface till the customer starts using the product.

Even more horrifying are the folks who disable warnings, or always run the compiler with the minimum level of error-checking. Sure, that reduces output, but it's rather like tossing those unread nastygrams from the IRS into the trash. Sooner or later you'll have to pay, and paying later always costs more.

Why do I think warnings are critical program insights we can't ignore?

Build a PC product and count on product lifecycles measured in microseconds. Embedded systems, though, seem to last forever. That factory controller might run for years or even decades before being replaced. Surely, someone, sometime, will have to enhance or fix the firmware. In three or ten years, when resurrecting the code for an emergency patch, how will that future programmer respond to three hundred warnings screaming by? He won't know if the system is supposed to compile so unhappily, or if it's something he did wrong when setting up the development system from old media whose documentation was lost.

Maintenance is a fact of life. If we're truly professional software engineers, we must design systems that can be maintained. Clean compiles and links are a crucial part of building applications that can be opened and modified.

Did you know that naval ships have their wiring exposed, hanging in trays from the overhead? Fact is, the electrical system needs routine and non-routine maintenance. If the designers buried the cables in inaccessible locations the ship would work, right out of the shipyard, but would be un-maintainable; junk, a total design failure.

Working is not the sole measure of design success, especially in firmware. Maintainability is just as important, and requires as much attention.

Beyond maintenance, when we don't observe warnings we risk developing the habit of ignoring them. Good habits form the veneer of civilization. Dining alone? You still probably use utensils rather than lapping it up canine-like. These habits means we don't even have to think about doing the right thing during dinner with that important date. The same goes for most human endeavors.

The old saying "the way to write beautiful code is to write beautiful code for twenty years" reflects the importance of developing and nurturing good habits. Once we get in the so-easy-to-acquire habit of ignoring warning messages we lose a lot of the diagnostic power of the compiler.

Of course spurious warnings are annoying. Deal with it. If we spend 10 minutes going through the list and find just one that's suggestive of a real problem, we'll save hours of debugging.

We can and should develop habits that eliminate all or most spurious warnings. A vast number come from pushing the C standard too hard. Stick with plain vanilla ANSI C with no tricks, no implied castings, and that forces the compiler to make no assumptions. The code might look boring, but it's more portable and generally easier to maintain.

Did you know that the average chunk of code contains between 5 and 20% errors before we start debugging? (reference 1). That's 500 to 2000 bugs in a little 10,000 line program. My informal data, acquired from talking to many, many developers but lacking a scientific base, suggests we typically spend about half of the project time debugging. So anything we can do to reduce bugs before starting debug pays off in huge ways.

We need a tool that creates more warnings, not fewer. A tool that looks over the code and finds the obvious and obscure constructs that might be a problem; that says "hey, better check this a little closer! it looks odd."

Such a tool does exist and has been around practically since the dawn of C. lint (named for the bits of fluff it picks from programs) is like the compiler's syntax-checker on steroids. lint works with a huge base of rules and points out structures that just seem weird. In my opinion, a lint is an essential part of any developer's toolbox, and is the first weapon against bugs. It will find problems much faster than debugging.

How is lint different than your compiler's syntax checker? First, it has much stronger standards for language correctness than the compiler. For instance, most lints track type definitions - as with typedef - and resolve possible type misuse as the ultimate types are resolved and

lint, unlike a compiler's syntax checker, is more aware of a program's structure, so is better able to find possible infinite loops, and unused return values. Will your compiler flag these as problems?

b[i]= i++;
status & 2 == 0

lint will.

But much more powerfully, lints can look at how multiple C files interact. Separate compilation is a wonderful tool for keeping information hidden, to reduce file size, and to keep local things local. But it means that the compiler's error checking is necessarily limited to just a single file. We do use function prototypes, for instance, to help the compiler spot erroneous use of external routines, but lint goes much further. It can flag inconsistent definitions or usage across files, including libraries.

Especially in a large development project with many programmers, lint is a quick way to find cross-file problems.

The downside to lint, though, is that it can be very noisy. If you're used to ignoring a handful of warning messages from the compiler then lint will drive you out of your mind. It's not unusual to get 30,000 messages from linting a 1000 line module.

The trick is to train the product. Every lint offers many different configuration options aimed to tune it to your particular needs. Success with lint - as with any tool - requires a certain amount of your time. Up front, you'll lose productivity. There's a painful hump you'll have to overcome before gaining its benefits.

Arrows or Machine Guns?

I'm sure you've seen the comic. A medieval battle wages in the background. Arrows, catapults and boiling oil are the technological state of the art (oh, for the days of less mechanized and efficient warfare!). A salesman, machine gun in hand, is trying to get the general's attention, but his aide-de-camp bushes him off, telling him that his boss is just too busy fighting a war to deal with the intruder.

When I show this to developers they invariably shake their heads with a mocking smile, wondering who could possibly be so short sighted. Sometimes you just have to stop for a bit to adopt a new technology or idea.

When I was a tool vendor my biggest frustration was that customers used only the simplest features of our products; virtually none took the time to learn the more powerful functions that would ultimately save them lots of time. When I talk to tool vendors today they share the same complaint.

We're all very busy; impossible deadlines and unexpected problems fill the days to overflowing. To stop and learn a new tool seems an impossible demand on our time. Clearly it's insane to halt development every time we hear about the next new thing. But we're in a dysfunctional environment when such pressure never lets up.

I despair at times for our profession. So many developers never get a chance to stop. When a project finishes it's invariably late, so the next one is already behind schedule. We jump from one fire to the next. It took 20 years for C to become common in embedded systems. Why? Maybe because developers are too panicked to learn new things.

I have no solutions, other than to observe that sooner or later your boss will die, be promoted, or move to sales (much like dying, I suppose). Then you'll be in charge. Change will come if you use the painful lessons and give your people a chance to pick up new ideas and learn better ways to do their job.

Find some time to learn lint, and to tune it to your application. When I talk to folks who use it, 9 out of 10 are wild about how it has helped them be more productive.

Resources

Commercial and free lints abound; while all are similar they do differ in the details of what gets checked and how one goes about teaching the product to behave in a reasonable fashion.

Probably the most popular of all PC-hosted commercial lints is the $239 version by Gimpel Software (www.gimpel.com). This product has a huge user base and is very stable. It's a small price for such fantastic diagnostic information! particularly in the embedded world where compilers may cost many thousands of dollars.

LCLint is a freebie package whose C source code is also available (http://lclint.cs.virginia.edu/). Linux, other Unix, and PC distributions are all available.

Another factor in writing maintainable software is to follow a consistent set of rules - a standard. The standard defines the prettiness parameters (brace placement, indentation, etc), but goes far beyond these superficial charms. The standard tells the team how to name variables, format comments, limit function sizes and a host of other rules.

Prior to the metric system - a standardized system of units and measures - scientists had trouble communicating in quantitative terms. Each spoke a different dialect of science. We have the same sort of Babel in the software community today; though C and C++ are standards, each of us employ them in very stylistically different manners. Worse, most  of us switch styles at will so even a single module has no consistency.

But even in the best of cases when we have and use a software standard human frailty means we'll slip up. Use a tool to check your code against your standard. Parasoft's $995 CodeWizard (www.parasoft.com) compares your source against a canned set of 150 rules, flagging violations a la lint.

If CodeWizard's rules were set in stone I'd chuck the product in a heartbeat. Happily they are extensible and modifiable. It's pretty easy to define the checks to match your company's software standard. Does this take an up-front commitment of time? Of course.

Conclusion

A half dozen times a year I'll watch a panicked developer repeatedly invoke the compiler and linker manually. The reason? Invariably it's because he's "too busy" to set up make files. Astonishing.

Equally astonishing is how many of us refuse to use a lint or lint-like product for the very same reason: it takes time to train the thing to behave reasonably. Most tools require an investment of both money and time before you reap benefits. I know it's hard to steal precious hours from a project to tune the development environment, but the alternative is repeating the same problems forever.

Sometimes it's easiest to learn how to do the right thing by looking at wrong examples. Check out "How To Write Unmaintainable Code" at http://mindprod.com/unmain.html.

Two lessons from the site: If God didn't want us to use global variables, he wouldn't have invented them. Rather than disappoint God, use and set as many global variables as possible.

And finally: If you give someone a program, you will frustrate them for a day; if you teach them how to program, you will frustrate them for a lifetime.

Reference 1: A Discipline for Software Engineering, Watts Humphrey, Addison Wesley, Reading, MA 1995, ISBN 0-201-54610-8. Also see the Software Engineering Institute's data (www.sei.cmu.edu) which suggests that at least 6% of all pre-tested code is buggy.