On Engineering Notebooks

Summary: Debugging is hard. We need to slow down to get speed up.

I attended college in the mainframe days. The school had one computer for its 40,000 students, a huge dual-processor Univac 1108. Eight or ten tape drives whirled; a handful of disk drives hummed, and two drum storage units spun their six-foot-long cylinders. After 3:00 AM the machine was given over to the systems staff for maintenance of the various utilities. We'd sometimes take over the entire machine to play Space Wars; the university's sole graphic display showed a circle representing a planet and another representing the spaceship, which the players could maneuver. Part of the game's attraction was that, with the $10 million dollars of computational power fully engaged, it could even model gravitational force in real time.

There was some access to the machine via ASR-33 teletypes (properly called "teletypewriters"). But most folks submitted jobs in decks of punched cards. A box held 2000 cards - a 2000 line program. It wasn't unusual to see grad students struggling along with a stack of five or more boxes of cards holding a single program. Data, too, was on the cards. And, how one felt after dropping a stack of cards and seeing them scrambled all over the floor!

A user would walk up to the counter in the computer science building with the stack of cards. The High Priest of Computing gravely intoned turn-around times, typically 24 hours. That meant the entire edit, compile, test cycle was a full day and night. Stupid errors cost days, weeks and months.

But stupid errors abounded. If a FORTRAN program had more than 50 compile-time flaws the compiler printed a picture of Alfred E. Neumann, with the caption "this man never worries, but from the look of your code, you should."

Something was invented to find the most egregious mistakes: playing computer. You'd get a listing and execute the code absolutely literally in your head.

That technique has evolved into the modern process of inspections, which too few of us use. Instead, the typical developer quickly writes some code, goes through the build, and starts debugging. Encountering a bug he may quickly change that ">=" to a ">", rebuild, and resume testing. The tools are so good the iterations happen at light-speed, but, unfortunately, there's little incentive to really think through the implications of the change. Subtle bugs sneak in.

I recommend that developers use engineering notebooks or their electronic equivalent. When you run into a bug take your hands off the keyboard and record the symptom in the notebook. Write down everything you know about it. Some debugging might be needed - single stepping, traces, and breakpoints. Log the results of each step. When you come up with a solution, don't implement it. Instead, write that down, too, and only then fix the code.

The result is that instead of spending two seconds not really thinking things through, you've devoted half a minute or so to really noodling out what is going on. The odds of getting a correct fix go up.

But wait, there's more. Go over your engineering notebook every 6 to 12 months. The patterns it will reveal are surprising. When I started doing this it became embarrassingly clear that I'm always mixing up "==" and "=". So now I'm quite careful when typing those operators, so never make that mistake.

This is a feedback loop that improves our abilities.

But there's even more! Have you ever watched (or participated in) a debugging session that lasts days or weeks? There's a bug that's just impossible to find. We run all sorts of tests. Almost inevitably we'll forget what tests we ran and repeat work done a day or a week ago. The engineering notebook is a complete record of those tests. It gives us a more scientific way to guide our debugging efforts and speeds up the process by reminding us of what we've already done.

The engineering notebook is top secret. No boss should have access to it. This is a tool designed for personal improvement only.

What's your take? Do you use any sorts of tools to guide your debugging efforts (outside of normal troubleshooting tools)?

Published October 29, 2014