On Names

The Linux printk function has a number of logging levels, which include KERN_EMERG, KERN_ERR and others.

What, exactly, does EMERG mean? Emergency? Emerging? Emergent? The latter sounds like part of the title of a horror movie.

Or ERR – perhaps that's error, but it could be to err, or erroneous. Maybe even erogenous (nearly, and definitely my preference). Combine that with KERN (definition: "a part of the face of a type projecting beyond the body or shank") and the puerile possibilities are positively provocative.

Why is every index variable named i, j or k? Or, for nested loops, ii. Or my personal favorite, iii? The reason is because 60 years ago, when Fortran came out, variables starting with the letters i through n were, by default, integers. Few remember this, but most of us still mindlessly practice it. I suspect few readers have ever even used Fortran.

It's interesting that some believe long names yield self-documenting code (which isn't true) yet so many of us abbreviate to the point of obfuscation. The code has to do two things: work, and express its intent to a future version of yourself or to some poor slob faced with maintaining it all years from now after you're long gone. If it fails to do either it's junk. I've met plenty of developers who say they really don't care what happens after they move on to another job or retire. But if we are professionals we must act professionally and do good work for the sake of doing good work.

Clarity is our goal. Names are a critical part of writing clear code. It's a good idea to type a few additional characters when they are needed to remove any chance of confusion.

Pack the maximum amount of information you can into a name. I had the dreary duty of digging through some code last month where no variable name exceeded three characters. And there were a lot of variables! The mess was essentially unmaintainable, which seemed to be one of the author's primary goals.

We've known how to name things for 250 years. Carl Linnaeus taught scientists to start with the general and work towards the specific. Kingdom, Phylum, Class, Order, Family, Genus, Species. Since in the West we read left-to-right, seeing the Kingdom first gets us in the general arena, and as our eyes scan rightward more specificity ensues. So read_timer is a really lousy name. Better: timer_read. timer_write. timer_initialize. A real system probably has multiple timers, so use timer0_read, timer0_write. Or even better, timer_tick_read, timer_tick_write. One could also make a good argument for tick_timer_read.

Avoid weak and non-specific verbs like "handle," "process" and "update." I have no idea what "ADC_Handle()" means. "ADC_filter()" conveys much more information.

I do think that short toss-away names are sometimes fine. A tight for loop can benefit from a single-letter index variable. But if the loop is more than a handful of lines of code, use a more expressive name.

Don't use acronyms and abbreviations. In "Some Studies of Word Abbreviation Behavior" (Journal of Experimental Psychology, 98(2):350-361,) authors Hodge and Pennington ran experiments with abbreviations, and found that a third of the abbreviations that were obvious to those doing the word-shortening were inscrutable to others. Thus, abbreviating is a form of encryption, which is orthogonal to our goal of clarity.

There are two exceptions to this rule. Industry-standard acronyms, like "USB", are fine. Also acceptable are any abbreviations or acronyms defined in a library somewhere – perhaps in a header file:

/*  Abbreviation Table
         * dsply    == Display (the verb)
         * disp     == Display (our LCD display)
         * tot      == Total
         * calc     == Calculation
         * val      == Value
         *  mps      == Meters per second
         * pos      == Position
 */

Does a name have a physical parameter associated with it? If so, append the units. What does the variable "velocity" mean? Is it in feet/second, meters/second or furlongs/fortnight? Much better is "velocity_mps", where mps was defined as meters/second in the header file.

One oh-so-common mistake is to track time in a variable with a name like "time." What does that mean? Is it in microseconds, milliseconds or clock ticks? Add a suffix to remove any possibility of confusion.

The Mars Climate Orbiter was lost due to a units mix-up. We can, and must, learn from that $320 million mistake.

Published March, 2016