The Perils of NMI
NMI is a critical resource, yet all too often it's misused.
Published in Embedded Systems Programming, April 1991
By Jack Ganssle
Wise amateurs fear interrupts. Fools go where wise men fear to tread. Normal sequential code is hard enough to understand, code, and debug. Toss in a handful of asyncronous events that randomly change the processor's execution path, perhaps thousands of times per second, and you have a recipe for disaster.
Yet interrupts are an important fact of life for all real time systems. No experienced programmer would dream of replacing a clean interrupt service routine with polled I/O, particularly where fast I/O response is required.
In fact, interrupts are the both the best and worse microprocessor feature. Well thought out interrupt-driven code will be reasonably easy to write, debug and maintain. A poorly conceived interrupt routine is probably the worst possible software to work on. Because interrupts are so important to embedded systems, it is vital to become proficient with their use.
If interrupts are tough to work with, then the non-maskable interrupt (NMI) is the true killer of the business. Be careful before you connect a peripheral to your processor's NMI input - think through the problems carefully.
Almost every processor has some sort of NMI signal, though it may be called something else. On the 68000, a level 7 interrupt cannot be masked, and is equivalent to NMI. Some 8051-family CPUs have no non-maskable interrupt, an idea that is sort of appealing in terms of enforcing interrupt discipline.
I'm a firm believer in restricting NMI to those conditions that are truly unusual and of momentous importance. Quite a few designers use NMI as a general purpose interrupt, a practice that usually spells disaster.
When timing gets tight, the code can easily disable a conventional interrupt. Indeed, the very assertion of an interrupt signal automatically turns all interrupts off until the software explicitly reenables them, giving the code a clean window to process a high priority task. Not so with NMI. An NMI at any time will interrupt the CPU - no ifs, ands or buts. As long as the hardware supplies NMIs to the processor, it will stop whatever it's doing and vector through the NMI handler.
The very fact that NMI can never be disabled makes it ideal for handling a small but vital class of extremely high priority events. Chief among these is a power failure. If a system must die gracefully, then hardware that detects the imminent loss of power can assert NMI to let the software park disk heads, put moving sensors into a "safe" state, copy important variables from RAM to non-volitile storage, and generally prepare for being down.
Modern power supplies have little reserve capacity. Old linear designs had massive filtering capacitors that acted like batteries with several seconds of reserve capacity. Today's off-line switchers use comparatively tiny capacitors; smart electronics does the filtering. When the AC power goes down, the switcher's output quickly follows suit.
During the short time it takes for power to trail away the code may very well be executing with interrupts disabled. Only NMI is guaranteed to be available at all times. Power fail is such an important event, that NMI is really the only option for notifying the software of power's impending demise.
Perhaps more should be said about power fail circuits at this point, since so many suffer from serious design flaws. Most embedded systems ignore power fail conditions. Running ROM based code with no dangerous or critical external hardware, they can restart without harm from the top when power resumes. However, two types of systems require power-fail management hardware and software. The first category are those systems controlling moving objects; a disk controller should park the head, a robot should stop all motors, and an X-ray system should shut down the beam.
The other class are systems that preserve transient data through a power-up cycle. A data acquisition system might need to keep logged data even when power goes down, an instrument sometimes has to save painfully collected calibration constants, and a video game should remember high scoring individuals' initials and totals.
Decaying Power
Far too many designs rely on nothing more than battery backed up static RAMs or some true non-volitile device like an EEPROM to store data through multiple on/off cycles. More often than not these schemes work, but all will sooner or later fail. Let's consider what happens when the AC power fails.
Without AC, the power supply stops working. The computer continues to run from the energy stored in the supply's output capacitor. The amount of time left before the computer goes haywire is proportional to the size of the capacitor in microfarads and inversely proportional
Until the computer's 5 volts decays to about 4.75 it continues to run properly. At the 4.75 volt level most of the system's chips are no longer operating in their design region. No one can predict what will happen with any certainty.
At about 4.8 to 4.9 volts the well-designed power fail circuit will inject an NMI into the computer (some detect missing AC cycles, a better but more expensive approach). Probably the system has only milleseconds before Vcc decays to the 4.75 volt region of instability. The NMI routine should quickly shut down external events and save critical variables.
After processing the power fail condition, the computer and external I/O is all in a safe state. The voltage level continues to decline past 4.75 volts, eventually reaching zero. Unfortunately, the supply's capacitor decays exponentially. It will provide something between zero and 4.75 volts for a comparatively long time (perhaps seconds).
What does the CPU chip, memories, and glue logic do with, say, 4 volts applied? No one knows. No vendor will guarantee any behavior under the 4.75 volt level. Frequently the program just runs wild, executing practically random instructions. Your carefully saved data or meticulously protected I/O could be destroyed by rogue instructions!
No power fail circuit is complete unless it clamps the reset line whenever power is less than the magic 4.75 volt level. A suitable circuit keeps the CPU in a reset state, preventing wild execution from corrupting the efforts of the NMI power save routine. Motorola sells a 3-terminal reset controller for less than a dollar which will hold reset down in low Vcc conditions.
Consider another case: suppose the power grid's sadly overload summertime generating capacity experiences a brownout. If the line drops from 110 VAC to, say, 80 volts, what happens to the +5 volt output from your system's power supply? Most likely it will go out of regulation, giving perhaps 3 or 4 volts until the 110 input level is reestablished. Hopefully the power fail circuit will assert an NMI to the processor chip. Using the conventional resistor/capacitor unclamped reset circuit, the reset input will decline only to the 3-4 volt level, not nearly low enough to force a reset when power comes back.
The reset clamping circuit will not only keep the CPU in a safe state; in this brownout case it will also ensure that the system restarts properly when +5 volts is reestablished.
Regardless, NMI is the only reasonable interrupt choice for power fail detection.
NMI Abuse
Unfortunately, NMI is widely abused as a general purpose interrupt. Use NMI only for events that occur infrequently. Never substitute it for poor design.
It's not too unusual to see a divider circuit driving NMI, generating hundreds or thousands of interrupts per seconds. Usually these designs start life using a reasonable maskable interrupt. As the programmers debug the system they find the CPU occasionally misses an interrupt, so they switch over to NMI. This is a mistake. If the code misses interrupts, there is a fundamental flaw in its design that NMI will not cure.
Your code will miss interrupts only if some bottleneck keeps them disabled for too long. Always design the code to keep interrupts disabled only while servicing the hardware. Reenable them as soon as possible. With good reentrant design, interrupts should never be off for more than a few tens of microseconds.
On the Edge
Quite a few processors implement NMI as an edge sensitive interrupt. This guarantees that even a breathtakingly short pulse will set the CPU's internal NMI flip flop, so the interrupt simply cannot be missed. It might, however, cause several kinds of nasty problems.
Suppose the input comes from the real world, perhaps after having been transmitted a few feet. Without proper pulse shaping circuitry, the signal could easily have ragged edges or even multiple, closely spaced transitions. Maskable interrupts live quite happily with short bouncing on their lines, since the first transition will make the processor disable the input and start the ISR. Even the fastest code will take a few microseconds to service and reenable the interrupt, by which time the transients will be long gone. NMI cannot be disabled; every bit of bounce will reinitiate the NMI service routine. The result: one real interrupt might masquerade as several independent NMIs, each one pushing onto the stack and recalling the ISR.
Edge sensitive inputs respond when the input voltage crosses some threshold. Imperfect digital circuits give a rather broad window to the threshold. If the NMI input signal is perfectly clean but moves slowly from the idle to the asserted state, it stays within the threshold region for far too long, sometimes causing multiple NMI triggers.
Finally, the edge sensitive nature of the NMI signal renders it susceptible to every stray bit of electrical noise. A clean NMI driven by a gate on the other side of a circuit board might pick up unexpected noise from other parts of the circuit.
Edge sensitive NMI inputs must be clean, noise free, and should switch quickly and cleanly.
Remember that debugging NMI service routines is sometimes tough. How will you single step in an NMI service routine if, while debugging, dozens more NMIs keep coming? Most of us debug code by stopping at a breakpoint and looking at the registers and variables. If, when debugging the NMI handler, another comes along while we're stopped, after resuming execution the service routine will re-invoke itself, probably corrupting a non-reentrant value.
In summary, NMI is a valuable feature. Don't abuse it; restrict its use to those few situations where only an NMI will solve a problem.