A Different Take On Asserts
(For more in depth on this, see here).
By Jack Ganssle
A Different Take On Asserts
Published in ESP, January 2004.
Expert programmers use the assert() macro to seed their code with debugging statements. So should we all.
Readers unfamiliar with assert() should check out Niall Murphy's two articles in the April and May 2001 issues of Embedded Systems Programming. Niall does a great job of describing the what, how and why of this debugging construct. Basically, if assert()'s argument is zero it flags an error message.
The macro must have no effect on the system - it shouldn't change memory, interrupt state, or I/O, and it should make no function calls. The user must feel free to sprinkle these at will throughout the program. Disabling the asserts in released code must never change the way the program operates.
The macro generally looks something like:
#define assert(s) if (s) {}else (printf("Error at %s %d: ", __FILE__, __LINE__)
It's sort of an odd way to write code - if the condition is true (in which case assert() takes no action) there's a null pair of braces. Why not test the negation of the statement "s"?
The answer lies in my, uh, assertion above: the macro must have no effect on the system. Code it without the else and poorly written software, perhaps including an unmatched else from another if (this is a debugging construct after all; it's role is to look for dumb mistakes) will cause the assert to change the system's behavior in unpredictable ways.
Embedded Issues
Few embedded systems make use of the assert() macro since we're usually in an environment with neither display console nor printf statement. Where do the error messages go? Equally problematic, embedding strings ("Error at" and the file name) eats valuable ROM.
Replace the printf with a software interrupt. Plenty of processors have a one byte or single word instruction that vectors the execution stream to an interrupt handler. Write code to capture the exception and take some sort of action. Log the error to a data structure, flash an LED, or put the device in a safe mode.
Initialize your debugging tools to automatically set a breakpoint on the handler. You'll forget the breakpoint is set. But once in a while, Surprise! The system suddenly stops, the debug window highlighting the first line of the exception routine. You've captured a bug in its early phases, when it's easiest to track.
Systems constrained by small ROM sizes can leave out the __LINE__ and __LINE__ arguments. Pop the stack and log the caller's return address. It's an easy matter to track back to find which assertion threw the exception.
Most developers embed assert() inside a preprocessor conditional. Flip a compile-time switch and they all disappear. This gives us a highly-debuggable version of the code for our own use, and a production release not seeded with the macros.
But I'm not keen on shipping code that's differs from our test version. NASA's commandment to "test what you fly and fly what you test" makes an awful lot of sense for building spacecraft as well as any embedded system. When ROM and CPU cycles aren't in short supply leave the assert()s seeded in the code, enabled, not #defined out. If you believe the system can merrily proceed despite an assertion failure, code the exception handler to log the error, leaving debugging breadcrumbs behind, and return.
Is your system connected to the 'net? Why not send an email message logging the error with a snapshot of the snapshot to crash@yourcompany.com? Be sure to include the firmware's version number.
This is yet another reason to get and use a version control system. If your code isn't static, if new versions proliferate like politician's promises of tax relief, the stack contents will be meaningless unless you can recreate earlier releases.
Embedded systems should use asserts to check housekeeping problems, like stack growth. On an x86 processor the instruction mov ax,sp grabs the current stack pointer. Embed this, or a similar instruction for your CPU, into an assert that then compares against a nearly-full stack. Seed every task with the macro.
Check malloc's return value, too, if you're gutsy enough to use it in firmware. If malloc fails it always returns an error code. Ignore it at your peril!
Real Time Asserts
systems think only in the procedural domain, working diligently to build a sequential series of instructions that implement some algorithm, blithely ignoring the time aspects of their code. Building real time code means managing time as seriously as we build if-elses. Track free CPU microseconds as well as available bytes of ROM.There's surprisingly little interest in managing time. C and C++ compilers give us no help. They should produce a listing which gives us the execution time, or at least a range of times, for each statement in our program. We compiler users have no idea if a simple ++int takes a microsecond or a week. More complex statements invoke the compiler's runtime library, truly a black hole of unknown execution times.
Years ago a friend worked on a Navy job using the CMS-2 language. The compiler was so bug-ridden that they adopted a unique coding strategy: write a function, compile it, and see if the assembly output made sense. If clearly wrong, change something in the CMS-2 source - even spacing - and hope a recompile produced better results. In effect, they made random changes to their source in hopes of taming the compiler.
We do the same thing when the compiler does not produce timing information. How do we fix a slow snippet? Change something - almost anything - and hope. That's more akin to hacking than to software engineering. Seems to me the vendors do their customers a horrible disservice by not documenting best, worst and average execution times.
A very cool product, despite its high price, was Applied Microsystem's CodeTest. Among other things, it measured execution time of each function in a program, seeding the code with special instructions whose action was captured by a hardware probe. Yet Applied failed, never able to generate much interest in that product.
In lieu of buying time tools, why not build real-time assert()s, macros that helps us debug in the time domain?
The oldest timing trick is to drive an output bit high when entering a routine, and then low as we exit. Probe the port with an oscilloscope and measure the time the bit is set.
Rather than dropping a pair of outport() C statements directly into the code, I prefer to define a assert-like macros. Never sprinkle low level I/O into anything other than drivers, except in the most ROM-constrained systems.
Call them RT_set_assert() and RT_unset_assert(), the RT meaning real-time to clearly indicate that these are not your father's asserts anymore.
This brain-dead trick has been around forever. But you can learn a lot about the behavior of a system by extending the idea just a bit. Want to measure min and max execution time of a function or ISR? Set a digital scope to store multiple acquisitions on the screen. Trigger on the I/O bit's positive edge. You'll see the bit go high, a clear space on the screen, and then a lot of hash before it goes low. The clear space is the minimum time. The interval from the positive edge to the end of the hash is the maximum execution time.
I often want to know the total amount of time spent servicing interrupts. Heavy interrupt loads are a sure sign of a system which will be a bear to debug and maintain. Drop the RT_assert pair into every ISR. That is, every ISR toggles a common bit. If the scope trace stays mostly low, there's little loading. Mostly high, and kiss your family goodbye for the duration of the project.
We developers are being shortchanged by our tools; the continued disappearance of the in-circuit emulator gives us ever less insight into our projects' time domains. But we're lucky in that most high integration CPUs include plenty of timers. Even simple systems today use RTOSes sequenced by interrupts from a timer. If you're not using an RTOS, do program one of these peripherals to track time for the sake of performance measurements.
Create a simple RT_assert pair to compute elapsed time over a function or region of code. RT_enter_assert() saves the clock; RT_exit_assert() figures time, in tics or microseconds. In a preemptive system it's best to pass an argument with a unique identifier for each code snippet being measured. This is then an index into a data structure, so that multiple instances of the assert don't clobber each other's data.
In a real system logging every instantiation of the code chunk would generate a flood of data. Instead, have "Courier New"">RT_exit_assert() compute the max execution time. Or, if needed, figure min and max.
Examine the structure with a debugger to get a quick code profile.
How much accuracy do you need? If the timer is used for context switching, no doubt it only generates an interrupt every handful of milliseconds. Yet the hardware actually counts CPU clocks, or CPU clocks divided by some very small number. The RT_assert macros can read OS-maintained time (which counts interrupts) and the timer's hardware. Resolution to a few microseconds is possible.
To "tare" the readings, compensating for the overhead contributed by the RT_asserts, issue the two macros back to back. Subtract the tare value to increase precision.
Multitasking can queer the results. Preempting the code in the middle of a profiling operation then includes the execution time of the preempter. Just as we want to include the overhead contributed by the compiler's runtime routines, most of the time we want to know elapsed time, which includes preemption. Alternatively, disable interrupts during the measurement, though of course that may greatly effect system operation.
Some developers advocate using macros to send unique codes, a start/stop pair per function, to a parallel port. A logic analyzer then acquires the data. This is more trouble than it's worth. The analyzer quickly accumulates thousands or millions of data points; sometimes much more than it can store. It's impossible to make sense of the information without writing a computer program to empty the tool's buffer and convert raw codes and time into meaningful info.
Worse, the analyzer collects data only during the debugging phase, and only then when you're specifically taking performance measurements. I believe in proactive debugging - seed the code and the execution environment with tools and tricks to log data and flag errors all the time.
And that's where the idea of "real time asserts" shines. Expand RT_exit_assert() just a bit. Pass a parameter that indicates the maximum allowed time. The macro computes the actual period! and errors if it exceeds what we've expected. Maybe you leave this in only during debugging to grab your attention when the system snarls. Perhaps it's deployed with the production release to take some action when insane bottlenecks occur.
After all, missing a deadline is as much of a bug in many real time systems as is a complete system crash. If we use asserts to capture errors in the procedural domain, we should do no less in the time domain.