Built-in Debuggers
More and more processors have built-in debugging resources. Here's a look at what features they offer.
Published in Embedded Systems Programming, August, 1993.
By Jack Ganssle
Take pity on the poor embedded programmer. Too many "save money" by relying on only the crudest of tools, but even those with the largest budgets and best desires are often forced into the same trap when using a leading-edge part that beats the tools to market.
Finally our wails of anguish are being heard by the chip makers. We're starting to see on-board debugging resources on quite a few new CPUs. It's hard to dedicate processor pins to debugging, as they contribute nothing to the end-product. However, development is such a huge part of the cost of most embedded products that there is often little choice but to add recurring costs to mitigate NRE.
Internal Registers
Intel addressed debugging problems early on with the 386 microprocessor. Evidently they recognized that the speed of the processor was such that traditional debuggers would be prohibitively expensive. It's hard to push electrons through a cable at 33 Mhz (or, at 66 Mhz for the 486... soon to be 99 Mhz if IBM's rumored clock tripler part comes out).
The 386/486 has a very complex addressing mechanism. Logical addresses get transformed to linear addresses via a wondrously sophisticated segmentation system. A paging unit then translates linear addresses to physical. As a result, it's all but impossible to know what the program is doing simply by looking at the processor's pins with, say, a logic analyzer. Physical address 10145A0 (on the pins), could correspond to any of thousands of addresses generated by the program, depending on the settings of the CS selector, corresponding descriptor, and paging setup.
I have to give Intel credit. They dedicated a substantial number of transistors on the part to debugging back when transistors were still relatively expensive. This foresight has paid off for a generation of developers (hey - I figure a generation lasts about 5 years in this business), as nearly all debuggers, from Turbo-Debugger to many of the hardware tools, make use of these debugging resources to set breakpoints.
The 386/486 implements 4 hardware breakpoints using six internal registers. Four of the registers simply hold the break address, which is a linear address - it is the post-segmented, but pre-page translated address generated by the program.
One register controls the mode of each of the four breakpoints. Intel went to extremes to make these useful debugging resources, so that each can be an instruction breakpoint or a data break. For example, you could set one to break on a data write to a specific address, another to work on instruction fetches only, and a third to break on any data read or write.
The sixth register contains status information so the debug exception handler can determine the source of the breakpoint.
Since the breakpoints are handled as hardware comparators, they will work in code that resides in ROM or in RAM, an important benefit for debugging embedded systems.
I have yet to see if the Pentium includes any sort of enhanced debugging capability beyond the 386-type debug registers. Presumably it's superscaler architecture will present yet another range of complexity in tracking down bugs.
Background Mode
Motorola has been very innovative in their approach to both processor technologies and on-board debugging tools. The 683xx family is a series of processors mostly based on the 68020 core. Each part offers a tuned I/O mix. Ideally, the family will be so large that you'll be able to buy exactly the processor you need. I suspect the family will become as persuasive as the 8051 and its 50 or so variants.
Motorola implements the family as a core CPU and numerous standard I/O modules - timers, DMA, and the like. Each module is on their CAD system. It's easy to design a new microprocessor by using the Betty Crocker method of extracting standard stuff from the library, shaking it up, and letting the CAD system generate photomasks. They tell me that one part took but a single day to design... and was correct in its first silicon release.
Having a huge family of slightly different parts is both a blessing and a curse. Again, look at the 8051 family for comparison. Sure, any part you'll need is probably there, but each time you change CPUs you'll have to buy, at the very least, a new pod for your ICE. It's hard to use leading edge components when the tools may lag by months.
The solution - on on-board debugger that is standard across the entire family (it even carries into the 68HC16 family). Each processor dedicates 3 wires to a serial interface for debugging purposes. The entire port is called the CPU's Background Debug Mode (BDM).
Given some simple hardware to connect the serial lines to a PC, you can establish a communications path to the processor that bypasses all of its normal operation. The CPU will process a wide number of commands sent over this port, all without altering the processor's status - the registers, PC, and the like stay intact unless you explicitly issue a command to modify them.
The command set resembles that of a ROM monitor. You can read and write memory and registers, start a program executing, and issue resets.
Normally, the BDM is disabled. You'd hate to have your embedded system toggle to a debugging state in the field, when no debugger is connected! A special reset sequence enables background mode, essentially turning on the serial port and altering the function of the Background (BGND) instruction.
Normally, BGND is an illegal instruction. If BDM is enabled BGND stops execution of the program and throws the CPU into background mode, where it services the serial commands. What could be better than this for a breakpoint? The BDM expects you to substitute a BGND instruction for the instruction you'd like to breakpoint on. This does imply that you cannot break on data accesses or instructions in ROM unless substantial extra hardware is added.
The CPUs also have a breakpoint input, which drives the processor into background mode when BDM is enabled. This is essential for stopping a runaway program or adding more sophisticated external breakpoint hardware.
Numerous suppliers make debuggers that connect the CPU's BDM pins to a PC's parallel or serial ports. A single BDM debugger will work with any of the Motorola processors with this resource. If you use these processors, be sure to include the Motorola standard Berg connector in your hardware, to make the BDM port available to a commercial BDM debugger, no matter what your plans are for debugging strategies. Unfortunately, Motorola defined two different "standard" connections, one using a 10 pin connector and the other an 8 pin version, with quite different pinouts. The 10 pin connector offers a bit more control of the target hardware, so is probably the preferred connection.
Since most vendors provide a source debugger with their BDM tools, C and assembly are both viable prospects for BDM debugging. However, BDM debuggers make the most sense when total code size is relatively small, and when real time constraints are minimal. I'd be cautious about relying on a simple BDM in any interrupt-intensive application, since more powerful full scale emulators (or, at the very least, logic analyzers), are essential for tracking these asynchronous events.
Embedded Systems Technology is the exception to this rule; they make an optional trace board, which, while not cheap, does cleverly communicate to the BDM through an unused register in the processor. As always, compare prices and features to get the tool that suits your needs and budget.
SMT
It seems the embedded world is stampeding to surface-mounted components (SMT). In the good old days each IC, resistor, and capacitor had long leads that fit through holes in the circuit board, providing a solid mechanical connection prior to soldering. SMT parts solder directly to the face of the board, tenaciously holding on by virtue of the solder alone. The benefit of this technology is reduced size: SMT components are tiny... so small they're hard for these caffeine-shaky hands to manage). In addition, since there are no holes needed to mount the parts, clever designers can smear both sides of a board with them, further reducing the size of the system.
Surface mounted CPUs create all sorts of new challenges for debugging. Most have leads on all four sides of their small, squarish packages. Sometimes the "pitch" of these leads (their spacing) is a paltry .020 inch.
Traditional emulation techniques just don't work well in this environment. You cannot simply unplug the CPU and cram an emulator's pod in - the processor is soldered directly to the board. One option is to dedicate one prototype system to development, and install a special conversion device in place of the CPU. Emulation Technology, EDI, and others make these adapters which solder to the processor's footprint and provide a socket for an appropriate emulator. Be aware, though, that adapters cost $500 to $1000, and are wispy, delicate parts that require a magician's hand to solder in place. Don't try this at home, kids!
If the processor is soldered in place, why not design an emulator whose pod clips over the entire chip? That is, use a sort of inverted female socket on the pod, and snap it onto the CPU, providing an electrical connection to each processor's pin.
Emulator's work by taking massive control of all processor functions. They must be able to run short segments of emulator code on the target microprocessor, which means the CPU must be isolated from the target system by a buffer, so the emulator's code doesn't spuriously effect target I/O and memory. Since there is no physical way to place a buffer between the surface-mounted CPU and it's target resources, the emulator must somehow disable the target CPU, replacing all of its functionality with a processor inside of the emulator itself.
Most of the microprocessor vendors recognized this, and provide some method of tri-stating the target chip. The part is driven to an inactive state, where all of its pins are non-functional. In effect, the processor on the target system becomes a dead hunk of plastic that is completely replaced by the emulator's own CPU.
Zilog's Z182, for example, is a 100 pin quad flat pack (QFP) device based on the Z180 core. Two pins are dedicated to selecting a debug mode. Usually your system leaves these pins open and the processor enters normal operation on power up. If an emulator is connected, it drives the pins in to one of two debugging modes.
Mode 1 forces almost all of the Z182's pins to a tri-state condition. A Z180 emulator, with a special adapter, clips over the Z182 in the target and provides all of the address, data, and other signals to the target. Only a few lines stay active - those related to peripherals inside of the Z182 that the Z180 does not have. So, the Z182 stays semi-active: it's core processor is disabled. The internal I/O that is identical to that on a Z180 is disabled. Just the new Z182 superset I/O is alive, intercepting I/O commands sent to the processor's pins via the clip-on plug.
This is a nice approach, since dozens of vendors sell Z180 tools. Creating a new emulator for the Z182 would be prohibitively expensive, as illustrated by the chip's Mode 2, which tri-states everything on the part, including all of the I/O. No vendor supports debugging in this mode today.
Intel uses a similar approach on their 80186EC microprocessor, a surface mounted variant of the popular 186 family. It is also a 100 pin QFP device. Instead of dedicating pins to debugging, Intel elected to share an address line (A19) with the emulation mode selection. Grounding A19 during reset drives the part into a tri-state condition Intel calls ONCE mode (apparently pronounced "AHNCE").
Though the 186EC is a lot like other members of the 186 family, it is sufficiently different that you cannot make an adapter to convert, say, a 186 pod to the 186EC. A new pod is needed (at the very least). Thus, unlike the Z182, going to ONCE mode tri-states - the part is just an expensive piece of plastic during debugging, with the emulator's CPU assuming all processor and I/O functions.
The Future
The driving force behind electronics is an implicit guarantee that the cost of silicon always follows a downward spiral. Transistors are cheap; so cheap, it seems chip vendors have a hard time deciding what to do with them. It's clear that a percentage of the transistor budget on many new microprocessors will be dedicated to on-chip debugging resources, to make the parts truly usable by developers.
One technology that has been lurking for a number of years is boundary scan. Boundary scan is an IEEE standard (IEEE 1149.1) that defines a way to design chips for in-circuit testability. Its thrust is towards the production test and repair end of the business.
A chip designed to the IEEE standard will include 4 pins that implement a serial link for communications to an outside test device. Typically, a number of chips, all implementing boundary scan, will be daisy chained together so the tester can send commands to any part on a circuit board.
ICs with boundary scan capabilities can sense the signals on each pin, so the tester can completely probe the board purely by sending serial commands between chips.
I've heard rumors that some vendors are exploring expanding the technology to include debugging assets, somewhat like the breakpoint registers on the 386. After all, serial pins are already dedicated to test functions; it makes sense to add debug logic, perhaps implemented somewhat like Motorola's Background Mode. Then, the production test logic can do double duty as a software development platform.
Corelis Inc. (Cerritos, CA, (310) 926-6727), just announced a boundary scan-based development tool for the AM29200 and 29030. It's cheap; it lets you view target resources like memory and I/O, and it supports software breakpoints. I see this as an interesting alternative to the extremely high priced development tools used for fast 32 bit CPUs.
Boundary scan offers promise for the future, but it will never offer a complete solution to the debugging process. Programmer time is expensive. Tools that improve productivity are therefore, by definition, cheap. Some resources, like real time trace and performance analysis, offer lots of benefits to the developer, but are far too complex to ever put in the silicon itself. However, built-in debugging hardware does bring at least a minimal development system to a huge audience, and simplifies high-powered tools.