|
||||||
You may redistribute this newsletter for non-commercial purposes. For commercial use contact jack@ganssle.com. To subscribe or unsubscribe go here or drop Jack an email. |
||||||
Contents | ||||||
Editor's Notes | ||||||
Tip for sending me email: My email filters are super aggressive and I no longer look at the spam mailbox. If you include the phrase "embedded muse" in the subject line your email will wend its weighty way to me. Jack's latest blog: Non Compos Mentis |
||||||
Quotes and Thoughts | ||||||
Excessive or irrational schedules are probably the single most destructive influence in all of software. Capers Jones |
||||||
Tools and Tips | ||||||
Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past. In response to my review of SourceMonitor Sergio Caprile wrote:
|
||||||
Freebies and Discounts | ||||||
This month's giveaway is a 0-30 volt 5A lab power supply. Enter via this link. |
||||||
Testing for Unexpected Errors | ||||||
A reader posed an interesting question: in a safety-critical system how do you test for unexpected errors? Some errors are expected. A sensor going out of range, for instance. In the case of the 737 MAX crashes the angle-of-attack sensor went to an insane value in a second or less which caused the code to do bad things. That is a testable condition. In the hundreds of "Failures of the Week" that are in each Muse many are due to a crazy input which the code did not provide mitigation for. That's sloppy engineering. But other errors are impossible to anticipate. Cosmic rays can cause random bit flips. If the program counter is corrupted there's no predicting what will happen. A power supply brown-out will cause, well, pretty much any kind of unpredictable behavior. Dereference a null pointer and all sorts of craziness can happen. (Note: Many software errors can be handled. Design by Contract stems from the Eiffel programming language, which pretty much nobody uses, but is available in Ada and SPARK. It essentially performs runtime checks to ensure values going into, and returned from, functions meet certain rules. It's one of the most valuable ways to ensure safe code. Alas, too few of us use DbC. One wonders if neglecting such a powerful tool is engineering malpractice.) The reader's question is important, but testing for correctness is getting things a little backwards. First, design a robust system. We're in a golden age of microcontrollers where huge amounts of capability are available for little cost. All MCUs now have watchdog timers, the first line of defense against unexpected problems. Some offer window watchdogs which are a bit more of a pain to use, but are better than your average WDT. I had an epiphany when Intel introduced the 386 decades ago. Everyone hated that the 8086 used segment registers to extend a 16-bit address space to 20 bits. When the 386 appeared developers went mad. Instead of four segment registers now there were thousands. But that seemed brilliant to me. With the 386 one could build a system using an OS where each task lived in its own hardware-protected address space. If a task did something zany the memory management unit (MMU) would throw an exception. A third of a century later we embedded people are still mostly stuck using 1980s hardware designs. Few MCUs have an MMU. Transistors are so cheap it makes no sense to not add such a powerful crash-mitigation asset to a processor. In recent years Arm's memory protection unit (MPU) has made it into some (not enough) Cortex MCUs. The MPU is a poor-persons MMU. It offers a handful of independent protected memory areas. Despite its limited capabilities I feel it's a great asset that every system that needs robustness should use. Some RTOSes are MPU aware, which greatly simplifies its use. A few vendors offer lockstep MCUs, where two identical processors operate simultaneously. Any difference in behavior throws an exception. TI is one, as well as Freescale (or Motorola or NXP or whatever their name is now). While good architecture is critical, testing does remain important. Back in the mainframe days we tested Fortran compilers by feeding a random deck of punched cards into the tool. It's amazing how often crashes occurred, but this did lead to incremental improvements in the compilers. (The University of Maryland's Ralph compiler would abort after 50 compiletime errors and print out a picture of Alfred E. Neuman, with the caption "This man never worries, but from the look of your code, you should.") Clearly, we want to test for every possible input. But in truth this is liable to be superficial at best. With three 12-bit ADCs there are billions of possible input combinations. A dozen GPIOs means thousands of tests if one wants to check every possible condition. 100% testing is somewhere between intractable and impossible. One common issue is a problem in booting. There used to be a tool called the Poc-It which cycled power to a system. The target would assert a signal meaning "good boot" when it came up; the Poc-It monitored that and logged errors. Some users coupled that to a Variac to ramp the mains power through a range of values to simulate different international mains voltages and brown-outs. Alas, the Poc-It is no longer available, but it would not be hard to cobble up such a tool. Exception handlers are exceptionally difficult to test. Sure, one can simulate a divide by zero and see that an interrupt occurs. But how do you simulate that error at every possible divide in the code? And how do you ensure that the system responds safely? Anticipating and handling errors is one of the most difficult problems we face. What is your approach? |
||||||
Design For Debugging Redux | ||||||
Responding to last issue's thoughts about designing for debugging, Stephen Morris-Jones wrote:
Jerry Mulchin also had some thoughts:
|
||||||
Rewriting Code | ||||||
And Steve Wheeler had some stories about rewriting others' code:
|
||||||
Failure of the Week | ||||||
In honor of the New England blizzard last week, Marinna Martini sent the following. I expect this is not a failure per se; no doubt bulbs are burned out or there was operator error. Still, it's priceless: Josh Weeks sent this: Have you submitted a Failure of the Week? I'm getting a ton of these and yours was added to the queue. |
||||||
Jobs! | ||||||
Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intent of this newsletter. Please keep it to 100 words. There is no charge for a job ad. |
||||||
Joke For The Week | ||||||
These jokes are archived here. From Rick Ilowite: Q: Would you like some bouillabaisse 2? |
||||||
About The Embedded Muse | ||||||
The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com. The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster. can take now to improve firmware quality and decrease development time. |