The Embedded Muse
Issue Number 500, October 21, 2024
Copyright 2024 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

Jack Ganssle, Editor of The Embedded Muse

Contents

Editor's Notes
Quotes and Thoughts
What We Must Learn From the 737 Max
Some Advice
A Look Back
Final Failure of the Week
The Last Joke for the Week

Editor's Notes

This is the last Embedded Muse

After 500 issues and 27 years it's time to wrap up the Embedded Muse. It's said there's some wisdom in knowing when it is time to move on, and for me that time is now. I mostly retired a year or two ago, and will now change that "mostly" to "completely". To misquote Douglas MacArthur, who gave his farewell address to Congress at the same age I am now, like an old soldier, it's time for me to pack up my oscilloscope and fade away.

Thanks to all of you loyal readers for your comments, insight, corrections and feedback over the years. The best part of running the Muse has been the dialog with you. I've learned much from our interactions, and greatly enjoyed the give and take.

And thanks to the advertisers, particularly SEGGER and Percepio, who made this newsletter possible.

Back issues of the Muse remain available here.

This last Muse has three articles:

The first covers the failure of the 737 Max, because there is a critically important lesson there for all firmware developers. As our systems keep getting more complex, naively relying on sensor data is dangerous. Though a lot of articles have discussed the Max failures, none that I know of address what I consider a fundamental conceptual problem that is too common in this industry.
Then I pass on some advice; tidbits garnered from a half century working on embedded systems.
Finally, there's a short piece on how much this field has changed. The takeaway is: what fun we've had!

Quotes and Thoughts

“Program testing can at best show the presence of errors, but never their absence.” Edsger Dijkstra

What We Must Learn From the 737 Max

Air disasters are tragic, but are also, if we're wise, a source of important lessons. The twin crashes of the 737 Max in 2018 and 2019 offer stark warnings for all embedded developers. In my opinion the code - like so much other embedded software - mishandled the sensors. Do think about this and feed the lessons into your work.

The short version: The code read the angle of attack sensor, which had failed, and produced a result that was bogus. The computer assumed the data was correct and initiated actions that resulted in failure.

Don't believe inputs. Hold them to scrutiny against other data and the known science.

There's plenty to pick apart in these crashes, but my focus is on how the code handled input data. The angle-of-attack sensor (AoA) measures how far the aircraft is from level flight. The 737 Max has two of these; why only one was used is a good question, but goes beyond this discussion.

The following figure is adapted from the preliminary report of the Ethiopian Airlines accident. Interestingly, the flight data recorder had 1790 channels of data.

First fail: The left AoA sensor went from about zero degrees (level flight) to 75 degrees (apparently the maximum that the sensor can report) in one sampling interval (one second). That is, the sensor reported the aircraft pitched up to nearly vertical flight in a second or less.

The software should have looked at the AoA sensor rate of change and said "75 degrees in a second or less in a hundred ton aircraft? Unlikely." The very physics involved should have signaled a warning that the data was no good.

Second fail: Two channels show the altitude virtually unchanged. Aim an aircraft up, and it will either rapidly ascend, or stall and fall out of the sky. The altitude data shows neither of these conditions.

The code should have compared the strange AoA indication to a change in altitude. Again, the physics was clear, and other sensors gave lie to the AoA.

Third fail: Three other channels show the airspeed remaining constant or increasing slightly. Three channels reported no real airspeed change. Rotate the aircraft to an almost vertical inclination and the airspeed will plummet. Unless you're flying an X-15.

Why didn't the system compare the almost certainly bogus AoA data (due to it's unlikely rapid change) to any of the three airspeed datastreams?

The software had lots of inputs that suggested the AoA sensor wasn't working properly. Alas, it ignored that data and believed the failed sensor.

I believe we all need to think about this. When you're reading data, ensure it makes sense. Perhaps by comparing a reading to what you know the physics or common sense permits. A thermometer outside a bank should never read 300 degrees. Or -100. An AoA that reads 75 degrees on a commercial airplane is most likely in error. At the very least the code should say "that's strange; let's see if it makes sense."

Or, compare to previous readings and reject outliers. If the input stream is 24, 23, 25, 26, 22, 24, 25, 90 I'd be mighty suspicious of that last number. The 737 Max's AoA sensor was basically giving 0, 0, 0, 0, 0, 0, 0, 75.

Sometimes we average an input stream. That's generally a bad idea, as a single out-of-range reading can skew that average. Reject readings outside some limit before summing it into the average. That can be a simple comparison against a mean, or a comparison to a standard deviation, or, as shown in Muse 495, something much more robust.

Make use of other information that might be present; in the case of the Max many other channels were saying "we're flying straight and level."

CPU cycles are plentiful. Use them to build robust and reliable systems. Never assume an input will be correct.

Some Advice

A few thoughts from an old embedded guy:

Design first. Check the design against requirements (such as they are). Only then start writing code. If you think good design is expensive, well, consider the cost of bad design.
Don't conflate R&D. Research is figuring out things you don't know. Development is creating the product.
- Along the same lines, if the science behind a project is unknown or iffy, codify it before starting development. Do the R (research) to nail it down. Not doing so is one of the leading causes of project failure.
Don't trust vendor-supplied code. If you must use it, subject it to withering critique. If it smells like crap, it probably is.
Don't tailgate. It's pointless, dangerous, and incredibly rude.
Be introspective and deeply question your actions, rather than rationalizing them. Be open to change.
The goal of test is to prove correctness, not to find bugs. If you study the quality movement you'll see this precept has revolutionized manufacturing. Alas, it hasn't seeped into software engineering.
Do study the quality movement. Apply its ideas to your engineering.
Don't be afraid of quantum computing even though many of us will not understand it. If it works, if it is useful for embedded problems, if it can scale outside of cryogenics, there will be an API.
Design as much margin into your hardware and software as you can. You'll never regret having buffers against the slings and arrows of outrageous fortune.
Asynchronous digital design has its place, but be careful about timing problems, runt pulses, and the like.
Save money. With a decent bank balance you have options.
Be wary of conventional wisdom. For example: don't pick a pullup value because everyone else does; have an engineering reason. If you debounce for 10ms - why?
Perhaps the greatest contribution the agile movement has made is to highlight the difficulty of eliciting requirements. For any reasonably-sized project it's extremely hard to discern all aspects of a system's features. But that’s no excuse for abdicating the process. If we don’t know what the system is supposed to do, we will not deliver something the customer wants. Just because this is hard is not an excuse for skipping it.
The assert() macro is your friend.
Be wary of AI. It is going to change pretty much everything in ways we can't imagine. It will bring great capabilities along with difficult challenges, some of which may be beyond practical solutions.
Great composers expose themselves to lots of great music. Great writers read a lot of great books. Great programmers should read a lot of great code.
Study electronics. There is no such thing as digital anymore; ones and zeroes are impossibly close, electromagnetic issues plague many systems, and even naive oscilloscope use deceives.
Be kind.
Don't write optimistic code. Bad stuff happens. Inputs will go to crazy/impossible values. Your calculation might overflow. Maybe a cosmic ray will flip a GPIO. Assume the worst and code accordingly.
Slow down. We rush to coding too quickly, and thence to debugging with even more alacrity. Check your code carefully before running it.
Take time to learn about your tools. Most offer more capability than we understand.
Think! That used to be IBM's motto, which was emblazoned on every wall. Why would one want to deprecate thinking?
Get away from the office from time to time to be exposed to different ideas. One great idea has inestimable value.
Don't release your code till you're proud of it. Workmanship is important. Don't tolerate even small errors - even misspellings in comments.
We spend most of our lives working. There are two things we need from those efforts: making money, and having fun. Either alone is not enough. Too many of us trade off fun for security or a big salary, but the years rocket by. You'll want to be proud of your career and your choices. An old retired person with a head full of regrets is sad indeed.

A Look Back

We live in the best of times, especially for technology people. It's impossible to imaging not owning computers, of not being connected to a super-high-speed Internet. It wasn't always that way, and here's a little retrospective from over a half-century working in the embedded world.

When my dad went to MIT to study mechanical engineering, he figured with a slide rule. Twenty-five years later we engineering students at the U of MD used slide rules as calculators were quite unaffordable. We had one computer on campus, a $10 million (around $100m in today's dollars) Univac 1108.

My kids were required to own laptops when they went to college. In a single generation we went from $10m computers to using them for doorstops.

Around 1971 I desperately wanted a computer, so designed a 12 bit machine using hundreds of TTL chips and a salvaged Model-15 teletype. It worked - even better, being fully static, I could clock it at 1 Hz and use a VOM to debug it.

But then Intel announced the 4004, quickly followed by the first commercial 8-bit microprocessor, the 8008. It was hardly a computer on a chip as plenty of external components were needed. And at $120 ($900 in today's bucks) just for the chip, this was a pricey part. But suddenly a huge new market was born: the embedded system. (Actually, people had been embedding minicomputers into instruments before this, but the costs were staggering).

I had been working as an electronics technician while in college, but the company found itself in a quandary: clearly their analog instruments were to become obsoleted by digital technology, but the engineers didn't know anything about computers. Somehow I got promoted to engineer.

There were no IDEs or GUIs then. Our development system was an Intellec 8, an 8008-based computer from Intel. We removed the CPU board from our instrument and cabled a 100 pin flat ribbon cable from the Intellec to our bus; the Intellec then taking place of our CPU. Can you imaging running a bus over a couple of feet of cable today? At 0.8 MHz, that was not much of a problem.

An ASR-33 teletype with a 10 character-per-second paper tape reader and punch interfaced to the Intellec. Development was straightforward: Punch instructions into the Intellec's front panel to jump to the ROM loader (44 00 01 - I still remember these!). Load the big loader paper tape, suck that in. Then load the editor tape. Create a module in assembly language and punch the tape. Load the assembler tape. Load the module tape - three times, as it was a three-pass assembler. If no errors, punch a binary. Iterate for each module. Load the linker tape. Load each binary tape. Punch a final binary.

Re-assembling took three days.

A primitive debugger supported several software breakpoints and let us examine/modify memory and registers. Re-assembly was so painful we'd plug in machine code patches, documenting these changes on the TTY's listings.

The code had to do multiple linear regressions so we found a marvelous floating point package by a Cal Ohne which fit into 1kB. In the years since I've tried to track him down with no success.

The binary was just 4KB, but EPROMs were small. A 1702A could hold 256 bytes, so a board full of 16 were required. Those parts were $100 a pop; that board would have been $15k today.

Eventually we got a 200 CPS tape reader, then moved to an MDS-800 "Blue Box" with twin 80KB floppies. We paid $20K for that system; in today's dollars one could get a house in some areas for that sum.

How things have changed!

A hundred bucks buys more disk storage than possibly existed in the world 50 years ago. The 8008 might have squeaked out 100 FLOP/s; my Core I9 PC is rated at 109 GFLOP/s, yet the entire computer, with skads of RAM, disk, etc., was just a couple of $K. In the early 90s I started one of Maryland's first ISPs - we paid $5000/month for a T1 (1.5 MB/s) link to the net. Today Xfinity gets me for $100/mo for 500 MB/s.

Remember the spinning tape drives on mainframes? My astronomy camera today generates a 50 MB file for each image. That would fill one of those tapes. In a single night I'll take hundreds of those photos.

Our tools give us stunning capabilities. Many people complain about $5000 for a compiler, or $2000 for some other tool. Our MDS-800 cost more than a year's salary for an engineer and was worth every penny. A handful of thousands for today's tools is pretty insignificant in comparison.

I do worry that the tools are sometimes too good. We can iterate a change and be testing again in seconds. Quick iterations, absent careful thought about the problem, can kill a schedule and doom a project. That quick fix might work... but is it really correct for all possible paths through the code, for every input we can expect?

But working on that primitive 8008 was a ton of fun. As were uncountable other projects over the decades. How many careers can offer a decent salary and so much fun?

Final Failure of the Week

This is a medley of just a few computer failures readers have submitted over the years. They share one characteristic: the displayed results are bogus. I have hundreds of other examples. Does a $1 million drop in the price of bananas make sense? An ocean temperature of 9999 degrees? Air temp of over 100,000?

Like the 737 Max problem detailed above, these failures result from shoddy programming. Assuming the results will be just fine, all the time, despite too many contrarian examples.

There are two takeaways I hope we can learn:

1) Check your results! A half century ago, when CPU cycles were expensive, we were taught to check the goesintas and goesoutas. It is naive to assume everything will be perfect.

2) Critically evaluate error handlers. Exception code is exceptionally difficult to test, but is as important as the mainline stuff. Some will be impossible to test, which shows the vital importance of code inspections.

The Last Joke For The Week

These jokes are archived here.

Scientists created an AI that seemed to be able to answer all questions. It cured cancer and even told them how to travel faster than light.

One day, one of the scientists asked it if there was a god. It was given all books ever written, all historical data and even nuclear clearance codes. The AI, after ingesting this information, simply said:

There is now.

I'd say "click here to unsubscribe from the Embedded Muse", but after this issue it is dead and buried. You are unsubscribed!