The Embedded Muse 495

The Embedded Muse
Issue Number 495, August 5, 2024
Copyright 2024 The Ganssle Group

Editor: Jack Ganssle, jack@ganssle.com

Jack Ganssle, Editor of The Embedded Muse

You may redistribute this newsletter for non-commercial purposes. For commercial use contact jack@ganssle.com. To subscribe or unsubscribe go here or drop Jack an email.

Contents

Editor's Notes
Quotes and Thoughts
Tools and Tips
Rejecting Outliers
Mean or Median?
More on Engineering Ethics
Failure of the Week
Jobs!
Joke for the Week
About The Embedded Muse

Editor's Notes

Tip for sending me email: My email filters are super aggressive and I no longer look at the spam mailbox. If you include the phrase "embedded" in the subject line your email will wend its weighty way to me.

Coming soon: 1 trillion transistor GPUs.

Quotes and Thoughts

In times of change learners inherit the earth while the learned find themselves beautifully equipped to work in a world that no longer exists. - Eric Hoffer

Tools and Tips

Please submit clever ideas or thoughts about tools, techniques and resources you love or hate. Here are the tool reviews submitted in the past.

Daniel McBrearty has a nice way to improve the use of assert():

For a recent project I created a tool which allowed me to write ASSERT / fatal error handling that automatically saves the file and line number to FLASH as a uint32_t.

I couldn't quite believe I was not reinventing the wheel here, but I didn't see anyway to do it without making special tools. Of course, the C __file__macro gives a string which is not ideal for use in embedded error reports.

My tool makes both the file and the git hashref available to the preprocessor, which in turn makes it possible to avoid having to define a custom error code for every place in your code where you want an ASSERT statement.

Maybe this wheel already existed somewhere (I can't believe I'm the first), if so I'd like to know where. If not, here it is. At the moment I only implemented it using makefiles, a CMake version is in progress.

Hope this is useful to somebody other than me.

https://github.com/danmcb/auto-error-codes

Ingo Marks sent this update:

I know you are a huge fan of Ada. Since comp.lang.ada has died unfortunately due to insane spamming there is a new Ada forum alive which may be of interest to you.

https://forum.ada-lang.io

Some new topics are really interesting, for instance:

Experiments with RISC-V:
- https://forum.ada-lang.io/t/experiments-with-risc-v/930

Rejecting Outliers

We often average input data to reject noise and get a more accurate model of the data we're sampling. In some cases naive averaging can lead to unacceptable errors. A single data point wildly out of range can skew the mean, giving us quite incorrect results.

One solution is to reject outliers before computing the average. There are many ways to do this.

A simple approach is sigma clipping. Compute the standard deviation of the data and reject samples with a sigma greater than some value. (That number is very application dependent). This is simple and fast.

In the absence of hardware floating point one could dispense with the computationally-expensive square root, which yields the variance. However, I generally prefer standard deviation over variance, as the former is in the same units as the data. If you're measuring length in meters, standard deviation is also in meters.

Don't want to toss data? An alternative is Winsorized sigma clipping (named for statistician Charles Winsor). Instead of rejecting outliers, replace them with an adjacent value. For instance, if the data looks like:

24, 25, 22, 20, 28, 102, 23, 25, 31, 22, 23

Replace the outlier with an adjacent data point - perhaps the one to the right:

24, 25, 22, 20, 28, 23, 23, 25, 31, 22, 2

... again, using the standard deviation to identify outliers. Sure, this is a bit of a lie as we're making up data. But it's a white lie that won't skew the result.

For the last year or two I've been using a newish-to-me method of rejecting outliers. It goes by the unwieldly name "Generalized Extreme Studentized Deviate" (often abbreviated ESD), which sounds like something an academic would make a career of, but is pretty simple in concept.

Compute the mean and standard deviation of the set of samples. Then find the sample for which:

is a maximum. Remove that sample from the dataset. Recompute the mean and standard deviation and repeat. Continue removing samples until you've removed however many outliers you expect.

A better stopping criteria is to monitor the change in the size of that equation, and stop when it falls below some value. Again, I'd use the standard deviation here.

I am no statistician, but suspect one could remove the divisor, as that is a constant. You still want the standard deviation to compute a stopping criteria, but could compute the subtraction at the same time as figuring sigma.

A more complex alternative is detailed here.

I've found that ESD is remarkably effective at removing very strong signals that should not be part of the data set. It's computationally expensive, but may be an option when there are sufficient CPU cycles.

Mean or Median?

When running statistics should we use the mean or median?

Generally we default to the mean, which more often than not makes sense. Sometimes we'll average just out of habit. But when the mean and median are very different it may make sense to profile using the latter.

Here in the little unincorporated town of Finksburg, MD we have a population of about 10,000. I imagine the average salary is around $100k, with a similar median. In other words, most folks here probably get a more or less Gaussian distribution around $100k; some a bit more, others somewhat less.

If Elon Musk moved into one of our modest houses, though, Finksburg's mean income jumps to millions of dollars - even though (I imagine) no one earns anything like that. The median hasn't changed and is much more representative of the town's income.

One could argue that the mean is still a good model if we reject Musk as an outlier (see above). But outliers are sometimes where the interesting data lies. Consider a picture of a remote galaxy. The sky is nearly black; virtually all of the data is the blackness at the left side of a histogram. A tiny percentage of the pixels are bright - the stars we're focused on. That's the interesting stuff, yet those pixels might represent a faction of a percent of all of the data. They look like outliers but are not.

As always, know your data before deciding how to process it.

More on Engineering Ethics

Responding to last issue's take on engineering ethics, Greg Hansen notes the Ritual of the Calling of An Engineer, which has been an institution in Canada for a long time. From the Wikipedia article:

The ritual traces its origins to professor H. E. T. Haultain of the University of Toronto, who believed and persuaded other members of the Engineering Institute of Canada that there needed to be a ceremony and standard of ethics developed for graduating engineers. This was in response to the Quebec Bridge Disaster in which 75 workmen died due to faulty engineering calculations and miscommunication.[2] The ritual was created in 1922 by Rudyard Kipling at the request of Haultain, representing seven past-presidents of the Engineering Institute of Canada.[3][4] The seven past-presidents were the original seven wardens of the corporation.

The Ritual of the Calling of an Engineer has been instituted with the simple end of directing the young engineer towards a consciousness of his profession and its significance, and indicating to the older engineer his responsibilities in receiving, welcoming and supporting the young engineers in their beginnings.

— Rudyard Kipling, from notes by Dr. J. Jeswiet

Dave Telling wrote:

I can only recall one time that I was asked to deliberately lie to a customer about a particular product. I told my manager that I could not do that, and he just said, "OK, we will find someone else". I never felt any discrimination or blowback about this - as a matter of fact, it seemed to give some weight to my comments about design issues and other decisions, as our management believed that I would be honest in what I said. I recall one interaction with our local plant manager (who was a mechanical engineer) where we were discussing an electronic technical issue that needed to be addressed, and I said, "You're just going to have to believe me on this." His reply was, "Dave, if I can't believe you, who can I believe?". This made me realize that I had that reputation of honesty, and also that I had better live up to it (which isn't as easy as you'd think!).

To add to a famous quote, "Honesty is not the best policy, it's the only policy"

Tom Lock

I had the good fortune to make an ethical decision at the dawning of my career that shaped it and to which I credit its success.

In the fall of 1976 I was in my senior year of electrical engineering at Case Western Reserve University. Times were good – so good that lots of employers sent people to the college to interview prospective graduates. I signed up for one of those interviews with a nearby place called Goodyear Aerospace. I didn't know what they did but thought it probably had something to do with commercial aviation. I was wrong. They made guidance systems for ballistic missiles. When the interviewer said, “We like to think of our products as defensive weapons,” I had an epiphany. I realized that, if I did a really good job for them, I could be instrumental in killing tens of millions of people. I promised that day that I would never have that kind of job. I never did. I kept that promise and it served me well. In my career I witnessed recessions, business downturns, and draconian layoffs – and I never wanted for a single day’s employment.

Failure of the Week

Dainius Stankevicius wrote: The discount percentage somehow managed to become a NaN. Although when one compares the two posters, we can see that the poster on the left uses commas while the one on the right uses points, which most likely was the reason. But still, someone had to print it and physically put it up there...

And this is from Chris Hammond:

Have you submitted a Failure of the Week? I'm getting a ton of these and yours was added to the queue.

Jobs!

Let me know if you’re hiring embedded engineers. No recruiters please, and I reserve the right to edit ads to fit the format and intent of this newsletter. Please keep it to 100 words. There is no charge for a job ad.

Joke For The Week

These jokes are archived here.

If C++ is so good why has it never improved to A+?

About The Embedded Muse

The Embedded Muse is Jack Ganssle's newsletter. Send complaints, comments, and contributions to me at jack@ganssle.com.

The Embedded Muse is supported by The Ganssle Group, whose mission is to help embedded folks get better products to market faster.