Tester To Developer Ratio
Summary: How many testers per developer do you need?
Reader Dave Kellogg asked an intriguing question recently: what is the proper tester/developer ratio? Or, for projects where the developers do their own testing, what is the ratio of their time in the two activities?
Surely there's no single answer as that depends on the strategy used to develop the code and the needed reliability of the product. If completely undisciplined software engineering techniques were used to design commercial aircraft avionics (e.g., giving a million monkeys a text editor), then you'd hope there would be an army of testers for each developer.
Or consider security. Test is probably the worst way to ensure a system is bullet-proof. Important? Sure. But no system will be secure unless that attribute is designed in.
Microsoft is said to have a 1:1 ratio. Now, when you consider the number of patches that stream out of that company one might think that either better developers or processes should be used, or a bunch more testers.
The quality revolution taught us that you cannot test quality into a product. In fact, tests usually exercise only half the code! (Of course, with test coverage that number can be greatly improved). Capers Jones and Olivier Bonsignour, in The Economics of Software Quality (2012, Pearson Education) show that many kinds of testing are needed for reliable products. On large projects up to 16 different forms of testing are sometimes used.
Jones and Bonsignour don't address Dave's question directly, but do provide some useful data. It is all given in function points, as Jones especially is known for espousing them over lines of code (LOC). But function points are a metric few practitioners use or understand. We do know that in C, one function point represents very roughly 120 LOC, so despite the imprecision of that metric, I've translated their function point results to LOC.
They have found that, on average, companies create 55 test cases per function point. That is, companies typically create almost one test per two lines of C. The average test case takes 35 minutes to write and 15 to run. Another 84 minutes are consumed fixing bugs and re-running the tests. Most tests won't find problems; that 84 minutes is the average including those tests that run successfully.
The authors emphasize that the data has a high standard deviation so we should be cautious in doing much math, but a little is instructive.
One test for every two lines of code consumes 35+15+84 minutes. Let's call it an hour's work per line of code. That's a hard-to-believe number but, according to the authors, represents companies doing extensive, multi-layered, testing.
Most data shows the average developer writes 200 to 300 LOC of production code per month. No one believes this as we're all superprogrammers. But you may be surprised! I see a ton of people working on very complex products that no one completely understands, and they may squeak out just a little code each month. Or, they're consumed with bug fixes, which effectively generate no new code at all. Others crank massive amounts of code in a short time but then all but stop, spending months in maintenance, support, requirements analysis for new products, design, or any of a number of non-coding activities.
One hour of test per LOC means two testers (160 hours/month each) are needed for the developer creating about 300 LOC/month.
One test per two LOC is another number that seems unlikely, but a little math shows it isn't an outrageous figure. One version of the Linux kernel I have averages 17.6 statements per function, with an average cyclomatic complexity of 4.7. Since complexity is the minimum number of tests needed, at least one test is needed per four lines of code. Maybe a lot more; complexity doesn't give an upper bound. So one per two lines of code could be right, and is certainly not off by very much.
Jones' and Bonsignour's data is skewed towards large companies on large projects. Smaller efforts may see different results. They do note that judicious use of static analysis and code inspections greatly changes the results, since these two techniques, used together, and used effectively, can eliminate 97% of all defects pre-test. But they admit that few of their clients exercise much discipline with the methods. If 97% of the defects were simply not there, that 84 minutes of rework drops to 2.5 and well over half the testing effort goes away. Yet the code undergoes exactly the same set of tests.
(Here's another way to play with the numbers. The average embedded project removes 95% of all defects before shipping. Use static analysis and inspections effectively, and one could completely skip testing and still have higher quality than the average organization! I don't advocate this, of course, since we should aspire to extremely high quality levels. But it does make you think. And, though the authors say that static analysis and inspections can eliminate 97% of defects, that's a far higher number than I have seen.)
The authors don't address alternative strategies. Tools exist that will create tests automatically. I have a copy of LDRA Unit here, which is extraordinary at creating unit tests, and I plan to report on it in more detail in a future article.
Test is no panacea. But it's a critical part of generating good code. It's best to view quality as a series of filters: each activity removes some percentage of the defects. Inspections, compiler warnings, static analysis, lint, test and all of the other steps we use each filters out bugs.
Jones' and Bonsignour's results are fascinating, but like so much empirical software data one has to be wary of assigning too much weight to any single result. It's best to think of it like an impressionistic painting that gives a suggestion of the underlying reality, rather than as hard science. Still, their metrics give us some data to work from, and data is sorely needed in this industry.
What about you? How many test do you create per LOC or function point?
Published February 4, 2015