By Jack Ganssle
A Trillion Lines of Code?
Published 9/20/2008
An article (http://www.ddj.com/architect/210602491?pgno=1) on Dr. Dobb's site claims that in 1997 the Gartner Group estimated there were about 240 billion lines of Cobol running active business applications worldwide.
If those quarter-trillion lines were written at a constant rate over the 40 year history (as of the study's 1997 date) of the language, which is hard to believe, that means developers cranked 6 billion lines of Cobol a year. Add to that all of the other Cobol apps that no longer exist; maybe the world has been producing 10 billion Cobol lines a year.
I have no idea how many Cobol programmers exist; certainly they're an increasingly-rare breed today. But to pick a number that seems wildly high, suppose for that 40 years a million Cobol developers were employed every year. That's 10k LOC/person/year, or 800/month, an unusually high productivity figure. Cobol is wordy, but has roughly the same density as C - that is, programs with the same functionality will be about the same length in both languages. (See Backfiring: Converting Lines of Code to Function Points, by Capers Jones, IEEE Computer, November 1995).
So I started to wonder how much software is extant in all languages. Surely in the decade since the Gartner study the base of Cobol applications must have grown. And though Cobol might be the most popular business language, it's merely one of some 700 computer languages, some of which have huge constituencies, like C and C++.
How much PC code exists? Or non-Cobol business, government and military code? I have no idea.
What about embedded? About 118k embedded projects start each year in the US. Maybe double that for the world-wide figure. Multiply that by nearly 40 years of embedded history, cut it in half to account for a lower figure in the early years, and it appears some 5 million embedded projects have been built.
It's unreasonable to talk about "average" sizes of embedded programs as they span the gamut of a few hundred LOC for many tiny apps to the increasingly-common multi-million line products. But, for the sake of playing with the math, we assume there are a trillion LOC of firmware, then that means each of those 5m embedded apps has 200,000 lines. That's certainly high. but it's probably within an order of magnitude or better.
Combine Cobol, PC code, web Java, firmware and all the rest it's not unreasonable to assume there's a trillion lines of code mediating the electronic hum that powers the world. That's truly a staggering number. A single one-million line app is baffling in its complexity, but a trillion is a million millions, something beyond any of us to comprehend.
The news this week is full of the cost of nationalizing the financial system. ah, I mean, stabilizing the economy. People far smarter than me at economics put around a trillion dollars, more or less.
A trillion here, a trillion there, pretty soon you're talking about real money, and a really, really, big software base.