By Jack Ganssle
Microkernel vs Monolithic
Published 5/24/2006
Slashdot reopened the endless Linus Torvald vs. Andy Tanenbaum debate about microkernels and. monolithic operating systems (http://developers.slashdot.org/article.pl?sid=06/05/15/1637206). This is a silly discussion akin to debating bicycles versus cars: both are forms of transportation that meet differing needs. Both can and should coexist in the people-moving ecology.
Minux is a microkernel, defined more or less as a very small operating system that provides system calls to manage basic services like handling threads, address spaces, and inter-process communications. A microkernel relegates all other activities to "servers" that exist in user space. A big monolithic OS (like Linux and Windows), on the other hand, provides far more services in the protected kernel space. Linux's kernel is over 2 million lines of code; Window's is far bigger. Monolithic kernels have been tremendously successful and do a yeomen-like job of running the world's desktops and many embedded systems.
Where most operating systems have complex and often conflicting design goals, plus the agony of support for legacy code, microkernels tout reliability as their primary feature. A bug in a device driver, for instance, only crashes that driver and not the entire system. Judicious use of a memory management unit insures that non-kernel servers live in their own address spaces, independent of each other, and protected from each other. If a server crashes, the kernel can restart that component rather than having the entire system die or fall into a seriously degraded mode. Monolithic kernel advocates note that the microkernel is far from a panacea and that malicious attacks can still cripple the system (for more on this see http://en.wikipedia.org/wiki/Microkernel).
Regardless of the debate, the philosophy behind microkernels fascinates me. See http://www.minix3.org/reliability.html to see how Minix, for instance, is designed for reliable operation (which is, in my opinion, far more important than the desire to pile features into already-bloated code).
Minix stresses a small kernel size. It's a whole lot easier to ensure 4k lines of code are correct than 2 million. By installing device drivers and other typically-buggy features in user space, most of the OS simply cannot execute privileged instructions or access memory or I/O belonging to another process or resource. Infinite loops disappear, since the scheduler lowers the sick server's priority till it becomes the idle task.
A reincarnation server regularly pings each server; those that don't respond are killed off and restarted. That's pretty cool.
Every firmware engineer should read (http://www.minix3.org/reliability.html) and think deeply about Minix's philosophy of reliability. The idea that bugs in big systems are inevitable, but that we can build fault-tolerant code that survives attacks and defects, is important. It's worth thinking about whether you use a micro- or monolithic-kernel, or even just a while(1) loop.
I believe the next great innovation in embedded processor design will be a very old idea: the memory management unit. An MMU in every CPU, coupled with code that isolates each OS component and task into distinct, hardware-protected memory areas, can and will lead to much more reliable firmware.
What do you think?