386 Protected Mode
Part 2 of a two part series on protected mode. Part 1 is here.Published in Embedded Systems Programming, August 1991
By Jack Ganssle
Last month I introduced the architecture of the 386, and described how it uses "segment" registers to access a 4 Gb address space. Though many believe that segmentation isn't used in protected mode, in fact it is every bit as crucial as with an 8088. Every address reference is made via segmentation whether in real or protected mode. However, protected mode segments can be any size, from a single byte to all the way up to 4 Gb (32 bits).
To summarize last month's description of 386 addressing, every protected mode memory reference uses a selector, an offset, and a descriptor to form a linear address. CS, DS, ES, SS, FS, and GS (segment registers in real mode) are called "selectors", and are pointers to data structures that define characteristics of a segment. These 8 byte data structures are known as "descriptors", and are grouped into tables. The Global Descriptor Table (GDT) is available to every task in a 386 program, and contains up to 8192 descriptors. Local Descriptor Tables (LDTs) can be private to individual tasks, and also contain up to 8192 descriptors.
Every descriptor contains the starting address of the segment (a 32 bit absolute number), the end address (for checking out-of-bounds errors), and miscellaneous access right bits.
Just like in real mode, a "segment register" is associated with every type of memory access. In protected mode, these selectors contain a 13 bit index into the GDT or LDT. The instruction:
MOV AX,[data]
uses selector DS (by default) to index into the GDT or LDT, where the processor finds the base address of the segment containing item "data". The CPU adds this base address to the offset (i.e., the address of "data" as stored in the instruction bytes) to create a linear address to send to memory.
Thus, the descriptor tables define the bases and sizes of every segment in the program, and define the areas of memory that are addressable. It's easy to set up the descriptor tables using special 386-aware linkers available from a number of vendors.
Protection Systems
So far I've glossed over the details of the format of selectors and descriptors. In fact, each contains information used to keep ill-behaved programs in check. The whole issue of capturing address violation errors is perhaps a bit new to the embedded world, but with the proliferation of ever more complex systems will certainly become important in the next few years. As one who has suffered through watching programs crash and write over themselves, I find it breathtaking to watch buggy 386 code recover from practically any insult I toss at it; the protection mechanisms ensure that the code never gets overwritten, and that the operating system, if any, remains intact and functional.
The 386 supports 3 privilege levels, numbered 0 to 3. The highest, most privileged level is 0 - a program running at this level can gain access to any 386 resource. Programs running with lower privilege levels are restricted in their ability to use memory, I/O, and some instructions.
Privilege levels are intimately tied to descriptors. As I mentioned, the descriptor contains the base address of a segment, the segment size, and access rights bits. Two of these bits specify the Descriptor's Privilege Level (DPL). Privileges are thus associated with segments, a somewhat novel concept when you consider that most CPUs simply have a global privilege setting that effects all of memory equally.
Before describing how a segment's DPL effects memory access rights, it makes sense to answer the obvious question: what defines the processor's privilege level? Cleverly enough, this is handled entirely within the context of segment privileges. The CPU runs at the privilege level defined within the DPL of the current code segment - the Current Privilege Level (CPL). Privileges are somewhat removed from the code, then. A transfer to a segment with a DPL of 0 (say, the operating system), will always run with the greatest access rights. Vector off to a code segment with DPL=3 and you'll be very limited in your ability to run amok.
Every time any section of code accesses another segment, the 386 hardware compares the CPL to the referenced segment's DPL (i.e., it compares the privilege level the CPU is running at to the privilege defined for the segment). If the CPL is the same or higher (smaller number) than the DPL, then it can proceed with the access. An attempt to access a segment more privileged then the computer's CPL results in an exception, letting us know something is wrong.
Thus, code running in a segment with a DPL of 0 pumps the CPU up to a CPL of 0, and gives the CPU access to every other segment.
Novice 8086 assembly programmers always moan about the complexity of segments and segment groups. Sometimes the ASSUMEs, GROUPs, and other pseudo-ops seem to be an awful lot of trouble. When you switch to the 386 suddenly these constructs make perfect sense: group like segments together, simultaneously grouping privilege levels. Perhaps the operating system will be grouped into one segment with a DPL of 0 so it can access any resource. Maybe device drivers can fit into a less important group, giving them just as much power as needed but no more, preventing them from trashing code. Finally, run the application program at a very low privilege (i.e., high number, like 3), so it cannot effect system data structures or I/O.
We're now talking about two independent levels of protection. The first is defined by segment sizes: no task can access outside of whatever segment it is attempting to use, since an address that exceeds the segment-size field in the descriptor will generate an exception. Obviously, array subscripting errors just cannot cause major crashes if the segments are defined cleverly. The second level of protection is DPL checking, which prevents accesses to higher privileged segments.
In addition, the processor provides hardware protection of certain dangerous instructions. Obviously, the HLT instruction is one to be limited only to very highly privileged tasks. In addition, those instructions that load the 386's internal control registers (including the debug registers), and those that load the descriptor table base pointers should be restricted to only some tasks. These and a few other instructions will cause an exception if they are executed by ny code running with a CPL greater than 0.
I/O instructions are protected as well. An I/O protection level is defined in the processor's EFLAGS register. Instructions to enable and disable interrupts will cause an exception if executed from a section of code less privileged than the I/O protection level. Any I/O instruction will create a similar error only if bitmap, an array of 64k bits that indicates the protection status for each and every port.
Call Gates
Given that a low privileged task cannot access code or data with a higher privilege (lower number), then how can any task invoke the operating system? The operating system, probably running at CPL 0, can access outwards; a mechanism is needed to permit application programs access to OS resources.
The 386 uses "call gates" to access higher privileged routines. A call gate is a special type of descriptor, stored in the GDT or LDT, that contains a pointer to an entry point. To invoke a higher privilege routine the linker will replace your CALL instructions with a CALL that works indirectly through this new form of descriptor.
Where a normal descriptor contains just the segment's base address, length, and access rights bits, a call gate (which is also 8 bytes long) has only the destination routine's selector, offset, and DPL. The call gate is an indirect pointer to the destination segment's descriptor.
Though this is a bit tricky, essentially all a call gate does is remove the selector and offset from the call instruction (where these things would normally go), and place them inside of the descriptor table. That is, the call gate contains the complete destination address selection parameters. The CALL instruction itself has a selector (that selects the call gate, just as any selector picks a descriptor), and an ignored offset (since the offset to the routine is in the call gate).
If you use a call gate to access routine invoke_os, the linker will replace your CALL with a CALL to the gate - it will load the selector with the gate's index in the descriptor table and probably store garbage in the offset part of the instruction. At runtime, the 386 sees the call, uses the selector to read the gate's 8 bytes, saves the offset part from the descriptor, and uses the descriptor's selector to load in the destination address's code segment descriptor. This yields a base address (and length and access rights), which is added to the offset from the call gate, generating the linear address of the routine.
The 386 uses the DPL in the call gate to ensure the invoker is allowed to use the gate: the caller must be at least as privileged as the gate. It then switches to the privilege level indicated in the descriptor pointed to by the gate. Thus, a low level application routine calls for operating system service with a call gate. The transfer through the gate will raise the privilege level to that of the OS.
Call gates add yet another level of complexity to a program's structure, but most of the details can be left to the linker. One of the nice advantages of the gate is that every call to it uses the same selector. If the gate is defined at some sacred location that never changes from version to version, then the gate is sort of like a jump table. I've always been a big fan of using jump tables in embedded systems, so you can figure out where routines are, even in the field with limited tools, even after 50 versions of the ROM.
Call gates are designed mostly for use when privilege level transitions are needed. Since they are stored in a descriptor table, you are limited in the number of gates the system will support. Remember that the GDT and each LDT is limited to 8k entries, which is far from infinity. Generally, gates are used to funnel requests for operating system service through a single OS dispatcher.
Other Goodies
The 386 is just chock full of features for managing complex operating systems and code. This list is far too extensive to cover here in any detail. However, I'll briefly mention several other features that can help in developing any kind of system, embedded or otherwise.
The processor does support virtual memory. One of the attribute bits in every segment descriptor indicates if the segment is present. A reference to a not-present segment creates an exception, allowing system software to load the required segment from disk. Frankly, I'm not sure what this would be useful for in an embedded system, but it does seem like a neat feature. I'd welcome ideas...
The processor's memory management has yet another level beyond the segmentation I've described. Optionally, you can divide the 4 Gb address space into smaller chunks and then remap the physical address of each chunk through page tables. You define the page tables to translate practically any address into any other. Thus, two tasks could be compiled at identical addresses, yet run at different physical addresses by using different paging. Again, is this useful for an embedded system? Does someone out there have some devilishly clever technique you'd care to share with us?
The 386 does include a number of debug registers that let you set hardware breakpoints on up to 4 addresses simultaneously. These breakpoints work rather like those produced by an emulator: they are non-intrusive, and work in ROM or RAM. You can set them on code or data accesses. If you'd care to write a monitor to embed in the product (always a good idea for long term product maintenance), then by all means use these resources.
Conclusion
Why use protected mode in embedded applications? The biggest attraction is the large, 32 bit address space that becomes immediately available. Of course, most any other 32 bit CPU will give easier access to lots of memory.
Certainly the DOS based tools that so many non-embedded people use are a compelling incentive to stick with the 80x86 architecture. How many millions use all of the great DOS Cs and assemblers? You can use any of these on the 386, and as they become more 32 bit aware they'll take even greater advantage of the 386's features. Quick development cycles demand proven tools, and it's awfully hard to argue against those from the DOS world. You can even do a lot of the development on a DOS machine, and port to the harder embedded world after removing most of the bugs.
A lot of embedded folks are now putting DOS into ROM - a subject I know will see a lot of discussion at the upcoming Embedded Systems Conference. With the 386 you can run DOS as a task in its own segment, and run other applications concurrently.
Finally, protected mode really does protect your code. With the right segmentation, you'll never, and I mean never, see a rogue program overwrite the code. This could be important in medical and other life-critical applications.
For those wishing to explore the mysteries of this processor in more detail, be sure to get the complete set of Intel reference manuals.
Intel's "Microprocessors" manual (mine is dated 1990) contains a pretty complete hardware and software description of the part, but is definitely not for the faint hearted. It is complete but succinct.
Their "386 DX Microprocessor Programmer's Reference Manual" is far more readable, but neglects all hardware issues. It gives a pretty readable account of the operation of all of the processor's major modes. This is a must read for serious 386 users.
Intel's "80386 System Software Writer's Guide", though thin, does include lots of sample code, including routines to enter and exit protected mode. It is a good adjunct to the Programmer's Reference Manual.
Finally, the "80386 Microprocessor Hardware Reference Manual" helps explain how to design hardware that will really work with the 386. This is not a trivial problem, as the CPU can get out of sync with it's bus cycles - you have to build a sort of state machine to determine what it is doing when. Even adding wait states is a bit challenging.