Software-based Virtualization
The following discussion focuses only on virtualization of protected mode of the x86 architecture.
In protected mode the Kernel runs at a higher privilege such as ring 0, and applications at a lower privilege such as ring 3. Similarly, a host OS must control the processor while the guest OSs are prevented from direct access to the hardware. One approach used in x86 software-based virtualization is called ring deprivileging, which involves running the guest OS at a ring higher than 0.
Three techniques made virtualization of protected mode possible:
- Binary translation is used to rewrite in terms of ring 3 instructions certain ring 0 instructions, such as POPF, that would otherwise fail silently or behave differently when executed above ring 0, making the classic trap-and-emulate virtualization impossible. To improve performance, the translated basic blocks need to be cached in a coherent way that detects code patching (used in VxDs for instance), the reuse of pages by the guest OS, or even self-modifying code.
- A number of key data structures used by a processor need to be shadowed. Because most operating systems use paged virtual memory, and granting the guest OS direct access to the MMU would mean loss of control by the virtualization manager, some of the work of the x86 MMU needs to be duplicated in software for the guest OS using a technique known as shadow page tables. This involves denying the guest OS any access to the actual page table entries by trapping access attempts and emulating them instead in software. The x86 architecture uses hidden state to store segment descriptors in the processor, so once the segment descriptors have been loaded into the processor, the memory from which they have been loaded may be overwritten and there is no way to get the descriptors back from the processor. Shadow descriptor tables must therefore be used to track changes made to the descriptor tables by the guest OS.
- I/O device emulation: Unsupported devices on the guest OS must be emulated by a device emulator that runs in the host OS.
These techniques incur some performance overhead due to lack of MMU virtualization support, as compared to a VM running on a natively virtualizable architecture such as the IBM System/370.
On traditional mainframes, the classic type 1 hypervisor was self-standing and did not depend on any operating system or run any user applications itself. In contrast, the first x86 virtualization products were aimed at workstation computers, and ran a guest OS inside a host OS by embedding the hypervisor in a kernel module that ran under the host OS (type 2 hypervisor).
There has been some controversy whether the x86 architecture with no hardware assistance is virtualizable as described by Popek and Goldberg. VMware researchers pointed out in a 2006 ASPLOS paper that the above techniques made the x86 platform virtualizable in the sense of meeting the three criteria of Popek and Goldberg, albeit not by the classic trap-and-emulate technique. However, as of 2009 some academics claimed that it is not.
A different route was taken by other systems like Denali, L4, and Xen, known as paravirtualization, which involves porting operating systems to run on the resulting virtual machine, which does not implement the parts of the actual x86 instruction set that are hard to virtualize. The paravirtualized I/O has significant performance benefits as demonstrated in the original SOSP'03 Xen paper.
Read more about this topic: X86 Virtualization