1
Disco: Running Commodity
Operating Systems on Scalable
Multiprocessors
Edouard Bugnion, Scott Devine, Mendel Rosenblum,
Stanford University, 1997
Presented by Divya Parekh
2
Outline
Virtualization
Disco description
Disco performance
Discussion
3
Virtualization
“a technique for hiding the physical characteristics of computing resources from the way in which other systems, applications, or end users interact with those resources. This includes making a single physical resource appear to function as multiple logical resources; or it can include making multiple physical resources appear as a single logical resource”
4
Old idea from the 1960s
IBM VM/370 – A VMM for IBM mainframe Multiple OS environments on expensive hardware
Desirable when few machine around
Popular research idea in 1960s and 1970s Entire conferences on virtual machine monitors
Hardware/VMM/OS designed together
Interest died out in the 1980s and 1990s Hardware got more cheaper
Operating systems got more powerful (e.g. multi-user)
5
A Return to Virtual Machines
Disco: Stanford research project (SOSP ’97)
Run commodity OSes on scalable multiprocessors
Focus on high-end: NUMA, MIPS, IRIX
Commercial virtual machines for x86 architecture
VMware Workstation (now EMC) (1999-)
Connectix VirtualPC (now Microsoft)
Research virtual machines for x86 architecture
Xen (SOSP ’03)
plex86
OS-level virtualization
FreeBSD Jails, User-mode-linux, UMLinux
6
Overview
Virtual Machine
A fully protected and isolated copy of the underlying
physical machine’s hardware. (definition by IBM)”
Virtual Machine Monitor
A thin layer of software that's between the hardware
and the Operating system, virtualizing and managing
all hardware resources.
Also known as “Hypervisor”
7
Classification of Virtual Machines
8
Classification of Virtual Machines
Type I VMM is implemented directly on the physical hardware.
VMM performs the scheduling and allocation of the system’s resources.
IBM VM/370, Disco, VMware’s ESX Server, Xen
Type II VMMs are built completely on top of a host OS.
The host OS provides resource allocation and standard execution environment to each “guest OS.”
User-mode Linux (UML), UMLinux
9
Non-Virtualizable Architectures
According to Popek and Goldberg,” an architecture is virtualizable if the set of
sensitive instructions is a subset of the set of privileged instructions.”
x86 Several instructions can read system state in
register CPL 3 without trapping
MIPS KSEG0 bypasses TLB, reads physical memory
directly
10
Type I contd..
Hardware Support for Virtualization
Figure: The hardware support approach to x86 VirtualizationE.g. Intel Vanderpool/VT and AMD-V/SVM
11
Type I contd..
Full Virtualization
Figure : The binary translation approach to x86 Virtualization
E.g. VMware ESX server
12
Type I contd..
Paravirtualization
Figure: The Paravirtualization approach to x86 Virtualization E.g. Xen
13
Type II
Hosted VM Architecture
E.g. VMware Workstation, Connectix VirtualPC
14
Disco : VMM Prototype
Goals
Extend modern OS to run efficiently on shared memory multiprocessors without large changes to the OS.
A VMM built to run multiple copies of Silicon Graphics IRIX operating system on a Stanford Flash shared memory multiprocessor.
15
Problem Description
Multiprocessor in the market (1990s)
Innovative Hardware
Hardware faster than System Software
Customized OS are late, incompatible, and possibly bug
Commodity OS not suited for multiprocessors
Do not scale cause of lock contention, memory architecture
Do not isolate/contain faults
More Processors More failures
16
Solution to the problems
Resource-intensive Modification of OS (hard and time consuming, increase in size, etc)
Make a Virtual Machine Monitor (software) between OS and Hardware to resolve the problem
17
Two opposite Way for System Software
Address these challenges in the operating system: OS-Intensive Hive , Hurricane, Cellular-IRIX, etc innovative, single system image But large effort.
Hard-partition machine into independent failure units: OS-light Sun Enterprise10000 machine Partial single system image Cannot dynamically adapt the partitioning
18
Return to Virtual Machine Monitors
One Compromise Way between OS-intensive & OS-light – VMM
Virtual machine monitors, in combination with commodity and specialized operating systems, form a flexible system software solution for these machines
Disco was introduced to allow trading off between the costs of performance and development cost.
19
Architecture of Disco
20
Advantages of this approach
Scalability
Flexibility
Hide NUMA effect
Fault Containment
Compatibility with legacy applications
21
Challenges Facing Virtual Machines
Overheads
Trap and emulate privileged instructions of guest OS
Access to I/O devices
Replication of memory in each VM
Resource Management
Lack of information to make good policy decisions
Communication and Sharing
Stand alone VM’s cannot communicate
22
Disco’s Interface
Processors MIPS R10000 processor Emulates all instructions, the MMU, trap architecture Extension to support common processor operations
Enabling/disabling interrupts, accessing privileged registers
Physical memory Contiguous, starting at address 0
I/O devices Virtualize devices like I/O, disks, n/w interface exclusive to VM Physical devices multiplexed by Disco Special abstractions for SCSI disks and network interfaces
Virtual disks for VMs Virtual subnet across all virtual machines
23
Disco Implementation
Multi threaded shared memory program
Attention to NUMA memory placement, cache aware data structures and IPC patterns
Code segment of DISCO copied to each flash processor – data locality
Communicate using shared memory
24
Virtual CPUs
Direct Execution execution of virtual CPU on real CPU Sets the real machine’s registers to the virtual CPU’s
Jumps to the current PC of the virtual CPU, Direct execution on the real CPU
Challenges Detection and fast emulation of operations that cannot be
safely exported to the virtual machine privileged instructions such as TLB modification and Direct access to physical memory and I/O devices.
Maintains data structure for each virtual CPU for trap emulation
Scheduler multiplexes virtual CPU on real processor
25
Virtual Physical Memory
Address translation & maintains a physical-to-machine address (40 bit) mapping.
Virtual machines use physical addresses
Software reloaded translation-lookaside buffer (TLB) of the MIPS processor
Maintains pmap data structure for each VM –contains one entry for each physical to virtual mapping
pmap also has a back pointer to its virtual address to help invalidate mappings in the TLB
26
Contd..
Kernel mode references on MIPS processors access memory and I/O directly - need to re-link OS code and data to a mapped address space
MIPS tags each TLB entry with Address space identifiers (ASID)
ASIDs are not virtualized - TLB need to be flushed on VM context switches
Increased TLB misses in workloads Additional Operating system references VM context switches
TLB misses expensive - create 2nd level software -TLB . Idea similar to cache?
27
NUMA Memory management
Cache misses should be satisfied from local memory (fast) rather than remote memory (slow)
Dynamic Page Migration and Replication
Pages frequently accessed by one node are migrated
Read-shared pages are replicated among all nodes
Write-shared are not moved, since maintaining consistency requires remote access anyway
Migration and replacement policy is driven by cache-miss-counting facility provided by the FLASH hardware
28
Transparent Page Replication
1. Two different virtual processors of the same virtual machine logically read-share the same physical page, but each virtual processor accessesa local copy.
2. memmap tracks which virtual page references each physical page. Used during TLB shootdown
29
Disco Memory Management
30
Virtual I/O Devices
Disco intercepts all device accesses from the virtual machine and forwards them to the physical devices
Special device drivers are added to the guest OS Disco device provide monitor call interface to pass all
the arguments in single trap Single VM accessing a device does not require
virtualizing the I/O – only needs to assure exclusivity
31
Copy-on-write Disks
Intercept DMA requests to translate the physical addresses into machine addresses.
Maps machine page as read only to destination address page of DMA Sharing machine memory
Attempts to modify a shared page will result in a copy-on-write fault handled internally by the monitor. Logs are maintained for each VM Modification
Modification made in main memory
Non-persistent disks are copy on write shared E.g. Kernel text and buffer cache
E.g. File systems root disks
32
Transparent Sharing of Pages
Creates a global buffer cache shared across VM's and reduces memory foot print of the system
33
Virtual Network Interface
Virtual subnet and network interface use copy on write mapping to share the read only pages
Persistent disks can be accessed using standard system protocol NFS
Provides a global buffer cache that is transparently shared by independent VMs
34
Transparent sharing of pages over NFS
1. The monitor’s networking device remaps the data page from the source’s machine address space to the destination’s.
2. The monitor remaps the data page from the driver’s mbuf to the clients
buffer cache.
35
Modifications to the IRIX 5.3 OS
Minor changes to kernel code and data segment – specific to MIPS
Relocate the unmapped segment of the virtual machine into the mapped supervisor segment of the processor– Kernel relocation
Disco drivers are same as original device drivers of IRIX
Patched HAL to use memory loads/stores instead of privileged instructions
36
Modifications to the IRIX 5.3 OS
Added code to HAL to pass hints to monitor for resource management
New Monitor calls to MMU to request zeroed page, unused memory reclamation
Changed mbuf management to be page-aligned
Changed bcopy to use remap (with copy-on-write)
37
SPLASHOS: A specialized OS
Thin specialized library OS, supported directly by Disco
No need for virtual memory subsystem since they share address space
Used for the parallel scientific applications that can span the entire machine
38
Disco: Performance
Experimental Setup
Disco targets the FLASH machine not available that time
Used SimOS, a machine simulator that models the hardware of MIPS-based multiprocessors for the Disco monitor.
Simulator was too slow to allow long work loads to be studied
39
Disco: Performance
Workloads
40
Disco: Performance
Execution Overhead
Pmake overhead due to I/O virtualization, others due to TLB mappingReduction of kernel timeOn average virtualization overhead of 3% to 16%
41
Disco: Performance
Memory Overheads
V: Pmake memory used if there is no sharingM: Pmake memory used if there is sharing
42
Disco: Performance
Scalability
Partitioning of problem into different VM’s increases scalability.
Kernel synchronization time becomes smaller.
43
Disco: Performance
Dynamic Page Migration and replication
44
Conclusion
Disco VMM hides NUMA-ness from non-NUMA aware OS
Disco VMM is low(er) effort
Moderate overhead due to virtualization
45
Discussion
Was Disco- VMM done rightly? Virtual Physical Memory on architectures other
than MIPS MIPS TLB is software managed
Not sure of how well other OS perform on Disco since IRIX was designed for MIPS
Not sure how HIVE, Hurricane performs comparatively
Performance of long workloads on the system Performance of heterogeneous VMs e.g. Pmake
case
46
Discussion
Are VMM Microkernels done right?
Top Related