Xen and The Art of Virtualization
-
Upload
cameroon45 -
Category
Technology
-
view
584 -
download
1
Transcript of Xen and The Art of Virtualization
![Page 1: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/1.jpg)
Xen and The Art of Virtualization
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt & Andrew
Warfield
Jianmin Chen 10/19/2007
![Page 2: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/2.jpg)
Motivation
up to 100 virtual machine instances Simultaneously on a modern server.
![Page 3: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/3.jpg)
Concerns
Variety: Allow for full-blown operating systems of different typesCommodity hardware & O/Ss, but requires
some source code modifications
Isolation: Guest OS should not affect each other
Efficiency: Reduce overhead
![Page 4: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/4.jpg)
Enhanced OS?
It is possible to re-design O/Ss to enforce performance isolation E.g. resource containers [OSDI’99]
Difficult to account for all resource usage to a process For instance, buffer cache is shared across processe
s – a process may have a larger share of it than others
Page replacement algorithms generate process interference – “crosstalk”
![Page 5: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/5.jpg)
Classic VM: Full Virtualization
Functionally identical to the underlying machineAllow unmodified operating systems to be h
ostedDifficult to handle sensitive but not privilege
d instructionsNot efficient to trap everything (Syscall …)Sometimes OS wants both real and virtual r
esource information:Timer
![Page 6: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/6.jpg)
Xen
Design principles:Unmodified applications: essentialFull-blown multi-task O/Ss: essentialParavirtualization: necessary for performance
and isolation
![Page 7: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/7.jpg)
Xen
![Page 8: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/8.jpg)
Xen VM interface: Memory
Memory managementGuest cannot install highest privilege level
segment descriptors; top end of linear address space is not accessible
Guest has direct (not trapped) read access to hardware page tables; writes are trapped and handled by the VMM
Physical memory presented to guest is not necessarily contiguous
![Page 9: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/9.jpg)
Details: Memory 1
TLB: challenging Software TLB can be virtualized without flushing TLB entries bet
ween VM switches Hardware TLBs tagged with address space identifiers can also
be leveraged to avoid flushing TLB between switches x86 is hardware-managed and has no tags…
Decisions: Guest O/Ss allocate and manage their own hardware page tabl
es with minimal involvement of Xen for better safety and isolation
Xen VMM exists in a 64MB section at the top of a VM’s address space that is not accessible from the guest
![Page 10: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/10.jpg)
Details: Memory 2
Guest O/S has direct read access to hardware page tables, but updates are validated by the VMMThrough “hypercalls” into XenAlso for segment descriptor tablesVMM must ensure access to the Xen 64MB s
ection not allowedGuest O/S may “batch” update requests to a
mortize cost of entering hypervisor
![Page 11: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/11.jpg)
Xen VM interface: CPU
CPUGuest runs at lower privilege than VMMException handlers must be registered with
VMMFast system call handler can be serviced
without trapping to VMMHardware interrupts replaced by lightweight
event notification systemTimer interface: both real and virtual time
![Page 12: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/12.jpg)
Details: CPU 1
Protection: leverages availability of multiple “rings” Intermediate rings have not been used in practice since OS/2;
x86-specific An O/S written to only use rings 0 and 3 can be ported; needs to
modify kernel to run in ring 1
![Page 13: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/13.jpg)
Details: CPU 2
Frequent exceptions: Software interrupts for system calls Page faults
Allow “guest” to register a ‘fast’ exception handler for system calls that can be accessed directly by CPU in ring 1, without switching to ring-0/Xen Handler is validated before installing in hardware exc
eption table: To make sure nothing executed in Ring 0 privilege.
Doesn’t work for Page Fault
![Page 14: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/14.jpg)
Xen VM interface: I/O
I/OVirtual devices exposed as asynchronous I/O
rings to guestsEvent notification replaces interrupts
![Page 15: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/15.jpg)
Details: I/O 1
Xen does not emulate hardware devicesExposes device abstractions for simplicity an
d performance I/O data transferred to/from guest via Xen usi
ng shared-memory buffersVirtualized interrupts: light-weight event deliv
ery mechanism from Xen-guestUpdate a bitmap in shared memoryOptional call-back handlers registered by O/S
![Page 16: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/16.jpg)
Details: I/O 2 I/O Descriptor Ring:
![Page 17: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/17.jpg)
OS Porting Cost
Number of lines of code modified or added compared with original x86 code base (excluding device drivers)Linux: 2995 (1.36%)Windows XP: 4620 (0.04%)
Re-writing of privileged routines; Removing low-level system initialization c
ode
![Page 18: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/18.jpg)
Control Transfer
Guest synchronously call into VMMExplicit control transfer from guest O/S to mo
nitor “hypercalls”
VMM delivers notifications to guest O/SE.g. data from an I/O device readyAsynchronous event mechanism; guest O/S d
oes not see hardware interrupts, only Xen notifications
![Page 19: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/19.jpg)
Event notification
Pending events stored in per-domain bitmaskE.g. incoming network packet receivedUpdated by Xen before invoking guest O/S h
andlerXen-readable flag may be set by a domain
To defer handling, based on time or number of pending requests
Analogous to interrupt disabling
![Page 20: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/20.jpg)
Data Transfer: Descriptor Ring
Descriptors are allocated by a domain (guest) and accessible from Xen
Descriptors do not contain I/O data; instead, point to data buffers also allocated by domain (guest)Facilitate zero-copy transfers of I/O data into
a domain
![Page 21: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/21.jpg)
Network Virtualization
Each domain has 1+ network interfaces (VIFs)Each VIF has 2 I/O rings (send, receive)Each direction also has rules of the form (<pa
ttern>,<action>) that are inserted by domain 0 (management)
Xen models a virtual firewall+router (VFR) to which all domain VIFs connect
![Page 22: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/22.jpg)
Network Virtualization
Packet transmission:Guest adds request to I/O ringXen copies packet header, applies matching f
ilter rulesE.g. change header IP source address for NATNo change to payload; pages with payload must b
e pinned to physical memory until DMA to physical NIC for transmission is complete
Round-robin packet scheduler
![Page 23: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/23.jpg)
Network Virtualization
Packet reception:Xen applies pattern-matching rules to determi
ne destination VIFGuest O/S required to exchange unused pag
e frame for each packet receivedXen exchanges packet buffer for page frame in VI
F’s receive ring If no receive frame is available, the packet is drop
pedAvoids Xen-guest copies; requires pagealigned re
ceive buffers to be queued at VIF’s receive ring
![Page 24: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/24.jpg)
Disk virtualization
Domain0 has access to physical disksCurrently: SCSI and IDE
All other domains: virtual block device (VBD)Created & configured by management softwa
re at domain0Accessed via I/O ring mechanismPossible reordering by Xen based on knowle
dge about disk layout
![Page 25: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/25.jpg)
Disk virtualization
Xen maintains translation tables for each VBDUsed to map requests for VBD (ID,offset) to c
orresponding physical device and sector address
Zero-copy data transfers take place using DMA between memory pages pinned by requesting domain
Scheduling: batches of requests in round-robin fashion across domains
![Page 26: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/26.jpg)
Evaluation
![Page 27: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/27.jpg)
Microbenchmarks
Stat, open, close, fork, exec, etc Xen shows overheads of up to 2x with respect t
o native Linux (context switch across 16 processes; mmap latency)
VMware shows up to 20x overheads (context switch; mmap latencies)
UML shows up to 200x overheads Fork, exec, mmap; better than VMware in context swi
tches
![Page 28: Xen and The Art of Virtualization](https://reader036.fdocuments.net/reader036/viewer/2022081506/557d7cfbd8b42a2c428b500f/html5/thumbnails/28.jpg)
Thank you