Embedded System Lab. Yoon Jun Kee Xen and the Art of Virtualization.

34
Embedded System Lab. Embedded System Lab. Yoon Jun Kee Xen and the Art of Virtualization

Transcript of Embedded System Lab. Yoon Jun Kee Xen and the Art of Virtualization.

Embedded System Lab.

Embedded System Lab.

Yoon Jun Kee

Xen and the Art of Vir-tualization

윤 준 기

Embedded System Lab.

Basic idea

윤 준 기

Embedded System Lab.

☠ Xen

What is Xen?

A high performance resource-managed virtual machine monitor (VMM).

Zen enables applications. server consolidation co-located hosting facilities distributed web services secure computing platforms application mobility.

Xen 4.5

윤 준 기

Embedded System Lab.

INTRODUCTION VMM : High performance resource-managed virtual machine monitor. Successful partitioning

VMs must be isolated To support a variety of different OS The performance overhead introduced by virtualization should be small

Xen enables to dynamically instantiate an operating system. Xen enables to dynamically instantiate an operating system. performance isolation

The scheduling, mem-ory demand, network traffic, disk accesses

Impact the performance

of others

윤 준 기

Embedded System Lab.

XEN: APPROACH & OVERVIEW Paravirtualization vs Full virtualization

윤 준 기

Embedded System Lab.

XEN: APPROACH & OVERVIEW design principles

Support for unmodified application binaries is essential, or users will not transition to Xen..

Supporting full multi-application operating systems is important, as this al-lows complex server configurations to be virtualized within a single guest OS instance.

Paravirtualization is necessary to obtain high performance and strong re-source isolation on uncooperative machine architectures such as x86.

Even on cooperative machine architectures, completely hiding the effects of resource virtualization from guest OSes risks both correctness and per-formance.

윤 준 기

Embedded System Lab.

XEN: APPROACH & OVERVIEW Xen design

Xen is intended to scale to approximately 100 virtual machines running industry standard applications and services.

1) Denali does not target existing ABIs, and so can elide certain architectural features from their VM interface.

2) The Denali implementation does not address the problem of supporting application multiplexing, nor multiple address spaces, within a single guest OS.

3) In the Denali architecture the VMM performs all paging to and from disk.

4) Denali virtualizes the ‘namespaces’ of all machine resources, taking the view that no VM can access the resource allocations of another VM if it cannot name them.

윤 준 기

Embedded System Lab.

The Virtual Machine Interface Such as memory management, are specific to the x86, many aspects

(such as our virtual CPU and I/O devices) can be readily applied to other machine architectures.

윤 준 기

Embedded System Lab.

Typical System call

윤 준 기

Embedded System Lab.

Xen CPU X86 supports 4 levels of privi-

leges 0 for OS, and 3 for applications Xen downgrades the privilege

of OSes System-call and page-fault

handlers registered to Xen “fast handlers” for most excep-

tions, Xen isn’t involved

Xen: CPU

윤 준 기

Embedded System Lab.

Control transfer

윤 준 기

Embedded System Lab.

Virtualizing Device I/O Xen exposes device abstractions.

I/O data is transferred to and from each domain via Xen, using shared memory, asynchronous buffer descriptor rings.

윤 준 기

Embedded System Lab.

Data Transfer: I/O Rings Two main factors have shaped the design of our I/O-transfer mecha-

nism.1. resource management

2. event notification

I/O buffers are protected during data transfer by pinning the underly-ing page frames within Xen.

윤 준 기

Embedded System Lab.

Data Transfer: I/O Rings Decouple the production of requests or responses from the notifica-

tion of the other party.1. Requests case : a domain may enqueue multiple entries before invoking

a hypercall to alert Xen.

2. Responses case : a domain can defer delivery of a notification event by specifying a threshold number of responses.

This allows each domain to trade-off latency and throughput require-ments, similarly to the flow-aware interrupt dispatch in the ArseNIC Gigabit Ethernet interface.

윤 준 기

Embedded System Lab.

Memory management Hardware provides a softwaremanaged TLB. Unfortunately, x86 does

not have a software-managed TLB.

TLB is not tagged, address space switches typically require a com-plete TLB flush.

Given these limitations to made two decisions.1. Guest OSes are responsible for allocating and managing the hardware

page tables.

2. Xen exists in a 64MB section at the top of every address space, thus avoiding a TLB flush when entering and leaving the hypervisor.

The OS must relinquish direct write privileges to the page-table memory: all subsequent updates must be validated by Xen.

윤 준 기

Embedded System Lab.

Xen Control and Management

윤 준 기

Embedded System Lab.

Subsystem Virtualization

CPU scheduling

Xen currently schedules domains according to the Borrowed Virtual Time (BVT) scheduling algorithm

Fast dispatch is particularly important to minimize the effect of virtual-ization on OS subsystems that are designed to run in a timely fashion

윤 준 기

Embedded System Lab.

Subsystem Virtualization

Time and timers Each guest OS can program a pair of alarm timers, one for real time

and the other for virtual time Real time : Real time is expressed in nanoseconds passed since machine

boot and is maintained to the accuracy of the processor’s cycle counter and can be frequency-locked to an external time source

Virtual time : virtual time only advances while it is executing

Guest OSes are expected to maintain internal timer queues and use the Xen-provided alarm timers to trigger the earliest timeout

윤 준 기

Embedded System Lab.

Subsystem Virtualization

Virtual address translation No shadow pages (VMWare) Xen provides constrained but direct MMU updates All guest OSes have read-only accesses to page tables Updates are batched into a single hypercall Updates must be validated by Xen Guest OSes are responsible for allocation and managing pages within

their own domain Xen exists in a generally unused section at the top of every address

space to prevent paging out

윤 준 기

Embedded System Lab.

Subsystem Virtualization

Physical memory Reserved at domain creation times Memory statically partitioned among domains Does not guarantee contiguous regions of memory Supports hardware~physical mapping by providing shared translation

array readable by all domains

윤 준 기

Embedded System Lab.

Subsystem Virtualization

Network Virtual firewall-router attached to all domains Round-robin packet scheduler To send a packet, enqueue a buffer descriptor into the transmit rang Use scatter-gather DMA (no packet copying)

1. A domain needs to exchange page frame to avoid Copying

2. Page-aligned buffering

윤 준 기

Embedded System Lab.

Subsystem Virtualization

Disk Only Domain0 has direct access to disks Other domains need to use virtual block devices

1. Use the I/O ring

2. Reorder requests prior to enqueuing them on the ring

3. If permitted, Xen will also reorder requests to improve Performance Use DMA (zero copy)

윤 준 기

Embedded System Lab.

The Cost of Porting an OS to Xen More changes were required in Windows XP, mainly due to the pres-

ence of legacy 16-bit emulation code and the need for a somewhat different boot-loading mechanism.

윤 준 기

Embedded System Lab.

Evaluation Based on Linux 2.4.21(neither XP nor NetBSD fully functional)

Thoroughly compared to 2 other systems1. –VMware Workstation (binary translation)

2. –UML (run Linux as a Linux process)

Performs better than solutions with restrictive licenses (ESX Server)

윤 준 기

Embedded System Lab.

Relative Performance

윤 준 기

Embedded System Lab.

Operating System Benchmarks As expected fork, exec and shrequire large number of page updates

which slow things down On the up side these can be batched (up to 8MB of address space

constructed per hypercall)

윤 준 기

Embedded System Lab.

Operating System Benchmarks Overhead due to a hypercall when switching context in a guest OS (in

order to change base of page table) The larger the working set the smaller the relative overhead

윤 준 기

Embedded System Lab.

Operating System Benchmarks 2 transitions into XEN

One for the page fault handler One to actually get the page

윤 준 기

Embedded System Lab.

Operating System Benchmarks Page flipping really pays off –no unnecessary data copying More overhead for smaller packets –we still need to deal with every

header

윤 준 기

Embedded System Lab.

Concurrent Virtual Machines Unexpectedly low SMP

performance for 1 in-stance of Apache

As expected adding an-other domain leads to a sharp jump in perfor-mance under XEN

More domains –more overhead

윤 준 기

Embedded System Lab.

Concurrent Virtual Machines Performance differentia-

tion works as expected with IR

But fails with OLTP Probably due to ineffi-

ciencies with the disk scheduling algorithm

윤 준 기

Embedded System Lab.

Isolation Run uncooperative

user applications, see if they bring down the system

2 “bad” domains vs 2 “good” ones

XEN delivers good performance even in this case

윤 준 기

Embedded System Lab.

Scalability Very low footprint per domain (4 -6MB memory, 20KB state) Benchmark is compute-bound and Linux assigns long time slices,

XEN needs some tweaking Even without it does pretty well (but no absolute values)

윤 준 기

Embedded System Lab.

Thank you! Any questions?