Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and...

38
Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University

Transcript of Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and...

Page 1: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

Symbiotic Virtualization

John R. LangePh.D. Final Defense

Department of Electrical Engineering and Computer ScienceNorthwestern University

August 9, 2010

Page 2: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

2

Thesis

• Current VMM architectures use lowlevel interfaces– Do not expose high level semantic information

• VMMs have little knowledge of internal guest behavior– Large semantic gap

• Symbiotic Virtualization– New approach for designing virtual architectures

– Exposes high level semantic information• Bidirectional synchronous and asynchronous communication

Page 3: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

3

Outline• Palacios

– Developed first VMM for scalable High Performance Computing (HPC)

• Largest scale study of virtualization– Proved HPC virtualization is effective at scale

• Symbiotic Virtualization (SymSpy)– High level interfaces to enable guest/VMM cooperation

• SymCall and SwapBypass– Leverage symbiotic interfaces to improve swap performance

• SymMod– Extend guest OS with arbitrary interfaces/functionality

Page 4: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

4

Past and Current Research• Palacios: Symbiotic virtualization at scale

– In progress: Scalable virtualization for HPC– In progress: Symbiotic Virtualization– IPDPS 2010: Palacios VMM and scaling– In progress: Adaptive virtual paging– WIOV 2008: Virtual passthrough I/O– OSR 2009: Virtual passthrough I/O

• Empathic Systems: Bridging systems and HCI– INFOCOM 2010: Empathic networks– SIGMETRICS 2009: Empathic networks– USENIX 2008: Speculative remote display

• Virtuoso: Adaptive virtual infrastructure as a service (IaaS) cloud– HPDC 2007: Transparent network services– ICAC 2006: Formalization of cloud adaptation– MAMA 2005: Formalization of cloud adaptation– HPDC 2005: Automatic optical network reservations– Patent # 20080155537

• Vortex: Cooperative traffic aggregation for intrusion detection systems– RAID 2007: Cooperative selective wormholes

Page 5: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

5

What are VMMs currently used for?

• Enterprise and Data centers– Server Consolidation

– Fault tolerance

– Legacy application support

– Debugging

– Isolation

– Virtual appliances

– Failover and disaster recovery

• None specifically designed for other areas– HPC, education, architecture research

$16.70 Billion $7.58 Billion

Page 6: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

6

Virtualization in HPC

• Fault tolerance– RedStorm MTBI target: 50 hours– RedStorm Min TTR: 30 minutes – 1 hour

• Broader usage– Allow applications to select best OS

• Only if it doesn’t degrade performance…– Tightly coupled parallel applications– Very large scale

A.B. Nagarajan, F. Mueller, C. Engelmann, and S.L. ScottProactive Fault Tolerance for HPC with Xen VirtualizationICS 2007

Page 7: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

7

Scalability

• Scale causes lots of problems – Small intra-node performance losses become incredibly large

• Per-node overhead has a Butterfly Effect– OS timers, deferred work, kernel threads– “OS Noise”

• Linux reduces performance by X%• Linux delivers performance of ~0%

• Performance at scale is not always correlated with local performance– 5% loss for 1 node does not equal 5% loss at scale– Example: Paragon

Page 8: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

8

Kitten

• Open-source Lightweight Kernel – Exports Linux compatible ABI

• Subset of features

• LWK for a wide range of HPC applications– Open source version of Catamount lineage

• Contributing developer– http://software.sandia.gov/trac/kitten– http://code.google.com/p/kitten/

Page 9: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

9

Palacios VMM• OS-independent embeddable virtual machine monitor• Developed at Northwestern and University of New Mexico

– Lead developer and graduate student• Open source and freely available

– Downloaded over 1000 times as of July• Users:

– Kitten: Lightweight supercomputing OS from Sandia National Labs– MINIX 3– Modified Linux versions

• Successfully used on supercomputers, clusters (Infiniband and Ethernet), and servers

http://www.v3vee.org/palacios

Page 10: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

10

People• Myself: Lead architect and developer

• Peter Dinda and Patrick Bridges– Project PIs

• Lei Xia, Chang Bae, Zheng Cui, Phil Soltero, and Yuan Tang– Graduate students

• Andy Gocke, Steven Jaconette, Rob Deloatch, Rumou Duan, Jason Lee, Madhav Suresh, Brad Weinberger, Matt Wojcik, and Peter Kamm– Undergraduate students

• Kevin Pedretti and Trammell Hudson– Collaborators at Sandia National Labs

• Many others

Page 11: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

11

Palacios as an HPC VMM

• Minimalist interface– Suitable for an LWK

• Compile and runtime configurability– Create a VMM tailored to specific environments

• Low noise

• Contiguous memory pre-allocation

• Passthrough resources and resource partitioning

Page 12: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

12

HPC Performance Evaluation

• Virtualization is very useful for HPC, but…Only if it doesn’t hurt performance

• Virtualized RedStorm with Palacios– Evaluated with Sandia’s system evaluation

benchmarks

17th fastest supercomputer

Cray XT338208 cores~3500 sq ft

2.5 MegaWatts$90 million

Page 13: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

13

Scalability at Small Scales(Catamount)

Within 5%Scalable

HPCCG: conjugant gradient solver

Page 14: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

14

CatamountCompute Node Linux

Comparison of Operating Systems

HPCCG: conjugant gradient solver

Shadow Paging

Page 15: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

15

Large Scale Study

• Evaluation on full RedStorm system– 12 hours of dedicated system time on full machine– Largest virtualization performance scaling study to date

• Measured performance at exponentially increasing scales– Up to 4096 nodes

• Publicity– New York Times– Slashdot– HPCWire– Communications of the ACM– PC World

Page 16: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

16

Scalability at Large Scale(Catamount)

CTH: multi-material, large deformation, strong shockwave simulation

Within 3%

Scalable

Page 17: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

17

Summary

• Virtualization can scale– Near native performance for optimized VMM/guest (within 5%)

• VMM needs to know about guest internals– Should modify behavior for each guest environment– Example: Paging method to use depends on guest

• Black Box inference is not desirable in HPC environment– Unacceptable performance overhead– Convergence time– Mistakes have large consequences

• Need guest cooperation– Guest and VMM relationship should be symbiotic (Thesis)

Page 18: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

18

Semantic Gap

• VMM architectures are designed as black boxes– Explicit lowlevel OS interface (hardware or paravirtual)– Internal OS state is not exposed to the VMM

• Many uses for internal state– Performance, security, etc...– VMM must recreate that state

• “Bridging the Semantic Gap”– [Chen: HotOS 2001]

• Two existing approaches: Black Box and Gray Box– Black Box: Monitor external guest interactions– Gray Box: Reverse engineer internal guest state– Examples

• Virtuoso Project (Early graduate work)• Lycosid, Antfarm, Geiger, IBMon, many others

Page 19: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

19

Example: Swapping

Physical MemorySwappedMemory

Application Memory Working Set

Swap Disk

• Disk storage for expanding physical memory

Only basic knowledge without internal state

Guest

VMM

Page 20: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

20

Symbiotic Virtualization

• Bridging the semantic gap is hard– Can we design a virtual machine interface with no gap?

• Symbiotic Virtualization– Design both guest OS and VMM to minimize semantic gap

– Bidirectional synchronous and asynchronous communication channels

– Interfaces are optional• Non-symbiotic OS can run on symbiotic VMM

• Symbiotic OS can run on real hardware

Page 21: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

21

Symbiotic Interfaces

• SymSpy Passive Interface– Internal state already exists but it is hidden– Asynchronous bi-directional communication

• via shared memory– Structured state information that is easily parsed

• Semantically rich

• SymCall Functional Interface– Synchronous upcalls into guest during exit handling– API

• Function call in VMM• System call in Guest

– Brand new interface construct

Page 22: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

22

Discovery and Configuration

• A symbiotic OS must run on real hardware– Interface must be based on hardware features

• CPUID – Detection of Symbiotic VMM

• MSRs– Configuration of Symbiotic interfaces

Page 23: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

23

SymSpy

• Shared memory page between OS and VMM– Global and per-core interfaces

• Standardized data structures– Shared state information

• Read and write without exits

Page 24: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

24

PCI passthrough (with SymSpy)

2 node Infiniband Ping Pong bandwidth measurement

(Linux guest on Infiniband cluster)

Page 25: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

25

SymCall (Symbiotic Upcalls)

• Conceptually similar to System Calls– System Calls: Application requests OS services– Symbiotic Upcalls: VMM requests OS services

• Designed to be architecturally similar– Virtual hardware interface

• Superset of System Call MSRs

– Internal OS implementation• Share same system call data structures and basic operations

• Guest OS configures a special execution context– VMM instantiates that context to execute synchronous upcall– Symcalls exit via a dedicated hypercall

Page 26: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

26

SymCall Control Flow

Handle exit

Running in guest

Returnto VMM

NestedExits

Page 27: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

27

Implementation

• Symbiotic Linux guest OS– Exports SymSpy and SymCall interfaces

• Palacios– Fairly significant modifications to enable nested

VM entries• Re-entrant exit handlers

• Serialize subset of guest state out of global hardware structures

Page 28: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

28

SwapBypass

• Purpose: improve performance when swapping– Temporarily expand guest memory– Completely bypass the Linux swap subsystem

• Enabled by SymCall– Not feasible without symbiotic interfaces

• VMM detects guest thrashing– Shadow page tables used to prevent it

Page 29: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

29

Symbiotic Shadow Page tablesGuest Page Tables Shadow Page Tables

PageDirectory

PageTable

PhysicalMemory

PageDirectory

PageTable

PhysicalMemory

SwapDiskCache

Swapped out page Swap Bypass Page

Page 30: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

30

Guest 3Global Swap Disk Cache

Guest 1

SwapBypass Concept

Guest Physical MemorySwappedMemory

Swap Disk

Application Working Set

VMM physical Memory

Swap Disk

Guest 2

Guest 2 Guest 3

Page 31: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

31

Necessary SymCall: query_vaddr()

1. Get current process ID– get_current(); (Internal Linux API)

2. Determine presence in Linux swap cache– find_get_page(); (Internal Linux API)

3. Determine page permissions for virtual address– find_vma(); (Internal Linux API)

• Information extremely hard to get otherwise• Must be collected while exit is being handled

Page 32: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

32

Evaluation

• Memory system micro-benchmarks– Stream, GUPS, ECT Memperf– Configured to over commit anonymous memory

• Cause thrashing in the guest OS

• Overhead isolated to swap subsystem– Ideal swap device implemented as RAM disk

• I/O occurs at main memory speeds

• Provides lower bound for performance gains

Page 33: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

33

Bypassing Swap Overhead

Working set size

Ideal I/O improvement

Stream: simple vector kernel

Performance improves

Page 34: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

34

SymMod (Symbiotic Modules)

• SymSpy/SymCall: explicit interfaces– Guest OS must support each specific one

• With migration, VMs not tied to hardware– Virtual environment (hardware and devices) can change

– Currently, OSes must include all possible device drivers

• Guest OS might lack functionality

• SymMod: Extend guest OS with arbitrary interfaces/functionality

– VMM loads code directly into guest

– Implementation in Palacios and Linux

• OS drivers

• Standard Symbiotic Modules

• Secure Symbiotic Modules

Page 35: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

35

• Access standard OS driver API– Dependent on internal OS implementation

OS Drivers/Modules

Page 36: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

36

Standard Symbiotic Modules

• Guest exposes standard symbiotic API via SymSpy

Page 37: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

37

Secure Symbiotic Modules

• Symbiotic API, protected from guest– Secure initialization– Virtual memory overlay

Page 38: Symbiotic Virtualization John R. Lange Ph.D. Final Defense Department of Electrical Engineering and Computer Science Northwestern University August 9,

38

Summary and Contributions• Designed and implemented the Palacios VMM

– OS independent embeddable VMM

• Virtualization can scale– Near native performance for optimized VMM/guest (within 5%)

• VMM needs to know about guest internals– Bridge the semantic gap– Black Box inference is not desirable in HPC environment

• Need guest cooperation– Guest and VMM relationship should be symbiotic

• Symbiotic Virtualization – SymSpy and SymCall interfaces– SymMod extensions