Post on 25-Dec-2015
Symbiotic Virtualization
John R. Lange
Thesis ProposalDepartment of Electrical Engineering and Computer Science
Northwestern University
June 2009
2
Introduction
• VMs are traditionally Black boxes– Separated from the VMM by a semantic gap– Does provide a clean interface
• Does that make sense in today’s environment?– Cloud computing, live migration, differing architectures– Guests should know they are in a VM
• Many reasons to bridge the gap – Performance, Security, Monitoring, etc…
• Existing approaches don’t allow this
• Symbiotic Virtualization is an alternative to black box design
3
Symbiotic Virtualization
• Novel approach to designing VMMs and operating systems
• OS compatible with native hardware interface• BUT also optionally exposes a software interface that
can be used by a VMM
• Essentially, the VMM can easily inspect and modify the guest OS– Optional and Incremental
4
Outline
• The Semantic Gap
• Thesis Statement
• Palacios and Kitten
• Symbiotic Virtualization
• Schedule
• Contributions
5
Semantic Gap
• VMM architectures are designed as black boxes– Explicit OS interface (hardware or paravirtual)– Internal OS state is not exposed to the VMM
• Many uses for internal state– Performance, security, etc...– VMM must recreate that state
• “Bridging the Semantic Gap”
• Many examples– Virtuoso Project– Lycosid, Antfarm, Geiger, IBMon, many others
6
Virtuoso Project• Bridged the semantic gap for virtual networking
– Examine physical network traffic to model application behavior
• Provide virtual services to unmodified OSes and Applications
• Virtuoso Project Components– VNET
• Sundararaj, A., and Dinda, P. Towards virtual networks for virtual machine grid computing. In Proceedings of the 3rd USENIX Virtual Machine Research And Technology Symposium (VM 2004)
– VTTIF• Gupta, A., and Dinda, P. Inferring the topology and traffic load of parallel programs running in a virtual
machine environment. In Proceedings of the 10th Workshop on Job Scheduling Strategies for Parallel Processing– VADAPT
• Sundararaj, A., Gupta, A., and Dinda, P. Increasing application performance in virtual environments through run-time inference and adaptation. In Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing
– VRESERVE • Lange, J., Sundararaj, A., and Dinda, P. A. Automatic dynamic run-time optical network reservations. In
Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing
– VTL • Lange, J. and Dinda, P. Transparent network services via a virtual traffic layer for virtual machines. In
Proceedings of the 16th International Symposium on High Performance Distributed Computing
7
VNET
• Overlay network for virtual machines– Remotely distributed VMs appear connected to a LAN
– Layer 2 overlay, operates on ethernet frames
• Supports arbitrary overlay topologies, routing, and link types
• Provides mechanisms to maximize network performance
8
VTTIF and VADAPT
• Virtual Topology and Traffic Inference Framework– Infers communication topology and traffic load
matrix for a VM
• VADAPT– Uses information from VTTIF – Adaptively optimizes VNET overlay topology
9
VRESERVE
• Automatic and dynamic network reservations– Allows unmodified applications to use circuit
switched optical networks
• Added optical network reservation interface to VNET– Automatically reserves network link when VTTIF
detects traffic between two connected hosts
10
VTL: Transparent Network Services
• Manipulate data and signaling of connections to add services to existing unmodified applications and OSes– High Level transformations of Low Level traffic– Transparency: Manipulations invisible to guest
environment (Black Box approach)
• VTL (Virtual Traffic Layer)– A framework for creating Transparent Network Services
• Can transform TCP connections into different protocols
11
Bridging the Semantic Gap
• Enables many useful features and optimizations
• However…– Current approaches are labor intensive
• Reverse engineering an OS
– Highly specific to OS implementation– Collected information not always accurate
12
Symbiotic Virtualization
• Bridging the semantic gap is hard– Can we design a virtual environment with no gap?
• Symbiotic Virtualization– Design both guest OS and VMM to minimize semantic gap
– 2 components• Guest OS provides internal state to VMM
• Guest OS services requests from VMM
– Interfaces are optional• Not required for correct operation
13
Thesis Statement
• I propose symbiotic virtualization, an approach to OS design that preserves the benefits of full system virtualization, while enabling performance and functionality benefits.
• In symbiotic virtualization, an OS targets the native hardware interface and can run unmodified on raw hardware. However, it also exposes a software interface that can be leveraged by a symbiotic virtualization-aware VMM.
• Both the interface and its use by the VMM are optional, but if it exists, and the VMM uses it, the VMM and the OS can mutually benefit.
• Symbiotic virtualization is markedly different from the current virtualization approaches, and is best considered as being on a continuum between full system virtualization and paravirtualization.
14
Thesis Goals
• Define and formalize Symbiotic Virtualization
• Develop formal symbiotic interfaces
• Implement symbiotic interfaces inside an OS
• Implement set of symbiotic extensions
• Use examples to evaluate the symbiotic approach
15
Palacios
• OS independent embeddable VMM– Written from scratch at NU and UNM
• Designed to be modularly linked into existing kernels– Minimal host OS interface– Compiles into static library– Currently embedded: Kitten and GeekOS
• Open Source (BSD License)– Downloaded ~1000 times
• Lead developer
16
Palacios Details
• Supports 32 and 64 bit environments– Host and Guest
• Full hardware virtualization– Currently only supports AMD extensions– Intel VMX in process
• Supports Linux and HPC guest OSes
• Relatively small: ~28K lines
17
Architecture
Palacios
18
Kitten
• Lightweight HPC OS from Sandia National Labs– Designed for large scale HPC systems (Cray XT)– Successor to Catamount and earlier lightweight kernels
• Based on Linux– Only the necessary components– Limited Linux ABI compatibility
• Uses Palacios for virtualization– Embedded as a library– VMs launch as part of job submission
• Contributing developer
19
Palacios as an HPC VMM• Minimalist interface:
– Does not require extensive host OS features– Easily embedded into even small kernels
• Full system virtualization: – Does not require guest OS changes– Runs existing kernels without any porting
• Kitten, Catamount, Cray CNL, and IBM’s CNK
• Contiguous memory preallocation:– Preallocates guest memory as a physically contiguous region– Vastly simplifies the virtualized memory implementation– Deterministic performance for most memory operations
• Passthrough resources and resource partitioning: – Host resources are easily mapped directly into a guest environment– Provides access to high performance devices, with existing device drivers, with no
virtualization overhead.• Low noise:
– Minimizes the amount of OS noise injected by the VMM layer. – No internal timers and no accumulated deferred work.
20
Symbiotic Virtualization in HPC
• HPC environments are well suited to symbiotic techniques
• Full trust of the software stack– Fewer security concerns
• Specific hardware configurations– Limited number of devices
• Constrained problem space– Small number of applications
• Implementations can be very specific
• Environments are much smaller– Internal OS state is simpler than a general purpose OS
• At large scale performance impact is dramatic– Large impetus to optimize VMM and OS
21
HPC Performance Example
• Guest OS behavior can differ widely– Must optimize for specific OSes and applications
• Example: – Catamount and Compute Node Linux
• 2 HPC OSes– Process switching implementation
• CNL swaps page tables• Catamount does not
– Nested and shadow page tables have very different performance characteristics
– Evaluated with 2 HPC benchmarks • HPCCG and CTH• 3 configurations (Native, Shadow Paging, Nested Paging)• Running on RedStorm Development Cages (Cray XT)
22
HPCCG Benchmark
CatamountCompute Node Linux
23
CTH Benchmark
CatamountCompute Node Linux
24
Takeaway
• At large scale minor performance problems become large– Very important to minimize any performance overhead introduced
• VMM needs to know about guest internals– Should modify behavior for each guest environment– Which paging method to use depends on guest
• Inference is not desirable in HPC environment– Unacceptable performance overhead– Convergence time– Mistakes have large consequences
• Symbiotic approach is very appealing
25
Symbiotic Virtualization
• Definition based on formalization– Formalized interfaces
• Two types of interfaces– Passive information interface
• VMM can read guest OS state– Functional interface
• VMM can send requests to guest OS
• Neither required for OS to function correctly– Symbiotic OS can run on hardware– Non-symbiotic OS can run on symbiotic VMM– Can be implemented incrementally
26
Passive Interface
• Formalize the interface for bridging the semantic gap– Ideally removes the gap
• Internal state already exists but it is hidden– Existing tools try to recreate this data in the VMM
• Symbiotic Interface: – Structure internal OS state in a way that is easily parsed
• Semantically rich– Expose OS state to the VMM
• Easily accessible
27
Example interface
• Linux process list– Organized in a series of lists– Scattered throughout kernel address space– Lots of information included inside
• Priority, memory map, open file descriptors, etc
• Symbiotic Interface– Collect task information in standard location– Organize information to be easily parsable
• Reserved memory page that holds pointers to high priority processes
• List of CR3 values that should be cached
28
• Mechanism for OS to expose functionality to VMM– Guest OS services VMM requests
• Possible interfaces– Guest OS notifications– VMM can force explicit upcalls– Iterator based system
Functional Interface
29
Initial Functional Interface
• Partial initial test implementation– Prosnitz and Xia
• Implemented inside GeekOS and Palacios
• Iterator based– Modelled on RPCs
30
Issues
• New VMM/OS interaction model
• Traditional virtualization assumptions no longer true– No longer a black box
• Some new issues to be addressed– Trust– Design Complexity
31
Symbiotic Trust Model
• Current Architectures: unidirectional trust– Guest OS fully trusts VMM– VMM should not trust guest– Restricts VMM from interacting with guest
• Symbiotic VMM must trust guest interfaces• BUT it doesn’t have to use them
– Selectively enable interfaces depending on trust level
• I will examine the implications Symbiotic virtualization has on the trust model
32
Symbiotic Complexity
• Symbiotic interfaces can increase complexity of VMMs– Implications for Trusted Computing platforms
• Complexity is already there– See examples of bridging the gap
• Correct functionality does not require VMM support
33
Evaluation
• Performance impact of Symbiotic Interfaces• Comparison against existing interfaces
– Lines of code– Complexity of other approaches– Explanation of how the symbiotic functionality is not
otherwise possible
• Evaluate functionality with several example cases• Examine how issues are addressed by design
• Also evaluating virtualization and HPC at scale
34
Implementation
• Implementation of formalized design
• Environment– VMM: Palacios– Host OS: Kitten– Guest OS: Kitten and Linux
• Reasoning:– Relatively small code size– Familiarity with both
35
Code size
36
Symbiotic Examples
• Demonstrating symbiotic virtualization– Symbiotic Swap– Symbiotic Device Drivers– Symbiotic Assists
• Made possible by a symbiotic design
37
Virtualized Memory
• VM memory model same as physical memory– Shadow/Nested paging designed to mimic
• OS has memory set at boot time– Exceptions
• Rare support for hot pluggable memory• Paravirtualized memory
– Usually a large change to Guest OS
• Swap storage allows over allocation– Can be exhausted– Can lead to thrashing
38
Current Swap Architectures
• In Linux, swap storage is an array of pages– Easily accessible
• When a page is swapped its given an index value
– Points to array location– Page faults occur on page access– OS retrieves page and moves it to physical
memory
39
Symbiotic Swap
• Purpose: prevent thrashing situations– Temporarily expand memory
• A symbiotic OS would expose swapped page map– VMM could find swapped page with minimal effort
• Guest OS begins to thrash– Detected by VMM– Guest page is swapped out, but VMM copies it to free page– Shadow memory is altered to point to swapped page
• Accesses no longer cause faults
• Thrashing ends– VMM synchronizes swapped out page– Next access will fault the page back in to guest memory
40
Symbiotic Swap Architecture
41
Device Drivers
• Guests often need direct device access– High performance networks– Driver included inside guest OS
• Self-virtualization– Devices still require their own drivers– Not all devices are capable
• Does not map well to virtual environments– Migration changes underlying hardware– Difficult to share between multiple VMs– VMM must fully trust guest driver
42
VPIO: Virtual Passthrough IO
• Modeling-based approach to high performance I/O virtualization for commodity devices– Devices with no virtualization support
• VMM runs Device Model Monitor (DMM)– Intercepts a subset of IO commands
– Maintains model of internal device state
– Transitions model state based on IO operations
• Prevents security violations• Determines when device can be context switched
L. Xia, J. Lange, and P. Dinda, Towards Virtual Passthrough I/O on Commodity Devices, Proceedings of the First Workshop on I/O Virtualization at OSDI
43
NE2k Device Model
44
Symbiotic Device Drivers
• VMM provides passthrough driver to guest– Passthrough driver can include VPIO model
• Design OS to allow driver injection
• Guest OS no longer needs to include full set of drivers for all possible hardware
• VMM can optimize driver behavior to the environment• Drivers can be dynamically swapped as conditions change
– Passthrough network driver– Overlay network driver– Paravirtual driver
45
VMM Extended Services
• VMMs perform operations on OS– Migration, suspend/resume, checkpointing
• One sided approaches are often overly complex– VMM must account for OS behavior– Many have not been successfully implemented inside an OS
• Ideally services are supported by both VMM and Guest OS– VMM and guest OS share responsibility – Each one does what is suitable to their environment
46
Symbiotic Assists
• Possible Uses– Notifications
• Guest OS is aware of VMM events– Optimizations
• VMM can request guest optimize itself for an operation
• Example: Migration– Allow guest OS to optimize itself for migration
• Flush memory, freeze processes, disable devices– Pre/Post migration notifications
• Possibly interrupt based– A non-symbiotic OS will still work
• But won’t be optimized
47
Schedule
48
Contributions
• Bridging the Semantic Gap– Automatic network reservations (VRESERVE)– Virtual network services (VTL)
• Palacios– A new VMM architecture for HPC
• Kitten– Lightweight HPC OS
• Evaluation of virtualization in HPC at scale
49
Expected Contributions
• Formal definition of Symbiotic Virtualization– Design of a set of symbiotic interfaces.
• Implementation of Symbiotic Virtualization– Based on formal design– Implemented in Palacios– Linux and Kitten guests
• Evaluation of the Symbiotic Virtualization– Raw performance– Complexity comparison
50
Expected Contributions
• Example extensions– Symbiotic Swap
• Guest OS thrashing detection
– Symbiotic Device Drivers• Dynamic insertion of device drivers
– Symbiotic Assists• Optimize VM operations inside guest OS
51
Related Work
• Pre-Virtualization– Dynamically transform an OS to implement Paravirtualization
• FoxyTechnique– Modify virtual hardware to modify guest behavior– Does nothing to bridge the semantic gap
• Bridging the semantic gap– Lycosid
• Security introspection– Antfarm
• Process behavior inference– Geiger
• Buffer cache inference – IBMon
• Infiniband communication monitoring
52
Thank you