virtualization tutorial at ACM bangalore Compute 2009

52
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The Hardware Revolution in Server Virtualization Mohan Parthasarathy (Hewlett-Packard) ACM Compute 2009 Tutorial 9 th Jan 2009

description

compute 2009 tutorial presentation on virtualization by mohan parthasarathy, HP, bangalore, india.

Transcript of virtualization tutorial at ACM bangalore Compute 2009

Page 1: virtualization tutorial at ACM bangalore Compute 2009

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

The HardwareRevolution in ServerVirtualization

Mohan Parthasarathy (Hewlett-Packard)

ACM Compute 2009 Tutorial

9th Jan 2009

Page 2: virtualization tutorial at ACM bangalore Compute 2009

2 16 January 2009

Agenda• Server Virtualization technologies ~15 min

− Overview and history

− VMM architectures

− Criteria for a processor to be virtualizable

• X86 Virtualization ~30 min− The x86 processor architecture overview− Virtualization challenges in x86 processors

• Break 1 – Q&A

• Software techniques for virtualization ~ 45 min− CPU virtualization (Binary Translation/Para-virtualization)

− Memory virtualization (shadow tables/Xen writeable page tables)

− I/O virtualization (device emulation)

• Break 2 – Q&A

• Hardware techniques for virtualization ~45 min− CPU virtualization (VT-x/AMD-V)

− Memory virtualization (Intel EPT/AMD NPT)

− I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)

• Future Trends ~ 5 min− Manageability

− SecurityDid you ever wonder if the person in the puddle is real, and you're justa reflection of him? ~Calvin and Hobbes

Page 3: virtualization tutorial at ACM bangalore Compute 2009

3 16 January 2009

Server Virtualization Technologies

IsolationFlexibility

HP nPar

Sun DSD

HP vPar

IBM DLPAR

Sun Logical Domains

HP Integrity VM

IBM SLPARS (micro-partitions)

Hitachi Virtage

VMware ESX/GSX

Microsoft Hyper-V

Xen, KVM, xVM…

HP-UX SRP

Solaris Containers(Zones)

PVC (earlier SWSoft)

OpenVZ,

IBM WPAR

HardwarePartitioning

Software/FirmwarePartitioning

Software/FirmwareVirtualization

ResourceVirtualization

S/W

H/W CPU

Memory

CPU

Memory

OS1 OS2

Hypervisor Layer(Software/Firmware)

OS1

APP1 APP2 APP1 APP2

OS2

CPU

Memory

CPU

Memory

CPU

Memory

CPU

Memory

Hypervisor Layer(Software/Firmware)

OS1 OS2

APP1 APP2

APP1 APP2

CPU

Memory

CPU

Memory

OS

Page 4: virtualization tutorial at ACM bangalore Compute 2009

4 16 January 2009

A brief history lesson

• VMM on IBM Mainframe

• Many apps on $$$ HW

Stanford Research

• DISCO project

• VMM on cheap x86 HW

• VMware in 1999

IBM Mainframe

IBM VM/370

CMS

APP

MVS MVS CMS

APP APP APP

Intel / AMD x86 Server

VMware

W2K3

APP

W2K WNT4 Linux

APP APP APP

1960’s 1996

Commodity hardware becomes powerful enough to support a virtual machinemanager (VMM) – so it’s back to the future with a proven technology!

Page 5: virtualization tutorial at ACM bangalore Compute 2009

5 16 January 2009

VMM Architectures

Hardware

VMM (Hypervisor)

Guest 1 Guest 2

Examples:-VMware ESX-Xen-MS Hyper-V

Type-1 VMM Hypervisor

Hardware

Host OS

VMM

Guest 1 Guest 2

Examples:-UML

Type-2 VMM Hypervisor

Hardware

Host OS VMM

Guest 1 Guest 2

Examples:-HPVM-VMWare GSX-Microsoft Virtual Server

Hybrid VMM Hypervisor

Page 6: virtualization tutorial at ACM bangalore Compute 2009

6 16 January 2009

Hosted VM Architecture

HP Integrity VM, Microsoft Virtual Server, VMware GSX

Page 7: virtualization tutorial at ACM bangalore Compute 2009

7 16 January 2009

Virtualization Requirements – Popek andGoldberg

• A Model of Third Generation Machines

− Two modes of execution

− Protection mechanism for thesupervisor mode

− A method to automatically signal thesupervisor when the VM executes asensitive instruction.

• Properties for a Virtual Machine Monitor

− Equivalence

− Resource control

− Efficiency

Page 8: virtualization tutorial at ACM bangalore Compute 2009

8 16 January 2009

VMM Requirements (Sensitive Instructions)

Ref : Analyzing the Intel Pentium’s ability to support a secure VMM – John Scott Robin (1999)

Page 9: virtualization tutorial at ACM bangalore Compute 2009

9 16 January 2009

Agenda• Server Virtualization technologies

− Overview and history

− VMM architectures

− Criteria for a processor to be virtualizable

• X86 Virtualization− The x86 processor architecture overview

− Virtualization challenges in x86 processors

• Software techniques for virtualization− CPU virtualization (Binary Translation/Para-virtualization)

− Memory virtualization (shadow tables/Xen writeable page tables)

− I/O virtualization (device emulation)

• Hardware techniques for virtualization− CPU virtualization (VT-x/AMD-V)

− Memory virtualization (Intel EPT/AMD NPT)

− I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)

• Future Trends− Manageability

− Security

Page 10: virtualization tutorial at ACM bangalore Compute 2009

10 16 January 2009

X86 architecture – Privilege Levels

Data Structures contains Privilege Levels

• DPL : Descriptor privilege level

• CPL : Current Privilege Level

− DPL of the access rights byte in CSsegment descriptor cache register

− privilege level of the code and datasegment for the current task

• RPL : Requested Privilege Level

− the privilege level of the new selectorloaded into a segment register

Page 11: virtualization tutorial at ACM bangalore Compute 2009

11 16 January 2009

X86 memory managementCS, DS, SS, FS, ES, GS

Page 12: virtualization tutorial at ACM bangalore Compute 2009

12 16 January 2009

X86 memory management - segmentation

Upper 13 bits ofsegment selectorare used to index

the descriptor table

TI = Table Indicator

Select the descriptor table

0 = Global Descriptor Table

1 = Local Descriptor Table

GDTR, LDTR

selector Segment base Segment limitAccessrights

Hidden part of segment register

Page 13: virtualization tutorial at ACM bangalore Compute 2009

13 16 January 2009

X86 Paging – 32 bit mode

Page Table Entry

Page Table

Page 14: virtualization tutorial at ACM bangalore Compute 2009

14 16 January 2009

X86 paging - registers

Page 15: virtualization tutorial at ACM bangalore Compute 2009

15 16 January 2009

X86 paging – 64 bit mode with 4KBpages

Linear Address

63 0

4Kb Page Translation

Page MapLevel 4

9

Page DirectoryPointer

9

Page DirectoryEntry

9

Page TableEntry

9

PTE OffsetPDE OffsetPDPE OffsetPML4E OffsetSign Extend

20293847 21303948 1112

Page Offset

4-Kb PageIn PhysicalMemory

12

512 PML4E * 512 PDPE * 512 PDE * 512 PTE = 2 4-Kb pages36

Page 16: virtualization tutorial at ACM bangalore Compute 2009

16 16 January 2009

X86 paging – 64 bit mode with 2MBpages

Linear Address

63 0

2Mb Page Translation

Page MapLevel 4

9

Page DirectoryPointer

9

Page DirectoryEntry

9

PDE OffsetPDPE OffsetPML4E OffsetSign Extend

20293847 21303948

Page Offset

2-Mb PageIn PhysicalMemory

21

512 PML4E * 512 PDPE * 512 PDE = 2 2-Mb pages27

Page 17: virtualization tutorial at ACM bangalore Compute 2009

17 16 January 2009

X86 virtualization challenges

Guest

Apps

Guest

AppsRing 3

Ring 1

Ring 0

Sysenter

Hardware

CLI/STI

ExcessiveFaulting

Segmentreversibility issueon context switch

SGDT/SIDT/SLDT/STR/PUSHF/SMSW/POP/PUSH

Non-faulting readof privilegedregisters (3B1)

LAR/LSL/VERR/VERW/CALL/INT/JMP/RET

Incorrect execution whenrun in ring level > 0 (3C1)

Ringaliasing/compression

POPF

Non-faulting write toprivileged state(eflags.IF) (3B1)

CPUID

Address spacecompression

STR/POP/PUSH

VMMLeakage of privilegelevel (3C1)

Page 18: virtualization tutorial at ACM bangalore Compute 2009

18 16 January 2009

Agenda• Server Virtualization technologies

− Overview and history

− VMM architectures

− Criteria for a processor to be virtualizable

• X86 Virtualization− The x86 processor architecture overview

− Virtualization challenges in x86 processors

• Software techniques for virtualization− CPU virtualization (Binary Translation/Para-virtualization)

− Memory virtualization (shadow tables/Xen writeable page tables)

− I/O virtualization (device emulation)

• Hardware techniques for virtualization− CPU virtualization (VT-x/AMD-V)

− Memory virtualization (Intel EPT/AMD NPT)

− I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)

• Future Trends− Manageability

− Security

Page 19: virtualization tutorial at ACM bangalore Compute 2009

19 16 January 2009

Dynamic Binary Translation

Data RAM

Disk

x86Binary

Runtime -- Execution

x86Binary

Code Cache Code CacheTags

Translator

x86 Parser &High LevelTranslator

High LevelOptimization

Low LevelCode Generation

Low LevelOptimization and

Scheduling

Ref : Virtual Machines and Dynamic Translation:Implementing ISAs in Software – Joel Emer, Massachusetts

Institute of Technology

Page 20: virtualization tutorial at ACM bangalore Compute 2009

20 16 January 2009

Binary Translation - C Code Example

int isPrime(int a) {

for (int i = 2; i < a; i++) {

if (a % i == 0) return 0;

}

return 1;

}

Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86virtualization. Operating Systems Review, 40(5):2–13, December 2006

Page 21: virtualization tutorial at ACM bangalore Compute 2009

21 16 January 2009

Basic Block Translation

• Most instructions copied identically.

• Privileged instructions must be emulated.

• Jumps must be translated since translation can alter code layout.

• Each translated BB must end with jump to next translated BB.

Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86virtualization. Operating Systems Review, 40(5):2–13, December 2006

Page 22: virtualization tutorial at ACM bangalore Compute 2009

22 16 January 2009

Translation of isPrime(49)

Note that prime: BB never translated since 49 is not prime.

Ref : Keith Adams and Ole Agesen. A comparison of software and hardware techniques for x86virtualization. Operating Systems Review, 40(5):2–13, December 2006

Page 23: virtualization tutorial at ACM bangalore Compute 2009

23 16 January 2009

Para-virtualization – Xen architecture

Page 24: virtualization tutorial at ACM bangalore Compute 2009

24 16 January 2009

Memory Virtualization – Shadow PageTables

Page 25: virtualization tutorial at ACM bangalore Compute 2009

25 16 January 2009

Memory Virtualization – Shadow PT vsWriteable page tables (Xen)

Guest

VMMShadow PT

PD

PT

PD

PT

Sync’ed

CR3

Guest

PD

PT

CR3

VMMPage Fault

Verifies thatpage tableupdate isokay

VA->PA VA->MA

Page 26: virtualization tutorial at ACM bangalore Compute 2009

26 16 January 2009

Dynamic memory resizing - Ballooning

• Inflating a balloon

− When the server wants toreclaim memory

− Driver allocates pinnedphysical pages within the VM

− Increases memory pressure inthe guest OS, reclaims spaceto satisfy the driver allocationrequest

− Driver communicates thephysical page number foreach allocated page to VMM

• Deflating

− Frees up memory for generaluse within the guest OS

Page 27: virtualization tutorial at ACM bangalore Compute 2009

27 16 January 2009

I/O system architecture overview (PCI/PCI-e)

MemoryTX RX

0 1 2

OS driver

3, 0, 0 (BDF)

RootComplex

CPU CPU CPU CPU

OS driver OS driver

VMM

CHAOS!!

Configurationspace

Page 28: virtualization tutorial at ACM bangalore Compute 2009

28 16 January 2009

I/O Virtualization Architecture

• Pro: Higher Performance

• Pro: I/O Device Sharing

• Pro: VM Migration

• Con: Larger Hypervisor

Hypervisor

SharedDevices

I/O Services

Device Drivers

VM0

Guest OSand Apps

VMn

Guest OSand Apps

Monolithic ModelMonolithic Model

• Pro: Highest Performance

• Pro: Smaller Hypervisor

• Pro: Device assisted sharing

• Con: Migration Challenges

AssignedDevices

Hypervisor

VM0

Guest OSand Apps

DeviceDrivers

VMn

Guest OSand Apps

DeviceDrivers

PassPass--through Modelthrough Model

• Pro: High Security

• Pro: I/O Device Sharing

• Pro: VM Migration

• Con: Lower Performance

SharedDevices

I/OServices

Hypervisor

DeviceDrivers

Service VMs

VMn

VM0

Guest OSand Apps

Guest VMs

Service VM ModelService VM Model

VMWare ESX Xen

Page 29: virtualization tutorial at ACM bangalore Compute 2009

29 16 January 2009

Network Virtualization (VMWare GSXexample)

Page 30: virtualization tutorial at ACM bangalore Compute 2009

30 16 January 2009

Xen I/O Architecture - Safe HardwareInterface

Page 31: virtualization tutorial at ACM bangalore Compute 2009

31 16 January 2009

Networking in Xen

Driver Domain GuestDomain 1

Guest

Domain ...

Guest

Domain 2

NIC CPU / Memory / Disk / Other Devices

NIC Driver

Packet Data

Front-EndDriver

EthernetBridge

Back-End Drivers

HardwareInterrupts

Packet Data

DriverControl

Control + Data

VirtualInterrupts

HypervisorInterruptDispatch

HypervisorPage

Flipping

Page 32: virtualization tutorial at ACM bangalore Compute 2009

32 16 January 2009

Agenda• Server Virtualization technologies (15 min)

− Overview and history

− VMM architectures

− Criteria for a processor to be virtualizable

• X86 Virtualization (30 min)− The x86 processor architecture overview

− Virtualization challenges in x86 processors

• Software techniques for virtualization (30 min)− CPU virtualization (Binary Translation/Para-virtualization)

− Memory virtualization (shadow tables/Xen writeable page tables)

− I/O virtualization (device emulation)

• Hardware techniques for virtualization− CPU virtualization (VT-x/AMD-V)

− Memory virtualization (Intel EPT/AMD NPT)

− I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)

• Future Trends (5 min)− Manageability

− Security

Page 33: virtualization tutorial at ACM bangalore Compute 2009

33 16 January 2009

CPU Virtualization with Intel VT-x

• Two new VT-x operating modes

− Less-privileged mode(VMX non-root) for guest OSes

− More-privileged mode(VMX root) for VMM

• Two new transitions

− VM entry to non-root operation

− VM exit to root operation

Ring 3

Ring 0

VMXRoot

Virtual Machines (VMs)

Apps

OS

VM Monitor (VMM)

Apps

OS

VM ExitVM Exit VM EntryVM Entry

• Execution controls determine when exits occur

− Access to privilege state, occurrence of exceptions, etc.

− Flexibility provided to minimize unwanted exits

• VM Control Structure (VMCS) controls VT-x operation

− Also holds guest and host state

Page 34: virtualization tutorial at ACM bangalore Compute 2009

34 16 January 2009

IA-32Operation

VT-x Operations

Ring 0

Ring 3VMX RootOperation

VMXNon-rootOperation

. . .Ring 0

Ring 3

VM 1

Ring 0

Ring 3

VM 2

Ring 0

Ring 3

VM n

VMXONVMLAUNCHVMRESUME

VM Exit VMCS2

VMCSn

VMCS1

Page 35: virtualization tutorial at ACM bangalore Compute 2009

35 16 January 2009

VT-x new instructions• VMXON and VMXOFF

− To enter and exit VMX-root mode.

• VMLAUNCH: Used on initial transition from VMM to Guest− Enters VMX non-root operation mode

• VMRESUME: Used on subsequent entries− Enters VMX non-root operation mode

− Loads Guest state and Exit criteria from VMCS

• VMEXIT− Used on transition from Guest to VMM

− Enters VMX root operation mode

− Saves Guest state in VMCS

− Loads VMM state from VMCS

• VMPTRST and VMPTRLD− To Read and Write the VMCS pointer.

• VMREAD, VMWRITE, VMCLEAR− Read from, Write to and clear a VMCS

• VMCALL− Hypervisor entry point for hypercall from guest

Page 36: virtualization tutorial at ACM bangalore Compute 2009

36 16 January 2009

VT-x Data Structures (VMCS)• VMCS is a 4K table

which specifies theVM environment

• Physical addressingonly, and is accessedthroughVMREAD/VMWRITEinterface

• Loads and Stores tothe current VMCSpointer throughVMPTRLD andVMPTRST

• VMRESUME used ifsame VMCS is beingresumed on aprocessor. Else,VMCLEAR followed byVMLAUNCH.

Interrupts onentry, MSRloads etc..

These fieldscontrol VMentries

VM entry controls

MSR save etc..These fieldscontrol VM exits

VM exit controls

CR3, EIP set tomonitor entry,EFLAGS etc..

Processor stateloaded on VMexits

Host save state

EIP, ESP,EFLAGS, IDTR,Segmentregisters etc..

Processor statesaved on VMexits and loadedfrom on VMentries

Guest save state

External interruptexiting, interruptwindow exiting,CR3 load/storeexiting, VPIDenable, VPIDvalue, EPTenable, EPTP…

Controlsprocessorbehaviour innon-root mode

VM execution controls

Page 37: virtualization tutorial at ACM bangalore Compute 2009

37 16 January 2009

VT-x solution to x86 virtualization challenges

Guest

Apps

Guest

AppsRing 3

Ring 0

Ring -1

Sysenter

Hardware

CLI/STI

Sysenter calls into guestOS. CLI/STI optimized todeliver virtual interrupts toVM

Clean context switch onVM entry/exit

SGDT/SIDT/SLDT/STR/PUSHF/SMSW/POP/PUSH

All reads return privilegelevel 0, GDT/LDT owned byguest OS, CPUID can bemade to trap into VMM

LAR/LSL/VERR/VERW/CALL/INT/JMP/RET

Guest OS in full control ofsegment/task descriptors

No ringcompression –all ringsavailable

POPF

Eflags.IF is no longer used forinterrupt masking

CPUID

No need for VMM to share addressspace with guest – no addresscompression

VMM

Page 38: virtualization tutorial at ACM bangalore Compute 2009

38 16 January 2009

Intel EPT/AMD NPT

GuestPhysicalAddress

GPT BasePointer (hCR3)

HostPhysicalAddress

TLB & Caches

GuestLinear

Address

gCR3

x86 GuestPage Tables

Host GPTPage Tables

• GPT directly translates Guest Virtual addresses into Host Physicaladdresses on the fly.

− Uses Guest Page Table and Host-based Page Table

• Significant reduction in “exit frequency”• Primary page table modifications are as fast as native

• Page faults require no exits

• Context switches require no exits

− No shadow page table memory overhead

• However, results in more expensive TLB misses - The “memsweep effect” –mitigated by large guest pages

• AMD ASID/Intel VPID - segments the TLB, reduces TLB purge overheads.

Page 39: virtualization tutorial at ACM bangalore Compute 2009

39 16 January 2009

VT-x extension: Extended Page Table(EPT)

• All guest-physical addresses go through extended page tables

• Includes address in CR3, address in PDE, address in PTE, etc.

Page 40: virtualization tutorial at ACM bangalore Compute 2009

40 16 January 2009

VT-x extension: Virtual Processor IDs(VPID)

Host 0x1000 0x10001000

Host 0x2000 0x10002000

Host 0x3000 0x10003000

Host 0x4000 0x10004000

Guest 0x1000 0xFFF01000

Guest 0x2000 0xFFF02000

Guest 0x3000 0xFFF03000

Guest 0x4000 0xFFF04000

Address Physical Address

Tag

Virtual

• The idea of a tagged TLB is that eachTLB entry is “tagged” with an identifier

• Having such a tag allows the TLBentries to not be “flushed” whenswitching between the host and aguest

• VPID is activated if the new “enableVPIP” control bit is set in VMCS

Page 41: virtualization tutorial at ACM bangalore Compute 2009

41 16 January 2009

VT-x extension: CPUID spoofing(Flex Migration)• Allows software to “spoof” the CPUID feature bits (e.g. make

the value of the CPUID feature bits appear different thanthey really are).

• This is the same than the CPUID spoofing feature that thecurrent VT processors have.

Older / Existing Servers Newer Servers

32 bitsingle core

64 bitsingle core

Pre 2004 2004+ 2006+ (Intel® Core™)

64 bit dual,quad-core

Live VMMigration

Live VMMigration

Page 42: virtualization tutorial at ACM bangalore Compute 2009

42 16 January 2009

Memory-resident Partitioning AndTranslation Structures

DeviceAssignmentStructures

Address TranslationStructures

Device D1

Device D2

Address TranslationStructures

Intel VT-d Architecture Detail

DMA Requests

Device ID Virtual Address Length

Memory Access with SystemPhysical Address

DMA RemappingEngine

Translation Cache

Context Cache

Fault Generation

…Bus 255

Bus 0

Bus N

Dev 31, Func 7

Dev P, Func 1

Dev 0, Func 0

Dev P, Func 2

PageFrame

4KB PageTables

Page 43: virtualization tutorial at ACM bangalore Compute 2009

43 16 January 2009

VT-d: Remapping Structures

PControlsRsvdAddress Space Root Pointer

AddressWidth

RsvdRsvd Domain ID

Ext.Controls

0

64

63

127

• VT-d Page Table Entry

RSP

Page-Frame / Page-Table Address

063

WAvailableRsvd Rsvd Ext.Controls

• VT-d supports hierarchical page tables for address translation− Page directories and page tables are 4 KB in size

− 4KB base page size with support for larger page sizes

− Support for DMA snoop control through page table entries

• VT-d hardware selects page-table based on source of DMA request− Requestor ID (bus / device / function) in request identifies DMA source

• VT-d Device Assignment Entry

Page 44: virtualization tutorial at ACM bangalore Compute 2009

44 16 January 2009

VT-d: Hardware Page Walk

000000b000000bBusBus DeviceDevice FuncFunc

00223377881515

Requestor IDRequestor ID

DeviceDeviceAssignmentAssignment

TablesTables

BaseBase

LevelLevel--44PagePageTableTable

LevelLevel--33PagePageTableTable

LevelLevel--22PagePageTableTable LevelLevel--11

PagePageTableTable

PagePage

Example Device AssignmentExample Device AssignmentTable Entry specifying 4Table Entry specifying 4--levellevelpage tablepage table

5656

DMA Virtual AddressDMA Virtual Address

001111

LevelLevel--44table offsettable offset

LevelLevel--33table offsettable offset

LevelLevel--22table offsettable offset

LevelLevel--11table offsettable offset

12122020212129293030383839394747

000000000b000000000b

6363 48485757

Page OffsetPage Offset

Page 45: virtualization tutorial at ACM bangalore Compute 2009

45 16 January 2009

•• PCI SIG is standardizing mechanisms that enable PCIe Devices toPCI SIG is standardizing mechanisms that enable PCIe Devices to be directly sharedbe directly shared

−− SingleSingle--Root IOVRoot IOV –– Direct sharing between SIs on a single systemDirect sharing between SIs on a single system

−− MultiMulti--Root IOVRoot IOV –– Direct sharing between SIs on multiple systemsDirect sharing between SIs on multiple systems

•• PCIPCI--SIG IOV Specification coversSIG IOV Specification covers ““northnorth--sideside”” of the Deviceof the Device

PCIe SinglePCIe Single--Root IOVRoot IOV

SI SIVI SI SI

PCIe MultiPCIe Multi--Root IOVRoot IOV

VISI SIVI

PCI SIG IOV Overview

Page 46: virtualization tutorial at ACM bangalore Compute 2009

46 16 January 2009

•• System Image (SI)System Image (SI)

−− SW, e.g., a guest OS, to which virtualSW, e.g., a guest OS, to which virtualand physical devices can be assignedand physical devices can be assigned

•• Virtual Intermediary (VI)Virtual Intermediary (VI)

−− Performs resource allocation, isolation,Performs resource allocation, isolation,management and event handlingmanagement and event handling

•• PCIMPCIM –– PCI ManagerPCI Manager

−− Controls configuration, managementControls configuration, managementand error handling of PFs and VFsand error handling of PFs and VFs

−− May be in SW and/or Firmware.May be in SW and/or Firmware.

−− May be integrated into a VIMay be integrated into a VI

•• Translation Agent (TA )Translation Agent (TA )

−− Uses ATPT to translates PCI BusUses ATPT to translates PCI BusAddresses into platform addressesAddresses into platform addresses

•• Address Translation and ProtectionAddress Translation and ProtectionTable (ATPT)Table (ATPT)

−− Validates access rights of incoming PCIValidates access rights of incoming PCImemory transactions.memory transactions.

−− Translates PCI Address intoTranslates PCI Address intoplatform physical addressesplatform physical addresses

VI

F F

PCIeSwitch

VISR-PCIM SISIPCI SIG IOVTerminologies

Page 47: virtualization tutorial at ACM bangalore Compute 2009

47 16 January 2009

VT-c: Virtual Machine DeviceQueues (VMDq)

• On the receive path, VMDqprovides a hardware ‘sorter'or classifier that essentiallydoes the pre-work for theVMM of directing which endVM the packets should go to.The NIC or LAN silicon isperforming a hardware assistfor the VMM layer.

Page 48: virtualization tutorial at ACM bangalore Compute 2009

48 16 January 2009

Intel / AMD Comparison

2005

VT-xVMENTER, VMRESUME,VMREAD, VMWRITEVMCS – VM control seg

2006

LTSENTERAC

2007

VT-dIOMMU

In

tel

SVMVMRUNVMCB – VM control blockASID tagged TLB (performance)Paged realmodeSKINIT (security)DMA exclusion vector (security)

SVM-2Nested page tablesImproved #VMEXITDecode assistA

MD

VT-d2IOMMU

IOMMUPCI-SIGATS

VT-x2Extended PageTables (EPT)Virtual ProcessorIDs (VPID)

SVM-3?

2008 unknown

Page 49: virtualization tutorial at ACM bangalore Compute 2009

49 16 January 2009

Deja-Vu – Back to the future

• What VT calls "non-root mode", and Pacifica calls "guestmode", was called "interpretive execution" on the IBMVM/370 and VM/ESA mainframes.

• VT's "vmlaunch" instruction and Pacifica's "vmrun" wascalled as "sie“

• Intel's "VMCS" and AMD's "VMCB" was called as "statedescription" on the IBM mainframes.

• IBM also defined the concept of shadow translation tablesand a dual page-table walk in hardware.

• IBM also defined a interpreted SIE for nested hypervisorsupport (not yet in Intel/AMD)

Page 50: virtualization tutorial at ACM bangalore Compute 2009

50 16 January 2009

Agenda• Server Virtualization technologies

− Overview and history

− VMM architectures

− Criteria for a processor to be virtualizable

• X86 Virtualization− The x86 processor architecture overview

− Virtualization challenges in x86 processors

• Software techniques for virtualization− CPU virtualization (Binary Translation/Para-virtualization)

− Memory virtualization (shadow tables/Xen writeable page tables)

− I/O virtualization (device emulation)

• Hardware techniques for virtualization− CPU virtualization (VT-x/AMD-V)

− Memory virtualization (Intel EPT/AMD NPT)

− I/O virtualization (VT-d/Vt-d2/PCI SIG SR-IOV/MR-IOV)

• Future Trends− Manageability

− Security

Page 51: virtualization tutorial at ACM bangalore Compute 2009

51 16 January 2009

Future Trends• Secure Hypervisors – The hypervisor itself like an OS can have holes.

• BluePill attacks – subverting the hypervisor

• Trusted Virtualization - Virtualizing TPMs for use by guest virtual machines

• Trusted Virtualization – How do we trust the VMM ? Intel’s LT (LaGrande) andAMD’s Presidio introduce architectural extensions for security

• Firewalls to protect guests. Xen Motion security hole

• Storage QoS – FC NPIV, Storage vMotion

• Datacenter/Lifecycle Management (Virtualiztion 2.0)

− OpsWare PAS (now HP Operations Orchestrator)

− Novell ZENworks Orchestrator

− VMware Lifecycle Manager

Page 52: virtualization tutorial at ACM bangalore Compute 2009

52 16 January 2009

References• D. L. Osisek, K. M. Jackson, and P. H. Gum. ESA/390

interpretive-execution architecture, foundation for VM/ESA.IBM Systems Journal, 30(1):34–51, 1991.

• John Scott Robin and Cynthia E. Irvine. Analysis of the IntelPentium’s ability to support a secure virtual machinemonitor. In USENIX, editor, Proceedings of the NinthUSENIX Security Symposium, August 14–17, 2000,Denver, Colorado, page 275, San Francisco, CA, USA,2000

• Keith Adams and Ole Agesen. A comparison of softwareand hardware techniques for x86 virtualization. OperatingSystems Review, 40(5):2–13, December 2006

• PCI IOV talks at WinHEC and HP by Michael Krause

• VMWorld 2007 talk by Ole Agesen

• Intel IDF 2007/2008 presentations