Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement...

Post on 22-Apr-2020

2 views 0 download

Transcript of Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement...

Virtualization

CSCE678

Announcement

• Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t watched)

• All optional readings are posted If you don’t like your material, let me know

2

VirtualizationBasics

3

Virtualization in Cloud

• Multi-tenancy:splitting hardware resources for multiple tenants

• Security Isolation:tenants are isolated from each other

• Legacy software:tenants run their OSes and applications without modification

• Flexibility of deployment:tenants can be easily deployed, migrated, or removed from a physical host

4

Physical Machine

Physical vs Virtual Machines

5

Virtual Machine

(Guest)OS

Application

Virtual Machine

(Guest)OS

Application

Virtual Machine MonitorPhysical Machine

OS

Application

Physical Machine

OS

Application

Without virtualization With virtualization

Definition of Virtualization

Popek & Goldberg (1974):

• Identical environment:Programs in VMs should have identical effects as running on real machines

• Efficiency: should not add too much overheads

• Resource control:Virtual machine monitor (VMM) has complete control of hardware resources

6

Identicality

7

Physical Machine

OS

Application

Physical Machine

Virtual Machine

(Guest)OS

Application

VMM

Startprogram

ProcessData

Outputresults

Startprogram

ProcessData

Outputresults

Should behave “exactly” the same!(except VM might be slower)

Challenges for Virtualization

• Virtualizing user programs is not hard• A time-sharing OS (i.e., UNIX) will work

• Virtualizing a whole OS is the main challenge• According to Popek & Goldberg (1974),

most architectures such as x86 are not “virtualizable”• VMWare workstation is designed to solve the x86

virtualization problem

8

OS Background

9

What Does An OS Consist Of?

• Where most applications run

• Many spaces in parallel

• Cannot access states of other applications unless allowed

10

User Space

Kernel• Control hardware and resources

• Sharing states (e.g., pipes)

Differentiating User Space & Kernel

• An architecture typically have two protection rings• Ring 0: kernel• Ring 3: user spaces

• Only ring 0 can run privileged instructions• Setting up page tables• Setting up interrupt handlers• Changing user contexts• Assigning segment registers (%cs, %ds, %fs, %gs, etc)• … etc

11

Ring 1 & 2 are mostly not usedor even not supported

Context Switches

12

User Space

(run in ring 3)

Kernel(run in ring 0)

• System calls(syscall instruction)

• Exceptions(e.g., page faults)

• Interrupts (e.g., timers)

iretinstruction

Virtual Memory Management

13

Kernel Application 1 Application 2

Physical Memory

Page Page Page Page Page

Page Page Page Page Page

Demand paging:Allocating virtual memoryas needed.

Virtual Address Physical Address

14

001000111 110000111 111001000 001110011 011011100000PageTable(cr3) …

……

……

……

Page

OffsetPhysical Address (PA)= Physical page address + page offset

Virtual address (48 bits in binary)

011011100000

001000111

110000111111001000

001110011

Page Fault Handling

15

001000111 110000111 111001000 001110011 011011100000

Kernel

Trap into kernel

NewPage

Alloc

Upd

ate

Virtual address (48 bits in binary)

PageTable(cr3) …

……

……

……

001000111

110000111111001000

001110011

User Space

Invalid

Virtualization:The VMWare Approach

16

Kernel vs VMM

• An OS relies on hardware protections to ensure all resources are controlled by kernel

• VMM has the exact same requirement

How to subvert the control of OS kernel whenVMM is in charge?

17

Trap-and-Emulate

• Any privileged instruction needs to be trapped into VMM before the OS kernel

18

User Space(ring 3)

Kernel(ring 1 or ring 3)

iretinstruction

System calls,Exceptions,or Interrupts

VMM(ring 0)

Trap

Emulateand return

Example: Timer Interrupts

19

GuestKernel

A

GuestKernel

B

Set alarm in 1 second

Set alarm in 3 second

VMM

Trap Trap

1. Set alarm in 1 second on core 1

2. Set alarm in 3 seconds on core 2

Core 1 Core 2

Example: Page Table

20

GuestKernel

Set page table (cr3)

VMM

Trap

Set page table (cr3)for the guest kernel

But what aboutpage table entries?

GuestPageTable …

……

……

001000111

110000111111001000

001110011 New Page

Shadow Page Table

21

GuestKernel

ShadowPageTable(cr3)

VMM Update

……

……

……

001000111

110000111111001000

001110011 New Page

Read-only

Pagefault

Read-only Read-only Read-only

Update

Memory Management

• Requires VMM to trace the changes in the guest page table (i.e., page tracing)

• Pages for memory-mapped IO Read or write to the pages trigger interaction

with I/O devices Also trap-and-emulated by VMM

22

World Switch

23

GuestUser

Space

GuestKernel

VMM

GuestUser

Space

GuestKernel

VMM

GuestUser

Space

GuestKernel

VMM

GuestUser Mode

GuestKernel Mode

VMMMode

Trap Trap

Switch

Switch

VMM injects memory into guest spacefor efficient trapping.

Protectedfrom guest

x86 Was Not Virtualizable

• Popek & Goldberg: all privileged instructions need to be trapped in non-kernel mode (ring > 0)

• Many x86 instructions are not trappable• Example 1: PUSH %cs pushes current protection level on

the stack, so guest kernel can see ring != 0• Example 2: POPF can enable/disable interrupt in ring 0,

but silently ignored in ring > 0• VMM never gets the chance to emulate!

24

Dynamic Binary Translation

25

GuestData

GuestCode

On-diskimage

GuestData

Native Executable

Code

In-memory

VMMmodifies thebinary code

Untrapped privileged instructionscan be either:

(1) Replaced with emulation codewhich originally runs in VMM

(2) Injected with other trappableinstructions (e.g., syscall)

Virtualizing I/O Devices (1/2)• Assigning physical devices to guest kernels pose engineering

and security challenges because guest drivers can’t access devices directly

26

GuestKernel

HDD ETH VGA HDD ETH VGA

VMM

Drivers

GuestKernelDrivers

Virtualizing I/O Devices (1/2)• VMM creates virtual devices to multiplex access to

I/O resources

27

vHDD vETH vVGA vHDD vETH vVGA

VMM

HDD ETH VGA

GuestKernelDrivers

GuestKernelDrivers

Use of A Host Operating System

28

GuestKernel

GuestKernel

Virtual HW Virtual HWHost

OSVMM

Standard emulated devices:SCSI/IDE HDDs, e1000 eth cards,COM1/2 serial ports, VGA cards,keyboards, mouses, etc

Drivers

HDD ETH VGA

A host OS makes emulationmuch easier by relying onexisting OS implementation.

Virtualization:The Hardware Approach

29

Modern Virtualization Solutions

• Virtualization nowadays are assisted by hardware• x86 is now virtualizable• Hardware-assisted paging replaced shadow paging• Virtualization-friendly I/O devices

30

Fixing x86 Virtualization

• Intel VT or AMD-V fulfilled the Popek & Goldberg requirements

31

GuestKernel

(ring 0 non-root)

Trap by privilegedinstructions

VMM(ring 0 root)

VM Exit

VM Enter

VMCSSpecify which instructionscan be trapped(including ones that are previouslyuntrappable, e.g., POPF)

(Virtual MachineControl Structure)

VM Exits

32

GuestKernel

(ring 0 non-root)VMM(ring 0 root) MOV RAX->CR3

VMCS

Switch to VMCS.host_cr3

Guest-specificVMCS

Find out the new CR3 from RAX

Assign to VMCS.guest_cr3

VM Exit

Switch to new CR3

VMRESUMEinstruction

Memory Virtualization (1/3)• Intel VT-x (extended page table) & AMD SVM (nested page table)

33

……

……

001000111 110000111111001000

001110011

GuestPage Table(CR3)

001000111 110000111 111001000 001110011 011011100000gVA:

011001101 111001110 101000110 000111001 011011100000gPA:

……

……

011001101 111001110101000110

000111001

ExtendedPage Table

(in VMCS)

hPA

Memory Virtualization (2/3)• Guest can update CR3 or page tables w/o trapping into VMM

34

……

……

001000111 110000111111001000

001110011

001000111 110000111 111001000 001110011 011011100000gVA:

011001101 111001110 101000110 000111010 011011100000gPA:

……

……

011001101 111001110101000110

000111010

New Page

GuestPage Table(CR3)

ExtendedPage Table

(in VMCS)

Memory Virtualization (3/3)• VMM only manages the guest physical pages

35

……

……

001000111 110000111111001000

001110011

001000111 110000111 111001000 001110011 011011100000gVA:

011001101 111001110 101000110 000111010 011011100000gPA:

……

……

011001101 111001110101000110

000111010

New Page

Invalid

Not your physical page!!!

GuestPage Table(CR3)

ExtendedPage Table

(in VMCS)

I/O Virtualization (1/2)• Intel VT-d and AMD-Vi allow direct I/O to assigned devices Virtual IOMMU (vIOMMU)

36

Guest Kernel

HDD ETH VGA HDD ETH VGA

VMM

Driver

DMA

Driver

DMA

Driver

DMA

Guest Kernel

Driver

DMA

Driver

DMA

Driver

DMA

Setup

Still can’t share devices!

I/O Virtualization (2/2)

• Single-Root I/O Virtualization (SR-IOV):A specification for sharing PCI-e devices with multiple guests

37

Guest Kernel

Driver

Guest Kernel

Driver

SR-IOV supportedethernet card

Presented as multiple devicescontrolled by PCI-evirtual functions (VFs)

VF VF