Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement...

37
Virtualization CSCE678

Transcript of Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement...

Page 1: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtualization

CSCE678

Page 2: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Announcement

• Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t watched)

• All optional readings are posted If you don’t like your material, let me know

2

Page 3: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

VirtualizationBasics

3

Page 4: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtualization in Cloud

• Multi-tenancy:splitting hardware resources for multiple tenants

• Security Isolation:tenants are isolated from each other

• Legacy software:tenants run their OSes and applications without modification

• Flexibility of deployment:tenants can be easily deployed, migrated, or removed from a physical host

4

Page 5: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Physical Machine

Physical vs Virtual Machines

5

Virtual Machine

(Guest)OS

Application

Virtual Machine

(Guest)OS

Application

Virtual Machine MonitorPhysical Machine

OS

Application

Physical Machine

OS

Application

Without virtualization With virtualization

Page 6: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Definition of Virtualization

Popek & Goldberg (1974):

• Identical environment:Programs in VMs should have identical effects as running on real machines

• Efficiency: should not add too much overheads

• Resource control:Virtual machine monitor (VMM) has complete control of hardware resources

6

Page 7: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Identicality

7

Physical Machine

OS

Application

Physical Machine

Virtual Machine

(Guest)OS

Application

VMM

Startprogram

ProcessData

Outputresults

Startprogram

ProcessData

Outputresults

Should behave “exactly” the same!(except VM might be slower)

Page 8: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Challenges for Virtualization

• Virtualizing user programs is not hard• A time-sharing OS (i.e., UNIX) will work

• Virtualizing a whole OS is the main challenge• According to Popek & Goldberg (1974),

most architectures such as x86 are not “virtualizable”• VMWare workstation is designed to solve the x86

virtualization problem

8

Page 9: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

OS Background

9

Page 10: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

What Does An OS Consist Of?

• Where most applications run

• Many spaces in parallel

• Cannot access states of other applications unless allowed

10

User Space

Kernel• Control hardware and resources

• Sharing states (e.g., pipes)

Page 11: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Differentiating User Space & Kernel

• An architecture typically have two protection rings• Ring 0: kernel• Ring 3: user spaces

• Only ring 0 can run privileged instructions• Setting up page tables• Setting up interrupt handlers• Changing user contexts• Assigning segment registers (%cs, %ds, %fs, %gs, etc)• … etc

11

Ring 1 & 2 are mostly not usedor even not supported

Page 12: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Context Switches

12

User Space

(run in ring 3)

Kernel(run in ring 0)

• System calls(syscall instruction)

• Exceptions(e.g., page faults)

• Interrupts (e.g., timers)

iretinstruction

Page 13: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtual Memory Management

13

Kernel Application 1 Application 2

Physical Memory

Page Page Page Page Page

Page Page Page Page Page

Demand paging:Allocating virtual memoryas needed.

Page 14: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtual Address Physical Address

14

001000111 110000111 111001000 001110011 011011100000PageTable(cr3) …

……

……

……

Page

OffsetPhysical Address (PA)= Physical page address + page offset

Virtual address (48 bits in binary)

011011100000

001000111

110000111111001000

001110011

Page 15: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Page Fault Handling

15

001000111 110000111 111001000 001110011 011011100000

Kernel

Trap into kernel

NewPage

Alloc

Upd

ate

Virtual address (48 bits in binary)

PageTable(cr3) …

……

……

……

001000111

110000111111001000

001110011

User Space

Invalid

Page 16: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtualization:The VMWare Approach

16

Page 17: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Kernel vs VMM

• An OS relies on hardware protections to ensure all resources are controlled by kernel

• VMM has the exact same requirement

How to subvert the control of OS kernel whenVMM is in charge?

17

Page 18: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Trap-and-Emulate

• Any privileged instruction needs to be trapped into VMM before the OS kernel

18

User Space(ring 3)

Kernel(ring 1 or ring 3)

iretinstruction

System calls,Exceptions,or Interrupts

VMM(ring 0)

Trap

Emulateand return

Page 19: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Example: Timer Interrupts

19

GuestKernel

A

GuestKernel

B

Set alarm in 1 second

Set alarm in 3 second

VMM

Trap Trap

1. Set alarm in 1 second on core 1

2. Set alarm in 3 seconds on core 2

Core 1 Core 2

Page 20: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Example: Page Table

20

GuestKernel

Set page table (cr3)

VMM

Trap

Set page table (cr3)for the guest kernel

But what aboutpage table entries?

Page 21: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

GuestPageTable …

……

……

001000111

110000111111001000

001110011 New Page

Shadow Page Table

21

GuestKernel

ShadowPageTable(cr3)

VMM Update

……

……

……

001000111

110000111111001000

001110011 New Page

Read-only

Pagefault

Read-only Read-only Read-only

Update

Page 22: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Memory Management

• Requires VMM to trace the changes in the guest page table (i.e., page tracing)

• Pages for memory-mapped IO Read or write to the pages trigger interaction

with I/O devices Also trap-and-emulated by VMM

22

Page 23: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

World Switch

23

GuestUser

Space

GuestKernel

VMM

GuestUser

Space

GuestKernel

VMM

GuestUser

Space

GuestKernel

VMM

GuestUser Mode

GuestKernel Mode

VMMMode

Trap Trap

Switch

Switch

VMM injects memory into guest spacefor efficient trapping.

Protectedfrom guest

Page 24: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

x86 Was Not Virtualizable

• Popek & Goldberg: all privileged instructions need to be trapped in non-kernel mode (ring > 0)

• Many x86 instructions are not trappable• Example 1: PUSH %cs pushes current protection level on

the stack, so guest kernel can see ring != 0• Example 2: POPF can enable/disable interrupt in ring 0,

but silently ignored in ring > 0• VMM never gets the chance to emulate!

24

Page 25: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Dynamic Binary Translation

25

GuestData

GuestCode

On-diskimage

GuestData

Native Executable

Code

In-memory

VMMmodifies thebinary code

Untrapped privileged instructionscan be either:

(1) Replaced with emulation codewhich originally runs in VMM

(2) Injected with other trappableinstructions (e.g., syscall)

Page 26: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtualizing I/O Devices (1/2)• Assigning physical devices to guest kernels pose engineering

and security challenges because guest drivers can’t access devices directly

26

GuestKernel

HDD ETH VGA HDD ETH VGA

VMM

Drivers

GuestKernelDrivers

Page 27: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtualizing I/O Devices (1/2)• VMM creates virtual devices to multiplex access to

I/O resources

27

vHDD vETH vVGA vHDD vETH vVGA

VMM

HDD ETH VGA

GuestKernelDrivers

GuestKernelDrivers

Page 28: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Use of A Host Operating System

28

GuestKernel

GuestKernel

Virtual HW Virtual HWHost

OSVMM

Standard emulated devices:SCSI/IDE HDDs, e1000 eth cards,COM1/2 serial ports, VGA cards,keyboards, mouses, etc

Drivers

HDD ETH VGA

A host OS makes emulationmuch easier by relying onexisting OS implementation.

Page 29: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Virtualization:The Hardware Approach

29

Page 30: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Modern Virtualization Solutions

• Virtualization nowadays are assisted by hardware• x86 is now virtualizable• Hardware-assisted paging replaced shadow paging• Virtualization-friendly I/O devices

30

Page 31: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Fixing x86 Virtualization

• Intel VT or AMD-V fulfilled the Popek & Goldberg requirements

31

GuestKernel

(ring 0 non-root)

Trap by privilegedinstructions

VMM(ring 0 root)

VM Exit

VM Enter

VMCSSpecify which instructionscan be trapped(including ones that are previouslyuntrappable, e.g., POPF)

(Virtual MachineControl Structure)

Page 32: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

VM Exits

32

GuestKernel

(ring 0 non-root)VMM(ring 0 root) MOV RAX->CR3

VMCS

Switch to VMCS.host_cr3

Guest-specificVMCS

Find out the new CR3 from RAX

Assign to VMCS.guest_cr3

VM Exit

Switch to new CR3

VMRESUMEinstruction

Page 33: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Memory Virtualization (1/3)• Intel VT-x (extended page table) & AMD SVM (nested page table)

33

……

……

001000111 110000111111001000

001110011

GuestPage Table(CR3)

001000111 110000111 111001000 001110011 011011100000gVA:

011001101 111001110 101000110 000111001 011011100000gPA:

……

……

011001101 111001110101000110

000111001

ExtendedPage Table

(in VMCS)

hPA

Page 34: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Memory Virtualization (2/3)• Guest can update CR3 or page tables w/o trapping into VMM

34

……

……

001000111 110000111111001000

001110011

001000111 110000111 111001000 001110011 011011100000gVA:

011001101 111001110 101000110 000111010 011011100000gPA:

……

……

011001101 111001110101000110

000111010

New Page

GuestPage Table(CR3)

ExtendedPage Table

(in VMCS)

Page 35: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

Memory Virtualization (3/3)• VMM only manages the guest physical pages

35

……

……

001000111 110000111111001000

001110011

001000111 110000111 111001000 001110011 011011100000gVA:

011001101 111001110 101000110 000111010 011011100000gPA:

……

……

011001101 111001110101000110

000111010

New Page

Invalid

Not your physical page!!!

GuestPage Table(CR3)

ExtendedPage Table

(in VMCS)

Page 36: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

I/O Virtualization (1/2)• Intel VT-d and AMD-Vi allow direct I/O to assigned devices Virtual IOMMU (vIOMMU)

36

Guest Kernel

HDD ETH VGA HDD ETH VGA

VMM

Driver

DMA

Driver

DMA

Driver

DMA

Guest Kernel

Driver

DMA

Driver

DMA

Driver

DMA

Setup

Still can’t share devices!

Page 37: Virtualization - Texas A&M Universitycourses.cse.tamu.edu/chiache/csce678/s19/slides/...Announcement • Lecture video for Jan 18 is available on Piazza and blackboard (if you haven’t

I/O Virtualization (2/2)

• Single-Root I/O Virtualization (SR-IOV):A specification for sharing PCI-e devices with multiple guests

37

Guest Kernel

Driver

Guest Kernel

Driver

SR-IOV supportedethernet card

Presented as multiple devicescontrolled by PCI-evirtual functions (VFs)

VF VF