DESIGN OF BARE METAL FABRICS - Built with SDN, Bare Metal Switches, and Merchant Silicon
Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in...
Transcript of Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in...
![Page 1: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/1.jpg)
Bare-Metal Performance for x86 I/O Virtualization
Muli Ben-Yehuda
Technion & IBM Research
HiPEAC Autumn Computing Systems Week in Barcelona
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 1 / 23
![Page 2: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/2.jpg)
Background: x86 machine virtualization
Running multiple different unmodified operating systemsEach in an isolated virtual machineSimultaneouslyOn the x86 architectureMany uses: live migration, record & replay, testing, security, . . .Foundation of IaaS cloud computingUsed nearly everywhere
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 2 / 23
![Page 3: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/3.jpg)
The problem is performance
Machine virtualization can reduce performance by orders ofmagnitude[Adams06,Santos08,Ram09,Ben-Yehuda10,Amit11,. . . ]Overhead limits use of virtualization in many scenariosWe would like to make it possible to use virtualization everywhereWhere does the overhead come from?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 3 / 23
![Page 4: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/4.jpg)
The origin of overhead
Popek and Goldberg’s virtualization model [Popek74]: Trap andemulatePrivileged instructions trap to the hypervisorHypervisor emulates their behaviorTraps cause an exitI/O intensive workloads cause many exits
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 4 / 23
![Page 5: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/5.jpg)
I/O virtualization via device emulation
GUEST
HOST
1
2
34
deviceemulation
driverdevice
driverdevice
Emulation is usually the default [Sugerman01]Works for unmodified guests out of the boxVery low performance, due to many exits on the I/O path
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 5 / 23
![Page 6: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/6.jpg)
I/O virtualization via paravirtualized devices
GUEST
HOST
driver
1
23
back−end
virtualdriver
front−end
virtualdevicedriver
Hypervisor aware drivers and “devices” [Barham03,Russell08]Requires new guest driversRequires hypervisor involvement on the I/O path
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 6 / 23
![Page 7: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/7.jpg)
I/O virtualization via device assignment
GUEST
HOST
devicedriver
Bypass the hypervisor on I/O path [Levasseur04,Ben-Yehuda06]SR-IOV devices provide sharing in hardwareBetter performance than paravirtual—but far from native
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 7 / 23
![Page 8: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/8.jpg)
Comparing I/O virtualization methods
IOV method throughput (Mb/s) CPU utilizationbare-metal 950 20%
device assignment 950 25%paravirtual 950 50%emulation 250 100%
netperf TCP_STREAM sender on 1Gb/s Ethernet (16K msgs)Device assignment best performing optionDevice assignment still 25% worse than bare metal. Why?
“The Turtles Project: Design and Implementation of Nested Virtualization”,Ben-Yehuda, Day, Dubitzky, Factor, Hare’El, Gordon, Liguori, Wasserman andYassour, OSDI ’10
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 8 / 23
![Page 9: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/9.jpg)
What does it mean, to do I/O?
Programmed I/O (in/outinstructions)Memory-mapped I/O (loadsand stores)Direct memory access (DMA)Interrupts
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 9 / 23
![Page 10: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/10.jpg)
Direct memory access (DMA)
All modern devices access memory directlyOn bare-metal:
A trusted driver gives its device an addressDevice reads or writes that address
Protection problem: guest drivers are not trustedTranslation problem: guest memory 6= host memoryDirect access: the guest bypasses the hostWhat to do?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 10 / 23
![Page 11: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/11.jpg)
IOMMU
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 11 / 23
![Page 12: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/12.jpg)
The IOMMU mapping memory/performance tradeoff
When does the host map and unmap translation entries?Direct mapping up-front on virtual machine creation: all memory ispinned, no intra-guest protectionDuring run-time: high cost in performanceWe want: direct mapping performance, intra-guest protection,minimal pinning
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 12 / 23
![Page 13: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/13.jpg)
vIOMMU: efficient IOMMU emulation
Emulate an IOMMU so that weknow when to map and unmapUse a sidecore [Kumar07] forefficient emulation: avoid costlyexits by running emulation onanother core in parallelOptimistic teardown: relaxprotection to increaseperformance by cachingtranslation entriesvIOMMU provides highperformance with intra-guestprotection and minimal pinning
IOMMU
I/O Device
Memory
I/O DeviceDriver
IOMMUMapping
Layer
GuestDomain
EmulationDomain(Sidecore)
SystemDomain
IOMMUEmulation
(2) UpdateMappings Emul.
PTE
PhysicalPTE
(6) UpdateMappings
I/OBuffer
(9) IOVAAccess
(7) IOTLB Invalidations
Emul.IOMMURegs.
(4) Poll
(3) IOTLB Invd.
(1)Map / Unmap
I/O Buffer
(11)PhysicalAccess
(8) Transactionto IOVA
(10)Translate
(5) Read
“vIOMMU: Efficient IOMMU Emulation”, Amit, Ben-Yehuda, Schuster, Tsafrir,USENIX ATC ’11
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 13 / 23
![Page 14: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/14.jpg)
Problem solved?
netperf TCP_STREAMsender on 10Gb/s Ethernetwith 256 byte messagesUsing device assignment withdirect mapping in the IOMMUOnly achieves 60% ofbare-metal performanceSame results for memcachedand apache
Where does the rest go?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 14 / 23
![Page 15: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/15.jpg)
Recap: doing I/O
Programmed I/O (in/out instructions)Memory-mapped I/O (loads and stores)Direct memory access (DMA)Interrupts: approximately 49,000 interrupts per second with Linux
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 15 / 23
![Page 16: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/16.jpg)
ELI: ExitLess Interrupts
bare-metal
Baseline
guest
hypervisor
(time)
ELI delivery
guest
hypervisor
ELIdelivery & completion
guest
hypervisor
PhysicalInterrupt
Interrupt Completion
(a)
(b)
(c)
Interrupt Injection
Interrupt Completion
(d)
ELI: direct interrupts for unmodified, untrusted guests
“ELI: Bare-Metal Performance for I/O Virtualization”, Gordon, Amit, Hare’El,Ben-Yehuda, Landau, Schuster, Tsafrir, ASPLOS ’12
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 16 / 23
![Page 17: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/17.jpg)
ELI: delivery
ShadowIDT
Hypervisor
ShadowIDT
InterruptHandler
AssignedInterrupt
PhysicalInterrupt
Non-assignedInterrupt(#NP/#GP exit)
ELIDelivery
GuestIDT
VM
IDT Entry
IDT Entry
…
IDT Entry
P=0
P=1
P=0
Handler
#NP
#NP
IDT Entry#GP
IDTRLimit
All interrupts are delivered directly to the guestHost and other guests’ interrupts are bounced back to the host. . . without the guest being aware of it
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 17 / 23
![Page 18: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/18.jpg)
ELI: signaling completion
Guests signal interrupt completions by writing to the LocalAdvance Programmable Interrupt Controller (LAPIC)End-of-Interrupt (EOI) registerOld LAPIC: hypervisor traps load/stores to LAPIC pagex2APIC: hypervisor can trap specific registers
Signaling completion without trapping requires x2APICELI gives the guest direct access only to the EOI register
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 18 / 23
![Page 19: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/19.jpg)
ELI: threat model
Threats: malicious guests might try to:keep interrupts disabledsignal invalid completionsconsume other guests or host interrupts
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 19 / 23
![Page 20: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/20.jpg)
ELI: protection
VMX preemption timer to force exits instead of timer interruptsIgnore spurious EOIsProtect critical interrupts by:
Delivering them to a non-ELI core if availableRedirecting them as NMIs→unconditional exitUse IDTR limit to force #GP exits on critical interrupts
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 20 / 23
![Page 21: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/21.jpg)
Bare-metal Performance for I/O Virtualization
Throughput is scaled so 100% means bare-metal throughputAll workloads reach 97–100% of bare metal with ELI!CPU is saturated; host uses huge pages to back guest memoryFull experimental details and analysis in ASPLOS paper
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 21 / 23
![Page 22: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/22.jpg)
Conclusion
IOMMUs take the host out of the DMA pathELI takes the host out of the interrupt pathAchievement unlocked: bare-metal performance for x86 VMs
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 22 / 23
![Page 23: Bare-Metal Performance for x86 I/O Virtualization · HiPEAC Autumn Computing Systems Week in Barcelona Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization](https://reader035.fdocuments.net/reader035/viewer/2022081407/604e52a6173afc7f512da563/html5/thumbnails/23.jpg)
Thank you! Questions?
Muli Ben-Yehuda (Technion & IBM Research) Bare-Metal Perf. for I/O Virtualization HiPEAC CSW Nov, 2011 23 / 23