Latency in Cloud Computing and Recent Research on IDC in Xenjain/cs770-13/ftp/sisu_xen.pdf ·...
Transcript of Latency in Cloud Computing and Recent Research on IDC in Xenjain/cs770-13/ftp/sisu_xen.pdf ·...
LatencyinCloudComputingandRecentResearchonIDCinXen
Sisu XiNetwork Seminar on 03/04/2013
LatencyMatterstoServices Amazon: Revenue decreased by 1% of sales for every 100 ms latency
http://highscalability.com/blog/2009/7/25/latency‐is‐everywhere‐and‐it‐costs‐you‐sales‐how‐to‐crush‐it.html
Google: slowing down the search results page by 100 ms to 400 ms has a measurable impact on the number of searches per user of ‐0.2% to ‐0.6% http://googleresearch.blogspot.com/2009/06/speed‐matters.html
Firefox: 2.2 seconds faster web response increases 15.4% more Firefox install package download. (equals 10.28 million additional downloadsper year) http://blog.mozilla.org/metrics/2010/04/05/firefox‐page‐load‐speed‐‐‐part‐ii/
2
IntotheVirtualizedWorld Infrastructure as a Service (IaaS)
Amazon EC2: Amazon Elastic Compute Cloud• http://aws.amazon.com/media‐sharing/
Microsoft Azure: Use your OS, language, database, tool• http://www.windowsazure.com/en‐us/
Google Compute Engine: Run your large‐scale computing workload• https://cloud.google.com/products/compute‐engine
Question: Can these services guarantee network latency to end user?
3
Outline Current services in cloud computing
Microsoft Azure Google Compute Engine Amazon EC2
Networking in Xen Para‐virtualization network architecture
IDC in Xen Why is it important? Three Shared‐Memory Approaches Our Approach: RTCA
Summary
4
MicrosoftAzure
3/4/2013 5
Windows Azure Pricinghttp://www.windowsazure.com/en‐us/pricing/calculator/?scenario=virtual‐machines
Windows Azure Pricinghttp://www.windowsazure.com/en‐us/pricing/calculator/?scenario=virtual‐machines
GoogleComputeEngine
3/4/2013 6
Google Compute Engine Pricinghttps://cloud.google.com/pricing/compute‐engineGoogle Compute Engine Pricinghttps://cloud.google.com/pricing/compute‐engine
AmazonEC2
3/4/2013 7
Amazon EC2 Instance Typeshttp://aws.amazon.com/ec2/instance‐types/Amazon EC2 Instance Typeshttp://aws.amazon.com/ec2/instance‐types/
CurrentServicesinCloudComputing CPU, Memory resources can be dedicated, which provide
highest level isolation Network resource are usually shared
no mechanism for rate‐control, let alone priority coarse grand indicator (low/medium/large) in Amazon, can pay more to get dedicated network resources
Recall: Can these services guarantee network latency to end user?
8
AmazonEC2inAction
9
The Impact of Virtualization on Network Performance of Amazon EC2 Data CenterGuohui Wang, T.S. Eugene Ng, INFOCOM 2010The Impact of Virtualization on Network Performance of Amazon EC2 Data CenterGuohui Wang, T.S. Eugene Ng, INFOCOM 2010
EnablingTechnologies‐‐ Xen
10
Scheduling I/O in Virtual Machine MonitorsDiego Ongaro, Alan L. Cox, Scott Rixner, VEE 2008Scheduling I/O in Virtual Machine MonitorsDiego Ongaro, Alan L. Cox, Scott Rixner, VEE 2008
Outline Current services in cloud computing
Microsoft Azure Google Compute Engine Amazon EC2
Networking in Xen Para‐virtualization network architecture
IDC in Xen Why is it important? Three Shared‐Memory Approaches Our Approach: RTCA
Summary
11
B
XenOverview
12
A
NIC
VMM Scheduler
VCPU
netfront
Domain 1
VCPU
NIC driver
softnet_data
netback
Domain 0
VCPU
netfront
Domain 2
… …
Outline Current services in cloud computing
Microsoft Azure Google Compute Engine Amazon EC2
Networking in Xen Para‐virtualization network architecture
IDC in Xen Why is IDC important? Three Shared‐Memory Approaches Our Approach: RTCA
Summary
13
IDCinXen
14
BA
VMM Scheduler
VCPU
netfront
Domain 1
VCPU
softnet_data
netback
Domain 0
VCPU
netfront
Domain 2
… …
IDC : Inter‐Domain Communication
WhyisIDCImportant
15
Hardware Now: Intel Xeon E7‐8870, 10 cores, 20 threads
• http://ark.intel.com/products/53580/Intel‐Xeon‐Processor‐E7‐8870‐30M‐Cache‐2_40‐GHz‐6_40‐GTs‐Intel‐QPI
Future: Intel’s 80‐Core CPU Running at 5.7 GHz• http://news.softpedia.com/news/Intel‐039‐s‐80‐Core‐CPU‐Running‐
at‐5‐7‐GHz‐46881.shtml
System Administrator “By optimizing the placement of VMs on host machines, traffic
patterns among VMs can be better aligned with the communication distance between them, e.g. VMs with large mutual bandwidth usage are assigned to host machines in close proximity.”
Improving the Data Center Networks with Traffic‐aware Virtual Machine Placement Xiaoqiao Meng, et.al, INFOCOM, 2010Improving the Data Center Networks with Traffic‐aware Virtual Machine Placement Xiaoqiao Meng, et.al, INFOCOM, 2010
WhyisIDCImportant
16
Embedded Systems Integrated Modular Avionics
• ARINC 653 Standard• Honeywell claims that IMA design can save 350 pounds of weight on
a narrow‐body jet: equivalent to two adults• http://www.artist‐
embedded.org/docs/Events/2007/IMA/Slides/ARTIST2_IMA_WindRiver_Wilson.pdf
Can IDC provide guaranteed network latency to end user?Can IDC provide guaranteed network latency to end user?
Full Virtualization based ARINC 653 partitionSanghyun Han, Digital Avionics Systems Conference (DASC), 2011Full Virtualization based ARINC 653 partitionSanghyun Han, Digital Avionics Systems Conference (DASC), 2011
ARINC 653 HypervisorVanderLeest S.H., Digital Avionics Systems Conference (DASC), 2010ARINC 653 HypervisorVanderLeest S.H., Digital Avionics Systems Conference (DASC), 2010
Domain-0Domain-U
XenVirtualNetwork
17
socket(AF_INET, SOCKET_DGRAM, 0);socket(AF_INET, SOCKET_STREAM, 0);
sendto(…)recvfrom(…)
VMM
app
kernel
TCP
IP
Netback Driver
UDP
INET
TCP
IP
Netfront Driver
UDP
INET
Transparent Isolation General Migration
X PerformanceX Data IntegrityX Multicast
XenSocket
18
Domain-UVMM
app
kernel
TCP
IP
Netfront
UDP
INETAF_Xen
Netfront
Performance
X TransparentX One way CommunicationX Patch Guest OS
XenSocket: A High‐Throughput Interdomain Transport for Virtual MachinesXiaolan Zhang et. al, IBM, Middleware, 2007XenSocket: A High‐Throughput Interdomain Transport for Virtual MachinesXiaolan Zhang et. al, IBM, Middleware, 2007
XWAY
19
Domain-UVMM
XWAY switch
TCP
IP
XWAYprotocol
NetfrontXWAY driver
UDP
INET
app
kernel
Performance Dynamic Create/Destroy Live Migration
X Patch Guest OSX MigrationX No UDPX Complicated
XWAY: Inter‐domain Socket Communications Supporting High Performance and Full Binary Compatibility on Xen Kangho Kim et. al, VEE, 2008XWAY: Inter‐domain Socket Communications Supporting High Performance and Full Binary Compatibility on Xen Kangho Kim et. al, VEE, 2008
XenLoop
20
Domain-UVMM
app
kernel
socket(AF_INET, SOCKET_DGRAM, 0);socket(AF_INET, SOCKET_STREAM, 0);
sendto(…)recvfrom(…)
TCP
IP
Netfront
UDP
INET
XenLoop
Transparent Performance Migration
X Kernel Module in Guest OSX Domain 0 Co‐operationX Migration
XenLoop: A Transparent High Performance Inter‐VM Network LoopbackJian Wang et. al, HPDC 2008XenLoop: A Transparent High Performance Inter‐VM Network LoopbackJian Wang et. al, HPDC 2008
SummaryforSharedMemoryinIDC All require modification to the Guest OS
XenSocket needs to re‐compile guest kernel, modify the app XWAY needs to re‐compile guest kernel XenLoop needs to load a kernel module, and co‐operation with
Domain 0
Issue with migration XenSocket does not support XWAY/XenLoop requires dynamic teardown channels between
two domains, which incurs extra overhead
21
Recall:XenNetworkArchitecture
22
BA
VMM Scheduler
VCPU
netfront
Domain 1
VCPU
softnet_data
netback
Domain 0
VCPU
netfront
Domain 2
… …
VMMScheduler:Evaluation
23
VMM Scheduler: RT‐Xen VS. Credit
C 5C 0 C 1 C 3 C 4
Dom 0
Linux 3.4.2100% CPU
sent pkt every 10ms5,000 data points
C 2
…
When Domain 0 is not busy, the VMM scheduler dominates the IDC performance for higher priority domains
When Domain 0 is not busy, the VMM scheduler dominates the IDC performance for higher priority domains
VMMScheduler:Enough???
24
VMM Scheduler
C 5C 0 C 1 C 2 C 4C 3
Dom 0
100% CPU
…
…
…
Domain0:Background
25
B B
A
netfront
Domain 1 Domain 0
… A
netfront
Domain 2
…netif netif
TX RXnetback
netback[0] {rx_action();tx_action(); }
netfront
Domain m
…
netfront
Domain n
…
… …netif
netif netif
netif
softnet_data
Packets are fetched in a round-robin orderSharing one queue in softnet_data
Domain0:RTCA
26
Packets are fetched by priority, up to batch size
A
netfront
Domain 1 Domain 0
… A
netfront
Domain 2
…netif netif
TX RXnetback
netback[0] {rx_action();tx_action(); }
… …
softnet_data
B
netfront
Domain m
…
netif
netifnetfront
Domain n
…
…
netif
netif
B
Queues are separated by priority in softnet_data
RTCA:EvaluationSetup
27
VMM Scheduler
C 5C 0 C 1 C 2 C 4C 3
Dom 0
100% CPUOriginal vs. RTCA
Interference
Medium
Heavy
Light
Base
…
…
…
sent pkt every 10ms5,000 data points
RTCA:Latency
28
When there are no interference, IDC performances are comparable
Original Domain 0 performs poor under all cases• Due to priority inversion within Domain 0
RTCA with batch size 1 performs best• we eliminate most of the priority inversions
RTCA with larger bath sizes perform worse under IDC interference
By reducing priority inversion in Domain 0, RTCA can effectively mitigate impacts of low priority traffic on the latency of high priority IDC
By reducing priority inversion in Domain 0, RTCA can effectively mitigate impacts of low priority traffic on the latency of high priority IDC
IDC Latency between Domain 1 and Domain 2 in presence of low priority IDC (us)
RTCA:Throughput
29
Base Light Medium Heavy0
2
4
6
8
10
12G
bits
/s
RTCA, Size 1RTCA, Size 64RTCA, Size 238Original
A small batch size leads to significant reduction in high priority IDC latency and improved IDC throughput under interfering traffic
A small batch size leads to significant reduction in high priority IDC latency and improved IDC throughput under interfering traffic
iPerf Throughput between Dom 1 and Dom 2
Summary Current services in cloud computing
Microsoft Azure Google Compute Engine Amazon EC2
Networking in Xen Para virtualization network architecture
IDC in Xen Why is it important? Three Shared‐Memory Approaches Our Approach: RTCA
Summary
30
BackupSlides
31
Multiple computing elements Cost! Weight! Power! Communicate via dedicated network or real‐time networks
Use fewer computing platforms to integrate independently developed systems
Motivation
32
Physically Isolated Hosts ‐> Common Computing PlatformsPhysically Isolated Hosts ‐> Common Computing Platforms
Network Communication ‐> Local Inter‐Domain CommunicationNetwork Communication ‐> Local Inter‐Domain Communication
Preserve Real‐Time Properties with Virtualization???Preserve Real‐Time Properties with Virtualization???
SystemModel We focus on:
Xen as the underlying virtualization software Single core for each virtual machine on a multi‐core platform Local Inter‐Domain Communication (IDC) No modification to the guest domain besides the Xen patch
Future work: Multi‐core for each virtual machine (domain) Integrating with the Network Interface Card (NIC)
33
B
Background– XenOverview
34
A
NIC
VMM Scheduler
VCPU
netfront
Domain 1
VCPU
NIC driver
softnet_data
netback
Domain 0
VCPU
netfront
Domain 2
… …
PartI– VMMScheduler:Limitations Default Credit Scheduler:
schedule VCPUs in round‐robin order
RT‐Xen Scheduler schedule VCPUs by priority
However: If execution time < 0.5 ms, VCPU budget is not consumed
Solutions: Dual quanta: ms for scheduling, while us for time accounting
35
“Realizing Compositional Scheduling through Virtualization”, Real‐Time and Embedded Technology and Application Symposium (RTAS), 2012
“Realizing Compositional Scheduling through Virtualization”, Real‐Time and Embedded Technology and Application Symposium (RTAS), 2012
“RT‐Xen: Towards Real‐Time Hypervisor Scheduling in Xen”, ACM International Conferences on Embedded Software (EMSOFT), 2011
“RT‐Xen: Towards Real‐Time Hypervisor Scheduling in Xen”, ACM International Conferences on Embedded Software (EMSOFT), 2011
Conclusion
36
Hardware
VMM Scheduler
VCPU
netfront
Domain 1
VCPU
softnet_data
netbackDomain 0
VCPU
netfront
Domain 2
VMM scheduler alone cannot guarantee real‐time IDC RTCA: Real‐Time Communication Architecture RTCA + real‐time VMM scheduler reduces high priority IDC
latency from ms to us in the presence of low priority IDC https://sites.google.com/site/realtimexen/
End‐to‐EndTaskPerformance
37
VMM Scheduler: Credit vs. RT‐Xen
C 5C 0 C 1 C 2 C 4C 3
Interference
Medium
Heavy
Light
Dom 0
100% CPUOriginal vs. RTCA
T1(10, 2)
T2(20, 2)
T1(10, 2)
T3(20, 2)
T1(10, 2)
T4(30, 2)
Dom 1 & Dom 2•60% CPU each
Dom 3 to Dom 10•10% CPU each•4 pairs bouncing packets
Dom 3Dom 4
Dom 5Dom 6
Dom 7Dom 8
Dom 9Dom 10
Base
End‐to‐EndTaskPerformance
38
By combining the RT‐Xen VMM scheduler and the RTCA Domain 0 kernel, we can deliver end‐to‐end real‐time performance to tasks involving both computation and communication
By combining the RT‐Xen VMM scheduler and the RTCA Domain 0 kernel, we can deliver end‐to‐end real‐time performance to tasks involving both computation and communication
Backup– Baseline
39
0 500 1000 15000
10
20
30
40
50
60
70
Packet Size
Mic
ro S
econ
ds
IDCLocal Thread
A
netfront
Domain‐1 Domain‐0
… A
netfront
Domain‐2
…netif netif
… …
softnet_data
NIC driver
multiple kthreads
B
netfront
Domain‐m
…
netif
netifnetfront
Domain‐n
…
…
netif
netif
B
TX RX
netback
TX RX
netback
TX RX
netback
priority kthreads
highest priority