vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload
description
Transcript of vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload
![Page 1: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/1.jpg)
vSnoop: Improving TCP Throughput in Virtualized Environments
via Acknowledgement Offload
Ardalan Kangarlou, Sahan Gamage, Ramana Kompella, Dongyan Xu
Department of Computer SciencePurdue University
![Page 2: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/2.jpg)
Cloud Computing and HPC
![Page 3: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/3.jpg)
Background and Motivation Virtualization: A key enabler of cloud
computing Amazon EC2, Eucalyptus
Increasingly adopted in other real systems: High performance computing
NERSC’s Magellan system Grid/cyberinfrastructure computing
In-VIGO, Nimbus, Virtuoso
![Page 4: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/4.jpg)
Multiple VMs hosted by one physical host Multiple VMs sharing the same core
Flexibility, scalability, and economy
VM Consolidation: A Common Practice
Hardware
Virtualization Layer
VM 1 VM 3 VM 4VM 2Key Observation:VM consolidation negatively
impacts network performance!
![Page 5: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/5.jpg)
Sender
Hardware
Virtualization Layer
Investigating the Problem
Server
VM 1 VM 2 VM 3Client
![Page 6: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/6.jpg)
40
60
80
100
120
140
160
180
5432
RTT
(ms)
Number of VMs
US East – West
US East – Europe
US West – Australia
RTT increases in proportion to VM scheduling slice
(30ms)
Q1: How does CPU Sharing affect RTT ?
![Page 7: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/7.jpg)
RTT Increase
Q2: What is the Cause of RTT Increase ?
Sender
Hardware
Driver Domain(dom0)
VM 1
Device Driver
VM 3
bufbuf
30ms
30ms
VM scheduling latency dominates
virtualization overhead!
CD
F
VM 2
buf
+ dom0 processing x wait time in buffer
![Page 8: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/8.jpg)
Connection to the VM is much slower than dom0!
Q3: What is the Impact on TCP Throughput ?
+ dom0 x VM
![Page 9: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/9.jpg)
Our Solution: vSnoop Alleviates the negative effect of VM
scheduling on TCP throughput Implemented within the driver domain to
accelerate TCP connections
Does not require any modifications to the VM
Does not violate end-to-end TCP semantics Applicable across a wide range of VMMs
Xen, VMware, KVM, etc.
![Page 10: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/10.jpg)
Sender VM1 BufferDriver Domain
time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
TCP Connection to a VMScheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACKSYN
VM Scheduling Latency
RTT
RTT
VM Scheduling Latency
Sender establishes a TCP connection to
VM1
![Page 11: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/11.jpg)
Sender VM Shared BufferDriver Domain
time
SYN
SYN,ACK
SYN
SYN,ACK
VM1 buffer
Key Idea: Acknowledgement OffloadScheduled VM
VM1
VM2
VM3
VM1
VM2
VM3
SYN,ACK
w/ vSnoop
Faster progress during TCP slowstart
![Page 12: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/12.jpg)
vSnoop’s Impact on TCP Flows TCP Slow Start
Early acknowledgements help progress connections faster
Most significant benefit for short transfers that are more prevalent in data centers [Kandula IMC’09], [Benson WREN’09]
TCP congestion avoidance and fast retransmit Large flows in the steady state can also benefit from
vSnoop Benefit not as much as for Slow Start
![Page 13: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/13.jpg)
Challenge 1: Out-of-order/special packets (SYN, FIN packets)
Solution: Let the VM handle these packets
Challenge 2: Packet loss after vSnoop Solution: Let vSnoop acknowledge only if room in
buffer
Challenge 3: ACKs generated by the VM Solution: Suppress/rewrite ACKs already generated by
vSnoop
Challenge 4: Throttle Receive window to keep vSnoop online
Solution: Adjusted according to the buffer size
Challenges
![Page 14: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/14.jpg)
State Machine Maintained Per-Flow
Start
Unexpected Sequence
Active(online)
No buffer(offline)
Out-of-order packet
In-order pkt Buffer space
available
Out-of-order packet
In-order pktNo buffer
In-order pkt Buffer space available
No buffer
Packet recvEarly acknowledgements
for in-order packets
Don’t acknowledge
Pass out-of-order pkts to VM
![Page 15: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/15.jpg)
vSnoop Implementation in Xen
Driver Domain (dom0)
Bridge
Netfront
Netback
vSnoop
VM1
Netfront
Netback
VM3Netfront
Netback
VM2
buf bufbuf
Tuning Netfront
![Page 16: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/16.jpg)
Evaluation Overheads of vSnoop
TCP throughput speedup
Application speedup Multi-tier web service (RUBiS) MPI benchmarks (Intel, High-Performance
Linpack)
![Page 17: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/17.jpg)
Evaluation – Setup VM hosts
3.06GHz Intel Xeon CPUs, 4GB RAM Only one core/CPU enabled Xen 3.3 with Linux 2.6.18 for the driver domain (dom0)
and the guest VMs Client machine
2.4GHz Intel Core 2 Quad CPU, 2GB RAM Linux 2.6.19
Gigabit Ethernet switch
![Page 18: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/18.jpg)
vSnoop Routines
Single Stream Multiple Streams
Cycles CPU % Cycles CPU %
vSnoop_ingress() 509 3.03 516 3.05vSnoop_lookup_hash(
)74 0.44 91 0.51
vSnoop_build_ack() 52 0.32 52 0.32vSnoop_egress() 104 0.61 104 0.61
Per-packet CPU overhead for vSnoop routines in dom0
vSnoop Overhead Profiling per-packet vSnoop overhead using
Xenoprof [Menon VEE’05]
Minimal aggregateCPU overhead
![Page 19: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/19.jpg)
Median
0.192MB/s
0.778MB/s
6.003MB/s
TCP Throughput Improvement 3 VMs consolidated, 1000 transfers of a
100KB file Vanilla Xen, Xen+tuning,
Xen+tuning+vSnoop30x Improvement
+ Vanilla Xen x Xen+tuning * Xen+tuning+vSnoop
![Page 20: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/20.jpg)
TCP Throughput: 1 VM/Core
0.00
0.20
0.40
0.60
0.80
1.00
100M
B
10M
B
1MB
500K
B
250K
B
100K
B
50KBNo
rmal
ized
Thro
ughp
ut
Transfer Size
Xen+tuning+vSnoopXen+tuningXen
![Page 21: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/21.jpg)
TCP Throughput: 2 VMs/Core
0.00
0.20
0.40
0.60
0.80
1.00
100M
B
10M
B
1MB
500K
B
250K
B
100K
B
50KBNo
rmal
ized
Thro
ughp
ut
Transfer Size
Xen+tuning+vSnoopXen+tuningXen
![Page 22: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/22.jpg)
TCP Throughput: 3 VMs/Core
0.00
0.20
0.40
0.60
0.80
1.00
100M
B
10M
B
1MB
500K
B
250K
B
100K
B
50KB
Norm
alize
d Th
roug
hput
Transfer Size
Xen+tuning+vSnoopXen+tuningXen
![Page 23: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/23.jpg)
TCP Throughput: 5 VMs/Core
0.00
0.20
0.40
0.60
0.80
1.00
100M
B
10M
B
1MB
500K
B
250K
B
100K
B
50KB
Norm
alize
d Th
roug
hput
Transfer Size
Xen+tuning+vSnoopXen+tuningXen
vSnoop’s benefit rises with higher VM consolidation
![Page 24: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/24.jpg)
TCP Throughput: Other Setup Parameters CPU load for VMs Number of TCP connections to VM Driver domain on separate core Sender being a VM
vSnoop consistently achieves significant TCP
throughput improvement
![Page 25: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/25.jpg)
vSnoopdom0
dom1 dom2
Server1
vSnoopdom0
dom1 dom2
Server2ClientClient Threads
Application-Level Performance: RUBiS
RUBiS Clients Apache MySQL
![Page 26: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/26.jpg)
RUBiS Operation Countw/o vSnoop
Countw/ vSnoop
%Gain
Browse 421 505 19.9%BrowseCategories 288 357 23.9%
SearchItemsInCategory 3498 4747 35.7%BrowseRegions 128 141 10.1%
ViewItem 2892 3776 30.5%ViewUserInfo 732 846 15.6%
ViewBidHistory 339 398 17.4%Others 3939 4815 22.2%Total 12237 15585 27.4%
Average Throughput 29 req/s 37 req/s 27.5%
RUBiS Results
![Page 27: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/27.jpg)
Intel MPI Benchmark: Network intensive High-performance Linpack: CPU intensive
vSnoopdom0
dom1 dom2
Server1dom0
dom1 dom2
Server2dom0
dom1 dom2
Server3dom0
dom2
Server4
dom1
MPI nodes
Application-level Performance – MPI Benchmarks
vSnoop vSnoop vSnoop
![Page 28: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/28.jpg)
Intel MPI Benchmark Results: Broadcast
0.00
0.20
0.40
0.60
0.80
1.00
8MB
4MB
2MB
1MB
512K
B
256K
B
128K
B
64KB
Norm
alize
d Ex
ecut
ion
Tim
e
Message Size
Xen+tuning+vSnoopXen+tuningXen
40% Improvement
![Page 29: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/29.jpg)
Intel MPI Benchmark Results: All-to-All
0.00
0.20
0.40
0.60
0.80
1.00
8MB
4MB
2MB
1MB
512K
B
256K
B
128K
B
64KB
Norm
alize
d Ex
ecut
ion
Tim
e
Message Size
Xen+tuning+vSnoopXen+tuningXen
![Page 30: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/30.jpg)
40%
HPL Benchmark Results
0.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600 1.800
(8K,
16)
(8K,
8)
(8K,
4)
(8K,
2)
(6K,
16)
(6K,
8)
(6K,
4)
(6K,
2)
(4K,
16)
(4K,
8)
(4K,
4)
(4K,
2)
Gflop
s
Problem Size and Block Size (N,NB)
Xen+tuning+vSnoopXen
![Page 31: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/31.jpg)
Related Work Optimizing virtualized I/O path
Menon et al. [USENIX ATC’06,’08; ASPLOS’09]
Improving intra-host VM communications XenSocket [Middleware’07], XenLoop
[HPDC’08], Fido [USENIX ATC’09], XWAY [VEE’08], IVC [SC’07]
I/O-aware VM scheduling Govindan et al. [VEE’07], DVT [SoCC’10]
![Page 32: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/32.jpg)
Conclusions Problem: VM consolidation degrades TCP
throughput Solution: vSnoop
Leverages acknowledgment offloading Does not violate end-to-end TCP semantics Is transparent to applications and OS in VMs Is generically applicable to many VMMs
Results: 30x improvement in median TCP throughput About 30% improvement in RUBiS benchmark 40-50% reduction in execution time for Intel
MPI benchmark
![Page 33: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/33.jpg)
Thank you.
For more information: http://
friends.cs.purdue.edu/dokuwiki/doku.php?id=vsnoopOr Google “vSnoop Purdue”
![Page 34: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/34.jpg)
TCP Benchmarks cont. Testing different scenarios:
a) 10 concurrent connections b) Sender also subject to VM
scheduling c) Driver domain on a separate core
a)
b)
c)
![Page 35: vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload](https://reader036.fdocuments.net/reader036/viewer/2022062315/56816202550346895dd225de/html5/thumbnails/35.jpg)
TCP Benchmarks cont. Varying CPU load for 3 consolidated VMs:
40% CPU load:
80% CPU load:
60% CPU load: