VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at...
Transcript of VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at...
VNF Chain Allocation and Management at Datacenter Scale
Nodir KodirovSam Bayless, Fabian Ruffy, Swati Goswami
Ivan Beschastnikh, Holger Hoos, Alan Hu, Margo Seltzer
Internet…
TenantsCloud Provider
• Security• Firewall, DDoS protection, DPI
• Monitoring• QoE monitor, Network Stats
• Services• Ad insertion, Transcoder
• Network optimization• NAT, Load-balancer, WAN accelerator
# middleboxes ≈ # L2/L3 devices [Sherry et al. SIGCOMM’12]
Network Functions (NF) are useful and widespread
2Sherry et al. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service, SIGCOMM'12
DDoS protection
carrier-grade NAT
ad insertion
transcoder
BRAS
session border controller
WAN accelerator IDS
load balancer
DPI
QoE monitor
firewall
• Elasticity• Quick scale up and down NFs
• Fast upgrades• No need to wait for new hardware
• Quick configuration, recovery• Failover to the backup NF instance
•Outsourcing
Benefits of Virtualized Network Functions (VNF)
3
Sherry et al. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service, SIGCOMM’12Rajagopalan et al., Split/Merge: System Support for Elastic Execution in Virtual Middleboxes, NSDI’13Martins et al., ClickOS and the Art of Network Function Virtualization, NSDI'14
DDoS protection
carrier-grade NAT
ad insertion
transcoder
BRAS
session border controller
WAN accelerator IDS
load balancer
DPI
QoE monitor
firewall
Outsourcing VNFs to the Cloud
4
Internet
…
Cloud Provider Tenants
Outsourcing VNFs to the Cloud
5
Internet
…
Cloud Provider Tenants
Outsourcing VNF Chains to the Cloud
6
chain
Internet
…
Cloud Provider Tenants
Challenges of outsourcing VNF Chains
7
chain
How can cloud providers achieve high datacenter utilization?
…
How can tenants allocate and manage their VNF chains?
Our contributions: API and algorithm
8
How can cloud providers achieve high datacenter utilization?
How can tenants allocate and manage their VNF chains?
• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization
• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale
• Ongoing work: chain abstraction
Internet…
TenantsCloud Provider
N. Kodirov et al, VNF Chain Allocation and Management at Data Center Scale, ANCS 2018
NAT FW IDS VPN2 1 22 1
1
IDS’1 1
VNF Chain: six API with use-cases
9
NAT FW IDS VPN2 1 22 1
1
NAT FW IDS VPN3 2 33 2
1
Chain scale-out Element upgrade
cid ⟵ allocate-chain(C, bw)add-link-bandwidth(a, b, bw, cid)add-node(f, cid)
remove-link-bandwidth(a, b, bw, cid)remove-node(f, cid)remove-e2e-bandwidth(cid, bw)
…
Initial chain
VNF Chain: six API with use-cases
10
NAT FW IDS VPN2 1 22 1
1
Chain scale-out Element upgrade Chain expand …
Initial chain
A graph can be transformed arbitrarily by manipulating individual nodes and edges.
N. Kodirov et al, VNF Chain Allocation and Management at Data Center Scale, ANCS 2018
Scale-out beyond single physical resource capacity
11
NAT FW IDS VPN2 1 22 1
1
Chain scale-out
cid ⟵ allocate-chain(C, bw)add-link-bandwidth(a, b, bw, cid)
(f, cid)
(a, b, bw, cid)(f, cid)
(cid, bw)
Initial chain
NAT FW IDS VPN50 50 40 40
10
50 ToR2
40
40 40
ToR1
40
Gateway
100
• Abstract VNF chain• what tenant requires to allocate
and operates on
• Concrete VNF chain• cloud provider’s implementation
of the abstract chain
• Chains abstraction advantages• facilitates high DC utilization• improves SLA guarantees
Chain Abstraction: Abstract-Concrete VNF Chains
Concrete chains (for Cloud provider)
NAT FW IDS VPN5 4 55 41
Abstract chain (for Tenants)
NAT FW IDS VPN5 4 55 41
NAT FW IDS VPN50 50 40 40 50
10
…
12
NAT FW IDS VPN50 50 40 40 50
10
10×
Our contributions: API and algorithm
13
How can tenants allocate and manage their VNF chains?
How can cloud providers achieve high datacenter utilization?
• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization
• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale
• Ongoing work: chain abstraction
Internet…
TenantsCloud Provider
Algorithm inputs: DC topology and chain
14
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
NAT FW IDS VPN2 1 22 1
1
1/8 core1/2 GB
3/8 core1/2 GB
1/2 core2 GB
1/4 core1/2 GB
Expected resource consumption per Gbps of traffic(see the paper for VNF profile generation)
Palkar et al., E2: A Framework for NFV Applications, SOSP’15Naik et al., NFVPerf: Online performance monitoring and bottleneck detection for NFV, IEEE NFV-SDN 2016. Nam et al., Probius: Automated Approach for VNF and Service Chain Analysis in Software-Defined NFV, SOSR'18
Algorithms for Chain Allocation and Management
15
1/8 core1/2 GB
3/8 core1/2 GB
1/2 core2 GB
1/4 core1/2 GB
Expected resource consumption per Gbps of traffic(see the paper for VNF profile generation)
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
NAT FW IDS VPN2 1 22 1
1
• Random (baseline)• Consider NFs and servers/switches in random order• Attempt the above step n times (e.g., n=100)• Choose the shortest path between chain NFs
Algorithms for Chain Allocation and Management
16
NAT IDS2 1 22 1
1
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
VPNFW
• Random (baseline)• Consider NFs and servers/switches in random order• Attempt the above step n times (e.g., n=100)• Choose the shortest path between chain NFs
• NetPack: Random + 3 simple heuristics• Consider the chain NFs in a topological order• Re-use the same server when allocating consecutive NFs• Gradually increase the network scope: rack, cluster, etc.
• VNFSolver: how optimal is NetPack?• Constraint-solver based chain allocation algorithm• Slow, but complete: finds a solution when one exists
Algorithms for Chain Allocation and Management
17
10-node
E2 Commercial Facebook
# of
allo
cate
d ch
ains ? ? ? ? ? ?
Palkar et al., E2: A Framework for NFV Applications, SOSP’15Bayless et al., SAT Modulo Monotonic Theories, AAAI'15
R R Rand
om
R
Rand
om
R
Net
Pack
Net
Pack
Net
Pack
Net
Pack
Net
Pack
N
Our contributions: API and algorithm
18
How can tenants allocate and manage their VNF chains?
How can cloud providers achieve high datacenter utilization?
• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization
• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale
• Ongoing work: chain abstraction
Internet…
TenantsCloud Provider
• How good is the datacenter utilization?• Evaluate Random, NetPack, and VNFSolver• Consider three different datacenter topologies • Use five different VNF chains with varying length (2-10)
• How fast is chain allocation?• Measure time it takes to saturate the datacenter
• Does API reliably implement the use-cases?• Prototype scale-out and chain upgrade in Daisy• Use two different racks, two sources of packet traces
Evaluation: Objectives
19
Datacenter utilization evaluation
20Palkar et al., E2: A Framework for NFV Applications, SOSP'15
NAT IDS2 1 22 1
1
VPNFW
Datacenter utilization evaluation
21
NetPack achieves at least 96% of VNFSolver allocations.Chain allocation time: Random ≲ NetPack ≪ VNFSolver.
Palkar et al., E2: A Framework for NFV Applications, SOSP'15
NetPack Utilization and Speed
22
NetPack achieves at least 96% of VNFSolver allocations while being 82x faster than VNFSolver (optimal) and only up to
54% slower (per chain) than Random (baseline) on average.
Qualitatively similar results with Facebook and Commercial DC topologies with chains of up to 10 nodes.
(see the paper for details)
• Daisy builds on Sonata framework• Mininet to build DC topology• OVS for switches, and Dockers for NFs
• Runs on a single Azure VM• 64 cores, 432 GB RAM
• Emulates use-cases and chain arrivals• scale-out and upgrade use-cases• continuous arrival of tenant chains in rack-scale
Feasibility check: does API work?
23
mininet
DAISY
Peuster et al., Sonata NFV SDK, github.com/sonata-nfv/son-emu, 2017
VNF Chain scale-out
24
Chain scale-out
NAT IDS3 2 33 2
1
VPNFW
Daisy implements scale-out with no packet drops.
Thro
ughp
ut (M
bps)
NAT IDS2 1 22 1
1
VPNFW
Initial chain
Daisy: emulate continuous chain arrival
25
NAT IDS2 1 22 1
1
VPNFW
VNFSolver allocated 75 chains (687 Mbps)NetPack allocated 67 chains (633 Mbps)Random allocated 61 chains (561 Mbps)
(throughput with iperf generated packets is precise)
Agg
rega
te c
hain
th
roug
hput
(Mbp
s)
Time (s)
26
Daisy implements scale-out with no packet drops and element upgrade with 1s packet drop at most.
We also emulated continuous chain arrival case where different tenants make chain allocation requests one-by-one.
Daisy Contributions
• API with six primitives• Implements wide-range of chain operations• Chain abstraction facilitates full DC utilization
• NetPack algorithm• Handles DC-scale allocation with 1000+ servers• Achieves at least 96% allocations of VNFSolver
(optimal) while being 82x faster on average
• Daisy prototype• Demonstrates feasibility of API and algorithms
•Ongoing work: chain abstraction
Snapshot of complete work
27
How can cloud providers achieve high data center utilization?
How can tenants allocate and manage their VNF chains?
Internet
TenantsCloud Provider
…
• Abstract VNF chain• what tenant requires to allocate
and operates on
• Concrete VNF chain• cloud provider’s implementation
of the abstract chain
• Chains abstraction advantages• facilitates high DC utilization• improves SLA guarantees
• Challenges• low-latency, packet loss,
state synchronization, efficiency loss, hotspots
Chain abstraction challenges
Concrete chains (for Cloud provider)
NAT FW IDS VPN5 4 55 41
Abstract chain (for Tenants)
NAT FW IDS VPN5 4 55 41
…
28
NAT FW IDS VPN50 50 40 40 50
10
10×
Kodirov et al., VNF Chain Abstraction for Cloud Service Providers, ANCS’18 poster
Problem: Hotspots
Concrete chains (for Cloud provider)
NAT FW IDS VPN5 4 55 41
Abstract chain (for Tenants)
NAT FW IDS VPN5 4 55 41
…
29
NAT FW IDS VPN50 50 40 40 50
10
10×
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
Problem: Hotspots
Abstract chain (for Tenants)
30
NAT FW IDS VPN50 50 40 40 50
10
NF(s)bw bw
NIC
Rx co
re-n
core
1co
re0 N
IC T
xNF(s)bw bw
NF(s)bw bw
40
ToR2
AggSw2AggSw1
40 40 4040
10 10
Gateway100 100
32 core128 GB[ ] 32 core
128 GB[ ]
[ 2048 TCAM ] [ 2048 TCAM ] ToR1
• Hotspots in different layers:
CPU cores, NIC ports, ToR switch ports, etc.
Problem: Hotspots
Abstract chain (for Tenants)
31
NAT FW IDS VPN50 50 40 40 50
10
Sadok et al., A Case for Spraying Packets in Software Middleboxes, HotNets'18
NF(s)bw bw
NIC
Rx co
re-n
core
1co
re0 N
IC T
xNF(s)bw bw
NF(s)bw bw
Microbenchmarks with up to 3 chains
Concrete chains
32OpenNetVM, github.com/sdnfv/openNetVM, 2016
• tNF: tunable NF• computes N prime numbers, per packet
(<50K in our experiments)• Throughput: 0.83 Mpps
(9900 Mbps with 1500 byte packet)
• NF runs on OpenNetVM• Varied the number of concrete chains
from one to three• Each concrete chain processes a
separate flow (5-tuple)
tNF9.9 9.9
Abstract chain
tNFbw bw
1 co
re, b
w=9
.9
tNFbw bw
2 co
res,
2bw
=4.4
5 x2
tNFbw bw 3 co
res,
3bw
=3.3
x3
Microbenchmarks with up to 3 chains
Concrete chains
33
tNFbw bw
tNFbw bw
tNFbw bwFlow
-Dire
ctor
core
0
core
3co
re2
core
1
Server
9.9Mbps
tNF9.9 9.9
Abstract chain• Tail latencies (99 percentile) in microseconds
• 1 concrete chain (uses 1 core for 9.9Gbps): 336 us• 2 concrete chains (uses 2 cores for 9.9Gbps): 182 us• 3 concrete chains (uses 3 cores for 9.9Gbps): 126 us
1.0
0.8
0.6
0.4
0.2
0
CDF
100
1 c.chain 2 c.chains 3 c.chains
Packet processing latency in log scale (microseconds)
Observations and Ongoing Work
34
Concrete chains
tNFbw bw
tNFbw bw
tNFbw bw
• Latency grows proportional to the core load
• Need CPU load-aware chain-splitting mechanism• N flows to 1 core (N-to-1) for mice flows• 1-to-N for elephant flows
• Splitting should happen in multiple levels• CPU cores, NIC ports, ToR ports
• Need to support wide-range of NFs• Bounded tail latencies are particularly challenging
for stateful NFs, such as DPI
tNF9.9 9.9
Abstract chain
Sadok et al., A Case for Spraying Packets in Software Middleboxes, HotNets’18Khalid and Akella, Correctness and Performance for Stateful Chained Network Functions, NSDI’19
• API with six primitives• Implements wide-range of chain operations• Chain abstraction facilitates full DC utilization
• NetPack algorithm• Handles DC-scale allocation with 1000+ servers• Achieves at least 96% allocations of VNFSolver
(optimal) while being 82x faster on average• Ongoing work: chain abstraction
• Need load-aware chain-splitting mechanism
Conclusion
35
How can cloud providers achieve high data center utilization?
How can tenants allocate and manage their VNF chains?
Internet
TenantsCloud Provider
…
Thank you!