VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at...

35
VNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh, Holger Hoos, Alan Hu, Margo Seltzer Internet Tenants Cloud Provider

Transcript of VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at...

Page 1: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

VNF Chain Allocation and Management at Datacenter Scale

Nodir KodirovSam Bayless, Fabian Ruffy, Swati Goswami

Ivan Beschastnikh, Holger Hoos, Alan Hu, Margo Seltzer

Internet…

TenantsCloud Provider

Page 2: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Security• Firewall, DDoS protection, DPI

• Monitoring• QoE monitor, Network Stats

• Services• Ad insertion, Transcoder

• Network optimization• NAT, Load-balancer, WAN accelerator

# middleboxes ≈ # L2/L3 devices [Sherry et al. SIGCOMM’12]

Network Functions (NF) are useful and widespread

2Sherry et al. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service, SIGCOMM'12

DDoS protection

carrier-grade NAT

ad insertion

transcoder

BRAS

session border controller

WAN accelerator IDS

load balancer

DPI

QoE monitor

firewall

Page 3: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Elasticity• Quick scale up and down NFs

• Fast upgrades• No need to wait for new hardware

• Quick configuration, recovery• Failover to the backup NF instance

•Outsourcing

Benefits of Virtualized Network Functions (VNF)

3

Sherry et al. Making Middleboxes Someone Else's Problem: Network Processing as a Cloud Service, SIGCOMM’12Rajagopalan et al., Split/Merge: System Support for Elastic Execution in Virtual Middleboxes, NSDI’13Martins et al., ClickOS and the Art of Network Function Virtualization, NSDI'14

DDoS protection

carrier-grade NAT

ad insertion

transcoder

BRAS

session border controller

WAN accelerator IDS

load balancer

DPI

QoE monitor

firewall

Page 4: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Outsourcing VNFs to the Cloud

4

Page 5: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Internet

Cloud Provider Tenants

Outsourcing VNFs to the Cloud

5

Page 6: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Internet

Cloud Provider Tenants

Outsourcing VNF Chains to the Cloud

6

chain

Page 7: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Internet

Cloud Provider Tenants

Challenges of outsourcing VNF Chains

7

chain

How can cloud providers achieve high datacenter utilization?

How can tenants allocate and manage their VNF chains?

Page 8: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Our contributions: API and algorithm

8

How can cloud providers achieve high datacenter utilization?

How can tenants allocate and manage their VNF chains?

• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization

• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale

• Ongoing work: chain abstraction

Internet…

TenantsCloud Provider

N. Kodirov et al, VNF Chain Allocation and Management at Data Center Scale, ANCS 2018

Page 9: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

NAT FW IDS VPN2 1 22 1

1

IDS’1 1

VNF Chain: six API with use-cases

9

NAT FW IDS VPN2 1 22 1

1

NAT FW IDS VPN3 2 33 2

1

Chain scale-out Element upgrade

cid ⟵ allocate-chain(C, bw)add-link-bandwidth(a, b, bw, cid)add-node(f, cid)

remove-link-bandwidth(a, b, bw, cid)remove-node(f, cid)remove-e2e-bandwidth(cid, bw)

Initial chain

Page 10: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

VNF Chain: six API with use-cases

10

NAT FW IDS VPN2 1 22 1

1

Chain scale-out Element upgrade Chain expand …

Initial chain

A graph can be transformed arbitrarily by manipulating individual nodes and edges.

N. Kodirov et al, VNF Chain Allocation and Management at Data Center Scale, ANCS 2018

Page 11: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Scale-out beyond single physical resource capacity

11

NAT FW IDS VPN2 1 22 1

1

Chain scale-out

cid ⟵ allocate-chain(C, bw)add-link-bandwidth(a, b, bw, cid)

(f, cid)

(a, b, bw, cid)(f, cid)

(cid, bw)

Initial chain

NAT FW IDS VPN50 50 40 40

10

50 ToR2

40

40 40

ToR1

40

Gateway

100

Page 12: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Abstract VNF chain• what tenant requires to allocate

and operates on

• Concrete VNF chain• cloud provider’s implementation

of the abstract chain

• Chains abstraction advantages• facilitates high DC utilization• improves SLA guarantees

Chain Abstraction: Abstract-Concrete VNF Chains

Concrete chains (for Cloud provider)

NAT FW IDS VPN5 4 55 41

Abstract chain (for Tenants)

NAT FW IDS VPN5 4 55 41

NAT FW IDS VPN50 50 40 40 50

10

12

NAT FW IDS VPN50 50 40 40 50

10

10×

Page 13: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Our contributions: API and algorithm

13

How can tenants allocate and manage their VNF chains?

How can cloud providers achieve high datacenter utilization?

• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization

• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale

• Ongoing work: chain abstraction

Internet…

TenantsCloud Provider

Page 14: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Algorithm inputs: DC topology and chain

14

40

ToR2

AggSw2AggSw1

40 40 4040

10 10

Gateway100 100

32 core128 GB[ ] 32 core

128 GB[ ]

[ 2048 TCAM ] [ 2048 TCAM ] ToR1

NAT FW IDS VPN2 1 22 1

1

1/8 core1/2 GB

3/8 core1/2 GB

1/2 core2 GB

1/4 core1/2 GB

Expected resource consumption per Gbps of traffic(see the paper for VNF profile generation)

Palkar et al., E2: A Framework for NFV Applications, SOSP’15Naik et al., NFVPerf: Online performance monitoring and bottleneck detection for NFV, IEEE NFV-SDN 2016. Nam et al., Probius: Automated Approach for VNF and Service Chain Analysis in Software-Defined NFV, SOSR'18

Page 15: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Algorithms for Chain Allocation and Management

15

1/8 core1/2 GB

3/8 core1/2 GB

1/2 core2 GB

1/4 core1/2 GB

Expected resource consumption per Gbps of traffic(see the paper for VNF profile generation)

40

ToR2

AggSw2AggSw1

40 40 4040

10 10

Gateway100 100

32 core128 GB[ ] 32 core

128 GB[ ]

[ 2048 TCAM ] [ 2048 TCAM ] ToR1

NAT FW IDS VPN2 1 22 1

1

Page 16: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Random (baseline)• Consider NFs and servers/switches in random order• Attempt the above step n times (e.g., n=100)• Choose the shortest path between chain NFs

Algorithms for Chain Allocation and Management

16

NAT IDS2 1 22 1

1

40

ToR2

AggSw2AggSw1

40 40 4040

10 10

Gateway100 100

32 core128 GB[ ] 32 core

128 GB[ ]

[ 2048 TCAM ] [ 2048 TCAM ] ToR1

VPNFW

Page 17: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Random (baseline)• Consider NFs and servers/switches in random order• Attempt the above step n times (e.g., n=100)• Choose the shortest path between chain NFs

• NetPack: Random + 3 simple heuristics• Consider the chain NFs in a topological order• Re-use the same server when allocating consecutive NFs• Gradually increase the network scope: rack, cluster, etc.

• VNFSolver: how optimal is NetPack?• Constraint-solver based chain allocation algorithm• Slow, but complete: finds a solution when one exists

Algorithms for Chain Allocation and Management

17

10-node

E2 Commercial Facebook

# of

allo

cate

d ch

ains ? ? ? ? ? ?

Palkar et al., E2: A Framework for NFV Applications, SOSP’15Bayless et al., SAT Modulo Monotonic Theories, AAAI'15

R R Rand

om

R

Rand

om

R

Net

Pack

Net

Pack

Net

Pack

Net

Pack

Net

Pack

N

Page 18: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Our contributions: API and algorithm

18

How can tenants allocate and manage their VNF chains?

How can cloud providers achieve high datacenter utilization?

• API to allocate and manage VNF chains• Three algorithms• implement the API, and• achieve high datacenter utilization

• Evaluation• simulate: in datacenter scale with 1000+ servers• Daisy: emulate chain management at rack-scale

• Ongoing work: chain abstraction

Internet…

TenantsCloud Provider

Page 19: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• How good is the datacenter utilization?• Evaluate Random, NetPack, and VNFSolver• Consider three different datacenter topologies • Use five different VNF chains with varying length (2-10)

• How fast is chain allocation?• Measure time it takes to saturate the datacenter

• Does API reliably implement the use-cases?• Prototype scale-out and chain upgrade in Daisy• Use two different racks, two sources of packet traces

Evaluation: Objectives

19

Page 20: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Datacenter utilization evaluation

20Palkar et al., E2: A Framework for NFV Applications, SOSP'15

NAT IDS2 1 22 1

1

VPNFW

Page 21: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Datacenter utilization evaluation

21

NetPack achieves at least 96% of VNFSolver allocations.Chain allocation time: Random ≲ NetPack ≪ VNFSolver.

Palkar et al., E2: A Framework for NFV Applications, SOSP'15

Page 22: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

NetPack Utilization and Speed

22

NetPack achieves at least 96% of VNFSolver allocations while being 82x faster than VNFSolver (optimal) and only up to

54% slower (per chain) than Random (baseline) on average.

Qualitatively similar results with Facebook and Commercial DC topologies with chains of up to 10 nodes.

(see the paper for details)

Page 23: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Daisy builds on Sonata framework• Mininet to build DC topology• OVS for switches, and Dockers for NFs

• Runs on a single Azure VM• 64 cores, 432 GB RAM

• Emulates use-cases and chain arrivals• scale-out and upgrade use-cases• continuous arrival of tenant chains in rack-scale

Feasibility check: does API work?

23

mininet

DAISY

Peuster et al., Sonata NFV SDK, github.com/sonata-nfv/son-emu, 2017

Page 24: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

VNF Chain scale-out

24

Chain scale-out

NAT IDS3 2 33 2

1

VPNFW

Daisy implements scale-out with no packet drops.

Thro

ughp

ut (M

bps)

NAT IDS2 1 22 1

1

VPNFW

Initial chain

Page 25: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Daisy: emulate continuous chain arrival

25

NAT IDS2 1 22 1

1

VPNFW

VNFSolver allocated 75 chains (687 Mbps)NetPack allocated 67 chains (633 Mbps)Random allocated 61 chains (561 Mbps)

(throughput with iperf generated packets is precise)

Agg

rega

te c

hain

th

roug

hput

(Mbp

s)

Time (s)

Page 26: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

26

Daisy implements scale-out with no packet drops and element upgrade with 1s packet drop at most.

We also emulated continuous chain arrival case where different tenants make chain allocation requests one-by-one.

Daisy Contributions

Page 27: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• API with six primitives• Implements wide-range of chain operations• Chain abstraction facilitates full DC utilization

• NetPack algorithm• Handles DC-scale allocation with 1000+ servers• Achieves at least 96% allocations of VNFSolver

(optimal) while being 82x faster on average

• Daisy prototype• Demonstrates feasibility of API and algorithms

•Ongoing work: chain abstraction

Snapshot of complete work

27

How can cloud providers achieve high data center utilization?

How can tenants allocate and manage their VNF chains?

Internet

TenantsCloud Provider

Page 28: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• Abstract VNF chain• what tenant requires to allocate

and operates on

• Concrete VNF chain• cloud provider’s implementation

of the abstract chain

• Chains abstraction advantages• facilitates high DC utilization• improves SLA guarantees

• Challenges• low-latency, packet loss,

state synchronization, efficiency loss, hotspots

Chain abstraction challenges

Concrete chains (for Cloud provider)

NAT FW IDS VPN5 4 55 41

Abstract chain (for Tenants)

NAT FW IDS VPN5 4 55 41

28

NAT FW IDS VPN50 50 40 40 50

10

10×

Kodirov et al., VNF Chain Abstraction for Cloud Service Providers, ANCS’18 poster

Page 29: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Problem: Hotspots

Concrete chains (for Cloud provider)

NAT FW IDS VPN5 4 55 41

Abstract chain (for Tenants)

NAT FW IDS VPN5 4 55 41

29

NAT FW IDS VPN50 50 40 40 50

10

10×

Page 30: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

40

ToR2

AggSw2AggSw1

40 40 4040

10 10

Gateway100 100

32 core128 GB[ ] 32 core

128 GB[ ]

[ 2048 TCAM ] [ 2048 TCAM ] ToR1

Problem: Hotspots

Abstract chain (for Tenants)

30

NAT FW IDS VPN50 50 40 40 50

10

NF(s)bw bw

NIC

Rx co

re-n

core

1co

re0 N

IC T

xNF(s)bw bw

NF(s)bw bw

Page 31: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

40

ToR2

AggSw2AggSw1

40 40 4040

10 10

Gateway100 100

32 core128 GB[ ] 32 core

128 GB[ ]

[ 2048 TCAM ] [ 2048 TCAM ] ToR1

• Hotspots in different layers:

CPU cores, NIC ports, ToR switch ports, etc.

Problem: Hotspots

Abstract chain (for Tenants)

31

NAT FW IDS VPN50 50 40 40 50

10

Sadok et al., A Case for Spraying Packets in Software Middleboxes, HotNets'18

NF(s)bw bw

NIC

Rx co

re-n

core

1co

re0 N

IC T

xNF(s)bw bw

NF(s)bw bw

Page 32: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Microbenchmarks with up to 3 chains

Concrete chains

32OpenNetVM, github.com/sdnfv/openNetVM, 2016

• tNF: tunable NF• computes N prime numbers, per packet

(<50K in our experiments)• Throughput: 0.83 Mpps

(9900 Mbps with 1500 byte packet)

• NF runs on OpenNetVM• Varied the number of concrete chains

from one to three• Each concrete chain processes a

separate flow (5-tuple)

tNF9.9 9.9

Abstract chain

tNFbw bw

1 co

re, b

w=9

.9

tNFbw bw

2 co

res,

2bw

=4.4

5 x2

tNFbw bw 3 co

res,

3bw

=3.3

x3

Page 33: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Microbenchmarks with up to 3 chains

Concrete chains

33

tNFbw bw

tNFbw bw

tNFbw bwFlow

-Dire

ctor

core

0

core

3co

re2

core

1

Server

9.9Mbps

tNF9.9 9.9

Abstract chain• Tail latencies (99 percentile) in microseconds

• 1 concrete chain (uses 1 core for 9.9Gbps): 336 us• 2 concrete chains (uses 2 cores for 9.9Gbps): 182 us• 3 concrete chains (uses 3 cores for 9.9Gbps): 126 us

1.0

0.8

0.6

0.4

0.2

0

CDF

100

1 c.chain 2 c.chains 3 c.chains

Packet processing latency in log scale (microseconds)

Page 34: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

Observations and Ongoing Work

34

Concrete chains

tNFbw bw

tNFbw bw

tNFbw bw

• Latency grows proportional to the core load

• Need CPU load-aware chain-splitting mechanism• N flows to 1 core (N-to-1) for mice flows• 1-to-N for elephant flows

• Splitting should happen in multiple levels• CPU cores, NIC ports, ToR ports

• Need to support wide-range of NFs• Bounded tail latencies are particularly challenging

for stateful NFs, such as DPI

tNF9.9 9.9

Abstract chain

Sadok et al., A Case for Spraying Packets in Software Middleboxes, HotNets’18Khalid and Akella, Correctness and Performance for Stateful Chained Network Functions, NSDI’19

Page 35: VNF Chain Allocation and Management at Datacenter Scale fileVNF Chain Allocation and Management at Datacenter Scale Nodir Kodirov Sam Bayless, Fabian Ruffy, Swati Goswami Ivan Beschastnikh,

• API with six primitives• Implements wide-range of chain operations• Chain abstraction facilitates full DC utilization

• NetPack algorithm• Handles DC-scale allocation with 1000+ servers• Achieves at least 96% allocations of VNFSolver

(optimal) while being 82x faster on average• Ongoing work: chain abstraction

• Need load-aware chain-splitting mechanism

Conclusion

35

How can cloud providers achieve high data center utilization?

How can tenants allocate and manage their VNF chains?

Internet

TenantsCloud Provider

Thank you!