Taming Latency in Software Defined...

Post on 21-Jul-2019

225 views 0 download

Transcript of Taming Latency in Software Defined...

Understanding Latency in Software Defined Networks

Keqiang He, Sourav Das, Aditya Akella

1

Junaid Khalid

Li Erran Li, Marina Thottan

Latency in SDN

Centralized Controller

Mobility

100 flows

Openflow msgs

2

Some XXX msecs?? Can be as large as 10 secs!!!

Time taken to install 100 rules ?

Do applications care about latency?

3

Intra-DC Traffic Engineering

• MicroTE routes predictable traffic on time scales of 1-2s

• Route updates can take as long as 0.5s

Latency can undermine MicroTE’s effectiveness

Fast Failover

• Reroute the affected flows quickly in face of failures

• Longer update time increases congestion and drops

Latency can inflate failover time by nearly 20s!

Latency is critical to many applications

Limits the applicability of SDN applications

Applications assume latency is low and constant

Factors contributing to Latency?

4

Speed of Control Programs and Network Latency

Latency in network switches

Robust Control Software Design and Distributed Controllers

Not received much attention

Our Work

5

Latency in network switches

Two contributions:

• Systematic experiments to explore latencies in production switches

• Design a framework to overcome the impact of latencies

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

6

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

7

Inbound Latency

Elements of Latency

PHY PHY

Switch

Controller

I1: Send to ASIC SDK I2: Send to OF Agent I3: Send to Controller

Packet

8

CPU Memory DMA

ASIC SDK

OF Agent

I2 CPU board

Forwarding Engine Lookup Hardware

Tables

No Match

I1

PC

I

Switch Fabric

I3

1. Inbound Latency

Outbound Latency

Elements of Latency

PHY PHY

Switch

Controller

O1: Parse OF Msg O2: Software schedules the rule O3: Reordering of rules in table O4: Rule is updated in table

9

Memory CPU

ASIC SDK

OF Agent

O2 DMA

O1

Forwarding Engine Lookup Hardware

Tables

PC

I O4

O3

Switch Fabric

CPU board

1. Inbound Latency 2. Outbound Latency

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

10

Latency Measurements - Setup

eth0

eth1

eth2

Control Channel

Flows IN

Flows OUT

Openflow Switch

POX Controller

Pktgen

Libpcap

11

Switch CPU RAM Flow table size

Data Plane

Vendor A 1 Ghz 1 GB 896 14*10Gbps + 4*40Gbps

Vendor B 2 Ghz 2 GB 4096 40*10Gbps +4*40Gbps

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

12

Inbound Latency

• Increases with flow arrival rate

• CPU Usage is higher for higher flow arrival rates

13

Flow Arrival Rate (packets/sec)

Mean Delay per packet_in (msec)

CPU Usage ( % )

0 - 7.1

100 3.32 15.7

200 8.33 26.5

vendor A switch

Low Power CPU

Inbound Latency

vendor A switch 200 flows/sec

• Increases with interference from outbound msgs

14

0

50

100

150

200

250

300

0 200 400 600 800 1000

inb

ou

nd

del

ay (

ms)

flow #

(a) with flow_mod/pkt_out

0

50

100

150

200

250

300

0 200 400 600 800 1000

inb

ou

nd

del

ay (

ms)

flow #

(b) w/o flow_mod/pkt_out

Low Power CPU

Software Inefficiency

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

15

Outbound Latency

• Latency for three different flow_mod operations Insertion

Modification

Deletion

• Impact of key factors on these latencies Table occupancy

Rule priority structure

16

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

17

Insertion Latency – Priority Effects

Vendor B switch

18

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(a) Burst size 100, same priority

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(b) Burst size 100, increasing priority

Insertion Latency – Priority Effects

Vendor B switch

19

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(a) Burst size 100, same priority

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(b) Burst size 100, increasing priority

100

100

0x0000

TCAM

Insertion Latency – Priority Effects

Vendor B switch

20

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(a) Burst size 100, same priority

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(b) Burst size 100, increasing priority

100

100

99

0x0000

TCAM

Insertion Latency – Priority Effects

Vendor B switch

21

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(a) Burst size 100, same priority

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(b) Burst size 100, increasing priority

100

100 99

101

0x0000

TCAM

Insertion Latency – Priority Effects

Vendor B switch

22

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(a) Burst size 100, same priority

0

5

10

15

20

25

30

0 20 40 60 80 100

inse

rtio

n d

elay

(m

s)

rule #

(b) Burst size 100, increasing priority

100

100 99

101

0x0000

TCAM

Insertion Latency – Priority Effects

Vendor A switch

23

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (a) Burst size 800, same priority

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (b) Burst size 800, decreasing priority

Insertion Latency – Priority Effects

Vendor A switch

24

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (a) Burst size 800, same priority

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (b) Burst size 800, decreasing priority

0x0000

TCAM

Insertion Latency – Priority Effects

Vendor A switch

25

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (a) Burst size 800, same priority

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (b) Burst size 800, decreasing priority

0x0000

TCAM

100

101

102

103

104

Insertion Latency – Priority Effects

Vendor A switch

26

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (a) Burst size 800, same priority

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (b) Burst size 800, decreasing priority

0x0000

TCAM

100

101

102

103

104

99

Insertion Latency – Priority Effects

Vendor A switch

27

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (a) Burst size 800, same priority

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (b) Burst size 800, decreasing priority

0x0000

TCAM

100

101

102

103

104

99

Insertion Latency – Priority Effects

Vendor A switch

28

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (a) Burst size 800, same priority

0

2

4

6

8

10

0 100 200 300 400 500 600 700 800

inse

rtio

n d

elay

(m

s)

rule # (b) Burst size 800, decreasing priority

0x0000

TCAM

100

101

102

103

104

99

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 100 200 300 400 500 600

avg

com

ple

tio

n t

ime

(ms)

burst size

(b) high priority rules into a table with low priority existing rules

table 100 table 400

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

0 100 200 300 400 500 600

avg

com

ple

tio

n t

ime

(ms)

burst size

(a) low priority rules into a table with high priority existing rules

table 100 table 400

Insertion Latency – Table occupancy Effects

• Affected by the types of rules in the table

Vendor B switch 29

Rule Priority Structure and Table Occupancy

TCAM Organization and # Of Slices

Switch Software Overhead

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

30

Modification Latency

• Higher than Insertion latency for vendor B

• Not affected by rule priority but affected by table occupancy

Vendor B switch 31

0

20

40

60

80

100

120

140

160

0 20 40 60 80 100

mo

dif

icat

ion

del

y (m

s)

rule #

(a) 100 rules in the table

0

20

40

60

80

100

120

140

160

0 50 100 150 200

mo

dif

icat

ion

del

ay (

ms)

rule #

(b) 200 rules in table

Poorly optimized software in vendor B

Outline

• Motivation

• Elements of Latency

• Measurement Methodology

• Inbound Latency

• Outbound Latency • Insertion

• Modification

• Deletion

32

Deletion Latency

• Higher than Insertion latency for both vendor A and B

• Not affected by rule priority but affected by table occupancy

Vendor B switch

33

0

20

40

60

80

100

120

0 50 100 150 200

del

etio

n d

elay

(m

s)

rule #

(b) 200 rules in table

0

20

40

60

80

100

120

0 20 40 60 80 100

del

etio

n d

elay

(m

s)

rule #

(a) 100 rules in table

Deletion Latency

• Higher than Insertion latency for both vendor A and B

• Not affected by rule priority but affected by table occupancy

34

Vendor A switch

0

2

4

6

8

10

12

14

0 20 40 60 80 100

del

etio

n d

elay

(m

s)

rule #

(a) 100 rules in table

0

2

4

6

8

10

12

14

0 50 100 150 200

del

etio

n d

elay

(m

s)

rule #

(b) 200 rules in table

Deletion is causing TCAM Reorganization

Summary

• Latency in SDN is critical to many applications

• Assumption: Latency is small or constant

• Latency is high and variable

• Varies with Platforms, Type of operations, Rule priorities, Table occupancy, Concurrent operations

• Key Factors: TCAM Organization, Switch CPU and inefficient Software Implementation

35

Summary

• Latency in SDN is critical to many applications

• Assumption: Latency is small or constant

• Latency is high and variable

• Varies with Platforms, Type of operations, Rule priorities, Table occupancy, Concurrent operations

• Key Factors: TCAM Organization, Switch CPU and inefficient Software Implementation

36

Backup Slides

37

Impact of Concurrent CPU jobs

38

• Impacts insertion delay. E.g. Polling Statistics

• Polling stats impacts more when table occupancy is higher

0

200

400

600

800

1000

1200

0 1 2 3 4 5

flow stats polling count

table occupancy 0 table occupancy 500

Low Power CPU delays concurrent jobs

Vendor B