Post on 23-Jan-2016
description
Dynamic Scheduling ofNetwork Updates
Xin JinHongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan,
Ming Zhang, Jennifer Rexford, Roger Wattenhofer
2
SDN: Paradigm Shift in Networking
• Many benefits– Traffic engineering [B4, SWAN]
– Flow scheduling [Hedera, DevoFlow]
– Access control [Ethane, vCRIB]
– Device power management [ElasticTree]
Controller• Direct, centralized updates of forwarding rules in switches
3
Network Update is Challenging
• Requirement 1: fast– The agility of control loop
• Requirement 2: consistent– No congestion, no blackhole, no loop, etc.
Network
Controller
4
What is Consistent Network Update
Current State
A
D
CB
EF1: 5 F4: 5
F2: 5 F3: 10
Target State
A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
5
What is Consistent Network Update
• Asynchronous updates can cause congestion• Need to carefully order update operations
Current State
A
D
CB
EF1: 5 F4: 5
F2: 5 F3: 10
Target State
A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
A
D
CB
EF4: 5F1: 5
F3: 10F2: 5
Transient State
6
Existing Solutions are Slow
• Existing solutions are static [ConsistentUpdate’12, SWAN’13, zUpdate’13]
– Pre-compute an order for update operations
F3
F4
F2
F1
StaticPlan A
Target StateCurrent State
A
D
CB
EF4: 5
F2: 5
F1: 5
F3: 10A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
7
Existing Solutions are Slow
• Existing solutions are static [ConsistentUpdate’12, SWAN’13, zUpdate’13]
– Pre-compute an order for update operations
• Downside: Do not adapt to runtime conditions– Slow in face of highly variable operation completion time
F3
F4
F2
F1
StaticPlan A
8
Operation Completion Times are Highly Variable
>10x
• Measurement on commodity switches
9
Operation Completion Times are Highly Variable
• Measurement on commodity switches
• Contributing factors– Control-plane load– Number of rules– Priority of rules– Type of operations (insert vs. modify)
10
Static Schedules can be Slow
F3
F4
F2 F1StaticPlan B
F3
F4
F2
F1
StaticPlan A
F1F2F3F4
Time1 2 3 4 5
F1F2F3F4
Time1 2 3 4 5
F1F2F3F4
Time1 2 3 4 5
F1F2F3F4
Time1 2 3 4 5
Target StateCurrent State
A
D
CB
EF4: 5
F2: 5
F1: 5
F3: 10A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
No static schedule is a clear winner under all conditions!
11
Dynamic Schedules are Adaptive and Fast
F3
F4
F2 F1StaticPlan B
F3
F4
F2
F1
StaticPlan A
Target StateCurrent State
A
D
CB
EF4: 5
F2: 5
F1: 5
F3: 10A
D
CB
E
F1: 5
F4: 5
F2: 5 F3: 10
No static schedule is a clear winner under all conditions!
F3
F4
F2F1
DynamicPlan
Adapts to actual conditions!
12
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
13
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
EF3: 5F1: 5
F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5F2: 5
F2
Deadlock
14
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
EF3: 5
F2: 5
F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F1
F1: 5
15
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
E
F2: 5
F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F1 F3
F1: 5
F3: 5
16
Challenges of Dynamic Update Scheduling
• Exponential number of orderings• Cannot completely avoid planning
Current State
A
D
CB
E F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F1 F3 F2
Consistent update plan
F2: 5
F1: 5
F3: 5
17
Dionysus Pipeline
Dependency Graph Generator
Update Scheduler
Network
CurrentState
TargetState
ConsistencyProperty
Encode valid orderings
Determine a fast order
18
Dependency Graph Generation
Dependency Graph Elements
Operation node
Resource node (link capacity, switch table size)Node
Edge Dependencies between nodes
Path node
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
19
Dependency Graph Generation
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
MoveF3
MoveF1
MoveF2
20
Dependency Graph Generation
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
A-E: 5
MoveF3
MoveF1
MoveF2
21
Dependency Graph Generation
Current State
A
D
CB
EF1: 5
F4: 5
F5: 10
A-E: 5
5
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
F3: 5
F2: 5
22
Dependency Graph Generation
A-E: 55
5
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
23
Dependency Graph Generation
A-E: 55
55
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
24
Dependency Graph Generation
A-E: 55
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
Target State
A
D
CB
E
F1: 5
F3: 5F4: 5
F5: 10
F2: 5
Current State
A
D
CB
EF3: 5
F2: 5
F1: 5F4: 5
F5: 10
25
Dependency Graph Generation
• Supported scenarios– Tunnel-based forwarding: WANs– WCMP forwarding: data center networks
• Supported consistency properties– Loop freedom– Blackhole freedom– Packet coherence– Congestion freedom
• Check paper for details
26
Dionysus Pipeline
Dependency Graph Generator
Update Scheduler
Network
CurrentState
TargetState
ConsistencyProperty
Encode valid orderings
Determine a fast order
27
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 55
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
28
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 0
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
Deadlock!
29
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 05
D-E: 05 5
5
MoveF3
MoveF1
MoveF2
30
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 05
D-E: 55
5
MoveF3
MoveF2
31
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 05
5
MoveF3
MoveF2
32
Dionysus Scheduling
• Scheduling as a resource allocation problem
A-E: 55 Move
F2
33
Dionysus Scheduling
• Scheduling as a resource allocation problem
MoveF2
Done!
34
Dionysus Scheduling
• Scheduling as a resource allocation problem
• NP-complete problems under link capacity and switch table size constraints
• Approach– DAG: always feasible, critical-path scheduling– General case: covert to a virtual DAG– Rate limit flows to resolve deadlocks
35
Critical-Path Scheduling
• Calculate critical-path length (CPL) for each node
– Extension: assign larger weight to operation nodes if we know in advance the switch is slow
• Resource allocated to operation nodes with larger CPLs
A-B:5
CPL=3
C-D:0
5
CPL=1
5
5 5
5
CPL=1
CPL=2
CPL=2
CPL=1
MoveF4
MoveF3
MoveF2
MoveF1
36
Critical-Path Scheduling
• Calculate critical-path length (CPL) for each node
– Extension: assign larger weight to operation nodes if we know in advance the switch is slow
• Resource allocated to operation nodes with larger CPLs
A-B:0
CPL=3
C-D:0
5
CPL=1
5
5
5
CPL=1
CPL=2
CPL=2
CPL=1
MoveF4
MoveF3
MoveF2
MoveF1
37
Handling Cycles
• Convert to virtual DAG– Consider each strongly connected
component (SCC) as a virtual node
CPL=1CPL=3
weight = 1weight = 2
• Critical-path scheduling on virtual DAG– Weight wi of SCC: number of operation nodes
A-E: 55
D-E: 0
5
5 5
5
MoveF3
MoveF1
MoveF2
MoveF2
38
Handling Cycles
• Convert to virtual DAG– Consider each strongly connected
component (SCC) as a virtual node
CPL=1CPL=3
weight = 1weight = 2
• Critical-path scheduling on virtual DAG– Weight wi of SCC: number of operation nodes
A-E: 05
D-E: 05 5
5
MoveF3
MoveF1
MoveF2
MoveF2
39
Evaluation: Traffic Engineering
WAN TE Data Center TE0
0.5
1
1.5
2
2.5
3
3.5
50th Percentile Update Time
OneShotDionysusSWAN
Upd
ate
Tim
e (s
econ
d)
Not congestion-free
Congestion-free
Congestion-free
40
Evaluation: Traffic Engineering
Improve 50th percentile update speed by 80%compared to static scheduling (SWAN), close to OneShot
WAN TE Data Center TE0
0.5
1
1.5
2
2.5
3
3.5
50th Percentile Update Time
OneShotDionysusSWAN
Upd
ate
Tim
e (s
econ
d)
Not congestion-free
Congestion-free
Congestion-free
41
Evaluation: Failure Recovery
OneShot Dionysus SWAN0
1
2
3
4
5
99th PercentileLink Oversubscription
Link
Ove
rsub
scrip
tion
(GB)
OneShot Dionysus SWAN0
0.5
1
1.5
2
2.5
3
99th PercentileUpdate Time
Upd
ate
Tim
e (s
econ
d)
42
Evaluation: Failure Recovery
Reduce 99th percentile link oversubscription by 40% compared to static scheduling (SWAN)
Improve 99th percentile update speed by 80% compared to static scheduling (SWAN)
OneShot Dionysus SWAN0
1
2
3
4
5
99th PercentileLink Oversubscription
Link
Ove
rsub
scrip
tion
(GB)
OneShot Dionysus SWAN0
0.5
1
1.5
2
2.5
3
99th PercentileUpdate Time
Upd
ate
Tim
e (s
econ
d)
43
Conclusion
• Dionysus provides fast, consistent network updates through dynamic scheduling– Dependency graph: compactly encode orderings– Scheduling: dynamically schedule operations
Dionysus enables more agile SDN control loops
44
Thanks!