SIGCOMM14 Dionysus slides
Transcript of SIGCOMM14 Dionysus slides
Dynamic Scheduling of Network Updates
Xin Jin Hongqiang Harry Liu, Rohan Gandhi, Srikanth Kandula, Ratul Mahajan,
Ming Zhang, Jennifer Rexford, Roger WaHenhofer
SDN: Paradigm ShiK in Networking
• Many benefits – Traffic engineering [B4, SWAN]
– Flow scheduling [Hedera, DevoFlow] – Access control [Ethane, vCRIB] – Device power management [ElasZcTree]
1
Controller • Direct, centralized updates of forwarding rules in switches
Network Update is Challenging
• Requirement 1: fast – The agility of control loop
• Requirement 2: consistent – No congesZon, no blackhole, no loop, etc.
2 Network
Controller
What is Consistent Network Update
3
Current State
A
D
C B
E F1: 5 F4: 5
F2: 5 F3: 10
Target State
A
D
C B
E
F1: 5
F4: 5
F2: 5 F3: 10
What is Consistent Network Update
• Asynchronous updates can cause congesZon • Need to carefully order update operaZons
4
Current State
A
D
C B
E F1: 5 F4: 5
F2: 5 F3: 10
Target State
A
D
C B
E
F1: 5
F4: 5
F2: 5 F3: 10
A
D
C B
E F4: 5 F1: 5
F3: 10 F2: 5
Transient State
ExisZng SoluZons are Slow
• ExisZng soluZons are staZc [ConsistentUpdate’12, SWAN’13, zUpdate’13]
– Pre-‐compute an order for update operaZons
5
F3
F4
F2
F1
StaZc Plan A
Target State Current State
A
D
C B
E F4: 5
F2: 5
F1: 5
F3: 10 A
D
C B
E
F1: 5
F4: 5
F2: 5 F3: 10
ExisZng SoluZons are Slow
• ExisZng soluZons are staZc [ConsistentUpdate’12, SWAN’13, zUpdate’13]
– Pre-‐compute an order for update operaZons
• Downside: Do not adapt to runZme condiZons – Slow in face of highly variable operaZon compleZon Zme
6
F3
F4
F2
F1
StaZc Plan A
OperaZon CompleZon Times are Highly Variable
7
>10x
• Measurement on commodity switches
OperaZon CompleZon Times are Highly Variable
8
• Measurement on commodity switches • ContribuZng factors
– Control-‐plane load – Number of rules – Priority of rules – Type of operaZons (insert vs. modify)
StaZc Schedules can be Slow
9
F3
F4
F2 F1 StaZc Plan B
F3
F4
F2
F1
StaZc Plan A
F1 F2 F3 F4
Time 1 2 3 4 5
F1 F2 F3 F4
Time 1 2 3 4 5
F1 F2 F3 F4
Time 1 2 3 4 5
F1 F2 F3 F4
Time 1 2 3 4 5
Target State Current State
A
D
C B
E F4: 5
F2: 5
F1: 5
F3: 10 A
D
C B
E
F1: 5
F4: 5
F2: 5 F3: 10
No staZc schedule is a clear winner under all condiZons!
Dynamic Schedules are AdapZve and Fast
10
F3
F4
F2 F1 StaZc Plan B
F3
F4
F2
F1
StaZc Plan A
Target State Current State
A
D
C B
E F4: 5
F2: 5
F1: 5
F3: 10 A
D
C B
E
F1: 5
F4: 5
F2: 5 F3: 10
No staZc schedule is a clear winner under all condiZons!
F3
F4
F2 F1
Dynamic Plan
Adapts to actual condiZons!
Challenges of Dynamic Update Scheduling
• ExponenZal number of orderings • Cannot completely avoid planning
11
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
Challenges of Dynamic Update Scheduling
• ExponenZal number of orderings • Cannot completely avoid planning
12
Current State
A
D
C B
E F3: 5 F1: 5 F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5 F2: 5
F2
Deadlock
Challenges of Dynamic Update Scheduling
• ExponenZal number of orderings • Cannot completely avoid planning
13
Current State
A
D
C B
E F3: 5
F2: 5
F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
F1
F1: 5
Challenges of Dynamic Update Scheduling
• ExponenZal number of orderings • Cannot completely avoid planning
14
Current State
A
D
C B
E
F2: 5
F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
F1 F3
F1: 5
F3: 5
Challenges of Dynamic Update Scheduling
• ExponenZal number of orderings • Cannot completely avoid planning
15
Current State
A
D
C B
E F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
F1 F3 F2
Consistent update plan
F2: 5
F1: 5
F3: 5
Dionysus Pipeline
16
Dependency Graph Generator
Update Scheduler
Network
Current State
Target State
Consistency Property
Encode valid orderings
Determine a fast order
Dependency Graph GeneraZon
17
Dependency Graph Elements
OperaZon node
Resource node (link capacity, switch table size) Node
Edge Dependencies between nodes
Path node
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
Dependency Graph GeneraZon
18
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
Move F3
Move F1
Move F2
Dependency Graph GeneraZon
19
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
A-‐E: 5
Move F3
Move F1
Move F2
Dependency Graph GeneraZon
20
Current State
A
D
C B
E F1: 5 F4: 5
F5: 10
A-‐E: 5
5
Move F3
Move F1
Move F2
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
F3: 5
F2: 5
Dependency Graph GeneraZon
21
A-‐E: 5 5
5
Move F3
Move F1
Move F2
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Dependency Graph GeneraZon
22
A-‐E: 5 5
5 5
Move F3
Move F1
Move F2
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Dependency Graph GeneraZon
23
A-‐E: 5 5
D-‐E: 0
5
5 5
5
Move F3
Move F1
Move F2
Target State
A
D
C B
E
F1: 5
F3: 5 F4: 5
F5: 10
F2: 5
Current State
A
D
C B
E F3: 5
F2: 5
F1: 5 F4: 5
F5: 10
Dependency Graph GeneraZon • Supported scenarios
– Tunnel-‐based forwarding: WANs – WCMP forwarding: data center networks
• Supported consistency properZes – Loop freedom – Blackhole freedom – Packet coherence – CongesZon freedom
• Check paper for details
24
Dionysus Pipeline
25
Dependency Graph Generator
Update Scheduler
Network
Current State
Target State
Consistency Property
Encode valid orderings
Determine a fast order
Dionysus Scheduling • Scheduling as a resource allocaZon problem
26
A-‐E: 5 5
D-‐E: 0
5
5 5
5
Move F3
Move F1
Move F2
Dionysus Scheduling • Scheduling as a resource allocaZon problem
27
A-‐E: 0
D-‐E: 0
5
5 5
5
Move F3
Move F1
Move F2
Deadlock!
Dionysus Scheduling • Scheduling as a resource allocaZon problem
28
A-‐E: 0 5
D-‐E: 0 5 5
5
Move F3
Move F1
Move F2
Dionysus Scheduling • Scheduling as a resource allocaZon problem
29
A-‐E: 0 5
D-‐E: 5 5
5
Move F3
Move F2
Dionysus Scheduling • Scheduling as a resource allocaZon problem
30
A-‐E: 0 5
5
Move F3
Move F2
Dionysus Scheduling • Scheduling as a resource allocaZon problem
31
A-‐E: 5 5 Move
F2
Dionysus Scheduling • Scheduling as a resource allocaZon problem
32
Move F2
Done!
Dionysus Scheduling • Scheduling as a resource allocaZon problem
• NP-‐complete problems under link capacity and switch table size constraints
• Approach – DAG: always feasible, criZcal-‐path scheduling – General case: covert to a virtual DAG – Rate limit flows to resolve deadlocks
33
CriZcal-‐Path Scheduling
• Calculate criZcal-‐path length (CPL) for each node
– Extension: assign larger weight to operaZon nodes if we know in advance the switch is slow
• Resource allocated to operaZon nodes with larger CPLs
34
A-‐B:5
CPL=3
C-‐D:0
5
CPL=1
5
5 5
5
CPL=1
CPL=2
CPL=2
CPL=1
Move F4
Move F3
Move F2
Move F1
CriZcal-‐Path Scheduling
• Calculate criZcal-‐path length (CPL) for each node
– Extension: assign larger weight to operaZon nodes if we know in advance the switch is slow
• Resource allocated to operaZon nodes with larger CPLs
35
A-‐B:0
CPL=3
C-‐D:0
5
CPL=1
5
5
5
CPL=1
CPL=2
CPL=2
CPL=1
Move F4
Move F3
Move F2
Move F1
Handling Cycles
• Convert to virtual DAG – Consider each strongly connected
component (SCC) as a virtual node
36
CPL=1 CPL=3
weight = 1 weight = 2
• CriZcal-‐path scheduling on virtual DAG – Weight wi of SCC: number of operaZon nodes
A-‐E: 5 5
D-‐E: 0
5
5 5
5
Move F3
Move F1
Move F2
Move F2
Handling Cycles
• Convert to virtual DAG – Consider each strongly connected
component (SCC) as a virtual node
37
CPL=1 CPL=3
weight = 1 weight = 2
• CriZcal-‐path scheduling on virtual DAG – Weight wi of SCC: number of operaZon nodes
A-‐E: 0 5
D-‐E: 0 5 5
5
Move F3
Move F1
Move F2
Move F2
EvaluaZon: Traffic Engineering
38
0
0.5
1
1.5
2
2.5
3
3.5
WAN TE Data Center TE
Upd
ate Time (secon
d)
50th PercenZle Update Time
OneShot
Dionysus
SWAN
Not congesZon-‐free
CongesZon-‐free
CongesZon-‐free
EvaluaZon: Traffic Engineering
39
Improve 50th percenZle update speed by 80% compared to staZc scheduling (SWAN), close to OneShot
0
0.5
1
1.5
2
2.5
3
3.5
WAN TE Data Center TE
Upd
ate Time (secon
d)
50th PercenZle Update Time
OneShot
Dionysus
SWAN
Not congesZon-‐free
CongesZon-‐free
CongesZon-‐free
EvaluaZon: Failure Recovery
40
0
1
2
3
4
5
OneShot Dionysus SWAN
Link Oversub
scrip
Zon (GB)
99th PercenZle Link OversubscripZon
0
0.5
1
1.5
2
2.5
3
OneShot Dionysus SWAN
Upd
ate Time (secon
d)
99th PercenZle Update Time
EvaluaZon: Failure Recovery
41
Reduce 99th percenZle link oversubscripZon by 40% compared to staZc scheduling (SWAN)
Improve 99th percenZle update speed by 80% compared to staZc scheduling (SWAN)
0
1
2
3
4
5
OneShot Dionysus SWAN
Link Oversub
scrip
Zon (GB)
99th PercenZle Link OversubscripZon
0
0.5
1
1.5
2
2.5
3
OneShot Dionysus SWAN
Upd
ate Time (secon
d)
99th PercenZle Update Time
Conclusion
• Dionysus provides fast, consistent network updates through dynamic scheduling – Dependency graph: compactly encode orderings – Scheduling: dynamically schedule operaZons
42
Dionysus enables more agile SDN control loops
Thanks!
43