Post on 27-Mar-2015
Routing Convergence and the Impact of Scale
Dan MasseyColorado State University
26 October 05 2massey@cs.colostate.edu
Internet Routing and BGP Internet divided into Autonomous Systems
Large-scale implies maintaining entire topology at a router is not feasible.
BGP is the inter-AS routing protocol. Router stores the AS path to a destination.
Path allows router to apply policies
How quickly does BGP converge after a change? Can BGP continue to scale with more growth?
Do we need BGP changes or a new protocol?
26 October 05 3massey@cs.colostate.edu
(B A) (C B A) (E D B A) (H G F E A)
BGP Path Exploration
H
BZ
D
A
E
C
dest.
I G
Obsolete paths: (C B A), (E D B A) If Z knew [B A] failed, it could’ve avoided the obsolete paths
Z’s Candidate paths:
() (C B A) (E D B A) (I H G F A)
() () (E D B A) (I H G F A)
() () () (I H G F A)
F
( )
( )
( )( )
( )
( )
26 October 05 4massey@cs.colostate.edu
Path Exploration and Policy Internet does not select the shortest path
Policies limit the number of potential paths. Especially at high level tiers.
Example: Due to routing policy, AS-X (lower tier) sees more alternate paths than AS-Z (tier-1). Via multiple providers Via peers
Z
X
P2
Y
W
P1
26 October 05 5massey@cs.colostate.edu
Impact of Topology Growth
Denser connectivity => more alternate paths
Impact depends on policies and tier Lower tier nodes see more slow convergence
MRAIoff
MRAIon
Jan 2, 2004 Dec 2, 2004
Beacon prefix 198.32.7.0/24
RV peer (AS#)
#updates
#paths
#updates
#paths
1239 (tier1) 44 4 37 4
1221 62 8 87 11
2914 (tier1) 106 6 279 7
3557 102 19 198 39
26 October 05 6massey@cs.colostate.edu
Convergence Improvements
MRAI Timer (Deployed Now) Require minimum time between updates
Typically 30 seconds
Assertion Checking (Proposed in INFOCOM 02) Signal policy or topological failure in some cases
Discard routes that include failed subpath
Ghost Flushing (Proposed in INFOCOM 03) When the MRAI timer delays an update, send a withdrawal
Attach Failure Notification (INFOCOM05, CompNet05) Explicitly list the cause of the failure
26 October 05 7massey@cs.colostate.edu
MRAI Rate-Limiting Timer
Minimum Route Advertisement Interval (MRAI) timer:
Within M=30 seconds, at most one announcement from A to B
P1 P2 P 3P 4 P 5A’s path changes:
Msgs from A to B:P1
time=0 time=30time=60
P4 P 5
b. delay convergence
a. suppress transient changes
Impact:
26 October 05 8massey@cs.colostate.edu
MRAI and Ghost Flushing
MRAI prevents removal of stale information Suppose P1 to P5 are increasingly worse Neighbor believes P1 still available until time 30
P1 P2 P 3P 4 P 5A’s path changes:
Msgs from A to B:P1
time=0 time=30time=60
P4 P 5w
Ghost Flushing: if change to longer path and MRAI applies, send a withdraw
w
26 October 05 9massey@cs.colostate.edu
Root Cause Notification
The node who detects the failure attaches root cause to msg Other nodes copy the root cause to outgoing messages
(B A) (C B A) (E D B A) (H G F E A) H
BZ
D
A
E
C
I G
Z’s Candidate paths:
F () (C B A) (E D B A) (I H G F A)
( ), [B A] failure
( ), [B A] failure
( ), [B A] failure
the first msg is enough for Z to remove all the obsolete paths
26 October 05 10massey@cs.colostate.edu
Ghost Flushing
Assertion
BGP
Root Cause Notification
Fail-down Simulation Results
Fail-down: destination becomes unreachable
26 October 05 11massey@cs.colostate.edu
Ghost Flushing
AssertionBGP
Root Cause Notification
Implication: more redundancy means faster
Tlong convergence
Fail-over Simulation Results
Fail-over: nodes switch to worse paths
26 October 05 12massey@cs.colostate.edu
Conclusions? (Not Yet!) Root Cause Approach is Clear Winner
But several non-trivial deployment problems Not immediately clear we could standardize it.
Ghost-Flushing Does Well in Fail-down Easily incrementally deployed But may not work well in Fail-over
MRAI Timer Only Leaves us with current convergence problems And the network is getting larger…. And other complications in large systems….
26 October 05 13massey@cs.colostate.edu
Damping Analysis
simulation
calculation
no damping
Convergence UpdatesTrigger Damping Policies!
(could fix if we damped the RCNrather than just updates)
26 October 05 14massey@cs.colostate.edu
But What About Packets?
Improving packet delivery is the ultimate goal
Ghost Flushing
AssertionBGP
Root Cause Notification
26 October 05 15massey@cs.colostate.edu
Conclusions Root Cause Approach Adds Many Benefits
Convergence, dampening, packet delivery, diagnosis,….
New Routing Designs Should Include RCN Should be a required part of new routing
protocols Can RCN Be Added to BGP?
Not clear given existing complications To be continued in IRTF Routing Research
Group– Encourage interested researchers to join