Routing Convergence and the Impact of Scale Dan Massey Colorado State University.
-
Upload
caleb-kelly -
Category
Documents
-
view
224 -
download
2
Transcript of Routing Convergence and the Impact of Scale Dan Massey Colorado State University.
Routing Convergence and the Impact of Scale
Dan MasseyColorado State University
26 October 05 [email protected]
Internet Routing and BGP Internet divided into Autonomous Systems
Large-scale implies maintaining entire topology at a router is not feasible.
BGP is the inter-AS routing protocol. Router stores the AS path to a destination.
Path allows router to apply policies
How quickly does BGP converge after a change? Can BGP continue to scale with more growth?
Do we need BGP changes or a new protocol?
26 October 05 [email protected]
(B A) (C B A) (E D B A) (H G F E A)
BGP Path Exploration
H
BZ
D
A
E
C
dest.
I G
Obsolete paths: (C B A), (E D B A) If Z knew [B A] failed, it could’ve avoided the obsolete paths
Z’s Candidate paths:
() (C B A) (E D B A) (I H G F A)
() () (E D B A) (I H G F A)
() () () (I H G F A)
F
( )
( )
( )( )
( )
( )
26 October 05 [email protected]
Path Exploration and Policy Internet does not select the shortest path
Policies limit the number of potential paths. Especially at high level tiers.
Example: Due to routing policy, AS-X (lower tier) sees more alternate paths than AS-Z (tier-1). Via multiple providers Via peers
Z
X
P2
Y
W
P1
26 October 05 [email protected]
Impact of Topology Growth
Denser connectivity => more alternate paths
Impact depends on policies and tier Lower tier nodes see more slow convergence
MRAIoff
MRAIon
Jan 2, 2004 Dec 2, 2004
Beacon prefix 198.32.7.0/24
RV peer (AS#)
#updates
#paths
#updates
#paths
1239 (tier1) 44 4 37 4
1221 62 8 87 11
2914 (tier1) 106 6 279 7
3557 102 19 198 39
26 October 05 [email protected]
Convergence Improvements
MRAI Timer (Deployed Now) Require minimum time between updates
Typically 30 seconds
Assertion Checking (Proposed in INFOCOM 02) Signal policy or topological failure in some cases
Discard routes that include failed subpath
Ghost Flushing (Proposed in INFOCOM 03) When the MRAI timer delays an update, send a withdrawal
Attach Failure Notification (INFOCOM05, CompNet05) Explicitly list the cause of the failure
26 October 05 [email protected]
MRAI Rate-Limiting Timer
Minimum Route Advertisement Interval (MRAI) timer:
Within M=30 seconds, at most one announcement from A to B
P1 P2 P 3P 4 P 5A’s path changes:
Msgs from A to B:P1
time=0 time=30time=60
P4 P 5
b. delay convergence
a. suppress transient changes
Impact:
26 October 05 [email protected]
MRAI and Ghost Flushing
MRAI prevents removal of stale information Suppose P1 to P5 are increasingly worse Neighbor believes P1 still available until time 30
P1 P2 P 3P 4 P 5A’s path changes:
Msgs from A to B:P1
time=0 time=30time=60
P4 P 5w
Ghost Flushing: if change to longer path and MRAI applies, send a withdraw
w
26 October 05 [email protected]
Root Cause Notification
The node who detects the failure attaches root cause to msg Other nodes copy the root cause to outgoing messages
(B A) (C B A) (E D B A) (H G F E A) H
BZ
D
A
E
C
I G
Z’s Candidate paths:
F () (C B A) (E D B A) (I H G F A)
( ), [B A] failure
( ), [B A] failure
( ), [B A] failure
the first msg is enough for Z to remove all the obsolete paths
26 October 05 [email protected]
Ghost Flushing
Assertion
BGP
Root Cause Notification
Fail-down Simulation Results
Fail-down: destination becomes unreachable
26 October 05 [email protected]
Ghost Flushing
AssertionBGP
Root Cause Notification
Implication: more redundancy means faster
Tlong convergence
Fail-over Simulation Results
Fail-over: nodes switch to worse paths
26 October 05 [email protected]
Conclusions? (Not Yet!) Root Cause Approach is Clear Winner
But several non-trivial deployment problems Not immediately clear we could standardize it.
Ghost-Flushing Does Well in Fail-down Easily incrementally deployed But may not work well in Fail-over
MRAI Timer Only Leaves us with current convergence problems And the network is getting larger…. And other complications in large systems….
26 October 05 [email protected]
Damping Analysis
simulation
calculation
no damping
Convergence UpdatesTrigger Damping Policies!
(could fix if we damped the RCNrather than just updates)
26 October 05 [email protected]
But What About Packets?
Improving packet delivery is the ultimate goal
Ghost Flushing
AssertionBGP
Root Cause Notification
26 October 05 [email protected]
Conclusions Root Cause Approach Adds Many Benefits
Convergence, dampening, packet delivery, diagnosis,….
New Routing Designs Should Include RCN Should be a required part of new routing
protocols Can RCN Be Added to BGP?
Not clear given existing complications To be continued in IRTF Routing Research
Group– Encourage interested researchers to join