Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

15
Routing Convergence and the Impact of Scale Dan Massey Colorado State University

Transcript of Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

Page 1: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

Routing Convergence and the Impact of Scale

Dan MasseyColorado State University

Page 2: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Internet Routing and BGP Internet divided into Autonomous Systems

Large-scale implies maintaining entire topology at a router is not feasible.

BGP is the inter-AS routing protocol. Router stores the AS path to a destination.

Path allows router to apply policies

How quickly does BGP converge after a change? Can BGP continue to scale with more growth?

Do we need BGP changes or a new protocol?

Page 3: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

(B A) (C B A) (E D B A) (H G F E A)

BGP Path Exploration

H

BZ

D

A

E

C

dest.

I G

Obsolete paths: (C B A), (E D B A) If Z knew [B A] failed, it could’ve avoided the obsolete paths

Z’s Candidate paths:

() (C B A) (E D B A) (I H G F A)

() () (E D B A) (I H G F A)

() () () (I H G F A)

F

( )

( )

( )( )

( )

( )

Page 4: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Path Exploration and Policy Internet does not select the shortest path

Policies limit the number of potential paths. Especially at high level tiers.

Example: Due to routing policy, AS-X (lower tier) sees more alternate paths than AS-Z (tier-1). Via multiple providers Via peers

Z

X

P2

Y

W

P1

Page 5: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Impact of Topology Growth

Denser connectivity => more alternate paths

Impact depends on policies and tier Lower tier nodes see more slow convergence

MRAIoff

MRAIon

Jan 2, 2004 Dec 2, 2004

Beacon prefix 198.32.7.0/24

RV peer (AS#)

#updates

#paths

#updates

#paths

1239 (tier1) 44 4 37 4

1221 62 8 87 11

2914 (tier1) 106 6 279 7

3557 102 19 198 39

Page 6: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Convergence Improvements

MRAI Timer (Deployed Now) Require minimum time between updates

Typically 30 seconds

Assertion Checking (Proposed in INFOCOM 02) Signal policy or topological failure in some cases

Discard routes that include failed subpath

Ghost Flushing (Proposed in INFOCOM 03) When the MRAI timer delays an update, send a withdrawal

Attach Failure Notification (INFOCOM05, CompNet05) Explicitly list the cause of the failure

Page 7: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

MRAI Rate-Limiting Timer

Minimum Route Advertisement Interval (MRAI) timer:

Within M=30 seconds, at most one announcement from A to B

P1 P2 P 3P 4 P 5A’s path changes:

Msgs from A to B:P1

time=0 time=30time=60

P4 P 5

b. delay convergence

a. suppress transient changes

Impact:

Page 8: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

MRAI and Ghost Flushing

MRAI prevents removal of stale information Suppose P1 to P5 are increasingly worse Neighbor believes P1 still available until time 30

P1 P2 P 3P 4 P 5A’s path changes:

Msgs from A to B:P1

time=0 time=30time=60

P4 P 5w

Ghost Flushing: if change to longer path and MRAI applies, send a withdraw

w

Page 9: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Root Cause Notification

The node who detects the failure attaches root cause to msg Other nodes copy the root cause to outgoing messages

(B A) (C B A) (E D B A) (H G F E A) H

BZ

D

A

E

C

I G

Z’s Candidate paths:

F () (C B A) (E D B A) (I H G F A)

( ), [B A] failure

( ), [B A] failure

( ), [B A] failure

the first msg is enough for Z to remove all the obsolete paths

Page 10: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Ghost Flushing

Assertion

BGP

Root Cause Notification

Fail-down Simulation Results

Fail-down: destination becomes unreachable

Page 11: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Ghost Flushing

AssertionBGP

Root Cause Notification

Implication: more redundancy means faster

Tlong convergence

Fail-over Simulation Results

Fail-over: nodes switch to worse paths

Page 12: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Conclusions? (Not Yet!) Root Cause Approach is Clear Winner

But several non-trivial deployment problems Not immediately clear we could standardize it.

Ghost-Flushing Does Well in Fail-down Easily incrementally deployed But may not work well in Fail-over

MRAI Timer Only Leaves us with current convergence problems And the network is getting larger…. And other complications in large systems….

Page 13: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Damping Analysis

simulation

calculation

no damping

Convergence UpdatesTrigger Damping Policies!

(could fix if we damped the RCNrather than just updates)

Page 14: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

But What About Packets?

Improving packet delivery is the ultimate goal

Ghost Flushing

AssertionBGP

Root Cause Notification

Page 15: Routing Convergence and the Impact of Scale Dan Massey Colorado State University.

26 October 05 [email protected]

Conclusions Root Cause Approach Adds Many Benefits

Convergence, dampening, packet delivery, diagnosis,….

New Routing Designs Should Include RCN Should be a required part of new routing

protocols Can RCN Be Added to BGP?

Not clear given existing complications To be continued in IRTF Routing Research

Group– Encourage interested researchers to join