Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) teixeira with Aman...

21
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) http://www-cse.ucsd.edu/~teixeira with Aman Shaikh (AT&T), Tim Griffin(Intel), and Geoff Voelker (UCSD) SIGCOMM’04 – Portland, OR

Transcript of Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) teixeira with Aman...

Network Sensitivity to Hot-Potato Disruptions

Renata Teixeira

(UC San Diego)http://www-cse.ucsd.edu/~teixeira

with

Aman Shaikh (AT&T), Tim Griffin(Intel), and Geoff Voelker (UCSD)

SIGCOMM’04 – Portland, OR

2SIGCOMM’04

Internet Routing Architecture

UCSDSprint

AT&T Verio

AOL

interdomain routing (BGP)

intradomain routing (OSPF,IS-IS) User

Web Server

End-to-end performance depends on all ASes

along the path

Changes in one AS may impact traffic

and routing in other ASes

3SIGCOMM’04

Hot-Potato Routing

San Francisco

Dallas

New York

Hot-potato routing = route to closest egress point when there is more than one route to destination

ISP network

9 10

dstmultiple connections to the same peer

-All traffic from customer to peers-All traffic to customer prefixes with multiple connections

4SIGCOMM’04

Hot-Potato Disruption

San Francisco

Dallas

New York

ISP network

dst

9 10- failure- planned maintenance- traffic engineering

11

Routes to thousands of destinations switch

exit point!!!

11

5SIGCOMM’04

Consequences of Hot-Potato Disruptions

Transient forwarding instability Up to three minutes convergence delay Normal internal changes take a couple of seconds

Traffic shift Responsible for largest traffic matrix variations

Interdomain routing changes Around 2 – 5% of a router’s external BGP updates

6SIGCOMM’04

What to do about it?

Engineer network to minimize disruptions Network operator: operational practices to avoid

changes Network designer: designs that minimize sensitivity

Need a vocabulary and metrics to evaluate impact of internal changes Compare possible network designs Identify critical events Take special care during maintenance or traffic

engineering

7SIGCOMM’04

Modeling Hot-Potato Routing

Model of egress selection in backbone networks Internal topology and link weights Set of egress routers for each destination prefix

Apply topology changes Link or router failures Link weight changes

Evaluate impact of topology changes For a router what fraction of prefixes shifts Most critical link failure …

8SIGCOMM’04

Modeling Egress Selection

AB

C

DG

EF

4

5

11

39

34

108

68

dst

Egress set for a destination prefix (dst) = set of border nodes that learn routes to dst ({A,B})

AB

Region of egress node A = nodes that are closer to A than B

Region of A Region of B

9SIGCOMM’04

Modeling Topology Changes

C

DG

EF

4

5

11

39

34

108

68

Region of A Region of B

AB

Topology change = edge or node deletion, link weight change

dst

C

DG

EF

4

5

11

39

34

108

68

Region of A Region of B

AB

C shifts from region of A to B

dst

10SIGCOMM’04

Generalizing to All Prefixes

Fraction of prefixes at a router that change egresses after a single topology change Routing-shift function (HRM)

AB

C

DG

EF4

5

11

39

34

106

68

A B

X (10,000 prefixes)

Z (4,000 prefixes)Y (1,000 prefixes)

Routing-shift at C when CF is deleted= 10,000/15,000(i.e. 2/3)

11SIGCOMM’04

All Prefixes, Routers, and Topology Changesro

uter

s

topology changes

C

failure of CF

fraction of prefixes at C that changes egress after the failure of link CF: 2/3

routing-shift function

12SIGCOMM’04

Node Routing Sensitivity Metrics (RM)

rout

ers

topology changes

C

Node routing sensitivity Expected fraction of route shifts

experienced by a node

Worst case Maximum route shift experienced

by a node

13SIGCOMM’04

Routing Impact of a Graph Transformation (RM)

Impact of graph transformations Average fraction of route shifts

across all nodes

Worst case Maximum route shift caused by each

graph transformation

rout

ers

topology changesfailure of CF

14SIGCOMM’04

Case Study: A Large ISP Backbone Network

Obtaining input for the model Topology – intradomain routing messages Egress sets – collection of BGP tables Set of graph transformations

• Single link failures• Single router failures

Probability distribution for graph transformations• Uniform

15SIGCOMM’04

Order failuresaccording to average impact

Which failures are most disruptive?ro

uter

s

single router failures

fraction of failures

Ro

utin

g Im

pa

ct o

f Fai

lure

s router failureslink failures

Most failures cause no hot-potato disruptionsOperators can focus on

most disruptive failures

16SIGCOMM’04

Which routers are most sensitive?

rout

ers

single router failures

Order routersaccording to average sensitivity

router failureslink failures

fraction of routers

No

de

Ro

utin

g S

en

sitiv

ity

Very few hot-potato changes on average,but there are many failures that causeno shift

High variance among routers

17SIGCOMM’04

What is the largest routing shift for each router?

rout

ers

single router failuresor

single link failures

Order routersaccording to worst case sensitivity

Wo

rst C

ase

N

od

e R

ou

ting

Se

nsi

tivity

fraction of routers

Very disruptive failuresfor some routers

18SIGCOMM’04

Conclusion

Contributions Model of hot-potato disruptions Basis for a sensitivity analysis tool

Robustness should be a first-order metric As important as traditional performance metrics Network should have small reactions to small changes

Two approaches Engineer the system: our model Redesign routing interaction: on-going work

19SIGCOMM’04

Single Link vs. Single Router Failures

AB

CD

E

dst

10 1000

10

10 2020

20SIGCOMM’04

Single Link vs. Single Router Failures

AB

CD

E

dst

10 1000

10

10 2020

21SIGCOMM’04

Minimizing Disruptions

5 5

10

10104 Reconfiguration of routing protocols Link and node redundancy Selection of peering locations