Understanding and Limiting BGP Instabilities Zhi-Li Zhang ([email protected]) Jaideep Chandrashekar...

47
Understanding and Limiting BGP Instabilities Zhi-Li Zhang ([email protected] ) Jaideep Chandrashekar ( [email protected] ) Kuai Xu ([email protected])

Transcript of Understanding and Limiting BGP Instabilities Zhi-Li Zhang ([email protected]) Jaideep Chandrashekar...

Page 1: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Understanding and Limiting BGP Instabilities

Zhi-Li Zhang ([email protected])

Jaideep Chandrashekar ([email protected]) Kuai Xu ([email protected])

Page 2: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP: Internet GlueBGP: Internet Glue

“Path-vector” routing protocol.

Allows networks to tell other networks about destinations that they are “responsible” for and how to reach themUsing “route advertisements”, also called

“NLRI” or “network-layer reachability information”

Page 3: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP: Internet Glue (cont’d)BGP: Internet Glue (cont’d)

Policy-based: allow ISPs to richly express their routing policy, both in selecting outbound paths and in announcing internal routes

Relatively “simple” protocol, but configuration is complex and the entire world can see, and be impacted by, mis-configurations.

Page 4: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

ASes & AS Numbers (ASNs)ASes & AS Numbers (ASNs)

• An autonomous system is an independent routing domain that has been assigned an Autonomous System Number (ASN).

• Currently over 15,000 in use.• 64512 through 65535 are “private”• Examples

• AS57 U of Minnesota GigaPoP• AS217 U of Minnesota • AS701 UUNET• AS1239 Sprint

• ASNs represent atoms of BGP routing policy.

Page 5: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

AS 1Genuity

AS 57 UMN GigaPoP

AS 7911 Wiltel

AS 11537Internet2

AS 217 UMN

AS 1998 State of Minnesota

128.101.0.0/16

Internet Connectivity of University Internet Connectivity of University of Minnesotaof Minnesota

Page 6: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Architecture of Internet Architecture of Internet RoutingRouting

AS 1

AS 2

BGP

EGP = Exterior Gateway Protocol

IGP = Interior Gateway Protocol

Metric based: OSPF, IS-IS, RIP

Policy based: BGP

ISIS

OSPF

Page 7: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Simplified BGP OperationsSimplified BGP Operations

Establish session on TCP port 179

Exchange all active routes

Exchange incremental updates

AS1

AS2

While connection is ALIVE exchangeroute UPDATE messages

BGP session

Page 8: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Types of BGP MessagesTypes of BGP Messages

Open : Establish a peering session.

Keep Alive : Handshake at regular intervals.

Notification : Shuts down a peering session.

Update : announce new routes or withdraw

previously announced routes.

Announcement : prefix + attribute valuesWithdrawals : prefix only

Page 9: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP AttributesBGP AttributesValue Code Reference----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] ... 255 reserved for development

Not all attributes need to be present in every announcement

Page 10: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Two Types of BGP Neighbor Two Types of BGP Neighbor RelationshipsRelationships

• External Neighbor (eBGP) in a different Autonomous Systems

• Internal Neighbor (iBGP) in the same Autonomous System

AS1

AS2

eBGP

iBGP

iBGP is routed (using IGP!)

eBGP

eBGP

Page 11: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

iBGP Peers Must be Fully iBGP Peers Must be Fully MeshedMeshed

iBGP neighbors do not announce routes received via iBGP to other iBGP neighbors.

eBGP update

iBGP updates

• iBGP is needed to avoid routing loops within an AS

• Injecting external routes into IGP does not scale and causes BGP policy information to be lost

• BGP does not provide “shortest path” routing

Page 12: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

AS PATH AttributeAS PATH Attribute

AS7018135.207.0.0/16AS Path = 6341

AS 1239Sprint

AS 1755Ebone

AT&T

AS 3549Global Crossing

135.207.0.0/16AS Path = 7018 6341

135.207.0.0/16AS Path = 3549 7018 6341

AS 6341

135.207.0.0/16

AT&T Research

Prefix Originated

AS 12654RIPE NCCRIS project

AS 1129Global Access

135.207.0.0/16AS Path = 7018 6341

135.207.0.0/16AS Path = 1239 7018 6341

135.207.0.0/16AS Path = 1755 1239 7018 6341

135.207.0.0/16AS Path = 1129 1755 1239 7018 6341

Page 13: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Inter-domain Loop Inter-domain Loop PreventionPrevention

BGP at AS YYY will never accept a route with ASPATH containing YYY.

AS 7018

12.22.0.0/16ASPATH = 1 333 7018 877

Don’t Accept!

AS 1

Page 14: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP Best Path SelectionBGP Best Path Selection Ignore if exit point unreachableHighest local preferenceLowest AS path lengthLowest origin typeLowest MED (with same next hop AS)Lowest IGP cost to next hop Lowest router ID of BGP speaker

Page 15: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

In a nutshellIn a nutshell

BGP = Path Vector Protocol + Policies. The Path vector protocol is very simple

Distribute Reachability. Prevent Loops.

All the complexity is introduced by locally administered policies.

Determine which paths are selected. And which neighbors they are exported to.

Page 16: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Path Exploration and Slow Path Exploration and Slow ConvergenceConvergence

Page 17: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

What is Path Exploration?What is Path Exploration?

When a link fails (or is repaired), routers “go through” a sequence of paths before selecting a “converged” path.

Results from dependencies in advertised “path vectors”.Router’s best path is an extension of a neighbors’

best path.Which extends a best path from one of its own

neighbors.And so on……

Page 18: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

What is Path Exploration (cont’d)What is Path Exploration (cont’d)

When a link fails, a set of dependent paths becomes invalid (or obsolete).

Removed one by one from the system.Router selects and propagates it.Receives withdrawal.Selects next best path (possibly invalid).Receive withdrawal, repeat till no more invalid

paths.

Page 19: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Path Exploration examplePath Exploration example

0 1

2

3

4

5

6

7

10

310

210

4210

6310

742107521076310

9

8

Network in a steady state

Page 20: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Path Exploration Example (cont’d)Path Exploration Example (cont’d)

0 1

2

3

4

5

6

7

10

310

210

4210

6310

742107521076310

7521076310W

9

8

Page 21: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Path Exploration Example (cont’d)Path Exploration Example (cont’d)

Paths 75210 and 76310 both contain the “problem edge” 10.

2 additional messages to force 7 to flush “bad paths”.

Number of “spurious messages” increases with the “richness” of connectivity …..

0 1

2

3

4

5

6

77210

74210

7510

75210

72510

……

8

9

Page 22: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Impact of Path ExplorationImpact of Path Exploration

In general, convergence time is O(LΔ)‘L’ is the longest simple path in the network.‘Δ’ is the time between successive

announcements.

From measurements: up to 15 minutes to converge (after link failure).

Page 23: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Impact of Path Exploration (cont’d)Impact of Path Exploration (cont’d)

Delays a router from picking valid, alternate paths.Have to first go through all the invalid paths.

Large scale packet losses in a short duration.Core routers process millions of packets a second.

In the absence of path exploration, convergence time is Ω(Dh).‘D’ is “diameter” of the network (D << L)‘h’ is message processing time at a node.

Page 24: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Causes for Path ExplorationCauses for Path Exploration

Invalid paths are selected, propagated, then withdrawn.Routers waste time processing “stale information”Delay convergence to valid, perhaps less preferred,

alternate pathsKey Issue: How to distinguish invalid paths from

valid” pathsDifficult in BGP: AS Paths --high level, abstract

Page 25: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

AS PATHS: High Level ConnectivityAS PATHS: High Level Connectivity

AS 81

AS 217

AS 1239

AS 3

AS 11536

AS 217 and AS 3 receive the same AS PATH [11536 1239 81]

Underlying physical paths are disjoint.

Page 26: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Naive Solutions Fail.Naive Solutions Fail.TAG withdrawals:

When router generates withdrawal, tag it with cause/location.

0 1

2

3

4

5

6

7

WDRAW: (2,1)

failed

742107521076310

Page 27: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Naïve Solutions Fail (cont’d)Naïve Solutions Fail (cont’d)

0 1

3

4

5

6

7

AS Paths do not describe (or reflect) internal AS topology.

When an internal edge fails, which AS Path affected?[10] or [210]?

2

Page 28: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Naïve Solutions Fail (cont’d)Naïve Solutions Fail (cont’d)Link between 3.2 and 6.1 fails. 6.1 generates a withdrawal and tags with

<3,6>Should 6.3 remove all paths containing

<3,6> ?

0 1

2

4

5

7

AS 3AS 6

6.1

6.2

3.2

3.3 6.3

Page 29: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

EPIC --- A Simple Solution EPIC --- A Simple Solution

Exploit Path dependencies to Invalidate Paths. To avoid Path Exploration:

When link fails, a set of dependent paths becomes invalid.

All the dependent paths must be removed from the system.

Dependent paths cannot be described using only AS Paths.

AS Paths are annotated with additional information (forward edge sequence numbers). Can capture path dependencies. Can distinguish valid and invalid paths.

Page 30: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Forward Edge Sequence NumbersForward Edge Sequence Numbers

When AS Path being advertised to an external AS neighbor, include fesn of “forward” external edge.

fesn = edge identifier + sequence number

AS X AS Y

Edge <X,Y>

Page 31: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Forward Edge Sequence Numbers (cont’d)Forward Edge Sequence Numbers (cont’d)

Defined per destination, for every AS-AS edge.When AS X sends a route to AS Y, the fesn

(X:Y, n) is attached; If route already has a previously attached fesn,

new fesn is prepended to it ---- fesnList.

AS XAS Z

AS Y

AS W

(X:Y

, n)

(X:Z, m)

(X:Y, n)

Page 32: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

fesnfesn Management Management

When a link fails, its fesn does not change.Same value carried in withdrawals.

When <X,Y> is repaired:AS X increments the sequence number.Subsequent route announcements carry

“updated” fesn.

So a larger fesn always corresponds to “newer” information

Page 33: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

fesnListfesnList Propagation Propagation

0 1

2

3

4

5

6

7

7

7

3

14

73

10

14

11

[4210] {(0:1, 7)(1:2, 7)(2:4, 14)(4:7, 11)}

[0] {(0:1, 7)}

[10] {(0:1, 7)(1:3, 3)}

[10] {

(0:1

, 7)(1

:2,

7)}

Same AS Path, distinct fesnLists

Page 34: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

fesnListfesnList Propagation Propagation

0 1

2

3

4

5

6

7

4210 (4:7, 11) (2:4, 14) (1:2, 7) (0:1, 7)

5210 (5:7, 10) (2:5, 14) (1:2, 7) (0:1, 7)

6310 (6:7, 3) (3:6, 7) (1:3, 7) (0:1, 7)

After the routes are processed at all nodes

Routing Table at AS 7

Page 35: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Invalidating Paths upon FailureInvalidating Paths upon Failure When router generates a withdrawal:

The fesnList of withdrawn route (“path stem”) is attached to the withdrawal.

When router receives a withdrawal:1. Invalidates all routes containing the fesnList2. Selects a new best path3. If best path has changed, it sends new best

route to its neighbors, and the withdrawal is piggybacked.

4. If no valid path, only withdrawal is forwarded.

Page 36: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Invalidating Paths: ExampleInvalidating Paths: Example

0 1

2

3

4

5

6

7

W: {(1:2, 7

), (0:1,

7)}

W: {(1:2, 7), (0:1,

7)}

4210 (4:7, 11) (2:4, 14) (1:2, 7) (0:1, 7)

5210 (5:7, 10) (2:5, 14) (1:2, 7) (0:1, 7)

6310 (6:7, 3) (3:6, 7) (1:3, 7) (0:1, 7)

76310

Page 37: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Handling Link RepairsHandling Link Repairs

When <X,Y> is repaired:

1. AS X increments the fesn for the edge

2. Generates a new route announcement to send to AS Y (reflects updated fesn)

3. At AS Y, the route is installed into routing table and a subsequent route update may be generated.

4. After all updates have been processed, every fesnList containing (X:Y, n) will reflect the updated value.

Page 38: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

What about Multiple Edges?What about Multiple Edges?

Each edge is associated with a minor fesnContrast with major fesn for “logical” AS-AS edge.

All edges between ASes share the same major fesn, but have distinct minor fesn’s.

Minor fesn is incremented with corresponding edge.major fesn incremented only if all edges are

affected.

Page 39: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Minor Minor fesn’sfesn’sMinor fesn’s are only used between adjacent

ASes.All routers in AS 6 include minor fesn in route

updates.When the updates exported externally (to AS

7) minor fesn is removed.

AS 3 AS 6

0 1

2

4

5

7

6.1

6.2

3.2

3.3 6.3

7 (11)

7 (13)

common major fesn distinct

minor fesn’s

Page 40: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

fesn fesn – Key Properties– Key Properties

Sequence number is monotonic --- new events will have higher values.

Imposes a partial ordering on the fesnLists.Old information can be easily detected, and

discarded.

Allows compact, correct description of invalid paths i.e. the fesnList in a withdrawal captures all obsolete paths.

Page 41: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

EPIC PropertiesEPIC Properties

No router will select an invalid path after receiving any update triggered by a single failure event.

No router will select an invalid path after receiving at least one update triggered by each of a set of multiple failure events.

Achieves optimal bounds for a path vector protocol.Routers may still explore paths.But these paths are all valid.

Page 42: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

EPIC Performance (vs BGP)EPIC Performance (vs BGP)

Time (L-2)Δ (D-1)h

Messages (L-2)(|E| -1) |E| - 1

Time (L’+D’–1) Δ D’(h+Δ)

Messages (L’+D’-1)(|E’|-1) (|E’|-1)D’(h+Δ)/Δ

Fail Down

Fail Over

BGP EPIC

Page 43: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Root Cause Analysis of Root Cause Analysis of BGP EventsBGP Events

Page 44: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP Routing DynamicsBGP Routing Dynamics

BGP routing instabilities BGP routing suffers from many problems, e.g., mis-

configurations, link failures, policy changes, slow convergence, etc.

BGP update streams are visible from all BGP-monitoring vantage points.

Open research problems What are the common characteristics of BGP

dynamics? What are primary causes of BGP routing dynamics? How to visualize BGP dynamics?

Page 45: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP Routing Update (per second)BGP Routing Update (per second) View: UMN View: UMN Time: 2003/12/07 – 2003/12/14Time: 2003/12/07 – 2003/12/14

Time vs. Number of BGP updates at prefix level

BGP Update Burst

BGP Update Noise

Page 46: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

BGP Routing Update (per second) (cont.) View: UMN Time: 2003/12/07 – 2003/12/14

Time vs. Number of BGP updates at AS level

BGP Update Burst

BGP Update Noise

Page 47: Understanding and Limiting BGP Instabilities Zhi-Li Zhang (zhzhang@cs.umn.edu) Jaideep Chandrashekar (jaideepc@cs.umn.edu) Kuai Xu (kxu@cs.umn.edu)zhzhang@cs.umn.edujaideepc@cs.umn.edu.

Modeling BGP Routing DynamicsModeling BGP Routing Dynamics

Modeling BGP dynamics on all prefixes/ASes is challenging.~120, 000 prefixes, ~16,000 ASes

High-dimensional time-series BGP updates are temporally and spatially

correlated