Slides Selected from SIGCOMM 2001 BGP Tutorial by Tim Griffin
1 Interdomain Routing and BGP The slides here were created by Timothy G. Griffin of AT&T and...
-
Upload
bernice-russell -
Category
Documents
-
view
217 -
download
1
Transcript of 1 Interdomain Routing and BGP The slides here were created by Timothy G. Griffin of AT&T and...
1
Interdomain Routing and BGP
The slides here were created by Timothy G. Griffin of AT&T and presented at a SIGCOMM 2001 Tutorial Session.
The slides have been edited by me (D. Szajda). Any errors introduced as a result of these minor changes are my
responsibility.
2
An Introduction to Interdomain Routing
and BGP
Timothy G. Griffin
http://www.research.att.com/~griffin/interdomain.html
SIGCOMM 2001 Tutorial Session
August 28, 2001
3
Caveat
• These slides have been edited by me. Any errors introduced as a result of these minor changes are my responsibility.
4
Acknowledgements
Thanks to Jay Borkenhagen, Randy Bush, Anja Feldmann, Matt Grossglauser, Madan Musuvathi, Jennifer Rexford, Shubho Sen, and Jia Wang for many helpful comments
My opinions should not be taken torepresent AT&T policy
Errors are my own
5
BGP Generalities
• BGP is a path-vector protocol– Generally, vectoring refers to creating routes without
having global knowledge of the tolopology
• Based on neighbors reported paths, you choose preferred path and report that to neighbors
• BGP implements policy-based routing– talk of “minimum paths” is ill defined.– configure routers with route preferences
6
Policy Based Routing
• Ways to configure routers that effect which routes get computed– Route preferences: e.g. don’t use any path
going through AS X– Which destinations should not be reported
to which neighbors?– How a path should be edited when passed
to a particular neighbor• E.G. maybe add several AS numbers into path to
discourage, but not prevent, others from using path
7
Policy Based Routing
• Is there a better way? • R. Perlman supports link-state system
plus source specified routes.– Link-state gives complete information– Sources can then determine routes that
meet their specifications– Instead of influencing neighbors with
advertisements, simply drop traffic you don’t want to forward
8
Common View of the Telco Network
Brick
9
Common View of the IP Network (Layer 3)
10
What This Tutorial Is About
routing
11
GoalUnderstand how layer 3 connectivity is maintained in the global Internet
This tutorial will not say much about the applications that exploit this connectivity.
It will be restricted to IPv4 unicast routing.
• Part I : The basics of interdomain routing and BGP
• Part II : BGP in practice: Issues of Scale
12
Outline Part I
• Forwarding vs. Routing• IP addressing• Autonomous Systems (basic units of
interdomain routing)• The Border Gateway Protocol (BGP)
– BGP fundamentals – BGP route attributes– Implementing policy with BGP – A wee bit of theory
13
Outline Part II
• Scaling internal BGP• BGP table growth
– Address aggregation vs. Multihoming
• Growth in number of autonomous systems
• Dynamics of BGP– Route flapping – BGP convergence– Rates of BGP updates
14
Best Effort Connectivity
This is the fundamental service provided by Internet Service Providers (ISPs)
All other IP services depend on connectivity: DNS, email, VPNs, Web Hosting, …
IP traffic
135.207.49.8 192.0.2.153
15
Routing vs. Forwarding
R
R
RA
B
C
D
R1
R2
R3
R4 R5
ENet Nxt Hop
R4R3R3R4DirectR4
Net Nxt Hop
A B C D Edefault
R2R2DirectR5R5R2
Net Nxt Hop
A B C D Edefault
R1DirectR3R1R3R1
Default toupstreamrouter
A B C D Edefault
Forwarding: determine next hop
Routing: establish end-to-end paths
Forwarding always works
Routing can be badly broken
16
How Are Forwarding Tables Populated to implement
Routing?Statically Dynamically
Routers exchange network reachability information using ROUTING PROTOCOLS. Routers use this to compute best routes
Administrator manually configuresforwarding table entries
In practice : a mix of these.Static routing mostly at the “edge”
+ More control+ Not restricted to destination-based forwarding - Doesn’t scale- Slow to adapt to network failures
+ Can rapidly adapt to changes in network topology+ Can be made to scale well- Complex distributed algorithms- Consume CPU, Bandwidth, Memory- Debugging can be difficult- Current protocols are destination-based
17
Routers Talking to Routers
Routing info
Routing info
• Routing computation is distributed among routers within a routing domain
• Computation of best next hop based on routing information is the most CPU/memory intensive task on a router
• Routing messages are usually not routed, but exchanged via layer 2 between physically adjacent routers (internal BGP and multi-hop external BGP are exceptions)
18
Before We Go Any Further …
IP ROUTING PROTOCOLS DO NOT DYNAMICALLY ROUTE AROUND NETWORK CONGESTION
• IP traffic can be very bursty • Dynamic adjustments in routing typically
operate more slowly than fluctuations in traffic load
• Dynamically adapting routing to account for traffic load can lead to wild, unstable oscillations of routing system
19
Autonomous Routing Domains
A collection of physical networks glued togetherusing IP, that have a unified administrativerouting policy.
• Campus networks• Corporate networks• ISP Internal networks• …
20
Autonomous Systems (ASes)
An autonomous system is an autonomous routing domainthat has been assigned an Autonomous System Number (ASN).
RFC 1930: Guidelines for creation, selection, and registration of an Autonomous System
… the administration of an AS appears to other ASes to have a single coherent interior routing plan and presents a consistent picture of what networks are reachable through it.
21
AS Numbers (ASNs)ASNs are 16 bit values.
64512 through 65535 are “private”
• Genuity: 1 • MIT: 3• Harvard: 11• UC San Diego: 7377• AT&T: 7018, 6341, 5074, … • UUNET: 701, 702, 284, 12199, …• Sprint: 1239, 1240, 6211, 6242, …• …
ASNs represent units of routing policy
Currently over 11,000 in use.
22
Architecture of Dynamic Routing
AS 1
AS 2
BGP
EGP = Exterior Gateway Protocol
IGP = Interior Gateway Protocol
Metric based: OSPF, IS-IS, RIP, EIGRP (cisco)
Policy based: BGP
The Routing Domain of BGP is the entire Internet
OSPF
EIGRP
23
• Topology information is flooded within the routing domain
• Best end-to-end paths are computed locally at each router.
• Best end-to-end paths determine next-hops.
• Based on minimizing some notion of distance
• Works only if policy is shared and uniform
• Examples: OSPF, IS-IS
• Each router knows little about network topology
• Only best next-hops are chosen by each router for each destination network.
• Best end-to-end paths result from composition of all next-hop choices
• Does not require any notion of distance
• Does not require uniform policies at all routers
• Examples: RIP, BGP
Link State Vectoring
Technology of Distributed Routing
24
The Gang of Four
Link State Vectoring
EGP
IGP
BGP
RIPIS-IS
OSPF
25
Many Routing Processes Can Run on a Single Router
Forwarding Table
OSPFDomain
RIPDomain
BGP
OS kernel
OSPF Process
OSPF Routing tables
RIP Process
RIP Routing tables
BGP Process
BGP Routing tables
Forwarding Table Manager
26
IPv4 Addresses are 32 Bit Values
11111111 00010001 10000111 00000000
255 013417
255.17.134.0Dotted quad notation
IPv6 addresses have 128 bits
27
Classful Addresses
0nnnnnnn
10nnnnnn nnnnnnnn
nnnnnnnn nnnnnnnn110nnnnn
hhhhhhhh hhhhhhhh hhhhhhhh
hhhhhhhh hhhhhhhh
hhhhhhhh
n = network address bit h = host identifier bit
Class A
Class C
Class B
Leads to a rigid, flat, inefficient use of address space …
28
RFC 1519: Classless Inter-Domain Routing (CIDR)
IP Address : 12.4.0.0 IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Address
Mask
for hosts Network Prefix
Use two 32 bit numbers to represent a network. Network number = IP address + Mask
Usually written as 12.4.0.0/15
29
Which IP Addresses are Covered by a Prefix?
00001100 00000100 00000000 00000000
11111111 11111110 00000000 0000000012.4.0.0/15
00001100 00000101 00001001 00010000
00001100 00000111 00001001 00010000
12.5.9.16
12.7.9.16
12.5.9.16 is covered by prefix 12.4.0.0/15
12.7.9.16 is not covered by prefix 12.4.0.0/15
30
CIDR = Hierarchy in Addressing
12.0.0.0/8
12.0.0.0/16
12.254.0.0/16
12.1.0.0/16
12.2.0.0/16
12.3.0.0/16
:::
12.253.0.0/16
12.3.0.0/2412.3.1.0/24
::
12.3.254.0/24
12.253.0.0/1912.253.32.0/1912.253.64.0/19
12.253.96.0/1912.253.128.0/1912.253.160.0/1912.253.192.0/19
:::
31
Classless Forwarding
Destination =12.5.9.16------------------------------- payload
Prefix Interface Next Hop
12.0.0.0/8 10.14.22.19 ATM 5/0/8
12.4.0.0/15
12.5.8.0/23 attached
Ethernet 0/1/3
Serial 1/0/7
10.1.3.77
IP Forwarding Table
0.0.0.0/0 10.14.11.33 ATM 5/0/9
even better
OK
better
best!
32
IP Address Allocation and Assignment: Internet Registries
IANAwww.iana.org
RFC 2050 - Internet Registry IP Allocation Guidelines RFC 1918 - Address Allocation for Private Internets RFC 1518 - An Architecture for IP Address Allocation with CIDR
ARINwww.arin.org
APNICwww.apnic.org
RIPEwww.ripe.org
Allocate to National and local registries and ISPs Addresses assigned to customers by ISPs
33
Nontransit vs. Transit ASes
ISP 1ISP 2
Nontransit ASmight be a corporateor campus network.Could be a “content provider”
NET ATraffic NEVER flows from ISP 1through NET A to ISP 2(At least not intentionally!)
IP traffic
Internet Serviceproviders (often)have transit networks
34
Selective Transit
NET BNET C
NET A provides transitbetween NET B and NET Cand between NET D and NET C
NET A
NET D
NET A DOES NOTprovide transitBetween NET D and NET B
Most transit networks transit in a selective manner…
IP traffic
35
Customers and Providers
Customer pays provider for access to the Internet
provider
customer
IP trafficprovider customer
36
Customers Don’t Always Need BGP
provider
customer
Nail up default routes 0.0.0.0/0pointing to provider.
Nail up routes 192.0.2.0/24pointing to customer
192.0.2.0/24
Static routing is the most common way of connecting anautonomous routing domain to the Internet. This helps explain why BGP is a mystery to many …
37
Customer-Provider Hierarchy
IP trafficprovider customer
38
The Peering Relationship
peer peer
customerprovider
Peers provide transit between their respective customers
Peers do not provide transit between peers
Peers (often) do not exchange $$$trafficallowed
traffic NOTallowed
39
Peering Provides Shortcuts
Peering also allows connectivity betweenthe customers of “Tier 1” providers.
peer peer
customerprovider
40
Peering Wars
• Reduces upstream transit costs
• Can increase end-to-end performance
• May be the only way to connect your customers to some part of the Internet (“Tier 1”)
• You would rather have customers
• Peers are usually your competition
• Peering relationships may require periodic renegotiation
Peering struggles are by far the most contentious issues in the ISP world!
Peering agreements are often confidential.
Peer Don’t Peer
41
BGP-4• BGP = Border Gateway Protocol
• Is a Policy-Based routing protocol
• Is the de facto EGP of today’s global Internet
• Relatively simple protocol, but configuration is complex and the
entire world can see, and be impacted by, your mistakes.
• 1989 : BGP-1 [RFC 1105]– Replacement for EGP (1984, RFC 904)
• 1990 : BGP-2 [RFC 1163]
• 1991 : BGP-3 [RFC 1267]
• 1995 : BGP-4 [RFC 1771] – Support for Classless Interdomain Routing (CIDR)
42
BGP Operations (Simplified)
Establish session on TCP port 179
Exchange all active routes
Exchange incremental updates
AS1
AS2
While connection is ALIVE exchangeroute UPDATE messages
BGP session
43
Four Types of BGP Messages
• Open : Establish a peering session.
• Keep Alive : Handshake at regular intervals.
• Notification : Shuts down a peering session.
• Update : Announcing new routes or withdrawing previously announced routes.
announcement = prefix + attributes values
44
BGP Attributes
Value Code Reference----- --------------------------------- --------- 1 ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] ... 255 reserved for development
From IANA: http://www.iana.org/assignments/bgp-parameters
This tutorial will cover these attributes
Not all attributesneed to be present inevery announcement
45
Attributes are Used to Select Best Routes
192.0.2.0/24pick me!
192.0.2.0/24pick me!
192.0.2.0/24pick me!
192.0.2.0/24pick me!
Given multipleroutes to the sameprefix, a BGP speakermust pick at mostone best route
(Note: it could reject them all!)
46
Two Types of BGP Neighbor Relationships
• External Neighbor (eBGP) in a different Autonomous System
• Internal Neighbor (iBGP) in the same Autonomous System AS1
AS2
eBGP
iBGP
iBGP is routed (using IGP!)
47
iBGP Peers Must be Fully Meshed
iBGP neighbors do not announce routes received via iBGP to other iBGPneighbors.
eBGP update
iBGP updates
• iBGP is needed to avoid routing loops within an AS
• Injecting external routes into IGP does not scale and causes BGP policy information to be lost
• BGP does not provide “shortest path” routing
• Is iBGP an IGP? NO!
48
BGP Next Hop Attribute
Every time a route announcement crosses an AS boundary, the Next Hop attribute is changed to the IP address of the border router that announced the route.
AS 6431AT&T Research
135.207.0.0/16Next Hop = 12.125.133.90
AS 7018AT&T
AS 12654RIPE NCCRIS project
12.125.133.90
135.207.0.0/16Next Hop = 12.127.0.121
12.127.0.121
49
Forwarding Table
Forwarding Table
Join EGP with IGP For Connectivity
AS 1 AS 2192.0.2.1
135.207.0.0/16
10.10.10.10
EGP
192.0.2.1135.207.0.0/16
destination next hop
10.10.10.10192.0.2.0/30
destination next hop
135.207.0.0/16Next Hop = 192.0.2.1
192.0.2.0/30
135.207.0.0/16
destination next hop
10.10.10.10
+
192.0.2.0/30 10.10.10.10
Router IP addressNetwork address
(for AS 2)
Arrows show directionof route advertisement
50
Forwarding Table
Forwarding Table
Next Hop Often Rewritten to Loopback
AS 1 AS 2192.0.2.1
135.207.0.0/16
10.10.10.10
EGP
127.22.33.44135.207.0.0/16
destination next hop
10.10.10.10127.22.33.44
destination next hop
135.207.0.0/16
destination next hop
10.10.10.10
+
127.22.33.44
10.10.10.10
127.22.33.44
135.207.0.0/16Next Hop = 192.0.2.1
135.207.0.0/16Next Hop = 127.22.33.44
Loopback refers todirecting traffic backto advertising router
51
Implementing Customer/Provider and Peer/Peer relationships
• Enforce transit relationships – Outbound route filtering
• Enforce order of route preference– provider < peer < customer
Two parts:
52
Import Routes
Frompeer
Frompeer
Fromprovider
Fromprovider
From customer
From customer
provider route customer routepeer route ISP route
53
Export Routes
Topeer
Topeer
Tocustomer
Tocustomer
Toprovider
From provider
provider route customer routepeer route ISP route
filtersblock
54
How Can Routes be Colored?BGP Communities!
A community value is 32 bits
By convention, first 16 bits is ASN indicating who is giving itan interpretation
communitynumber
Very powerful BECAUSE it has no (predefined) meaning
Community Attribute = a list of community values.(So one route can belong to multiple communities)
RFC 1997 (August 1996)
Used for signalingwithin and betweenASes
Two reserved communities
no_advertise 0xFFFFFF02: don’t pass to BGP neighbors
no_export = 0xFFFFFF01: don’t export out of AS
55
Communities Example
• 1:100– Customer routes
• 1:200– Peer routes
• 1:300– Provider Routes
• To Customers– 1:100, 1:200, 1:300
• To Peers– 1:100
• To Providers– 1:100
AS 1
Import Export
56
192.0.2.0/24
192.0.2.0/24Accidental or maliciousannouncement of your prefixcan blackhole your destinations in large part of the Internet
peer peer
customerprovider
Need Filter Here!
legitimate
not legitimate
Blackholes
57
Mars Attacks!
• 0.0.0.0/0: default• 10.0.0.0/8: private• 172.16.0.0/12: private• 192.168.0.0/16: private• 127.0.0.0/8: loopbacks• 128.0.0.0/16: IANA reserved • 192.0.2.0/24: test networks• 224.0.0.0/3: classes D and E• …..
Martian list often includes
Martian list gives addresses that shouldbe considered suspectin an advertisement(See RFC 1149)
58
Import Routes (Revisited)
Frompeer
Frompeer
Fromprovider
Fromprovider
From customer
From customer
provider route customer routepeer route ISP route
filtersmartians
xxxxxx
xxxxxx
xxxxxx
xxxxxxxxxxxx
xxxxxxxxxxxx
Customeraddress filters
cccccccccccccccccc
potentialblackhole
Martian
59
So Many Choices
Which route shouldFrank pick to 13.13.0.0./16?
AS 1
AS 2
AS 4
AS 3
13.13.0.0/16
Frank’s Internet Barn
peer peer
customerprovider
60
BGP Route Processing
Best Route Selection
Apply Import Policies
Best Route Table
Apply Export Policies
Install forwardingEntries for bestRoutes.
ReceiveBGPUpdates
BestRoutes
TransmitBGP Updates
Apply Policy =filter routes & tweak attributes
Based onAttributeValues
IP Forwarding Table
Apply Policy =filter routes & tweak attributes
Open ended programming.Constrained only by vendor configuration language
61
Tweak Tweak Tweak
• For inbound traffic– Filter outbound routes– Tweak attributes on
outbound routes in the hope of influencing your neighbor’s best route selection
• For outbound traffic– Filter inbound routes– Tweak attributes on
inbound routes to influence best route selection
outboundroutes
inboundroutes
inboundtraffic
outboundtraffic
In general, an AS has morecontrol over outbound traffic
62
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED (Multi-Exit Discriminator attribute)
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
63
Back to Frank …
AS 1AS 2
AS 4
AS 3
13.13.0.0/16
peer peer
customerprovider
local pref = 80
local pref = 100
local pref = 90
Higher Localpreference valuesare more preferred
Local preference only used in iBGP
64
Implementing Backup Links with Local Preference (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes receivedfrom AS 1 AS 65000
Set Local Pref = 50for all routes receivedfrom AS 1
We’ll talk about inbound traffic soon …
65
Multihomed Backups (Outbound Traffic)
Forces outbound traffic to take primary link, unless link is down.
AS 1
primary link backup link
Set Local Pref = 100for all routes receivedfrom AS 1
AS 2
Set Local Pref = 50for all routes receivedfrom AS 3
AS 3provider provider
66
ASPATH Attribute
AS7018135.207.0.0/16AS Path = 6341
AS 1239Sprint
AS 1755Ebone
AT&T
AS 3549Global Crossing
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 3549 7018 6341
AS 6341
135.207.0.0/16
AT&T Research
Prefix Originated
AS 12654RIPE NCCRIS project
AS 1129Global Access
135.207.0.0/16AS Path = 7018 6341
135.207.0.0/16AS Path = 1239 7018 6341
135.207.0.0/16AS Path = 1755 1239 7018 6341
135.207.0.0/16AS Path = 1129 1755 1239 7018 6341
67
Interdomain Loop Prevention
BGP at AS YYY will never accept a route with ASPATH containing YYY.
AS 7018
12.22.0.0/16ASPATH = 1 333 7018 877
Don’t Accept!
AS 1
68
Traffic Often Follows ASPATH
AS 4AS 3AS 2AS 1135.207.0.0/16
135.207.0.0/16ASPATH = 3 2 1
IP Packet Dest =135.207.44.66
69
… But It Might Not
AS 4AS 3AS 2AS 1135.207.0.0/16
135.207.0.0/16ASPATH = 3 2 1
IP Packet Dest =135.207.44.66
AS 5
135.207.44.0/25ASPATH = 5
135.207.44.0/25
AS 2 filters allsubnets with maskslonger than /24
135.207.0.0/16ASPATH = 1
From AS 4, it may look like thispacket will take path 3 2 1, but it actually takespath 3 2 5
70
In fairness: could you do this “right” and still scale?
Exporting internalstate would dramatically increase global instability and amount of routingstate
Shorter Doesn’t Always Mean Shorter
AS 4
AS 3
AS 2
AS 1
Mr. BGP says that path 4 1 is better than path 3 2 1
Duh!
71
Shedding Inbound Traffic with ASPATH Padding Hack
Padding will (usually) force inbound traffic from AS 1to take primary link
AS 1
192.0.2.0/24ASPATH = 2 2 2
customerAS 2
provider
192.0.2.0/24
backupprimary
192.0.2.0/24ASPATH = 2
72
Padding May Not Shut Off All Traffic
AS 1
192.0.2.0/24ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
AS 3 will sendtraffic on “backup”link because it prefers customer routes and localpreference is considered before ASPATH length!
Padding in this way is oftenused as a form of loadbalancing
backupprimary
73
COMMUNITY Attribute to the Rescue!
AS 1
customerAS 2
provider
192.0.2.0/24
192.0.2.0/24ASPATH = 2
AS 3provider
backupprimary
192.0.2.0/24ASPATH = 2 COMMUNITY = 3:70
Customer import policy at AS 3:If 3:90 in COMMUNITY then set local preference to 90If 3:80 in COMMUNITY then set local preference to 80If 3:70 in COMMUNITY then set local preference to 70
AS 3: normal customer local pref is 100,peer local pref is 90
74
Hot Potato Routing: Go for the Closest Egress Point
192.44.78.0/24
15 56 IGP distances
egress 1 egress 2
This Router has two BGP routes to 192.44.78.0/24.
Hot potato: get traffic off of your network as Soon as possible. Go for egress 1!
75
Getting Burned by the Hot Potato
15 56
172865High bandwidth
Provider backbone
Low bandwidthcustomer backbone
Heavy Content Web Farm
Many customers want their provider to carry the bits!
tiny http request
huge http reply
SFF NYC
San Diego
76
Cold Potato Routing with MEDs(Multi-Exit Discriminator Attribute)
15 56
172865
Heavy Content Web Farm
192.44.78.0/24
192.44.78.0/24MED = 15
192.44.78.0/24MED = 56
This means that MEDs must be considered BEFOREIGP distance!
Prefer lower MED values
Note1 : some providers will not listen to MEDs
Note2 : MEDs need not be tied to IGP distance
77
Route Selection Summary
Highest Local Preference
Shortest ASPATH
Lowest MED
i-BGP < e-BGP
Lowest IGP cost to BGP egress
Lowest router ID
traffic engineering
Enforce relationships
Throw up hands andbreak ties
This is somewhat simplified. Hey, what happened to ORIGIN??
78
Policies Can Interact Strangely(“Route Pinning” Example)
backup
Disaster strikes primary linkand the backup takes over
Primary link is restored but sometraffic remains pinned to backup
1 2
3 4
Install backup link using community
customer
79
News At 11:00
• BGP is not guaranteed to converge on a stable routing. Policy interactions could lead to “livelock” protocol oscillations. See “Persistent Route Oscillations in Inter-domain Routing” by K. Varadhan,
R. Govindan, and D. Estrin. ISI report, 1996 • Corollary: BGP is not guaranteed to recover
from network failures.
80
What Problem is BGP solving?
Underlying problem
Shortest Paths
Distributed means of computing a solution.
X?
RIP, OSPF, IS-IS
BGP
• aid in the design of policy analysis algorithms and heuristics• aid in the analysis and design of BGP and extensions• help explain some BGP routing anomalies • provide a fun way of thinking about the protocol
X could
A Wee Bit of Theory
81
Separate dynamic and static semantics
SPVP = Simple Path Vector Protocol = a distributed algorithm for solving SPP
BGP
SPVP
Booo Hooo, Many, many complications...
BGP Policies
Stable Paths Problem (SPP)
staticsemantics
dynamicsemantics
See [Griffin, Shepherd, Wilfong]
82
1
An instance of the Stable Paths Problem (SPP)
2 5 5 2 1 0
0
2 1 02 0
1 3 01 0
3 0
4 2 04 3 0
3
4
2
1
• A graph of nodes and edges, • Node 0, called the origin, • For each non-zero node, a set
of permitted paths to the origin. This set always contains the “null path”.
• A ranking of permitted paths at each node. Null path is always least preferred. (Not shown in diagram)
When modeling BGP : nodes represent BGP speaking routers, and 0 represents a node originating some address block
most preferred…least preferred (not null)
Yes, the translation gets messy!
83
5 5 2 1 0
1
A Solution to a Stable Paths Problem
2
0
2 1 02 0
1 3 01 0
3 0
4 2 04 3 0
3
4
2
1
• node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph,
• each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors.
– There may be many paths consistent with those assigned to neighbors. This condition limits which may be chosen
A Solution need not represent a shortest path tree, or a spanning tree.
A solution is an assignment of permitted paths to each node such that
red path isthe assignedpath
84
An SPP may have multiple solutions
First solution
1
0
2
1 2 01 0
1
0
2
1
0
2
2 1 02 0
1 2 01 0
2 1 02 0
1 2 01 0
2 1 02 0
Second solutionDISAGREE
85
BAD GADGET : No Solution
2
0
31
2 1 02 0
1 3 01 0
3 2 03 0
4
3
• Why bad? Well…– if 1 uses (1,3,0) then 2 must use path (2,0) (it can’t use (2,1,0)) and 3
must use (3,0), in which case 3 ends up assigned a consistent path that is NOT the highest ranked among consistent paths.
– if 1 uses (1,0), then 2 can use either (2,1,0) or (2,0). If 2 uses (2,1,0) then 3 must use (3,0), which means that the route (1,0) of 1 is consistent but not highest ranked. If 2 instead uses (2,0), then 3 will use (3,2,0), so the route (2,1,0) of 2 will be consistent with assignments to neighbors, but is not assigned to 2, creating a violation of the solution properties.
86
SURPRISE : Beware of Backup Policies
2
0
31
2 1 02 0
1 3 01 0
3 4 2 03 0
4
4 04 2 04 3 0
Becomes a BAD GADGET if link (4, 0) goes down.
BGP is not robust : it is not guaranteed to recover from network failures.
87
PRECARIOUS
1
0
2
1 2 01 0
2 1 02 0
3
4
5 6
5 3 1 05 6 3 1 2 05 3 1 2 0
6 3 1 06 4 3 1 2 06 3 1 2 0
4 3 1 04 5 3 1 2 04 3 1 2 0
3 1 03 1 2 0
As with DISAGREE, this part has two distinct solutions
This part has a solution only when node 1 is assigned the direct path (1 0).
Has a solution, but can get “trapped”
88
Part II
Issues of scale for BGP in the real world
89
Big and Getting Bigger
• Scaling the iBGP mesh– Confederations– Route Reflectors
• BGP Table Growth– Address aggregation (CIDR) – Address allocation
• AS number allocation and use• Dynamics of BGP
– Inherent vs. accidental oscillation – Rate limiting and route flap dampening– Lots and lots of noise – Slow convergence time
ScaleScaleScaleScaleScaleScaleScaleScaleScaleScaleScaleScaleScale
90
iBGP Mesh Does Not Scale
eBGP update
iBGP updates
• N border routers means N(N-1)/2
peering sessions
• Each router must have N-1 iBGP
sessions configured
• The addition of a single iBGP speaker
requires configuration changes to all
other iBGP speakers
• Size of iBGP routing table can be order
N larger than number of best routes
(remember alternate routes!)
• Each router has to listen to update
noise from each neighbor
Currently four solutions: (0) Buy bigger routers!(1) Break AS into smaller ASes(2) BGP Route reflectors(3) BGP confederations
91
• Route reflectors can pass on
iBGP updates to clients
• Each RR passes along ONLY
best routes • ORIGINATOR_ID and
CLUSTER_LIST attributes are needed to avoid loops
RR RR
RR
Route Reflectors
92
BGP Confederations
AS 65501
AS 65502
AS 65503 AS 65504AS 65500
AS 1
From the outside, this looks like AS 1
Confederation eBGP (between member ASes) preserves LOCAL_PREF, MED, and BGP NEXTHOP.
93
BGP Table Growth
Thanks to Geoff Huston. http://www.telstra.net/ops/bgptable.html on August 8, 2001
94
Large BGP Tables Considered Harmful
• Routing tables must store best routes and alternate routes
• Burden can be large for routers with many alternate routes (route reflectors for example)
• Routers have been known to die• Increases CPU load, especially during
session reset
Moore’s Law may save us in theory. But in practice it means spending money to upgradeequipment …
95
Deaggregation Due to Multihoming May be a Leading
Cause
AS 1
customerAS 2
provider
12.0.0.0/8
AS 3provider
12.2.0.0/16
12.2.0.0/16
12.2.0.0/16
If AS 1 doesnot announce themore specific prefix,then most traffic to AS 2 will go through AS 3 because it is a longer match
AS 2 is “punching a hole” inThe CIDR block of AS 1
96
How Many ASNs are there?
Thanks to Geoff Huston. http://www.telstra.net/ops on June 23, 2001
97
64,511
2005?2007?
When will we run out of ASNs?
98
What is to be done?
• Make ASNs larger than 16 bits– How about 32 bits? – See Internet Draft: “BGP support for four-octet AS
number space” (draft-ietf-idr-as4bytes-03.txt)– Requires protocol change and wide deployment
• Change the way ASNs are used– Allow multihomed, non-transit networks to use private
ASNs– Use ASE (AS number Substitution on Egress )– See Internet Draft: “Autonomous System Number
Substitution on Egress” (draft-jhaas-ase-00.txt)– Works at edge, requires protocol change (for loop
prevention)– Makes some kinds of debugging harder!
99
Multihomed and “Private”! (draft-jhaas-ase-00.txt)
In fairness: could you “do thisright” and still scale?
AS 2AS 1
AS 65535
63.63.63.0/24
AS 3
63.63.63.0/24
ASPATH = 65535
63.63.63.0/24
ASPATH = 65535
Replaceprivate ASN
Choice of “private” ASN requires a bit of additional coordination between providers
A non-transit network
63.63.63.0/24
ASPATH = 2
ASE-ORIGINATOR = 65535
63.63.63.0/24
ASPATH = 1
ASE-ORIGINATOR = 65535
ASE-ORIGINATOR is a new attribute needed for “sender side loop detection” at AS 1 and 2
100
BGP Routing Tables
• Use “whois” queries to associate an ASN with “owner” (for example, http://www.arin.net/whois/arinwhois.html)
• 7018 = AT&T Worldnet, 701 =Uunet, 3561 = Cable & Wireless, …
• Hey, we can use these paths to draw cool graphs!
show ip bgpBGP table version is 111849680, local router ID is 203.62.248.4Status codes: s suppressed, d damped, h history, * valid, > best, i - internalOrigin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
. . .*>i192.35.25.0 134.159.0.1 50 0 16779 1 701 703 i*>i192.35.29.0 166.49.251.25 50 0 5727 7018 14541 i*>i192.35.35.0 134.159.0.1 50 0 16779 1 701 1744 i*>i192.35.37.0 134.159.0.1 50 0 16779 1 3561 i*>i192.35.39.0 134.159.0.3 50 0 16779 1 701 80 i*>i192.35.44.0 166.49.251.25 50 0 5727 7018 1785 i*>i192.35.48.0 203.62.248.34 55 0 16779 209 7843 225 225 225 225 225 i*>i192.35.49.0 203.62.248.34 55 0 16779 209 7843 225 225 225 225 225 i*>i192.35.50.0 203.62.248.34 55 0 16779 3549 714 714 714 i*>i192.35.51.0/25 203.62.248.34 55 0 16779 3549 14744 14744 14744 14744 14744 14744 14744 14744 i. . .
Thanks to Geoff Huston. http://www.telstra.net/ops on July 6, 2001
101
AS Graphs Can Be Fun
The subgraph showing all ASes that have more than 100 neighbors in fullgraph of 11,158 nodes. July 6, 2001. Point of view: AT&T route-server
102
AS Graphs Depend on Point of View
This explains why there is no UUNET (701) Sprint (1239) link on previous slide!
peer peer
customerprovider
54
2
1 3
6
54
2
6
1 3
54 6
1 3
54
2
6
1 32
103
AS Graphs Do Not Show Topology!
The AS graphmay look like this. Reality may be closer to this…
BGP was designed to throw away information!
104
BGP Dynamics
• How many updates are flying around the Internet?
• How long Does it take Routes to Change?
The goals of (1) fast convergence (2) minimal updates (3) path redundancy are at odds
105
Daily Update Count
106
What is the Sound of One Route Flapping?
107
A Few Bad Apples …
Thanks to Madanlal Musuvathi for this plot. Data source: RIPE NCC
Typically, 80% ofthe updates are for less than 5% Of the prefixes.
Most prefixes are stable most of the time. On this day, about 83% of the prefixes were not updated.
Percent of BGP table prefixes
108
Two BGP Mechanisms for Squashing Updates
• Rate limiting on sending updates– Send batch of updates every
MinRouteAdvertisementInterval seconds (+/- random fuzz)
– Default value is 30 seconds– A router can change its mind about best
routes many times within this interval without telling neighbors
• Route Flap Dampening– Punish routes for misbehaving
Effective in dampening oscillations inherent in the vectoring approach
Must be turned onwith configuration
109
30 Second Bursts
110
How Long Does BGP Take to Adapt to Changes?
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100 120 140 160
Seconds Until Convergence
Cu
mu
lati
ve P
erce
nta
ge
of
Eve
nts
Tup
Tshort
Tlong
Tdow n
Thanks to Abha Ahuja and Craig Labovitz for this plot.
111
Two Main Factors in Delayed Convergence
• Rate limiting timer slows everything down
• BGP can explore many alternate paths before giving up or arriving at a new path– No global knowledge in vectoring protocols
112
Why is Rate Limiting Needed?
Updatesto convergence
MinRouteAdvertisementInterval
0
Timeto convergence
MinRouteAdvertisementInterval
0
SSFNet (www.ssfnet.org) simulations, T. Griffin and B.J. Premore. To appear in ICNP 2001.
Rate limiting dampens some of the oscillation inherent in a vectoring protocol.
Current interval (30 seconds) was picked “out of the blue sky” (yet “impact on BGP convergence time is tremendous”)
113
Route Flap Dampening (RFC 2439)
Routes are given a penalty for changing. If penalty exceeds suppress limit, the route is dampened. When the route is not changing, its penalty decays exponentially. If the penalty goesbelow reuse limit, then it is announced again.
• Can dramatically reduce the number of BGP updates
• Requires additional router resources• Applied on eBGP inbound only
114
Route Flap Dampening Example
route dampenedfor nearly 1 hour
penalty for each flap = 1000
115
Q: Why All the Updates?
• Networks come, networks go• There’s always a router rebooting somewhere • Hardware failure, flaky interface cards, backhoes
digging, floods in Houston, …
This is “normal” --- exactly what dynamic routing is designed for…
116
Q: Why All the Updates?
• Misconfiguration • Route flap dampening not widely used• BGP exploring many alternate paths• Software bugs in implementation of routing protocols• BGP session resets due to congestion or lack of interoperability: BGP
sessions are brittle. One malformed update is enough to reset session and flap 100K routes. (Consequence of incremental approach)
• IGP instability exported by use of MEDs or IGP tie breaker• Sub-optimal vendor implementation choices• Secret sauce routing algorithms attempting fancy-dancy tricks• Weird policy interactions (MED oscillation, BAD GADGETS??)• Gnomes, sprites, and fairies • ….
A: NO ONE REALLY KNOWS …
117
IGP Tie Breaking Can Export Internal Instability to the Whole Wide World
15 56
192.44.78.0/24
AS 4
AS 3AS 2
AS 1
10
FLAP
FLAP
FLAP FLAP192.44.78.0/24ASPATH = 4 2 1
192.44.78.0/24ASPATH = 4 3 1
118
MEDs Can Export Internal Instability
15
172865
Heavy Content Web Farm
192.44.78.0/24
192.44.78.0/24MED = 15
192.44.78.0/24MED = 56 OR 10
56
10
FLAP
FLAP
FLAP
FLAP
FLAPFLAP
119
Implementation Does Matter!
Thanks to Abha Ahuja and Craig Labovitz for this plot.
stateless withdrawswidely deployed
stateful withdrawswidely deployed
120
How Long Will Interdomain Routing Continue to Scale?
... the existing interdomain routinginfrastructure is rapidly nearing the end of its useful lifetime. Itappears unlikely that mere tweaks of BGP will stave off fundamentalscaling issues, brought on by growth, multihoming and other causes.
A quote from some recent email:
Is this true or false? How can we tell?
Research required…
121
Summary
• BGP is a fairly simple protocol …• … but it is not easy to configure• BGP is running on more than 100K
routers (my estimate), making it one of world’s largest and most visible distributed systems
• Global dynamics and scaling principles are still not well understood
122
Addressing and ASN RFCs
• RFC 1380 IESG Deliberations on Routing and Addressing (1992) • RFC 1517Applicability Statement for the Implementation of Classless Inter-
Domain Routing (CIDR) (1993) • RFC 1518 An Architecture for IP Address Allocation with CIDR (1993) • RFC 1519 Classless Inter-Domain Routing (CIDR) (1993) • RFC 1467 Status of CIDR Deployment in the Intrenet (1983) • RFC 1520 Exchanging Routing Information Across Provider Boundaries in the
CIDR Environment (1993) • RFC 1817 CIDR and Classful routing (1995) • RFC 1918 Address Allocation for Private Internets (1996) • RFC 2008 Implications of Various Address Allocation Policies for Internet Routing
(1996) • RFC 2050 Internet Registry IP Allocation Guidelines (1996) • RFC 2260 Scalable Support for Multi-homed Multi-provider Connectivity (1998) • RFC 2519 A Framework for Inter-Domain Route Aggregation (1999)
• RFC 1930 Guidelines for creation, selection, and registration of an Autonomous System (AS)
• RFC 2270 Using a Dedicated AS for Sites Homed to a Single Provider
123
Selected BGP RFCs
• IDR : http://www.ietf.org/html.charters/idr-charter.html
• RFC 1771 A Border Gateway Protocol 4 (BGP-4)
• Latest draft rewrite: draft-ietf-idr-bgp4-12.txt
• RFC 1772 Application of the Border Gateway Protocol in the Internet
• RFC 1773 Experience with the BGP-4 protocol
• RFC 1774 BGP-4 Protocol Analysis
• RFC 2796 BGP Route Reflection An alternative to full mesh IBGP
• RFC 3065 Autonomous System Confederations for BGP
• RFC 1997 BGP Communities Attribute
• RFC 1998 An Application of the BGP Community Attribute in Multi-home Routing
• RFC 2439 Route Flap Dampening
Internet Engineering Task Force (IETF)
http://www.ietf.org
124
Titles of Some Recent Internet Drafts
• Dynamic Capability for BGP-4• Application of Multiprotocol BGP-4 to IPv4 Multicast Routing • Graceful Restart mechanism for BGP• Cooperative Route Filtering Capability for BGP-4• Address Prefix Based Outbound Route Filter for BGP-4• Aspath Based Outbound Route Filter for BGP-4• Architectural Requirements for Inter-Domain Routing in the Internet• BGP support for four-octet AS number space• Autonomous System Number Substitution on Egress• BGP Extended Communities Attribute• Controlling the redistribution of BGP routes• BGP Persistent Route Oscillation Condition• Benchmarking Methodology for Basic BGP Convergence• Terminology for Benchmarking External Routing Convergence
Measurements
BGP is a moving target …
125
Selected Bibliography on Routing
• Internet Routing Architectures. Bassam Halabi. Second edition Cisco Press, 2000
• BGP4: Inter-domain Routing in the Internet. John W. Stewart, III. Addison-Wesley, 1999
• Routing in the Internet. Christian Huitema. 2000
• ISP Survival Guide: Strategies for Running a Competitive ISP. Geoff Huston. Wiley, 1999.
• Interconnection, Peering and Settlements. Geoff Huston. The Internet Protocol Journal. March and June 1999.
126
BGP Stability and Convergence
• The Impact of Internet Policy and Topology on Delayed Routing Convergence. Craig Labovitz, Abha Ahuja, Roger Wattenhofer, Srinivasan Venkatachary. INFOCOM 2001
• An Experimental Study of BGP Convergence. Craig Labovitz, Abha Ahuja, Abhijit Abose, Farnam Jahanian. SIGCOMM 2000
• Origins of Internet Routing Instability. C. Labovitz, R. Malan, F. Jahanian. INFOCOM 1999
• Internet Routing Instability. Craig Labovitz, G. Robert Malan and Farnam Jahanian. SIGCOMM 1997
127
Analysis of Interdomain Routing
• Cooperative Association for Internet Data Analysis (CAIDA)– http://www.caida.org/– Tools and analyses promoting the engineering and maintenance of a
robust, scalable global Internet infrastructure
• Internet Performance Measurement and Analysis (IPMA)– http://www.merit.edu/ipma/– Studies the performance of networks and networking protocols in local
and wide-area networks
• National Laboratory for Applied Network Research (NLANR)– http://www.nlanr.net/– Analysis, tools, visualization.
• IRTF Routing Research Group (IRTF-RR)– http://puck.nether.net/irtf-rr/
128
Internet Route Registries
• Internet Route Registry– http://www.irr.net/
• Routing Policy Specification Language (RPSL)– RFC 2622 Routing Policy Specification Language
(RPSL) – RFC 2650 Using RPSL in Practice
• Internet Route Registry Daemon (IRRd)– http://www.irrd.net/
• RAToolSet – http://www.isi.edu/ra/RAToolSet/
129
Some BGP Theory
• Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan, and Deborah Estrin. Computer Networks, Jan. 2000. (Also USC Tech Report, Feb. 1996)
– Shows that BGP is not guaranteed to converge• An Architecture for Stable, Analyzable Internet Routing. Ramesh Govindan, Cengiz Alaettinoglu,
George Eddy, David Kessens, Satish Kumar, and WeeSan Lee. IEEE Network Magazine, Jan-Feb 1999.
– Use RPSL to specify policies. Store them in registries. Use registry for conguration generation and analysis.
• An Analysis of BGP Convergence Properties. Timothy G. Griffin, Gordon Wilfong. SIGCOMM 1999
– Model BGP, shows static analysis of divergence in policies is NP complete• Policy Disputes in Path Vector Protocols. Timothy G. Griffin, F. Bruce Shepherd, Gordon
Wilfong. ICNP 1999– Define Stable Paths Problem and develop sufficient condition for “sanity”
• A Safe Path Vector Protocol. Timothy G. Griffin, Gordon Wilfong. INFOCOM 2001– Dynamic solution for SPVP based on histories
• Stable Internet Routing without Global Coordination. Lixin Gao, Jennifer Rexford. SIGMETRICS 2000
– Show that if certain guidelines are followed, then all is well. • Inherently safe backup routing with BGP. Lixin Gao, Timothy G. Griffin, Jennifer Rexford.
INFOCOM 2001– Use SPP to study complex backup policies
130
Thank You!
• Companion links:
• http://www.research.att.com/~griffin/interdomain.html