Internet Routing Instability Craig Labovitz G. Robert Malan Farnam Jahanian Presenters: Supranamaya...
-
Upload
hannah-walton -
Category
Documents
-
view
223 -
download
3
Transcript of Internet Routing Instability Craig Labovitz G. Robert Malan Farnam Jahanian Presenters: Supranamaya...
Internet Routing Instability
Craig LabovitzG. Robert MalanFarnam Jahanian
Presenters: Supranamaya Ranjan Mohammed Ahamed
Appeared: SIGCOMM ‘97
The Core of the Internet
Verio
UUNet
Sprint
rice.edu
• Routing done using BGP at core
• Inter-domain routing could be RIP/OSPF etc
BGP Overview
Verio
UUNet
Sprint 128.42.x.x196.29.x.x
128.42.x.x
100.100.x.x196.29.x.x
92.92.x.x
100.100.x.x196.29.x.x
92.92.x.x
BGP Overview (contd.)
• Similar to Distance Vector routing
• Path Vector protocol
• Loop detection done using AS_PATH field
Peering session (TCP)
R1 R2
• Exchange full routing table at start
• Updates sent incrementally
Key Point
The volume of BGP messages exchanged is abnormally high
• Most messages are redundant / unnecessary and do not correspond to and topology or policy changes
Consequence: Instability
• Normal data packets handled by dedicated hardware
• BGP packet processing consumes CPU time
• Severe CPU processing overhead takes the router offline
Route Flap Storm:
C
B
A
• Router A temporarily fails
• When A becomes alive B & C send full routing tables
• B & C fail…cascading effect
How do we avoid /lessen the impact of these problems?
Route Dampening
• Router does not accept frequent route updates to a destination
• Might signal that network has erratic connectivity
• Increment counter for destination when route changes
• Counter exceeds threshold stop accepting updates
Problem:
• Future legitimate announcements are accepted only after a delay
• Decrement counter with time
Prefix Aggregation/Super-netting
• Core router advertises a less specific network prefix
• Reduces size of routing tables exchanged
Problems:
- Internet addresses largely non-hierarchically assigned
Prefix aggregation is not effective because:
- 25% of prefixes multi-homed
- Multi-homed prefixes should be exposed at the core
- Domain renumbering not done when changing ISP’s
Route Servers
• O(N) peering sessions per Router
• 1 peering session per router
Route Server
In-spite of all these measures the BGP message overheadis unexpectedly high
Evaluation Methodology
• Data from Route Server at M.A.E west (D.C) peering point
• Peering point for more than 60 major ISP’s
• Time series analysis of message exchange events
• Nine month log
Observation: Lot’s of redundant updates
• Duplicate route with-drawls
Number of With-drawls Unique ISP
A 23276 4344
F 86417 12435
I 2479023 14112
Ratio
5
7
175
One Reason:
- Stateless BGP
- No state of previous with-drawls maintained
Observation: Instability Proportional to ActivityAfter removing duplicate messages:
Tim
e o
f d
ay
Lesser messages
Lesser messages
10:00 AM
ISP infrastructure up-grade
Instability density with time
6:00
12:00
18:00
24:00
Evidence from Fine Grained Structure
Conjecture: BGP packets are competing with data packets duringhigh bandwidth activity.
Nu
mb
er
of
insta
bilit
y e
ven
ts
Frequency (1/hour)
7 days
24 hours
Pow
er
sp
ectr
al d
en
sit
y
Observation: Instability & size uncorrelated
• ISP’s serving more network prefixesmay not contribute more to instability
AADiff
Proportion of routing table
Pro
port
ion
of
an
nou
ncem
en
ts
WADup
Proportion of routing table
Pro
port
ion
of
an
nou
ncem
en
ts
WADiff
Proportion of routing tablePro
port
ion
of
an
nou
ncem
en
ts
Observation: Instability distributed over routes
75% median
• 20% to 90% of routes change 10 times or less
• No single route contributes significantly to instability
10
# of announcements per prefix+AS
Cu
mu
lati
ve p
rop
ort
ion
Observation: Synchronized updates
• Inter-arrival times of updates shows periodicity
• 30 s and 1 minute patterns
• Some routers collect and send Updates once every 30 s
• Routers get synchronized
Possible reasons:
• Border router- Internal router: interaction misconfigured??
AADiff
Inter Arrival Time distribution for AADiff’s
Pro
port
ion
30s
1min
End-to-end Perspective
Chinoy: “Dynamics of Internet routing information” (SIGCOMM 93)
Measurements on NSFNET showed: - Processing and forwarding latency of BDP update is 3 orders of magnitude more than the latency incurred in forwarding data packets - Will lead to packet drops during the intervening period
Paxson: “End-to-End routing behavior in the internet” (SIGCOMM 96)
- Routing loops introduce loops into other router’s routing tables
- An end-to-end route changes every 1.5 hours on an average
End-to-End perspective (Paxson)
Pathology type
Probability in 1995
Probability in 1996
Long-livedRouting loops
Short-livedRouting loops
Outage>30s
Total
0.065%~
0.14%~same
same
0.96% 2.2%
3.4%1.5%
Summary and Conclusions
• Redundant routing information flows in core
• Instability distributed across autonomous systems
Possible reasons for instability:
- Stateless BGP updates
- Misconfigured routers
- Synchronization
- Clocks driving the links not synchronized (link “flaps”)
Follow-up work & impact
• Migration from stateless to stateful BGP decreased duplicate withdrawals by an order of magnitude
“Origins of Internet Routing Instability”-1999
• But Duplicate Announcements (AADup) doubled
• Reason: Non-transitive attribute filtering not implemented
- BGP specification: “never propagate non-transitive attributes”..
- ASPATH is transitive attribute
- MED (Multi Exit Discriminator) is NOT transitive