On Measuring, Inferring, and ModelingInternet Connectivity:
A Guided Tour across the TCP/IP Protocol Stack
Walter WillingerAT&T Labs-Research
2
AcknowledgmentsMain Collaborators• John Doyle (Caltech)• David Alderson (Caltech)• Lun Li (Caltech)
Contributions• Reiko Tanaka (RIKEN, Japan)• Matt Roughan (Univ. of Adelaide, Australia)• Steven Low (Caltech)• Ramesh Govindan (USC)• Hyunseok Chang, Sugih Jamin, Morely Mao (Univ. of Michigan)• Stanislav Shalunov (Abilene)• Heather Sherman (CENIC)• Daniel Stutzbach, Reza Rejaie (Univ. of Oregon)• S. Sen, Nick Duffield (AT&T Labs-Research)
3
Outline• Internet architecture and Internet topology
• On measuring Internet connectivity
• On inferring Internet connectivity structures
• On modeling Internet connectivity
• On validating models of Internet connectivity structures
4
The Internet: The User Perspective
mycomputer
router router
webserver
5
The Internet: The Engineering Perspective
HTTP
TCPIP
LINK
mycomputer
router router
webserver
6
The Internet is a LAYERED Network
HTTP
TCPIP
LINK
mycomputer
router router
webserver
packetpacketpacketpacketpacketpacket
The perception of the Internet as a simple, user-friendly, and robust
system is enabled by FEEDBACK and other CONTROLS that operate both
WITHIN LAYERS and ACROSS LAYERS.
These ARCHITECTURAL DETAILS (protocols, interfaces, etc.) are MOST ESSENTIAL to
the nature of the Internet.
7
Internet Architecture: Vertical Decomposition
HTTP
TCPIP
LINK
mycomputer
router router
webserver
Ver
tical
dec
ompo
sitio
nP
roto
col S
tack Benefits:
• Each layer can evolve independently
• Substitutes, complementsRequirements:1. Each layer follows the rules2. Every other layer does “good
enough” with its implementation
8
Internet Architecture: Horizontal Decomposition
HTTP
TCPIP
LINK
mycomputer
router router
webserver
Horizontal decompositionEach level is decentralized and asynchronous
Benefit: Individual components can fail (provided that they “fail off”) without disrupting the network.
9
Internet Connectivity: The “hourglass” Architecture
Applications
TCP
IP
Transmission
WWW, Email, Napster, FTP, …
Ethernet, ATM, POS, WDM, …
• Consider a (vertical) layer of the Internet hourglass• Expand it horizontally• Give layer-specific meaning to “nodes” and “links”
10
“links”
“nodes”
11
Router-Level Internet
12
From Router-Level to Autonomous System (AS)-Level Internet
13
AS Graphs = Business Relationships
AS 1 AS 3
AS 4AS 2
Nodes = ASesLinks = peeringrelationships
14
AS Graphs obscure Topology!
The AS graphmay look like this. Reality may be closer to this…
Courtesy Tim Griffin
15
(Part of the) Web Graph
Nodes = documents, connections = hyperlinks
16
The many Facets of Internet Topology
IP
TRANSMISSION
TCP
virtual
physical static
dynamicAPPLICATION
Router-level connectivity
IP-level connectivity
Autonomous System (AS) graph
Web graphEmail graphP2P graphand many others …
17
MESSAGE #1: Specify WHICH aspect of Internet topology
• There is no “generic” Internet topology• The many facets of Internet topology
– Router-level (physical)– IP-level (logical)– AS-level (logical)– Application-level (logical)– …
• Details of each make a big difference• Lack of specificity can cause confusion
– Albert, Jeong, and Barabasi (2000) study robustness properties of the Internet by equating AS-level topology with router-level topology
– Knocking out nodes in the AS graph??
18
On Measuring Internet Connectivity• No central agency/repository • Economic incentive for ISPs to obscure network
structure• Direct inspection is typically not possible• Based on measurement experiments, hacks• Mismatch between what we want to measure and
can measure
19
On Measuring the Internet’s Router-level Topology• traceroute tool
– Discovers compliant (i.e., IP) routers along path between selected network host computers
• Large-scale traceroute experiments– Pansiot and Grad (router-level map from around
1995)– Cheswick and Burch (mapping project 1997--)– Mercator (router-level maps from around 1999 by
R. Govindan and H. Tangmunarunkit)– Skitter (ongoing mapping project by CAIDA folks)– Rocketfuel (state-of-the-art router-level maps of
individual ISPs by UW folks)– Dimes (EU project)
20
http://research.lumeta.com/ches/map/
21
http://www.isi.edu/scan/mercator/mercator.html
22
http://www.caida.org/tools/measurement/skitter/
23http://www.cs.washington.edu/research/networking/rocketfuel/bb
24http://www.cs.washington.edu/research/networking/rocketfuel/
25
HOWEVER: Problems with existing measurements• traceroute-based measurements are ambiguous
– traceroute is strictly about IP-level connectivity– traceroute cannot distinguish between high
connectivity nodes that are for real and that are fake and due to underlying Layer 2 (e.g., Ethernet, ATM) or Layer 2.5 technologies (e.g., MPLS)
26
http://www.caida.org/tools/measurement/skitter/
www.savvis.netmanaged IP and hosting companyfounded 1995offering “private IP with ATM at core”
This “node” is an entire
network! (not just a router)
27http://www.cs.washington.edu/research/networking/rocketfuel/
Illusion of a fully-meshed Network due to use of MP
28
HOWEVER: Problems with existing measurements• traceroute-based measurements are ambiguous
– traceroute is strictly about IP-level connectivity– traceroute cannot distinguish between high
connectivity nodes that are for real and that are fake and due to underlying Layer 2 (e.g., Ethernet, ATM) or Layer 2.5 technologies (e.g., MPLS)
• traceroute-based measurements are inaccurate– Requires some guesswork in deciding which IP
addresses/interface cards refer to the same router (“alias resolution” problem)
• traceroute-based measurements are incomplete/biased– IP-level connectivity is more easily/accurately inferred
the closer the routers are to the traceroute source(s)– Node degree distribution is inferred to be of the
power-law type even when the actual distribution is not
29
On Measuring the Internet’s AS-level Topology• BGP routing tables/updates
– RouteViews (Univ. of Oregon)– RIPE (Europe)– E.g., 129.223.224.0/19 7018 701 4637 1221
• Traceroute measurements– Skitter (CAIDA)– Dimes
• Other available sources– Public databases (WHOIS)– Looking glass sites, additional routing tables
30
HOWEVER: Problems with existing measurements• BGP-based measurements are incomplete
– Contains most nodes (ASes)– Might miss up to 40-50% of existing links
• BGP-based measurements are ambiguous– Dynamics of AS-level Internet– Requires some guesswork in deciding whether a
“new” node or link is genuine• BGP-based measurements are inaccurate
– Use of heuristics for inferring peering relationships
31
On Measuring the Internet’s Overlay Topologies
• P2P networks– Structured (e.g., Kad DHT): Central control– Unstructured (e.g., Gnutella): Crawler– Sampling
• World-Wide-Web (WWW)– AltaVista crawls (Broder et al,) in 1999– Duration is a couple of weeks
32
HOWEVER: Problems with existing measurements• High degree of dynamics of overlay networks
– Connectivity structure changes underneath the crawler
– Fast vs. slow crawls• Enormous size of overlay networks
– Complete crawls take too long– Alternative approach: Sampling
• Issues of sampling bias– Due to temporal dynamics of nodes (peers)– Due to spatial features of overlay topology
33
MESSAGE #2: Internet connectivity measurements should never be taken at face
value• Each technique is typically specific to the network of
interest (e.g., traceroute for IP-level, BGP tables for AS-level)
• It is important to understand the process by which the measurements were obtained and collected
• Even best-of-breed measurement data is ambiguous, inaccurate, and incomplete
• Taking (someone else’s) data at face value may provide a false basis for results
34
On Inferring Internet Connectivity• Quality of available data
– See earlier• Quality of data analysis
– Doing specious analysis with specious data• Sensitivity of inferred properties to known
imperfections of the underlying data– See later
35
100 101 102 103 104100
101
102
103
α+1 = 1.5
100 101 102 103 104100
101
102
103
α = 0.5
α = 1.0
Size-Frequency vs. Size-Rank Plotsor
Non-cumulative vs. Cumulative
36
102 103 104100
101
102
103
500 1000 1500100
101
102
103
102
103100
101
102
0 400 800 1200100
101
102
Ye
Exponential
ScalingYs
Ye appears (incorrectly) to be scaling
Ys appears (incorrectly) to be exponential
Rank k
Freq.
Ye
Exponential
ScalingYs
ykyk
yk yk
Rank k
Freq.
0
Ye
Ye
Ys
Ys
37
100 101 102 103 104100
101
102
103
104
105
106
100 101 102 10310
0
101
102
103
104
105
100 101 102 103100
101
102
103
104
105
106
0 50 100 150 200 250 300 350100
101
102
103
104
105
106
Noncumulative Size-Frequency Binned Size-Frequency
Size-Rank (log-log scale) Size-Rank (log-linear scale)
raw MERCATOR data a common reporting technique
without 2 largest nodes
without 2 largest nodes
exponentialin tail…
plausibledata?
38
MESSAGE #3: Be aware of specious analysis of specious measurements
• Know your data• Avoid (non-cumulative) size-frequency plots• Rely on (cumulative) size-rank plots
39
On Modeling Internet Connectivity and Model Validation
• All models are wrong …• There are in general many different
explanations/models for one and the same phenomenon
• Role of randomness vs. design• To argue in favor of any particular model typically
requires additional information– In the form of domain knowledge– In the form of new or complementary data
• Reproducing a given graph statistics is a data-fitting exercise and does not validate a chosen model
40
Node Degree (d)
Source: Faloutsos et al. (1999)
Ran
k: R
(d) =
P (D
>d) x
#no
des
100 101 10210 0
10 1
10 2
10 3
10 4
Node Degree (d)
Nod
e R
ank
Internet Topology and Power Laws
Node Degree: d = # connections
Router-Level Graph Autonomous System (AS) Graph
• A random variable X is said to follow a power law with index α > 0 if
• Has led to active research in degree-based network models
41
Degree-Based (Random) Graph Models• Basic Idea: traditional random graphs [Erdös & Renyí, 59]
do not produce power laws, so develop new models that explicitly attempt to match the observed (power law) distribution in node degree
• Preferential Attachment– Incremental growth + new nodes attach to high-degree
nodes – “Rich get richer”—power laws in asymptotic limit– Scale-free networks [Barabási & Albert, 99]– Generators: Inet, GPL, AB, BA, BRITE, CMU power-law
generator
• Expected Degree Sequence– Based on random graph models that skew probability
distribution to produce power laws in expectation– Power law random graph (PLRG) [Aiello et al., 00]– Generalized random graph (GRG) [Chung & Lu, 03]
42
Preferential Attachment
43
“Error tolerance”= Loss of random
node has little effect
“Attack vulnerability”= Targeted loss of
hub fragments network
“Scale-free” networks and the
“Achilles’ heel” of the InternetReference: R. Albert, H. Jeong,
and A.-L. Barabási. Attack and error tolerance of complex networks. Nature 406, 378-382, 2000.
node degree101
101
102
100
node
rank
44
Broad implications for the Internet and other networks
Power laws in network connectivity…⇔ Are necessary and sufficient for “scale-free structure”⇔ Imply critically connected “hubs”⇒ Create an Achilles’ heel vulnerability⇒ Yield a zero epidemic threshold for contagion
⇒Are evidence of fundamental self-organization in networks
⇒This self-organization is believed by some to be a universal feature of technological, biological, social and business networks
⇒Efforts to protect complex networks should focus on the most highly-connected components
45
• High degree central “hubs”• From random construction • Poor performance and
robustness
MESSAGE #3: Can construct networks that have the same node degree distribution but are OPPOSITES
otherwise
• Low degree core• Result of design• High performance
and robustness
46
SOX
U. Florida
U. So. Florida
Miss StateGigaPoP
WiscREN
SURFNet
MANLAN
NorthernCrossroads
Mid-AtlanticCrossroads
Drexel U.
NCNI/MCNC
MAGPI
UMD NGIX
Seattle
Sunnyvale
Los Angeles
Houston
DenverKansas
City Indian-apolis
Atlanta
Wash D.C.
Chicago
New York
OARNET
Northern Lights Indiana GigaPoP
MeritU. LouisvilleNYSERNet
U. Memphis
Great Plains
OneNet
U. Arizona
Qwest Labs
CHECS-NETOregon
GigaPoP
Front RangeGigaPoP
Texas Tech
Tulane U.
TexasGigaPoP
LaNetUT Austin
CENIC
UniNet
NISN
PacificNorthwestGigaPoP
U. Hawaii
PacificWave
TransPAC/APAN
Iowa St.
Florida A&MUT-SWMed Ctr.
SINetWPI
Star-Light
IntermountainGigaPoP
Abilene BackbonePhysical Connectivity
(as of August 2004)
0.1-0.5 Gbps0.5-1.0 Gbps1.0-5.0 Gbps5.0-10.0 Gbps
DREN
Jackson St.
NRENUSGS
U. So. Miss.
PSC
DARPABossNet
SFGP/AMPATH
Arizona St.
ESnet
GEANT
North TexasGigaPoP
47
Cisco 750XCisco 12008Cisco 12410
dc1
dc2
dc3
hpr
dc1dc3
hpr
dc2
dc1
dc1 dc2
hpr
hpr
SACOAK
SVL
LAX
SDG
SLOdc1
FRGdc1
FREdc1
BAKdc1
TUSdc1
SOLdc1
CORdc1
hprdc1
dc2
dc3
hpr
OC-3 (155 Mb/s)OC-12 (622 Mb/s)GE (1 Gb/s)OC-48 (2.5 Gb/s)10GE (10 Gb/s)
CENIC Backbone (as of January 2004)
AbileneLos Angeles
AbileneSunnyvale
The Corporation for Education Network Initiatives in California (CENIC) acts as ISP for the state's colleges and universitieshttp://www.cenic.org
Like Abilene, its backbone is a sparsely-connected mesh, with relatively low connectivity and minimal redundancy.• no high-degree hubs?• no Achilles’ heel?
48
Cisco 12000 Series Routers
80 Gbps41/812404
120 Gbps61/412406
200 Gbps101/212410
320 Gbps16Full12416
Switching CapacitySlotsRack sizeChassis
• Modular in design, creating flexibility in configuration.• Router capacity is constrained by the number and speed of line
cards inserted in each slot.
Source: www.cisco.com
4910 0 10 1 10 2Degree
10 -1
100
101
102
103
Ban
dwid
th (G
bps)
15 x 1-port 10 GE
15 x 3-port 1 GE
15 x 4-port OC12
15 x 8-port FE
Technology constraint
Total Bandwidth
Router Technology ConstraintCisco 12416 GSR, circa 2002
high BW low degree
high degree low BW
50
Router Deployment: Abilene and CENIC
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+00 1.E+01 1.E+02 1.E+03Router Degree
Tota
l BW
(Mbp
s)
Abilene T640sJuniper T640 Feasible RegionCENIC 12410sCisco 12410 Feasible RegionCENIC 12008sCisco 12008 Feasible RegionCENIC 750XsCisco 750X Feasible Region
51node degree
101
101
102
100
node
rank
Preferential Attachment
52
SOX
SFGP/AMPATH
U. Florida
U. So. Florida
Miss StateGigaPoP
WiscREN
SURFNet
Rutgers U.
MANLAN
NorthernCrossroads
Mid-AtlanticCrossroads
Drexel U.
U. Delaware
PSC
NCNI/MCNC
MAGPI
UMD NGIX
DARPABossNet
GEANT
Seattle
Sunnyvale
Los Angeles
Houston
DenverKansas
City Indian-apolis
Atlanta
Wash D.C.
Chicago
New York
OARNET
Northern LightsIndiana GigaPoP
MeritU. Louisville
NYSERNet
U. Memphis
Great Plains
OneNetArizona St.
U. Arizona
Qwest Labs
UNM
OregonGigaPoP
Front RangeGigaPoP
Texas Tech
Tulane U.
North TexasGigaPoP
TexasGigaPo
P
LaNet
UT Austin
CENIC
UniNet
WIDE
AMES NGIX
PacificNorthwestGigaPoP
U. Hawaii
PacificWave
ESnet
TransPAC/APAN
Iowa St.
Florida A&MUT-SWMed Ctr.
NCSA
MREN
SINet
WPI
StarLight
IntermountainGigaPoP
Abilene BackbonePhysical Connectivity
0.1-0.5 Gbps0.5-1.0 Gbps1.0-5.0 Gbps5.0-10.0 Gbps
Abilene-inspired
53
MESSAGE #4: Importance of model validation
• Descriptive modeling that replicates statistical features is no more than an exercise in “data fitting”
• Matching given graph statistics should be a by-product and not a main focus of modeling
• Emphasis on “closing the loop” (using complementary measurements and domain expertise)
54
• #1: Specify which aspect of Internet connectivity you are interested in
• #2: Internet connectivity measurements should never be taken at face value
• #3: Be aware of specious analysis of specious measurements
• #4: It is often easy to construct networks that agree with respect to certain graph statistics (e.g., same node degree distribution) but are otherwise completely different
• #5: Importance of model validation
Take-Home Messages
55
Some open Questions• ISPs design/evolve their networks for a purpose (see
talk on Friday)– What is the purpose?– What are the constraints?– Where does randomness enter?
• What does the AS-level Internet as a whole try to achieve?– Objective, constraints, uncertainty?
• What is the purpose of a P2P network like Gnutella?– Objective, constraints, uncertainty?
• What does the Web graph as a whole try to achieve?– Hopeless, rely on randomness as main driver
• What about social networks?– An engineering perspective of social networks?
56
http://hot.caltech.edu/topology.html• L. Li, D. Alderson, J.C. Doyle, W. Willinger. Toward a Theory of Scale-
Free Networks: Definition, Properties, and Implications. Internet Mathematics 2(4), 2006.
• D. Alderson, L. Li, W. Willinger, J.C. Doyle. Understanding Internet Topology: Principles, Models, and Validation. ACM/IEEE Trans. on Networking 13(6), 2005.
• J.C. Doyle, D. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W. Willinger. The "robust yet fragile" nature of the Internet. PNAS 102(41), 2005.
• D. Alderson and W. Willinger. A contrasting look at self-organization in the Internet and next-generation communication networks. IEEE Comm. Magazine. July 2005.
• W. Willinger, D Alderson, J.C. Doyle, and L. Li, More “normal” than Normal: scaling distributions in complex systems. Proc. Winter Simulation Conf. 2004.
• W. Willinger, D Alderson, and L. Li, A pragmatic approach to dealing with high-variability in network measurements, Proc. ACM SIGCOMM IMC 2004.
• L. Li, D. Alderson, W. Willinger, and J. Doyle, A first-principles approach to understanding the Internet’s router-level topology, Proc. ACM SIGCOMM 2004.
Top Related