1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 1 March 19, 2006 .
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006 .
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006 .
1
Algorithms for Large Data Sets
Ziv Bar-YossefLecture 7
May 14, 2006
http://www.ee.technion.ac.il/courses/049011
3
Outline
Power laws The preferential attachment model Small-world networks The Watts-Strogatz model
4
Observed Phenomena
Few multi-billionaires, but many with modest income [Pareto, 1896]
Few frequent words, but many infrequent words [Zipf, 1932]
Few “mega-cities” but many small towns [Zipf, 1949]
Few web pages with high degree, but many with low degree [Kumar et al, 99] [Barabási & Albert, 99]
All the above obey power laws.
5
Power Law (Pareto) Distribution
> 0: shape parameter (“slope”) k > 0: location parameter Ex: (k = $1000, = 2)
1/100 earn ≥ $10,0001/10,000 earn ≥ $100,0001/1,000,000 earn ≥ $1,000,000
8
Scale-Free Distributions
Power laws are invariant to scaleEx: (k = arbitrary, = 2)
1/100 earn ≥ 10k 1/10,000 earn ≥ 100k 1/1,000,000 earn ≥ 1000k
9
Heavy Tailed Distributions
In many “classical” distributions
Ex: normal, exponential
In power law distributions
“heavy tail”
“light tail”
10
Zipf’s Law
Size of r-th largest city is Equivalent to a power law:
X = size of a city Change variables:
11
Power Laws and the Internet
Web Graph In- and out-degrees (in slope: ~2.1, out slope: ~2.7)
[Kumar et al. 99, Barabási & Albert 99, Broder et al 00] Sizes of connected components [Broder et al 00] Website sizes [Huberman & Adamic 99]
Internet graph Degrees [Faloutsos3 99] Eigenvalues [Mihail & Papadimitriou 02]
Traffic Number of visits to websites
12
Power Laws and Graphs
If X is a random web page, then
What random graph model explains this phenomenon?
13
Erdős-Rényi Random Graphs
Gn,p
n: size of the graph (fixed)p: edge existence probability (fixed):
Every pair u,v is connected by an edge with probability p.
Theorem [Erdős & Rényi, 60]
For any node x in Gn,p,
14
Preferential Attachment [Barabási & Albert 99] A novel random graph model
Initialization: graph starts with a single node with two self loops.
Growth: At every step a new node v is added to the graph. v has a self loop and connects to one neighbor.
Preferential attachment: v connects to u with probability
The rich get richer / The winner takes it all
16
Why Does it Work? (2)
Fact: After sufficiently many steps, reaches a “steady state”.
ck = value of at the steady state. Since at steady state, Hence,
Therefore:
18
Six Degrees of Separation[Stanley Milgram, 67] “Random starters” at Nebraska, Kansas,
etc. Destinations: in Boston Intermediaries send postcards to Milgram Findings: average of 6 postcards “Conclusion”: every two people in the US
are connected by a path of length ~ 6
19
Small-World Networks
Average diameter: length of shortest path from u to v, averaged over all pairs u,v
Clustering coefficient: fraction of neighbors of v that are neighbors of each other, averaged over all v
Small-world network: a sparse graph with average diameter O(log n) and a constant clustering coefficient
20
The Web as a Small World Network
Low diameter Study of a synthetic web graph model [Albert, Jeong,
Barabási 99] Average diameter of the Web is ~19 Grows logarithmically with size of the Web.
Study of a large crawl [Broder et al 00] Average diameter of the SCC is ~ 16 Maximum diameter of the SCC is ≥ 28
Diameter of host graph [Adamic 99] Average diameter of SCC: ~4
High clustering coefficient Clustering coefficient of host graph [Adamic 99]
Clustering coefficient: ~0.08 (compared to 0.001 in a comparable random graph)
21
Model for Small-World Networks[Watts & Strogatz 98]
One extreme: random networks Low diameter Low clustering coefficient
Other extreme: “regular” networks (e.g., a lattice) High clustering coefficient High diameter
Small-world: interpolation between the two Low diameter High clustering coefficient Regularity: social networking Randomness: individual interests
22
Random Network
The model: n vertices Every pair u,v is connected by an edge
with probability p = d/n
Properties: Expected number of edges: ~dn Graph is connected w.h.p Diameter: O(log n) w.h.p. Clustering coefficient: ~ p = d/n = o(1)
23
Ring Lattice
The model: n vertices on a circle Every vertex has d neighbors: the d/2
vertices to its right and the d/2 vertices to its left
Properties: Number of edges: dn/2 Graph is connected Diameter: O(n/d) Clustering coefficient:
24
Random Rewiring
Start from a ring lattice for i = 1 to d/2 do
for v = 1 to n do Pick i-th clockwise nearest neighbor of v With probability p, replace this neighbor by a random vertex
25
Analysis
If p = 0, ring lattice High clustering coefficient High diameter
If p = 1, random network Logarithmic diameter Low clustering coefficient
However, Diameter goes down rapidly as p grows Clustering coefficient goes down slowly as p grows
Therefore, for small p, we get a small-world network. Logarithmic diameter High clustering coefficient