Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social...

41
CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University http://cs224w.stanford.edu

Transcript of Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social...

Page 1: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

CS224W: Social and Information Network AnalysisJure Leskovec  Stanford UniversityJure Leskovec, Stanford University

http://cs224w.stanford.edu

Page 2: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

Page 3: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Faloutsos Faloutsos and Faloutsos 1999] [Faloutsos, Faloutsos and Faloutsos, 1999]

Internet domain topology

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

Internet domain topology

Page 4: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Barabasi Albert 1999] [Barabasi‐Albert, 1999]

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

Power‐gridWeb graphActor collaborations

Page 5: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Broder Kumar Maghoul Raghavan [Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, Wiener, 2000]

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

Page 6: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Leskovec et al. KDD ‘08]

Take real network plot a histogram of p vs k Take real network plot a histogram of pk vs. k

Flickr socialFlickr social network

n= 584,207, m=3,555,115

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

Page 7: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Leskovec et al. KDD ‘08]

Plot the same data on log log axis: Plot the same data on log‐log axis:

Flickr social networknetwork

n= 584,207, m=3,555,115

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

Page 8: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Degrees are heavily skewed: Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if:

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

Page 9: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

Power law vs exponential on log log scales Power‐law vs. exponential on log‐log scales

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

Page 10: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

Various names kinds and forms: Various names, kinds and forms: Long tail, Heavy tail, Zipf’s law, Pareto’s law

P(x) is proportional to: P(x) is proportional to:

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

Page 11: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

In social systems – lots of power laws: In social systems – lots of power‐laws: Pareto, 1897 – Wealth distribution L tk 1926 S i tifi t t Lotka 1926 – Scientific output Yule 1920s – Biological taxa and subtaxaZi f 1940 W d f Zipf 1940s – Word frequency Simon 1950s – City populations

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

Page 12: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

Many other quantities follow heavy‐tailed distributions

Page 13: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Chris Anderson, Wired, 2004]

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

Page 14: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

CMU grad‐students at the G20 meeting in b h

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

Pittsburgh  in Sept 2009

Page 15: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Power‐law degree exponent is g ptypically 2 <  < 3 Web graph: in = 2.1, out = 2.4 [Broder et al. 00]

Autonomous systems: = 2 4 [Faloutsos3 99] = 2.4 [Faloutsos , 99]

Actor‐collaborations: = 2.3 [Barabasi‐Albert 00]

Citations to papers: 3 [Redner 98]

Online social networks: Online social networks: 2 [Leskovec et al. 07]

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

Page 16: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

What is the normalizing constant?What is the normalizing constant?P(x) = c x- c=?

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Page 17: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

What’s the expectation of a power‐law rnd var?p pE[x]=

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Page 18: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Power laws: Infinite moments! Power‐laws: Infinite moments! If α ≤ 2 : E[x]= ∞ If ≤ 3 V [ ] If α ≤ 3 : Var[x]=∞

Sample average of n samples form a p g ppower‐law with exponent α:

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Page 19: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

Estimating from data:Estimating  from data:1. Fit a line on log‐log axis 

using least squaresBAD!

using least squares

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

Page 20: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

Estimating  from data:2. Plot Complementary CDF P(X>x)

Then α=1+α’ where α’ is the slope of P(X>x). E i if P(X ) α th P(X> ) (α 1)Ok E.i., if P(X=x)x-α then P(X> x) x-(α-1)Ok

10/25/2010 20

Page 21: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Clauset‐Shalizi‐Newman 2007]

Estimating power‐law exponent from data:Best

Estimating power law exponent  from data:3. Use MLE:   = xi is degree of node i

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

Page 22: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Linear scaleL lLog scale, α=1.75

CCDF, Log scale, α=1.75

CCDF, Log scale, α=1.75, exp  cutoff

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

, exp. cutoff

Page 23: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Not well characterized by the mean:y Avg. U.S. city size: 165k, StdDev=410k If human heights in US would be power‐law: Expect to have 60k as high as 2.72m (world record),  10k people as high as giraffe, 1 person as high as Empire State Building

Can not arise from sums of independent events Recall: in Gnp each pair of nodes in connected independently 

ith bwith prob. p X… degree of node v, Xw … event that w links to v X = w Xw, E[xi]= w E[Xw] = (n-1)p Now what is Pr[X=k]? Now what is Pr[X=k]? Central limit theorem: x1,…,xn: rnd. vars with mean , var 2

S = in Xi: E[S ]=n var[S ]=n 2 std dev[S ]= nSn i Xi: E[Sn] n , var[Sn] n , std dev[Sn] n

P[Sn=E[Sn]+X*std.dev.(Sn)] ~ 1/(2) exp(-x2/2)

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

Page 24: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Random network Scale‐free (power‐law) networkFunction is 

l  f  if

(Erdos‐Renyi random graph)Degree distribution is 

scale free if:f(ax) = c f(x)Degree distribution is Binomial

Power‐law

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1‐2410/25/2010

Page 25: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

What is a good model that gives rise to What is a good model that gives rise to power‐law degree distributions?

What is the analog of central limit theorem for power‐laws?for power‐laws?

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

Page 26: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Preferential attachment Preferential attachment[Price 1965, Albert‐Barabasi 1999]: Nodes arrive in orderNodes arrive in order  A new node j creates m out‐links Prob. of linking to a previous node i is g pproportional to its degree di

d

k

i

ddijP )(

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

k

Page 27: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

New nodes are more likely to link to ynodes that already have high degree

Herbert Simon’s result: Power‐laws arise from “Rich get richer” ( l i d )(cumulative advantage)

Examples [Price 65]: Examples [Price 65]: Citations: new citations of a paper are proportional to the number it already hasproportional to the number it already has

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

Page 28: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

[Mitzenmacher, ‘03]

Pages are created in order 1 2 3 n Pages are created in order 1,2,3,…,n When node j is created it makes a single link to an earlier node i chosen:single link to an earlier node i chosen:1)With prob. p, j links to i chosen uniformly at random (from among all earlier nodes)random (from among all earlier nodes)

2) With prob. 1-p, node j chooses node i uniformly at random and links to the node i points toat random and links to the node i points to.

Note this is same as saying:2)With prob 1-p node j links to node u with prob2)With prob. 1 p, node j links to node u with prob. proportional to du (the degree of u)

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

Page 29: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Claim: The described model generates Claim: The described model generates networks where the fraction of nodes with degree k scales as:degree k scales as:

)11( )1()( q

i kkdP

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

where q=1-p

Page 30: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Degree d (t) of node i (i=1 2 n) is a Degree di(t) of node i (i=1,2,…,n) is a continuous quantity and it grows deterministically as a function of time tdeterministically as a function of time t

Analyze d (t) – continuous degree of Analyze di(t) – continuous degree of node i at time t i

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30

Page 31: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Initial condition: Initial condition: di(t)=0, when t=i (i just arrived)

Expected change of di(t) over time:pected c a ge o di(t) o e t e Node i gains an in‐link at step t+1 only if a link from a newly created node t+1 points to it. What’s the prob. of this event? With prob. p node t+1 links to a random node: 

li k t i ith b 1/t links to i with prob. 1/t With prob 1-p node t+1 links preferentially: links to i with prob. di(t)/t d1

So: prob. node t+1 links to i is:

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 31

tdp

tp i)1(1

Page 32: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

dd ii 1dt

qt

pt

ii d

1

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32

pAtq

td qi

1)(

Page 33: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

We know: d (i)=0 We know: di(i)=0

1)(

qtptd

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 33

1)(i iqtd

Page 34: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

What is F(d) the fraction of nodes that has What is F(d) the fraction of nodes that has degree at least d at time t?

1qq

i dpqtid

it

qptd

1

11)(

There are t nodes total at time t so F(d):

piq

( )

qdqdF

1

1)(

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 34

dp

dF 1)(

Page 35: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

What is the fraction of nodes with degree What is the fraction of nodes with degree exactly d? Take derivative of F(d): Take derivative of F(d):

q

11111

pqd

pq

pdF

q

1111111)('

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35

Page 36: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Two changes from the Gnp modelg np The network grows Preferential attachment

Do we need both? Yes! If we just add growth to Gnp (p=1):

Hn…n‐thharmonic number:p

xj = degree of node j at the end  Xj(u)= 1 if u links to j, else 0 (j+1)+ (j+2)+ + ( ) xj = xj(j+1)+xj(j+2)+…+xj(n) E[xj(u)] = P[u links to j]= 1/(u-1) E[xj] = 1/(u-1) = 1/j + 1/(j+1)+…+1/(n-1) = Hn-1 – Hj[ j] ( ) j (j ) ( ) n-1 j E[xj] = log(n-1) – log(j) = log((n-1)/j) NOT (n/j)

7/2/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Page 37: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Preferential attachment gives power‐law Preferential attachment gives power‐law degrees

Intuitively reasonable process Intuitively reasonable process Can tune p to get the observed exponent On the web P[node has degree d] d-2 1 On the web, P[node has degree d] ~ d 2.1

2.1 = 1+1/(1-p) p ~ 0.1

7/2/2009 Jure Leskovec, Stanford CS322: Network Analysis 37

Page 38: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Preferential attachment is not so good at Preferential attachment is not so good at predicting network structure Age‐degree correlationAge degree correlation Links among high degree nodes On the web nodes sometime avoid linking to each otherg

Further questions: What is a reasonable probabilistic model for how people sample through web‐pages and link to them? Short+Random walksEff t f h i hi b d b f Effect of search engines – reaching pages based on number of links to them

7/2/2009 Jure Leskovec, Stanford CS322: Network Analysis 38

Page 39: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Preferential attachment is a key ingredient Preferential attachment is a key ingredient Extensions: Early nodes have advantage: node fitness Early nodes have advantage: node fitness Geometric preferential attachment

Copying model [Kleinberg et al ]: Copying model [Kleinberg et al.]: Picking a node proportional to the degree is same as pickingthe degree is same as picking an edge at random (pick node and then it’s neighbor)and then it s neighbor)

6/14/2009 Jure Leskovec, ICML '09 39

Page 40: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

We observe how the connectivity (length of the paths) of the network changes as the vertices get removed g[Albert et al. 00; Palmer et al. 01]

Vertices can be removed: Uniformly at random In order of decreasing degreeIn order of decreasing degree

It is important for epidemiology Removal of vertices corresponds to p

vaccination

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu10/25/2010 40

Page 41: Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social systems – lots of power‐laws: Pareto, 1897 –Wealth distribution LtkLotka1926–SitifiScientific

Real‐world networks are resilient to random attacks One has to remove all web‐pages of degree > 5 to disconnect the webOne has to remove all web pages of degree > 5 to disconnect the web But this is a very small percentage of web pages

Random network has better resilience to targeted attacks Random networkInternet (Autonomous systems)

h

Preferentialremoval

Random network( y )

path

leng

thM

ean Random

removal

Fraction of removed nodes Fraction of removed nodesJure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu10/25/2010 41