Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social...

CS224W: Social and Information Network AnalysisJure Leskovec Stanford UniversityJure Leskovec, Stanford University

http://cs224w.stanford.edu

10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

[Faloutsos Faloutsos and Faloutsos 1999] [Faloutsos, Faloutsos and Faloutsos, 1999]

Internet domain topology

[Barabasi Albert 1999] [Barabasi‐Albert, 1999]

Power‐gridWeb graphActor collaborations

[Broder Kumar Maghoul Raghavan [Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata, Tomkins, Wiener, 2000]

[Leskovec et al. KDD ‘08]

Take real network plot a histogram of p vs k Take real network plot a histogram of pk vs. k

Flickr socialFlickr social network

n= 584,207, m=3,555,115

[Leskovec et al. KDD ‘08]

Plot the same data on log log axis: Plot the same data on log‐log axis:

Flickr social networknetwork

n= 584,207, m=3,555,115

Degrees are heavily skewed: Degrees are heavily skewed: Distribution P(X>x) is heavy tailed if:

[Clauset‐Shalizi‐Newman 2007]

Power law vs exponential on log log scales Power‐law vs. exponential on log‐log scales

Various names kinds and forms: Various names, kinds and forms: Long tail, Heavy tail, Zipf’s law, Pareto’s law

P(x) is proportional to: P(x) is proportional to:

In social systems – lots of power laws: In social systems – lots of power‐laws: Pareto, 1897 – Wealth distribution L tk 1926 S i tifi t t Lotka 1926 – Scientific output Yule 1920s – Biological taxa and subtaxaZi f 1940 W d f Zipf 1940s – Word frequency Simon 1950s – City populations

Many other quantities follow heavy‐tailed distributions

[Chris Anderson, Wired, 2004]

CMU grad‐students at the G20 meeting in b h

Pittsburgh in Sept 2009

Power‐law degree exponent is g ptypically 2 < < 3 Web graph: in = 2.1, out = 2.4 [Broder et al. 00]

Autonomous systems: = 2 4 [Faloutsos3 99] = 2.4 [Faloutsos , 99]

Actor‐collaborations: = 2.3 [Barabasi‐Albert 00]

Citations to papers: 3 [Redner 98]

Online social networks: Online social networks: 2 [Leskovec et al. 07]

What is the normalizing constant?What is the normalizing constant?P(x) = c x- c=?

What’s the expectation of a power‐law rnd var?p pE[x]=

Power laws: Infinite moments! Power‐laws: Infinite moments! If α ≤ 2 : E[x]= ∞ If ≤ 3 V [ ] If α ≤ 3 : Var[x]=∞

Sample average of n samples form a p g ppower‐law with exponent α:

Estimating from data:Estimating from data:1. Fit a line on log‐log axis

using least squaresBAD!

using least squares

Estimating from data:2. Plot Complementary CDF P(X>x)

Then α=1+α’ where α’ is the slope of P(X>x). E i if P(X ) α th P(X> ) (α 1)Ok E.i., if P(X=x)x-α then P(X> x) x-(α-1)Ok

10/25/2010 20

Estimating power‐law exponent from data:Best

Estimating power law exponent from data:3. Use MLE: = xi is degree of node i

Linear scaleL lLog scale, α=1.75

CCDF, Log scale, α=1.75

CCDF, Log scale, α=1.75, exp cutoff

, exp. cutoff

Not well characterized by the mean:y Avg. U.S. city size: 165k, StdDev=410k If human heights in US would be power‐law: Expect to have 60k as high as 2.72m (world record), 10k people as high as giraffe, 1 person as high as Empire State Building

Can not arise from sums of independent events Recall: in Gnp each pair of nodes in connected independently

ith bwith prob. p X… degree of node v, Xw … event that w links to v X = w Xw, E[xi]= w E[Xw] = (n-1)p Now what is Pr[X=k]? Now what is Pr[X=k]? Central limit theorem: x1,…,xn: rnd. vars with mean , var 2

S = in Xi: E[S ]=n var[S ]=n 2 std dev[S ]= nSn i Xi: E[Sn] n , var[Sn] n , std dev[Sn] n

P[Sn=E[Sn]+X*std.dev.(Sn)] ~ 1/(2) exp(-x2/2)

Random network Scale‐free (power‐law) networkFunction is

l f if

(Erdos‐Renyi random graph)Degree distribution is

scale free if:f(ax) = c f(x)Degree distribution is Binomial

Power‐law

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu Part 1‐2410/25/2010

What is a good model that gives rise to What is a good model that gives rise to power‐law degree distributions?

What is the analog of central limit theorem for power‐laws?for power‐laws?

Preferential attachment Preferential attachment[Price 1965, Albert‐Barabasi 1999]: Nodes arrive in orderNodes arrive in order A new node j creates m out‐links Prob. of linking to a previous node i is g pproportional to its degree di

ddijP )(

New nodes are more likely to link to ynodes that already have high degree

Herbert Simon’s result: Power‐laws arise from “Rich get richer” ( l i d )(cumulative advantage)

Examples [Price 65]: Examples [Price 65]: Citations: new citations of a paper are proportional to the number it already hasproportional to the number it already has

[Mitzenmacher, ‘03]

Pages are created in order 1 2 3 n Pages are created in order 1,2,3,…,n When node j is created it makes a single link to an earlier node i chosen:single link to an earlier node i chosen:1)With prob. p, j links to i chosen uniformly at random (from among all earlier nodes)random (from among all earlier nodes)

2) With prob. 1-p, node j chooses node i uniformly at random and links to the node i points toat random and links to the node i points to.

Note this is same as saying:2)With prob 1-p node j links to node u with prob2)With prob. 1 p, node j links to node u with prob. proportional to du (the degree of u)

Claim: The described model generates Claim: The described model generates networks where the fraction of nodes with degree k scales as:degree k scales as:

)11( )1()( q

i kkdP

where q=1-p

Degree d (t) of node i (i=1 2 n) is a Degree di(t) of node i (i=1,2,…,n) is a continuous quantity and it grows deterministically as a function of time tdeterministically as a function of time t

Analyze d (t) – continuous degree of Analyze di(t) – continuous degree of node i at time t i

Initial condition: Initial condition: di(t)=0, when t=i (i just arrived)

Expected change of di(t) over time:pected c a ge o di(t) o e t e Node i gains an in‐link at step t+1 only if a link from a newly created node t+1 points to it. What’s the prob. of this event? With prob. p node t+1 links to a random node:

li k t i ith b 1/t links to i with prob. 1/t With prob 1-p node t+1 links preferentially: links to i with prob. di(t)/t d1

So: prob. node t+1 links to i is:

tp i)1(1

dd ii 1dt

We know: d (i)=0 We know: di(i)=0

1)(i iqtd

What is F(d) the fraction of nodes that has What is F(d) the fraction of nodes that has degree at least d at time t?

i dpqtid

There are t nodes total at time t so F(d):

dF 1)(

What is the fraction of nodes with degree What is the fraction of nodes with degree exactly d? Take derivative of F(d): Take derivative of F(d):

1111111)('

Two changes from the Gnp modelg np The network grows Preferential attachment

Do we need both? Yes! If we just add growth to Gnp (p=1):

Hn…n‐thharmonic number:p

xj = degree of node j at the end Xj(u)= 1 if u links to j, else 0 (j+1)+ (j+2)+ + ( ) xj = xj(j+1)+xj(j+2)+…+xj(n) E[xj(u)] = P[u links to j]= 1/(u-1) E[xj] = 1/(u-1) = 1/j + 1/(j+1)+…+1/(n-1) = Hn-1 – Hj[ j] ( ) j (j ) ( ) n-1 j E[xj] = log(n-1) – log(j) = log((n-1)/j) NOT (n/j)

7/2/2009 Jure Leskovec, Stanford CS322: Network Analysis 36

Preferential attachment gives power‐law Preferential attachment gives power‐law degrees

Intuitively reasonable process Intuitively reasonable process Can tune p to get the observed exponent On the web P[node has degree d] d-2 1 On the web, P[node has degree d] ~ d 2.1

2.1 = 1+1/(1-p) p ~ 0.1

Preferential attachment is not so good at Preferential attachment is not so good at predicting network structure Age‐degree correlationAge degree correlation Links among high degree nodes On the web nodes sometime avoid linking to each otherg

Further questions: What is a reasonable probabilistic model for how people sample through web‐pages and link to them? Short+Random walksEff t f h i hi b d b f Effect of search engines – reaching pages based on number of links to them

Preferential attachment is a key ingredient Preferential attachment is a key ingredient Extensions: Early nodes have advantage: node fitness Early nodes have advantage: node fitness Geometric preferential attachment

Copying model [Kleinberg et al ]: Copying model [Kleinberg et al.]: Picking a node proportional to the degree is same as pickingthe degree is same as picking an edge at random (pick node and then it’s neighbor)and then it s neighbor)

6/14/2009 Jure Leskovec, ICML '09 39

We observe how the connectivity (length of the paths) of the network changes as the vertices get removed g[Albert et al. 00; Palmer et al. 01]

Vertices can be removed: Uniformly at random In order of decreasing degreeIn order of decreasing degree

It is important for epidemiology Removal of vertices corresponds to p

vaccination

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu10/25/2010 40

Real‐world networks are resilient to random attacks One has to remove all web‐pages of degree > 5 to disconnect the webOne has to remove all web pages of degree > 5 to disconnect the web But this is a very small percentage of web pages

Random network has better resilience to targeted attacks Random networkInternet (Autonomous systems)

Preferentialremoval

Random network( y )

ean Random

removal

Fraction of removed nodes Fraction of removed nodesJure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu10/25/2010 41

Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social...

Documents

Transcript of Stanford University ://snap.stanford.edu/class/cs224w-2010/slides/11-power_laws_… · In social...

CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2011/slides/11-powerlaws.pdf · random (from among all earlier nodes) 2) With prob. 1-p, node

CS224W Project Report: Finding Top UI/UX Design Talent …snap.stanford.edu/class/cs224w-2014/projects2014/cs224w-25-final.pdf · CS224W Project Report: Finding Top UI/UX Design Talent

Network Analysis of Peer‐to‐Peer Lending Networks ...snap.stanford.edu/class/cs224w-2014/projects2014/cs224w-50-final.pdf · Network Analysis of Peer‐to‐Peer Lending Networks

Modeling Ebola using a Macro and Microlevel Network 1 …snap.stanford.edu/class/cs224w-2014/projects2014/cs224w-46-final.pdf · Modeling Ebola using a Macro and Microlevel Network

CS224W: Analysis of Networks Jure Leskovec, ...snap.stanford.edu/class/cs224w-2017/slides/19-node2vec.pdf · rounds of optimization are required. This parameter is set to unity for

The Hacker’s Code: Finding Bitcoin Thieves Through …snap.stanford.edu/class/cs224w-2013/projects2013/cs224w...Stanford CS 224W Fall 2013 - Team 39C - S 224W Final Report 1 The

Predicting Yelp Star Reviews Based on Network Structure with …snap.stanford.edu/class/cs224w-2017/projects/cs224w-33... · 2020. 1. 9. · Predicting Yelp Star Reviews Based on

Facebook User Networks Based on Information Retrieval ...snap.stanford.edu/class/cs224w-2014/projects2014/cs224w...Facebook User Networks Based on Information Retrieval from Media

CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2013/slides/17-overlapping.pdfFind maximal-cliques (not k-cliques!) Clique overlap graph: Each

CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2012/slides/02-gnp.pdfIntro sessions to SNAP and NetworkX: SNAP: Today. Thu 9/27, 6-7:30pm in

Graph Convolutional Networks to explore Drug and Disease ...snap.stanford.edu/class/cs224w-2017/projects/cs224w-41-final.pdfdiscover proteins related to a particular disease. They

Social Coding: Evaluating Github’s Network using Weighted ...snap.stanford.edu/class/cs224w-2012/projects/cs224w-052-final.pdf · also incorporate social media features such as

Genetic Algorithm Approach to Optimizing the Air Traffic ...snap.stanford.edu/class/cs224w-2017/projects/cs224w-36-final.pdf · Genetic Algorithm Approach to Optimizing the Air Traffic

CS224W: Social and Information Network Analysis Jure ...snap.stanford.edu/class/cs224w-2014/slides/01-intro.pdf · 9/23/2014 Jure Leskovec, Stanford CS224W: Social and Information

CS224W Project Final Report Political Blog Leaning ...snap.stanford.edu/class/cs224w-2012/projects/cs224w-013-final.pdfBlogs and Politics L. Adamic and N. Glance analyze the political

Cancer drug target discovery in protein-protein ...snap.stanford.edu/.../cs224w-2015/...discovery_in_protein_protein_interaction_networks.pdfCancer drug target discovery in protein-protein

Supervised Random Walks on Homogeneous Projections of ...snap.stanford.edu/class/cs224w-2011/proj/skurian_Finalwriteup_v2.pdf · recommendations, anomaly detection, missing link,

Searching the Web - SNAP: Stanford Network …snap.stanford.edu/class/cs224w-readings/Arasu01Websearch.pdfKeywords: Search engine, crawling, indexing, link analysis, PageRank, HITS,

Structural Balance, Mechanical Solidarity, and …snap.stanford.edu/class/cs224w-readings/davis63balance.pdf · STRUCTURAL BALANCE, MECHANICAL SOLIDARITY, AND INTERPERSONAL RELATIONS

CS224W: Social and Information Network Analysis Jure Leskovec, …snap.stanford.edu/class/cs224w-2013/slides/05-evals.pdf · 2020. 1. 9. · Structural balance, Theory of status Independent