The Logarithmic Dimension Hypothesis
description
Transcript of The Logarithmic Dimension Hypothesis
![Page 1: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/1.jpg)
Log Dimension Hypothesis 1
The Logarithmic Dimension Hypothesis
Anthony BonatoRyerson University
MITACS International Problem Solving Workshop
July 2012
![Page 2: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/2.jpg)
Log Dimension Hypothesis 2
Workshop team• David Gleich (Purdue)
• Dieter Mitsche (Ryerson)
• Stephen Young (UCSD)
• Myunghwan Kim (Stanford)
• Amanda Tian (York)
![Page 3: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/3.jpg)
Log Dimension Hypothesis 3
Friendship networks• network of friends (some real, some virtual) form
a large web of interconnected links
![Page 4: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/4.jpg)
Log Dimension Hypothesis 4
6 degrees of separation
• (Stanley Milgram, 67): famous chain letter experiment
![Page 5: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/5.jpg)
Log Dimension Hypothesis 5
6 Degrees in Facebook?• 900 million users, > 70
billion friendship links• (Backstrom et al., 2012)
– 4 degrees of separation in Facebook
– when considering another person in the world, a friend of your friend knows a friend of their friend, on average
• similar results for Twitter and other OSNs
![Page 6: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/6.jpg)
Log Dimension Hypothesis 6
Complex Networks• web graph, social networks, biological networks, internet
networks, …
![Page 7: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/7.jpg)
Log Dimension Hypothesis 7
The web graph
• nodes: web pages
• edges: links
• over 1 trillion nodes, with billions of nodes added each day
![Page 8: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/8.jpg)
Log Dimension Hypothesis 8
On-line Social Networks (OSNs)Facebook, Twitter, LinkedIn, Google+…
![Page 9: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/9.jpg)
Log Dimension Hypothesis 9
Key parameters• degree distribution:
• average distance:
• clustering coefficient:
|})deg(:)({| , iuGVuN ni
)(,
1
2),()(
GVvu
nvudGL
)(
1-1
)()( ,2
)deg(|))((| )(
GVxxcnGC
xxNExc
![Page 10: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/10.jpg)
Log Dimension Hypothesis 10
Properties of Complex Networks• power law degree distribution
(Broder et al, 01)
2 some ,, bniN bni
![Page 11: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/11.jpg)
Power laws in OSNs (Mislove et al,07):
Log Dimension Hypothesis 11
![Page 12: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/12.jpg)
Log Dimension Hypothesis 12
Small World Property• small world networks
(Watts & Strogatz,98)– low distances
• diam(G) = O(log n)• L(G) = O(loglog n)
– higher clustering coefficient than random graph with same expected degree
![Page 13: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/13.jpg)
Log Dimension Hypothesis 13
Sample data: Flickr, YouTube, LiveJournal, Orkut
• (Mislove et al,07): short average distances and high clustering coefficients
![Page 14: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/14.jpg)
Log Dimension Hypothesis 14
• (Zachary, 72)
• (Mason et al, 09)
• (Fortunato, 10)
• (Li, Peng, 11): small community property
Community structure
![Page 15: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/15.jpg)
Log Dimension Hypothesis 15
(Leskovec, Kleinberg, Faloutsos,05):• densification power law: average degree is
increasing with time• decreasing distances
• (Kumar et al, 06): observed in Flickr, Yahoo! 360
![Page 16: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/16.jpg)
Log Dimension Hypothesis 16
Geometry of OSNs?
• OSNs live in social space: proximity of nodes depends on common attributes (such as geography, gender, age, etc.)
• IDEA: embed OSN in 2-, 3- or higher dimensional space
![Page 17: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/17.jpg)
Log Dimension Hypothesis 17
Dimension of an OSN• dimension of OSN: minimum number of
attributes needed to classify nodes
• like game of “20 Questions”: each question narrows range of possibilities
• what is a credible mathematical formula for the dimension of an OSN?
![Page 18: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/18.jpg)
Log Dimension Hypothesis 18
Geometric model for OSNs• we consider a geometric
model of OSNs, where– nodes are in m-
dimensional Euclidean space
– threshold value variable: a function of ranking of nodes
![Page 19: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/19.jpg)
Log Dimension Hypothesis 19
Geometric Protean (GEO-P) Model(Bonato, Janssen, Prałat, 12)
• parameters: α, β in (0,1), α+β < 1; positive integer m• nodes live in m-dimensional hypercube (torus metric)• each node is ranked 1,2, …, n by some function r
– 1 is best, n is worst – we use random initial ranking
• at each time-step, one new node v is born, one randomly node chosen dies (and ranking is updated)
• each existing node u has a region of influence with volume
• add edge uv if v is in the region of influence of u nr
![Page 20: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/20.jpg)
Log Dimension Hypothesis 20
Notes on GEO-P model
• models uses both geometry and ranking• number of nodes is static: fixed at n
– order of OSNs at most number of people (roughly…)
• top ranked nodes have larger regions of influence
![Page 21: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/21.jpg)
Log Dimension Hypothesis 21
Simulation with 5000 nodes
![Page 22: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/22.jpg)
Log Dimension Hypothesis 22
Simulation with 5000 nodes
random geometric GEO-P
![Page 23: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/23.jpg)
Log Dimension Hypothesis 23
Properties of the GEO-P model (Bonato, Janssen, Prałat, 2012)
• a.a.s. the GEO-P model generates graphs with the following properties:– power law degree distribution with exponent
b = 1+1/α– average degree d = (1+o(1))n(1-α-β)/21-α
• densification– diameter D = O(nβ/(1-α)m log2α/(1-α)m n)
• small world: constant order if m = Clog n– clustering coefficient larger than in comparable
random graph
![Page 24: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/24.jpg)
Log Dimension Hypothesis 24
Spectral properties• the spectral gap λ of G is defined by the
difference between the two largest eigenvalues of the adjacency matrix of G
• for G(n,p) random graphs, λ tends to 0 as order grows
• in the GEO-P model, λ is close to 1• (Estrada, 06): bad spectral expansion in real
OSN data
![Page 25: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/25.jpg)
Log Dimension Hypothesis 25
Dimension of OSNs
• given the order of the network n, power law exponent b, average degree d, and diameter D, we can calculate m
• gives formula for dimension of OSN:
Dn
nd
bb
Dnm
loglog
loglog
211
loglog
![Page 26: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/26.jpg)
Log Dimension Hypothesis 26
6 Dimensions of Separation
OSN DimensionFacebook 7YouTube 6Twitter 4Flickr 4
Cyworld 7
![Page 27: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/27.jpg)
Log Dimension Hypothesis 27
Uncovering the hidden reality• reverse engineering approach
– given network data (n, b, d, D), dimension of an OSN gives smallest number of attributes needed to identify users
• that is, given the graph structure, we can (theoretically) recover the social space
![Page 28: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/28.jpg)
Log Dimension Hypothesis 28
Logarithmic Dimension Hypothesis
• Logarithmic Dimension Hypothesis (LDH): the dimension of an OSN is best fit by about log n, where n is the number of users OSN– theoretical evidence GEO-P and MAG
(Leskovec, Kim,12) models–empirical evidence? – (Sweeney, 2001)
![Page 29: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/29.jpg)
Log Dimension Hypothesis 29
Experimental design• supervised machine learning
– Alternating Decision Trees (ADT)– approach of (Janssen et al, 12+) based on earlier
work on PIN by (Middendorf et al, 05)• classify OSN data vs simulated graphs from GEO-P
model in various dimensions• develop a feature vector (graphlets, degree distribution
percentiles, average distance, etc) to classify the correct dimension
• ADT will classify which dimension best fits the data – cross-validation and robustness testing
![Page 30: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/30.jpg)
Log Dimension Hypothesis 30
Example
![Page 31: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/31.jpg)
Log Dimension Hypothesis 31
• preprints, reprints, contact:search: “Anthony Bonato”
![Page 32: The Logarithmic Dimension Hypothesis](https://reader035.fdocuments.net/reader035/viewer/2022081520/568168aa550346895ddf4a7e/html5/thumbnails/32.jpg)
Log Dimension Hypothesis 32