Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
-
Upload
shawn-cunningham -
Category
Documents
-
view
220 -
download
0
Transcript of Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
![Page 1: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/1.jpg)
Network Characterization via Random Walks
B. Ribeiro, D. TowsleyUMass-Amherst
![Page 2: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/2.jpg)
Problem
Given large, possibly dynamic, network, how does one efficiently sample/crawl to accurately characterize it?
degree distribution centrality clustering …
![Page 3: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/3.jpg)
Motivation
understanding technological networks, social networks Internet, wireless networks on-line social networks such as FaceBook,
MySpace, Orkut, YouTube, …
when network dataset not available size, lack of global view, dynamics
![Page 4: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/4.jpg)
Outline
review of sampling
random walks (RWs)
multiple coupled RWs
results
![Page 5: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/5.jpg)
Sampling methods
random sampling uniform vertex sampling
• θi - fraction of vertices with degree i
• degree i vertex sampled with probability θi
uniform edge sampling• πi - probability degree i vertex sampled
• πi = θi x i / <average degree>
crawling snowball sampling – commonly used, highly
biased random walk
![Page 6: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/6.jpg)
6
Estimate θi - fraction of vertices with degree i
Budget: B samples accuracy: Normalized root Mean Squared
Error
uniform vertex
uniform edge
Random sampling: accuracy of estimates
head: GOOD tail: BAD
q head: BAD
q tail: GOOD
![Page 7: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/7.jpg)
NM
SE
in-degree
Uniform vertex vs. edge sampling
edge
vertex
head: GOOD tail: BAD
GO
OD
head: BAD tail: GOOD
BA
D
Flickr graph (1.7 M vertices, 22M
edges)
budget: B = |V|/100
![Page 8: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/8.jpg)
8
uniform vertex
Pros: independent sampling OSN needs numeric
user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...
Cons: resource intensive
(sparse user ID space) difficult to sample
large degree vertices
Pros & Consuniform edge
Pros:◦ independent sampling◦ easy to sample high
degree vertices
Cons:◦ no public OSN interface
to sample edges
![Page 9: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/9.jpg)
9
start at node v randomly select a neighbor of v repeat till collected B samples
sampling with replacement
Random walk (RW)
![Page 10: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/10.jpg)
Random walk sampling produces biased
estimate iRW
of i
easily corrected
iRW
= i i /avg. degree
i = Norm iRW
/iCCDF
RW sampling^ ^
![Page 11: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/11.jpg)
11
uniform vertex
Pros: independent sampling OSN needs numeric
user IDs. E.g.: Livejournal, Flickr, MySpace, Facebook,...
Cons: resource intensive
(sparse user ID space) difficult to sample
large degree vertices
Pros & Consrandom walk
Pros: asymptotically unbiased easy to sample high
degree vertices low cost resource-wise
Cons: graph must be
connected large estimation errors
when graph loosely connected
length of transient?
![Page 12: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/12.jpg)
12
uniform vertex samples A and C subgraphs but is expensive
RW samples A or C but is cheap
A
C
Combine advantages of
uniform vertex & RWs?
Hybrid sampling
![Page 13: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/13.jpg)
Multiple random walks
m independent uniformly placed RWs split budget B among
them
Pros cover all components whp as m increases
Cons bias due to transient difficult to combine estimates
Couple the RWs?
![Page 14: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/14.jpg)
14
m coupled walkers
B – sampling budget
S = {v1, … , vm} initial set of m vertices; E’ =
(1) start from vr S w.p. deg(vr)
(2) walk one step from vr
(3) add walked edge to E’ and update vr
(4) return to (1) (until m + | E’ | = B)
Frontier Sampling (FS)
![Page 15: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/15.jpg)
Random walk on Gm
At steady state
samples edges uniformlyas m → , walkers uniformly distributed in
graph m coupled RWs start approximately in
steady state short transient
15
FS properties
![Page 16: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/16.jpg)
16
Sample paths for θ1 estimate (Flickr graph)
Plot evolution (n) , n - number of steps
![Page 17: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/17.jpg)
17
large connected component of Flickr graph
accuracy metric: NMSE of CCDF
Sampling errors
in-degree
NM
SE
![Page 18: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/18.jpg)
18
2 Albert-Barabasi graphs with average degrees 2, 10, connected by one edge
Sampling errors: GAB graph
in-degree
NM
SE
![Page 19: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/19.jpg)
20
m independent walkers walker i takes next step with
exponentially distributed time, mean current node degree
walkers run for time T, report to central site
Distributed FS
![Page 20: Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.](https://reader036.fdocuments.net/reader036/viewer/2022062518/56649e9f5503460f94ba0f1a/html5/thumbnails/20.jpg)
Future work analyzing, speeding up convergence
other forms of coupling other graph statistics study how graph structure affects
sampling efficiency power law vs exponential tail spatial correlation, independence vs. SRD
vs. LRD application to different networks
wireless, social, wireless/social