Estimating and Sampling Graphs with Multidimensional Random...

Estimating and Sampling Graphs with Multidimensional Random Walks

Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu

Motivation ´  Complex Network

Social Network

Reference: www.forbe.com imdevsoftware.wordpress.com

Biological Network

Existing Approaches

´ Random vertex sampling

´ Random edge sampling

´ Random walks 3

Frontier Sampling

´ a new m-dimensional random walk that uses m

dependent random walkers.

Contribution

´  Mitigates the large estimation errors caused by disconnected or

loosely connected components.

´  Shows that the tail of the degree distribution is better estimated

using random edge sampling than random vertex sampling.

´  Presents asymptotically unbiased estimators 5

Definitions Notation Meaning

Gd (V, Ed) A labeled directed graph representing the (original) network graph, where V is a set of vertices and Ed is a set of edges

(u, v) A connection from u to v (a.k.a. edges)

Lv and Le Finite set of vertex and edge labels,

Le(u,v)= ∅ Edge (u, v) is unlabeled

Lv(v)= ∅ Vertex v is unlabeled 6

Vertex V.S. Edge Sampling

Section 4

1.  Mathematical theories and conductions on Random Walk Sampling 2.  Strong Law of Large Numbers 3.  Four estimators will be applied in Section 5 4.  Deficiency of RW 5.  Multiple Independent Random Walkers

Strong Law of Large Numbers

Original statistical Thm:

Weak law:

B Number of RW steps

B*(B) Number of edges in E*

E* Total RW Sampled Edges

First, label edges of interest:

So, the probability of the labelled edges

The estimator based on SLLN

Estimator 1: Edge Label Density

Estimator 2: Assortative Mixing Coefficient (AMC)

Considering directed G, an asymptotically unbiased estimator of AMC:

Covariance

Which two are highly correlated?

Estimator 3:Vertex label Density

Construct an asymptotically unbiased estimator:

Since:

So, this estimator converges to:

Estimator 4: Global Clustering Coefficient

The unbiased estimator by SLLN

Since:

Deficiency of RW from one point

1. “Trapped” inside a subgraph (MSE)

2. Start from non-stationary (non-steady) state (MSE, Bias)

Burn-in period: Discard the non-stationary samples

1. Just decrease error with non-stationary one

2. Discarding in a small sample is not ideal

Multiple Independent Random Walkers come!

Single RW and Multiple RW

Single RW is depend on sample sizes from the estimators provided. Mutiple RW will split the sample sizes into each path. So, error in the total CNMSE

Log-log plot!

Still very high

Why not M-independent RW ?

MIRW is hard to sample m independent vertex with p proportional to their degrees.

Section 5 Motivation

´ We want an m-dimensional random walk that, in steady state, samples edges uniformly at random but, unlike MultipleRW, can benefit from starting its walkers at uniformly sampled vertices.

Frontier Sampling

p = deg(𝑢)/∑𝐿↑▒deg(𝑣)  *1/deg(𝑢) =

Frontier Sampling: A m-dimensional Random Walk

´ The frontier sampling process is equivalent to the sampling process of a single random walker over 𝐺↑𝑚 . (Lemma 5.1) ´ P(selecting a vertex and its outgoing edge in FS) = P(randomly

sampling an edge from e(𝐿↓𝑛 ) in single random walker over 𝐺↑𝑚 ).

FS Steady State v.s Uniform Distribution

´  𝐾↓𝑓𝑠 (𝑚) be a random variable that denotes the number of random walkers in 𝑉↓𝐴  in steady state.

´  Let 𝐾↓𝑢𝑛 (𝑚) be a random variable that denotes the number of sampled vertices, out of m uniformly sampled vertices from V, that belong to 𝑉↓𝐴 .

´  Proving this to be true indicates that the FS algorithm starting with m random walkers at m uniformly sampled vertices approaches the steady state distribution. This means FS benefits from starting its walkers at uniformly sampled vertices by reducing transient of RW.

FS Steady State V.S. Uniform Distribution

´  By definition

´  Theorem 5.2

´  Lemma 5.3

´  Theorem 5.4

MultipleRW Steady State V.S. Uniform Distribution

´  𝐾↓𝑚𝑤 (𝑚) be a random variable that denotes the steady state number of MultipleRW random walkers in 𝑉↓𝐴 .

Note: 𝑑↓𝐴  (average degree of vertices in 𝑉↓𝐴 ) Conclusion: If we initialize m random walkers with uniformly sampled vertices, FS starts closer to steady state than MultipleRW.

Distributed Frontier Sampling

´  Frontier Sampling can also be parallelized.

´  A MultipleRW sampling process where the cost of sampling a vertex v is an exponentially distributed random variable with parameter deg(v) is equivalent to a FS process. (Theorem 5)

Experiment and Result

´ Data: ´ “Flickr”, “Livejournal”, and “YouTube” ´ Barabási-Albert [5] graph

´ Goal: ´ Compare FS to MultipleRW, SingleRM ´ Compare FS on random vertex and edge sampling

´ Result: FS is constantly more accurate

Assortative Mixing Coefficient

In-degree Distribution Estimates

Out-degree Distribution Estimates

In-degree Distribution loosely connected components

´  Barabási-Albert Graph

FS V.S. Stationary MultipleRW & SingleRW

´  MultipleRW and SingleRW start with steady state

FS V.S. Random Independent Sampling

Density of Special Interest Group

Global Clustering Coefficient Estimates

´  Global Clustering Coefficient

a measure of the degree to which nodes in a graph tend to cluster together.

Conclusion

´ In almost all of the tests, FS is better.

Future Work • estimating characteristics of dynamic networks • design of new MCMC-based approximation algorithms

Thank you!

Estimating and Sampling Graphs with Multidimensional Random...

Documents

Transcript of Estimating and Sampling Graphs with Multidimensional Random...

Estimating Multidimensional Poverty and Identifying the Poor in Pakistan… · 2016-08-02 · 1 Research Consortium on Educational Outcomes and Poverty WP10/28 RECOUP Working Paper

Multidimensional digital signal processingnicolls/lectures/eee401f/01_mdsp_slides.pdf · Multidimensional digital signal processing ... The multidimensional discrete Fourier transform

Beyond Triangles: A Distributed Framework for Estimating 3 ...dimakis/3profile_ext.pdfBeyond Triangles: A Distributed Framework for Estimating 3-proﬁles of Large Graphs Ethan R.

Multidimensional Credibility applied to estimating the frequency … · 2018-05-22 · Multidimensional Credibility applied to estimating the frequency of big claims Bühlmann Hans,

Estimating Growth when Content Specifications Change: A Multidimensional IRT Approach Mark D. Reckase Tianli Li Michigan State University.

The multidimensional Itô Integral and the multidimensional ...

Method for estimating cycle lengths from multidimensional ... · Method for estimating cycle lengths from multidimensional time series: Test cases and application to a massive “in

Estimating High-Dimensional Directed Acyclic Graphs with the PC

Annotation Graphs as a Framework for Multidimensional ...

Estimating Multidimensional Poverty and Identifying the ...ceid.educ.cam.ac.uk/publications/WP28-AN-final.pdf · Pakistan forecasts to reduce poverty by 20 per cent in next three

Introducing Multidimensional Translation 2005 – Challenges of Multidimensional Translation: Conference Proceedings Heidrun Geryzmisch-Arbogast 3 multidimensional (i.e. multilingual,

Slide 1-4-1 Section 1-4 Calculating, Estimating, and Reading Graphs.

Reliability for Multidimensional Scale 1 Estimating Reliability ...Reliability for Multidimensional Scale 5 1965), and Maximal Reliability (Li, Rosenthal, & Rubin, 1996). This paper

Brief introduction to multidimensional stochastic dominance · Brief introduction to multidimensional stochastic dominance Introduction Introduction: Multidimensional versus unidimensional

Why Multidimensional Poverty? - Ministerio de Desarrollo ... · Why Multidimensional Poverty? ... UNDP’s Multidimensional Poverty Index ... Natural generalization of FGT to multidimensional

Multidimensional Chromatography in Pesticides Analysiscdn.intechopen.com/pdfs/12953/InTech-Multidimensional... · 2018-09-25 · Multidimensional Chromatography in Pesticides Analysis

Bar Graphs Line Graphs & Picto-Graphs

Graphs. Types of graphs Line graphs Sector graphs Picture graphs Bar graphs Step graphs Column graphs DIFFERENT TYPES OF GRAPHS.

Linear Graphs Linear Graphs Linear Graphs Linear Graphs.

Graph-based analysis of kinetics on multidimensional ...€¦ · multidimensional systems, the graphs are too complex as they are. Coarse-graining procedures have been used to ex-tract