The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.

Post on 11-Dec-2015

222 views 4 download

Transcript of The Theory of Zeta Graphs with an Application to Random Networks Christopher Ré Stanford.

The Theory of Zeta Graphs with an Application to Random Networks

Christopher RéStanford

Social Network Data

Social network data is ubiquitous and high value.

Since 2000, many studies of the dynamics of these graphs, Watts-Strogatz, Preferential Attachment, etc.

Design new random graph models to capture some new aspect of an observed network.

Above is not the goal of this work…

What’s the matter with Erdös-Rényi?

G(N,p) does not match real-world graphs (degree distribution, diameter)

But we have a beautiful theory of G(N,p) (zero-one laws, the “movie”, threshold phenomenon, ….)

Much of this work enabled by simple, declarative G(N,p).

Find an ER-like model to replace generative models for DB theory-style theorems?

May lead to rigorous hypothesis testing for these models (key question in motifs).

Which model should we study?

“At each time step, a new vertex is added. Then, with probability δ, two vertices are chosen uniformly at random and joined by an undirected edge.” – CHKNS

Many models. For this study: simple & popular.

Callway, Hopcroft, Kleinberg, Newman, Strogatz (CHKNS)

CHKNS captures one salient aspect of many models: Arrival order of node affect its properties.

NB: Does not capture all phenomenon of interest.

Zeta GraphsSimple model to capture “arrival order”

NB: We’ll use a directed variant, all queries are binary since its easier to describe.

Zeta graphs

Bare bones model to break symmetry: 1 connects to many nodes (~ log N).N connects to 1 node (in expectation)

ER-like: Edges are present independently.

Zeta graphs are a family of sets of graphs indexed by N

Fixed node set: [N] = {1,…,N} (Index ≈ arrival order)

Stochastic edge set (independent edges)

Informal Main Result

Conjunctive Graph Queries cannot distinguish between Zeta graphs and CHKNS as N to ∞.

1. Determine the Theory of Zeta Graphs

2. Show the Theory of CHKNS is sandwiched between two “slices” of Zeta Graphs.

Here, Theory is set of CQs with probability 1

1st Technical Challenge:Graph Patterns

Our goal for this sectionGiven(1) a Language of Boolean queries L, and (2) a family of probability models M(1), M(2), …,M(N) check if limN to ∞ PrM(N)[q] = 1 for q in L

For the talk:(1) L will be “graph patterns” positive conjunctive

queries over binary relations.(2) The family of probability models M(N)=

“Theory” Th(L,M) = { q in L : limN to ∞ PrM(N)[q] = 1 }

Boolean Query Answering on ER Graphs

(2) Compute expected number of tuples.

(1) Form “full query” corresponding to q.

(3) Use Janson’s Inequality to relate E[Q] to Pr[q]

Recall: Classical Janson’s InequalityA classical sufficient condition for Pr[q] to 1.

A Q(c) and Q(d) properly overlap if they are not identical, but they share at least one identical subgoal

see Alon & Spencer, Random Graphs

A corollary of Janson’s inequality is:

Boolean Query Answering on ER Graphs

(2) Compute expected number of tuples.

(1) Form “full query” corresponding to q.

(3) Use Janson’s Inequality to relate E[Q] to Pr[q]

What changes for Zeta graphs?

Computing Expectation

Multiple Valued Zeta (MVZ) Functions

Only use integer si in this talk

MVZs show up in some strange places…

Order Matters: Paths of Length 2

If x < y < z

If x < z < y

So in our “atoms” variables will be totally ordered.

0 1 1

00 2

Why Multiple-Valued Zeta (MVZ)?

Well-studied special function. We get for free:

1. Asymptotics [Costermans et al. 2005]

2. Algebraic Identities [Zudilin & Zudilin 2003]

3. Fancy sounding function (not helpful)

Asymptotic Estimates for MVZsThis is a small variation of Costermans et al. result.

(expected # of edges)

(expected # of triangles)

(expected # of K4)

Indicates shared identical goal

Pr[2 Paths]

Consider pairs of properly overlapping 2 paths.

And others o(E[Q]2) and since E[Q] = w(1), Pr[Q] = 1 – o(1)

0

0

0

0

1

2 1

1 1

1 1

Two cycles you’re out!

rcycle

scycle

(1) For all r, s ≥ 2, PrM(N)[ B(r,s) ] < 1 – e for some fixed e > 0 as N to ∞, i.e., no bicycles.

B(r,s)

(2) Any connected graph q with at most one cycle appears with probability 1.

1st result:

Two Parts: (A) Any individual pattern, check E, and(B) Different “orderings” are non-negatively correlated.

Back to CHKNS

Central Message

How different is CHKNS from the family of Zeta graphs?

Up to CQs, the answer is not at all.

Key Technical Issues

1. CHKNS Edge probabilities have a painful form.– But can be sandwiched by “Zeta slices”

2. CHKNS Edges are correlated!- Develop bounds on correlations

3. Show that CHKNS can be essentially embedded in a part of Zeta graphs.

Goal: Establish that Th(“Graph Patterns”, CHKNS ) = Th(“Graph Patterns”, Zeta Graphs)

Other Related Work

Graph Models. Huge amounts. Volumes!

[Lynch 05]: Conditions on a skewed degree distribution, but symmetrizes labels.• Proves a 0-1 law for all of FO! • Zeta graphs and CHKNS have no 0-1 law.• Inspired by this paper!

Future Work & Conclusion

“Conjunctive” theory of simple random graph models with order.

• Does a simpler model capture CHKNS?

• Could one capture Albert & Barabasi’s preferential attachment model?

• Richer Languages?

Expectations for Ordered Graphs

Since sensitive to order, consider graph patterns with order among variables.

Then expectation has a semi-closed form.

This function has an MVZ

Computing Expectations of General CQs

If variables in Q are totally ordered, then we can compute E[Q] using MVZs.

Obvious algorithm: given a query, add in equality and inequality in all possible ways.

This takes exponential time in Q (#P-hard).