Distributed Estimation of Graph 4-Profiles
Ethan R. Elenberg, Karthikeyan Shanmugam,Michael Borokhovich, Alexandros G. Dimakis
University of Texas, Austin, USA
April 14, 2016
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 1/23
Introduction
• Perform analytics on large graphs
- How does information spread across the Web?- Is this social network user a spam bot?- Classify a protein as helpful or harmful
• Generalize existing subgraph analysis
- Triangle counts, clustering coefficient, graphlet frequencies
• Scalable, distributed algorithms
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 2/23
Graph 3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23
Graph 3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
? ? ? ?
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23
Graph 3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
0 ? ? ?
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23
Graph 3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
0 5 ? ?
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23
Graph 3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
0 5 5 ?
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23
Graph 3-profile
• Count the induced subgraphs formed by selecting all triples ofvertices
H3H2H1H0
0 5 5 0
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 3/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]
F0
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?]
F0
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, 0, 0, 0, ?, ?, ?, ?, ?, ?, ?]
F4
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ?]
F4
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, 0, 0, 0, 2, ?, ?, ?, ?, ?, ?]
F4
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, 0, 0, 0, 2, 0, 0, ?, ?, ?, ?]
F7
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, 0, 0, 0, 2, 0, 0, 1, ?, ?, ?]
F7
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
• Similarly, count the induced subgraphs formed by selecting allsets of 4 vertices
• n(G) = [0, 0, 0, 0, 2, 0, 0, 1, 2, 0, 0]
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 4/23
Graph 4-profile
Definition
Let Ni be the number of Fi’s in a graph G. The vectorn(G) = [N0, N1, . . . , N10] is called the global 4-profile of G.
- Always sums to(|V |
4
), the total number of 4-subgraphs
Definition
For each v ∈ V , the local 4-profile counts how many times vparticipates in each Fi with 3 other vertices.
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 5/23
Graph 4-profile
Definition
Let Ni be the number of Fi’s in a graph G. The vectorn(G) = [N0, N1, . . . , N10] is called the global 4-profile of G.
- Always sums to(|V |
4
), the total number of 4-subgraphs
Definition
For each v ∈ V , the local 4-profile counts how many times vparticipates in each Fi with 3 other vertices.
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 5/23
Motivation
• Local 4-profiles embed each vertex into an 11 dimensionalfeature space
- Spam detection- Generative models
• Global 4-profile concisely describes local connectivity
- Molecule classification
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 6/23
Introduction
• Problem: Compute (or approximate) 4-profile quantities for alarge graph
• Previous approaches have 2 drawbacks
- Require global communication- Many vertices redundantly repeat the same calculations
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 7/23
Introduction
• Problem: Compute (or approximate) 4-profile quantities for alarge graph
• Previous approaches have 2 drawbacks
- Require global communication- Many vertices redundantly repeat the same calculations
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 7/23
Contributions
1 Design novel, distributed algorithm to calculate local 4-profiles
2 Derive improved concentration bounds for 4-profile sparsifiers
3 Evaluate performance on real-world datasets
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 8/23
Related Work
Well studied across several communities (graphlets, motifs,subgraph frequencies):
• Graph sub-sampling
[Kim, Vu ’00] [Tsourakakis, et al. ’08 -’11] [Jha, et al. ’15]
• Large-scale triangle counting
[Shank ’07] [Satish, et al. ’14] [Eden, et al. ’15]
• Subgraph/graphlet counting equations
[Kloks, et al. ’00] [Kowaluk, et al. ’13]Orca [Hocevar, Demsar ’14] [E. ’15] [Ahmed, et al. ’15]
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 9/23
Outline
1 Introduction
2 4-Prof-Dist Algorithm
3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
4 Experiments
5 Conclusions
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 9/23
4-Prof-Dist
• Message passing algorithm in the Gather-Apply-Scatterframework
- GraphLab, Pregel, Spark GraphX, etc.
- Communication only allowed between adjacent vertices
- Intermediate results stored as edge data
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 10/23
4-Prof-Dist
1 Each vertex computes its local 3-profile and triangle list
2 Each vertex solves a 17× 17 system of equations relating itslocal 4-profile to its neighbors’ local 3-profiles
- Edge Pivots [Kloks, et al. ’00] [Kowaluk, et al. ’13] [E. ’15]
- 4-clique Counting
- 2-hop Histogram
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 11/23
4-Prof-Dist
1 Each vertex computes its local 3-profile and triangle list
2 Each vertex solves a 17× 17 system of equations relating itslocal 4-profile to its neighbors’ local 3-profiles
- Edge Pivots [Kloks, et al. ’00] [Kowaluk, et al. ’13] [E. ’15]
- 4-clique Counting
- 2-hop Histogram
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 11/23
Outline
1 Introduction
2 4-Prof-Dist Algorithm
3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
4 Experiments
5 Conclusions
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 11/23
Edge Sub-sampling Process
• Sub-sample each edge in the graph independently withprobability p
• Relate the original and sub-sampled graphs via a 1-stepMarkov chain
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 12/23
Edge Sub-sampling Process: 4-clique
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
......
p6
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=1
p6
Sub-sampled
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23
Edge Sub-sampling Process: 4-clique
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
......
p6
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=1
p6
Sub-sampled
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23
Edge Sub-sampling Process: 4-clique
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
......
p6
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=1
p6
Sub-sampled
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23
Edge Sub-sampling Process: 4-clique
Original
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
......
p6
p4
p5
6p 5(1− p)
3p 4(1−
p) 2p 4
(1− p)
Sub-sampled
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Estimator
=1
p6
Sub-sampled
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 13/23
Edge Sub-sampling Process
In general, construct an invertible transition matrix H
Hij = P(sub-sampled = Fi | original = Fj)
UnbiasedEstimators
= H−1
Sub-sampled4-profile
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 14/23
Main Concentration Result
• N10 - # 4-cliques
• k10 - Maximum # 4-cliques sharing a common edge
Theorem (4-clique sparsifier)
If the sampling probability
p ≥(
log(2/δ)k10
2ε2N10
)1/12
,
then the relative error is bounded by ε with probability at least1− δ.
Proof Sketch:
- 4-clique estimator is associated with a read-k10 function family
f(G, p) = e1e2e4e5e7e8 + e4e5e6e8e9e10 + . . .
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 15/23
Main Concentration Result
• N10 - # 4-cliques
• k10 - Maximum # 4-cliques sharing a common edge
Theorem (4-clique sparsifier)
If the sampling probability
p ≥(
log(2/δ)k10
2ε2N10
)1/12
,
then the relative error is bounded by ε with probability at least1− δ.
Proof Sketch:
- 4-clique estimator is associated with a read-k10 function family
f(G, p) = e1e2e4e5e7e8 + e4e5e6e8e9e10 + . . .
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 15/23
Outline
1 Introduction
2 4-Prof-Dist Algorithm
3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
4 Experiments
5 Conclusions
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 15/23
Implementation
• GraphLab PowerGraph v2.2
• Multicore server• 256 GB RAM, 72 logical cores
• EC2 cluster (Amazon Web Services)• 20 c3.8xlarge, 60 GB RAM, 32 logical cores each
Datasets
Name Vertices Edges (undirected)
WEB-NOTRE 325, 729 1, 090, 108LiveJournal 4, 846, 609 42, 851, 237
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 16/23
Implementation
• GraphLab PowerGraph v2.2
• Multicore server• 256 GB RAM, 72 logical cores
• EC2 cluster (Amazon Web Services)• 20 c3.8xlarge, 60 GB RAM, 32 logical cores each
Datasets
Name Vertices Edges (undirected)
WEB-NOTRE 325, 729 1, 090, 108LiveJournal 4, 846, 609 42, 851, 237
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 16/23
Results: AWS Full Neighborhood vs. Histogram, 10 runs
12 nodes 14 nodes 16 nodes 18 nodes 20 nodes0.0
0.5
1.0
1.5
2.0
Net
wor
kse
nt[b
ytes
]×1010 WEB-NOTRE, AWS
2-hop 2-hop histogram
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 17/23
Results: AWS Full Neighborhood vs. Histogram, 10 runs
12 nodes 14 nodes 16 nodes 18 nodes 20 nodes0
10
20
30
40
50R
unni
ngtim
e[s
ec]
WEB-NOTRE, AWS2-hop 2-hop histogram
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 18/23
Results: Concentration Bounds
0.0 0.2 0.4 0.6 0.8 1.0Edge sampling probability
100
103
106
109
1012
1015
1018
1021
1024
1027
1030
Acc
urac
y:|e
xact
-app
rox|
LiveJournal, Sparsifier Accuracy
Kim-VuRead-k4-cliquesMeasured
Improves on previous bounds if p = Ω(1/ log |E|)
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 19/23
Results: Concentration Bounds
0.0 0.2 0.4 0.6 0.8 1.0Edge sampling probability
100
103
106
109
1012
1015
1018
1021
1024
1027
1030
Acc
urac
y:|e
xact
-app
rox|
LiveJournal, Sparsifier Accuracy
Kim-VuRead-k4-cliquesMeasured
Improves on previous bounds if p = Ω(1/ log |E|)E. R. Elenberg Distributed Estimation of Graph 4-Profiles 19/23
Results: Multicore Running Time Comparison, 10 runs
10 cores 20 cores 30 cores 40 cores 50 cores 60 cores0
500
1000
1500
2000
2500R
unni
ngtim
e[s
ec]
Orca: 1288 sec
LiveJournal, Asterix
4-Prof-Dist, p = 1 vs. Orca [Hocevar, Demsar ’14]
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 20/23
Results: Multicore Running Time Comparison, 10 runs
p=1 p=0.9 p=0.8 p=0.7 p=0.6 p=0.5 p=0.4 p=0.3 p=0.2 p=0.10
100
200
300
400
500
600R
unni
ngtim
e[s
ec]
LiveJournal, Asterix
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 21/23
Results: AWS Running Time Comparison, 10 runs
8 nodes 12 nodes 16 nodes 20 nodes0
50
100
150
200R
unni
ngtim
e[s
ec]LiveJournal, AWS
p=1p=0.7
p=0.4p=0.1
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 22/23
Outline
1 Introduction
2 4-Prof-Dist Algorithm
3 4-profile SparsifierEdge Sub-sampling ProcessConcentration Bound
4 Experiments
5 Conclusions
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 22/23
Summary
1 2-hop histogram reduces network communication in adistributed setting
2 Edge sub-sampling produces fast, accurate 4-profile estimates
- Bounds for other subgraphs in the full paper
3 Distributed/parallel implementation improves performance atscale
github.com/eelenberg/4-profiles
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Results: AWS Network Communication, 10 runs
8 nodes 12 nodes 16 nodes 20 nodes0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
Net
wor
kse
nt[b
ytes
]×1011 LiveJournal, AWS
p=1p=0.7
p=0.4p=0.1
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) 2-hop Histogram
• At each vertex a, store a (p, ca[p]) pair for each neighbor p
• For any a ∈ Γ(v) and p /∈ Γ(v),
ca[p] = 1 ⇔ vap forms a 2-path
• Gather across neighbors to count the number of distinct2-paths from v to p
p
v∑p/∈Γ(v)
(⊕a∈Γ(v) ca[p]2
)F7(v) F9(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) 2-hop Histogram
• At each vertex a, store a (p, ca[p]) pair for each neighbor p
• For any a ∈ Γ(v) and p /∈ Γ(v),
ca[p] = 1 ⇔ vap forms a 2-path
• Gather across neighbors to count the number of distinct2-paths from v to p
p
v∑p/∈Γ(v)
(⊕a∈Γ(v) ca[p]2
)F7(v) F9(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) 2-hop Histogram
• At each vertex a, store a (p, ca[p]) pair for each neighbor p
• For any a ∈ Γ(v) and p /∈ Γ(v),
ca[p] = 1 ⇔ vap forms a 2-path
• Gather across neighbors to count the number of distinct2-paths from v to p
p
v∑p/∈Γ(v)
(⊕a∈Γ(v) ca[p]2
)F7(v) F9(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) 2-hop Histogram
• At each vertex a, store a (p, ca[p]) pair for each neighbor p
• For any a ∈ Γ(v) and p /∈ Γ(v),
ca[p] = 1 ⇔ vap forms a 2-path
• Gather across neighbors to count the number of distinct2-paths from v to p
p
v∑p/∈Γ(v)
(⊕a∈Γ(v) ca[p]2
)F7(v) F9(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) 2-hop Histogram
• At each vertex a, store a (p, ca[p]) pair for each neighbor p
• For any a ∈ Γ(v) and p /∈ Γ(v),
ca[p] = 1 ⇔ vap forms a 2-path
• Gather across neighbors to count the number of distinct2-paths from v to p
p
v∑p/∈Γ(v)
(⊕a∈Γ(v) ca[p]2
)F7(v) F9(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Local 3-profile
Vertex program in the Gather-Apply-Scatter framework [E. ’15]
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Local 3-profile
Vertex program in the Gather-Apply-Scatter framework [E. ’15]
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Local 3-profile
Vertex program in the Gather-Apply-Scatter framework [E. ’15]
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Local 3-profile
Vertex program in the Gather-Apply-Scatter framework [E. ’15]
1 For each vertex v: Gather and Apply vertex IDs to store Γ(v)
2 For each edge va: Scatter
v an3,va = |Γ(v) ∩ Γ(a)|,
v anc2,va = |Γ(v)| − |Γ(v) ∩ Γ(a)| − 1, . . .
3 For each vertex v: Gather and Apply
v an3,v = 1
2
∑a∈Γ(v) n3,va
v anc2,v = 1
2
∑a∈Γ(v) n
c2,va, . . .
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
H0(v) He1(v) Hd
1 (v) Hc2(v) He
2(v) H3(v)
∑a∈Γ(v)
(ne
1,va
2
)= F1(v) + F2(v)
∑a∈Γ(v)
ne1,van3,va = 2F5(v) + F
′′8 (v)
∑a∈Γ(v)
(nc
2,va
2
)= 3F
′6(v) + F
′8(v)
∑a∈Γ(v)
nc2,van
e2,va = F
′4(v) + 2F7(v)
∑a∈Γ(v)
(n3,va
2
)= F
′9(v) + 3F10(v)
∑a∈Γ(v)
nc2,van3,va = 2F
′8(v) + 2F
′9(v)
∑a∈Γ(v)
ne1,van
c2,va = 2F
′3(v) + F
′4(v) n
d1,v|Γ(v)| = F2(v) + F4(v) + F8(v)
F3(v)
F
F4(v) F6(v) F8(v) F9(v)
F′
F′′
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Edge Pivot Equations
• Relate local 4-profile at v to neighboring local 3-profiles
• Accumulate pairs of subgraph counts involving edge va,summed over all neighbors
v a
∑a∈Γ(v) n
c2,van
e2,va F
′4(v) 2F7(v)
= +
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Clique Counting
• Directly count the number of 4-cliques, F10
1 Initially, each vertex stores its triangle list ∆(v)
2 For each edge v:Gather triangles of a that contain 3 neighbors of v
b c
v a
∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Clique Counting
• Directly count the number of 4-cliques, F10
1 Initially, each vertex stores its triangle list ∆(v)
2 For each edge v:Gather triangles of a that contain 3 neighbors of v
b c
v a
∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Clique Counting
• Directly count the number of 4-cliques, F10
1 Initially, each vertex stores its triangle list ∆(v)
2 For each edge v:Gather triangles of a that contain 3 neighbors of v
b c
v a
∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Clique Counting
• Directly count the number of 4-cliques, F10
1 Initially, each vertex stores its triangle list ∆(v)
2 For each edge v:Gather triangles of a that contain 3 neighbors of v
b c
v a
∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Clique Counting
• Directly count the number of 4-cliques, F10
1 Initially, each vertex stores its triangle list ∆(v)
2 For each edge v:Gather triangles of a that contain 3 neighbors of v
b c
v a
∑a∈Γ(v) |(b, c) ∈ ∆(a) : b ∈ Γ(v), c ∈ Γ(v)|
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Concentration Result
• Because there are no large constants, our concentrationbounds improve on earlier work in practical settings
Corollary (Comparison with [Kim,Vu])
Let G be a graph with m edges. If p = Ω(1/ logm) andδ = Ω(1/m), then read-k provides better triangle sparsifier
accuracy than [Kim,Vu]. If additionally k10 ≤ N5/610 , then read-k
provides better 4-clique sparsifier accuracy than [Kim,Vu].
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23
(Backup) Results: 4-profile Sparsifier Accuracy, 10 runs
p=0.9 p=0.7 p=0.4 p=0.3 p=0.10.90
0.95
1.00
1.05
1.10A
ccur
acy:
exac
t/app
rox
LiveJournal, Accuracy, 4-profilesF10F9
F8F7
E. R. Elenberg Distributed Estimation of Graph 4-Profiles 23/23