Community Detection in Networks: SDP relaxation and ...€¦ · Community detection in networks...
Transcript of Community Detection in Networks: SDP relaxation and ...€¦ · Community detection in networks...
Community Detection in Networks:SDP relaxation and Computational Gaps
Yihong Wu
Department of ECEUniversity of Illinois at Urbana-Champaign
Joint work with Bruce Hajek (Illinois) and Jiaming Xu (Wharton)
May 20, 2015
Community detection in networks
• Networks with community structures arise in many applications
Santa Fe Institute Collaboration network [Girvan-Newman ’02]
• Task: Discover underlying communities based on the networktopology
• Applications: Friend or movie recommendation in online socialnetworks
Yihong Wu (Illinois) Community Detection 2
Community detection in networks
• Networks with community structures arise in many applications
Santa Fe Institute Collaboration network [Girvan-Newman ’02]
• Task: Discover underlying communities based on the networktopology
• Applications: Friend or movie recommendation in online socialnetworks
Yihong Wu (Illinois) Community Detection 2
Community detection in networks
• Networks with community structures arise in many applications
Santa Fe Institute Collaboration network [Girvan-Newman ’02]
• Task: Discover underlying communities based on the networktopology
• Applications: Friend or movie recommendation in online socialnetworks
Yihong Wu (Illinois) Community Detection 2
Statistical and computational challenges
• The observed network is sparse
• Large solution space
Question
• Is there a computationally efficient and statistically optimalcommunity detection algorithm?
Yihong Wu (Illinois) Community Detection 3
Statistical and computational challenges
• The observed network is sparse
• Large solution space
Question
• Is there a computationally efficient and statistically optimalcommunity detection algorithm?
Yihong Wu (Illinois) Community Detection 3
Stochastic block model [Holland et al. ’83]Planted partition model [Condon-Karp 01]
n = 40, K = 10, r = 3
Yihong Wu (Illinois) Community Detection 4
Stochastic block model [Holland et al. ’83]Planted partition model [Condon-Karp 01]
p = 0.9
q = 0.1
Yihong Wu (Illinois) Community Detection 5
Stochastic block model [Holland et al. ’83]Planted partition model [Condon-Karp 01]
p = 0.9 q = 0.1
Yihong Wu (Illinois) Community Detection 5
Stochastic block model [Holland et al. ’83]Planted partition model [Condon-Karp 01]
p = 0.9 q = 0.1
Yihong Wu (Illinois) Community Detection 6
Exact recovery
• True cluster: C∗
• Estimated cluster: C
• Goal: exact recovery (strong consistency)
PC = C∗ n→∞−−−→ 1
• AlternativesI almost exact recovery (weak consistency):
[Mossel-Neeman-Sly ’13, Abbe-Sandon ’15, Montanari ’15]...I correlated recovery:
[Decelle-Krzakala-Moore-Zdeborova ’11, Mossel-Neeman-Sly ’12,Massoulie ’13]...
Yihong Wu (Illinois) Community Detection 7
Exact recovery
• True cluster: C∗
• Estimated cluster: C
• Goal: exact recovery (strong consistency)
PC = C∗ n→∞−−−→ 1
• AlternativesI almost exact recovery (weak consistency):
[Mossel-Neeman-Sly ’13, Abbe-Sandon ’15, Montanari ’15]...I correlated recovery:
[Decelle-Krzakala-Moore-Zdeborova ’11, Mossel-Neeman-Sly ’12,Massoulie ’13]...
Yihong Wu (Illinois) Community Detection 7
Objectives of this talk
• Statistical limit: When is exact recovery possible (impossible)?
• Computational limit: When is exact recovery computationally easy(hard)?
Yihong Wu (Illinois) Community Detection 8
remainder of the talk
1 Linear community size: Sharp recovery via semidefinite programming
2 Sublinear community size: Computational lower bounds
Yihong Wu (Illinois) Community Detection 9
Two equal-sized communities
Binary symmetric SBM
Model:
• n nodes partitioned into two communities of size n2 (σi = ±1).
• i ∼ j independently w.p.
p = a logn
n σi = σj
q = b lognn σi 6= σj
Yihong Wu (Illinois) Community Detection 11
MLE ⇔ MIN BISECTION
Assuming p > q
• Maximum likelihood estimator (MLE)
maxσ〈A, σσ>〉
s.t. σi ∈ ±1, i ∈ [n]
σ>1 = 0,
lift⇐==⇒ maxY〈A, Y 〉
s.t. rank(Y ) = 1
Yii = 1, i ∈ [n]
〈J, Y 〉 = 0.
where J = all-one matrix
Yihong Wu (Illinois) Community Detection 12
MLE ⇔ MIN BISECTION
Assuming p > q
• Maximum likelihood estimator (MLE)
maxσ〈A, σσ>〉
s.t. σi ∈ ±1, i ∈ [n]
σ>1 = 0,
lift⇐==⇒ maxY〈A, Y 〉
s.t. rank(Y ) = 1
Yii = 1, i ∈ [n]
〈J, Y 〉 = 0.
where J = all-one matrix
Yihong Wu (Illinois) Community Detection 12
SDP relaxation
• Semidefinite programming (SDP) relaxation of MLE
YSDP = arg maxY
〈A, Y 〉
s.t. Y 0
Yii = 1, i ∈ [n]
〈J, Y 〉 = 0.
• similar SDP as in [Frieze-Jerrum ’95] for MAX BISECTION
• average-case analysis on generative model (SBM)
• focus on arg max rather than approximating max
• goal: P
YSDP =
−1
−11
1
→ 1
Yihong Wu (Illinois) Community Detection 13
Optimal recovery via SDP
Theorem (Abbe-Bandeira-Hall ’14, Mossel-Neeman-Sly ’14)
• If (√a−√b)2 > 2, recovery is achievable in polynomial-time.
• If (√a−√b)2 < 2, recovery is impossible.
Theorem (Hajek-W.-Xu ’14)
SDP achieves the optimal recovery threshold (√a−√b)2 > 2.
Remarks
• originally conjectured in [Abbe-Bandeira-Hall ’14]
• independently proved by [Bandeira ’15]
• P
YSDP =
−1
−11
1
= 1− n−Ω(1)
Yihong Wu (Illinois) Community Detection 14
Optimal recovery via SDP
Theorem (Abbe-Bandeira-Hall ’14, Mossel-Neeman-Sly ’14)
• If (√a−√b)2 > 2, recovery is achievable in polynomial-time.
• If (√a−√b)2 < 2, recovery is impossible.
Theorem (Hajek-W.-Xu ’14)
SDP achieves the optimal recovery threshold (√a−√b)2 > 2.
Remarks
• originally conjectured in [Abbe-Bandeira-Hall ’14]
• independently proved by [Bandeira ’15]
• P
YSDP =
−1
−11
1
= 1− n−Ω(1)
Yihong Wu (Illinois) Community Detection 14
Optimal recovery via SDP
Theorem (Abbe-Bandeira-Hall ’14, Mossel-Neeman-Sly ’14)
• If (√a−√b)2 > 2, recovery is achievable in polynomial-time.
• If (√a−√b)2 < 2, recovery is impossible.
Theorem (Hajek-W.-Xu ’14)
SDP achieves the optimal recovery threshold (√a−√b)2 > 2.
Remarks
• originally conjectured in [Abbe-Bandeira-Hall ’14]
• independently proved by [Bandeira ’15]
• P
YSDP =
−1
−11
1
= 1− n−Ω(1)
Yihong Wu (Illinois) Community Detection 14
Dual certificate
maxY〈A, Y 〉
dual variables
s.t. Y 0
S 0
Yii = 1
D = diag di
〈J, Y 〉 = 0
λ ∈ R
Lemma
Y ∗ = σ∗(σ∗)> is unique solution if ∃D,λ s.t. S = λJ +D −A satisfies
Sσ = 0 and λ2(S) > 0.
⇒ di = (# of nbrs in own cluster)− (# of nbrs in other cluster)
=
e(i, C1)− e(i, C2) i ∈ C1
e(i, C2)− e(i, C1) i ∈ C2
Yihong Wu (Illinois) Community Detection 15
Dual certificate
maxY〈A, Y 〉 dual variables
s.t. Y 0 S 0
Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R
Lemma
Y ∗ = σ∗(σ∗)> is unique solution if ∃D,λ s.t. S = λJ +D −A satisfies
Sσ = 0 and λ2(S) > 0.
⇒ di = (# of nbrs in own cluster)− (# of nbrs in other cluster)
=
e(i, C1)− e(i, C2) i ∈ C1
e(i, C2)− e(i, C1) i ∈ C2
Yihong Wu (Illinois) Community Detection 15
Dual certificate
maxY〈A, Y 〉 dual variables
s.t. Y 0 S 0
Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R
Lemma
Y ∗ = σ∗(σ∗)> is unique solution if ∃D,λ s.t. S = λJ +D −A satisfies
Sσ = 0 and λ2(S) > 0.
⇒ di = (# of nbrs in own cluster)− (# of nbrs in other cluster)
=
e(i, C1)− e(i, C2) i ∈ C1
e(i, C2)− e(i, C1) i ∈ C2
Yihong Wu (Illinois) Community Detection 15
Verify PSD
• Mean adj matrix: E [A] = p+q2 J + p−q
2 σ∗(σ∗)> − pI
•
S = λJ−A+D
=(λ− p+ q
2
)J︸ ︷︷ ︸−p− q2
σ∗(σ∗)> + pI +D − (A− E [A])︸ ︷︷ ︸• λ2(S) = infx⊥σ∗,‖x‖2=1 x
>Sx > 0 if min di ≥ ‖A− E [A] ‖ andλ ≥ (p+ q)/2
• To finish the proof:
1 min di = ΩP (log n) if√a−√b >√
22 ‖A− E [A] ‖ = OP (
√log n)
Yihong Wu (Illinois) Community Detection 16
Verify PSD
• Mean adj matrix: E [A] = p+q2 J + p−q
2 σ∗(σ∗)> − pI•
S = λJ−A+D
=(λ− p+ q
2
)J︸ ︷︷ ︸−p− q2
σ∗(σ∗)> + pI +D − (A− E [A])︸ ︷︷ ︸
• λ2(S) = infx⊥σ∗,‖x‖2=1 x>Sx > 0 if min di ≥ ‖A− E [A] ‖ and
λ ≥ (p+ q)/2
• To finish the proof:
1 min di = ΩP (log n) if√a−√b >√
22 ‖A− E [A] ‖ = OP (
√log n)
Yihong Wu (Illinois) Community Detection 16
Verify PSD
• Mean adj matrix: E [A] = p+q2 J + p−q
2 σ∗(σ∗)> − pI•
S = λJ−A+D
=(λ− p+ q
2
)J︸ ︷︷ ︸−p− q2
σ∗(σ∗)> + pI +D − (A− E [A])︸ ︷︷ ︸• λ2(S) = infx⊥σ∗,‖x‖2=1 x
>Sx > 0 if min di ≥ ‖A− E [A] ‖ andλ ≥ (p+ q)/2
• To finish the proof:
1 min di = ΩP (log n) if√a−√b >√
22 ‖A− E [A] ‖ = OP (
√log n)
Yihong Wu (Illinois) Community Detection 16
Verify PSD
• Mean adj matrix: E [A] = p+q2 J + p−q
2 σ∗(σ∗)> − pI•
S = λJ−A+D
=(λ− p+ q
2
)J︸ ︷︷ ︸−p− q2
σ∗(σ∗)> + pI +D − (A− E [A])︸ ︷︷ ︸• λ2(S) = infx⊥σ∗,‖x‖2=1 x
>Sx > 0 if min di ≥ ‖A− E [A] ‖ andλ ≥ (p+ q)/2
• To finish the proof:
1 min di = ΩP (log n) if√a−√b >√
22 ‖A− E [A] ‖ = OP (
√log n)
Yihong Wu (Illinois) Community Detection 16
Verify PSD
• Mean adj matrix: E [A] = p+q2 J + p−q
2 σ∗(σ∗)> − pI•
S = λJ−A+D
=(λ− p+ q
2
)J︸ ︷︷ ︸−p− q2
σ∗(σ∗)> + pI +D − (A− E [A])︸ ︷︷ ︸• λ2(S) = infx⊥σ∗,‖x‖2=1 x
>Sx > 0 if min di ≥ ‖A− E [A] ‖ andλ ≥ (p+ q)/2
• To finish the proof:
1 min di = ΩP (log n) if√a−√b >√
2
2 ‖A− E [A] ‖ = OP (√
log n)
Yihong Wu (Illinois) Community Detection 16
Verify PSD
• Mean adj matrix: E [A] = p+q2 J + p−q
2 σ∗(σ∗)> − pI•
S = λJ−A+D
=(λ− p+ q
2
)J︸ ︷︷ ︸−p− q2
σ∗(σ∗)> + pI +D − (A− E [A])︸ ︷︷ ︸• λ2(S) = infx⊥σ∗,‖x‖2=1 x
>Sx > 0 if min di ≥ ‖A− E [A] ‖ andλ ≥ (p+ q)/2
• To finish the proof:
1 min di = ΩP (log n) if√a−√b >√
22 ‖A− E [A] ‖ = OP (
√log n)
Yihong Wu (Illinois) Community Detection 16
Remarks
1 Necessity
√a−√b <√
2
⇒ min di < 0 w.h.p.
⇒ ∃i : # of nbrs in own cluster < # of nbrs in other cluster
⇒ MLE fails
2 Proof of ‖A− E [A] ‖ = OP (√
log n)I 2nd-order stochastic dominance argument [Tomozei-Massoulie ’14]
+ result for iid matrix [Seginer ’00]I [Feige-Ofek ’05]: G(n, C logn
n ) for sufficiently large CI [Bandeira-van Handel ’14]: comparison argument
Yihong Wu (Illinois) Community Detection 17
Remarks
1 Necessity
√a−√b <√
2
⇒ min di < 0 w.h.p.
⇒ ∃i : # of nbrs in own cluster < # of nbrs in other cluster
⇒ MLE fails
2 Proof of ‖A− E [A] ‖ = OP (√
log n)I 2nd-order stochastic dominance argument [Tomozei-Massoulie ’14]
+ result for iid matrix [Seginer ’00]I [Feige-Ofek ’05]: G(n, C logn
n ) for sufficiently large CI [Bandeira-van Handel ’14]: comparison argument
Yihong Wu (Illinois) Community Detection 17
Multiple equal-sized communities
r equal-sized clusters
• 0, 1-cluster matrix:
Y ∗ =∑r
k=1 ξk(ξk)> =
1
1
1
1
0
0
where ξk = indicator of the kth cluster of size K = n/r
• SDP relaxation of MLE:
maxY〈A, Y 〉
s.t. Y 0
Yii = 1
Yij ≥ 0∑j
Yij = K
Yihong Wu (Illinois) Community Detection 19
r equal-sized clusters
• 0, 1-cluster matrix:
Y ∗ =∑r
k=1 ξk(ξk)> =
1
1
1
1
0
0
where ξk = indicator of the kth cluster of size K = n/r• SDP relaxation of MLE:
maxY〈A, Y 〉
s.t. Y 0
Yii = 1
Yij ≥ 0∑j
Yij = K
Yihong Wu (Illinois) Community Detection 19
Optimality of SDP
Theorem ([Hajek-W.-Xu ’15])
SDP achieves optimal threshold (√a−√b)2 > r.
Proof of correctness:
maxY〈A, Y 〉
s.t. Y 0
S 0
Yii = 1
di
Yij ≥ 0
B ≥ 0
∑j
Yij = K
λi
Yihong Wu (Illinois) Community Detection 20
Optimality of SDP
Theorem ([Hajek-W.-Xu ’15])
SDP achieves optimal threshold (√a−√b)2 > r.
Proof of correctness:
maxY〈A, Y 〉
s.t. Y 0 S 0
Yii = 1 di
Yij ≥ 0 B ≥ 0∑j
Yij = K λi
Yihong Wu (Illinois) Community Detection 20
Construction of the dual witness
• For node i ∈ Ck,
λi =1
K
(max6=k
e(i, C`)−Kq/2 +√
log n/2)
di = e(i, Ck)−max6=k
e(i, C`)−1
K
∑j∈Ck
max6=k
e(j, C`) +Kq −√
log n
• B =
0
0
0
0
, where each is rank-2, specified by
BCk×Ck′ (i, j) =1
K
(max6=k
e(i, Ck′)− e(i, Ck′) + max`6=k′
e(j, Ck)− e(j, Ck)
+e(Ck, Ck′)
K−Kq +
√log n
)• S = D −A−B + λ1> + 1λ>
Yihong Wu (Illinois) Community Detection 21
Construction of the dual witness
maxY〈A, Y 〉
s.t. Y 0 S 0
Yii = 1 di
Yij ≥ 0 B ≥ 0∑j
Yij = K λi
• Sξk = 0 for k = 1, . . . , r.
• λr+1(S) > 0 if min di ≥ ‖A− E [A] ‖ = OP (√
log n)
• di = (# of nbrs in own cluster)−maximal (# of nbrs in other clusters) +OP (
√log n).
• Sharp thresholdI√a−√b >√r ⇒ min di = Ω(log n)⇒ SDP succeeds
I√a−√b <√r ⇒ min di = −Ω(log n)⇒ MLE fails
Yihong Wu (Illinois) Community Detection 22
Unequal-sized clusters
Two unequal-sized clusters: known size
q
qp
p
Two clusters of size K and n−K (K = ρn):
YSDP = arg maxY
〈A, Y 〉
s.t. Y 0
Yii = 1, i ∈ [n]
〈J, Y 〉 = (2K − n)2
achieves optimal threshold η(ρ, a, b) > 1.
Note: ρ 7→ η(ρ, a, b) is minimized at η(1/2, a, b) = 12(√a−√b)2 ⇒
“suggests” equal-sized case is the hardest for two communities
Yihong Wu (Illinois) Community Detection 24
Two unequal-sized clusters: known size
q
qp
p
Two clusters of size K and n−K (K = ρn):
YSDP = arg maxY
〈A, Y 〉
s.t. Y 0
Yii = 1, i ∈ [n]
〈J, Y 〉 = (2K − n)2
achieves optimal threshold η(ρ, a, b) > 1.
Note: ρ 7→ η(ρ, a, b) is minimized at η(1/2, a, b) = 12(√a−√b)2 ⇒
“suggests” equal-sized case is the hardest for two communities
Yihong Wu (Illinois) Community Detection 24
Two unequal-sized clusters: unknown size
q
qp
p
Two clusters of size K and n−K (K = 0, 1, . . . , n):
YSDP = arg maxY
〈A, Y 〉 − λ〈J, Y 〉
s.t. Y 0
Yii = 1, i ∈ [n]
with λ = a−blog a−log b
lognn achieves optimal threshold
(√a−√b)2 > 2.
Note: If K = Ω(n), there exists a data-driven choice of λ.
Yihong Wu (Illinois) Community Detection 25
Two unequal-sized clusters: unknown size
q
qp
p
Two clusters of size K and n−K (K = 0, 1, . . . , n):
YSDP = arg maxY
〈A, Y 〉 − λ〈J, Y 〉
s.t. Y 0
Yii = 1, i ∈ [n]
with λ = a−blog a−log b
lognn achieves optimal threshold
(√a−√b)2 > 2.
Note: If K = Ω(n), there exists a data-driven choice of λ.
Yihong Wu (Illinois) Community Detection 25
More generally...
• Binary censored block model: G(n, a lognn ) observe edge label flipped
w.p. εI SDP achieves sharp threshold a (
√1− ε−
√ε)2 > 1
I Closes the gap in [Abbe-Bandeira-Bracher-Singer ’14]
• General SBM:I Optimality of SDP relaxation remains open (but within a factor of 4)I Sharp threshold is found in [Abbe-Sandon ’15] via a two-stage
procedure.
Yihong Wu (Illinois) Community Detection 26
Detecting a single cluster
Finding a single community
q
qp
q
• One cluster of size K plus n−K outliers
• Connectivity p within cluster and q otherwise
• Also known as Planted Dense Subgraph model
• Linear community size: K = ρn and SDPachieves sharp threshold
• Next focus on K = Θ(nβ).
Yihong Wu (Illinois) Community Detection 28
Conjecture on computational limit
Oα
β
1
1
p = cq = Θ(n−α)
K = Θ(nβ)
impossible
1/2
easy
spectral barrier
Conjecture [Chen-Xu ’14]: no polynomial-time algorithm succeedsbeyond the spectral barrier [Nadakuditi-Newman ’12]
Yihong Wu (Illinois) Community Detection 29
Conjecture on computational limit
Oα
β
1
1
p = cq = Θ(n−α)
K = Θ(nβ)
impossible
1/2
easy
spectral barrier
Conjecture [Chen-Xu ’14]: no polynomial-time algorithm succeedsbeyond the spectral barrier [Nadakuditi-Newman ’12]
Yihong Wu (Illinois) Community Detection 29
Conjecture on computational limit
Oα
β
1
1
p = cq = Θ(n−α)
K = Θ(nβ)
impossible
1/2
easy
spectral barrier
Conjecture [Chen-Xu ’14]: no polynomial-time algorithm succeedsbeyond the spectral barrier [Nadakuditi-Newman ’12]
Yihong Wu (Illinois) Community Detection 29
Conjecture on computational limit
Oα
β
1
1
p = cq = Θ(n−α)
K = Θ(nβ)
impossible
1/2
easy
spectral barrier
Conjecture [Chen-Xu ’14]: no polynomial-time algorithm succeedsbeyond the spectral barrier [Nadakuditi-Newman ’12]
Yihong Wu (Illinois) Community Detection 29
A =
p
q
K
K
+ A− E[A]
−3 −2 −1 0 1 2 3 4 50
0.05
0.1
0.15
0.2
0.25
0.3
K(p−q)σ
semi−circle law
Eigenvalue distribution of A−q11>
σ for σ =√q(1− q)n
Yihong Wu (Illinois) Community Detection 30
A =
p
q
K
K
+ A− E[A]
−3 −2 −1 0 1 2 3 4 50
0.05
0.1
0.15
0.2
0.25
0.3
K(p−q)σ
semi−circle law
Eigenvalue distribution of A−q11>
σ for σ =√q(1− q)n
Yihong Wu (Illinois) Community Detection 30
Planted clique hardness hypothesis
H0 : Bern(γ) vs H1 : Bern(1)
K
K
Bern(γ)
Intermediate regime: log n K √n, γ = Θ(1)
• detection is possible but believed to have high computationalcomplexity: [Alon et al. ’11] [Feldman et al. ’13][Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]
• various hardness results assuming Planted Clique hardnessI detecting sparse principal component [Berthet-Rigollet ’13]: γ = 1
2I detecting sparse submatrix [Ma-W. ’13, Cai-Liang-Rakhlin ’15]:γ = 1
2
I cryptography [Applebaum-Barak-Wigderson ’10]: γ = 2− log0.99 n
Yihong Wu (Illinois) Community Detection 31
Planted clique hardness hypothesis
H0 : Bern(γ) vs H1 : Bern(1)
K
K
Bern(γ)
Intermediate regime: log n K √n, γ = Θ(1)
• detection is possible but believed to have high computationalcomplexity: [Alon et al. ’11] [Feldman et al. ’13][Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]
• various hardness results assuming Planted Clique hardnessI detecting sparse principal component [Berthet-Rigollet ’13]: γ = 1
2I detecting sparse submatrix [Ma-W. ’13, Cai-Liang-Rakhlin ’15]:γ = 1
2
I cryptography [Applebaum-Barak-Wigderson ’10]: γ = 2− log0.99 n
Yihong Wu (Illinois) Community Detection 31
Hard regime for recovering a single cluster
Assuming Planted Clique hardness for any constant γ > 0
1
12/3
p = cq = Θ(n−α)
K = Θ(nβ)
1/2
impossible
easy
1/2
hard
O α
β
Recovering a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(
√n)
Yihong Wu (Illinois) Community Detection 32
Hard regime for recovering a single cluster
Assuming Planted Clique hardness for any constant γ > 0
1
12/3
p = cq = Θ(n−α)
K = Θ(nβ)
1/2
impossible
easy
1/2
hard
O α
β
Recovering a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(
√n)
Yihong Wu (Illinois) Community Detection 32
Proof step 1: Recovery is harder than detection
Recovery versus Detection [Arias-Castro-Verzelen ’14] :
H0 : Bern(q) vs H1 : Bern(p)
S
S
Bern(q)
Each node is included in S with probability Kn
Yihong Wu (Illinois) Community Detection 33
Proof step 1: Recovery is harder than detection
Recovery versus Detection [Arias-Castro-Verzelen ’14] :
H0 : Bern(q) vs H1 : Bern(p)
S
S
Bern(q)
Each node is included in S with probability Kn
Yihong Wu (Illinois) Community Detection 33
Proof step 2: Hardness for detecting a single cluster
1
12/3
p = cq = Θ(n−α)
K = Θ(nβ)
1/2
impossible
easy
1/2
hard
O α
β
• Detecting a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(
√n)
• Reduced from Planted Clique detection in polynomial time
Yihong Wu (Illinois) Community Detection 34
Proof step 2: Hardness for detecting a single cluster
1
12/3
p = cq = Θ(n−α)
K = Θ(nβ)
1/2
impossible
easy
1/2
hard
O α
β
• Detecting a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(
√n)
• Reduced from Planted Clique detection in polynomial time
Yihong Wu (Illinois) Community Detection 34
An×n AN×N
H0 :
H1 :
vs vs
Bern(γ)
clique
k
k
h : 7→
Bern(p)
K
K
Bern(q)
h : A 7→ A is agnostic to the clique and can be computed in P-time
Yihong Wu (Illinois) Community Detection 35
An×n AN×N
H0 :
H1 :
vs vs
Bern(γ)
clique
k
k
h : 7→
Bern(p)
K
K
Bern(q)
h : A 7→ A is agnostic to the clique and can be computed in P-time
Yihong Wu (Illinois) Community Detection 35
Given an integer `, two probability distributions P,Q on 0, 1, . . . , `2
• • • • •
• • • • •
••
Split each nodeinto ` new nodesN = n`,K = k`
`
`
0 Q7→Assign edges withdistributions P,Q
1 P7→
H0 : Bern(γ)
H1 : Bern(1) (in-clique)
(1− γ)Q+ γP
P (in-cluster)
How to choose P,Q?
• Matching H0: (1− γ)Q+ γP = Binom(`2, q)
• Matching H1 approximately: P ≈ Binom(`2, p) in total variation
• Main effort: the law of the resulting graph is close to SBM in totalvariation
Yihong Wu (Illinois) Community Detection 36
Given an integer `, two probability distributions P,Q on 0, 1, . . . , `2
• • • • •
• • • • •
••
Split each nodeinto ` new nodesN = n`,K = k`
`
`
0 Q7→Assign edges withdistributions P,Q
1 P7→
H0 : Bern(γ)
H1 : Bern(1) (in-clique)
(1− γ)Q+ γP
P (in-cluster)
How to choose P,Q?
• Matching H0: (1− γ)Q+ γP = Binom(`2, q)
• Matching H1 approximately: P ≈ Binom(`2, p) in total variation
• Main effort: the law of the resulting graph is close to SBM in totalvariation
Yihong Wu (Illinois) Community Detection 36
Given an integer `, two probability distributions P,Q on 0, 1, . . . , `2
• • • • •
• • • • •
••
Split each nodeinto ` new nodesN = n`,K = k`
`
`
0 Q7→Assign edges withdistributions P,Q
1 P7→
H0 : Bern(γ)
H1 : Bern(1) (in-clique)
(1− γ)Q+ γP
P (in-cluster)
How to choose P,Q?
• Matching H0: (1− γ)Q+ γP = Binom(`2, q)
• Matching H1 approximately: P ≈ Binom(`2, p) in total variation
• Main effort: the law of the resulting graph is close to SBM in totalvariation
Yihong Wu (Illinois) Community Detection 36
Given an integer `, two probability distributions P,Q on 0, 1, . . . , `2
• • • • •
• • • • •
••
Split each nodeinto ` new nodesN = n`,K = k`
`
`
0 Q7→Assign edges withdistributions P,Q
1 P7→
H0 : Bern(γ)
H1 : Bern(1) (in-clique)
(1− γ)Q+ γP
P (in-cluster)
How to choose P,Q?
• Matching H0: (1− γ)Q+ γP = Binom(`2, q)
• Matching H1 approximately: P ≈ Binom(`2, p) in total variation
• Main effort: the law of the resulting graph is close to SBM in totalvariation
Yihong Wu (Illinois) Community Detection 36
Given an integer `, two probability distributions P,Q on 0, 1, . . . , `2
• • • • •
• • • • •
••
Split each nodeinto ` new nodesN = n`,K = k`
`
`
0 Q7→Assign edges withdistributions P,Q
1 P7→
H0 : Bern(γ)
H1 : Bern(1) (in-clique)
(1− γ)Q+ γP
P (in-cluster)
How to choose P,Q?
• Matching H0: (1− γ)Q+ γP = Binom(`2, q)
• Matching H1 approximately: P ≈ Binom(`2, p) in total variation
• Main effort: the law of the resulting graph is close to SBM in totalvariation
Yihong Wu (Illinois) Community Detection 36
Concluding remarks
• Versatility of SDP as a simple, general purpose, computationallyfeasible methodology for community detection
• Construction of dual witness lacks a general recipe
Yihong Wu (Illinois) Community Detection 37
Concluding remarks
1
12/3
p = cq = Θ(n−α)
K = Θ(nβ)
1/2
impossible
easy
1/2hard
O α
β
?
References• B. Hajek, Y. W. & J. Xu (2014). Computational lower bounds for
community detection on random graphs. arXiv:1406.6625 (COLT ’15)• B. Hajek, Y. W. & J. Xu (2014). Achieving exact cluster recovery
threshold via semidefinite programming. arXiv:1412.6156
• B. Hajek, Y. W. & J. Xu (2015). Achieving exact cluster recovery
threshold via semidefinite programming: Extensions. arXiv:1502.07738
Yihong Wu (Illinois) Community Detection 38
Formal statement of hardness of detecting a cluster
γ: edge probability in Planted Clique
Theorem
Assume Planted Clique Hypothesis holds for all 0 < γ ≤ 1/2. Let α > 0and 0 < β < 1 be such that
α < β <1
2+α
4.
Then there exists a sequence (N`,K`, q`)`∈N satisfyinglim`→∞
− log q`logN`
= α and lim`→∞logK`logN`
= β such that for any sequenceof randomized polynomial-time tests φ` for the PDS(N`,K`, 2q`, q`)problem, the Type-I+II error probability is lower bounded by 1.
Proof ideas: Reduce from Planted Clique in polynomial-timeMap approximately:
• G(n, γ) 7→ G(N, q)
• G(n, k, γ, 1) 7→ G(N,K, q, p)
Yihong Wu (Illinois) Community Detection 39
Bound the total variation distance
Lemma
Let `, n ∈ N, k ∈ [n] and γ ∈ (0, 12 ]. Let N = `n, K = k`, p = 2q and
m0 = blog2(1/γ)c. Assume that 16q`2 ≤ 1 and k ≥ 6e`. If G ∼ G(n, γ),then G ∼ G(N, q). If G ∼ G(n, k, 1, γ), then
dTV
(PG,G(N,K, p, q)
). e−K + ke−` + k2(q`2)m0+1 +
√eq`2 − 1
Proof ideas: dTV(P,Q) ≤ 12
√χ2(P,Q) and use negative associations
[Dubhashi-Ranjan ’98] to get rid of dependency in calculating the χ2
distance.
Apply the Lemma by choosing q = `−2−δ so that q`2 → 0: N = `2+δα ,
K = `(2+δ)βα , n = `
2+δα−1, k = `
(2+δ)βα−1. Easy to check that
α < β <1
2− δ +
α(1 + 2δ)
4 + 2δ⇒ log k
log n≤ 1
2− δ
Yihong Wu (Illinois) Community Detection 40
Bound the total variation distance
Lemma
Let `, n ∈ N, k ∈ [n] and γ ∈ (0, 12 ]. Let N = `n, K = k`, p = 2q and
m0 = blog2(1/γ)c. Assume that 16q`2 ≤ 1 and k ≥ 6e`. If G ∼ G(n, γ),then G ∼ G(N, q). If G ∼ G(n, k, 1, γ), then
dTV
(PG,G(N,K, p, q)
). e−K + ke−` + k2(q`2)m0+1 +
√eq`2 − 1
Proof ideas: dTV(P,Q) ≤ 12
√χ2(P,Q) and use negative associations
[Dubhashi-Ranjan ’98] to get rid of dependency in calculating the χ2
distance.
Apply the Lemma by choosing q = `−2−δ so that q`2 → 0: N = `2+δα ,
K = `(2+δ)βα , n = `
2+δα−1, k = `
(2+δ)βα−1. Easy to check that
α < β <1
2− δ +
α(1 + 2δ)
4 + 2δ⇒ log k
log n≤ 1
2− δ
Yihong Wu (Illinois) Community Detection 40
Bound the total variation distance
Lemma
Let `, n ∈ N, k ∈ [n] and γ ∈ (0, 12 ]. Let N = `n, K = k`, p = 2q and
m0 = blog2(1/γ)c. Assume that 16q`2 ≤ 1 and k ≥ 6e`. If G ∼ G(n, γ),then G ∼ G(N, q). If G ∼ G(n, k, 1, γ), then
dTV
(PG,G(N,K, p, q)
). e−K + ke−` + k2(q`2)m0+1 +
√eq`2 − 1
Proof ideas: dTV(P,Q) ≤ 12
√χ2(P,Q) and use negative associations
[Dubhashi-Ranjan ’98] to get rid of dependency in calculating the χ2
distance.
Apply the Lemma by choosing q = `−2−δ so that q`2 → 0: N = `2+δα ,
K = `(2+δ)βα , n = `
2+δα−1, k = `
(2+δ)βα−1. Easy to check that
α < β <1
2− δ +
α(1 + 2δ)
4 + 2δ⇒ log k
log n≤ 1
2− δ
Yihong Wu (Illinois) Community Detection 40
Spectral concentration
Theorem
Let A denote a symmetric and zero-diagonal random matrix, where theentries Aij : i < j are independent and [0, 1]-valued. Assume thatE [Aij ] ≤ p, where c0 log n/n ≤ p ≤ 1− c1 for arbitrary constants c0 > 0and c1 > 0. Then for any c > 0, there exists c′ > 0 such that for anyn ≥ 1,
P‖A− E [A]‖2 ≤ c
′√np≥ 1− n−c.
Yihong Wu (Illinois) Community Detection 41