Community Detection in Networks: SDP relaxation and ...€¦ · Community detection in networks...

Community Detection in Networks:SDP relaxation and Computational Gaps

Yihong Wu

Department of ECEUniversity of Illinois at Urbana-Champaign

[email protected]

Joint work with Bruce Hajek (Illinois) and Jiaming Xu (Wharton)

May 20, 2015

Community detection in networks

• Networks with community structures arise in many applications

Santa Fe Institute Collaboration network [Girvan-Newman ’02]

• Task: Discover underlying communities based on the networktopology

• Applications: Friend or movie recommendation in online socialnetworks

Yihong Wu (Illinois) Community Detection 2

Statistical and computational challenges

• The observed network is sparse

• Large solution space

Question

• Is there a computationally efficient and statistically optimalcommunity detection algorithm?

Statistical and computational challenges

• The observed network is sparse

• Large solution space

Question

• Is there a computationally efficient and statistically optimalcommunity detection algorithm?

Stochastic block model [Holland et al. ’83]Planted partition model [Condon-Karp 01]

n = 40, K = 10, r = 3

p = 0.9

q = 0.1

p = 0.9 q = 0.1

Exact recovery

• True cluster: C∗

• Estimated cluster: C

• Goal: exact recovery (strong consistency)

PC = C∗ n→∞−−−→ 1

• AlternativesI almost exact recovery (weak consistency):

[Mossel-Neeman-Sly ’13, Abbe-Sandon ’15, Montanari ’15]...I correlated recovery:

[Decelle-Krzakala-Moore-Zdeborova ’11, Mossel-Neeman-Sly ’12,Massoulie ’13]...

Exact recovery

• True cluster: C∗

• Estimated cluster: C

• Goal: exact recovery (strong consistency)

PC = C∗ n→∞−−−→ 1

• AlternativesI almost exact recovery (weak consistency):

[Mossel-Neeman-Sly ’13, Abbe-Sandon ’15, Montanari ’15]...I correlated recovery:

[Decelle-Krzakala-Moore-Zdeborova ’11, Mossel-Neeman-Sly ’12,Massoulie ’13]...

Objectives of this talk

• Statistical limit: When is exact recovery possible (impossible)?

• Computational limit: When is exact recovery computationally easy(hard)?

remainder of the talk

1 Linear community size: Sharp recovery via semidefinite programming

2 Sublinear community size: Computational lower bounds

Two equal-sized communities

Binary symmetric SBM

Model:

• n nodes partitioned into two communities of size n2 (σi = ±1).

• i ∼ j independently w.p.

p = a logn

n σi = σj

q = b lognn σi 6= σj

MLE ⇔ MIN BISECTION

Assuming p > q

• Maximum likelihood estimator (MLE)

maxσ〈A, σσ>〉

s.t. σi ∈ ±1, i ∈ [n]

σ>1 = 0,

lift⇐==⇒ maxY〈A, Y 〉

s.t. rank(Y ) = 1

Yii = 1, i ∈ [n]

〈J, Y 〉 = 0.

where J = all-one matrix

MLE ⇔ MIN BISECTION

Assuming p > q

• Maximum likelihood estimator (MLE)

maxσ〈A, σσ>〉

s.t. σi ∈ ±1, i ∈ [n]

σ>1 = 0,

lift⇐==⇒ maxY〈A, Y 〉

s.t. rank(Y ) = 1

Yii = 1, i ∈ [n]

〈J, Y 〉 = 0.

where J = all-one matrix

SDP relaxation

• Semidefinite programming (SDP) relaxation of MLE

YSDP = arg maxY

〈A, Y 〉

s.t. Y 0

Yii = 1, i ∈ [n]

〈J, Y 〉 = 0.

• similar SDP as in [Frieze-Jerrum ’95] for MAX BISECTION

• average-case analysis on generative model (SBM)

• focus on arg max rather than approximating max

• goal: P

YSDP =

Optimal recovery via SDP

Theorem (Abbe-Bandeira-Hall ’14, Mossel-Neeman-Sly ’14)

• If (√a−√b)2 > 2, recovery is achievable in polynomial-time.

• If (√a−√b)2 < 2, recovery is impossible.

Theorem (Hajek-W.-Xu ’14)

SDP achieves the optimal recovery threshold (√a−√b)2 > 2.

Remarks

• originally conjectured in [Abbe-Bandeira-Hall ’14]

• independently proved by [Bandeira ’15]

YSDP =

= 1− n−Ω(1)

Remarks

YSDP =

= 1− n−Ω(1)

Remarks

YSDP =

= 1− n−Ω(1)

Dual certificate

maxY〈A, Y 〉

dual variables

s.t. Y 0

Yii = 1

D = diag di

〈J, Y 〉 = 0

λ ∈ R

Y ∗ = σ∗(σ∗)> is unique solution if ∃D,λ s.t. S = λJ +D −A satisfies

Sσ = 0 and λ2(S) > 0.

⇒ di = (# of nbrs in own cluster)− (# of nbrs in other cluster)

e(i, C1)− e(i, C2) i ∈ C1

e(i, C2)− e(i, C1) i ∈ C2

Dual certificate

maxY〈A, Y 〉 dual variables

s.t. Y 0 S 0

Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R

Sσ = 0 and λ2(S) > 0.

e(i, C1)− e(i, C2) i ∈ C1

e(i, C2)− e(i, C1) i ∈ C2

Dual certificate

maxY〈A, Y 〉 dual variables

s.t. Y 0 S 0

Yii = 1 D = diag di〈J, Y 〉 = 0 λ ∈ R

Sσ = 0 and λ2(S) > 0.

e(i, C1)− e(i, C2) i ∈ C1

e(i, C2)− e(i, C1) i ∈ C2

Verify PSD

• Mean adj matrix: E [A] = p+q2 J + p−q

2 σ∗(σ∗)> − pI

S = λJ−A+D

=(λ− p+ q

)J︸︷︷︸−p− q2

σ∗(σ∗)> + pI +D − (A− E [A])︸︷︷︸• λ2(S) = infx⊥σ∗,‖x‖2=1 x

>Sx > 0 if min di ≥ ‖A− E [A] ‖ andλ ≥ (p+ q)/2

• To finish the proof:

1 min di = ΩP (log n) if√a−√b >√

22 ‖A− E [A] ‖ = OP (

√log n)

Verify PSD

2 σ∗(σ∗)> − pI•

S = λJ−A+D

=(λ− p+ q

)J︸︷︷︸−p− q2

σ∗(σ∗)> + pI +D − (A− E [A])︸︷︷︸

• λ2(S) = infx⊥σ∗,‖x‖2=1 x>Sx > 0 if min di ≥ ‖A− E [A] ‖ and

λ ≥ (p+ q)/2

22 ‖A− E [A] ‖ = OP (

√log n)

Verify PSD

2 σ∗(σ∗)> − pI•

S = λJ−A+D

=(λ− p+ q

)J︸︷︷︸−p− q2

22 ‖A− E [A] ‖ = OP (

√log n)

Verify PSD

2 σ∗(σ∗)> − pI•

S = λJ−A+D

=(λ− p+ q

)J︸︷︷︸−p− q2

22 ‖A− E [A] ‖ = OP (

√log n)

Verify PSD

2 σ∗(σ∗)> − pI•

S = λJ−A+D

=(λ− p+ q

)J︸︷︷︸−p− q2

2 ‖A− E [A] ‖ = OP (√

log n)

Verify PSD

2 σ∗(σ∗)> − pI•

S = λJ−A+D

=(λ− p+ q

)J︸︷︷︸−p− q2

22 ‖A− E [A] ‖ = OP (

√log n)

Remarks

1 Necessity

√a−√b <√

⇒ min di < 0 w.h.p.

⇒ ∃i : # of nbrs in own cluster < # of nbrs in other cluster

⇒ MLE fails

2 Proof of ‖A− E [A] ‖ = OP (√

log n)I 2nd-order stochastic dominance argument [Tomozei-Massoulie ’14]

+ result for iid matrix [Seginer ’00]I [Feige-Ofek ’05]: G(n, C logn

n ) for sufficiently large CI [Bandeira-van Handel ’14]: comparison argument

Remarks

1 Necessity

√a−√b <√

⇒ min di < 0 w.h.p.

⇒ ∃i : # of nbrs in own cluster < # of nbrs in other cluster

⇒ MLE fails

2 Proof of ‖A− E [A] ‖ = OP (√

log n)I 2nd-order stochastic dominance argument [Tomozei-Massoulie ’14]

+ result for iid matrix [Seginer ’00]I [Feige-Ofek ’05]: G(n, C logn

n ) for sufficiently large CI [Bandeira-van Handel ’14]: comparison argument

Multiple equal-sized communities

r equal-sized clusters

• 0, 1-cluster matrix:

Y ∗ =∑r

k=1 ξk(ξk)> =

where ξk = indicator of the kth cluster of size K = n/r

• SDP relaxation of MLE:

maxY〈A, Y 〉

s.t. Y 0

Yii = 1

Yij ≥ 0∑j

Yij = K

r equal-sized clusters

• 0, 1-cluster matrix:

Y ∗ =∑r

k=1 ξk(ξk)> =

where ξk = indicator of the kth cluster of size K = n/r• SDP relaxation of MLE:

maxY〈A, Y 〉

s.t. Y 0

Yii = 1

Yij ≥ 0∑j

Yij = K

Optimality of SDP

Theorem ([Hajek-W.-Xu ’15])

SDP achieves optimal threshold (√a−√b)2 > r.

Proof of correctness:

maxY〈A, Y 〉

s.t. Y 0

Yii = 1

Yij ≥ 0

B ≥ 0

Yij = K

Optimality of SDP

Theorem ([Hajek-W.-Xu ’15])

SDP achieves optimal threshold (√a−√b)2 > r.

Proof of correctness:

maxY〈A, Y 〉

s.t. Y 0 S 0

Yii = 1 di

Yij ≥ 0 B ≥ 0∑j

Yij = K λi

Construction of the dual witness

• For node i ∈ Ck,

λi =1

(max6=k

e(i, C`)−Kq/2 +√

log n/2)

di = e(i, Ck)−max6=k

e(i, C`)−1

∑j∈Ck

max6=k

e(j, C`) +Kq −√

• B =

, where each is rank-2, specified by

BCk×Ck′ (i, j) =1

(max6=k

e(i, Ck′)− e(i, Ck′) + max`6=k′

e(j, Ck)− e(j, Ck)

+e(Ck, Ck′)

K−Kq +

√log n

)• S = D −A−B + λ1> + 1λ>

Construction of the dual witness

maxY〈A, Y 〉

s.t. Y 0 S 0

Yii = 1 di

Yij ≥ 0 B ≥ 0∑j

Yij = K λi

• Sξk = 0 for k = 1, . . . , r.

• λr+1(S) > 0 if min di ≥ ‖A− E [A] ‖ = OP (√

log n)

• di = (# of nbrs in own cluster)−maximal (# of nbrs in other clusters) +OP (

√log n).

• Sharp thresholdI√a−√b >√r ⇒ min di = Ω(log n)⇒ SDP succeeds

I√a−√b <√r ⇒ min di = −Ω(log n)⇒ MLE fails

Unequal-sized clusters

Two unequal-sized clusters: known size

Two clusters of size K and n−K (K = ρn):

YSDP = arg maxY

〈A, Y 〉

s.t. Y 0

Yii = 1, i ∈ [n]

〈J, Y 〉 = (2K − n)2

achieves optimal threshold η(ρ, a, b) > 1.

Note: ρ 7→ η(ρ, a, b) is minimized at η(1/2, a, b) = 12(√a−√b)2 ⇒

“suggests” equal-sized case is the hardest for two communities

Two unequal-sized clusters: known size

Two clusters of size K and n−K (K = ρn):

YSDP = arg maxY

〈A, Y 〉

s.t. Y 0

Yii = 1, i ∈ [n]

〈J, Y 〉 = (2K − n)2

achieves optimal threshold η(ρ, a, b) > 1.

Note: ρ 7→ η(ρ, a, b) is minimized at η(1/2, a, b) = 12(√a−√b)2 ⇒

“suggests” equal-sized case is the hardest for two communities

Two unequal-sized clusters: unknown size

Two clusters of size K and n−K (K = 0, 1, . . . , n):

YSDP = arg maxY

〈A, Y 〉 − λ〈J, Y 〉

s.t. Y 0

Yii = 1, i ∈ [n]

with λ = a−blog a−log b

lognn achieves optimal threshold

(√a−√b)2 > 2.

Note: If K = Ω(n), there exists a data-driven choice of λ.

Two unequal-sized clusters: unknown size

Two clusters of size K and n−K (K = 0, 1, . . . , n):

YSDP = arg maxY

〈A, Y 〉 − λ〈J, Y 〉

s.t. Y 0

Yii = 1, i ∈ [n]

with λ = a−blog a−log b

lognn achieves optimal threshold

(√a−√b)2 > 2.

Note: If K = Ω(n), there exists a data-driven choice of λ.

More generally...

• Binary censored block model: G(n, a lognn ) observe edge label flipped

w.p. εI SDP achieves sharp threshold a (

√1− ε−

√ε)2 > 1

I Closes the gap in [Abbe-Bandeira-Bracher-Singer ’14]

• General SBM:I Optimality of SDP relaxation remains open (but within a factor of 4)I Sharp threshold is found in [Abbe-Sandon ’15] via a two-stage

procedure.

Detecting a single cluster

Finding a single community

• One cluster of size K plus n−K outliers

• Connectivity p within cluster and q otherwise

• Also known as Planted Dense Subgraph model

• Linear community size: K = ρn and SDPachieves sharp threshold

• Next focus on K = Θ(nβ).

Conjecture on computational limit

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

spectral barrier

Conjecture [Chen-Xu ’14]: no polynomial-time algorithm succeedsbeyond the spectral barrier [Nadakuditi-Newman ’12]

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

spectral barrier

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

spectral barrier

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

spectral barrier

+ A− E[A]

−3 −2 −1 0 1 2 3 4 50

K(p−q)σ

semi−circle law

Eigenvalue distribution of A−q11>

σ for σ =√q(1− q)n

+ A− E[A]

−3 −2 −1 0 1 2 3 4 50

K(p−q)σ

semi−circle law

Eigenvalue distribution of A−q11>

σ for σ =√q(1− q)n

Planted clique hardness hypothesis

H0 : Bern(γ) vs H1 : Bern(1)

Bern(γ)

Intermediate regime: log n K √n, γ = Θ(1)

• detection is possible but believed to have high computationalcomplexity: [Alon et al. ’11] [Feldman et al. ’13][Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]

• various hardness results assuming Planted Clique hardnessI detecting sparse principal component [Berthet-Rigollet ’13]: γ = 1

2I detecting sparse submatrix [Ma-W. ’13, Cai-Liang-Rakhlin ’15]:γ = 1

I cryptography [Applebaum-Barak-Wigderson ’10]: γ = 2− log0.99 n

Planted clique hardness hypothesis

H0 : Bern(γ) vs H1 : Bern(1)

Bern(γ)

Intermediate regime: log n K √n, γ = Θ(1)

• detection is possible but believed to have high computationalcomplexity: [Alon et al. ’11] [Feldman et al. ’13][Deshpande-Montanari ’15] [Meka-Potechin-Wigderson ’15]

• various hardness results assuming Planted Clique hardnessI detecting sparse principal component [Berthet-Rigollet ’13]: γ = 1

2I detecting sparse submatrix [Ma-W. ’13, Cai-Liang-Rakhlin ’15]:γ = 1

I cryptography [Applebaum-Barak-Wigderson ’10]: γ = 2− log0.99 n

Hard regime for recovering a single cluster

Assuming Planted Clique hardness for any constant γ > 0

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

Recovering a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(

Hard regime for recovering a single cluster

Assuming Planted Clique hardness for any constant γ > 0

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

Recovering a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(

Proof step 1: Recovery is harder than detection

Recovery versus Detection [Arias-Castro-Verzelen ’14] :

H0 : Bern(q) vs H1 : Bern(p)

Bern(q)

Each node is included in S with probability Kn

Proof step 1: Recovery is harder than detection

Recovery versus Detection [Arias-Castro-Verzelen ’14] :

H0 : Bern(q) vs H1 : Bern(p)

Bern(q)

Each node is included in S with probability Kn

Proof step 2: Hardness for detecting a single cluster

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

• Detecting a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(

• Reduced from Planted Clique detection in polynomial time

Proof step 2: Hardness for detecting a single cluster

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

• Detecting a single cluster in the red regime is at least as hard asdetecting a clique of size K = o(

• Reduced from Planted Clique detection in polynomial time

An×n AN×N

Bern(γ)

clique

h : 7→

Bern(p)

Bern(q)

h : A 7→ A is agnostic to the clique and can be computed in P-time

An×n AN×N

Bern(γ)

clique

h : 7→

Bern(p)

Bern(q)

h : A 7→ A is agnostic to the clique and can be computed in P-time

Given an integer `, two probability distributions P,Q on 0, 1, . . . , `2

• • • • •

••

Split each nodeinto ` new nodesN = n`,K = k`

0 Q7→Assign edges withdistributions P,Q

1 P7→

H0 : Bern(γ)

H1 : Bern(1) (in-clique)

(1− γ)Q+ γP

P (in-cluster)

How to choose P,Q?

• Matching H0: (1− γ)Q+ γP = Binom(`2, q)

• Matching H1 approximately: P ≈ Binom(`2, p) in total variation

• Main effort: the law of the resulting graph is close to SBM in totalvariation

• • • • •

••

1 P7→

H0 : Bern(γ)

(1− γ)Q+ γP

P (in-cluster)

How to choose P,Q?

• • • • •

••

1 P7→

H0 : Bern(γ)

(1− γ)Q+ γP

P (in-cluster)

How to choose P,Q?

• • • • •

••

1 P7→

H0 : Bern(γ)

(1− γ)Q+ γP

P (in-cluster)

How to choose P,Q?

• • • • •

••

1 P7→

H0 : Bern(γ)

(1− γ)Q+ γP

P (in-cluster)

How to choose P,Q?

Concluding remarks

• Versatility of SDP as a simple, general purpose, computationallyfeasible methodology for community detection

• Construction of dual witness lacks a general recipe

Concluding remarks

p = cq = Θ(n−α)

K = Θ(nβ)

impossible

1/2hard

References• B. Hajek, Y. W. & J. Xu (2014). Computational lower bounds for

community detection on random graphs. arXiv:1406.6625 (COLT ’15)• B. Hajek, Y. W. & J. Xu (2014). Achieving exact cluster recovery

threshold via semidefinite programming. arXiv:1412.6156

• B. Hajek, Y. W. & J. Xu (2015). Achieving exact cluster recovery

threshold via semidefinite programming: Extensions. arXiv:1502.07738

Formal statement of hardness of detecting a cluster

γ: edge probability in Planted Clique

Theorem

Assume Planted Clique Hypothesis holds for all 0 < γ ≤ 1/2. Let α > 0and 0 < β < 1 be such that

α < β <1

Then there exists a sequence (N`,K`, q`)`∈N satisfyinglim`→∞

− log q`logN`

= α and lim`→∞logK`logN`

= β such that for any sequenceof randomized polynomial-time tests φ` for the PDS(N`,K`, 2q`, q`)problem, the Type-I+II error probability is lower bounded by 1.

Proof ideas: Reduce from Planted Clique in polynomial-timeMap approximately:

• G(n, γ) 7→ G(N, q)

• G(n, k, γ, 1) 7→ G(N,K, q, p)

Bound the total variation distance

Let `, n ∈ N, k ∈ [n] and γ ∈ (0, 12 ]. Let N = `n, K = k`, p = 2q and

m0 = blog2(1/γ)c. Assume that 16q`2 ≤ 1 and k ≥ 6e`. If G ∼ G(n, γ),then G ∼ G(N, q). If G ∼ G(n, k, 1, γ), then

(PG,G(N,K, p, q)

). e−K + ke−` + k2(q`2)m0+1 +

√eq`2 − 1

Proof ideas: dTV(P,Q) ≤ 12

√χ2(P,Q) and use negative associations

[Dubhashi-Ranjan ’98] to get rid of dependency in calculating the χ2

distance.

Apply the Lemma by choosing q = `−2−δ so that q`2 → 0: N = `2+δα ,

K = `(2+δ)βα , n = `

2+δα−1, k = `

(2+δ)βα−1. Easy to check that

α < β <1

2− δ +

α(1 + 2δ)

4 + 2δ⇒ log k

log n≤ 1

2− δ

(PG,G(N,K, p, q)

). e−K + ke−` + k2(q`2)m0+1 +

√eq`2 − 1

distance.

K = `(2+δ)βα , n = `

2+δα−1, k = `

α < β <1

2− δ +

α(1 + 2δ)

4 + 2δ⇒ log k

log n≤ 1

2− δ

(PG,G(N,K, p, q)

). e−K + ke−` + k2(q`2)m0+1 +

√eq`2 − 1

distance.

K = `(2+δ)βα , n = `

2+δα−1, k = `

α < β <1

2− δ +

α(1 + 2δ)

4 + 2δ⇒ log k

log n≤ 1

2− δ

Spectral concentration

Theorem

Let A denote a symmetric and zero-diagonal random matrix, where theentries Aij : i < j are independent and [0, 1]-valued. Assume thatE [Aij ] ≤ p, where c0 log n/n ≤ p ≤ 1− c1 for arbitrary constants c0 > 0and c1 > 0. Then for any c > 0, there exists c′ > 0 such that for anyn ≥ 1,

P‖A− E [A]‖2 ≤ c

′√np≥ 1− n−c.

Community Detection in Networks: SDP relaxation and ...€¦ · Community detection in networks...

Documents

Transcript of Community Detection in Networks: SDP relaxation and ...€¦ · Community detection in networks...