“From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached...

30
“From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of drawing; but I am unsatisfied with all that I have produced before my seventieth year”. Hokusai

Transcript of “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached...

Page 1: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

“From my sixth year, I had a perfect mania for drawing every object that I saw.

When I had reached my fiftieth year, I had published a vast quantity of drawing;

but I am unsatisfied with all that I have produced before my seventieth year”.

Hokusai

Page 2: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

COMMUNITY DETECTION & RAMANUJAN GRAPHS: A PROOF OF THE "SPECTRAL REDEMPTION CONJECTURE"

Charles Bordenave, Marc Lelarge, Laurent Massoulié

Page 3: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Community Detection

3

Profile space

Identification of groups of similar objects within overall population based on their observed graph of interactions Closely related objectives: clustering and embedding

Page 4: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

The Stochastic Block Model [Holland-Laskey-Leinhardt’83]

n “nodes” partitioned into r categories Category : nodes Edge between nodes u,v present with probability s:“signal strength”

Observation: adjacency matrix A

A = + Noise matrix

Page 5: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Outline

Basic spectral methods for “rich signal” case Ramanujan-like spectrum separation

The “weak signal” case (sparse observations) Phase transition on detectability Non-backtracking matrices and “spectral

redemption”

Page 6: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Basic spectral clustering

From Matrix A extract R normalized eigenvectors corresponding to R largest eigenvalues

Form R-dimensional node representatives

Group nodes u according to proximity of spectral representatives

Page 7: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Result for “logarithmic” signal strength s

Assume s=(log(n)) and clusters are distinguishable, i.e.

Then spectrum of A consists of R eigenvalues i of order (s) (R r) and n-R eigenvalues i of order

Node representatives based on top R eigenvectors :

Cluster according to underlying “blocks” except for negligible fraction of nodes

Page 8: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Proof arguments

Control spectral radius of noise matrix

+ perturbation of matrix eigen-elements

(for symmetric matrices: Weyl’s inequalities, Courant-Fisher variational characterization,…)

A = + random “noise” matrix

Block matrix non-zero eigenvalues:

(s)

Page 9: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

spectral separation properties “à la Ramanujan”

s-regular graph Ramanujan if

[Lubotzky-Phillips-Sarnak’88]

The best possible spectral gap:

[Alon-Boppana’91]

Page 10: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

spectral separation properties “à la Ramanujan”

[Friedman’08]: random s-regular graph verifies whp

[Feige-Ofek’05]: for Erdős-Rényi graph and , then whp

Also:

Page 11: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

spectral separation properties “à la Ramanujan”

Corollary: in SBM with , whp ’s leading eigen-elements close to those of

For , spectral separation is lost

Page 12: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Outline

Basic spectral methods for “rich signal” case Ramanujan-like spectrum separation

The “weak signal” case (sparse observations) Phase transition on detectability Non-backtracking matrices and “spectral

redemption”

Page 13: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

: threshold for Giant component

Weak signal strength: s = (1)

Correct classification of all but negligible fraction of nodes impossible (isolated nodes…)

Assess performance of clustering by overlap metric: k

n

uuun

k1

maxˆ1

ˆov 1

Signal strength s

Overlap

Signal strength s

Overlap

𝑠𝑐𝑠0

Page 14: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Symmetric two-communities scenario:

Conjecture ([Decelle-Krzakala-Moore-Zdeborova 2011]: For , overlap tends to zero for any

Proven by [Mossel-Neeman-Sly 2012] For , positive overlap can be achieved (by Belief Propagation [DKMZ 2011]; by “spectral redemption” [Krzakala-Moore-Mossel-Neeman-Sly-Zdeborova-Zhang 2013])

First proof that positive overlap achievable when : spectral method applied to matrix counting self-avoiding walks with logarithmic length [LM 2013]

Weak signal strength : s=1

ab ab bb

Page 15: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

“Spectral redemption” and the non-backtracking matrix

Edge-to-edge non-backtracking matrix

A non-symmetric matrix, such that number of non-backtracking paths on G of length m+1 starting at e and ending at f

Spectral redemption conjecture [KMMNSZZ’13]:

When positive overlap is achievable, it can be obtained by extraction of leading eigenvectors of (plus thresholding & postprocessing)

e f e

f

Page 16: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

“Spectral redemption” and the non-backtracking matrix

Let be the leading eigenvalues of and corresponding eigenvectors in

In the symmetric two-community case, eigenvalues of

, (hence ), and feasibility condition: Assume constant per-community average degree:

Let Let ,

a/2 b/2

a/2b/2

Page 17: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Main result

With high probability as , eigenvalues of satisfy

For , if has multiplicity 1, corresponding eigenvector:

asymptotically parallel to for

Page 18: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Corollary 1

Assume , equal community sizes and has multiplicity 1 for some

Then positive overlap obtained by following procedure

1) Extract eigenvector of 2) Form signs 3) To each node assign for edge picked uniformly among

graph edges with 4) Assign community at random according to distribution

uee’

e’’

Page 19: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Illustration for 2-community symmetric Stochastic block model

𝑖∗

Page 20: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Corollary 2

For Erdős-Rényi graph , spectrum of associated non-backtracking matrix satisfies with high probability as

An approximate version of

Property generalizes notion of Ramanujan graph to non-regular case [Ihara’66,Stark-Terras’96] Hence: Corollary 2 a « non-regular » version of Friedman’s result for regular graphs: most random graphs approximately Ramanujan according to extended definition

Page 21: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Illustration for Erdős-Rényi graph with

Page 22: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Proof elements

Key step: for prove matrix expansion

where (near-eigenvalue), (near-eigenvector),

and with high probability (near-orthonormal system) (low condition number) (negligible perturbation)

Yields the result, after leveraging Bauer-Fike theorem(as not symmetric, can’t rely on Courant-Fisher characterization of eigen-elements as solutions of optimization problems)

Low-rank

Page 23: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Proof elements (local analysis) (near-orthonormal system) (low condition number)

Established by « local analysis »:

Couple node neighborhood to Galton-Watson branching tree

Control growth of processes of interest on tree (martingale analyses, elaborating on results of [Kesten-Stigum’66])

i

+ + + - -

Page 24: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Nb of distance t neighbors: Sum of spins of distance t neighbors:

then whp:

Proof elements (local analysis)

i

+ + + - -

Page 25: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Proof elements (ctd)

To prove negligible perturbation , establish Intermediate result:

Introduce , centered version of :

where sum over non-backtracking « tangle-free » paths starting at e, ending at f

Yields after some manipulations

Page 26: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Proof elements (ctd)

to be bounded over

, (bound expectation of moments: path counting combinatorics a la [Füredi-Komlós’81])

for

« Local analysis » entails that close to , to which is orthogonal

Page 27: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Remaining mysteries about SBM’s (1)

Conjectured “phase diagram” for more than 2 blocks

(assuming fixed inter-community parameter b)

Intra-communityparameter a

Number ofcommunities r

Detection easy(spectral methods or BP)

Detection hard but feasible(how? In polynomial time?)

Detection infeasible

r=4 r=5

Page 28: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Remaining mysteries about SBM’s (2)

Clique detection problem: add a size-K clique to random graph with edge-probability ½

i.e. a 2-block SBM with unbalanced block sizes:

for clique easily detectable (e.g. inspection of node degrees)

are there polynomial-time algorithms for smaller yet large K? (e.g. )A notoriously hard problem (“planted clique detection” recently proposed as a new benchmark of algorithmic hardness)

K

n-K½

½

½

Page 29: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Conclusions and Outlook

Variations of basic spectral methods still to be invented: interesting mathematics and practical relevance

Detection in SBM = rich playground for analysis of computational complexity with methods of statistical physics

Computationally efficient methods for “hard” cases (planted clique, intermediate phase for multiple communities)?

Non-regular Ramanujan graphs: theory still in its infancy (strong analogue of Alon-Boppana’s theorem still missing, but…)

Page 30: “From my sixth year, I had a perfect mania for drawing every object that I saw. When I had reached my fiftieth year, I had published a vast quantity of.

Thanks!