Luca Trevisan U.C Berkeley Based on work withTsz Chiu Kwok...
Transcript of Luca Trevisan U.C Berkeley Based on work withTsz Chiu Kwok...
Spectral Algorithms for Graph Partitioning
Luca TrevisanU.C Berkeley
Based on work with Tsz Chiu Kwok, Lap Chi Lau, James Lee,Yin Tat Lee, and Shayan Oveis Gharan
spectral graph theory
Use linear algebra to
• understand/characterize graph properties
• design efficient graph algorithms
linear algebra and graphs
Take an undirected graph G = (V ,E ), consider matrix
A[i , j ] := weight of edge (i , j)
eigenvalues −→ combinatorial propertieseigenvenctors −→ algorithms for combinatorial problems
linear algebra and graphs
Take an undirected graph G = (V ,E ), consider matrix
A[i , j ] := weight of edge (i , j)
eigenvalues −→ combinatorial propertieseigenvenctors −→ algorithms for combinatorial problems
linear algebra and graphs
Take an undirected graph G = (V ,E ), consider matrix
A[i , j ] := weight of edge (i , j)
eigenvalues −→ combinatorial propertieseigenvenctors −→ algorithms for combinatorial problems
fundamental thm of spectral graph theory
G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees
L := I − D−12AD−
12
L is symmetric, all its eigenvalues are real and
• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2
• λk = 0 iff G has ≥ k connected components
• λn = 2 iff G has a bipartite connected component
fundamental thm of spectral graph theory
G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees
L := I − D−12AD−
12
L is symmetric, all its eigenvalues are real and
• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2
• λk = 0 iff G has ≥ k connected components
• λn = 2 iff G has a bipartite connected component
fundamental thm of spectral graph theory
G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees
L := I − D−12AD−
12
L is symmetric, all its eigenvalues are real and
• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2
• λk = 0 iff G has ≥ k connected components
• λn = 2 iff G has a bipartite connected component
fundamental thm of spectral graph theory
G = (V ,E ) undirected, A adjacency matrix, dv degree of v , Ddiagonal matrix of degrees
L := I − D−12AD−
12
L is symmetric, all its eigenvalues are real and
• 0 = λ1 ≤ λ2 ≤ · · · ≤ λn ≤ 2
• λk = 0 iff G has ≥ k connected components
• λn = 2 iff G has a bipartite connected component
0,2
3,
2
3,
2
3,
2
3,
2
3,
5
3,
5
3,
5
3,
5
3
0,2
3,
2
3,
2
3,
2
3,
2
3,
5
3,
5
3,
5
3,
5
3
0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8
0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8
0,2
3,
2
3,
2
3,
4
3,
4
3,
4
3, 2
0,2
3,
2
3,
2
3,
4
3,
4
3,
4
3, 2
eigenvalues as solutions to optimizationproblems
If M ∈ Rn×n is symmetric, and
λ1 ≤ λ2 ≤ · · · ≤ λnare eigenvalues counted with multiplicities, then
λk = minA k−dim subspace of RV
maxx∈A
xTMx
xT x
and eigenvectors of λ1, . . . , λk are basis of opt A
why eigenvalues relate to connectedcomponents
If G = (V ,E ) is undirected and L is Laplacian, then
xTLx
xT x=
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
If λ1 ≤ λ2 ≤ · · ·λn are eigenvalues of L with multiplicities,
λk = minA k−dim subspace of RV
maxx∈A
R(x)
where
R(x) :=
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
Note:∑
(u,v)∈E |xu − xv |2 = 0 iff x constant on connectedcomponents
why eigenvalues relate to connectedcomponents
If G = (V ,E ) is undirected and L is Laplacian, then
xTLx
xT x=
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
If λ1 ≤ λ2 ≤ · · ·λn are eigenvalues of L with multiplicities,
λk = minA k−dim subspace of RV
maxx∈A
R(x)
where
R(x) :=
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
Note:∑
(u,v)∈E |xu − xv |2 = 0 iff x constant on connectedcomponents
why eigenvalues relate to connectedcomponents
If G = (V ,E ) is undirected and L is Laplacian, then
xTLx
xT x=
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
If λ1 ≤ λ2 ≤ · · ·λn are eigenvalues of L with multiplicities,
λk = minA k−dim subspace of RV
maxx∈A
R(x)
where
R(x) :=
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
Note:∑
(u,v)∈E |xu − xv |2 = 0 iff x constant on connectedcomponents
if G has ≥ k connected components
λk = minA k−dim subspace of RV
maxx∈A
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
Consider the space of all vectors that are constant in eachconnected component; it has dimension at least k and so itwitnesses λk = 0.
if G has ≥ k connected components
λk = minA k−dim subspace of RV
maxx∈A
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
Consider the space of all vectors that are constant in eachconnected component; it has dimension at least k and so itwitnesses λk = 0.
if λk = 0
λk = minA k−dim subspace of RV
maxx∈A
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
If λk = 0 there is a k-dimensional space A of vectors, eachconstant on connected components
The graph must have ≥ k connected components
if λk = 0
λk = minA k−dim subspace of RV
maxx∈A
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
If λk = 0 there is a k-dimensional space A of vectors, eachconstant on connected components
The graph must have ≥ k connected components
if λk = 0
λk = minA k−dim subspace of RV
maxx∈A
∑(u,v)∈E |xu − xv |2∑
v dv · x2v
If λk = 0 there is a k-dimensional space A of vectors, eachconstant on connected components
The graph must have ≥ k connected components
conductance
vol(S) :=∑
v∈S dv
φ(S) :=E (S , S)
vol(S)
φ(G ) := minS⊆V :0<vol(S)≤ 1
2vol(V )
φ(S)
In regular graphs, same as edge expansion and, up to factor of 2non-uniform sparsest cut
conductance
0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8
φ(G ) = φ(0, 1, 2) =0
6= 0
conductance
0, .055, .055, .211, . . .
φ(G ) =2
18
conductance
0,2
3,
2
3,
2
3,
2
3,
2
3,
5
3,
5
3,
5
3,
5
3
φ(G ) =5
15
finding S of small conductance
• G is a social network
• S is a set of users more likely to be friends with each otherthan anyone else
• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation
• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in
time 2O(|E(S,S)|)
finding S of small conductance
• G is a social network• S is a set of users more likely to be friends with each other
than anyone else
• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation
• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in
time 2O(|E(S,S)|)
finding S of small conductance
• G is a social network• S is a set of users more likely to be friends with each other
than anyone else
• G is a similarity graph on items of a data sets
• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation
• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in
time 2O(|E(S,S)|)
finding S of small conductance
• G is a social network• S is a set of users more likely to be friends with each other
than anyone else
• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation
• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in
time 2O(|E(S,S)|)
finding S of small conductance
• G is a social network• S is a set of users more likely to be friends with each other
than anyone else
• G is a similarity graph on items of a data sets• S are items more similar to each other than to other items• [Shi, Malik]: image segmentation
• G is an input of a NP-hard optimization problem• Recurse on S , V − S , use dynamic programming to combine in
time 2O(|E(S,S)|)
conductance versus λ2
Recall:λ2 = 0⇔ G disconnected
⇔ φ(G ) = 0
Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s):
λ2
2≤ φ(G ) ≤
√2λ2
conductance versus λ2
Recall:λ2 = 0⇔ G disconnected⇔ φ(G ) = 0
Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s):
λ2
2≤ φ(G ) ≤
√2λ2
conductance versus λ2
Recall:λ2 = 0⇔ G disconnected⇔ φ(G ) = 0
Cheeger inequality (Alon, Milman, Dodzuik, mid-1980s):
λ2
2≤ φ(G ) ≤
√2λ2
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
Constructive proof of φ(G ) ≤√
2λ2:
• given eigenvector x of λ2.
• sort vertices v according to xv
• a set of vertices S occurring as initial segment or finalsegment of sorted order has
φ(S) ≤√
2λ2 ≤ 2√φ(G )
Fiedler’s sweep algorithm can be implemented in O(|V |+ |E |) time
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
Constructive proof of φ(G ) ≤√
2λ2:
• given eigenvector x of λ2.
• sort vertices v according to xv
• a set of vertices S occurring as initial segment or finalsegment of sorted order has
φ(S) ≤√
2λ2
≤ 2√φ(G )
Fiedler’s sweep algorithm can be implemented in O(|V |+ |E |) time
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
Constructive proof of φ(G ) ≤√
2λ2:
• given eigenvector x of λ2.
• sort vertices v according to xv
• a set of vertices S occurring as initial segment or finalsegment of sorted order has
φ(S) ≤√
2λ2 ≤ 2√φ(G )
Fiedler’s sweep algorithm can be implemented in O(|V |+ |E |) time
applications
λ2
2≤ φ(G ) ≤
√2λ2
• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)
• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)
• mixing time of random walk is O(
1λ2
log |V |)
and so also
O
(1
φ2(G )log |V |
)
applications
λ2
2≤ φ(G ) ≤
√2λ2
• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)
• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)
• mixing time of random walk is O(
1λ2
log |V |)
and so also
O
(1
φ2(G )log |V |
)
applications
λ2
2≤ φ(G ) ≤
√2λ2
• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)
• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)
• mixing time of random walk is O(
1λ2
log |V |)
and so also
O
(1
φ2(G )log |V |
)
applications
λ2
2≤ φ(G ) ≤
√2λ2
• rigorous analysis of sweep algorithm(practical performance usually better than worst-case analysis)
• to certify φ ≥ Ω(1), enough to certify λ2 ≥ Ω(1)
• mixing time of random walk is O(
1λ2
log |V |)
and so also
O
(1
φ2(G )log |V |
)
goal
A similar theory for other eigenvectors, with algorithmicapplications
One intended application: Like Cheeger’s inequality is worst caseanalysis of Fiedler’s algorithm, develop worst-case analysis ofspectral clustering
spectral clustering
The following algorithm works well in practice. (But to solve whatproblem?)
Compute k eigenvectors x(1), . . . , x(k) ofk smallest eigenvalues of L
Define mapping F : V → Rk
F (v) := (x(1)v , x
(2)v , . . . , x
(k)v )
Apply k-means to the points that vertices are mapped to
spectral embedding of a random graphk = 3
When λk is small
Lee, Oveis-Gharan, T, 2012
From fundamental theorem of spectral graph theory:
• λk = 0⇔ G has ≥ k connected components
We prove:
• λk small ⇔ G has ≥ k disjoint sets of small conductance
Lee, Oveis-Gharan, T, 2012
From fundamental theorem of spectral graph theory:
• λk = 0⇔ G has ≥ k connected components
We prove:
• λk small ⇔ G has ≥ k disjoint sets of small conductance
order-k conductance
Define order-k conductance
φk(G ) := minS1,...,Sk disjoint
maxi=1,...,k
φ(Si )
Note:
• φ(G ) = φ2(G )
• φk(G ) = 0⇔ G has ≥ k connected components
0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8
φ3(G ) = max
0,
2
6,
2
4
=
1
2
0, 0, .69, .69, 1.5, 1.5, 1.8, 1.8
φ3(G ) = max
0,
2
6,
2
4
=
1
2
0, .055, .055, .211, . . .
φ3(G ) = max
2
12,
2
12,
2
14
=
1
6
λk
λk = 0⇔ φk = 0
λk2≤ φk(G ) ≤ O(k2) ·
√λk
(Lee,Oveis Gharan, T, 2012)
Upper bound is constructive: in nearly-linear time we can find kdisjoint sets each of expansion at most O(k2)
√λk .
λk
λk = 0⇔ φk = 0
λk2≤ φk(G ) ≤ O(k2) ·
√λk
(Lee,Oveis Gharan, T, 2012)
Upper bound is constructive: in nearly-linear time we can find kdisjoint sets each of expansion at most O(k2)
√λk .
λk
λk2≤ φk(G ) ≤ O(k2) ·
√λk
φk(G ) ≤ O(√
(log k) · λ1.1k)
(Lee,Oveis Gharan, T, 2012)
φk(G ) ≤ O(√
(log k) · λO(k))
(Louis, Raghavendra, Tetali, Vempala, 2012)
Factor O(√
log k) is tight
λk
λk2≤ φk(G ) ≤ O(k2) ·
√λk
φk(G ) ≤ O(√
(log k) · λ1.1k)
(Lee,Oveis Gharan, T, 2012)
φk(G ) ≤ O(√
(log k) · λO(k))
(Louis, Raghavendra, Tetali, Vempala, 2012)
Factor O(√
log k) is tight
λk
φk(G ) ≤ O(√
(log k) · λO(k))
(Louis, Raghavendra, Tetali, Vempala, 2012)(Lee,Oveis Gharan, T, 2012)
Miclo 2013:
• Analog for infinite-dimensional Markov process.
• Application: every hyper-bounded operator has a spectral gap.
• Solves 40+ year old open problem
spectral clustering
Recall spectral clustering algorithm
1 Compute k eigenvectors x(1), . . . , x(k) ofk smallest eigenvalues of L
2 Define mapping F : V → Rk
F (v) := (x(1)v , x
(2)v , . . . , x
(k)v )
3 Apply k-means to the points that vertices are mapped to
LOT algorithms and LRTV algorithm find k low-conductance setsif they exist
First step is spectral embedding
Then geometric partitioning but not k-means
spectral clustering
Recall spectral clustering algorithm
1 Compute k eigenvectors x(1), . . . , x(k) ofk smallest eigenvalues of L
2 Define mapping F : V → Rk
F (v) := (x(1)v , x
(2)v , . . . , x
(k)v )
3 Apply k-means to the points that vertices are mapped to
LOT algorithms and LRTV algorithm find k low-conductance setsif they exist
First step is spectral embedding
Then geometric partitioning but not k-means
When λk is large
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
φ(G ) ≤ O(k) · λ2√λk
(Kwok, Lau, Lee, Oveis Gharan, T, 2013)
and the upper bound holds for the sweep algorithm
Nearly-linear time sweep algorithm finds S such that, for every k ,
φ(S) ≤ φ(G ) · O(
k√λk
)
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
φ(G ) ≤ O(k) · λ2√λk
(Kwok, Lau, Lee, Oveis Gharan, T, 2013)
and the upper bound holds for the sweep algorithm
Nearly-linear time sweep algorithm finds S such that, for every k ,
φ(S) ≤ φ(G ) · O(
k√λk
)
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
φ(G ) ≤ O(k) · λ2√λk
(Kwok, Lau, Lee, Oveis Gharan, T, 2013)
and the upper bound holds for the sweep algorithm
Nearly-linear time sweep algorithm finds S such that, for every k ,
φ(S) ≤ φ(G ) · O(
k√λk
)
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
φ(G ) ≤ O(k) · λ2√λk
(Kwok, Lau, Lee, Oveis Gharan, T, 2013)
Liu 2014: analog for Riemann manifolds. Applications to boundingeigenvalue gaps
Cheeger inequality
λ2
2≤ φ(G ) ≤
√2λ2
φ(G ) ≤ O(k) · λ2√λk
(Kwok, Lau, Lee, Oveis Gharan, T, 2013)
Liu 2014: analog for Riemann manifolds. Applications to boundingeigenvalue gaps
proof structure
1 We find a 2k-valued vector y such that
||x − y ||2 ≤ O
(λ2
λk
)where x is eigenvector of λ2
2 For every k-valued vector y , applying the sweep algorithm tox finds a set S such that
φ(S) ≤ O(k) ·(λ2 +
√λ2 · ||x − y ||
)
So the sweep algorithm finds S
φ(S) ≤ O(k) · λ2 ·1√λk
proof structure
1 We find a 2k-valued vector y such that
||x − y ||2 ≤ O
(λ2
λk
)where x is eigenvector of λ2
2 For every k-valued vector y , applying the sweep algorithm tox finds a set S such that
φ(S) ≤ O(k) ·(λ2 +
√λ2 · ||x − y ||
)
So the sweep algorithm finds S
φ(S) ≤ O(k) · λ2 ·1√λk
proof structure
1 We find a 2k-valued vector y such that
||x − y ||2 ≤ O
(λ2
λk
)where x is eigenvector of λ2
2 For every k-valued vector y , applying the sweep algorithm tox finds a set S such that
φ(S) ≤ O(k) ·(λ2 +
√λ2 · ||x − y ||
)
So the sweep algorithm finds S
φ(S) ≤ O(k) · λ2 ·1√λk
bounded threshold rank
[Arora, Barak, Steurer], [Barak, Raghavendra, Steurer],[Guruswami, Sinop], ...
If λk is large,then a cut of approximately minimal conductance
can be found in time exponential in kusing spectral methods [ABS]or convex relaxations [BRS, GS]
[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation if λk is large
bounded threshold rank
[Arora, Barak, Steurer], [Barak, Raghavendra, Steurer],[Guruswami, Sinop], ...
If λk is large,then a cut of approximately minimal conductance
can be found in time exponential in kusing spectral methods [ABS]or convex relaxations [BRS, GS]
[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation if λk is large
bounded threshold rank
If λk is large,
[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation
[Oveis-Gharan, T]: the Goemans-Linial relaxation (solvable inpolynomial time independent of k) can be rounded better than inthe Arora-Rao-Vazirani analysis
[Oveis-Gharan, T]: the graph satisfies a weak regularity lemma inthe sense of Frieze and Kannan
bounded threshold rank
If λk is large,
[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation
[Oveis-Gharan, T]: the Goemans-Linial relaxation (solvable inpolynomial time independent of k) can be rounded better than inthe Arora-Rao-Vazirani analysis
[Oveis-Gharan, T]: the graph satisfies a weak regularity lemma inthe sense of Frieze and Kannan
bounded threshold rank
If λk is large,
[Kwok, Lau, Lee, Oveis-Gharan, T]: the nearly-linear time sweepalgorithm finds good approximation
[Oveis-Gharan, T]: the Goemans-Linial relaxation (solvable inpolynomial time independent of k) can be rounded better than inthe Arora-Rao-Vazirani analysis
[Oveis-Gharan, T]: the graph satisfies a weak regularity lemma inthe sense of Frieze and Kannan
bounded threshold rank
If λk is large, the graph is an easy instance for several algorithms.Two types of results:
• There is a near-optimal combinatorial solution with a simplestructure
Find it by brute force
• The optimum of a relaxation has a special structure
Improved rounding algorithm that uses the special structure
When λk+1 is large and λk is small
eigenvalue gap
• if λk = 0 and λk+1 > 0 then G has exactly k connectedcomponents;
• [Tanaka 2012] If λk+1 > 5k√λk , then vertices of G can be
partitioned into k sets of small conductance, each inducing asubgraph of large conductance. [Non-algorithmic proof]
• [Oveis-Gharan, T 2013] Sufficient φk+1(G ) > (1 + ε)φk(G ); ifλk+1 > O(k2)
√λk , then partition can be found
algorithmically
eigenvalue gap
• if λk = 0 and λk+1 > 0 then G has exactly k connectedcomponents;
• [Tanaka 2012] If λk+1 > 5k√λk , then vertices of G can be
partitioned into k sets of small conductance, each inducing asubgraph of large conductance. [Non-algorithmic proof]
• [Oveis-Gharan, T 2013] Sufficient φk+1(G ) > (1 + ε)φk(G ); ifλk+1 > O(k2)
√λk , then partition can be found
algorithmically
eigenvalue gap
• if λk = 0 and λk+1 > 0 then G has exactly k connectedcomponents;
• [Tanaka 2012] If λk+1 > 5k√λk , then vertices of G can be
partitioned into k sets of small conductance, each inducing asubgraph of large conductance. [Non-algorithmic proof]
• [Oveis-Gharan, T 2013] Sufficient φk+1(G ) > (1 + ε)φk(G ); ifλk+1 > O(k2)
√λk , then partition can be found
algorithmically
In Summary
λk is small: There are k disjoint non-expanding sets; they can befound efficiently
λk is large: Easy instance for various algorithms.
λk is small and λk+1 is large: Nodes can be partitioned into ksets, each with high internal expansion and small externalexpansion
lucatrevisan.wordpress.com/tag/expanders/