Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition...

30
References Convergence of Graph Laplacians Daniel Ting 1 joint work with Ling Huang 2 and Michael Jordan 3 1 Dept. of Statistics, UC Berkeley 2 Intel Research, Berkeley 3 EECS and Dept. of Statistics, UC Berkeley July 29, 2011 1 / 29

Transcript of Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition...

Page 1: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Convergence of Graph Laplacians

Daniel Ting 1

joint work withLing Huang 2 and Michael Jordan 3

1Dept. of Statistics, UC Berkeley

2Intel Research, Berkeley

3EECS and Dept. of Statistics, UC Berkeley

July 29, 2011

1 / 29

Page 2: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Overview

What the talk is aboutMy general goal: Understand methods that rely on the manifoldassumption

Graph LaplaciansUsed in:

I ClusteringI Semi-supervised learningI Non-linear Dimensionality Reduction

Goals:I Better understanding of Laplacian or “Laplacian like” methodsI Analysis of kNN graphsI Better choices when constructing graphs

2 / 29

Page 3: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Graph Laplacian

Definition (Graph Laplacian)

W = edge weight matrixD = diagonal degree matrix

There are three commonly used versions of Graph LaplaciansLu = D −W (unnormalized)

Ln = I −D−1/2WD−1/2 (normalized)

Lrw = I −D−1W (random walk).

3 / 29

Page 4: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

A visual example: kNN non-linear embedding

−2

0

2

−2

0

2

0.05

0.1

0.15

(A) Gaussian Manifold

−0.1 −0.05 0 0.05

−0.1

−0.05

0

0.05

Kernel Laplacian Embedding

−0.05 0 0.05

−0.05

0

0.05

(C) Raw kNN Laplacian Embedding

Two graph constructions:1) Why do they behave differently?2) Is it fixable?

4 / 29

Page 5: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Constructing graphs on a manifold

Convert Euclidean distance to edge weights: Existing theory1 Choose a smooth kernel K (e.g. Gaussian)2 Choose a bandwidth h3 Choose a weighting/normalization exponent 1/d(x)α

wij =1

d(xi)αd(xj)αK

(‖xi − xj‖

h

)Overly restrictive conditions!

Fails to cover kNN graphsImportant since kNN graphs

I are sparse and have good computational propertiesI show better empirical performance in SSL applications

5 / 29

Page 6: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Constructing graphs on a Manifold

Generalizing edge weight choices1 Allow for a non-smooth kernel K2 Use a location dependent bandwidth hR3 Use arbitrary weight functions ω

wij = ω(xi)ω(xj) K(‖xi − xj‖hR(xi, xj)

)Covers most geometric graph constructions of interest. This covers kNNgraphs by using an indicator kernel.

6 / 29

Page 7: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Overview: The details

AnalysisIntroduce kernel-free framework for analysis using diffusion theoryIdentify the limiting operator and corresponding smoothness functional

I Effect of the densityI Effect of the graph construction method

Discuss how choose a good graph construction

ApplicationkNN graphs“self-tuning” graphs of Zelnik-Manor & Perona (2004)Locally Linear Embedding (LLE)

7 / 29

Page 8: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Setting and Notation

Setting

Compact, smooth m-dimensional manifold without boundaryM embedded in Rk

Density p

Graph construction method (e.g. kNN) where neighborhood sizes are shrinking

Examine L(n)u f →? (i.e. pointwise convergence in the strong operator topology)

NotationData points x, y ∈ RkSmooth C3 function onM f :M→ RNormal coordinates for y

in neighborhood of x s ∈ Rm

Canonical measure onM ηBase kernel function K(·) : R+ → R+

degree function dn(·)

8 / 29

Page 9: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Motivation for a Kernel-free framework

Previous questionHere’s a graph construction method. Does a limit exist and what is it?

Broader questionsWhat characterizes the limit?What are sufficient conditions for a limit to exist?

9 / 29

Page 10: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Answer: Diffusion processes

CharacterizationsLaplacian is a (negative) infinitesimal generator for a random walkDrift µ and diffusion σσT terms characterize a diffusion process

dXt = µ(Xt)dt+ σ(Xt)dWt

Infinitesimal generator of a diffusion process is

Gf =12

∑i,j

(σσT

)ij

∂2f

∂si∂sj+∑i

µi∂f

∂si.

10 / 29

Page 11: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Interpreting the drift and diffusion

A special elliptic operator: Laplace-BeltramiIf σσT = g(·)I and µ/g is a conservative vector field with µ/g = 2∇ log q, then

G =12g∆ + 〈µ,∇〉 =

12g∆q

Continuous analog to graph LaplacianSmoothness functional∫

〈f,−∆qf〉dη =∫〈∇f,∇f〉qdη = ‖∇f‖2L2(q)

11 / 29

Page 12: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Main result

Key AssumptionsKernel K of bounded variation

Bandwidth scaling hn → 0 and hm+2n n/ logn→∞

Weight ωn(x) and bandwidth functions Rn(x, y) with Taylor-like expansions

Rn(x, x+ ε) = hn`r(x) + εdr(x, sign(u(x)Tε)) + o(hn)

´ResultExpressed in normal coordinates, the drift and diffusion defined by Lrw are

µs(x) = r(x)2„∇p(x)p(x)

+∇ω(x)

ω(x)+m+ 2

2

dr(x)

r(x)

«,

σs(x)σs(x)T = r(x)2I

where dr(x) = 12(dr(x, 1) + dr(x,−1))us(x).

12 / 29

Page 13: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Remarks

Alternative form (for self-adjoint Laplacians)If Rn(x, y)/hn =

√r(x)r(y) + o(hn) e.v. then we have

The asymptotic limit of the graph Laplacian is

−cnLrw → r2∆q

−c′nLu →q

p∆q

where q = p2ω2rm+2.Gives a change of measure from p to q

13 / 29

Page 14: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Proof sketch

Based on one cute trick. Write a kernel with bounded variation as aweighted sum of indicator kernels

K(x) =∫

I(x < z)dν+(z)−∫

I(x < z)dν−(z)

All the calculations involving non-random quantities reduce to findingmoments for a sphere

Location dependent bandwidth =⇒ in-dicator kernel behaves like it is shifted bym+2

2 h2dr. Shift introduces drift. Effect onthe diffusion term is of smaller order than theleading term.

������������������������������������������

������������������������������������������

{

(m+2)r’(x)h /22

r(x)h

Use Bernstein inequality and Borel-Cantelli to get almost sureconvergence of random quantities

14 / 29

Page 15: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Application to kernel graphs

Weights

wij =K (‖xi − xj‖ /hn)dn(xi)αdn(xj)α

ωn(x) = 1/dn(x)α → 1/p(x)α

Asymptotics (known)Rn(x, y) = 1q = p2w2rd+2 = p2−2α

Asymptotic random walk and unnormalized Laplacians arecnLrw → ∆p2−2α

c′nLu → p1−2α∆p2−2α

If α < 1, drift towards high density regions.

15 / 29

Page 16: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Application to kNN

Weights

wij = I(‖xi − xj‖ < Rn(xi, xj)

)kn/n→ 0 and k1+2/m

n /(n2/m log n)→∞

Undirected kNN graph (OR rule)Rn(x, y)/hn ≈ max{p(x)−1/m, p(y)−1/m}dr(x) = 1

2∇p(x)−1/m

Asymptotic random walk and unnormalized Laplacians are

cnLrw = c′nLu →1

p2/m∆p1−2/m

Drift away from high density regions if m = 1!

16 / 29

Page 17: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

kNN non-linear embedding example

−2

0

2

−2

0

2

0.05

0.1

0.15

(A) Gaussian Manifold

−0.1 −0.05 0 0.05

−0.1

−0.05

0

0.05

Kernel Laplacian Embedding

−0.05 0 0.05

−0.05

0

0.05

(C) Raw kNN Laplacian Embedding

Fix:kNN limit: 1

p∆p0

kernel limit: ∆p

Take ω = p1/2 ≈ r−1

17 / 29

Page 18: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

kNN non-linear embedding example

−2

0

2

−2

0

2

0.05

0.1

0.15

(A) Gaussian Manifold

−0.1 −0.05 0 0.05

−0.1

−0.05

0

0.05

Kernel Laplacian Embedding

−0.05 0 0.05

−0.05

0

0.05

(C) Raw kNN Laplacian Embedding

−0.05 0 0.05

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08(D) Rescaled kNN Laplacian Embedding

18 / 29

Page 19: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Application to “self-tuning” graphs

“self-tuning” kernel (Zelnik-Manor & Perona (2004))

K(x, y) = exp

(−‖x− y‖

2

σxσy

)where σx is the distance between x and the kth neighbor

Asymptotics

r(x, y) =√p(x)−1/mp(y)−1/m

dr(x) = 12∇p(x)−1/m (same as undirected kNN)

Same asymptotic Laplace-Beltrami operator as for kNN

cnLrw = c′nLu →1

p1+2/m∆p1−2/m

19 / 29

Page 20: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Application of kernel-free approach to LLE

Reminder: LLEGoal: Find weights and coordinates that minimize reconstruction error.

Find weights Find coordinates(local regression) (eigen-decomposition)

minW

Y T (I −W )T (I −W )Y minzzT (I −W )T (I −W )z

BehaviorBelkin & Niyogi (2003) give a heuristic derivation that LLE should behave likethe Laplace-Beltrami operator, but... This is not completely true!

20 / 29

Page 21: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Application of kernel-free approach to LLE

Normal coordinatesConverting normal coordinates s to extrinsic coordinates y

y = x+HTxs+ LT⊥x (ssT ) +O(‖s‖3)

Rough analysisExamine the “drift” and “diffusion” terms for the LLE matrix I −W .

µRk(x) = HTx Es︸︷︷︸µ(x)

+LT⊥x E(ssT )︸ ︷︷ ︸σσT (x)

The quantities in normal coordinates satisfyµ = 0 LT⊥x

(σσT

)= 0

21 / 29

Page 22: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Application of kernel-free approach to LLE

Implications for behaviorµ = 0

I No effect of density on drift

LT⊥x(σσT

)= 0

I Curvature of manifold affects diffusion termI No well-defined limit when L is not full rankI Does not behave like any elliptic operator if L is full rank

Behavior in practiceWeights are regularized=⇒ favors constant weights=⇒ like kNN-Laplacian if regularization is large

22 / 29

Page 23: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Explaining weird behavior of LLE

−20

2

−2

0

2

−1

−0.5

0

0.5

1

(A) Toroidal helix

−0.05 0 0.05−0.05

0

0.05

(B) Laplacian

−2 −1.5 −1 −0.5 0 0.5 1

−1.5

−1

−0.5

0

0.5

1

1.5(C) LLE w/ regularization 1e−3

−1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(D) LLE w/ regularization 1e−9

23 / 29

Page 24: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Explaining weird behavior of LLE

−20

2

−2

0

2

−1

−0.5

0

0.5

1

(A) Toroidal helix

−0.05 0 0.05−0.05

0

0.05

(B) Laplacian

−2 −1.5 −1 −0.5 0 0.5 1

−1.5

−1

−0.5

0

0.5

1

1.5(C) LLE w/ regularization 1e−3

−2 −1.5 −1 −0.5 0 0.5 1 1.5−2

−1.5

−1

−0.5

0

0.5

1

1.5LLE w/ projection step

24 / 29

Page 25: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Effect of regularization and boundary in LLE

−10 −5 0 5 102040

−10

−5

0

5

10

Swiss hole

−2 −1 0 1−2−101

−1

−0.5

0

0.5

1

1.5

LLE w/ regularization 1e−0

−20

2

−2−1012

−1

0

1

LLE w/ regularization 1e−2

−2024 −2−1012

−2

−1

0

1

LLE w/ regularization 1e−3

25 / 29

Page 26: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Effect of regularization and boundary in LLE

−10 −5 0 5 102040

−10

−5

0

5

10

Swiss hole

−2 −1 0 1

−1

0

1

012

LLE w/ regularization 1e−9

−2 0 2−202

−1

0

1

2

LLE w/projection

−2024 −2−1012

−2

−1

0

1

LLE w/ regularization 1e−3

26 / 29

Page 27: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Other good properties: Convergence of Eigenvectors

ImportanceConstruction of valid basis functionsPossibly of relevance for spectral clustering

Graph Laplacians and spectrum (von Luxburg et al. (2008))

Luf(·) = d(·)f(·)−∫K(·, y)f(y)dPy

No convergence for eigenvalues in rng(d).Good spectral properties: Choose d so rng(d) = {1}.

27 / 29

Page 28: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Graph construction and spectrum

Asymptotic degree operator and weighting density

d(x) = p(x)ω(x)2r(x)m set= 1

q(x) = p(x)2ω(x)2r(x)m+2

= p(x)r(x)2 Any density q is possible

Possible “good” symmetric graph constructionsw/o loc dep bw w/ loc dep bw

Unnormalized∆p

q

p∆q

Normalized p12−α∆p2−2αp−

12+α

g

(q

p∆q

)g−1

28 / 29

Page 29: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Conclusion

What we introduceKernel-free framework using diffusion processesAnalysis for a general class of graph constructionsLocation dependent bandwidth + arbitrary weights + non-smooth kernels

Practical implicationsBetter understanding of kNN graphsLLE

I no “drift” componentI affected by curvature of the manifold

Construction of arbitrary first order smoothness functionals ‖∇f‖2L2(q),

not just q = pα.Pilot density estimates lead to “better” graph constructions

29 / 29

Page 30: Convergence of Graph Laplacianssaito/confs/ICIAM11/ting.pdfReferences Graph Laplacian Definition (Graph Laplacian) W= edge weight matrix D= diagonal degree matrix There are three

References

Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reductionand data representation. Neural Computation, 15(6):1373–1396, 2003.

Donoho, D.L. and Grimes, C. Hessian eigenmaps: Locally linear embeddingtechniques for high-dimensional data. Proceedings of the NationalAcademy of Sciences, 100(10):5591, 2003.

von Luxburg, U., Belkin, M., and Bousquet, O. Consistency of spectralclustering. Annals of Statistics, 36(2):555–586, 2008.

Zelnik-Manor, L. and Perona, P. Self-tuning spectral clustering. In NIPS 17,2004.

Zhang, Z. and Zha, H. Principal Manifolds and Nonlinear DimensionalityReduction via Tangent Space Alignment. SIAM Journal on ScientificComputing, 26:313, 2004.

29 / 29