Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

45
Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov

Transcript of Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Page 1: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Statistical perturbation theory for spectral clustering

Harrachov, 2007

A. Spence and Z. Stoyanov

Page 2: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Plan of the Talk

A. Clustering (Brief overview).

B. Deterministic Perturbation Theory.

C. Statistical Perturbation Theory.

Page 3: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Graph Clustering

3

41

2

6

7

5

Page 4: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Graph Clustering

3

41

2

6

7

5

Page 5: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Graph Clustering + Perturbation

3

41

2

6

7

5?

Page 6: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Gene Expression DataGene Expression Data ClusteringClustering

An Application

• There are over 10 000 genes expressed in any one tissue;

• DNA arrays typically produce very noisy data.

1. Genes in same cluster behave similarly?

2. Genes in different clusters behave differently?

1. Genes in same cluster behave similarly?

2. Genes in different clusters behave differently?

Issues:

Page 7: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Bi-partite Graphs

1

2

3

4

1

2

3

Page 8: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Matrix Form

Page 9: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

A Real Data Matrix (Leukemia)

Page 10: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Spectral Clustering: General Idea

Discrete Optimisation Problem(NP - Hard)

Discrete Optimisation Problem(NP - Hard)

Real Optimisation Problem(Tractable)

Real Optimisation Problem(Tractable)

Approximation

Exact - Impractical

Heuristic - Practical

Page 11: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Discrete Optimisation SVD

Active

Inactive

Inactive

Active

Solution: SingularValueDecomposition of Wscaled

Page 12: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Clustering Algorithm: Summary

ACTIVE

ACTIVEINACTIVE

INACTIVE

Page 13: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Literature

Page 14: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Types of Graph Matrices

Page 15: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

How we Cluster

Page 16: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Leukemia Data

Page 17: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Clustered Leukemia Data

Page 18: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Inaccuracies in the Data(Perturbation Theory)

Page 19: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Perturbation Theory(Deterministic Noise)

Page 20: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Deterministic Perturbation(Symmetric Matrix)

Page 21: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Linear Solve

Page 22: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Taylor Expansions

Page 23: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Rectangular Case Symmetric

Page 24: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Random Perturbations (plan)

• The Model

• Issues with the Theory

• A Possible Solution via Simulations?

• Experiments

Page 25: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

The Model

3

41

2

6

7

5

Page 26: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Difficulties with Random Matrix Theory (RMT)

Page 27: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Deterministic Perturbation Stochastic Perturbation

(simple eigenvector)

Page 28: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Deterministic Perturbation Stochastic Perturbation

(simple eigenvalues)

Page 29: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

PP Plot -Test for Normality(Largest eigenvalue of a Symmetric Matrix)

Page 30: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Simulated Random Perturbation(Largest eigenvalue of a Symmetric Matrix)

Page 31: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Deterministic Perturbation Stochastic Perturbation

(simple eigenvectors)

Page 32: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Results for Laplacian Matrices

Page 33: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Functional of the Eigenvector

Page 34: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Results for hTv2

Page 35: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

PP Plot of hTv’(0) - Test for Normality (h = ej)

Page 36: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Histogram of hTv’(0) - Simulations(h = ej)

Page 37: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

PP Plot of Simulated v[j]()(Distribution close to Normal)

Page 38: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Histogram of Simulated v[j]()(Distribution close to Normal)

Page 39: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Extension to the Rectangular Case

Page 40: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Probability of “Wrong Clustering”

Page 41: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Issues with Numerics

Page 42: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Efficient Simulations

Page 43: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Solution via Simulations?

Page 44: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Solution via Simulations?(Algorithm)

Page 45: Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.

Comparing: Direct Calculation Vs. Repeated Linear Solve