Image segmentation using Eigenvectors Speaker : Sameer Agarwal Course : Learning and Vision Seminar...

69
Image segmentation using Eigenvectors Speaker : Sameer Agarwal Course : Learning and Vision Seminar Date : 09/10/2001

Transcript of Image segmentation using Eigenvectors Speaker : Sameer Agarwal Course : Learning and Vision Seminar...

Image segmentation using Eigenvectors

Speaker : Sameer Agarwal

Course : Learning and Vision Seminar

Date : 09/10/2001

“Theoretically I might say there are 327 brightnesses and nuances of color. Do I have “327” No. I have sky, house, and trees. It is impossible to achieve “327” as such. And yet even though such droll calculations are possible--- and implied, say, for the house 120, the trees 90 and the sky 117– I should at least have this arrangement and division of the total, and not, say, 127 and 100 and 100; or 150 and 177.”

Laws of Organization in Perceptual Forms

Max Wertheimer (1923)

What is Image Segmentation ?

Partitioning of an image into related regions.

Why do Image Segmentation ?

Image Compression - Identify distinct components within an image and use the most suitable compression algorithm for each component to get a higher compression ratio.

Medical Diagnosis - Automatic segmentation of MRI images for identification of cancerous regions

Mapping and Measurement - Automatic analysis of remote sensing data from satellites to identify and measure regions of interest. e.g. Petroleum reserves.

How many groups ?

Out of the various possible partitions, which is the correct one ?

The bayesian view

Given prior knowledge about the structure of the data, choose the partition which is most probable.

Problem :

How do you specify a prior for knowledge which is composed of knowledge on multiple scales. e.g.

– Coherence– Symmetry

A simple implementation

Assume that the image was generated by a mixture of multiple models

Segmentation is done in two steps :1. Estimate the parameters of the mixture model

2. For each point calculate the posterior probabilities of it belonging to a cluster. Assign to the cluster with the maximum posterior.

Why doesn’t it work ?

The model selection problem.– Number of components ?– The structure of the components?

Estimation problem transforms into a hard optimization problem. No guarantee of convergence to the global optima.

Prior Work

1. k-means

2. Mixture Models (Expectation Maximization)

3. k-Medoid

4. k-Harmonic

5. Self Organizing Maps

6. Neural Gas

7. Linkage based graph methods.

Outline of the talk

1. The Gestalt approach to perceptual grouping

2. Graph theoretic formulation of the segmentation problem

3. The normalized cut

4. Experimental results

5. Relation to other methods

6. Conclusions

The Gestalt approach

Gestalt : a structure, configuration, or pattern of physical, biological, or psychological phenomena so integrated as to constitute a functional unit with properties not derivable by summation of its parts

“The whole is different from the sum of the parts”

The Gestalt Movement

1. Formed by Max Werthheimer, Wolfgang Kohler and Kurt Koffka.

2. Rejected structuralism and its assumptions of atomicity and empiricism.

3. Adopted a “Holistic” approach to perception.

An Example

Emergent properties of a configuration. The arrangement of several dots in a line gives rise to emergent properties, such as length, orientation and curvature, that are different from the properties of the dots that compose it.

Gestalt Cues

And the moral of the story is ..

Image segmentation based on low level cues cannot and should not aim to produce a complete final “correct” segmentation.

Instead use low-level attributes like color, brightness to sequentially come up with hierarchical partitions.

Mid and high-level knowledge can be used to either confirm or select some partition for further attention.

A graph theoretic approach

A weighted undirected graph G = (V,E) Nodes are points in the feature space Fully connected graph Edge weight w(i,j) is a function of the similarity between

nodes i and j.

Task: Partition the set V into disjoint sets V1,..,Vn, s.t. similarity among nodes in Vi is high and similarity across Vi and Vj is low.

Issues

What is a good partition ? How can you compute such a partition

efficiently ?

Graph Cut

G=(V,E) Sets A and B are a disjoint partition of V

Cut(A,B) is a measure of similarity between the two groups.

BvAu

vuwBACut,

),(),(

The temptation

Cut is a measure of association

Minimizing it will give a partition with the maximum disassociation.

Efficient poly-time algorithms algorithms exist to solve the MinCut problem.

So why not use it ?

The problem with MinCut

The Normalized Cut

Given a partition (A,B) of the vertex set V.

Ncut(A,B) measures similarity between two groups, normalized by the “volume” they occupy in the whole graph.

VtAu

tuwVAassoc

VBassoc

BAcut

VAassoc

BAcutBANcut

,

),(),(

),(

),(

),(

),(),(

Matrix formulation

Definitions:

D is an n x n diagonal matrix with entries

W is an n x n symmetrical matrix

j

jiwiiD ),(),(

),(),( jiwjiW

After some linear algebra we get..

Dyy

yWDyGMinNcut

t

t

y

)(min)(

Subject to the constraints:

1. y(i) ε {1,-b}2. ytD1=0 NP-Complete

Real numbers to the rescue

Relax the constraints on y, and allow it to take real value.

Claim :The real valued MinNcut(G) can then be solved for by solving the generalized eigenvalue problem

for the second smallest generalized eigenvector.

DyyWD )(

Proof

Rewrite the equation as

Here

Lemma 1: is an eigenvector of the above eigensystem with eigenvalue 0.

zzDWDD

2

1

2

1

)(

yDz 2

1

12

1

0 Dz

Proof(contd.)

Lemma 2 : is a positive definite matrix since (D-W) is known to be positive semi-definite.

Lemma 3 : z0 is the smallest eigenvector of eigensystem.

Lemma 4 : z1 is perpendicular to z0

2

1

2

1

)(

DWDD

Proof (contd.)

Lemma 5 : Let A be a real symmetric matrix, Under the constraint that x is orthogonal to the j-1 smallest eigenvectors x1,…,xj-1,the quotient

is minimized by the next smallest eigenvector.

xx

Axxt

t

Finally..

1. By lemma 1 we have y0=1 is an eigenvector of the eigensystem with eigenvalue 0.

2. It is the “smallest” eigenvector.3. Hence by lemma 2, the second smallest

eigenvector (y1) will minimize the Ncut equation.

4. By lemma 3 and 4 z1

tz0= y1tD1=0

What about the first constraint ?

The second smallest eigenvector is only an approximation to the optimal normalized cut.

y1 minimizes

Y will take similar values for nodes with with high similarity value.

),()(

))()((inf

2

2

01 iiDiy

wjyiy

i

ijji

Dy t

The grouping algorithm

1. Given an image, set up the weighted graph G=(E,V). Set the weight on the edges connecting two nodes as a measure of the similarity between the nodes.

2. Solve (D-W)x=λDx for eigenvectors with the smallest eigenvalues.

3. Use the second smallest eigenvector to bipartition the graph.

Details..

The eigenvector takes continuous values, how do use it to segment the image ?

1. Choose 0 as the splitting point.2. Find the median of the eigenvector and use that as

the splitting point3. Search amongst l evenly spaced points for one

which gives the best exact Ncut value.4. Impose a stability criterion on the eigenvector.

Stability ?

Since we allow the eigenvectors to take real values. Some eigenvectors might take a smooth continuous form.

We want vectors that have sharp discontinuities, indicating separation between regions.

Measure the smoothness of the vector, and stop partitioning when the smoothness value falls below a threshold.

Detail.. (Contd.)

How do you partition images with multiple segments ?

1. The higher order eigenvectors contain information about sub-partitions. Keep splitting till Ncut exceeds some pre-specified value.

Problem : Numerical Error

2. Recursively run the algorithm on successive subgraphs.

Problem : Computationally Expensive and the stability criterion might prevent correct partitioning.

Simultanous P-way cut

1. Use the first n eigenvectors as n-dimensional indicator vectors of each point. This is equivalent to imbedding each point in an n-dimensional space.

2. Perform k-means clustering in this new space to create p’>p clusters.

3. Use the original 2-way Ncut or a greedy strategy to merge these p’ partitions into p partitions.

How good is the approximation ?

The normalized cheeger constant h is defined as :

We know that the second eigenvalue is bounded by :

This is only a qualitative indication of the quality of approximation, it does not say anything about how close the eigenvector is to the optimal Ncut vector.

)),(),,(min(

),(inf

VBassocVAassoc

BACuth

22

2

1

hh

Example I

Distance Matrix

The second generalized eigenvector

The first partition

The second generalized eigenvector

The second partition

The fourth generalized eigenvector

The third partition

Example II

The structure of the affinity matrix

Generalized eigenvalues

The first partition

The second partition

The third partition

The fourth partition

The fifth partition

The sixth partition

Complexity Issues

Finding Eigenvectors for an n x n matrix is O(n3) operation.

This is extremely expensive One solution is to make the affinity matrix

sparse. Only consider nearby points. Efficient methods exist for finding eigenvectors of sparse matrices.

Even with the best methods, its not possible to perform this task in real time.

The Nystrom method

Belongie et. al. made the observation that the affinity matrix has very low rank i.e. the matrix has very few unique rows.

Hence its possible to approximate the eigenvectors of the whole affinity matrix by linearly interpolating the eigenvectors of a small randomly sampled sub-matrix.

This method is fast enough to give real-time performance.

This is also referred to as the Nystrom method in operator theory.

Cuts Galore

The standard Cheeger constant

defines the ratio cut (Hu & Kahng) The Feidler value is the solution to the problem

which known as the average cut.||

),(

||

),(min

AV

AAVCut

A

AVACutVA

|)||,min(|

),(

AVA

AVAcut

Association or Disassociation ?

Normalized Cut can be formulated as a minimization of association between clusters OR as maximization of association within clusters.

),(

),(

),(

),(2

),(

),(

),(

),(

VBassoc

BBassoc

VAassoc

AAassoc

VBassoc

BAcut

VAassoc

BAcut

Average Cut is NOT symmetric

The average does not share the same relationship with its corresponding notion of normalized association.

The RHS gives rise to another kind of cut which we refer to as the average association.

||

),(

||

),(max

||

),(

||

),(min

B

BBassoc

A

AAassoc

B

BAcut

A

BAcut

Relationship between Average,Ratio and Normalized Cuts

Average Association

Assoc(A,A)/|A| + Assoc(B,B)/|B|

Normalized Cut

Cut(A,B)/assoc(A,V)+ Cut(A,B)/assoc(B,V)

= 2 – (assoc(A,A)/assoc(A,V) + assoc(B,B)/assoc(B,V))

Average Cut

Cut(A,B)/|A| + Cut(A,b)/|B|

Wx=λx (D-W)x=λDx (D-W)x=λx

Con

tinu

ous

For

mul

atio

nD

iscr

ete

For

mul

atio

n

Finding Clumps Finding Splits

Perona and Freeman

1. Construct the affinity matrix W for the graphs G(V,E)

2. Find the eigenvector with the largest eigenvalue.

3. Threshold it to get a partition of the nodes of G.

yWy

Shi & Malik

Construct the matricies W and D. Find the second smallest generalized eigen

vector of (D-W) i.e.

Threshold y1 to get a partitioning of the graph.

DyyWD )(

A closer look

Define a new matrix N as

Lemma : If v is an eigenvector of N with eigenvalue λ, then D-1/2v is a generalized eigenvector of W with eigenvector 1-λ. Also

0< λ <1.

Hence Perona and Freeman use the largest eigenvector of the un-normalized affinity matrix, and Shi & Malik use the ratio of the first two vectors of the normalized affinity matrix.

2

1

2

1

WDDN

Scott and Longuet-Higgins

Construct the matrix V whose columns are the k eigenvectors of W

Normalize the rows of V Construct the matrix Q = V VT

Segment points using Q. If i and j belong to the same cluster, Q(i,j)=1, 0 if they belong to different groups.

In an ideal world..

A & B would be constant and C would be 0. Then W can be decomposed as

BC

CAW T

TOSOW

bc

caS

11...00

00...11TO

And that tells us..

1. If V is a 2x2 matrix whose columns are the first two eigen vectors of W. Then V = ODR, where D is a 2x2 diagonal matrix and R is a 2x2 rotation matrix. Now if W(i,j) on depends on the membership of i and j :

1. If v1 is the indicator vector(first eigenvector of W) of the PF algorithm, then if i and j belong to the same cluster then v(i) = v(j).

2. If v1 is the indicator vector(second generalized eigenvector of W) and if i and j belong to the same cluster then v(i) = v(j).

3. If Q is the indicator matrix in the the SLH method, then Q(i,j)=1, 0 otherwise.

Non-constant Matricies

Let A,B be arbitrary positive matrices and C=0.

Let v be the PF indicator vector. If λ(A)1 > λ(B)1 , then v(i) >0 for all points belonging to the first cluster and v(j) =0 for points belonging to the second cluster.

Let v be the SM indicator vector, then v(i)=v(j) if points i and j belong to the same cluster.

If λ(B)1 > |λ(A)2 | and λ(A)1 > |λ(B)2 | then Q(i,j) = 1 if i,j belong to the same cluster, 0 otherwise.

Conclusions

Normalized cut presents a new optimality criterion for partitioning a graph into clusters.

Ncut is normalized measure of disassociation and minimizing it is equivalent to maximizing association.

The discrete problem corresponding to Min Ncut is NP-Complete.

We solve an approximate version of the MinNcut problem by converting it into a generalized eigenvector problem.

Conclusions (contd.)

There are a number of approaches which use the eigenvectors of matrices related to the affinity matrix of a graph.

Three of these methods can be shown to be based on the top eigenvectors of the affinity matrix. They differ in two ways1. Which eigenvectors to look at.2. Whether to normalize the matrix or not ?

References

1. Normalized Cut and Image Segmentation – Jianbo Shi and Jitendra Malik

2. Segmentation using eigenvectors: a unifying view – Yair Weiss

Acknowledgements

Serge Belongie for sharing hours of excitement and details of Linear Algebra and associated wonders.

Ben Leong for sharing his figures. And the music of Tool for keeping me

company.