Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means,...

90
Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the end) Competitive NN (SOM) (not shown here) SVC, QC Applications COMPACT (an ill-defined problem)
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    236
  • download

    0

Transcript of Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means,...

Page 1: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Clustering

IntroductionPreprocessing: dimensional reduction with SVDClustering methods: K-means, FCMHierarchical methodsModel based methods (at the end)Competitive NN (SOM) (not shown here)SVC, QCApplicationsCOMPACT

(an ill-defined problem)

Page 2: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

What Is Clustering?

Why? To help understand the natural grouping or structure in a data set

When? Used either as a stand-alone tool to get insight into data distribution or as a preprocessing step for other algorithms, e.g., to discover

classes

Not

Not

Clas

sifica

tion

Class

ifica

tion

!!!!

Clustering is partitioning of data into meaningful (?) groups called clusters.Cluster a collection of objects that are “similarsimilar” to one

another … what is similar? unsupervised learning: no predefined classes

Page 3: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Clustering Applications

Operations Research: Facility Location Problem: locate fire stations so as to

minimize the maximum/average distance a fire truck must travel

Signal Processing Vector Quantization: Transmit large files (e.g., video,

speech) by computing quantizers Astronomy:

SkyCat: Clustered 2x109 sky objects into stars, galaxies, quasars, etc based on radiation emitted in different spectrum bands.

Page 4: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Clustering Applications

Marketing: Segmentation of customers for target marketing Segmentation of customers based on online clickstream data.

Web To discover categories of content. Search results

Bioinformatics Gene expression

Finding groups of individuals (sick Vs. healthy) Finding groups of genes

Motifs search. …

In practice, clustering is one of the most widely used data mining techniques Association rule algorithms produce too many rules Other machine learning algorithms require labeled data.

Page 5: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Points/Metric Space Points could be in in Rd, {0,1}d,… Metric Space: dist(x,y) is a distance metric

if Reflexive: dist(x,y)=0 iff x=y Symmetric: dist(x,y)= dist(y,x) Triangle Inequality: dist(x,y) dist(x,z) +

dist(z,y)

x

y

Page 6: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example of Distance Metrics The distance between x=<x1,…,xn> and

y=<y1,…,yn> is: L2 norm: Manhattan Distance (L1 norm):

Documents: Cosine measure Similarity

i.e., more similar -> close to 1 less similar -> close to 0

Not a metric space, but 1-cos is

2211 )()( nn yxyx

nn yxyx 11

Page 7: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Correlation

We might care more about the overall shape of expression profiles rather than the actual magnitudes

That is, we might want to consider genes similar when they are “up” and “down” together

When might we want this kind of measure? What experimental issues might make this appropriate?

Page 8: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Pearson Linear Correlation

We’re shifting the expression profiles down (subtracting the means) and scaling by the standard deviations (i.e., making the data have mean = 0 and std = 1)

n

ii

n

ii

n

ii

n

ii

i

n

ii

yn

y

xn

x

yyxx

yyxx

1

1

)()(

))((),(

1

2

1

2

1yx

Page 9: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Pearson Linear Correlation Pearson linear correlation (PLC) is a measure that is

invariant to scaling and shifting (vertically) of the expression values

Always between –1 and +1 (perfectly anti-correlated and perfectly correlated)

This is a similarity measure, but we can easily make it into a dissimilarity measure:

2

),(1 yxpd

Page 10: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

PLC (cont.)

PLC only measures the degree of a linear relationship between two expression profiles!

If you want to measure other relationships, there are many other possible measures (see Jagota book and project #3 for more examples)

= 0.0249, so dp = 0.4876

The green curve is the square of the blue curve – this relationship is not captured with PLC

Page 11: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

More correlation examples

What do you think the correlation is here? Is this what we want?

How about here? Is this what we want?

Page 12: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Missing Values A common problem w/ microarray data One approach with Euclidean distance or

PLC is just to ignore missing values (i.e., pretend the data has fewer dimensions)

There are more sophisticated approaches that use information such as continuity of a time series or related genes to estimate missing values – better to use these if possible

Page 13: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Preprocessing

For methods that are not applicable in very high dimensions you may want to apply

- Dimensional reduction, e.g. consider the first few SVD components (truncate S at r-dimensions) and use the remaing values of the U or V matrices

- Dimensional reduction + normalization: after applying dimensional reduction normalize all resulting vectors to unit length (i.e. consider angles as proximity measures)

- Feature selection, e.g. consider only features that have large variance. More on feature selection in the future.

Page 14: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Clustering Types

Exclusive vs. Overlapping Clustering Hierarchical vs. Global Clustering Formal vs. Heuristic Clustering

First two examples:

K-Means: exclusive, global, heuristic

FCM (fuzzy c-means): overlapping, global, heuristic

Page 15: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Two classes of data described by (o) and (*). The objective is to reproduce the two classes by K=2 clustering.

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 16: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

1. Place two cluster centres (x) at random.2. Assign each data point (* and o) to the nearest cluster centre (x)

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 17: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

1. Compute the new centre of each class2. Move the crosses (x)

Page 18: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 2

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 19: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 3

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 20: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 4 (then stop, because no visible change)Each data point belongs to the cluster defined by the nearest centre

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 21: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The membership matrix M: 1. The last five data points (rows) belong to the first cluster (column)2. The first five data points (rows) belong to the second cluster (column)

M =

0.0000 1.0000

0.0000 1.0000

0.0000 1.0000

0.0000 1.0000

0.0000 1.0000

1.0000 0.0000

1.0000 0.0000

1.0000 0.0000

1.0000 0.0000

1.0000 0.0000

Page 22: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Membership matrix M

otherwise

ifm jkikik

0

122

cucu

data point k cluster centre i

distance

cluster centre j

Results of K-means depend on the starting point of the algorithm. Repeat it several times to get a better feeling whether the results are meaningful.

Page 23: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

c-partition

Kc

iallforUCØ

jiallforØCC

UC

i

ji

c

ii

2

1

All clusters C together fills the

whole universe UClusters do not

overlap

A cluster C is never empty and it is smaller than the whole universe U

There must be at least 2 clusters in a c-partition

and at most as many as the number of data

points K

Page 24: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Objective function

c

i Ckik

c

ii

ik

JJ1

2

,1 u

cu

Minimise the total sum of all distances

Page 25: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Algorithm: fuzzy c-means (FCM)

Page 26: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Each data point belongs to two clusters to different degrees

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 27: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

1. Place two cluster centres

2. Assign a fuzzy membership to each data point depending on distance

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 28: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

1. Compute the new centre of each class2. Move the crosses (x)

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 29: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 2

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 30: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 5

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 31: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 10

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 32: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Iteration 13 (then stop, because no visible change)Each data point belongs to the two clusters to a degree

-8 -6 -4 -2 0 2-8

-7

-6

-5

-4

-3

-2

-1

0

1

2

log(intensity) 475 Hz

log

(inte

nsi

ty)

55

7 H

z

Tiles data: o = whole tiles, * = cracked tiles, x = centres

Page 33: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The membership matrix M: 1. The last five data points (rows) belong mostly to the first cluster (column)2. The first five data points (rows) belong mostly to the second cluster (column)

M =

0.0025 0.9975

0.0091 0.9909

0.0129 0.9871

0.0001 0.9999

0.0107 0.9893

0.9393 0.0607

0.9638 0.0362

0.9574 0.0426

0.9906 0.0094

0.9807 0.0193

Page 34: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Hard Classifier (HCM)

Ok

light

moderate

severeOk

A cell is either one or the other class defined by a colour.

Page 35: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Fuzzy Classifier (FCM)

Ok

light

moderate

severeOk

A cell can belong to several classes to adegree, i.e., one columnmay have several colours.

Page 36: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Dendrograms allow us to visualize visualization is not unique!

Tends to be sensitive to small changes in the data Provided with clusters of every size: where to “cut” is

user-determined Large storage demand +

Running Time: O(n2 * |levels|) = O(n3) Depends on: distance measure, linkage method

Hierarchical Clustering• Greedy• Agglomerative vs. Divisive

Page 37: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Hierarchical Agglomerative Clustering

We start with every data point in a separate cluster

We keep merging the most similar pairs of data points/clusters until we have one big cluster left

This is called a bottom-up or agglomerative method

Page 38: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Hierarchical Clustering (cont.) This produces a

binary tree or dendrogram

The final cluster is the root and each data item is a leaf

The height of the bars indicate how close the items are

Page 39: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Hierarchical Clustering Demo

Page 40: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Hierarchical Clustering Issues Distinct clusters are not produced –

sometimes this can be good, if the data has a hierarchical structure w/o clear boundaries

There are methods for producing distinct clusters, but these usually involve specifying somewhat arbitrary cutoff values

What if data doesn’t have a hierarchical structure? Is HC appropriate?

Page 41: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Support Vector Clustering

Given points x in data space, define images in Hilbert space.

Require all images to be enclosed by a minimal sphere in Hilbert space.

Reflection of this sphere in data space defines cluster boundaries.

Two parameters: width of Gaussian kernel and fraction of outliers

Ben-Hur, Horn, Siegelmann & Vapnik. JMLR 2 (2001) 125-127

Page 42: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.
Page 43: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.
Page 44: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.
Page 45: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Variation of q allows for clustering solutions on various scales

q=1,

20,

24,

48

Page 46: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.
Page 47: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example that allows for SVclustering only in presence of outliers. Procedure: limit β <C=1/pN, where p=fraction of assumed outliers in the data.

q=3.5 p=0 q=1 p=0.3

Page 48: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Similarity to scale space approach for high values of q and p. Probability distribution obtained from R(x) .

q=4.8 p=0.7

Page 49: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

From Scale-space to Quantum Clustering

Parzen window approach: estimate the probability density by kernel functions (Gaussians) located at data points.

N

i

N

i

xx

i

i

ecxfcxP1 1

2

)(2

2

)(

σ= 1/√(2q)

Page 50: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Quantum Clustering

View P= as the solution of the Schrödinger equation:

with the potential V(x) responsible for attraction to cluster centers and the Lagrangian causing the spread.

Find V(x):

i

xx

i

i

exxd

EExV2

2

22

2

22

2

1

22

ExVH 2

2

2

Horn and Gottlieb, Phys. Rev. Lett. 88 (2002) 018702

Page 51: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Crabs Example The Crabs Example (from Ripley’s (from Ripley’s textbook)textbook)4 classes, 50 samples each, d=54 classes, 50 samples each, d=5

A topographic map of the probability distribution for the crab data set with =1/2 using principal components 2 and 3. There exists only one maximum.

Page 52: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Crabs ExampleQC potential exhibits four minima identified with cluster centers

A topographic map of the potential for the crab data set with =1/2 using principal components 2 and 3 . The four minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.

Page 53: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Crabs Example - ContdThe Crabs Example - Contd..

A three dimensional plot of the potential for the crab data set with =1/3 using principal components 2 and 3

Page 54: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Crabs Example - ContdThe Crabs Example - Contd..

A three dimensional plot of the potential for the crab data set with =1/2 using principal components 2 and 3

Page 55: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Identifying Clusters

Local minima of the potential are identified with cluster centers.

Data points are assigned to clusters according to:-minimal distance from centers, or,-sliding points down the slopes of the potential

with gradient descent until they reach the centers.

Page 56: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Iris ExampleThe Iris Example3 classes, each containing 50 samples, d=43 classes, each containing 50 samples, d=4

A topographic map of the potential for the iris data set with =0.25 using principal components 1 and 2. The three minima are denoted by crossed circles. The contours are set at values V=cE for c=0.2,…,1.

Page 57: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Iris Example - Gradient Descent DynamicsThe Iris Example - Gradient Descent Dynamics

Page 58: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

The Iris Example - Using Raw Data in 4DThe Iris Example - Using Raw Data in 4D..

There are only 5 misclassifications. =0.21.

Page 59: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example – Yeast cell cycle

Yeast cell cycle data were studied by several Yeast cell cycle data were studied by several groups who have applied SVD. groups who have applied SVD. (Spellman et al. (Spellman et al.

Molecular Biology of the Cell, 9, Dec. 2000)Molecular Biology of the Cell, 9, Dec. 2000) We use it to test clustering of genes, whose We use it to test clustering of genes, whose classification into groups was investigated by classification into groups was investigated by Spellman et al.Spellman et al.

The gene/sample matrix that we start from has The gene/sample matrix that we start from has dimensions of 798x72, using the same selection dimensions of 798x72, using the same selection as made by as made by (Shamir, R. and Sharan, R. 2002 ). (Shamir, R. and Sharan, R. 2002 ).

We truncate it to r=4 and obtain, once again, We truncate it to r=4 and obtain, once again, our best results for our best results for σσ=0.5, where four clusters =0.5, where four clusters follow from the QC algorithm. follow from the QC algorithm.

Page 60: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example – Yeast cell cycle

The five gene families as represented in two coordinates of our r=4 dimensional space.

Page 61: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example – Yeast cell cycle

Cluster assignments of genes for QC with s=0.46 , as compared to the classification by Spellman into five classes, shown as alternating gray and white areas .

Page 62: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Yeast cell cycle in normalized 2 dimensions

Page 63: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Hierarchical Quantum Clustering (HQC)

Start with raw data matrix containing gene expression profiles of the samples.

Apply SVD and truncate to r-space by selecting the first r significant eigenvectors

Apply QC in r-dimensions starting at small scale , obtaining many clusters. Move data points to cluster centers and reiterate the process at higher σ. This produces hierarchical clustering that can be represented by a dendrogram.

Page 64: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example – Clustering of human cancer cells

The NCI60 set is a gene expression profile of The NCI60 set is a gene expression profile of ~8000 genes in 60 human cancer cells. ~8000 genes in 60 human cancer cells.

NCI60 includes cell lines derived from cancers NCI60 includes cell lines derived from cancers of colorectal, renal, ovarian, breast, prostate, of colorectal, renal, ovarian, breast, prostate, lung and central nervous system, as well as lung and central nervous system, as well as leukemias and melanomas.leukemias and melanomas.

After application of selective filters the number After application of selective filters the number of gene spots is reduced to 1,376 gene subset. of gene spots is reduced to 1,376 gene subset. (Scherf et al. – Nature Genetics 24 , 2000)(Scherf et al. – Nature Genetics 24 , 2000)

We applied HQC with r=5 dimensionWe applied HQC with r=5 dimension.

Page 65: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example – Clustering of human cancer cells

Dendrogram of 60 cancer cell samples. The clustering was done in 5 truncated dimensions. The first 2 letters in each sample represent the tissue/cancer type.

Page 66: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Example - Projection onto the unit sphere

Representation of data of four classes of cancer cells on two dimensions of the truncated space. The circles denote the locations of the data points before this normalization was applied

Page 67: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

COMPACT – a comparative package for clustering assessment

Compact is a GUI Matlab tool that enables an easy and intuitive way to compare some clustering methods.

Compact is a five-step wizard that contains basic Matlab clustering methods as well as the quantum clustering algorithm. Compact provides a flexible and customizable interface for clustering data with high dimensionality.

Compact allows both textual and graphical display of the clustering results

Page 68: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

How to Install?

COMPACT is a self-extracting package. In order to install and run the QUI tool, follow these three easy steps

Download the COMPACT.zip package to your local drive.

     Add the COMPACT destination directory to your Matlab path.

     Within Matlab, type ‘compact’ at the command prompt.

Page 69: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 1

Input parameters

Page 70: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 1

Selecting variables

Page 71: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 2

Determining the matrix shape and vectors to cluster

Page 72: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 3

Preprocessing Procedures Components’ variance

graphs Preprocessing parameters

Page 73: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 4

Points distribution preview

and clustering method selection

Page 74: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 5

Parameters for clustering algorithms Kmeans

Page 75: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 5

Parameters for clustering algorithms FCM

Page 76: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 5

Parameters for clustering algorithms NN

Page 77: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 5

Parameters for clustering algorithms QC

Page 78: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 6COMPACT results

Page 79: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Steps – 6Results

Page 80: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Clustering Methods: Model-Based Data are generated from a mixture of

underlying probability distributions

Page 81: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Some Examples Two univariate

normal components

Equal proportions Common

variance 2=1

=1 =2

=3 =4

Page 82: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Two univariate normal components

proportions 0.75 and 0.25

Common variance 2=1

=1 =2

=3 =4

Page 83: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

and some more

Page 84: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Probability Models

Classification Likelihood

1 11

( ,..., ; ,..., | ) ( | )i i

n

C G n ii

L x f x

set of parameters of cluster K k

|i ik x K Mixture Likelihood

1 111

( ,..., ; ,..., | ) ( | )n G

M G G k k i kki

L x f x

is the probability that an observation belongs to cluster K ( )

k0;k

1

1G

kk

Page 85: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Probability Models (Cont.) Most used multivariate normal distribution

Θk has a means vector μk and a covariance matrix Σk

11( ) ( )

2/ 2

1( | , )

2 | |

Ti k k i kx x

k i k k pk

f x e

How is the covariance matrix Σk calculated?

Page 86: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Calculating the covariance matrix Σk

The idea: parameterize the covariance matrixT

k k k k kD A D Dk – Orthogonal matrix of eigenvectors

Determines the orientation of the PCs of Σk

Ak – Diagonal matrix whose elements are proportional to the eigenvalues of Σk

Determines the shape of the density contours

λk – Scalar Determines the volume of the corresponding

ellipsoid

Page 87: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Σk Definition Determines the Model

spherical, equal (SOS criterion)k I all ellipsoids are equal k DAD

Page 88: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

How is Θk computed? EM algorithm

1 1

( , , | ) [log ( | )]n G

k k ik ik k k i ki k

l z x z f x

The complete-data log-likelihood(*)

1 if belongs to group

0 otherwisei

ik

x kz

Density of an observation given zi is

is the conditional expectation of zik given xi and Θ1,…, ΘG

1

( | ) ik

Gz

k i kk

f x

1ˆ [ | , ,..., ]ik ik i Gz E z x

Page 89: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

1

ˆˆ

n

ik ii

kk

z x

n

ˆ kk

n

n

1

ˆn

k iki

n z

ˆk depends on the model

ˆikz• E: calculate,

,1

ˆˆ ˆ( | )ˆ

ˆˆ ˆ( | )

k k i k kik G

j j i j jj

f xz

f x

• M: given maximize (*)ˆikz

Page 90: Clustering Introduction Preprocessing: dimensional reduction with SVD Clustering methods: K-means, FCM Hierarchical methods Model based methods (at the.

Limitations of the EM Algorithm Low rate of convergence

You should start with good starting points and hope for separable clusters…

Not practical for large number of clusters (== probabilities)

"Crashes" when covariance matrix becomes singular Problems when there are few observation in a

cluster EM must not get more clusters than exist in

nature…