Clustering - People · • Small shift of a data point can flip it to a different cluster •...

Clustering

Machine Learning Workshop

UC Berkeley

Junming Yin

August 24, 2007

Outline

• Introduction

• Unsupervised learning

• What is clustering? Application

• Dissimilarity (similarity) of samples

• Clustering algorithm

• K-means

• Gaussian mixture model (GMM), EM

• Hierarchical clustering

• Spectral clustering

Unsupervised Learning

• Recall in the setting of classification and

regression, the training data are represented as

, the goal is to predict given

a new point , they are called supervised

learning

• In unsupervised setting, we only have unlabelled

• Why bother unsupervised learning? Whether can

we learn anything of value from unlabelled

samples?

Unsupervised Learning

• Most of time, collecting raw data is virtually cheap

but labeling them can be very costly

• The samples might lie in high-dimensional space,we might find some low-dimensional features thatmight be sufficient to describe the samples

• In the early stages of an investigation it may bevaluable to perform exploratory data analysisand thus gain some insight into the nature orstructure of data

• Clustering is one of unsupervised learningmethod

What is Clustering?• Roughly speaking, clustering analysis aims to discover

distinct clusters or groups of samples such that samples

within the same group are more similar to each other than

they are to the members of other groups

• a dissimilarity (similarity) function between samples

• a criterion to evaluate a groupings of samples into clusters

• an algorithm that optimizes this criterion function

Application of Clustering

• Image segmentation: decompose the image into regions with

coherent color and texture inside them

• Search result clustering: group the search result set and

provide a better user interface

• Computational biology: group homologous protein sequences

into families; gene expression data analysis

• Signal processing: compress the signal by using codebook

derived from vector quantization (VQ)

Outline

• Introduction

• K-means

Dissimilarity of samples

• The natural question now is: how should we

measure the dissimilarity between samples?

• the results depend on the choice of dissimilarity

• usually from subject matter consideration

• not necessarily a metric (i.e. triangle inequalitydoesn’t hold)

• possible to learn the dissimilarity from data for aparticular application (later)

Dissimilarity Based on features

• Most of time, data have measurements on features

• A common choice of dissimilarity function between samples is

the Euclidean distance

• Clusters defined by Euclidean distance is invariant totranslations and rotations in feature space, but not invariant toscaling of features

• You might consider standardizing the data: translate andscale the features so that all of features have zero mean andunit variance

• BE CAREFUL! It is not always desirable

Case Studies

Simulated data, 2-meanswithout standardization

Simulated data, 2-meanswith standardization

Learning Dissimilarity

• Suppose a user indicates certain objects are considered by him to be

“similar”:

• Consider learning a dissimilarity that respects this subjectivity

• If A is identity matrix, it corresponds to Euclidean distance

• Generally, A parameterizes a family of Mahalanobis distance

• Leaning such a dissimilarity is equivalent to finding a rescaling of

data by replacing with , and then applying the standard

Euclidean distance

• A simple way to define a criterion for thedesired dissimilarity:

• A convex optimization problem, could besolved by gradient descent and iterativeprojection

• For details, see [Xing, Ng, Jordan, Russell ’03]

Outline

• Introduction

• K-means

K-means: Idea

• Represent the data set in terms of K clusters,

each of which is summarized by a prototype

• Each data is assigned to one of K clusters

• Represented by responsibilities such

that for all data indices i

• Example: 4 data points and 3 clusters

K-means: Idea

• Criterion function: the sum-of-squared

distances from each data point to its

assigned prototype

prototypesresponsibilities

Minimizing the Cost Function

• Chicken and egg problem, have to resort to iterativemethod

• E-step: fix values for and minimize w.r.t.

• assigns each data point to its nearest prototype

• M-step: fix values for and minimize w.r.t• this gives

• each prototype set to the mean of points in thatcluster

• Convergence guaranteed since there are a finitenumber of possible settings for the responsibilities

• It can only find the local minima, we should start thealgorithm with many different initial settings

How to Choose K?• In some cases it is known apriori from problem

domain

• Generally, it has to be estimated from data andusually selected by some heuristics in practice

• The cost function J generally decrease withincreasing K

• Idea: Assume that K* is the right number

• We assume that for K<K* each estimated clustercontains a subset of true underlying groups

• For K>K* some natural groups must be split

• Thus we assume that for K<K* the cost functionfalls substantially, afterwards not a lot more

Limitations of K-means

• Hard assignments of data points to clusters

• Small shift of a data point can flip it to a different cluster

• Solution: replace hard clustering of K-means with soft

probabilistic assignments (GMM)

• Hard to choose the value of K

• As K is increased, the cluster memberships can change in

an arbitrary way, the resulting clusters are not necessarily

nested

• Solution: hierarchical clustering

The Gaussian Distribution• Multivariate Gaussian

• Maximum likelihood estimation

mean covariance

Gaussian Mixture

• Linear combination of Gaussians

0 0.5 10

parameters to be estimated

Gaussian Mixture

0 0.5 10

• To generate a data point:

• first pick one of the components with probability

• then draw a sample from that component distribution

• Each data point is generated by one of K components, a latent

variable is associated with each

Synthetic Data Set Without Colours

0 0.5 10

Fitting the Gaussian Mixture• Given the complete data set

• maximize the complete log likelihood

• trivial closed-form solution: fit each component to the

corresponding set of data points

• Without knowing values of latent variables, we have to

maximize the incomplete log likelihood:

• Sum over components appears inside the logarithm, no

closed-form solution

EM Algorithm

• E-step: for given parameter values we can

compute the expected values of the latent

variables (responsibilities of data points)

• Note that instead of but we still

Bayes rule

EM Algorithm

• M-step: maximize the expected complete log

likelihood

• update parameters:

EM Algorithm

• Iterate E-step and M-step until the loglikelihood of data does not increase any more

• converge to local optima

• need to restart algorithm with different initialguess of parameters (as in K-means)

• Relation to K-means

• Consider GMM with common covariance

• As , two methods coincide

Hierarchical Clustering

• Organize the clusters in an hierarchical way

• Produces a rooted (binary) tree (dendrogram)

Step 0 Step 1 Step 2 Step 3 Step 4

a b c d e

agglomerative

divisive

Hierarchical Clustering• Two kinds of strategy

• Bottom-up (agglomerative): recursively merge two groups with

the smallest between-cluster dissimilarity (defined later on)

• Top-down (divisive): in each step, split a least coherent cluster

(e.g. largest diameter); splitting a cluster is also a clustering

problem (usually done in a greedy way); less popular than

bottom-up way

a b c d e

agglomerative

divisive

• User can choose a cut through the hierarchy to

represent the most natural division into clusters

• e.g, choose the cut where intergroup dissimilarity

exceeds some threshold

a b c d e

agglomerative

divisive

• Have to measure the dissimilarity for two disjoint

groups G and H, is computed from

pairwise dissimilarities

• Single Linkage: tends to yield extended clusters

• Complete Linkage: tends to yield round clusters

• Group Average: tradeoff between them

Example: Human Tumor Microarray Data

• 6830 64 matrix of real numbers

• Rows correspond to genes,

columns to tissue samples

• Cluster rows (genes) can deduce

functions of unknown genes from

known genes with similar

expression profiles

• Cluster columns (samples) can

identify disease profiles: tissues

with similar disease should yield

similar expression profiles

Gene expression matrix

Example: Human Tumor Microarray Data

• 6830 64 matrix of real numbers

• GA clustering of the microarray data• Applied separately to rows and columns

• Subtrees with tighter clusters placed on the left

• Produces a more informative picture of genes and samples than

the randomly ordered rows and columns

Example: Two circles

Spectral Clustering [Ng, Jordan and Weiss 01’]

• Given a set of points , we’d like tocluster them into k clusters

• Form an affinity matrix where

• Define

• Find k largest eigenvectors of L, concatenate themcolumnwise to obtain

• Form the matrix Y by normalizing each row of X tohave unit length

• Think of n rows of Y as a new representation oforiginal n data points; cluster them into k clustersusing K-means

Example: Two circles

Had Fun?

Hope you enjoyed…

Clustering - People · • Small shift of a data point can flip it to a different cluster •...

Documents

Transcript of Clustering - People · • Small shift of a data point can flip it to a different cluster •...

GMM system - ANPEC

Gmm chapter6

Programma gmm 2014

Singing speaker clustering based on subspace learning in the GMM mean supervector space

GMM ASSI(2)

GMM Solar Energy by Rob Smith GMM Committee on Climate Change.

Gmm andate

Computer Vision: Models, Learning and Inference …cv192/wiki.files/CV192_lec...GMM GMM and Clustering Despite the term \posterior" in the previous slide, note it referred to a probability

Víctor Ponce Miguel Reyes Xavier Baró Mario Gorga Sergio Escalera Two-level GMM Clustering of Human Poses for Automatic Human Behavior Analysis Departament.

Data sheet GCM MOD GMM EC.1 Communications … Data sheet GCM MOD GMM EC.1 V_1.0 Data sheet GCM MOD GMM EC.1 Communications module Modbus for GMM EC ERP no.: 5206415 GCM MOD GMM EC.1

Soft-Clustering Driven Flip-flop Placement Targeting Clock … · Soft-Clustering Driven Flip-flop Placement Targeting Clock-induced OCV Dimitrios Mangiras Giorgos Dimitrakopoulos

Mapas mentales GMM

GMM Assignment Saket

Flip-ﬂop Clustering by Weighted K-means Algorithmhome.engineering.iastate.edu/~cnchu/pubs/c88.pdf · 2016-09-09 · Flip-ﬂop Clustering by Weighted K-means Algorithm Gang Wu,

GMM February 2010

GMM 2015 Brochure

Comunidad herrera gmm

2008 spie gmm

FF-Bond: Multi-bit Flip-flop Bonding at Placement · Multi-Bit Flip-Flops (MBFFs) ... Jiang et al. “INTEGRA: Fast multibit flip-flop clustering for clock power saving,” TCAD12,

GMM and Hierarchical Clustering