RJMCMC in clustering

44
. . Clustering by mixture model Pham The Thong 知能システム研究室 April 22, 2011 Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 1 / 44

description

A 30-minute presentation about RJMCMC in clustering

Transcript of RJMCMC in clustering

Page 1: RJMCMC in clustering

.

......Clustering by mixture model

Pham The Thong

知能システム研究室

April 22, 2011

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 1 / 44

Page 2: RJMCMC in clustering

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 2 / 44

Page 3: RJMCMC in clustering

RJMCMC in clustering Clustering overview

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 3 / 44

Page 4: RJMCMC in clustering

RJMCMC in clustering Clustering overview

Clustering overview

Divide the observations into groups.

Predict group of a new observation.

Model-based clustering: select a probabilistic modelthat underlying the observations and makestatistical inferences based on that model. Onepopular model is the mixture model.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 4 / 44

Page 5: RJMCMC in clustering

RJMCMC in clustering Clustering overview

Clustering via mixture model

X = (x1, · · · , xn) be independent p-dimensionalobservations from G populations.

f (xi |w,θ) =G∑

k=1

wk f (xi |θk)

f (xi |θk) is the density of an observation xi from the kthcomponent.w = (w1, · · · ,wG )

T are component weights.θ = (θ1, · · · , θG )T are component parameters.Clustering is done via allocation vectory = (y1, · · · , yn)T : yi = k if the ith observation xi comesfrom component k .

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 5 / 44

Page 6: RJMCMC in clustering

RJMCMC in clustering Clustering overview

Some approaches

Model Selection: Compare some model selectioncriteria of fixed-G models for various values of G tochoose the best G . Inference on fixed-G model isoften done via EM algorithm or Gibbs sampler.

Nonparametric method: Use Dirichlet Process.

Trans-dimensional Markov Chain Monte Carlo(MCMC): Allow G to be changed during theinference process by combining Gibbs sampler withMCMC moves that can change dimension of themodel. Reversible jump MCMC (RJMCMC) is onepossible scheme.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 6 / 44

Page 7: RJMCMC in clustering

RJMCMC in clustering Reversible Jump MCMC

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 7 / 44

Page 8: RJMCMC in clustering

RJMCMC in clustering Reversible Jump MCMC

Overview

First developed in Green(1995)

Has applications ranged well beyond mixture modelanalysis.

Mixture model analysis power first demonstrated inRichardson&Green(1997). They considered only the1-dimensional case.

Applied to multidimensional setting in Tadesse et.al.(2005).

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 8 / 44

Page 9: RJMCMC in clustering

RJMCMC in clustering Reversible Jump MCMC

Some advantages of clustering byRJMCMC

Avoid the task of model selection.

Provide a coherent Bayesian framework. The clusternumber G is not treated as a special parameter.

Can provide useful summary of data which isdifficult to obtain by other methods.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 9 / 44

Page 10: RJMCMC in clustering

RJMCMC in clustering Reversible Jump MCMC

General ideas of RJMCMC I

Simulating a Markov Chain that converges to thefull posterior distribution p(G , y,w,θ|X).Hybrid sampler consist of Gibbs Sampler(the base)and jump moves (the extension).

Gibbs sampler will sample (y,w,θ). Jump moveswill sample the cluster number G .

The jump moves come in pair: Split/Merge andBirth/Death

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 10 / 44

Page 11: RJMCMC in clustering

RJMCMC in clustering Reversible Jump MCMC

General ideas of RJMCMC II

Split move: split one component into twocomponents.Merge move: combine two components into onecomponent.Birth move: create an empty component.Death move: delete an empty component.

At each iteration, propose to perform Split(Birth)move with some fixed probability bk and withprobability 1− bk propose to perform Merge(Death)move.

In one proposal, calculate all the changes to themodel as if the move was made.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 11 / 44

Page 12: RJMCMC in clustering

RJMCMC in clustering Reversible Jump MCMC

General ideas of RJMCMC III

Calculate the acceptance probability A, which is theproduct of three terms:

the ratio of the posterior of the new model to that of theold modelthe ratio of the probability of the way to go from thenew model back to the old model to that of the way togo from old model to new modelthe Jacobian arises from the change of dimension

To ensure convergence to the desired distribution,only actually carry out the move with probabilitymin(1,A).

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 12 / 44

Page 13: RJMCMC in clustering

Richardson&Green(1997) Overview

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 13 / 44

Page 14: RJMCMC in clustering

Richardson&Green(1997) Overview

Overview

1-dimensional data.Goal:

Clustering data.Estimating component parameters.Estimating the distribution of data.Predicting group of new data.

Demonstrated in three real dataset: Enzym, Acid,and Galaxy.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 14 / 44

Page 15: RJMCMC in clustering

Richardson&Green(1997) Split/Merge and Birth/Death Mechanism

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 15 / 44

Page 16: RJMCMC in clustering

Richardson&Green(1997) Split/Merge and Birth/Death Mechanism

Split/Merge Mechanism

In Split move, select one component (wj∗, µj∗, σj∗)to split to 2 components (wj1, µj1, σj1) and(wj2, µj2, σj2).

In Merge move, select two components (wj1, µj1, σj1)and (wj2, µj2, σj2) to merge into one new component(wj∗, µj∗, σj∗).

Equalizing the zeroth, first, second moment of thenew component to those of a combination of thetwo old components.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 16 / 44

Page 17: RJMCMC in clustering

Richardson&Green(1997) Split/Merge and Birth/Death Mechanism

Birth/Death Mechanism

Birth moveGenerate wj∗ , µj∗ , σj∗ from some distributions.Rescale the weights.

Death moveDelete a randomly chosen empty component.Rescale the weights.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 17 / 44

Page 18: RJMCMC in clustering

Richardson&Green(1997) Algorithm

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 18 / 44

Page 19: RJMCMC in clustering

Richardson&Green(1997) Algorithm

One iteration containsGibbs Sampler:

Updating the weights wUpdating the parameters µ,σUpdating the allocation y

Split/Merge move

Birth/Death move

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 19 / 44

Page 20: RJMCMC in clustering

Richardson&Green(1997) Result

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 20 / 44

Page 21: RJMCMC in clustering

Richardson&Green(1997) Result

Post simulation

By processing the raw data come from the simulation,one can

clustering data by selecting the allocation vector ythat has the highest frequency.

estimating component parameters by their posteriormean.

estimating the distribution of data.

predicting group of new data.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 21 / 44

Page 22: RJMCMC in clustering

Richardson&Green(1997) Result

The three dataset

Enzym data: enzymatic activity of one enzyme inthe blood of 245 unrelated people. The interest isidentifying subgroups of slow or fast activity as amarker of genetic polymorphism in the generalpopulation(i.e. to some extent, people of the samesubgroup may have similar genetic structurealthough they are unrelated).

Acid data: acidity level of 155 lakes in Wisconsin.

Galaxy data: velocities of 82 galaxies diverging fromour galaxy.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 22 / 44

Page 23: RJMCMC in clustering

Richardson&Green(1997) Result

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 23 / 44

Page 24: RJMCMC in clustering

Richardson&Green(1997) Result

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 24 / 44

Page 25: RJMCMC in clustering

Tadesse et.al.(2005) Overview

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 25 / 44

Page 26: RJMCMC in clustering

Tadesse et.al.(2005) Overview

Overview

High dimensional dataGoal:

Variable selecting.Clustering data.Predicting group of new data.

Applied to microarray data.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 26 / 44

Page 27: RJMCMC in clustering

Tadesse et.al.(2005) Variable Selection

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 27 / 44

Page 28: RJMCMC in clustering

Tadesse et.al.(2005) Variable Selection

Concept

Perhaps not all variables are useful for clustering.

By throwing away non-discriminating variables(irrelevant variables) and clustering only ondiscriminating variables (relevant variables) we mayimprove clustering accuracy.

We can think of variable selection as one way togeneralize the basic approach “clustering by the fullset of variables” to “clustering by a subset ofvariables”.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 28 / 44

Page 29: RJMCMC in clustering

Tadesse et.al.(2005) Variable Selection

The model of Tadesse et.al. I

Introduce γ = (γ1, · · · , γp): γj = 1 if the jth variable isa discriminating variable and 0 if it is not.Use (γ) and (γc) to index discriminating variables andnon-discriminating variables.Three assumptions:

The set of discriminating variables and the set ofnon-discriminating variables are independent.

If we look only at (γc), the data X(γc) have anormal distribution(hence unsuitable for clustering).

If we look only at (γ), the data X(γ) have a mixturedistribution of G normal components (hencesuitable for clustering).

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 29 / 44

Page 30: RJMCMC in clustering

Tadesse et.al.(2005) Variable Selection

The model of Tadesse et.al. II

(η(γc),Ω(γc)): mean and covariance for thenon-discriminating variables.(µk(γ),Σk(γ)): mean and covariance for the kthcomponents Ck .The three assumptions can be written as

p(X|G ,γ,w, y,µ,Σ,η,Ω) =n∏

i=1

N(xi(γc),η(γc),Ω(γc)

)G∏

k=1

∏xi∈Ck

N(xi(γ),µk(γ),Σk(γ)

)Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 30 / 44

Page 31: RJMCMC in clustering

Tadesse et.al.(2005) Variable Selection

Searching for γ

The problem of variable selection is re-casted as aproblem of searching for the most probable binaryvector γ.

Use a Metropolis search(of which SimulatedAnnealing is one type)

At each step randomly choosing one of the followingtwo transitional moves: flip one bit or swap two bitof γ and accept the move with probability

min(1, p(γ

new |X,y,w,G )p(γold |X,y,w,G )

).

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 31 / 44

Page 32: RJMCMC in clustering

Tadesse et.al.(2005) RJMCMC Mechanism

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 32 / 44

Page 33: RJMCMC in clustering

Tadesse et.al.(2005) RJMCMC Mechanism

Difficulties in high dimension

Unlike 1-dimensional case, there is no obvious wayto split a covariance matrix into two covariancematrix. Even if this could be done[4], the Jacobianmay not have closed-form.

The number of model parameters increases rapidlywith order p2. The chain may converge very slowly.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 33 / 44

Page 34: RJMCMC in clustering

Tadesse et.al.(2005) RJMCMC Mechanism

Approach of Tadesse et.al.

Integrating out the mean vector and the covariancematrix to obtain a marginalized posterior in whichonly G ,w,γ,and y are involved.

Despite being quite tedious, the math follows astandard framework: define conjugate priors formean and covariance matrix and then take theintegration.

Only need to split or merge the weights ofcomponents in Split/Merge move. Birth/Deathmove are the same as in 1-dimensional case.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 34 / 44

Page 35: RJMCMC in clustering

Tadesse et.al.(2005) RJMCMC Mechanism

Algorithm

One iteration contains

Metropolis search for γGibbs sampler:

Updating the weights wUpdating the allocation y

Split/Merge move

Birth/Death move

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 35 / 44

Page 36: RJMCMC in clustering

Tadesse et.al.(2005) Result

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 36 / 44

Page 37: RJMCMC in clustering

Tadesse et.al.(2005) Result

Post simulation

Since the mean and covariance are integrated out,there is no estimation for component parameters.Variable selection:

Method 1: select the vector γ that have the highestfrequency.Method 2: select all variables j that have p(γj |X,G )greater than some threshold: p(γj |X,G ) ≥ a.

Clustering and group prediction can be done in thesame way as in the univariate case.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 37 / 44

Page 38: RJMCMC in clustering

Tadesse et.al.(2005) Result

Microarray data

14 samples (samples are come from tissues).

Variables are genes. There are 762 variables.

By clustering the samples into subgroups, one mayfind out which genes are relevant to each subgroup.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 38 / 44

Page 39: RJMCMC in clustering

Tadesse et.al.(2005) Result

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 39 / 44

Page 40: RJMCMC in clustering

Tadesse et.al.(2005) Result

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 40 / 44

Page 41: RJMCMC in clustering

Tadesse et.al.(2005) Weakness of the model

Outline...1 RJMCMC in clustering

Clustering overviewReversible Jump MCMC

...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components

OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult

...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data

OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 41 / 44

Page 42: RJMCMC in clustering

Tadesse et.al.(2005) Weakness of the model

Weakness of the model [5]

The independence assumption would often lead tothe wrongly case in which one irrelevant variable beidentified as a discriminating one because it isrelated to some discriminating variables.

It is not known whether one can relax thisassumption while still being able to performRJMCMC-based full Bayesian analysis.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 42 / 44

Page 43: RJMCMC in clustering

Tadesse et.al.(2005) Weakness of the model

References

[1]P.J.Green(1995), Reversible jump Markov chain Monte Carlocomputation and Bayesian model determination, Biometrica82,4,711-732.[2]S.Richardson and P.J.Green(1997), On Bayesian Analysis ofMixtures with an Unknown Number of Components, J.R.Statist.Soc.B 59, 4,731-792.[3]M.G.Tadesse, N.Sha, and M. Vannucci(2005), Bayesian VariableSelection in Clustering High-Dimensional Data,Journal of theAmerican Statistical Association 100,470,602-617.[4]Petros Dellaportas and Ioulia Papageorgiou(2006), Multivariatemixtures of normals with unknown number of components,Statisticsand Computing 16,1,57 - 68.[5]Maugis et.al.(2009), Variable Selection for Clustering withGaussian Mixture Models, Biometrics 65, 701-709.

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 43 / 44

Page 44: RJMCMC in clustering

Tadesse et.al.(2005) Weakness of the model

Thank you for your attention

Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 44 / 44