RJMCMC in clustering
-
Upload
thethong -
Category
Technology
-
view
1.778 -
download
0
description
Transcript of RJMCMC in clustering
.
......Clustering by mixture model
Pham The Thong
知能システム研究室
April 22, 2011
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 1 / 44
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 2 / 44
RJMCMC in clustering Clustering overview
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 3 / 44
RJMCMC in clustering Clustering overview
Clustering overview
Divide the observations into groups.
Predict group of a new observation.
Model-based clustering: select a probabilistic modelthat underlying the observations and makestatistical inferences based on that model. Onepopular model is the mixture model.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 4 / 44
RJMCMC in clustering Clustering overview
Clustering via mixture model
X = (x1, · · · , xn) be independent p-dimensionalobservations from G populations.
f (xi |w,θ) =G∑
k=1
wk f (xi |θk)
f (xi |θk) is the density of an observation xi from the kthcomponent.w = (w1, · · · ,wG )
T are component weights.θ = (θ1, · · · , θG )T are component parameters.Clustering is done via allocation vectory = (y1, · · · , yn)T : yi = k if the ith observation xi comesfrom component k .
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 5 / 44
RJMCMC in clustering Clustering overview
Some approaches
Model Selection: Compare some model selectioncriteria of fixed-G models for various values of G tochoose the best G . Inference on fixed-G model isoften done via EM algorithm or Gibbs sampler.
Nonparametric method: Use Dirichlet Process.
Trans-dimensional Markov Chain Monte Carlo(MCMC): Allow G to be changed during theinference process by combining Gibbs sampler withMCMC moves that can change dimension of themodel. Reversible jump MCMC (RJMCMC) is onepossible scheme.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 6 / 44
RJMCMC in clustering Reversible Jump MCMC
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 7 / 44
RJMCMC in clustering Reversible Jump MCMC
Overview
First developed in Green(1995)
Has applications ranged well beyond mixture modelanalysis.
Mixture model analysis power first demonstrated inRichardson&Green(1997). They considered only the1-dimensional case.
Applied to multidimensional setting in Tadesse et.al.(2005).
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 8 / 44
RJMCMC in clustering Reversible Jump MCMC
Some advantages of clustering byRJMCMC
Avoid the task of model selection.
Provide a coherent Bayesian framework. The clusternumber G is not treated as a special parameter.
Can provide useful summary of data which isdifficult to obtain by other methods.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 9 / 44
RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC I
Simulating a Markov Chain that converges to thefull posterior distribution p(G , y,w,θ|X).Hybrid sampler consist of Gibbs Sampler(the base)and jump moves (the extension).
Gibbs sampler will sample (y,w,θ). Jump moveswill sample the cluster number G .
The jump moves come in pair: Split/Merge andBirth/Death
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 10 / 44
RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC II
Split move: split one component into twocomponents.Merge move: combine two components into onecomponent.Birth move: create an empty component.Death move: delete an empty component.
At each iteration, propose to perform Split(Birth)move with some fixed probability bk and withprobability 1− bk propose to perform Merge(Death)move.
In one proposal, calculate all the changes to themodel as if the move was made.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 11 / 44
RJMCMC in clustering Reversible Jump MCMC
General ideas of RJMCMC III
Calculate the acceptance probability A, which is theproduct of three terms:
the ratio of the posterior of the new model to that of theold modelthe ratio of the probability of the way to go from thenew model back to the old model to that of the way togo from old model to new modelthe Jacobian arises from the change of dimension
To ensure convergence to the desired distribution,only actually carry out the move with probabilitymin(1,A).
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 12 / 44
Richardson&Green(1997) Overview
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 13 / 44
Richardson&Green(1997) Overview
Overview
1-dimensional data.Goal:
Clustering data.Estimating component parameters.Estimating the distribution of data.Predicting group of new data.
Demonstrated in three real dataset: Enzym, Acid,and Galaxy.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 14 / 44
Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 15 / 44
Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Split/Merge Mechanism
In Split move, select one component (wj∗, µj∗, σj∗)to split to 2 components (wj1, µj1, σj1) and(wj2, µj2, σj2).
In Merge move, select two components (wj1, µj1, σj1)and (wj2, µj2, σj2) to merge into one new component(wj∗, µj∗, σj∗).
Equalizing the zeroth, first, second moment of thenew component to those of a combination of thetwo old components.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 16 / 44
Richardson&Green(1997) Split/Merge and Birth/Death Mechanism
Birth/Death Mechanism
Birth moveGenerate wj∗ , µj∗ , σj∗ from some distributions.Rescale the weights.
Death moveDelete a randomly chosen empty component.Rescale the weights.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 17 / 44
Richardson&Green(1997) Algorithm
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 18 / 44
Richardson&Green(1997) Algorithm
One iteration containsGibbs Sampler:
Updating the weights wUpdating the parameters µ,σUpdating the allocation y
Split/Merge move
Birth/Death move
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 19 / 44
Richardson&Green(1997) Result
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 20 / 44
Richardson&Green(1997) Result
Post simulation
By processing the raw data come from the simulation,one can
clustering data by selecting the allocation vector ythat has the highest frequency.
estimating component parameters by their posteriormean.
estimating the distribution of data.
predicting group of new data.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 21 / 44
Richardson&Green(1997) Result
The three dataset
Enzym data: enzymatic activity of one enzyme inthe blood of 245 unrelated people. The interest isidentifying subgroups of slow or fast activity as amarker of genetic polymorphism in the generalpopulation(i.e. to some extent, people of the samesubgroup may have similar genetic structurealthough they are unrelated).
Acid data: acidity level of 155 lakes in Wisconsin.
Galaxy data: velocities of 82 galaxies diverging fromour galaxy.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 22 / 44
Richardson&Green(1997) Result
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 23 / 44
Richardson&Green(1997) Result
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 24 / 44
Tadesse et.al.(2005) Overview
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 25 / 44
Tadesse et.al.(2005) Overview
Overview
High dimensional dataGoal:
Variable selecting.Clustering data.Predicting group of new data.
Applied to microarray data.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 26 / 44
Tadesse et.al.(2005) Variable Selection
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 27 / 44
Tadesse et.al.(2005) Variable Selection
Concept
Perhaps not all variables are useful for clustering.
By throwing away non-discriminating variables(irrelevant variables) and clustering only ondiscriminating variables (relevant variables) we mayimprove clustering accuracy.
We can think of variable selection as one way togeneralize the basic approach “clustering by the fullset of variables” to “clustering by a subset ofvariables”.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 28 / 44
Tadesse et.al.(2005) Variable Selection
The model of Tadesse et.al. I
Introduce γ = (γ1, · · · , γp): γj = 1 if the jth variable isa discriminating variable and 0 if it is not.Use (γ) and (γc) to index discriminating variables andnon-discriminating variables.Three assumptions:
The set of discriminating variables and the set ofnon-discriminating variables are independent.
If we look only at (γc), the data X(γc) have anormal distribution(hence unsuitable for clustering).
If we look only at (γ), the data X(γ) have a mixturedistribution of G normal components (hencesuitable for clustering).
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 29 / 44
Tadesse et.al.(2005) Variable Selection
The model of Tadesse et.al. II
(η(γc),Ω(γc)): mean and covariance for thenon-discriminating variables.(µk(γ),Σk(γ)): mean and covariance for the kthcomponents Ck .The three assumptions can be written as
p(X|G ,γ,w, y,µ,Σ,η,Ω) =n∏
i=1
N(xi(γc),η(γc),Ω(γc)
)G∏
k=1
∏xi∈Ck
N(xi(γ),µk(γ),Σk(γ)
)Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 30 / 44
Tadesse et.al.(2005) Variable Selection
Searching for γ
The problem of variable selection is re-casted as aproblem of searching for the most probable binaryvector γ.
Use a Metropolis search(of which SimulatedAnnealing is one type)
At each step randomly choosing one of the followingtwo transitional moves: flip one bit or swap two bitof γ and accept the move with probability
min(1, p(γ
new |X,y,w,G )p(γold |X,y,w,G )
).
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 31 / 44
Tadesse et.al.(2005) RJMCMC Mechanism
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 32 / 44
Tadesse et.al.(2005) RJMCMC Mechanism
Difficulties in high dimension
Unlike 1-dimensional case, there is no obvious wayto split a covariance matrix into two covariancematrix. Even if this could be done[4], the Jacobianmay not have closed-form.
The number of model parameters increases rapidlywith order p2. The chain may converge very slowly.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 33 / 44
Tadesse et.al.(2005) RJMCMC Mechanism
Approach of Tadesse et.al.
Integrating out the mean vector and the covariancematrix to obtain a marginalized posterior in whichonly G ,w,γ,and y are involved.
Despite being quite tedious, the math follows astandard framework: define conjugate priors formean and covariance matrix and then take theintegration.
Only need to split or merge the weights ofcomponents in Split/Merge move. Birth/Deathmove are the same as in 1-dimensional case.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 34 / 44
Tadesse et.al.(2005) RJMCMC Mechanism
Algorithm
One iteration contains
Metropolis search for γGibbs sampler:
Updating the weights wUpdating the allocation y
Split/Merge move
Birth/Death move
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 35 / 44
Tadesse et.al.(2005) Result
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 36 / 44
Tadesse et.al.(2005) Result
Post simulation
Since the mean and covariance are integrated out,there is no estimation for component parameters.Variable selection:
Method 1: select the vector γ that have the highestfrequency.Method 2: select all variables j that have p(γj |X,G )greater than some threshold: p(γj |X,G ) ≥ a.
Clustering and group prediction can be done in thesame way as in the univariate case.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 37 / 44
Tadesse et.al.(2005) Result
Microarray data
14 samples (samples are come from tissues).
Variables are genes. There are 762 variables.
By clustering the samples into subgroups, one mayfind out which genes are relevant to each subgroup.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 38 / 44
Tadesse et.al.(2005) Result
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 39 / 44
Tadesse et.al.(2005) Result
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 40 / 44
Tadesse et.al.(2005) Weakness of the model
Outline...1 RJMCMC in clustering
Clustering overviewReversible Jump MCMC
...2 Richardson&Green(1997): On Bayesian Analysis of Mixtures with anUnknown Number of Components
OverviewSplit/Merge and Birth/Death MechanismAlgorithmResult
...3 Tadesse et.al.(2005): Bayesian Variable Selection in ClusteringHigh-Dimensional Data
OverviewVariable SelectionRJMCMC MechanismResultWeakness of the model
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 41 / 44
Tadesse et.al.(2005) Weakness of the model
Weakness of the model [5]
The independence assumption would often lead tothe wrongly case in which one irrelevant variable beidentified as a discriminating one because it isrelated to some discriminating variables.
It is not known whether one can relax thisassumption while still being able to performRJMCMC-based full Bayesian analysis.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 42 / 44
Tadesse et.al.(2005) Weakness of the model
References
[1]P.J.Green(1995), Reversible jump Markov chain Monte Carlocomputation and Bayesian model determination, Biometrica82,4,711-732.[2]S.Richardson and P.J.Green(1997), On Bayesian Analysis ofMixtures with an Unknown Number of Components, J.R.Statist.Soc.B 59, 4,731-792.[3]M.G.Tadesse, N.Sha, and M. Vannucci(2005), Bayesian VariableSelection in Clustering High-Dimensional Data,Journal of theAmerican Statistical Association 100,470,602-617.[4]Petros Dellaportas and Ioulia Papageorgiou(2006), Multivariatemixtures of normals with unknown number of components,Statisticsand Computing 16,1,57 - 68.[5]Maugis et.al.(2009), Variable Selection for Clustering withGaussian Mixture Models, Biometrics 65, 701-709.
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 43 / 44
Tadesse et.al.(2005) Weakness of the model
Thank you for your attention
Pham The Thong (知能システム研究室) Clustering by mixture model April 22, 2011 44 / 44