Clustering of Time Course Gene-Expression Data via Mixture Regression Models
description
Transcript of Clustering of Time Course Gene-Expression Data via Mixture Regression Models
![Page 1: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/1.jpg)
Clustering of Time Course Gene-Expression Data
via Mixture Regression ModelsGeoff McLachlan
(joint with Angus Ng and Sam Wang)
Department of Mathematics & Institute for Molecular BioscienceUniversity of Queensland
ARC Centre of Excellence in Bioinformatics
http://www.maths.uq.edu.au/~gjm
![Page 2: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/2.jpg)
Institute for Molecular Bioscience, University of Queensland
![Page 3: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/3.jpg)
Time-Course Data
Time-course microarray experiments are being increasingly used to characterize dynamic biological processes.(Microarray technology provides the ability to measure the expressionlevels of thousands of genes at once.)
In these experiments, gene-expression levels are measured at different time points, possibly in different biological conditions (e.g. treatment-control). The focus here is on the analysis of gene-expression profilesconsisting of short time series of log expression ratios foreach of the genes represented on the microarrays.
![Page 4: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/4.jpg)
CLUSTERING OF GENE PROFILES
can provide new insight into the biological proces of interest (coexpressed genes can contribute to our understanding of the regulatory network of gene expression). can also assist in assigning functions to genes that have not yet been functionally annotated.
a secondary concern is the need for imputation of missing data
![Page 5: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/5.jpg)
The biological rationale underlying the clustering ofmicroarray data is the fact that many coexpressed genesare coregulated. It becomes a way of identifying sets ofgenes that are putatively coregulated, thereby generatingtestable hypotheses; see Boutros and Okey (2005).
It assists with: the functional annotation of uncharacterised genes
the identification of transcription factor binding sites
the elucidation of complete biological pathways
![Page 6: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/6.jpg)
Outline of Talk
1. Mixture model-based approach to analysis of gene-expressions
2. Normal Mixtures
3. Modifications for high-dimensional and/or structured data
4. Mixtures of linear mixed models
5. Clustering of gene profiles
![Page 7: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/7.jpg)
![Page 8: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/8.jpg)
![Page 9: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/9.jpg)
• Provide an arbitrarily accurate estimate of the underlying density with g sufficiently large
• Provide a probabilistic clustering of the data into g clusters - outright clustering by assigning a data point to the component to which it has the greatest posterior probability of belonging.
Finite Mixture Models
![Page 10: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/10.jpg)
Definition
We let Y1,…. Yn denote a random sample of size n where Yj is a p-dimensional random vector with probability density function f (yj)
where the f i(yj) are densities and the i are nonnegative quantities that sum to one.
)y()y( j
g
1iij iff
![Page 11: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/11.jpg)
By Bayes Theorem,
for i=1,…, g; j=1,…,n.
);( )()( kji
kij Ψy
);(/);( )()()()( kj
ki
kii
ki ff Ψyθy
The quantity i(yj;(k)) is the posterior probability that the jth member of the sample with observed value yj belongs to the ith component of the mixture.
![Page 12: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/12.jpg)
A soft (probabilistic) clustering is given in terms of theestimated posterior probabilities of component membership
A hard (outright) clustering is given by assigning each yj
to the component to which it has the highest posteriorprobability of belonging; that is, given by thewhere
.ij
,ˆijz
hjh
ij iz max arg if ,1ˆ
otherwise. ,0
![Page 13: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/13.jpg)
Multivariate Mixture Models Day (Biometrika, 1969) Wolfe (NORMIX, 1965, 1967, 1970)
It was the publication of the seminal paper of Dempster, Laird, and Rubin (1977) on the EM algorithm that greatly stimulated interest in the use of finite mixture distributions to model heterogeneous data.
![Page 14: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/14.jpg)
Multivariate Mixture Models Day (Biometrika, 1969) Wolfe (NORMIX, 1965, 1967, 1970)
It was the publication of the seminal paper of Dempster, Laird, and Rubin (1977) on the EM algorithm that greatly stimulated interest in the use of finite mixture distributions to model heterogeneous data.
Ganesalingam and McLachlan (Biometrika,1978)
![Page 15: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/15.jpg)
• Everitt and Hand (2001)
• Titterington, Smith, and Makov (1985)
![Page 16: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/16.jpg)
• Everitt and Hand (2001)
• Titterington, Smith, and Makov (1985)
• McLachlan and Basford (1988)
• Lindsay (1996)
• McLachlan and Peel (2000)
• Bohning (2000)
• Fruhwirth-Schnatter (2006)
![Page 17: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/17.jpg)
Normal Mixtures
where is the vector containing the unknown parameters.
),,;( );(1
iij
g
iij Ψf yy
Suppose that the density of the random vector Yj has a g-component normal mixture form
![Page 18: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/18.jpg)
One attractive feature of adopting mixture models with elliptically symmetric components, such as the normal or t densities, is that the implied clustering is invariant under affine transformations of the data, i.e., under operations relating to changes in location, scale, and rotation of the data.
Thus the clustering process does not depend on irrelevant factors such as the units of measurement or the orientation of the clusters in space.
![Page 19: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/19.jpg)
Sample 1 Sample 2 Sample M
Gene 1Gene 2
Gene N
Expression ProfileE
xpression S
ignature
Microarray Data represented as N x M Matrix
N rows (genes) ~ 104
M columns (samples) ~ 102
![Page 20: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/20.jpg)
Clustering of Microarray Data
Clustering of tissues on basis of genes:
latter is a nonstandard problem in
cluster analysis (n =M << p=N)
Clustering of genes on basis of tissues:
genes (observations) not independent and
structure on the tissues (variables) (n=N >> p=M)
![Page 21: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/21.jpg)
The component-covariance matrix Σi is highly parameterized with p(p+1)/2 parameters.
Σi = σ2Ip (equal spherical)
Σi = σi2Ip (unequal spherical)
Σi = D (equal diagonal)
Σi = Di (unequal diagonal)
Σi = Σ (equal)
![Page 22: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/22.jpg)
Banfield and Raftery (1993) introduced a parameterization of the component-covariance matrix Σi based on a variant of the standard spectral decomposition of Σi
(i=1, …,g).
![Page 23: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/23.jpg)
However, if p is large relative to the sample size n, it may not be possible to use this decomposition to infer an appropriate model for the component-covariance matrices.
Even if it is possible, the results may not be reliable due to potential problems with near-singular estimates of the component-covariance matrices when p is large relative to n.
![Page 24: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/24.jpg)
Hence, in fitting normal mixture models to high-dimensional data, we should first consider
• some form of dimension reduction and/or
• some form of regularization
![Page 25: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/25.jpg)
Mixture Software: EMMIX
McLachlan, Peel, Adams, and Basfordhttp://www.maths.uq.edu.au/~gjm/emmix/emmix.html
EMMIX for UNIX
![Page 26: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/26.jpg)
PROVIDES A MODEL-BASED APPROACH TO CLUSTERING
McLachlan, Bean, and Peel, 2002, A Mixture Model-Based Approach to the Clustering of Microarray
Expression Data, Bioinformatics 18, 413-422
http://www.bioinformatics.oupjournals.org/cgi/screenpdf/18/3/413.pdf
![Page 27: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/27.jpg)
![Page 28: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/28.jpg)
Sample 1 Sample 2 Sample M
Gene 1Gene 2
Gene N
Expression ProfileE
xpression S
ignature
Microarray Data represented as N x M Matrix
N rows (genes) ~ 104
M columns (samples) ~ 102
![Page 29: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/29.jpg)
In applying the normal mixture model to cluster multivariate(continuous) data, it is assumed as in most typical cluster
analyses using any other method that
(a) there are no replications on any particular entity specifically identified as such;
(b) all the observations on the entities are independent of one another
![Page 30: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/30.jpg)
For example,
where
and
where
,),( 21TT
iTii X
, 2
1
p
p
IO
OIX
).2,1,( 1 ihhphihi
),2,1( 2
1
i
O
O
i
ii
).2,1,( 2 ihIhphihi
![Page 31: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/31.jpg)
• Longitudinal (with or without replication, for example time-course)
• Cross-sectional data
Clustering of gene expression profiles
Ng, McLachlan, Wang, Ben-Tovim Jones, and Ng (2006, Bioinformatics)
Supplementary information :
http://www.maths.uq.edu.au/~gjm/bioinf0602_supp.pdf
EMMIX-WIREEM-based MIXture analysis With Random Effects
![Page 32: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/32.jpg)
),,1( njijiijij εVcUbXβy
In the ith component of the mixture, the profile vector yj for the jth gene follows the model
1p 1m 1bq 1cq 1p
),(~ iij N H0b
),(~ ci cqi N I0c
),(~ iA0Nij
Tiqiiii e
),,(),diag( 221 φWφA
![Page 33: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/33.jpg)
},|1{);,( cyZprcy jijji
g
h hhhjjh
iiijji
czyf
czyf
1);,1|(
);,1|(
N(iiwith iii VcX T
biii UUAB
![Page 34: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/34.jpg)
• Celeux et al. (2005). Mixtures of linear mixed models for clustering gene expression profiles from repeated microarray measurements. Statistical Modelling 5 , 243-267.
• Qin and Self (2006). The clustering of regression models method with applications in gene expression data. Biometrics 62, 526-533.
• Booth et al. (2008). Clustering using objective functions and stochastic search. J R Statist Soc B 70, 119-139.
![Page 35: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/35.jpg)
Yeast cell cycle data of Cho et al. (1998)
n=237 genes at p=17 time points
categorized into 4 MIPS (Munich Information Centre for Protein Sequences) functional groups.
The yeast system is useful because of our ability to control and monitor the progression of cells through the cell cycle (temperature-based synchronization with temperature-sensitive genes whose product is essential for cell-cycle progression).
![Page 36: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/36.jpg)
High-density oligonucleotide arrays were used to quanitate mRNA transcript levels in synchronized
yeast cells at regular intervals (10 min) during the cell cycle
(genes with cell-cycle dependent periodicity).
Samples of yeast cultures were taken at 17 time points after their cell cycle phase had been synchronized.
The data were reduced to a short time series of log expression ratios for each of the yeast genes represented on the microarrays (expression ratios were calculated by dividing each intensity measurement by the average for that gene.
![Page 37: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/37.jpg)
n = 237 genes
p = 17 time points
ijiijij εVcUbXβy
where
)/2sin()/2cos(
)/2sin()/2cos(
1717
11
TtTt
TtTt
i
Xβ
ijij b171Ub
17,
1
17
i
i
i
c
c
IVc
)diag()diag()cov( 217 iiij 1Wφε
i
i
2
1
Example . Clustering of yeast cell cycle time-course data
![Page 38: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/38.jpg)
In the ith cluster,
ijkikij
ikikjk
ecb
TtTty
)2sin()2cos( 21
![Page 39: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/39.jpg)
jkkkjk eTtTty )/2sin()/2cos( 210
T is the period – estimated to be 73 min.
kt
),0(~ 2Ne jk
0, 10, 20,…, 160
Estimated T following Booth et al. (2004)
![Page 40: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/40.jpg)
The Number Of Components
2 3 4 5 6 7
10883 10848 10837 10865 10890 10918
Table 1: Values of BIC for Various Levels of the Number of Components g
![Page 41: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/41.jpg)
Cluster-specific random effects term
),(~ 17IOc cii N
Tc )0.14 0.04, 0.28, ,23.0(ˆ θ
![Page 42: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/42.jpg)
Table 2: Summary of Clustering Results for g = 4 Clusters
Model Rand Index Adjusted Rand Index
Error Rate
1 0.7808 0.5455 0.2910
2 0.7152 0.4442 0.3160
3 0.7133 0.3792 0.4093
Wong 0.7087 0.3697 NA
![Page 43: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/43.jpg)
• The use of the cluster-specific random effects terms ci leads to a clustering that corresponds more closely to the underlying functional groups than without their use.
![Page 44: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/44.jpg)
Figure 1: Clusters of gene-profiles obtained by mixture of linear mixed models with cluster-
specific random effects
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 2
6 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 1
37
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 5
0 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 2
4 g
en
es
)
![Page 45: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/45.jpg)
Figure 2: Clusters of gene-profiles obtained by mixture of linear mixed models without cluster-
specific random effects
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 1
5 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 1
75
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 2
4 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 2
3 g
en
es
)
![Page 46: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/46.jpg)
Figure 3: Clusters of gene-profiles obtained by mixtures of linear mixed models with and without cluster-specific
random effects
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 1
( 1
5 g
en
es
)
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 2
( 1
75
g
en
es
)
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 3
( 2
4 g
en
es
)
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 4
( 2
3 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 2
6 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 1
37
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 5
0 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 2
4 g
en
es
)
![Page 47: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/47.jpg)
Figure 4: Plots of gene profiles grouped according to their functional grouping
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 4
9 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 1
39
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 3
1 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 1
8 g
en
es
)
![Page 48: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/48.jpg)
Figure 5: Plots of clustered gene profiles versus functional grouping
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 1
( 4
9 g
en
es
)
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 2
( 1
39
g
en
es
)
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 3
( 3
1 g
en
es
)
0 50 100 150
-3
-2
-1
01
23
clu
ste
r 4
( 1
8 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 2
6 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 1
37
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 5
0 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 2
4 g
en
es
)
![Page 49: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/49.jpg)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 3
7 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 4
6 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 1
16
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 3
8 g
en
es
)
Figure 6: Clusters of gene-profiles obtained by k-means
![Page 50: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/50.jpg)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 3
7 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 4
6 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 1
16
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 3
8 g
en
es
)
Figure 7: Plots of Clusters of gene-profiles: Model-based clustering versus k-means
0 50 100 150
-3-2
-10
12
3
clu
ste
r 1
( 2
6 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 2
( 1
37
ge
ne
s )
0 50 100 150
-3-2
-10
12
3
clu
ste
r 3
( 5
0 g
en
es
)
0 50 100 150
-3-2
-10
12
3
clu
ste
r 4
( 2
4 g
en
es
)
![Page 51: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/51.jpg)
Another Yeast Cell Cycle Dataset
Spellman (1998 used α-factor (pheromone) synchronization where the yeast cells were sampled at 7 minute intervals for 119 minutes; the period of the cell cycle was estimated using least squares to be T=53 min.
![Page 52: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/52.jpg)
n = 612 genes
p = 18 time points
ijiijij εVcUbXβy
where
)/.2sin()/.2cos(
)/.2sin()/.2cos(
1818
11
TtTt
TtTt
i
Xβ
ijij b181Ub
18,
1
18
i
i
i
c
c
IVc
)diag()diag()cov( 218 iiij 1Wφε
i
i
2
1
Example . Clustering of time-course data
![Page 53: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/53.jpg)
Clustering Results for Spellman Yeast Cell Cycle Data
![Page 54: Clustering of Time Course Gene-Expression Data via Mixture Regression Models](https://reader035.fdocuments.net/reader035/viewer/2022062323/56815a87550346895dc7f85e/html5/thumbnails/54.jpg)
Mixtures of linear mixed models Useful in modelling biological processes that exhibit periodicity atdifferent temporal scales (not restricted to cell cycle data; e.gchanges in core body temperature, heart rate, blood pressure).
In summary, they provide a flexible tool to cluster high-dimensional data (which may be correlated and structured) for a wide range of experimental designs, e.g. - longitudinal data (with or without replication)
- cross sectional data (multiple samples at one time point).
Provide an integrated framework for the analysis of microarray data by incorporating experimental designinformation and (biological or clinical) covariates.