Analysis of Globally-Coherent Data Sets a two-day course for ...First set the color de nition...
Transcript of Analysis of Globally-Coherent Data Sets a two-day course for ...First set the color de nition...
-
Analysis of Globally-Coherent Data Sets
a two-day course for GlaxoSmithKline
Yinyin Yuan and Mauro Castro ∗
http://www.markowetzlab.org/GCDcourse/
Stevenage, 28 - 29th Oct 2010
Abstract
This tutorial refers to the practical session on day one of the course Analysis of Globally-Coherent Data Sets.The topic is Molecular Data Integration with R.
Part I
1 Introduction
Globally-coherent datasets (GCDs) contain (at least) three levels of information (i) genome-wide DNA variation,(ii) an intermediate trait, as well as (iii) a (clinical) phenotype. Intermediate traits are typically gene expression,but may also include proteomic, metabolomic, and other molecular data. These data sets make it possible todissect how a genomic perturbation (e.g. a somatic copy-number alteration) leads to changes in cellular networksand pathways, which then shape the phenotype (e.g. how aggressive a type of cancer is). Examples of GCDsare the The Cancer Genome Atlas, the International Cancer Genome Consortium, the METABRIC project atthe CRI in Cambridge, as well as the data collected by SAGE Bionetworks.
The challenge of GCDs is to gain a global understanding of how the different layers of information areconnected. While effective statistical methods can provide a system-level view of the genomic landscape,network visualization methods are key to visualizing complex data sets. Together these tools can help to ‘boildown’ the complex multi-layered GCDs into testable hypotheses for in-depth follow-up studies.
Here we exemplify this type of data analysis with a breast cancer dataset comprising CNA, mRNA andpatient information [1].
2 Main setup
Load the package ‘Dance’ and ‘lol’ and the datasets that will be used in the Part I of this tutorial. To do this,first set the filepath of the working directory to the folder called ‘chin07’ (e.g. setwd(‘./chin07/’) ). Please,check this folder, it must contain two other folders called ‘dataDir’ and ‘resDir’: the first one contains all datayou will need and the second is just for results. Also, the root of this directory must have the folder called‘Package’ which contains the main tools to run this tutorial. You can also check, at the last page, the completeversion information about R, including loaded packages and attachments. Then just follow the commentedworkflow below.
Data input
Install the main packages and set the data folders.
> install.packages('../Package/lol_0.5.tar.gz', repos=NULL, type='source')
> install.packages('../Package/HTSanalyzeR_1.2.5.tar.gz', repos=NULL, type='source')
> install.packages('../Package/IGIR_0.1.zip', repos=NULL, type='source')
> install.packages('../Package/Dance_0.9.tar.gz', repos=NULL, type='source')
∗Cambridge Research Institute - CRUK, Li Ka Shing Centre, Robinson Way Cambridge, CB2 0RE, UK.
1
-
> library(Dance)
> library(lol)
> dataDir resDir ER er names(er) ge.data ge.h idx ge.data ge.h cn.data cn.h idx cn.data cn.h rownames(cn.data) commonCol commonCol cn.data ge.data er ge.data ge.h cn.data cn.h
-
Now run the EM algorithm calling discrete copy number:
# minls threshold nSig
-
First set the color definition (default colors are not pretty!).
> load('./dataDir/pam50colors.rdata')
> load('./dataDir/iclustercolors.rdata')
> pam50clusters pam50Subtype.color color.code color.code data fileName breaks pdf(paste('./resDir/', fileName,'.pdf', sep=''))
> heatmap(as.matrix(data),
+ distfun=function(x) dist(x,method='manhattan'),
+ hclustfun=function(x) hclust(x,method='ward'),
+ col=color.code,breaks=breaks,
+ labRow=GE.cyto,scale='none',ColSideColors=pam50Subtype.color)
> dev.off()
2386
2104
2081
2203
2087
2176
2095
2453
2479
2375
2421
2191
2111
2419
2236
2181
2186
2204
2237
2496
2209
2154
2402
2461
2468
2086
2196
2185
2100
2229
2091
2219
2183
2401
2227
2217
2438
2327
2134
2079
2083
2099
2153
2166
2462
2231
2222
2377
2184
2138
2210
2199
2171
2097
2202
2211
2218
2207
2484
2393
2228
2107
2172
2374
2380
2124
2125
2105
2080
2198
2174
2142
2156
2110
2233
2372
2456
2504
2149
2492
2499
2098
2405
2114
2160
2436
2180
2498
2420
2200
2167
2157
2113
2109
2163
2378
2096
2108
2159
2075
2077
2078
2201
2162
2230
2225
16q2116q22.11q24.216q24.31q23.31q441q25.31q42.131q21.21q23.31q21.11q42.131q42.131q221q221q411q23.31q21.21q21.21q441q21.21q21.31q25.31q32.31q221q221q21.21q32.11q21.31q411q21.116q12.216q24.11q21.21q25.31q21.216q12.21q23.31q411q24.21q42.121q42.131q32.21q24.21q32.11q32.21q25.31q24.21q21.11q23.21q32.21q32.11q42.111q42.21q42.121q441q411q411q42.131q21.21q24.11q24.21q32.11q32.31q23.21q24.21q32.31q42.21q42.21q42.31q32.31q411q23.31q24.11q21.31q24.21q21.31q21.31q42.31q32.11q25.31q25.11q221q221q23.31q25.11q24.31q23.31q411q42.21q32.31q32.31q411q411q32.31q32.11q42.31q25.11q42.31q32.11q31.21q441q25.21q25.21q441q42.31q32.11q32.11q32.11q42.131q32.31q32.11q42.121q32.11q32.21q32.31q42.121q23.21q23.21q32.21q32.11q32.11q25.11q23.21q21.21q221q21.21q21.11q21.21q21.11q21.11q21.11q32.31q25.31q411q25.21q24.11q441q21.31q221q23.11q221q21.31q42.21q23.11q23.21q21.21q21.21q221q21.11q221q21.31q21.21q221q21.21q21.31q21.31q21.31q21.31q21.31q23.21q23.21q221q21.31q23.21q221q32.11q25.21q411q24.21q32.31q23.31q21.21q32.11q23.21q23.31q23.21q21.21q21.11q24.21q21.31q23.11q23.31q21.21q23.31q24.31q31.11q24.11q23.31q23.31q411q32.21q42.31q441q32.11q431q21.31q32.11q23.31q42.121q42.131q42.131q42.131q23.31q221q42.131q42.131q42.121q32.11q411q21.21q441q31.21q431q42.131q42.21q42.111q31.31q32.11q21.31q431q42.131q411q21.11q431q441q25.21q23.21q221q21.31q32.11q25.31q411q32.11q23.31q23.31q21.21q21.21q42.21q42.131q21.11q441q221q23.11q221q23.21q221q221q42.111q42.111q21.31q21.31q21.31q21.11q21.21q25.31q221q24.21q221q42.121q32.11q32.11q42.121q42.128q12.31q32.116q24.316p12.116q23.38q21.111q32.11q31.31q431q411q411q23.11q21.316q22.11q21.38q22.28q21.38p11.218p11.218p11.218p128p121q21.21q21.21q21.21q42.131q21.28q24.38q24.38q24.38q24.38q24.38q13.18q24.38q24.38q24.38q24.38q24.38q24.38q24.38q24.38q24.38q24.38q24.138q24.38q24.38q23.18q22.28q24.118q24.128q24.138q24.138q23.28q22.38q22.38q22.28q21.38q24.138q21.118q22.28q22.38q22.116q23.38q11.238q21.28q22.18q22.38q22.28q23.18q24.128q21.118q21.118q13.28q22.38q22.28q24.138q21.138q22.38q22.18q24.111q32.18q13.28q13.28q22.38q11.238q24.138q24.228q24.228q22.18q22.38q22.38q24.128q24.138q24.138q24.138q24.138q22.18q24.138q22.18q24.128q24.118q11.238q21.28q21.138q22.38q22.38q21.38q22.18q22.18q24.118q24.138q24.1317q1217q1217q121q21.31q21.316q22.31q42.1116q22.11q32.116q24.216q22.116q131q221q42.216q23.216q24.316q24.316q24.316q23.116q22.116q22.116q1316q22.116q24.316q22.116q22.116q24.216q22.116q24.316q24.216q22.316q23.216q22.116q1316q131q23.31q21.31q24.21q23.31q42.131q42.1216q24.11q32.116q24.316q24.11q23.31q23.216q22.11q23.11q31.31q411q32.11q32.31q4316q22.31q21.31q42.21q411q221q21.31q21.31q23.11q221q21.316q1316q1316q1316q131q32.21q25.38q24.138q24.138q24.218q22.18q24.2216q24.316q22.116q131q32.1
Figure 1: Result of the hierarchical cluster analysis (GE data).
4
-
Another feature set (CN data).
> data fileName breaks pdf(paste('./resDir/', fileName,'.pdf', sep=''))
> heatmap(as.matrix(data),
+ distfun=function(x) dist(x,method='manhattan'),
+ hclustfun=function(x) hclust(x,method='ward'),
+ col=color.code,breaks=breaks,
+ labRow=CN.cyto,scale='none',ColSideColors=pam50Subtype.color)
> dev.off()S
2181
S24
53S
2496
S24
68S
2227
S21
76S
2419
S20
95S
2402
S21
34S
2231
S22
37S
2327
S22
04S
2210
S21
83S
2083
S20
86S
2438
S22
03S
2196
S24
79S
2202
S23
86S
2087
S24
21S
2104
S21
71S
2436
S23
77S
2504
S20
79S
2081
S23
75S
2207
S21
56S
2107
S20
97S
2393
S22
28S
2138
S22
11S
2219
S24
84S
2091
S20
96S
2186
S21
84S
2154
S22
00S
2218
S21
60S
2217
S22
22S
2153
S20
99S
2166
S21
49S
2111
S22
36S
2461
S21
57S
2167
S21
42S
2185
S23
72S
2405
S24
62S
2159
S20
98S
2114
S23
78S
2174
S21
72S
2077
S21
09S
2110
S23
80S
2125
S22
29S
2124
S21
00S
2080
S22
09S
2163
S24
01S
2233
S21
99S
2108
S21
13S
2499
S24
20S
2492
S24
98S
2230
S21
91S
2456
S21
05S
2198
S21
80S
2162
S20
78S
2075
S23
74S
2201
S22
25
16q1316q1316q22.116q22.316q22.116q22.116q22.116q1316q1316q1316q23.216q23.216q23.116q23.316q12.216q12.216q24.216q24.216q24.116q24.316q24.317q1217q128p128p11.218p11.2116p12.18q24.38q24.38q24.38q24.38q24.38q11.238q13.18q13.28q21.138q21.118q21.118q21.138q21.28q21.38q21.38q22.18q22.18q22.18q22.28q24.228q24.228q24.128q24.138q24.138q24.118q23.18q24.118q22.38q22.38q22.38q22.31q21.11q21.11q21.11q21.11q21.21q21.21q21.21q21.21q21.21q21.31q21.31q221q221q221q21.31q21.31q221q23.11q23.11q221q221q221q221q221q21.31q21.31q21.31q42.131q42.131q441q431q42.31q42.21q42.21q42.131q32.11q32.11q32.11q32.11q32.11q32.11q42.131q32.21q32.31q23.31q23.21q23.31q23.31q31.31q31.21q31.31q31.31q25.21q25.31q25.21q25.31q23.31q23.31q25.11q24.21q24.21q24.1
Figure 2: Result of the hierarchical cluster analysis (CN data).
3.2 Dirichlet process mixture model clustering
We first load the package BHC. This library performs Bayesian hierarchical clustering on discretised CN andGE data. It performs bottom-up hierarchical clustering using Dirichlet process to model uncertainty in thedata and Bayesian model selection to decide at each step which cluster merges.
Load BHC package.
> library(BHC)
> samples bhc.cn bhc.ge
-
Plot results.
> pdf(file='./resDir/BHCplotCN.pdf', width=30, height=8)
> plot(bhc.cn)
> dev.off()
> WriteOutClusterLabels(bhc.cn, "./resDir/BHClabelsCN.txt", verbose=TRUE)
> pdf(file='./resDir/BHCplotGE.pdf', width=30, height=8)
> plot(bhc.ge)
> dev.off()
> WriteOutClusterLabels(bhc.ge, "./resDir/BHClabelsGE.txt", verbose=TRUE)
020
4060
80
−5236.224
−1235.195
−231.797
●
S22
29.B
asal
S24
01.B
asal
S23
80.L
umA
S21
25.L
umB
S21
00.B
asal
S20
80.L
umB
S22
09.B
asal
S21
80.N
orm
al
S20
99.H
er2
S21
42.L
umA
S21
63.N
orm
al
S21
99.L
umB
S21
66.B
asal
S22
33.L
umA
S21
07.L
umB
S22
28.L
umB
−16.451
●
S22
00.L
umA
S21
56.L
umA
S21
54.B
asal
S22
18.H
er2
S21
49.B
asal
S21
24.L
umB
−291.343
●
S21
08.L
umB
S21
05.L
umB
S24
62.B
asal
S22
01.L
umA
S21
13.L
umA
S24
99.L
umA
S21
72.L
umB
S24
92.L
umA
S21
74.L
umA
S24
20.L
umA
S23
74.L
umA
S21
98.L
umA
S22
25.L
umA
S21
85.B
asal
S22
17.B
asal
S20
78.L
umA
S21
62.B
asal
S21
59.L
umA
S21
53.B
asal
S21
09.N
orm
al
S24
56.L
umB
S24
98.L
umA
S24
05.L
umA
S22
30.L
umA
S21
14.L
umA
S21
10.L
umA
S20
77.L
umA
S23
78.H
er2
S20
75.L
umA
S20
98.L
umA
S21
91.L
umA
−1340.899
−163.485
●
S22
07.L
umB
S21
38.L
umB
S23
86.N
orm
al
S23
77.L
umB
S24
84.L
umB
S24
21.N
orm
al
S20
87.L
umA
S24
36.L
umA
S23
75.L
umB
S21
71.L
umB
S23
72.L
umA
S25
04.L
umB
S22
22.H
er2
S20
91.H
er2
S22
19.B
asal
S22
11.H
er2
S20
81.N
orm
al
S22
02.H
er2
S20
79.B
asal
S24
79.L
umB
S21
04.N
orm
al
S20
97.L
umB
S23
93.L
umB
−535.501
●
S20
86.B
asal
S22
10.L
umB
S24
61.B
asal
S20
95.L
umB
S22
37.N
orm
al
S22
04.N
orm
al
S22
03.N
orm
al
S21
67.N
orm
al
S24
53.L
umA
S21
81.L
umA
S24
96.B
asal
S24
68.B
asal
S23
27.B
asal
S21
96.B
asal
S21
83.B
asal
S24
38.H
er2
S22
31.H
er2
S20
83.B
asal
S22
27.B
asal
−112.034
●
S21
34.B
asal
S24
02.B
asal
S22
36.L
umA
S21
11.L
umA
S24
19.L
umA
S21
76.H
er2
S21
57.L
umA
−1.577●
S21
60.H
er2
S20
96.L
umA
S21
86.L
umA
S21
84.L
umB
(a) bhc.cn
010
2030
4050
60
−3737.47
−479.70
−71.22
●
S21
62.B
asal
S20
98.L
umA
S20
96.L
umA
S23
72.L
umA
S21
08.L
umB
S21
56.L
umA
S24
98.L
umA
S21
80.N
orm
al
S24
20.L
umA
S22
00.L
umA
S21
57.L
umA
S21
10.L
umA
S21
13.L
umA
S20
75.L
umA
S20
78.L
umA
S20
77.L
umA
S22
01.L
umA
S22
30.L
umA
S21
09.N
orm
al
S21
59.L
umA
S21
63.N
orm
al
−11.08●
S21
74.L
umA
S21
11.L
umA
S23
78.H
er2
S22
36.L
umA
S21
42.L
umA
S21
86.L
umA
−166.63
●
S21
05.L
umB
S21
98.L
umA
S21
24.L
umB
S22
07.L
umB
S20
80.L
umB
S21
72.L
umB
S23
80.L
umA
S21
25.L
umB
S21
60.H
er2
S25
04.L
umB
S24
56.L
umB
S22
25.L
umA
S24
05.L
umA
S23
74.L
umA
S21
14.L
umA
S24
92.L
umA
S24
99.L
umA
−1330.75
−253.52
●
S20
87.L
umA
S20
81.N
orm
al
S21
34.B
asal
S21
67.N
orm
al
S23
27.B
asal
S22
17.B
asal
S22
27.B
asal
S24
02.B
asal
S21
91.L
umA
S22
03.N
orm
al
S24
19.L
umA
S21
04.N
orm
al
S23
86.N
orm
al
S21
76.H
er2
S21
81.L
umA
S24
96.B
asal
S22
04.N
orm
al
S22
37.N
orm
al
−11.27
●
S24
53.L
umA
S20
95.L
umB
S23
77.L
umB
S24
36.L
umA
S24
79.L
umB
S23
75.L
umB
S24
21.N
orm
al
S21
49.B
asal
S22
09.B
asal
S24
38.H
er2
−386.77
−1.18
●
S21
84.L
umB
S22
10.L
umB
S20
79.B
asal
S22
18.H
er2
S20
83.B
asal
S22
28.L
umB
S21
71.L
umB
S21
99.L
umB
S24
84.L
umB
S21
00.B
asal
S21
38.L
umB
S22
19.B
asal
S23
93.L
umB
S22
02.H
er2
S20
91.H
er2
S22
11.H
er2
S20
97.L
umB
S21
07.L
umB
−75.24
●
S22
29.B
asal
S21
66.B
asal
S24
01.B
asal
S20
86.B
asal
S21
85.B
asal
S24
68.B
asal
S21
96.B
asal
S21
83.B
asal
S21
54.B
asal
S24
61.B
asal
−1.18●
S20
99.H
er2
S22
33.L
umA
S21
53.B
asal
S22
31.H
er2
S24
62.B
asal
S22
22.H
er2
(b) bhc.ge
Figure 3: Result of bhc() function.
3.3 Integrative clustering
We first load the package iCluster and then organize the data for the analysis [2]. The iCluster libraryimplements a sparse clustering method which will select features from CN and GE data for joint clusteringsamples. Here the idea is to generate integrated cluster assignments based on joint inference across multipledata types.
Load iCluster library and create an input object for iCluster.
> library(iCluster)
> datasets datasets[[1]] datasets[[2]] fit iclusters
-
> selected.cn.feature selected.ge.feature selected.feature selected.feature names(selected.feature) write.table(selected.feature, file='./resDir/selectedFeatureByIntClust.txt',
+ sep='\t', quote=FALSE, row.names=FALSE, col.names=FALSE)
Plot the integrative clustering result using plotCE() function from Dance package.
> dataCol dataColors CN.data GE.data rownames(CN.data) rownames(GE.data) plotCE(CN.data[selected.cn.feature, ], GE.data[selected.ge.feature, ], dataCol=dataCol,
+ grouping=iclusters, dataColors=dataColors, fileName='./resDir/IntClustOutcome.pdf')
1q25.31q32.21q32.31q32.11q21.21q23.31q24.11q32.11q221q23.31q42.38q24.131q21.21q21.31q24.21q32.11q42.131q42.21q448q24.111q32.18q22.11q21.21q221q24.21q25.11q25.21q25.31q31.31q42.131q42.28q24.1216q24.31q21.11q32.18q21.21q21.21q21.21q21.31q21.31q228q22.38q23.18q24.1116q22.116q23.21q23.11q23.31q25.21q438q22.316q22.116q24.216q24.31q21.31q23.18q22.18q22.28q22.38q24.1316q1316q1316q24.11q21.11q21.11q21.11q21.31q21.31q228q22.18q24.316q1316q22.31q221q31.28q21.118q21.138q21.38q24.316q1316q1316q22.18q11.238q13.18q24.228q24.221q21.31q31.31q32.11q42.138q13.28q21.38q22.316q12.216q22.116q23.116q23.216q24.21q221q228q21.118q21.1316q23.3
1q23.31q21.21q42.121q23.31q411q25.38q24.121q21.11q42.111q21.21q21.31q221q23.31q32.11q42.118q22.316q22.116q23.21q42.121q42.131q42.131q42.21q438q22.28q22.38q24.131q24.21q441q448q22.18q22.216q131q21.11q21.31q221q25.31q32.11q42.128q24.1316q131q23.28q21.118q21.138q21.38q22.18q22.316q1316q2116q22.116q22.116q24.31q21.21q21.21q221q32.11q42.131q42.38q11.238q22.18q24.31q21.21q21.21q21.31q23.11q23.21q24.21q25.11q411q42.131q448q13.28q13.28q21.118q22.38q22.316q22.116q23.116q23.216q24.216q24.31q221q221q23.31q23.31q31.11q32.18q11.238q21.118q22.316q22.116q22.116q22.116q22.116q22.316q24.216q24.316q24.3
Figure 4: Result of plotCE() function.
How these outcomes from different clustering algorithms differ?
> source("./dataDir/myfunctions.r")
> bhc.cn.clusters bhc.ge.clusters clustering1 clustering2 clustering3
-
> data rownames(data) colnames(data) pdf(file='./resDir/coOccuranceClustering.pdf')
> heatmap(as.matrix(data),
+ distfun=function(x) dist(x, method='manhattan'),
+ hclustfun=function(x) hclust(x, method='ward'),
+ col=c('white','darkblue','darkred','black'),breaks=c(-1, 0.5, 1.5, 2.5, 3.5),
+ scale='none', ColSideColors=pam50Subtype.color)
> dev.off()
S21
09.N
orm
alS
2498
.Lum
AS
2230
.Lum
AS
2159
.Lum
AS
2110
.Lum
AS
2098
.Lum
AS
2078
.Lum
AS
2077
.Lum
AS
2162
.Bas
alS
2075
.Lum
AS
2401
.Bas
alS
2166
.Bas
alS
2229
.Bas
alS
2149
.Bas
alS
2199
.Lum
BS
2233
.Lum
AS
2209
.Bas
alS
2099
.Her
2S
2142
.Lum
AS
2456
.Lum
BS
2114
.Lum
AS
2405
.Lum
AS
2191
.Lum
AS
2185
.Bas
alS
2378
.Her
2S
2153
.Bas
alS
2492
.Lum
AS
2499
.Lum
AS
2172
.Lum
BS
2174
.Lum
AS
2462
.Bas
alS
2372
.Lum
AS
2420
.Lum
AS
2163
.Nor
mal
S21
80.N
orm
alS
2218
.Her
2S
2100
.Bas
alS
2079
.Bas
alS
2202
.Her
2S
2124
.Lum
BS
2080
.Lum
BS
2375
.Lum
BS
2222
.Her
2S
2479
.Lum
BS
2083
.Bas
alS
2231
.Her
2S
2210
.Lum
BS
2438
.Her
2S
2468
.Bas
alS
2461
.Bas
alS
2196
.Bas
alS
2086
.Bas
alS
2183
.Bas
alS
2217
.Bas
alS
2081
.Nor
mal
S21
04.N
orm
alS
2237
.Nor
mal
S22
04.N
orm
alS
2203
.Nor
mal
S24
96.B
asal
S22
27.B
asal
S23
27.B
asal
S21
56.L
umA
S21
08.L
umB
S21
13.L
umA
S22
01.L
umA
S23
80.L
umA
S21
25.L
umB
S21
98.L
umA
S21
05.L
umB
S22
25.L
umA
S23
74.L
umA
S23
93.L
umB
S20
97.L
umB
S22
11.H
er2
S22
19.B
asal
S20
91.H
er2
S21
07.L
umB
S22
28.L
umB
S22
07.L
umB
S21
38.L
umB
S24
84.L
umB
S24
53.L
umA
S20
95.L
umB
S21
81.L
umA
S21
67.N
orm
alS
2087
.Lum
AS
2386
.Nor
mal
S24
19.L
umA
S21
76.H
er2
S21
34.B
asal
S24
02.B
asal
S21
60.H
er2
S25
04.L
umB
S21
71.L
umB
S24
21.N
orm
alS
2436
.Lum
AS
2377
.Lum
BS
2157
.Lum
AS
2200
.Lum
AS
2096
.Lum
AS
2111
.Lum
AS
2236
.Lum
AS
2154
.Bas
alS
2186
.Lum
AS
2184
.Lum
B
S2109.NormalS2498.LumAS2230.LumAS2159.LumAS2110.LumAS2098.LumAS2078.LumAS2077.LumAS2162.BasalS2075.LumAS2401.BasalS2166.BasalS2229.BasalS2149.BasalS2199.LumBS2233.LumAS2209.BasalS2099.Her2S2142.LumAS2456.LumBS2114.LumAS2405.LumAS2191.LumAS2185.BasalS2378.Her2S2153.BasalS2492.LumAS2499.LumAS2172.LumBS2174.LumAS2462.BasalS2372.LumAS2420.LumAS2163.NormalS2180.NormalS2218.Her2S2100.BasalS2079.BasalS2202.Her2S2124.LumBS2080.LumBS2375.LumBS2222.Her2S2479.LumBS2083.BasalS2231.Her2S2210.LumBS2438.Her2S2468.BasalS2461.BasalS2196.BasalS2086.BasalS2183.BasalS2217.BasalS2081.NormalS2104.NormalS2237.NormalS2204.NormalS2203.NormalS2496.BasalS2227.BasalS2327.BasalS2156.LumAS2108.LumBS2113.LumAS2201.LumAS2380.LumAS2125.LumBS2198.LumAS2105.LumBS2225.LumAS2374.LumAS2393.LumBS2097.LumBS2211.Her2S2219.BasalS2091.Her2S2107.LumBS2228.LumBS2207.LumBS2138.LumBS2484.LumBS2453.LumAS2095.LumBS2181.LumAS2167.NormalS2087.LumAS2386.NormalS2419.LumAS2176.Her2S2134.BasalS2402.BasalS2160.Her2S2504.LumBS2171.LumBS2421.NormalS2436.LumAS2377.LumBS2157.LumAS2200.LumAS2096.LumAS2111.LumAS2236.LumAS2154.BasalS2186.LumAS2184.LumB
Figure 5: Result of co-occurance clustering. Blue is co-occurance once, red is twice and black is three times.
Discussions
Only CN data from chromosome 1 and 8 are chosen in the iCluster output, why? And what happenswhen the penalty (lambda) is set lower?
Setting the parameters for making calls from segmented CN data can be tricky in CGHcall. The minimumsegment to be fit has to be adjusted according to the probe designed on the array.
Co-occurance matrix enables comparisons across many clustering outcomes, but how many cluster shouldbe there in this dataset, and does the POD score tell the same story?
8
-
Homework: Is there an optimal number of clusters?
Compute scores of iCluster using different number of clusters k.
# score.pod
-
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mB
Lum
BLu
mB
Bas
alH
er2
Her
2Lu
mA
Lum
BLu
mB
Lum
BLu
mB
Lum
BLu
mB
Lum
BB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Her
2H
er2
Her
2H
er2
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mB
Lum
BLu
mB
Lum
BLu
mB
Nor
mal
Nor
mal
Nor
mal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alH
er2
Her
2H
er2
Her
2H
er2
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mB
Lum
BLu
mB
Lum
BLu
mB
Lum
BLu
mB
Lum
BN
orm
alN
orm
alN
orm
alN
orm
alN
orm
alN
orm
alN
orm
alN
orm
al
NormalNormalNormalNormalNormalNormalNormalNormal
LumBLumBLumBLumBLumBLumBLumBLumBLumALumALumALumALumALumALumALumALumALumALumAHer2Her2Her2Her2Her2
BasalBasalBasalBasalBasalBasalBasalBasalBasalBasalBasalBasalBasalNormal
NormalNormalLumBLumBLumBLumBLumBLumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumAHer2Her2Her2Her2
BasalBasalBasalBasalBasalBasalBasalBasalBasalBasalBasalLumBLumBLumBLumBLumBLumBLumBLumAHer2Her2Basal
LumBLumBLumBLumALumALumALumALumALumALumA
K=4
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
BLu
mB
Lum
BB
asal
Her
2H
er2
Lum
ALu
mB
Lum
BLu
mB
Lum
BLu
mB
Lum
BLu
mB
Bas
alB
asal
Her
2Lu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
BLu
mB
Lum
BN
orm
alN
orm
alN
orm
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alH
er2
Her
2H
er2
Her
2Lu
mB
Lum
BLu
mB
Lum
BLu
mB
Nor
mal
Nor
mal
Nor
mal
Nor
mal
Nor
mal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Her
2H
er2
Her
2H
er2
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
BLu
mB
Lum
BLu
mB
Lum
BN
orm
alN
orm
alN
orm
al
NormalNormalNormalLumB
LumBLumBLumBLumBLumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumAHer2Her2Her2Her2
BasalBasalBasalBasalBasalBasalBasalBasalBasalBasal
NormalNormalNormalNormalNormalLumB
LumBLumBLumBLumBHer2Her2Her2Her2
BasalBasalBasalBasalBasalBasalBasalBasalBasalBasalBasalBasal
NormalNormalNormalLumB
LumBLumBLumALumALumALumALumALumALumALumALumALumALumAHer2BasalBasalLumBLumBLumBLumBLumBLumBLumBLumAHer2Her2
BasalLumBLumBLumBLumALumALumALumALumALumA
K=5Lu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mB
Lum
BLu
mB
Bas
alH
er2
Her
2Lu
mA
Lum
BLu
mB
Lum
BLu
mB
Lum
BLu
mB
Lum
BB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Her
2H
er2
Lum
BLu
mB
Lum
BLu
mB
Bas
alB
asal
Her
2Lu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
BLu
mB
Lum
BN
orm
alN
orm
alN
orm
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Her
2H
er2
Her
2H
er2
Lum
BLu
mB
Lum
BLu
mB
Lum
BN
orm
alN
orm
alN
orm
alN
orm
alN
orm
alB
asal
Bas
alH
er2
Her
2Lu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mB
Nor
mal
Nor
mal
Nor
mal
NormalNormalNormalLumB
LumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumAHer2Her2
BasalBasalNormalNormalNormalNormalNormalLumB
LumBLumBLumBLumBHer2Her2Her2Her2
BasalBasalBasalBasalBasalBasalBasalBasalBasalNormal
NormalNormalLumBLumBLumBLumALumALumALumALumALumALumALumALumALumALumAHer2Basal
BasalLumBLumBLumBLumBHer2Her2Basal
BasalBasalBasalBasalBasalBasalBasalBasalBasalBasalLumBLumBLumBLumBLumBLumBLumBLumAHer2Her2
BasalLumBLumBLumBLumALumALumALumALumALumA
K=6
Her
2Lu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mB
Lum
BLu
mB
Bas
alH
er2
Her
2Lu
mA
Lum
BLu
mB
Lum
BLu
mB
Lum
BLu
mB
Lum
BB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alH
er2
Her
2Lu
mB
Lum
BLu
mB
Lum
BB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alB
asal
Bas
alH
er2
Her
2H
er2
Lum
BLu
mB
Nor
mal
Nor
mal
Nor
mal
Nor
mal
Bas
alB
asal
Her
2Lu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
BLu
mB
Lum
BLu
mB
Lum
BLu
mB
Nor
mal
Nor
mal
Nor
mal
Nor
mal
Bas
alB
asal
Her
2H
er2
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
ALu
mA
Lum
BN
orm
alN
orm
alN
orm
al
NormalNormalNormalLumB
LumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumALumAHer2Her2
BasalBasalNormalNormalNormalNormal
LumBLumBLumBLumBLumBLumBLumALumALumALumALumALumALumALumALumALumALumAHer2BasalBasal
NormalNormalNormalNormal
LumBLumBHer2Her2Her2Basal
BasalBasalBasalBasalBasalBasalBasalBasalBasalLumBLumBLumBLumBHer2Her2Basal
BasalBasalBasalBasalBasalBasalBasalBasalBasalLumBLumBLumBLumBLumBLumBLumBLumAHer2Her2Basal
LumBLumBLumBLumALumALumALumALumALumAHer2
K=7
(a) k clusters
●●
● ●
4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.19
0.24
4:7
scor
e.po
d
(b) iCluster scores
Figure 6: Result of plotiCluster() function.
10
-
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
attached base packages:
[1] splines grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] iCluster_1.2.0 corpcor_1.5.7 penalized_0.9-32
[4] survival_2.35-8 DLBCL_1.3.1 snow_0.3-3
[7] Dance_0.9 IGIR_0.1 igraph_0.5.4-2
[10] KEGG.db_2.4.5 RSQLite_0.9-2 DBI_0.2-5
[13] Matrix_0.999375-44 lol_0.5 CGHregions_1.8.0
[16] CGHbase_1.8.0 marray_1.27.0 limma_3.5.21
[19] qvalue_1.24.0 samr_1.28 impute_1.22.0
[22] biomaRt_2.5.1 HTSanalyzeR_1.2.5 RankProd_2.22.0
[25] cellHTS2_2.14.0 locfit_1.5-6 lattice_0.19-13
[28] akima_0.5-4 hwriter_1.2 vsn_3.17.2
[31] splots_1.16.0 genefilter_1.31.2 RColorBrewer_1.0-2
[34] BioNet_1.8.0 RBGL_1.25.1 GSEABase_1.12.0
[37] graph_1.28.0 annotate_1.27.3 AnnotationDbi_1.11.10
[40] Biobase_2.9.2 R.utils_1.5.3 R.oo_1.7.4
[43] R.methodsS3_1.2.1
loaded via a namespace (and not attached):
[1] affy_1.28.0 affyio_1.18.0 Category_2.16.0
[4] MASS_7.3-7 prada_1.26.0 preprocessCore_1.12.0
[7] RCurl_1.4-4.1 rrcov_1.1-00 stats4_2.12.0
[10] tcltk_2.12.0 tools_2.12.0 XML_3.2-0.1
[13] xtable_1.5-6
11