Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays...
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
1
Transcript of Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays...
![Page 1: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/1.jpg)
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays
Department of Biostatistics, University pf North Carolina, Chapel Hill
Division of Human Cancer GeneticsOhio State University
William J. Lemon, Jeffrey J.T. Palatini, Ralf Krahe, Fred A. Wright
![Page 2: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/2.jpg)
Measuring gene expression with the Affymetrix GeneChip
Perfect Match (PM)
Mismatch (MM)
PM - 25 bases complementary to region of gene
MM - Middle base is different
...
Coding portion of gene X polyA
•cRNA from sample mRNA is put on the chip
•intensity of binding reflects gene expression
![Page 3: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/3.jpg)
Reproducibility of Probe Sensitivities
Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.
![Page 4: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/4.jpg)
The Li-Wong Model
Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.
Li-Wong Full (LWF)
Li-Wong Reduced (LWR)
),0(~
,2
Ne
eMM
ePM
ijjij
ijijjij
222 2),,0(~
,
N
MMPMy ijijijij
Identifiability constraint j
j J2
![Page 5: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/5.jpg)
The Li-Wong Model
Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.
Li-Wong Full (LWF)
Li-Wong Reduced (LWR)
),0(~
,2
Ne
eMM
ePM
ijjij
ijijjij
222 2),,0(~
,
N
MMPMy ijijijij
Identifiability constraint j
j J2
ith array
jth probe pair
Total no. probe pairs
![Page 6: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/6.jpg)
The Li-Wong Model
Li, C and Wong, WH, Proc. Natl. Acad. Sci. USA, 98:31-36, 2001.
Li-Wong Full (LWF)
Li-Wong Reduced (LWR)
),0(~
,2
Ne
eMM
ePM
ijjij
ijijjij
222 2),,0(~
,
N
MMPMy ijijijij
Identifiability constraint j
j J2
ith array
jth probe pair
Total no. probe pairs
expression
sensitivities
![Page 7: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/7.jpg)
How to compare gene expression indexes?
•We get maximum likelihood estimates for using either full data (LWF) or reduced data (LWR)
•The Affymetrix software computes:
Average Difference (AD)
Log-Average (LA)
•The log-average might perform particularly poorly. Note that if terms are small and error variance is small,
.ˆ j
j Jy
JMMPMj
jj /)/log(10
)/()()/()()/( jjjjjjjj MMPM
![Page 8: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/8.jpg)
•We gain insight by assuming Li-Wong model is true. Then what are the consequences?
•For large sample sizes, the ’s and ’s will be well-estimated
![Page 9: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/9.jpg)
Compare LW estimators directly:
0.2)(
2)ˆvar(
)ˆvar(),(
22
JreducedfullRE j
jjj
j
full
reduced
Comparing to AD is tricky, but with a correction factor AD is also an unbiased estimate of :
ˆˆ
jjJ
0.1)var(1
1
)ˆvar(
)ˆvar(),(
reduced
ADreducedRE
![Page 10: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/10.jpg)
•This also gives insight into “perfect match only” analyses:
RE(full, PM-only)=
jjj
jj
full
PM2
2
)(1
)ˆvar(
)ˆvar(
21 REand
Furthermore, PM-only is always at least twice as efficient as LWR
![Page 11: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/11.jpg)
Empirical Comparisons
•We propose that an expression index is “good” if it has a high correlation with the underlying true expression (which is usually unknown).
•this correlation can be estimated using a specially designed mixing experiment
•if r is the correlation coefficient between the measured index and true expression, the “relative efficiency” of two indexes and can be estimated as
)1/(
)1/(22
22
rr
rr
![Page 12: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/12.jpg)
),0(~,ˆ 210 Nee
).,0(~,ˆ 210 Nee
Suppose the true underlying gene expression for a given gene is . Consider two indices of gene expression
10 /)ˆ(ˆ is an unbiased estimate of
21
2 /)ˆ
var(
21
2
21
2
/
/
)ˆ
var(
)ˆvar()ˆ,ˆ(
RE
And we have
![Page 13: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/13.jpg)
Can we estimate this relative efficiency?
•Suppose we could do a regression of on .
•the ratio of explained to residual variance in the model can be shown to be
2
222
11
/)var(
r
r
)ˆ,ˆ()1/(
)1/(22
22
RErr
rr
and similarly for , so
![Page 14: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/14.jpg)
Can we estimate r without ever knowing true expressions ?
•Yes, with a specially designed mixing experiment
•we seek two contrasting conditions in which many genes will be differentially expressed
![Page 15: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/15.jpg)
Experimental Design
Human Fibroblasts(GM 08330)
20% FBS
48h
24hHarvest total RNA
Lys, PheDap, Thr
50:50
Add Bacterial Control Genes
StimulatedStarved
5 passages
Dap, Thr,Lys, Phe
Produce 50:50 group
Produce duplicates each day for 3d
Synthesize cDNA, cRNA; fragment
Add Hybridization Control Genes
BioB, BioC, BioD, Cre
Hybridize HuGeneFL
0.1% FBS
Serum starvation
Cell culture
Serum stimulation0.1%
20%
Harvest total RNA
Gene Expression IndexesData Reduction
RNA extraction
20% FBS
(6 replicates for each condition)
![Page 16: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/16.jpg)
BIN1 expression
Stim 50:50 Starved
True expression = average of Stim, Starved
full
![Page 17: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/17.jpg)
BIN1 expression
Stim 50:50 Starved
full
1 2 3
![Page 18: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/18.jpg)
X
X
r
or
r
rr
,ˆ
,ˆ
,ˆ
Note that
Where X=1, 2, 3 (say) for Stim, 50:50 Starved, respectively
![Page 19: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/19.jpg)
Mean probe intensity per array
Stim 50:50 Starved
Overall intensity higher in Stimulated
![Page 20: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/20.jpg)
Coefficients of variation for assay (individual probes) and gene expression indexes
0.0 0.5 1.0 1.5 2.0
02
00
00
60
00
01
00
00
0
Assay Stim
CV
# P
rob
es
0.121
0.0 0.5 1.0 1.5 2.0
05
00
10
00
15
00
20
00
25
00
LWF Stim
CV
# g
en
es
0.149
0.0 0.5 1.0 1.5 2.0
02
00
40
06
00
80
0
Affymetrix AD Stim
CV
# g
en
es
0.293
![Page 21: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/21.jpg)
Stim 50:50 Starved Stim 50:50 Starved
Stim
50:50
Starved
Stim
50:50
Starved
LWF
AD
LWR
LA
Correlation matrix of 18 arrays as a colorized image for each expression index.
![Page 22: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/22.jpg)
Comparing ModelsCluster Analysis
Affymetrix Log Ave
Full Model Reduced Model
Affymetrix Ave Diff
Str
v 1
Str
v 4
Str
v 2
Str
v 5
Str
v 3
Str
v 6
50
:50
35
0:5
0 5
50
:50
45
0:5
0 2
50
:50
15
0:5
0 6
Sti
m 4
Sti
m 6
Sti
m 5
Sti
m 3
Sti
m 1
Sti
m 2
Sti
m 2
Str
v 1
Str
v 3
Str
v 2
Str
v 6
Str
v 5
Str
v 4
Sti
m 1
Sti
m 6
Sti
m 3
Sti
m 5
Sti
m 4
50
:50
55
0:5
0 4
50
:50
35
0:5
0 2
50
:50
15
0:5
0 6
Str
v 3
Str
v 4
Str
v 6
Str
v 5
Str
v 2
Str
v 1
Sti
m 2
Sti
m 1
Sti
m 4
Sti
m 5
Sti
m 6
Sti
m 3
50
:50
55
0:5
0 4
50
:50
25
0:5
0 1
50
:50
65
0:5
0 3
Str
v 2
Str
v 3
Str
v 1
Str
v 6
Str
v 5
Str
v 4
Sti
m 2
Sti
m 4
50
:50
1S
tim
1S
tim
6S
tim
3S
tim
55
0:5
0 3
50
:50
55
0:5
0 4
50
:50
25
0:5
0 6
![Page 23: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/23.jpg)
Relative Efficiency
0.0
0.5
1.0
1.5
LWF
LWR
AD LA
Med
ian(
r2 /(1
-r2 )
)
LWF
LWR
AD LA
Unscaled Scaled
![Page 24: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/24.jpg)
Correlation of duplicate measurements of 149 genes
LWF median r=.74
LWR median r=.43
AD median r=.08
LA median r=.17
![Page 25: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/25.jpg)
Number of unexpressed genes•Only 0.2% of the LW estimates are negative
•50:50 group has fewest negative estimates
•could this indicate very few unexpressed genes?
Stim 50:50 Starved
![Page 26: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/26.jpg)
A conservative approach to estimating number of unexpressed genes
•Let U denote number of unexpressed genes
•genes are ranked according to expression index
)genes all amonggenesofrankmedian(2 UU
•This is useful if we can get a random sample of unexpressed genes
Unexpressed population
Gene expression index
![Page 27: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/27.jpg)
•We use the spiked-out bacterial control genes as a sample of “unexpressed” genes
•the 4 genes are are represented 3 times each (different portions of mRNA), for a total of 12 probe sets
•Based on this reasoning, we estimate that greater than 88% of the genes are expressed, even in the Starved samples
![Page 28: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/28.jpg)
Rank of expression index variance across the 6 Stimulated arrays versus rank of index mean
Truly absent in stim group
Rank(mean)
Ran
k(va
r)
0 2000 4000 6000
020
0040
0060
00
Rank(mean)
Ran
k(va
r)
0 2000 4000 6000
2000
4000
6000
DapThrPheLys
ADLWF
Very low estimated expression for truly absent genes when using LWF
![Page 29: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/29.jpg)
Present/absent calls
•We use the statistic
)ˆ(
ˆ
SEz
to declare genes present/absent (absolute call)
•we find the vast majority of genes on the array appear to be present
•for the spiked in/out genes, we find vastly improved present/absent calling using LW estimates
![Page 30: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/30.jpg)
False Positive Rate0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
(1 - Specificity)
(Sensitiv
ity)
1 -
Fals
e N
egative R
ate LWF-Z
LWR-Z
Untrimmed AD
Untrimmed LA
LA
AD
Absolute Call
ROC curve - spiked in/out genes
![Page 31: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/31.jpg)
Variability in estimates
Full Model Reduced Model
log(
vari
ance
)
log(mean)
Stim
50:50
Starved
![Page 32: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/32.jpg)
Conclusions
• Model-based estimators are superior to simple averaging• Full model superior to reduced• this does not necessarily mean that the mismatch probes
are a good idea - but if they are present we should use them
• we have demonstrated this using both analytic considerations and experimental data
• a carefully designed experiment can be used to address many issues
• Many more genes may be expressed than previously thought
![Page 33: Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Department of Biostatistics, University pf North Carolina,](https://reader037.fdocuments.net/reader037/viewer/2022103022/56649d5d5503460f94a3b71f/html5/thumbnails/33.jpg)
Other issues/ future work
•Spiking genes might be used to calibrate and normalize arrays
•relationship between variance and mean of expression indexes may be useful in planning experiments
•our data may be useful for future work, especially in producing indexes that are resistant to probe saturation
•all primary data, this Powerpoint presentation and a preprint are available at http://thinker.med.ohio-state.edu