Differentially expressed genes 09/19/07. Identify differentially expressed genes.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
-
date post
19-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
![Page 1: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/1.jpg)
Detecting Differentially Expressed Genes
Pengyu Hong09/13/2005
![Page 2: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/2.jpg)
Background (Microarray)
CellsExtract RNA
![Page 3: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/3.jpg)
Background
CellsExtract RNA
![Page 4: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/4.jpg)
Background
CellsExtract RNA
![Page 5: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/5.jpg)
Background
CellsExtract RNA
![Page 6: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/6.jpg)
Background
CellsExtract RNA
104+ genes
![Page 7: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/7.jpg)
Background
CellsExtract RNA
104+ genes
![Page 8: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/8.jpg)
Background
CellsExtract RNA
104+ genes
![Page 9: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/9.jpg)
Background
Biological sample
RNA extraction (total RNA or mRNA) Amplification (in vitro transcription) Label samples Hybridization Washing and staining
Scanning
• Microarrays are highly noisy • Use replicated experiments to
make inferences about differential expression for the population from which the biological samples originate
biological variability
technical variability
![Page 10: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/10.jpg)
Background
Normalization
Calculate Gene Expression Index
![Page 11: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/11.jpg)
An Example
probe set gene Normal m412aNormal m414aNormal m416aNormal m426aNormal m430aMM m282 MM m331aMM m332aMM m333aMM m334aMM m353aMM m408aMM m423a31307_at pre-T/NK cell associated protein28.53 32.61 29.56 36.55 33.19 25.1 32.79 34.3 35.44 28.48 29.55 22.28 28.7731308_at pre-T/NK cell associated protein69.14 53.69 52.78 62.07 58.74 67.88 85.82 83.54 85.91 60.93 62.82 47.17 77.0731309_r_atHuman breast cancer suppressor element Ishmael Upper CP1 mRNA, partial cds16.9 67.7 27.61 46.16 51.46 45.62 35.57 32.62 35.14 96.18 45.94 63.2 38.2731310_at glycine receptor, alpha 1 (startle disease/hyperekplexia, stiff man syndrome)67.42 49.55 55.51 59.57 68.42 91.06 91.23 83.66 76.37 71.23 74.95 74.04 100.7731311_at Homo sapiens cDNA FLJ40594 fis, clone THYMU2010671, highly similar to Homo sapiens T-cell receptor78.73 62.91 60.84 72.98 72.9 79.39 85.52 82.57 69.69 63.72 64.29 62.85 67.5831312_at potassium voltage-gated channel, Shab-related subfamily, member 266.65 59.46 55.47 61.75 69.92 75.28 85.53 97.91 69.92 74.77 71.83 58.17 72.1531313_at mannosyl (alpha-1,6-)-glycoprotein beta-1,6-N-acetyl-glucosaminyltransferase115.33 95.51 84.48 94.99 109.04 105.05 118.68 106.76 142.88 103.72 106.19 98.58 104.1331314_at bone morphogenetic protein 3 (osteogenic)71.89 36.24 41.86 46.99 45.94 46.67 67.56 66.14 53.95 40.97 47.96 43.63 53.5431315_at immunoglobulin lambda locus103.99 88.27 83.81 81.81 254.63 87.12 99.11 109.56 86.37 75.03 74.97 69.02 97.2231316_at Human vacuolar ATPase (isoform HO68) mRNA, complete cds16.79 10.08 9.53 16.48 11.98 12.8 16.7 18.76 11.25 12.09 18.89 10.81 19.4931317_r_atHuman unproductively rearranged Ig mu-chain mRNA V-region (VD), 5' end, clone mu-3A1A316.75 269.61 254.92 352.61 342.4 327.12 366.39 346 308.43 279.81 312.4 318.06 334.2731318_at Stem cell factor {alternatively spliced} [human, preimplantation embryos, blastocysts, mRNA Partial, 180 nt]32.68 19.79 27.45 29.56 28.34 26.55 38.04 41.05 31.91 22.76 23.58 28.29 22.6131319_at Cluster Incl. M20707:Human kappa-immunoglobulin germline pseudogene (Chr22.4) variable region (subgroup V kappa II) /cds=(0,320) /gb=M20707 /gi=185954 /ug=Hs.123030 /len=363252.78 441.07 143.32 400.01 373.4 105.06 105.72 87.02 110.75 161.69 84.88 240.91 210.5431320_at Cluster Incl. U18548:Human GPR12 G protein coupled-receptor gene, complete cds /cds=(15,1019) /gb=U18548 /gi=604499 /ug=Hs.123034 /len=1101101.42 89.07 79.51 100.69 120.06 116.74 121.41 134.74 131.36 137.4 114.15 119.89 126.7431321_at Cluster Incl. U41737:Human pancreatic beta cell growth factor (INGAP) mRNA, complete cds /cds=(5,520) /gb=U41737 /gi=1514681 /ug=Hs.123060 /len=586112.27 62.17 62.44 80.17 110.97 53.89 55.04 55.16 63.37 54.35 57.79 48.07 47.8931322_at Cluster Incl. X61079:Human mRNA for T cell receptor, clone IGRA24 /cds=(0,142) /gb=X61079 /gi=33521 /ug=Hs.123062 /len=23544.15 52.5 44.8 46.25 55.96 50.01 53.2 52.24 62.16 49.94 47.24 40.64 50.131323_r_atGlutamate transporter II variant B/HBGT IIB {5' region} [human, brain and spinal cord, mRNA Partial Mutant, 129 nt]141.44 177.7 138.58 142.61 167.28 169.49 199.64 185.22 218.79 196.56 150.14 185.24 226.3731324_at Cluster Incl. U82303:Homo sapiens unknown protein mRNA, partial cds /cds=(0,257) /gb=U82303 /gi=1938329 /ug=Hs.123080 /len=34470.87 57.8 61.61 65.93 84.05 106.41 106.73 87.01 112.12 78.47 111 89.08 100.5331325_at Cluster Incl. U82306:Homo sapiens unknown protein mRNA, partial cds /cds=(0,221) /gb=U82306 /gi=1938333 /ug=Hs.123081 /len=25368.63 167.66 69.04 112.84 120.46 126.72 107.04 100 116.83 207.5 125.65 155.19 102.5531326_at Cluster Incl. AF005081:Homo sapiens skin-specific protein (xp32) mRNA, partial cds /cds=(0,340) /gb=AF005081 /gi=2589189 /ug=Hs.123091 /len=416157.67 127.49 123.37 146.18 150.95 159.46 184.08 206.02 182.95 139.01 154.57 143.09 175.2731327_at Cluster Incl. AF015124:Homo sapiens IgG heavy chain variable region (Vh26) mRNA, partial cds /cds=(0,305) /gb=AF015124 /gi=2599349 /ug=Hs.123093 /len=34035.57 28.17 33.64 32.36 38.76 43.13 40.16 46.8 34.47 33.71 25.74 29.45 37.3731328_at solute carrier family 34 (sodium phosphate), member 161.61 48.23 50.76 58 57.58 57.91 69.91 72.34 70.29 54.98 59.74 45.55 63.0231329_at Human putative opioid receptor mRNA, complete cds12.23 18.91 15.36 19.99 21.15 15.9 20.76 22.26 16.15 28.86 13.59 16.06 23.8531330_at ribosomal protein S19108.87 133.3 89.84 113.02 147.61 169.87 156.81 136.47 153.07 220.54 220.96 332.11 18331331_at surfactant protein A binding protein28.21 17.99 23.56 26.37 30.35 28.84 31.54 35.06 22.53 24.45 23 21.37 30.8631332_at RIG-like 14-1 20.77 18.58 19.03 18.29 20.86 23.56 25.11 24.43 19.3 28.72 17.27 23.18 25.3531333_at tolloid-like 1 22.97 52.9 26.95 41.22 48.38 48.85 42.09 40.13 40.73 89.86 46.96 59.74 40.8731334_at G protein-coupled receptor 4598.57 100.16 78.76 119.09 118.58 97.42 110.67 104.95 143.47 111.28 102.88 115.9 133.1731335_at clone 1900 unknown protein65.79 54.3 54.79 57.23 60.75 66.98 72.89 86.97 76.34 57.65 59.83 49.21 70.8331336_at Cluster Incl. AC004076:Homo sapiens chromosome 19, cosmid R30217 /cds=(0,2075) /gb=AC004076 /gi=2822142 /ug=Hs.129709 /len=207640.97 26.15 32.55 26.26 33.73 36.15 36.03 34.93 24.12 26.26 22.55 23.64 28.11
5 normal sample and 9 myeloma (MM) samples 12558 genes (rows)
![Page 12: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/12.jpg)
Genes of Interest
• Statistical significance: that the observed differential expression is unlikely to be due to chance.
• Scientific significance: that the observed level of differential expression is of sufficient magnitude to be of biological relevance.
![Page 13: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/13.jpg)
Group 1 (N samples): X1, X2, … XN
Group 2 (M samples): Y1, Y2, … YM
Statistical significance in the two group problem
Assume
Yj ~ Normal (μ2, σ2)
Xi ~ Normal (μ1, σ2)
Null hypothesis: Group 1 is the “same” to Group 2
(i.e., μ1= μ2)
Parametric Test: t-test
![Page 14: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/14.jpg)
Statistical significance in the two group problem
Yj ~ Normal (μ2, σ2)Xi ~ Normal (μ1, σ2)
Null hypothesis: μ1= μ2
M
s
N
s
YXt
22
* )(
2
)1()1( 22
212
MN
sMsNs
N
iiX
NX
1
1
M
iiY
MY
1
1
Test null hypothesis with test statistics:
N
ii XX
Ns
1
221 )(
1
1
M
ii YY
Ms
1
222 )(
1
1
)2(~* MNtt
Parametric Test: t-test
![Page 15: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/15.jpg)
M
s
N
s
YXt
22
21
)('
1
)/(
1
)/()//(
' 222
221
222
21
M
Ms
N
NsMsNs
df
If variances are unequal
(1) When N+M > 30, this is approximately normal
(2) When 1 >> 2, this is approximately t(df = N–1)
(3) In general, Welch approximation: t’ ~ t(df’), where
Yj ~ Normal (μ2, σ22)
Xi ~ Normal (μ1, σ12)
σ1 σ2
![Page 16: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/16.jpg)
Wilcoxon rank sum test
Consider row 7 of MM study
16 253 633 1008 708 36 72 28 14 33 19 49 58 23
13 4 3 1 2 8 5 10 14 9 12 7 6 11
---------------------------
rank sum = 23
This test is more appropriate than the t-tests when the underlying distribution is far from normal. (But it requires large group sizes)
![Page 17: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/17.jpg)
P-value
• p-value = P(|T|>|t|) is calculated based on the distribution of T under the null hypothesis.
• p-value is a function of the test statistics and can be viewed as a random variable.– e.g. p-value = 2(1 - F(|t*|), F = cdf of t(N+M – 2).
• A small p-value represents evidence against the null hypothesis differentially expressed in our case.
![Page 18: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/18.jpg)
Permutation test• A non-parametric way of computation p-value for any
test statistics.– In the MM-study, each gene has (14 choose 5) = 2002
different test values obtainable from permuting the group labels.
• Under the null hypothesis that the distribution for the two groups are identical, all these test values are equally probable. What is the probability of getting a test value at least as extreme as the observed one? This is the permutation p-value.
![Page 19: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/19.jpg)
Permutation technique
Condition 0 Condition 1
Patient 4 Patient 2 Patient 3 Patient 1 Patient 5 Patient 6
Condition 0 Condition 1
Patient 1 Patient 2 Patient 5 Patient 4 Patient 3 Patient 6
Condition 0 Condition 1
Patient 1 Patient 6 Patient 3 Patient 4 Patient 5 Patient 2
Condition 0 Condition 1
Patient 1 Patient 2 Patient 3 Patient 4 Patient 5 Patient 6Compute TS0
Compute TS1
Compute TS2
Compute TS3
The set of TSi form the empirical distribution of the test statistic TS
![Page 20: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/20.jpg)
Scientific Significance
• Fold change FC =
• May not be high when statistical significance is high.
• Not an appropriate measure if the dispersion is not taken into consideration.
/X Y
![Page 21: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/21.jpg)
Conservative fold change
Conservative fold change (CFC) =
Max (25th percentile of sample 1 / 75th percentile of sample 2,
25th percentile of sample 2 / 75th percentile of sample 1)
![Page 22: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/22.jpg)
xaxis
de
n1
98 100 102 104 106
0.0
0.1
0.2
0.3
0.4
Sample 1: Normal (100, 1)
Sample 2: Normal (103, 1)
CFC = 1.0164
![Page 23: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/23.jpg)
90
10
011
01
20
10
02
00
30
04
00
50
04
06
08
01
00
12
01
40
16
0
CFC=3.53
50
10
01
50
20
02
50
30
0
CFC=1.07
CFC=2.89
CFC=1.45
![Page 24: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/24.jpg)
P-values and FC contains different information
- log10(p.value.unequal.log)[idx]
fold
.qu
an
tile
[idx]
3 4 5 6 7
24
68
10
12
14
![Page 25: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/25.jpg)
Gene Selection and Ranking
• A high threshold of statistical significance Select genes with p-values smaller than a threshold
• The selected genes are ordered according to their scientific significance (i.e. ranked by fold-changes)
![Page 26: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/26.jpg)
The False Positive Rate (FPR)
• If we select genes with p-value < 0.01, then the probability of making a positive call when the gene is in fact not differential is less than 0.01. Thus selection by p-value controls the FPR.
• However, if we have 12,000 genes in a microarray, then a FPR = 0.01 still allows up to 120 false positives. To make sensible decision, we must take multiple comparisons into consideration.
![Page 27: Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.](https://reader038.fdocuments.net/reader038/viewer/2022110207/56649d375503460f94a100fb/html5/thumbnails/27.jpg)
Dealing with Multiple Comparison
• Bonferroni inequality: To control the family-wise error rate for testing m hypotheses at level α, we need to control the FPR for each individual test at α/m
• Then P(false rejection at least one hypothesis) < α
or P(no false rejection) > 1- α
• This is appropriate for some applications (e.g. testing a new drug versus several existing ones), but is too conservative for our task of gene selection.