Lecture 8 Microarray experiments MA plots Normalization of microarray data
description
Transcript of Lecture 8 Microarray experiments MA plots Normalization of microarray data
![Page 1: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/1.jpg)
Lecture 8
Microarray experiments
MA plots
Normalization of microarray data
Tests for differential expression of genes
Multiple testing and FDR
![Page 2: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/2.jpg)
•Though most cells in an organism contain the same genes, not all of the genes are used in each cell.
•Some genes are turned on, or "expressed" when needed in particular types of cells.
•Microarray technology allows us to look at many genes at once and determine which are expressed in a particular cell type.
DNA Microarray
Typical microarray chip
![Page 3: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/3.jpg)
•DNA molecules representing many genes are placed in discrete spots on a microscope slide which are called probes.
•Messenger RNA--the working copies of genes within cells is purified from cells of a particular type.
•The RNA molecules are then "labeled" by attaching a fluorescent dye that allows us to see them under a microscope, and added to the DNA dots on the microarray.
•Due to a phenomenon termed base-pairing, RNA will stick to the probe corresponding to the gene it came from
DNA Microarray
Typical microarray chip
![Page 4: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/4.jpg)
Source: PhD thesis by Benjamin Milo Bolstad, 2004, University of California, Barkeley
Usually a gene is interrogated by 11 to 20 probes and usually each probe is a 25-mer sequenceThe probes are typically spaced widely along the sequenceSometimes probes are choosen closer to the 3’ end of the sequenceA probe that is exactly complementary to the sequence is called perfect match (PM)A mismatch probe (MM) is not complementary only at the cemtral positionIn theory MM probes can be used to quantify and remove non specific hybridization
DNA Microarray
![Page 5: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/5.jpg)
Sample preparation and hybridization
Source: PhD thesis by Benjamin Milo Bolstad, 2004, University of California, Barkeley
![Page 6: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/6.jpg)
Source: PhD thesis by Benjamin Milo Bolstad, 2004, University of California, Barkeley
Sample preparation and hybridization
During the hybridization process cRNA binds to the array
Earlier probes had all the probes of a probset located continuously on the arrayThis may fall prey to spatial defectsNewer chips have all the probes spread out across the arrayA PM and MM probe pair are always adjacent on the array
![Page 7: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/7.jpg)
Growth curve of bacteria
•Samples can be taken at different stages of the growth curve
•One of them is considered as control and others are considered as targets
•Samples can be taken before and after application of drugs
•Sample can be taken under different experimental conditions e.g. starvation of some metabolite or so
•What types of samples should be used depends on the target of the experiment at hand.
![Page 8: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/8.jpg)
•After washing away all of the unstuck RNA, the microarray can be observed under a microscope and it can be determined which RNA remains stuck to the DNA spots
•Microarray technology can be used to learn which genes are expressed differently in a target sample compared to a control sample (e.g diseased versus healthy tissues)
However background correction and normalization are necessary before making useful decisions or conclusions
DNA Microarray
Typical microarray chip
![Page 9: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/9.jpg)
MA plots
MA plots are typically used to compare two color channels, two arrays or two groups of arraysThe vertical axis is the difference between the logarithm of the signals(the log ratio) and the horizontal axis is the average of the logarithms of the signalsThe M stands for minus and A stands for addMA is also mnemonic for microarray
Mi= log(Xij) - log(Xik) = Log(Xij/Xik) (Log ratio)
Ai=[log(Xij) + log(Xik)]/2 (Average log intensity)
![Page 10: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/10.jpg)
A typical MA plot
From the first plot we can see differences between two arrays but the non linear trend is not apparentThis is because there are many points at low intensities compared to at high intensitiesMA plot allows us to assess the behavior across all intensities
![Page 11: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/11.jpg)
Normalization of microarray data
Normalization is the process of removing unwanted non-biological variation that might exist between chips in microarray experiments
By normalization we want to remove the non-biological variation and thus make the biological variations more apparent.
![Page 12: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/12.jpg)
Array 1 Array 2 ・・・
Array j ・・・
Array m
Gene 1 X11 X12 X1j X1m
Gene 2 X21 X22 X2j X2m
・・・Gene i Xi1 Xi2 Xij Xim
・・・Gene n Xn1 Xn2 Xnj Xnm
Mean X1 X2 Xj Xm
SD σ1 σ2 σj σm
Typical microarray data
![Page 13: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/13.jpg)
Array 1 Array 2 ・・・
Array j ・・・
Array m
Gene 1 X11 X12 X1j X1m
Gene 2 X21 X22 X2j X2m
・・・Gene i Xi1 Xi2 Xij Xim
・・・Gene n Xn1 Xn2 Xnj Xnm
Mean X1 X2 Xj Xm
SD σ1 σ2 σj σm
Normalization within individual arrays
Scaling: Sij = Xij - Xj
Centering: Cij = ( Xij - Xj ) / σj
![Page 14: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/14.jpg)
Original Data
Scaling Centering
Effect of Scaling and centering normalization
![Page 15: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/15.jpg)
Normalization between a pair of arrays: Loess(Lowess) Normalization
Lowess normalization is separately applied to each experiment with two dyes
This method can be used to normalize Cy5 and Cy3 channel intensities (usually one of them is control and the other is the target) using MA plots
![Page 16: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/16.jpg)
Genei-1 Ci-1 Ti-1
Genei Ci Ti
Genei+1 Ci+1 Ti+1
Mi=Log(Ti/Ci) (Log ratio)
Ai=[log(Ti) + log(Ci)]/2 (Average log intensity)
Mi=
Log(
Ti/C
i)
Ai=[log(Ti) + log(Ci)]/2
Each point corresponds to a single gene
2 channel data
Normalization between a pair of arrays: Loess(Lowess) Normalization
![Page 17: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/17.jpg)
Mi=Log(Ti/Ci) (Log ratio)
Ai=[log(Ti) + log(Ci)]/2 (Average log intensity)
Mi=
Log(
Ti/C
i)
Ai=[log(Ti) + log(Ci)]/2
Each point corresponds to a single gene
The MA plot shows some bias
Typical regression line
Normalization between a pair of arrays: Loess(Lowess) Normalization
![Page 18: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/18.jpg)
Mi=Log(Ti/Ci) (Log ratio)
Ai=[log(Ti) + log(Ci)]/2 (Average log intensity)
Mi=
Log(
Ti/C
i)
Ai=[log(Ti) + log(Ci)]/2
Each point corresponds to a single gene
The MA plot shows some bias
Usually several regression lines/polynomials are considered for different sections
The final result is a smooth curve providing a model for the data. This model is then used to remove the bias of the data points
Normalization between a pair of arrays: Loess(Lowess) Normalization
![Page 19: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/19.jpg)
Bias reduction by lowess normalization
Normalization between a pair of arrays: Loess(Lowess) Normalization
![Page 20: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/20.jpg)
Unnormalized fold changes
fold changes after Loess normalization
Normalization between a pair of arrays: Loess(Lowess) Normalization
![Page 21: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/21.jpg)
Normalization across arrays
Here we are discussing the following two normalization procedure applicable to a number of arrays
1.Quantile normalization2.Baseline scaling normalization
![Page 22: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/22.jpg)
Quantile normalization The goal of quantile normalization is to give the same empirical distribution to the intensities of each arrayIf two data sets have the same distribution then their quantile- quantile plot will have straight diagonal line with slope 1 and intercept 0.Or projecting the data points of the quantile- quantile plot to 45-degree line gives the transformation to have the same distribution.
quantile- quantile plot motivates the quantile normalization algorithm
Normalization across arrays
![Page 23: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/23.jpg)
Quantile normalization Algorithm
Source: PhD thesis by Benjamin Milo Bolstad, 2004, University of California, Barkeley
Normalization across arrays
![Page 24: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/24.jpg)
No. Exp.1 No. Exp.2
1 1.6 1 1.2
2 0.6 2 2.8
3 1.8 3 1.8
4 0.8 4 3.8
5 0.4 5 0.8
No. Exp.1 No. Exp.2 Mean
5 0.4 5 0.8 0.6 = (0.4+0.8)/2
2 0.6 1 1.2 0.9
4 0.8 3 1.8 1.3
1 1.6 2 2.8 2.2
3 1.8 4 3.8 2.8
No. Exp.1 No. Exp.2
5 0.6 5 0.6
2 0.9 1 0.9
4 1.3 3 1.3
1 2.2 2 2.2
3 2.8 4 2.8
No. Exp.1 No. Exp.2
1 2.2 1 0.9
2 0.9 2 2.2
3 2.8 3 1.3
4 1.3 4 2.8
5 0.6 5 0.6
Original data
4. Get X normalized by rearranging each column of X' sort to have the same ordering as original X
1. Sort each column of X (values)2. Take the means across rows of X sort
3. Assign this mean to each elementin the row to get X' sort
Sort
Sort
Quantile Normalization:Normalization across arrays
![Page 25: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/25.jpg)
Raw data
After quantile normalization
Normalization across arrays
![Page 26: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/26.jpg)
Baseline scaling method
In this method a baseline array is chosen and all the arrays are scaled to have the same mean intensity as this chosen array
This is equivalent to selecting a baseline array and then fitting a linear regression line without intercept between the chosen array and every other array
Normalization across arrays
![Page 27: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/27.jpg)
Baseline scaling methodNormalization across arrays
![Page 28: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/28.jpg)
After Baseline scaling normalization
Raw data
Normalization across arrays
![Page 29: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/29.jpg)
Tests for differential expression of genes
Let x1…..xn and y1…yn be the independent measurements of the same probe/gene across two conditions.
Whether the gene is differentially expressed between two conditions can be determined using statistical tests.
![Page 30: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/30.jpg)
Important issues of a test procedure are(a)Whether the distributional assumptions are valid(b)Whether the replicates are independent of each other(c)Whether the number of replicates are sufficient(d)Whether outliers are removed from the sample
Replicates from different experiments should not be mixed since they have different characteristics and cannot be treated as independent replicates
Tests for differential expression of genes
![Page 31: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/31.jpg)
Most commonly used statistical tests are as follows:
(a) Student’s t-test(b) Welch’s test(c) Wilcoxon’s rank sum test(d) Permutation tests
The first two test assumes that the samples are taken from Gaussian distributed data and the p-values are calculated by a probability distribution functionThe later two are nonparametric and the p values are calculated using combinatorial arguments.
Tests for differential expression of genes
![Page 32: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/32.jpg)
Student’s t-test
Assumptions: Both samples are taken from Gaussian distribution that have equal variances
Degree of freedom: m+n-2
Welch’s test is a variant of t-test where t is calculated as follows
Welch’s test does not assume equal population variances
![Page 33: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/33.jpg)
Student’s t-test
The value of t is supposed to follow a t-distribution.After calculating the value of t we can determine the p-value from the t distribution of the corresponding degree of freedom
![Page 34: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/34.jpg)
Wilcoxon’s rank sum testLet x1…..xn and y1…ym be the independent measurements of the same probe/gene across two conditions.Consider the combined set x1…..xn ,y1…ym The test statistic of Wilcoxon test is
Where is the rank of xi in the combined series
Possible Minimum value of T is
Possible Maximum value of T is
Minimum and maximum values of T occur if all X data are greater or smaller than the Y data respectively i.e. if they are sampled from quite different distributions
![Page 35: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/35.jpg)
Expected value and variance of T under null hypothesis are as follow:
Now unusually low or high values of T compared to the expected value indicate that the null hypothesis should be rejected i.e. the samples are not from the same population
For larger samples i.e. m+n >25 we have the following approximation
![Page 36: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/36.jpg)
X Data
x1 7
x2 8
x3 5
x4 9
x5 7
Y Data
y1 5
y2 6
y3 8
y4 4
X&Y Data Rank
x4 9 1
x2 8 2
y3 8 3
x5 7 4
x1 7 5
y2 6 6
y1 5 7
x3 5 8
y4 4 9
Wilcoxon’s rank sum test (Example)
n=5. m=4
T=R(x1)+R(x2)+R(x3)+R(x4)+R(x5)=5+2+8+1+4= 20EH0(T)=n(m+n+1)/2= 5(4+5+1)/2=25VarH0(T)=mn(m+n+1)/12= 5*4(4+5+1)/12=50/3=16.66
P-value = .1112 (From chart)
![Page 37: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/37.jpg)
Example
![Page 38: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/38.jpg)
Multiple testing and FDR
The single gene analysis using statistical tests has a drawback.This arises from the fact that while analyzing microarray data we conduct thousands of tests in parallel.
Let we select 10000 genes with a significant level α=0.05 i.e a false positive rate of 5%
This means we expect that 500 individual tests are false which is not at logical
Therefore corrections for multiple testing are applied while analyzing microarray data
![Page 39: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/39.jpg)
Let αg be the global significance level and αs is the significance level at single gene level
In case of a single gene the probability of making a correct decision is
Therefore the probability of making correct decision for all n genes (i.e. at global level)
Now the probability of drawing the wrong conclusion in either of n tests is
For example if we have 100 different genes and αs=0.05the probability that we make at least 1 error is 0.994 ---this is very high and this is called family-wise error rate (FWER)
Multiple testing and FDR
![Page 40: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/40.jpg)
Using binomial expansion we can write
Thus
Therefore the Bonferroni correction of the single gene level is the global level divided by the number of tests
Therefore for FWER of 0.01 for n= 10000 genes the P-value at single gene level should be 10-6
Usually very few genes can meet this requirement
Therefore we need to adjust the threshold p-value for the single gene case.
Multiple testing and FDR
![Page 41: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/41.jpg)
A method for adjusting p-value is given in the following paper
Westfall P. H. and Young S. S. Resampling based multiple testing : examples and methods for p-value adjustment(1993), Wiley, New York
Multiple testing and FDR
![Page 42: Lecture 8 Microarray experiments MA plots Normalization of microarray data](https://reader036.fdocuments.net/reader036/viewer/2022062519/5681542a550346895dc22be1/html5/thumbnails/42.jpg)
An alternative to controlling FWER is the computation of false discovery rate(FDR)
The following papers discuss about FDRStorey J. D. and Tibshirani R. Statistical significance for genome wise studies(2003), PNAS 100, 9440-9445
Benjamini Y and Hochberg Y Controlling the false discovery rate : a practical and powerful approach to multiple testing(1995) J Royal Statist Soc B 57, 289-300
Still the practical use of multiple testing is not entirely clear.
However it is clear that we need to adjust the p-value at single gene level while testing many genes together.
Multiple testing and FDR