(2) Ratio statistics of gene expression levels and applications to microarray data analysis

Post on 21-Jan-2016

38 views 0 download

Tags:

description

(2) Ratio statistics of gene expression levels and applications to microarray data analysis. Bioinformatics, Vol. 18, no. 9, 2002 Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent. Outline. Introduction Ratio Statistics - PowerPoint PPT Presentation

Transcript of (2) Ratio statistics of gene expression levels and applications to microarray data analysis

(2) Ratio statistics of gene expression levels and applications to microarray data analysis

Bioinformatics, Vol. 18, no. 9, 2002

Yidong Chen, Vishnu Kamat, Edward R. Dougherty, Michael L. Bittner, Paul S. Meltzer1, and Jeffery M. Trent

OutlineOutline

Introduction

Ratio Statistics

Quality Metric for Ratio Statistics

Conclusion

IntroductionIntroduction

Motivation Expression-based analysis for large families of genes

has recently become possible owing to the development of cDNA microarrays, which allow simultaneous measurement of transcript levels for thousands of genes. For each spot on a microarray, signals in two channels must be extracted from their backgrounds. This requires algorithms to extract signals arising from tagged mRNA hybridized to arrayed cDNA locations and algorithms to determine the significance of signal ratios.

IntroductionIntroduction Results 1. estimation of signal ratios from the two channels,

and the significance of those ratios.

2. a refined hypothesis test is considered in which the measured intensities forming the ratio are assumed to be combinations of signal and background. The new method involves a signal-to-noise ratio, and for a high signal-to-noise ratio the new test reduces (with close approximation) to the original test. The effect of low signal-to-noise ratio on the ratio statistics constitutes the main theme of the paper.

3. a quality metric is formulated for spots

Ratio StatisticsRatio Statistics

Consider a microarray having n genes, with red and green fluorescent expression values labeled by

and , respectively.

Hypothesis test:

Assumption:

Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variationconstant coefficient of variation

nRRR ,...,, 21 nGGG ,...,, 21

kk

kk

kk

GR

H

GG

RR

c

c

0under

kk

kk

GR

GR

H

H

:

:

1

0

Ratio test statistics:

Assuming and to be normally and

identically distributed, has the density function

Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)

kkk GRT /

kR kG

kT

],)1(2

)1(exp[

2)1(

1)1();(

2

2

22

2

tc

t

tc

ttctf

kT

n

i i

i

t

t

nc

12

2

)1(

)1(1ˆ

Ratio Statistics assuming a constant Ratio Statistics assuming a constant coefficient of variation coefficient of variation (cont.)

self-self experiment Duplicate

),log(log)log(log

logloglog

,'/

''

'

kkkk

kkk

GGRR

ttT

ttT

).log(loglog where

)log(1

log1

2

R

R

n

ik

RR

Rn

ck

Ratio Statistics assuming a Ratio Statistics assuming a constant coefficient of variation constant coefficient of variation (cont.)

Confidence interval

1. Integrating the ratio density function

2. The C.I. is determined by the parameter c, one can

either use the par. derived from pre-selected housekeeping genes or a set of duplicate genes.

2

2'log

2log

2'log

2loglog

4c

)()(

Therefore,

GGRRT

Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratioto-noise ratio

The actual expression intensity measurement is of the form

kBRkkk BRSRR )(

level backgroundmean theis

and level, background fluoresent theis

, gene of

t measuremenintensity expression theis where

kBRk

k

BR

k

SR

Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)

Null hypothesis of interest:

test statistics:

kkkk GRSGSR HH : : 00

kk SGSR

k

k

k

SR

BRkk

kR

BRSRE

RE

])[(

][

kkk GRT /

Ratio Statistics for low signal-Ratio Statistics for low signal-to-noise ratio to-noise ratio (cont.)

Major difference:

1. the assumption of a constant cv applies to

and , not to and

2. the density of is not applicable

SNR (signal-to-noise ratio)

kSR

kSG kR kG

kT

Assuming that are independent,

SNR (SNR (signal-to-noise ratiosignal-to-noise ratio))

and kk BRSR

2222 )(kkkkk BRSRBRSRR c

k

k

kk

kBR

SR

BRBRk

kR BRE

SRESNR

][

][

2

22

22

2

222

2 1)(

kk

k

k

kk

k

k

kRSR

BR

SR

BRSR

R

RR SNR

ccc

c

The Expression intensity scatter plot

Confidence interval for the Confidence interval for the test statisticstest statistics

Assumption:

k

k

BGkk

BRkk

k

kk BGSG

BRSR

G

RT

)(

)(

BGBGBGp

BRBRBRp

NpN

NpNT

),(),(

),(),(

)( ,under 0 kkkk GRSGSRpH

t.independen and

ddistributenormally are ,,, kkkk BGBRSGSR

Confidence interval for the Confidence interval for the test statistics test statistics (cont.)

Under the assumption of constant cv for the signal (wi

thout the background),

cpp

ratio) std d(backgroun /

ratio) noise-to-(signal /

par.) (variance },max{

BGBR

B

BGBRB

ps

),0(),(

),0(),(

BGBB

BGBB

NcssN

NcssNT

The 99% confidence interval for ratio statistic

1 (b) )1or ( 100 (a)

,2.0

BGBR

c

Correction of background Correction of background estimationestimation

Owing to interaction between the fluorescent signal and background, local-background estimation is often biased.

To estimate the bias difference, we find the relationship between the red and green intensities under the null hypothesis by assuming a linear relation, G = aR+b.

Correction of background Correction of background estimation estimation (cont.)(cont.)

Simulation

1. generate 10,000 data points from exp. dist. with

2,000 to simulate 10,000 gene expression levels,

2. The intensity measurement for each channel is

further simulated by using a normal dist. with mean

intensity from the exp. dist. and a constant cv of 0.2

3. simulate background level by a normal dist.

(1) no bias: background level ~ N (0,100)

(2) some bias: background level ~ N (b,100)

),0(),(

),0(),(

BGp

BGp

NpN

NpNT

Scatter plot of simulated expression data

500 of bias estimation background with points data 10,000 (b)

estimation background from bias no with points data 10,000 )(a

dog-leg effect

Correction of background Correction of background estimation estimation (cont.)(cont.)

G = aR+b

we employ a chi-square fitting method that minimizes

N

k GR

kk

kk

baRG

122

22 ))((

N

k BGBRkk

N

k kkBGBRkk

GRc

RGGRcb

11222

11222

)ˆ2ˆ2)((

)()ˆ2ˆ2)((

Quality Metric for Ratio Quality Metric for Ratio StatisticsStatistics

For a given cDNA target, the following factors affect ratio measurement quality:

(1) Weak fluorescent intensities

(2) A smaller than normal detected target area

(3) A very high local background level

(4) A high standard deviation of target intensity

(1)Fluorescent intensity (1)Fluorescent intensity measurement quality measurement quality

Under the null hypothesis, the signal means are equal, so that

B

R

BGBR

RGR SNRSNR

},max{

},min{

otherwise , 1

6ˆ 2

GR3 ,

ˆ 6

GR

3ˆ 2

GR ,0

obtain to,ˆ and G)/2(R ,estimators

hypothesis-nullby their and replace We

BB

B

B

BR

Iw

(2)Target area measurement (2)Target area measurement quality quality

.target

theof components connectedlargest two theof

area thebe let and tip,-print particular afor

t cDNA targe theofmask of area thebe Let

k

A

A

kT

M

otherwise ,1

20.0 ,

}05.0,/10max{a ,0

by

qualityt measuremen area the define We

./

istarget each of area alproportion The

minmin

min

min

bb

M

a

MTk

sasss

a-s

As

w

AAak

(3)Background flatness quality(3)Background flatness quality

Define background flatness

similarly. defined is and

6 ,0

64 ,3

)6(

4 ,1

where},,min{

BG

BRBRk

BRBRkBRBRBR

kBRBR

BRBRk

BR

BGBRb

w

BR

BRBR

BR

w

www

(4)Signal intensity consistency (4)Signal intensity consistency quality quality

Typical target shap

cv=0.48 cv=0.45 cv=0.31

cv=0.81 cv=0.98 cv=0.59

(4)Signal intensity consistency (4)Signal intensity consistency quality quality (cont.)

9.0 ,1

1.10.9 ,2.0

9.0

1.1 ,0

channels,green and

red for the variationoft coefficienintensity the

between minimun thedenote Letting

min,

min,min,

min,

min,

k

kk

k

s

k

cv

cvcv

cv

w

cv