Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG)

Post on 06-Feb-2016

29 views 0 download

Tags:

description

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG). Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry Speed, Walter & Eliza Hall Institute of Medical Research). Summary. - PowerPoint PPT Presentation

Transcript of Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG)

Bias, Variance, and Fit for Three Measures of Expression: AvDiff, Li &Wong’s, and AvLog(PM-BG)

Rafael A. IrizarryDepartment of Biostatistics, JHU

(joint work with Bridget Hobbs and Terry Speed,

Walter & Eliza Hall Institute of Medical Research)

Summary

• Summarize the expression level of a probe set by Average Log2 (PM-BG)

• PMs need to be normalized • Background makes no use of probe-specific MM• Evaluate and compare through bias, variance and

model fit to AvDiff and the Li & Wong algorithm• Use Gene Logic spike-in and dilution study• All three expression measures performed well• AvLog(PM-BG) is arguably the best of the three

SD vs. Avg of Defective Probes

Normalization at Probe Level

Expression after Normalization

Background Distribution

Average Log2(PM-BG)

• Normalize probe level data

• Compute BG = background mean by estimating the mode of the MM distribution

• Subtract BG from each PM

• If PM-BG < 0 use minimum of positives divided by 2

• Take average

Spike-In Experiments

• Add concentrations (0.5pM – 100 pM) of 11 foreign species cRNAs to hybridization mixture

• Set A: 11 control cRNAs were spiked in, all at the same concentration, which varied across chips.

• Set B: 11 control cRNAs were spiked in, all at different concentrations, which varied across chips. The concentrations were arranged in 12x12 cyclic Latin square (with 3 replicates)

Why Remove Background?

Probe Level Data (12 chips)

What Did We Learn?

• Don’t subtract or divide by MM

• Probe effect is additive on log scale

• Take logs

Expression Level

Spike-In BGene Conc 1 Conc 2 Rank

BioB-5 100 0.5 1

BioB-3 0.5 25.0 2

BioC-5 2.0 75.0 3

BioB-M 1.0 35.7 4

BioDn-3 1.5 50.0 5

DapX-3 35.7 3.0 6

CreX-3 50.0 5.0 7

CreX-5 12.5 2.0 8

BioC-3 25.0 100 9

DapX-5 5.0 1.5 10

DapX-M 3.0 1.0 11

Later we consider 24 different combinations of concentrations

Differential Expression

Observed vs True Ratio

Dilution Experiment• cRNA hybridized to human chip (HGU_95) in

range of proportions and dilutions• Dilution series begins at 1.25 g cRNA per

GeneChip array, and rises through 2.5, 5.0, 7.5, 10.0, to 20.0 g per array. 5 replicate chips were used at each dilution

• Normalize just within each set of 5 replicates• For each probe set compute expression, average

and SD over replicates, and fit a line to log expression vs. log concentration

• Regression line should have slope 1 and high R2

Dilution Experiment Data

Expression and SD

Slope Estimates and R2

Model check

• Compute observed SD of 5 replicate expression estimates

• Compute RMS of 5 nominal SDs

• Compare by taking the log ratio

• Closeness of observed and nominal SD taken as a measure of goodness of fit of the model

Observed vs. Model SE

Observed vs. Model SE

Conclusion

• Take logs• PMs need to be normalized • Using global background improves on use of

probe-specific MM• Gene Logic spike-in and dilution study show all

three expression measures performed very well• AvLog(PM-BG) is arguably the best in terms of

bias, variance and model fit• Future: better BG; robust/resistant summaries

Acknowledgements

• Gene Brown’s group at Wyeth/Genetics Institute, and Uwe Scherf’s Genomics Research & Development Group at Gene Logic, for generating the spike-in and dilution data

• Gene Logic for permission to use these data • Francois Collin (Gene Logic)• Ben Bolstad (UC Berkeley)• Magnus Åstrand (Astra Zeneca Mölndal)