Download - RNASeq DE methods review Applied Bioinformatics Journal Club

Transcript
Page 1: RNASeq DE methods review Applied Bioinformatics Journal Club

Applied Bioinformatics Journal ClubWednesday, March 5

Page 2: RNASeq DE methods review Applied Bioinformatics Journal Club

Background

• Comparison of commonly used DE software packages– Cuffdiff– edgeR– DESeq– PoisssonSeq– baySeq– limma

• Two benchmark datasets– Sequencing Quality

Control (SEQC) dataset• Includes qRT-PCR for

1,000 genes

– Biological replicates from 3 cell lines as part of ENCODE project

Page 3: RNASeq DE methods review Applied Bioinformatics Journal Club

Focus of paper:Comparison of elevant measures for DE detection

• Normalization of count data

• Sensitivity and specificity of DE detection

• Genes expressed in one condition but no expression in the other condition

• Sequencing depth and number of replicates

Page 4: RNASeq DE methods review Applied Bioinformatics Journal Club

Theoretical background

• Count matrix—number of reads assigned to gene i in sequencing experiment j

• Length bias when measuring gene expression by RNA-seq– Reduces the ability to

detect differential expression among shorter genes

• Differential gene expression consists of 3 components:– Normalization of counts

– Parameter estimation of the statistical model

– Tests for differential expression

Page 5: RNASeq DE methods review Applied Bioinformatics Journal Club

Normalization

• Commonly used– RPKM– FPKM– Biases—proportional

representation of each gene is dependent on expression levels of other genes

• DESeq-scaling factor based normalization– median of ratio for each gene

of its read count over its geometric mean across all samples

• Cuffdiff—extension of DESeq normalization– Intra-condition library scaling– Second scaling between

conditions– Also accounts for changes in

isoform levels

Page 6: RNASeq DE methods review Applied Bioinformatics Journal Club

Normalization

• edgeR– Trimmed means of M

values (TMM)– Weighted average of

subset of genes (excluding genes of high average read counts and genes with large differences in expression)

• baySeq– Sum gene counts to

upper 25% quantile to normalize library size

• PoissonSeq– Goodness of fit estimate

to define a gene set that is least differentiated between 2 conditions, and then used to compute library normalization factors

Page 7: RNASeq DE methods review Applied Bioinformatics Journal Club

Normalization

• limma (2 normalization procedures)– Quantile normalization

Sorts counts from each sample and sets the values to be equal to quantile mean from all samples– Voom: LOWESS regression to estimate mean

variance relation and transforms read counts to log form for linear modeling

Page 8: RNASeq DE methods review Applied Bioinformatics Journal Club

Statistical modeling of gene expression

• edgeR and DESeq– Negative binomial distribution (estimation of

dispersion factor)• edgeR– Estimation of dispersion factor as weighted

combination of 2 components• Gene specific dispersion effect and common dispersion

effect calculated for all genes

Page 9: RNASeq DE methods review Applied Bioinformatics Journal Club

Statistical modeling of gene expression

• DESeq– Variance estimate into a combination of Poisson

estimate and a second term that models biological variability

• Cuffdiff– Separate variance models for single isoform and

multiple isoform genes• Single isoform—similar to DESeq• Multiple isoform– mixed model of negative binomial

and beta distributions

Page 10: RNASeq DE methods review Applied Bioinformatics Journal Club

Statistical modeling of gene expression

• baySeq– Full Bayesian model of negative binomial

distributions– Prior probability parameters are estimated by

numerical sampling of the data• PoissonSeq– Models gene counts as a Poisson variable– Mean of distribution represented by log-linear

relationship of library size, expression of gene, and correlation of gene with condition

Page 11: RNASeq DE methods review Applied Bioinformatics Journal Club

Test for differential expression

• edgeR and DESeq– Variation of Fisher exact test modified for negative

binomial distribution– Returns exact P value from derived probabilities

• Cuffdiff– Ratio of normalized counts between 2 conditions

(follows normal distribution)– t-test to calculate P value

Page 12: RNASeq DE methods review Applied Bioinformatics Journal Club

Test for differential expression

• limma– Moderated t-statistic of modified standard error

and degrees of freedom• baySeq– Estimates 2 models for every gene• No differential expression• Differential expression

– Posterior likelihood of DE given the data is used to identify differentially expressed genes (no P value)

Page 13: RNASeq DE methods review Applied Bioinformatics Journal Club

Test for differential expression

• PoissonSeq– Test for significance of correlation term – Evaluated by score statistics which follow a Chi-

squared distribution (used to derive P values)

• Multiple hypothesis corrections– Benjamini-Hochberg– PoissonSeq—permutation based FDR

Page 14: RNASeq DE methods review Applied Bioinformatics Journal Club

Results

• Normalization and log expression correlation

• Differential expression analysis

• Evaluation of type I errors

• Evaluation of genes expressed in one condition

• Impact of sequencing depth and replication on DE detection

Page 15: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 16: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 17: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 18: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 19: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 20: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 21: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 22: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 23: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 24: RNASeq DE methods review Applied Bioinformatics Journal Club
Page 25: RNASeq DE methods review Applied Bioinformatics Journal Club

5

Page 26: RNASeq DE methods review Applied Bioinformatics Journal Club

5

Page 27: RNASeq DE methods review Applied Bioinformatics Journal Club