scRNA-seq - Differential expression analysis methods · Differential expression analysis methods...
Transcript of scRNA-seq - Differential expression analysis methods · Differential expression analysis methods...
scRNA-seqDifferential expression analysis methods
Olga Dethlefsen
NBIS National Bioinformatics Infrastructure Sweden
October 2017
Olga (NBIS) scRNA-seq de October 2017 1 34
OutlineIntroduction what is so special about DE with scRNA-seqCommon methods what is out therePerformance how to choose the best methodSummaryDE tutorial
Olga (NBIS) scRNA-seq de October 2017 2 34
Introduction
Figure Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 3 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seq
Olga (NBIS) scRNA-seq de October 2017 4 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)
Olga (NBIS) scRNA-seq de October 2017 5 34
Common methods
Common methods
Olga (NBIS) scRNA-seq de October 2017 6 34
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
OutlineIntroduction what is so special about DE with scRNA-seqCommon methods what is out therePerformance how to choose the best methodSummaryDE tutorial
Olga (NBIS) scRNA-seq de October 2017 2 34
Introduction
Figure Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 3 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seq
Olga (NBIS) scRNA-seq de October 2017 4 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)
Olga (NBIS) scRNA-seq de October 2017 5 34
Common methods
Common methods
Olga (NBIS) scRNA-seq de October 2017 6 34
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Introduction
Figure Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 3 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seq
Olga (NBIS) scRNA-seq de October 2017 4 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)
Olga (NBIS) scRNA-seq de October 2017 5 34
Common methods
Common methods
Olga (NBIS) scRNA-seq de October 2017 6 34
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seq
Olga (NBIS) scRNA-seq de October 2017 4 34
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)
Olga (NBIS) scRNA-seq de October 2017 5 34
Common methods
Common methods
Olga (NBIS) scRNA-seq de October 2017 6 34
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Introduction
Differential expression is an old problemso
why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)
Olga (NBIS) scRNA-seq de October 2017 5 34
Common methods
Common methods
Olga (NBIS) scRNA-seq de October 2017 6 34
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Common methods
Olga (NBIS) scRNA-seq de October 2017 6 34
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio
Olga (NBIS) scRNA-seq de October 2017 7 34
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)
Olga (NBIS) scRNA-seq de October 2017 8 34
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]
Olga (NBIS) scRNA-seq de October 2017 9 34
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
MAST
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 10 34
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
SCDE
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 11 34
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Monocole
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Letrsquos stop for a minute
Olga (NBIS) scRNA-seq de October 2017 13 34
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Differential expression
Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions
Olga (NBIS) scRNA-seq de October 2017 14 34
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
The key
Outcomei = (Modeli) + errori
we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data
Olga (NBIS) scRNA-seq de October 2017 15 34
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
The key
t = x1minusx2
sp
radic1
n1+ 1
n2
height [cm]
Fre
quen
cy
165 170 175 180
010
3050
Olga (NBIS) scRNA-seq de October 2017 16 34
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
The key
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Olga (NBIS) scRNA-seq de October 2017 17 34
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
The key MAST (again)
uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)
logit(Pr (Zig = 1)) = XiβDg
Pr (Yig = Y |Zig = 1) = N(XiβCg σ
2g) where Xi is a design matrix
Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 18 34
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
The key SCDE (again)
models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach
Olga (NBIS) scRNA-seq de October 2017 19 34
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
The key Monocole (again)
Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as
g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically
log function and fi are non-parametric functions (eg cubic splines)
The observable expression level Y is then modeled using GAM
E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero
The DE test is performed using an approx χ2 likelihood ratio test
Olga (NBIS) scRNA-seq de October 2017 20 34
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
They key implication
Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference
Implicationthe better model fits to the data the better statistics
Olga (NBIS) scRNA-seq de October 2017 21 34
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Common methods
Negative Binomial
Read Counts
Fre
quen
cy
0 5 10 15 20
050
100
150
200
Zerominusinflated NB
Read Counts
Fre
quen
cy
0 5 10 15 20
010
020
030
040
050
0
PoissonminusBeta
Read Counts
Fre
quen
cy
0 20 60 100
010
020
030
040
0
Olga (NBIS) scRNA-seq de October 2017 22 34
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
Performance
Olga (NBIS) scRNA-seq de October 2017 23 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
No golden standard
There is no golden standard no single best solution
so what do we do
we gather as much evidence as possible
Olga (NBIS) scRNA-seq de October 2017 24 34
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
Get to know your data amp wisely choose DE methods
Example data 46078 genes x 96 cells22229 genes with no expression at all
Read Counts
Fre
quen
cy
0 500 1000 1500
050
0015
000
0 counts
Fre
quen
cy
0 20 40 60 80
020
0040
0060
00
Olga (NBIS) scRNA-seq de October 2017 25 34
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
Learn from methodological papers andor past studies
eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods
10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control
Olga (NBIS) scRNA-seq de October 2017 26 34
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
Learn from methodological papers andor past studies
Olga (NBIS) scRNA-seq de October 2017 27 34
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
Compare methods
eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata
Olga (NBIS) scRNA-seq de October 2017 28 34
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Performance
Stay critical
Olga (NBIS) scRNA-seq de October 2017 29 34
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Summary
Summary
Olga (NBIS) scRNA-seq de October 2017 30 34
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Summary
SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical
Olga (NBIS) scRNA-seq de October 2017 31 34
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
DE tutorial
DE tutorial
Olga (NBIS) scRNA-seq de October 2017 32 34
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
DE tutorial
DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells
check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset
Olga (NBIS) scRNA-seq de October 2017 33 34
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-
Finally
Thank you for attention
Questions
Enjoy the rest of the course
olgadethlefsennbisse
Olga (NBIS) scRNA-seq de October 2017 34 34
- Introduction
- Common methods
- Performance
- Summary
- DE tutorial
- Finally
-