scRNA-seq - Differential expression analysis methods · Differential expression analysis methods...

35
scRNA-seq Differential expression analysis methods Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden October 2017 Olga (NBIS) scRNA-seq de October 2017 1 / 34

Transcript of scRNA-seq - Differential expression analysis methods · Differential expression analysis methods...

Page 1: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

scRNA-seqDifferential expression analysis methods

Olga Dethlefsen

NBIS National Bioinformatics Infrastructure Sweden

October 2017

Olga (NBIS) scRNA-seq de October 2017 1 34

OutlineIntroduction what is so special about DE with scRNA-seqCommon methods what is out therePerformance how to choose the best methodSummaryDE tutorial

Olga (NBIS) scRNA-seq de October 2017 2 34

Introduction

Figure Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 3 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seq

Olga (NBIS) scRNA-seq de October 2017 4 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)

Olga (NBIS) scRNA-seq de October 2017 5 34

Common methods

Common methods

Olga (NBIS) scRNA-seq de October 2017 6 34

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 2: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

OutlineIntroduction what is so special about DE with scRNA-seqCommon methods what is out therePerformance how to choose the best methodSummaryDE tutorial

Olga (NBIS) scRNA-seq de October 2017 2 34

Introduction

Figure Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 3 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seq

Olga (NBIS) scRNA-seq de October 2017 4 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)

Olga (NBIS) scRNA-seq de October 2017 5 34

Common methods

Common methods

Olga (NBIS) scRNA-seq de October 2017 6 34

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 3: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Introduction

Figure Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 3 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seq

Olga (NBIS) scRNA-seq de October 2017 4 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)

Olga (NBIS) scRNA-seq de October 2017 5 34

Common methods

Common methods

Olga (NBIS) scRNA-seq de October 2017 6 34

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 4: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seq

Olga (NBIS) scRNA-seq de October 2017 4 34

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)

Olga (NBIS) scRNA-seq de October 2017 5 34

Common methods

Common methods

Olga (NBIS) scRNA-seq de October 2017 6 34

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 5: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Introduction

Differential expression is an old problemso

why is DE scRNA-seq different to RNA-seqscRNA-seq are affected by higher noise (technical and biologicalfactors)low amount of available mRNAs results in amplification biases anddropout events (technical)3rsquo bias partial coverage and uneven depth (technical)stochastic nature of transcription (biological)multimodality in gene expression presence of multiple possiblecell states within a cell population (biological)

Olga (NBIS) scRNA-seq de October 2017 5 34

Common methods

Common methods

Olga (NBIS) scRNA-seq de October 2017 6 34

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 6: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Common methods

Olga (NBIS) scRNA-seq de October 2017 6 34

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 7: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Simplified scRNA-seq workflow [adopted from httphemberg-labgithubio

Olga (NBIS) scRNA-seq de October 2017 7 34

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 8: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Common methodsnon-parametric test eg Kruskal-Wallis (generic)edgeR limma (bulk RNA-seq)MAST SCDE Monocle (scRNA-seq)D3E Pagoda (scRNA-seq)

Olga (NBIS) scRNA-seq de October 2017 8 34

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 9: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Table Information of gene differential expression analysis methods used [Miao and Zhang 2017Quantitative Biology 2016 4]

Olga (NBIS) scRNA-seq de October 2017 9 34

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 10: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

MAST

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effects DE isdetermined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 10 34

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 11: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

SCDE

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 11 34

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 12: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Monocole

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio testOlga (NBIS) scRNA-seq de October 2017 12 34

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 13: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Letrsquos stop for a minute

Olga (NBIS) scRNA-seq de October 2017 13 34

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 14: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Differential expression

Differential expression analysismeans taking the normalized read count data ampperforming statistical analysis to discover quantitative changes inexpression levels between experimental groupseg to decide whether for a given gene an observed difference inread counts is significant that is whether it is greater than whatwould be expected just due to natural random variationor simply checking for differences in distributions

Olga (NBIS) scRNA-seq de October 2017 14 34

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 15: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

The key

Outcomei = (Modeli) + errori

we collect data on a sample from a much larger populationStatistics lets us to make inferences about the population fromwhich it was derivedwe try to predict the outcome given a model fitted to the data

Olga (NBIS) scRNA-seq de October 2017 15 34

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 16: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

The key

t = x1minusx2

sp

radic1

n1+ 1

n2

height [cm]

Fre

quen

cy

165 170 175 180

010

3050

Olga (NBIS) scRNA-seq de October 2017 16 34

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 17: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

The key

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Olga (NBIS) scRNA-seq de October 2017 17 34

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 18: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

The key MAST (again)

uses generalized linear hurdle modeldesigned to account for stochastic dropouts and bimodalexpression distribution in which expression is either stronglynon-zero or non-detectableThe rate of expression Z and the level of expression Y aremodeled for each gene g indicating whether gene g is expressedin cell i (ie Zig = 0 if yig = 0 and zig = 1 if yig gt 0)A logistic regression model for the discrete variable Z and aGaussian linear model for the continuous variable (Y|Z=1)

logit(Pr (Zig = 1)) = XiβDg

Pr (Yig = Y |Zig = 1) = N(XiβCg σ

2g) where Xi is a design matrix

Model parameters are fitted using an empirical BayesianframeworkAllows for a joint estimate of nuisance and treatment effectsDE is determined using the likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 18 34

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 19: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

The key SCDE (again)

models the read counts for each gene using a mixture of a NBnegative binomial and a Poisson distributionNB distribution models the transcripts that are amplified anddetectedPoisson distribution models the unobserved or background-levelsignal of transcripts that are not amplified (eg dropout events)subset of robust genes is used to fit via EM algorithm theparameters to the mixture of modelsFor DE the posterior probability that the gene shows a foldexpression difference between two conditions is computed using aBayesian approach

Olga (NBIS) scRNA-seq de October 2017 19 34

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 20: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

The key Monocole (again)

Originally designed for ordering cells by progress throughdifferentiation stages (pseudo-time)The mean expression level of each gene is modeled with a GAMgeneralized additive model which relates one or more predictorvariables to a response variable as

g(E(Y )) = β0 + f1(x1) + f2(x2) + + fm(xm) where Y is a specific geneexpression level xi are predictor variables g is a link function typically

log function and fi are non-parametric functions (eg cubic splines)

The observable expression level Y is then modeled using GAM

E(Y ) = s(ϕt(bx si)) + ε where ϕt(bx si) is the assigned pseudo-timeof a cell and s is a cubic smoothing function with three degrees offreedom The error term ε is normally distributed with a mean of zero

The DE test is performed using an approx χ2 likelihood ratio test

Olga (NBIS) scRNA-seq de October 2017 20 34

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 21: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

They key implication

Simple recipemodel eg gene expression with random errorfit model to the data andor data to the model estimate modelparametersuse model for prediction andor inference

Implicationthe better model fits to the data the better statistics

Olga (NBIS) scRNA-seq de October 2017 21 34

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 22: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Common methods

Negative Binomial

Read Counts

Fre

quen

cy

0 5 10 15 20

050

100

150

200

Zerominusinflated NB

Read Counts

Fre

quen

cy

0 5 10 15 20

010

020

030

040

050

0

PoissonminusBeta

Read Counts

Fre

quen

cy

0 20 60 100

010

020

030

040

0

Olga (NBIS) scRNA-seq de October 2017 22 34

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 23: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

Performance

Olga (NBIS) scRNA-seq de October 2017 23 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 24: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 25: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

No golden standard

There is no golden standard no single best solution

so what do we do

we gather as much evidence as possible

Olga (NBIS) scRNA-seq de October 2017 24 34

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 26: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

Get to know your data amp wisely choose DE methods

Example data 46078 genes x 96 cells22229 genes with no expression at all

Read Counts

Fre

quen

cy

0 500 1000 1500

050

0015

000

0 counts

Fre

quen

cy

0 20 40 60 80

020

0040

0060

00

Olga (NBIS) scRNA-seq de October 2017 25 34

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 27: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

Learn from methodological papers andor past studies

eg Dal Molin Barruzo and Di Camilillo frontiers in Genetics 2017Single-Cell RNA-Sequencing Assessment of Differential ExpressionAnalysis Methods

10000 genes simulated for 2 conditions with sample size of 100cells each8000 genes were simulated as not differentially expressed usingthe same distribution (unimodal NB and bimodal two-componentNB mixture)2000 genes were simulated as differentially expressed accordingto four types of differential expressionsreal dataset 44 mouse Embryonic Stem Cells and 44 EmbryonicFibroblsts for positive controlreal dataset 80 single cells as negative control

Olga (NBIS) scRNA-seq de October 2017 26 34

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 28: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

Learn from methodological papers andor past studies

Olga (NBIS) scRNA-seq de October 2017 27 34

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 29: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

Compare methods

eg Miao and Zhang Quantitative Biology 20164 Differentialexpression analyses for single-cell RNA-Seq old questions on newdata

Olga (NBIS) scRNA-seq de October 2017 28 34

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 30: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Performance

Stay critical

Olga (NBIS) scRNA-seq de October 2017 29 34

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 31: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Summary

Summary

Olga (NBIS) scRNA-seq de October 2017 30 34

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 32: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Summary

SummaryscRNA-seq is a rapidly growing fieldDE is a common task so many newer and better methods will bedevelopedthink like a statistician get to know your data think aboutdistributions and models best for your data Avoid applyingmethods blindlycomparing methods is good as long as you are aware what youare comparing and whystay critical

Olga (NBIS) scRNA-seq de October 2017 31 34

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 33: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

DE tutorial

DE tutorial

Olga (NBIS) scRNA-seq de October 2017 32 34

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 34: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

DE tutorial

DE tutorialBased on the dataset used is single-cell RNA-seq data (SmartSeq)from mouse embryonic development from Deng et al Science 2014Vol 343 no 6167 pp 193-196 Single-Cell RNA-Seq RevealsDynamic Random Monoallelic Gene Expression in Mammalian Cells

check for differentially expressed genes between 8-cell and 16-cellstage embryoswith many methods incl SCDE MAST SC3 package PagodaSeuratand compare the results trying to decide on the best DE methodfor the dataset

Olga (NBIS) scRNA-seq de October 2017 33 34

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally
Page 35: scRNA-seq - Differential expression analysis methods · Differential expression analysis methods Olga Dethlefsen ... DE tutorial DE tutorial Olga ... scRNA-seq - Differential expression

Finally

Thank you for attention

Questions

Enjoy the rest of the course

olgadethlefsennbisse

Olga (NBIS) scRNA-seq de October 2017 34 34

  • Introduction
  • Common methods
  • Performance
  • Summary
  • DE tutorial
  • Finally