False Discovery Rate (FDR) = proportion of false positive results out of all positive results...
-
Upload
nicholas-jones -
Category
Documents
-
view
226 -
download
0
Transcript of False Discovery Rate (FDR) = proportion of false positive results out of all positive results...
False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result =
statistically significant result)
Ladislav Pecen
Outline
Introduction Familywise Error Rate False Discovery Rate Positive False Discovery Rate Comparison of approaches Example – microarray study Estimate of FDR and pFDR Bayes approach to FDR and pFDR
Background
The aim is to find an approach to Problem of multiple testing
Problem, which occur by multiple testing each single test of one null hypothesis has probability of type I error equal to α calculation of several tests, where each has probability of type I error equal to
α, the probability of overall type I error increases this leads to high value of false positive results
in the worst situation (tests are independent) – the probabilities of type I error are summing
assume 100 of independent tests each test has probability of type I error equal to α = 0.05 in about 5% of tests ( = 5 tests) you will make type I error, i.e. you will reject the null
hypothesis although it is true
Typical areas where you can meet multiple testing microarray studies genetics and all lab testing
Theory of hypothesis testing
We test the null hypothesis H0 against the alternative hypothesis H1
One of the following four possibilities will happen
Usually we require restrictions on the two errors: probability of type I error, i.e. false positive rate, should be limited by α
P(rejection of H0 | H0 is true) = P(type I error) ≤ α
power of the test should not be lower than 1 – β
P(rejection of H0 | H0 is not true) = 1 – P(acceptation of H0 | H0 is not true)
= 1 – P(type II error) ≥ 1 - β
Testing result
True reject null hypothesis accept null hypothesis
null hypothesis type I error
alternative hypothesis type II error
Multiple hypotheses testing
Testing of multiple hypotheses
Also here we have restrictions on the false results, these could be similar as in one test of hypothesis :
control false positive rate: (# of incorrectly rejected H0)/(# of true H0) = S/m0
this is a classical approach, also called “Familywise error rate approach” (FWER)
or can be approached from different point of view control false discovery rate (# of incorrectly rejected H0)/(# of rejected H0) = S/R0
Connected characteristics sensitivity – proportion of correctly identified DE genes: U/m1
specificity – proportion of correctly identified non-DE genes: S/m0
Testing result
True # of rejected null hypotheses
# of accepted null hypotheses
Total
# of true null hypotheses S T m0
# of true alternatives U W m1=m-m0
Total R0 R1=m-R0 m
Example
Example from genetics, microarray studies 10 000 genes examined search for differentially expressed genes
type I error rate (false positive rate) = 475 / 9 500 = 5% type II error rate (false negative rate) = 100 / 500 = 20% sensitivity of the test (power) = 400 / 500 = 80% false discovery rate = 475 / 875 = 54%
more than a half of discovered DE genes are faults false non-discovery rate = 100 / 9125 = 1%
Results of testing
Determined as DE genes
Determined as non-DE genes
Total
Reality
Truly DE genes 400 100 500
Truly non-DE genes 475 9 025 9 500
Total 875 9 125 10 000
Common approach
Control of familywise error rate (FWER) probability of making one or more false discoveries (type I error) estimate: number of false discoveries out of all tests done
assume, we provide k independent tests, each at significance level α = 0.05 i.e. in each test we have 5% probability of making false positive decision
using Bonferroni inequality, we can estimate the overall probability of type I error, i.e. the probability of making at least one false positive decision as k * α = k * 0.05
already for 10 tests we have the upper bound for probability of at least one false positive result 50%
to keep the overall significance level controlled (e.g. equal to 5%), one have to decrease the significance level in each particular test
to be sure that overall significance level is α = 0.05, each test has to be provided at significance level α = 0.05 / k
for 10 test, each has to have its significance level 0.005; for thousand of tests, ...
Such approach leads to highly conservative results for thousands of test it is highly difficult to prove truly positive result
the more tests, the lower power
False Discovery Rate
Definition
False Discovery Rate (FDR) = proportion of false positive results out of
all positive (= statistically significant) results
Advantages if null hypothesis is rejected, we know the probability, that it is correctly rejected
FDR = 5% means that out of 100 positive tests circa 5 are false positive and remaining 95 are truly positive
usually it is more powerful than the traditional FWER approach convenient especially when testing large amount of null hypotheses
Disadvantages do not need to keep the probability of at least one wrongly rejected null hypothesis lower than
α one has to care about situation, when number of rejected H0 is zero
Factors determining False Discovery Rate (FDR) proportion of truly DE genes: m1/m
distribution of true differences variability sample size
False Discovery Rate
The FDR as defined above works nicely, when at least one of the null hypotheses is rejected, i.e. when R0 > 0
In case, when P(R0 = 0) > 0, three possibilities are available
FDR1 = E(S / R0 | R0 > 0) * P(R0 > 0)
FDR2 = E(S / R0 | R0 > 0)
FDR3 = E(S) / E(R0)
The second and third alternatives are equal to 1, if m0 = m
FDR2 and FDR3 cannot be controlled (limited by α) whenever m0 = m, hence Benjamini and Hochberg decided to work with FDR1 in further, it is called False Discovery Rate (FDR)
when controlling FDR1 by α, it means that we control FDR1 = E(S/R0 |R0> 0) only with α / P(R0 > 0), hence Storey decided to work with FDR2
the fact that FDR2 = 1 when m0 = m is not a problem, since this result is obvious
in further, it is called positive False Discovery Rate (pFDR)
Benjamini-Hochberg - FDR
Procedure controlling false discovery rate consider testing of m null hypotheses H1, H2, H3, ..., Hm
order the respective p-values such that p(1) ≤ p(2) ≤ p(3) ≤ ... ≤ p(m) and denote the null hypothesis corresponding to p(i) as H(i)
choose k* such that p(k*) is the largest p-value less than α * k / m, i.e.
k* = argmax{k: p(k) ≤ α * k / m; 1 ≤ k ≤ m}
then reject all hypotheses H0(i) for which p(i) ≤ p(k*)
Properties for independent test statistics and any configuration of false null hypotheses,
the procedure controls FDR at value α
E(S/k0) = E(falsely rejected / number of rejected) ≤ α *m0/m
Benjamini, Y. and Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. (1995) J. R. Stat. Soc. Ser. B 57289–300
Positive False Discovery Rate
Definition false discovery rate given that at least one test has positive result
i.e. proportion of false positive results between all the positive results given at least one positive result occur
pFDR = E(V / R0 | R0 > 0)
Additional characteristics can be defined q-value: a natural pFDR counterpart to common p-value
p-value is defined as test statistic greater than equal to observed value given null hypothesis P (T ≥ t | H0)
q-value is Bayesian analogue to p-value q-value is posterior probability of null hypothesis given test statistic is
greater than or equal to the observed value P(H0 | T ≥ t)
for more hypotheses: q-value(t) = inf {pFDR(Γα); t ϵ Γα}
Positive False Discovery Rate
for more hypotheses: q-value(t) = inf {pFDR(Γα); t ϵ Γα} the minimum pFDR that can occur, when rejecting a statistic with
value t minimum posterior probability of null hypothesis over all significance
regions containing the statistics q-value minimizes the ratio of the type I error to the power over all
significance regions that contain the statistic
pFNR (positive False Negatives hypothesis Rate): a natural counterpart to pFDR rate of false negatives hypothesis between all negative results first we define False Non-discovery rate (FNR)
FNR = E(T / (m – R0) | (m – R0) > 0) P((m – R0) > 0)
the positive False Non-discovery rate (FNR) is defined as
pFNR = E(T / (m – R0) | (m – R0) > 0)
Comparison of FWER, FDR and pFDR
FWER – control for multiple error rate we fix the error rate and estimate the rejection area
FDR – control of false positive between all positive we fix the rejection area and estimate the error rate
Interpretation of FDR and pFDR FDR - rate that false discoveries occur pFDR - rate that discoveries are false
when all the null hypotheses are true (i.e. when all genes are non-DE and m0 = m), the pFDR will be always equal to 1 (and hence cannot be controlled by some prespecified value of α
when controlling FDR at level α, and positive findings have occurred, then FDR has really only been controlled at level α / P(R0 > 0)
Comparison of FWER, FDR and pFDR
Two approaches to false discovery rate to fix the acceptable rate α and estimate the
corresponding significance threshold is available only when using FDR, since the pFDR cannot be
controlled
to fix the significance threshold and to estimate the corresponding rate α both FDR as well as pFDR can be used ... the later one leads to
“stronger” results
Example
Back to the genetics and the microarray study the problems come from high percentage
of truly non-DE genes lowering of significance level decreases also FDR
Assume the following situation evaluate circa 10 000 genes to decide whether they are DE or non-DE the genes are independent or slightly dependent for each gene compare two independent groups with equal variance n arrays per group usage of standard t-statistics with pooled variance
9 5009 025475Truly non-DE genes
10 0009 125875Total
500100400Truly DE genes
Reality
TotalDetermined as non-DE genes
Determined as DE genes
Results of testing
9 5009 025475Truly non-DE genes
10 0009 125875Total
500100400Truly DE genes
Reality
TotalDetermined as non-DE genes
Determined as DE genes
Results of testing
Example
denote α the significance level for each one test, not an overall significance level for all multiple tests together
Any formal statistical testing procedure compute relevant test statistic for each gene sort the statistics (or p-values) by order determine the cut-off point dividing the genes into DE and non-DE
In such situation it makes sense to care about FDR – how many of the rejected null hypotheses are rejected wrongly FNR – proportion of true alternative missed by the test
Example
Genetics, microarray studies, particular situations FDR varies depending on sample size per group (n),
significance level (α) and proportion of true null hypotheses (π0 = m0 / m)
n = 5 microarrays per group at significance level α = 5% we get
sensitivity of the test about 35% FDR at π0 = 0.9 is greater than 60% and at π0 = 0.995 it is around
95% at sensitivity equal to 80% we get
significance level around 0.45 FDR at π0 = 0.9 is around 82% and at π0 = 0.99 it is almost 99%
hence, n = 5 leads to underpowered study
Example
n = 20 microarrays per group at significance level α = 5% we get
sensitivity of the test around 90% FDR at π0 = 0.9 is around 35% and at π0 = 0.99 it is more than 80%
n = 30 microarrays per group at significance level α = 5% the results are still poor at significance level α = 0.4% we get
sensitivity of the test around 80% FDR at π0 = 0.9 is slightly above 20% and at π0 = 0.99 it is around
72%
for any n, the best results associated with significance level α = 5% can be minimal FDR at π0 = 0.9 is 18% and at π0 = 0.99 it is 71%
Example
Genetics, microarray study FDR enables another approach to sample size estimation
the sample size depends on number and distribution of truly DE genes and on the tolerated value of FDR
if n = 5, we have to have about π0 = 80% of truly DE genes to obtain reasonable FDR
another possibility is to hope for large differences in both cases we use really small significance level
if π0 = 0.9 and we desire FDR less than 10%, then we should have at least n = 30 observations per group
if π0 = 0.99 and we classify 1% of top genes as DE, we need sample size of n
= 45 to observe FDR around 10% (sample size n = 35 is necessary for FDR less than 20%)
if we can estimate π0, then it makes sense to apply the rule to reject top (1 – π0) * 100% of hypotheses, since then FDR = FNR = 1 – sensitivity
we control both these statistics together
Typical examplesSituations, where controlling of FWER (i.e. probability of one false rejection of H0) is not needed and then controlling of FDR is meaningful
multiple endpoints typical goal: to recommend a new test or treatment over the standard one the aim is to find as many endpoints in which the new treatment can exceed the standard
one the limitation on false positive result is not so strict, but too many false discoveries are
also bad
multiple separate decisions without an overall decision multiple subgroups problem, where two treatments are compared in various subgroups of
patients we want to find as many subgroups with potentially different reaction on the two
treatments as possible but we want to control the rate of false discoveries
screening of multiple potential effects multiple potential effect is screened to weed out the null effect screening of various chemicals when screening of potential drug development again, we want as many discoveries as possible, while controlling for FDR
Thank you
Sometimes I would like to exchange all my knowledgefor a bottle of Whisky