False Discovery Rate (FDR) = proportion of false positive results out of all positive results...

False Discovery Rate (FDR) = proportion of false positive results out of all positive results (positive result =

statistically significant result)

Ladislav Pecen

Outline

Introduction Familywise Error Rate False Discovery Rate Positive False Discovery Rate Comparison of approaches Example – microarray study Estimate of FDR and pFDR Bayes approach to FDR and pFDR

Background

The aim is to find an approach to Problem of multiple testing

Problem, which occur by multiple testing each single test of one null hypothesis has probability of type I error equal to α calculation of several tests, where each has probability of type I error equal to

α, the probability of overall type I error increases this leads to high value of false positive results

in the worst situation (tests are independent) – the probabilities of type I error are summing

assume 100 of independent tests each test has probability of type I error equal to α = 0.05 in about 5% of tests ( = 5 tests) you will make type I error, i.e. you will reject the null

hypothesis although it is true

Typical areas where you can meet multiple testing microarray studies genetics and all lab testing

Theory of hypothesis testing

We test the null hypothesis H0 against the alternative hypothesis H1

One of the following four possibilities will happen

Usually we require restrictions on the two errors: probability of type I error, i.e. false positive rate, should be limited by α

P(rejection of H0 | H0 is true) = P(type I error) ≤ α

power of the test should not be lower than 1 – β

P(rejection of H0 | H0 is not true) = 1 – P(acceptation of H0 | H0 is not true)

= 1 – P(type II error) ≥ 1 - β

Testing result

True reject null hypothesis accept null hypothesis

null hypothesis type I error

alternative hypothesis type II error

Multiple hypotheses testing

Testing of multiple hypotheses

Also here we have restrictions on the false results, these could be similar as in one test of hypothesis :

control false positive rate: (# of incorrectly rejected H0)/(# of true H0) = S/m0

this is a classical approach, also called “Familywise error rate approach” (FWER)

or can be approached from different point of view control false discovery rate (# of incorrectly rejected H0)/(# of rejected H0) = S/R0

Connected characteristics sensitivity – proportion of correctly identified DE genes: U/m1

specificity – proportion of correctly identified non-DE genes: S/m0

Testing result

True # of rejected null hypotheses

# of accepted null hypotheses

Total

# of true null hypotheses S T m0

# of true alternatives U W m1=m-m0

Total R0 R1=m-R0 m

Example

Example from genetics, microarray studies 10 000 genes examined search for differentially expressed genes

type I error rate (false positive rate) = 475 / 9 500 = 5% type II error rate (false negative rate) = 100 / 500 = 20% sensitivity of the test (power) = 400 / 500 = 80% false discovery rate = 475 / 875 = 54%

more than a half of discovered DE genes are faults false non-discovery rate = 100 / 9125 = 1%

Results of testing

Determined as DE genes

Determined as non-DE genes

Total

Reality

Truly DE genes 400 100 500

Truly non-DE genes 475 9 025 9 500

Total 875 9 125 10 000

Common approach

Control of familywise error rate (FWER) probability of making one or more false discoveries (type I error) estimate: number of false discoveries out of all tests done

assume, we provide k independent tests, each at significance level α = 0.05 i.e. in each test we have 5% probability of making false positive decision

using Bonferroni inequality, we can estimate the overall probability of type I error, i.e. the probability of making at least one false positive decision as k * α = k * 0.05

already for 10 tests we have the upper bound for probability of at least one false positive result 50%

to keep the overall significance level controlled (e.g. equal to 5%), one have to decrease the significance level in each particular test

to be sure that overall significance level is α = 0.05, each test has to be provided at significance level α = 0.05 / k

for 10 test, each has to have its significance level 0.005; for thousand of tests, ...

Such approach leads to highly conservative results for thousands of test it is highly difficult to prove truly positive result

the more tests, the lower power

False Discovery Rate

Definition

False Discovery Rate (FDR) = proportion of false positive results out of

all positive (= statistically significant) results

Advantages if null hypothesis is rejected, we know the probability, that it is correctly rejected

FDR = 5% means that out of 100 positive tests circa 5 are false positive and remaining 95 are truly positive

usually it is more powerful than the traditional FWER approach convenient especially when testing large amount of null hypotheses

Disadvantages do not need to keep the probability of at least one wrongly rejected null hypothesis lower than

α one has to care about situation, when number of rejected H0 is zero

Factors determining False Discovery Rate (FDR) proportion of truly DE genes: m1/m

distribution of true differences variability sample size

False Discovery Rate

The FDR as defined above works nicely, when at least one of the null hypotheses is rejected, i.e. when R0 > 0

In case, when P(R0 = 0) > 0, three possibilities are available

FDR1 = E(S / R0 | R0 > 0) * P(R0 > 0)

FDR2 = E(S / R0 | R0 > 0)

FDR3 = E(S) / E(R0)

The second and third alternatives are equal to 1, if m0 = m

FDR2 and FDR3 cannot be controlled (limited by α) whenever m0 = m, hence Benjamini and Hochberg decided to work with FDR1 in further, it is called False Discovery Rate (FDR)

when controlling FDR1 by α, it means that we control FDR1 = E(S/R0 |R0> 0) only with α / P(R0 > 0), hence Storey decided to work with FDR2

the fact that FDR2 = 1 when m0 = m is not a problem, since this result is obvious

in further, it is called positive False Discovery Rate (pFDR)

Benjamini-Hochberg - FDR

Procedure controlling false discovery rate consider testing of m null hypotheses H1, H2, H3, ..., Hm

order the respective p-values such that p(1) ≤ p(2) ≤ p(3) ≤ ... ≤ p(m) and denote the null hypothesis corresponding to p(i) as H(i)

choose k* such that p(k*) is the largest p-value less than α * k / m, i.e.

k* = argmax{k: p(k) ≤ α * k / m; 1 ≤ k ≤ m}

then reject all hypotheses H0(i) for which p(i) ≤ p(k*)

Properties for independent test statistics and any configuration of false null hypotheses,

the procedure controls FDR at value α

E(S/k0) = E(falsely rejected / number of rejected) ≤ α *m0/m

Benjamini, Y. and Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. (1995) J. R. Stat. Soc. Ser. B 57289–300

Positive False Discovery Rate

Definition false discovery rate given that at least one test has positive result

i.e. proportion of false positive results between all the positive results given at least one positive result occur

pFDR = E(V / R0 | R0 > 0)

Additional characteristics can be defined q-value: a natural pFDR counterpart to common p-value

p-value is defined as test statistic greater than equal to observed value given null hypothesis P (T ≥ t | H0)

q-value is Bayesian analogue to p-value q-value is posterior probability of null hypothesis given test statistic is

greater than or equal to the observed value P(H0 | T ≥ t)

for more hypotheses: q-value(t) = inf {pFDR(Γα); t ϵ Γα}

Positive False Discovery Rate

for more hypotheses: q-value(t) = inf {pFDR(Γα); t ϵ Γα} the minimum pFDR that can occur, when rejecting a statistic with

value t minimum posterior probability of null hypothesis over all significance

regions containing the statistics q-value minimizes the ratio of the type I error to the power over all

significance regions that contain the statistic

pFNR (positive False Negatives hypothesis Rate): a natural counterpart to pFDR rate of false negatives hypothesis between all negative results first we define False Non-discovery rate (FNR)

FNR = E(T / (m – R0) | (m – R0) > 0) P((m – R0) > 0)

the positive False Non-discovery rate (FNR) is defined as

pFNR = E(T / (m – R0) | (m – R0) > 0)

Comparison of FWER, FDR and pFDR

FWER – control for multiple error rate we fix the error rate and estimate the rejection area

FDR – control of false positive between all positive we fix the rejection area and estimate the error rate

Interpretation of FDR and pFDR FDR - rate that false discoveries occur pFDR - rate that discoveries are false

when all the null hypotheses are true (i.e. when all genes are non-DE and m0 = m), the pFDR will be always equal to 1 (and hence cannot be controlled by some prespecified value of α

when controlling FDR at level α, and positive findings have occurred, then FDR has really only been controlled at level α / P(R0 > 0)

Comparison of FWER, FDR and pFDR

Two approaches to false discovery rate to fix the acceptable rate α and estimate the

corresponding significance threshold is available only when using FDR, since the pFDR cannot be

controlled

to fix the significance threshold and to estimate the corresponding rate α both FDR as well as pFDR can be used ... the later one leads to

“stronger” results

Example

Back to the genetics and the microarray study the problems come from high percentage

of truly non-DE genes lowering of significance level decreases also FDR

Assume the following situation evaluate circa 10 000 genes to decide whether they are DE or non-DE the genes are independent or slightly dependent for each gene compare two independent groups with equal variance n arrays per group usage of standard t-statistics with pooled variance

9 5009 025475Truly non-DE genes

10 0009 125875Total

500100400Truly DE genes

Reality

TotalDetermined as non-DE genes


Results of testing

9 5009 025475Truly non-DE genes

10 0009 125875Total

500100400Truly DE genes

Reality

TotalDetermined as non-DE genes


Results of testing

Example

denote α the significance level for each one test, not an overall significance level for all multiple tests together

Any formal statistical testing procedure compute relevant test statistic for each gene sort the statistics (or p-values) by order determine the cut-off point dividing the genes into DE and non-DE

In such situation it makes sense to care about FDR – how many of the rejected null hypotheses are rejected wrongly FNR – proportion of true alternative missed by the test

Example

Genetics, microarray studies, particular situations FDR varies depending on sample size per group (n),

significance level (α) and proportion of true null hypotheses (π0 = m0 / m)

n = 5 microarrays per group at significance level α = 5% we get

sensitivity of the test about 35% FDR at π0 = 0.9 is greater than 60% and at π0 = 0.995 it is around

95% at sensitivity equal to 80% we get

significance level around 0.45 FDR at π0 = 0.9 is around 82% and at π0 = 0.99 it is almost 99%

hence, n = 5 leads to underpowered study

Example

n = 20 microarrays per group at significance level α = 5% we get

sensitivity of the test around 90% FDR at π0 = 0.9 is around 35% and at π0 = 0.99 it is more than 80%

n = 30 microarrays per group at significance level α = 5% the results are still poor at significance level α = 0.4% we get

sensitivity of the test around 80% FDR at π0 = 0.9 is slightly above 20% and at π0 = 0.99 it is around

72%

for any n, the best results associated with significance level α = 5% can be minimal FDR at π0 = 0.9 is 18% and at π0 = 0.99 it is 71%

Example

Genetics, microarray study FDR enables another approach to sample size estimation

the sample size depends on number and distribution of truly DE genes and on the tolerated value of FDR

if n = 5, we have to have about π0 = 80% of truly DE genes to obtain reasonable FDR

another possibility is to hope for large differences in both cases we use really small significance level

if π0 = 0.9 and we desire FDR less than 10%, then we should have at least n = 30 observations per group

if π0 = 0.99 and we classify 1% of top genes as DE, we need sample size of n

= 45 to observe FDR around 10% (sample size n = 35 is necessary for FDR less than 20%)

if we can estimate π0, then it makes sense to apply the rule to reject top (1 – π0) * 100% of hypotheses, since then FDR = FNR = 1 – sensitivity

we control both these statistics together

Typical examplesSituations, where controlling of FWER (i.e. probability of one false rejection of H0) is not needed and then controlling of FDR is meaningful

multiple endpoints typical goal: to recommend a new test or treatment over the standard one the aim is to find as many endpoints in which the new treatment can exceed the standard

one the limitation on false positive result is not so strict, but too many false discoveries are

also bad

multiple separate decisions without an overall decision multiple subgroups problem, where two treatments are compared in various subgroups of

patients we want to find as many subgroups with potentially different reaction on the two

treatments as possible but we want to control the rate of false discoveries

screening of multiple potential effects multiple potential effect is screened to weed out the null effect screening of various chemicals when screening of potential drug development again, we want as many discoveries as possible, while controlling for FDR

Thank you

Sometimes I would like to exchange all my knowledgefor a bottle of Whisky

False Discovery Rate (FDR) = proportion of false positive results out of all positive results...

Documents

Transcript of False Discovery Rate (FDR) = proportion of false positive results out of all positive results...