Inferential Statistics Confidence Intervals and Hypothesis Testing.
Statistics and Quantitative Biologyqobweb.igc.gulbenkian.pt/courses/sqb2010/docs/SQB2010_4.pdf1...
Transcript of Statistics and Quantitative Biologyqobweb.igc.gulbenkian.pt/courses/sqb2010/docs/SQB2010_4.pdf1...
1
Thursday, September 30, 2010
Statistics and Quantitative Biology
Hypothesis testing
Statistical hypothesis testing in biological research
The scientific method
1. Observation and description of a phenomenon orgroup of phenomena.
2. Formulation of an hypothesis to explain thephenomena. Often the hypothesis takes theform of a causal relationship or mechanism.
3. Use of the hypothesis to predict the existenceof other phenomena, or to predict quantitativelythe results of new observations.
4. Performance of experimental tests of thepredictions by several independentexperimenters and properly performedexperiments.
2
Statistical hypothesis testing in research
Occam’s razor
Numquam ponenda est pluritas sine necessitate.
"Shave off" (omit) unnecessary entities in explanations
Choose the most parsimonious interpretation
Hypothesis Testing
Differences may and will occur simply because of theoperation of chance.
How can we determine in any given case whether theobserved differences between two samples are duemerely to chance or are caused by other factors?
Statistical inference procedures.
Hypothesis Testing
“Standard” hypothesis testing procedure
1. In face of a biological problem formulate a hypothesis anddesign an experiment with an expected yes-or-no result
2. Collect the data.3. Do a panel of statistical tests4. Find the one with the smallest p-value5. Choose a level of significance that best suites your favorite
interpretation of the data
6. In case of panic, call someone that you believeunderstands statistics
3
Examples of things that do not need statistics
Example of things that do need statistics
Hypothesis Testing
Inference procedure
1. State the null hypothesis (H0) and its alternative (H1).Decide what data to collect and under what conditions.
2. Choose a test, the model of which most closelyapproximates the conditions of the research in terms of theassumption on which the test is based.
3. Find the sampling distribution of the statistical test underthe assumption that H0 is true.
4. Specify a significance level (α) and a sample size (n).5. On the basis of 2, 3, and 4 above, define the region of
rejection for the statistical test.6. Collect the data. Using the data compute the value of the
test statistic. If the value is in the region of rejection, thedecision is to reject H0; otherwise, the decision is that H0cannot be rejected at the chosen level of significance.
4
Hypothesis Testing
The null hypothesis H0
It is formulated with the purpose of being rejected
It is the negation of the point one is trying to make
It is an hypothesis of “no effect”
Hypothesis Testing
The test statistic
How to choose a test statistic?
Hypothesis Testing
The level of significance
In advance of the data collection, we specify all possiblesamples that could occur when H0 is true.
From these we specify a subset of possible samples whichare so inconsistent with H0 (so extreme) that theprobability is very small, when H0 is true, that thesample we actually observe will be among them.
Then if we actually observe a sample which is included inthat subset, we reject H0 in favor of H1.
We reject H0 in favor of H1 if a statistical test yields avalue whose associated probability of occurrenceunder H0 is equal to or less than some smallprobability, usually denoted α, called the level ofsignificance.
5
Hypothesis Testing
Find the sampling distribution of the test statisticunder the null hypothesis H0
Find all possible samples under the null hypothesis H0
Compute the test statistic for each sample
Find the probability “associated with” each value ofthe test statistic
Hypothesis Testing
The “infamous” p-value
p is the probability “associated with” a specifiedvalue of the test statistics
It is the probability of drawing a sample with a teststatistic value at least as extreme as the specifiedvalue
Hypothesis Testing Example
Does a recessive mutation pem lead to early developmental problems ?
Does the mutation pem in homozygotic individuals increase the mortality inthe embryo?
A suitable null hypothesis: pem shows Mendelian segregation
H0: πpem =1/4 versus H1: πpem<1/4
6
CC cc
Gametes C c
Crosspolination
Cc
C cGametes
C c
C
c
1/2 1/2
1/2
1/2
CC Cc
Cc cc1/4 1/4
1/4 1/4
c
C1/1
1/1Cc1/1Cc1/1
CC Cc
Cc cc1/4 1/4
1/4 1/4
Crosspolination
CC Cc Cc cc
Offspring
wt/pem wt/pem
X
€
34⋅34⋅34
€
14⋅34⋅34
€
34⋅14⋅14
€
14⋅14⋅14
€
P(1) = 3 ⋅ 14⋅34⋅34
€
P(0) =34⋅34⋅34
€
P(2) = 3 ⋅ 34⋅14⋅14
€
P(3) =1⋅ 14⋅14⋅14
7
P(n)
n
What about litter size 5 ?
€
P(n) =5n
⋅14
n
⋅34
5−n
€
P(0) =1⋅ 14
0
⋅34
5
€
P(1) = 5 ⋅ 14
1
⋅34
4
€
P(2) =10 ⋅ 14
2
⋅34
3
€
P(3) =10 ⋅ 14
3
⋅34
2
€
P(4) = 5 ⋅ 14
4
⋅34
1
€
P(5) =1⋅ 14
5
⋅34
0
P(n)
n
8
What about litter size 9 ?
€
P(n) = Pπ (n |N) =Nn
⋅ π
n ⋅ (1−π )N−n
=N!
n!(N − n)!⋅ π n ⋅ (1−π )N−n
Binomial Distribution
0 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920
0.05
0.1
0.15
0.2
0.250 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920
P(n)
n Ν=20
α
9
0 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920
0.05
0.1
0.15
0.2
0.250 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920
P(n)
nRegion of rejection
Collect data
If n (the number of pem homozygous) is equalor smaller to 1 then reject the null hypothesis
that pem has Mendelian segregation
Otherwise, there is no evidence for non-Mendelian segretion
Hypothesis Testing
Types of errors
Type I error involves rejecting the hypothesis H0 when itis, in fact, true.
Type II error involves failing to the reject the nullhypothesis when, in fact, it is false.
Note that a gives the probability of mistakenly or falselyrejecting H0.
The probability of errors of type II, denoted b, is moredifficult to specify and one generally specifies N thesample size and a, from which b can be derived.
10
Hypothesis Testing
1-ββH1 True
α1-αH0 True
Reject H0Accept H0
Error Probabilities
Hypothesis Testing
One-tailed and two-tailed
Importance of Hypothesis H1
Hypothesis Testing
Inference procedure
1. State the null hypothesis (H0) and its alternative (H1).Decide what data to collect and under what conditions.
2. Choose a test, the model of which most closelyapproximates the conditions of the research in terms of theassumption on which the test is based.
3. Find the sampling distribution of the statistical test underthe assumption that H0 is true.
4. Specify a significance level (a) and a sample size (N).5. On the basis of 2, 3, and 4 above, define the region of
rejection for the statistical test.6. Collect the data. Using the data compute the value of the
test statistic. If the value is in the region of rejection, thedecision is to reject H0; otherwise, the decision is that H0cannot be rejected at the chosen level of significance.
11
Statistical test with their assumptions, samplingdistributions, and properties have been established for manycommon experimental conditions which are now available foryou to uses
Student’s t-tests
Fisher’s exact test
ANOVA
Chi2 test
Pearson’s correlation coefficient
Wilcoxon-Mann-Whitney test
Spearman correlation coefficient