Statistics and Quantitative Biologyqobweb.igc.gulbenkian.pt/courses/sqb2010/docs/SQB2010_4.pdf1...

1

Thursday, September 30, 2010

Statistics and Quantitative Biology

Hypothesis testing

Statistical hypothesis testing in biological research

The scientific method

1. Observation and description of a phenomenon orgroup of phenomena.

2. Formulation of an hypothesis to explain thephenomena. Often the hypothesis takes theform of a causal relationship or mechanism.

3. Use of the hypothesis to predict the existenceof other phenomena, or to predict quantitativelythe results of new observations.

4. Performance of experimental tests of thepredictions by several independentexperimenters and properly performedexperiments.

2

Statistical hypothesis testing in research

Occam’s razor

Numquam ponenda est pluritas sine necessitate.

"Shave off" (omit) unnecessary entities in explanations

Choose the most parsimonious interpretation

Hypothesis Testing

Differences may and will occur simply because of theoperation of chance.

How can we determine in any given case whether theobserved differences between two samples are duemerely to chance or are caused by other factors?

Statistical inference procedures.

Hypothesis Testing

“Standard” hypothesis testing procedure

1. In face of a biological problem formulate a hypothesis anddesign an experiment with an expected yes-or-no result

2. Collect the data.3. Do a panel of statistical tests4. Find the one with the smallest p-value5. Choose a level of significance that best suites your favorite

interpretation of the data

6. In case of panic, call someone that you believeunderstands statistics

3

Examples of things that do not need statistics

Example of things that do need statistics

Hypothesis Testing

Inference procedure

1. State the null hypothesis (H0) and its alternative (H1).Decide what data to collect and under what conditions.

2. Choose a test, the model of which most closelyapproximates the conditions of the research in terms of theassumption on which the test is based.

3. Find the sampling distribution of the statistical test underthe assumption that H0 is true.

4. Specify a significance level (α) and a sample size (n).5. On the basis of 2, 3, and 4 above, define the region of

rejection for the statistical test.6. Collect the data. Using the data compute the value of the

test statistic. If the value is in the region of rejection, thedecision is to reject H0; otherwise, the decision is that H0cannot be rejected at the chosen level of significance.

4

Hypothesis Testing

The null hypothesis H0

It is formulated with the purpose of being rejected

It is the negation of the point one is trying to make

It is an hypothesis of “no effect”

Hypothesis Testing

The test statistic

How to choose a test statistic?

Hypothesis Testing

The level of significance

In advance of the data collection, we specify all possiblesamples that could occur when H0 is true.

From these we specify a subset of possible samples whichare so inconsistent with H0 (so extreme) that theprobability is very small, when H0 is true, that thesample we actually observe will be among them.

Then if we actually observe a sample which is included inthat subset, we reject H0 in favor of H1.

We reject H0 in favor of H1 if a statistical test yields avalue whose associated probability of occurrenceunder H0 is equal to or less than some smallprobability, usually denoted α, called the level ofsignificance.

5

Hypothesis Testing

Find the sampling distribution of the test statisticunder the null hypothesis H0

Find all possible samples under the null hypothesis H0

Compute the test statistic for each sample

Find the probability “associated with” each value ofthe test statistic

Hypothesis Testing

The “infamous” p-value

p is the probability “associated with” a specifiedvalue of the test statistics

It is the probability of drawing a sample with a teststatistic value at least as extreme as the specifiedvalue

Hypothesis Testing Example

Does a recessive mutation pem lead to early developmental problems ?

Does the mutation pem in homozygotic individuals increase the mortality inthe embryo?

A suitable null hypothesis: pem shows Mendelian segregation

H0: πpem =1/4 versus H1: πpem<1/4

6

CC cc

Gametes C c

Crosspolination

Cc

C cGametes

C c

C

c

1/2 1/2

1/2

1/2

CC Cc

Cc cc1/4 1/4

1/4 1/4

c

C1/1

1/1Cc1/1Cc1/1

CC Cc

Cc cc1/4 1/4

1/4 1/4

Crosspolination

CC Cc Cc cc

Offspring

wt/pem wt/pem

X

€

34⋅34⋅34

€

14⋅34⋅34

€

34⋅14⋅14

€

14⋅14⋅14

€

P(1) = 3 ⋅ 14⋅34⋅34

€

P(0) =34⋅34⋅34

€

P(2) = 3 ⋅ 34⋅14⋅14

€

P(3) =1⋅ 14⋅14⋅14

7

P(n)

n

What about litter size 5 ?

€

P(n) =5n

⋅14

n

⋅34

5−n

€

P(0) =1⋅ 14

0

⋅34

5

€

P(1) = 5 ⋅ 14

1

⋅34

4

€

P(2) =10 ⋅ 14

2

⋅34

3

€

P(3) =10 ⋅ 14

3

⋅34

2

€

P(4) = 5 ⋅ 14

4

⋅34

1

€

P(5) =1⋅ 14

5

⋅34

0

P(n)

n

8

What about litter size 9 ?

€

P(n) = Pπ (n |N) =Nn

⋅ π

n ⋅ (1−π )N−n

=N!

n!(N − n)!⋅ π n ⋅ (1−π )N−n

Binomial Distribution

0 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920

0.05

0.1

0.15

0.2

0.250 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920

P(n)

n Ν=20

α

9

0 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920

0.05

0.1

0.15

0.2

0.250 1 2 3 4 5 6 7 8 9 1011 1213 141516 1718 1920

P(n)

nRegion of rejection

Collect data

If n (the number of pem homozygous) is equalor smaller to 1 then reject the null hypothesis

that pem has Mendelian segregation

Otherwise, there is no evidence for non-Mendelian segretion

Hypothesis Testing

Types of errors

Type I error involves rejecting the hypothesis H0 when itis, in fact, true.

Type II error involves failing to the reject the nullhypothesis when, in fact, it is false.

Note that a gives the probability of mistakenly or falselyrejecting H0.

The probability of errors of type II, denoted b, is moredifficult to specify and one generally specifies N thesample size and a, from which b can be derived.

10

Hypothesis Testing

1-ββH1 True

α1-αH0 True

Reject H0Accept H0

Error Probabilities

Hypothesis Testing

One-tailed and two-tailed

Importance of Hypothesis H1

Hypothesis Testing

Inference procedure

1. State the null hypothesis (H0) and its alternative (H1).Decide what data to collect and under what conditions.

2. Choose a test, the model of which most closelyapproximates the conditions of the research in terms of theassumption on which the test is based.

3. Find the sampling distribution of the statistical test underthe assumption that H0 is true.

4. Specify a significance level (a) and a sample size (N).5. On the basis of 2, 3, and 4 above, define the region of

rejection for the statistical test.6. Collect the data. Using the data compute the value of the

test statistic. If the value is in the region of rejection, thedecision is to reject H0; otherwise, the decision is that H0cannot be rejected at the chosen level of significance.

11

Statistical test with their assumptions, samplingdistributions, and properties have been established for manycommon experimental conditions which are now available foryou to uses

Student’s t-tests

Fisher’s exact test

ANOVA

Chi2 test

Pearson’s correlation coefficient

Wilcoxon-Mann-Whitney test

Spearman correlation coefficient

Statistics and Quantitative Biologyqobweb.igc.gulbenkian.pt/courses/sqb2010/docs/SQB2010_4.pdf1...

Documents

Transcript of Statistics and Quantitative Biologyqobweb.igc.gulbenkian.pt/courses/sqb2010/docs/SQB2010_4.pdf1...