SUMMARY Hypothesis testing. Self-engagement assesment.

53
SUMMARY Hypothesis testing

Transcript of SUMMARY Hypothesis testing. Self-engagement assesment.

Page 1: SUMMARY Hypothesis testing. Self-engagement assesment.

SUMMARYHypothesis testing

Page 2: SUMMARY Hypothesis testing. Self-engagement assesment.

Self-engagement assesment

Page 3: SUMMARY Hypothesis testing. Self-engagement assesment.

Null hypothesis

no song

song

Null hypothesis: I assume that populations without and with song are same.

At the beginning of our calculations, we assume the null

hypothesis is true.

Page 4: SUMMARY Hypothesis testing. Self-engagement assesment.

Hypothesis testing song• population • sample

8.27.8

corresponding probability is 0.0022

Because of such a low probability, we interpret 8.2 as a significant increase over 7.8 caused by undeniable pedagogical qualities of the 'Hypothesis testing song'.

Page 5: SUMMARY Hypothesis testing. Self-engagement assesment.

Four steps of hypothesis testing

1. Formulate the null and the alternative (this includes one- or two-directional test) hypothesis.

2. Select the significance level α – a criterion upon which we decide that the claim being tested is true or not.

--- COLLECT DATA ---

3. Compute the p-value. The p-value is the probability that the data would be at least as extreme as those observed, if the null hypothesis were true.

4. Compare the p-value to the α-level. If p ≤ α, the observed effect is statistically significant, the null is rejected, and the alternative hypothesis is valid.

Page 6: SUMMARY Hypothesis testing. Self-engagement assesment.

One-tailed and two-tailedone-tailed (directional) test

two-tailed (non-directional) test

Z-critical value, what is it?

Page 7: SUMMARY Hypothesis testing. Self-engagement assesment.

NEW STUFF

Page 8: SUMMARY Hypothesis testing. Self-engagement assesment.

Decision errors• Hypothesis testing is prone to misinterpretations.• It's possible that students selected for the musical lesson

were already more engaged.• And we wrongly attributed high engagement score to the

song.• Of course, it's unlikely to just simply select a sample with

the mean engagement of 8.2. The probability of doing so is 0.0022, pretty low. Thus we concluded it is unlikely.

• But it's still possible to have randomly obtained a sample with such a mean mean.

Page 9: SUMMARY Hypothesis testing. Self-engagement assesment.

Four possible things can happen

Decision

Reject H0 Retain H0

State of the world

H0 true 1 3

H0 false 2 4

In which cases we made a wrong decision?

Page 10: SUMMARY Hypothesis testing. Self-engagement assesment.

Four possible things can happen

Decision

Reject H0 Retain H0

State of the world

H0 true 1

H0 false 4

In which cases we made a wrong decision?

Page 11: SUMMARY Hypothesis testing. Self-engagement assesment.

Four possible things can happen

Decision

Reject H0 Retain H0

State of the world

H0 true Type I error

H0 false Type II error

Page 12: SUMMARY Hypothesis testing. Self-engagement assesment.

Type I error• When there really is no difference between the

populations, random sampling can lead to a difference large enough to be statistically significant.

• You reject the null, but you shouldn't.• False positive – the person doesn't have the disease, but

the test says it does

Page 13: SUMMARY Hypothesis testing. Self-engagement assesment.

Type II error• When there really is a difference between the populations,

random sampling can lead to a difference small enough to be not statistically significant.

• You do not reject the null, but you should.• False negative - the person has the disease but the test

doesn't pick it up

• Type I and II errors are theoretical concepts. When you analyze your data, you don't know if the populations are identical. You only know data in your particular samples. You will never know whether you made one of these errors.

Page 14: SUMMARY Hypothesis testing. Self-engagement assesment.

The trade-off• If you set α level to a very low value, you will make few

Type I/Type II errors.• But by reducing α level you also increase the chance of

Type II error.

Page 15: SUMMARY Hypothesis testing. Self-engagement assesment.

Clinical trial for a novel drug• Drug that should treat a disease for which there exists no

therapy• If the result is statistically significant, drug will me

marketed.• If the result is not statistically significant, work on the drug

will cease.• Type I error: treat future patients with ineffective drug• Type II error: cancel the development of a functional drug

for a condition that is currently not treatable.• Which error is worse?• I would say Type II error. To reduce its risk, it makes

sense to set α = 0.10 or even higher.

Harvey Motulsky, Intuitive Biostatistics

Page 16: SUMMARY Hypothesis testing. Self-engagement assesment.

Clinical trial for a me-too drug• Drug that should treat a disease for which there already

exists another therapy• Again, if the result is statistically significant, drug will me

marketed.• Again, if the result is not statistically significant, work on

the drug will cease.• Type I error: treat future patients with ineffective drug• Type II error: cancel the development of a functional drug

for a condition that can be treated adequately with existing drugs.

• Thinking scientifically (not commercially) I would minimize the risk of Type I error (set α to a very low value).

Harvey Motulsky, Intuitive Biostatistics

Page 17: SUMMARY Hypothesis testing. Self-engagement assesment.

Engagement example, n = 30

H0 :

HA :

two-tailed test

𝑛=30Z = 0.79

Z = 1.87

𝜇=0

www.udacity.com – Statistics

Page 18: SUMMARY Hypothesis testing. Self-engagement assesment.

Engagement example, n = 30

Decision

Reject H0 Retain H0

State of the world

H0 true

H0 false

Which of these four quadrants represent the result of our hypothesis test?

www.udacity.com – Statistics

Page 19: SUMMARY Hypothesis testing. Self-engagement assesment.

Engagement example, n = 30

Decision

Reject H0 Retain H0

State of the world

H0 true X

H0 false

Which of these four quadrants represent the result of our hypothesis test?

Page 20: SUMMARY Hypothesis testing. Self-engagement assesment.

Engagement example, n = 50

H0 :

HA :

two-tailed test

𝒏=𝟓𝟎Z = 1.02

Z = 2.42

𝜇=0

www.udacity.com – Statistics

Page 21: SUMMARY Hypothesis testing. Self-engagement assesment.

Engagement example, n = 50

Decision

Reject H0 Retain H0

State of the world

H0 true

H0 false

Which of these four quadrants represent the result of our hypothesis test?

www.udacity.com – Statisticswww.udacity.com – Statistics

Page 22: SUMMARY Hypothesis testing. Self-engagement assesment.

Engagement example, n = 50

Decision

Reject H0 Retain H0

State of the world

H0 true X

H0 false

Which of these four quadrants represent the result of our hypothesis test?

www.udacity.com – Statistics

Page 23: SUMMARY Hypothesis testing. Self-engagement assesment.

population of students that did not attend the musical lesson

population of students that did attend the musical lesson

unknown

𝜇0parameters are known

sample

𝑥

statistic is known

Page 24: SUMMARY Hypothesis testing. Self-engagement assesment.

Test statistic

𝑍=𝑥−𝜇0𝜎 0

√𝑛test statistic

Z-test

We use Z-test if we know the population mean and the population s.d. .

Page 25: SUMMARY Hypothesis testing. Self-engagement assesment.

New situation• An average engagement score in the population of 100

students is 7.5.• A sample of 50 students was exposed to the musical

lesson. Their engagement score became 7.72 with the s.d. of 0.6.

• DECISION: Does a musical performance lead to the change in the students' engagement? Answer YES/NO.

• Setup a hypothesis test, please.

Page 26: SUMMARY Hypothesis testing. Self-engagement assesment.

Hypothesis test

• H0:

• H1: • In this case doing two-sided test is the only way to test the null.

You compare the sample mean of 7.72 with the population mean of 7.5. It seems that sample mean is larger than the population mean (7.72 > 7.5), but the sample s.d. is 0.6. You can't setup the one-tailed test as you can't guess the correct direction of the relationship. Actually, you could very easily miss the correct direction.

Page 27: SUMMARY Hypothesis testing. Self-engagement assesment.

Formulate the test statistic

• Instead of we only know the sample s.d.• We can use it as the point estimate of population s.d.• However, this will estimate s.d. for the population

exposed to the musical lesson, in the above formula is for "unperturbed" population.

• In this case, it is common to make an assumption that both populations have the same standard deviation.

𝑍=𝑥−𝜇0𝜎 0

√𝑛

but this is unknown!

population of students that did attend the musical lesson

unknown sample

𝑥𝑠

population of students that did not attend the musical lesson

𝜇0known

unknown

Page 28: SUMMARY Hypothesis testing. Self-engagement assesment.

t-statistic

Choose a correct alternative in the following statements:

1. The larger/smaller the value of , the strongest the evidence that .

2. The larger/smaller the value of , the strongest the evidence that .

3. The further the value from in either direction, the stronger/weaker evidence that .

𝑡=𝑥−𝜇0𝑠

√𝑛

one sample t-test jednovýběrový t-test

Page 29: SUMMARY Hypothesis testing. Self-engagement assesment.

t-distribution

Page 30: SUMMARY Hypothesis testing. Self-engagement assesment.

One-sample t-test

level

𝑡=𝑥−𝜇0𝑠

√𝑛

Page 31: SUMMARY Hypothesis testing. Self-engagement assesment.

Quiz• What will increase the t-statistic? Check all that apply.

1. A larger difference between and .

2. Larger .

3. Larger .

4. Larger standard error.

𝑡=𝑥−𝜇0𝑠

√𝑛

Page 32: SUMMARY Hypothesis testing. Self-engagement assesment.

Z-test vs. t-test• Use Z-test if

• you know the standard deviation of the population. • If you know the sample AND you have large sample size

(traditionally over 30). In addition, you assume that the population standard deviation is the same as the sample standard deviation.

• Use t-test if• you don't know the population standard deviation (you know only

sample standard deviation ) and have a relatively small sample size.

• Tip: If you know only the sample standard deviation, always use t-test.

• For two sided test and , what are the critical values at Z- and t-distributions?

Page 33: SUMMARY Hypothesis testing. Self-engagement assesment.

Typical example of one-sample t-test

• You have to prepare 20 tubes with 30% solution od NaCl. When you're finished, you measure the strength of 20 solutions. The mean strength is 31.5%, with the s.d. of 1.15%.

• Decide if you have 30% solution or not?

• , • You use t-test in such a situation.• You could use Z-test if you have a large sample (e.g., you

prepared 100 tubes), but generally it is always correct to use t-test.

Page 34: SUMMARY Hypothesis testing. Self-engagement assesment.

Dependent t-test for paired samples• Two samples are dependent when the same subject

takes the test twice.• paired t-test (párový t-test)

• This is a two-sample test, as we work with two samples.• Examples of such situations:

• Each subject is assigned to two different conditions (e.g., use QWERTZ keyboard and AZERTY keyboard and compare the error rate).

• Pre-test … post-test.• Growth over time.

Page 35: SUMMARY Hypothesis testing. Self-engagement assesment.

Example• 25 students attended a normal lesson. Their mean

engagement is .• The same 25 students then heard the „Hypotheses testing

song“. Their mean engagement score is .

student 1

student 2

student n

no songsong

𝑥𝑖− 𝑦 𝑖

Page 36: SUMMARY Hypothesis testing. Self-engagement assesment.

Do the hypothesis test• Now we follow the same procedure as for the one-sample

t-test, except that we use values of differences .

• What will be the null?

• But this is equivalent to stating

• And the alternative?

• What is our point estimate for ?

Page 37: SUMMARY Hypothesis testing. Self-engagement assesment.

Do the hypothesis test• What else do we need to calculate a t-statistic?

• Wee need the standard deviation of mean differences.

• We have a paired samples table, so we know each value, and we can easily calculate (do not forget, you're dividing by !).

• Let's say it is .• The t-statistic • Do we reject the null or do we fail to reject the null at the ?

• Critical values for for two-tailed are .• We reject the null.

Page 38: SUMMARY Hypothesis testing. Self-engagement assesment.

Dependent samples• e.g., give one person two different conditions to see how

he/she reacts. Maybe one control and one treatment or two types of treatments.

• Advantages• we can use fewer subjects• cost-effective• less time-consuming

• Disadvantages• carry-over effects• order may influence results

Page 39: SUMMARY Hypothesis testing. Self-engagement assesment.

Independent samples• Disadvantages of dependent samples become

advantages of dependent samples and vice versa.• We need more subjects, it's generally more time consuming and

more expensive.• No carry-over effects (each subject only gets one treatment).

• Everything else is same• ,

• Reject if , fail to reject if .

Page 40: SUMMARY Hypothesis testing. Self-engagement assesment.

Independent samples• However, the standard error changes because it is based

on two sample sizes and two standard deviations.• If we subtract normally distributed data from another

normally distributed data, we get a new data set

• Similarly, for the sample:

• standard error

This is true only if two samples are independent!

Page 41: SUMMARY Hypothesis testing. Self-engagement assesment.

Independent samples• However, the standard error changes because it is based

on two sample sizes and two standard deviations.• If we subtract normally distributed data from another

normally distributed data, we get a new data set

• Similarly, for the sample:

• standard error

𝑑 .𝑓 .=

𝑛 1+𝑛 2−

2

Page 42: SUMMARY Hypothesis testing. Self-engagement assesment.

An example• Again, the musical lesson.• Let's teach students without the musical performance,

and expose different students to the song.• What will be the null and the alternative?

• ,

• Which direction will we use?• two-tailed

Page 43: SUMMARY Hypothesis testing. Self-engagement assesment.

An example

• , • , • Standard error

• Calculate t-statistic

• How will you proceed further?• calculate d.f., define , find the critical t-value, compare the t-statistic

with the t-critical, decide about the null

Page 44: SUMMARY Hypothesis testing. Self-engagement assesment.

An example

• t-critical value for is • Reject or fail to reject the null?• Reject the null.

Page 45: SUMMARY Hypothesis testing. Self-engagement assesment.

Summary of t-tests• one-sample test (jednovýběrový test)

• you test H0 :

• two-sample test (dvouvýběrový test)• you test H0 :

• dependent samples• paired t-test (párový test)

• independent samples• equal variances • unequal variances

two-sample tests

Page 46: SUMMARY Hypothesis testing. Self-engagement assesment.

F-test of equality of variances• How to know if our variances are equal or not?• var.test() in R, • Test statistic is a ratio of two variances. It has an F-

distribution. Each numerator and denominator has certain number of d.f.

source: Wikipedia

Page 47: SUMMARY Hypothesis testing. Self-engagement assesment.

t-test in R• t.test()• Let's have a look into R manual: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/t.test.html

• See my website for link to pdf explaining various t-test in R (with examples).

Page 48: SUMMARY Hypothesis testing. Self-engagement assesment.

Assumptions

1. Unpaired t-tests are highly sensitive to the violation of the independence assumption.

2. Populations samples come from should be approximately normal.• This is less important for large sample sizes.

• What to do if these assumptions are not fullfilled

1. Use paired t-test

2. Let's see further

Page 49: SUMMARY Hypothesis testing. Self-engagement assesment.

Check for normality – histogram

Page 50: SUMMARY Hypothesis testing. Self-engagement assesment.

Check for normality – QQ-plotqqnorm(rivers)qqline(rivers)

Page 51: SUMMARY Hypothesis testing. Self-engagement assesment.

Check for normality – tests• The graphical methods for checking data normality still

leave much to your own interpretation. If you show any of these plots to ten different statisticians, you can get ten different answers.

• H0: Data follow a normal distribution.

• Shapiro-Wilk test• shapiro.test(rivers): Shapiro-Wilk normality test

data: rivers W = 0.6666, p-value < 2.2e-16

Page 52: SUMMARY Hypothesis testing. Self-engagement assesment.

Nonparametric statistics• Small samples from considerably non-normal

distributions.• non-parametric tests

• No assumption about the shape of the distribution.• No assumption about the parameters of the distribution (thus they

are called non-parametric).

• Simple to do, however their theory is extremely complicated. Of course, we won't cover it at all.

• However, they are less accurate than their parametric counterparts.• So if your data fullfill the assumptions about normality, use

paramatric tests (t-test, F-test).

Page 53: SUMMARY Hypothesis testing. Self-engagement assesment.

Nonparametric tests• If the normality assumption of the t-test is violated, and

the sample sizes are too small, then its nonparametric alternative should be used.

• The nonparametric alternative of t-test is Wilcoxon test.• wilcox.test()• http://stat.ethz.ch/R-manual/R-patched/library/stats/html/wilcox.test.html