Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test...

Hypothesis test flow chart

frequencydata

Measurement scale

number of variables

1

basic χ2 test(19.5)

Table I

χ2 test for independence(19.9) Table I

2correlation (r)number of correlations

1Test H0: r=0(17.2)

Table G

2

Test H0: r1= r2(17.4)

Tables H and A

number of means

Mea

ns

Do you know s?

1Yesz -test(13.1) Table A

t -test(13.14)Table D

2

independentsamples?

Yes

Test H0: m1= m2

(15.6) Table D

No

Test H0: D=0(16.4)

Table D

More than 2 number of factors

1

1-way ANOVACh 20

Table E

2 2-way ANOVACh 21

Table E

No

START HERE

Chapter 14: Testing Hypotheses about the Difference between Two Independent Groups

Example: The mothers of 10 left handed students in this class have a mean height of 64.1 inches with SSX = 60.9, and the mothers of the 86 right handed students in this class have a mean height of 63.94 inches with SSY = 928.7. Are the left hander’s mothers significantly different than the right-hander’s mothers, using a = .05?

This chapter is all about conducting a t-test on problems like this, where we’re testing for differences in the means from two independent samples.

Setting up the hypotheses: let X be heights of left-handed student’s mothers and Y be heights of the right handed student’s mothers.

Null Hypothesis, H0: mX=mY

Alternative Hypothesis HA: mX≠mY

7.928,9.60,86,10,94.63,1.64 YXYX SSSSnnYX

YXYX

YX

YXpYX nnnn

SSSS

nnss

11

)1()1(

11

To calculate the standard error of the estimate, we need to find the ‘average’ standard deviation from the two samples. This is done by calculating the ‘pooled’ standard deviation:

The standard error of the estimate is calculated from the pooled variance by:

And then calculate our value for the t-distribution with nX+nY-2 degrees of freedom.

)1()1(

YX

YXp nn

SSSSs

YX

hypYX

s

YXt

This thing is zero when we don’t expect a difference of means under H0

H0: mX = mY

HA: mX≠mY

08.186

1

10

1

)186()110(

7.9289.6011

)1()1(

YXYX

YXYX nnnn

SSSSs

Back to our example:

1481.0

08.1

94.631.64

YX

hypYX

s

YXt

Looking up Table D for df = (10-1)+(86-1) = 94 (we’ll use 95), two-tailed, a = .05, tcrit = 1.984

Decision: Our observed value of t does not fall in the critical region, so we fail to reject H0

7.928,9.60,86,10,94.63,1.64 YXYX SSSSnnYX

To apply a t-test on the difference between two independent means, we must assume:

1) Each sample is drawn at random from its respective population.

This is the basis rule of random sampling, but for two populations.

2) The two random samples must be independently selected.

i.e. there should be no expected correlation between X and Y.

3) Samples are drawn with replacement

This is almost never done, but if the population is large it doesn’t matter much.

4) The sampling distribution of the difference between means is normal.

This where the central limit theorem comes to the rescue. The larger the sample size, the more normally distributed the difference of means will become.

Other names for this test are ‘Unpaired two-sample t-test’ and ‘Independent two-sample” t-test.

YXYX

YYXXYX nnnn

snsns

11

)1()1(

11 22

If you are given the standard deviations sX and sY instead of SSX and SSY, then you can use the fact that (note the lower case s for estimating population standard deviation):

Or

To calculate:

and

12

X

xX n

SSs

2)1( XXx snSS

)1()1(

11 22

YX

YYXXp nn

snsns

Example: The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches. Is there a significant difference between the heights of these two groups of students using a = .05?

Answer: We will use a two-tailed independent samples t-test with HO: mX=mY, and HA: mX≠mY

Since you’ve been given just means, standard deviations and sample sizes, you should use this estimate for the standard error of the difference between means:

YXYX

YYXXYX nnnn

snsns

11

)1()1(

11 22

79.0

51

1

45

1

)151()145(

62.315114.4145 22

YXs

Calculating t with nX+nY-2 = 45+51-2= 94 degrees of freedom:

Look up tcrit in table D with 0.025 in one tail:

tcrit = 1.986. So our rejection region will be t < -1.986 or t > 1.986

Since our observed t = 2.4051, we reject H0 .

Using APA format:

There is a significant difference between the heights of students with tall fathers (M = 66.8, SD = 4.14) and students with less tall fathers (M = 64.9, SD = 3.62) , t(94) = 2.4, p<.05.

For fun at home, would we still reject H0 using a = .01?

4.279.0

9.648.66

YXs

YXt

Plotting bar graphs with ‘error bars’

It is useful to compare means by plotting them as bar graphs with error bars representing plus and minus one standard error of the mean.

61.045

14.4

X

XX n

ss

Example (again): The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches.

51.051

62.3

Y

YY n

ss

61.045

14.4

X

XX n

ss

Example (again): The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches.

51.051

62.3

Y

YY n

ss

Rule of thumb: If the error bars representing standard errors of the mean overlap, a one-tailed t-test will probably fail to reject HO with a = .05.

In general, the bars need a gap to reach statistical significance.

Fathers over 70 in. Fathers below 70 in.64

64.5

65

65.5

66

66.5

67

67.5

68

He

igh

t (in

)

Students in our class

fluffy noisy0

20

40

60

80

100

120a

ng

er

athletes

In the pursuit of science, you measure the anger of 10 fluffy and 10 noisy athletes and obtain for fluffy athletes a mean anger of 102.5 and a standard deviation of 19.21, and for noisy athletes a mean of 92.8 and a standard deviation of 20.96.

Using an alpha value of 0.05, is the mean anger of fluffy athletes significantly greater than for the noisy athletes?

We fail to reject H0. (t(18) = 1.08, tcrit = 1.7341) The anger of fluffy athletes is not significantly greater than the anger of noisy athletes. p = 0.1474

Your Psych 315 professor asks you to measure the money of 10 laughable and 10 flimsy candy bars and obtain for laughable candy bars a mean money of 106 and a standard deviation of 12.25, and for flimsy candy bars a mean of 107.2 and a standard deviation of 17.4.

Using an alpha value of 0.05, is the mean money of laughable candy bars significantly greater than for the flimsy candy bars?

We fail to reject H0. (t(18) = -0.18, tcrit = 1.7341) The money of laughable candy bars is not significantly greater than the money of flimsy candy bars. p = 0.5698

laughable flimsy0

20

40

60

80

100

120m

oney

candy bars

Tomorrow you measure the clothing of 10 kind and 10 red geeks and obtain for kind geeks a mean clothing of 110.2 and a standard deviation of 14.91, and for red geeks a mean of 87.4 and a standard deviation of 13.22. Using an alpha value of 0.05, is the mean clothing of kind geeks significantly greater than for the red geeks?

We reject H0. (t(18) = 3.62, tcrit = 1.7341) The clothing of kind geeks is significantly greater than the clothing of red geeks. p = 0.0010

kind red0

20

40

60

80

100

120cl

othi

ng

geeks

Just for fun, you measure the conductivity of 14 straight and 12 strange teams and obtain for straight teams a mean conductivity of 19.4 and a standard deviation of 11.74, and for strange teams a mean of 13.2 and a standard deviation of 11.77. Using an alpha value of 0.05, is the mean conductivity of straight teams significantly greater than for the strange teams?

We fail to reject H0. (t(24) = 1.34, tcrit = 1.7109) The conductivity of straight teams is not significantly greater than the conductivity of strange teams. p = 0.0963

straight strange0

5

10

15

20

25co

nduc

tivity

teams

On a dare, you measure the age of 20 capricious and 19 enthusiastic Asian food and obtain for capricious Asian food a mean age of 22.8 and a standard deviation of 12.05, and for enthusiastic Asian food a mean of 12.5 and a standard deviation of 14.82. Using an alpha value of 0.05, is the mean age of capricious Asian food significantly greater than for the enthusiastic Asian food?

We reject H0. (t(37) = 2.39, tcrit = 1.6871) The age of capricious Asian food is significantly greater than the age of enthusiastic Asian food. p = 0.0111

capricious enthusiastic0

5

10

15

20

25

30ag

e

Asian food

I measure the conduct of 17 aware and 19 easy beers and obtain for aware beers a mean conduct of 26.5 and a standard deviation of 14.9, and for easy beers a mean of 17.6 and a standard deviation of 15.04. Using an alpha value of 0.05, is the mean conduct of aware beers significantly greater than for the easy beers?

We reject H0. (t(34) = 1.78, tcrit = 1.6909) The conduct of aware beers is significantly greater than the conduct of easy beers. p = 0.0420

aware easy0

5

10

15

20

25

30

35co

nduc

t

beers

We decide to measure the visual acuity of 12 husky and 16 wrathful Seattleites and obtain for husky Seattleites a mean visual acuity of 50.1 and a standard deviation of 11.45, and for wrathful Seattleites a mean of 42.4 and a standard deviation of 15.21. Using an alpha value of 0.05, is the mean visual acuity of husky Seattleites significantly greater than for the wrathful Seattleites?

We fail to reject H0. (t(26) = 1.47, tcrit = 1.7056) The visual acuity of husky Seattleites is not significantly greater than the visual acuity of wrathful Seattleites. p = 0.0772

husky wrathful0

10

20

30

40

50

60vi

sual

acu

ity

Seattleites

super tan0

10

20

30

40

50w

ate

r

brains

I go and measure the water of 13 super and 20 tan brains and obtain for super brains a mean water of 41.5 and a standard deviation of 14.32, and for tan brains a mean of 31.9 and a standard deviation of 15.8. Using an alpha value of 0.05, is the mean water of super brains significantly greater than for the tan brains?

We reject H0. (t(31) = 1.77, tcrit = 1.6955) The water of super brains is significantly greater than the water of tan brains. p = 0.0435

Effect size and power for the independent samples t-test

)1()1(

YX

YXp nn

SSSSs

Effect size is the difference between means divided by the estimate of the standard deviation. For tests of the difference between means our estimate of the standard deviation is the pooled standard deviation:

p

YX

s

YXg

)(

)1()1(

11 22

YX

YYXXp nn

snsns

Where (as before)

or

Again, by convention, effect sizes of 0.2 are small, 0.5 are medium and 0.8 are large.

Example (again): The mothers of 10 left handed students in this class have a mean height of 64.1 inches with SSX = 60.9, and the mothers of the 86 right handed students in this class have a mean height of 63.94 inches with SSY = 928.7 What is the effect size?

Answer: these are our parameters:

05.024.3

94.631.64)(

p

YX

s

YXg

24.3859

7.9289.60

)1()1(

YX

YXp nn

SSSSs

This is a very small effect size.

7.928,94.63,86,10,94.63,1.64 YXYX SSSSnnYX

Example (again): The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches. What is the effect size?

Answer: This time, we have been given standard deviations, so we’ll use:

87.3

)151()145(

62.315114.4145

)1()1(

11 2222

YX

YYXXp nn

snsns

49.087.3

9.648.66)(

p

YX

s

YXg

This is a medium effect size. Remember it was a ‘significant’ t-test.

Power curves for independent samples t-test:

Remember, the power of a test is the probability of correctly rejecting H0 when it is false.

Power depends on a, nX, nY, and the effect size.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n=8

10

12

15

20

25

30

40

50

75

100

150

250

500

1000

Effect size (d)

Po

wer

a = 0.01, 1-tail, 2 means

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n=8

1012

15

20

2530

40

50

75

100

150

250

500

1000

Effect size (d)

Po

wer

a = 0.05, 1-tail, 2 means

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n=8

10

12

15

20

25

30

40

50

75

100

150

250

500

1000

Effect size (d)

Po

wer

a = 0.01, 2-tails, 2 means

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n=8

10

12

15

20

25

30

40

50

75

100

150

250

500

1000

Effect size (d)

Po

wer

a = 0.05, 2-tails, 2 means

Example: Suppose you randomly chose two groups of 25 subjects from the general population and gave a new drug designed to enhance IQ to one group and a placebo to the other.

Suppose you later give all subjects an IQ test and find that the group that got the new drug had a mean IQ of 110 and a standard deviation of 12, and the placebo group had a mean of 102 and a standard deviation of 16.

1) Conduct an independent sample t-test with a = .05 to determine if the drug had a significant effect on IQ

Answer: We’ll test the null hypothesis that mX-mY = 0, where X is the drug group and Y is the placebo group. Since we’re given standard deviations, we’ll use:

14.14

)125()125(

1612512125

)1()1(

11 2222

YX

YYXXp nn

snsns

Calculating t with nX+nY-2 = 24+24 = 48 degrees of freedom:

Look up tcrit in table D with 0.025 in one tail (using df = 50)

tcrit = 2.009. So our rejection region will be t < -2.009 or t > 2.009

Since our observed t = 2.0, we fail to reject H0, so we cannot conclude that there is a significant difference between the heights of these two groups.

We just missed the a=.05 cutoff for rejecting H0

0.408.014.1411

yx

YX nnsps

0.20.4

102110

YXs

YXt

14.14

)125()125(

1612512125

)1()1(

11 2222

YX

YYXXp nn

snsns

Example contd.

2) What was the effect size in this example?

Answer: We use the pooled standard deviation:

and then effect size:

57.014.14

102110)(

p

YX

s

YXg

Example contd.

3) Estimate the power value (to the nearest tenth) from this test given this effect size?

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n=8

10

12

15

20

25

30

40

50

75

100

150

250

500

1000

Effect size (d)

Po

we

r

= 0.05, 2-tails, 2 means

We have a power of about 0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

n=8

10

12

15

20

25

30

40

50

75

100

150

250

500

1000

Effect size (d)

Po

we

r

= 0.05, 2-tails, 2 means

Example contd.

4) How large of a sample size would we need to obtain a power level of 0.8?

We’d need a sample size of about 50 people in each group.

Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test...

Documents

Transcript of Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test...