Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test...
-
Upload
lynette-dawson -
Category
Documents
-
view
240 -
download
4
Transcript of Hypothesis test flow chart frequency data Measurement scale number of variables 1 basic χ 2 test...
Hypothesis test flow chart
frequencydata
Measurement scale
number of variables
1
basic χ2 test(19.5)
Table I
χ2 test for independence(19.9) Table I
2correlation (r)number of correlations
1Test H0: r=0(17.2)
Table G
2
Test H0: r1= r2(17.4)
Tables H and A
number of means
Mea
ns
Do you know s?
1Yesz -test(13.1) Table A
t -test(13.14)Table D
2
independentsamples?
Yes
Test H0: m1= m2
(15.6) Table D
No
Test H0: D=0(16.4)
Table D
More than 2 number of factors
1
1-way ANOVACh 20
Table E
2 2-way ANOVACh 21
Table E
No
START HERE
Chapter 14: Testing Hypotheses about the Difference between Two Independent Groups
Example: The mothers of 10 left handed students in this class have a mean height of 64.1 inches with SSX = 60.9, and the mothers of the 86 right handed students in this class have a mean height of 63.94 inches with SSY = 928.7. Are the left hander’s mothers significantly different than the right-hander’s mothers, using a = .05?
This chapter is all about conducting a t-test on problems like this, where we’re testing for differences in the means from two independent samples.
Setting up the hypotheses: let X be heights of left-handed student’s mothers and Y be heights of the right handed student’s mothers.
Null Hypothesis, H0: mX=mY
Alternative Hypothesis HA: mX≠mY
7.928,9.60,86,10,94.63,1.64 YXYX SSSSnnYX
YXYX
YX
YXpYX nnnn
SSSS
nnss
11
)1()1(
11
To calculate the standard error of the estimate, we need to find the ‘average’ standard deviation from the two samples. This is done by calculating the ‘pooled’ standard deviation:
The standard error of the estimate is calculated from the pooled variance by:
And then calculate our value for the t-distribution with nX+nY-2 degrees of freedom.
)1()1(
YX
YXp nn
SSSSs
YX
hypYX
s
YXt
This thing is zero when we don’t expect a difference of means under H0
H0: mX = mY
HA: mX≠mY
08.186
1
10
1
)186()110(
7.9289.6011
)1()1(
YXYX
YXYX nnnn
SSSSs
Back to our example:
1481.0
08.1
94.631.64
YX
hypYX
s
YXt
Looking up Table D for df = (10-1)+(86-1) = 94 (we’ll use 95), two-tailed, a = .05, tcrit = 1.984
Decision: Our observed value of t does not fall in the critical region, so we fail to reject H0
7.928,9.60,86,10,94.63,1.64 YXYX SSSSnnYX
To apply a t-test on the difference between two independent means, we must assume:
1) Each sample is drawn at random from its respective population.
This is the basis rule of random sampling, but for two populations.
2) The two random samples must be independently selected.
i.e. there should be no expected correlation between X and Y.
3) Samples are drawn with replacement
This is almost never done, but if the population is large it doesn’t matter much.
4) The sampling distribution of the difference between means is normal.
This where the central limit theorem comes to the rescue. The larger the sample size, the more normally distributed the difference of means will become.
Other names for this test are ‘Unpaired two-sample t-test’ and ‘Independent two-sample” t-test.
YXYX
YYXXYX nnnn
snsns
11
)1()1(
11 22
If you are given the standard deviations sX and sY instead of SSX and SSY, then you can use the fact that (note the lower case s for estimating population standard deviation):
Or
To calculate:
and
12
X
xX n
SSs
2)1( XXx snSS
)1()1(
11 22
YX
YYXXp nn
snsns
Example: The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches. Is there a significant difference between the heights of these two groups of students using a = .05?
Answer: We will use a two-tailed independent samples t-test with HO: mX=mY, and HA: mX≠mY
Since you’ve been given just means, standard deviations and sample sizes, you should use this estimate for the standard error of the difference between means:
YXYX
YYXXYX nnnn
snsns
11
)1()1(
11 22
79.0
51
1
45
1
)151()145(
62.315114.4145 22
YXs
Calculating t with nX+nY-2 = 45+51-2= 94 degrees of freedom:
Look up tcrit in table D with 0.025 in one tail:
tcrit = 1.986. So our rejection region will be t < -1.986 or t > 1.986
Since our observed t = 2.4051, we reject H0 .
Using APA format:
There is a significant difference between the heights of students with tall fathers (M = 66.8, SD = 4.14) and students with less tall fathers (M = 64.9, SD = 3.62) , t(94) = 2.4, p<.05.
For fun at home, would we still reject H0 using a = .01?
4.279.0
9.648.66
YXs
YXt
Plotting bar graphs with ‘error bars’
It is useful to compare means by plotting them as bar graphs with error bars representing plus and minus one standard error of the mean.
61.045
14.4
X
XX n
ss
Example (again): The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches.
51.051
62.3
Y
YY n
ss
61.045
14.4
X
XX n
ss
Example (again): The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches.
51.051
62.3
Y
YY n
ss
Rule of thumb: If the error bars representing standard errors of the mean overlap, a one-tailed t-test will probably fail to reject HO with a = .05.
In general, the bars need a gap to reach statistical significance.
Fathers over 70 in. Fathers below 70 in.64
64.5
65
65.5
66
66.5
67
67.5
68
He
igh
t (in
)
Students in our class
fluffy noisy0
20
40
60
80
100
120a
ng
er
athletes
In the pursuit of science, you measure the anger of 10 fluffy and 10 noisy athletes and obtain for fluffy athletes a mean anger of 102.5 and a standard deviation of 19.21, and for noisy athletes a mean of 92.8 and a standard deviation of 20.96.
Using an alpha value of 0.05, is the mean anger of fluffy athletes significantly greater than for the noisy athletes?
We fail to reject H0. (t(18) = 1.08, tcrit = 1.7341) The anger of fluffy athletes is not significantly greater than the anger of noisy athletes. p = 0.1474
Your Psych 315 professor asks you to measure the money of 10 laughable and 10 flimsy candy bars and obtain for laughable candy bars a mean money of 106 and a standard deviation of 12.25, and for flimsy candy bars a mean of 107.2 and a standard deviation of 17.4.
Using an alpha value of 0.05, is the mean money of laughable candy bars significantly greater than for the flimsy candy bars?
We fail to reject H0. (t(18) = -0.18, tcrit = 1.7341) The money of laughable candy bars is not significantly greater than the money of flimsy candy bars. p = 0.5698
laughable flimsy0
20
40
60
80
100
120m
oney
candy bars
Tomorrow you measure the clothing of 10 kind and 10 red geeks and obtain for kind geeks a mean clothing of 110.2 and a standard deviation of 14.91, and for red geeks a mean of 87.4 and a standard deviation of 13.22. Using an alpha value of 0.05, is the mean clothing of kind geeks significantly greater than for the red geeks?
We reject H0. (t(18) = 3.62, tcrit = 1.7341) The clothing of kind geeks is significantly greater than the clothing of red geeks. p = 0.0010
kind red0
20
40
60
80
100
120cl
othi
ng
geeks
Just for fun, you measure the conductivity of 14 straight and 12 strange teams and obtain for straight teams a mean conductivity of 19.4 and a standard deviation of 11.74, and for strange teams a mean of 13.2 and a standard deviation of 11.77. Using an alpha value of 0.05, is the mean conductivity of straight teams significantly greater than for the strange teams?
We fail to reject H0. (t(24) = 1.34, tcrit = 1.7109) The conductivity of straight teams is not significantly greater than the conductivity of strange teams. p = 0.0963
straight strange0
5
10
15
20
25co
nduc
tivity
teams
On a dare, you measure the age of 20 capricious and 19 enthusiastic Asian food and obtain for capricious Asian food a mean age of 22.8 and a standard deviation of 12.05, and for enthusiastic Asian food a mean of 12.5 and a standard deviation of 14.82. Using an alpha value of 0.05, is the mean age of capricious Asian food significantly greater than for the enthusiastic Asian food?
We reject H0. (t(37) = 2.39, tcrit = 1.6871) The age of capricious Asian food is significantly greater than the age of enthusiastic Asian food. p = 0.0111
capricious enthusiastic0
5
10
15
20
25
30ag
e
Asian food
I measure the conduct of 17 aware and 19 easy beers and obtain for aware beers a mean conduct of 26.5 and a standard deviation of 14.9, and for easy beers a mean of 17.6 and a standard deviation of 15.04. Using an alpha value of 0.05, is the mean conduct of aware beers significantly greater than for the easy beers?
We reject H0. (t(34) = 1.78, tcrit = 1.6909) The conduct of aware beers is significantly greater than the conduct of easy beers. p = 0.0420
aware easy0
5
10
15
20
25
30
35co
nduc
t
beers
We decide to measure the visual acuity of 12 husky and 16 wrathful Seattleites and obtain for husky Seattleites a mean visual acuity of 50.1 and a standard deviation of 11.45, and for wrathful Seattleites a mean of 42.4 and a standard deviation of 15.21. Using an alpha value of 0.05, is the mean visual acuity of husky Seattleites significantly greater than for the wrathful Seattleites?
We fail to reject H0. (t(26) = 1.47, tcrit = 1.7056) The visual acuity of husky Seattleites is not significantly greater than the visual acuity of wrathful Seattleites. p = 0.0772
husky wrathful0
10
20
30
40
50
60vi
sual
acu
ity
Seattleites
super tan0
10
20
30
40
50w
ate
r
brains
I go and measure the water of 13 super and 20 tan brains and obtain for super brains a mean water of 41.5 and a standard deviation of 14.32, and for tan brains a mean of 31.9 and a standard deviation of 15.8. Using an alpha value of 0.05, is the mean water of super brains significantly greater than for the tan brains?
We reject H0. (t(31) = 1.77, tcrit = 1.6955) The water of super brains is significantly greater than the water of tan brains. p = 0.0435
Effect size and power for the independent samples t-test
)1()1(
YX
YXp nn
SSSSs
Effect size is the difference between means divided by the estimate of the standard deviation. For tests of the difference between means our estimate of the standard deviation is the pooled standard deviation:
p
YX
s
YXg
)(
)1()1(
11 22
YX
YYXXp nn
snsns
Where (as before)
or
Again, by convention, effect sizes of 0.2 are small, 0.5 are medium and 0.8 are large.
Example (again): The mothers of 10 left handed students in this class have a mean height of 64.1 inches with SSX = 60.9, and the mothers of the 86 right handed students in this class have a mean height of 63.94 inches with SSY = 928.7 What is the effect size?
Answer: these are our parameters:
05.024.3
94.631.64)(
p
YX
s
YXg
24.3859
7.9289.60
)1()1(
YX
YXp nn
SSSSs
This is a very small effect size.
7.928,94.63,86,10,94.63,1.64 YXYX SSSSnnYX
Example (again): The heights of the 45 students in our class with fathers above 70 inches has a mean of 66.8 inches and a standard deviation of 4.14 inches. The heights of the remaining 51 students has a mean of 64.9 and a standard deviation of 3.62 inches. What is the effect size?
Answer: This time, we have been given standard deviations, so we’ll use:
87.3
)151()145(
62.315114.4145
)1()1(
11 2222
YX
YYXXp nn
snsns
49.087.3
9.648.66)(
p
YX
s
YXg
This is a medium effect size. Remember it was a ‘significant’ t-test.
Power curves for independent samples t-test:
Remember, the power of a test is the probability of correctly rejecting H0 when it is false.
Power depends on a, nX, nY, and the effect size.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n=8
10
12
15
20
25
30
40
50
75
100
150
250
500
1000
Effect size (d)
Po
wer
a = 0.01, 1-tail, 2 means
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n=8
1012
15
20
2530
40
50
75
100
150
250
500
1000
Effect size (d)
Po
wer
a = 0.05, 1-tail, 2 means
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n=8
10
12
15
20
25
30
40
50
75
100
150
250
500
1000
Effect size (d)
Po
wer
a = 0.01, 2-tails, 2 means
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n=8
10
12
15
20
25
30
40
50
75
100
150
250
500
1000
Effect size (d)
Po
wer
a = 0.05, 2-tails, 2 means
Example: Suppose you randomly chose two groups of 25 subjects from the general population and gave a new drug designed to enhance IQ to one group and a placebo to the other.
Suppose you later give all subjects an IQ test and find that the group that got the new drug had a mean IQ of 110 and a standard deviation of 12, and the placebo group had a mean of 102 and a standard deviation of 16.
1) Conduct an independent sample t-test with a = .05 to determine if the drug had a significant effect on IQ
Answer: We’ll test the null hypothesis that mX-mY = 0, where X is the drug group and Y is the placebo group. Since we’re given standard deviations, we’ll use:
14.14
)125()125(
1612512125
)1()1(
11 2222
YX
YYXXp nn
snsns
Calculating t with nX+nY-2 = 24+24 = 48 degrees of freedom:
Look up tcrit in table D with 0.025 in one tail (using df = 50)
tcrit = 2.009. So our rejection region will be t < -2.009 or t > 2.009
Since our observed t = 2.0, we fail to reject H0, so we cannot conclude that there is a significant difference between the heights of these two groups.
We just missed the a=.05 cutoff for rejecting H0
0.408.014.1411
yx
YX nnsps
0.20.4
102110
YXs
YXt
14.14
)125()125(
1612512125
)1()1(
11 2222
YX
YYXXp nn
snsns
Example contd.
2) What was the effect size in this example?
Answer: We use the pooled standard deviation:
and then effect size:
57.014.14
102110)(
p
YX
s
YXg
Example contd.
3) Estimate the power value (to the nearest tenth) from this test given this effect size?
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n=8
10
12
15
20
25
30
40
50
75
100
150
250
500
1000
Effect size (d)
Po
we
r
= 0.05, 2-tails, 2 means
We have a power of about 0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
n=8
10
12
15
20
25
30
40
50
75
100
150
250
500
1000
Effect size (d)
Po
we
r
= 0.05, 2-tails, 2 means
Example contd.
4) How large of a sample size would we need to obtain a power level of 0.8?
We’d need a sample size of about 50 people in each group.