Two-sample problems for population means BPS chapter 19 © 2006 W.H. Freeman and Company.
12.5 Differences between Means ( ’s known) Two populations: ( 1, 1 ) & ( 2, 2 ) Two samples:...
-
date post
20-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of 12.5 Differences between Means ( ’s known) Two populations: ( 1, 1 ) & ( 2, 2 ) Two samples:...
12.5 Differences between Means (’s known)
Two populations: (1, 1) & (2, 2) Two samples: one from each population Two sample means and sample sizes: n1 & n2 Compare two population means: H0: 1-2= (=0 in most cases) Alternatives: 1-2>; 1-2<; 1-2
1x 2x
Let’s go through a two sided alternative
H0: 1-2=0 vs HA: 1-2≠0 Reject H0 if is too far from zero in
either direction. How far from zero might be if 1-
2=0? Sampling distribution of is
asymptotically normal with mean 0 and standard deviation
We need to know
)( 21 xx
)( 21 xx
)( 21 xx
1 2x x
1 2x x
Fact: If the sample means are from
independent samples, then
1 2 1 2
1 2 1 2
2 2 2
2 22 2 2 21 2
1 11 2
x x x x
x x x x SE SEn n
Thus under certain assumptions:
1 2
2 21 2
1 2
( ) 0x xz
n n
Correspondingly, a confidence interval for 1-2 is
2
22
1
21
2/21 )(nn
zxx
Assumptions
1 & 2 are known Normal populations or large sample
sizes Under null hypothesis
is (asymptotically) standard normal
2
22
1
21
21 )(
nn
xxz
Rejection Regions:
Alternative Hypotheses
1-2> 1-2< 1-2
Rejection Regions
z>z z<-z z>z/2 or
z<-z/2
Example 12.4
Two labs measure the specific gravity of metal. On average do the two labs give the same answer?
1 -- Population mean by lab1
2 -- Population mean by lab2
H0: 1=2 vs HA: 12 1=0.02, n1=20, 2=0.03, n2=25,
032.21 x020.22 x
95% Confidence Interval
from –0.014 to 0.016
2 21 2
1 2 0.0251 2
2 2
( )
0.02 0.03(2.032 2.020) 1.96
20 250.012 1.96 (0.0075)
x x zn n
Two-tailed Hypotheses Test
Two sample test
Rejection region: |Z|>z0.025=1.96
Conclusion: Don’t reject H0.
1 2
1 2 0.0121.6
0.0075x x
x xz
Rejection Regions
Alternative Hypotheses
HA: 1>2
HA: 1<2 HA: 12
Rejection Regions
z>z z<-z z>z/2 or
z<-z/2
Exercise An investigation of two kinds of photocopying
equipment showed that a random sample of 60 failures of one kind of equipment took on the average 84.2 minutes to repair, while a random sample of 60 failures of another kind of equipment took on the average 91.6 minutes to repair. If, on the basis of collateral information, it can be assumed that 1=2=19.0 minutes for such data, test at the 0.02 level of significance whether the difference between these two sample means is significant.
12.6 Differences Between Means (unknown equal variances) Large samples n130; n230
Small samples 1. 1=2
2. 12
Large Samples
n130; n230 Estimate 1 and 2 by s1 and s2
Set
2
22
1
21
21 )(
ns
ns
xxz
Rejection Regions
Alternative Hypotheses
HA: 1>2
HA: 1<2 HA: 12
Rejection Regions
z>z z<-z z>z/2 or
z<-z/2
Small Samples
1=2= unknown Two populations are normal Standard error
Estimate the common variance
212
22
1
21 11
21 nnnnxx
Pooled standard deviation
Using both s12 and s2
2 to estimate 2, we combine these estimates, weighting each by its d.f.. The combined estimate of 2 is sp
2, the pooled estimate:
Estimate by sp
2
)1()1(
21
222
2112
nn
snsnsp
Two-Sample T-test
T-test (t distribution with df=n1+n2-2)
100(1-)% CI
21
21
11
)(
nns
xxt
p
212/21
11)(
nnstxx p
Hypothesized 1- 2
Example 12.5
Compare blood pressures Two populations: common
variance =0.05 n1=10, s1=16.2, n2=12, s2=14.3,
1251 x
1372 x
6.23021210
)3.14)(112()2.16)(110( 222
ps
CI & test
sp=15.2 df=10+12-2=20 Critical value t0.025=2.086 t statistic: reject H0 if |t|>2.086
Conclusion? Don’t Reject.
CI: -122.086(6.51)=-12 13.6 -1.6 to 25.6
84.151.6
12
121
101
2.15
137125
t
What happens when variances are not equal?
Testing: H0: 1-2=δ. Normal population 1 and 2 are not necessarily equal 1 and 2 unknown
1 2 1 2
1 2 1 2
2 2 2
2 2 2 22 2 1 2 1 2
1 2 1 2
estimated by
x x x x
x x x x
s s
n n n n
Two sample t-test with unequal variances
1 2
2 21 2
1 2
x xt
s s
n n
d.f. =min(n1-1, n2-1)
Exercise In a department store’s study designed to test
whether or not the mean balance outstanding on 30-day charge accounts is the same in its two suburban branch stores, random samples yielded the following results:
Use the 0.05 level of significance to test the null hypothesis 1-2=0.
1 1 1
2 2 2
80 $64.20 s $16.00
100 $71.41 s $22.13
n x
n x
12.7 Paired Data
12
3
4
5
6
T=top water zinc concentration (mg/L)B=bottom water zinc (mg/L)
1 2 3 4 5 6Top 0.415 0.238 0.390 0.410 0.605 0.609Bottom 0.430 0.266 0.567 0.531 0.7070.716
1982 study of trace metals in South Indian River. 6 random locations
One of the first things to do when analyzing data is to PLOT the data
This is not a useful way to plot the data. There is not a clear distinction between bottom water and top water zinc—even though Bottom>Top at all 6 locations.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Zinc
Top Bottom
A better way
0.2
0.3
0.4
0.5
0.6
0.7
Zinc
Top Bottom
Connect points in the same pair.
A better way
0
0.2
0.4
0.6
0.8
0 0.2 0.4 0.6 0.8
Bottom=Top
The plot suggests that Bottom>Top. Is it true?
That is equivalent to ask: is it true that difference>0?
1 2 3 4 5 6
Top 0.4150.2380.3900.4100.6050.609Bottom 0.4300.2660.5670.5310.7070.716D=B-T 0.0150.0280.1770.1210.1020.107
Ho: D=0 vs HA: D>0
First check the assumption that the population is normal
Normal Pl ot
0
0. 05
0. 1
0. 15
0. 2
- 2 - 1 0 1 2
Expected Z
Orde
red
diff
eren
ce(x
)
Ser i es1
Doing a one-sided test
Ho: D=0 vs HA: D>0
6
0.092 0.0923.68
0.0250.061/ 6
D D
D Dt
S
t0.05 at 5 d.f. is 2.015. So anything greater than 2.015 will be an evidence against H0.We reject H0: B-T=0 in favor of HA: B-T>0.
Another example
The average weekly losses of man-hours due to accidents in 10 industrial plants before and after installation of an elaborate safety program:
Plants 1 2 3 4 5 6 7 8 9 10 Before 45 73 46 124 33 57 83 34 26 17 After 36 60 44 119 35 51 77 29 24 11diff(B-A) 9 13 2 5 -2 6 6 5 2 6
Is the safety program effective? (level=0.05)
Two Populations: Before and After
Normal? Independent?
No, No
Normal Probability Plots
Small sizes Skew to right
somehow
-1 0 1
Quantiles of Standard Normal
20
40
60
80
10
01
20
be
fore
-1 0 1
Quantiles of Standard Normal
20
40
60
80
10
01
20
aft
er
Normal Probability Plot for Difference
Looks better
-1 0 1
Quantiles of Standard Normal
05
10
diff
Consider the Differences
Paired Observations:before and after the installation of safety program are from the same plants (dependent)
Data from different plants may be independent
Diff: 9 13 2 5 -2 6 6 5 2 6
Set up a Test—Paired T-Test
‘ effective’ means the program reduces the accidents, i.e., before > after (D>0)
=difference of average accidents H0: D=0 vs HA: D>0The procedure is the same as the one-sample t-test
Df=n-1ns
xt
D
D
/
Rejection Regions for Paired T-test
Alternative Hypotheses
D> D< D
Rejection Regions
t>t t<-t t>t/2 or
t<-t/2
Paired t-test
One-tailed test Critical value: df=9, t0.05=1.833 Sample mean & standard deviation:
t-statistic: Conclusion: reject H0 since
t=4.03>1.833
03.410/08.4
02.5
/
ns
xt
D
D 08.4;2.5 DD sx