Comparisons Between Two Populations
Transcript of Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
1/38
Comparisons Between Two Populations
Statistics dan Probability
Semester 1, 2013
Ira M. Anjasmara
Jurusan Teknik Geomatika
-
8/13/2019 Comparisons Between Two Populations
2/38
Introduction
Previously, we have covered applications to samples drawn from onepopulation:
testing means through the z-test (n 30) or the t-test (n
-
8/13/2019 Comparisons Between Two Populations
3/38
Comparing Means Large Samples
The large sample case occurs when both samples have n 30.
Suppose we have two normally-distributed populations with differentmeans and variances: 1, 21 and 2,
22. Now, the difference in the
population means, 1 2, is also normally-distributed.
The sampling distribution of interest is:
x1
x2Samples are taken from each population, with x1, s
21 and x2, s
22, and both
n1 and n2 30.
The mean of the difference distribution is:
E[x1 x2] =1 2
The standard error of the mean of the difference distribution is then:
x1x2 =
21n1
+22n2
Statistics dan Probability 3/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
4/38
Hypothesis testing
The procedure for hypothesis testing two population means follows thesame 8-step procedure as for a normal distribution, except we use thefollowing test statistic:
z=
(x1 x2) (1 2)
x1x2
Most often, H0 will assume that 1=2, while Ha will test fordifferences. Hence, the above test statistic reduces to:
z= x1 x2x1x2
Statistics dan Probability 4/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
5/38
-
8/13/2019 Comparisons Between Two Populations
6/38
Example
Step 1Formulate alternative hypothesis: Ha:1 =2
i.e., test whether the two theodolites are different.
Formulate null hypothesis: H0:1 =2
i.e., assume that they give identical readings.
Step 2 - Determine number of tails
This is a 2-tailed test, because the null hypothesis has an equality.
Step 3 - Determine level of significance.
Were told that the significance level is = 0.05.
Statistics dan Probability 6/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
7/38
Example
Step 4 - Determine the critical value ofz
We have a 2-tailed test, so we need to find z
/2=z0.025From the standard normal distribution table, we have:z0.025=z(0.5 0.025) =z(0.475) =1.96
Step 5 - Determine the rejection region
The null hypothesis will be rejected if1 =2, so we have the followingsituation:
Since we are testing 1 =2, we are at both sides of the normal curve,
therefore the rejection regions are z < 1.96 and z >1.96.Statistics dan Probability 7/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
8/38
Example
Step 6 - Determine the test statistic (z-score) from the sample data:
z=(x1 x2) (1 2)
x1x2=
(x1 x2) 0
x1x2=
16.10 15.990.152
40 + 0.2
2
40
= 2.78
Step 7 - Compare the test statistic against its critical value:2.78
-
8/13/2019 Comparisons Between Two Populations
9/38
Confidence intervals
For the distribution of the difference between two populations, the(1 )% confidence interval is given by:
CI = (x1 x2) z/2x1x2
Remember, this shows us that we are (1 )% confident that thedifference between the means lies in the range specified by the CI.
Notice that with this approach, we dont need to know the values of,and we can approximate bys if necessary. For the given data in theabove example:
CI = (16.10 15.99) 1.960.15240
+0.22
40
1
2
= 0.110 0.077
Therefore, we reject H0 at this level, because H0 says that1 2= 0,whereas we have found that 0 does not lie in the CI range.Note that you can only use confidence interval estimation as a
replacement for hypothesis testing when you have a 2-tailed test.Statistics dan Probability 9/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
10/38
Comparing Means Small Samples
Ifn1
-
8/13/2019 Comparisons Between Two Populations
11/38
-
8/13/2019 Comparisons Between Two Populations
12/38
Unequal population variances
Sometimes the small samples will be drawn from two populations thathave different (but unknown) variances, for example:
comparing instruments from two different manufacturers;
different operators using the same instruments (though depends oncompetency).
In this case we are not allowed to form a pooled variance like we do whenthe population variances are equal. So, we have to compute the standarderror of the mean of the difference distribution through:
sx1x2 =
s21n1
+ s22
n2
Statistics dan Probability 12/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
13/38
Unequal population variances
However, we now must use the following formula to calculate the totalnumber of degrees of freedom:
=
s
2
1n1 + s2
2n22
11
s21
n1
2+ 12
s22
n2
2instead of=1+2, when determining the critical value oft.
Statistics dan Probability 13/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
14/38
Hypothesis testing
When doing hypothesis testing on small samples drawn from twopopulations, use the following test statistic:
t= (x1 x2) (1 2)sx1x2
where sx1x2 is determined trough the previous methods, depending onwhether the two populations have equal or unequal variances.
Statistics dan Probability 14/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
15/38
Example of Equal Variances
The same distance was measured by two EDMs (from the samemanufacturer): EDM 1 recorded a mean distance of 100.20 m with s1 =0.04 m from 10 measurements; EDM 2 recorded a mean distance of 99.94m with s2 = 0.09 m from 32 measurements. You suspect that EDM 1 hasa systematic error of at least 20 cm (i.e., is reading longer by 20 cm). Test
this hypothesis at 0.01 significance.
Step 1
Formulate alternative hypothesis: Ha:1 2>0.2
i.e., test whether EDM 1 has a systematic error of +20 cm.Formulate null hypothesis: H0:1 2 0.2
i.e., assume that EDM 1 and EDM 2 are the same.
Statistics dan Probability 15/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
16/38
Example of Equal Variances
Step 2 - Determine number of tails
This is a 1-tailed test, because the null hypothesis has an inequality.
Step 3 - Determine level of significance and degree of freedom.
Were told that the significance level is = 0.01.Because we have equal population variances, we can use=1+2= 9 + 31 = 40.
Step 4 - Determine the critical value oft
We have a 1-tailed test, so we need to find t,=t40,0.01From the t distribution table, we have:t40,0.01=2.423
Statistics dan Probability 16/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
17/38
E l f E l V i
-
8/13/2019 Comparisons Between Two Populations
18/38
Example of Equal Variances
Step 6 - Determine the test statistic (t-score) from the sample data:First, determine the pooled variance:
s2p=1s
21+2s
22
1+2=
9 0.042
+
31 0.092
9 + 31 = 0.00664
Then determine the standard error of the mean:
sx1x2 =
s2p
1
n1+
1
n2
=
0.00664
1
10+
1
32
= 0.0295
Finally, determine the test statistic:
t=(x1 x2) (1 2)
sx1x2=
(100.20 99.94) 0.2
0.0295 = 2.033
[Note that 1 2= 0.2 here.]
Statistics dan Probability 18/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
19/38
-
8/13/2019 Comparisons Between Two Populations
20/38
C i V i F Di t ib ti
-
8/13/2019 Comparisons Between Two Populations
21/38
Comparing Variances - F Distribution
Sometimes we may need to compare the precision resulting from twoexperiments:
precision is measured by the standard deviation;
in fact, as with the 2 test, we compare variances.
If random samples of size n1 and n2 are selected from twonormally-distributed populations with equal variance then the ratio:
F =s21s22
has an F distribution with 1 degrees of freedom in the numerator and 2degrees of freedom in the denominator.
Statistics dan Probability 21/38 Comparisons Between Two Populations
Comparing Variances F Distribution
-
8/13/2019 Comparisons Between Two Populations
22/38
Comparing Variances - F Distribution
Each specific F distribution depends upon which sample is selected for the
numerator of the F-ratio, and which for the denominator; i.e., there is aunique F distribution for every possible combination of values of1 and2.
The probability density function for the F distribution is:
f(x, 1, 2) = 1
2 + 22
12
22
1x1x+2
2/21x x >0
where is the gamma function (see standard maths texts).
Different tables are given for different values of. Each table gives a valueof F corresponding to the area in the upper tail (), for the degrees offreedom N in the numerator, and D in the denominator. The tables forthe F distribution look something like the following:
Statistics dan Probability 22/38 Comparisons Between Two Populations
Table of F distribution
-
8/13/2019 Comparisons Between Two Populations
23/38
Table of F distribution
The numbers in the first column give the degrees of freedom in thedenominator; the numbers in the first row give the degrees of freedom inthe numerator.The numbers in the main body of the table give the F-score correspondingto those particular values of,
N and
D, i.e., F
N,D,.
Statistics dan Probability 23/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
24/38
F Distribution
-
8/13/2019 Comparisons Between Two Populations
25/38
F Distribution
The tables only give the area in the upper tail. If we want to find theF-score corresponding to s small area in the lower tail, we use theimportant relationship:
F1,2,1= 1
F2,1,
Notice that the number of degrees of freedom in the numerator anddenominator are interchanged. So:
F0.95=
1
0.05 F0.975=
1
0.025 F0.99=
1
0.01
for any 1, 2
Statistics dan Probability 25/38 Comparisons Between Two Populations
Example
-
8/13/2019 Comparisons Between Two Populations
26/38
Example
Calculate F4,20,0.975.From the previous equation, we see that:
F4,20,0.975= 1
F20,4,0.025=
1
8, 56= 0.117
Statistics dan Probability 26/38 Comparisons Between Two Populations
Hypothesis testing
-
8/13/2019 Comparisons Between Two Populations
27/38
Hypothesis testing
The procedure for the hypothesis testing of variances follows the same
8-step procedure as for means testing with a normal distribution, exceptwe use the test statistic:
F =s21s22
For a 1-tailed test, we always phrase the alternative hypothesis like:
Ha:2larger>
2smaller
Furthermore, the observation with the largest variance goes into thenumerator, so that
F >1
This puts the rejection region in the upper tail, so we only ever need touse the upper tail F values.
Statistics dan Probability 27/38 Comparisons Between Two Populations
For a 2-tailed test, it doesnt matter which way the alternative hypothesis
-
8/13/2019 Comparisons Between Two Populations
28/38
is phrased:
Ha :2larger =
2smaller or Ha:
2smaller =
2larger
as long as the observation with the largest variance goes into thenumerator.As there are two tails, we need to find F1,2,/2 andF1,2,1/2= 1/F2,1,/2:
Statistics dan Probability 28/38 Comparisons Between Two Populations
Example 1
-
8/13/2019 Comparisons Between Two Populations
29/38
Example 1
Which of these two sets of measurements, A or B, is the most precise, at
the 0.05 level of significance: sA = 5.83 from 31 measurements, or sB =4.12 from 21 measurements?
Step 1
Formulate alternative hypothesis: Ha:
2
A >
2
Bi.e.,put the larger variance as population 1;
or, set A has the larger variability, so is less precise.
Formulate null hypothesis: H0:2A
2B
i.e., the opposite.Step 2 - Determine number of tails
This is a 1-tailed test, because the null hypothesis has an inequality.
Statistics dan Probability 29/38 Comparisons Between Two Populations
Example 1
-
8/13/2019 Comparisons Between Two Populations
30/38
Example 1
Step 3 - Determine level of significance and degree of freedom.
Were told that the significance level is = 0.05.A= 31, A = 31 1 = 30 (numerator, because A has the largest
variance)B = 21, B = 21 1 = 20 (denominator).
Step 4 - Determine the critical value ofF
We have a 1-tailed test, so we need to find FA,B,=F30,20,0.05= 2.04
Statistics dan Probability 30/38 Comparisons Between Two Populations
Example 1
-
8/13/2019 Comparisons Between Two Populations
31/38
Example 1
Step 5 - Determine the rejection region
The null hypothesis will be rejected if2A> 2B, so we have the following
situation:
Since we are testing 2A> 2B, we are in the upper tail of the F curve,
therefore the rejection region is F >2.04.
Statistics dan Probability 31/38 Comparisons Between Two Populations
Example 1
-
8/13/2019 Comparisons Between Two Populations
32/38
p
Step 6 - Determine the test statistic (F-score) from the sample data:
F = s2As2B
=5.832
4.122 = 2.002
Step 7 - Compare the test statistic against its critical value:
2.002
-
8/13/2019 Comparisons Between Two Populations
33/38
Example 2
-
8/13/2019 Comparisons Between Two Populations
34/38
p
Step 3 - Determine level of significance and degree of freedom.
Were told that the significance level is = 0.05.A= 10, A = 10 1 = 9 (denominator)B = 6, B = 6 1 = 5 (numerator, because B has the largest
variance).Step 4 - Determine the critical value ofF
We have a 2-tailed test, so we need to find FB ,A,/2=F5,9,0.025= 4.48
FB ,A,1
/2=F5,9,0.975=
1
F9,5,0.025 =
1
6.68 = 0.150
Statistics dan Probability 34/38 Comparisons Between Two Populations
-
8/13/2019 Comparisons Between Two Populations
35/38
Example 1
-
8/13/2019 Comparisons Between Two Populations
36/38
Step 6 - Determine the test statistic (F-score) from the sample data:
F =s2Bs2A
= 0.52
0.422 = 1.42
Step 7 - Compare the test statistic against its critical value:0.150
-
8/13/2019 Comparisons Between Two Populations
37/38
As for the t and 2
distributions, determining P-values for the Fdistribution requires the use of a computer program.
On the internet, such a program can be found at:
davidmlane.com/hyperstat/F table.htmlMicrosoft Excel has the function FDIST to work out P-values for the Fdistribution, where:
p(F > F0) = FDIST(F0, N, D)
for some numerical value F0
Statistics dan Probability 37/38 Comparisons Between Two Populations
Example
http://localhost/var/www/apps/conversion/tmp/scratch_2/davidmlane.com/hyperstat/F_table.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_2/davidmlane.com/hyperstat/F_table.html -
8/13/2019 Comparisons Between Two Populations
38/38
Calculate the P-value for the following data: sA = 0.42 from 10measurements, and sB = 0.5 from 6 measurements. That is, what is theprobability that the precisions are different?
N =B = 5
D =A= 9
F =s2Bs2A
= 0.52
0.422 = 1.42
Using Excel (or the website shown above), we find:
p(F 1.42) = FDIST(1.42, 5, 9) = 0.305
Statistics dan Probability 38/38 Comparisons Between Two Populations