Comparisons Between Two Populations

8/13/2019 Comparisons Between Two Populations

1/38

Comparisons Between Two Populations

Statistics dan Probability

Semester 1, 2013

Ira M. Anjasmara

Jurusan Teknik Geomatika


2/38

Introduction

Previously, we have covered applications to samples drawn from onepopulation:

testing means through the z-test (n 30) or the t-test (n


3/38

Comparing Means Large Samples

The large sample case occurs when both samples have n 30.

Suppose we have two normally-distributed populations with differentmeans and variances: 1, 21 and 2,

22. Now, the difference in the

population means, 1 2, is also normally-distributed.

The sampling distribution of interest is:

x1

x2Samples are taken from each population, with x1, s

21 and x2, s

22, and both

n1 and n2 30.

The mean of the difference distribution is:

E[x1 x2] =1 2

The standard error of the mean of the difference distribution is then:

x1x2 =

21n1

+22n2

Statistics dan Probability 3/38 Comparisons Between Two Populations


4/38

Hypothesis testing

The procedure for hypothesis testing two population means follows thesame 8-step procedure as for a normal distribution, except we use thefollowing test statistic:

z=

(x1 x2) (1 2)

x1x2

Most often, H0 will assume that 1=2, while Ha will test fordifferences. Hence, the above test statistic reduces to:

z= x1 x2x1x2



5/38


6/38

Example

Step 1Formulate alternative hypothesis: Ha:1 =2

i.e., test whether the two theodolites are different.

Formulate null hypothesis: H0:1 =2

i.e., assume that they give identical readings.

Step 2 - Determine number of tails

This is a 2-tailed test, because the null hypothesis has an equality.

Step 3 - Determine level of significance.

Were told that the significance level is = 0.05.



7/38

Example

Step 4 - Determine the critical value ofz

We have a 2-tailed test, so we need to find z

/2=z0.025From the standard normal distribution table, we have:z0.025=z(0.5 0.025) =z(0.475) =1.96

Step 5 - Determine the rejection region

The null hypothesis will be rejected if1 =2, so we have the followingsituation:

Since we are testing 1 =2, we are at both sides of the normal curve,

therefore the rejection regions are z < 1.96 and z >1.96.Statistics dan Probability 7/38 Comparisons Between Two Populations


8/38

Example

Step 6 - Determine the test statistic (z-score) from the sample data:

z=(x1 x2) (1 2)

x1x2=

(x1 x2) 0

x1x2=

16.10 15.990.152

40 + 0.2

2

40

= 2.78

Step 7 - Compare the test statistic against its critical value:2.78


9/38

Confidence intervals

For the distribution of the difference between two populations, the(1 )% confidence interval is given by:

CI = (x1 x2) z/2x1x2

Remember, this shows us that we are (1 )% confident that thedifference between the means lies in the range specified by the CI.

Notice that with this approach, we dont need to know the values of,and we can approximate bys if necessary. For the given data in theabove example:

CI = (16.10 15.99) 1.960.15240

+0.22

40

1

2

= 0.110 0.077

Therefore, we reject H0 at this level, because H0 says that1 2= 0,whereas we have found that 0 does not lie in the CI range.Note that you can only use confidence interval estimation as a

replacement for hypothesis testing when you have a 2-tailed test.Statistics dan Probability 9/38 Comparisons Between Two Populations


10/38

Comparing Means Small Samples

Ifn1


11/38


12/38

Unequal population variances

Sometimes the small samples will be drawn from two populations thathave different (but unknown) variances, for example:

comparing instruments from two different manufacturers;

different operators using the same instruments (though depends oncompetency).

In this case we are not allowed to form a pooled variance like we do whenthe population variances are equal. So, we have to compute the standarderror of the mean of the difference distribution through:

sx1x2 =

s21n1

+ s22

n2



13/38

Unequal population variances

However, we now must use the following formula to calculate the totalnumber of degrees of freedom:

=

s

2

1n1 + s2

2n22

11

s21

n1

2+ 12

s22

n2

2instead of=1+2, when determining the critical value oft.



14/38

Hypothesis testing

When doing hypothesis testing on small samples drawn from twopopulations, use the following test statistic:

t= (x1 x2) (1 2)sx1x2

where sx1x2 is determined trough the previous methods, depending onwhether the two populations have equal or unequal variances.



15/38

Example of Equal Variances

The same distance was measured by two EDMs (from the samemanufacturer): EDM 1 recorded a mean distance of 100.20 m with s1 =0.04 m from 10 measurements; EDM 2 recorded a mean distance of 99.94m with s2 = 0.09 m from 32 measurements. You suspect that EDM 1 hasa systematic error of at least 20 cm (i.e., is reading longer by 20 cm). Test

this hypothesis at 0.01 significance.

Step 1

Formulate alternative hypothesis: Ha:1 2>0.2

i.e., test whether EDM 1 has a systematic error of +20 cm.Formulate null hypothesis: H0:1 2 0.2

i.e., assume that EDM 1 and EDM 2 are the same.



16/38


Step 2 - Determine number of tails

This is a 1-tailed test, because the null hypothesis has an inequality.

Step 3 - Determine level of significance and degree of freedom.

Were told that the significance level is = 0.01.Because we have equal population variances, we can use=1+2= 9 + 31 = 40.

Step 4 - Determine the critical value oft

We have a 1-tailed test, so we need to find t,=t40,0.01From the t distribution table, we have:t40,0.01=2.423



17/38

E l f E l V i


18/38


Step 6 - Determine the test statistic (t-score) from the sample data:First, determine the pooled variance:

s2p=1s

21+2s

22

1+2=

9 0.042

+

31 0.092

9 + 31 = 0.00664

Then determine the standard error of the mean:

sx1x2 =

s2p

1

n1+

1

n2

=

0.00664

1

10+

1

32

= 0.0295

Finally, determine the test statistic:

t=(x1 x2) (1 2)

sx1x2=

(100.20 99.94) 0.2

0.0295 = 2.033

[Note that 1 2= 0.2 here.]



19/38


20/38

C i V i F Di t ib ti


21/38

Comparing Variances - F Distribution

Sometimes we may need to compare the precision resulting from twoexperiments:

precision is measured by the standard deviation;

in fact, as with the 2 test, we compare variances.

If random samples of size n1 and n2 are selected from twonormally-distributed populations with equal variance then the ratio:

F =s21s22

has an F distribution with 1 degrees of freedom in the numerator and 2degrees of freedom in the denominator.


Comparing Variances F Distribution


22/38

Comparing Variances - F Distribution

Each specific F distribution depends upon which sample is selected for the

numerator of the F-ratio, and which for the denominator; i.e., there is aunique F distribution for every possible combination of values of1 and2.

The probability density function for the F distribution is:

f(x, 1, 2) = 1

2 + 22

12

22

1x1x+2

2/21x x >0

where is the gamma function (see standard maths texts).

Different tables are given for different values of. Each table gives a valueof F corresponding to the area in the upper tail (), for the degrees offreedom N in the numerator, and D in the denominator. The tables forthe F distribution look something like the following:


Table of F distribution


23/38

Table of F distribution

The numbers in the first column give the degrees of freedom in thedenominator; the numbers in the first row give the degrees of freedom inthe numerator.The numbers in the main body of the table give the F-score correspondingto those particular values of,

N and

D, i.e., F

N,D,.



24/38

F Distribution


25/38

F Distribution

The tables only give the area in the upper tail. If we want to find theF-score corresponding to s small area in the lower tail, we use theimportant relationship:

F1,2,1= 1

F2,1,

Notice that the number of degrees of freedom in the numerator anddenominator are interchanged. So:

F0.95=

1

0.05 F0.975=

1

0.025 F0.99=

1

0.01

for any 1, 2


Example


26/38

Example

Calculate F4,20,0.975.From the previous equation, we see that:

F4,20,0.975= 1

F20,4,0.025=

1

8, 56= 0.117


Hypothesis testing


27/38

Hypothesis testing

The procedure for the hypothesis testing of variances follows the same

8-step procedure as for means testing with a normal distribution, exceptwe use the test statistic:

F =s21s22

For a 1-tailed test, we always phrase the alternative hypothesis like:

Ha:2larger>

2smaller

Furthermore, the observation with the largest variance goes into thenumerator, so that

F >1

This puts the rejection region in the upper tail, so we only ever need touse the upper tail F values.


For a 2-tailed test, it doesnt matter which way the alternative hypothesis


28/38

is phrased:

Ha :2larger =

2smaller or Ha:

2smaller =

2larger

as long as the observation with the largest variance goes into thenumerator.As there are two tails, we need to find F1,2,/2 andF1,2,1/2= 1/F2,1,/2:


Example 1


29/38

Example 1

Which of these two sets of measurements, A or B, is the most precise, at

the 0.05 level of significance: sA = 5.83 from 31 measurements, or sB =4.12 from 21 measurements?

Step 1

Formulate alternative hypothesis: Ha:

2

A >

2

Bi.e.,put the larger variance as population 1;

or, set A has the larger variability, so is less precise.

Formulate null hypothesis: H0:2A

2B

i.e., the opposite.Step 2 - Determine number of tails

This is a 1-tailed test, because the null hypothesis has an inequality.


Example 1


30/38

Example 1


Were told that the significance level is = 0.05.A= 31, A = 31 1 = 30 (numerator, because A has the largest

variance)B = 21, B = 21 1 = 20 (denominator).

Step 4 - Determine the critical value ofF

We have a 1-tailed test, so we need to find FA,B,=F30,20,0.05= 2.04


Example 1


31/38

Example 1

Step 5 - Determine the rejection region

The null hypothesis will be rejected if2A> 2B, so we have the following

situation:

Since we are testing 2A> 2B, we are in the upper tail of the F curve,

therefore the rejection region is F >2.04.


Example 1


32/38

p

Step 6 - Determine the test statistic (F-score) from the sample data:

F = s2As2B

=5.832

4.122 = 2.002

Step 7 - Compare the test statistic against its critical value:

2.002


33/38

Example 2


34/38

p


Were told that the significance level is = 0.05.A= 10, A = 10 1 = 9 (denominator)B = 6, B = 6 1 = 5 (numerator, because B has the largest

variance).Step 4 - Determine the critical value ofF

We have a 2-tailed test, so we need to find FB ,A,/2=F5,9,0.025= 4.48

FB ,A,1

/2=F5,9,0.975=

1

F9,5,0.025 =

1

6.68 = 0.150



35/38

Example 1


36/38

Step 6 - Determine the test statistic (F-score) from the sample data:

F =s2Bs2A

= 0.52

0.422 = 1.42

Step 7 - Compare the test statistic against its critical value:0.150


37/38

As for the t and 2

distributions, determining P-values for the Fdistribution requires the use of a computer program.

On the internet, such a program can be found at:

davidmlane.com/hyperstat/F table.htmlMicrosoft Excel has the function FDIST to work out P-values for the Fdistribution, where:

p(F > F0) = FDIST(F0, N, D)

for some numerical value F0


Example
http://localhost/var/www/apps/conversion/tmp/scratch_2/davidmlane.com/hyperstat/F_table.htmlhttp://localhost/var/www/apps/conversion/tmp/scratch_2/davidmlane.com/hyperstat/F_table.html


38/38

Calculate the P-value for the following data: sA = 0.42 from 10measurements, and sB = 0.5 from 6 measurements. That is, what is theprobability that the precisions are different?

N =B = 5

D =A= 9

F =s2Bs2A

= 0.52

0.422 = 1.42

Using Excel (or the website shown above), we find:

p(F 1.42) = FDIST(1.42, 5, 9) = 0.305


Comparisons Between Two Populations

Documents

Transcript of Comparisons Between Two Populations