Test of Goodness of Fit

8/2/2019 Test of Goodness of Fit

1/38

Tests of Goodness of FitTests of Goodness of Fit

A goodness-of-fit test is an inferential procedure used to

determine whether a frequency distribution follows a

claimed distribution.

Goodness of fit refers to how close the observed data are

to those predicted from a hypothesis

Note:Note:

The chi square test does not prove that a hypothesis is

correct. It evaluates to what extent the data and the

hypothesis have a good fit


2/38

PROCEDURE FOR CHI-SQUAREPROCEDURE FOR CHI-SQUARE

GOODNESS OF FIT TESTGOODNESS OF FIT TEST

1. Set up the hypothesis for Chi-Square

goodness of fit test:Null hypothesis:Null hypothesis: In Chi-Square goodness of fit test, theIn Chi-Square goodness of fit test, the

null hypothesis assumes that there is no significantnull hypothesis assumes that there is no significant

difference between the observed and the expecteddifference between the observed and the expected

value.value. In other words, the data follows a specifiedIn other words, the data follows a specified

distribution.distribution.

Alternative hypothesis:Alternative hypothesis: In Chi-Square goodness of fitIn Chi-Square goodness of fittest, the alternative hypothesis assumes that there is atest, the alternative hypothesis assumes that there is a

significant difference between the observed and thesignificant difference between the observed and the

expected value. In other words, the data does notexpected value. In other words, the data does not

follow a specified distribution.follow a specified distribution.


3/38


4/38



3.3. Degree of freedom:Degree of freedom: In Chi-Square goodness of fittest, the degree of freedom depends on the

distribution of the sample. The following table

shows the distribution and an associated degree of

freedom:

Type ofdistribution

c Degrees of freedom

Binominal distribution(if p is estimated) 2 n-2

Poisson distribution 2 n-2

Normal distribution 3 n-3


5/38



4. Hypothesis testing: Hypothesis testing in Chi-Squaregoodness of fit test is the same as in other tests, like Z- test, t-

test, etc.

The calculated value of Chi-Square goodness of fit test is

compared with the table value corresponding to (k-c) degrees offreedom and at level of significance.

If the calculated value of Chi-Square goodness of fit test is

greater than or equal to the table value, we will reject the null

hypothesis and conclude that there is a significant difference

between the observed and the expected frequency.

If the calculated value of Chi-Square goodness of fit test is less

than the table value, we will accept the null hypothesis and

conclude that there is no significant difference between the

observed and expected value.


6/38

where is the significance level and

there are k- c degrees of freedom

pp-value approach:-value approach:

Critical value approach:Critical value approach:

RejectReject HH00 ififpp-value-value


7/38

In 200 flips of a coin, one would expect 100 heads and

100 tails. But what if 92 heads and 108 tails areobserved? Would we reject the hypothesis that the

coin is fair? Or would we attribute the difference

between observed and expected frequencies to

random fluctuation?

Null hypothesis:Null hypothesis:

The frequency of heads is equal to theThe frequency of heads is equal to the

frequency of tails.frequency of tails.Alternative hypothesis:Alternative hypothesis:

The frequency of heads is not equal toThe frequency of heads is not equal to

the frequency of tailsthe frequency of tails..

ExampleExample


8/38

ExampleExample

The calculation of the statistic2

2

Face O E O-E (O-E)2 (O-E)2/E

Heads 92 100 - 8 64 0.64

Tails 108 100 8 64 0.64

Total 200 200 0 00 = 1.28

Conclusion:The critical values of for 1 degree of freedom, with = .05 and =

0.01 are 3.841 and 6.635, respectively. As the calculated value ofis less than the table value at both = 0.05 and = 0.01 levels of

significance we do not reject the null hypothesis and conclude that the

coin is fair. That is, frequency of heads is equal to the frequency of

tails.

2

2


9/38

The president of a major University hypothesizes that at least 90The president of a major University hypothesizes that at least 90

percent of the teaching and research faculty will favor a newpercent of the teaching and research faculty will favor a new

university policy on consulting with private and public agenciesuniversity policy on consulting with private and public agencies

within the state. Thus, for a random sample of 200 faculty members,within the state. Thus, for a random sample of 200 faculty members,

the president wouldthe president would expectexpect0.90 x 200 = 180 to favor the new policy0.90 x 200 = 180 to favor the new policy

and 0.10 x 200 = 20 to oppose it. Suppose, however, for this sample,and 0.10 x 200 = 20 to oppose it. Suppose, however, for this sample,

168 faculty members favor the new policy and 32 oppose it. Is the168 faculty members favor the new policy and 32 oppose it. Is the

difference between observed and expected frequencies sufficient todifference between observed and expected frequencies sufficient toreject the president's hypothesis that 90 percent would favor thereject the president's hypothesis that 90 percent would favor the

policy? Or would the differences be attributed to chancepolicy? Or would the differences be attributed to chance

fluctuation?fluctuation?

Null hypothesis:Null hypothesis:

The faculty favouring the new policy is 90 percentThe faculty favouring the new policy is 90 percent

Alternative hypothesis:Alternative hypothesis:

The faculty favouring the new policy is not 90 percent.The faculty favouring the new policy is not 90 percent.

ExampleExample


10/38

ExampleExample

The calculation of the statistic2

2

Conclusion:The critical values of for 1 degree of freedom, with = .05 and =

0.01 are 3.841 and 6.635, respectively. As the calculated value ofis greater than the table value at both = 0.05 and = 0.01 levels of

significance we reject the null hypothesis. The faculty favouring the new

policy is not 90 percent

2

2

Disposition O E O-E (O-E)2 (O-E)2/E

Favour 168 180 - 12 144 0.80

Oppose 32 20 12 144 7.20

Total 200 200 0 = 8.00= 8.00


11/38

11. Set up the null and alternative hypotheses. H0: Population has a Poisson probability

distribution

Ha: Population does not have a Poissondistribution

3. Compute the expected frequency of occurrences ei

for each value of the Poisson random variable.

2. Select a random sample and

a. Record the observed frequency fi for each value of

the Poisson random variable.

b. Compute the mean number of occurrences.

Poisson DistributionPoisson Distribution


12/38


22

1=

=

( )f e

e

i i

ii

k

22

1

=

=

( )f e

e

i i

ii

k

4.4. Compute the value of the test statistic.

ffii = observed frequency for category= observed frequency for category iieeii = expected frequency for category= expected frequency for category ii

kk= number of categories= number of categories

where:where:


13/38

wherewhere is the significance level andis the significance level and

there arethere are kk- 2 degrees of freedom- 2 degrees of freedom

pp-value approach:-value approach:

Critical value approach:Critical value approach:

Reject H0 ifp-value <

5. Rejection rule:5. Rejection rule:

2 2

Reject H0 if



14/38

Example: Troy Parking GarageExample: Troy Parking Garage

In studying the need for an additional entrance to

a city parking garage, a consultant has

recommended an analysis approach that isapplicable only in situations where the number of

cars entering during a specified time period

follows a Poisson distribution.



15/38

A random sample of 100 one-minute time

intervals resulted in the customer arrivals

listed below. A statistical test must be

conducted to see if the assumption of a

Poisson distribution is reasonable.

Example: Troy Parking Garage

# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1



16/38

HypothesesHypotheses

H1: Number of cars entering the garage during a

one-minute interval is not Poisson

distributed

H0: Number of cars entering the garage during a

one-minute interval is Poisson distributed



17/38

Estimate of Poisson Probability Function

f xe

x

x

( )!

=

6 6f x

e

x

x

( )!

=

6 6

Total ArrivalsTotal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1)= 0(0) + 1(1) + 2(4) + . . . + 12(1)

= 600= 600

Hence,Hence,

Estimate ofEstimate of= 600/100 = 6= 600/100 = 6

Total Time Periods = 100Total Time Periods = 100



18/38

Expected FrequenciesExpected Frequencies

xx ff((xx)) nfnf((xx))

00

11

22

33

4455

66

13.7713.77

10.3310.33

6.886.88

4.134.13

2.252.252.012.01

100.00100.00

0 .13770 .1377

0 .10330 .1033

0 .06880 .0688

0 .04130 .0413

0 .02250 .02250 .02010 .0201

1.00001.0000

77

88

99

1010

11111212

TotalTotal

.0025.0025

.0149.0149

.0446.0446

.0892.0892

.1339.1339

.1606.1606

.1606.1606

0.250.25

1.491.49

4.464.46

8.928.92

13.3913.3916.0616.06

16.0616.06

xx ff((xx)) nfnf((xx))



19/38

Observed and Expected FrequenciesObserved and Expected Frequencies

ii ffii eeii ffii -- eeii

-1.20-1.20

1.081.08

0.610.61

3.943.94

-4.06-4.06

-1.77-1.77

-1.33-1.33

1.121.12

1.611.61

6.206.20

8.928.92

13.3913.39

16.0616.06

16.0616.06

13.7713.77

10.3310.33

6.886.88

8.398.39

55

1010

1414

2020

1212

1212

99

88

1010

0 or 1 or 20 or 1 or 2

33

44

55

66

77

88

99

10 or more10 or more



20/38

Test Statistic

= + + + =

2 2 22 ( 1.20) (1.08) (1.61)

. . . 3.2686.20 8.92 8.39

WithWith = .05 and= .05 and kk--pp - 1 = 9 - 1 - 1 = 7 d.f.- 1 = 9 - 1 - 1 = 7 d.f.

(where(where kk= number of categories and= number of categories andpp = number= number

of population parameters estimated),of population parameters estimated),2

.05 14.067 =

RejectReject HH00 ififpp-value-value >

14.067.14.067.

Rejection Rule



21/38

Conclusion Using theConclusion Using thepp-Value Approach-Value Approach

Thep-value > . We cannot reject the null hypothesis.

There is no reason to doubt the assumption of a Poisson

distribution.

Because22= 3.268 is between 2.833 and 12.017 in theChi-Square Distribution Table, the area in the upper tail of

the distribution is between 0.90 and 0.10.

Area in Upper Tail 0.90 0.10 0.05 0.025 0.01Area in Upper Tail 0.90 0.10 0.05 0.025 0.01

22Value (df = 7) 2.833 12.017 14.067 16.013 18.475Value (df = 7) 2.833 12.017 14.067 16.013 18.475



22/38

1. Set up the null and alternative hypotheses.

3. Compute the expected frequency, ei, for each

interval.

2. Select a random sample and

a. Compute the mean and standard deviation.

b. Define intervals of values so that the

expected frequency is at least 5 for eachinterval.

c. For each interval record the observed

frequencies

Normal DistributionNormal Distribution


23/38

4. Compute the value of the test statistic.

22

1=

=

( )f e

e

i i

ii

k

22

1

=

=

( )f e

e

i i

ii

k

5. Reject H0 if (whereis the significance level

and there are k- 3 degrees of freedom).

2 2



24/38


Example: IQ ComputersExample: IQ Computers

IQ Computers manufactures and sells a generalIQ Computers manufactures and sells a general

purpose microcomputer. As part of a study topurpose microcomputer. As part of a study toevaluate sales personnel, management wants toevaluate sales personnel, management wants to

determine, at a .05 significance level, if the annualdetermine, at a .05 significance level, if the annual

sales volume (number of units sold by a salesperson)sales volume (number of units sold by a salesperson)

follows a normal probability distribution.follows a normal probability distribution.


25/38

A simple random sample of 30 of the

salespeople was taken and their numbers

of units sold are below.

Example: IQ ComputersExample: IQ Computers

(mean = 71, standard deviation = 18.54)(mean = 71, standard deviation = 18.54)

33 43 44 45 52 52 56 58 63 6433 43 44 45 52 52 56 58 63 64

64 65 66 68 70 72 73 73 74 7564 65 66 68 70 72 73 73 74 75

83 84 85 86 91 92 94 98 102 10583 84 85 86 91 92 94 98 102 105



26/38

HypothesesHypotheses

H1

: The population of number of units sold

does not have a normal distribution with

mean 71 and standard deviation 18.54.

H0: The population of number of units sold

has a normal distribution with mean 71and standard deviation 18.54.



27/38


28/38

Interval DefinitionInterval Definition

Areas

= 1.00/6

= 0.1667

Areas= 1.00/6

= 0.1667

717153.0253.02

71 - 0.43(18.54) = 63.0371 - 0.43(18.54) = 63.03 78.9778.97

88.98 = 71 + 0.97(18.54)88.98 = 71 + 0.97(18.54)



29/38

Observed and ExpectedObserved and Expected

FrequenciesFrequencies

1

-2

1

0

-11

5

5

5

5

55

30

6

3

6

5

46

30

Less than 53.02

53.02 to 63.03

63.03 to 71.00

71.00 to 78.97

78.97 to 88.98More than 88.98

i fi ei fi - ei

Total


N l Di t ib ti


30/38

2 2 2 2 2 22 (1) ( 2) (1) (0) ( 1) (1) 1.600

5 5 5 5 5 5

= + + + + + =

Test Statistic

With = .05 and k-p - 1 = 6 - 2 - 1 = 3 d.f.

(where k= number of categories andp = number

of population parameters estimated),2

.05 7.815 =

RejectReject HH00 ififpp-value-value > 7.815.7.815.

Rejection Rule



31/38

Conclusion Using theConclusion Using thepp-Value Approach-Value Approach

Thep-value > .. We cannot reject the null hypothesis.There is little evidence to support rejecting the assumption

the population is normally distributed with = 71 and=18.54.

Because2= 1.600 is between 0.584 and 6.251 in theChi-Square Distribution Table, the area in the upper tail of

the distribution is between 0.90 and 0.10.

Area in Upper Tail .90 .10 .05 .025 .01Area in Upper Tail .90 .10 .05 .025 .01

22Value (df = 3) .584 6.251 7.815 9.348 11.345Value (df = 3) .584 6.251 7.815 9.348 11.345



32/38

CONTINGENCY TABLES

A frequency table in which a sample is classifiedaccording to the distinct classes of two different

attributes is called a contingency table.

It is often of interest to test the hypothesis that, inthe population from which the sample was drawn,the two attributes are independent.

An mxn contingency table has m rows andn columns.

CHI-SQUARE TEST FORCHI-SQUARE TEST FOR

INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES


33/38

A typical mxn contingency table



Rows(Attribute 2)

Columns ( Attribute 1)

1 2 ... j n Total

1 O11

O12

O1j

O1n

R1

2 O21

O22

O2j

O2n

R2

.i.

.O

i1.

.O

i2.

Oij .O

in.

.R

i

.

m Om1

Om2

Omj

Omn

Rm


34/38


35/38

The test statistic to test the above hypothesis is:

or simply for easy of understanding.

This statistic has chi-square distribution with

(m-1)(n-1) degrees of freedom.

The decision is to reject the null hypothesis H0 If thecalculated value of , is greater than the table value

of at level of significance corresponding to

(m-1)(n-1) degrees of freedom.



=m

i

n

j ij

ijij

E

EO2

2)(

( ) 22

= O EE

2


36/38

EXAMPLEEXAMPLEThe following data were collected in a study on the effectiveness ofinoculation for a particular disease. The two attributes in this case are;

Attribute A:Attribute A:whether or not the person was inoculated; and

Attribute BAttribute B: whether or not they contracted the disease

The 2x2 contingency table isThe 2x2 contingency table is



Attribute A Attribute B

Disease No disease

Inoculated 10 50

Not Inoculated 30 40


37/38

In this case the null hypothesis and alternative

hypothesis are stated as,

H0: Contracting the disease is independent of

inoculationH1: Contracting the disease is not independent of

inoculation

Expected FrequenciesExpected Frequencies



Expected frequenciesExpected frequencies DiseaseDisease No diseaseNo disease TotalTotal

Inoculated 18.5 41.5 60

Not Inoculated 21.5 48.5 70

Total 40 90 130


38/38

The test statistic is



( ) 2

2

=

O E

E

( ) ( ) ( ) ( ) ( ) 22 2 2 2 2

10 185

1 85

5 0 4 15

4 15

30 215

2 15

4 0 4 85

4 851 05= = + + + =

O E

E

.

.

.

.

.

.

.

..

The critical value for a 1% significance level with 1 d.f. is 6.63. The null

hypothesis is therefore rejected at this level and it can be concludedthat inoculation does have an effect on the probability of contracting the

disease. From the contingency table it can be seen that inoculation

reduces the risk.

Test of Goodness of Fit

Documents

Transcript of Test of Goodness of Fit