Test of Goodness of Fit
-
Upload
madanish-kanna -
Category
Documents
-
view
242 -
download
0
Transcript of Test of Goodness of Fit
-
8/2/2019 Test of Goodness of Fit
1/38
Tests of Goodness of FitTests of Goodness of Fit
A goodness-of-fit test is an inferential procedure used to
determine whether a frequency distribution follows a
claimed distribution.
Goodness of fit refers to how close the observed data are
to those predicted from a hypothesis
Note:Note:
The chi square test does not prove that a hypothesis is
correct. It evaluates to what extent the data and the
hypothesis have a good fit
-
8/2/2019 Test of Goodness of Fit
2/38
PROCEDURE FOR CHI-SQUAREPROCEDURE FOR CHI-SQUARE
GOODNESS OF FIT TESTGOODNESS OF FIT TEST
1. Set up the hypothesis for Chi-Square
goodness of fit test:Null hypothesis:Null hypothesis: In Chi-Square goodness of fit test, theIn Chi-Square goodness of fit test, the
null hypothesis assumes that there is no significantnull hypothesis assumes that there is no significant
difference between the observed and the expecteddifference between the observed and the expected
value.value. In other words, the data follows a specifiedIn other words, the data follows a specified
distribution.distribution.
Alternative hypothesis:Alternative hypothesis: In Chi-Square goodness of fitIn Chi-Square goodness of fittest, the alternative hypothesis assumes that there is atest, the alternative hypothesis assumes that there is a
significant difference between the observed and thesignificant difference between the observed and the
expected value. In other words, the data does notexpected value. In other words, the data does not
follow a specified distribution.follow a specified distribution.
-
8/2/2019 Test of Goodness of Fit
3/38
-
8/2/2019 Test of Goodness of Fit
4/38
PROCEDURE FOR CHI-SQUAREPROCEDURE FOR CHI-SQUARE
GOODNESS OF FIT TESTGOODNESS OF FIT TEST
3.3. Degree of freedom:Degree of freedom: In Chi-Square goodness of fittest, the degree of freedom depends on the
distribution of the sample. The following table
shows the distribution and an associated degree of
freedom:
Type ofdistribution
c Degrees of freedom
Binominal distribution(if p is estimated) 2 n-2
Poisson distribution 2 n-2
Normal distribution 3 n-3
-
8/2/2019 Test of Goodness of Fit
5/38
PROCEDURE FOR CHI-SQUAREPROCEDURE FOR CHI-SQUARE
GOODNESS OF FIT TESTGOODNESS OF FIT TEST
4. Hypothesis testing: Hypothesis testing in Chi-Squaregoodness of fit test is the same as in other tests, like Z- test, t-
test, etc.
The calculated value of Chi-Square goodness of fit test is
compared with the table value corresponding to (k-c) degrees offreedom and at level of significance.
If the calculated value of Chi-Square goodness of fit test is
greater than or equal to the table value, we will reject the null
hypothesis and conclude that there is a significant difference
between the observed and the expected frequency.
If the calculated value of Chi-Square goodness of fit test is less
than the table value, we will accept the null hypothesis and
conclude that there is no significant difference between the
observed and expected value.
-
8/2/2019 Test of Goodness of Fit
6/38
where is the significance level and
there are k- c degrees of freedom
pp-value approach:-value approach:
Critical value approach:Critical value approach:
RejectReject HH00 ififpp-value-value
-
8/2/2019 Test of Goodness of Fit
7/38
In 200 flips of a coin, one would expect 100 heads and
100 tails. But what if 92 heads and 108 tails areobserved? Would we reject the hypothesis that the
coin is fair? Or would we attribute the difference
between observed and expected frequencies to
random fluctuation?
Null hypothesis:Null hypothesis:
The frequency of heads is equal to theThe frequency of heads is equal to the
frequency of tails.frequency of tails.Alternative hypothesis:Alternative hypothesis:
The frequency of heads is not equal toThe frequency of heads is not equal to
the frequency of tailsthe frequency of tails..
ExampleExample
-
8/2/2019 Test of Goodness of Fit
8/38
ExampleExample
The calculation of the statistic2
2
Face O E O-E (O-E)2 (O-E)2/E
Heads 92 100 - 8 64 0.64
Tails 108 100 8 64 0.64
Total 200 200 0 00 = 1.28
Conclusion:The critical values of for 1 degree of freedom, with = .05 and =
0.01 are 3.841 and 6.635, respectively. As the calculated value ofis less than the table value at both = 0.05 and = 0.01 levels of
significance we do not reject the null hypothesis and conclude that the
coin is fair. That is, frequency of heads is equal to the frequency of
tails.
2
2
-
8/2/2019 Test of Goodness of Fit
9/38
The president of a major University hypothesizes that at least 90The president of a major University hypothesizes that at least 90
percent of the teaching and research faculty will favor a newpercent of the teaching and research faculty will favor a new
university policy on consulting with private and public agenciesuniversity policy on consulting with private and public agencies
within the state. Thus, for a random sample of 200 faculty members,within the state. Thus, for a random sample of 200 faculty members,
the president wouldthe president would expectexpect0.90 x 200 = 180 to favor the new policy0.90 x 200 = 180 to favor the new policy
and 0.10 x 200 = 20 to oppose it. Suppose, however, for this sample,and 0.10 x 200 = 20 to oppose it. Suppose, however, for this sample,
168 faculty members favor the new policy and 32 oppose it. Is the168 faculty members favor the new policy and 32 oppose it. Is the
difference between observed and expected frequencies sufficient todifference between observed and expected frequencies sufficient toreject the president's hypothesis that 90 percent would favor thereject the president's hypothesis that 90 percent would favor the
policy? Or would the differences be attributed to chancepolicy? Or would the differences be attributed to chance
fluctuation?fluctuation?
Null hypothesis:Null hypothesis:
The faculty favouring the new policy is 90 percentThe faculty favouring the new policy is 90 percent
Alternative hypothesis:Alternative hypothesis:
The faculty favouring the new policy is not 90 percent.The faculty favouring the new policy is not 90 percent.
ExampleExample
-
8/2/2019 Test of Goodness of Fit
10/38
ExampleExample
The calculation of the statistic2
2
Conclusion:The critical values of for 1 degree of freedom, with = .05 and =
0.01 are 3.841 and 6.635, respectively. As the calculated value ofis greater than the table value at both = 0.05 and = 0.01 levels of
significance we reject the null hypothesis. The faculty favouring the new
policy is not 90 percent
2
2
Disposition O E O-E (O-E)2 (O-E)2/E
Favour 168 180 - 12 144 0.80
Oppose 32 20 12 144 7.20
Total 200 200 0 = 8.00= 8.00
-
8/2/2019 Test of Goodness of Fit
11/38
11. Set up the null and alternative hypotheses. H0: Population has a Poisson probability
distribution
Ha: Population does not have a Poissondistribution
3. Compute the expected frequency of occurrences ei
for each value of the Poisson random variable.
2. Select a random sample and
a. Record the observed frequency fi for each value of
the Poisson random variable.
b. Compute the mean number of occurrences.
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
12/38
Poisson DistributionPoisson Distribution
22
1=
=
( )f e
e
i i
ii
k
22
1
=
=
( )f e
e
i i
ii
k
4.4. Compute the value of the test statistic.
ffii = observed frequency for category= observed frequency for category iieeii = expected frequency for category= expected frequency for category ii
kk= number of categories= number of categories
where:where:
-
8/2/2019 Test of Goodness of Fit
13/38
wherewhere is the significance level andis the significance level and
there arethere are kk- 2 degrees of freedom- 2 degrees of freedom
pp-value approach:-value approach:
Critical value approach:Critical value approach:
Reject H0 ifp-value <
5. Rejection rule:5. Rejection rule:
2 2
Reject H0 if
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
14/38
Example: Troy Parking GarageExample: Troy Parking Garage
In studying the need for an additional entrance to
a city parking garage, a consultant has
recommended an analysis approach that isapplicable only in situations where the number of
cars entering during a specified time period
follows a Poisson distribution.
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
15/38
A random sample of 100 one-minute time
intervals resulted in the customer arrivals
listed below. A statistical test must be
conducted to see if the assumption of a
Poisson distribution is reasonable.
Example: Troy Parking Garage
# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12# Arrivals 0 1 2 3 4 5 6 7 8 9 10 11 12Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1Frequency 0 1 4 10 14 20 12 12 9 8 6 3 1
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
16/38
HypothesesHypotheses
H1: Number of cars entering the garage during a
one-minute interval is not Poisson
distributed
H0: Number of cars entering the garage during a
one-minute interval is Poisson distributed
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
17/38
Estimate of Poisson Probability Function
f xe
x
x
( )!
=
6 6f x
e
x
x
( )!
=
6 6
Total ArrivalsTotal Arrivals = 0(0) + 1(1) + 2(4) + . . . + 12(1)= 0(0) + 1(1) + 2(4) + . . . + 12(1)
= 600= 600
Hence,Hence,
Estimate ofEstimate of= 600/100 = 6= 600/100 = 6
Total Time Periods = 100Total Time Periods = 100
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
18/38
Expected FrequenciesExpected Frequencies
xx ff((xx)) nfnf((xx))
00
11
22
33
4455
66
13.7713.77
10.3310.33
6.886.88
4.134.13
2.252.252.012.01
100.00100.00
0 .13770 .1377
0 .10330 .1033
0 .06880 .0688
0 .04130 .0413
0 .02250 .02250 .02010 .0201
1.00001.0000
77
88
99
1010
11111212
TotalTotal
.0025.0025
.0149.0149
.0446.0446
.0892.0892
.1339.1339
.1606.1606
.1606.1606
0.250.25
1.491.49
4.464.46
8.928.92
13.3913.3916.0616.06
16.0616.06
xx ff((xx)) nfnf((xx))
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
19/38
Observed and Expected FrequenciesObserved and Expected Frequencies
ii ffii eeii ffii -- eeii
-1.20-1.20
1.081.08
0.610.61
3.943.94
-4.06-4.06
-1.77-1.77
-1.33-1.33
1.121.12
1.611.61
6.206.20
8.928.92
13.3913.39
16.0616.06
16.0616.06
13.7713.77
10.3310.33
6.886.88
8.398.39
55
1010
1414
2020
1212
1212
99
88
1010
0 or 1 or 20 or 1 or 2
33
44
55
66
77
88
99
10 or more10 or more
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
20/38
Test Statistic
= + + + =
2 2 22 ( 1.20) (1.08) (1.61)
. . . 3.2686.20 8.92 8.39
WithWith = .05 and= .05 and kk--pp - 1 = 9 - 1 - 1 = 7 d.f.- 1 = 9 - 1 - 1 = 7 d.f.
(where(where kk= number of categories and= number of categories andpp = number= number
of population parameters estimated),of population parameters estimated),2
.05 14.067 =
RejectReject HH00 ififpp-value-value >
14.067.14.067.
Rejection Rule
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
21/38
Conclusion Using theConclusion Using thepp-Value Approach-Value Approach
Thep-value > . We cannot reject the null hypothesis.
There is no reason to doubt the assumption of a Poisson
distribution.
Because22= 3.268 is between 2.833 and 12.017 in theChi-Square Distribution Table, the area in the upper tail of
the distribution is between 0.90 and 0.10.
Area in Upper Tail 0.90 0.10 0.05 0.025 0.01Area in Upper Tail 0.90 0.10 0.05 0.025 0.01
22Value (df = 7) 2.833 12.017 14.067 16.013 18.475Value (df = 7) 2.833 12.017 14.067 16.013 18.475
Poisson DistributionPoisson Distribution
-
8/2/2019 Test of Goodness of Fit
22/38
1. Set up the null and alternative hypotheses.
3. Compute the expected frequency, ei, for each
interval.
2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the
expected frequency is at least 5 for eachinterval.
c. For each interval record the observed
frequencies
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
23/38
4. Compute the value of the test statistic.
22
1=
=
( )f e
e
i i
ii
k
22
1
=
=
( )f e
e
i i
ii
k
5. Reject H0 if (whereis the significance level
and there are k- 3 degrees of freedom).
2 2
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
24/38
Normal DistributionNormal Distribution
Example: IQ ComputersExample: IQ Computers
IQ Computers manufactures and sells a generalIQ Computers manufactures and sells a general
purpose microcomputer. As part of a study topurpose microcomputer. As part of a study toevaluate sales personnel, management wants toevaluate sales personnel, management wants to
determine, at a .05 significance level, if the annualdetermine, at a .05 significance level, if the annual
sales volume (number of units sold by a salesperson)sales volume (number of units sold by a salesperson)
follows a normal probability distribution.follows a normal probability distribution.
-
8/2/2019 Test of Goodness of Fit
25/38
A simple random sample of 30 of the
salespeople was taken and their numbers
of units sold are below.
Example: IQ ComputersExample: IQ Computers
(mean = 71, standard deviation = 18.54)(mean = 71, standard deviation = 18.54)
33 43 44 45 52 52 56 58 63 6433 43 44 45 52 52 56 58 63 64
64 65 66 68 70 72 73 73 74 7564 65 66 68 70 72 73 73 74 75
83 84 85 86 91 92 94 98 102 10583 84 85 86 91 92 94 98 102 105
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
26/38
HypothesesHypotheses
H1
: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.
H0: The population of number of units sold
has a normal distribution with mean 71and standard deviation 18.54.
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
27/38
-
8/2/2019 Test of Goodness of Fit
28/38
Interval DefinitionInterval Definition
Areas
= 1.00/6
= 0.1667
Areas= 1.00/6
= 0.1667
717153.0253.02
71 - 0.43(18.54) = 63.0371 - 0.43(18.54) = 63.03 78.9778.97
88.98 = 71 + 0.97(18.54)88.98 = 71 + 0.97(18.54)
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
29/38
Observed and ExpectedObserved and Expected
FrequenciesFrequencies
1
-2
1
0
-11
5
5
5
5
55
30
6
3
6
5
46
30
Less than 53.02
53.02 to 63.03
63.03 to 71.00
71.00 to 78.97
78.97 to 88.98More than 88.98
i fi ei fi - ei
Total
Normal DistributionNormal Distribution
N l Di t ib ti
-
8/2/2019 Test of Goodness of Fit
30/38
2 2 2 2 2 22 (1) ( 2) (1) (0) ( 1) (1) 1.600
5 5 5 5 5 5
= + + + + + =
Test Statistic
With = .05 and k-p - 1 = 6 - 2 - 1 = 3 d.f.
(where k= number of categories andp = number
of population parameters estimated),2
.05 7.815 =
RejectReject HH00 ififpp-value-value > 7.815.7.815.
Rejection Rule
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
31/38
Conclusion Using theConclusion Using thepp-Value Approach-Value Approach
Thep-value > .. We cannot reject the null hypothesis.There is little evidence to support rejecting the assumption
the population is normally distributed with = 71 and=18.54.
Because2= 1.600 is between 0.584 and 6.251 in theChi-Square Distribution Table, the area in the upper tail of
the distribution is between 0.90 and 0.10.
Area in Upper Tail .90 .10 .05 .025 .01Area in Upper Tail .90 .10 .05 .025 .01
22Value (df = 3) .584 6.251 7.815 9.348 11.345Value (df = 3) .584 6.251 7.815 9.348 11.345
Normal DistributionNormal Distribution
-
8/2/2019 Test of Goodness of Fit
32/38
CONTINGENCY TABLES
A frequency table in which a sample is classifiedaccording to the distinct classes of two different
attributes is called a contingency table.
It is often of interest to test the hypothesis that, inthe population from which the sample was drawn,the two attributes are independent.
An mxn contingency table has m rows andn columns.
CHI-SQUARE TEST FORCHI-SQUARE TEST FOR
INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES
-
8/2/2019 Test of Goodness of Fit
33/38
A typical mxn contingency table
CHI-SQUARE TEST FORCHI-SQUARE TEST FOR
INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES
Rows(Attribute 2)
Columns ( Attribute 1)
1 2 ... j n Total
1 O11
O12
O1j
O1n
R1
2 O21
O22
O2j
O2n
R2
.i.
.O
i1.
.O
i2.
Oij .O
in.
.R
i
.
m Om1
Om2
Omj
Omn
Rm
-
8/2/2019 Test of Goodness of Fit
34/38
-
8/2/2019 Test of Goodness of Fit
35/38
The test statistic to test the above hypothesis is:
or simply for easy of understanding.
This statistic has chi-square distribution with
(m-1)(n-1) degrees of freedom.
The decision is to reject the null hypothesis H0 If thecalculated value of , is greater than the table value
of at level of significance corresponding to
(m-1)(n-1) degrees of freedom.
CHI-SQUARE TEST FORCHI-SQUARE TEST FOR
INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES
=m
i
n
j ij
ijij
E
EO2
2)(
( ) 22
= O EE
2
-
8/2/2019 Test of Goodness of Fit
36/38
EXAMPLEEXAMPLEThe following data were collected in a study on the effectiveness ofinoculation for a particular disease. The two attributes in this case are;
Attribute A:Attribute A:whether or not the person was inoculated; and
Attribute BAttribute B: whether or not they contracted the disease
The 2x2 contingency table isThe 2x2 contingency table is
CHI-SQUARE TEST FORCHI-SQUARE TEST FOR
INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES
Attribute A Attribute B
Disease No disease
Inoculated 10 50
Not Inoculated 30 40
-
8/2/2019 Test of Goodness of Fit
37/38
In this case the null hypothesis and alternative
hypothesis are stated as,
H0: Contracting the disease is independent of
inoculationH1: Contracting the disease is not independent of
inoculation
Expected FrequenciesExpected Frequencies
CHI-SQUARE TEST FORCHI-SQUARE TEST FOR
INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES
Expected frequenciesExpected frequencies DiseaseDisease No diseaseNo disease TotalTotal
Inoculated 18.5 41.5 60
Not Inoculated 21.5 48.5 70
Total 40 90 130
-
8/2/2019 Test of Goodness of Fit
38/38
The test statistic is
CHI-SQUARE TEST FORCHI-SQUARE TEST FOR
INDEPENDENCE OF ATTRIBUTESINDEPENDENCE OF ATTRIBUTES
( ) 2
2
=
O E
E
( ) ( ) ( ) ( ) ( ) 22 2 2 2 2
10 185
1 85
5 0 4 15
4 15
30 215
2 15
4 0 4 85
4 851 05= = + + + =
O E
E
.
.
.
.
.
.
.
..
The critical value for a 1% significance level with 1 d.f. is 6.63. The null
hypothesis is therefore rejected at this level and it can be concludedthat inoculation does have an effect on the probability of contracting the
disease. From the contingency table it can be seen that inoculation
reduces the risk.