INDE 2333 ENGINEERING STATISTICS I GOODNESS OF FIT University of Houston Dept. of Industrial...
-
Upload
thomasine-stephens -
Category
Documents
-
view
215 -
download
0
Transcript of INDE 2333 ENGINEERING STATISTICS I GOODNESS OF FIT University of Houston Dept. of Industrial...
INDE 2333ENGINEERING STATISTICS I
GOODNESS OF FIT
University of HoustonDept. of Industrial Engineering
Houston, TX 77204-4812(713) 743-4195
GOODNESS OF FIT TESTS
Used to determine if a sample could have come from a distribution with the specified parameters
Commonly used to determine if data is normally distributed Many tests such as the ones that we have been using require normally
distributed data. If data is not normally distributed, non-parametric tests must be used
(next subject in the course) Also used for input distributions in system modeling
Customers or jobs arrive exponentially distributed? Service times follow what distribution? Failures occur according to what distribution?
GOODNESS OF FIT TESTS
Based on a comparison of observations between Observed data Theoretical data
The comparison utilizes a set of intervals or cells Each cell has a lower and upper boundary values The determination of the boundaries are a function of
Theoretical distribution Number of observations in the sample 2 different approaches…
TWO DIFFERENT APPROACHES
Approach 1 Used in the book Equal interval approach No cell grouping can have less than 5 expected observations
Approach 2 Used in other books Equiprobable approach Maximum number of cells not to exceed 100 such that the expected number of
observations is at least 5 = Int ( obs/5 ) Expected number of obs in each cell = obs / cells More statistically robust
HYPOTHESES TEST PROCEDURE
Identify Ho and Ha Determine level of significance (generally 0.05 or 0.01) Determine “critical value” criterion from level of
significance Calculate “test statistic” Make decision
Fail to reject Ho Reject Ho
HYPOTHESES
Ho The sample could have come from a distribution with the
specified parameters Ha
The sample could not have come from a distribution with the specified parameters
CRITICAL VALUE
Chi-square distribution chart One sided test Alpha typically 0.05 Degrees of freedom
# of cells - # of parameters used from sample -1 The -1 is always used due to the known sample size n Note, if the parameters are specified not sampled then they do not
reduce the number of degrees of freedom in the above equation
CHI-SQUAREfor a particular number of degrees of freedom
f(X^2)
X^20 X^2 Critical value
Right tail probability, alpha, typically 0.05
DECISION
Cannot reject Test statistic is less than the critical value Sample could have come from a distribution with the
specified parameters Reject
Test statistic is greater than the critical value Sample could not have come from a distribution with the
specified parameters
EXAMPLE 1EQUAL INTERVAL APPROACH
400 5 minute intervals were observed for air traffic control messages
At alpha=0.01, is the distribution of the number of messages able to be considered as having a poisson distribution with a mean of 4.6?
Approach Lamba parameter of 4.6 is given Use the poisson table probability table for 4.6 Multiply the probability by 400 to obtain the expected observations Compare the actual observations to the expected observations
HYPOTHESES
Ho: Poisson distribution with mean of 4.6
Ha: Not poisson distribution with a mean of 4.6
Messages Observed Probability Expected
0 3 Combine 0.010 4.0 Combine
1 15 for 18 0.046 18.4 For 22.4
2 47 0.107 42.8
3 76 0.163 65.2
4 68 0.187 74.8
5 74 0.173 69.2
6 46 0.132 52.8
7 39 0.087 34.8
8 15 0.050 20.0
9 9 0.025 10.0
10 5 Combine 0.012 4.8 Combine
11 2 these 4 0.005 2.0 These 4
12 0 cells for a 0.002 0.8 Cells for a
13 1 total of 8 0.001 0.4 total of 8
CHI-SQUAREfor 10-1 degrees of freedom
f(X^2)
X^20 16.919 Critical value
Right tail probability, alpha = 0.01
DECISION
Test statistic of 6.749 is less than the critical value of 16.919
Cannot reject Ho of distribution being poisson with a mean of 4.6
There is evidence to support the claim that the data came from a poisson distribution with a mean of 4.6 at an alpha level of 0.01
EXAMPLE 2EQUIPROBABLE APPROACH
Were the scores from an INDE 2333 exam normally distributed?
Sample statistics Mean=71.95 Std=11.93 N=43
HYPOTHESES
Ho The sample could have come from a normally distributed
population with a mean of 71.95 and a std of 11.93 Ha
The sample could not have come from a normally distributed population with a mean of 71.95 and a std of 11.93
CRITICAL VALUE
Chi-square distribution chart One sided test 0.05 Degrees of freedom
The sample size is 43 Want the maximum number of cells not to exceed 100 with a minimum
expected number of observation of 5 43/5=8.6 cells With 8 cells, the expected number of observations is 5.375 Degrees of freedom is number of cells – number of parameters used from
sample-1 Degrees of freedom=8-2-1=5
CELL BOUNDARIES
To calculate observed values in each cell, we must determine the actual x cell boundaries from the 8 equiprobable cells
For normal distributions Look up z value corresponding to probability Boundaries =mean+std * Z
CALCULATING OBSERVATIONS
Cell Lower% Upper% Lower
Z
Upper
Z
lowerx upperx obs exp
1 0 0.125 -inf 0 58.227 5 5.375
2 0.125 0.250 58.227 63.905 6 5.375
3 0.250 0.375 63.905 68.151 5 5.375
4 0.375 0.500 0 68.151 71.953 3 5.375
5 0.500 0.625 0 71.953 75.756 5 5.375
6 0.625 0.750 75.756 80.002 5 5.375
7 0.750 0.875 80.002 85.680 5 5.375
8 0.875 1.000 +inf 85.680 100.00 8 5.375
CALCULATING TEST STATISTIC
Cell obs exp ((O-E)^2)/E
1 5 5.375 0.026
2 6 5.375 0.072
3 5 5.375 0.026
4 3 5.375 1.049
5 5 5.375 0.026
6 5 5.375 0.026
7 5 5.375 0.026
8 8 5.375 1.282
Total 2.534
DECISION
2.581 < 11.070 Cannot reject the Ho Evidence to support the claim that the test scores are
normally distributed with a mean of 71.95 and std of 11.93