CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called...

25
CHAPTER 5 CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution will be used in carrying out hypothesis to analyze whether: i. A sample could have come from a given type of POPULATION DISTRIBUTION. ii. Two nominal variable/categorical variable could be INDEPENDENT and HOMOGENEOUS of each other. The chi-square test that will be discussed are: i. Goodness-of-fit Test ii. The Chi-square Test For Homogeneity iii. The Chi-square Test For Independence

Transcript of CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called...

Page 1: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

CHAPTER 5CHAPTER 5INTRODUCTORY CHI-SQUARE TESTINTRODUCTORY CHI-SQUARE TEST

• This chapter introduces a new probability distribution called the chi-square distribution.

• This chi-square distribution will be used in carrying out hypothesis to analyze whether:i. A sample could have come from a given type of POPULATION DISTRIBUTION.ii. Two nominal variable/categorical variable could be INDEPENDENT and HOMOGENEOUS of each other.

• The chi-square test that will be discussed are:

i. Goodness-of-fit Test

ii. The Chi-square Test For Homogeneity

iii. The Chi-square Test For Independence

Page 2: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

1) Goodness-of-fit Test

• In Goodness-of-fit test, chi-square analysis is applied for the purpose of examine whether sample data could have been drawn from a population having a specific probability distribution.

• In Goodness-of-fit test, the test procedures are appropriate when the following conditions are met :i. The sampling method is simple random sampling.ii. The population is at least 10 times as large as the sample.iii. The variable under study is categorical (qualitative variable).iv. The expected value (ei) for each level of the variable is at least 5.

Page 3: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

• The table frequency distribution layout:

where

• Test procedure to run the Goodness-of-fit test:1. State the null hypothesis and alternative hypothesis2. Determine:

i. The level of significance, ii. The degree of freedom,

Find the value of from the table of chi-square distribution

Category

1 2 … k

Frequency

0H 1H

1

where number of levels of the

categorical variable

df k

k

1o ioko

number of levels of the categorical variable

observed frequency, for 1 2 thi

k

o i i , ,...,k

2,df

Page 4: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

3. Calculate the value of

where the

4. Determine the rejection region to reject : i. or

ii. .

5. Make decision/conclusion.

2 2If calculated ,df

If value p

2 using the formula below:calculated

2

2

1

observed frequency

expected frequency

ki i

calculatedi i

thi

thi

o e

e

o i

e i

1 2 for 1 2 and

total observationi i ke nP X i , ,...,k n o o ... o

n

0H

Page 5: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Example:The authority claims that the proportions of road accidents

occurring in this country according to the categories User attitude (A),

Mechanical Fault (M), Insufficient Sign Board (I) and Fate (F) are 60%, 20%, 15% and

5% respectively. A study by an independent body shows the following data.

Can we accept the claim at significance level ?Solution:

1.

Category A M I F Total

Frequency 130 35 30 5 200

0 05.

0

1

200 4

: 0 6 0 2 0 15 0 05 (claim)

: At least one differs for and

n , k

H P A . ,P M . ,P I . ,P F .

H P i i A,M ,I F

Page 6: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

2.

3.

4. Since . Thus we accept 5. We conclude that we have no evidence to reject the claim.

0 05 1 4 1 3. , df k 20 05 3 7 815. , .

io

130Ao

35Mo

30Io

5Fo

i ie nP X 2

i i

i

o e

e

200 0 6 120Ae .

200 0 2 40Me .

200 0 15 30Ie .

200 0 05 10Fe .

2 3 958c .

2 20 05 33 958 7 815c . ,. . 0H

2130 120

0 833120

.

235 40

0 62540

.

230 30

0 00030

.

25 10

2 510

.

Page 7: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Example:The number of students playing truancy in a school over 200

school days is showing below.

If X is a random variable representing the number of students playing truancy

per day, test the hypothesis that X follows the Poisson distribution with mean

3 per day at

No. of truancy 0 1 2 3 4

No of days 12 32 45 50 35 26

5

0 01.

Solution:

0

1

200 6 3

1) : follows the Poisson distribution with mean 3 per day (claim)

: does not follows the Poisson distribution with mean 3 per day

n , k ,

H X

H X

20 01 52) 15 086. , .

Page 8: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

3)

4) Since 4.472<15.086, so we accept Ho.5) We conclude that there is not enough evidence to reject the claim.

# of truancy

# of days,

0 12 0.0498 200(0.0498)=9.96

0.4178

1 32 0.1493 200(0.1493) =29.86

0.1534

2 45 0.2241 200(0.2241) =44.82

0.0007

3 50 0.2240 200(0.2240) =44.80

0.6036

4 35 0.1681 200(0.1681) =33.62

0.0566

26 0.1847 200(0.1847)=36.94

3.2399

iO ( )iP X

5

( )i ie nP X

200

2

2 i i

i

O e

e

2 4.472

Page 9: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

2) The Chi-Square Test for Homogeneity

• The homogeneity test is used to determine whether several populations are similar or equal or homogeneous in some characteristics.

• This test is applied to a single categorical variable from two different population

• The test procedure is appropriate when satisfy the below conditions : i. For each population, the sampling method is simple random samplingii. Each population is at least 10 times as large as the sampleiii. The variable under study is categoricaliv. If sample data are displayed in contingency table (population x category levels), the expected value (ei) for each cell of the table is at least 5.

Page 10: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Two dimensional Contingency Table layout:

• The above is contingency table (r x c) where r denotes as the number of categories of the row variable, c denotes as the number of categories of the column variable

• is the observed frequency in cell i, j• be the total frequency for row category i• be the total frequency for column category j• be the grand total frequency for all cell (i, j)

Column Variable

Category B1

Category B2

… Category Bc

Total

Row Variable

Category A1

Category A2

Category …

… … … … …

Category Ar

Total …

11o

21o

1ro

1n

12o

22o

2ro

1co

2co

rco

2n cn

1n

2n

rn

n

ijo

in

jn

n

Page 11: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Test procedure to run Chi-square test for homogeneity:1. State the null hypothesis and alternative hypothesis

Eg:

2. Determine:i. The level of significance, ii. The degree of freedom, where

Find the value of from the table of chi-square distribution3. Calculate the value of using the formula below:

0H 1H

1 1df r c

number of rows

number of column

r

c

2,df

0

1

: The proportion of ROW variables are SAME with COLUMN variable

: The proportion of ROW variables are NOT SAME with COLUMN variable

H

H

2

2

1 1

observed frequency of and column

expected frequency of and column

r cij ij

ci j ij

th thij

th thij

o e

e

o i j

e i j

2calculated

row total column total

grand total

th thi j

ij

i jn ne

n

Page 12: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

4. Determine the rejection region TO REJECT Ho:i. If ii. If p – value approach;

5. Make decision

2 20 if

calculated ,dfH valuep

Page 13: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Example:Four machines manufacture cylindrical steel pins. The pins are

subjected to a diameter specification. A pin may meet the specification or it

may be too thin or too thick. Pins are sampled from each machine and the

number of pins in each category is counted. Table below presents the results.

Test

0 01.

Too thin OK Too Thick

Machine 1 10 102 8

Machine 2 34 161 5

Machine 3 12 79 9

Machine 4 10 60 10

Page 14: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Solution:1.

2.

From table of chi-square: 3. Construct a contingency table:

Calculation of the expected frequency:

0

1

: The proportion of pins that are too thin, OK, or too thick is the same for all machines

: The proportion of pins that are too thin, OK, or too thick is not same for all machines

H

H

0 01

4 3 So 1 1 4 1 3 1 6

.

r , c , df r c

20 01 6 16 812 . , .

Too thin OK Too Thick Total

Machine 1 10 102 8 120

Machine 2 34 161 5 200

Machine 3 12 79 9 100

Machine 4 10 60 10 80

Total 66 402 32 500

row total column total

grand total

th thi j

ij

i jn ne

n

Page 15: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Using the observed and expected frequency in the contingency table, we calculate using the formula given:

2c

ijo row total column total

grand total

th th

ij

i je

2

ij ij

ij

o e

e

11 10o

12 102o

13 8o

21 34o

22 161o

23 5o

31 12o

11

120 6615 84

500e .

12

120 40296 48

500e .

13

120 327 68

500e .

21

200 6626 4

500e .

22

200 402160 8

500e .

23

200 3212 8

500e .

31

100 6613 2

500e .

210 15 84

2 153115 84

..

.

2102 96 48

0 315896 48

..

.

28 7 68

0 00147 68

..

.

234 26 40

2 187926 40

..

.

2161 160 80

0 0002160 80

..

.

25 12 80

4 753112 80

..

.

212 13 20

0 109113 20

..

.

Page 16: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

32 79o

33 9o

41 10o

42 60o

43 10o

32

100 40280 40

500e .

33

100 326 40

500e .

41

80 6610 56

500e .

42

80 40264 32

500e .

43

80 325 12

500e .

2 15 5844c .

279 80 40

0 024480 40

..

.

29 6 40

1 05636 40

..

,

210 10 56

0 029710 56

..

.

260 64 32

0 290164 32

..

.

210 5 12

4 65135 12

..

.

Page 17: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

4.

5.

2 20 01 6

0

Since the value of 15 5844 16 812 thus we fail to

reject c . ,. . ,

H

We conclude that the proportion of pins that are too thin, OK,

or too thick is the same for all machines.

Page 18: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Exercise:200 female owners and 200 male owners of Proton cars

selected at random and the colour of their cars are noted. The following data

shows the results:

Use a 1% significance level to test whether the proportions of colour

preference are the same for female and male.

Car Colour

Black Dull Bright

Gender Male 40 110 50

Female 20 80 100

Page 19: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

3) Chi-Square Test for Independence• This test is applied to a single population which has 2

categorical variables.• To determine whether there is a significant association

between the 2 categorical variables.• Eg : In an election survey, voter might be classified by

gender (female and male) and voting preferences (democrate ,republican or independent) . This test is used to determine whether gender is related to voting preferences.

• The test is appropriated if the following are met :1. The sampling method is simple random samplingii. Each population is at least 10 times as large as the sampleiii. The variable under study is categoricaliv. If sample data are displayed in contingency table (population x category levels), the expected value for each cell of the table is at least 5.

Page 20: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

• Note: The procedure for the Chi-square test for independence is the same as the Chi-square test for homogeneity.

The only different between these two test is at the determination of the null and alternative hypothesis. The rest of the procedure are the same for both tests.

This theorem is useful in testing the following hypothesis:0

1

: ROW and COLUMN variable are INDEPENDENT

: ROW and COLUMN variable are NOT INDEPENDENT

H

H

Page 21: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Example:Insomnia is disease where a person finds it hard to sleep at

night. A study is conducted to determine whether the two attributes, smoking

habit and insomnia disease are dependent. The following data set was

obtained.

Use a 5% significance level to conduct the study.

Insomnia

Yes No

Habit Non-smokers

10 70

Ex-smokers 8 32

Smokers 22 38

Page 22: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Solution: The contingency table

1.

2.

Insomnia

Yes No Total

Habit Non-smokers 10 70 80

Ex-smokers 8 32 40

Smokers 22 38 60

Total 40 140 180

0

1

: Smoking habits and Insomnia are independent

: Smoking habits and Insomnia are not independent

H

H

0 05

3 2 So 1 1 3 1 2 1 2

.

r , c , df r c

20 05 2 5 991. , .

Page 23: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

3. Using the observed and expected frequency in the contingency table, we calculate using the formula given:

2c

ijo ije 2

ij ij

ij

o e

e

11 10o 11

80 4017 78

180e . 2

10 17 783 40

17 78

..

.

12 70o

21 8o

22 32o

31 22o

32 38o

12

80 14062 22

180e .

21

40 408 89

180e .

22

40 14031 11

180e .

31

60 4013 33

180e .

32

60 14046 67

180e .

270 62 22

0 9762 22

..

.

28 8 89

0 098 89

..

.

232 31 11

0 0331 11

..

.

222 13 33

5 6413 33

..

.

238 46 67

1 6146 67

..

.

2 11 74c .

Page 24: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

4. Since 5. We conclude that the smoking habit and insomnia disease

are not independent.

2 20 05 2 012 55 5 991 so we reject c . ,. . , H

Page 25: CHAPTER 5 INTRODUCTORY CHI-SQUARE TEST This chapter introduces a new probability distribution called the chi-square distribution. This chi-square distribution.

Exercise:A study is conducted to determine whether student’s

academic performance are independent of their active in co-curricular activities. The

following data set was obtained:

Use a 5% significance level to conduct the study.

Academic Performance

Low Fair Good

Co-curricular Activities

Inactive 40 80 60

Active 30 90 60