Goodness of Fit Tests: Independence
Transcript of Goodness of Fit Tests: Independence
![Page 1: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/1.jpg)
Goodness of Fit Tests: IndependenceMathematics 47: Lecture 34
Dan Sloughter
Furman University
May 9, 2006
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 1 / 10
![Page 2: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/2.jpg)
Testing for independence
I Suppose X is a discrete random variable with r possible outcomesand Y is a random variable with c possible outcomes.
I For i = 1, 2, . . . , r and j = 1, 2, . . . , c , let
pij = P(X = i ,Y = j),
pi+ = pi1 + pi2 + · · ·+ pic = P(X = i),
andp+j = p1j + p2j + · · ·+ prj = P(Y = j).
I We want to test the hypothesis that X and Y are independent.
I That is, we wish to test
H0 : pij = pi+p+j for all i and j
HA : pij 6= pi+p+j for some i and j .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 2 / 10
![Page 3: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/3.jpg)
Testing for independence
I Suppose X is a discrete random variable with r possible outcomesand Y is a random variable with c possible outcomes.
I For i = 1, 2, . . . , r and j = 1, 2, . . . , c , let
pij = P(X = i ,Y = j),
pi+ = pi1 + pi2 + · · ·+ pic = P(X = i),
andp+j = p1j + p2j + · · ·+ prj = P(Y = j).
I We want to test the hypothesis that X and Y are independent.
I That is, we wish to test
H0 : pij = pi+p+j for all i and j
HA : pij 6= pi+p+j for some i and j .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 2 / 10
![Page 4: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/4.jpg)
Testing for independence
I Suppose X is a discrete random variable with r possible outcomesand Y is a random variable with c possible outcomes.
I For i = 1, 2, . . . , r and j = 1, 2, . . . , c , let
pij = P(X = i ,Y = j),
pi+ = pi1 + pi2 + · · ·+ pic = P(X = i),
andp+j = p1j + p2j + · · ·+ prj = P(Y = j).
I We want to test the hypothesis that X and Y are independent.
I That is, we wish to test
H0 : pij = pi+p+j for all i and j
HA : pij 6= pi+p+j for some i and j .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 2 / 10
![Page 5: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/5.jpg)
Testing for independence
I Suppose X is a discrete random variable with r possible outcomesand Y is a random variable with c possible outcomes.
I For i = 1, 2, . . . , r and j = 1, 2, . . . , c , let
pij = P(X = i ,Y = j),
pi+ = pi1 + pi2 + · · ·+ pic = P(X = i),
andp+j = p1j + p2j + · · ·+ prj = P(Y = j).
I We want to test the hypothesis that X and Y are independent.
I That is, we wish to test
H0 : pij = pi+p+j for all i and j
HA : pij 6= pi+p+j for some i and j .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 2 / 10
![Page 6: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/6.jpg)
Testing for independence (cont’d)
I To test the hypotheses, suppose we have a random sample of size nfrom the bivariate distribution of (X ,Y ).
I For i = 1, 2, . . . , r and j = 1, 2, . . . , c , let
nij = number of observations (X ,Y ) for which X = i and Y = j ,
ni+ = ni1 + ni2 + · · ·+ nic
= number of observations (X ,Y ) for which X = i ,
and
n+j = n1j + n2j + · · ·+ nrj
= number of observations (X ,Y ) for which Y = j .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 3 / 10
![Page 7: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/7.jpg)
Testing for independence (cont’d)
I To test the hypotheses, suppose we have a random sample of size nfrom the bivariate distribution of (X ,Y ).
I For i = 1, 2, . . . , r and j = 1, 2, . . . , c , let
nij = number of observations (X ,Y ) for which X = i and Y = j ,
ni+ = ni1 + ni2 + · · ·+ nic
= number of observations (X ,Y ) for which X = i ,
and
n+j = n1j + n2j + · · ·+ nrj
= number of observations (X ,Y ) for which Y = j .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 3 / 10
![Page 8: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/8.jpg)
Testing for independence (cont’d)
I We call the table of the values nij a contingency table:
1 2 · · · c Total
1 n11 n12 · · · n1c n1+
2 n21 n22 · · · n2c n2+...
......
. . ....
...r nr1 nr2 · · · nrc nr+
Total n+1 n+2 · · · n+c n
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 4 / 10
![Page 9: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/9.jpg)
Testing for independence (cont’d)
I Now the maximum likelihood estimators are
p̂i+ =ni+
n,
for i = 1, 2, . . . , r , and
p̂+j =n+j
n,
for j = 1, 2, . . . , c .
I Hence, under H0, the expected frequencies are
eij = n · ni+
n·n+j
n=
ni+n+j
n,
i = 1, 2, . . . r and j = 1, 2, . . . , c .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 5 / 10
![Page 10: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/10.jpg)
Testing for independence (cont’d)
I Now the maximum likelihood estimators are
p̂i+ =ni+
n,
for i = 1, 2, . . . , r , and
p̂+j =n+j
n,
for j = 1, 2, . . . , c .
I Hence, under H0, the expected frequencies are
eij = n · ni+
n·n+j
n=
ni+n+j
n,
i = 1, 2, . . . r and j = 1, 2, . . . , c .
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 5 / 10
![Page 11: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/11.jpg)
Testing for independence (cont’d)
I We may evaluate either
−2 log(Λ) = 2r∑
i=1
c∑j=1
nij log
(nij
eij
)or
Q =r∑
i=1
c∑j=1
(nij − eij)2
eij.
I Under H0, both −2 log(Λ) and Q are, for large n, approximatelyχ2((r − 1)(c − 1)), where the degrees of freedom follow fromsubtracting the number of estimated parameters, that is,(r − 1) + (c − 1) = r + c − 2, from one less than the number of cells:
(rc − 1)− (r + c − 2) = rc − r − c + 1 = (r − 1)(c − 1).
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 6 / 10
![Page 12: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/12.jpg)
Testing for independence (cont’d)
I We may evaluate either
−2 log(Λ) = 2r∑
i=1
c∑j=1
nij log
(nij
eij
)or
Q =r∑
i=1
c∑j=1
(nij − eij)2
eij.
I Under H0, both −2 log(Λ) and Q are, for large n, approximatelyχ2((r − 1)(c − 1)), where the degrees of freedom follow fromsubtracting the number of estimated parameters, that is,(r − 1) + (c − 1) = r + c − 2, from one less than the number of cells:
(rc − 1)− (r + c − 2) = rc − r − c + 1 = (r − 1)(c − 1).
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 6 / 10
![Page 13: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/13.jpg)
Example
I After the Doll and Hill study of 1948 and 1949 on the connectionbetween lung cancer and smoking, R. A. Fisher raised questions aboutpossible links between genetics and smoking habits.
I In one study of 71 pairs of twins, he classified each pair in two ways:(1) whether they were identical or fraternal twins and (2) whetherthey had similar or dissimilar smoking habits.
I The following contingency table summarizes his results:
Like Habits Unlike Habits Total
Identical Twins 44 9 53Fraternal Twins 9 9 18
Total 53 18 71
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 7 / 10
![Page 14: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/14.jpg)
Example
I After the Doll and Hill study of 1948 and 1949 on the connectionbetween lung cancer and smoking, R. A. Fisher raised questions aboutpossible links between genetics and smoking habits.
I In one study of 71 pairs of twins, he classified each pair in two ways:(1) whether they were identical or fraternal twins and (2) whetherthey had similar or dissimilar smoking habits.
I The following contingency table summarizes his results:
Like Habits Unlike Habits Total
Identical Twins 44 9 53Fraternal Twins 9 9 18
Total 53 18 71
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 7 / 10
![Page 15: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/15.jpg)
Example
I After the Doll and Hill study of 1948 and 1949 on the connectionbetween lung cancer and smoking, R. A. Fisher raised questions aboutpossible links between genetics and smoking habits.
I In one study of 71 pairs of twins, he classified each pair in two ways:(1) whether they were identical or fraternal twins and (2) whetherthey had similar or dissimilar smoking habits.
I The following contingency table summarizes his results:
Like Habits Unlike Habits Total
Identical Twins 44 9 53Fraternal Twins 9 9 18
Total 53 18 71
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 7 / 10
![Page 16: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/16.jpg)
Example
I After the Doll and Hill study of 1948 and 1949 on the connectionbetween lung cancer and smoking, R. A. Fisher raised questions aboutpossible links between genetics and smoking habits.
I In one study of 71 pairs of twins, he classified each pair in two ways:(1) whether they were identical or fraternal twins and (2) whetherthey had similar or dissimilar smoking habits.
I The following contingency table summarizes his results:
Like Habits Unlike Habits Total
Identical Twins 44 9 53Fraternal Twins 9 9 18
Total 53 18 71
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 7 / 10
![Page 17: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/17.jpg)
Example (cont’d)
I The expected frequencies are
e11 =53 · 53
71= 39.56
e21 =53 · 18
71= 13.44
e12 =18 · 53
71= 13.44
e22 =18 · 18
71= 4.56.
I Table of expected frequencies:
Like Habits Unlike Habits Total
Identical Twins 39.56 13.44 53.00Fraternal Twins 13.44 4.56 18.00
Total 53.00 18.00 71.00
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 8 / 10
![Page 18: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/18.jpg)
Example (cont’d)
I The expected frequencies are
e11 =53 · 53
71= 39.56
e21 =53 · 18
71= 13.44
e12 =18 · 53
71= 13.44
e22 =18 · 18
71= 4.56.
I Table of expected frequencies:
Like Habits Unlike Habits Total
Identical Twins 39.56 13.44 53.00Fraternal Twins 13.44 4.56 18.00
Total 53.00 18.00 71.00
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 8 / 10
![Page 19: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/19.jpg)
Example (cont’d)
I The expected frequencies are
e11 =53 · 53
71= 39.56
e21 =53 · 18
71= 13.44
e12 =18 · 53
71= 13.44
e22 =18 · 18
71= 4.56.
I Table of expected frequencies:
Like Habits Unlike Habits Total
Identical Twins 39.56 13.44 53.00Fraternal Twins 13.44 4.56 18.00
Total 53.00 18.00 71.00
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 8 / 10
![Page 20: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/20.jpg)
Example (cont’d)
I To test the hypothesis H0 that the smoking habits and type of twinare independent, we compute the observed value of Q, namely,q = 7.76.
I If U is χ2(1), we have
p-value ≈ P(U ≥ 7.76) = 0.005342.
I Hence there is strong evidence to reject H0, and so to conclude thatthere is a genetic component to smoking habits.
I Note: we could have computed the observed value of −2 log(Λ):−2 log(λ) = 7.16, giving a p-value of 0.007455.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 9 / 10
![Page 21: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/21.jpg)
Example (cont’d)
I To test the hypothesis H0 that the smoking habits and type of twinare independent, we compute the observed value of Q, namely,q = 7.76.
I If U is χ2(1), we have
p-value ≈ P(U ≥ 7.76) = 0.005342.
I Hence there is strong evidence to reject H0, and so to conclude thatthere is a genetic component to smoking habits.
I Note: we could have computed the observed value of −2 log(Λ):−2 log(λ) = 7.16, giving a p-value of 0.007455.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 9 / 10
![Page 22: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/22.jpg)
Example (cont’d)
I To test the hypothesis H0 that the smoking habits and type of twinare independent, we compute the observed value of Q, namely,q = 7.76.
I If U is χ2(1), we have
p-value ≈ P(U ≥ 7.76) = 0.005342.
I Hence there is strong evidence to reject H0, and so to conclude thatthere is a genetic component to smoking habits.
I Note: we could have computed the observed value of −2 log(Λ):−2 log(λ) = 7.16, giving a p-value of 0.007455.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 9 / 10
![Page 23: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/23.jpg)
Example (cont’d)
I To test the hypothesis H0 that the smoking habits and type of twinare independent, we compute the observed value of Q, namely,q = 7.76.
I If U is χ2(1), we have
p-value ≈ P(U ≥ 7.76) = 0.005342.
I Hence there is strong evidence to reject H0, and so to conclude thatthere is a genetic component to smoking habits.
I Note: we could have computed the observed value of −2 log(Λ):−2 log(λ) = 7.16, giving a p-value of 0.007455.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 9 / 10
![Page 24: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/24.jpg)
Example (cont’d)
I To test the hypothesis H0 that the smoking habits and type of twinare independent, we compute the observed value of Q, namely,q = 7.76.
I If U is χ2(1), we have
p-value ≈ P(U ≥ 7.76) = 0.005342.
I Hence there is strong evidence to reject H0, and so to conclude thatthere is a genetic component to smoking habits.
I Note: we could have computed the observed value of −2 log(Λ):−2 log(λ) = 7.16, giving a p-value of 0.007455.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 9 / 10
![Page 25: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/25.jpg)
Example (cont’d)
I Note: To perform the above test in R, if the first column of data is inthe vector x and the second column of data is in the vector y, we firstcreate a matrix z with the command > z <- cbind(x, y)
I Then the command > chisq.test(z, correct=FALSE) willperform the above analysis.
I The command > chisq.test(z) will perform the test as well, butwith a correction for continuity.
I Note: If the data are in an array in a file called twins.dat, then thecommand > z <- read.table("twins.dat") would read the datadirectly into z for analysis.
I Note: In R Commander, use the Contingency tables option underthe Statistics menu.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 10 / 10
![Page 26: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/26.jpg)
Example (cont’d)
I Note: To perform the above test in R, if the first column of data is inthe vector x and the second column of data is in the vector y, we firstcreate a matrix z with the command > z <- cbind(x, y)
I Then the command > chisq.test(z, correct=FALSE) willperform the above analysis.
I The command > chisq.test(z) will perform the test as well, butwith a correction for continuity.
I Note: If the data are in an array in a file called twins.dat, then thecommand > z <- read.table("twins.dat") would read the datadirectly into z for analysis.
I Note: In R Commander, use the Contingency tables option underthe Statistics menu.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 10 / 10
![Page 27: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/27.jpg)
Example (cont’d)
I Note: To perform the above test in R, if the first column of data is inthe vector x and the second column of data is in the vector y, we firstcreate a matrix z with the command > z <- cbind(x, y)
I Then the command > chisq.test(z, correct=FALSE) willperform the above analysis.
I The command > chisq.test(z) will perform the test as well, butwith a correction for continuity.
I Note: If the data are in an array in a file called twins.dat, then thecommand > z <- read.table("twins.dat") would read the datadirectly into z for analysis.
I Note: In R Commander, use the Contingency tables option underthe Statistics menu.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 10 / 10
![Page 28: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/28.jpg)
Example (cont’d)
I Note: To perform the above test in R, if the first column of data is inthe vector x and the second column of data is in the vector y, we firstcreate a matrix z with the command > z <- cbind(x, y)
I Then the command > chisq.test(z, correct=FALSE) willperform the above analysis.
I The command > chisq.test(z) will perform the test as well, butwith a correction for continuity.
I Note: If the data are in an array in a file called twins.dat, then thecommand > z <- read.table("twins.dat") would read the datadirectly into z for analysis.
I Note: In R Commander, use the Contingency tables option underthe Statistics menu.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 10 / 10
![Page 29: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/29.jpg)
Example (cont’d)
I Note: To perform the above test in R, if the first column of data is inthe vector x and the second column of data is in the vector y, we firstcreate a matrix z with the command > z <- cbind(x, y)
I Then the command > chisq.test(z, correct=FALSE) willperform the above analysis.
I The command > chisq.test(z) will perform the test as well, butwith a correction for continuity.
I Note: If the data are in an array in a file called twins.dat, then thecommand > z <- read.table("twins.dat") would read the datadirectly into z for analysis.
I Note: In R Commander, use the Contingency tables option underthe Statistics menu.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 10 / 10
![Page 30: Goodness of Fit Tests: Independence](https://reader030.fdocuments.net/reader030/viewer/2022020705/61fb70ef2e268c58cd5e3441/html5/thumbnails/30.jpg)
Example (cont’d)
I Note: To perform the above test in R, if the first column of data is inthe vector x and the second column of data is in the vector y, we firstcreate a matrix z with the command > z <- cbind(x, y)
I Then the command > chisq.test(z, correct=FALSE) willperform the above analysis.
I The command > chisq.test(z) will perform the test as well, butwith a correction for continuity.
I Note: If the data are in an array in a file called twins.dat, then thecommand > z <- read.table("twins.dat") would read the datadirectly into z for analysis.
I Note: In R Commander, use the Contingency tables option underthe Statistics menu.
Dan Sloughter (Furman University) Goodness of Fit Tests: Independence May 9, 2006 10 / 10