Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture...
Transcript of Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture...
![Page 1: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/1.jpg)
Categorical Data Analysis for Survey Data"
Professor Ron Fricker"Naval Postgraduate School"
Monterey, California"
3/26/13 1
Reading:"Lohr chapter 10"
![Page 2: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/2.jpg)
Goals for this Lecture"
• Understand and be able to conduct tests for discrete contingency table data"– One-way chi-square goodness-of-fit tests"
• Homogeneity"• Other distributions"
– Two-way chi-square tests "• Independence"• Homogeneity "
• First assuming SRS and no fpc"– Then, introduction to how to handle complex
sampling designs"3/26/13 2
![Page 3: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/3.jpg)
One-Way Classifications"
• Each item classified into one (and only one) of k categories (cells)"– Denote counts as x1, x2, …,
xk with x1+ x2 + … + xk = n"
3/26/13 3
Population
Random sample of size n
Category k Cell frequency xk
Classify
Category 1 Cell frequency x1
Category 2 Cell frequency x2
![Page 4: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/4.jpg)
Two-Way Contingency Tables"
• A two-way contingency table (or cross tabulation) gives counts by all pairwise combinations of variable levels"
3/26/13 4
Variable 1
Variable 2
“A” “B”
“X”
“Y”
# or %
# or %
# or %
# or %
# or %
# or %
# or % # or %
Number or percent of obs that are both “X” and “B”
Number or percent of obs that are “Y”
![Page 5: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/5.jpg)
One-Way Goodness-of-Fit Test"
• Have counts for k categories, x1, x2, …, xk, with x1+ x2 + … + xk = n"
• (Unknown) population cell probabilities denoted p1, p2, …, pk with p1+ p2 +…+ pk = 1
• Estimate each cell probability from the observed counts: "
• The hypotheses to be tested are""
3/26/13 5
ˆ / , 1,2,...,i ip x n i k= =
* * *0 1 1 2 2
*
: , ,...,
: at least one k k
a i i
H p p p p p pH p p
= = =
≠
![Page 6: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/6.jpg)
Goodness-of-Fit Test for Homogeneity"
• Null hypothesis is the probability of each category is equally likely:"– I.e., the distribution of category characteristics is
homogeneous in the population"• If the null is true, in each cell (in a perfect
world) we would expect to observe counts"
• So, how to do a statistical test that assesses how “far away” the ei expected counts are from the xi observed counts?"
"3/26/13 6
* 1/ , 1,2,...,ip k i k= =
*i ie np=
![Page 7: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/7.jpg)
Answer: Chi-square Test"
• Idea: Look at how far off table counts are from what is expected under the null"
• Reject if chi-square statistic too large"– Assess “too large” using chi-squared distribution"
3/26/13 7
22
1
2
1
(observed expected)expected
( )
k
i
ki i
i i
- X
x - ee
=
=
=
=
∑
∑
![Page 8: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/8.jpg)
Conducting the Test"
• First calculate X 2 statistic"• Then calculate the p-value:"
• is the chi-square distribution with k-1 degrees of freedom"
• Reject null if p-value < , for some pre-determined significance level "
3/26/13 8
21kχ −
2 21-value Pr( )kp Xχ −= ≥
αα
![Page 9: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/9.jpg)
Example"
• In Excel:"
• In R, use the chisq.test() function"– Default is the GoF test for homogeneity"
3/26/13 9
* Data from 2008 survey of NPS new students; remember, here we are assuming SRS and no fpc, which is actually not true for this data
![Page 10: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/10.jpg)
Goodness-of-Fit Test for Other Distributions"
• Homogeneity is just a special case"• Can test whether the s are anything as long
as"
• Might have some theory that says what the distribution should be, for example"
• Remember, don’t look at that data first and then specify the probabilities… "
3/26/13 10
*ip
*
11
k
iip
=
=∑
![Page 11: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/11.jpg)
Example"
• In Excel:"
• In R, again use chisq.test() function"– Now, add a vector for the probabilities"
3/26/13 11
* Data from 2008 survey of NPS new students; remember, here we are assuming SRS and no fpc, which is actually not true for this data
![Page 12: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/12.jpg)
A Note"
• Pearson chi-square test depends on all cells having sufficiently large expected counts:"– If not, collapse across some categories"– E.g., "
12
* 5i ie np= ≥
3/26/13
Count and probability for “Strongly Disagree” and “Disagree” aggregated"
* Data from 2008 survey of NPS new students; remember, here we are assuming SRS and no fpc, which is actually not true for this data
![Page 13: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/13.jpg)
Some Notation for Two-Way Contingency Tables "
• Table has r rows and c columns"• Observed cell counts are xij, with"
• Denote row sums:"
• Denote column sums:"
3/26/13 13
1, 1,...,
r
j iji
x x j c•=
= =∑1
, 1,...,c
i ijj
x x i r•=
= =∑1 1
r c
iji j
x n= =
=∑∑
![Page 14: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/14.jpg)
Chi-square Test for Independence"
• Independence means the probability of being in any cell is the product of the row and column probabilities"
3/26/13 14
Variable 1
Variable 2
“A” “B”
“X”
“Y”
Pr(X) x Pr(A) Pr(X)
Pr(Y)
Pr(A) Pr(B)
Pr(X) x Pr(B)
Pr(Y) x Pr(A) Pr(Y) x Pr(B)
Probability that a random obs is a “Y”
Probability that an obs is both “X” and “B”
![Page 15: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/15.jpg)
The Hypotheses"
• Independence means, for all cells in the table, where"– is the probability of having row i characteristic "– is the probability of having column j
characteristic"• The hypotheses to be tested are"""
"
3/26/13 15
0 : , 1,2,..., ; 1,2,...,
: , for some and ij i j
a ij i j
H p p p i r j cH p p p i j
• •
• •
= = =
≠
ij i jp p p• •=ip •
p• j
![Page 16: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/16.jpg)
Chi-square Test Statistic"
• Test statistic: "
• Under the null, the expected count is calculated as"
3/26/13 16
22
1 1
( )r cij ij
i j ij
x - eX
e= =
=∑∑
ˆ ˆ ˆ jiij ij i
j
j
i
xxe np np px x
nn n
n
••• •
• •
= = =×
=
![Page 17: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/17.jpg)
Conducting the Test"
• Now, proceed as with the goodness-of-fit test"– Except degrees of freedom are "
• Large values of the chi-square statistic are evidence that the null is false"
• We’ll let R do the p-value calculation"– Reject null if p-value < , for some pre-determined
significance level ""
3/26/13 17
( 1)( 1)r cν = − −
αα
![Page 18: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/18.jpg)
Example: Mobile Learning Survey"
• In mobile learning devices survey, is there an association between those who own a smartphone and those who own a PDA?"– “Do you own a smartphone (such as iPhone, Android, and
Blackberry)?” (yes/no)"– “Do you own a PDA (such as iPad, Zune HD, iPod Touch,
Palm, excluding previously mentioned devices)?” (yes/no)"
"
• Conclusion: The two sets of responses are not independent, so yes there is an association"
3/26/13 18
* Data from 2010 mobile learning devices survey of NPS students (again, assuming SRS and no fpc)
![Page 19: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/19.jpg)
What’s the Connection?"
• Those who do not own a smartphone are also slightly more likely not to own a PDA"
• Similarly, those who own a smartphone are slightly more likely to own a PDA"– Perhaps not a big surprise…"
3/26/13 19
• Data from 2010 mobile learning devices survey of NPS students (again, assuming SRS and no fpc, and data cleaned up for convenience)
![Page 20: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/20.jpg)
Chi-square Test for Homogeneity"
• The question: Is the distribution of a variable (say on a Likert scale) the same for two or more row categories?"
• Idea: Each row is a population and proportion that falls in each column category is the same"
• Good news: Calculation is exactly the same as test for independence!"
3/26/13 20
![Page 21: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/21.jpg)
Example: Mobile Learning Survey"
• In mobile learning devices survey, is the age distribution different for resident and DL students?"
• Sure looks different, so let’s test it formally:"
3/26/13 21
• Data from 2010 mobile learning devices survey of NPS students (again, assuming SRS and no fpc, and data cleaned up for convenience)
![Page 22: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/22.jpg)
What If I Don’t Have SRS?"
• Chi-square distribution of test statistic results from SRS assumption"
• The problem: In complex surveys table counts unlikely to reflect relative frequencies of the categories in the population "– Can’t just plug counts into standard X 2 calculations"– Results in incorrect p-values"
3/26/13 22 22
![Page 23: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/23.jpg)
Computer/Cable TV Example"
• What if interviewed two individuals in each house and got same answers?"
3/26/13 23
Computer? Yes No
Cable? Yes 119 188 307 No 88 105 193
207 293 500
New data:
Original data:
Computer? Yes No
Cable? Yes 238 376 614 No 176 210 386
414 586 1000
2 4.562X =-value 0.03p =
2 2.281X =-value 0.13p =
23
![Page 24: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/24.jpg)
After Specifying Sampling Design, R “survey” Package Gets It Right"
3/26/13 24
“cluster” identifies the households
p-value is right
Original data:" New data:"
New data correctly analyzed with cluster design accounted for:"
![Page 25: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/25.jpg)
Effects of Stratified Sampling Design on Hypothesis Tests and CIs (1) "
• If rows in contingency table correspond to strata, usual chi-square test of homogeneity fine"– But may want to test association between other
(non-strata) factors"• In general, stratification increases precision of
estimates"– E.g., stratified sample of size n gives same
precision for estimating pij as a SRS of size n / dij, where dij is the design effect"
3/26/13 25 25
![Page 26: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/26.jpg)
Effects of Stratified Sampling Design on Hypothesis Tests and CIs (2) "
• Thus p-values for chi-square tests with stratification are conservative"– E.g., actual p-value will be smaller than calculated
p-value"– Means if null rejected, it is appropriate"– However, could also fail to reject and miss a
significant result (“Type II error”)"• If don’t reject but close, how to tell if null should
be rejected?"
3/26/13 26 26
![Page 27: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/27.jpg)
Effects of Clustered Sampling Design on Hypothesis Tests and CIs"
• Opposite effect from stratification"• As we illustrated, p-values artificially low"
– Means if fail to reject null, it is appropriate"• However, if do reject null, how to tell if null
really should be rejected?"• Clustering unaccounted for can result in
spurious “significant” results"– I.e., more likely to commit “Type I” error"
3/26/13 27 27
![Page 28: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/28.jpg)
Corrections to Chi-square Tests"
• Wald tests"• Bonferroni tests"• Matching moments"• Model-based methods"
3/26/13 28 28
![Page 29: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/29.jpg)
Think of Problem in Terms of Cell Probabilities (1)"
• Use sampling weights to estimate population quantity"
where"
• Thus"
3/26/13 29
ˆk kij
k Sij
kk S
w yp
w∈
∈
=∑∑
1 if observation unit is in cell ( , )0 otherwise kij
k i jy ⎧
= ⎨⎩
sum of weights for observation units in cell ( , )ˆsum of weights for all observation units in sampleij
i jp =
29
![Page 30: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/30.jpg)
Think of Problem in Terms of Cell Probabilities (2)"
• So, using the , construct the table"
• Can express the test statistics as"
""
3/26/13 30
ˆ ijp
30
2 222
All All Allcells cells cells
ˆ ˆ( ) ( )(observed expected)expected
ij ij ij ij
ij ij
np - np p - p - X nnp p
= = =∑ ∑ ∑
2
All Allcells cells
ˆobserved ˆ2 observed ln 2 lnexpected
ijij
ij
pG n p
p⎛ ⎞⎛ ⎞
= × = ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠∑ ∑
![Page 31: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/31.jpg)
Wald Tests (1)"
• For a 2x2 table, null hypothesis of independence is"
• This is equivalent to testing"
• Let"
3/26/13 31 31
, 1 , 2ij i jp p p i j+ += ≤ ≤
0 11 22 12 21
11 22 12 21
: 0: 0a
H p p p pH p p p p
− =− ≠
11 22 12 21ˆ ˆ ˆ ˆ ˆp p p pθ = −
![Page 32: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/32.jpg)
Wald Tests (2)"
• Then for large samples, under the null"
• follows an approximately standard normal distribution "
• Equivalently, follows a chi-square distribution with 1 degree of freedom"
• Must estimate the variance appropriately"
3/26/13 32 32
( )ˆ ˆV̂θ θ
( )2ˆ ˆV̂θ θ
( )ˆV θ
![Page 33: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/33.jpg)
Example: Survey of Youth in Custody"
• 1987 survey of incarcerated youth"– Sample of n=2,588 juveniles and young adults in long-term, state-
operated juvenile institutions"– Interviewed about family background, previous criminal history, and
drug and alcohol use"– Selected variables are contained in the syc data frame"
"
3/26/13 33
![Page 34: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/34.jpg)
Example: Survey of Youth in Custody (2)"
• Is there an association between:"– “Was anyone in your family ever incarcerated?”"– “Have you ever been put on probation or sent to a
correctional institution for a violent offense?”"• Table with sum of weights:""
3/26/13 34 34
![Page 35: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/35.jpg)
Example: Survey of Youth in Custody (3)"
Incorrect analyses"
3/26/13 35
Raw counts: Do not appropriately reflect population distribution
Weighted counts: Without adjustment, sample size overinflated
35
![Page 36: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/36.jpg)
Example: Survey of Youth in Custody (4)"
• Results in the following estimated proportions:"
• Test statistic:"• How to estimate the variance? "
3/26/13 36 36
11 22 12 21ˆ ˆ ˆ ˆ ˆ 0.0053p p p pθ = − =
![Page 37: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/37.jpg)
Example: Survey of Youth in Custody (5)"
• Use resampling method:"• Thus, the standard error
of is"• So the test statistic is"
• p-value:"• Result: No evidence of association "
3/26/13 37
0.0158 7 0.006=θ̂
( )ˆ 0.0053 0.89
0.0060ˆV̂t θ
θ= = =
6Pr( ) 2 Pr( 0.89) 0.41T t Tν => = × > =
![Page 38: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/38.jpg)
Example: Survey of Youth in Custody (6)"
• Doing the calculations in R:"
– Results consistent with book – but I’m not sure how R is doing the calculations…
3/26/13 38 38
Better p-value?
![Page 39: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/39.jpg)
Wald Tests for Larger Tables"
• Let"• Hypotheses are"
• Wald test statistic is where is the estimated covariance matrix "
• Problem is, need a large number of PSUs to estimate covariance matrix"– E.g., 4x4 table results in 9x9 covariance matrix
that requires estimation of 45 variance/covariances"
3/26/13 39
11 12 ( 1)( 1), ,...,T
r cθ θ θ − −⎡ ⎤= ⎣ ⎦θ
0 :: for one or more cellsa
HH
=≠θ 0θ 0
2 1ˆ ˆ ˆV̂( )TWX
−= θ θ θ ˆV̂( )θ
![Page 40: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/40.jpg)
Bonferroni Tests (1)"
• Alternative to Wald test"• Idea is to separately (and conservatively) test
each "• Test each of m=(r-1)(c-1) tests separately at
/m significance level"• Reject null that variables are independent if
any of the m separate tests reject"
3/26/13 40
, 1 1, 1 1ij i r j cθ ≤ ≤ − ≤ ≤ −
α
![Page 41: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/41.jpg)
Bonferroni Tests (2)"
• Specifically, reject if"
"for any i and j, where k is the appropriate degrees of freedom"– Resampling: #resample groups – 1"– Another method: #PSUs – #strata"
• Lohr says method works well in practice"
3/26/13 41
( ) / 2 ,ˆ ˆV̂ij ij mtα κθ θ >
0 :H =θ 0
![Page 42: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/42.jpg)
Example: Survey of Youth in Custody (1)"
• Is there a relationship between age and whether a youth was sent to an institution for a violent offense?"
3/26/13 42
![Page 43: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/43.jpg)
Example: Survey of Youth in Custody (2)"
• Hypotheses are"
• What happens if clustering ignored?"– With n=2,621, we have"
"which gives an (incorrect) p-value of ~ 0"• Compare to a Bonferroni test…"
3/26/13 43
0 11 11 1 1
12 12 1 2
: 00
H p p pp p p
θθ
+ +
+ +
= − == − =
22 32
1 1
ˆ ˆ ˆ( )34
ˆ ˆij i j
i j i j
p - p pX n
p p+ +
= = + +
= =∑∑
![Page 44: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/44.jpg)
Example: Survey of Youth in Custody (3)"
• For these data, and"• Using resampling,
we get the table:"• And from this,"
• Thus "
3/26/13 44
11ˆ 0.013θ = 12
ˆ 0.0119θ =
( )11ˆs.e. 0.0074,θ =
( )12ˆs.e. 0.0035θ =
θ̂11
s.e. θ̂11( ) = 1.8, θ̂12
s.e. θ̂12( ) = 3.4 0.05/ 2 2, 6 2.97t ν× = =and"
Reject null (more appropriately)
![Page 45: Categorical Data Analysis for Survey Datafaculty.nps.edu/rdfricke/Survey_Short_Course_Docs/Lecture 13... · Cell frequency x k! Classify ... • Null hypothesis is the probability](https://reader031.fdocuments.net/reader031/viewer/2022030421/5aa7e5e07f8b9a6d5a8cefee/html5/thumbnails/45.jpg)
What We Have Just Learned"
• Discussed tests for contingency tables"– One-way chi-square goodness-of-fit tests"
• Homogeneity"• Other distributions"
– Two-way chi-square tests "• Independence"• Homogeneity "
• Gained some insight into "– What to do about categorical data analysis for
complex designs"– How complex designs affect chi-square hypothesis
tests"3/26/13 45