1 A. Analysis of count data Introduction to log-linear models.
-
Upload
joseph-reeves -
Category
Documents
-
view
219 -
download
0
Transcript of 1 A. Analysis of count data Introduction to log-linear models.
1
A. Analysis of count data
Introduction to log-linear models
2
Log-linear analysis
• Contingency-table analysis
• Categorical data analysis• Discrete multivariate analysis (Bishop, Fienberg
and Holland, 1975)
• Analysis of cross-classified data• Multivariate analysis of qualitative data
(Goodman, 1978)
• Count data analysis
3
Contrast CodingLog-linear models for two-way tables
μμμλAB
ij
B
j
A
iij μ ln
Saturated log-linear model:
μ
μA
i μ
B
j
Overall effect (level)
Main effects(marginal freq.)
Interaction effect μAB
ij
In case of 2 x 2 table:
4 observations
9 parameters
Normalisation constraints
4
Survey: leaving parental home in the Netherlands
Age Female Male Total<20 135 74 209>=20 143 178 321Total 278 252 530Censored 13 40 53Total 291 292 583
The survey (Sept. 1987 - Febr. 1988):Sample of 583 young adults born in 1961530 left home before survey53 censored cases
Number leaving perantal home, by age and sex, 1961 birth cohort
Sex
5
• Counts
• Percentages
• Odds of leaving home early rather than late
Descriptive statistics
Age Female Male Total Female Male Total<20 48.6 29.4 39.4 64.6 35.4 100.0>=20 51.4 70.6 60.6 44.5 55.5 100.0Total 100.0 100.0 100.0 52.5 47.5 100.0
SexSex
Female Male TotalOdds 0.9441 0.4157 0.6511Odds ratio (ref.cat: males): 2.271
SexReference category
Leaving home
6
Log-linear models for two-way tables4 models
Model 1: Null model or overall effect model
All categories are equiprobable (an observation is equally likely to fall into any cell)
μ ln λij for all i and j
= 4.887 s.e. 0.0434
ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)
Exp(4.887) = 132.5
= 530/4
Leaving home
7
λ λ ijij 1/ ][ln Var μVar
Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29)
4ij
ijij
ij41 λ λ ln ln μ
50.132
1
50.132
1
50.132
1
50.132
1 μVar 4
1 2
ij
ij41 λ ln μ
ijij
2
ijij4
1 λ λ ln ]ln Var[ μVar 41
Leaving home
8
Log-linear models for two-way tables
Model 2: B null model
Categories of variable B (sex) are equiprobable within levels of variable A (age)
μ ln μλA
iij for all j
estimate s.e. Parameter Exp(parameter) 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1)
0.4291 0.08886 TIME(2) 1.536
μ
μA
2
μA
1
GLIM
Leaving home
9
Log-linear models for two-way tables
Model 3: B null model
Categories of variable A (age) are equiprobable within levels of variable B (sex)
ln Bjij for all j
estimate s.e. Parameter Exp(parameter) 5.773 0.0558 Overall effect 321.5
-0.4283 0.0888 TIME(1) 0.6516
0.0000 TIME(2)
μ
μA
2
μA
1
SPSS
Leaving home
10
Log-linear models for two-way tablesModel 4: independence model (unsaturated model)
Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (time)
Bj
Aiijln
Bj
Ai
Bj
Aiij ]exp[
estimate s.e. Parameter Exp(parameter)
4.697 0.0806 Overall effect 109.62 0.429 0.0889 TIME(2) 1.536 -0.098 0.0870 SEX(2) 0.906
GLIM
A
2B2
Leaving home
11
LOG-LINEAR MODEL: predictions
Females leaving home early: 109.62
Females leaving home late: 109.62 * 1.536 = 168.37
Males leaving home early: 109.62 * 0.906 = 99.37
Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63
Leaving home
12
Parameter Estimate SE
1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1)
3 .0000 . Time(2)
4 .0982 .0870 Sex(1)
5 .0000 . Sex (2)
SPSS
μ
μA
1
μA
2
μB
2
μB
1
Leaving home
13
Log-linear models for two-way tablesModel 5: saturated model
The values of categories of variable B (sex) depend on levels of variable A (time)
μμμλAB
ij
B
j
A
iij μ ln
GLIM μ
μA
2
μB
2
estimate s.e. parameter 4.905 0.08607 Overall effect
0.05757 0.1200 TIME(2)
-0.6012 0.1446 SEX(2)
0.8201 0.1831 TIME(2).SEX(2) μAB
22
Leaving home
14
Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect
2 -.8738 .1379 Time(1)
3 .0000 . Time(2)
4 -.2183 .1121 Sex(1)
5 .0000 . Sex(2)
6 .8164 .1827 Time(1) * Sex(1)
7 .0000 . Time(1) * Sex(2)
8 .0000 . Time(2) * Sex(1)
9 .0000 . Time(2) * Sex(2)
μ
μA
1
μB
2
μAB
21
μA
2
μB
1
μAB
22
μAB
12
μAB
11
SPSSLeaving home
15
LOG-LINEAR MODEL: predictions
Expected frequencies
Observed Model 1 Model 2 Model 3 Model 4 Model 5Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00
D:\s\1\liebr\2_2\2_2.wq2
Leaving home
16
Relation log-linear model and Poisson regression model
μμμλAB
ij
B
j
A
iij μ ln
xxxλln 3ij32j21i10ij
x , , 3ij2j1i xx are dummy variables (0 if i or j is equal to 1and1 if i or j equal to 2) and interaction variable is x x*x 2j1i3ij
17
Observed
FMAge20974135< 20321178143> 20530252278
18
Aiijln
Model 1: Null Model
FMAge265132.5132.5< 20265132.5132.5> 20530265265
19
Aiijln
Model 2: B Null Model (sex equiprobable)
FMAge209104.5104.5< 20321160.5160.5> 20530265265
20
Bjijln
Model 3: A Null Model (age equiprobable)
FMAge265126139< 20265126139> 20530252278
21
Bj
Aiijln
Model 4: Independence Model (no interaction)
FMAge20999.37109.63< 20321152.63168.37> 20530252278
22
ABij
Bj
Aiijln
Model 5: Saturated Model
FMAge20974135< 20321178143> 20530252278
23
Log-linear model fit a model to a table of frequencies
Data: survey of political attitudes of British electors
by genderGender
Party Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257
OBSERVED FREQUENCIES FOR VOTE
Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)
24
The classical approach
Geometric means (Birch, 1963)
Effect coding (mean is ref. Cat.)
Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233
25
GenderParty Male Female TotalConservative 5.6312 5.8636 11.4948Labour 5.8141 5.6733 11.4875Total 11.4453 11.5370 22.9823
Logarithm of frequencies
Overall effect : 22.98/4 = 5.7456
Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018
Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229
Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933
The basic modelPolitical attitudes
26
The basic model (Effect Coding: Mean)Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233
μ
μA
i
μB
j
μAB
ij
Main effect 5.7456Party effect Conservative 0.0018 Labour -0.0018Gender effect Male -0.0229 Female 0.0229Gender-Party interaction Male conservative -0.0933 Female conservative 0.0933 Male labour 0.0933 Female labour -0.0933
0 i
A
iμ
0 j
B
jμ
Coding: effect coding
0 j
AB
iji
AB
ijμμ
Parameters are subject to constraints: normalisation constraints
Only first-order contrasts can be estimated: μμA
1
A
2 -
Political attitudes
27
The basic model (GLIM)
μ
μA
i
μB
j
μAB
ij
Main effect 5.6310 0.0599Party effect Conservative 0.0000 . Labour 0.1829 0.0811Gender effect Male 0.0000 . Female 0.2324 0.0802Gender-Party interaction Male conservative 0.0000 . Female conservative 0.0000 . Male labour 0.0000 . Female labour -0.3732 0.1133
Estimate S.E.
Political attitudes
28
The basic model (SPSS)
Estimate SE Lower Upper
Main effect 5.6750 0.0586 5.56 5.79Party effect Conservative 0.1900 0.0792 0.03 0.35 Labour 0.0000 . . .Gender effect Male 0.1406 0.0801 -0.02 0.30 Female 0.0000 . . .Gender-Party interaction Male conservative -0.3726 0.1133 -0.59 -0.15 Female conservative 0.0000 . . . Male labour 0.0000 . . . Female labour 0.0000 . . .
Asymptotic 95% CI
Political attitudes
29
μμμλAB
ij
B
j
A
iij μ ln
ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312
ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636
ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142
ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734
GenderParty Male Female TotalConservative 5.6312 5.8636 11.4948Labour 5.8141 5.6733 11.4875Total 11.4453 11.5370 22.9823
Logarithm of frequencies
] exp[ μμμμλAB
ij
B
j
A
iij
The basic model (1)Political attitudes
30
The design-matrix approach
31
I. Design matrix: Effect Codingunsaturated log-linear model
μ ln μμλB
j
A
iij
uuuuu
λλλλ
B
2
B
1
A
2
A
1
22
21
12
11
10101
01101
10011
01011
ln
ln
ln
ln
Number of parameters exceeds number of equations need for additional equations
(X’X)-1 is singular identify linear dependencies
μ Y X Yμ X'X-1
32
I. Design matrixunsaturated log-linear model
μ ln μμλB
j
A
iij
μμA
1
A
2 -
uuu
λλλλ
B
1
A
1
22
21
12
11
1-1-1
1 1-1
1-1 1
1 1 1
ln
ln
ln
ln
μμB
1
B
2 - (additional eq.)
Coding!
33
uu
λλλ
B
1
A
1
21
12
11 u
1 11
11 1
1 1 1
ln
ln
ln
3 unknowns 3 equations
λλλ
ln ln
ln
uuu
21
12
11
B
1
A
1
A
1
0 0.5-0.5
0.5-0 0.5
0.5 0.5 0
λλλ
1 11
11 1
1 1 1
uu
21
12
11
1
B
1
A
1
ln
ln
ln
u
μ ln μμλB
j
A
iij where is the frequency predicted by the model
34
322.78 6431257
631 F
FF
λ 2
1
12
308.22 6141257
631 F
FF
λ 1
1
11
305.78 6141257
626 F
FF
λ 1
2
21
320.22 6431257
626 F
FF
λ 2
2
22
by genderGender
Party Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257
OBSERVED FREQUENCIES FOR VOTE
by genderGender
Party Male Female TotalConservative 308.22 322.78 631.00Labour 305.78 320.22 626.00Total 614.00 643.00 1257.00
PREDICTED FREQUENCIES FOR VOTE
Political attitudes
35
305.78ln
322.78 ln
308.22 ln
0 0.5-0.5
0.5-0 0.5
0.5 0.5 0
uuu
B
1
A
1
A
1
0.02310-
0.00395
5.74995
7229.5
7770.5
7308.5
0 0.5- 0.5
0.5- 0 0.5
0.5 0.5 0
uuu
B
1
A
1
A
1
9772.0
0040.1
17.314
]exp[
]exp[
]exp[
uuu
τττ
A
1
A
1
B
1
A
1
λλλ
uuu
21
12
11
B
1
A
1
A
1
ln
ln
ln
0 0.5-0.5
0.5-0 0.5
0.5 0.5 0
314.17*1.0040*0.9772 = 308.23 B
1
A
1
11 τττλ
B
1
A
1
21 τττλ ][ 1/ 314.17*[1/1.0040]*0.9772 = 305.78
Political attitudes
36
uuuu
λλλλ
AB
11
B
1
A
1
22
21
12
11
1 1-1-1
1-1 1-1
1-1-1 1
1 1 1 1
ln
ln
ln
ln
Design matrixSaturated log-linear model
μμμλAB
ij
B
j
A
iij μ ln
μμA
1
A
2 - μμ
B
1
B
2 -
μμAB
11
AB
12 - μμ
AB
11
AB
21 - μμ
AB
11
AB
22
37
λλλλ
1 1-1-1
1-1 1-1
1-1-1 1
1 1 1 1
uuuu
22
21
12
11
-1
AB
11
B
1
A
1
ln
ln
ln
ln
0.09330-
0.02290-
0.00185
5.74555
5.6733
5.8141
5.8636
5.6312
25.0 25.0-25.0-0.25
25.0-25.0 25.0-0.25
25.0-30.0-25.0 0.25
25.0 25.0 0.25 0.25
uuuu
AB
11
B
1
A
1
] exp[ μμμμλAB
11
B
1
A
111 exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279
] exp[ μμμμλAB
21
B
1
A
221 exp[5.7456-0.0018-0.0229+0.0933] = 335
Political attitudes
38
Type of model Overall Party Gender Unsatur. SaturatedObserved Model 1 Model 2 Model 3 Model 4 Model 5
Mal_Cons F11F11 279 314.25 315.50 307.00 308.22 279.00Fem_Cons F12 352 314.25 315.50 321.50 322.78 352.00Mal_Labour F21 335 314.25 313.00 307.00 305.78 335.00Fem_Labour F22 291 314.25 313.00 321.50 320.22 291.00--------------------------------------------------------------------------Chi-square 11.58 11.54 10.9 10.89 0Degrees of freedom 3 2 2 1 0
A. Additive modelType of model Overall Party Gender Unsatur. Satur.Main effect 5.7502 5.7542 5.7269 5.7308 5.6312Gender effect 0.0000 0.0000 0.0461 0.0462 0.2324Party effect 0.0000 -0.0080 0.0000 -0.0080 0.1829Gender-Party interaction effect 0.0000 0.0000 0.0000 0.0000 -0.3732
B. Multiplicative model [exp(u)]Type of model Overall Party Gender Unsatur. Satur.Main effect 314.2500 315.5001 307.0007 308.2157 278.9967Gender effect 0.0000 1.0000 1.0472 1.0472 1.2616Party effect 0.0000 0.9920 1.0000 0.9920 1.2007Gender-Party interaction effect 0.0000 1.0000 1.0000 1.0000 0.6885
LOG-LINEAR MODEL: expected frequencies
LOG-LINEAR MODEL: Parameters (EFFECT CODING: first category = 0)
Political attitudes
39
Other Ways of RestrictingII. Design Matrix: Contrast Coding
40
III. Design matrix: other restrictions on parameterssaturated log-linear model
μ ln μμμλAB
ij
B
j
A
iij
0 μA
2
uuuu
λλλλ
AB
11
B
1
A
1
22
21
12
11
0001
0101
0011
1111
ln
ln
ln
ln
0 μB
2 (SPSS)0 μμμ
AB
22
AB
21
AB
12
41
0.3726-
0.1406
0.1900
5.6750
5.6733
5.8141
5.8636
5.6312
291ln
335ln
352ln
279ln
1 1-1-1
1-1 0 0
1-0 1 0
1 0 0 0
1 1-1-1
1-1 0 0
1-0 1 0
1 0 0 0
uuuu
AB
11
B
1
A
1
Coding 2 Coding 1(SPSS) (Birch)
Main effect 5.6750 5.7456Party effect Conservative 0.1900 0.0019 Labour 0.0000 -0.0019Gender effect Male 0.1406 -0.0229 Female 0.0000 0.0229Gender-Party interaction Male conservative -0.3726 -0.0933 Female conservative 0.0000 0.0933
Political attitudes
42
0.3726-
0.1406
0.1900
5.6750
5.6733
5.8141
5.8636
5.6312
291ln
335ln
352ln
279ln
1 1-1-1
1-1 0 0
1-0 1 0
1 0 0 0
1 1-1-1
1-1 0 0
1-0 1 0
1 0 0 0
uuuu
AB
11
B
1
A
1
Coding 2 Coding 1(SPSS) (Birch)
Main effect 5.6750 5.7456Party effect Conservative 0.1900 0.0019 Labour 0.0000 -0.0019Gender effect Male 0.1406 -0.0229 Female 0.0000 0.0229Gender-Party interaction Male conservative -0.3726 -0.0933 Female conservative 0.0000 0.0933 Male labour 0.0000 0.0933 Female labour 0.0000 -0.0933
Political attitudes
43
OBSERVED FREQUENCIES FOR VOTE BY SEXSex
Party Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257
mu exp(mu) mu exp(mu) mu exp(mu)Main effect 5.6750 291.49 5.6312 279.00 5.7456 312.80Party effect Conservative 0.1900 1.2092 0.0000 1.0000 0.0019 1.0019 Labour 0.0000 1.0000 0.1829 1.2007 -0.0019 0.9982Gender effect Male 0.1406 1.1510 0.0000 1.0000 -0.0229 0.9774 Female 0.0000 1.0000 0.2324 1.2616 0.0229 1.0232Gender-Party interaction Male conservative -0.3726 0.6889 0.0000 1.0000 -0.0933 0.9109 Female conservative 0.0000 1.0000 0.0000 1.0000 0.0933 1.0978 Male labour 0.0000 1.0000 0.0000 1.0000 0.0933 1.0978 Female labour 0.0000 1.0000 -0.3732 0.6885 -0.0933 0.9109
Parameter estimatesContrast coding Effect coding
(SPSS)Contrast coding
(GLIM) (Birch)
Political attitudes
44
Param s.e. Param s.e.Main effect 5.6750 0.0586 5.6312 0.0599
Party effect Conservative 0.1900 0.0792 0.0000 .
Labour 0.0000 . 0.1829 0.0811
Gender effect Male 0.1406 0.0801 0.0000 .
Female 0.0000 . 0.2324 0.0802
Gender-Party interaction Male conservative -0.3726 0.1133 0.0000 .
Female conservative 0.0000 . 0.0000 .
Male labour 0.0000 . 0.0000 .
Female labour 0.0000 . -0.3732 0.1133
Parameters estimates and standard errorContrast coding
(SPSS)Contrast coding
(GLIM)
Political attitudes
45
B. Contrast coding: GLIM291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour)279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat)352 = 279 * 1.2616 * 1 * 1 (females voting conservative)335 = 279 * 1 * 1.2007 * 1 (males voting labour)C. Contrast coding: SPSS (SPSS adds 0.5 to observed values )
279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894352.5 = 291.5 * 1 * 1.20925 * 1291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat)335.5 = 291.5 * 1.15096 * 1 * 1
A. Effect coding279 = 312.80 * 0.97736 * 1.00185 * 0.91092352 = 312.80 * 1.02316 * 1.00185 * 1.09779335 = 312.80 * 0.97736 * 0.99815 * 1.09779291 = 312.80 * 1.02316 * 0.99815 * 0.91092
Prediction of counts or frequencies:Political attitudes