Basics of mixed effects models in Rling.snu.ac.kr/jun/ksss_2018/handout.pdf · Baayen, R. Harald....
Transcript of Basics of mixed effects models in Rling.snu.ac.kr/jun/ksss_2018/handout.pdf · Baayen, R. Harald....
Basics of mixed effects models in R
July 5, 2018 Summer workshop: the Korean Society of Speech Sciences
Jongho Jun Hyesun Cho
([email protected])Seoul National University
([email protected])Dankook University
Topics
• Mixed effects linear regression • Mixed effects logistic regression
• Fixed effect• Random effectü Random interceptü Random slope• Model comparison
2
Roadmap
I. Mixed effects linear regression○Wall Street Journal corpus data○Hypothetical VC duration data○ Interaction terms and model selection
II. Mixed effects logistic regression○English dative alternation
3
Data for in-class discussion
• vbarg.txt
• BresDative.txtDownload 7. Syntax (.zip)
from the DOWNLOADS tabat https://www.wiley.com/en-
us/Quantitative+Methods+In+Linguistics-p-9781405144247.
• vcdur.txt: download from https://goo.gl/N2oDaS
4
References� Johnson, Keith. (2008) Quantitative methods in
linguistics. Blackwell. ch7.
� Baayen, R. Harald. (2008). Analyzing Linguistic Data:A Practical Introduction to Statistics Using R. Cambridge University Press. ch7.
� Muller, Samuel, J. L. Scealy & A.H. Welsh (2013) Model Selection in Linear Mixed Models. Statistical Science 28(2), 135-167
5
I. Mixed effects linear regression
○Wall Street Journal corpus data
○Hypothetical VC duration data
○ Interaction terms and model selection
6
Wall Street Journal (WSJ) corpus
• Discussed in Johnson (2008 section 7.3)• Coding
(N0 According to the media)(A0 President Menem)(V took)(A1 office)(AM July 8).
ü A0 = agent; ü N0 = earlier material in the sentence.ü A1 = argumentü …
7
Research question (WSJ)
Is there any relationship between # words of N0 and # words of A0?
• (N0 According to the media)(A0 President Menem)(V took)(A1 office)(AM July 8).
• More specifically,Prediction: N0 size à A0 size?
8
WSJ corpus data: vbarg.txt • a0, no: log transformed size
9
verb A0size N0size a0 n0 ….1 take 2 17 1.09 2.89 …2 say 4 0 1.60 0 …3 expect 1 5 0.69 1.79 …4 sell 4 0 1.60 0 …5 say 9 0 2.30 0 …6 increase 3 0 1.38 0 …… … … … … … …
30590 say 0 11 0 2.48
Linear regression model
• Equation: y= b + a*x
üb interceptüa slope
10
Linear regression model (WSJ)
• Equation: y= b + a*x
üb interceptüa slope
11
dependent var: a0 Predictor: n0
Linear regression model (WSJ)
• Equation: a0 = b + a*n0
• R function: lm()
12
Linear regression model (WSJ)
• Equation: a0 = b + a*n0
13
> lm(a0~n0,data=vbarg)
Linear regression model (WSJ)
• Equation: a0 = b + a*n0
14
> summary(lm(a0~n0,data=vbarg))Call:
lm(formula = a0 ~ n0, data = vbarg)
…
Coefficients:Est Std. Error t value Pr(>|t|)
(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***
Linear regression model (WSJ)
• Equation: a0 = b + a*n0
15
> summary(lm(a0~n0,data=vbarg))Call:
lm(formula = a0 ~ n0, data = vbarg)
…
Coefficients:Est Std. Error t value Pr(>|t|)
(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***
Linear regression model (WSJ)
• Equation: a0 = 1.14 + a*n0
16
> summary(lm(a0~n0,data=vbarg))Call:
lm(formula = a0 ~ n0, data = vbarg)
…
Coefficients:Est Std. Error t value Pr(>|t|)
(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***
Linear regression model (WSJ)
• Equation: a0 = 1.14 -0.19*n0
17
> summary(lm(a0~n0,data=vbarg))Call:
lm(formula = a0 ~ n0, data = vbarg)
…
Coefficients:Est Std. Error t value Pr(>|t|)
(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***
Linear regression model (WSJ)
• Equation: a0 = 1.14 -0.19*n0
18
> summary(lm(a0~n0,data=vbarg))Call:
lm(formula = a0 ~ n0, data = vbarg)
…
Coefficients:Est Std. Error t value Pr(>|t|)
(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***
Linear regression model (WSJ)
• Linear regression model found:
a0 = 1.14 - 0.19*n0
“As n0 size increases, a0 size decreases.”
19
Linear regression model (WSJ)
• Linear regression model found: a0 = 1.14 - 0.19*n0
“As n0 size increases, a0 size decreases.”
• Incorrect assumptionAll verbs have the same average a0 size.
20
Linear regression model (WSJ)
• Incorrect assumptionAll verbs have the same average a0 size.
• Different verbs might behave differently.
21
Linear regression model (WSJ)• Different verbs might have the different
average a0 size.
22
verb A0size N0size a0 n0 ….1 take 2 17 1.09 2.89 …2 say 4 0 1.60 0 …3 expect 1 5 0.69 1.79 …4 sell 4 0 1.60 0 …5 say 9 0 2.30 0 …6 increase 3 0 1.38 0 …… … … … … … …30590 say 0 11 0 2.48
Linear regression model (WSJ)• Different verbs might have the different
average a0 size. à Random effect
23
verb A0size N0size a0 n0 ….1 take 2 17 1.09 2.89 …2 say 4 0 1.60 0 …3 expect 1 5 0.69 1.79 …4 sell 4 0 1.60 0 …5 say 9 0 2.30 0 …6 increase 3 0 1.38 0 …… … … … … … …30590 say 0 11 0 2.48
Random effect
• Similar examples in an experimentü individual participants (subject) ü individual test words (item)
• Items and subjects are randomly sampled from their populations.
• If we repeat the same experiment, different items and subjects will be employed.
24
Fixed effect
• Usually not interested in random effects, but rather in fixed effects.
• Fixed factor The set of possible levels of a factor is fixed, and each of these levels can be repeated.
• Examplesü N0
ü Treatment factors with two levels: treatment vs. control group
25
Mixed effects model
• A model containing both fixed and randomfactors.
• R Package: lmerTest• R function: lmer()
(Note: Johnson (2008) uses different package and function.)
26
Mixed effects model
• random effects ürandom interceptürandom slope
• Equation: y= b + a*x
27
Mixed effects model with random intercept (WSJ)
• AssumptionAverage A0 size may be verb-specific.
• The model includes a separate intercept for each verb.
• Formula
lmer(a0 ~ n0 + (1|verb), data=vbarg)
28
Mixed effects model with random intercept (WSJ)
• AssumptionAverage A0 size may be verb-specific.
• The model includes a separate intercept for each verb.
• Formula
lmer(a0 ~ n0 + (1|verb), data=vbarg)
29
Mixed effects model with random intercept (WSJ)
30
> library(lmerTest)> verb.lmer01 <-
lmer(a0~n0+(1|verb),data=vbarg)> summary(verb.lmer01)
Mixed effects model with random intercept (WSJ)
31
> summary(verb.lmer01)…Random effects:
Groups Name Variance Std.Dev.
verb (Intercept) 0.1659 0.4074
Residual 0.3926 0.6266
Number of obs: 30590, groups: verb, 32
Fixed effects:Est Std.Error df t value Pr(>|t|)
(Intercept) 0.850 0.07 31 11.77 5.04e-13 ***n0 -0.102 0.003 30570 -31.46 < 2e-16 ***
Mixed effects model with random intercept (WSJ)
32
> summary(verb.lmer01)…Random effects:
Groups Name Variance Std.Dev.
verb (Intercept) 0.1659 0.4074
Residual 0.3926 0.6266
Number of obs: 30590, groups: verb, 32
Fixed effects:Est Std.Error df t value Pr(>|t|)
(Intercept) 0.850 0.07 31 11.77 5.04e-13 ***n0 -0.102 0.003 30570 -31.46 < 2e-16 ***
Mixed effects model with random intercept (WSJ)
33
> summary(verb.lmer01)…Random effects:
Groups Name Variance Std.Dev.
verb (Intercept) 0.1659 0.4074
Residual 0.3926 0.6266
Number of obs: 30590, groups: verb, 32
Fixed effects:Est Std.Error df t value Pr(>|t|)
(Intercept) 0.850 0.07 31 11.77 5.04e-13 ***n0 -0.102 0.003 30570 -31.46 < 2e-16 ***
Mixed effects model with random intercept (WSJ)• model found:
a0 = 0.850 - 0.102*n0
There is a strong effect of n0 on a0even after controlling for the different average size of a0 for different verbs.
34
Mixed effects model with random intercept (WSJ)• model found:
a0 = 0.850 - 0.102*n0
There is a strong effect of n0 on a0 even after controlling for the different average size of a0 for different verbs.
Cf. previous linear regression model: a0 = 1.14 - 0.19*n0
35
Mixed effects model with random intercept and random slope (WSJ)
• AssumptionBoth average A0 size and the A0-N0 size relationship are verb-specific.
• The model includes verb-specific intercept and slope.
• Formulalmer(a0 ~ n0 + (1+n0|verb), data=vbarg)
36
Mixed effects model with random intercept and random slope (WSJ)
• AssumptionBoth average A0 size and the A0-N0 size relationship are verb-specific.
• The model includes verb-specific intercept and slope.
• Formulalmer(a0 ~ n0 + (1+n0|verb), data=vbarg)
37
Mixed effects model with random intercept and random slope (WSJ)
• AssumptionBoth average A0 size and the A0-N0 size relationship are verb-specific.
• The model includes verb-specific intercept and slope.
• Formulalmer(a0 ~ n0 + (1+n0|verb), data=vbarg)
38
Mixed effects model with random intercept and random slope (WSJ)
• AssumptionBoth average A0 size and the A0-N0 size relationship are verb-specific.
• The model includes verb-specific intercept and slope.
• Formulalmer(a0 ~ n0 + (1+n0|verb), data=vbarg)
39
Mixed effects model with random intercept and random slope (WSJ)
40
> verb.lmer02<-lmer(a0~n0+(1+n0|verb),data=vbarg)
> summary(verb.lmer02)
Mixed effects model with random intercept and random slope (WSJ)
41
> summary(verb.lmer02)…Random effects:Groups Name Variance Std.Dev. Corr
verb (Intercept) 0.220 0.469
n0 0.004 0.065 -0.84
Residual 0.387 0.622
Number of obs: 30590, groups: verb, 32
Fixed effects:Est Std.Error df t value Pr(>|t|)
(Intercept) 0.852 0.083 31.05 10.23 1.81e-11 ***n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***
Mixed effects model with random intercept and random slope (WSJ)
42
> summary(verb.lmer02)…Random effects:Groups Name Variance Std.Dev. Corr
verb (Intercept) 0.220 0.469
n0 0.004 0.065 -0.84
Residual 0.387 0.622
Number of obs: 30590, groups: verb, 32
Fixed effects:Est Std.Error df t value Pr(>|t|)
(Intercept) 0.852 0.083 31.05 10.23 1.81e-11 ***n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***
Mixed effects model with random intercept and random slope (WSJ)
43
> summary(verb.lmer02)…Random effects:Groups Name Variance Std.Dev. Corr
verb (Intercept) 0.220 0.469
n0 0.004 0.065 -0.84
Residual 0.387 0.622
Number of obs: 30590, groups: verb, 32
Fixed effects:Est Std.Error df t value Pr(>|t|)
(Intercept) 0.852 0.083 31.05 10.23 1.81e-11 ***n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***
Mixed effects model with random intercept and random slope (WSJ)• A model with random intercept and
slope:a0 = 0.852 - 0.101*n0
44
Mixed effects model with random intercept and random slope (WSJ)• A model with random intercept and
slope:a0 = 0.852 - 0.101*n0
The estimate of n0 is very similar to the previous model with only random intercept, and it is still significant.
Cf. a model with only random intercepta0 = 0.850 - 0.102*n0
45
Model comparison
• QuestionDoes a model with random slope and intercept fits the data better than the one with only random intercept?
• Likelihood ratio test ü Compare smaller (or simpler) vs. larger (or more
complicated) models.ü It is carried out by the anova() function.
anova(verb.lmer01, verb.lmer02)
46
Model comparison (WSJ)
47
> anova(verb.lmer01, verb.lmer02)Data: vbarg
Models:verb.lmer01: a0 ~ n0 + (1 | verb)verb.lmer02: a0 ~ n0 + (1 + n0 | verb)
Df AIC BIC … Chisq Chi Df Pr(>Chisq) verb.lmer01 4 58397 …verb.lmer02 6 58083 … 318.39 2 < 2.2e-16 ***
---
Model comparison
• The likelihood ratio test is taken to compare models with and without a factor.
• It measures how much the likelihood of the model improves.
• If the ratio is close to 1.0, then the improved fit offered by the added factor is insubstantial and considered a non-significant predictor of the criterion variable.
48
Model comparison (WSJ)• The ratio is 318.39 which is significantly greater
than chance. The associated probability is very small.
• This indicates that adding a random slope significantly improved the fit of the model.
49
> anova(verb.lmer01, verb.lmer02)…
… LogLik … Chisq Chi Df Pr(>Chisq) verb.lmer01 … -29195 …verb.lmer02 … -29036 … 318.39 2 < 2.2e-16 ***
Roadmap
I. Mixed effects linear regression●Wall Street Journal corpus data○Hypothetical VC duration data○ Interaction terms and model selection
II. Mixed effects logistic regression○English dative alternation
50
Hypothetical VC duration data
• Duration measurement: vowel-consonant sequence.
• Suppose: there is inverse relationship between consonant duration (Cdur) vs. vowel duration (Vdur).
51
e.g. (c)ab V C
(c)ap V C
VC duration
• Construct two types of measurement data:Data with …(01) subject-specific intercept only(02) subject-specific intercept and slope
52
Data 01: subject-specific intercept
• Equation: y = a + b*x + e (where y = Vdur, x = Cdur, e = error)
53
Data 01: subject-specific intercept
• Equation: Vdur = a + b*Cdur + e
• a (intercept) = 300 ms
• b (slope) = -1• x (Cdur) = 100 ms
54
Data 01: subject-specific intercept
• Equation: Vdur = a + b*Cdur + e
• a (intercept) = 300 msn sd of subject-specific intercept =
20 ms• b (slope) = -1• x (Cdur) = 100 ms
55
Data 01 with no error term
• Equation: y = a + b*x + e(where e = 0)
• Number of subjects (subjN) = 5• Number of words (wordN) = 7
56
Data 01 w/o error subject word Cdur Vdur
1 1 94 2321 2 92 2351 3 91 2351 4 103 2241 5 92 2341 6 103 2231 7 123 2032 1 100 189… … … …5 6 99 2065 7 111 194
mean (sd) 98.4 (9.8) 198.5 (19.44)
57
Data 01 w/o error
subject Vdur1 2272 1923 1804 1855 207
58
• Average Vdur by subject
Data 01 w/o error: scatter plot
59
Data 01 w/o error: scatter plot
60
Data 01 with error term
• Equation: y = a + b*x + e(where e ¹ 0)
• Number of subjects (subjN) = 5• Number of words (wordN) = 7
61
Data 01 with errorsubject word Cdur Vdur
1 1 94 2331 2 92 2361 3 91 2341 4 103 2211 5 92 2411 6 103 2201 7 123 1972 1 100 193… … … …5 6 99 2125 7 111 195
mean (sd) 98.4 (9.8) 199.8 (19.08)
62
Data 01 w/ error: scatter plot
63
Data 01 w/ error
• Mixed effects model with a separate intercept for each subject: • Average Vdur may be subject-specific.
64
> lmer(Vdur ~ Cdur + (1|subject),data=d)
Data 01 w/ error
65
• Random effects
• Fixed effects
Groups Name Variance Std.Dev.subject (Intercept) 303.93 17.434Residual 24.05 4.904
Estimate … t value Pr(>|t|)(Intercept) 302.964 25.885 <0.001Cdur -1.047 -11.865 <0.001
Data 02: subject-specific intercept and slope
• Equation: Vdur = a + b*Cdur + e
• a (intercept) = 300 msn sd of subject-specific intercept =
20 ms• b (slope) = -1
n sd of subject-specific slope = 0.3• x (Cdur) = 100 ms
66
Data 02: subject-specific intercept and slope
• Equation: y = a + b*x + e
• Number of subjects (subjN) = 5• Number of words (wordN) = 7
67
Data 02 w/ subject-specific intercept & slopesubject word Cdur Vdur
1 1 94 2241 2 92 2271 3 91 2251 4 103 2111 5 92 2321 6 103 2101 7 123 1852 1 100 198… … … …5 6 99 1965 7 111 177
mean (sd) 98.4 (9.8) 195.2 (16.9)
68
Data 02: scatter plot
69
Data 02
• Mixed effects model with subject-specific intercept & slope:
70
> lmer(Vdur ~ Cdur + (1+Cdur|subject),data=d)
Data 02
71
• Random effects
• Fixed effects
Groups Name Variance Std.Dev. Corrsubject (Intercept) 708.599 26.62
Cdur 0.015 0.122 -1.00Residual 20.989 4.581
Estimate … t value Pr(>|t|)(Intercept) 305.987 21.186 <0.001Cdur -1.124 -11.331 <0.001
Data 02
72
• Random effects
• Fixed effects
Groups Name Variance Std.Dev. Corrsubject (Intercept) 708.599 26.62
Cdur 0.015 0.122 -1.00Residual 20.989 4.581
Estimate … t value Pr(>|t|)(Intercept) 305.987 21.186 <0.001Cdur -1.124 -11.331 <0.001
Not even close to 0.3.
Data 02: larger data
• Equation: y = a + b*x + e
• Number of subjects (subjN) = 30• Number of words (wordN) = 50
73
Data 02: larger data
74
• Random effects
• Fixed effects
Groups Name Variance Std.Dev. Corrsubject (Intercept) 474.678 21.787
Cdur 0.089 0.298 -0.19Residual 25.064 5.006
Estimate … t value Pr(>|t|)(Intercept) 296.2 70.702 <0.001Cdur -0.908 -16.218 <0.001
Roadmap
I. Mixed effects linear regression●Wall Street Journal corpus data●Hypothetical VC duration data○ Interaction terms and model selection
II. Mixed effects logistic regression○English dative alternation
75
Interaction terms and model selection
• When there are more than one fixed effect, we can also consider the interactions terms.
• As a number of terms are added to the model, evaluating which model best fits the data becomes necessary.• vdur ~ 1• vdur ~ cdur• vdur ~ cdur + voicing• vdur ~ cdur + voicing + cdur× voicing
76
• A key aspect of the mixed-effects analysis is often model selection, the choice of a particular model within a class of candidate models. (Muller and Welsh 2013)
77
Interaction terms and model selection
Model selection
• Aim: A parsimonious model with other desirable properties. Get as good a fit as possible with a minimum of predictive variables.
• Starting from the saturated model (the full model with all factors and their interactions), eliminate non-significant predictors, applying LRT. • Backward elimination for linear mixed model
78
Functions for model selection
• update(): removes a factor from a model
• anova(): Likelihood-ratio test
• step(): function for model selection (gives the results (=the best model) all at once)
79
Data: vcdur.txt
• English speakers’ production of monosyllabic English words differing in coda voicing (bit, bid, beat, bead..).
• Speaker regions: NZ(3), UK(2), US(2)• Dependent variable: vdur
80
Specifying interaction terms
a*b = a + b + a:bvdur ~ cdur*voicing vdur ~ cdur + voicing + cdur∶voicing
vdur ~ cdur*voicing*height vdur ~ cdur + voicing + height + cdur:voicing+ voicing:height + cdur:height + cdur:voicing:height
81
Saturated model
• Model with all factors and their interactions.
• Starting with this saturated model, eliminate factors and interactions that do not significantly contribute to the goodness of model fit. • The contribution is tested by LRT (p<0.05).
All factors and their interactionsi.e. ML (Use this for
model selection)☞ Use REML=TRUE (for parameter estimation)
82
Fixed effects:Estimate Std. Error df t value Pr(>|t|)
(Intercept) 216.04612 20.24309 41.60000 10.673 1.76e-13 ***cdur -0.15703 0.07356 480.90000 -2.135 0.03328 * voicingVOICELESS -24.87089 25.79788 38.70000 -0.964 0.34100 heightNONHIGH 12.46191 24.78135 33.00000 0.503 0.61839 cdur:voicingVOICELESS -0.23325 0.08912 474.40000 -2.617 0.00915 ** cdur:heightNONHIGH 0.18897 0.09312 474.90000 2.029 0.04299 * voicingVOICELESS:heightNONHIGH 8.80019 36.61632 39.20000 0.240 0.81132 cdur:voicingVOICELESS:heightNONHIGH -0.17443 0.13220 473.60000 -1.319 0.18765 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Saturated model Model with all interaction termssummary(mm1)
We will eliminate this non-significant three-way
interaction. 83
Eliminating a factor: Use Update()
• Syntax: update(x, ~.-factor)• Delete the non-significant three-way interaction
term by update().mm2<-update(mm1, ~.-cdur:voicing:height)
• Output model specification:
84
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]Formula: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) +
cdur:voicing + cdur:height + voicing:height
Likelihood Ratio Test
• The saturated model (mm1) is not better than the reduced model (mm2), so we can discard mm1.
85
> anova(mm1,mm2)Data: dModels:mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:heightmm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)mm2 10 4996.9 5039.1 -2488.5 4976.9 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.1874
Reporting the result
86
> anova(mm1,mm2)Data: dModels:mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:heightmm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)mm2 10 4996.9 5039.1 -2488.5 4976.9 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.1874
“The model with three-way interactions was not significantly better than the model without the interaction (χ2(1) = 1.74, p = .19).”
Report Chi sq statistics: χ2(df)=Chisq value, p= p value
Fixed effects:Estimate Std. Error df t value Pr(>|t|)
(Intercept) 212.42246 20.06869 40.10000 10.585 3.55e-13 ***cdur -0.12437 0.06938 478.10000 -1.793 0.0737 . voicingVOICELESS -14.43787 24.57580 31.90000 -0.587 0.5610 heightNONHIGH 21.47215 23.83963 28.30000 0.901 0.3754 cdur:voicingVOICELESS -0.31050 0.06731 474.70000 -4.613 5.11e-06 ***cdur:heightNONHIGH 0.10303 0.06668 474.20000 1.545 0.1230 voicingVOICELESS:heightNONHIGH -13.76816 32.39704 24.10000 -0.425 0.6746 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Current model: summary (mm2)
Among the interactions, eliminate the interaction with higher p-value
87
Elimination (2nd)
• A model with more parameters(mm2) is not better than mm3, so we can discard mm2 and adopt a simpler model mm3.
• Reporting: The interaction of voicing and vowel height did not significantly improve the goodness of fit (χ2(1)=0.18,p=0.67). 88
> mm3<-update(mm2,~.-voicing:height) > anova(mm2,mm3)Data: dModels:mm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:heightmm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)mm3 9 4995.1 5033.1 -2488.6 4977.1 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.18 1 0.6714
Fixed effects:Estimate Std. Error df t value Pr(>|t|)
(Intercept) 215.77739 18.48194 41.70000 11.675 1.02e-14 ***cdur -0.12390 0.06938 478.10000 -1.786 0.0748 . voicingVOICELESS -21.31848 18.52951 40.40000 -1.151 0.2567 heightNONHIGH 14.94762 18.29788 38.30000 0.817 0.4190 cdur:voicingVOICELESS -0.31019 0.06730 474.90000 -4.609 5.21e-06 ***cdur:heightNONHIGH 0.10022 0.06634 482.60000 1.511 0.1315 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Summary of the current model (mm3)
• Repeat the same procedure until deleting a term yields a significant change in the goodness of fit (p<0.05), ..then:
Now remove cdur:height.
89
> anova(mm1,mm2,mm3,mm4,mm5,mm6,mm7,mm8)Data: dModels:mm8: vdur ~ cdur + (1 | speaker) + (1 | item)mm6: vdur ~ cdur + (1 | speaker) + (1 | item) + cdur:voicingmm7: vdur ~ (1 | speaker) + (1 | item) + cdur:voicingmm5: vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicingmm4: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm4: cdur:voicingmm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:heightmm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:heightmm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item)
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm8 5 5022.7 5043.8 -2506.3 5012.7 mm6 6 4995.4 5020.7 -2491.7 4983.4 29.2976 1 6.207e-08 ***mm7 6 4995.4 5020.7 -2491.7 4983.4 0.0000 0 1.00000 mm5 7 4996.2 5025.7 -2491.1 4982.2 1.1459 1 0.28442 mm4 8 4995.4 5029.1 -2489.7 4979.4 2.8169 1 0.09328 . mm3 9 4995.1 5033.1 -2488.6 4977.1 2.2750 1 0.13148 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.1800 1 0.67140 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.18741 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
mm6:Best model
90
mm5: Best model that keeps all main effects in the
interactions
Eliminating random effects
• To the best model so far (mm5), try removing random effects. mm9<–update(mm5, ~.-(1|speaker),RELM=TRUE)mm10<-update(mm5, ~.-(1|item),RELM=TRUE)anova(mm5,mm9)anova(mm5,mm10)
• The current model is significantly better than the model without the by-speaker random intercept (χ2(1)=201.22, p<0.0001) and the one without the by-item random intercept (χ2(1)=401.54, p<0.0001).
91
Refitting the best model with REML
• After the best model is decided, refit the model with REML to obtain the best estimate of parameter values.
mm5<-lmer(vdur~cdur+voicing+cdur:voicing+(1|speaker)+(1|item),data=d)
Remove REML=FALSE
92
The best model summary & interpretation
• Interpretation: • Vowel duration changes by -0.31 ms (t(473)=-4.54,
p< .0001), as consonant duration increases by 1 msif the consonant is voiceless.
93
Fixed effects:Estimate Std. Error df t value Pr(>|t|)
(Intercept) 224.61615 17.24766 33.40000 13.023 1.25e-14 ***cdur -0.08896 0.06580 468.50000 -1.352 0.177 voicingVOICELESS -20.87662 19.86242 34.90000 -1.051 0.300 cdur:voicingVOICELESS -0.30664 0.06758 472.70000 -4.537 7.23e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Backward Elimination using Step()
• Function step() performs backward elimination of non-significant effects of linear mixed effects model. • The step() function of package lmerTest overrides the
one for lm objects (which does forward selection based on AIC). • Applying to our data:
94
> step(mm1)Backward reduced random-effect table:
Eliminated npar logLik AIC LRT Df Pr(>Chisq) <none> 11 -2487.6 4997.2 (1 | speaker) 0 10 -2588.7 5197.3 202.09 1 < 2.2e-16 ***(1 | item) 0 10 -2673.5 5367.0 371.78 1 < 2.2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Fixed effects:
Selected model:
95
Backward reduced fixed-effect table:Degrees of freedom method: Satterthwaite
Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F) cdur:voicing:height 1 1654.5 1654.5 1 473.56 1.7410 0.18765 voicing:height 2 172.3 172.3 1 24.10 0.1806 0.67462 cdur:height 3 2176.4 2176.4 1 482.55 2.2819 0.13155 height 4 2866.7 2866.7 1 23.61 2.9903 0.09682 . cdur:voicing 0 19776.1 19776.1 1 474.75 20.6284 7.079e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Model found:vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicing
Roadmap
I. Mixed effects linear regression●Wall Street Journal corpus data●Hypothetical VC duration data● Interaction terms and model selection
II. Mixed effects logistic regression○English dative alternation
96
II. Mixed effects logistic regression
○English dative alternation
97
English dative alternation
• Discussed in Johnson (2008: section 7.4, citing Bresnan et al 2007)
• Two alternative ways
98
dative PP dative NPI pushed the box to John. I pushed John the box.
English dative alternation
• Preference (A > B: A is preferred to B.)
99
dative PP dative NP
I pushed the box to John. > I pushed John the box.
That movie gave the creeps to me. < That movie gave me the
creeps.
This will give the creeps to just about anyone. > This will give just about
anyone the creeps.
English dative alternation
• Factors for preference: recipient = pronoun or not;
short or long; given or new …?
100
Data: BresDative.txt
Two corpora, • the Switchboard corpus of conversational
speech• the Wall Street Journal corpus of text
101
Data: BresDative.txt
Code:• the realization of the dative (PP, NP), • the discourse accessibility, definiteness, animacy, and pronominality of the recipient and theme;
• the semantic class of the verb ü abstract, transfer, future transfer, prevention of
possession, and communication• a measure of the difference between the (log) length of
the recipient and (log) length of the theme.
102
Data: BresDative.txt
103
real verb vsense class animrec ….
1 NP feed feed.t t animate …2 NP give give.a a animate …3 NP give give.a a animate …4 NP give give.a a animate …5 NP offer offer.c c animate …6 NP give give.a a animate …… … … … … … …3265 NP give give.a a animate …
Data: BresDative.txt
• Question: … predict …?
104
class
accessibility à realization of dative
definiteness
…
Data: BresDative.txt
• Logistic regressionR function: glm()
105
Logistic regression
• Regression for count data
• We cannot apply standard linear regression to the proportions.ü Proportions are bounded between 0 and
1, but lm() does not know this. ü Other problems.
è Logit transformation
106
Logistic regression
• Logit transformation
107
i. p proportion 0 ~ 1
ii.!"#! odds 0 ~ ¥
iii. log(!"#!) log odds, i.e. logit -¥ ~ +¥
Data: BresDative.txt• Logistic regression model of dative
alternation
ü glm(real~class+accrec+…, family=binomial, data=SwitchDat)
“PP realization as a function of class, …”
108
Data: BresDative.txt
109
> summary(glm(real ~ class+accrec+accth+prorec+ proth+defrec+defth+animrec+ldiff, family=binomial, data=SwitchDat))
...Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3498 0.3554 0.984 0.32503
classc -1.3516 0.3141 -4.303 1.68e-05 ***
classf 0.5138 0.4922 1.044 0.29651
classp -3.4277 1.2504 -2.741 0.00612 **
classt 1.1571 0.2055 5.631 1.80e-08 ***
accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 ***
...
Data: BresDative.txt
110
...Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.3498 0.3554 0.984 0.32503
classc -1.3516 0.3141 -4.303 1.68e-05 ***
classf 0.5138 0.4922 1.044 0.29651
classp -3.4277 1.2504 -2.741 0.00612 **
classt 1.1571 0.2055 5.631 1.80e-08 ***
accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 ***
...
When recipient is not given earlier, PP is more likely.
Data: BresDative.txt
• Fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf
– 3.42*classp + 1.15*classt + 1.12*accrecnotgiven + …
111
Data: BresDative.txt
• A fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf –
3.42*classp + 1.15*classt + 1.12*accrecnotgiven + …
• Incorrect assumptionAll verbs have the same preference of dative alternation.
112
Data: BresDative.txt
• PossibilityVerbs may differ in the preference of dative alternation.
113
Data: BresDative.txt
• PossibilityVerbs may differ in the preference of dative alternation.
• Verbs may have more than one senses.E.g. pay• Transfer: to pay him some money• More abstract: to pay attention to the clock
114
Data: BresDative.txt
• Possibility (revised)Verb senses may differ in the preference of dative alternation.
• Add a random factor for verb sense.
115
Mixed effects logistic regression model
• Logistic regression model containing both fixed and random factors.
• R Package: lmerTest• R function: glmer()
116
A model with random intercept
• AssumptionAverage preference of PP realization may be specific to each verb sense.
• The model includes a separate intercept for each verb sense.
117
A model with random intercept
• The model includes a separate intercept for each verb sense.
• Formula for “PP realization as a function of class, etc.”.
• glmer(real ~ class + accrec + accth + prorec+ proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)
118
A model with random intercept
• The model includes a separate intercept for each verb sense.
• Formula for “PP realization as a function of class, etc.”.
• glmer(real ~ class + accrec + accth + prorec+ proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)
119
Mixed effects logistic model w/ random intercept
120
> summary(glmer(real ~ class...(1|vsense),...)...Random effects:
Groups Name Variance Std.Dev.
vsense (Intercept) 4.976 2.23
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.3943 0.8069 1.728 0.083984 .
classc -1.3882 1.1481 -1.209 0.226614
classf -0.2491 1.3619 -0.183 0.854862
classp -4.9325 2.1697 -2.273 0.023007 *
classt 0.9035 0.9647 0.937 0.348957
accrecnotgiven 1.6421 0.3417 4.805 1.55e-06 ***
...
Mixed effects logistic model w/ random intercept
• Compare the outputs of mixed effects model vs. model with no random factor.
121
> summary(glmer(...(1|vsense)...)Fixed effects:
Estimate Pr(>|z|)
(Intercept) 1.3943 0.083984 .
classc -1.3882 0.226614
classf -0.2491 0.854862
classp -4.9325 0.023007 *
classt 0.9035 0.348957
Accrecnotgiven 1.6421 1.55e-06 ***
...
> summary(glm(...)
Estimate Pr(>|z|)
0.3498 0.32503
-1.3516 1.68e-05 ***
0.5138 0.29651
-3.4277 0.00612 **
1.1571 1.80e-08 ***
1.1282 2.57e-05 ***
...
Random intercepts and slopes
• Many maximal random effects models (e.g. both random intercept and slope) fail to converge (i.e. can’t find a solution).
• In that case, you will have to use a simpler model (e.g. intercepts-only).
122
What we’ve covered today
I. Mixed effects linear regression●Wall Street Journal corpus data●Hypothetical VC duration data● Interaction terms and model selection
II. Mixed effects logistic regression●English dative alternation
123
Thank you.
124