Basics of mixed effects models in Rling.snu.ac.kr/jun/ksss_2018/handout.pdf · Baayen, R. Harald....

Basics of mixed effects models in R

July 5, 2018 Summer workshop: the Korean Society of Speech Sciences

Jongho Jun Hyesun Cho

([email protected])Seoul National University

([email protected])Dankook University

Topics

• Mixed effects linear regression • Mixed effects logistic regression

• Fixed effect• Random effectü Random interceptü Random slope• Model comparison

2

Roadmap

I. Mixed effects linear regression○Wall Street Journal corpus data○Hypothetical VC duration data○ Interaction terms and model selection

II. Mixed effects logistic regression○English dative alternation

3

Data for in-class discussion

• vbarg.txt

• BresDative.txtDownload 7. Syntax (.zip)

from the DOWNLOADS tabat https://www.wiley.com/en-

us/Quantitative+Methods+In+Linguistics-p-9781405144247.

• vcdur.txt: download from https://goo.gl/N2oDaS

4

https://goo.gl/N2oDaS

References� Johnson, Keith. (2008) Quantitative methods in

linguistics. Blackwell. ch7.

� Baayen, R. Harald. (2008). Analyzing Linguistic Data:A Practical Introduction to Statistics Using R. Cambridge University Press. ch7.

� Muller, Samuel, J. L. Scealy & A.H. Welsh (2013) Model Selection in Linear Mixed Models. Statistical Science 28(2), 135-167

5

I. Mixed effects linear regression

○Wall Street Journal corpus data

○Hypothetical VC duration data

○ Interaction terms and model selection

6

Wall Street Journal (WSJ) corpus

• Discussed in Johnson (2008 section 7.3)• Coding

(N0 According to the media)(A0 President Menem)(V took)(A1 office)(AM July 8).

ü A0 = agent; ü N0 = earlier material in the sentence.ü A1 = argumentü …

7

Research question (WSJ)

Is there any relationship between # words of N0 and # words of A0?

• (N0 According to the media)(A0 President Menem)(V took)(A1 office)(AM July 8).

• More specifically,Prediction: N0 size à A0 size?

8

WSJ corpus data: vbarg.txt • a0, no: log transformed size

9

verb A0size N0size a0 n0 ….1 take 2 17 1.09 2.89 …2 say 4 0 1.60 0 …3 expect 1 5 0.69 1.79 …4 sell 4 0 1.60 0 …5 say 9 0 2.30 0 …6 increase 3 0 1.38 0 …… … … … … … …

30590 say 0 11 0 2.48

Linear regression model

• Equation: y= b + a*x

üb interceptüa slope

10

Linear regression model (WSJ)


üb interceptüa slope

11

dependent var: a0 Predictor: n0


• Equation: a0 = b + a*n0

• R function: lm()

12



13

> lm(a0~n0,data=vbarg)



14

> summary(lm(a0~n0,data=vbarg))Call:

lm(formula = a0 ~ n0, data = vbarg)

…

Coefficients:Est Std. Error t value Pr(>|t|)

(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***



15



…


(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***


• Equation: a0 = 1.14 + a*n0

16



…


(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***


• Equation: a0 = 1.14 -0.19*n0

17



…


(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***


• Equation: a0 = 1.14 -0.19*n0

18



…


(Intercept) 1.14 0.005 196.68 <2e-16 ***n0 -0.19 0.003 -53.46 <2e-16 ***


• Linear regression model found:

a0 = 1.14 - 0.19*n0

“As n0 size increases, a0 size decreases.”

19


• Linear regression model found: a0 = 1.14 - 0.19*n0

“As n0 size increases, a0 size decreases.”

• Incorrect assumptionAll verbs have the same average a0 size.

20


• Incorrect assumptionAll verbs have the same average a0 size.

• Different verbs might behave differently.

21

Linear regression model (WSJ)• Different verbs might have the different

average a0 size.

22

verb A0size N0size a0 n0 ….1 take 2 17 1.09 2.89 …2 say 4 0 1.60 0 …3 expect 1 5 0.69 1.79 …4 sell 4 0 1.60 0 …5 say 9 0 2.30 0 …6 increase 3 0 1.38 0 …… … … … … … …30590 say 0 11 0 2.48

Linear regression model (WSJ)• Different verbs might have the different

average a0 size. à Random effect

23

verb A0size N0size a0 n0 ….1 take 2 17 1.09 2.89 …2 say 4 0 1.60 0 …3 expect 1 5 0.69 1.79 …4 sell 4 0 1.60 0 …5 say 9 0 2.30 0 …6 increase 3 0 1.38 0 …… … … … … … …30590 say 0 11 0 2.48

Random effect

• Similar examples in an experimentü individual participants (subject) ü individual test words (item)

• Items and subjects are randomly sampled from their populations.

• If we repeat the same experiment, different items and subjects will be employed.

24

Fixed effect

• Usually not interested in random effects, but rather in fixed effects.

• Fixed factor The set of possible levels of a factor is fixed, and each of these levels can be repeated.

• Examplesü N0

ü Treatment factors with two levels: treatment vs. control group

25

Mixed effects model

• A model containing both fixed and randomfactors.

• R Package: lmerTest• R function: lmer()

(Note: Johnson (2008) uses different package and function.)

26

Mixed effects model

• random effects ürandom interceptürandom slope


27

Mixed effects model with random intercept (WSJ)

• AssumptionAverage A0 size may be verb-specific.

• The model includes a separate intercept for each verb.

• Formula

lmer(a0 ~ n0 + (1|verb), data=vbarg)

28


• AssumptionAverage A0 size may be verb-specific.

• The model includes a separate intercept for each verb.

• Formula

lmer(a0 ~ n0 + (1|verb), data=vbarg)

29


30

> library(lmerTest)> verb.lmer01 <-

lmer(a0~n0+(1|verb),data=vbarg)> summary(verb.lmer01)


31

> summary(verb.lmer01)…Random effects:

Groups Name Variance Std.Dev.

verb (Intercept) 0.1659 0.4074

Residual 0.3926 0.6266

Number of obs: 30590, groups: verb, 32

Fixed effects:Est Std.Error df t value Pr(>|t|)

(Intercept) 0.850 0.07 31 11.77 5.04e-13 ***n0 -0.102 0.003 30570 -31.46 < 2e-16 ***


32




Residual 0.3926 0.6266



(Intercept) 0.850 0.07 31 11.77 5.04e-13 ***n0 -0.102 0.003 30570 -31.46 < 2e-16 ***


33




Residual 0.3926 0.6266



(Intercept) 0.850 0.07 31 11.77 5.04e-13 ***n0 -0.102 0.003 30570 -31.46 < 2e-16 ***

Mixed effects model with random intercept (WSJ)• model found:

a0 = 0.850 - 0.102*n0

There is a strong effect of n0 on a0even after controlling for the different average size of a0 for different verbs.

34

Mixed effects model with random intercept (WSJ)• model found:

a0 = 0.850 - 0.102*n0

There is a strong effect of n0 on a0 even after controlling for the different average size of a0 for different verbs.

Cf. previous linear regression model: a0 = 1.14 - 0.19*n0

35

Mixed effects model with random intercept and random slope (WSJ)

• AssumptionBoth average A0 size and the A0-N0 size relationship are verb-specific.

• The model includes verb-specific intercept and slope.

• Formulalmer(a0 ~ n0 + (1+n0|verb), data=vbarg)

36





37





38





39


40

> verb.lmer02<-lmer(a0~n0+(1+n0|verb),data=vbarg)

> summary(verb.lmer02)


41

> summary(verb.lmer02)…Random effects:Groups Name Variance Std.Dev. Corr


n0 0.004 0.065 -0.84

Residual 0.387 0.622



(Intercept) 0.852 0.083 31.05 10.23 1.81e-11 ***n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***


42



n0 0.004 0.065 -0.84




(Intercept) 0.852 0.083 31.05 10.23 1.81e-11 ***n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***


43



n0 0.004 0.065 -0.84




(Intercept) 0.852 0.083 31.05 10.23 1.81e-11 ***n0 -0.101 0.012 30.75 -8.31 2.27e-09 ***

Mixed effects model with random intercept and random slope (WSJ)• A model with random intercept and

slope:a0 = 0.852 - 0.101*n0

44

Mixed effects model with random intercept and random slope (WSJ)• A model with random intercept and

slope:a0 = 0.852 - 0.101*n0

The estimate of n0 is very similar to the previous model with only random intercept, and it is still significant.

Cf. a model with only random intercepta0 = 0.850 - 0.102*n0

45

Model comparison

• QuestionDoes a model with random slope and intercept fits the data better than the one with only random intercept?

• Likelihood ratio test ü Compare smaller (or simpler) vs. larger (or more

complicated) models.ü It is carried out by the anova() function.

anova(verb.lmer01, verb.lmer02)

46

Model comparison (WSJ)

47

> anova(verb.lmer01, verb.lmer02)Data: vbarg

Models:verb.lmer01: a0 ~ n0 + (1 | verb)verb.lmer02: a0 ~ n0 + (1 + n0 | verb)

Df AIC BIC … Chisq Chi Df Pr(>Chisq) verb.lmer01 4 58397 …verb.lmer02 6 58083 … 318.39 2 < 2.2e-16 ***

---

Model comparison

• The likelihood ratio test is taken to compare models with and without a factor.

• It measures how much the likelihood of the model improves.

• If the ratio is close to 1.0, then the improved fit offered by the added factor is insubstantial and considered a non-significant predictor of the criterion variable.

48

Model comparison (WSJ)• The ratio is 318.39 which is significantly greater

than chance. The associated probability is very small.

• This indicates that adding a random slope significantly improved the fit of the model.

49

> anova(verb.lmer01, verb.lmer02)…

… LogLik … Chisq Chi Df Pr(>Chisq) verb.lmer01 … -29195 …verb.lmer02 … -29036 … 318.39 2 < 2.2e-16 ***

Roadmap

I. Mixed effects linear regression●Wall Street Journal corpus data○Hypothetical VC duration data○ Interaction terms and model selection


50

Hypothetical VC duration data

• Duration measurement: vowel-consonant sequence.

• Suppose: there is inverse relationship between consonant duration (Cdur) vs. vowel duration (Vdur).

51

e.g. (c)ab V C

(c)ap V C

VC duration

• Construct two types of measurement data:Data with …(01) subject-specific intercept only(02) subject-specific intercept and slope

52

Data 01: subject-specific intercept

• Equation: y = a + b*x + e (where y = Vdur, x = Cdur, e = error)

53


• Equation: Vdur = a + b*Cdur + e

• a (intercept) = 300 ms

• b (slope) = -1• x (Cdur) = 100 ms

54



• a (intercept) = 300 msn sd of subject-specific intercept =

20 ms• b (slope) = -1• x (Cdur) = 100 ms

55

Data 01 with no error term

• Equation: y = a + b*x + e(where e = 0)

• Number of subjects (subjN) = 5• Number of words (wordN) = 7

56

Data 01 w/o error subject word Cdur Vdur

1 1 94 2321 2 92 2351 3 91 2351 4 103 2241 5 92 2341 6 103 2231 7 123 2032 1 100 189… … … …5 6 99 2065 7 111 194

mean (sd) 98.4 (9.8) 198.5 (19.44)

57

Data 01 w/o error

subject Vdur1 2272 1923 1804 1855 207

58

• Average Vdur by subject

Data 01 w/o error: scatter plot

59

Data 01 w/o error: scatter plot

60

Data 01 with error term

• Equation: y = a + b*x + e(where e ¹ 0)


61

Data 01 with errorsubject word Cdur Vdur

1 1 94 2331 2 92 2361 3 91 2341 4 103 2211 5 92 2411 6 103 2201 7 123 1972 1 100 193… … … …5 6 99 2125 7 111 195

mean (sd) 98.4 (9.8) 199.8 (19.08)

62

Data 01 w/ error: scatter plot

63

Data 01 w/ error

• Mixed effects model with a separate intercept for each subject: • Average Vdur may be subject-specific.

64

> lmer(Vdur ~ Cdur + (1|subject),data=d)

Data 01 w/ error

65

• Random effects

• Fixed effects

Groups Name Variance Std.Dev.subject (Intercept) 303.93 17.434Residual 24.05 4.904

Estimate … t value Pr(>|t|)(Intercept) 302.964 25.885 <0.001Cdur -1.047 -11.865 <0.001

Data 02: subject-specific intercept and slope


• a (intercept) = 300 msn sd of subject-specific intercept =

20 ms• b (slope) = -1

n sd of subject-specific slope = 0.3• x (Cdur) = 100 ms

66

Data 02: subject-specific intercept and slope

• Equation: y = a + b*x + e


67

Data 02 w/ subject-specific intercept & slopesubject word Cdur Vdur

1 1 94 2241 2 92 2271 3 91 2251 4 103 2111 5 92 2321 6 103 2101 7 123 1852 1 100 198… … … …5 6 99 1965 7 111 177

mean (sd) 98.4 (9.8) 195.2 (16.9)

68

Data 02: scatter plot

69

Data 02

• Mixed effects model with subject-specific intercept & slope:

70

> lmer(Vdur ~ Cdur + (1+Cdur|subject),data=d)

Data 02

71

• Random effects

• Fixed effects

Groups Name Variance Std.Dev. Corrsubject (Intercept) 708.599 26.62

Cdur 0.015 0.122 -1.00Residual 20.989 4.581


Data 02

72

• Random effects

• Fixed effects


Cdur 0.015 0.122 -1.00Residual 20.989 4.581


Not even close to 0.3.

Data 02: larger data

• Equation: y = a + b*x + e


73

Data 02: larger data

74

• Random effects

• Fixed effects


Cdur 0.089 0.298 -0.19Residual 25.064 5.006


Roadmap

I. Mixed effects linear regression●Wall Street Journal corpus data●Hypothetical VC duration data○ Interaction terms and model selection


75

Interaction terms and model selection

• When there are more than one fixed effect, we can also consider the interactions terms.

• As a number of terms are added to the model, evaluating which model best fits the data becomes necessary.• vdur ~ 1• vdur ~ cdur• vdur ~ cdur + voicing• vdur ~ cdur + voicing + cdur× voicing

76

• A key aspect of the mixed-effects analysis is often model selection, the choice of a particular model within a class of candidate models. (Muller and Welsh 2013)

77

Interaction terms and model selection

Model selection

• Aim: A parsimonious model with other desirable properties. Get as good a fit as possible with a minimum of predictive variables.

• Starting from the saturated model (the full model with all factors and their interactions), eliminate non-significant predictors, applying LRT. • Backward elimination for linear mixed model

78

Functions for model selection

• update(): removes a factor from a model

• anova(): Likelihood-ratio test

• step(): function for model selection (gives the results (=the best model) all at once)

79

Data: vcdur.txt

• English speakers’ production of monosyllabic English words differing in coda voicing (bit, bid, beat, bead..).

• Speaker regions: NZ(3), UK(2), US(2)• Dependent variable: vdur

80

Specifying interaction terms

a*b = a + b + a:bvdur ~ cdur*voicing vdur ~ cdur + voicing + cdur∶voicing

vdur ~ cdur*voicing*height vdur ~ cdur + voicing + height + cdur:voicing+ voicing:height + cdur:height + cdur:voicing:height

81

Saturated model

• Model with all factors and their interactions.

• Starting with this saturated model, eliminate factors and interactions that do not significantly contribute to the goodness of model fit. • The contribution is tested by LRT (p<0.05).

All factors and their interactionsi.e. ML (Use this for

model selection)☞ Use REML=TRUE (for parameter estimation)

82

Fixed effects:Estimate Std. Error df t value Pr(>|t|)

(Intercept) 216.04612 20.24309 41.60000 10.673 1.76e-13 ***cdur -0.15703 0.07356 480.90000 -2.135 0.03328 * voicingVOICELESS -24.87089 25.79788 38.70000 -0.964 0.34100 heightNONHIGH 12.46191 24.78135 33.00000 0.503 0.61839 cdur:voicingVOICELESS -0.23325 0.08912 474.40000 -2.617 0.00915 ** cdur:heightNONHIGH 0.18897 0.09312 474.90000 2.029 0.04299 * voicingVOICELESS:heightNONHIGH 8.80019 36.61632 39.20000 0.240 0.81132 cdur:voicingVOICELESS:heightNONHIGH -0.17443 0.13220 473.60000 -1.319 0.18765 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Saturated model Model with all interaction termssummary(mm1)

We will eliminate this non-significant three-way

interaction. 83

Eliminating a factor: Use Update()

• Syntax: update(x, ~.-factor)• Delete the non-significant three-way interaction

term by update().mm2<-update(mm1, ~.-cdur:voicing:height)

• Output model specification:

84

Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's method [lmerModLmerTest]Formula: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) +

cdur:voicing + cdur:height + voicing:height

Likelihood Ratio Test

• The saturated model (mm1) is not better than the reduced model (mm2), so we can discard mm1.

85

> anova(mm1,mm2)Data: dModels:mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:heightmm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)mm2 10 4996.9 5039.1 -2488.5 4976.9 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.1874

Reporting the result

86

> anova(mm1,mm2)Data: dModels:mm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:heightmm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item)


“The model with three-way interactions was not significantly better than the model without the interaction (χ2(1) = 1.74, p = .19).”

Report Chi sq statistics: χ2(df)=Chisq value, p= p value


(Intercept) 212.42246 20.06869 40.10000 10.585 3.55e-13 ***cdur -0.12437 0.06938 478.10000 -1.793 0.0737 . voicingVOICELESS -14.43787 24.57580 31.90000 -0.587 0.5610 heightNONHIGH 21.47215 23.83963 28.30000 0.901 0.3754 cdur:voicingVOICELESS -0.31050 0.06731 474.70000 -4.613 5.11e-06 ***cdur:heightNONHIGH 0.10303 0.06668 474.20000 1.545 0.1230 voicingVOICELESS:heightNONHIGH -13.76816 32.39704 24.10000 -0.425 0.6746 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Current model: summary (mm2)

Among the interactions, eliminate the interaction with higher p-value

87

Elimination (2nd)

• A model with more parameters(mm2) is not better than mm3, so we can discard mm2 and adopt a simpler model mm3.

• Reporting: The interaction of voicing and vowel height did not significantly improve the goodness of fit (χ2(1)=0.18,p=0.67). 88

> mm3<-update(mm2,~.-voicing:height) > anova(mm2,mm3)Data: dModels:mm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:heightmm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:height



(Intercept) 215.77739 18.48194 41.70000 11.675 1.02e-14 ***cdur -0.12390 0.06938 478.10000 -1.786 0.0748 . voicingVOICELESS -21.31848 18.52951 40.40000 -1.151 0.2567 heightNONHIGH 14.94762 18.29788 38.30000 0.817 0.4190 cdur:voicingVOICELESS -0.31019 0.06730 474.90000 -4.609 5.21e-06 ***cdur:heightNONHIGH 0.10022 0.06634 482.60000 1.511 0.1315 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Summary of the current model (mm3)

• Repeat the same procedure until deleting a term yields a significant change in the goodness of fit (p<0.05), ..then:

Now remove cdur:height.

89

> anova(mm1,mm2,mm3,mm4,mm5,mm6,mm7,mm8)Data: dModels:mm8: vdur ~ cdur + (1 | speaker) + (1 | item)mm6: vdur ~ cdur + (1 | speaker) + (1 | item) + cdur:voicingmm7: vdur ~ (1 | speaker) + (1 | item) + cdur:voicingmm5: vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicingmm4: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm4: cdur:voicingmm3: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm3: cdur:voicing + cdur:heightmm2: vdur ~ cdur + voicing + height + (1 | speaker) + (1 | item) + mm2: cdur:voicing + cdur:height + voicing:heightmm1: vdur ~ cdur * voicing * height + (1 | speaker) + (1 | item)

Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq) mm8 5 5022.7 5043.8 -2506.3 5012.7 mm6 6 4995.4 5020.7 -2491.7 4983.4 29.2976 1 6.207e-08 ***mm7 6 4995.4 5020.7 -2491.7 4983.4 0.0000 0 1.00000 mm5 7 4996.2 5025.7 -2491.1 4982.2 1.1459 1 0.28442 mm4 8 4995.4 5029.1 -2489.7 4979.4 2.8169 1 0.09328 . mm3 9 4995.1 5033.1 -2488.6 4977.1 2.2750 1 0.13148 mm2 10 4996.9 5039.1 -2488.5 4976.9 0.1800 1 0.67140 mm1 11 4997.2 5043.6 -2487.6 4975.2 1.7379 1 0.18741 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

mm6:Best model

90

mm5: Best model that keeps all main effects in the

interactions

Eliminating random effects

• To the best model so far (mm5), try removing random effects. mm9<–update(mm5, ~.-(1|speaker),RELM=TRUE)mm10<-update(mm5, ~.-(1|item),RELM=TRUE)anova(mm5,mm9)anova(mm5,mm10)

• The current model is significantly better than the model without the by-speaker random intercept (χ2(1)=201.22, p<0.0001) and the one without the by-item random intercept (χ2(1)=401.54, p<0.0001).

91

Refitting the best model with REML

• After the best model is decided, refit the model with REML to obtain the best estimate of parameter values.

mm5<-lmer(vdur~cdur+voicing+cdur:voicing+(1|speaker)+(1|item),data=d)

Remove REML=FALSE

92

The best model summary & interpretation

• Interpretation: • Vowel duration changes by -0.31 ms (t(473)=-4.54,

p< .0001), as consonant duration increases by 1 msif the consonant is voiceless.

93


(Intercept) 224.61615 17.24766 33.40000 13.023 1.25e-14 ***cdur -0.08896 0.06580 468.50000 -1.352 0.177 voicingVOICELESS -20.87662 19.86242 34.90000 -1.051 0.300 cdur:voicingVOICELESS -0.30664 0.06758 472.70000 -4.537 7.23e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Backward Elimination using Step()

• Function step() performs backward elimination of non-significant effects of linear mixed effects model. • The step() function of package lmerTest overrides the

one for lm objects (which does forward selection based on AIC). • Applying to our data:

94

> step(mm1)Backward reduced random-effect table:

Eliminated npar logLik AIC LRT Df Pr(>Chisq) <none> 11 -2487.6 4997.2 (1 | speaker) 0 10 -2588.7 5197.3 202.09 1 < 2.2e-16 ***(1 | item) 0 10 -2673.5 5367.0 371.78 1 < 2.2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Fixed effects:

Selected model:

95

Backward reduced fixed-effect table:Degrees of freedom method: Satterthwaite

Eliminated Sum Sq Mean Sq NumDF DenDF F value Pr(>F) cdur:voicing:height 1 1654.5 1654.5 1 473.56 1.7410 0.18765 voicing:height 2 172.3 172.3 1 24.10 0.1806 0.67462 cdur:height 3 2176.4 2176.4 1 482.55 2.2819 0.13155 height 4 2866.7 2866.7 1 23.61 2.9903 0.09682 . cdur:voicing 0 19776.1 19776.1 1 474.75 20.6284 7.079e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Model found:vdur ~ cdur + voicing + (1 | speaker) + (1 | item) + cdur:voicing

Roadmap

I. Mixed effects linear regression●Wall Street Journal corpus data●Hypothetical VC duration data● Interaction terms and model selection


96

II. Mixed effects logistic regression

○English dative alternation

97

English dative alternation

• Discussed in Johnson (2008: section 7.4, citing Bresnan et al 2007)

• Two alternative ways

98

dative PP dative NPI pushed the box to John. I pushed John the box.


• Preference (A > B: A is preferred to B.)

99

dative PP dative NP

I pushed the box to John. > I pushed John the box.

That movie gave the creeps to me. < That movie gave me the

creeps.

This will give the creeps to just about anyone. > This will give just about

anyone the creeps.


• Factors for preference: recipient = pronoun or not;

short or long; given or new …?

100

Data: BresDative.txt

Two corpora, • the Switchboard corpus of conversational

speech• the Wall Street Journal corpus of text

101


Code:• the realization of the dative (PP, NP), • the discourse accessibility, definiteness, animacy, and pronominality of the recipient and theme;

• the semantic class of the verb ü abstract, transfer, future transfer, prevention of

possession, and communication• a measure of the difference between the (log) length of

the recipient and (log) length of the theme.

102


103

real verb vsense class animrec ….

1 NP feed feed.t t animate …2 NP give give.a a animate …3 NP give give.a a animate …4 NP give give.a a animate …5 NP offer offer.c c animate …6 NP give give.a a animate …… … … … … … …3265 NP give give.a a animate …


• Question: … predict …?

104

class

accessibility à realization of dative

definiteness

…


• Logistic regressionR function: glm()

105

Logistic regression

• Regression for count data

• We cannot apply standard linear regression to the proportions.ü Proportions are bounded between 0 and

1, but lm() does not know this. ü Other problems.

è Logit transformation

106

Logistic regression

• Logit transformation

107

i. p proportion 0 ~ 1

ii.!"#! odds 0 ~ ¥

iii. log(!"#!) log odds, i.e. logit -¥ ~ +¥

Data: BresDative.txt• Logistic regression model of dative

alternation

ü glm(real~class+accrec+…, family=binomial, data=SwitchDat)

“PP realization as a function of class, …”

108


109

> summary(glm(real ~ class+accrec+accth+prorec+ proth+defrec+defth+animrec+ldiff, family=binomial, data=SwitchDat))

...Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.3498 0.3554 0.984 0.32503

classc -1.3516 0.3141 -4.303 1.68e-05 ***

classf 0.5138 0.4922 1.044 0.29651

classp -3.4277 1.2504 -2.741 0.00612 **

classt 1.1571 0.2055 5.631 1.80e-08 ***

accrecnotgiven 1.1282 0.2681 4.208 2.57e-05 ***

...


110

...Coefficients:


(Intercept) 0.3498 0.3554 0.984 0.32503

classc -1.3516 0.3141 -4.303 1.68e-05 ***

classf 0.5138 0.4922 1.044 0.29651

classp -3.4277 1.2504 -2.741 0.00612 **

classt 1.1571 0.2055 5.631 1.80e-08 ***


...

When recipient is not given earlier, PP is more likely.


• Fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf

– 3.42*classp + 1.15*classt + 1.12*accrecnotgiven + …

111


• A fixed effects logistic regression model real = 0.34 – 1.35*classc + 0.51*classf –

3.42*classp + 1.15*classt + 1.12*accrecnotgiven + …

• Incorrect assumptionAll verbs have the same preference of dative alternation.

112


• PossibilityVerbs may differ in the preference of dative alternation.

113


• PossibilityVerbs may differ in the preference of dative alternation.

• Verbs may have more than one senses.E.g. pay• Transfer: to pay him some money• More abstract: to pay attention to the clock

114


• Possibility (revised)Verb senses may differ in the preference of dative alternation.

• Add a random factor for verb sense.

115

Mixed effects logistic regression model

• Logistic regression model containing both fixed and random factors.

• R Package: lmerTest• R function: glmer()

116

A model with random intercept

• AssumptionAverage preference of PP realization may be specific to each verb sense.

• The model includes a separate intercept for each verb sense.

117



• Formula for “PP realization as a function of class, etc.”.

• glmer(real ~ class + accrec + accth + prorec+ proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)

118



• Formula for “PP realization as a function of class, etc.”.

• glmer(real ~ class + accrec + accth + prorec+ proth + defrec + defth + animrec + ldiff + (1|vsense), family=binomial, data=SwitchDat)

119

Mixed effects logistic model w/ random intercept

120

> summary(glmer(real ~ class...(1|vsense),...)...Random effects:


vsense (Intercept) 4.976 2.23

Fixed effects:


(Intercept) 1.3943 0.8069 1.728 0.083984 .

classc -1.3882 1.1481 -1.209 0.226614

classf -0.2491 1.3619 -0.183 0.854862

classp -4.9325 2.1697 -2.273 0.023007 *

classt 0.9035 0.9647 0.937 0.348957


...

Mixed effects logistic model w/ random intercept

• Compare the outputs of mixed effects model vs. model with no random factor.

121

> summary(glmer(...(1|vsense)...)Fixed effects:

Estimate Pr(>|z|)

(Intercept) 1.3943 0.083984 .

classc -1.3882 0.226614

classf -0.2491 0.854862

classp -4.9325 0.023007 *

classt 0.9035 0.348957

Accrecnotgiven 1.6421 1.55e-06 ***

...

> summary(glm(...)

Estimate Pr(>|z|)

0.3498 0.32503

-1.3516 1.68e-05 ***

0.5138 0.29651

-3.4277 0.00612 **

1.1571 1.80e-08 ***

1.1282 2.57e-05 ***

...

Random intercepts and slopes

• Many maximal random effects models (e.g. both random intercept and slope) fail to converge (i.e. can’t find a solution).

• In that case, you will have to use a simpler model (e.g. intercepts-only).

122

What we’ve covered today

I. Mixed effects linear regression●Wall Street Journal corpus data●Hypothetical VC duration data● Interaction terms and model selection

II. Mixed effects logistic regression●English dative alternation

123

Thank you.

124

Basics of mixed effects models in Rling.snu.ac.kr/jun/ksss_2018/handout.pdf · Baayen, R. Harald....

Documents

Transcript of Basics of mixed effects models in Rling.snu.ac.kr/jun/ksss_2018/handout.pdf · Baayen, R. Harald....