Mosteller & Tukey (1977). Data Analysis and Regression.

42

description

Mosteller & Tukey (1977). Data Analysis and Regression. “Encouraging linguists to use linear mixed-effects models is like giving shotguns to toddlers.”. Gerry Altmann. (see Barr et al., 2013). “A world of subjectivity”. Sarah Depaoli “IF YOU BEAT THE DATA, AT SOME TIME IT WILL SPEAK”. - PowerPoint PPT Presentation

Transcript of Mosteller & Tukey (1977). Data Analysis and Regression.

Page 1: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.
Page 2: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Mosteller & Tukey (1977). Data Analysis and

Regression.

Page 3: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Gerry Altmann

“Encouraging linguists to use linear mixed-effects models is like

giving shotguns to toddlers.”

(see Barr et al., 2013)

Page 4: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

“A world of subjectivity”Sarah Depaoli

“IF YOU BEAT THE DATA, AT SOME TIME IT WILL SPEAK”

Page 5: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

“A world of subjectivity”Sarah Depaoli

“… and then you publish and get tenure.”

Page 6: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

LMM response ~ intercept + slope * fixed effect + error

distinguish between testand control variables

Page 7: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Test vs. Control Variable Example

pitch ~ gender

pitch ~ politen * gendertest variable control

variable

Null Model

Page 8: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Test vs. Control Variable Example

vowdur ~ Repetition

vowdur ~ VowelType * Repetition

test variable controlvariable

Null Model

Page 9: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Test vs. Control Variable Example

Critical Effect Control 1 Control 2 RandomEffects

Response ~

BLACKBOX

Page 10: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Test vs. Control Variable Example

Critical Effect Control 2 RandomEffects

Response ~

BLACKBOX

Control 3

Page 11: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Test vs. Control Variable Example

Critical Effect Control 2 RandomEffects

Response ~

Control 3

Page 12: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Trade-off #1

ModelSimplicity

ModelFit

Page 13: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Trade-off #2

Data-driven

Theory-driven

“ExploratoryEnd”

“Confirmatory

End”

Harald Baayen

Page 14: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Trade-off #2

Data-driven

Theory-driven

“ExploratoryEnd”

“Confirmatory

End”

Roger Mundry (and many others)

Page 15: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

How much do you allow the data to suggest new

hypotheses? How much do you depend on a priori

theory?

Trade-off #2Big Question:

Page 16: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Approach 1:more data-

driven

Approach 2:more theory-

driven

• e.g., test whether random slopes are needed (maybe not advisable)

• e.g., test whether interaction for sth. is necessary or not (“o.k.” if the interaction is a control variable)

• e.g., test whether sth. requires a non-linear or a linear effect (maybe o.k.)

Page 17: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

THINGS TO WORRY ABOUT:

• Taken to the extreme, this approach has a very high likelihood of finding any significant result

• The model selection process is less transparent to outsiders (or, you have to write a LONG LONG stats section)

Approach 1:more data-

driven

Approach 2:more theory-

driven

Page 18: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Approach 1:more data-

driven

Approach 2:more theory-

driven

ADVANTAGES:

• You don’t miss important things in your data

• Your model might thus be more accurate and “more true to the data”

Page 19: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Approach 1:more data-

driven

Approach 2:more theory-

driven

• You formulate your model before you look at the data

• The components of your model are guided by: Theory + Published Results General world-knowledge Research experience

• Taken to the extreme, you can’t even make a plot before you formulate your model

Page 20: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Approach 1:more data-

driven

Approach 2:more theory-

driven

ADVANTAGES:

• It forces you to think a lot

• It’s fun!

• It gives you a lot of responsibility, as a scientist

• Your estimates are going to be more conservative

Page 21: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Approach 1:more data-

driven

Approach 2:more theory-

driven

Think about model (before you conduct

your experiment)

Build model, evaluate the

model’s assumptions

Build model that better fits the assumptions

Test whether control variables interact with

test variable, or whether they are needed

Page 22: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Dialogue with your modelYou need to

know that there’s multiple

responses per subject and item!People might

speed up or slow down throughout

an experiment.

You need to know that each

item was repeated two

times!

TokenResearcher ;-)

Page 23: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Keep in mind:

• You have to resolve non-independencies

• Your random effects structure should be maximal with respect to your experimental design

Page 24: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Protecting your researchfrom yourself:

Whatever you do,your model decision

should not be based on the significance of your

effect

Page 25: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

(JEPS Bulletin)

Page 26: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Important principle

CONFIRM FIRSTEXPLORE SECOND

John McArdle

McArdle, J. J. (2011). Some ethical issues in factor analysis. In A.T. Panter & S. K. Sterba (Eds.), Handbook of Ethics in Quantitative Methodology (pp. 313-339). New York, NY: Routledge.

McArdle (2011: 335)

Page 27: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

The write-up

Page 28: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Important principle

BE HONESTNOT PURE

John McArdle

Page 29: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Cool guidelines

United Nations Economic Commission for Europe (2009a). Making Data Meaningful Part 1: A guide to writing stories about numbers. New York and Geneva: United Nations.

United Nations Economic Commission for Europe (2009b). Making Data Meaningful Part 2: A guide to presenting statistics. New York and Geneva: United Nations.

Page 30: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

“We tested a linear mixed effects model

with subjects and items as random

effects.”

Page 31: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

The write-up should reflect (as adequately as possible) the details

of your model… and your model selection procedure

= Reproducible Research

Page 32: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Rule of thumb:

“One needs to provide sufficient information for the reader to be able to recreate

the analyses.”Barr et al. (2013)

Ask yourself: With the information that I

provided, could I, myself, replicate the analysis?

Page 33: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

How to write up

• (1) "Phenomenon-oriented write-up"

• (2) Appendix / Supplementary Materials

Page 34: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

“We used generalized linear mixed models to test the effect of Gender and Politeness on pitch. Subjects and items were

random effects (random intercepts) (Baayen, Davidson & Bates, 2008), with random slopes for subjects and items for the effect Politeness (Barr, Levy, Scheepers & Tily, 2013). We also included

a Gender * Politeness interaction into the model and if this interaction was not significant, only included the main

effects. /// Q-Q plots and plots of residuals against fitted values revealed no obvious deviations from normality and

homoskedasticity. We report p-values based on Likelihood Ratio Tests of the model with the main fixed effect in question

(Politeness) against the model without the main fixed effect (null model, including Gender).”

Example #1

Page 35: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

“We used generalized linear mixed models to test the association between voice onset time and pitch. The fixed

effects quantify the effect of VOT on politeness, as well as the effect of place of articulation, vowel type and gender on

politeness. The random effects quantify the by-subject and by-item variability in pitch (random intercepts), as well as the

variation of the effect of VOT on pitch for subjects and items (random slopes).”

Example #2: "Phenomenon-oriented"

Page 36: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

“Visual inspection of residual plots revealed no obvious deviation from normality and homoskedasticity of errors.”

“We checked plots of residuals against fitted values and found no indication that the normality and homoskedasticity

assumption were violated.”

“… indicated a problem with … We therefore log-transformed the data.”

Mentioning assumptions

Page 37: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Results

o Provide results of likelihood ratio test (i.e., significance etc.)

o Provide estimates and standard errors in the metric of the model

o For poisson and logistic regression, additionally provide some exemplary back-transformed values (don’t back-transform the standard errors)

Page 38: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Data: magModels:magmodel.maineffect: linelength ~ condition + city_status + german_side + gender + magmodel.maineffect: trial_order + (1 + condition * city_status | subjects) + magmodel.maineffect: (1 + condition * city_status | items)magmodel: linelength ~ condition * city_status + german_side + gender + magmodel: trial_order + (1 + condition * city_status | subjects) + magmodel: (1 + condition * city_status | items) Df AIC BIC logLik Chisq Chi Df Pr(>Chisq) magmodel.maineffect 27 7984.5 8121.9 -3965.3 magmodel 28 7893.7 8036.2 -3918.8 92.821 1 < 2.2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Likelihood Model Output

Page 39: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Important principle

BE HONESTNOT PURE

John McArdle

Page 40: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

Make your scripts orderlyand reproducible

Page 41: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

• Make your script online available

• Avoid modifying your data manually ... make a script that records your process

Reproducibility

Page 42: Mosteller  &  Tukey  (1977).  Data Analysis and Regression.

That’s it