Assumptions

Post on 22-Feb-2016

28 views 0 download

Tags:

description

Assumptions. “Essentially , all models are wrong, but some are useful”. Your model has to be wrong… … but that’s o.k. if it’s illuminating!. George E.P. Box. Linear Model Assumptions. Absence of Collinearity. No influential data points. Normality of Errors. Homoskedasticity of Errors. - PowerPoint PPT Presentation

Transcript of Assumptions

Assumptions

“Essentially, all models are wrong, but some are useful”

George E.P. Box

Your model has to bewrong…… but that’s o.k.if it’s illuminating!

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Absence of Collinearity

Baayen(2008: 182)

Absence of Collinearity

Baayen(2008: 182)

Where does collinearitycome from?

…most often, correlated predictor variables

Demo

What to do?

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Baayen(2008: 189-

190)

Leverage

DFbeta(…and much

more)

Leave-one-outInfluence Diagnostics

Winter & Matlock (2013)

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Normality of ErrorThe error (not the data!) is assumed to be normally distributed

So, the residuals should be normally distributed

xmdl = lm(y ~ x)hist(residuals(xmdl))

qqnorm(residuals(xmdl))qqline(residuals(xmdl))

qqnorm(residuals(xmdl))qqline(residuals(xmdl))

Linear ModelAssumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Homoskedasticity of ErrorThe error (not the data!) is assumed to have equal variance across the predicted values

So, the residuals should have equal variance across the predicted values

WHAT TO IF NORMALITY/HOMOSKEDAS

TICITY IS VIOLATED? Either: nothing + report the

violation Or: report the violation

+ transformations

Two types of transformations

LinearTransformation

s

NonlinearTransformation

s

Leave shape of the distribution

intact (centering, scaling)

Do change the shape of the distribution

Before transformation

After transformation

Still bad….…. but better!!

Assumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Normality of Errors

Homoskedasticity of Errors

(Histogram of Residuals)

Q-Q plot of Residuals

Residual Plot

Assumptions

Absence ofCollinearity

No influentialdata points

Independence

Normality of Errors

Homoskedasticity of Errors

Assumptions

Absence ofCollinearity

Normality of Errors

Homoskedasticity of Errors

No influentialdata points

Independence

Assumptions

What isindependence?

Rep 1

Rep 2

Rep 3

Item #1

Subject

Common experimental data

Item...

Item...

Rep 1

Rep 2

Rep 3

Item #1

Subject

Common experimental data

Pseudoreplication= DisregardingDependenciesItem

...

Item...

Subject1 Item1Subject1 Item2Subject1 Item3… …

Subject2 Item1Subject2 Item2Subject3 Item3…. …

Machlis et al. (1985)“pooling fallacy”

Hurlbert (1984)“pseudoreplication”

Hierarchical data is everywhere• Typological data

(e.g., Bell 1978, Dryer 1989, Perkins 1989; Jaeger et al., 2011)

• Organizational data

• Classroom data

German

French

English

Spanish Italian

Swedish

NorwegianFinnish

Hungarian

Turkish

Romanian

German

French

English

Spanish Italian

Swedish

NorwegianFinnish

Hungarian

Turkish

Romanian

Class 1 Class 2

Hierarchical data is everywhere

Class 1 Class 2

Hierarchical data is everywhere

Class 1 Class 2

Hierarchical data is everywhere

Hierarchical data is everywhere

IntraclassCorrelation (ICC)

Hierarchical data is everywhere

Simulation for 16 subjects

pseudoreplication

items analysis

Type Ierrorrate

Interpretational Problem:What’s the population

for inference?

Violating the independence assumption makesthe p-value…

…meaningless

S1

S2

S1

S2

That’s it(for now)