Post on 14-Dec-2015
Experimental design and analysis
Multiple linear regression
Gerry Quinn & Mick Keough, 1998Do not copy or distribute without permission of authors.
Multiple regression
• One response (dependent) variable:– Y
• More than one predictor (independent variable) variable:– X1, X2, X3 etc.
– number of predictors = p
• Number of observations = n
Example
• A sample of 51 mammal species (n = 51)• Response variable:
– total sleep time in hrs/day (y)
• Predictors:– body weight in kg (x1)
– brain weight in g (x2)
– maximum life span in years (x3)
– gestation time in days (x4)
Regression models
Population model (equation):
• yi = 0 + 1x1 + 2x2 + .... + i
Sample equation:
• yi = b0 + b1x1 + b2x2 + ....
Example
• Regression model:
sleep = intercept + 1*bodywt + 2*brainwt + 3*lifespan + 4*gestime
Multiple regression equation
Totalsleep
Log lifespan
Log body weight
Partial regression coefficients
• Ho: 1 = 0
• Partial population regression coefficient (slope) for y on x1, holding all other x’s constant, equals zero
• Example:– slope of regression of sleep against body
weight, holding brain weight, max. life span and gestation time constant, is 0.
Partial regression coefficients
• Ho: 2 = 0
• Partial population regression coefficient (slope) for y on x2, holding all other x’s constant, equals zero
• Example:– slope of regression of sleep against brain
weight, holding body weight, max. life span and gestation time constant, is 0.
Testing HO: i = 0
• Use partial t-tests:
• t = bi / SEbi
• Compare with t-distribution with n-2 df
• Separate t-test for each partial regression coefficient in model
• Usual logic of t-tests:– reject HO if P < 0.05
Model comparison
• To test HO: 1 = 0
• Fit full model:– y = 0+1x1+2x2+3x3+…
• Fit reduced model:– y = 0+2x2+3x3+…
• Calculate SSextra:
– SSRegression(full) - SSRegression(reduced)
• F = MSextra / MSResidual(full)
Overall regression model
• Ho: 1 = 2 = ... = 0 (all population slopes equal zero).
• Test of whether overall regression equation is significant.
• Use ANOVA F-test:– Variation explained by regression– Unexplained (residual) variation
Regression diagnostics
• Residual is still observed y - predicted y– Studentised residuals still work
• Other diagnostics still apply:– residual plots– Cook’s D statistics
Assumptions
• Normality and homogeneity of variance for response variable
• Independence of observations
• Linearity
• No collinearity
Collinearity
• Collinearity:– predictors correlated
• Assumption of no collinearity:– predictor variables are uncorrelated with (ie.
independent of) each other
• Collinearity makes estimates of i’s and their significance tests unreliable:– low power for individual tests on i’s
Response (y) and 2 predictors (x1 and x2); n=20
1. x1 and x2 uncorrelated (r = -0.24)
coeff se tol t Pintercept -0.17 1.03 -0.16 0.873x1 1.13 0.14 0.95 7.86 <0.001x2 0.12 0.14 0.95 0.86 0.404
R2 = 0.787, F = 31.38, P < 0.001
Collinearity
Collinearity
intercept 0.49 0.72 0.69 0.503x1 1.55 1.21 0.01 1.28 0.219x2 -0.45 1.21 0.01 -0.37 0.714
2. rearrange x2 so x1 and x2 highly correlated (r = 0.99)
coeff se tol t P
R2 = 0.780, F = 30.05, P < 0.001
Checks for collinearity
• Correlation matrix between predictors• Tolerance for each predictor:
– 1-R2 for regression of that predictor on all others– if tolerance is low (<0.1) then collinearity is a
problem• Variance inflation factor (VIF) for each
predictor:– 1/tolerance– if VIF>10 then collinearity is a problem
Explained variance
R2
proportion of variation in y explained by linear relationship with x1, x2 etc.
SS Regression SS Total
Example
Sleep Bodywt Brainwt Lifespan Gestime
3.3 6654.000 5712.0 38.6 64512.5 3.385 44.5 14.0 60etc.
African elephantArctic foxetc.
Boxplots of variables
Collinearity problem for body weight and brain weight• low tolerance• highly correlated
Parameter Estimate SE Tol t PIntercept 18.94 3.11 6.09 <0.001Bodywt -0.76 1.31 0.08 -0.58 0.565Brainwt -0.84 2.03 0.05 -0.42 0.680Lifespan 2.60 2.05 0.33 1.27 0.211Gestime -5.11 1.81 0.36 -2.82 0.007
R2 = 0.486
Predictors log transformed
No collinearity between any predictors:• all tolerances OK• reduced SE and larger slope for body weight
Parameter Estimate SE Tol t PIntercept 19.06 3.07 6.21 <0.001Bodwt -1.25 0.59 0.36 -2.09 0.042Lifespan 2.19 1.78 0.43 1.23 0.225Gestime -5.39 1.67 0.42 -3.23 0.002
R2 = 0.484
Omit brain weight because body weight and brain weight are so highly correlated.
Examples from literature
Lampert (1993)
• Ecology 74:1455-1466
• Response variable:– Daphnia (water flea) clutch size
• Predictors:– body size (mm)– particulate organic carbon (mg/L)– temperature (oC)
Lampert (1993)
Parameter Coeff. SE t P
Intercept -42.34 27.52 -1.54 0.168
Body size 14.76 7.10 2.08 0.076POC 0.27 0.43 0.61 0.559Temp 0.73 0.68 1.07 0.321
ANOVA P = 0.052, R2 = 0.684, n = 11
Williams et al. (1993)
• Ecology 74:904-918
• Response variable:– Zostera (seagrass) growth
• Predictors:– epiphyte biomass– porewater ammonium
Williams et al. (1993)
Parameter Coeff. P
Epiphyte biomass 0.340 >0.05Porewater ammonium 0.919 <0.05
R2 = 0.71Tolerance = 0.839 (so no collinearity)