Post on 14-Dec-2015
Tutorial 4
MBP 1010Kevin Brown
Correlation Review
• Pearson’s correlation coefficient
– Varies between – 1 (perfect negative linear correlation) and 1 (perfect positive linear correlation). 0 indicates no linear association.
– Location and scale independent
Linear Regression
Requires you to define?
• Y – independent variable• X – dependent variable(s)
Allows you to answer what questions?
•Is there an association (same question as the Pearson correlation coefficient)
•What is the association? Measured as the slope.
Assumes
•Linearity•Constant residual variance (homoscedasticity) / residuals normal
•Errors are independent (i.e. not clustered)
Homogeneity of variance
Outputs “estimates”
• intercept•slope•standard errors•t values•p-values•residual standard error (SSE – what is this?)•R2
Linear regression example: height vs. weightExtract information:
> summary(lm(HW[,2] ~ HW[,1]))
Call:lm(formula = HW[, 2] ~ HW[, 1])
Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.12 on 48 degrees of freedomMultiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05
Linear regression example: height vs. weightExtract information:
> summary(lm(HW[,2] ~ HW[,1]))
Call:lm(formula = HW[, 2] ~ HW[, 1])
Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.12 on 48 degrees of freedomMultiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05
Example
• Televisions, Physicians and Life Expectancy (World Almanac Factbook 1993) example– Residuals & Outliers– High leverage points & influential observations– Dummy variable coding– Transformations
• Take home messages– Regression is a very flexible tool– correlation ≠ causation
Dummy coding
• Creates an alternate variable that’s used for analysis
• For 2 categories you set values of …– reference level to 0– level of interest to 1
Residuals and Outliers
High Leverage Points and Influential Observations