Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies...

14
Tutorial 4 MBP 1010 Kevin Brown

Transcript of Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies...

Page 1: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Tutorial 4

MBP 1010Kevin Brown

Page 2: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Correlation Review

• Pearson’s correlation coefficient

– Varies between – 1 (perfect negative linear correlation) and 1 (perfect positive linear correlation). 0 indicates no linear association.

– Location and scale independent

Page 3: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Linear Regression

Page 4: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Requires you to define?

• Y – independent variable• X – dependent variable(s)

Page 5: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Allows you to answer what questions?

•Is there an association (same question as the Pearson correlation coefficient)

•What is the association? Measured as the slope.

Page 6: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Assumes

•Linearity•Constant residual variance (homoscedasticity) / residuals normal

•Errors are independent (i.e. not clustered)

Page 7: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Homogeneity of variance

Page 8: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Outputs “estimates”

• intercept•slope•standard errors•t values•p-values•residual standard error (SSE – what is this?)•R2

Page 9: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Linear regression example: height vs. weightExtract information:

> summary(lm(HW[,2] ~ HW[,1]))

Call:lm(formula = HW[, 2] ~ HW[, 1])

Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.12 on 48 degrees of freedomMultiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05

Page 10: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Linear regression example: height vs. weightExtract information:

> summary(lm(HW[,2] ~ HW[,1]))

Call:lm(formula = HW[, 2] ~ HW[, 1])

Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.12 on 48 degrees of freedomMultiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05

Page 11: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Example

• Televisions, Physicians and Life Expectancy (World Almanac Factbook 1993) example– Residuals & Outliers– High leverage points & influential observations– Dummy variable coding– Transformations

• Take home messages– Regression is a very flexible tool– correlation ≠ causation

Page 12: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Dummy coding

• Creates an alternate variable that’s used for analysis

• For 2 categories you set values of …– reference level to 0– level of interest to 1

Page 13: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

Residuals and Outliers

Page 14: Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.

High Leverage Points and Influential Observations