Post on 19-Jan-2018
description
Continuous
Continuous
Categorical
Histogram
Scatter
Boxplot
Predictor Variable(X-Axis)
Child’s Height
Outcome, Dependent Variable
(Y-Axis)
LinearRegression
Regression Model
Parents Height
Gender
Slide 5
Describing a Straight Line
• bi• Regression coefficient for the predictor• Gradient (slope) of the regression line• Direction/strength of relationship
• b0• Intercept (value of Y when X = 0)• Point at which the regression line crosses the Y-
axis (ordinate)
0i i i iY b b X
Slide 8
• SST• Total variability (variability between scores and the mean).
• SSR• Residual/error variability (variability between the regression
model and the actual data).
• SSM • Model variability (difference in variability between the model
and the mean).
Sum of Squares
Testing the Model: ANOVA
• If the model results in better prediction than using the mean, then we expect SSM to be much greater than SSR
SSR
Error in Model
SSM
Improvement Due to the Model
SST
Total Variance in the Data
Linear Model - Regression
• lm() function – lm stands for ‘linear model’.
Model <-lm(outcome ~ predictor(s), data = dataFrame, na.action = an action))
model.1 <- lm(childHeight~father, data = heights)
Slide 15
Testing the Model: R2
• R2
• The proportion of variance accounted for by the regression model.• The Pearson Correlation Coefficient Squared
M
T
SS2SS R
Compare Models
Model 1 2 12 3 4
Intercept 40.1 46.6 22.6 22.63 22.64
Father 0.385 0.36 0.01
Mom 0.314 0.29 NA
midparentHeight 0.637 0.538
R-squares 0.070 0.0395 0.105 0.102 0.1033
r 0.27 0.2 0.32
R^2 0.073 0.04 0.102
Linear Regression Comparison
Model 1 2 12 3 4 5 6 7
Intercept 40.1 46.6 22.6 22.6 22.6 64.1 16.5 16.5
Father 0.385 0.36 x 0.39
Mom 0.314 0.29 x 0.31
midparentHeight
0.637 0.538 0.687
Gender 5.13 5.21 5.21
R-squares 0.070 0.0395 0.105 0.102 0.1033 0.5137 0.632 0.634
r 0.27 0.2 0.32 0.717
R^2 0.073 0.04 0.102 0.5137