STA291 Statistical Methods Lecture 11. 2 LINEar Association o r measures “closeness” of data to...
-
Upload
giles-davis -
Category
Documents
-
view
213 -
download
0
Transcript of STA291 Statistical Methods Lecture 11. 2 LINEar Association o r measures “closeness” of data to...
2
LINEar Associationo r measures “closeness” of data to
the “best” line. What line is that? And best in what terms of what?
o In terms of least squared error:
3
“Best” line: least-squares, or regression line
Observed point: (xi, yi)
Predicted value for given xi :
(interpretation in a minute)
“Best” line minimizes , the sum of the squared errors.
ii xbby 10ˆ
2ˆii yy
4
Interpretation of the b0, b1
ii xbby 10ˆ
b0 Intercept: predicted value of y
when x = 0.
b1 Slope: predicted change in y
when x increases by 1.
6
Least Squares, or Regression Line, Example
STA291 study time example:(Hours studied, Score on First Exam)o Data: (1,45), (5, 80), (12, 100)o In summary:
o b1 =
o b0 =Interpretation?
and
952.0,8388.27
,5677.5,75,6
rs
syx
y
x
7
Properties of the Least Squares Lineo b1, slope, always has the same sign as
r, the correlation coefficient—but they measure different things!
o The sum of the errors (or residuals), , is always 0 (zero).
o The line always passes through the point .
ii yy ˆ
yx,
About those residuals
8
o When we use our prediction equation to “check” values we actually observed in our data set, we can find their residuals: the difference between the predicted value and the observed value
o For our STA291 study data earlier, one observation was (5, 80). Our prediction equation was:
o When we plug in x = 5, we get a predicted y of 70.24—our residual, then, is
xy 758.445.46ˆ
76.924.7080ˆ yye
Residuals
9
o Earlier, pointed out the sum of the residuals is always 0 (zero)
o Residuals are positive when the observed y is above the regression line; negative when it is below
o The smaller (in absolute value) the individual residual, the closer the predicted y was to the actual y.
R-squared???
10
o Gives the proportion of the variation of the y’s accounted for in the linear relationship with the x’s
o So, this means?
Why “regression”?
11
o Sir Francis Galton (1880s): correlation between x=father’s height and y=son’s height is about 0.5
o Interpretation: If a father has height one standard deviation below average, then the predicted height of the son is 0.5 standard deviations below average
o More Interpretation: If a father has height two standard deviations above average, then the predicted height of the son is 0.5 x 2 = 1 standard deviation above average
o Tall parents tend to have tall children, but not so tall
o This is called “regression toward the mean”statistical term “regression”