STA291 Statistical Methods Lecture 11. 2 LINEar Association o r measures “closeness” of data to...

12
STA291 Statistical Methods Lecture 11

Transcript of STA291 Statistical Methods Lecture 11. 2 LINEar Association o r measures “closeness” of data to...

STA291Statistical Methods

Lecture 11

2

LINEar Associationo r measures “closeness” of data to

the “best” line. What line is that? And best in what terms of what?

o In terms of least squared error:

3

“Best” line: least-squares, or regression line

Observed point: (xi, yi)

Predicted value for given xi :

(interpretation in a minute)

“Best” line minimizes , the sum of the squared errors.

ii xbby 10ˆ

2ˆii yy

4

Interpretation of the b0, b1

ii xbby 10ˆ

b0 Intercept: predicted value of y

when x = 0.

b1 Slope: predicted change in y

when x increases by 1.

5

Calculation of the b0, b1

ii xbby 10ˆ

x

y

s

srb 1

xbyb 10

where

and

6

Least Squares, or Regression Line, Example

STA291 study time example:(Hours studied, Score on First Exam)o Data: (1,45), (5, 80), (12, 100)o In summary:

o b1 =

o b0 =Interpretation?

and

952.0,8388.27

,5677.5,75,6

rs

syx

y

x

7

Properties of the Least Squares Lineo b1, slope, always has the same sign as

r, the correlation coefficient—but they measure different things!

o The sum of the errors (or residuals), , is always 0 (zero).

o The line always passes through the point .

ii yy ˆ

yx,

About those residuals

8

o When we use our prediction equation to “check” values we actually observed in our data set, we can find their residuals: the difference between the predicted value and the observed value

o For our STA291 study data earlier, one observation was (5, 80). Our prediction equation was:

o When we plug in x = 5, we get a predicted y of 70.24—our residual, then, is

xy 758.445.46ˆ

76.924.7080ˆ yye

Residuals

9

o Earlier, pointed out the sum of the residuals is always 0 (zero)

o Residuals are positive when the observed y is above the regression line; negative when it is below

o The smaller (in absolute value) the individual residual, the closer the predicted y was to the actual y.

R-squared???

10

o Gives the proportion of the variation of the y’s accounted for in the linear relationship with the x’s

o So, this means?

Why “regression”?

11

o Sir Francis Galton (1880s): correlation between x=father’s height and y=son’s height is about 0.5

o Interpretation: If a father has height one standard deviation below average, then the predicted height of the son is 0.5 standard deviations below average

o More Interpretation: If a father has height two standard deviations above average, then the predicted height of the son is 0.5 x 2 = 1 standard deviation above average

o Tall parents tend to have tall children, but not so tall

o This is called “regression toward the mean”statistical term “regression”

Looking back

oBest-fit, or least-squares, or regression lineoInterpretation of the slope, interceptoResidualsoR-squaredo“Regression toward the mean”