Transcript of Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line to Bivariate Data.
- Slide 1
- Slide 2
- Chapters 8, 9, 10 Least Squares Regression Line Fitting a Line
to Bivariate Data
- Slide 3
- Suppose there is a relationship between two numerical
variables. Data: (x 1, y 1 ), (x 2, y 2 ), , (x n, y n ) Let x be
the amount spent on advertising and y be the amount of sales for
the product during a given period. You might want to predict
product sales for a month (y) when the amount spent on advertizing
is $10,000 (x). The letter y is used to denoted the variable you
want to predict, called the response variable. The other variable,
denoted by x, is the explanatory variable.
- Slide 4
- Simplest Relationship Simplest equation that describes the
dependence of variable y on variable x y = b 0 + b 1 x linear
equation b 1 is the slope it is the amount by which y changes when
x increases by 1 unit y-intercept b 0 where the line crosses the
y-axis; that is, the value of y when x = 0.
- Slide 5
- Graph is a line y x0 b0b0 y=b 0 +b 1 x run rise Slope
b=rise/run
- Slide 6
- How do you find an appropriate line for describing a bivariate
data set? y = 10 + 2x y = 4 + 2.5x Lets look at only the blue line.
To assess the fit of a line, we look at how the points deviate
vertically from the line. What is the meaning of a negative
deviation? The point (15,44) has a deviation of +4. To assess the
fit of a line, we need a way to combine the n deviations into a
single measure of fit.
- Slide 7
- The deviations are referred to as residuals and denoted e
i.
- Slide 8
- Residuals: graphically
- Slide 9
- 8 The Least Squares (Regression) Line A good line is one that
minimizes the sum of squared differences between the points and the
line.
- Slide 10
- The Least Squares (Regression) Line 9 3 3 4 1 1 4 (1,2) 2 2
(2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2
+(1.5 - 3) 2 + (4,3.2) (3.2 - 4) 2 = 6.89 Sum of squared
differences =(2 -2.5) 2 +(4 - 2.5) 2 +(1.5 - 2.5) 2 +(3.2 - 2.5) 2
= 3.99 2.5 Let us compare two lines The second line is horizontal
The smaller the sum of squared differences the better the fit of
the line to the data.
- Slide 11
- Criterion for choosing what line to draw: method of least
squares The method of least squares chooses the line that makes the
sum of squares of the residuals as small as possible This line has
slope b 1 and intercept b 0 that minimizes
- Slide 12
- Least Squares Line y = b 0 + b 1 x: Slope b 1 and Intercept b
0
- Slide 13
- Scatterplot with least squares prediction line (x i, y i ):
(3.4, 5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6)
(2, 2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
- Slide 14
- Observed y, Predicted y predicted y when x=2.7 = b 0 + b 1 x =
b 0 + b 1 *2.7 2.7
- Slide 15
- Car Weight, Fuel Consumption Example, cont. (x i, y i ): (3.4,
5.5) (3.8, 5.9) (4.1, 6.5) (2.2, 3.3) (2.6, 3.6) (2.9, 4.6) (2,
2.9) (2.7, 3.6) (1.9, 3.1) (3.4, 4.9)
- Slide 16
- Wt (x) Fuel (y) 3.45.5.5.251.111.231.555
3.85.9.9.811.512.28011.359 4.16.51.21.442.114.45212.532
2.23.3-.7.49-1.091.1881.763 2.63.6-.3.09-.79.6241.237
2.94.600.21.04410 2.02.9-.9.81-1.492.22011.341
2.73.6-.2.04-.79.6241.158 1.93.11-1.291.66411.29
3.44.9.5.25.51.2601.255 2943.905.18014.5898.49 col. sum
- Slide 17
- Calculations
- Slide 18
- Scatterplot with least squares prediction line
- Slide 19
- The Least Squares Line Always goes Through ( x, y ) (x, y ) =
(2.9, 4.39)
- Slide 20
- Using the least squares line for prediction. Fuel consumption
of 3,000 lb car? (x=3)
- Slide 21
- Be Careful! Fuel consumption of 500 lb car? (x =.5) x =.5 is
outside the range of the x-data that we used to determine the least
squares line
- Slide 22
- Avoid GIGO! Evaluating the least squares line 1.Create
scatterplot. Approximately linear? 2.Calculate r 2, the square of
the correlation coefficient 3.Examine residual plot
- Slide 23
- r 2 : The Variation Accounted For The square of the correlation
coefficient r gives important information about the usefulness of
the least squares line
- Slide 24
- r 2 : important information for evaluating the usefulness of
the least squares line The square of the correlation coefficient, r
2, is the fraction of the variation in y that is explained by the
least squares regression of y on x. -1 r 1 implies 0 r 2 1 The
square of the correlation coefficient, r 2, is the fraction of the
variation in y that is explained by differences in x.
- Slide 25
- March Madness: S(k) Sagarin rating of k th seeded team; Y ij
=Vegas point spread between seed i and seed j, i