MVS 250: V. Katch
-
Upload
dillon-hooper -
Category
Documents
-
view
27 -
download
0
description
Transcript of MVS 250: V. Katch
1
MVS 250: V. KatchMVS 250: V. Katch
STATISTICSChapter 5 Regression
2
RegressionDefinition Regression Equation
Given a collection of paired data, the regression equation
Regression Line (line of best fit or least-squares line)
the graph of the regression equation
algebraically describes the relationship between the two variables
(y = mx + b)^
3
Regression Line Plotted on Scatter Plot
4
Regression Line
5
Two different lines, one to predict X and one to predict Y.
6
The Regression Equationx is the independent variable
(predictor variable)
y is the dependent variable (response variable)
^
y = mx +b b = slope
7
Assumptions1. We are investigating only linear relationships.
2. For each x value, y is a random variable having a normal (bell-shaped) distribution. All of these y distributions have the same variance. Also, for a given value of x, the distribution of y-values has a mean that lies on the regression line. (Results are not seriously affected if departures from normal distributions and equal variances are not too extreme.)
8
Formula for y-intercept and slope
Formula 2
b =
(y/n) (x2/n) - (x/n) (xy/n)
(x2/n) - (x/n)2
(slope)
Formula 1
SD2x
(xy/n) - (x/n) (y/n)
(x2/n) - (x/n)2m =
SD2x
(y-intercept)
9
If you find r, then
Formula 3
where y is the mean of the y-values and x is the mean of the x values
b = y - mxIntercept = Formula 4
slope = m = r sy/sx
where y is the mean of the y-values, x is the mean of the x-values and m is the slope
10
Rounding the y-intercept and the slope
Round to three significant digits
If you use the formulas 1 and 2, and 3 try not to round intermediate values.
11
The regression line fits the sample
points best.
12
DefinitionsResidual
for a sample of paired (x,y) data, the difference (y - y)
between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation.
Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.
^
Residuals and the Least-Squares Property
^
13
Residuals and the Least-Squares Property
x 1 2 4 5y 4 24 8 32
y = 5 + 4x
02468
101214161820222426283032
1 2 3 4 5
•
•
•
x
yResidual = 7
Residual = -13Residual = -5
Residual = 11
^
•
14
In predicting a value of y based on some given value of x ...
1. If there is not a significant linear correlation, the best predicted y-value is y.
2. If there is a significant linear correlation, the best predicted y-value is found by substituting the x-value into the regression equation.
Predictions
15
Predicting the Value of a Variable
Use the regressionequation to makepredictions. Substitutethe given value in theregression equation.
Calculate the value of rand test the hypothesis
that = 0
Isthere a
significant linearcorrelation
?
Given any value of onevariable, the best predictedvalue of the other variableis its sample mean.
Yes
No
Start
16
1. If there is no significant linear correlation, don’t use the regression equation to make predictions.
2. When using the regression equation for predictions, stay within the scope of the available sample data.
3. A regression equation based on old data is not necessarily valid now.
4. Don’t make predictions about a population that is different from the population from which the sample data was drawn.
Guidelines for Using TheRegression Equation
17
Example1
2
3
4
5
10
15
18
20
30
34
36
37
39
41
50
59
64
68
86
X Y
Compute r, slope, intercept, regression
What is this equation used for?
18
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
What is the best predicted size of a household that discard 0.50 lb of plastic?
19
0.27
2
1.41
3
2.19
3
2.83
6
2.19
4
1.81
2
0.85
1
3.05
5
Data from the Garbage Projectx Plastic (lb)
y Household
What is the best predicted size of a household that discard 0.50 lb of plastic?
b = 0.549
m = 1.48
Using a calculator:
y = 0.549 + 1.48 (0.50)y = 1.3
A household that discards 0.50 lb of plastic has approximately one person.
20
Definitions Marginal Change
the amount a variable changes when the other variable changes by exactly one unit
Outlier a point lying far away from the other data
points
Influential Points points which strongly affect the graph of the
regression line
21
Example 5.4 Height and Foot Length (cont)
Regression equation uncorrected data: 15.4 + 0.13 heightcorrected data: -3.2 + 0.42 height
Correlationuncorrected data: r = 0.28corrected data: r = 0.69
Three outliers were data entry errors.
22
Example 5.10 Earthquakes in US
Correlationall data: r = 0.73w/o SF: r = –0.96
San Francisco earthquake of 1906.
23
Example: Predict the quiz score of a student who spends 30 hours a week watching television.
One more step…….
24
Compute the Standard Error of the Estimate
The predicted score is 56.56 points 7.978 points+
SY*X = SDY√1-r2
SY*X = 13.83√1-(-8.17)2
SY*X = ±7.978
25
Multiple RegressionDefinition
Multiple Regression Equation
A linear relationship between a dependent
variable y and two or more independent
variables (x1, x2, x3 . . . , xk)
y = m0 + m1x1 + m2x2 + . . . + mkxk ^
26
Generic Models
Linear: y = a + bx
Quadratic: y = ax2 + bx + c
Logarithmic: y = a + b lnx
Exponential: y = abx
Power: y = axb
Logistic: y = c1 + ae -bx
27
28
29
30
31
32
33
Development of a Good Mathematics Model
Look for a Pattern in the Graph: Examine the graph of the plotted points and compare the basic pattern to the known generic graphs.
Find and Compare Values of R2: Select functions that result in larger values of R2, because such larger values correspond to functions that better fit the observed points.
Think: Use common sense. Don’t use a model that lead to predicted values known to be totally unrealistic.