Regression Everything is naturally associated. - Mathacle's...
Transcript of Regression Everything is naturally associated. - Mathacle's...
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
1
Regression – Everything is naturally associated.
IV. REGRESSION ANALYSIS
Regression analysis is to study the relationships among two or more variables.
The basic assumptions for two-variable or bivariate regression analysis are:
The sample is representative of the population for the inference prediction.
The error is a random variable with a mean of zero conditional on the explanatory
variables and the errors are uncorrelated.
The variance of the error is constant across observations.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
2
4.1. LSRL – Least Squares Regression Line
The sample standard deviation of X : 21
1x is x x
n
The sample standard deviation of Y : 21
1y is y y
n
The sample covariance of X and Y: yyxxn
s iixy
1
12
The sample correlation coefficient:
2 2
1
1
1
1 i i
xy
x y
i i
i i
i i
x y
x y
sr
s s
x x y y
x x y y
x x y y
n s s
z zn
The slope of the best fitting line: 1
y
x
sb r
s
The intercept of the best fitting line: 0 1b y b x or 0 1y b b x
The predicted values for Y: xbby 10ˆ
[MATH] The predicted z-score for Y:
0 1 0 1
1
y
y y
y
y x y
x
x
b b x b b xy yz
s s
sx x x xb r
s s s
x xr r z
s
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
3
That is, in the 2-dimensional z space, the “best-fit” line
xy zrz ˆ
passes though the origin (means) with the r as the slope.
Example 4.1.1. A set of sample data about heights and weights is given below:
Height ix Weights
iy
61 105
62 120
63 120
65 160
65 120
68 145
69 175
70 160
72 185
75 210
To analyze the data, the first thing to do may be to visualize the data. Sketch the graph
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
4
Solution:
Does the plot seem “linearly correlated?”
4.2. Find the Regression Line and Residuals
Example 4.2.1. The data is given in the table below:
Height ix Weights
iy
61 105
62 120
63 120
65 160
65 120
68 145
69 175
70 160
72 185
75 210
Find the following variables by hand:
a.) _________n , _________x , _________y
b.) 21
_________1
x is x xn
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
5
ix ix x
2
ix x
61
62
63
65
65
68
69
70
72
75
2
ix x
c.) 21
_________1
y is y yn
iy iy y
2
iy y
105
120
120
160
120
145
175
160
185
210
2
iy y
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
6
d.)
yyxxn
s iixy1
12
ix ix x
iy iy y i ix x y y
61 105
62 120
63 120
65 160
65 120
68 145
69 175
70 160
72 185
75 210
i ix x y y
e.) ____________
2
yx
xy
ss
sr
f.) ____________2 r
g.) 1 _____________y
x
sb r
s
h.) 0 1 ___________b y b x
i.) 0 1 __________________y b b x
Solution:
a.) 150,67,10 yxn
b.) 57.4xs
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
7
ix ix x
2
ix x
61 -6 36
62 -5 35
63 -4 16
65 -2 4
65 -2 4
68 1 1
69 2 4
70 3 9
72 5 25
75 8 64
2
ix x 188
c.) 99.33ys
iy iy y
2
iy y
105 -45 2025
120 -30 900
120 -30 900
160 10 100
120 -30 900
145 -5 25
175 25 625
160 10 100
185 35 1225
210 60 3600
2
iy y 10400
d.) 56.1452 xys
ix ix x iy iy y i ix x y y
61 -6 105 -45 270
62 -5 120 -30 150
63 -4 120 -30 120
65 -2 160 10 -20
65 -2 120 -30 60
68 1 145 -5 -5
69 2 175 25 50
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
8
70 3 160 10 30
72 5 185 35 175
75 8 210 60 480
i ix x y y 1310
e.) 94.0r
f.) 88.02 r
g.) 99.61 b
h.) 33.3180 b
i.) xy 99.633.318ˆ
[ Ti-84] Find the regression line:
STAT -> Calc -> 8. LinReg(a+bx) L1, L2, 1Y
To get 1Y :
VARS -> Y-VARS ->1. Function -> 1. 1Y
Use the calculator to find the following:
_________________x
_________________xS
_________________y
_________________yS
_________________r
1 _________________b
0 _________________b
_________________y
2 _________________r
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
9
Residual Plot
The error or residual is given by
e y y
A residual plot shows the residuals (response variable) on the y-axis and explanatory
variable on the x-axis. If the residual plot has a distinct pattern rather than a random
scattering points, the “linear model” may not be suitable to “best fit” the data.
[ Ti-84] Find the residual plot:
1.) Turn off the StatPlot #1 and turn on StatPlot#2.
2.) In StatPlot#2, change L2 to RESID.
3.) To find RESID:
2nd
-> LIST -> #7: RESID
4.) Graph Residuals:
Zoom -> ZoomStat
Example 4.2.2. Sketch the residual plot for the regression line in Example 4.2.1.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
10
Solution:
4.3. Regression with Nonlinear Regressors
Example 4.3.1. A set of sample data are given below:
ix iy
1 2
2 1
3 6
4 14
5 15
6 30
7 40
8 74
9 75
1.) Sketch the original graph
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
11
2.) Sketch the residual graph
Use the quadratic transformation:
ix (L1) iy (L2) iy (L3)
1 2 1.4142
2 1 1
3 6 2.4495
4 14 3.7417
5 15 3.873
6 30 5.4772
7 40 6.3246
8 74 8.6023
9 75 8.6603
3.) Sketch the “transformed” graph
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
12
4.) Sketch the “transformed” residual graph
Solution:
1.) The original graph:
2.) The residual graph:
The residual plot looks more like a parabola.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
13
3.) Sketch the “transformed” graph
4.) Sketch the “transformed” residual graph
The residual plot looks more like a random noise.
4.4. Properties of LSRL
a.) y or yz are always underestimated.
[MATH]
From Cauchy–Schwarz Inequality, 2 2
i i i ix x y y x x y y , or
1r . In z-score space, the change in the estimate y is always smaller than the change in
x . i.e., the angle of the slope is within 045 .
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
14
b.) The Sum of Residuals is zero
The residual is defined as
iii yye ˆ
The sum of the residuals is
0
ˆ
1
11
10
xxbyy
xbxbyy
xbby
yy
e
ii
ii
ii
ii
i
c.) Coefficient of Determination 2r
The normalized sum of squared residuals:
2
22
111
ˆr
n
rzz
n
zzSSR iii xyyiy
The percentage of variability that cannot be explained is 21 r . So, the percentage of
variability that can be explained is 2r .
2r is called the coefficient of determination.
Do not place too much importance on small differences between 2r values. Keep in mind
that 2,r r values can only relatively be compared while evaluating certain regression
models.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
15
d.) Rule of Thumb for Correlation Strength
0 0.3r , weak correlation
0.3 0.7r , moderate correlation
0.7r , strong correlation
When the points are removed, they are influential if those points change the slope of the
line and correlation coefficient greatly. The influential points are outliers, but outliers
may not be influential. When the data size is large, a single outlier may not be influential.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
16
e.) Residual Plot
The residual plot depicts the measure of the signed distances between the actual data
values and the outputs predicted by the model. A good linear model has residuals that are
near zero and are randomly distributed.
**4.5. Models with Nonlinear Regressors and Linear in Coefficients
A logarithmic model log (x)cy a b may be appropriate if ln( ), yx or
log( ), yx appear to be linear. The logarithmic model can be expressed as
0 1ˆ ln(x)y b b or 0 1
ˆ log(x)y b b
An exponential model xy a b may be appropriate if , ln(y)x or , log(y)x appear to be
linear. The exponential model can be expressed as
0 1ˆln y b b x or 0 1
ˆlog y b b x
Or in the exponential form: 0 1ˆ b b xy e or 0 1ˆ 10b b xy
An power model by a x may be appropriate if ln( ), ln(y)x or log( ), log(y)x appear
to be linear. The power model can be expressed as
0 1ˆln ln( )y b b x or 0 1
ˆlog log( )y b b x
Or in the exponential form: 0 1ˆ b by e x or 0 1ˆ 10b by x
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
17
Start with analysis of the original ( , )x y data. Check r and residuals for all three models,
if necessary, to see which model is more “linear.”
The exponential model may be better for “faster growing” response variable, while the
logarithmic model may be better for “slower growing” response variable.
When apply bx to “straight” the curve, use 1b to “bend” concave-down curves, and use
0 1b to “bend” concave-up curves.
Be sure to keep the list of variables straight. Label sketches/graphs with appropriate
variables.
Example 4.5.1. A Xerox machine dealer has data on the number x of Xerox machines at
each of 89 customer locations and the number y of service calls in a month at each
location. Summary calculations are:
8.4, 14.2, 0.86, 2.1, 3.8x yx y r s s
What is the y-intercept of the LSRL?
Solution:
1
3.80.86 1.556
2.1
y
x
sb r
s , 0 1 14.2 1.556 8.4 1.128b y b x
Example 4.5.2. Which of the following would not be a correct conclusion based on an a
correlation of 0.27 ?
(A) There is a weak linear relationship between the 2 variables.
(B) Approximately 7.3% of variation in y can be explained by linear relationship with x.
(C) In general, as one variable increases, the other variable tends to increase as well.
(D) There is a positive association between the two variables.
(E) The relationship must not follow a linear pattern since the correlation value is so low.
Solution:
The answer is E. Even the correlation value is low, i.e., only 7.3% variation can be
explained, we can not claim that the two variables are totally nonlinear.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
18
Example 4.5.3. A study was conducted to determine the relationship between the number
of mishandled baggages per 1000 customers ( )x and the percentage of on-time arrivals
for various airlines. The LSRL was found to be 97 5.08y x and 2 0.5384r . What is
the correlation?
Solution:
Since 1 0b , the correlation coefficient must be negative. 0.5384 0.7338r .
Example 4.5.4. Suppose that the scatterplot of logX and logY shows a strong positive
correlation close to one. Which of the following is true?
I. The variable X and Y also have a correlation close to one.
II. A scatterplot of the variables X and Y shows strong nonlinear pattern.
III. The residual plot of the variables X and Y shows a random pattern.
Solution:
That logX and logY shows a strong positive correlation implies the power model should
be used:
0 1logY logXb b or 0 1Y 10 Xb b . So, only II is true.
Example 4.5.5. [2014APStatsFRQs, #6] Graph I is a scatterplot showing the lengths of
66 cars plotted with the fuel consumption rate (FCR). One point on the graph is labeled A.
A computer output from a linear regression is shown below.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
19
a.) The point on the graph labeled A represents one car of length 175 inches and an FCR
of 5.88. Calculate and interpret the residual for the car relative to the least squares
regression line.
Graph II is a scatterplot showing the engine size of the 66 cars plotted with the
corresponding residuals from the regression of FCR on length. Graph III is a scatterplot
showing the wheel base of the 66 cars plotted with the corresponding residuals from the
regression of FCR on length.
b.) In graph II, the point labeled A corresponds to the same car whose point was labeled
A in graph I. The measurements for the car represented by point A are given below.
(i) Circle the point on graph III that corresponds to the car represented by point A on
graphs I and II.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
20
(ii) There is a point on graph III labeled B. It is very close to the horizontal line at 0.
What does that indicate about the FCR of the car represented by point B?
c.) Write a few sentences to compare the association between the variables in graph II
with the association between the variables in graph III.
d.) Jamal wants to predict FCR using length and one of the other variables, engine size
or wheel base. Based on your response to part (c), which variable, engine size or wheel
base, should Jamal use in addition to length if he wants to improve the prediction?
Explain why you chose that variable.
Solution:
a.) 5.88 ( 1.595789 0.0372614 175) 0.955A A Ae y y . The predication
underestimated 0.955 FCR for the length of 175 in.
b.) Point B represents with the wheel base of about 120, the regression model can
accurately predict FCR on the length.
c.) There is a moderate positive association in Graph II and very weak association in
Graph III. The association is stronger between residual and engine size than that between
residual and wheel base.
d.) Since the engine size shows a stronger positive association with the residuals, it is
more useful to provide extra info when it comes to assess FCR based on the length.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
21
Practice Questions -- LSRL
1. The residual value of x, y in a linear regression is
(A) Negative
(B) Zero
(C) Positive
(D) Dependent on the value of r
(E) None of these
2. If 60,12 is an influential point for the regression line xy 098.4908.7ˆ , then which
of the following must be true?
(A) Removal of 60,12 will improve r
(B) Removal of 60,12 will not affect r
(C) Removal of 60,12 will change the value of the slope of regression line
(D) 60,12 has a large residual
(E) None of these
3. A statistics student calculated a LSRL to describe the relationship between two
variables and then realized that he had mistakenly interchanged the explanatory and
response variable. When the LSRL is recalculated, what can be said about the
correlation?
(A) The correlation does not change when the variables are switched.
(B) The correlation will have the same absolute value, but the sign will change.
(C) The correlation will increase once the explanatory and response variables are
correctly identified and used appropriately in the calculation of the LSRL
(D) The correlation will decrease because of the student’s initial mistake.
(E) We can not be sure how the correlation of the new LSRL will compare to the
correlation that was found originally.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
22
4. Suppose a data set is transformed using , ( , log )x y x y and a least linear regression
procedure is performed on the transformed data. If the residual plot of this regression
shows a curved pattern, which of the following is an appropriate conclusion?
(A) A quadratic model should be used with the original data.
(B) A square root transformation should be applied to the transformation data
(C) The correlation coefficient of the set of transformed data is zero.
(D) The exponential transformation is not appropriate.
(E) None of these is appropriate.
5. After data are collected from an agricultural experiment, suppose a transformation is
performed on the bivariate set (inches of water, total plant growth.) if the linear
regression of the transformed data has the equation: ( ) 0.7 1.93 log(water)log growth
The regression model of the original data is
(A) 0.7 1.93(water)growth
(B) 5.01 1.93(water)growth
(C) 5.01 1.93water
growth
(D) 1.93
5.01growth water
(E) none of these
6. Residuals are
(A) possible models not explored by the researcher.
(B) variation in the response variable that is explained by the model.
(C) the difference between the observed response and the values predicted by the model.
(D) data collected from individuals that is not consistent with the rest of the group.
(E) a measure of the strength of the linear relationship between x and y .
7. Data was collected in two variables x and y and a least squares regression line was
fitted to the data. The resulting equation is ˆ 2.29 1.7y x . What is the residual for
point 5,6 ?
(A) 2.91
(B) 0.21
(C) 0.21
(D) 6.21
(E) 7.91
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
23
8. Given a set of ordered pairs ,x y with 2.5xs , 1.9ys , 0.63r , what is the slope
of the regression line of y on x ?
(A) 0.48
(B) 0.65
(C) 1.32
(D) 1.90
(E) 2.63
9. All but one of these statements is false. Which one could be true?
(A) The correlation between a football player’s weight and the position he plays is 0.54
(B) The correlation between a car’s length and its fuel efficiency is 0.71miles per gallon.
(C) There is a high correlation (1.09) between height of a corn stalk and its age in weeks.
(D) Correlation between amount of fertilizer used and quantity of beans harvested is
0.42
(E) There is a correlation of 0.63between gender and political party.
10. Which is true?
I. Random scatter in the residuals indicates a linear model.
II. If two variables are very strongly associated, then the correlation between them will be
near 1.0 or 1.0 .
III. Changing the units of measurement for x or y changes the correlation coefficient.
(A) I only
(B) II only
(C) I and II only
(D) II and III only
(E) I, II and III
11. If the coefficient of determination 2r is calculated as 0.49 , then the correlation
coefficient
(A) can not be determined without the data
(B) is 0.70
(C) is 0.2401
(D) is 0.70
(E) is 0.7599
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
24
12. Which of the following is a correct conclusion based on the residual plot displayed?
(A) The line overestimates the data.
(B) The line underestimates the data.
(C) It is not appropriate to fit a line to these data since there is clearly no correlation.
(D) The data are not related.
(E) There is a nonlinear relationship between the variables.
13. What conclusion can be reached based on the residual plot shown below?
(A) Since the plot shows a pattern, a linear model is not appropriate for the data.
(B) Since the plot shows a pattern, the data is not roughly normally distributed.
(C) Since both parts of the plot shows a roughly linear pattern, a linear model is
appropriate for the data.
(D) Since both parts of the plot follow a roughly linear, the data is approximately
normally distributed.
(E) The data appears to follow a quadratic or absolute value model
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
25
14. Which of the following would not be correct based on a correlation of 0.92 ?
(A) There is a negative association between the two variables.
(B) Since correlation is a resistant measure, there must not be any outliers.
(C) In general, as the value of one variable increases, the value of the other variable tends
to decrease.
(D) There is a strong linear relationship between the two variables
(E) Approximately 84.6%of the variation in y can be explained by the linear
relationship with x .
15. Data that follows an exponential model in ,x y can be re-expressed as a linear model
if you plot:
(A) log ,x y
(B) ,x y
(C) log , logx y
(D) 2,x y
(E) , logx y
16. An LSRL is to found to be: ˆlog 2.35 0.62y x . The equation can be rewritten as:
(A) ˆ 0.24 (223.87)xy
(B) ˆ 223.87 0.24y x
(C) ˆ 223.87 (0.24)xy
(D) ˆ 223.87(0.24)xy
(E) ˆ 0.24 (223.87)xy
17. An LSRL was calculated and the residual for a certain data points is found to be
0.8074 . This tells us that the predicted cost is
(A) wrong
(B) higher than our observed cost
(C) lower than our observed cost
(D) the result of extrapolation
(E) the result of interpolation
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
26
18. An LSRL was computed for log , logx y . The resulting equation was:
ˆlog 3.1 2log( )y x . Find the predicted value of y when 1x .
(A) 5.1
(B) 510.00
(C) 1258.9
(D) 5100.0
(E) 126,000.0
19. Which of the following statements are true?
I. When the data set includes an influential point, the data set is nonlinear.
II. Influential points always reduce the coefficient of determination.
III. All outliers are influential data points.
(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above
20. In the scatterplot of y versus x shown below, the least squares regression line is
superimposed on the plot. Which of the following points has the largest residual?
(A) A
(B) B
(C) C
(D) D
(E) E
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
27
21. Consider n pairs of numbers1 1( , )x y ,
2 2( , )x y , …, ( , )n nx y . The mean and standard
deviation of the x-values are 5x and 4xs , respectively. The mean and standard
deviation of the y-values are 10y and 10ys , respectively. Of the following, which
could be the least squares regression line?
(A) ˆ 5.0 3.0y x
(B) ˆ 3.0y x
(C) ˆ 5.0 2.5y x
(D) ˆ 8.5 0.3y x
(E) ˆ 10.0 0.4y x
22. Researchers studying growth patterns of children collect data on the heights of fathers
and sons. The correlation between the fathers’ heights and the heights of their 16 year-old
sons is most likely to be
(A) -1.0
(B) near zero
(C) near 0.7
(D) exactly +1.0
(E) somewhat greater than 1.0
23. The auto insurance industry crashed some test vehicles into a cement barrier at speeds
of 5 to 25 mph to investigate the amount of damage to the cars. They found a correlation
of 0.60r between speed (MPH) and damage ($). If the speed at which a car hits the
barrier is 1.5 standard deviations above the mean speed, we expect the damage to be how
much of the mean damage.
(A) equal to
(B) 0.36 SD above
(C) 0.60 SD above
(D) 0.90 SD above
(E) 1.5 SD above
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
28
24. The correlation between X and Y is 0.35r . If we double each X value and decrease
each Y by 0.20, and exchange the variables (put X on the Y-axis and vice versa), the new
correlation
(A) is 0.35
(B) is 0.50
(C) is 0.70
(D) is 0.90
(E) can not be determined
25. The correlation between a family’s weekly income and the amount they spend on
restaurant means is found to be 0.30r . Which must be true?
I. Families tend to spend about 30% of their incomes in restaurants
II. In general, the higher the income, the more the family spends in restaurants.
III. The line of best fit passes through 30% of (income, restaurant$) data points
(A) I only
(B) II only
(C) III only
(D) II and III only
(E) I, II and III
26. Education research consistently shows that students from wealthier families tend to
have higher SAT scores. The slope of the line that predicts SAT score from family
income is 6.25 points per $1000, and the correlation between the variables is 0.48. Then
the slope of the line that predicts family income from SAT scores (in $1000 per point) is
(A) 0.037
(B) 0.16
(C) 3.00
(D) 6.25
(E) 13.02
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
29
27. A regression analysis of company profits and amount of money the company spent on
advertising found 2 0.72r . Which of these is true?
I. The model can correctly predict the profit for 72% of companies
II. On average, about 72% of the company’s profits results from advertising.
III. On average, companies spend about 72% of their profits on advertising
(A) none
(B) I only
(C) II only
(D) III only
(E) I and III
28. A least squares line of regression has been fitted to a scatterplot; the model’s residuals
plot is shown below. Which of the following statements is true?
(A)The linear model is appropriate.
(B) The linear model is poor because some residuals are large.
(C) The linear model is poor because the correlation is near zero.
(D) A curved model would be better.
(E) None of the above.
29. The correlation between two scores X and Y equals to 0.8. If both the X scores and
the Y scores are converted to z-scores, then the correlation between the z-scores for X
and the z-scores for Y would be
(A) -0.8
(B) -0.2
(C) 0.0
(D) 0.2
(E) 0.8
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
30
30. A least squares regression line was fitted to the weights (in pounds) versus age (in
months) of a group of many young children. The equation of the line is ˆ 16.6 0.65y t
where y is the predicted weight and t is the age of the child. A 20-month-old child in this
group has an actual weight of 25 pounds. Which of the following is the residual weight,
in pounds, for this child?
(A) -7.85
(B) -4.60
(C) 4.60
(D) 5.0
(E) 7.85
31. A college’s job placement office collected data about students’ GPAs and the salaries
they earned in their first jobs after graduation. The mean GPA was 2.9 with standard
deviation of 0.4. Starting salaries had a mean of $47200 with a SD of $8500. The
correlation between the two variables was 0.72r . The association appeared to be linear
in scatterplot.
a.). Write an equation of the model that can predict salary based on GPA
b.) Do you think these predictions will be reliable? Explain.
c.) Your brother just graduated from that college with a GPA of 3.30. He tells you that
based this model the residual for his pay is -$1880. What salary is he earning?
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
31
32. [2011APStatsFRQs, #5] Windmill generate electricity by transferring energy from
wind to a turbine. A study was conducted to examine the relationship between wind
velocity in miles per hour (mph) and electricity production in amperes for one particular
windmill. For the windmill, measurements were taken on twenty-five randomly selected
days, and the computer output for the regression analysis for predicting electricity
production based on wind velocity is given below. The regression model assumptions
were checked and determined to be reasonable over the interval of wind speeds
represented in the data, which were from 10 miles per hour to 40 miles per hour.
Predictor Coef SE Coef T P
Constant 0.137 0.126 0.109 0.289
Wind velocity 0.240 0.019 12.63 0
S=0.237 R-Sq=0.873 R-Sq(adj)=0.868
(a) Use the computer output above to determine the equation of the least squares
regression line. Identify all variable used in the equation.
(b) How much more electricity would be the windmill be expected to produce on a day
when the wind velocity 25 mph than on a day when the wind velocity is 15 mph? Show
how you arrived at your answer.
(c) What proportion of the variation in electricity production is explained by its linear
relationship with wind velocity?
33. [CollegeBoardAPStatsPracticeProblem] Exercise physiologists are investigating the
relationship between lean body mass (in kilograms) and the resting metabolic rate (in
calories per day) in sedentary males.
Based on the computer output above, which of the following is the best interpretation of
the value of the slope of the regression line?
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
32
(A) For each additional kilogram of lean body mass, the resting metabolic rate increases
on average by 22.563 calories per day.
(B) For each additional kilogram of lean body mass, the resting metabolic rate increases
on average by 264.0 calories per day.
(C) For each additional kilogram of lean body mass, the resting metabolic rate increases
on average by 144.9 calories per day.
(D) For each additional calorie per day for the resting metabolic rate, the lean body mass
increases on average by 22.563 kilograms.
(E) For each additional calorie per day for the resting metabolic rate, the lean body mass
increases on average by 264.0 kilograms.
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
33
Answers:
1. B. ),( yx is on the line. 0)(ˆ10 xbbyyye .
2. C. D may not be true.
3. A.
y
i
x
i
s
yy
s
xx
nr
1
1.
4. D. bxaybxay 10,log .
5. D. 93.1)log(93.1)log(93.17.0 )(01.51001.510 watergrowth waterwater
6. C.
7. B. 21.0))5(7.129.2(6
8. A. 4788.05.2
9.163.01
x
y
s
srb
9. D. (A) position is not quantified variable, (B) correlation does not units, (C)
correlation is no greater than one, (E) gender and political party is not quantified
variables.
10. C.
11. A. The sign of the correlation can not be determined.
12. B.
13. A.
14. B.
15. E.
16. D. xxy )24.0(87.223)10(10ˆ 62.035.2 .
17. C. 8074.0ˆ,8074.0ˆ yyyye .
18. C. 9.125810ˆ,1.3ˆlog 1.3 yy .
19. E.
20. A.
21. D. )10,5(),( yx is on the line.
22. C.
23. D. 9.0)5.1(6.06.0ˆ xxd zrzz
24. A.
25. B. )($)( 10 incomebbrestaurant . I and III do not make sense.
26. A. 1000/25.6income
satsat
s
srb ,
0368.0
1000/25.6
1)48.0(
1 22
income
satsat
incomeincome
s
sr
rs
srb .
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
34
27. A. aap zrzz 84.0ˆ
28. A.
29. E. 8.0xyr
30. B. 6.46.2925ˆ,6.29)20(65.06.16ˆ,25 iiii yyeyy .
31. 2830)9.2(1530047200,153004.0
850072.0 101
xbyb
s
srb
x
y.
a.) 2830 15300( )Salary GPA
b.) 2 52%R , about 52% of variation can be explained, somewhat reliable.
c.) 2830 15300(3.30) 53320,Salary
1880 53320 $51440b b b be y y y y
32.
(a) Production=0.137+0.24(Velocity)
(b) Diff (Prodction)=0.24(25-15)=2.4 A
(c) 87.3%
33.) A. calories=264+22.53(mass).
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
35
**6.6. Testing different models
X
Y
X
Y
a.) Line Model: 0 1y b b x or y a bx
For Ti-84,
Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2
Step B: Turn on the StatPlot #1: 2nd
-> StatPlot -> #1 -> On -> 1st graph
Step C: Display the data: Zoom -> ZoomStat
Step D: Regression Line: STAT -> Calc -> 8. LinReg(a+bx) L1, L2,
1Y
To get 1Y : VARS -> Y-VARS ->1. Function -> 1. 1Y
Step E: Equation: _______________________________
Step F: R and 2R : __________R and 2 __________R
Step G: Sketch the data and equation:
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
36
Step H: Residual Plot:
1.) Turn off the StatPlot #1 and turn on StatPlot#2.
2.) In StatPlot#2, change L2 to RESID.
To find RESID in Ti-84: 2nd
-> LIST -> #7: RESID
3.) Zoom -> ZoomStat
Sketch the residual plot
b.) Log Model: 0 1 log( )y b b x or 0 1 Ln( )y b b x or Ln( )y a b x
For Ti-84,
Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2
Step B: Turn on the StatPlot #1: 2nd
-> StatPlot -> #1 -> On -> 1st graph
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
37
Step C: Display the data: Zoom -> ZoomStat
Step D: Log Model: STAT -> Calc -> 9, LnReg L1, L2, 1Y
To get 1Y : VARS -> Y-VARS ->1. Function -> 1.
1Y
Step E: Equation: _______________________________
Step F: R and 2R : __________R and 2 __________R
Step G: Sketch the data and equation:
Step H: Residual Plot:
1.) Turn off the StatPlot #1 and turn on StatPlot#2.
2.) In StatPlot#2, change L2 to RESID.
To find RESID in Ti-84: 2nd
-> LIST -> #7: RESID
3.) Zoom -> ZoomStat
Sketch the residual plot
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
38
c.) Exponential Model: 0 1( )Ln y b b x , or xy ab
For Ti-84,
Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2
Step B: Turn on the StatPlot #1: 2nd
-> StatPlot -> #1 -> On -> 1st graph
Step C: Display the data: Zoom -> ZoomStat
Step F: Exponential Model: STAT -> Calc -> 0, ExpReg L1, L2, 1Y
To get 1Y : VARS -> Y-VARS ->1. Function -> 1. 1Y
Step E: Equation: _______________________________
Step F: R and 2R : __________R and 2 __________R
Step G: Sketch the data and equation:
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
39
Step H: Residual Plot:
1.) Turn off the StatPlot #1 and turn on StatPlot#2.
2.) In StatPlot#2, change L2 to RESID.
To find RESID in Ti-84: 2nd
-> LIST -> #7: RESID
3.) Zoom -> ZoomStat
Sketch the residual plot
d.) Power Model: 0 1( )Ln y b b x , or by ax
For Ti-84,
Step A: Input the data into L1 and L2: STAT -> EDIT -> L1, L2
Step B: Turn on the StatPlot #1: 2nd
-> StatPlot -> #1 -> On -> 1st graph
Step C: Display the data: Zoom -> ZoomStat
Step D: Power Model: STAT -> Calc -> A, PwrReg L1, L2,
1Y
To get 1Y : VARS -> Y-VARS ->1. Function -> 1. 1Y
Step E: Equation: _______________________________
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
40
Step F: R and 2R : __________R and 2 __________R
Step G: Sketch the data and equation:
Step H: Residual Plot:
1.) Turn off the StatPlot #1 and turn on StatPlot#2.
2.) In StatPlot#2, change L2 to RESID.
To find RESID in Ti-84: 2nd
-> LIST -> #7: RESID
3.) Zoom -> ZoomStat
Sketch the residual plot
Conclusions: ___________________________________________________
e.) Regression models in R script:
Mathacle
PSet ----- Stats, Regression Analysis
Level ---- 1
Number --- 1
Name: ___________________ Date: _____________
41