Stats Chap10
-
Upload
waseem-tariq -
Category
Documents
-
view
240 -
download
0
Transcript of Stats Chap10
-
8/3/2019 Stats Chap10
1/63
Chapter 10
Correlation and Regression
McGraw-Hill, Bluman, 5th ed., Chapter 10 1
-
8/3/2019 Stats Chap10
2/63
Chapter 10 OverviewIntroduction
10-1 Correlation
10-2 Regression 10-3 Coefficient of Determination and
Standard Error of the Estimate
Bluman, Chapter 10 2
-
8/3/2019 Stats Chap10
3/63
Chapter 10 Objectives1. Draw a scatter plot for a set of ordered pairs.
2. Compute the correlation coefficient.
3. Test the hypothesis H0: = 0.
4. Compute the equation of the regression line.
5. Compute the coefficient of determination.
6. Compute the standard error of the estimate.
7. Find a prediction interval.
Bluman, Chapter 10 3
-
8/3/2019 Stats Chap10
4/63
Introduction
In addition to hypothesis testing andconfidence intervals, inferential statisticsinvolves determining whether a
relationship between two or morenumerical or quantitative variables exists.
Bluman, Chapter 10 4
-
8/3/2019 Stats Chap10
5/63
Introduction
CorrelationCorrelation is a statistical method usedto determine whether a linear relationshipbetween variables exists.
RegressionRegression is a statistical method usedto describe the nature of the relationship
between variablesthat is, positive ornegative, linear or nonlinear.
Bluman, Chapter 10 5
-
8/3/2019 Stats Chap10
6/63
Introduction
The purpose of this chapter is to answerthese questions statistically:
1. Are two or more variables related?
2. If so, what is the strength of therelationship?
3. What type of relationship exists?4. What kind of predictions can be
made from the relationship?
Bluman, Chapter 10 6
-
8/3/2019 Stats Chap10
7/63
Introduction
1. Are two or more variables related?
2. If so, what is the strength of the
relationship?
To answer these two questions, statisticians usethe correlation coefficientcorrelation coefficient, a numericalmeasure to determine whether two or more
variables are related and to determine thestrength of the relationship between or amongthe variables.
Bluman, Chapter 10 7
-
8/3/2019 Stats Chap10
8/63
Introduction
3. What type of relationship exists?
There are two types of relationships: simple andmultiple.
In a simple relationship, there are two variables: anindependent variableindependent variable (predictor variable) and adependent variabledependent variable (response variable).
In a multiple relationship, there are two or moreindependent variables that are used to predict onedependent variable.
Bluman, Chapter 10 8
-
8/3/2019 Stats Chap10
9/63
Introduction
4. What kind of predictions can be made
from the relationship?
Predictions are made in all areas and daily. Examplesinclude weather forecasting, stock market analyses,sales predictions, crop predictions, gasoline pricepredictions, and sports predictions. Some predictionsare more accurate than others, due to the strength of
the relationship. That is, the stronger the relationship isbetween variables, the more accurate the prediction is.
Bluman, Chapter 10 9
-
8/3/2019 Stats Chap10
10/63
10.1 Scatter Plots and Correlation
A scatter plotscatter plot is a graph of the orderedpairs (x, y) of numbers consisting of theindependent variablexand the
dependent variable y.
Bluman, Chapter 10 10
-
8/3/2019 Stats Chap10
11/63
Chapter 10Correlation and Regression
Section 10-1Example 10-1
Page #536
Bluman, Chapter 10 11
-
8/3/2019 Stats Chap10
12/63
Example 10-1: Car Rental Companies
Construct a scatter plot for the data shown for car rentalcompanies in the United States for a recent year.
Step 1: Draw and label thexand yaxes.
Step 2: Plot each point on the graph.
Bluman, Chapter 10 12
-
8/3/2019 Stats Chap10
13/63
Example 10-1: Car Rental Companies
Bluman, Chapter 10 13
Positive Relationship
-
8/3/2019 Stats Chap10
14/63
Correlation
The correlation coefficientcorrelation coefficient computed fromthe sample data measures the strength anddirection of a linear relationship between two
variables. There are several types of correlation
coefficients. The one explained in this sectionis called the Pearson product momentPearson product moment
correlation coefficient (PPMC)correlation coefficient (PPMC). The symbol for the sample correlation
coefficient is r. The symbol for the populationcorrelation coefficient is V.
Bluman, Chapter 10 14
-
8/3/2019 Stats Chap10
15/63
Correlation
The range of the correlation coefficient is from1 to 1.
If there is a strong positive linearstrong positive linear
relationshiprelationship between the variables, the valueofrwill be close to 1.
If there is a strong negative linearstrong negative linearrelationshiprelationship between the variables, the value
ofr will be close to 1.
Bluman, Chapter 10 15
-
8/3/2019 Stats Chap10
16/63
Correlation
Bluman, Chapter 10 16
-
8/3/2019 Stats Chap10
17/63
Correlation Coefficient
The formula for the correlation coefficient is
where n is the number of data pairs.
Rounding Rule: Round to three decimal places.
Bluman, Chapter 10 17
2 2
2 2
!
- -
n xy x yr
n x x n y y
-
8/3/2019 Stats Chap10
18/63
Chapter 10Correlation and Regression
Section 10-1Example 10-2
Page #540
Bluman, Chapter 10 18
-
8/3/2019 Stats Chap10
19/63
Example 10-4: Car Rental Companies
Compute the correlation coefficient for the data inExample 101.
Bluman, Chapter 10 19
CompanyCarsx
(in 10,000s)Income y
(in billions) xy x 2 y2
A
BCDEF
63.0
29.020.819.113.48.5
7.0
3.92.12.81.41.5
441.00
113.1043.6853.4818.762.75
3969.00
841.00432.64364.81179.5672.25
49.00
15.214.417.841.962.25
x =
153.8
y =
18.7
xy =
682.77
x2 =
5859.26
y2 =
80.67
-
8/3/2019 Stats Chap10
20/63
Example 10-4: Car Rental Companies
Compute the correlation coefficient for the data inExample 101.
Bluman, Chapter 10 20
x = 153.8, y = 18.7, xy = 682.77, x2 = 5859.26,
y2 = 80.67, n = 6
2 22 2
! - -
n xy x yr
n x x n y y
2 2
6 682.77 153.8 18.7
6 5859.26 153.8 6 80.67 18.7
!
- -
r
0.982 (strong positive relationship)!r
-
8/3/2019 Stats Chap10
21/63
Hypothesis Testing
In hypothesis testing, one of the following istrue:
H0: V ! 0 This null hypothesis means that
there is no correlation between thexand yvariables in the population.
H1: V { 0 This alternative hypothesis meansthat there is a significant correlationbetween the variables in thepopulation.
Bluman, Chapter 10 21
-
8/3/2019 Stats Chap10
22/63
tTest for the Correlation Coefficient
Bluman, Chapter 10 22
2
2
1
with degrees of freedom equal to 2.
!
nt r
r
n
-
8/3/2019 Stats Chap10
23/63
Chapter 10Correlation and Regression
Section 10-1
Example 10-3
Page #544
Bluman, Chapter 10 23
-
8/3/2019 Stats Chap10
24/63
Example 10-3: Car Rental Companies
Test the significance of the correlation coefficient found inExample 102. Use = 0.05 and r= 0.982.
Step 1: State the hypotheses.
H0: = 0 and H1: { 0
Step 2: Find the critical value.
Since = 0.05 and there are 6 2 = 4 degrees of
freedom, the critical values obtained from Table Fare 2.776.
Bluman, Chapter 10 24
-
8/3/2019 Stats Chap10
25/63
Step 3: Compute the test value.
Step 4:
Make the de
cision
.
Reject the null hypothesis.
Step 5: Summarize the results.
There is a significant relationship between thenumber of cars a rental agency owns and itsannual income.
Example 10-3: Car Rental Companies
Bluman, Chapter 10 25
2
2
1
!
nt r
r 2
6 20.982 10.4
1 0.982
! !
-
8/3/2019 Stats Chap10
26/63
Chapter 10Correlation and Regression
Section 10-1
Example 10-4
Page #545
Bluman, Chapter 10 26
-
8/3/2019 Stats Chap10
27/63
Example 10-4: Car Rental Companies
Using Table I, test the significance of the correlationcoefficient r = 0.067, from Example 103, at = 0.01.
Step 1: State the hypotheses.
H0: = 0 and H1: { 0
There are 9 2 = 7 degrees of freedom. The value inTable I when = 0.01 is 0.798.
For a significant relationship, rmust be greater than 0.798or less than -0.798. Since r= 0.067, do not reject the null.
Hence, there is not enough evidence to say that there is asignificant linear relationship between the variables.
Bluman, Chapter 10 27
-
8/3/2019 Stats Chap10
28/63
Possible Relationships Between
VariablesWhen the null hypothesis has been rejected for a specifica value, any of the following five possibilities can exist.
1. There is a direct cause-and-effectrelationship
between the variables. That is,xcauses y.2. There is a reverse cause-and-effectrelationship
between the variables. That is, ycausesx.
3. The relationship between the variables may be
caused by a third variable.4. There may be a complexity of interrelationships
among many variables.
5. The relationship may be coincidental.
Bluman, Chapter 10 28
-
8/3/2019 Stats Chap10
29/63
Possible Relationships Between
Variables1. There is a reverse cause-and-effectrelationship
between the variables. That is, ycausesx.
For example,
water causes plants to grow
poison causes death
heat causes ice to melt
Bluman, Chapter 10 29
-
8/3/2019 Stats Chap10
30/63
Possible Relationships Between
Variables2. There is a reverse cause-and-effectrelationship
between the variables. That is, ycausesx.
For example,
Suppose a researcher believes excessive coffeeconsumption causes nervousness, but theresearcher fails to consider that the reverse
situation may occur. That is, it may be that anextremely nervous person craves coffee to calmhis or her nerves.
Bluman, Chapter 10 30
-
8/3/2019 Stats Chap10
31/63
Possible Relationships Between
Variables3. The relationship between the variables may be
caused by a third variable.
For example,
If a statistician correlated the number of deathsdue to drowning and the number of cans of softdrink consumed daily during the summer, he or
she would probably find a significant relationship.However, the soft drink is not necessarilyresponsible for the deaths, since both variablesmay be related to heat and humidity.
Bluman, Chapter 10 31
-
8/3/2019 Stats Chap10
32/63
Possible Relationships Between
Variables4. There may be a complexity of interrelationships
among many variables.
For example,
A researcher may find a significant relationshipbetween students high school grades and collegegrades. But there probably are many other
variables involved, such as IQ, hours of study,influence of parents, motivation, age, andinstructors.
Bluman, Chapter 10 32
-
8/3/2019 Stats Chap10
33/63
Possible Relationships Between
Variables5. The relationship may be coincidental.
For example,
A researcher may be able to find a significantrelationship between the increase in the number ofpeople who are exercising and the increase in thenumber of people who are committing crimes. But
common sense dictates that any relationshipbetween these two values must be due tocoincidence.
Bluman, Chapter 10 33
-
8/3/2019 Stats Chap10
34/63
10.2 Regression
If the value of the correlation coefficient issignificant, the next step is to determinethe equation of the regression lineregression line
which is the datas line of best fit.
Bluman, Chapter 10 34
-
8/3/2019 Stats Chap10
35/63
Regression
Bluman, Chapter 10 35
Best fitBest fit means that the sum of thesquares of the vertical distance fromeach point to the line is at a minimum.
-
8/3/2019 Stats Chap10
36/63
Regression Line
Bluman, Chapter 10 36
d! y a bx
2
22
22
where
= intercept
= the slope of the line.
!
!
d
y x x xya
n x x
n xy x yb
n x x
a y
b
-
8/3/2019 Stats Chap10
37/63
Chapter 10Correlation and Regression
Section 10-2
Example 10-5
Page #553
Bluman, Chapter 10 37
-
8/3/2019 Stats Chap10
38/63
Example 10-5: Car Rental Companies
Find the equation of the regression line for the data inExample 102, and graph the line on the scatter plot.
Bluman, Chapter 10 38
x = 153.8, y = 18.7, xy = 682.77, x2 = 5859.26,
y2 = 80.67, n = 6
2
22
!
y x x xya
n x x
22
!
n xy x yb
n x x
2
18.7 5859.26 153.8 682.77
6 5859.26 153.8
!
0.396!
2
6 682.77 153.8 18.7
6 5859.26 153.8
!
0.106!
0.396 0.106d d! p ! y a bx y x
-
8/3/2019 Stats Chap10
39/63
Find two points to sketch the graph of the regression line.
Use anyxvalues between 10 and 60. For example, letxequal 15 and 40. Substitute in the equation and find thecorresponding yvalue.
Plot (15,1.986) and (40,4.636), and sketch the resultingline.
Example 10-5: Car Rental Companies
Bluman, Chapter 10 39
0.396 0.106
0.396 0.106 15
1.986
d!
!
!
y x
0.396 0.106
0.396 0.106 40
4.636
d!
!
!
y x
-
8/3/2019 Stats Chap10
40/63
Example 10-5: Car Rental Companies
Find the equation of the regression line for the data inExample 102, and graph the line on the scatter plot.
Bluman, Chapter 10 40
0.396 0.106d! y x
15, 1.986
40, 4.636
-
8/3/2019 Stats Chap10
41/63
Chapter 10Correlation and Regression
Section 10-2
Example 10-6
Page #555
Bluman, Chapter 10 41
-
8/3/2019 Stats Chap10
42/63
Use the equation of the regression line to predict theincome of a car rental agency that has 200,000automobiles.
x = 20 corresponds to 200,000 automobiles.
Hence, when a rental agency has 200,000 automobiles, itsrevenue will be approximately $2.516 billion.
Example 10-11: Car Rental Companies
Bluman, Chapter 10 42
0.396 0.106
0.396 0.106 20
2.516
d!
!
!
y x
-
8/3/2019 Stats Chap10
43/63
Regression
The magnitude of the change in one variablewhen the other variable changes exactly 1 unitis called a marginal changemarginal change. The value ofslope b of the regression line equationrepresents the marginal change.
For valid predictions, the value of thecorrelation coefficient must be significant.
When ris not significantly different from 0, thebest predictor ofyis the mean of the datavalues ofy.
Bluman, Chapter 10 43
-
8/3/2019 Stats Chap10
44/63
Assumptions for Valid Predictions
1. For any specific value of the independentvariablex, the value of the dependent variableymust be normally distributed about theregression line. See Figure 1016(a).
2. The standard deviation of each of thedependent variables must be the same foreach value of the independent variable. See
Figure 1016(b).
Bluman, Chapter 10 44
-
8/3/2019 Stats Chap10
45/63
Extrapolations (Future Predictions)
ExtrapolationExtrapolation, or making predictions beyondthe bounds of the data, must be interpretedcautiously.
Remember that when predictions are made,they are based on present conditions or on thepremise that present trends will continue. Thisassumption may or may not prove true in the
future.
Bluman, Chapter 10 45
-
8/3/2019 Stats Chap10
46/63
Procedure Table
Step 1: Make a table with subject,x, y,xy,x2,and y2 columns.
Step 2: Find the values ofxy,x2, and y2. Place
them in the appropriate columns and sumeach column.
Step 3: Substitute in the formula to find the valueofr.
Step 4: When ris significant, substitute in theformulas to find the values ofa and b forthe regression line equation yd = a +bx.
Bluman, Chapter 10 46
-
8/3/2019 Stats Chap10
47/63
10.3 Coefficient of Determination
and Standard Error of the Estimate The total variationtotal variation is the
sum of the squares of the vertical
distances each point is from the mean. The total variation can be divided into two
parts: that which is attributed to the
relationship ofxand y, and that which isdue to chance.
Bluman, Chapter 10 47
2
y y
-
8/3/2019 Stats Chap10
48/63
Variation
The variation obtained from therelationship (i.e., from the predicted y'values) is and is called the
ex
plained variationex
plained variation. Variation due to chance, found by
, is called the unexplainedunexplainedvariationvariation. This variation cannot beattributed to the relationships.
Bluman, Chapter 10 48
2
d y y
2
d y y
-
8/3/2019 Stats Chap10
49/63
Variation
Bluman, Chapter 10 49
-
8/3/2019 Stats Chap10
50/63
Coefficient of Determiation
The coefficient of determinationcoefficient of determination is theratio of the explained variation to the totalvariation.
The symbol for the coefficient ofdetermination is r2.
Another way to arrive at the value forr2
is to square the correlation coefficient.
Bluman, Chapter 10 50
2 explained variation
total variation
!r
-
8/3/2019 Stats Chap10
51/63
Coefficient of Nondetermiation
The coefficient ofcoefficient of nondeterminationnondetermination isa measure of the unexplained variation.
The formula for the coefficient of
determination is 1.00 r2
.
Bluman, Chapter 10 51
-
8/3/2019 Stats Chap10
52/63
Standard Error of the Estimate
The standard error of estimatestandard error of estimate,denoted by sest is the standard deviationof the observed yvalues about the
predicted y'values. The formula for thestandard error of estimate is:
Bluman, Chapter 10 52
2
2
d
!
est
y ys
n
-
8/3/2019 Stats Chap10
53/63
Chapter 10Correlation and Regression
Section 10-3
Example 10-7
Page #569
Bluman, Chapter 10 53
-
8/3/2019 Stats Chap10
54/63
A researcher collects the following data and determinesthat there is a significant relationship between the age of acopy machine and its monthly maintenance cost. Theregression equation is yd = 55.57 + 8.13x. Find thestandard error of the estimate.
Example 10-7: Copy Machine Costs
Bluman, Chapter 10 54
-
8/3/2019 Stats Chap10
55/63
Example 10-7: Copy Machine Costs
Bluman, Chapter 10 55
MachineAgex(years)
Monthlycost, y yd y yd (y yd)2
ABCD
EF
1234
46
62787090
93103
63.7071.8379.9688.09
88.09104.35
-1.706.17
-9.961.91
4.91-1.35
2.8938.068999.20163.6481
24.10811.8225
169.7392
55.57 8.13
55.57 8.13 1 63.70
55.57 8.13 2 71.8355.57 8.13 3 79.96
55.57 8.13 4 88.09
55.57 8.13 6 104.35
d!
d! !
d
! !d! !
d! !
d! !
y x
y
yy
y
y
2
2
d!
est
y ys
n
169.73926.51
4! !
ests
-
8/3/2019 Stats Chap10
56/63
Chapter 10Correlation and Regression
Section 10-3
Example 10-8
Page #570
Bluman, Chapter 10 56
-
8/3/2019 Stats Chap10
57/63
Example 10-8: Copy Machine Costs
Bluman, Chapter 10 57
2
2
!
est
y a y b xys
n
-
8/3/2019 Stats Chap10
58/63
Example 10-8: Copy Machine Costs
Bluman, Chapter 10 58
496 1778
MachineAgex(years)
Monthlycost, y xy y 2
ABCD
EF
1234
46
62787090
93103
62156210360
372618
3,8446,0844,9008,100
8,64910,609
2
2
!
est
y a y b xys
n
42,186
42,186 55.57 496 8.13 17786.48
4
! !
ests
-
8/3/2019 Stats Chap10
59/63
Formula for the Prediction Interval
about a Value yd
Bluman, Chapter 10 59
2
2 22
2
2 22
11
11
with d.f. = - 2
d
d
est
est
n x Xy t s y
n n x x
n x Xy t s
n n x x
n
E
E
-
8/3/2019 Stats Chap10
60/63
Chapter 10Correlation and Regression
Section 10-3
Example 10-9Page #571
Bluman, Chapter 10 60
-
8/3/2019 Stats Chap10
61/63
For the data in Example 107, find the 95% predictioninterval for the monthly maintenance cost of a machinethat is 3 years old.
Step 1: Find
Step 2: Find yd forx = 3.
Step 3: Find sest.
(as shown in Example 10-13)
Example 10-9: Copy Machine Costs
Bluman, Chapter 10 61
2, , and . x x X 2 2020 82 3.3
6! ! ! ! x x X
55.57 8.13 3 79.96d! !y
6.48!est
s
-
8/3/2019 Stats Chap10
62/63
Step 4: Substitute in the formula and solve.
Example 10-9: Copy Machine Costs
Bluman, Chapter 10 62
2
2 22
2
2 22
11
11
d
d
est
est
n x Xy t s y
n n x x
n x Xy t s
n n x x
E
E
2
2
2
2
6 3 3.3179.96 2.776 6.48 1
66 82 20
6 3 3.3179.96 2.776 6.48 1
6 6 82 20
y
-
8/3/2019 Stats Chap10
63/63
Step 4: Substitute in the formula and solve.
Example 10-9: Copy Machine Costs
2
2
2
2
6 3 3.3179.96 2.776 6.48 1
6 6 82 20
6 3 3.31
79.96 2.776 6.48 1 6 6 82 20
y
79.96 19.43 79.96 19.43
60.53 99.39
y
y
Hence, you can be 95% confident that the interval60.53 < y < 99.39 contains the actual value ofy.