Chapter 5 : Introductory Linear Regression - UniMAP...
-
Upload
dinhkhuong -
Category
Documents
-
view
253 -
download
4
Transcript of Chapter 5 : Introductory Linear Regression - UniMAP...
Chapter 5 :
Introductory Linear
Regression
INTRODUCTION TO LINEAR
REGRESSION • Regression – is a statistical procedure for establishing the relationship
between 2 or more variables.
• This is done by fitting a linear equation to the observed data.
• The regression line is used by the researcher to see the trend and make
prediction of values for the data.
• There are 2 types of relationship:
– Simple ( 2 variables)
– Multiple (more than 2 variables)
Many problems in science and engineering involve exploring the
relationship between two or more variables.
Two statistical techniques:
(1) Regression Analysis
(2) Computing the Correlation Coefficient (r).
Linear regression - study on the linear relationship between two
or more variables.
This is done by formulate a linear equation to the observed data.
The linear equation is then used to predict values for the data.
In simple linear regression only two variables are involved:
i. X is the independent variable.
ii. Y is dependent variable.
The correlation coefficient (r) tells us how strongly two
variables are related.
Example 5.1:
1) A nutritionist studying weight loss programs might wants to find out
if reducing intake of carbohydrate can help a person reduce weight.
a) X is the carbohydrate intake (independent variable).
b) Y is the weight (dependent variable).
2) An entrepreneur might want to know whether increasing the cost of
packaging his new product will have an effect on the sales volume.
a) X is cost
b) Y is sales volume
SCATTER DIAGRAM
• A scatter plot is a graph or ordered pairs (x,y).
• The purpose of scatter plot – to describe the nature of the
relationships between independent variable, X and
dependent variable, Y in visual way.
• The independent variable, x is plotted on the horizontal axis
and the dependent variable, y is plotted on the vertical axis.
Positive Linear Relationship
E(y)
x
Slope b1
is positive
Regression line
Intercept b0
SCATTER DIAGRAM
Negative Linear Relationship
E(y)
x
Slope b1
is negative
Regression line Intercept b0
SCATTER DIAGRAM
No Relationship
E(y)
x
Slope b1
is 0
Regression line Intercept b0
SCATTER DIAGRAM
GRAPHICAL METHOD FOR DETERMINING
REGRESSION
• A linear regression can be develop by freehand plot of the
data.
Example 10.2:
The given table contains values for 2 variables, X and Y. Plot
the given data and make a freehand estimated regression line.
X -3 -2 -1 0 1 2 3
Y 1 2 3 5 8 11 12
5.1 SIMPLE LINEAR REGRESSION
MODEL
Linear regression model is a model that expresses the
linear relationship between two variables.
The simple linear regression model is written as:
where ;
0
1
ˆ = intercept of the line with the Y-axis
ˆ slope of the line
b
b
0 1ˆ ˆY Xb b
The Least Square method is the method most commonly
used for estimating the regression coefficients
The straight line fitted to the data set is the line:
where is the estimated value of y for a given value
of X.
5.2 INFERENCES ABOUT ESTIMATED
PARAMETERS
Y
0 1 and b b
0 1ˆ ˆY Xb b
LEAST SQUARES METHOD
i) y-Intercept for the Estimated Regression
Equation,
0 1ˆ ˆy xb b
and are the mean of and respectivelyx y x y
0b
ii) Slope for the Estimated Regression Equation,
1 1
1
2
12
1
2
12
1
n n
i ini i
xy i i
i
n
ini
yy i
i
n
ini
xx i
i
x y
S x yn
y
S yn
x
S xn
1
xy
xx
S
Sb
Math 1,
x
65 63 76 46 68 72 68 57 36 96
Math 2,
y
68 66 86 48 65 66 71 57 42 87
a) Develop an estimated linear regression model with “Math 1” as the independent variable and “Math 2” as the dependent variable.
b) Predict the score a student would obtain “Math 2” if he scored 60 marks in “Math 1”.
The data below represent scores obtained by ten students in
subject Mathematics 1 and Mathematics 2.
EXAMPLE 5.2: STUDENTS SCORE IN MATHEMATICS
x y x2 y2 xy
65 68 4225 4624 4420
63 66 3969 4356 4158
76 86 5776 7396 6536
46 48 2116 2304 2208
68 65 4624 4225 4420
72 66 5184 4356 4752
68 71 4624 5041 4828
57 57 3249 3249 3249
36 42 1296 1764 1512
96 87 9216 7569 8352
TOTAL 647 656 44279 44884 44435
Solution
102
1
10 102
1 1
10 10
1 1
10 44279
647 64.7 44884
656 65.6 44435
Solution
n x
x x y
y y xy
0 1
1
0 1
ˆ ˆˆ
ˆ
ˆ ˆ
xy
xx
Y X
S
S
y x
b b
b
b b
647 65644435
10
1991.8
xy
x yS xy
n
2
2
64744279
10
2418.1
xx
xS x
n
1
1991.8
2418.1
0.8237
xy
xx
S
Sb
0 1
ˆ ˆ
65.6 0.8237 64.7
12.3063
y xb b
0 1ˆ ˆˆ
12.3063 0.8237
Y X
X
b b
b When 60,
ˆ 12.3063 0.8237 60
61.7283
61.73
X
Y
5.3 ADEQUACY OF THE MODEL COEFFICIENT OF
DETERMINATION( R2)
• The coefficient of determination is a measure of the variation of
the dependent variable (Y) that is explained by the regression
line and the independent variable (X).
• The symbol for the coefficient of determination is r2 or R2.
Example :
If r = 0.90, then r2 =0.81. It means that 81% of the variation in
the dependent variable (Y) is accounted for by the variations in the
independent variable (X).
• The rest of the variation, 0.19 or 19%, is
unexplained and called the coefficient of non
determination.
• Formula for the coefficient of non
determination is 1- r2
The coefficient of determination is:
2
2 xy
xx yy
Sr
S S
5.4 PEARSON PRODUCT MOMENT
CORRELATION COEFFICIENT (r)
Correlation measures the strength of a linear
relationship between the two variables.
Also known as Pearso ’s product o e t coefficie t of correlation.
The symbol for the sample coefficient of correlation
is (r)
Formula :
.
xy
xx yy
Sr
S S
Properties of (r):
Values of r close to 1 implies there is a strong
positive linear relationship between x and y.
Values of r close to -1 implies there is a strong
negative linear relationship between x and y.
Values of r close to 0 implies little or no linear
relationship between x and y.
1 1r
EXAMPLE 5.4: REFER PREVIOUS EXAMPLE 5.2,
Calculate the value of r and interpret its
meaning.
SOLUTION:
.
1991.8
2418.1 1850.4
0.9416
xy
xx yy
Sr
S S
Thus, there is a strong positive linear relationship between
score obtain Math 1 (x) and Math 2 (y).
2
2
2
10
65644884
10
1850.4
yy
yS y
• To test the existence of a linear relationship
between two variables x and y, we proceed with
testing the hypothesis.
• Test commonly used:
5.5 TEST FOR LINEARITY OF REGRESSION
t -Test F-Test
1. Determine the hypotheses.
2. Compute Critical Value/ level of significance.
3. Compute the test statistic.
( no linear relationship)
(exist linear relationship)
t-Test
, 2 2
nt
0:0:
11
10
bb
HH
xx
xyyy
Sn
SSVar
Vart
1
2
ˆ)ˆ(
)ˆ(
ˆ
1
1
1
1
bb
b
b
2,2
2,2
or
nn
tttt
4. Determine the Rejection Rule.
Reject H0 if :
There is a significant relationship between
variable X and Y.
5.Conclusion.
EXAMPLE 5.5: REFER PREVIOUS EXAMPLE 5.3,
Test to determine if their scores in Math 1 and Math 2 is related.
Use α=0.05
SOLUTION: 1)
2)
( no linear r/ship)
(exist linear r/ship) 0:0:
11
10
bb
HH
306.205.0
8,2
05.0
t
3)
b
b
1
1( )
0.8237
0.0108
7.926
testt
Var
bb
1
1
1( )
2
1850.4 (0.8237)(1991.8) 1
8 2418.1
0.0108
yy xy
xx
S SVar
n S
4) Rejection Rule:
5) Conclusion:
Thus, we reject H0. The score Math 1(x) has a linear relationship
to the score in Math 2(y) .
0.025,8
7.926 2.306
testt t
F Test
1. Determine the hypothesis
2. Determine the rejection region
3. Compute the test statistics
4. Conclusion
1. Determine the hypothesis
(NO RELATIONSHIP)
(THERE IS RELATIONSHIP)
2. Compute Critical Value/ level of significance.
3. Compute the test statistics
0:10
bH
0:11
bH
test
MSRF
MSE
,1, 2nF
2. Determine the rejection region
We reject H0 if
p-value <
3. Conclusion
If we reject H0 there is a significant relationship between variable
X and Y.
,1, 2test nf F
General form of ANOVA table:
ANOVA Test
1) State the hypothesis
2) Select the distribution to use: F-distribution
3) Calculate the value of the test statistic: F
4) Determine rejection and non rejection regions:
5) Make a decision: Reject H0/failed to reject H0
Source of
Variation
Degrees of
Freedom(df)
Sum of Squares Mean Squares Value of the
Test Statistic
Regression 1 MSR=SSR/1
F=MSR
MSE Error n-2 MSE=SSE/n-2
Total n-1
1 xySSR Sb
yySST S
SSE SST SSR
Example The manufacturer of Cardio Glide exercise equipment wants to
study the relationship between the number of months since the
glide was purchased and the length of time the equipment was
used last week.
1) Determine the regression equation.
2) At α=0.01, test whether there is a linear relationship between the
variables
Regression equation:
Solution (1):
ˆ 9.939 0.637Y X
1) Hypothesis:
1) F-distribution table:
2) Test Statistic:
F = MSR/MSE = 17.303
or using p-value approach:
significant value =0.003
4) Rejection region:
Since F statistic > F table (17.303>11.2586 ), we reject H0 or since
p-value (0.003<0.01 )we reject H0
5) Thus, there is a linear relationship between the number of months
and length of time the equipment was used.
Solution (2):
0 1
1 1
: 0
: 0
H
H
bb
0.01,1,8 11.2586F
EXERCISE 5.1:
The owner of a small factory that produces working gloves is
concerned about the high cost of air conditioning in the summer.
Keeping the higher temperature in the factory may lower productivity.
During summer, he conducted an experiment with temperature
settings from 68 to 81 degrees Fahrenheit and measures each day’s productivity which produced the following table:
(a) Find the regression model.
(b) Predict the number of pairs of gloves produced if x = 74.
(c) Compute the Pearson correlation coefficient. What you can say
about the relationship of the two variables?
(d) Can you conclude that the temperature is linearly related to the
number of pairs of gloves produced? Use α=0.05.
Temperature 72 71 78 75 81 77 68 76
Number of Pairs of gloves
(in hundreds)
37 37 32 36 33 35 39 34
EXERCISE 5.2 :
An agricultural scientist planted alfalfa on several plots of land,
identical except for the soil pH. Following are the dry matter
yields (in pounds per acre) for each plot.
pH Yield
4.6 1056
4.8 1833
5.2 1629
5.4 1852
5.6 1783
5.8 2647
6.0 2131
a) Compute the estimated regression line for predicting yield
from pH.
b) If the pH is increased by 0.1, by how much would you
predict the yield to increase or decrease?
c) For what pH would you predict a yield of 1500 pounds per
acre?
d) Calculate coefficient correlation, and interpret the results.