Unit 9 Regression SLM
-
Upload
munmun8327 -
Category
Documents
-
view
217 -
download
3
Transcript of Unit 9 Regression SLM
Course: Statistics
Unit 9
Regression Analysis
Page 2 of 24
Table of Contents
9.1. Learning Objectives ........................................................................................................................... 3
9.2. Introduction ........................................................................................................................................ 3
9.3. Regression Analysis ............................................................................................................................ 4
9.4. Regression Lines ................................................................................................................................. 4
9.5. Regression Coefficient ....................................................................................................................... 5
9.6. Differences between Correlation Coefficient and Regression Coefficient .................................... 5
9.7. Examples ............................................................................................................................................. 8
9.8. Standard Error of Estimate ............................................................................................................ 13
9.9. Application in Finance ..................................................................................................................... 17
9.9.1. Correlation between Two Variables ......................................................................................................... 17
9.9.2. Beta () of a Stock/Share ......................................................................................................................... 17
9.10. Non-Linear Regression .................................................................................................................. 19
9.11. Logistic Regression......................................................................................................................... 21
9.12. Summary ......................................................................................................................................... 24
Page 3 of 24
9.1. Learning Objectives
By the end of this unit, you should be able to:
Recognise the need of regression analysis
Apply the regression equations to calculate correlation coefficient
Calculate the regression equations for a correlation study
Calculate the standard error of the estimate
9.2. Introduction
The word Regress means the tendency of the data to tend to the normal value.
Correlation analysis attempts to study the relationship between the two variables x and y. Regression
analysis attempts to predict the average x for a given y. In Regression it is attempted to quantify the
dependence of one variable on the other.
A Case
Mr. Ajit is a G.M of a tyre manufacturing company. He is very happy that the sales of
tyres are increasing. However he was of the opinion that increase in sales is due to sales
force. His secretary, Ms. Anitha pointed out that the performance record sent by
Marketing Manager does not show any changes. Mr. Ajit was very curious. When he was
talking to his friends son, Mr. Suresh who holds a position in Motor Vehicle Registration
office he learnt that Registration of vehicles is increasing. Mr. Ajit immediately thinks of
his statistician, Mr. Satish. He consults him. Mr. Satish promises to come back with
solution to the problem.
(Cont. in topic ‘Differences between Correlation Coefficient and Regression Coefficient’)
There are two variables x and y. y depends on x. The dependence is expressed in the form
of the following equation. In regression one of the variables is dependent and the others are
independent.
Y = a + bx
Regression is defined as, “the measure of the average relationship between two or more
variables in terms of the original units of the data.”
Page 4 of 24
9.3. Regression Analysis
Regression Analysis is used to:
Estimate the values of the dependent variables from the values of the independent variables
Get a measure of the error involved while using the regression line as a basis for estimation
Regression coefficient is used to calculate correlation coefficient; the square of correlation that prevails
between the given two variables. It provides a mathematical relationship between two or more variables. It is
based on cause and effect relationship.
9.4. Regression Lines
For a set of paired observations there exist two straight lines.
The smaller angle between these lines, higher is the correlation between the variables. If we fit a straight line
to scatter diagram data some of the points will lie above the straight line and some below the line. The
deviation of each point from the line is called Error.
The regression equations found by the above conditions is said to fit by method of least squares. ‘byx’ and
‘bxy’ are called Regression Coefficients.
The regression model captures the systematic behaviour of data. The non-systematic behaviour of data
cannot be captured and are known as errors. The errors are due to random components that cannot be
predicted. Assuming that the random errors are “Normally distributed” we can construct confidence level
and interval for random errors.
The regression lines always intersect at x y . The regression lines have equation,
The regression equation of y on x / simple linear Regression model is given by xXbyxyY .
The regression equation of x on y / simple linear regression model is given by yYbxyxX .
Where,
22 dxdx
dydxdxdybyx and
22 dydy
dydxdxdybxy
The line drawn such that sum of vertical deviation is zero and sum of their squares is minimum
is called Regression line of y on x. It is used to estimate y – values for given x – values.
The line drawn such that sum of horizontal deviation is zero and sum of their squares is
minimum is called Regression line of x on y. It is used to estimate x - values for given y -
values.
Page 5 of 24
9.5. Regression Coefficient
Regression coefficient is used to calculate correlation coefficient; the square of correlation that prevails
between the given two variables. It provides a mathematical relationship between two or more variables. It is
based on cause and effect relationship.
1.. 2 bxybyxrbxybyx
1. bxybyx
If byx is negative, then bxy is also negative and r is negative.
They can also be expressed as x
yrbyx
and
y
xrbyx
It is an absolute measure.
9.6. Differences between Correlation Coefficient and Regression Coefficient
Table 9.1
Correlation Coefficient Regression Coefficient
rxy = ryx
byx = bxy
-1< r <1 if byx can be greater than one, but bxy must
be less than one such that byx.byx<1
It has no units attached to it It has unit attached to it
There exist nonsense correlation There is no such nonsense regression
It is not based on cause and effect
relationship
It is based on cause and effect relationship
It indirectly helps in estimation It is meant for estimation
Page 6 of 24
(Cont. from topic ‘A Case’)
Mr. Satish collects data on Number of Vehicles registered and number of tyres sold as
follows:
Table 9.2
Number of Vehicle
Registered in week
(X)
23 29 29 35 42 46 50 54 64 66 76 78
Number of Tyre’s sold
per week (Y)
69 96 102 118 125 126 138 178 156 184 176 225
He worked out the regression equation of sales on number of vehicles registered as follows:-
Table 9.3
X Y X2 XY
23 69 529 1587 82.432 180.4305
29 96 841 2784 95.7959 0.0416
29 102 841 2958 95.7959 38.4904
35 118 1225 4130 109.1594 78.1557
42 125 1764 5250 124.7502 0.0624
46 126 2116 5796 133.6592 58.6629
50 138 2500 6900 142.5682 20.8681
54 178 2916 9612 151.4772 703.4609
64 156 4096 9184 173.7497 315.0502
66 184 4356 12144 178.2042 33.5918
76 176 5776 13376 200.4766 599.1060
78 225 6084 17550 204.9311 402.7592
Total 592 1693 33044 92071 2430.68
2272.2889.319
472.712byx
33.4912
592
083.14112
1693
The regression equation is
33.492272.2083.141
2128.312272.2
(Cont. in next page)
Page 7 of 24
(Cont. from previous page)
And he concludes that there is good relationship between the variables. His conclusion is
that increase is number of registration has increased the sales. He further supports it by
calculating correlation coefficient. The calculation through MS-Excel is shown at later
below. This information will help Mr. Ajit to plan his future production.
He worked out the regression equation of sales on number of vehicles registered as follows:
Table 9.4
Y
17 16.6555 0.1187
17 17.1765 0.0311
18 17.6975 0.0915
18 18.2185 0.0477
19 18.7395 0.0678
19 19.5605 0.3142
19 20.0815 1.1696
20 20.6025 0.3630
21 21.1235 0.0153
22 21.6445 0.1264
Total 2.3453
10
3453.2YXS
484.023453.0
Page 8 of 24
9.7. Examples
Example 9.1:
Find regression equation from the following data
Table 9.5
Age of Husband 18 19 20 21 22 23 24 25 26 27
Age of Wife 17 17 18 18 19 19 19 20 21 22
And hence calculate correlation coefficient.
Solution:
Table 9.6
Age of
husband
(x)
dx = x-
22
dx2 Age of
wife (y)
dy = y-19 dy2 dx dy
18 -4 16 17 -2 4 8
19 -3 9 17 -2 4 6
20 -2 4 18 -1 1 2
21 -1 1 18 -1 1 1
22 0 0 19 0 0 0
23 1 1 19 0 0 0
24 2 4 19 0 0 0
25 3 9 20 1 1 3
26 4 16 21 2 4 8
27 5 25 22 3 9 15
Total 225 5 85 190 0 24 43
5.2210
225 19
10
190
Regression equation of Y on X is:
Regression Equation of X on Y is:
2775.7521.0
)5.22(521.019
521.0825
430
)5(8510
)0)(5(43102
XY
XY
byx
XXbyxYY
966.0792.1521.0
548.11792.1
)19(792.15.22
392.124
43
)5(2410
)0)(5(43102
r
YX
YX
bxy
(Cont. in next page)
Page 9 of 24
(Cont. from previous page)
Using MS Excel - Procedure
Regression Analysis
Regression Statistics
Multiple R 0.966353136
R Square 0.933838384
Adjusted R Square 0.925568182
Standard Error 0.445516384
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 22.41212121 22.41212121 112.9160305 5.38409E-06
Residual 8 1.587878788 0.198484848
Total 9 24
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept 7.272727273 1.112575252 6.536840775 0.000180955 4.707124143 9.838330403
Age of
Husband 0.521212121 0.04904974 10.62619549 5.38409E-06 0.408103219 0.634321023
Residual Output
Observation
Predicted Age of
Wife Residuals
1 16.65454545 0.345454545
2 17.17575758 -0.175757576
3 17.6969697 0.303030303
4 18.21818182 -0.218181818
5 18.73939394 0.260606061
6 19.26060606 -0.260606061
7 19.78181818 -0.781818182
8 20.3030303 -0.303030303
9 20.82424242 0.175757576
10 21.34545455 0.654545455
Page 10 of 24
Example
A study of wheat prices at Mumbai and Kanpur yields the following data:
Mumbai Kanpur
Mean
7.50
8.10
Standard Deviation
0.326
0.207
Example 9.2:
In a correlation study we have the following data.
Table 9.7
Series X Series Y
Mean S.D 65 67
S.D 2.5 3.5
Correlation coefficient 0.8
Find the two regression equations.
Solution:
Regression equation of y and x is:
8.512.1
)65(12.167
655.2
5.3)8.0(67
.
XY
XY
XY
XXrYYx
y
Regression equation of x and y is:
72.2657.0
)67(57.065
675.3
5.2)8.0(65
.
YX
XX
XX
YYrXXy
x
Page 11 of 24
The correlation coefficient between the prices of Mumbai and Kanpur is 0.774. Estimate the price at Kanpur,
if the price at Mumbai is Rs.8.
Solution:
Given
X = 7.5 Y = 8.10 σx = 0.326 σy = 0.207 r = 0.774
The regression equation which we need to find is Y on X (where X Mumbai and Y Kanpur)
)( XXbYY yx …… eq. (1)
Where, x
y
yx rb
Substituting the values in eq. (1) we get,
50.7326.0
207.0774.010.8 XY
4145.44914.0
5.74914.010.8
XY
XY
Estimation of price at Kanpur when the price at Mumbai is Rs. 8
1195.8
4145.484914.0
Y
Y
The price at Kanpur is Rs. 8.12, when the price at Mumbai is Rs. 8.
Page 12 of 24
Example
The following table shows the amount spent on advertising and the corresponding sales of the product from
10 companies:
Company Sales (Rs. in
lakh)
Advertising cost
(Rs. in lakh)
A 25 8
B 35 12
C 29 11
D 24 5
E 38 14
F 12 3
G 18 6
H 27 8
I 17 4
J 30 9
a. Plot a scatter gram showing the relationship between advertising cost and sales of the
product.
b. Estimate the equation of the regression line of sales on advertising costs.
c. Use the regression line to forecast sales if advertising costs were Rs. 10 lakh.
Solution:
a. A scatter gram showing the relationship between advertising cost and sales of the product.
0
10
20
30
40
0 5 10 15
Sale
s (
Rs. in
lak
h)
Advertising cost (Rs. in lakh)
Page 13 of 24
b. The equation of the regression line of sales on advertising costs.
Y X X2 XY
25 8 64 200
35 12 144 420
29 11 121 319
24 5 25 120
38 14 196 532
12 3 9 36
18 6 36 108
27 8 64 216
17 4 16 68
30 9 81 270
Y = 225 X = 80 X2 = 756 XY = 2289
b =
22 xxn
yxxyn a =
y
nb
x
n
b = 28075610
80255228910
a =
10
8014655.2
10
255
= 2.14655 = 25.5 - 17.1724
= 8.3276
Y= 8.33 + 2.15x
c. Forecast of sales if advertising costs were Rs. 1000 lakh, we put X = 10 in the equation,
Y = 8.33 + 2.15 x 10
= 29.83
As the original data was given to the nearest integer (whole number), the forecast of sales
= 30 (or Rs. 30 lakh)
9.8. Standard Error of Estimate
The standard error of estimates helps to measure the accuracy of the estimated figures in regression analysis.
If the value of the standard error of estimate is small, it shows that the estimate provided by the regression
equation is better and closer. If standard error of estimate is zero, it shows that there is no variation about the
line and the correlation will be perfect.
Page 14 of 24
The standard error of regression of X values from Xc is:
2
ySx ,
216 rxSy ,
ba
ySx
2
, and
2
cySx
“The standard error of estimate uses to ascertain how good and representative the regression
line is as a description of the average relationship between two series.”
Page 15 of 24
Example 9.3:
The following results were worked out from scores in Statistics and Mathematics in a
certain examination.
Table 9.8
Scores in Statistics (X) Scores in Mathematics (Y)
Mean 40 48
Standard Deviation 10 15
Karl Pearson’s correlation coefficient between x and y is = + 0.42. Find the regression lines
x on y and y on x. Use the regression lines to find the value of y when x = 50 and value of x
when y = 30.
Solution:
Given the following data:
42.0;15;10;40;40 rYX yx
The regression line x on y is:
)()( YYrXXy
x
................... (1)
The regression line y on x is:
)()( XXrYYx
y
................... (2)
Therefore substituting the values we get the respective equation as:
8.6.26279.0 yX ................ (3) and
80.2263.0 xY ................ (4)
Therefore;
When y=30; x=35.518 using equation (3)
When x=50; y=54.3 by using equation (4)
Page 16 of 24
Example 9.4:
From the following data obtain the two regression equations
Table 9.9
X 12 4 20 8 16
Y 18 22 10 16 14
Estimate Y for X = 15 and estimate X for Y = 20
Solution:
= (12 + 4 + 20 + 8 + 16)/ 5 =12 = mean of X
= (18 + 22 + 10 + 16 + 14) / 5 = 16 = mean of Y
Table 9.10
X Y X –
X - 12
Y –
Y - 16
(X – )2 (Y – )
2 (X – ) (Y – )
12 8 0 2 0 4 0
4 22 - 8 6 64 36 - 48
20 10 8 - 6 64 36 - 48
8 16 - 4 0 16 0 0
16 14 4 - 2 16 4 - 8
160 80 - 104
65.0160
1042
yxb and
3.1
80
1042
yxb
Regression equation X on Y is given by:
YXTherefore
YX
b
3.18.32,
)16(3.112
1
When Y = 20; X = 32.8 – 1.3 x 20 = 6.8
Regression equation Y on X is given by:
XYTherefore
XY
b
65.08.23,
)12(65.016
1
When X = 15; Y = 23.8 – 0.65 x 15 = 14.05
Page 17 of 24
9.9. Application in Finance
9.9.1. Correlation between Two Variables
The correlation between two variables can be studied for
Time series data
Cross-sectional data, that is, data about sales revenue and advertisement expenses during a year for a
number of companies
The results and conclusions for time series data is valid for one company only. But for cross sectional data it
is valid for a group of companies at industry level.
We may take a particular company and study the correlation between prices of its stock in BSE and NSE.
9.9.2. Beta () of a Stock/Share
A stock with beta more than one say, 1.10, would rise 10% as much as the market index or would fall 10%
as compared to the index.
The volatility of stock is measured by its beta value. Beta represents the risk associated with the stock.
An aggressive investor would opt for a stock with beta value more than one.
A conservative investor would opt for the stock with beta value less than one.
Beta is measured through regression analysis. The percentage daily/weekly/monthly change in stock is taken
as dependent variable and the corresponding change in market index such as BSE or NSE is taken as
independent variable. Then the regression equation is fitted which is of the form Y= + X.
Thus a stock’s “” measures the relationship between the stock’s rate of return (Y) and the average rate of
return for the market as a whole.
The coefficient of determination “r2” obtained in the study provides a measure of volatility explained in a
stock’s price by the market.
One can determine regression equation between advertisement expenses and sales revenue
for different sectors of industries say, manufacturing, IT, chemical, pharmaceutical etc.
Beta measures which reflects the sensitiveness of a stock to movement in the stock market
index like NSE-Nifty or BSE-Sensex, as a whole. Always Beta value for market is taken as
one.
Page 18 of 24
Example 9.5:
The following data relates to the closing BSE sensex and stock price of RIL for 10 trading
days during a period. Find “” and interpret.
Table 9.11 Days BSE Stock price of RIL
1 12342 1150
2 12378 1163
3 12360 1148
4 12461 1150
5 12479 1147
6 12538 1169
7 12730 1192
8 12928 1213
9 12848 1216
10 12885 1208
Solution:
First we calculate the percentage changes in both BSE (X) and RIL(Y) as follows
dayIndexfor
dayindexfordayindexforRILBSE
st
stnd
1
10012/
Table 9.12 X Y
+0.2917 1.1304
-0.1454 -1.2898
0.8172 0.1742
0.1445 -0.2609
0.4728 1.9180
1.5313 1.9675
1.5554 1.7617
-0.6188 0.2473
0.2880 -0.6579
(Cont. in next page)
Page 19 of 24
9.10. Non-Linear Regression
Test of Hypothesis on regression coefficient by analysis will tell us whether there exists a linear relationship
or not suppose the relation is not linear, and then it can be always converted to linear relation by using
logarithm
(Cont. from previous page)
Using MS Excel - Procedure
Regression Analysis
Regression Statistics
Multiple R 0.657986268
R Square 0.432945929
Adjusted R
Square 0.351938204
Standard
Error 0.961822395
Observations 9
ANOVA
df SS MS F
Significance
F
Regression 1 4.9442110 4.9442110 5.34450178 0.05404187
Residual 7 6.4757162 0.9251023
Total 8 11.419927
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept 0.0291111 0.392985 0.0740769 0.9430215 -0.9001508 0.958373159
X 1.0903451 0.4716397 2.3118178 0.0540418 -0.0249055 2.205595895
Consider, the relation y = abx. This can be written as:
BXAY
baLogy
loglog
When, Y = log Y, A = log a, and B = log b.
Page 20 of 24
Example 9.6:
Consider the following incentive scheme and the turnover expected
Table 9.13
Incentive increase in % of Base Year Turnover (Rs. in crores)
1 110
2 120
3 132
5 160
8 215
10 260
Fit a curve of type Y = axb
Solution:
Log y = log a + blog x
Y = A + Bx
Table 9.14
X
Log x
Y
Log y X
2 Y
2
0 2.04 0 0
0.3 2.08 0.09 0.63
0.48 2.12 0.23 1.01
0.70 2.2 0.49 1.54
0.90 2.33 0.81 2.11
1.00 2.41 1.00 2.41
3.38 13.19 2.62 7.7
3766.04244.1172.15
5822.442.46
38.362.26
19.1338.37.762
A = 1.99 taking antilog the equation is
Y = 99.72 (2.364)x
Page 21 of 24
Example
Find the second degree regression polynomial y = a + bx + cx2 by least square method to the data given
below.
X
0
1
2
3
4
Y
1
0
3
10
21
Solution:
We need to fit a second degree regression polynomial of the form y = a + bx + cx2. In order to obtain the value for the
constants a, b and c the normal equations are:
∑y = Na + b∑x + c∑x2
∑xy = a∑x + b∑x2 + c∑x
3
∑x2y = a∑x
2 + b∑x
3 + c∑x
4
Calculation
X Y X2 XY X
2Y X
3 X
4
0 1 0 0 0 0 0
1 0 1 0 0 1 1
2 3 4 6 12 8 16
3 10 9 30 90 27 81
4 21 16 84 336 64 256
10 35 30 120 438 100 354
Substituting the values in the above equations and solving the simultaneous equations we get:
35 = 5a + 10b + 30c
120 = 10a + 30b + 100c
438 = 30a + 100b + 354c
a = 1
b = - 3
c = 2
Therefore, the second degree parabola is Y = 1 – 3x + 2x2.
Page 22 of 24
9.11. Logistic Regression
In linear regression model the variables are assumed to take continuous values in the interval. However there
are situations wherein the dependent variable follows Binomial distribution. In such cases logistic regression
is used.
The relationship between dependent and independent variable is of the form.
ye
1
1 where, P is the probability of success
Pe y 1
1 or
1
11
Pe y
1
ye or
BXAY
PPeY e
)1log(loglog
Page 23 of 24
Example 9.7:
Suppose an event either is successful or failure. These are the values of Y, Viz 1 or 0 taken
by dependent variable. The corresponding revenue is given for twenty events as follows:
Y X
0 3.45
1 3.36
0 3.12
0 3.15
0 3.14
1 3.48
1 3.42
1 3.32
0 3.31
1 3.29
1 3.46
1 3.34
0 3.25
1 3.41
1 3.48
1 3.21
1 3.25
1 3.16
1 3.28
0 3.22
Then Regression equation is Y = 1.881 x 5.566
Note:
It is left as an exercise for the reader to find regression equation.
This regression equation does not yield
Y = 0 or Y = 1 when we put X = 2
Y = 3.762 – 5.566 = 1.204 > 1
Therefore we require a different technique to predict Y-value.
Let us construct class intervals
Mid X Prob of Success P Y = log (P / 1-P)
3.1-3.2 3.15 1/4 = 0.25 -0.477
3.2-3.3 3.25 4/6 = 0.67 0.308
3.3-3.4 3.35 3/4 = 0.75 0.477
3.4-3.5 3.45 5/6 = 0.81 0.689
(Cont. in next page)
Page 24 of 24
9.12. Summary
In this unit we learnt what is regression, how to measure and how to interpret SPSS output. Further the
application of regression in financial field was explained with example. We also learnt how to calculate the
standard error of the estimate.
(Cont. from previous page)
Note:
There are 4 reading in the interval 3.1-3.2 and only one corresponds to 1
P = ¼
Regression equation of Y on X is Y = 3.667 X – 11.8572
(or)
1log =3.667 x -11.8572
The P values are given by:
852.111
852.11667.3
667.3
e
e
For example when X = 2.7 Y = -1.9511 and P = 12%