Regression analysis by Muthama JM - JKUAT

41
REGRESSION ANALYSIS ECONOMETRICS By Muthama JKUAT

Transcript of Regression analysis by Muthama JM - JKUAT

Page 1: Regression analysis by Muthama JM - JKUAT

REGRESSION ANALYSIS

ECONOMETRICS

By Muthama

JKUAT

Page 2: Regression analysis by Muthama JM - JKUAT

REGRESSION ANALYSIS

PRESENTED BY

JAPHETH MUTINDA MUTHAMA

PRESENTED TO

PROFESSOR NAMUSONGEJKUAT - KENYA

Page 3: Regression analysis by Muthama JM - JKUAT

MEANING OF REGRESSION:The dictionary meaning of the word Regression is ‘Stepping back’ or ‘Going back’. Regression is the measures of the average relationship between two or more variables in terms of the original units of the data. It attempts to establish the functional relationship between the variables and thereby provide a mechanism for prediction or forecasting.

It describes the relationship between two (or more) variables.Regression analysis uses data to identify relationships among variables by applying regression models The relationships between the variables e.g X and Y can be used to make predictions on the same.

The ’independent’ variable ‘X’ is usually called the repressor (there may be one or more of these), the ’dependent’ variable y is the response variable.

Page 4: Regression analysis by Muthama JM - JKUAT

Regression

Regression is thus an explanation of causation.

If the independent variable(s) sufficiently explain the variation in the dependent variable, the model can be used for prediction.

Independent variable (x)

Dep

ende

nt v

aria

ble

(Y)

Page 5: Regression analysis by Muthama JM - JKUAT

APPLICATION OF REGRESSION ANALYSIS IN RESEARCH

i. It helps in the formulation and determination of functional

relationship between two or more variables.  

ii. It helps in establishing a cause and effect relationship between

two variables in economics and business research. 

iii. It helps in predicting and estimating the value of dependent

variable as price, production, sales etc.  

iv. It helps to measure the variability or spread of values of a

dependent variable with respect to the regression line

Page 6: Regression analysis by Muthama JM - JKUAT

USE OF REGRESSION IN ORGANIZATIONS

In the field of business regression is widely used by businessmen in;•Predicting future production•Investment analysis•Forecasting on sales etc.It is also used in sociological study and economic planning to find the projections of population, birth rates. death rates

So the success of a businessman depends on the correctness of the various estimates that he is required to make.

Page 7: Regression analysis by Muthama JM - JKUAT
Page 8: Regression analysis by Muthama JM - JKUAT

METHODS OF STUDYING REGRESSION:

Or

Page 9: Regression analysis by Muthama JM - JKUAT

Algebraically method

1.Least Square Method-:The regression equation of X on Y is : X= a+bXWhere,

X=Dependent variable and Y=Independent variableThe regression equation of Y on X is:

Y = a+bXWhere,

Y=Dependent variable X=Independent variable

Page 10: Regression analysis by Muthama JM - JKUAT

Simple Linear Regression

Independent variable (x)

Dep

ende

nt v

aria

ble

(y)

The output of a regression is a function that predicts the dependent variable based upon values of the independent variables.

Simple regression fits a straight line to the data.

y = a + bX ± є

a (y intercept)

b = slope= ∆y/ ∆x

є

Page 11: Regression analysis by Muthama JM - JKUAT

The output of a simple regression is the coefficient β and the constant A. The equation is then:

y = A + β * x + ε

where ε is the residual error.

β is the per unit change in the dependent variable for each unit change in the independent variable. Mathematically:

β =∆ y∆ x

Page 12: Regression analysis by Muthama JM - JKUAT

Multiple Linear Regression

More than one independent variable can be used to explain variance in the dependent variable, as long as they are not linearly related.

A multiple regression takes the form:

y = A + β X + β X + … + β k Xk + ε

where k is the number of variables, or parameters.

1 1 2 2

Page 13: Regression analysis by Muthama JM - JKUAT

Multicollinearity

Multicollinearity is a condition in which at least 2 independent variables are highly linearly correlated. It will often crash computers.

Example table of Correlations

  Y X1 X2Y 1.000    X1 0.802 1.000  X2 0.848 0.578 1.000

A correlations table can suggest which independent variables may be significant. Generally, an ind. variable that has more than a .3 correlation with the dependent variable and less than .7 with any other ind. variable can be included as a possible predictor.

Page 14: Regression analysis by Muthama JM - JKUAT

Nonlinear Regression

Nonlinear functions can also be fit as regressions. Common choices include Power, Logarithmic, Exponential, and Logistic, but any continuous function can be used.

Page 15: Regression analysis by Muthama JM - JKUAT

Example1-: From the following data obtain the regression equations using the method of Least Squares.

X 3 2 7 4 8

Y 6 1 8 5 9

Solution-:X Y XY X2 Y2

3 6 18 9 36

2 1 2 4 1

7 8 56 49 64

4 5 20 16 25

8 9 72 64 81

24X 29Y 168XY 1422 X 2072 Y

Page 16: Regression analysis by Muthama JM - JKUAT

XbnaY

2XbXaXY

Substitution the values from the table we get

29=5a+24b…………………(i)168=24a+142b84=12a+71b………………..(ii)

Multiplying equation (i ) by 12 and (ii) by 5

348=60a+288b………………(iii)420=60a+355b………………(iv)

By solving equation(iii)and (iv) we get

a=0.66 and b=1.07

Page 17: Regression analysis by Muthama JM - JKUAT

By putting the value of a and b in the Regression equation Y on X we get

Y=0.66+1.07X

Now to find the regression equation of X on Y ,The two normal equation are

2YbYaXY

YbnaX

Substituting the values in the equations we get

24=5a+29b………………………(i)168=29a+207b…………………..(ii)

Multiplying equation (i)by 29 and in (ii) by 5 we get

a=0.49 and b=0.74

Page 18: Regression analysis by Muthama JM - JKUAT

Substituting the values of a and b in the Regression equation X and Y

X=0.49+0.74Y

2.Deaviation from the Arithmetic mean method:

The calculation by the least squares method are quit cumbersome when the values of X and Y are large. So the work can be simplified by using this method.The formula for the calculation of Regression Equations by this method:

Regression Equation of X on Y-)()( YYbXX xy

Regression Equation of Y on X-)()( XXbYY yx

2y

xybxy

2xxy

byxand

Where, xybyxband = Regression

Coefficient

Page 19: Regression analysis by Muthama JM - JKUAT

Example2-: from the previous data obtain the regression equations byTaking deviations from the actual means of X and Y series.

X 3 2 7 4 8

Y 6 1 8 5 9

X Y x2 y2 xy

3 6 -1.8 0.2 3.24 0.04 -0.36

2 1 -2.8 -4.8 7.84 23.04 13.44

7 8 2.2 2.2 4.84 4.84 4.84

4 5 -0.8 -0.8 0.64 0.64 0.64

8 9 3.2 3.2 10.24 10.24 10.24

XXx YYy

24X 29Y 8.262 x 8.28 xy8.382 y 0x 0 y

Solution-:

Page 20: Regression analysis by Muthama JM - JKUAT

Regression Equation of X on Y is

49.074.0

8.574.08.4

8.58.388.288.4

2

YXYX

YX

yxy

bxy

Regression Equation of Y on X is)()( XXbYY yx

66.007.1)8.4(07.18.5

8.48.268.288.5

2

XYXY

XY

xxy

byx

………….(I)

………….(II)

)()( YYbXX xy

Page 21: Regression analysis by Muthama JM - JKUAT

It would be observed that these regression equations are same as those obtained by the direct method .

3.Deviation from Assumed mean method-:When actual mean of X and Y variables are in fractions ,the calculations can be simplified by taking the deviations from the assumed mean.

The Regression Equation of X on Y-:

22

yy

yxyxxy

ddN

ddddNb

The Regression Equation of Y on X-:

22

xx

yxyxyx

ddN

ddddNb

)()( YYbXX xy

)()( XXbYY yx

But , here the values of and will be calculated by following formula: xyb yxb

Page 22: Regression analysis by Muthama JM - JKUAT

Example-: From the data given in previous example calculate regression equations by assuming 7 as the mean of X series and 6 as the mean of Y series.

X YDev. From

assu. Mean 7 (dx)=X-7

Dev. From assu. Mean 6 (dy)=Y-6 dxdy

3 6 -4 16 0 0 0

2 1 -5 25 -5 25 +25

7 8 0 0 2 4 0

4 5 -3 9 -1 1 +3

8 9 1 1 3 9 +3

Solution-:

2xd 2

yd

24X 29Y 11xd 1yd 512xd 392

yd 31yxdd

Page 23: Regression analysis by Muthama JM - JKUAT

The Regression Coefficient of X on Y-:

22

yy

yxyxxy

ddN

ddddNb

74.0194144

119511155

)1()39(5)1)(11()31(5

2

xy

xy

xy

xy

b

b

b

b

8.5529

YN

YY

The Regression equation of X on Y-:

49.074.0)8.5(74.0)8.4(

)()(

YXYX

YYbXX xy

8.4524

XN

XX

Page 24: Regression analysis by Muthama JM - JKUAT

The Regression coefficient of Y on X-:

22

xx

yxyxyx

ddN

ddddNb

07.1134144

12125511155

)11()51(5)1)(11()31(5

2

yx

yx

yx

yx

b

b

b

b

The Regression Equation of Y on X-: )()( XXbYY yx

66.007.1)8.4(07.1)8.5(

XYXY

It would be observed the these regression equations are same as those obtained by the least squares method and deviation from arithmetic mean .

Page 25: Regression analysis by Muthama JM - JKUAT

SIMPLE REGRESSION

This assumes the model y = β0 + βx + ε

Example:

Assume variables Y and X Explained by the following model

Y = 0 + x

Where (Y) is called the dependent (or response) variable and X the independent (or predictor, or explanatory) variable.

The two variables can be explained in the following model E(Y | X = x) = 0 + x (the “population line”)

Page 26: Regression analysis by Muthama JM - JKUAT

Cont…..

The interpretation is as follows:

where 0 is the (unknown) intercept and 1 is the (unknown) slope or

incremental change in Y per unit change in X.

0 and 1 are not known exactly, but are estimated from sample data

and their estimates can be denoted b0 and b1.

Note that the actual value of σ is usually not known.

The two regression coefficients are called the slope and intercept.

Their actual values are also unknown and should always be estimated using the empirical data at hand.

Page 27: Regression analysis by Muthama JM - JKUAT

MULTIVARIATE (LINEAR) REGRESSION

This is a regression model with multiple independent variables Here, the independent (regressor) variables x1, x2.... xn with only one dependent (response) variable yThe model therefore assumes the following format; yi = β0 + β1x1 + β2x2 + ...... βnxn+ ε

Where 1, 2, ... n, are the first index labels of the variable and the second observation.NB: The exact values of β and ε are, and will always remain unknown

Page 28: Regression analysis by Muthama JM - JKUAT

Polynomial Regression

This is a special case of multivariate regression, with only one independent variablex, but an x-y relationship which is clearly nonlinear (at the same time, there is no ‘physical’ model to rely on). y = β0 + β1x + β2x2 + β3x3.....+ βnxn + εEffectively, this is the same as having a multivariate model with x1 ≡ x, x2 ≡ x2, x3 ≡ x3

Page 29: Regression analysis by Muthama JM - JKUAT

NONLINEAR REGRESSION

This is a model with one independent variable (the results can be easily extended to several) and ‘n’ unknown parameters, which we will call b1,b2, ... bn:y = f (x, b) + εwhere f (x, b) is a specific (given) function of the independent variable and the ‘n’ parameters.

Page 30: Regression analysis by Muthama JM - JKUAT

Types of Lines

Page 31: Regression analysis by Muthama JM - JKUAT

Scatter plot

15.0 20.0 25.0 30.0 35.0

Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

20000

25000

30000

35000

40000

Pers

onal

Inco

me

Per C

apita

, cur

rent

dol

lars

, 19

99

Percent of Population with Bachelor's Degree by Personal Income Per Capita

•This is a linear relationship•It is a positive relationship.•As population with BA’s increases so does the personal income per capita.

Page 32: Regression analysis by Muthama JM - JKUAT

Regression Line

15.0 20.0 25.0 30.0 35.0

Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

20000

25000

30000

35000

40000

Pers

onal

Inco

me

Per C

apita

, cur

rent

dol

lars

, 19

99

Percent of Population with Bachelor's Degree by Personal Income Per Capita

R Sq Linear = 0.542

•Regression line is the best straight line description of the plotted points and use can use it to describe the association between the variables.•If all the lines fall exactly on the line then the line is 0 and you have a perfect relationship.

Page 33: Regression analysis by Muthama JM - JKUAT

Things to noteRegression focuses on association, not causation.Association is a necessary prerequisite for inferring causation, but also:

1. The independent variable must preceed the dependent variable.2. The two variables must be inline with a given theory,3. Competing independent variables must be eliminated.

Page 34: Regression analysis by Muthama JM - JKUAT

Regression Table

•The regression coefficient is not a good indicator for the strength of the relationship.•Two scatter plots with very different dispersions could produce the same regression line.

15.0 20.0 25.0 30.0 35.0

Percent of Population 25 years and Over with Bachelor's Degree or More, March 2000 estimates

20000

25000

30000

35000

40000

Pers

onal

Inco

me

Per C

apita

, cur

rent

dol

lars

, 19

99

Percent of Population with Bachelor's Degree by Personal Income Per Capita

R Sq Linear = 0.542

0.00 200.00 400.00 600.00 800.00 1000.00 1200.00

Population Per Square Mile

20000

25000

30000

35000

40000

Pers

onal

Inco

me

Per C

apita

, cur

rent

dol

lars

, 199

9

Percent of Population with Bachelor's Degree by Personal Income Per Capita

R Sq Linear = 0.463

Page 35: Regression analysis by Muthama JM - JKUAT

Regression coefficient

The regression coefficient is the slope of the regression line wil tell;• What the nature of the relationship between the variables is.• How much change in the independent variables is associated with

thechange in the dependent variable.• The larger the regression coefficient the more the change.

Page 36: Regression analysis by Muthama JM - JKUAT

Pearson’s r

• To determine strength you look at how closely the dots are clustered around the line. The more tightly the cases are clustered, the stronger the relationship, while the more distant, the weaker.

• Pearson’s r is given a range of -1 to + 1 with 0 being no linear relationship at all.

Page 37: Regression analysis by Muthama JM - JKUAT

Reading the tables

Model Summary

.736a .542 .532 2760.003Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Percent of Population 25 yearsand Over with Bachelor's Degree or More, March 2000estimates

a.

•When you run regression analysis on SPSS you get a 3 tables. Each tells you something about the relationship.•The first is the model summary.•The R is the Pearson Product Moment Correlation Coefficient.•In this case R is .736•R is the square root of R-Squared and is the correlation between the observed and predicted values of dependent variable.

Page 38: Regression analysis by Muthama JM - JKUAT

R-Square

Model Summary

.736a .542 .532 2760.003Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Percent of Population 25 yearsand Over with Bachelor's Degree or More, March 2000estimates

a.

•R-Square is the proportion of variance in the dependent variable (income per capita) which can be predicted from the independent variable (level of education). •This value indicates that 54.2% of the variance in income can be predicted from the variable education. Note that this is an overall measure of the strength of association, and does not reflect the extent to which any particular independent variable is associated with the dependent variable. •R-Square is also called the coefficient of determination.

Page 39: Regression analysis by Muthama JM - JKUAT

Adjusted R-squareModel Summary

.736a .542 .532 2760.003Model1

R R SquareAdjustedR Square

Std. Error ofthe Estimate

Predictors: (Constant), Percent of Population 25 yearsand Over with Bachelor's Degree or More, March 2000estimates

a.

•As predictors are added to the model, each predictor will explain some of the variance in the dependent variable simply due to chance. •One could continue to add predictors to the model which would continue to improve the ability of the predictors to explain the dependent variable, although some of this increase in R-square would be simply due to chance variation in that particular sample. •The adjusted R-square attempts to yield a more honest value to estimate the R-squared for the population. The value of R-square was .542, while the value of Adjusted R-square was .532. There isn’t much difference because we are dealing with only one variable. •When the number of observations is small and the number of predictors is large, there will be a much greater difference between R-square and adjusted R-square.•By contrast, when the number of observations is very large compared to the number of predictors, the value of R-square and adjusted R-square will be much closer.

Page 40: Regression analysis by Muthama JM - JKUAT

ANOVAANOVAb

4.32E+08 1 432493775.8 56.775 .000a

3.66E+08 48 7617618.5867.98E+08 49

RegressionResidualTotal

Model1

Sum ofSquares df Mean Square F Sig.

Predictors: (Constant), Percent of Population 25 years and Over with Bachelor'sDegree or More, March 2000 estimates

a.

Dependent Variable: Personal Income Per Capita, current dollars, 1999b.

•The p-value associated with this F value is very small (0.0000). •These values are used to answer the question "Do the independent variables reliably predict the dependent variable?". •The p-value is compared to your alpha level (typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably predict the dependent variable". •If the p-value were greater than 0.05, you would say that the group of independent variables does not show a statistically significant relationship with the dependent variable, or that the group of independent variables does not reliably predict the dependent variable.

Page 41: Regression analysis by Muthama JM - JKUAT

Part of the Regression Equation• b represents the slope of the line

• It is calculated by dividing the change in the dependent variable by the change in the independent variable.

• The difference between the actual value of Y and the calculated amount is called the residual.

• The represents how much error there is in the prediction of the regression equation for the y value of any individual case as a function of X.