MULTIPLE REGRESSION ANALYSIS: ESTIMATION - …tastan/teaching/03 Multiple Regression... · Multiple...
Transcript of MULTIPLE REGRESSION ANALYSIS: ESTIMATION - …tastan/teaching/03 Multiple Regression... · Multiple...
MULTIPLE REGRESSION ANALYSIS:ESTIMATION
Huseyin Tastan1
1Yıldız Technical UniversityDepartment of Economics
These presentation notes are based onIntroductory Econometrics: A Modern Approach (2nd ed.)
by J. Wooldridge.
17 Ekim 2012
Econometrics I: Multiple Regression: Estimation - H. Tastan 1
Multiple Regression Analysis
In the simple regression analysis with only one explanatoryvariable the assumption SLR.3 is generally very restrictive andunrealistic.
Recall that the key assumption was SLR.3: all other factorsaffecting y are uncorrelated with x (ceteris paribus).
Multiple regression analysis allow us to explicitly control formany other factors that simultaneously affect y.
By adding new explanatory variables we can explain more ofthe variation in y. In other words, we can develop moresuccessful models.
Additional advantage: multiple regression can incorporategeneral functional form relationships.
Econometrics I: Multiple Regression: Estimation - H. Tastan 2
Multiple Regression Analysis
In the simple regression analysis with only one explanatoryvariable the assumption SLR.3 is generally very restrictive andunrealistic.
Recall that the key assumption was SLR.3: all other factorsaffecting y are uncorrelated with x (ceteris paribus).
Multiple regression analysis allow us to explicitly control formany other factors that simultaneously affect y.
By adding new explanatory variables we can explain more ofthe variation in y. In other words, we can develop moresuccessful models.
Additional advantage: multiple regression can incorporategeneral functional form relationships.
Econometrics I: Multiple Regression: Estimation - H. Tastan 2
Multiple Regression Analysis
In the simple regression analysis with only one explanatoryvariable the assumption SLR.3 is generally very restrictive andunrealistic.
Recall that the key assumption was SLR.3: all other factorsaffecting y are uncorrelated with x (ceteris paribus).
Multiple regression analysis allow us to explicitly control formany other factors that simultaneously affect y.
By adding new explanatory variables we can explain more ofthe variation in y. In other words, we can develop moresuccessful models.
Additional advantage: multiple regression can incorporategeneral functional form relationships.
Econometrics I: Multiple Regression: Estimation - H. Tastan 2
Multiple Regression Analysis
In the simple regression analysis with only one explanatoryvariable the assumption SLR.3 is generally very restrictive andunrealistic.
Recall that the key assumption was SLR.3: all other factorsaffecting y are uncorrelated with x (ceteris paribus).
Multiple regression analysis allow us to explicitly control formany other factors that simultaneously affect y.
By adding new explanatory variables we can explain more ofthe variation in y. In other words, we can develop moresuccessful models.
Additional advantage: multiple regression can incorporategeneral functional form relationships.
Econometrics I: Multiple Regression: Estimation - H. Tastan 2
Multiple Regression Analysis
In the simple regression analysis with only one explanatoryvariable the assumption SLR.3 is generally very restrictive andunrealistic.
Recall that the key assumption was SLR.3: all other factorsaffecting y are uncorrelated with x (ceteris paribus).
Multiple regression analysis allow us to explicitly control formany other factors that simultaneously affect y.
By adding new explanatory variables we can explain more ofthe variation in y. In other words, we can develop moresuccessful models.
Additional advantage: multiple regression can incorporategeneral functional form relationships.
Econometrics I: Multiple Regression: Estimation - H. Tastan 2
Multiple Regression Analysis Examples
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
Wage Equation
wage = β0 + β1educ+ β2exper + u
wage: hourly wages (in US dollars); educ: level of education (inyears); exper: level of experience (in years)
β1: measures the impact of eduction on wage, holding allother factors fixed.
β2: measures the ceteris paribus effect of experience on wage.
This wage equation allows us to measure the impact ofeducation on wage holding experience fixed. This was notpossible in simple regression analysis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 3
Multiple Regression Analysis Examples
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
Wage Equation
wage = β0 + β1educ+ β2exper + u
wage: hourly wages (in US dollars); educ: level of education (inyears); exper: level of experience (in years)
β1: measures the impact of eduction on wage, holding allother factors fixed.
β2: measures the ceteris paribus effect of experience on wage.
This wage equation allows us to measure the impact ofeducation on wage holding experience fixed. This was notpossible in simple regression analysis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 3
Multiple Regression Analysis Examples
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
Wage Equation
wage = β0 + β1educ+ β2exper + u
wage: hourly wages (in US dollars); educ: level of education (inyears); exper: level of experience (in years)
β1: measures the impact of eduction on wage, holding allother factors fixed.
β2: measures the ceteris paribus effect of experience on wage.
This wage equation allows us to measure the impact ofeducation on wage holding experience fixed. This was notpossible in simple regression analysis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 3
Multiple Regression Analysis Examples
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
Wage Equation
wage = β0 + β1educ+ β2exper + u
wage: hourly wages (in US dollars); educ: level of education (inyears); exper: level of experience (in years)
β1: measures the impact of eduction on wage, holding allother factors fixed.
β2: measures the ceteris paribus effect of experience on wage.
This wage equation allows us to measure the impact ofeducation on wage holding experience fixed. This was notpossible in simple regression analysis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 3
Multiple Regression Analysis Examples
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
Wage Equation
wage = β0 + β1educ+ β2exper + u
wage: hourly wages (in US dollars); educ: level of education (inyears); exper: level of experience (in years)
β1: measures the impact of eduction on wage, holding allother factors fixed.
β2: measures the ceteris paribus effect of experience on wage.
This wage equation allows us to measure the impact ofeducation on wage holding experience fixed. This was notpossible in simple regression analysis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 3
Multiple Regression Analysis Examples
Student Success and Family Income
avgscore = β0 + β1expend+ β2avginc+ u
avgscore: average standardized test score; expend: educationspending per student; avgincome: average family income
If avginc is not included in the model directly, its effect willbe included in the error term, u.
Because average family income is generally correlated witheducation expenditures, the key assumption SLR.3 will beinvalid: x (expend) will be correlated with u leading to biasedOLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 4
Multiple Regression Analysis Examples
Student Success and Family Income
avgscore = β0 + β1expend+ β2avginc+ u
avgscore: average standardized test score; expend: educationspending per student; avgincome: average family income
If avginc is not included in the model directly, its effect willbe included in the error term, u.
Because average family income is generally correlated witheducation expenditures, the key assumption SLR.3 will beinvalid: x (expend) will be correlated with u leading to biasedOLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 4
Multiple Regression Analysis Examples
Student Success and Family Income
avgscore = β0 + β1expend+ β2avginc+ u
avgscore: average standardized test score; expend: educationspending per student; avgincome: average family income
If avginc is not included in the model directly, its effect willbe included in the error term, u.
Because average family income is generally correlated witheducation expenditures, the key assumption SLR.3 will beinvalid: x (expend) will be correlated with u leading to biasedOLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 4
Multiple Regression Analysis
Multiple regression analysis allow us to use more generalfunctional forms.
Consider the following quadratic model of consumption:
cons = β0 + β1inc+ β2inc2 + u
where x1 = inc and x2 = inc2
β1: we cannot fix inc2 while changing inc.
Marginal propensity to consume (MPC) is approximatedby:
∆cons
∆inc≈ β1 + 2β2inc
MPC depends on the level of income.
Econometrics I: Multiple Regression: Estimation - H. Tastan 5
Multiple Regression Analysis
Multiple regression analysis allow us to use more generalfunctional forms.
Consider the following quadratic model of consumption:
cons = β0 + β1inc+ β2inc2 + u
where x1 = inc and x2 = inc2
β1: we cannot fix inc2 while changing inc.
Marginal propensity to consume (MPC) is approximatedby:
∆cons
∆inc≈ β1 + 2β2inc
MPC depends on the level of income.
Econometrics I: Multiple Regression: Estimation - H. Tastan 5
Multiple Regression Analysis
Multiple regression analysis allow us to use more generalfunctional forms.
Consider the following quadratic model of consumption:
cons = β0 + β1inc+ β2inc2 + u
where x1 = inc and x2 = inc2
β1: we cannot fix inc2 while changing inc.
Marginal propensity to consume (MPC) is approximatedby:
∆cons
∆inc≈ β1 + 2β2inc
MPC depends on the level of income.
Econometrics I: Multiple Regression: Estimation - H. Tastan 5
Multiple Regression Analysis
Multiple regression analysis allow us to use more generalfunctional forms.
Consider the following quadratic model of consumption:
cons = β0 + β1inc+ β2inc2 + u
where x1 = inc and x2 = inc2
β1: we cannot fix inc2 while changing inc.
Marginal propensity to consume (MPC) is approximatedby:
∆cons
∆inc≈ β1 + 2β2inc
MPC depends on the level of income.
Econometrics I: Multiple Regression: Estimation - H. Tastan 5
Multiple Regression Analysis
Multiple regression analysis allow us to use more generalfunctional forms.
Consider the following quadratic model of consumption:
cons = β0 + β1inc+ β2inc2 + u
where x1 = inc and x2 = inc2
β1: we cannot fix inc2 while changing inc.
Marginal propensity to consume (MPC) is approximatedby:
∆cons
∆inc≈ β1 + 2β2inc
MPC depends on the level of income.
Econometrics I: Multiple Regression: Estimation - H. Tastan 5
Multiple Regression Analysis
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
“Zero conditional mean” assumption:
E(u|x1, x2) = 0
In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.
E.g., in the wage equation:
E(u|educ, exper) = 0
This means that unobservable factors affecting wage are notrelated on average with educ and exper.
If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.
Econometrics I: Multiple Regression: Estimation - H. Tastan 6
Multiple Regression Analysis
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
“Zero conditional mean” assumption:
E(u|x1, x2) = 0
In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.
E.g., in the wage equation:
E(u|educ, exper) = 0
This means that unobservable factors affecting wage are notrelated on average with educ and exper.
If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.
Econometrics I: Multiple Regression: Estimation - H. Tastan 6
Multiple Regression Analysis
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
“Zero conditional mean” assumption:
E(u|x1, x2) = 0
In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.
E.g., in the wage equation:
E(u|educ, exper) = 0
This means that unobservable factors affecting wage are notrelated on average with educ and exper.
If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.
Econometrics I: Multiple Regression: Estimation - H. Tastan 6
Multiple Regression Analysis
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
“Zero conditional mean” assumption:
E(u|x1, x2) = 0
In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.
E.g., in the wage equation:
E(u|educ, exper) = 0
This means that unobservable factors affecting wage are notrelated on average with educ and exper.
If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.
Econometrics I: Multiple Regression: Estimation - H. Tastan 6
Multiple Regression Analysis
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
“Zero conditional mean” assumption:
E(u|x1, x2) = 0
In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.
E.g., in the wage equation:
E(u|educ, exper) = 0
This means that unobservable factors affecting wage are notrelated on average with educ and exper.
If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.
Econometrics I: Multiple Regression: Estimation - H. Tastan 6
Multiple Regression Analysis
The Model with Two Explanatory Variables
y = β0 + β1x1 + β2x2 + u
“Zero conditional mean” assumption:
E(u|x1, x2) = 0
In other words, for all combinations of x1 and x2, theexpected value of unobservables, u, is zero.
E.g., in the wage equation:
E(u|educ, exper) = 0
This means that unobservable factors affecting wage are notrelated on average with educ and exper.
If ability is a part of u, then average ability levels must be thesame across all combinations of education and experience inthe population.
Econometrics I: Multiple Regression: Estimation - H. Tastan 6
Multiple Regression Analysis
The model with two independent variables
E(u|x1, x2) = 0
Test scores and average family income:
E(u|expend, avginc) = 0
All other factors affecting the average test scores (such asquality of schools, student characteristics, etc.) are, onaverage, unrelated to education expenditures and familyincome.
In the quadratic consumption model:
E(u|inc, inc2) = E(u|inc) = 0
Since inc2 is automatically known when inc is known we donot need to write it in the conditional expectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 7
Multiple Regression Analysis
The model with two independent variables
E(u|x1, x2) = 0
Test scores and average family income:
E(u|expend, avginc) = 0
All other factors affecting the average test scores (such asquality of schools, student characteristics, etc.) are, onaverage, unrelated to education expenditures and familyincome.
In the quadratic consumption model:
E(u|inc, inc2) = E(u|inc) = 0
Since inc2 is automatically known when inc is known we donot need to write it in the conditional expectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 7
Multiple Regression Analysis
The model with two independent variables
E(u|x1, x2) = 0
Test scores and average family income:
E(u|expend, avginc) = 0
All other factors affecting the average test scores (such asquality of schools, student characteristics, etc.) are, onaverage, unrelated to education expenditures and familyincome.
In the quadratic consumption model:
E(u|inc, inc2) = E(u|inc) = 0
Since inc2 is automatically known when inc is known we donot need to write it in the conditional expectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 7
Multiple Regression Analysis
The model with two independent variables
E(u|x1, x2) = 0
Test scores and average family income:
E(u|expend, avginc) = 0
All other factors affecting the average test scores (such asquality of schools, student characteristics, etc.) are, onaverage, unrelated to education expenditures and familyincome.
In the quadratic consumption model:
E(u|inc, inc2) = E(u|inc) = 0
Since inc2 is automatically known when inc is known we donot need to write it in the conditional expectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 7
Multiple Regression Analysis
The model with two independent variables
E(u|x1, x2) = 0
Test scores and average family income:
E(u|expend, avginc) = 0
All other factors affecting the average test scores (such asquality of schools, student characteristics, etc.) are, onaverage, unrelated to education expenditures and familyincome.
In the quadratic consumption model:
E(u|inc, inc2) = E(u|inc) = 0
Since inc2 is automatically known when inc is known we donot need to write it in the conditional expectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 7
The Model with k Independent Variables
Model
y = β0 + β1x1 + β2x2 + . . .+ βkxk + u
In this model we have k explanatory variables and k + 1unknown β parameters.
The definition of the error term is the same: it represents allother factors affecting y that are not included in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 8
The Model with k Independent Variables
Model
y = β0 + β1x1 + β2x2 + . . .+ βkxk + u
In this model we have k explanatory variables and k + 1unknown β parameters.
The definition of the error term is the same: it represents allother factors affecting y that are not included in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 8
The Model with k Independent Variables
Model
y = β0 + β1x1 + β2x2 + . . .+ βkxk + u
In this model we have k explanatory variables and k + 1unknown β parameters.
The definition of the error term is the same: it represents allother factors affecting y that are not included in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 8
The Model with k Explanatory Variables
βj : measures the change in y in response to a unit change inxj holding all other xs and unobservable factors u fixed.
For example:
log(salary) = β0 + β1 log(sales) + β2ceoten+ β3ceoten2 + u
ceoten: tenure with the current company (years)
Interpretation of β1: elasticity of salary with respect to sales.Ceteris paribus salary will change by β1% in response to a 1%change in sales.
β2: does not measure the impact of ceoten. We need to takethe quadratic term into account.
Econometrics I: Multiple Regression: Estimation - H. Tastan 9
The Model with k Explanatory Variables
βj : measures the change in y in response to a unit change inxj holding all other xs and unobservable factors u fixed.
For example:
log(salary) = β0 + β1 log(sales) + β2ceoten+ β3ceoten2 + u
ceoten: tenure with the current company (years)
Interpretation of β1: elasticity of salary with respect to sales.Ceteris paribus salary will change by β1% in response to a 1%change in sales.
β2: does not measure the impact of ceoten. We need to takethe quadratic term into account.
Econometrics I: Multiple Regression: Estimation - H. Tastan 9
The Model with k Explanatory Variables
βj : measures the change in y in response to a unit change inxj holding all other xs and unobservable factors u fixed.
For example:
log(salary) = β0 + β1 log(sales) + β2ceoten+ β3ceoten2 + u
ceoten: tenure with the current company (years)
Interpretation of β1: elasticity of salary with respect to sales.Ceteris paribus salary will change by β1% in response to a 1%change in sales.
β2: does not measure the impact of ceoten. We need to takethe quadratic term into account.
Econometrics I: Multiple Regression: Estimation - H. Tastan 9
The Model with k Explanatory Variables
βj : measures the change in y in response to a unit change inxj holding all other xs and unobservable factors u fixed.
For example:
log(salary) = β0 + β1 log(sales) + β2ceoten+ β3ceoten2 + u
ceoten: tenure with the current company (years)
Interpretation of β1: elasticity of salary with respect to sales.Ceteris paribus salary will change by β1% in response to a 1%change in sales.
β2: does not measure the impact of ceoten. We need to takethe quadratic term into account.
Econometrics I: Multiple Regression: Estimation - H. Tastan 9
The Model with k Explanatory Variables
Zero Conditional Mean Assumption
E(u|x1, x2, . . . , xk) = 0
Error term is unrelated with explanatory variables.
If u is correlated with any of xs then OLS estimators will ingeneral be biased. Estimation results will be unreliable.
If there are omitted important variables affecting y thisassumption may not hold. This may lead to “omitted variablebias”.
This assumption also means that functional form is correctlyspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 10
The Model with k Explanatory Variables
Zero Conditional Mean Assumption
E(u|x1, x2, . . . , xk) = 0
Error term is unrelated with explanatory variables.
If u is correlated with any of xs then OLS estimators will ingeneral be biased. Estimation results will be unreliable.
If there are omitted important variables affecting y thisassumption may not hold. This may lead to “omitted variablebias”.
This assumption also means that functional form is correctlyspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 10
The Model with k Explanatory Variables
Zero Conditional Mean Assumption
E(u|x1, x2, . . . , xk) = 0
Error term is unrelated with explanatory variables.
If u is correlated with any of xs then OLS estimators will ingeneral be biased. Estimation results will be unreliable.
If there are omitted important variables affecting y thisassumption may not hold. This may lead to “omitted variablebias”.
This assumption also means that functional form is correctlyspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 10
The Model with k Explanatory Variables
Zero Conditional Mean Assumption
E(u|x1, x2, . . . , xk) = 0
Error term is unrelated with explanatory variables.
If u is correlated with any of xs then OLS estimators will ingeneral be biased. Estimation results will be unreliable.
If there are omitted important variables affecting y thisassumption may not hold. This may lead to “omitted variablebias”.
This assumption also means that functional form is correctlyspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 10
The Model with k Explanatory Variables
Zero Conditional Mean Assumption
E(u|x1, x2, . . . , xk) = 0
Error term is unrelated with explanatory variables.
If u is correlated with any of xs then OLS estimators will ingeneral be biased. Estimation results will be unreliable.
If there are omitted important variables affecting y thisassumption may not hold. This may lead to “omitted variablebias”.
This assumption also means that functional form is correctlyspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 10
The Model with k Explanatory Variables: OLS Estimation
Sample Regression Function - SRF
y = β0 + β1x1 + β2x2 + . . .+ βkxk
Ordinary Least Squares (OLS) estimators minimize the Sum ofSquared Residuals (SSR):
OLS Objective Function
n∑i=1
u2i =
n∑i=1
(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik)2
OLS estimators, βj , j = 0, 1, 2, . . . , k, can be found by solving theFirst Order Conditions (FOC) involving k + 1 equations with k + 1unknowns.
Econometrics I: Multiple Regression: Estimation - H. Tastan 11
The Model with k Explanatory Variables: OLS Estimation
Sample Regression Function - SRF
y = β0 + β1x1 + β2x2 + . . .+ βkxk
Ordinary Least Squares (OLS) estimators minimize the Sum ofSquared Residuals (SSR):
OLS Objective Function
n∑i=1
u2i =
n∑i=1
(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik)2
OLS estimators, βj , j = 0, 1, 2, . . . , k, can be found by solving theFirst Order Conditions (FOC) involving k + 1 equations with k + 1unknowns.
Econometrics I: Multiple Regression: Estimation - H. Tastan 11
The Model with k Explanatory Variables: OLS Estimation
OLS First Order Conditions
n∑i=1
(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0
n∑i=1
xi1(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0
n∑i=1
xi2(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0
......
...n∑
i=1
xik(yi − β0 − β1xi1 − β2xi2 − . . .− βkxik) = 0
Econometrics I: Multiple Regression: Estimation - H. Tastan 12
The Model with k Explanatory Variables: OLS Estimation
OLS and Method of Moments
OLS first order conditions can also be obtained using the methodof moments. They are the sample counterparts to the followingpopulation moment conditions:
E(u) = 0
E(x1u) = 0
E(x2u) = 0...
......
E(xku) = 0
Econometrics I: Multiple Regression: Estimation - H. Tastan 13
Interpretation of SRF
The model with 2 explanatory variables
y = β0 + β1x1 + β2x2
Slope parameter estimates, βj , gives us the predicted changein y given the changes in x variables, ceteris paribus.
β1: holding x2 fixed, i.e. ∆x2 = 0
∆y = β1∆x1
Similarly, holding x1 fixed
∆y = β2∆x2
Econometrics I: Multiple Regression: Estimation - H. Tastan 14
Interpretation of SRF
The model with 2 explanatory variables
y = β0 + β1x1 + β2x2
Slope parameter estimates, βj , gives us the predicted changein y given the changes in x variables, ceteris paribus.
β1: holding x2 fixed, i.e. ∆x2 = 0
∆y = β1∆x1
Similarly, holding x1 fixed
∆y = β2∆x2
Econometrics I: Multiple Regression: Estimation - H. Tastan 14
Interpretation of SRF
The model with 2 explanatory variables
y = β0 + β1x1 + β2x2
Slope parameter estimates, βj , gives us the predicted changein y given the changes in x variables, ceteris paribus.
β1: holding x2 fixed, i.e. ∆x2 = 0
∆y = β1∆x1
Similarly, holding x1 fixed
∆y = β2∆x2
Econometrics I: Multiple Regression: Estimation - H. Tastan 14
Interpretation of SRF
The model with 2 explanatory variables
y = β0 + β1x1 + β2x2
Slope parameter estimates, βj , gives us the predicted changein y given the changes in x variables, ceteris paribus.
β1: holding x2 fixed, i.e. ∆x2 = 0
∆y = β1∆x1
Similarly, holding x1 fixed
∆y = β2∆x2
Econometrics I: Multiple Regression: Estimation - H. Tastan 14
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Intercept term is estimated as β0 = 1.29. This gives us theaverage colGPA when hsGPA = 0 and ACT = 0. Becausethere is no observation with ACT = 0 and hsGPA = 0, thisinterpretation is not meaningful.
Slope parameter on hsGPA is estimated to be 0.453. HoldingACT fixed, given a one-point change in high school GPA,college GPA is predicted to increase by 0.453 points, almosthalf a point.
Econometrics I: Multiple Regression: Estimation - H. Tastan 15
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Intercept term is estimated as β0 = 1.29. This gives us theaverage colGPA when hsGPA = 0 and ACT = 0. Becausethere is no observation with ACT = 0 and hsGPA = 0, thisinterpretation is not meaningful.
Slope parameter on hsGPA is estimated to be 0.453. HoldingACT fixed, given a one-point change in high school GPA,college GPA is predicted to increase by 0.453 points, almosthalf a point.
Econometrics I: Multiple Regression: Estimation - H. Tastan 15
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Intercept term is estimated as β0 = 1.29. This gives us theaverage colGPA when hsGPA = 0 and ACT = 0. Becausethere is no observation with ACT = 0 and hsGPA = 0, thisinterpretation is not meaningful.
Slope parameter on hsGPA is estimated to be 0.453. HoldingACT fixed, given a one-point change in high school GPA,college GPA is predicted to increase by 0.453 points, almosthalf a point.
Econometrics I: Multiple Regression: Estimation - H. Tastan 15
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Alternative interpretation of 0.453
If we choose two students, A and B, with the same ACT scorebut hsGPA of student A is one point higher than thehsGPA of student B, then we predict that student A’scollege GPA is 0.453 points higher than that of student B’s.
ACT has + sign but its effect is very small.
Econometrics I: Multiple Regression: Estimation - H. Tastan 16
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Alternative interpretation of 0.453
If we choose two students, A and B, with the same ACT scorebut hsGPA of student A is one point higher than thehsGPA of student B, then we predict that student A’scollege GPA is 0.453 points higher than that of student B’s.
ACT has + sign but its effect is very small.
Econometrics I: Multiple Regression: Estimation - H. Tastan 16
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Alternative interpretation of 0.453
If we choose two students, A and B, with the same ACT scorebut hsGPA of student A is one point higher than thehsGPA of student B, then we predict that student A’scollege GPA is 0.453 points higher than that of student B’s.
ACT has + sign but its effect is very small.
Econometrics I: Multiple Regression: Estimation - H. Tastan 16
Example: Determinants of College Success, gpa1.gdt
colGPA Estimation Results
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 students, colGPA: university grade point average (GPA,points out of 4), hsGPA: high school grade point average, ACT :achievement test score
Alternative interpretation of 0.453
If we choose two students, A and B, with the same ACT scorebut hsGPA of student A is one point higher than thehsGPA of student B, then we predict that student A’scollege GPA is 0.453 points higher than that of student B’s.
ACT has + sign but its effect is very small.
Econometrics I: Multiple Regression: Estimation - H. Tastan 16
Example: Determinants of College Success, gpa1.gdt
Regression with only ACT variable:
colGPA simple regression results
colGPA = 2.4 + 0.0271 ACT
The coefficient estimate on ACT is almost three times aslarge as the previous estimate.
But this regression does not allow us to compare two studentswith the same high school GPA
The impact ACT decreases considerably after controlling forthe high school GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 17
Example: Determinants of College Success, gpa1.gdt
Regression with only ACT variable:
colGPA simple regression results
colGPA = 2.4 + 0.0271 ACT
The coefficient estimate on ACT is almost three times aslarge as the previous estimate.
But this regression does not allow us to compare two studentswith the same high school GPA
The impact ACT decreases considerably after controlling forthe high school GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 17
Example: Determinants of College Success, gpa1.gdt
Regression with only ACT variable:
colGPA simple regression results
colGPA = 2.4 + 0.0271 ACT
The coefficient estimate on ACT is almost three times aslarge as the previous estimate.
But this regression does not allow us to compare two studentswith the same high school GPA
The impact ACT decreases considerably after controlling forthe high school GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 17
Example: Determinants of College Success, gpa1.gdt
Regression with only ACT variable:
colGPA simple regression results
colGPA = 2.4 + 0.0271 ACT
The coefficient estimate on ACT is almost three times aslarge as the previous estimate.
But this regression does not allow us to compare two studentswith the same high school GPA
The impact ACT decreases considerably after controlling forthe high school GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 17
Example: Determinants of College Success, gpa1.gdt
Regression with only ACT variable:
colGPA simple regression results
colGPA = 2.4 + 0.0271 ACT
The coefficient estimate on ACT is almost three times aslarge as the previous estimate.
But this regression does not allow us to compare two studentswith the same high school GPA
The impact ACT decreases considerably after controlling forthe high school GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 17
Interpretation of Estimated Regression Function
SRF - Sample Regression Function
y = β0 + β1x1 + β2x2 + . . .+ βkxk.
In terms of changes
∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.
Interpretation of the coefficient on x1, β1: holding all othervariables fixed, i.e. ∆x2 = 0, ∆x3 = 0, . . ., ∆xk = 0
∆y = β1∆x1
Holding all other independent variables fixed (i.e. controllingfor all other x variables), the change in y given a one unitchange in x1 is β1 (in terms of y’s unit of measurement)
Econometrics I: Multiple Regression: Estimation - H. Tastan 18
Interpretation of Estimated Regression Function
SRF - Sample Regression Function
y = β0 + β1x1 + β2x2 + . . .+ βkxk.
In terms of changes
∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.
Interpretation of the coefficient on x1, β1: holding all othervariables fixed, i.e. ∆x2 = 0, ∆x3 = 0, . . ., ∆xk = 0
∆y = β1∆x1
Holding all other independent variables fixed (i.e. controllingfor all other x variables), the change in y given a one unitchange in x1 is β1 (in terms of y’s unit of measurement)
Econometrics I: Multiple Regression: Estimation - H. Tastan 18
Interpretation of Estimated Regression Function
SRF - Sample Regression Function
y = β0 + β1x1 + β2x2 + . . .+ βkxk.
In terms of changes
∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.
Interpretation of the coefficient on x1, β1: holding all othervariables fixed, i.e. ∆x2 = 0, ∆x3 = 0, . . ., ∆xk = 0
∆y = β1∆x1
Holding all other independent variables fixed (i.e. controllingfor all other x variables), the change in y given a one unitchange in x1 is β1 (in terms of y’s unit of measurement)
Econometrics I: Multiple Regression: Estimation - H. Tastan 18
Interpretation of Estimated Regression Function
SRF - Sample Regression Function
y = β0 + β1x1 + β2x2 + . . .+ βkxk.
In terms of changes
∆y = β1∆x1 + β2∆x2 + . . .+ βk∆xk.
Interpretation of the coefficient on x1, β1: holding all othervariables fixed, i.e. ∆x2 = 0, ∆x3 = 0, . . ., ∆xk = 0
∆y = β1∆x1
Holding all other independent variables fixed (i.e. controllingfor all other x variables), the change in y given a one unitchange in x1 is β1 (in terms of y’s unit of measurement)
Econometrics I: Multiple Regression: Estimation - H. Tastan 18
Example: Logarithmic Wage Equation
Estimated equation
log(wage) = 0.284 + 0.092educ+ 0.0041exper + 0.022tenure
n = 526 workers.
Functional form: log-level, 100βj gives us % change in wages.
For example, holding experience and tenure fixed, another yearof education is predicted to increase 9.2% change in wages.
In other words, given two workers with the same level ofexperience and tenure, if one of the workers has a level ofeducation one year higher than the other one, then thepredicted difference between their wages is 9.2
Econometrics I: Multiple Regression: Estimation - H. Tastan 19
Example: Logarithmic Wage Equation
Estimated equation
log(wage) = 0.284 + 0.092educ+ 0.0041exper + 0.022tenure
n = 526 workers.
Functional form: log-level, 100βj gives us % change in wages.
For example, holding experience and tenure fixed, another yearof education is predicted to increase 9.2% change in wages.
In other words, given two workers with the same level ofexperience and tenure, if one of the workers has a level ofeducation one year higher than the other one, then thepredicted difference between their wages is 9.2
Econometrics I: Multiple Regression: Estimation - H. Tastan 19
Example: Logarithmic Wage Equation
Estimated equation
log(wage) = 0.284 + 0.092educ+ 0.0041exper + 0.022tenure
n = 526 workers.
Functional form: log-level, 100βj gives us % change in wages.
For example, holding experience and tenure fixed, another yearof education is predicted to increase 9.2% change in wages.
In other words, given two workers with the same level ofexperience and tenure, if one of the workers has a level ofeducation one year higher than the other one, then thepredicted difference between their wages is 9.2
Econometrics I: Multiple Regression: Estimation - H. Tastan 19
Example: Logarithmic Wage Equation
Estimated equation
log(wage) = 0.284 + 0.092educ+ 0.0041exper + 0.022tenure
n = 526 workers.
Functional form: log-level, 100βj gives us % change in wages.
For example, holding experience and tenure fixed, another yearof education is predicted to increase 9.2% change in wages.
In other words, given two workers with the same level ofexperience and tenure, if one of the workers has a level ofeducation one year higher than the other one, then thepredicted difference between their wages is 9.2
Econometrics I: Multiple Regression: Estimation - H. Tastan 19
Changing More than One Explanatory VariableSimultaneously
Sometimes we want to change more than one x variables atthe same time to find their impact on y.
Also in some cases when of the x variables changes the otherautomatically changes.
For example, in the wage equation when we increase tenure 1year experience also increases by 1 years
In this case the total effect on wage is % 2.61:
∆ log Ucret = 0.0041∆tecrube+ 0.022∆kidem
= 0.0041 + 0.022 = 0.0261
Econometrics I: Multiple Regression: Estimation - H. Tastan 20
Changing More than One Explanatory VariableSimultaneously
Sometimes we want to change more than one x variables atthe same time to find their impact on y.
Also in some cases when of the x variables changes the otherautomatically changes.
For example, in the wage equation when we increase tenure 1year experience also increases by 1 years
In this case the total effect on wage is % 2.61:
∆ log Ucret = 0.0041∆tecrube+ 0.022∆kidem
= 0.0041 + 0.022 = 0.0261
Econometrics I: Multiple Regression: Estimation - H. Tastan 20
Changing More than One Explanatory VariableSimultaneously
Sometimes we want to change more than one x variables atthe same time to find their impact on y.
Also in some cases when of the x variables changes the otherautomatically changes.
For example, in the wage equation when we increase tenure 1year experience also increases by 1 years
In this case the total effect on wage is % 2.61:
∆ log Ucret = 0.0041∆tecrube+ 0.022∆kidem
= 0.0041 + 0.022 = 0.0261
Econometrics I: Multiple Regression: Estimation - H. Tastan 20
Changing More than One Explanatory VariableSimultaneously
Sometimes we want to change more than one x variables atthe same time to find their impact on y.
Also in some cases when of the x variables changes the otherautomatically changes.
For example, in the wage equation when we increase tenure 1year experience also increases by 1 years
In this case the total effect on wage is % 2.61:
∆ log Ucret = 0.0041∆tecrube+ 0.022∆kidem
= 0.0041 + 0.022 = 0.0261
Econometrics I: Multiple Regression: Estimation - H. Tastan 20
OLS Fitted Values and Residuals
Fitted (predicted) value for observation i
yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.
Residual for observation i
ui = yi − yi
u > 0 implies yi > yi, underprediction
u < 0 implies yi < yi, overprediction
Econometrics I: Multiple Regression: Estimation - H. Tastan 21
OLS Fitted Values and Residuals
Fitted (predicted) value for observation i
yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.
Residual for observation i
ui = yi − yi
u > 0 implies yi > yi, underprediction
u < 0 implies yi < yi, overprediction
Econometrics I: Multiple Regression: Estimation - H. Tastan 21
OLS Fitted Values and Residuals
Fitted (predicted) value for observation i
yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.
Residual for observation i
ui = yi − yi
u > 0 implies yi > yi, underprediction
u < 0 implies yi < yi, overprediction
Econometrics I: Multiple Regression: Estimation - H. Tastan 21
OLS Fitted Values and Residuals
Fitted (predicted) value for observation i
yi = β0 + β1xi1 + β2xi2 + . . .+ βkxik.
Residual for observation i
ui = yi − yi
u > 0 implies yi > yi, underprediction
u < 0 implies yi < yi, overprediction
Econometrics I: Multiple Regression: Estimation - H. Tastan 21
Algebraic Properties of Residuals
The sum (and also average) of OLS residuals is zero:
n∑i=1
ui = 0, ¯u = 0
follows directly from the first moment condition.
Sample covariance between OLS residuals and each xj is zero:
n∑i=1
xij ui = 0, j = 1, 2, . . . , k
also follows directly from the population moment conditions.Imposed by the OLS FOC.
The point (xj , y : j = 1, 2, . . . , k) is always on the OLSregression line.
y = ¯y
Econometrics I: Multiple Regression: Estimation - H. Tastan 22
Algebraic Properties of Residuals
The sum (and also average) of OLS residuals is zero:
n∑i=1
ui = 0, ¯u = 0
follows directly from the first moment condition.
Sample covariance between OLS residuals and each xj is zero:
n∑i=1
xij ui = 0, j = 1, 2, . . . , k
also follows directly from the population moment conditions.Imposed by the OLS FOC.
The point (xj , y : j = 1, 2, . . . , k) is always on the OLSregression line.
y = ¯y
Econometrics I: Multiple Regression: Estimation - H. Tastan 22
Algebraic Properties of Residuals
The sum (and also average) of OLS residuals is zero:
n∑i=1
ui = 0, ¯u = 0
follows directly from the first moment condition.
Sample covariance between OLS residuals and each xj is zero:
n∑i=1
xij ui = 0, j = 1, 2, . . . , k
also follows directly from the population moment conditions.Imposed by the OLS FOC.
The point (xj , y : j = 1, 2, . . . , k) is always on the OLSregression line.
y = ¯y
Econometrics I: Multiple Regression: Estimation - H. Tastan 22
Algebraic Properties of Residuals
The sum (and also average) of OLS residuals is zero:
n∑i=1
ui = 0, ¯u = 0
follows directly from the first moment condition.
Sample covariance between OLS residuals and each xj is zero:
n∑i=1
xij ui = 0, j = 1, 2, . . . , k
also follows directly from the population moment conditions.Imposed by the OLS FOC.
The point (xj , y : j = 1, 2, . . . , k) is always on the OLSregression line.
y = ¯y
Econometrics I: Multiple Regression: Estimation - H. Tastan 22
An Alternative Derivation of Coefficient Estimates
The model with two explanatory variables
y = β0 + β1x1 + β2x2.
The slope estimator on x1 is:
β1 =
∑ni=1 ri1yi∑ni=1 r
2i1
where ri1 is the residual obtained from the regression of x1 onx2:
xi1 = α0 + α1xi2 + ri1
This means that β1 is the slope estimate obtained fromregressing y on residuals, ri1:
yi = β0 + β1ri1 + residual
Ceteris paribus (partialling out, netting out): β1 measures thesample relationship between y and x1 after x2 has beenpartialled out.
Econometrics I: Multiple Regression: Estimation - H. Tastan 23
An Alternative Derivation of Coefficient Estimates
The model with two explanatory variables
y = β0 + β1x1 + β2x2.
The slope estimator on x1 is:
β1 =
∑ni=1 ri1yi∑ni=1 r
2i1
where ri1 is the residual obtained from the regression of x1 onx2:
xi1 = α0 + α1xi2 + ri1
This means that β1 is the slope estimate obtained fromregressing y on residuals, ri1:
yi = β0 + β1ri1 + residual
Ceteris paribus (partialling out, netting out): β1 measures thesample relationship between y and x1 after x2 has beenpartialled out.
Econometrics I: Multiple Regression: Estimation - H. Tastan 23
An Alternative Derivation of Coefficient Estimates
The model with two explanatory variables
y = β0 + β1x1 + β2x2.
The slope estimator on x1 is:
β1 =
∑ni=1 ri1yi∑ni=1 r
2i1
where ri1 is the residual obtained from the regression of x1 onx2:
xi1 = α0 + α1xi2 + ri1
This means that β1 is the slope estimate obtained fromregressing y on residuals, ri1:
yi = β0 + β1ri1 + residual
Ceteris paribus (partialling out, netting out): β1 measures thesample relationship between y and x1 after x2 has beenpartialled out.
Econometrics I: Multiple Regression: Estimation - H. Tastan 23
An Alternative Derivation of Coefficient Estimates
The model with two explanatory variables
y = β0 + β1x1 + β2x2.
The slope estimator on x1 is:
β1 =
∑ni=1 ri1yi∑ni=1 r
2i1
where ri1 is the residual obtained from the regression of x1 onx2:
xi1 = α0 + α1xi2 + ri1
This means that β1 is the slope estimate obtained fromregressing y on residuals, ri1:
yi = β0 + β1ri1 + residual
Ceteris paribus (partialling out, netting out): β1 measures thesample relationship between y and x1 after x2 has beenpartialled out.
Econometrics I: Multiple Regression: Estimation - H. Tastan 23
An Alternative Derivation of Coefficient Estimates
The model with two explanatory variables
y = β0 + β1x1 + β2x2.
The slope estimator on x1 is:
β1 =
∑ni=1 ri1yi∑ni=1 r
2i1
where ri1 is the residual obtained from the regression of x1 onx2:
xi1 = α0 + α1xi2 + ri1
This means that β1 is the slope estimate obtained fromregressing y on residuals, ri1:
yi = β0 + β1ri1 + residual
Ceteris paribus (partialling out, netting out): β1 measures thesample relationship between y and x1 after x2 has beenpartialled out.
Econometrics I: Multiple Regression: Estimation - H. Tastan 23
Comparison of Simple and Multiple Regression Estimates
Two models:
y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.
These regressions will generally give different results.
There are two special cases in which they give the sameestimates:
The partial effect of x2 on y is zero, β2 = 0
x1 and x2 are uncorrelated in the sample.
Econometrics I: Multiple Regression: Estimation - H. Tastan 24
Comparison of Simple and Multiple Regression Estimates
Two models:
y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.
These regressions will generally give different results.
There are two special cases in which they give the sameestimates:
The partial effect of x2 on y is zero, β2 = 0
x1 and x2 are uncorrelated in the sample.
Econometrics I: Multiple Regression: Estimation - H. Tastan 24
Comparison of Simple and Multiple Regression Estimates
Two models:
y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.
These regressions will generally give different results.
There are two special cases in which they give the sameestimates:
The partial effect of x2 on y is zero, β2 = 0
x1 and x2 are uncorrelated in the sample.
Econometrics I: Multiple Regression: Estimation - H. Tastan 24
Comparison of Simple and Multiple Regression Estimates
Two models:
y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.
These regressions will generally give different results.
There are two special cases in which they give the sameestimates:
The partial effect of x2 on y is zero, β2 = 0
x1 and x2 are uncorrelated in the sample.
Econometrics I: Multiple Regression: Estimation - H. Tastan 24
Comparison of Simple and Multiple Regression Estimates
Two models:
y = β0 + β1x1, vs. y = β0 + β1x1 + β2x2.
These regressions will generally give different results.
There are two special cases in which they give the sameestimates:
The partial effect of x2 on y is zero, β2 = 0
x1 and x2 are uncorrelated in the sample.
Econometrics I: Multiple Regression: Estimation - H. Tastan 24
Goodness-of-Fit and Sums of Squares
SST: total variation in y.
SST =
n∑i=1
(yi − y)2
Note that: Var(y) = SST/(n− 1).SSE is the variation in the explained part:
SSE =
n∑i=1
(yi − y)2
SSR is the variation in the unexplained part:
SSR =
n∑i=1
u2i
Thus the total variation in y can be written as:
SST = SSE + SSR
Econometrics I: Multiple Regression: Estimation - H. Tastan 25
Goodness-of-Fit and Sums of Squares
SST: total variation in y.
SST =
n∑i=1
(yi − y)2
Note that: Var(y) = SST/(n− 1).SSE is the variation in the explained part:
SSE =
n∑i=1
(yi − y)2
SSR is the variation in the unexplained part:
SSR =
n∑i=1
u2i
Thus the total variation in y can be written as:
SST = SSE + SSR
Econometrics I: Multiple Regression: Estimation - H. Tastan 25
Goodness-of-Fit and Sums of Squares
SST: total variation in y.
SST =
n∑i=1
(yi − y)2
Note that: Var(y) = SST/(n− 1).SSE is the variation in the explained part:
SSE =
n∑i=1
(yi − y)2
SSR is the variation in the unexplained part:
SSR =
n∑i=1
u2i
Thus the total variation in y can be written as:
SST = SSE + SSR
Econometrics I: Multiple Regression: Estimation - H. Tastan 25
Goodness-of-Fit and Sums of Squares
SST: total variation in y.
SST =
n∑i=1
(yi − y)2
Note that: Var(y) = SST/(n− 1).SSE is the variation in the explained part:
SSE =
n∑i=1
(yi − y)2
SSR is the variation in the unexplained part:
SSR =
n∑i=1
u2i
Thus the total variation in y can be written as:
SST = SSE + SSR
Econometrics I: Multiple Regression: Estimation - H. Tastan 25
Goodness-of-fit
Dividing both sides of this expression by SST:
1 =SSE
SST+SSR
SST
Coefficient of determination: the ratio of the variation in theexplained part to the total variation:
R2 =SSE
SST= 1− SSR
SST
Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1
R2 measures the percentage of the variation in y that can beexplained by the variations in x variables. This can be used asgoodness-of-fit measure.
R2 can also be calculated as follows: R2 = Corr(y, y)2
Econometrics I: Multiple Regression: Estimation - H. Tastan 26
Goodness-of-fit
Dividing both sides of this expression by SST:
1 =SSE
SST+SSR
SST
Coefficient of determination: the ratio of the variation in theexplained part to the total variation:
R2 =SSE
SST= 1− SSR
SST
Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1
R2 measures the percentage of the variation in y that can beexplained by the variations in x variables. This can be used asgoodness-of-fit measure.
R2 can also be calculated as follows: R2 = Corr(y, y)2
Econometrics I: Multiple Regression: Estimation - H. Tastan 26
Goodness-of-fit
Dividing both sides of this expression by SST:
1 =SSE
SST+SSR
SST
Coefficient of determination: the ratio of the variation in theexplained part to the total variation:
R2 =SSE
SST= 1− SSR
SST
Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1
R2 measures the percentage of the variation in y that can beexplained by the variations in x variables. This can be used asgoodness-of-fit measure.
R2 can also be calculated as follows: R2 = Corr(y, y)2
Econometrics I: Multiple Regression: Estimation - H. Tastan 26
Goodness-of-fit
Dividing both sides of this expression by SST:
1 =SSE
SST+SSR
SST
Coefficient of determination: the ratio of the variation in theexplained part to the total variation:
R2 =SSE
SST= 1− SSR
SST
Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1
R2 measures the percentage of the variation in y that can beexplained by the variations in x variables. This can be used asgoodness-of-fit measure.
R2 can also be calculated as follows: R2 = Corr(y, y)2
Econometrics I: Multiple Regression: Estimation - H. Tastan 26
Goodness-of-fit
Dividing both sides of this expression by SST:
1 =SSE
SST+SSR
SST
Coefficient of determination: the ratio of the variation in theexplained part to the total variation:
R2 =SSE
SST= 1− SSR
SST
Since SSE can never be larger than SST we have 0 ≤ R2 ≤ 1
R2 measures the percentage of the variation in y that can beexplained by the variations in x variables. This can be used asgoodness-of-fit measure.
R2 can also be calculated as follows: R2 = Corr(y, y)2
Econometrics I: Multiple Regression: Estimation - H. Tastan 26
Goodness-of-fit
Coefficient of Determination:
R2 =SSE
SST= 1− SSR
SST
When a new x variable is added to the regression R2 alwaysincreases (or stays the same). It never decreases.
The reason is that, when a new variable is added SSR alwaysdecreases.
For this reason R2 may not be a good indicator modelselection.
Instead of R2 we will use adjusted R2 for model selection.
Econometrics I: Multiple Regression: Estimation - H. Tastan 27
Goodness-of-fit
Coefficient of Determination:
R2 =SSE
SST= 1− SSR
SST
When a new x variable is added to the regression R2 alwaysincreases (or stays the same). It never decreases.
The reason is that, when a new variable is added SSR alwaysdecreases.
For this reason R2 may not be a good indicator modelselection.
Instead of R2 we will use adjusted R2 for model selection.
Econometrics I: Multiple Regression: Estimation - H. Tastan 27
Goodness-of-fit
Coefficient of Determination:
R2 =SSE
SST= 1− SSR
SST
When a new x variable is added to the regression R2 alwaysincreases (or stays the same). It never decreases.
The reason is that, when a new variable is added SSR alwaysdecreases.
For this reason R2 may not be a good indicator modelselection.
Instead of R2 we will use adjusted R2 for model selection.
Econometrics I: Multiple Regression: Estimation - H. Tastan 27
Goodness-of-fit
Coefficient of Determination:
R2 =SSE
SST= 1− SSR
SST
When a new x variable is added to the regression R2 alwaysincreases (or stays the same). It never decreases.
The reason is that, when a new variable is added SSR alwaysdecreases.
For this reason R2 may not be a good indicator modelselection.
Instead of R2 we will use adjusted R2 for model selection.
Econometrics I: Multiple Regression: Estimation - H. Tastan 27
Goodness-of-fit
Coefficient of Determination:
R2 =SSE
SST= 1− SSR
SST
When a new x variable is added to the regression R2 alwaysincreases (or stays the same). It never decreases.
The reason is that, when a new variable is added SSR alwaysdecreases.
For this reason R2 may not be a good indicator modelselection.
Instead of R2 we will use adjusted R2 for model selection.
Econometrics I: Multiple Regression: Estimation - H. Tastan 27
Goodness-of-fit: Example
College GPA
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 R2 = 0.176
The coefficient of determination is 0.176.
%17.6 of the total variation in college GPA can be explainedby the variations in hsGPA and ACT .
This is not very large because there are many factors notincluded in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 28
Goodness-of-fit: Example
College GPA
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 R2 = 0.176
The coefficient of determination is 0.176.
%17.6 of the total variation in college GPA can be explainedby the variations in hsGPA and ACT .
This is not very large because there are many factors notincluded in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 28
Goodness-of-fit: Example
College GPA
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 R2 = 0.176
The coefficient of determination is 0.176.
%17.6 of the total variation in college GPA can be explainedby the variations in hsGPA and ACT .
This is not very large because there are many factors notincluded in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 28
Goodness-of-fit: Example
College GPA
colGPA = 1.29 + 0.453 hsGPA+ 0.0094 ACT
n = 141 R2 = 0.176
The coefficient of determination is 0.176.
%17.6 of the total variation in college GPA can be explainedby the variations in hsGPA and ACT .
This is not very large because there are many factors notincluded in the model.
Econometrics I: Multiple Regression: Estimation - H. Tastan 28
Regression through the Origin
Predicted y is 0 when all xjs are 0
y = β1x1 + β2x2 + . . .+ βkxk.
Note that the regression does not have a constant term.
R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.
R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.
When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.
Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 29
Regression through the Origin
Predicted y is 0 when all xjs are 0
y = β1x1 + β2x2 + . . .+ βkxk.
Note that the regression does not have a constant term.
R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.
R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.
When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.
Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 29
Regression through the Origin
Predicted y is 0 when all xjs are 0
y = β1x1 + β2x2 + . . .+ βkxk.
Note that the regression does not have a constant term.
R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.
R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.
When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.
Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 29
Regression through the Origin
Predicted y is 0 when all xjs are 0
y = β1x1 + β2x2 + . . .+ βkxk.
Note that the regression does not have a constant term.
R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.
R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.
When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.
Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 29
Regression through the Origin
Predicted y is 0 when all xjs are 0
y = β1x1 + β2x2 + . . .+ βkxk.
Note that the regression does not have a constant term.
R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.
R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.
When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.
Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 29
Regression through the Origin
Predicted y is 0 when all xjs are 0
y = β1x1 + β2x2 + . . .+ βkxk.
Note that the regression does not have a constant term.
R2 can be negative in this regression. In this case R2 istreated as 0 or the regression is re-estimated after adding anintercept term.
R2 < 0 means that the sample average of y, y, is moresuccessful in explaining total variation than the modelincluding x variables.
When the constant term in the PRF is different from 0, thenthe OLS estimators in the regression through the origin will bebiased.
Adding a constant term, when in fact it is 0, on the otherhand, increases the variance of OLS estimators.
Econometrics I: Multiple Regression: Estimation - H. Tastan 29
Assumptions for Unbiasedness of OLS Estimators
MLR.1 Linear in Parameters
y = β0 + β1x1 + β2x2 + . . .+ βkxk + u
Population model is linear in parameters.
MLR.2 Random Sampling
We have a random sample of n observations drawn from thepopulation model defined in MLR.1:
{(xi1, xi2, . . . , xik, yi) : i = 1, 2, . . . , n}
Econometrics I: Multiple Regression: Estimation - H. Tastan 30
Assumptions for Unbiasedness of OLS Estimators
MLR.1 Linear in Parameters
y = β0 + β1x1 + β2x2 + . . .+ βkxk + u
Population model is linear in parameters.
MLR.2 Random Sampling
We have a random sample of n observations drawn from thepopulation model defined in MLR.1:
{(xi1, xi2, . . . , xik, yi) : i = 1, 2, . . . , n}
Econometrics I: Multiple Regression: Estimation - H. Tastan 30
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.3 Zero Conditional Mean
E(u|x1, x2, . . . , xk) = 0
This assumption states that x explanatory variables arestrictly exogenous. Random error term is uncorrelated withexplanatory variables.
There are several ways this assumption can fail.
1. Functional form misspecification: if the functionalrelationship between the explained and explanatory variables ismisspecified this assumption can fail.
2. Omitting an important variable that is correlated with anyof explanatory variables.
3. Measurement error in explanatory variables.
If this assumption fails then we have endogenous explanatoryvariables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 31
Assumptions for Unbiasedness of OLS Estimators
MLR.4 No Perfect Collinearity
There is no perfect linear relationship between all independent xvariables. Any of the x variables cannot be written as a linearcombination of other independent variables. If this assumption failswe have perfect collinearity.
If x variables are perfectly collinear then it is not possible tomathematically determine OLS estimators.This assumption allows the independent variables to becorrelated: but they cannot be perfectly correlated.If we did not allow for any correlation among the independentvariables, then multiple regression would be of very limited usefor econometric analysis.For example, in the regression of student GPA on educationexpenditures and family income, we suspect that expenditureand income may be correlated and so we would like to holdincome fixed to find the impact of expenditure on GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 32
Assumptions for Unbiasedness of OLS Estimators
MLR.4 No Perfect Collinearity
There is no perfect linear relationship between all independent xvariables. Any of the x variables cannot be written as a linearcombination of other independent variables. If this assumption failswe have perfect collinearity.
If x variables are perfectly collinear then it is not possible tomathematically determine OLS estimators.This assumption allows the independent variables to becorrelated: but they cannot be perfectly correlated.If we did not allow for any correlation among the independentvariables, then multiple regression would be of very limited usefor econometric analysis.For example, in the regression of student GPA on educationexpenditures and family income, we suspect that expenditureand income may be correlated and so we would like to holdincome fixed to find the impact of expenditure on GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 32
Assumptions for Unbiasedness of OLS Estimators
MLR.4 No Perfect Collinearity
There is no perfect linear relationship between all independent xvariables. Any of the x variables cannot be written as a linearcombination of other independent variables. If this assumption failswe have perfect collinearity.
If x variables are perfectly collinear then it is not possible tomathematically determine OLS estimators.This assumption allows the independent variables to becorrelated: but they cannot be perfectly correlated.If we did not allow for any correlation among the independentvariables, then multiple regression would be of very limited usefor econometric analysis.For example, in the regression of student GPA on educationexpenditures and family income, we suspect that expenditureand income may be correlated and so we would like to holdincome fixed to find the impact of expenditure on GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 32
Assumptions for Unbiasedness of OLS Estimators
MLR.4 No Perfect Collinearity
There is no perfect linear relationship between all independent xvariables. Any of the x variables cannot be written as a linearcombination of other independent variables. If this assumption failswe have perfect collinearity.
If x variables are perfectly collinear then it is not possible tomathematically determine OLS estimators.This assumption allows the independent variables to becorrelated: but they cannot be perfectly correlated.If we did not allow for any correlation among the independentvariables, then multiple regression would be of very limited usefor econometric analysis.For example, in the regression of student GPA on educationexpenditures and family income, we suspect that expenditureand income may be correlated and so we would like to holdincome fixed to find the impact of expenditure on GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 32
Assumptions for Unbiasedness of OLS Estimators
MLR.4 No Perfect Collinearity
There is no perfect linear relationship between all independent xvariables. Any of the x variables cannot be written as a linearcombination of other independent variables. If this assumption failswe have perfect collinearity.
If x variables are perfectly collinear then it is not possible tomathematically determine OLS estimators.This assumption allows the independent variables to becorrelated: but they cannot be perfectly correlated.If we did not allow for any correlation among the independentvariables, then multiple regression would be of very limited usefor econometric analysis.For example, in the regression of student GPA on educationexpenditures and family income, we suspect that expenditureand income may be correlated and so we would like to holdincome fixed to find the impact of expenditure on GPA.
Econometrics I: Multiple Regression: Estimation - H. Tastan 32
Finite Sample Properties of OLS Estimators
THEOREM: Unbiasedness of OLS
Under Assumptions MLR.1 through MLR.4 OLS estimators areunbiased:
E(βj) = βj , j = 0, 1, 2, . . . , k
The centers of the sampling distributions (i.e. expectations) ofOLS estimators are equal to the unknown population parameters.Remember that an estimate cannot be unbiased, it is just anumber based on a sample, and in general we cannot say anythingabout its proximity to the true value.
Econometrics I: Multiple Regression: Estimation - H. Tastan 33
Including Irrelevant Variables in a Regression Model
What happens if we add an irrelevant variable in the model?(overspecifying the model)
Irrelevance of the variable means that its coefficient in thepopulation model is zero.
E.g., suppose that in the regression below the partial effect ofx3 is zero, β3 = 0
y = β0 + β1x1 + β2x2 + β3x3 + u
Taking the conditional expectation we have:
E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2
Even though the true model is given above x3 is added to themodel by mistake.
Econometrics I: Multiple Regression: Estimation - H. Tastan 34
Including Irrelevant Variables in a Regression Model
What happens if we add an irrelevant variable in the model?(overspecifying the model)
Irrelevance of the variable means that its coefficient in thepopulation model is zero.
E.g., suppose that in the regression below the partial effect ofx3 is zero, β3 = 0
y = β0 + β1x1 + β2x2 + β3x3 + u
Taking the conditional expectation we have:
E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2
Even though the true model is given above x3 is added to themodel by mistake.
Econometrics I: Multiple Regression: Estimation - H. Tastan 34
Including Irrelevant Variables in a Regression Model
What happens if we add an irrelevant variable in the model?(overspecifying the model)
Irrelevance of the variable means that its coefficient in thepopulation model is zero.
E.g., suppose that in the regression below the partial effect ofx3 is zero, β3 = 0
y = β0 + β1x1 + β2x2 + β3x3 + u
Taking the conditional expectation we have:
E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2
Even though the true model is given above x3 is added to themodel by mistake.
Econometrics I: Multiple Regression: Estimation - H. Tastan 34
Including Irrelevant Variables in a Regression Model
What happens if we add an irrelevant variable in the model?(overspecifying the model)
Irrelevance of the variable means that its coefficient in thepopulation model is zero.
E.g., suppose that in the regression below the partial effect ofx3 is zero, β3 = 0
y = β0 + β1x1 + β2x2 + β3x3 + u
Taking the conditional expectation we have:
E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2
Even though the true model is given above x3 is added to themodel by mistake.
Econometrics I: Multiple Regression: Estimation - H. Tastan 34
Including Irrelevant Variables in a Regression Model
What happens if we add an irrelevant variable in the model?(overspecifying the model)
Irrelevance of the variable means that its coefficient in thepopulation model is zero.
E.g., suppose that in the regression below the partial effect ofx3 is zero, β3 = 0
y = β0 + β1x1 + β2x2 + β3x3 + u
Taking the conditional expectation we have:
E(y|x1, x2, x3) = E(y|x1, x2) = β0 + β1x1 + β2x2
Even though the true model is given above x3 is added to themodel by mistake.
Econometrics I: Multiple Regression: Estimation - H. Tastan 34
Including Irrelevant Variables in a Regression Model
In this case SRF is given by
y = β0 + β1x1 + β2x2 + β3x3
OLS estimators are still unbiased:
E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,
True parameter value for the irrelevant variable is 0. Since thisvariable would have no explanatory power the expected valueof the OLS estimator will also be zero - i.e. unbiased.
However, even if they are unbiased, the variance of theregression will be larger if the model is overspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 35
Including Irrelevant Variables in a Regression Model
In this case SRF is given by
y = β0 + β1x1 + β2x2 + β3x3
OLS estimators are still unbiased:
E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,
True parameter value for the irrelevant variable is 0. Since thisvariable would have no explanatory power the expected valueof the OLS estimator will also be zero - i.e. unbiased.
However, even if they are unbiased, the variance of theregression will be larger if the model is overspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 35
Including Irrelevant Variables in a Regression Model
In this case SRF is given by
y = β0 + β1x1 + β2x2 + β3x3
OLS estimators are still unbiased:
E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,
True parameter value for the irrelevant variable is 0. Since thisvariable would have no explanatory power the expected valueof the OLS estimator will also be zero - i.e. unbiased.
However, even if they are unbiased, the variance of theregression will be larger if the model is overspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 35
Including Irrelevant Variables in a Regression Model
In this case SRF is given by
y = β0 + β1x1 + β2x2 + β3x3
OLS estimators are still unbiased:
E(β0) = β0, E(β1) = β1, E(β2) = β2, E(β3) = 0,
True parameter value for the irrelevant variable is 0. Since thisvariable would have no explanatory power the expected valueof the OLS estimator will also be zero - i.e. unbiased.
However, even if they are unbiased, the variance of theregression will be larger if the model is overspecified.
Econometrics I: Multiple Regression: Estimation - H. Tastan 35
Omitting a Relevant Variable
What happens if we exclude an important variable?
If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.
In this case OLS estimators will be biased.
E.g. suppose that the PRF includes 2 independent variables:
y = β0 + β1x1 + β2x2 + u
Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is
y = β0 + β1x1
Is the OLS estimator on x1, β1, still unbiased?
Econometrics I: Multiple Regression: Estimation - H. Tastan 36
Omitting a Relevant Variable
What happens if we exclude an important variable?
If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.
In this case OLS estimators will be biased.
E.g. suppose that the PRF includes 2 independent variables:
y = β0 + β1x1 + β2x2 + u
Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is
y = β0 + β1x1
Is the OLS estimator on x1, β1, still unbiased?
Econometrics I: Multiple Regression: Estimation - H. Tastan 36
Omitting a Relevant Variable
What happens if we exclude an important variable?
If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.
In this case OLS estimators will be biased.
E.g. suppose that the PRF includes 2 independent variables:
y = β0 + β1x1 + β2x2 + u
Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is
y = β0 + β1x1
Is the OLS estimator on x1, β1, still unbiased?
Econometrics I: Multiple Regression: Estimation - H. Tastan 36
Omitting a Relevant Variable
What happens if we exclude an important variable?
If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.
In this case OLS estimators will be biased.
E.g. suppose that the PRF includes 2 independent variables:
y = β0 + β1x1 + β2x2 + u
Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is
y = β0 + β1x1
Is the OLS estimator on x1, β1, still unbiased?
Econometrics I: Multiple Regression: Estimation - H. Tastan 36
Omitting a Relevant Variable
What happens if we exclude an important variable?
If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.
In this case OLS estimators will be biased.
E.g. suppose that the PRF includes 2 independent variables:
y = β0 + β1x1 + β2x2 + u
Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is
y = β0 + β1x1
Is the OLS estimator on x1, β1, still unbiased?
Econometrics I: Multiple Regression: Estimation - H. Tastan 36
Omitting a Relevant Variable
What happens if we exclude an important variable?
If a relevant variable is omitted this implies that its parameteris not 0 in the PRF. This is called underspecification of themodel.
In this case OLS estimators will be biased.
E.g. suppose that the PRF includes 2 independent variables:
y = β0 + β1x1 + β2x2 + u
Suppose that we omitted x2 because, say, it is unobservable.Now the SRF is
y = β0 + β1x1
Is the OLS estimator on x1, β1, still unbiased?
Econometrics I: Multiple Regression: Estimation - H. Tastan 36
Omitting a Relevant Variable
The impact of the omitted variable will be included in theerror term:
y = β0 + β1x1 + ν
True PRF includes x2’yi. Thus the error term ν can be writtenas:
ν = β2x2 + u
OLS estimator of β1 in the model above is:
β1 =
∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2
To determine the magnitude and sign of the bias we willsubstitute y in the formula for β1, re-arrange and takeexpectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 37
Omitting a Relevant Variable
The impact of the omitted variable will be included in theerror term:
y = β0 + β1x1 + ν
True PRF includes x2’yi. Thus the error term ν can be writtenas:
ν = β2x2 + u
OLS estimator of β1 in the model above is:
β1 =
∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2
To determine the magnitude and sign of the bias we willsubstitute y in the formula for β1, re-arrange and takeexpectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 37
Omitting a Relevant Variable
The impact of the omitted variable will be included in theerror term:
y = β0 + β1x1 + ν
True PRF includes x2’yi. Thus the error term ν can be writtenas:
ν = β2x2 + u
OLS estimator of β1 in the model above is:
β1 =
∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2
To determine the magnitude and sign of the bias we willsubstitute y in the formula for β1, re-arrange and takeexpectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 37
Omitting a Relevant Variable
The impact of the omitted variable will be included in theerror term:
y = β0 + β1x1 + ν
True PRF includes x2’yi. Thus the error term ν can be writtenas:
ν = β2x2 + u
OLS estimator of β1 in the model above is:
β1 =
∑ni=1(xi1 − x1)yi∑ni=1(xi1 − x1)2
To determine the magnitude and sign of the bias we willsubstitute y in the formula for β1, re-arrange and takeexpectation.
Econometrics I: Multiple Regression: Estimation - H. Tastan 37
Omitting a Relevant Variable
β1 =
∑(xi1 − x1)(β0 + β1xi1 + β2xi2 + ui)∑
(xi1 − x1)2
= β1 + β2
∑(xi1 − x1)xi2∑(xi1 − x1)2
+
∑(xi1 − x1)ui∑(xi1 − x1)2
Taking (conditional) expectation we obtain
E(β1) = β1 + β2
∑(xi1 − x1)xi2∑(xi1 − x1)2
+
∑(xi1 − x1)
=0,MLR.3︷ ︸︸ ︷E(ui)∑
(xi1 − x1)2
= β1 + β2
(∑(xi1 − x1)xi2∑(xi1 − x1)2
)
Econometrics I: Multiple Regression: Estimation - H. Tastan 38
Omitting a Relevant Variable
E(β1) = β1 + β2
(∑(xi1 − x1)xi2∑(xi1 − x1)2
)The expression in the parenthesis to the right of β2 is just theregression of x2 on x1:
x2 = δ0 + δ1x1
ThusE(β1) = β1 + β2δ1
bias = E(β1)− β1 = β2δ1
This is called omitted variable bias.
Econometrics I: Multiple Regression: Estimation - H. Tastan 39
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
Iff δ1 = 0 or β2 = 0 then bias is 0.
The sign of bias depends on both β2 and the correlationbetween omitted variable (x2) and included variable (x1).
It is not possible to calculate this correlation if omittedvariable cannot be observed.
The following table summarizes possible cases:
Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias
Econometrics I: Multiple Regression: Estimation - H. Tastan 40
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
Iff δ1 = 0 or β2 = 0 then bias is 0.
The sign of bias depends on both β2 and the correlationbetween omitted variable (x2) and included variable (x1).
It is not possible to calculate this correlation if omittedvariable cannot be observed.
The following table summarizes possible cases:
Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias
Econometrics I: Multiple Regression: Estimation - H. Tastan 40
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
Iff δ1 = 0 or β2 = 0 then bias is 0.
The sign of bias depends on both β2 and the correlationbetween omitted variable (x2) and included variable (x1).
It is not possible to calculate this correlation if omittedvariable cannot be observed.
The following table summarizes possible cases:
Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias
Econometrics I: Multiple Regression: Estimation - H. Tastan 40
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
Iff δ1 = 0 or β2 = 0 then bias is 0.
The sign of bias depends on both β2 and the correlationbetween omitted variable (x2) and included variable (x1).
It is not possible to calculate this correlation if omittedvariable cannot be observed.
The following table summarizes possible cases:
Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias
Econometrics I: Multiple Regression: Estimation - H. Tastan 40
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
Iff δ1 = 0 or β2 = 0 then bias is 0.
The sign of bias depends on both β2 and the correlationbetween omitted variable (x2) and included variable (x1).
It is not possible to calculate this correlation if omittedvariable cannot be observed.
The following table summarizes possible cases:
Direction of Bias
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 positive bias negative bias
β2 < 0 negative bias positive bias
Econometrics I: Multiple Regression: Estimation - H. Tastan 40
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.
Econometrics I: Multiple Regression: Estimation - H. Tastan 41
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.
Econometrics I: Multiple Regression: Estimation - H. Tastan 41
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.
Econometrics I: Multiple Regression: Estimation - H. Tastan 41
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.
Econometrics I: Multiple Regression: Estimation - H. Tastan 41
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.
Econometrics I: Multiple Regression: Estimation - H. Tastan 41
Omitted Variable Bias
bias = E(β1)− β1 = β2δ1
The size of the bias is also important. It depends on both δ1and β2.A small bias relative to β1 may not be a problem in practice.But in most cases we are not able to calculate the size of thebias.In some cases we may have an idea about the direction ofbias. For example, suppose that in the wage equation truePRF contains both education and ability.Suppose also that ability is omitted because it cannot beobserved, leading to omitted variable bias.In this case we can say that sign of the bias is + because it isreasonable to think that people with more ability tend to havehigher levels of education and ability is positively related towage.
Econometrics I: Multiple Regression: Estimation - H. Tastan 41
Omitted Variable Bias
The effect of omitted variable will be in u. Thus, MLR. 3 fails.
wage = β0 + β1educ+ β2ability + u
Instead of this model we estimate
wage = β0 + β1educ+ ν
ν = β2ability + u
Education will be correlated with the error term (ν).
E(ν|educ) 6= 0
Education is endogenous. If we omit ability the effect ofeducation on wages will be overestimated. Some part of theeffect of education on wage comes from ability.
Econometrics I: Multiple Regression: Estimation - H. Tastan 42
Omitted Variable Bias
The effect of omitted variable will be in u. Thus, MLR. 3 fails.
wage = β0 + β1educ+ β2ability + u
Instead of this model we estimate
wage = β0 + β1educ+ ν
ν = β2ability + u
Education will be correlated with the error term (ν).
E(ν|educ) 6= 0
Education is endogenous. If we omit ability the effect ofeducation on wages will be overestimated. Some part of theeffect of education on wage comes from ability.
Econometrics I: Multiple Regression: Estimation - H. Tastan 42
Omitted Variable Bias
The effect of omitted variable will be in u. Thus, MLR. 3 fails.
wage = β0 + β1educ+ β2ability + u
Instead of this model we estimate
wage = β0 + β1educ+ ν
ν = β2ability + u
Education will be correlated with the error term (ν).
E(ν|educ) 6= 0
Education is endogenous. If we omit ability the effect ofeducation on wages will be overestimated. Some part of theeffect of education on wage comes from ability.
Econometrics I: Multiple Regression: Estimation - H. Tastan 42
Omitted Variable Bias
The effect of omitted variable will be in u. Thus, MLR. 3 fails.
wage = β0 + β1educ+ β2ability + u
Instead of this model we estimate
wage = β0 + β1educ+ ν
ν = β2ability + u
Education will be correlated with the error term (ν).
E(ν|educ) 6= 0
Education is endogenous. If we omit ability the effect ofeducation on wages will be overestimated. Some part of theeffect of education on wage comes from ability.
Econometrics I: Multiple Regression: Estimation - H. Tastan 42
Omitted Variable Bias
In models with more than two variables, omitting a relevantvariable generally causes OLS estimators to be biased.
True PRF is given by:
y = β0 + β1x1 + β2x2 + β3x3 + u
x3 is omitted and the following model is estimated
y = β0 + β1x1 + β2x2
Also suppose that x3 is correlated with x1 but uncorrelatedwith x2.
In this case both β1 and β2 will be biased. Iff x1 and x2 arecorrelated then β2 will be unbiased.
Econometrics I: Multiple Regression: Estimation - H. Tastan 43
Omitted Variable Bias
In models with more than two variables, omitting a relevantvariable generally causes OLS estimators to be biased.
True PRF is given by:
y = β0 + β1x1 + β2x2 + β3x3 + u
x3 is omitted and the following model is estimated
y = β0 + β1x1 + β2x2
Also suppose that x3 is correlated with x1 but uncorrelatedwith x2.
In this case both β1 and β2 will be biased. Iff x1 and x2 arecorrelated then β2 will be unbiased.
Econometrics I: Multiple Regression: Estimation - H. Tastan 43
Omitted Variable Bias
In models with more than two variables, omitting a relevantvariable generally causes OLS estimators to be biased.
True PRF is given by:
y = β0 + β1x1 + β2x2 + β3x3 + u
x3 is omitted and the following model is estimated
y = β0 + β1x1 + β2x2
Also suppose that x3 is correlated with x1 but uncorrelatedwith x2.
In this case both β1 and β2 will be biased. Iff x1 and x2 arecorrelated then β2 will be unbiased.
Econometrics I: Multiple Regression: Estimation - H. Tastan 43
Omitted Variable Bias
In models with more than two variables, omitting a relevantvariable generally causes OLS estimators to be biased.
True PRF is given by:
y = β0 + β1x1 + β2x2 + β3x3 + u
x3 is omitted and the following model is estimated
y = β0 + β1x1 + β2x2
Also suppose that x3 is correlated with x1 but uncorrelatedwith x2.
In this case both β1 and β2 will be biased. Iff x1 and x2 arecorrelated then β2 will be unbiased.
Econometrics I: Multiple Regression: Estimation - H. Tastan 43
Omitted Variable Bias
In models with more than two variables, omitting a relevantvariable generally causes OLS estimators to be biased.
True PRF is given by:
y = β0 + β1x1 + β2x2 + β3x3 + u
x3 is omitted and the following model is estimated
y = β0 + β1x1 + β2x2
Also suppose that x3 is correlated with x1 but uncorrelatedwith x2.
In this case both β1 and β2 will be biased. Iff x1 and x2 arecorrelated then β2 will be unbiased.
Econometrics I: Multiple Regression: Estimation - H. Tastan 43
The Variance of the OLS Estimators
Assumption MLR.5: Homoscedasticity
This assumption states that conditional on x variables error termhas constant variance:
Var(u|x1, x2, . . . , xk) = σ2
If this assumption fails the model exhibits heteroscedasticity.
This assumption is essential in deriving variances and standarderrors of OLS estimators and in showing whether OLSestimators are efficient.
We do not need this assumption for unbiasedness.
E.g., in the wage equation this assumption implies that, thevariance of unobserved factors does not change with thefactors included in the model (education, experience, tenure,etc.).
Econometrics I: Multiple Regression: Estimation - H. Tastan 44
The Variance of the OLS Estimators
Assumption MLR.5: Homoscedasticity
This assumption states that conditional on x variables error termhas constant variance:
Var(u|x1, x2, . . . , xk) = σ2
If this assumption fails the model exhibits heteroscedasticity.
This assumption is essential in deriving variances and standarderrors of OLS estimators and in showing whether OLSestimators are efficient.
We do not need this assumption for unbiasedness.
E.g., in the wage equation this assumption implies that, thevariance of unobserved factors does not change with thefactors included in the model (education, experience, tenure,etc.).
Econometrics I: Multiple Regression: Estimation - H. Tastan 44
The Variance of the OLS Estimators
Assumption MLR.5: Homoscedasticity
This assumption states that conditional on x variables error termhas constant variance:
Var(u|x1, x2, . . . , xk) = σ2
If this assumption fails the model exhibits heteroscedasticity.
This assumption is essential in deriving variances and standarderrors of OLS estimators and in showing whether OLSestimators are efficient.
We do not need this assumption for unbiasedness.
E.g., in the wage equation this assumption implies that, thevariance of unobserved factors does not change with thefactors included in the model (education, experience, tenure,etc.).
Econometrics I: Multiple Regression: Estimation - H. Tastan 44
The Variance of the OLS Estimators
Assumption MLR.5: Homoscedasticity
This assumption states that conditional on x variables error termhas constant variance:
Var(u|x1, x2, . . . , xk) = σ2
If this assumption fails the model exhibits heteroscedasticity.
This assumption is essential in deriving variances and standarderrors of OLS estimators and in showing whether OLSestimators are efficient.
We do not need this assumption for unbiasedness.
E.g., in the wage equation this assumption implies that, thevariance of unobserved factors does not change with thefactors included in the model (education, experience, tenure,etc.).
Econometrics I: Multiple Regression: Estimation - H. Tastan 44
The Variance of the OLS Estimators
Assumption MLR.5: Homoscedasticity
This assumption states that conditional on x variables error termhas constant variance:
Var(u|x1, x2, . . . , xk) = σ2
If this assumption fails the model exhibits heteroscedasticity.
This assumption is essential in deriving variances and standarderrors of OLS estimators and in showing whether OLSestimators are efficient.
We do not need this assumption for unbiasedness.
E.g., in the wage equation this assumption implies that, thevariance of unobserved factors does not change with thefactors included in the model (education, experience, tenure,etc.).
Econometrics I: Multiple Regression: Estimation - H. Tastan 44
The Variance of the OLS Estimators
Gauss-Markov Assumptions
Assumptions MLR.1 through MLR.5 are called Gauss-Markovassumptions. MLR.1: Linear in parameters,MLR.2: Random sampling,MLR.3: Zero conditional mean,MLR.4: No perfect collinearity,MLR.5: Homoscedasticity.
This assumptions are only valid for cross-sectional data. Theyneed to be modified for time series data.
Assumptions MLR.3 and MLR.5 can be restated in terms ofthe dependent variable:
E(y|x1, x2, . . . , xk) = β0 + β1x1 + β2x2 + . . .+ βkxk
Var(y|x1, x2, . . . , xk) = Var(u|x1, x2, . . . , xk) = σ2
Econometrics I: Multiple Regression: Estimation - H. Tastan 45
The Variance of the OLS Estimators
Gauss-Markov Assumptions
Assumptions MLR.1 through MLR.5 are called Gauss-Markovassumptions. MLR.1: Linear in parameters,MLR.2: Random sampling,MLR.3: Zero conditional mean,MLR.4: No perfect collinearity,MLR.5: Homoscedasticity.
This assumptions are only valid for cross-sectional data. Theyneed to be modified for time series data.
Assumptions MLR.3 and MLR.5 can be restated in terms ofthe dependent variable:
E(y|x1, x2, . . . , xk) = β0 + β1x1 + β2x2 + . . .+ βkxk
Var(y|x1, x2, . . . , xk) = Var(u|x1, x2, . . . , xk) = σ2
Econometrics I: Multiple Regression: Estimation - H. Tastan 45
The Variance of the OLS Estimators
Gauss-Markov Assumptions
Assumptions MLR.1 through MLR.5 are called Gauss-Markovassumptions. MLR.1: Linear in parameters,MLR.2: Random sampling,MLR.3: Zero conditional mean,MLR.4: No perfect collinearity,MLR.5: Homoscedasticity.
This assumptions are only valid for cross-sectional data. Theyneed to be modified for time series data.
Assumptions MLR.3 and MLR.5 can be restated in terms ofthe dependent variable:
E(y|x1, x2, . . . , xk) = β0 + β1x1 + β2x2 + . . .+ βkxk
Var(y|x1, x2, . . . , xk) = Var(u|x1, x2, . . . , xk) = σ2
Econometrics I: Multiple Regression: Estimation - H. Tastan 45
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
where
SSTj =
n∑i=1
(xij − xj)2
is the total sample variation in xj and R2j is the R-squared from
regressing xj on all other independent variables (including anintercept term).
Var(βj) moves in the same direction with σ2 but in oppositedirection with SSTj .To increase SSTj we need to collect more data (increase n).To reduce σ2 we need to find good explanatory variables.Econometrics I: Multiple Regression: Estimation - H. Tastan 46
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
where
SSTj =
n∑i=1
(xij − xj)2
is the total sample variation in xj and R2j is the R-squared from
regressing xj on all other independent variables (including anintercept term).
Var(βj) moves in the same direction with σ2 but in oppositedirection with SSTj .To increase SSTj we need to collect more data (increase n).To reduce σ2 we need to find good explanatory variables.Econometrics I: Multiple Regression: Estimation - H. Tastan 46
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
where
SSTj =
n∑i=1
(xij − xj)2
is the total sample variation in xj and R2j is the R-squared from
regressing xj on all other independent variables (including anintercept term).
Var(βj) moves in the same direction with σ2 but in oppositedirection with SSTj .To increase SSTj we need to collect more data (increase n).To reduce σ2 we need to find good explanatory variables.Econometrics I: Multiple Regression: Estimation - H. Tastan 46
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
Var(βj) also depends on R2j which measures the degree of
correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2
j = 1 in which case the variance is infinite (note
that in this case βjs will be indefinite. But MLR.4 prohibitsthis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 47
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
Var(βj) also depends on R2j which measures the degree of
correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2
j = 1 in which case the variance is infinite (note
that in this case βjs will be indefinite. But MLR.4 prohibitsthis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 47
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
Var(βj) also depends on R2j which measures the degree of
correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2
j = 1 in which case the variance is infinite (note
that in this case βjs will be indefinite. But MLR.4 prohibitsthis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 47
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
Var(βj) also depends on R2j which measures the degree of
correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2
j = 1 in which case the variance is infinite (note
that in this case βjs will be indefinite. But MLR.4 prohibitsthis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 47
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
Var(βj) also depends on R2j which measures the degree of
correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2
j = 1 in which case the variance is infinite (note
that in this case βjs will be indefinite. But MLR.4 prohibitsthis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 47
The Variance of the OLS Estimators
Theorem: Variances of β
Under Gauss-Markov assumptions (MLR.1-MLR.5):
Var(βj) =σ2
SSTj(1−R2j ), j = 1, 2, . . . , k
Var(βj) also depends on R2j which measures the degree of
correlation among x variables.We did not have this term in the simple regression analysisbecause there was only one explanatory variable.As the degree of correlation increases among x variables thevariances of OLS estimators get larger and larger.When there are high level of collinearity among x variablesvariances of OLS estimators will be larger. This is calledmulticollinearity problem.In the limit R2
j = 1 in which case the variance is infinite (note
that in this case βjs will be indefinite. But MLR.4 prohibitsthis.
Econometrics I: Multiple Regression: Estimation - H. Tastan 47
Relationship between variance and R2j
The Variance of the OLS Estimators
Estimating Variances
An unbiased estimator of the error variance is:
σ2 =1
n− k − 1
n∑i=1
u2i =SSR
n− k − 1
Degrees of freedom:
dof = n− (k + 1)
dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.
Econometrics I: Multiple Regression: Estimation - H. Tastan 49
The Variance of the OLS Estimators
Estimating Variances
An unbiased estimator of the error variance is:
σ2 =1
n− k − 1
n∑i=1
u2i =SSR
n− k − 1
Degrees of freedom:
dof = n− (k + 1)
dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.
Econometrics I: Multiple Regression: Estimation - H. Tastan 49
The Variance of the OLS Estimators
Estimating Variances
An unbiased estimator of the error variance is:
σ2 =1
n− k − 1
n∑i=1
u2i =SSR
n− k − 1
Degrees of freedom:
dof = n− (k + 1)
dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.
Econometrics I: Multiple Regression: Estimation - H. Tastan 49
The Variance of the OLS Estimators
Estimating Variances
An unbiased estimator of the error variance is:
σ2 =1
n− k − 1
n∑i=1
u2i =SSR
n− k − 1
Degrees of freedom:
dof = n− (k + 1)
dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.
Econometrics I: Multiple Regression: Estimation - H. Tastan 49
The Variance of the OLS Estimators
Estimating Variances
An unbiased estimator of the error variance is:
σ2 =1
n− k − 1
n∑i=1
u2i =SSR
n− k − 1
Degrees of freedom:
dof = n− (k + 1)
dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.
Econometrics I: Multiple Regression: Estimation - H. Tastan 49
The Variance of the OLS Estimators
Estimating Variances
An unbiased estimator of the error variance is:
σ2 =1
n− k − 1
n∑i=1
u2i =SSR
n− k − 1
Degrees of freedom:
dof = n− (k + 1)
dof = number of observations − number of parametersdof comes from the OLS first order conditions. Recall thatthere were k + 1 conditions. In other words k + 1 restrictionsare imposed on residuals.This means that given n− (k + 1) of the residuals, theremaining k + 1 residuals are known: there are only n− k − 1degrees of freedom in the residuals.Notice that the error term u has n degrees of freedom.
Econometrics I: Multiple Regression: Estimation - H. Tastan 49
The Variance of the OLS Estimators
Standard deviations of βjs
sd(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
Standard errors of βjs
se(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
σ =√σ2 is the standard error of regression (SER).
SER is an estimator of the standard deviation of the errorterm. SER may increase or decrease when a new variable isadded in the model.
se(βj) is used in calculating confidence intervals and testinghypotheses.
Econometrics I: Multiple Regression: Estimation - H. Tastan 50
The Variance of the OLS Estimators
Standard deviations of βjs
sd(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
Standard errors of βjs
se(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
σ =√σ2 is the standard error of regression (SER).
SER is an estimator of the standard deviation of the errorterm. SER may increase or decrease when a new variable isadded in the model.
se(βj) is used in calculating confidence intervals and testinghypotheses.
Econometrics I: Multiple Regression: Estimation - H. Tastan 50
The Variance of the OLS Estimators
Standard deviations of βjs
sd(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
Standard errors of βjs
se(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
σ =√σ2 is the standard error of regression (SER).
SER is an estimator of the standard deviation of the errorterm. SER may increase or decrease when a new variable isadded in the model.
se(βj) is used in calculating confidence intervals and testinghypotheses.
Econometrics I: Multiple Regression: Estimation - H. Tastan 50
The Variance of the OLS Estimators
Standard deviations of βjs
sd(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
Standard errors of βjs
se(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
σ =√σ2 is the standard error of regression (SER).
SER is an estimator of the standard deviation of the errorterm. SER may increase or decrease when a new variable isadded in the model.
se(βj) is used in calculating confidence intervals and testinghypotheses.
Econometrics I: Multiple Regression: Estimation - H. Tastan 50
The Variance of the OLS Estimators
Standard deviations of βjs
sd(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
Standard errors of βjs
se(βj) =σ√
SSTj(1−R2j ), j = 1, 2, . . . , k
σ =√σ2 is the standard error of regression (SER).
SER is an estimator of the standard deviation of the errorterm. SER may increase or decrease when a new variable isadded in the model.
se(βj) is used in calculating confidence intervals and testinghypotheses.
Econometrics I: Multiple Regression: Estimation - H. Tastan 50
Efficiency of OLS
Gauss-Markov Theorem
Under the assumptions MLR.1 through MLR.5, OLS estimators arethe best linear unbiased estimators (BLUE) for unknownpopulation parameters. In other words, under MLR.1-MLR.5, OLSestimators β0, β1, β2, . . . , βk are BLUEs of β0, β1, β2, . . . , βk
The Gauss-Markov theorem provides justification for the OLSestimation.
If this theorem is valid then we do not need to find otherestimation methods than OLS. The method of OLS gives usthe most efficient (best, minimum variance) estimators.
If any of the five assumptions fails the Gauss-Markov theoremdoes not hold. When MLR.3 fails unbiasedness property doesnot hold. When MLR.5 fails efficiency does not hold.
Econometrics I: Multiple Regression: Estimation - H. Tastan 51
Efficiency of OLS
Gauss-Markov Theorem
Under the assumptions MLR.1 through MLR.5, OLS estimators arethe best linear unbiased estimators (BLUE) for unknownpopulation parameters. In other words, under MLR.1-MLR.5, OLSestimators β0, β1, β2, . . . , βk are BLUEs of β0, β1, β2, . . . , βk
The Gauss-Markov theorem provides justification for the OLSestimation.
If this theorem is valid then we do not need to find otherestimation methods than OLS. The method of OLS gives usthe most efficient (best, minimum variance) estimators.
If any of the five assumptions fails the Gauss-Markov theoremdoes not hold. When MLR.3 fails unbiasedness property doesnot hold. When MLR.5 fails efficiency does not hold.
Econometrics I: Multiple Regression: Estimation - H. Tastan 51
Efficiency of OLS
Gauss-Markov Theorem
Under the assumptions MLR.1 through MLR.5, OLS estimators arethe best linear unbiased estimators (BLUE) for unknownpopulation parameters. In other words, under MLR.1-MLR.5, OLSestimators β0, β1, β2, . . . , βk are BLUEs of β0, β1, β2, . . . , βk
The Gauss-Markov theorem provides justification for the OLSestimation.
If this theorem is valid then we do not need to find otherestimation methods than OLS. The method of OLS gives usthe most efficient (best, minimum variance) estimators.
If any of the five assumptions fails the Gauss-Markov theoremdoes not hold. When MLR.3 fails unbiasedness property doesnot hold. When MLR.5 fails efficiency does not hold.
Econometrics I: Multiple Regression: Estimation - H. Tastan 51
Efficiency of OLS
Gauss-Markov Theorem
Under the assumptions MLR.1 through MLR.5, OLS estimators arethe best linear unbiased estimators (BLUE) for unknownpopulation parameters. In other words, under MLR.1-MLR.5, OLSestimators β0, β1, β2, . . . , βk are BLUEs of β0, β1, β2, . . . , βk
The Gauss-Markov theorem provides justification for the OLSestimation.
If this theorem is valid then we do not need to find otherestimation methods than OLS. The method of OLS gives usthe most efficient (best, minimum variance) estimators.
If any of the five assumptions fails the Gauss-Markov theoremdoes not hold. When MLR.3 fails unbiasedness property doesnot hold. When MLR.5 fails efficiency does not hold.
Econometrics I: Multiple Regression: Estimation - H. Tastan 51
The Meaning of Linearity of Estimators in BLUE
Linearity of Estimators
An estimator βj of βj is linear, iff, it can be expressed as a linearfunction of the data on the dependent variable:
βj =
n∑i=1
wijyi
where each wij can be a function of the sample values of all theindependent variables.
The OLS estimators can be written in the form given above:
βj =
∑ni=1 rijyi∑ni=1 r
2ij
=n∑
i=1
wijyi, where wij =rij∑ni=1 r
2ij
where rij is the residual obtained from the regression of xj on allother explanatory variables.
Econometrics I: Multiple Regression: Estimation - H. Tastan 52