01 Regression Analysis
-
Upload
manuel-mercy-garcia -
Category
Documents
-
view
229 -
download
3
description
Transcript of 01 Regression Analysis
-
Basiskurs Finance
1. Regression Analysis
-
Please bring your laptop to the tutorials to follow along during the Excel exercises.
People:
Lecture: Dr. Nikolas Breitkopf ([email protected]) Tutorial: Janis Bauer ([email protected])
Grading:
60 minute exam Exam date: 02.06.2014, 18:3019:30 (please register for the exam via the LSF)
1
Course Overview
Date Topic
Tue. 08.04., 1216 Regression Analysis Lecture
Tue. 15.04., 1216 Regression Analysis Tutorial
Tue. 22.04. No class (Easter holiday)
Tue. 29.04., 1216 Event Studies Lecture
Tue. 06.05., 1216 Event Studies Tutorial
Fri. 09.05., 1418 Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models) Lecture
Tue. 13.05., 1216 Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models) Tutorial
-
Core questions What is an estimator?
Properties of estimators
Which problems result from violation of OLS assumptions?
Agenda Motivation
Ordinary Least Squares (OLS)
Effects of violation of assumptions- Heteroscedasticity
- Correlation of the regressors with the error term (Endogeneity)
Fixed Effects Panel Estimation
2
REGRESSION ANALYSIS
Content
-
Basic literature Barreto, H. and Howland, F. M.; Introductory Econometrics, Cambridge;
latest edition
Additional literature Johnston, J. and DiNardo, J.; Econometric Methods, McGraw-Hill; latest
edition
Greene, W.; Econometric Analysis, Prentice Hall; latest edition
3
REGRESSION ANALYSIS
Literature
-
What are regressions used for in the field of finance? Estimation of beta of a stock
Asset-Pricing-Tests
Determinants of capital structure
Event studies
Determination of trends
Forecasting
Definition A regression estimates the linear relationship between independent
variables (x) and the dependent variable (y).
The real relationship of the population is deduced from a sample.
4
REGRESSION ANALYSIS
Motivation
-
Dependent variable: Bid/Ask-Spread
Stocks with Listed Options
Stocks without Listed Options
Constant 0.60 *** (196.48)
4.23 ***(1,015.57)
Naked Short Sale Ban (Dummy)
0.33 ***(5.94)
1.40 ***(12.24)
Covered Short Sale Ban (Dummy)
0.67 ***(9.66)
2.14 ***(25.95)
Disclosure Requirement (Dummy)
-0.20 ***(-3.42)
-0.72 ***(-6.54)
Stock-level Fixed Effects Yes Yes
#Obs 427,164 4,716,000
#Stocks 1,306 15,185
5
REGRESSION ANALYSIS
Example: Prohibition of naked sales of stocks
Source: Beber/Pagano 2010, WP
-
The population is the true, data-generating process that determines the relationship between variables of interest. Usually, one cannot observe the full population
Statistical inference is the process to learn from a random sample about the population.
Population variables are assumed to be random variables, i.e. there is no deterministic relationship between variables.
Then, any statistic calculated from the sample is a random variable as well.
6
REGRESSION ANALYSiS
Inference from a Random Sample
-
Assume the population consists of a normally distributed random variable X ~ N( = 200, = 20)
Experiment Draw 8 random samples from the population, each having 10 observations.
Calculate the mean of each sample.
Calculate the mean of the experiments means and its standard deviation.
What can you learn about the true random variable?
The sample mean is the best estimator of the population mean Since the population variable X is a random variable, so will be the sample mean.
To learn about the population, you have to know the distribution of the estimator (here the mean) in repeated samples.
The standard error describes the uncertainty (standard deviation) of the estimator.
7
REGRESSION ANALYSIS
Illustrative Example: Inference on the Average of Random Variable
-
Observation E1 E2 E3 E4 E5 E6 E7 E81 176.27 185.11 223.63 207.93 181.65 254.48 197.73 233.672 185.75 182.41 222.41 214.81 202.57 189.30 195.00 164.473 200.09 215.10 212.60 187.94 236.40 217.17 234.46 212.614 201.17 203.69 193.54 182.40 188.33 202.42 199.96 218.675 232.02 197.46 214.99 208.36 196.82 216.74 199.91 166.116 202.67 225.78 219.02 165.95 189.38 168.10 210.96 221.777 210.55 236.46 246.32 191.26 219.76 202.98 198.10 227.808 203.00 202.90 218.72 218.34 228.23 203.15 210.51 194.889 166.32 179.36 219.29 197.47 204.82 194.76 184.78 173.09
10 221.19 162.50 209.35 194.19 201.30 199.27 151.77 199.35Average 199.90 199.08 217.99 196.86 204.93 204.84 198.32 201.24Std. Deviation 19.73 22.62 13.20 16.06 17.94 22.34 21.03 25.89
Mean (Averages) 202.89Standard Error (Experiments) 6.75
Standard Error (theoretical) 6.32
8
REGRESSION ANALYSIS
Illustrative Example (continued)
n/:Mean theof Error Standard
-
Estimation An estimator is a method to determine unknown parameters of a
population with the help of a random sample from this population.
9
REGRESSION ANALYSIS
Estimator Properties (I)
Real population meanPopulation
Real population mean
Estimated population mean
Elements of sample (here: sample n=10)
-
Desirable Properties Unbiasedness
- The expected value of estimator is the true parameter.
Efficiency- The sample variance of unbiased estimator is the smallest of all unbiased
estimators.
- Example: OLS is the best linear unbiased estimator (BLUE).
Consistency- A biased estimator is consistent, if it converges asymptotically against the true
parameter.
10
REGRESSION ANALYSIS
Estimator Properties (II)
222
222
1
222
221
1
221
)(lim1)()(1
)()(1
1
=
==
=
=
=
=
sEn
nsExxn
s
sExxn
s
n
n
ii
n
ii
=)(bE
-
OLS Estimation of the best straight line describing the relationship between
x and y.
Approach: Minimization of the squared errors by
Properties- Minimization of Residual Sums of Squares
- The straight line crosses the point
11
REGRESSION ANALYSIS
OLS in the case of two-dimensionality
ebxay ++=
iiiii
i
bxayyyeeRSS
==
=
2
),( yx
x
y
Slope b
e1
e2
Intercept a
-
OLS in matrix notation: Population model:
u describes the error term in the (unobserved) population, e the error term of the sample.
The vector of the sample residuals is:
The optimization problem is then as follows:
- First order conditions (FOC)
12
REGRESSION ANALYSIS
OLS in the case of multidimensionality
=
=
=
=+=
nkknn
k
k
n u
uu
xx
xxxx
Xy
yyuXy ...;...;
...1
.........
...1
...1
;...with1
1
0
,2,
,22,2
,12,11
Xbye =
)''''2'(min'min XbXbyXbyyeebb
+=
1
1
1 )'()'(')'(0'2'2
xk
xkkxk
yXXXbyXbXXXbXyXb
RSS ===+=
-
Transpose of a matrix:
Product of two matrices:
Identity matrix I:
Properties of the inverse of a (square) matrix:
13
REGRESSION ANALYSIS
Linear Algebra Basics
=
=
dbca
Adcba
A '
++++
=
=
=
dhcfdgcebhafbgae
ABhgfe
Bdcba
A
AAIAIA
==
=
1...00............0...100...01
I
IAAIAA=
=
1
1
-
14
REGRESSION ANALYSIS
Matrix Multiplication: Typical Cases
)( kn )( mk )( mn=
=
=
Example InterpretationShape
ee' Sum of squared elements of e(inner product)
X Linear combination of the columns of X
XX '
=
Co-Variation of the columns of X (second-moment matrix)
= 'uu Product of all combinations of the elements of u (outer product)
-
OLS estimates a linear function, that crosses the mean of X and y. To integrate the intercept into the estimation equation, a vector
consisting of ones has to be added to the matrix X.
15
REGRESSION ANALYSIS
What is the intercept?
[ ] [ ]
yb
yXXX
yXXXbXyOLS
===
=++=
===
=
=
=
=
2631
6321321
111'31)3(
111
111)'(
')'(111
321
11
1
-
X has full rank k Solution of OLS only possible, if the matrix (XX) is invertible, i.e. it has
to be positive definite (all intrinsic values > 0).
X has to consist of linear independent columns (full rank).
A violation of this assumption is also called perfect collinearity and results usually from a wrong specification of the problem.
- Wrong specification of dummy variables:
- If a variable, consisting of c attributes, is separated into c Dummy Variables, then X no longer posses full column rank.
- Example: gender is separated in female: 1 if female otherwise 0, and male: 1 if male otherwise 0.
16
REGRESSION ANALYSIS
Central OLS assumptions (I)
=
011101011
X
-
E(Xu)=0 Regressors are not correlated with the error
terms.
Homoscedasticity: Error terms are iid (0, )
E(ui)=0 The expected value of the error term is
zero.
Var(ui) = 2 The variance of the error terms is constant.
E(uiuj) = 0 for i j Individual errors are independent.
No autocorrelation.
17
REGRESSION ANALYSIS
Central OLS assumptions (II)
[ ]][][][
])[])([(),(bEaEbaE
bEbaEaEbaCov=
=
Note:
-
Unbiasedness: E(b) =
The regressors are uncorrelated with the residual: Cov(e, X) = 0
Note: E(Xe)=0 is not to be mixed up with the assumption E(Xu)=0. OLS is calculated such that E(Xe)=0, nevertheless E(ee) can be a biased
and inconsistent estimate of E(uu).
18
REGRESSION ANALYSIS
Properties of OLS
====
+=
+=
+==
)(0)(')'()(')'()(
')'()'()'(
)(')'(
)'()'(
11
11
1
1
bEuEXXXbEuXXXb
uXXXXXXXb
uXXXXb
uXyyXXXb
0),(0)'(0'')'()'(
)(')'()'()'( 1
===+=+=
=
XeCoveXEeXeXbXXbXX
eXbXbXXyXXXb
-
Variance-Covariance-Matrix of the error terms
Inference Estimation of 2: E(ee), so-called standard error of the regression: Variance of the OLS coefficients
19
REGRESSION ANALYSIS
OLS in the multivariate case (II)
[ ] I
uEuuEuuE
uEuuEuuEuuEuE
uuu
u
uu
EuuE
nnn
n
n
n
2
2
2
2
!
221
2212
1212
1
212
1
...00............0...00...0
)(...)()(............
...)()()(...)()(
......
)'(
=
=
=
=
1212
11
11
)'()r(av)'()var(
)'(')')('(
])'('')'[(
)(because])')([()var(
==
=
=
==
XXsbXXb
XXXIXXXuuE
XXXuuXXXE
bEbbEb
I
-
The standard errors of the coefficients are the square root of the diagonal of var(b). They can be used to calculate the t-statistic of an estimate: t = b/se(b)
A joint hypothesis test can be conducted by:
where R describes a (q x k)-matrix and r a vector of length q.
The test statistic is:
Example: H0: i=0, i=1..4
20
REGRESSION ANALYSIS
Hypothesis testing
]')'(,0[~)(:: 120 RXXRNrRbwithrRH=
),(~)/('
/)(]')'([)'( 11 knqFknee
qrRbRXXRrRb
4
0000
1000010000100001
=
=
= qrR
-
True model: 100 simulations of random samples of x,y
- x ~ N(0,1) one-time sampled (fixed regressors).
- error terms u are sampled randomly out of N(0,1)
- For each sample, conduct OLS estimation
21
REGRESSION ANALYSIS
Example: Estimates as random variables
1015.0)(9993.0)(1083.0)(9921.0)(
====
bStdDevbE
aStdDevaE
1,1 ==++= uxy
-
Properties Variance-Covariance-Matrix of the coefficients:
is the covariance matrix of the error terms
OLS makes the assumption = 2I
If the OLS-assumptions are violated, the standard significance statements are wrong.
22
REGRESSION ANALYSIS
Properties of the Variance-Covariance-Matrix under OLS assumptions
11
11
)'(')'(
])'('')'[(
])')([()var(
=
=
=
XXXXXX
XXXuuXXXE
bbEb
IAAXXXXXIXXX
XXXuuXXXEb
==
=
=
112
112
11
as)'()'(')'(
])'('')'[()var(
==
2
2
2
2
...00............0...00...0
I
-
OLS assumption: The error term ui posses a constant variance for all observations i(Homoscedasticity).
Heteroscedasticity:
23
REGRESSION ANALYSIS
Heteroscedasticity
Example: 2 =f(x)=x2
=
2
22
21
...00............0...00...0
)'(
n
uuE
-
Monte-Carlo-Simulation Assume the true model is:
Create 1000 samples of sample, each with a sample size of N = 100 observations
A regression from y to x is executed for each one of the 1000 data sets and the resulting axis intercept and the slope is noted.
Standard errors of OLS are biased here: The estimated standard error of the intercept is too large; the estimated standard error of the slope is too small.
Note: The coefficient estimates of OLS are still unbiased even in presence of heteroscedasticity. However, for inference unbiased standard errors are essential.
24
REGRESSION ANALYSIS
Results of a simulation study with heteroscedastic error terms
N=100
=1 =1
Coefficients (OLS) 1.0091 0.99751
OLS s.e. (avg. / incorrect)
0.60031 0.19875
OLS s.e. (sim. distribution / correct)
0.38269 0.21949
White s.e. 0.37341 0.21262
),0(~with1 2iiiii xNuuxy ++=
-
The White Correction determines an adapted covariance matrix out of the sample standard error terms to correct for heteroscedasticity.
The covariance matrix with heteroscedasticity is:
has n parameters, this cant be estimated out of n observations.
The coefficient estimators of OLS are unbiased, so the residuals e are unbiased estimates of u.
The White matrix is asymptotically unbiased with any type of heteroscedasticity, and only k parameters have to be estimated.
25
REGRESSION ANALYSIS
White Correction of the Variance-Covarince-Matrix
( ) ( ) ( ) 1
:
1 ''')var(
0
= XXXXXXb
kxkS
kxnnxnnxk
=
2
22
21
...00............0...00...0
with
n
=
=n
iiii xxeS
1
20 ':White
-
OLS assumes that the regressors X and the error term u are uncorrelated E(Xu) = 0
A violation of this assumption results particularly if the X-variables are measured with error
- a X-variable is endogenous with y- a X-variable, that is relevant in the population, is omitted in the regression.
Simulation experiment:
OLS is extremely biased Solution: Instrumental Variables Regression
26
REGRESSION ANALYSIS
Correlation of the regressors with the error term (Endogeneity)
)1,0(~, with00 Nuuxuxy +=++= =1 =0.5 =0.1
S=1000 OLS (=0) OLS OLS
Parameter 0.5075 0.40766 0.10106
Avg. s.e. 0.050628 0.081588 0.10124
Std. Dev. 0.035988 0.067452 0.10102
1%tile 0.42086 0.24095 -0.14092
99%tile 0.59003 0.55394 0.32374
-
27
REGRESSION ANALYSIS
Simulation results with E(Xu) 0
Case: = 0.5
-
REGRESSION ANALYSIS
Panel data consists of cross-section and time series data: N individuals, repeatedly observed at T points in time.
Simple OLS would pool all N*T observations, assuming independence. Obviously, with economic individuals
(like firms, stocks, countries, etc.) in the cross-section,
- repeated observations of the same individual will be more similar than obs. between individuals.
- OLS will then be inconsistent and biased.
Solutions Estimate a system of equations, one for
each individual. Estimate a system of equations, with
restrictions requiring some homogeneity(e.g. same slope, different intercepts)
28
Panel Estimation
I1
I2
I3
y
x
eXby +=:OLSPooled
-
REGRESSION ANALYSIS
Assumption: Individual differences are captured in differences in the constant term (intercept).
This amounts to including one dummy variable per individual
y1: Vector of observations of the dependent variable of individual 1
X1 : Explanatory variables of individual 1 i is a vector of ones with length
corresponding to y1 This is just a classical regression! Basically, the individual time series data
is demeaned and then estimated by OLS.
Properties:
Computational intensive for large N. Significance of fixed effects
F-test for the joint significance of the dummies.
The model is robust against misspecification. Every time-invariant explanatory variable
is captured by dummies.- E.g. legal form of firms, industry
affiliation, etc. The individual effects can be correlated
with the disturbances. Specific time-invariant variables can
only be estimated / included as interaction terms with other regressors.
The fixed effects themselves are biased estimates.
29
Fixed Effects Model (FE)
[ ]
+
=
+
+
=
n
nnn
ddXy
X
XX
y
yy
...
......00
............0...00...0
......
1
2
1
2
1
2
1
i
ii
Basiskurs FinanceSlide Number 2REGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSiSREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSIS