01 Regression Analysis

Basiskurs Finance

1. Regression Analysis

Please bring your laptop to the tutorials to follow along during the Excel exercises.

People:

Lecture: Dr. Nikolas Breitkopf ([email protected]) Tutorial: Janis Bauer ([email protected])

Grading:

60 minute exam Exam date: 02.06.2014, 18:3019:30 (please register for the exam via the LSF)

1

Course Overview

Date Topic

Tue. 08.04., 1216 Regression Analysis Lecture

Tue. 15.04., 1216 Regression Analysis Tutorial

Tue. 22.04. No class (Easter holiday)

Tue. 29.04., 1216 Event Studies Lecture

Tue. 06.05., 1216 Event Studies Tutorial

Fri. 09.05., 1418 Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models) Lecture

Tue. 13.05., 1216 Monte-Carlo Simulation / Game Theory (Bayesian Equilibrium Models) Tutorial

Core questions What is an estimator?

Properties of estimators

Which problems result from violation of OLS assumptions?

Agenda Motivation

Ordinary Least Squares (OLS)

Effects of violation of assumptions- Heteroscedasticity

- Correlation of the regressors with the error term (Endogeneity)

Fixed Effects Panel Estimation

2

REGRESSION ANALYSIS

Content

Basic literature Barreto, H. and Howland, F. M.; Introductory Econometrics, Cambridge;

latest edition

Additional literature Johnston, J. and DiNardo, J.; Econometric Methods, McGraw-Hill; latest

edition

Greene, W.; Econometric Analysis, Prentice Hall; latest edition

3

REGRESSION ANALYSIS

Literature

What are regressions used for in the field of finance? Estimation of beta of a stock

Asset-Pricing-Tests

Determinants of capital structure

Event studies

Determination of trends

Forecasting

Definition A regression estimates the linear relationship between independent

variables (x) and the dependent variable (y).

The real relationship of the population is deduced from a sample.

4

REGRESSION ANALYSIS

Motivation

Dependent variable: Bid/Ask-Spread

Stocks with Listed Options

Stocks without Listed Options

Constant 0.60 *** (196.48)

4.23 ***(1,015.57)

Naked Short Sale Ban (Dummy)

0.33 ***(5.94)

1.40 ***(12.24)

Covered Short Sale Ban (Dummy)

0.67 ***(9.66)

2.14 ***(25.95)

Disclosure Requirement (Dummy)

-0.20 ***(-3.42)

-0.72 ***(-6.54)

Stock-level Fixed Effects Yes Yes

#Obs 427,164 4,716,000

#Stocks 1,306 15,185

5

REGRESSION ANALYSIS

Example: Prohibition of naked sales of stocks

Source: Beber/Pagano 2010, WP

The population is the true, data-generating process that determines the relationship between variables of interest. Usually, one cannot observe the full population

Statistical inference is the process to learn from a random sample about the population.

Population variables are assumed to be random variables, i.e. there is no deterministic relationship between variables.

Then, any statistic calculated from the sample is a random variable as well.

6

REGRESSION ANALYSiS

Inference from a Random Sample

Assume the population consists of a normally distributed random variable X ~ N( = 200, = 20)

Experiment Draw 8 random samples from the population, each having 10 observations.

Calculate the mean of each sample.

Calculate the mean of the experiments means and its standard deviation.

What can you learn about the true random variable?

The sample mean is the best estimator of the population mean Since the population variable X is a random variable, so will be the sample mean.

To learn about the population, you have to know the distribution of the estimator (here the mean) in repeated samples.

The standard error describes the uncertainty (standard deviation) of the estimator.

7

REGRESSION ANALYSIS

Illustrative Example: Inference on the Average of Random Variable

Observation E1 E2 E3 E4 E5 E6 E7 E81 176.27 185.11 223.63 207.93 181.65 254.48 197.73 233.672 185.75 182.41 222.41 214.81 202.57 189.30 195.00 164.473 200.09 215.10 212.60 187.94 236.40 217.17 234.46 212.614 201.17 203.69 193.54 182.40 188.33 202.42 199.96 218.675 232.02 197.46 214.99 208.36 196.82 216.74 199.91 166.116 202.67 225.78 219.02 165.95 189.38 168.10 210.96 221.777 210.55 236.46 246.32 191.26 219.76 202.98 198.10 227.808 203.00 202.90 218.72 218.34 228.23 203.15 210.51 194.889 166.32 179.36 219.29 197.47 204.82 194.76 184.78 173.09

10 221.19 162.50 209.35 194.19 201.30 199.27 151.77 199.35Average 199.90 199.08 217.99 196.86 204.93 204.84 198.32 201.24Std. Deviation 19.73 22.62 13.20 16.06 17.94 22.34 21.03 25.89

Mean (Averages) 202.89Standard Error (Experiments) 6.75

Standard Error (theoretical) 6.32

8

REGRESSION ANALYSIS

Illustrative Example (continued)

n/:Mean theof Error Standard

Estimation An estimator is a method to determine unknown parameters of a

population with the help of a random sample from this population.

9

REGRESSION ANALYSIS

Estimator Properties (I)

Real population meanPopulation

Real population mean

Estimated population mean

Elements of sample (here: sample n=10)

Desirable Properties Unbiasedness

- The expected value of estimator is the true parameter.

Efficiency- The sample variance of unbiased estimator is the smallest of all unbiased

estimators.

- Example: OLS is the best linear unbiased estimator (BLUE).

Consistency- A biased estimator is consistent, if it converges asymptotically against the true

parameter.

10

REGRESSION ANALYSIS

Estimator Properties (II)

222

222

1

222

221

1

221

)(lim1)()(1

)()(1

1

=

==

=

=

=

=

sEn

nsExxn

s

sExxn

s

n

n

ii

n

ii

=)(bE

OLS Estimation of the best straight line describing the relationship between

x and y.

Approach: Minimization of the squared errors by

Properties- Minimization of Residual Sums of Squares

- The straight line crosses the point

11

REGRESSION ANALYSIS

OLS in the case of two-dimensionality

ebxay ++=

iiiii

i

bxayyyeeRSS

==

=

2

),( yx

x

y

Slope b

e1

e2

Intercept a

OLS in matrix notation: Population model:

u describes the error term in the (unobserved) population, e the error term of the sample.

The vector of the sample residuals is:

The optimization problem is then as follows:

- First order conditions (FOC)

12

REGRESSION ANALYSIS

OLS in the case of multidimensionality

=

=

=

=+=

nkknn

k

k

n u

uu

xx

xxxx

Xy

yyuXy ...;...;

...1

.........

...1

...1

;...with1

1

0

,2,

,22,2

,12,11

Xbye =

)''''2'(min'min XbXbyXbyyeebb

+=

1

1

1 )'()'(')'(0'2'2

xk

xkkxk

yXXXbyXbXXXbXyXb

RSS ===+=

Transpose of a matrix:

Product of two matrices:

Identity matrix I:

Properties of the inverse of a (square) matrix:

13

REGRESSION ANALYSIS

Linear Algebra Basics

=

=

dbca

Adcba

A '

++++

=

=

=

dhcfdgcebhafbgae

ABhgfe

Bdcba

A

AAIAIA

==

=

1...00............0...100...01

I

IAAIAA=

=

1

1

14

REGRESSION ANALYSIS

Matrix Multiplication: Typical Cases

)( kn )( mk )( mn=

=

=

Example InterpretationShape

ee' Sum of squared elements of e(inner product)

X Linear combination of the columns of X

XX '

=

Co-Variation of the columns of X (second-moment matrix)

= 'uu Product of all combinations of the elements of u (outer product)

OLS estimates a linear function, that crosses the mean of X and y. To integrate the intercept into the estimation equation, a vector

consisting of ones has to be added to the matrix X.

15

REGRESSION ANALYSIS

What is the intercept?

[ ] [ ]

yb

yXXX

yXXXbXyOLS

===

=++=

===

=

=

=

=

2631

6321321

111'31)3(

111

111)'(

')'(111

321

11

1

X has full rank k Solution of OLS only possible, if the matrix (XX) is invertible, i.e. it has

to be positive definite (all intrinsic values > 0).

X has to consist of linear independent columns (full rank).

A violation of this assumption is also called perfect collinearity and results usually from a wrong specification of the problem.

- Wrong specification of dummy variables:

- If a variable, consisting of c attributes, is separated into c Dummy Variables, then X no longer posses full column rank.

- Example: gender is separated in female: 1 if female otherwise 0, and male: 1 if male otherwise 0.

16

REGRESSION ANALYSIS

Central OLS assumptions (I)

=

011101011

X

E(Xu)=0 Regressors are not correlated with the error

terms.

Homoscedasticity: Error terms are iid (0, )

E(ui)=0 The expected value of the error term is

zero.

Var(ui) = 2 The variance of the error terms is constant.

E(uiuj) = 0 for i j Individual errors are independent.

No autocorrelation.

17

REGRESSION ANALYSIS

Central OLS assumptions (II)

[ ]][][][

])[])([(),(bEaEbaE

bEbaEaEbaCov=

=

Note:

Unbiasedness: E(b) =

The regressors are uncorrelated with the residual: Cov(e, X) = 0

Note: E(Xe)=0 is not to be mixed up with the assumption E(Xu)=0. OLS is calculated such that E(Xe)=0, nevertheless E(ee) can be a biased

and inconsistent estimate of E(uu).

18

REGRESSION ANALYSIS

Properties of OLS

====

+=

+=

+==

)(0)(')'()(')'()(

')'()'()'(

)(')'(

)'()'(

11

11

1

1

bEuEXXXbEuXXXb

uXXXXXXXb

uXXXXb

uXyyXXXb

0),(0)'(0'')'()'(

)(')'()'()'( 1

===+=+=

=

XeCoveXEeXeXbXXbXX

eXbXbXXyXXXb

Variance-Covariance-Matrix of the error terms

Inference Estimation of 2: E(ee), so-called standard error of the regression: Variance of the OLS coefficients

19

REGRESSION ANALYSIS

OLS in the multivariate case (II)

[ ] I

uEuuEuuE

uEuuEuuEuuEuE

uuu

u

uu

EuuE

nnn

n

n

n

2

2

2

2

!

221

2212

1212

1

212

1

...00............0...00...0

)(...)()(............

...)()()(...)()(

......

)'(

=

=

=

=

1212

11

11

)'()r(av)'()var(

)'(')')('(

])'('')'[(

)(because])')([()var(

==

=

=

==

XXsbXXb

XXXIXXXuuE

XXXuuXXXE

bEbbEb

I

The standard errors of the coefficients are the square root of the diagonal of var(b). They can be used to calculate the t-statistic of an estimate: t = b/se(b)

A joint hypothesis test can be conducted by:

where R describes a (q x k)-matrix and r a vector of length q.

The test statistic is:

Example: H0: i=0, i=1..4

20

REGRESSION ANALYSIS

Hypothesis testing

]')'(,0[~)(:: 120 RXXRNrRbwithrRH=

),(~)/('

/)(]')'([)'( 11 knqFknee

qrRbRXXRrRb

4

0000

1000010000100001

=

=

= qrR

True model: 100 simulations of random samples of x,y

- x ~ N(0,1) one-time sampled (fixed regressors).

- error terms u are sampled randomly out of N(0,1)

- For each sample, conduct OLS estimation

21

REGRESSION ANALYSIS

Example: Estimates as random variables

1015.0)(9993.0)(1083.0)(9921.0)(

====

bStdDevbE

aStdDevaE

1,1 ==++= uxy

Properties Variance-Covariance-Matrix of the coefficients:

is the covariance matrix of the error terms

OLS makes the assumption = 2I

If the OLS-assumptions are violated, the standard significance statements are wrong.

22

REGRESSION ANALYSIS

Properties of the Variance-Covariance-Matrix under OLS assumptions

11

11

)'(')'(

])'('')'[(

])')([()var(

=

=

=

XXXXXX

XXXuuXXXE

bbEb

IAAXXXXXIXXX

XXXuuXXXEb

==

=

=

112

112

11

as)'()'(')'(

])'('')'[()var(

==

2

2

2

2

...00............0...00...0

I

OLS assumption: The error term ui posses a constant variance for all observations i(Homoscedasticity).

Heteroscedasticity:

23

REGRESSION ANALYSIS

Heteroscedasticity

Example: 2 =f(x)=x2

=

2

22

21

...00............0...00...0

)'(

n

uuE

Monte-Carlo-Simulation Assume the true model is:

Create 1000 samples of sample, each with a sample size of N = 100 observations

A regression from y to x is executed for each one of the 1000 data sets and the resulting axis intercept and the slope is noted.

Standard errors of OLS are biased here: The estimated standard error of the intercept is too large; the estimated standard error of the slope is too small.

Note: The coefficient estimates of OLS are still unbiased even in presence of heteroscedasticity. However, for inference unbiased standard errors are essential.

24

REGRESSION ANALYSIS

Results of a simulation study with heteroscedastic error terms

N=100

=1 =1

Coefficients (OLS) 1.0091 0.99751

OLS s.e. (avg. / incorrect)

0.60031 0.19875

OLS s.e. (sim. distribution / correct)

0.38269 0.21949

White s.e. 0.37341 0.21262

),0(~with1 2iiiii xNuuxy ++=

The White Correction determines an adapted covariance matrix out of the sample standard error terms to correct for heteroscedasticity.

The covariance matrix with heteroscedasticity is:

has n parameters, this cant be estimated out of n observations.

The coefficient estimators of OLS are unbiased, so the residuals e are unbiased estimates of u.

The White matrix is asymptotically unbiased with any type of heteroscedasticity, and only k parameters have to be estimated.

25

REGRESSION ANALYSIS

White Correction of the Variance-Covarince-Matrix

( ) ( ) ( ) 1

:

1 ''')var(

0

= XXXXXXb

kxkS

kxnnxnnxk

=

2

22

21

...00............0...00...0

with

n

=

=n

iiii xxeS

1

20 ':White

OLS assumes that the regressors X and the error term u are uncorrelated E(Xu) = 0

A violation of this assumption results particularly if the X-variables are measured with error

- a X-variable is endogenous with y- a X-variable, that is relevant in the population, is omitted in the regression.

Simulation experiment:

OLS is extremely biased Solution: Instrumental Variables Regression

26

REGRESSION ANALYSIS

Correlation of the regressors with the error term (Endogeneity)

)1,0(~, with00 Nuuxuxy +=++= =1 =0.5 =0.1

S=1000 OLS (=0) OLS OLS

Parameter 0.5075 0.40766 0.10106

Avg. s.e. 0.050628 0.081588 0.10124

Std. Dev. 0.035988 0.067452 0.10102

1%tile 0.42086 0.24095 -0.14092

99%tile 0.59003 0.55394 0.32374

27

REGRESSION ANALYSIS

Simulation results with E(Xu) 0

Case: = 0.5

REGRESSION ANALYSIS

Panel data consists of cross-section and time series data: N individuals, repeatedly observed at T points in time.

Simple OLS would pool all N*T observations, assuming independence. Obviously, with economic individuals

(like firms, stocks, countries, etc.) in the cross-section,

- repeated observations of the same individual will be more similar than obs. between individuals.

- OLS will then be inconsistent and biased.

Solutions Estimate a system of equations, one for

each individual. Estimate a system of equations, with

restrictions requiring some homogeneity(e.g. same slope, different intercepts)

28

Panel Estimation

I1

I2

I3

y

x

eXby +=:OLSPooled

REGRESSION ANALYSIS

Assumption: Individual differences are captured in differences in the constant term (intercept).

This amounts to including one dummy variable per individual

y1: Vector of observations of the dependent variable of individual 1

X1 : Explanatory variables of individual 1 i is a vector of ones with length

corresponding to y1 This is just a classical regression! Basically, the individual time series data

is demeaned and then estimated by OLS.

Properties:

Computational intensive for large N. Significance of fixed effects

F-test for the joint significance of the dummies.

The model is robust against misspecification. Every time-invariant explanatory variable

is captured by dummies.- E.g. legal form of firms, industry

affiliation, etc. The individual effects can be correlated

with the disturbances. Specific time-invariant variables can

only be estimated / included as interaction terms with other regressors.

The fixed effects themselves are biased estimates.

29

Fixed Effects Model (FE)

[ ]

+

=

+

+

=

n

nnn

ddXy

X

XX

y

yy

...

......00

............0...00...0

......

1

2

1

2

1

2

1

i

ii

Basiskurs FinanceSlide Number 2REGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSiSREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSISREGRESSION ANALYSIS

01 Regression Analysis

Documents

Transcript of 01 Regression Analysis