Heteroskedasticity ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera...

Post on 14-Jan-2016

228 views 6 download

Tags:

Transcript of Heteroskedasticity ECON 4550 Econometrics Memorial University of Newfoundland Adapted from Vera...

ECON 4550 Econometrics Memorial University of Newfoundland

Adapted from Vera Tabakova’s notes

8.1 The Nature of Heteroskedasticity

8.2 Using the Least Squares Estimator

8.3 The Generalized Least Squares Estimator

8.4 Detecting Heteroskedasticity

1 2( )E y x

1 2( )i i i i ie y E y y x

1 2i i iy x e

Figure 8.1 Heteroskedastic Errors

2( ) 0 var( ) cov( , ) 0i i i jE e e e e

var( ) var( ) ( )i i iy e h x

ˆ 83.42 10.21i iy x

ˆ 83.42 10.21i i ie y x

Food expenditure example:

Figure 8.2 Least Squares Estimated Expenditure Function and Observed Data Points

The existence of heteroskedasticity implies:

The least squares estimator is still a linear and unbiased estimator, but

it is no longer best. There is another estimator with a smaller

variance.

The standard errors usually computed for the least squares estimator

are incorrect. Confidence intervals and hypothesis tests that use these

standard errors may be misleading.

21 2 var( )i i i iy x e e

2

22

1

var( )( )

N

ii

bx x

21 2 var( )i i i i iy x e e

2 2

2 2 12 2

1 2

1

( )var( )

( )

N

i iNi

i iNi

ii

x xb w

x x

2 2

2 2 12 2

1 2

1

ˆ( )ˆvar( )

( )

N

i iNi

i iNi

ii

x x eb w e

x x

ˆ 83.42 10.21

(27.46) (1.81) (White se)

(43.41) (2.09) (incorrect se)

i iy x

2 2

2 2

White: se( ) 10.21 2.024 1.81 [6.55, 13.87]

Incorrect: se( ) 10.21 2.024 2.09 [5.97, 14.45]

c

c

b t b

b t b

We can use a robust estimator: regress food_exp income, robust

In SHAZAM the HETCOV option on the OLS command reports the White's heteroskedasticity-consistent standard errors

OLS FOOD INCOME / HETCOV

Confidence interval estimate using the White standard errors CONFID INCOME / TCRIT=2.024

But the White standard errors reported by SHAZAM have numeric differences compared to our textbook results.

There is a correction we could apply, but we will not worry about it for now

1 2

2( ) 0 var( ) cov( , ) 0

i i i

i i i i j

y x e

E e e e e

2 2var i i ie x

1 2

1i i i

i i i i

y x e

x x x x

1 2

1 i i i

i i i i i

i i i i

y x ey x x x e

x x x x

1 1 2 2i i i iy x x e

2 21 1var( ) var var( )i

i i ii ii

ee e x

x xx

To obtain the best linear unbiased estimator for a model with

heteroskedasticity of the type specified in equation (8.11):

1. Calculate the transformed variables given in (8.13).

2. Use least squares to estimate the transformed model given in (8.14).

The generalized least squares estimator is as a weighted least

squares estimator. Minimizing the sum of squared transformed errors

that is given by:

When is small, the data contain more information about the

regression function and the observations are weighted heavily.

When is large, the data contain less information and the

observations are weighted lightly.

22 1/2 2

1 1 1

( )N N N

ii i i

i i ii

ee x e

x

ix

ix

ˆ 78.68 10.45

(se) (23.79) (1.39)i iy x

2 2ˆ ˆse( ) 10.451 2.024 1.386 [7.65,13.26]ct

Food example again, where was the problem coming from?

regress food_exp income [aweight = 1/income]

ˆ 78.68 10.45

(se) (23.79) (1.39)i iy x

2 2ˆ ˆse( ) 10.451 2.024 1.386 [7.65,13.26]ct

Food example again, where was the problem coming from?

Specify a weight variable (SHAZAM works with the inverse) GENR W=1/INCOME OLS FOOD INCOME / WEIGHT=W 95% confidence interval, p. 205 CONFID INCOME / TCRIT=2.024

Note that

The residual statistics reported in a WLS regression (SIGMA**2, STANDARD ERROR OF THE ESTIMATE-SIGMA, and SUM OF SQUARED ERRORS-SSE) are all based on the transformed (weighted) residuals

You should remember when making model comparisons, that the high-variance observations are systematically underweighted by this procedure

This may be a good thing, if you want to avoid having these observations dominate the model selection comparisons.

But if you want model selection to be based on how well the alternative models fit the original (untransformed) data, you must base the model selection tests on the untransformed residuals.

2 2var( )i i ie x

2 2ln( ) ln( ) ln( )i ix

2 2

1 2

exp ln( ) ln( )

exp( )

i i

i

x

z

21 2 2exp( )i i s iSz z

21 2ln( )i iz

1 2( )i i i i iy E y e x e

2 21 2ˆln( ) ln( )i i i i ie v z v

2ˆln( ) .9378 2.329i iz

21 1ˆ ˆˆ exp( )i iz

1 2

1i i i

i i i i

y x e

22 2

1 1var var( ) 1i

i ii i i

ee

1 2

1ˆ ˆ ˆ

i ii i i

i i i

y xy x x

1 1 2 2i i i iy x x e

1 2 2i i k iK iy x x e

21 2 2var( ) exp( )i i i s iSe z z

The steps for obtaining a feasible generalized least squares estimator

for are:

1. Estimate (8.25) by least squares and compute the squares of

the least squares residuals .

2. Estimate by applying least squares to the equation

1 2, , , K

2ie

1 2, , , S

21 2 2ˆln i i S iS ie z z v

3. Compute variance estimates .

4. Compute the transformed observations defined by (8.23),

including if .

5. Apply least squares to (8.24), or to an extended version of

(8.24) if .

21 2 2ˆ ˆ ˆˆ exp( )i i S iSz z

3, ,i iKx x 2K

2K

ˆ 76.05 10.63

(se) (9.71) (.97)iy x

For our food expenditure example:

gen z = log(income)regress food_exp incomepredict ehat, residualgen lnehat2 = log(ehat*ehat)regress lnehat2 z

* --------------------------------------------* Feasible GLS* --------------------------------------------predict sig2, xbgen wt = exp(sig2)regress food_exp income [aweight = 1/wt]

The HET command can be used for Maximum Likelihood Estimation of the model given in Equations (8.25) and (8.26), p. 207.

This method is an alternative estimation method to the GLS method discussed in the text (so the results will also be different):

HET FOOD INCOME (INCOME) / MODEL=MULT

9.914 1.234 .133 1.524

(se) (1.08) (.070) (.015) (.431)

WAGE EDUC EXPER METRO

1 2 3 1,2, ,Mi M Mi Mi Mi MWAGE EDUC EXPER e i N

1 2 3 1,2, ,Ri R Ri Ri Ri RWAGE EDUC EXPER e i N

1 9.914 1.524 8.39Mb

Using our wage data (cps2.dta):

???

2 2var( ) var( )Mi M Ri Re e

2 2ˆ ˆ31.824 15.243M R

1 2 3

1 2 3

9.052 1.282 .1346

6.166 .956 .1260

M M M

R R R

b b b

b b b

1 2 3

1Mi Mi Mi MiM

M M M M M

WAGE EDUC EXPER e

1,2, , Mi N

1 2 3

1Ri Ri Ri RiR

R R R R R

WAGE EDUC EXPER e

1,2, , Ri N

Feasible generalized least squares:

1. Obtain estimated and by applying least squares separately to

the metropolitan and rural observations.

2.

3. Apply least squares to the transformed model

ˆ M ˆ R

ˆ when 1ˆ

ˆ when 0

M i

i

R i

METRO

METRO

1 2 3

1ˆ ˆ ˆ ˆ ˆ ˆ

i i i i iR

i i i i i i

WAGE EDUC EXPER METRO e

9.398 1.196 .132 1.539

(se) (1.02) (.069) (.015) (.346)

WAGE EDUC EXPER METRO

_cons -9.398362 1.019673 -9.22 0.000 -11.39931 -7.397408 metro 1.538803 .3462856 4.44 0.000 .8592702 2.218336 exper .1322088 .0145485 9.09 0.000 .1036595 .160758 educ 1.195721 .068508 17.45 0.000 1.061284 1.330157 wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 36081.2155 999 36.1173328 Root MSE = 5.1371 Adj R-squared = 0.2693 Residual 26284.1488 996 26.3897076 R-squared = 0.2715 Model 9797.0667 3 3265.6889 Prob > F = 0.0000 F( 3, 996) = 123.75 Source SS df MS Number of obs = 1000

(sum of wgt is 3.7986e+01). regress wage educ exper metro [aweight = 1/wt]

* --------------------------------------------* Rural subsample regression* --------------------------------------------regress wage educ exper if metro == 0 scalar rmse_r = e(rmse)scalar df_r = e(df_r)* --------------------------------------------* Urban subsample regression* --------------------------------------------regress wage educ exper if metro == 1 scalar rmse_m = e(rmse)scalar df_m = e(df_r)* --------------------------------------------* Groupwise heteroskedastic regression using FGLS* --------------------------------------------gen rural = 1 - metrogen wt=(rmse_r^2*rural) + (rmse_m^2*metro)regress wage educ exper metro [aweight = 1/wt]

STATA Commands:

Remark: To implement the generalized least squares estimators

described in this Section for three alternative heteroskedastic

specifications, an assumption about the form of the

heteroskedasticity is required. Using least squares with White

standard errors avoids the need to make an assumption about the

form of heteroskedasticity, but does not realize the potential

efficiency gains from generalized least squares.

8.4.1 Residual Plots

Estimate the model using least squares and plot the least squares

residuals.

With more than one explanatory variable, plot the least squares

residuals against each explanatory variable, or against , to see if

those residuals vary in a systematic way relative to the specified

variable.

ˆiy

8.4.2 The Goldfeld-Quandt Test

2 2

( , )2 2

ˆ

ˆ M M R R

M MN K N K

R R

F F

2 2 2 20 0: against :M R M RH H

2

2

ˆ 31.8242.09

ˆ 15.243M

R

F

8.4.2 The Goldfeld-Quandt Test

2

2

ˆ 31.8242.09

ˆ 15.243M

R

F

* --------------------------------------------* Goldfeld Quandt test* --------------------------------------------

scalar GQ = rmse_m^2/rmse_r^2scalar crit = invFtail(df_m,df_r,.05)scalar pvalue = Ftail(df_m,df_r,GQ)scalar list GQ pvalue crit

8.4.2 The Goldfeld-Quandt Test

21ˆ 3574.8

22ˆ 12,921.9

2221

ˆ 12,921.93.61

ˆ 3574.8F

For the food expenditure data

You should now be able to obtain this test statistic

And check whether it exceeds the critical value

8.4.2 The Goldfeld-Quandt Test

Sort the data by income:

SORT INCOME FOOD / DESC

OLS FOOD INCOME

On the DIAGNOS command the CHOWONE= option reports the Goldfeld-Quandt test for heteroskedasticity (bottom of page 212) with a p-value for a one-sided test. The HET option reports the tests for heteroskedasticity reported on page 215.

DIAGNOS / CHOWONE=20 HET

8.4.2 The Goldfeld-Quandt Test

SORT INCOME FOOD / DESC

OLS FOOD INCOME

On the DIAGNOS command the CHOWONE= option reports the Goldfeld-Quandt test for heteroskedasticity (bottom of page 212) with a p-value for a one-sided test. The HET option reports the tests for heteroskedasticity reported on page 215.

DIAGNOS / CHOWONE=20 HET

Of course, this option also computes the Chow test statistic for structural change (that is, tests the null of parameter stability in the two subsamples).

8.4.2 The Goldfeld-Quandt Test

SEQUENTIAL CHOW AND GOLDFELD-QUANDT TESTS    N1   N2    SSE1        SSE2       CHOW    PVALUE    G-Q       DF1  DF2 PVALUE    20   20 0.23259E+06  64346.     0.45855     0.636  3.615       18   18 0.005             CHOW TEST - F DISTRIBUTION WITH DF1=   2 AND DF2=  36

8.4.2 The Goldfeld-Quandt Test

SHAZAM considers that the alternative hypothesis is smaller error variance in the second subset relative to the first subset.

Some authors present the alternative as larger variance in the second subset.

Goldfeld and Quandt recommend ordering the observations by the values of one of the explanatory variables.

This can be done with the SORT command in SHAZAM. The DESC option on the SORT command should be used if it is assumed that the variance is positively related to the value of the sort variable.

8.4.3 Testing the Variance Function

1 2 2( )i i i i K iK iy E y e x x e

2 21 2 2var( ) ( ) ( )i i i i S iSy E e h z z

1 2 2 1 2 2( ) exp( )i S iS i S iSh z z z z

21 2( ) exp ln( ) ln( )i ih z x

8.4.3 Testing the Variance Function

1 2 2 1 2 2( )i S iS i S iSh z z z z

1 2 2 1( ) ( )i S iSh z z h

0 2 3

1 0

: 0

: not all the in are zero

S

s

H

H H

8.4.3 Testing the Variance Function

2 21 2 2var( ) ( )i i i i S iSy E e z z

2 21 2 2( )i i i i S iS ie E e v z z v

21 2 2i i S iS ie z z v

2 2 2( 1)SN R

8.4.3a The White Test

1 2 2 3 3( )i i iE y x x

2 22 2 3 3 4 2 5 3z x z x z x z x

8.4.3b Testing the Food Expenditure Example

4,610,749,441 3,759,556,169SST SSE

2 1 .1846SSE

RSST

2 2 40 .1846 7.38N R

2 2 40 .18888 7.555 -value .023N R p

whitetst

Or

estat imtest, white

Further testing in SHAZAM

DIAGNOS / HET Will yield a battery of

heteroskedasticity tests using different specifications

SHAZAM for food exampleDIAGNOS\HET REQUIRED MEMORY IS PAR=       7 CURRENT PAR=   22480 DEPENDENT VARIABLE = FOOD            40 OBSERVATIONS REGRESSION COEFFICIENTS    10.2096426868       83.4160065402 HETEROSKEDASTICITY TESTS                             CHI-SQUARE     D.F.   P-VALUE                           TEST STATISTIC E**2 ON YHAT:                      7.384     1    0.00658 E**2 ON YHAT**2:                   7.549     1    0.00600 E**2 ON LOG(YHAT**2):              6.516     1    0.01069 E**2 ON LAG(E**2) ARCH TEST:       0.089     1    0.76544 LOG(E**2) ON X (HARVEY) TEST:     10.654     1    0.00110 ABS(E) ON X (GLEJSER) TEST:       11.466     1    0.00071 E**2 ON X                 TEST:           KOENKER(R2):             7.384     1    0.00658           B-P-G (SSR) :            7.344     1    0.00673 E**2 ON X X**2    (WHITE) TEST:           KOENKER(R2):             7.555     2    0.02288           B-P-G (SSR) :            7.514     2    0.02336

Same in SLR

Slide 8-51Principles of Econometrics, 3rd Edition

Slide 8-52Principles of Econometrics, 3rd Edition

Slide 8-53Principles of Econometrics, 3rd Edition

(8A.1)

1 2i i iy x e

2( ) 0 var( ) cov( , ) 0 ( )i i i i jE e e e e i j

2 2 i ib w e

2i

i

i

x xw

x x

Slide 8-54Principles of Econometrics, 3rd Edition

2 2

2 2

i i

i i

E b E E w e

w E e

Slide 8-55Principles of Econometrics, 3rd Edition

(8A.2)

2

2

2 2

2 2

22

var var

var cov ,

( )

( )

i i

i i i j i ji j

i i

i i

i

b w e

w e w w e e

w

x x

x x

Slide 8-56Principles of Econometrics, 3rd Edition

(8A.3)

2

2 2var( )i

bx x

Slide 8-57Principles of Econometrics, 3rd Edition

(8B.2)

(8B.1)21 2 2i i S iS ie z z v

( ) / ( 1)

/ ( )

SST SSE SF

SSE N S

2

2 2 2

1 1

ˆ ˆ ˆ and N N

i ii i

SST e e SSE v

Slide 8-58Principles of Econometrics, 3rd Edition

(8B.4)

(8B.3)2 2( 1)( 1)

/ ( ) S

SST SSES F

SSE N S

2var( ) var( )i i

SSEe v

N S

(8B.5)2

2var( )i

SST SSE

e

Slide 8-59Principles of Econometrics, 3rd Edition

(8B.6)

(8B.7)

24ˆ2 e

SST SSE

22 2 4

2 4

1var 2 var( ) 2 var( ) 2i

i i ee e

ee e

2 2 2 2

1

1ˆ ˆvar( ) ( )

N

i ii

SSTe e e

N N

Slide 8-60Principles of Econometrics, 3rd Edition

(8B.8)

2

2

/

1

SST SSE

SST N

SSEN

SST

N R