Econometrics HW V

21
UNIVERSITY OF ILLINOIS AT CHICAGO MASTER PROGRAM IN ECONOMICS ECONOMETRICS II PROBLEM SET # 5 STUDENT: HARLAN JEANCARLO LOPEZ OLIVAS 03/29/2011 1. Carefully discuss the advantages and disadvantages of 2SLS, 3SLS, LIML, OLS and I3SLS. It can be shown that at least in some cases; OLS has a smaller variance about its mean than does 2SLS about its mean, leading to the possibility that OLS might be more precise in a mean squared error sense. But this result must be tempered by the finding that the OLS standard errors are, in likelihood, not useful for inference purposes. Nonetheless, OLS is a frequently used estimator. Obviously, this discussion is relevant only to finite samples. Asymptotically, 2SLS must dominate OLS, and in a correctly specified model, any full information estimator must dominate any limited information one. The finite sample properties are of crucial importance. Most of what we know is asymptotic properties, but most applications are based on rather small or moderately sized samples (Greene, 2008, pp. 386). The reason that OLS is often used as a benchmark is because from among the class of all linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and 2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form coefficients are desired, identities in the system must be entered. The number of identities plus the number of estimated equations must equal the number of endogenous variables in the model. Moreover, Two-stage least squares estimation of an equation with endogenous variables on the right, in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables in the system because of loss of degrees

description

hw

Transcript of Econometrics HW V

Page 1: Econometrics HW V

UNIVERSITY OF ILLINOIS AT CHICAGO

MASTER PROGRAM IN ECONOMICS

ECONOMETRICS II

PROBLEM SET # 5

STUDENT: HARLAN JEANCARLO LOPEZ OLIVAS

03/29/2011

1. Carefully discuss the advantages and disadvantages of 2SLS, 3SLS, LIML, OLS and I3SLS.

It can be shown that at least in some cases; OLS has a smaller variance about its mean than does 2SLS about its mean, leading to the possibility that OLS might be more precise in a mean squared error sense. But this result must be tempered by the finding that the OLS standard errors are, in likelihood, not useful for inference purposes. Nonetheless, OLS is a frequently used estimator. Obviously, this discussion is relevant only to finite samples. Asymptotically, 2SLS must dominate OLS, and in a correctly specified model, any full information estimator must dominate any limited information one. The finite sample properties are of crucial importance. Most of what we know is asymptotic properties, but most applications are based on rather small or moderately sized samples (Greene, 2008, pp. 386).

The reason that OLS is often used as a benchmark is because from among the class of all linear estimators, OLS produces minimum variance. The loss in predictive power of LIML and 2SLS has to be weighed against the fact that OLS produces biased estimates. If reduced-form coefficients are desired, identities in the system must be entered. The number of identities plus the number of estimated equations must equal the number of endogenous variables in the model.

Moreover, Two-stage least squares estimation of an equation with endogenous variables on the right, in contrast with OLS estimation, in theory produces unbiased coefficients at the cost of some loss of efficiency. If a large system is estimated, it is often impossible to use all exogenous variables in the system because of loss of degrees of freedom. The usual practice is to select a subset of the exogenous variables. The greater the number of exogenous variables relatives to the degrees of freedom, the closer the predicted Y variables on the right is to the raw Y variables on the right. In this situation, the 2SLS estimator sum of squares of residuals will approach the OLS estimator sum of squares of residuals. Such an estimator will lose the unbiased property of the 2SLS estimator. Usual econometric practice is to use OLS and 2SLS and compare the results to see how sensitive the OLS results are to simultaneity problems.

While 2SLS results are sensitive to the variable that is used to normalize the system, limited information maximum likelihood (LIML) estimation, which can be used in place of 2SLS, is not so sensitive. The LIML estimator, which is hard to explain in simple terms, involves selecting values for b and δ for each equation in (1.1) such that L is minimized where L = SSE1 / SSE. We define SSE1 as the residual variance of estimating a weighted average of the y variables in the equation on all exogenous variables in the equation, while SSE is the residual variance of estimating a weighted average of the y variables on all the exogenous variables in the system. Since SSE SSE1, L is bounded at 1. The difficulty in LIML estimation is selecting the weights for combining the y variables in the equation.

Assume equation (1.1);

Page 2: Econometrics HW V

(1.1)

Then

(1.2)

Ignoring time subscripts, we can define

(1.3)

If we define and we knew the vector we would know y*1 since

and could regress y* on all x variables on the right in that equation and call the residual variance SSE1

and next regress on all x variables in the system and call the residual variance SSE. If we define X1 as a matrix consisting of the columns of the x variables on the right X1= [x1i,...,x1K], and we knew B1*, then we could estimate

as

(1.4)

However, we do not know B1*. If we define

(1.5)

(1.6)

Where X is the matrix of all X variables in the system, then can be written as

(1.7)

Minimizing L implies that

det (1.8)

The LIML estimator uses eigenvalue analysis to select the vector B1* such that L is minimized. This calculation involves solving the system

Page 3: Econometrics HW V

(1.9)

for the smallest root L which we will call This root can be substituted back into equation (1.8) to get B1* and into equation (1.4) to get Γ1. Jennings shows that equation (1.9) can be rewritten as

. (1.10)

Further factorizations lead to accuracy improvements and speed over the traditional methods of solution outlined in Johnston (1984), Kmenta (1971), and other books. Jennings (1973, 1980) briefly discusses tests made for computational accuracy, given the number of significant digits in the data and various tests for non unique solutions. Since the LIML standard errors are known only asymptotically and are, in fact, equal to the 2SLS estimated standard errors, these are used for both the 2SLS and LIML estimators. The LIML has several disadvantages, for example, it is computationally expensive, as I mentioned before, it has the same asymptotic distribution as 2SLS and do not perform in small sample as 2SLS does but I does not vary when equations are normalized.

Three-stage least squares utilizes the covariance of the residuals across equations from the estimated 2SLS model to improve the estimated coefficients B and Γ. If the model has only exogenous variables on the right-hand side (B = 0), the OLS estimates can be used to calculate the covariance of the residuals across equations. The resulting estimator is the seemingly unrelated regression model (SUR).

In a model with G equations, if the equation of interest is the jth equation, then assuming the exogenous variables in the system are selected correctly and the jth equation is specified correctly, 2SLS estimates are invariant to any other equation. 3SLS of the jth equation, in contrast, is sensitive to the specification of other equations in the system since changes in other equation specifications will alter the estimate of V and thus the 3SLS estimator of δ from equation

(1.11)

Because of this fact, it is imperative that users first inspect the 2SLS estimates closely. The constrained reduced form estimates, π, should be calculated from the OLS and 2SLS models and compared. The differences show the effects of correcting for simultaneity. Next, 3SLS should be performed. A study of the resulting changes in δ and π will show the gain of moving to a system-wide estimation procedure. Since changes in the functional form of one equation i can possibly impact the estimates of another equation j, in this step of model building, sensitivity analysis should be attempted.

In a multiequation system, the movement from 2SLS to 3SLS often produces changes in the estimate of δ i for one equation but not for another equation. In a model in which all equations are over identified, in general the 3SLS estimators will differ from the 2SLS estimators. If all equations are exactly identified, then V (in the first stage of 2SLS, π is the unconstrained, reduced form, Y= πX+V) is a diagonal matrix (Theil 1971, 511) and there is no gain for any equation from using 3SLS. When one equation is over identified and one equation is exactly identified, only the latter equation will be changed by 3SLS. This is because the exactly identified equation gains from information in the over identified equation but the reverse is not true. The over identified equation does not gain from information from the exactly identified equation.

Finally, Iterative 3SLS is an alternative final step in which the estimate of V is updated from the information from the 3SLS estimates. The problem now becomes where do you stop iterating on the estimates of V? The simeq command

Page 4: Econometrics HW V

on B34S uses the information on the number of significant digits (see ipr parameter) in the raw data and equation (1.12),

,

(1.12)

To terminate the I3SLS iterations if the relative change is within what would be expected, given the number of significant digits in the raw data. If ipr is not set, the simeq command assumes ten digits (see Stokes, ch.4, pp.1-16).

The following tables show the KLEIN MODEL I results.

Page 5: Econometrics HW V

2. Discuss and Contrast GMM and 2SLS.

The Generalized Method of Moments" was introduced by L. Hansen in his celebrated 1982 paper. The application of the instrumental variables (IV) estimator in the context of the classical linear regression model, from a textbook context, is quite straightforward: if the error distribution cannot be considered independent of the regressor' distribution, IV is called for, using an appropriate set of instruments. But applied researchers often must confront several hard choices in this context.

An omnipresent problem in empirical work is Heteroskedasticity. Although the consistency of the IV coefficient estimates is not affected by the presence of Heteroskedasticity, the standard IV estimates of the standard errors are inconsistent, preventing valid inference. The usual forms of the diagnostic tests for endogeneity and overindetifying restrictions will also be invalid if Heteroskedasticity is present. These problems can be partially addressed through the use of Heteroskedasticity-consistent or robust standard errors and statistics. The conventional IV estimator (though consistent) is, however, inefficient in the presence of Heteroskedasticity. The usual approach today when facing Heteroskedasticity of unknown form is to use the Generalized Method of Moments (GMM), introduced by L. Hansen (1982). GMM makes use of the orthogonality conditions to allow for efficient estimation in the presence of Heteroskedasticity of unknown form (Baum et al, 2002).

The Generalized Method of Moments estimation technique is a generalization of 2SLS that allows for various assumptions on the error distribution. Assume there are l instruments in Z. The basic idea of GMM is to select

coefficients such that

(2.1)

where

(2.2)

It can be shown that the efficient GMM estimator is

(2.3

where

(2.4)

Using the 2SLS residuals, a Heteroskedasticity-consistent estimator of S can be obtained as

Page 6: Econometrics HW V

(2.5)

Which has been characterized as a standard sandwich approach to robust covariance estimation.

Hall – Rudebusch - Wilcox (1996) proposed a likelihood ratio test of the relevance of instrumental variables Z that is

based on canonical correlation between the X and Z . The ordered canonical correlation vector can be calculated as the square root of the eigenvalues of

(2.6)

with associated eigenvectors or the square root of the eigenvalues of

(2.7)

with associated eigenvectors . The vectors and maximize the correlation between and which

equals . As noted by Hall-Rudebusch-Wilcox (1996, 287) “ and are the vectors which yield the highest

correlation subject to the constrains that and are orthogonal.” The proposed Anderson statistic

(2.8)

is distributed as Chi-squared with (l-k+1) degrees of freedom where l is the rank of Z and k is the rank of X and can be applied to both 2SLS and GMM models. A significant statistic is consistent with appropriate instruments. A disadvantage of the Anderson test, is that it assumes that the regressors are distributed multivariate normal. Further information on the Anderson test is in Baum (2006, 208). The Anderson statistic can also be displayed in LM form as

or in the Cragg-Donald (1993) form as . If these ststistics are not significant, the instruments selected are weak.

For GMM estimation the Hansen (1982) J statistic which tests for overidentifying restrictions is usually used. The Hansen test, which is also called the Sargon (1958) test, is the value of the efficient GMM objective function

(2.9)

and is distributed as chi-square with degrees of freedom l-k. A significant value indicates the selected instruments are not suitable.

The Basmann (1960) over identification test is

Page 7: Econometrics HW V

(2.10)

where is the residual from the LS2 equation and is the residual from a model that predicts as a function of Z. The Basmann test is distributed as chi-square with degrees of freedom l-k. If the instruments Z have

no predictive power, or in other words are orthogonal to the LS2 residuals, then and the chi-square value will not be significant. A significant chi-square value, however, indicates that the instruments are not suitable since they are not exogenous.

The advantages of GMM over IV are clear: if Heteroskedasticity is present, the GMM estimator is more efficient than the simple IV estimator, whereas if Heteroskedasticity is not present, the GMM estimator is no worse asymptotically than the IV estimator. Nevertheless, the use of GMM does come with a price. The problem, as Hayashi (2000) points out, is that the optimal weighting matrix S at the core of efficient GMM is a function of fourth moments, and obtaining reasonable estimates of four GMM estimators can have poor small sample properties. In particular, Wald tests tend to over-reject the null. If in fact the error is homoscedastic, IV would be preferable to efficient GMM. For this reason a test for the present of Heteroskedasticity when one or more regressor is endogenous may be useful in deciding whether IV or GMM is called for (see Baum et al, 2002 and Pagan and Hall, 1983).

Tables IV presents the results form KLEIN MODEL I.

Note: in order to get accuracy using RATS, so the next codes were used:

b34sexec options ginclude('b34sdata.mac') member(klein1);b34srun;

b34sexec data set maxlag=1;build plag xlag;gen plag = lag1(profit);gen xlag=lag1(x);b34srun;

b34sexec simeq printsys reduced ols liml ls2 ls3 ils3 kcov=diag maxit=100;Exogenous constant wg gov tax a klag plag xlag;Endogenous con invest wp x profit k twage;

model lvar=con rvar=(constant profit plag twage) name=('c');model lvar=invest rvar=(constant profit plag klag) name=('I');model lvar=wp rvar=(constant x xlag a) name=('wp');b34seend;

B34SEXEC OPTIONS HEADER$ B34SRUN$b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$b34sexec options clean(28)$ b34srun$b34sexec options clean(29)$ b34srun$b34sexec pgmcall$rats passasts

Page 8: Econometrics HW V

pcomments('* ','* Data passed from B34S(r) system to RATS','* ',"display @1 %dateandtime() @33 ' Rats Version ' %ratsversion()"'* ') $PGMCARDS$** heading=('HW5 Question 1 - Klein model ' ) $* exogenous constant wg gov tax a klag plag xlag $* endogenous con invest wp x profit k twage $* model lvar=con rvar=(constant profit plag twage) name=('c');* model lvar=invest rvar=(constant profit plag klag) name=('I');* model lvar=wp rvar=(constant x xlag a) name=('wp');

instruments plag klag xlag wg gov a tax constant * OLS

linreg con#constant profit plag twage

* 2SLS

linreg(inst) con#constant profit plag twage

* GMM

linreg(inst,optimalweights) con#constant profit plag twage

* OLS

linreg invest#constant profit plag klag

* 2SLS

linreg(inst) invest#constant profit plag klag

* GMM

linreg(inst,optimalweights) invest#constant profit plag klag

* OLS

linreg wp#constant x xlag a

* 2SLS

linreg(inst) wp#constant x xlag a

* GMM

Page 9: Econometrics HW V

linreg(inst,optimalweights) wp#constant x xlag a

b34sreturn$b34srun $b34sexec options close(28)$ b34srun$b34sexec options close(29)$ b34srun$b34sexec options/$ dodos(' rats386 rats.in rats.out ')dodos('start /w /r rats32s rats.in /run')dounix('rats rats.in rats.out')$ B34SRUN$b34sexec options npageoutWRITEOUT('Output from RATS',' ',' ')COPYFOUT('rats.out')dodos('ERASE rats.in','ERASE rats.out','ERASE rats.dat')dounix('rm rats.in','rm rats.out','rm rats.dat')$B34SRUN$

Page 10: Econometrics HW V

3. MATLAB SOLUTION

(3.1)

The above simultaneous equations can be written as

(3.2)

From equation (3.2), the constrained reduced form can be calculated as

(3.3)

If is estimated directly with OLS, then it is called the unconstrained reduced form. The B34S simeq command estimates B, using OLS, 2SLS, LIML, 3SLS, I3SLS, or FIML. For each estimated vector B, the associated reduced form coefficient vector π can be optionally calculated. If B is estimated by OLS, the coefficients will be biased since the key OLS assumption that the right-hand-side variables are orthogonal with the error term is violated. Model (3.2)

can be normalized such that the coefficients . The necessary condition for identification of each equation is that the number of endogenous variables - 1 be less than or equal to the number of excluded exogenous variables. The reason for this restriction is that otherwise it would not be possible to solve for the elements of uniquely in terms of the other parameters of the model. Finally, B is the matrix of coefficients corresponding to the exogenous x’s, and Γ corresponds to the coefficient of the endogenous variables. U is the vector of residuals.

π=−B−1Γ=[ −1−1+γ1 γ 2

β11−1

−1+γ1 γ 2

β21

−γ 1

−1+γ1 γ 2

β32−γ 1

−1+γ 1 γ2

β42

–γ 2

−1+γ 1 γ2

β11

−γ2

−1+γ1 γ 2

β21−1

−1+γ1 γ 2

β32−1

−1+γ 1 γ2

β42 ](3.4)

Using the MATLAB code, we calculate the reduced form:

% y1= g1*y2 + b11*x1 + b21*x2% y2= g2*y1 + b32*x3 + b42*x4

% We know BY+GX=E

syms g1 g2 b11 b21 b32 b42

B =[ 1, -g1; -g2, 1]

G =[-b11,-b21,0,0; 0,0,-b32,-b42]

Page 11: Econometrics HW V

a= -1*inv(B)*G

p11=a(1,1)p12=a(1,2)p13=a(1,3)p14=a(1,4)p21=a(2,1)p22=a(2,2)p23=a(2,3)p24=a(2,4)

< M A T L A B > Copyright 1984-2006 The MathWorks, Inc. Version 7.3.0.298 (R2006b) August 03, 2006

To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> % y1= g1*y2 + b11*x1 + b21*x2>> % y2= g2*y1 + b32*x3 + b42*x4>> >> % We know BY+GX=E>> >> syms g1 g2 b11 b21 b32 b42>> >> B =[ 1, -g1; -g2, 1] B = [ 1, -g1][ -g2, 1] >> >> G =[-b11,-b21,0,0; 0,0,-b32,-b42] G = [ -b11, -b21, 0, 0][ 0, 0, -b32, -b42] >> >> a= -1*inv(B)*G a = [ -1/(-1+g1*g2)*b11, -1/(-1+g1*g2)*b21, -g1/(-1+g1*g2)*b32, -g1/(-1+g1*g2)*b42][ -g2/(-1+g1*g2)*b11, -g2/(-1+g1*g2)*b21, -1/(-1+g1*g2)*b32, -1/(-1+g1*g2)*b42] >> >> p11=a(1,1) p11 = -1/(-1+g1*g2)*b11

Page 12: Econometrics HW V

>> p12=a(1,2) p12 = -1/(-1+g1*g2)*b21 >> p13=a(1,3) p13 = -g1/(-1+g1*g2)*b32 >> p14=a(1,4) p14 = -g1/(-1+g1*g2)*b42 >> p21=a(2,1) p21 = -g2/(-1+g1*g2)*b11 >> p22=a(2,2) p22 = -g2/(-1+g1*g2)*b21 >> p23=a(2,3) p23 = -1/(-1+g1*g2)*b32 >> p24=a(2,4) p24 = -1/(-1+g1*g2)*b42

4. CHARACTERISTIC ROOTS

Page 13: Econometrics HW V

4.1 Example 13.8 page 393 in Greene (2008)

test=[.172, -.051,-.008, 1.511, .848, .743, -.287, -.161, .818];teste=eig(test)A=sqrt(e(2)*e(3))B=acos(real(e(2))/A)period=(2*pi()/B)

âtest2=[-0.1899, -0.9471, -0.8991, 0, 0.9287, 0 -0.0656, -0.0791, 0.0952]test2e2=eig(test2)A2=sqrt(e2(2)*e2(3))B2=acos(real(e2(2))/A2)period2=(2*pi()/B2)

< M A T L A B > Copyright 1984-2006 The MathWorks, Inc. Version 7.3.0.298 (R2006b) August 03, 2006

To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> test=[.172, -.051,-.008, 1.511, .848, .743, -.287, -.161, .818];>> test

test =

0.1720 -0.0510 -0.0080 1.5110 0.8480 0.7430 -0.2870 -0.1610 0.8180

>> e=eig(test)

e =

0.2995 0.7692 + 0.3494i 0.7692 - 0.3494i

>> A=sqrt(e(2)*e(3))

A =

0.8449

>> B=acos(real(e(2))/A)

B =

0.4263

Page 14: Econometrics HW V

>> period=(2*pi()/B)

period =

14.7376

The characteristic roots of this matrix are 0.2995, 0.7692 + 0.3494i, 0.7692 - 0.3494i = 0.8449 [cos 0.4263+- i sin 0.4263]. The moduli of the complex roots are 0.8449, so we conclude that the model is stable. The period of oscillation=14.7376 periods (years). Using MATLAB, the results show that our roots agree with Greene (2008), and the complex roots and the periods shortly differ with the result presented by Greene (2008).

4.2 Greene (2008) Question 7 page 397

< M A T L A B > Copyright 1984-2006 The MathWorks, Inc. Version 7.3.0.298 (R2006b) August 03, 2006

To get started, type one of these: helpwin, helpdesk, or demo. For product information, visit www.mathworks.com. >> >> >> test2=[-0.1899, -0.9471, -0.8991, 0, 0.9287, 0 -0.0656, -0.0791, 0.0952]

test2 =

-0.1899 -0.9471 -0.8991 0 0.9287 0 -0.0656 -0.0791 0.0952

>> test2

test2 =

-0.1899 -0.9471 -0.8991 0 0.9287 0 -0.0656 -0.0791 0.0952

>> e2=eig(test2)

e2 = -0.3290 0.2343 0.9287

>> A2=sqrt(e2(2)*e2(3))

A2 =

0.4664

Page 15: Econometrics HW V

>> B2=acos(real(e2(2))/A2)

B2 =

1.0446

>> period2=(2*pi()/B2)

period2 =

6.0148

We would require that all three characteristic roots have modulus less than one. An intuitive guess that the diagonal element greater than one would preclude this would be correct. There is no need to go any further. It is obvious that so there is no at least one characteristic root larger than 1. The system is stable.