CALIFORNIA INSTITUTE OF TECHNOLOGY · CMLE's under correct or incorrect specification of the...

33
DIVISION OF THE HUMANITIES AND SOCIAL SCIENCES CALIFORNIA INSTITUTE OF TECHNOLOGY PASADENA. CALIFORNIA 91125 MISSPECIFICATION CONDITIONAL MIMUM LIKELIHOOD ESTIMATION Quang H. Vuong ,�1UTE OF 0 < v . . , �v 5HALL SOCIAL SCIENCE WORKING PAPER 503 April 1983 Revised November 1983

Transcript of CALIFORNIA INSTITUTE OF TECHNOLOGY · CMLE's under correct or incorrect specification of the...

  • DIVISION OF THE HUMANITIES AND SOCIAL SCIENCES

    CALIFORNIA INSTITUTE OF TECHNOLOGY PASADENA. CALIFORNIA 91125

    MISSPECIFICATION AND CONDITIONAL MAXIMUM LIKELIHOOD ESTIMATION

    Quang H. Vuong

    ,�r,1\lUTE OF �� )"�

  • ABSTRACT

    Recently Whi te (1982) s tudi ed the prope rti e s of Maximum

    Like li hood e s timati on of po s s i b ly m i s spe c i fed mode ls. The p re s e nt

    pape r e xtends Andersen (197 0) re sults ou Con d i ti onal Maximum

    L ike li hood e s timators (CMLE) to suc h a s i tuati on. In parti cular, the

    asympto tic p rope rti e s o f CMLE's are de rived unde r corre c t and

    inco rre c t spe c i f ication of the cond i ti onal mode l. Robus tne s s o f

    condi tional inf e rence s and e s timati on with re spe c t to m i ss pe c i f i c ation

    of the mode l for the cond i ti oning vari ab le s i s emphas i zed. Condi ti ons

    for asympto ti c e f fi c i ency of CMLE's are obtaine d , and spe c i f ic ation

    te sts i l! Hausman (1978) and Whi te (1982) are de ri ve d . Examples are

    also g i ven to i llus trate the use of CMLE's p rope rti e s . The se e xamp le s

    inc lude the s imple line ar mode l, the multinom i al logi t mode l, the

    s imple Tob i t mode l, and the multi vari ate log i t mode l.

  • 1 . Introduc tion

    MISSPECIFICATION AND CONDITIONAL

    MAXIMUM L I KELIHOOD ESTIMATION•

    Quang H, Vuong

    In most applied work, e s timation and inference are conducted

    conditional upon the ob se rved value s of some e xp lanato ry variab le s

    e ven though m o s t d ata in the soc ial s c ience s are no t outcome s o f

    we ll-defined e xpe riments . This resu lts in part b e c ause soc ial

    s c ientists are o ften intere s ted in e s tim ating s o -c alled s truc tural o r

    b e havio ral relations hips b e tween e ndogenous variab le s o n the one hand

    and e xogenous variab le s on the o the r hand.

    The pre s e nt pape r firs t p ro vide s a jus tific ation for

    cond itional e s timation and infe rence by s tudy ing the p rope rtie s o f

    Cond itional Maximum-Like lihood E s timato rs (CMLE). Spec ific ally , we

    gene ralize Ande rsen's (1970) re sults by de riving the asymptotic

    p rope rtie s of C!.ILE's unde r co rre c t and incorrect spe c ific ation of the

    cond itional model. This is done by fo llowing the line s of W hite 's

    (1982) impo rtant paper. I t is then obs e rved that the prope rtie s o f

    CMLE's and the infe rence s b ased on O.ILE's are robus t with re spe c t to

    m is spec ific ation o f the mode l for the cond itioning variab le s .

    Cond itional maximum -like lihood e s tim ation m ay , howe ver,

    e ntail a los s of efficiency e spe c ially when the mode l for the

    cond itioning variab le s contains inform ation on the e s timated

    p arame ters . We then c harac te rize cond itions unde r whic h OIL

    e s timato rs are asymptotically as e ffic ient as FIML e s timators . These

    cond itions are ac tually we ake r than the cond ition that the

    cond itioning variab le s be we akly e xogenous in the sense of Engle ,

    Hend ry , and R ic hard (1983) ,

    2

    Finally , s ince it is o ften of great inte re st to know whe the r

    the cond itional mode l is co rrec tly spec ified, we also discuss

    spec ific ation te s ts within the pre se nt framework. I t is argue d that

    CMLE's c an re adily be used to construc t spe c ific ation te sts A.!.!.

    Hausman (1978) and W hite (1982) . The e s sential re ason come s from the

    fact that one c an o ften c hoose o r cons truct variable s to cond iton upon

    so that the re sulting cond itional like lihood contains the paramete rs

    of inte re st.

    The p ape r is organized as fo llows . Sec tion 2 presents our

    assumptions on the s truc ture gene rating the d ata and on the c hose n

    conditional mode l. Sec tion 3 s tudies the asymptotic p rope rtie s o f

    CMLE's und e r co rrect o r incorre c t spe c ific ation of the cond itional

    mode l. Sec tion 4 de rive s ne ce s s ary and suffic ient cond itions for

    asymptotic e ffic iency of CMLE's . Sec tion 5 use s CMLE's to cons truct

    Hausman -W hite type tests for m isspec ific ation. Particular c are is

    g iven to the formulation of the null and alternative hypothe se s for

    e ac h te st. Section 6 p re se nts some applic ations o f the prope rtie s o f

    CMLE's . T he firs t three e xamples are the s imple line ar mode l, the

    multinom ial log it mode l, and the s imple Tobit model, The fourth

    e xample con s ide rs the c ase in whic h the re e xists a sufficient

    statistic for some paramete rs . This is then illus trate d by the

  • 3

    mul tivariate l ogit mode l . Section 7 summarizes our resul t s, and an

    appendix collects the proof s .

    2 . Notations and Assumptions1

    Let Xt be an m X 1 observed real random vector defined on a

    Eucl idean measurabl e space ( X, ax' �x ) . For instance X, ax, and �x can re spectively be :Rm, the Borel a-al gebra, and the usual Lebe sgue

    measure . The proce s s generating the observations Xt, t=l,2, • • •

    satisfies the following a s sumption.

    ASSUMPTION Al: The random vectors Xt, t=l,2, • • • are independent and

    identical ly distributed with common ( true ) cumulative distribution

    function H0 on ( X, a , � ) . x x

    The vector Xt is partitioned into Xt = (Yt,Zt) ' where Yt and

    Zt are re spe ctively p and q dimensional vector s with m = p + q. Let

    (Y, a , � ) and ( Z, a , � ) be the Eucl idean measurable space s y y z z a ssociated with Yt and Zt .

    For any t, let F� l z < . l . l be the true but unknown conditional

    distribution of yt given zt . 2 We are interested in e stimating

    F� l z . To do so, we choose a (parametric ) family of conditional

    distribution functions FY l zC . l . ; a ) , where a belongs to a subset A of :Rk. Such a family may or may not contain the true conditional

    distribution F� l z< . I .) • It is, however, chosen so a s to satisfy the

    following regul arity conditions .

    ASSUMPTION A2: ( a ) For every a in A, a compact subset of ll k, and for

    4

    ( H0-almost ) a l l z, the conditional distribution FC . I z; a) has a

    ay-measurable density f( . I z; a ) = dFy l z< . I z; a l /d�y· ( b ) For

    ( H0-almost ) a l l ( y,z) , f (y l z; . ) is continuous and s trictly positive on

    A.

    Assumption A2 ensures that we can define ( almost surely) the

    conditional l og-likelihood function :

    Lc (Y I Z; a) n n [: log f (Yt l zt ; a) . t=l ( 2 .1 )

    A Conditional Maximum-Likel ihood Estimator (CMLE) is a an-mea surabl e x A function an of C X1, . • • ,Xn) such that :

    LCCY I Z;� ) n n sup Lc (Y I Z;a ) . 3

    aSA n ( 2 .2 )

    As stated below, Assumptions Al-A2 ensure the existence of a A CMLE, an' for every n. To e stablish the strong consistency of a

    sequence of CMLE' s, the next a s sumption is made .

    ASSUMPTION A3: ( a) For ( H0-almost ) al 1 ( y, z) ,

    l l og f ( y l z ; al l i M1( y,z) for all a in A where lli(.,.) is H0-integrabl e . ( b ) The function zf( aj = J log f ( y l z; a) dH0 ( y,z) has a

    unique maximum on A at a• ( say) .

    Part ( a ) of as sumption A3 ensures that zf( a ) is well-defined

    for any a in A, whil e part (b ) requires global asymptotic

    identifiability of a• ( see e . g . Bowden ( 1973) , Rothenberg ( 197 1 ) , and

    White ( 1982 ) ) .

  • Final ly, to derive the asymptotic distribution of a CMLE,

    additional assumptions are made on the conditional density f(y(z; a ) .

    ASSUMPI'ION A4: ( a ) For ( H0-almost ) a l l ( y,z) , log f ( y(z; .) is twice

    continuously differentiabl e on A. ( b ) For ( H0-almost ) a l l ( y,z) ,

    l a2 log f ( y(z; a ) /aaaa• I i M2 ( y,z) for a l l a in A where�( .,.) is

    n°-integrable. ( c ) For ( H0-almost ) a l l ( y,z) ,

    s

    (alog f ( y(z; a ) /aa . a log f ( y(z; a ) /aa• I i M3( y,z) for al l a in A where

    M3( .,.) is H0-inte grable.

    Part s ( a ) and (b) imply that z f ( .) is twice continuously

    differentiabl e on A and that we can reverse the order of

    differentiation and integration when computing the first and second

    partial derivative s of z f( .) . Given Parts ( b ) and ( c ) , Jennrich' s

    uniform strong Law of Large Numbers ( 1969, Theorem 2, p. 636) applie s

    to:

    and

    f 1 n A ( a ) = - L n n t=l

    f 1 n B ( a ) = - L n n t=l

    2 il log f (Yt(Zt ;a ) ilaila' ( 2.3)

    il log f (Yt(Zt;a) il log f (Yt (Zt;a ) ila . a a• ( 2.4)

    This ensures that Ar(: ) and Bf (: ) are strongly consistent estimator s n n n n of

    Af ( a*) 0 [a2

    Eo. log f (Yt(Zt; a•) ]

    ilaila' ' ( 2.5 )

    6

    and

    Bf( a*) 0 E0(a log f (Yt l zt : a•)

    aa a log f (Yt(Zt ;a*)]

    • aa• • ( 2.6)

    where E0[.] is the expectation with respect to the true c.d.f. H0 ( .) .

    ASSUMPTION AS: ( a ) a• is an interior point of A. ( b ) a• is a

    regul ar point of Af( a) . 0

    As is wel l known, Part ( a ) ensures that a zf/aa is nul l at a•.

    As in White ' s Theorem 3.1 ( 1982, p. 6 ) , Part (b ) toge ther with

    Assumption A3-b implies that Af ( a• ) is nonsingul ar. 0

    3. Asymptotic Propertie s of Conditional Maximum-Likel ihood Estimators

    Given the previous assumptions, the asymptotic propertie s of a

    sequence of CMLE' s can readily be derived by standard technique s based

    on l emmas given by LeCam ( 1953) and Jennrich ( 1969) . Al ternatively,

    these propertie s can be obtained from White's ( 1982 ) re sul ts by noting

    that a CMLE can be thought of as a Qua si Maximum-Likelihood e stimator

    (QMLE) .

    Let G be a po stul ated distribution for Zt, and let HG be the

    family of joint distributions H6( .,.;a) for Xt = (Yi,Zf) ' defined as:

    G G G I H = [H ( .,.;a) : H ( .,.;a) = FY(Z( . .;a) G( .), aS A}.

    Suppose that G has a oz-measurable density g ( , ) = dG/d�z which is

    ( H0-almo st surely) strictly po sitive. Then given A2, for any a in A,

  • iPc.,.;a ) has a a -measurable density, h6( .,.;a ) , which is ( H0-almost x surely) strictly po sitive. A QMLE, :

  • 9

    in A. The next result follows from Jensen's inequality ( Rao (1973, p.

    58)) applied to the conditional densities f ( ylz;a) and f (ylz;a0 ) ,

    LEMMA 2 : Given Assumptions Al-A3 , if F�lz< . I . > = Fylz< .l .;a0 ) , then

    a• = a0•

    Equality between Af ( a0 ) and -Bf ( a0 ) is obtained under the next 0 0 weak additional assumption which is similar to that used by e . g . ,

    Silvey ( 1959 , Assumption 13, p. 394) .

    ASSUMPTION A6: For ( H0-almost ) all z , Ja2fC ylz;a• > /aaaa• d�Y = o .

    It is then straightforward to prove the next result which is

    similar to the usual information matrix equivalence (see e . g . , White

    (1982, Theorem 3 .3)).

    LEMMA 3 : Given Assumptions Al-A4 and A6 , if F�lz< . I . >

    FYlz< .l . ;a0 ) , then for H0-almost all z :

    Eo [a 2 log f (Ytlz;a0 > ] = _ 0 [a log f (Ytlz;a0 ) Y I Z aaaa. EY 1z aa

    il log f ( Ytlz;a0 ) ] • aa•

    where �lz[,) is the expectation with respect to the true conditional

    distribution of Yt given Z=z .

    By taking the total expe ctation of both side s o f the previous

    equation with respect to the true distribution of Z, it follows that

    under the assumptions of Lemma 3:

    10

    Af( a• ) = -Bf ( a• ) , 0 0 ( 3 .3)

    i. e. , the usual information matrix equivalence,

    The asymptotic properties of a sequence of CMLE's, when the

    conditional model for Yt given Zt is correctly specified, simply

    follow from Theorem 1 and Lemmas 2 and 3,

    111EOREM 2 (Asymptotic Propertie s of CMLE's under correct specification A of the conditional model) : Let {an} be a sequence of CMLE's . If

    F�lz< . I . > = FYlz< .l . ,a0 ) , then :

    (a ) A Given Assumptions Al-A3 , an a . s .

    � ao

    ( b ) Given Assumptions Al-A4 ,

    ( c )

    f A a . s . f f A a . s . f An(a ) � A ( a0 ) , B ( a ) � B ( a0 ) , n o n n o A D Given Assumptions Al-A6 , Jn(a - a0) � N(O, C(a0) ) where n

    C (a0 > = - (A!( a0 > ]-l (B!( ao ) ]-1.

    Theorem 2 is basically Andersen ' s (1970) re sult in a different

    framework ( see Example 4 below, and footnote 14) . Theorem 2

    emphasiz es , however, the robustne ss of a CMLE with respect to po ssible

    misspecification of the distribution of the conditioning variable s Zt,

    Specifically , suppose that one specifies a joint parametric model for

    (Yt, Zt) ' i. e . , choose s some parametric family of joint distributions

    {H( . , . ;&) ; &E9} where 9 is some parameter space . Then, the associated

    family of conditional distributions for Yt given Zt is nece s sarily

  • 11

    parameterized by some parameter a in some parameter space A so that it

    can be written as {FYlzC .l.,a ) ; aEAJ. For instance, a may be 9

    itself, a subvector of 9, or more generally a function of 9. If the

    conditional model for Yt given Zt is correctly specified and if

    Assumptions Al-A6 are satisfied, then the aforementioned propertie s of

    a CMLE and the clas sical inference s based on Wald, Lagrange

    J\lul ti plier, and Log-Likelihood Ratio statistics are val id even though

    the induced marginal model for Zt may be incorrectly spe cified., i.e.,

    even though the true marginal distribution G0( .) of Zt may not belong

    to the family {G( .;9) ; 9S&} where G( .;9) is the marginal distribution

    of Zt derived from H( .,.;9) .4 In particular, strong consistency of a

    CMLE to a0 is robust with re spect to misspe cification of the marginal

    model for zt.

    4. Asymptotic Efficiency of Conditional Maximum-Likelihood Estimator s

    Up to now, nothing has been said about asymptotic e fficiency

    of CMLE's. This is so because Assumptions A2-A6 do not require that a

    probability model for the conditioning variable s Zt be specified. If,

    however, one specifies a joint probability model for (Yt,Zt)

    parameteriz ed by 9 in&, as above, then under suitable regularity

    conditions, one can define the information matrix ( for one

    observation ) as usual by :

    I (9) 0[B log h( Yt'Zt;9) ] Var Be ( 4.1)

    where "var o,, means that the variance-covariance matrix is computed

    with respect to the true distribution H0 of (Yt,Zt) . Given the

    12

    previous as sumptions and some usual regularity conditions on the joint

    density h( .,.;.) , it follows from the asymptotic efficiency of FIML

    when H0 (Y,Z) = H(Y,Z;9°) for some 9° ( see e.g., Rao (1963) ) that

    CMLE's are not in general asymptotically ef ficient estimators of a0 in

    the sense that:

    C Ca0 l l � I [ic0°> ]-l 9.sL j . B9 9o B9 9o ( 4.2 ) This is so because the marginal probability model for the conditioning

    variable s Zt may contain unused information on a0 It follows that

    CMLE's are in general asymptotically inef ficient estimators of a0 even

    when the conditional model for Yt given Zt is correctly specified. In

    some sense, we have traded off efficiency for some robustne s s by using

    CML estimation instead of FIML estimation. In this section, we shall

    recover the aforementioned result by embedding the issue of efficiency

    of CMLE's in a more general framework. In addition we shall

    characterize the conditions under which CMLE's are efficient.

    Let { CY1t) ,(Y2tl} be a partition of the set of variables (Yt) .

    I.et p . b e the number of variable s in Y .t where p . 2 1 for i=l,2.5 1 1 1

    Thus pl + p2 = p. We shall consider the conditional model for Ylt given CY2t,Zt) induced by the conditional model for Yt given Zt.

    Given Assumption A2, the conditional density of Ylt given (Y2t,Zt) is k

    parameterized by some parameters a1 in Jl 1

    that are functions of a,

    i.e., a1 = a1C a ) . Let J (a ) be the Jacobian at a, if it exists, of the

    transformation a1( .), i.e. J ( a) = [Ba1/Ba'].

  • 13

    ASSUMPI'ION A7: (a ) The function a1( . ) is continuously differentiable .

    (b ) For any a in A, the Jacobian J ( a) has full row rank.

    Let A1 = a1(A ) . Given Assumption A7-(a) , � is a compact kl subset of 1l Assumption A7-(b ) implie s in particular that k1 i k .

    Let Assumptions A2'-A6' be the assumptions on the conditional

    model for Y1t given (Y2t,Zt) that correspond to the previous

    Assumptions Al-A6 . For instance, Assumption A3'- ( b ) states that the f

    function z 1 C a1 ) defined as flog f1 ( y1 l y2,z;a1 ) dH0 ( y,z) , where f1 C . l . ,. ; a1 ) denote s the conditional density of Y1t given (Y2t,Zt) for

    a1 in�· has a unique maximum ai on A1•

    It is important to note that in general ai is not equal to

    a1C a• ) where a• maximizes the function z f ( a ) ( see Assumption A3) .

    This remark will be used in the following section. On the other hand,

    if the conditional model for Yt given Zt is correctly specified, i. e .

    F� l z < . I . ) = F� tz< . l . ;a0) for some a0 in A, then the conditional model for Y1t given (Y2t,Zt) is nece s sarily correctly specified, i. e . ,

    o I o I o o . o FY I Y z< · ., . ) = FY IY z< • . , . ;a1 ) for some a1 1n AJ:· Moreover, we 1 2 1 2 must have :

    0 0 a1 = a1 ( a ) .

    A A Let a1n be the estimator defined by a1n A A

    ( 4 .3)

    a1 Ca0 ) where an is

    the CMLE obtained by estimating the conditional model for (Ylt' YZt)

    given Zt. Then, let a1n be the CMLE obtained by estimating the

    conditional model for Ylt given CY2t,Zt) . We shall first study the

    14

    properties of these two e stimators under general conditions . fl fl fl fl Let A0 ( . ) , B0

    ( . ) , An ( . ) , and Bn ( . ) be analogous to the

    corresponding matrice s for f ( y l z; a ) defined in Section 2 . Let

    f fl • • _

    0(alog f (Yt l zt;a•) a log f1 CY1t lY2t,Zt; ai) ] B ( a , al) - E a • a , , o a a1

    and

    f f1 l n alog f (Yt l zt;a) alog f1 CYlt lY2t'zt' al) B ( a' al) = - L a • a ' n n t=l a a1 The existence of

    ffl B0

    ( a•,at> and the strong convergence of

    ( 4.4)

    ( 4 . s )

    f fl A -Bn ( an,aln ) to ffl B0 ( a*,ai> are ensured by the following as sumption.

    ASSUMPTION AS: For ( H0-almost) all ( y,z) ,

    l alog f ( y l z; a) /aa . alog f1 C y1 ly2,z;a1> /aai l i M4( y,z) for all ( a,a1)

    in AX A1 where M4( . ,. ) is H0 integrable .

    The following theorem give s the joint asymptotic distribution A -of ( aln' aln) even when the conditional model for (Ylt' Yzt> given Zt is

    mis specified. Let

    where

    [c11 ( a• ) C ( a*,ai ) = C2l( a*,ai)

    c12( a*,ai) l c22 < at>

    ell ( a•) J (a• > [A!< a• > ]-1B! < a• > [A!< a• >]-11• ( a•) ,

    ( 4 .6 )

  • C12( a•,at > f 1 f fl fl

    [ l-1

    c21 < a•,at > = J{a*) [A0( a*) ]- B0 ( a*,at > A0 ( at> ,

    c22 < at> [ f l-l f [ f l-l Aol < at> Bol < at> Aol < at>

    A - A -

    15

    Let Cn (an,al n) be the sampl e ana l og of C (a*,at > evalua ted a t (an,al n) .

    111EOREM 3 (Joint Asymptotic Distribution of CMLE's) : Given Assumptions

    Al-AS , A2 '-AS', A7, and AS: A -( a ) For any n, the estimators (al n' al n) a lmost surely exist,

    ( b )

    ( c )

    ( d )

    A. _ a.s. ( al n' al n) � ( al ( a*) ,at > ' al n - al ( a*) D

    [A l Jn - � N(O,C (a*,at» , a - a* ln 1 A -Cn (an,aln)

    a.s. � C ( a*,at> ·

    A We now study the propertie s of the estimators a1n and a1n under correct specification of the conditional model for (Yl t' YZt )

    given Zt. As noted earlier, when the conditional model for (Yl t' YZt)

    given Zt is correctly spe cified, Equa tion ( 4.3) holds. It fol lows A - o that a1n and °in are both consistent estimator s of a1• The next

    A A A theorem states that the estimator a1n defined by a1C an) where an is

    the CMLE obtained by estimating the conditiona l mode l for (Yl t' YZt) -

    given Zt is at least a s efficient a s the CMLE a1n obtained by

    estimating the conditional mode l for Yl t given CY2t,Zt) . This is A A expe cted since the CMLE an and hence the estimator a1n are

    16

    asymptotica l ly efficient e stimator s of a0 and a� respectively when the

    conditional model for CY1t,Y2t > given Zt is correctly specified. The

    import of the theorem is that it a l so characterizes the case s for A which the CMLE a1n is as e fficient as a1n and therefore asymptotical ly

    efficient.

    We need some additional definitions and a lemma. Let, when it

    exist s,

    f B 2 ( ao ) 0

    [a log Eo.

    f2 (Y2t 1 Zt ; ao )

    aa a log f2 CY2t lzt ;a

    0) ] • aa• ( 4.7)

    where f2 ( . l .;a) is the conditional density of Y2t given Zt derived

    f rom the conditional density f ( .,. l .; a) of CY1t,Yzt l given Zt.

    LEMMA 4 : Given Assumptions Al-AS, A2'-A5', and A7, if

    F�rz< . I .> = FY lzC . l .; a0> . then :

    Bf (ao ) 0 f f

    J'( ao ) B l ( aol )J( ao ) + B 2 ( ao ) . 0 0

    Where each of the above matrice s is finite.

    On the other hand, from the rank factorization of the k1 X k

    matrix J ( a0) ( see Rao ( 1973, p. 19) ) we have:

    J{ao) MLN ( 4.8 )

    where M is a k1 X k1 non-singul ar matrix, N is a k X k orthogonal

    matrix, and L is a k1 X kl matrix of which al l the element s are nul l

  • 17

    except the f irst r di agonal element s which are al l equal to one, where

    r = rank J ( a0) . From Assumption A7, i t fol lows that r = k1 � k.

    Therefore :

    f Then, when k1 < k, we partition the k X k matrix NB0

    2 C a0) N' as

    L = [Ikl o]

    fol low s :

    f

    ( 4.9)

    NB 2ca0) N' 0

    [z11 z21

    Z12] z22

    ( 4.10)

    where zl l is a kl x kl matrix.

    lllEOREM 4 (Asymptotic Eff iciency of CMLE' s ) : G iven Assumptions Al-A6,

    A2'-A6 ', and A7, if F� l z< . I .> = Fy l z< . l .;a0) , then :

    0 0 c11 < 0 ) � c22 < a1 >

    where the equal ity holds if and only if : f

    ( i ) NB 2 C a0) N' = 0 when kl = k, 0 ( ii ) z11 - Z12 z;-� z21 = o when kl ( k.

    It i s easy to see that the inequal ity ( 4.2 ) discus sed a t the

    out set of thi s section i s a special case of the above general result.

    Indeed, it suffices to let in Theorem 4 , Z be the empty set, and the

    variables CY1,Y2) and the parameters ( a1,a ) be respectively the

    variables (Y, Z) and the parameters ( a,G) in the inequal ity ( 4.2 ) .

    18

    As another special case of Theorem 4, let us consider the case

    in which the variables CY2t,Zt) are weakly exogenous for a1 in the

    sense of Engle, Hendry, and Richard ( 1983) . Let a = C a1,a2 ) , and

    suppo se that A = � X Ai• and

    f2 C y2 l z; a) = f2 C y2 l z; a2> . ( 4.11 )

    i.e., that the (condi tional ) density of Y2t depends only on a2 Since,

    by def init ion of a1, the conditional density of Y1t given CY2t,Zt)

    depends only on a1, then ( a1,a2) operates a sequent ial cut. Let k2 be

    the number of parameters in a2, where k2 2 1. f

    It readily fol lows that the k X k matrix B 2 is of the form : 0

    f B 2 ( ao ) 0

    0

    0

    0

    _f2 0 Bo ( a2 )

    _f2 where B0 ( a�) is the k2 X k2 matrix def ined a s :

    Since

    _f2 0 Bo ( a2 ) r la log

    Eo . f2 CY2tlzt ; a�)

    aa2

    0 l a log f2 CY2tlzt;a2 > ! • • aa:i

    J (ao ) = [Ikl 0 J

    ( 4.12 )

    ( 4.13)

    it fol lows that M = Ik 1 and N Ik.

    f2 Therefore NB N' 0 f2 B It i s 0

  • 19

    -1 now e asy to see that Z11-z12 Z22 Z2l O. Thus from Theorem 4, the

    CMLE a1n is asymptotical ly efficient. As expected, when (Y2t,Zt ) are

    strictly exogenous for a1, no los s in efficiency is achieved by

    maximiz ing the condi tional l ikel ihood funct ion for Y1 g iven (Y2,Z) .

    As indicated by Theorem 4, the efficiency of CMLE's can arise,

    however, in other situations.

    5. Specification Te sts

    G iven the previous properties of CMLE's, it i s of interest to

    know whether the chosen ( or induced) conditional mode l for Yt given Zt i s correctly specified, i.e., whether F� l z < . I .> = Fy l z( .,.;a

    0) for

    some a0 in A. From Lemma 1 , it fol lows that to test such a

    hypothe s i s, we can apply the Information Matrix test propo sed by White

    ( 1982 , Theorems 4.1 ) . Thi s test i s based on the nul l ity of

    Af ( a• ) + Bf( a• ) which holds when the conditional mode l is correctly 0 0 spec i f ied ( see Equation (3.3) and Lemma 2 ) . Then the appropri ate

    assumptions and the appropri ate statistic are obtained by repl acing

    the joint density h( ,,,;&) by the conditional densi ty f( . l .;a) in

    Whi te's Assumpt ions A.8-A.10 and in White's statistic ( 4.1 ) .

    In this section, we shal l argue that CML estimation i s a

    convenient tool for carrying out the te sts for parameter est imator

    consi stency propo sed by Hausman ( 1978) and White ( 1982 , Sect ion 5) .

    The e s sential reason comes from the fact that we can choose or

    construct some appropriate variabl es to condition upon so that the

    parameters of interest appe ar in the re sul ting conditional l ike l ihood.

    20

    We shal l use the general framework introduced in Sect ion 4.

    The f irst specif ication test i s based on the equation :

    at = a1( a• ) ( S .l )

    which holds when the conditional model for (Ylt'Y2t ) given Zt i s A correctly specified ( see Equation (4.3)). From Sect ion 4 , a1n as

    A - o def ined by a1( an) ' and «in are both consi stent estimators of at = a1 A under correct specif ication. Moreover a1n is asymptotically

    A efficient. Thus, fol lowing Hausman ( 1978) , the difference a1n - a1n can be used t o construct a test of equation ( S.1 ) .6

    Let

    v c22 < at > + c11< a• ) - c12< a•,at> - Cz1 < a•,at>

    where the matrices on the right-hand s ide are def ined in Equation A A

    ( 4.6 ) . Let Vn be the sampl e analog of V evaluated a t (an,aln) .

    ( s .2)

    Contrary to White's Assumpt ion A.12, the k1 X k1 matr ices V and Vn may

    turn out to be singul ar. Let r and rn be their respective rank.

    When 1 � r < k1 and 1 � rn < k1, we use general ized inverse s

    of maximum rank, i.e., of rank k1 of V and Vn. Specif ical ly, let R be

    a k1 X (k1 - r) matrix such that the k1 X ( 2k1 - r) parti tioned matrix

    [V, R] is of rank k1• The k1 X (k1 - rn) matrix Rn i s simil arly

    def ined with respe ct to Vn. Then, from Rao and Mitra ( 1971, Sect ion 2.7) it fol lows that the matr ice s V +RR' and V +R R' are non-n n n singular and that [V + RR']-l and [V + R R']-l are general ized n n n inverse s of rank k1 of V and Vn respectively. The matrices R and Rn

  • are not, however, unique. From now on, it i s assumed that R and Rn are unique ly def ined by the same continuous selection rule. Thi s

    impl ies in parti cul ar that Vn + RnR� i s a continuous function of Vn.

    Thi s ensures that Vn + RnR� is a strongly cons i stent estimator of

    V + RR' whenever Vn i s a strongly cons i stent e st imator of V.

    Let

    21

    H n A -1 - A n(aln - aln) '[Vn+ RnR�] ( al n - aln) • (5 .3)

    Let H0 be the hypothe s i s that at a1C a• ) , and l e t H1 be i t s

    compl ement.

    lHEOREM 5 ( Hausman Te st ) : Suppose that Assumptions Al-AS, A2'-AS', A7

    and AS hold. Suppo se that V I O. D

    ( a ) under H , H � x2 , o n r a. s.

    ( b ) under H1 , Hn � ...

    Then :

    Thus, if H exceeds the critica l value for the "!?- di stribution n r at a given s ignif icance l evel, one must reject the hypothes i s that

    at:a1C a• ) and hence that the conditional mode l for (Yl t' Yzt l g iven Zt i s correctly specif ied. Part (b ) of Theorem 5 states that th is t e st

    is actually consi stent.

    It is al so important to note that the test is val id only when - - A

    V I O. Second, the asymptotic coveriance matrix of ./n ( al n - a1n l is

    not simply the difference be tween the asymptotic covariance matrices A

    c22 • and

    for H0-almost a l l ( y2, z)

    E; IY z[a log fl (Ylt,Y2•z;at > /aa1l 1 2

    then under H : 0

    O,

    22

    v [a:1 < at> l-l - J (a• > [a! < a• >]-1J•( a• ) . (5 .4) Condition ( i) is just the information matrix equival ence at a•

    and at for the conditional densities f and f1 • Condition ( ii ) i s f

    stronger than the requi rement that at maximize z 1 ( .) which i s

    E0[log f1 C Y1t lY2t,Zt; . )]. I t i s, however, worth noting tha t condition

    ( ii) is automatical ly sati sf ied when the conditional mode l for Yl t given CY2t , Zt ) is correct ly specif ied. (This fol lows from Lemma A2 in

    the Appendix.) Moreover, as emphasized by Lemma 5, Equation (5.4)

    holds only under H0• In other words, Hausman's wel l-known formula

    (5 .4) e sentially holds under H0 and under correct specifica tion of the

    conditional model for Y1t given CY2t,Zt) .

    Final ly, let us note from Theorem 4 that, under correct

    spedi f ication of the condi tional mode l for (Ylt'Y2t ) given Zt the

    condition that V be non-zero is simply equival ent to the condition

  • tha t a1n be an ine fficient e st imator of at·

    Our second specification test i s based on the equation:

    E0[a log f1 (Y1t l Y2t,Zt ; a1 < a• ) ) /aa11 = o .

    23

    ( s .s )

    Indeed thi s must hold when the conditional mode l for (Ylt' Y2t > given

    Zt is correctly specified s ince in such a case af = a1( a• ) ( see f

    Equat ion (4 .3)), and s ince by def inition at maximizes z 1( . ) , Thus

    fol lowing White (1982 ) , the appropri ate test statistic i s ba sed on c A I A A ( l/n) aLn(Y1 1Y2,Z;a1n> aa1 where a1n = a1( an) . Let :

    0 fl fl fl A0 ( a1( a*) ) C11< a• )A0 ( a2 ( a• ) ) + B0 ( a1( a• ) )

    fl f -1 ffl - A0 ( a1( a• ) ) J ( a• ) [A0( a• ) ] B0 ( a•,a1( a• ) )

    fl f f -1 fl - B0 ( a•,a1( a*) ) [A0( a• ) ] J'( a• )A0 ( a1( a• ) )

    fl a z ( al ( a• ) ) fl a z ( al ( a• ) )

    aa1 aai

    It turns out that 0 i s the a symptotic covariance matrix of - c A I ( l/,/nl aLn (Y1 1Y2 ,Z; a1nl aa1• Note that 0 depends only on a• .

    A.

    ( s .6)

    Let On be the sampl e analog of 0 evalua ted at an' i.e. , where ( say) the l ast

    term in (S.6) is replaced by :

    c A. .!. ilLn(Y1 1Y2 , Z; a1Pl n aal

    c A l aLn(Y1 1 Y2 , Z; alp) ;- aa' 1

    24

    Let s and sn be the respective rank of 0 and On. We consider

    again general ized inverse s of maximum rank. Moreover, as before, we

    use a continuous se l ect ion rule for choosing the k1 X (k1 - sn) matrix

    Qn where Qn is such that the k1 X ( 2k1 - sn) matrix [On,Qn] is of rank

    kl.

    G n

    Let

    c A C A 1 aL (Y1 1 Y2,Z;a1 ) -l aL (Y11Y2, Z; a1 ) _ n n [O + Q Q ' ] n p n aai n n n aa1 (S.7)

    Let H� be the hypothes i s that Equation (S .S) holds, and let Hi be its compl ement,

    1HEOREM 6 ( Gradient Te st ) : Suppo se that Assumptions Al-AS, A2'-AS',

    A7 and AS hold. Suppo se that 0 F O. Then:

    ( a )

    ( b )

    D under H� , Gn � x:.

    a. s. under Hi ' Gn � .. .

    Thus if G exceeds the critical value for the � distribution, n s one must reject the nul l hypothe s i s H� and hence that the conditional model for (Ylt' Y2t> given Zt is correctly specif ied. Moreover the

    gradient test i s consi stent.

    Though the statistic (S .7) is simil ar in spirit to White ' s

    gradi ent test , it differs from it in the choice of the covariance

    matrix e st imator. To simpl ify the di scus sion, let us restrict

    ourselves to the case studied by White in which the matrices 0 and On

  • fl fl ,.. are non-singular. Suppose in addition that A0 ( a1( a• ) ) and An ( a1n)

    25

    are non-singular. Then the covariance matrix e stimator used by White

    (1982 , Equation ( 5 .2 ) ) is :

    0 n

    where

    ,.. ,.. V (a a1 ) n n,. n

    fl A. A. A fl A. An ( aln) Vn ( an, aln) An ( al n)

    ,.. ,.. c22n + c11 n aai

    [ -11 c I "" -1 - 1 aL (Yl Y2,Z.al ) 0 - 0 - n n ( 5 .10) n n n aa1 n n n

    Since 0-l - 0-l converge s almost surely to a positive semi-definite n n matrix, it fol l ows that our test based on Gn is asymptotical ly as

    powerful as White's test based on Gn for any al ternatives, and

    strictly more powerful for some alternative s .8

    Returning to the general case , it is worth noting that the

    number s of degrees of freedom of the asymptotic distributions of Hn and Gn are not equal . This actually resul t s from the fact that the

    two statistics Hn and Gn are not de signed to test the same hypothe sis.

    However, since H0 implies H�. one may use the gradient statistic to

    test H o"

    LEMMA 6 : Given the as sumptions of Theorem 5 , we have under H0 :

    (a)

    ( b )

    ( c )

    n fl fl A0 ( at> V A0 ( at> •

    r = s,.

    H - G n n a. s.

    � o.

    Lemma 6 generalizes White ' s (1982 ) Theorem 5.2 to the singular

    case r = s < k1 : Under H0, the Hausman test and the gradient test

    have the same number of degrees of freedom and are asymptotical ly

    equival ent. It is noteworthy that our resul t holds irrespe ctive of

    the choice of the generalized inverse s v- and 0-. However , whil e the n n Hausman t e st is consistent for any alternative H1 : aI - a1( a• ) = a I 0

  • 27

    ( se e part b of Theorem S ) , the gradient test may not have any power

    against some al terna tives a I O.

    6. Example s

    This sect ion present s some appl ications of CMLE's and their

    properties. The f irst example de al s with the simpl e l inear mode l.

    The other three exampl e s , which are the mul tinomial l ogit model , the

    Tobi t model , and the bivariate logit mode l , il lustrate different

    partitions of the parameter vector a. Specifical ly, these partitions

    are respectively :

    ( i) f(Yl t'y2t l zt; a) = fl (Yl t lY2t'Zt; a) • f2 C Y2t l Zt ; a ) ,

    ( ii ) f(Yl t , Y2t 1Zt;a1 ,a2)

    ( ii i) f(Ylt' Y2t 1 Zt; a1 , a2>

    f1 (Yl t IY2t' Zt; a1 ,a2) • f2 C Y2t l zt ;a2) ,

    f1 C Yl t lY2t' zt ; a1 ) • f2 C Y2t 1 Zt ; a1 , a2 ) .

    EXAMPLE 1 : Suppo se that one specifies the following s impl e l inear

    model for (Yt, Zt ) ' t = 1 , • •• , n :

    yt = P1 + P2 zt + ut 2 where E(ut ) = 0 and var(ut ) = o for every t, We shal l study the

    asymptotic properties of the OLS estimators p1 , p2 , and � where � is def ined as the sum of squared residua l s divided by n.

    A A A Az The OLS est imator s , a = Cp1 , p2 , o ) , can be interpreted as

    CMLE's, Indeed they clearly maximiz e the conditional log-like l ihood

    function L�(Y1 , • . • , Yn I z1 , • • • , Zn;a ) where a= Cp1 , p2 , o2> and

    2 21 2 log f (Yt I Zt ; a ) = -.S log 2n -.S log o -.S (Yt - p1 - p2 Zt ) o •

    That i s , the OLS estimators are identical to the CMLE's assoc iated

    with the family of conditional normal distributions for Yt given Zt:

    {N C P1 + p2 Zt ; o2) ; Cp1 , p2) 8JR

    2 , o2 > O}. Bence, the asymptotic

    properties of OLS fol low from Section 3 .

    Specif ical ly, l e t (µ0, µ0, o0 , o0 , o0 ) be the true means , y z zy zz yz variance s and covariance s of (Yt, Zt) . Let ur=Yt-Pt-PzZt where

    28

    a•= is def ined in Assumption A3-(b ) . Then i t can b e shown

    that a• solve s :

    Hence :

    p• 1

    E0(u*) = 0 t E0 = o

    o o o I o • o µy - µz 0yz 0zz ; P2 = 0yz A

    0 / 0zz

    Eo(u•2 ) t

    •2 o 0 o yy

    •2 o

    - oo2 / yz 0

    o • z z

    Then, the OLS estimator a almost surely converge s t o a• , whether or

    not the true conditional di stribution of Yt given Zt i s normal with a

    mean l inear in Zt and a variance independent of Zt. Moreover, its

    a symptotic di stribution i s given by Theorem 1 so that one can conduct

    inferences on a• through appropriately modif ied Wald or Lagrange

    Mul tipl ier statisti cs, On the other hand, if the true conditional

    distribut ion of Yt given Zt is normal with a mean l ine ar in Zt and a

    variance independent of Zt , . o o o2 ) 1.e., N(p1 + p2 Zt; o for some

    a0 = (p�, p�, 0°2) , then the OLS est imator : cons i stently estimates a0, as expe cted.

    As ment ioned in Sect ion 3, these properties depend ne ither on

  • 29

    the nature of the true distribution of Zt nor on the choice of the

    marginal probability mode l for Zt . Moreover, the se properties are

    obtained whether or not Zt is strictly or weakly exogenous for a ( see

    Engl e, Hendry, and Richard ( 1983) for definitions ) . 9•10

    Final ly, one can test whether the true conditional

    distribution of Yt given Zt is normal with a mean l inear in Zt and a

    variance independent of Zt . This is carried out by using White ( 1982 )

    Information Matrix test a s indicated above . It can be shown that

    Af( a• ) + Bf ( a• ) = [d .. ] where : 0 0 1J

    di1 = 0 . • o •2 2 I •4 d22 = cov (ut ,Zt) a . • o •3 I •6 d23 = E (ut .Zt) 2a .

    0 •2 •4 df2 = cov (ut ,Zt ) /a ,

    • o •3 I •6 d13 = E (ut ) 2a dJ3 = [E

    o (u:4) -3( Eo (u:2) ) 2l /4a•8 .

    Thus the Information Matrix test is equivalent to testing di2= d22=

    di3= A. dijn

    d23= dJ3= O. To carry out the test, consistent sampl e analogs

    are used . For instance:

    A.

    A 1 nA-2 1 nA.2 1 n 4'14 d12n = c; ) utzt - < ;;- [ ut) . < ; [ Zt ) ] I a /;;;1 t=l t=l where the ut are the OLS residual s . It is worth noting that te sting

    di2= d22= 0 is equival ent to te sting that the squared OLS residual s

    are asymptotica l ly uncorrelated with the cro s s products of the

    expl anatory variabl es ( see a l so White ( 1980 ) ) . On the other hand,

    te sting di3= d�3= 0 is equivalent to te sting that the cubed OLS

    re sidual s are uncorrel ated with the expl ana tory variabl es . Final ly,

    while di3= 0 means that the distribution of u� is unskewed, the

    restriction dJ3= 0 corresponds to the condition that the kurtosis of

    u� be equal to 3 as required by the normal distribution ( see White

    ( 1982 ) ) .

    EXAMPLE 2 : Let us consider the Mul tinomial Logit (MNL) model ( se e

    30

    e . g . , McFadden ( 1974) , Nerlove and Pre ss ( 1973) ) for the random sampl e

    (Yt,Zt ) ' t=l,, • • ,n. Let Yt be discrete with I ca tegories . Then:

    log Pr(Yt=i) llt + vi_ta

    I where µt = - log�1exp ( vjta) , and where vit combine s characteristics of the al terna tive i and the individual characteristics zt •

    Let B be a proper subset of the initial choice se t . Define

    the statistic St= lB(Yt ) where lB( . ) is the indica tor function of B.

    Let:

    yit 1 if yt = i Bt B if st 1,

    0 otherwise Bc if st 0,

    where Bc is the compl ement of B.

    It is easy to show that the conditional log-likelihood of

    CY1, . . . ,Yn) given cs1 , • • • ,Sn,z1, . . . ,Zn) is :

    Lc (Y I S,Z; a) n n [ ) Y.t [v�ta - log ) (exp v�ta )] t=l 1m 1 1 ·m J t J t

    whil e the ( conditional ) log-likelihood of CS1 , • • • ,Sn) given

    cz1, . . • ,Zn) is :

  • where

    and bt

    n Lc ( S I Z; a ) n ): St log pt + ( 1-St ) log ( 1-pt ) Pl

    pt

    log ( } exp v'.ta Jn J

    c bt bt bt e / ( e + e )

    be t log( [ exp vjta ) . jl!Bc

    31

    From Section 3 , it fol lows that the maximiz ation of Lc (Y IS ; a) n with respect to the identified parameters a1 of a gives consistent

    estimates «in of these parameters .1 1 It is noteworthy that this

    conditional log-likel ihood function can be obtained by acting as if

    the individual s who chose an al terna tive in the restricted choice se t

    B ( or Be) had only B ( or Be respect ively) a s their initial choice se t .

    Thus a1n can readily be obtained from MNL package ( e . g . , QUAIL) that

    a l l ow s different choice sets for the individual s .

    Moreover, as indica ted in Section S we can construct

    spe cif ication tests for the MNL mode l based on the indicator s A c A A A A C a1 - a1 ) or aL CY I S , Z; a1 > /aa1 , where a1 = a1 ( a ) and a i s the ML n n n n n n n

    est imator on the complete choi ce se t . Though our test based on - A C al n - a1n> i s simil ar in spirit t o the specif ica tion test for the )JNL

    model propo sed by Hausman and McFadden ( 1981 ) , it di ffers from it for

    the reason that the stati stic used by these author s i s (a�n - �l n)

    where a�n is obtained by maximizing :

    L (a) n } } Yit[vj_ta - log } ( exp v'.ta )] . {t ; �=l } tea J� J t

    Hence Ln (a) negl ect s the information on the individua l s who have

    chosen an al ternative outside the restricted choice set B. 12 Since

    32

    our test s use a l l the information, they are l ikely to be more powerful

    than the Hausman-McFadden specif ication test .

    EXAMPLE 3 : Suppose that one considers the simple Tobit mode l (Tobin

    ( 1958) , Amemiya ( 1973) ) for the random sampl e (Yt' Zt ) , t=l , • • • , n,

    i.e . :

    yt Zill + ut 0

    if Zill + ut > 0

    otherwise

    where the ut ' s are N ( O , a2) and independent given the Zt ' s .

    Let St = 1 if Yt > 0 , and 0 otherwise . The ( condi tional )

    likel ihood function of ( Y1 , s1 , ... . Yn' Sn) given ] t[�] t

    n st X fi [d (Yt/a - Zty ) /�(Zty) ] • t=l

    where y = ll/a, and d(.) and')>( . ) are re spectively the density and

    c . d . f. of the standard Normal .

    Since the model for St given Zt is a dichotomous Probit mode l ,

    i t fol lows that the first product i n Lc is the conditional l ikel ihood n

  • function a s sociated with CS1 , , , , , Sn) given (Z1 , ••• , Zn

    ) , Henc e , the

    se cond product in Lc is simply the conditional like l ihood function n

    33

    a s sociated with CY1 , ••. , Yn) given (S1 , ••. , Sn

    , z1 , •• . , Zn) . It i s worth

    noting that this l atter l ikelihood function is the one a s sociated with

    a random sampl e drawn from a truncated normal distribution ( for the

    distinction between censored and trunca ted, see Maddal a (1983)) .13

    Hence maximizing this second product with re spe ct to a and y giv e s

    e stimate s a and y that have the prope rties o f CMLE' s . Thus f rom Section S specification t e st s for the Tobit mod e l

    - A - A can be constructed from e . g . , the indicator (y-y, a-a) where (a,y)

    A A and (a,y) are r e spe ctively the ML e stimators a s sociated with the truncated Tobit mode l and the simpl e Tobit mode l .

    Incide ntl y , l e t us note that the CMLE's (a,y) must satisfy the normal equation a s sociated with the partial de rivative with re spe ct to

    a. i. e . :

    � a N1 + a YiZ1y - YiYl = 0

    where N1 is the numbe r of ob servations such that Yt > 0 , Y1 is the N1 X 1 vector of such obs e rv ations on Y, and z1 is the corre sponding

    matrix of observations on the expl anatory variab l e s Z . Solving f o r a, the positive root of this normal equation, giv e s:

    a = I ]1/2 YiY1 1 YiZly 2 -- + - (--) N1 4 N1 .!. YiZly 2 N1

    34

    From Section 2 , a and y are strongly consistent e stimators of a0 and y0 (unde r correct specification of the conditional model for Y

    t

    given (St,Zt)) . If one has avail abl e another strongly consistent

    estimator of y0 such as the ML e stimator yp obtained by estimating the Probit mode l for St

    given Zt (the first stage in Heckman's (1976)

    procedure), then it fol lows that the e stimator ap obtained f rom the

    previous equation where y is repl aced by yp is a strongly consistent e stimator of a0• Then, one readily obtains strongly consistent e stimates o f p0 by using apyp. This procedure has the fol l owing advantage s: (i) it ensur e s that the e stimate ap is a lways strictly positiv e , (ii) it is e a sy to carry out since it requires only one

    e stimation, and (iii) it doe s not require the expensive computation of

    d(Ztyp) and •cztyp) a s in Heckman's se cond stage .

    F.XAMPLE 4: As in Ande rse n (1970), conditional ML estimation is

    particul arly use ful when there exist suf ficient statistics f or some

    parameters.14

    Specifical ly , suppo se that Y1 , ••• ,Yn are indepe ndent

    with a common distribution that is a s sum e d to be l ong to the family

    {Fy( . ;a); aSAJ . Suppo se that there exist (i) a partition Ca1,a2) of

    the parameter v e ctor a such that A= � X Ai• and (ii) a suf ficient statistic S

    t= S(Yt)

    for a2, for any a1 in � . 15• Let h( , , . ;a) be the joint density of (Y

    t , St). Then we have:

    h(Yt,

    St;a) f(Yt l st;al) . g(St;al,a2)

    where f( . l . ;a1) and g(. ;a1,a2) are re spe ctively the conditiona l

  • density of Yt given St and the marginal dens ity of St. It fol l ows

    that by maximiz ing the conditional l og-l ike l ihood function for A ( Y1, • • • ,Yn ) g iven ( S1, • .. ,Sn ) ' one obtains an e stimator «i that has

    the propertie s mentioned in Section 3 . In particular, the se

    35

    properti e s are robust with re spe ct to misspecification of the marginal

    mode l for st.

    An example of such a situation i s the estimation of a

    multivariate l ogit model ( see Nerl ove and Press (1973) , Amemiya

    (1978) ) .16 Let Ylt and Y2t be qual itative variables with I and J

    categorie s respe ctively. For any t = l, • • • ,n, i = 1, • • • ,1, and

    j = l ,,,.,J, one assumes that:

    where µt =

    log Pr(Ylt i, y2t = j) µt + vj_jta

    - log [t � exp ( vj_ .ta) ] and v i"t i s a vector of 1=1 J=l J J expl anatory vari ables. Let us partition the vector v .. t into � explanatory variables z ijt that vary across i, and explanatory

    variab l e s z jt that do not. Let a be partitioned accordingly into a1

    and a2, Then, the conditional probabil ity that Ylt = i given that

    Y2t = j sati sf ies:

    where µt =

    l og Pr (Ylt i lY2t

    - log [t exp ( zj_.tal ) ] 1=1 J j) µjt + zljtal

    Hence Y2t i s a sufficient

    stati stic for a2, and maximiz ing the conditional log-l ike l ihood A

    function for (Y11 • . • • ,Y1n) given (Y21, ••. ,Y2n) gives a OILE a1n. This

    estimator i s not in general efficient since the marginal probabil ity

    model for Y2t depends in general on a1 •17

    As in the previous examples, one can construct spe ci f ication A tests for the bivari ate logit mode l based on (a1n- a1n) or

    A aL�(Y1 1 Y2,Z;a1n> /aa1 where «in i s the CMLE for the conditional model A for Y1 given Y2, and a1n i s the ML e stimator for the bivariate logit

    mode l .

    Fina l ly let us note that one case has not been covered: the

    36

    case in which the variable Y2t i s, in addition to be ing suffic ient for

    a2, ancil lary for a1, i . e . such that the marginal dens ity f2 (.;a1,a2>

    is independent of a1 ( see Fi sher (1956) ) . From Part ( c ) of Theorem 2

    and the fact that in thi s case the information matrix becomes bl ock f

    diagonal with f irst block equal to B01C a1) , it fol lows that CML

    estimation of a1 is efficient, given of course correct spe cification

    of the conditional model for Ylt given Y2t. This is not surpri sing

    s ince Y2t i s then weakly exogenous for a1 ( see Engle, Hendry, and

    Richard (1983) ) .

    7 . Conclus ion

    In this paper we derived the asymptotic properties of CMLE's

    under correct or incorrect specif ication of the conditional mode l.

    CMLE's were found to be robust with respect to misspe cif ication of the

    mode l for the conditioning variables. Eff iciency of CMLE's a s wel l as

    tests for mis specification of the conditional mode l were al so

  • discus sed. It was argued that CML e stimation provide s a convenient

    37

    way to test for parameter estimator inconsistency. Some example s were

    given to il lustrate the range of application of the CMLE technique.

    Our resul ts should prove to be useful to social scientists who conduct

    estimation and inferences conditional upon the observed value s of some

    expl anatory variable s.

    PROOF OF LEJIMA 1 : Obvious •

    APPENDIX

    38

    PROOF OF THEOREM 1: The theorem fol lows from Lemma 1 and White (1982)

    Theorems 2 .1, 2 . 2. and 3 . 2 . Indeed it suffices to choose an arbitrary

    distribution for Zt with a strictly po sitive density. It is then easy

    to check that Assumptions Al-AS imply White ' s Assumptions Al-A6 on the

    resul ting family of joint distribution HG.

    PROOF OF LEMMA 2: Let

    w( z ; a) = fl og f( y l z; a )dF� l zC y l z)

    Q.E.D.

    where the right-hand side exist s by Assumptions Al -A3 for H0-almost

    a l l z. Since F� l z< . I .> = FY l zC .l .;a0 ) by assumption, it fol lows from Jensen's inequal ity ( see, e.g • • Rao (1973, p. 58)) that for H0-almost

    a l l z:

    w( z; a0 ) 2. w( z;a) for al l a in A.

    Since zf( a) = Jw( z;a)dG0( z) where G0 ( .) is the true distribution ot

    zt. it fol l ows by integration that :

    zf( a0 ) 2. zf( a) for all a in A.

    From the uniqueness of a• (As sumption A3-b ) , it fol lows that a•

    Q.E.D.

    0 a •

  • PROOF OF LEMMA 3 : W e have :

    a2 log f(ylz;a) ilaaa'

    __ a _ [af o f a log f(y z;a f ( y l z; a ) dy aaaa' - - f alog f(ylz;a0> aloa fCylz;a0) 0 aa' f ( y(z; a ) dy - aa

    where both side s exi st for H0-almost a l l z because of Assumpt ion A.4.

    Q.E.D.

    PROOF OF THEOREM 2 : Straightforward from Theorem 1 , Lemma 1, and

    Equation (3 .3) .

    To prove Theorem 3 , we use the following l emma.

    LEMMA Al : Given Assumptions Al-AS, A2'-AS' , and AS:

    ...1- aL:

    .Jn aa1

    [ f ffl D B0

    ( a• ) B0 ( a*,a!)

    � N{O, f f 1 fl B0 { a*,afl B0 C afl ) .

    Proof : The result follows from the multivariate version of the

    standard Central Limit Theorem appl ied to :

    aLcCY I Z;a•) 1 n a log f (Yt!Zt ;a*) n - ;; bi 1 I aa aa

    Jn aL:(Y1 1Y2 ,z;af> = ../n

    1 n a 1og fi ] = B0 < at> < "'·

    Moreover, from llquation ( 4.4) and Assumption AS, we have:

    0 cov (alog f (Yt(Zt; a• ) alog f1 CY1t lY2t,Zt; a!) ] f f1

    a • a I = B ( a•. a*1> < "'• a � o Q.E.D.

    PROOF OF THEOREM 3: Part {a ) s imply follows from Theorem 1-a. Part

    40

    (b ) follows f rom Theorem 1-b and Assumption A7-a. To prove Part (d ) ,

    we note that f rom Assumptions A4, A4', AS, and Jennerich's uniform

  • strong Law of Large Numbers ( 1969, Theorem 2 , p. 636) we have

    uniformly on A X A1 :

    Af(a ) n

    Bf ( a ) n

    a. s. Af( a) � 0

    a. s. Bf ( a) � 0

    ;

    ;

    f1 a.s. f1 An ( al ) � Ao ( al ) • f1 a.s. f1

    Bn ( al ) � Bo ( al ) • u1 a. s. f f1 Bn ( a ,a1) � B0 ( a ,a1) .

    41

    Given Equation ( 4 . 6 ) , Assumptions AS-b , AS ' -b , and A7 , it fol lows f rom

    the uniform convergence that :

    a. s. Cn ( an, aln ) � C (a• , at>

    for any estimator ( an, aln) that converge s almost surely to ( a* , at > ·

    Part (d ) fol lows.

    Moreover, by taking a Taylor expansion around a• and at " -re spectively, and us ing the def initions of an and a1n it fol lows that : Lc (Y ( Z; a• )

    Af ( • ) rc"'a - a• ) + o ( 1) • 1 n + a vn n P 0 = - ila o Fri

    0 1 ilL:(Yl ( Y2 , Z; at> + /1 < at>Jn + op ( l ) . - ilal o ../n

    Thus, given Assumpt ion AS and AS ' :

    �[�n - a* ] aln - at

    [[A !( a• ) ] -: 0 l 0 [Aol < at> ) -1 ...!.. aL:(Y ( Z;a• )

    Jn ila

    ...!.. aL;

    _..;-;. ila1

    + 0 ( 1 ) p

    [

    Using Lemma 1 , we obtain:

    a - a• .;;.

    _n �( o , [ > [" l

    where

    a - a• ln l

    42

    0 0 0 [[Af ( a• ) ] -lBf ( a•) [Af ( a• ) ] -l f f [Af ( a• ) J -1B 1 C a• , a•1) [Af ( a•1) J -l 0 0 0 f ff

    [Aol < at > J -lBo

    l ( a• , at > CA!< a• ) ) -1

    " -On the other hand, we have ( aln' aln)

    f f f [A 2 c a• ) J -1B

    1 C a• ) [A 1 C a• ) ] -l 0 1 0 1 0 1 " - " -( a1 C an) ,a1n) = h( an,aln ) so

    that f rom Assumption A7 , the Jacobi an of the transformation h ( . , .) at

    ( a• , at> i s :

    J ( a• ) = 0 [J { a• )

    < l From a wel l-known property of convergence in distribution, it fol lows

    that :

    [" l al - al ( a* ) D Jn _ n � N( O, J(a• ) [ J ' ( a• ) ) . aln - at Part ( c ) straightforwardly fol lows.

    Q . E . D .

    To prove Lemma 4, we use the following lemma.

  • LEMMA A2 : G iven Assumptions Al , A2 '-A5 ' , if F� I z< . I • • • ) 1 Y2 FY I Y z< . l • • • ; a�) then for H

    0-almost al l ( y2 , z> : 1 2

    Eo [a log f1 (Ylt ly2 , z ; a�)l

    Y1 1Y2z aa1 0

    43

    where By I Y Z is the expectation with re spe ct to the true conditional 1 2 distribut ion of Y1t given Y2 = y2 and Z = z .

    Proof : From Jensen ' s Inequal ity, we have for a l l a1 in Ai=

    J l og f1 C y1 1y2 , z ;a�) f1 C y1 1y2 , z; a�) dy1

    2 J log f1 C y1 1y2 , z; a1 > f1 C y1 1y2 , z ; a�) dy1

    Since a� = ai ( Lemma 2 ) , and s ince at belongs to the interior of Ai (Assumpt ion AS ) , it follows that , at a1 = a� :

    a!1 f l og f1 C y1 1y2 , z; a1 > f1 C y1 1y2 , z ; a�) dy1 = o

    when the l eft-hand s ide exists .

    ( • )

    On the other hand, o log f1 C y1 1 y2 , z ; a1 ) /aa1 is, from Assumption

    A5 '-c, domina ted by a funct ion M(y1 , y2 , z) which is u0-integrabl e . But

    JMC y1 , y2 , z) dH° Cy1 ,y2 , z) = f[fMCy1 ,y2 , z) dF�1 1 y2z< Y1 1y2 , z) ldH° C y2 , z) Hence , for H0-almost all ( y2 , z) , M (y1 , y2 , z) is integrable with respect

    to F� IY ZC . I . . . > = FY IY zC . 1 . . . ; a�> . 1 2 1 2 It follows that we can reverse

    the order of the derivation sign and the integration sign in ( • ) . The

    result now fol lows from the def inition of � I z• 1 y2 Q. E .D.

    It i s worth noting, by taking the total expectation of the

    44

    above equation with respe ct to the true distribution of CY2t , Zt ) that

    thi s equa tion impl ies (but is not impl ied by) :

    [a log f1 (Ylt IY2t , Zt;a�) l E0 = 0 aal which i s a standard property of a�.

    PROOF OF LEMMA 4 : Since

    a log f ( y1 ,y2 l z; a) il log f1 C y1 1y2 , z; a1 l a log f2 C y2 l z; a ) a = J ' ( a) a + a a a1 a

    f it fol lows from the def initions of Bf ( a) and B 2 C a1) that : 0 0

    Bf ( a) 0 fl f2 J ' ( a) B0 C a1 ) J( a) + B0

    ( a) , 0 [a log f1 CY1t lY2t , Zt; a1 ) a log f2 CY2t 1 Zt ; a ) ] + J C alE a • a , a1 a 0 [o log f2 CY2t l zt ; a ) o log f1 CY1t lY2t,Zt; a1 )] + E a • a ' J ( a) . a a1

    Since a� = a1( a0 ) when F� l z< . I . ) = Fy l zC . l . ;a0 ) , it suf fice s to show

    that the l ast two matr ice s on the right hand side of the prev ious

    equation are ident ical ly nul l when evaluated at (a0, a�) . This

    property fol lows by noting that :

  • [a log Eo .

    f1 ( Y1t lY2t , Zt ; a� ) a log f2 (Y2t lzt ;a0 ll

    aal • aa .

    45

    _ 0 [ 0 [a log f1 (Yl t ly2 , z; a�)l - EY2z EY1 I Y2 , Z aal

    a log f2 CY2t 1 Zt ;a0 )]

    • aa •

    where � Z denote s the expectation w ith respect to the true 2 di stribut ion of (Y2 t , Zt ) . The de si red property then follows from

    Lemma A2. f

    Fina l ly , since Bf ( a0 ) and B 1 < a01) are f inite ( from Assumptions 0 0

    f AS and AS ' ) , then B 2 ( a) is a l so f inite . 0

    A

    Q.E. D .

    PROOF OF TIIEOREM 4 : To show that a1n i s at l east as efficient as aln' we can use some general properties on FIML estimates ( see e . g . , Rao

    A ( 1963) ) . Indeed a1n and �n are in f act j ointly consi stent and

    uniformly asymptotica l ly normal ( JCUAN) est imates of a� . Moreover, by

    picking an arbitrary but f ixed di stribut ion for Zt , it is easy to see A A

    that an i s then the FIML est imator so that a1n i s an asymptotically

    ef ficient e st imator of a� .

    Parts ( i) and ( i i) of Theorem 4 requires , however, a di rect

    proof. When F� l z< . l . l = Fy l z< . I . ; a0 ) , it follows from Theorem 3 ,

    Assumpt ions A6 , A6 ' , and Lemmas 2 and 3 that :

    0 A Asym. Var (a1n) Cll ( ao l J ( ao > [s!< ao l]-1 J ' ( ao ) ,

    0 -Asym. Var (aln)

    We want to show that :

    0 C22 ( al )

    [ f l-1 Bol ( a�)

    [a:11-1 2 J [s!] -1 J '

    46

    where we have dropped a0 and a� to simpl ify the notations . From Lemma

    4 , this i s equivalent to showing that :

    [ f l-l [ f f ]-l Bol = J J 'Bol J + Bo2 J ' which is , from Equation ( 4 .8 ) , equival ent to :

    [ f ]-l [ f f 1-1 M'B01M 2 L L'M' B01ML + NB02N• L' ( • ) after having used the non- singularity o f M and the orthogonal i ty o f N .

    Suppo se f irst that k1 = k. Then, from Equation (4 .9), L is

    the ident i ty matrix. Hence ( * ) is equival ent to :

    f f f M ' B 1.i i M' B 1M + NB 2N• 0 0 0

    f f which is true since B 2 and hence NB 2N• are po si tive semi-def inite . 0 0

    f Moreover the equal ity holds if and only if NB 2N• = O . 0

    Suppose now that k1 < k. From Equations (4 .9) and ( 4 .10) it

    fol lows that ( • ) is equival ent to :

  • [M'B:l)rl 2 [

    fl

    ['•, •] M'B0 :: z11 z12 -1 [':,l Zi2

    fl Note that the square matrice s M 'B0 M + z11 and z22 must be non-

    47

    singul ar since Bf is non- singul ar ( from Assumptions AS , Im , and Lemma 0 3) . Using the formul a for the inverse of a parti tioned matr ix, we

    obtain :

    fl fl -1 [ ]-1 [ ]-1 M'B0 M 2 M' B0 M + z11 - z12 z22 z21 or equival ent ly, after s impl if ication:

    -1 zll - Z12Zi2Z21 2 O which i s true since the left-hand side i s equal to

    f 1

    [ lk l -1 NB 2N' 1 [1kl;-Z12z22] o -z;:2�1 f

    where B 2 i s po sitive semi-def inite . 0

    PROOF OF TIIEOREM 5 : From Theorem 3 , it follows that :

    Q. E. D.

    - - A. • • D Jn [aln - aln - ( al - al ( a ) ) ] � N( O ,V)

    where V is def ined by (5 .2). Since by as sumpt ion V I O , it fol lows

    from Rao and Mitra ( 1971 , Theorem 9 .2 .2 ) that :

    A. • • - - A. • • D 2 n [al n - aln - (al - al ( a ) ) ] 'V [ al n - aln - ( al - al ( a ) ) ] � Xr

    48

    where v- i s any general ized inverse of V, and r rank V. Thus under

    the nul l hypothesis H0 :

    A. - - A. D x2 n[aln - aln] ' V [ al n - aln1 � r ·

    Moreover , from part ( c ) of Theorem 3 , Vn converge s almost surely to V. ' -1 Thus , by construction, [Vn + RnRn] converge s almost surely to

    [V + RR' ] -l which i s a general ized inverse of V. Therefor e , under H , 0 Hn converge s in di stribution to a chi- square with r de grees of

    freedom.

    Under H1 , it follows from part (b ) of Theorem 3 that :

    ,.. 0'1n - aln

    a . s . • • � a1 - a1 ( a ) a I o.

    Since [V + R R1 ] -l converge s almost surely to [V + RR' ] -l , we have : n n n

    ,.. ' -1 - ,.. ( aln - aln) ' [Vn + RnRn] ( al n - aln) a . s .

    � a ' [V + RR' J -1a .

    Since , by construct ion, V + RR' is non-singul ar, i t fol lows that

    V + RR' and hence [V + RR' ] -l are po si tive def inite . Thus -1 a ' [V + RR' ] a I 0 for any a I O . Hence under H1 , Bn converges almost

    surely to "'-

    Q. E.D.

  • PROOF OF LEMMA 5 : G iven condition ( a ) , i t follows from Equations

    ( 5 .2 ) and (4 . 6 ) that :

    fl • -1 • f • • V = [ B ( a1 ) ] + J ( a ) B (a ) J ' ( a ) 0 0 • f • -1 f fl • • fl 0 -1 - J( a ) [B0( a ) ] B0 ( a , a1 ) [B0 C a1 ) ]

    fl • -1 fl f • f • -1 • - [B0 C a1 > l B0 ( a• , a1 ) [B0( a ) ] J ' ( a ) .

    On the other hand, we have :

    49

    • a log f ( y1 ,y2 l z;a ) • •

    • a log f1 C y1 ly2 . z;a1 C a ) ) a log f2 C y2 l z , a ) J ' ( a > a + a aa a1 a

    f Thus from Equation (4. 4) and the def inition of B 1 C . ) , wo have under 0 H : 0

    f fl • • B0 ( a , a1 ) • fl •

    = J ' ( a ) B 0 ( a1 )

    [a log f2 C Y2t l zt ; a• ) + Eo ---=-�""---"-aa • a log f1 (Yl t l

    1Y2t , Zt ;a� )l

    aal

    But the second term is nul l since i t i s equal to :

    Eo [u log f2 CY2t 1 Zt ; a )

    Y2z aa 0 [ EY1 I Y2Z

    a 1og • f1 (Yl t ly2 , z; a1 )

    aa ' ] ]

    1

    where tho conditional expectation i s nul l be cause of condition ( b ) .

    Therefore :

    V = [ Bf ( a• ) ] -l 0 1 J( a•) [Bf( a•) ] -lJ ' ( a•) . 0

    O . E . D.

    To prove Theorem 5, we use tho following Lemma which i s

    s imil ar to Lemma A2.

    LEMMA A3 : Given Assumptions Al-AS . A2 ' -A4 ' , and AS :

    where

    w =

    • ..!... aL° > Jn aal

    D � N(

    0

    fl • a z ( al ( a ) ) aal

    Bf ( a• ) 0

    fl f • • B0 ( a , a1 ( a ) )

    f fl • • B0 ( a , a1C a ) )

    fl • f 1 • a z ( al ( a ) ) B C a1 ( a ) ) - a o a1

    l ,W)

    � . a z ( al ( a ) ) '

    aal

    s o

    Proof : The proof is simil ar to tho proof of Lemma A2, and is based on

    the mul tivariate version of the standard Central Limit Theorem. The

    only difference i s that :

    Eo[a log f (Ylt lY2t ' Zt;

    aal

    . . , , l

    which may not be zero so that :

    � . a z ( al ( a ) ) aal

    •[a log f (Y11 1Y21, z1 , a1C a• l l l

  • 51

    Q. E. D.

    PROOF OF THEOREM 6 : S ince ..;;..c:ln -• Taylor expansion around a1 ( a ) :

    • a1( a ) ) = Op( l ) , we obtain from a

    c ,.

    1 aLn (Yl I Y2 ,Z•a1n> 1 aL;(Yl I Yz , Z; al ( a• ) )

    ,;;.. aa1 =

    ,;;.. aa1

    1 a2L:CY1 1 Y2 , Z; a1C a* ) ) _ ,. + - , JnC a1 n n aalaal

    • a1 C a ) ) + op( l ) .

    ,. . On the other hand, since Jn(a - a ) n 0 ( 1) , we have from a Taylor p

    • expansion of a1 = a1 ( a) around a :

    ,,. - . . _ ,. . vnal = Jnal ( a ) + J ( a >Jn > /aa1aa1 = A0 C a1 C a ) ) + op( l ) .

    Coll ect ing these results , the f irst equation become s :

    c ,. c • 1 oLnCY1 1Y2 , Z; a1n) 1 aLnCY1 1 Y2 , Z; a1 C a ) ) Jn aa1 = ,;;.. aa1 c I • f1 * • f • -l l aLn

    (Y Z; a ) - A C a1 ( a ) ) J (a ) [A ( a ) ] - a + o ( 1) o o Jn a p

    From Lemma A3 , it fol lows tha t :

    c ,. l aLn(Y1 1Y2 , Z; aln)

    ,;;.. aa1

    fl • D a z ( a1 ( a ) ) -7 N( a , 0 ) al

    where 0 i s given by Eqnation ( 5 .6 ) us ing the def inition ( 4 .6 ) of

    el l ( a· ) .

    ' fl • Under H0 , a z ( a1 ( a ) ) /aa1 = O . Thus :

    c ,. ..!... aLnCY1 1Y2 , Z; a1n>

    ..;;.. aa�

    A D aL c (Yl IY2 , Z;aln) -7 .;. ll- n , s aal

    52

    for any choice of the general ized inverse 0- of 0 when 0 F 0 ( see Rao

    and Mitra ( 1971 , Theorem 9 .2 .2 ) ) . Since by construction [O + o Q 1 ] -l n n n converge s almost surely to [O + QQ' ] -l which i s a generalized inverse

    I of ll, it fol lows that under H0 , Gn converge s in di stribution to a

    chi-square with s degrees of freedom. ' � . Under H1 , a z C a1 ( a l l /aa1 = a I o . Since

    ( 1/nl aL:CY1 1 Y2 ,Z;�1n> /aa1 converge s almost surely to a I O , and s ince

    [O + Q Q 1 ] -l converge s almost surely to a po sitive def inite matrix, n n n I

    it fol lows that under H1 , Gn converge s almost surely to m.

    Q, E. D.

    PROOF OF LEMMA 6 : We use the f irst equation of the proof of Theorem • • 6 . Since under 110 , a1 = a1 ( a ) , we ge t :

    ( l/in> aL:CY1 1Y2 ,z;:1n> /aa1 = ( l/in> aL:CY1 1Y2 ,Z; a; l /aa1 fl • r A •

    + A0 C a1 l�n(al n - a1 l + op( l ) .

  • On the other hand f rom the proof of Theorem 3 , we have :

    - * > Jn( aln - al

    Hence, under 110 :

    f1 • -1 ,- c • I [A0 C a1 ) J ( l/vn) aLn(Y1 1 Y2 , Z;a1 > aa1 + op( l ) .

    c A fl * A -( l/Jnl aLn(Y1 1 Y2 , Z; a1n> /aa1 = A0 C a1 >v'n< a1n - a1n> + op( l ) .

    53

    ( • )

    Since 0 i s the asymptotic covariance matrix o f the left-hand side , and

    T' A -since V is the asymptoti c covariance matrix of yn( al n - a1n) ' it

    fol lows that :

    0 fl • fl • A0 C a1) V A0 C a1 ) .

    Since by Assumption AS ' , the matrix A:1 C a� ) is non-singular, it fol lows that r = s,

    II n

    To prove ( c ) , we use the fact that :

    A - -1 A n(a1 - a1 ) ' [ V + RR' ] C a1 - a1 ) ' + o ( 1) , n n n n p

    < •• )

    c A I , -1 c A I Gn = C l/n) aLnCY1 1Y2 ,Z;a1n) aa1 [O + QQ• ] aLn(Y1 1 Y2 ,Z; a1n) aa1 + op( l ) .

    These equa tions fol low from the fact that V + R R and D + Q Q1 are n n n n n n consi stent est imators of V + RR' and D + QQ ' respect ively. Hence ,

    us ing ( * ) we ge t :

    II - G n n A - -1 -1 A n(a1 - a1 ) ' [ ( V + RR' ) - A(O + QQ ' ) A] (a1 - a1 ) + o ( 1) n n n n p

    where A = A:1 C a� ) . Since A is non- singul ar by Assumpt ion AS '-b , it i s

    5 4

    clear from ( ** ) that, in the non-singular case ( r = s = k1) , the f irst

    term on the right-hand side is identi cal ly nul l so that Hn - Gn � 0 in probabil ity .

    In the s ingul ar case the derived resul t is , however , more

    diff icul t to e st ablish. To see that, let v- = (V + RR' ) -l and

    0- = (0 + QQ ' ) -l , Though for any general ized inverse (of maximum

    rank) v- of V the matr ix A-l v- �l is a general ized inverse ( o f

    maximum rank) o f O, nothing ensures that V- = 0- since thi s depends on

    the choice of Rn and Qn' i. e . , on the choice o f general ized inverse s

    (of maximum rank) of Vn and On. We shal l nevertheless show that the

    f irst term on the right-hand side converge s in di stribution and hence

    in probabil ity to 0 ,

    From the proof of Theorem 5 we have :

    Moreover

    A - D Jn C aln - al n) � N(O, V) •

    V (V- - AO-A)V = V - VAii-AV

    V - (A-lOA-1) AO-A(A-lOA-l ) 0

    where we have used that v- and n- are general ized inverse s of V and n ,

    and that V = A-lOA-l which follows from part ( a ) . Hence :

    [V( V- - AO-A)J 3 = [V(V- - Ail-A) J 2 •

    Therefore from Theorem 9 .2 . l in Rao and Mitra ( 1971) it fol lows that :

  • where

    A - - - A - D 2 n(aln - a1n> • CV - AO Al Caln - a1n) � Xm

    m = trace [V- - AO-A]V

    trace (V-V) - trace (0-AVA)

    trace (V-V) - trace (0-0 )

    rank V - rank 0

    0 '

    where the f ourth equal ity fol lows f rom a property of a general ized

    inverse ( see Rao and Mitra ( 1971 , Def inition 3 , p, 21 ) . Thus

    Rn - Gn � 0 in probabil ity.

    55

    Q, E.D.

    56

    FOO'INOTES

    • I am much indebted to J . Dubin, R. Engle, D. Grether, J . Link, D. Rivers , and H. White f or helpful comment s and critici sm .

    Remaining errors are of course mine .

    1 . The fol lowing assumptions , with the exception of Assumption 6 in

    Section 3, are simil ar to those of White ( 1982 ) . The basic

    difference i s that our assumptions be ar on the conditional

    density instead of on the j oint density.

    2 . The existence of a conditional distribution i s ensured by

    Jirina ' s Theorem ( see e . g . , Loeve ( 1955) , Monfort ( 1980 ) ) .

    3 . Nothing i s said about uniqueness of a CMLE, Our definition

    corresponds to Wald ' s ( 1949) approach to ML e st imation. Ou the

    other hand, Andersen ( 1970) take s Cramer ' s ( 1946 ) approach so

    that his a s sumptions are somewhat different from ours .

    4 . Note that this holds even if G0 ( , ) doe s not have a density .

    5 . As a matter of fact Y2t may simply b e a funct ion o f Ylt ( see

    Exampl es 2 and 3 below) .

    6 . From Theorem 3 , Equation ( 5 .1) can also be written as

    pl im aln A

    plim aln' A 7 . Note , however, that in our case the effic ient e st imator a1n is

    used in evalua ting the gradient of L� CY1 1Y2 , z; . ) and On' This

  • contrast s with White ' s statistic ( 5 .2 ) where the ineff icient

    e st imator is used.

    8 . I owe this point to a di scuss ion with D . Rivers .

    9 . Consider the simul taneous system Yt = p1 + p2zt + ut and

    57

    Zt = y + vt where ut and vt may be correl ated . [The f irst

    equation i s not identif ied and, Zt i s ne ither weakly nor str ictly

    exogenous for a = C p1 , p2 ,auu) . ] OLS on the f irst equation

    consistently estimates the conditional distribut ion of Yt given

    Zt when the true conditional distribut ion of ut given vt i s

    normal , a condition that i s satisf ied when ut and vt are j ointly

    normally distributed.

    10 . Since no assumption is made on Zt , one can also consider the

    " reverse" regression of Zt on Yt . The resul ting parameter

    estimates and the direct OLS est imates must then sati sfy some

    compatibil ity conditions in order to def ine a proper e st imated

    j oint di stribut ion for (Yt , Zt ) ( see Gour ieroux and Monfort

    ( 1979) ) . See also footnote 17 .

    11 . Whether or not all the parameters in a are ident i f ied c l early

    depends on the choice of B.

    12 . If B is the compl ete choi ce set minus one al terna tive , then

    cl early a1n= a�n· Our test become s identical to the one propo sed by Hausman and McFadden ( 1981 ) .

    5 8

    13 . Though the l og-l ikel ihood function associated with the ( censored)

    Tobit model is global ly concave in y and l/a ( see Olsen ( 197 8) ) ,

    the l og-l ikelihood function associated with the trunca ted Tobit

    model is only partially concave in y and 1/a in the sense that

    given y it i s concave in 1/a, and given 1/a it is concave in y .

    14 . Andersen ( 1970) sugge sts CML estimation instead o f ML estimati on

    when there are incidental parameters . The appropri ate

    conditi oning variable to use i s a sufficient stati st ic f or the

    incidental parameters. Assumption Al rules out such a s i tuation

    s ince the Zt must be identically di stributed.

    15 . In fact St is a marginal sufficient statistic since it depends

    only on Yt ( see Rao ( 197 3 , p. 132 ) ) . Sudakov ( 1971) , however,

    shows that when the Yt ' s are i . i . d . , then a marginal suf f icient

    st atistic i s also suf f icient in the usual sense .

    16 . For more compl ex exampl es , see Vuong ( 1982b ) .

    1 7 . See Amemiya (1978) and Vuong ( 1982c ) . In this latter paper , CML

    estimation and ML est imation of the marginal model for Y2t are

    used i teratively in order to produce efficient est imator s of all

    the parameters. Note that, instead of cons idering the e stimation

    of the marginal model for Y2t , one can consider the CML

    est imation of the conditional model for Y2t given Y1t . This

    second approach was sugge sted by Nerlove and Press ( 1973 ) and

    studied by Guilkey and Schmidt ( 197 9) and Vuong ( 1982 a ) .

  • 59

    REFERENCES

    Amemiya, T. : "Regr e s s i on Ana ly s i s When the Dependent Variab l e i s

    Truncated Normal , " Econome trica, 41 ( 197 3 ) , 9 97-1016 .

    ����� : "On a Two-Step Est imation of a Mul t ivariate Log i t Mode l , "

    Journal o f Economet r ic s , 8 ( 197 8) , 13-21 .

    Ande rsen, E. B . : "Asympto t i c Prope r t i e s of Condi t i onal Maximum

    Likel ihood Estimator s , " Journal of the Royal Stat i s t ical Soc iety,

    Ser ies B , 32 ( 1970) , 2 83-301 .

    Bowden, R. : "The Theory of Parametric Ident i f ic a ti on, " Econometrica,

    41 ( 1973 ) , 1 06 9-174 .

    Cramer, H. : Mathema t ic a l Methods o f Sta t i s t i c s . Pr ince ton : Prince ton

    Univer s i ty Pre s s , 1 946 .

    Engl e , R, F. , D. Hendry, and J . F. Ri char d : "Exo gene i ty , "

    Economet r ica, 51 ( 1983 ) , 277-304 .

    F i sher, R. A. : Stat i s t ic a l Methods and Scient i f ic Inferenc e . London :

    Ol iver and Boyd, 1 956 .

    Gour ieroux, C, , and A. Monfor t : "On the Charact e r i z a t ion of a Joint

    Probabil ity D i s t r ibut i on by Cond i t i onal Di stribut i ons , Journal o f

    Ec onome t r i c s , 9 ( 197 9) , 115-1 1 8 .

    Gui l key, D . , and P . Schm idt : " Some Sma l l Sample Prope r t i e s of

    Est imator s and Te st Sta ti stics in the Mul tivariate Log i t Mode l , "

    Journal o f Econome trics, 10 ( 1979) , 33-42 .

    Hausman, J , A. : "Spe c i f ica tion Te s t s in Econometric s , " Econometr ica,

    46 ( 197 8) ' 1 251-1272 .

    Hausman, J , , and D. McFadden : "A Spe c if ication Te st f or the

    Mul tinomial Logi t Mode l , " MIT Working Pape r , No 292 , 1 981 .

    Heckman, J . : "The Common S tructure of Sta t i st i cal Mode l s of

    Trunca t i on, Sampl e Selection, and L imited Dependent Variabl e s ,

    and a S impl e Est imator for Such Mode l s , " Annal s o f Economic and

    Soc ial Measurement, 5 ( 1976) , 47 5-492 .

    Jennr i ch , R. I . : "Asympto t i c Prope r t i e s of Non-Linear Least Squares

    Est imator s , " � o f Mathema t ical Sta t i s t i c s , 40 ( 1969) , 633-6G .

    60

    LeCam, L. : "On Some Asymptotic Prope r t i e s of Maximum Likel ihood

    Estimat e s and Rel a ted Baye s ' Est imate s , " University of Cali fornia

    Pub l icat ions in Stat i s t ic s , 1 ( 1953 ) , 277-330 .

    Loeve, M. : Probab ility The ory. New York: D . Van Nostrand Company ,

    1 955 .

    McFadden, D . : "Conditional Log it Analys i s of Qua l itative Choice

    Behav ior , " Front iers in Econome t r ic s , ed. b y P, Zarembka . New

    York: Academic Pre s s , 1 974 .

    Madda l a , G. S. : Limited-Dependent and Qua l i tat ive Var iables in

  • Economet r i c s . New Yor k : Cambridge Univer s i ty Pre ss . 1983 .

    Monfor t . A . : Cours .!!£ Probabilites. Pari s : Econom ica. 19 80 .

    Ner l ov e , M. and S. J . Pre s a : Uniyariate .!!li! Multiyariate 12&-L.ia2.l.i. .!!li! Logistic �. San t a Mon i ca : Rand Corpor a t ion, 19 7 3 .

    Ol sen, R . J . : "Note o n t h e Uniquene s s o f t h e Maximum Like l ihood

    Est imator for the Tobi t Mode l . " Econometrica, 46 (197 8), 1 211-

    1215 .

    Rao, C . R. : "Cr i te r i a of Est i ma t i on i n Large Sampl e s , " Sankkya. 25

    (1963). 1 89-206 .

    ����- = � Stat ist i c a l Inference .!!li! .!1J. Applications . New York : J ohn W i l ey and Son s . 1973 .

    Rao. C . R • • and S . K . Mitra : Gener a l ized Inverse .21 Matrices .!!li! .i.t.1 Appl ic a t ions . John W i l ey and Son s . 197 1 .

    Rothenberg. T. : "Ident i f ica t i on i n Parametr i c Mode l s , " Econometrica, 39 (197 1). 577-591 .

    S ilvey, S. D. : "The Lagrangi an Mul t ipl ier Te s t , " Annals o f

    Mathema t i c a l Stat ist ics, 30 (1959), 389-407 .

    Sudakov, V. N. : "A Remark on Mar gi na l Suf f i ci ency , " Doklady. 198

    (1971). 796-7 9 8 .

    Tobin. J . : "Est i ma t i on of Re l a ti onships f o r Lim i ted Dependent

    61

    Var i ab l e s , " Econome t r i c a , 26 (1958), 24-36 .

    Vuong , Q. H. : "Conditiona l Log-Linear Probab i l i ty Mode l s : A

    Th e or e t i cal Devel opment w i th an Empi rical Appl i c a t i on. • Ph . D

    The s i s . Nor thwestern Univers i ty. 1982 .

    62

    ����� : "Probab i l ity Fe edback in a Recur s ive Sys t em of Probab i l ity

    Mode l s , " Soc i a l Sci ence Working Paper No. 44j , Cal i forni a

    Ins t i tute of Technol ogy . 1982 .

    ����� = "Probab i l i ty Fee dback in a Recur sive Sys t em of Loa i t

    Mode l s : Estima t i on, " Soc i a! Sc i ence Working Pape r N o . 444.

    Ca l ifornia Ins t i tute of Technol oay. 1982 .

    Wal d , A . : "Note on the Consi s t ency of the Maximum Like l ihood

    E s t imat e , " A!!.!!!.!!. of Mathemat ica l S t a t i s t i c s . 6 0 (1949), 595-603 .

    Wh i t e , H . : "A Heterosteda s t i c i ty-Con s i st ent Covari ance Ma t r ix

    Est imator and a Direct Te st for H e t e rosteda s t i c i ty, •

    Econome t r ica, 4 8 (19 8U ) , 817-838.

    ����� = "Maximum Like l ihood Est imation of Mi s spe c i fied Mode l s , "

    Econome tri ca, 50 (1982), 1-25 .

    "Corrigendum, " Ec onome t rica, 51 (198J ) , 513 .