Partially linear varying coe¢ cient models with missing at ... · Partially linear varying coe¢...

Partially linear varying coe¢ cient models with missing

at random responses

Francesco Bravo�

University of York

August 2012

Abstract

This paper considers partially linear varying coe¢ cient models when the response vari-

able is missing at random. The paper uses imputation techniques to develop an omnibus

speci�cation test. The test is based on a simple modi�cation of a Cramer von Mises

functional that overcomes the curse of dimensionality often associated with the standard

Cramer von Mises functional. The paper also considers estimation of the mean functional

under the missing at random assumption. The proposed estimator lies in between the full

nonparametric estimator of Cheng (1994) and the full parametric estimator of Scharfstein,

Rotnizky & Robins (1999), and can be used, for example, to obtain a novel estimator for

the average treatment e¤ect parameter. Monte Carlo simulations show that the proposed

estimator and test statistic have good �nite sample properties. An empirical application

illustrates the applicability of the results of the paper.

Keywords: Bootstrap, Imputation, Inverse probability weighting, Missing at random

�I am grateful to an Associate Editor and two referees for useful comments and suggestions that im-

proved the original draft. I am also grateful to Juan Carlos Escanciano and Ursula Müller for some

useful conversations. All remaining errors are my own. Address correspondence to: Department of

Economics, University of York, York YO10 5DD, UK. E-mail: [email protected]. Web Page:

https://sites.google.com/a/york.ac.uk/francescobravo/

1

mailto:[email protected]

https://sites.google.com/a/york.ac.uk/francescobravo/

1 Introduction

Partially linear varying coe¢ cient models are useful extensions of the popular partially linear

model considered for example by Engle, Granger, Rice & Weiss (1986), Robinson (1988) and

Speckman (1988). These models o¤er additional �exibility compared to partially linear models

because they allow interactions between a vector of covariates and a vector of unknown functions

depending on another covariate, while avoiding the curse of dimensionality typically associated

with partial linear models. Partially linear varying coe¢ cient models are important examples of

varying coe¢ cient models (see e.g. Cleveland, Grosse & Shyu (1991) and Hastie & Tibshirani

(1993)) which arise in many situations of practical relevance in economics, �nance and statistics,

and have been used in the context of generalized linear models and quasi-likelihood estimation

(Cai, Fan & Li 2000), time series (Cai, Fan & Yao 2000), longitudinal data (Fan & Wu 2008),

survival analysis (Cai, Fan, Jiang & Zhou 2008) to name just a few applications- see (Fan &

Zhang 2008) for a recent review containing further applications and a number of examples.

Compared to varying coe¢ cient models, partially linear varying coe¢ cient models contain the

additional information that some of the coe¢ cient are in fact constant and this information

should be incorporated in the estimation to obtain more e¢ cient estimates. Ahmad, Leela-

hanon & Li (2005) and Fan & Huang (2005) suggest two general estimation techniques based,

respectively, on nonparametric series and pro�le least square estimation. Both procedures result

in e¢ cient estimators under the additional assumption of conditional homoskedasticity of the

unobservable errors.

In this paper we consider varying coe¢ cient partially linear models when the response vari-

able is not directly observed, but it is assumed to be missing at random (MAR henceforth).

MAR is commonly assumed in many statistical models with missing data - see Little & Rubin

(2002) for a comprehensive review - and it speci�es that the probability of missing depends on

variables that are always observed. When the responses are MAR a natural approach to follow

is to impute a value for the missing responses and then estimate the parameters of interest as

if the imputed responses were the true parameters. Paik (1997) showed that imputation by

regression methods can improve the e¢ ciency of estimators for generalized estimating equations

models; Chen & Cui (2006) and Wang, Linton & Härdle (2004) considered, respectively, im-

putation methods with local quasi-likelihood estimators and in the context of partially linear

regression. Wang & Rao (2002) and Wang & Chen (2009) combined nonparametric imputation

and empirical likelihood to obtain inferences in models with MAR data.

In this paper we consider two problems: �rst testing for the correct speci�cation of partially

linear varying coe¢ cient models; second estimating the mean of a response variable assuming

a partially linear varying coe¢ cient speci�cation. The speci�cation test we consider is based

on a Cramer von Mises type of functional of a marked empirical process. This type of statistic

has been used for checking the correct speci�cation of a number of statistical models including:

2

parametric regressions (Stute (1997) and Escanciano (2006)), generalized linear models (Stute &

Zhu (2002)), quantile regressions (He & Zhu (2003) and Escanciano & Velasco (2010)), partially

linear regressions Zhu & Ng (2003), single index (Stute & Zhu (2005) and Xia, Li, Tong & Zhang

(2004)), conditional moments (Whang (2001)); Bravo (2012) proposes some general results for

semiparametric conditional moments models, which incorporate as special cases all of above

mentioned models. One important characteristic of the test statistic we propose is that it is

based on the same dimension reduction approach as that proposed by Escanciano (2006). This

approach yields test statistics that do not su¤er from the main problem associated with the

standard marked empirical process approach, namely the curse of dimensionality related to the

potentially high dimension of the conditioning set of covariates. The imputation estimator for

the mean functional is similar in spirit to that proposed by Wang et al. (2004) and complement

the nonparametric imputation estimator of Cheng (1994), and the fully parametric imputation

estimator of Scharfstein et al. (1999). The resulting estimator is rather general and can be used,

for example, to construct a novel estimator of the average treatment e¤ect parameter that is

central to many causal inference models, see for example Little & Rubin (2002).

In this paper we make the following contributions: �rst we consider partially linear varying

coe¢ cient models with two imputation methods: a standard one based on regression and an

alternative one based on inverse probability weighting. Both approaches are based on a complete

version of the pro�le kernel estimator originally suggested by Fan & Huang (2005) when the

responses are always observable. It is important however to note that other nonparametric

estimators could be used, such as the nonparametric series estimator suggested by Ahmad et al.

(2005). We derive the asymptotic distribution of the complete version pro�le kernel estimator

and of the speci�cation tests under the null and both global and local alternative hypotheses.

These results extend and/or complement results obtained by Wang et al. (2004), Fan & Huang

(2005), Escanciano (2006), Sun, Wang & Dai (2009) and Koul, Muller & Schick (2012) among

others.

Second we consider two bootstrap procedures that can be used to consistently approximate

the critical values of the unknown distributions of the test statistic. The �rst bootstrap procedure

is the wild bootstrap (see e.g. Wu (1986) and Hardle & Mammen (1993)), and has been used

in the context of the type of speci�cation tests considered in this paper by Stute, Gonzalez-

Mantiega & Quindimil (1998), Whang (2000), Escanciano (2006) and others. The second one

is motivated by the so-called random symmetrization technique (see e.g. Pollard (1984)) and

by the multiplier central limit theorems (see e.g. Van der Vaart & Wellner (1996)), and it has

been used by Su & Wei (1991), Delgado, Dominguez & Lavergne (2003), Zhu & Ng (2003) and

others.

Third we derive the asymptotic distributions of imputation estimators for a MAR mean

functional assuming a partially linear varying coe¢ cient speci�cation, and derive a new semi-

3

parametric e¢ ciency bound for this model under the assumption of normality. We also propose

a new estimator of the average treatment e¤ect parameter under strong ignorability (Rosenbaum

& Rubin 1983). These results complement those of Cheng (1994), Hahn (1998), Hirano, Imbens

& Ridder (2003), Wang et al. (2004) and Muller (2009) among others.

Fourth we use simulations to assess the �nite sample properties of the proposed estimators

and test statistics. As a by-product of these simulations we are able to compare the relative

performances of the two bootstrap procedures. This comparison is, as far as we are aware of,

new in the context of speci�cation testing.

Finally we illustrate the practical usefulness of the methods proposed in this paper by in-

vestigating the important policy-related question of whether membership to the World Trade

Organization (WTO) can have negative e¤ects on the environment.

The remaining part of the paper is structured as follows: next section introduces the model

and the estimator. Sections 3 introduces the Cramer von Mises statistic to test the correct

speci�cation of the model and shows the consistency of both bootstrap procedures. Section

4 contains the results on the estimation of the mean functional and the novel semiparametric

estimator of the average treatment parameter. Section 5 presents and discusses the results of

the Monte Carlo simulations. Section 6 contains the empirical application. Section 7 contains

some concluding remarks. An Appendix contains all the proofs.

The following notation is used throughout the paper: �a:a:�, �a:s:�stand for �almost all�

and �almost surely�), d! denote weak convergence in l1 (�) - the space of all real-valuedfunctions that are uniformly bounded in � (see Pollard (1984) for a de�nition), and convergencein distribution, respectively. Finally k�k is the Euclidean norm, "0" denotes transpose and forany vector v v2 = vv0.

2 The model and the estimator

The model we consider is

Y = X 0�0 (U) +W0�0 + "; (2.1)

where �0 (�) is a p-dimensional vector of unknown functions, U is an observable random variable1,�0 is a k-dimensional vector of unknown parameters and the unobservable error " is such that

E ("jU;X;W ) = 0 a:s: and E ("2jU;X;W ) = �2 (U;X;W ) a:s:.We assume that some of the Y values in a sample of size n may be MAR whereas the X, U

and W values are completely observed, that is the probability of missing, also called propensity

score in the causal inference literature, see e.g. Rosenbaum & Rubin (1983), is given by

Pr (�jY; U;X;W ) = Pr (�jU;X;W ) := � (U;X;W ) > 0 a:s:; (2.2)

1The results of the paper are valid also if U is a random vector.

4

where � = f0; 1g is a binary indicator of missingness. Let (Yi; Ui; Xi;Wi)ni=1 denote an i.i.d.

incomplete sample from (Y; U;X;W ), and �i = 0 indicates that Yi is missing. To deal with the

MAR responses we follow the same approach as that suggested by Wang et al. (2004) and Sun

et al. (2009) and propose the following complete case estimators for �0 (�) and �0

b� = " nXi=1

�i�Wi � b�XW (Ui)0Xi

�2#�1 nXi=1


�(Yi �X 0

ib�XY (Ui)) (2.3)

and b� (u) = b�XY (u)� b�XW (u) b�; (2.4)

where

b�XW (u) =

"nXi=1

�iXiX0iKh (Ui � u)

#�1 nXi=1

�iXiW0iKh (Ui � u) ;

b�XY (u) =

"nXi=1

�iXiX0iKh (Ui � u)

#�1 nXi=1

�iXiY0iKh (Ui � u)

and Kh (�) := K (�=h) =h is a kernel function and h =: h (n) is the bandwidth.Note that (2:3) and (2:4) are like the pro�le kernel estimators proposed by Fan & Huang

(2005) except that they are based on the complete case and conditional heteroskedasticity is

allowed. Alternatively we can use nonparametric series estimation and obtain the complete case

series based estimator b�Sb�S =

"nXi=1

�i (Wi �WiP (Xi; Ui))2

#� nXi=1

�i (Wi �WiP (Xi; Ui)) (Yi � P (Xi; Ui)Yi)

and b� (u)S = P (Xi; u)�Yi �Wi

b�S� ;where

P (Xi; u) = p (Xi; u)0

nXi=1

p (Xi; u) p (Xi; u)0

!�p (Xi; u) ;

p (Xi; u) =�X1ip

q11 (u)

0 ; :::; Xpipqpp (u)

0�0 ;�� denotes generalized inverse and pqjj (u) =

�pj1 (x) ; :::; pjqj (u)

�0is a qj � 1 vector of base

functions. Under the same regularity conditions of Ahmad et al. (2005) it is possible to show

that b�S has the same distribution as that of the pro�le kernel estimator b� given in the followingtheorem:

5

Theorem 1 Under A1(i), A2-A6(i), A7(i) and A8 listed in the Appendix

n1=2�b� � �0� d! N

�0;��10 0�

�10

�where

�0 = Eh� (U;X;W )

�W � �0XW (U)0X

�2i;

0 = Eh� (U;X;W )�2 (U;X;W )

�W � �0XW (U)0X

�2i:

Furthermore, under the additional assumption that " � N (0; �2),

n1=2�b� � �0� d! N

�0; �2��10

�; (2.5)

and �2��10 is the semiparametric e¢ ciency bound of Bickel, Klaassen, Ritov & Wellner (1993):

The e¢ ciency result (2:5) of Theorem 1 is, as far as we know, new. It is consistent with

that of Koul et al. (2012), who derive the e¢ ciency bound for a partially linear model with

MAR responses assuming a location density for the error with �nite Fisher information. Under

the assumption of normality Koul et al.�s (2012) e¢ ciency bound coincides with the one of

this paper, and vice-versa under the assumption of a location density (2:5) can be modi�ed to

encompass theirs. It is also interesting to note that (2:5) is consistent with the results of Kim

& Ma (2012) on the semiparametric e¢ ciency of the extended nonlinear least squares estimator

of Wang & LeBlanc (2008), under the assumption of homoskedasticity and that the errors have

a symmetric density.

Let

bY1i = �iYi + (1� �i)�X 0ib� (Ui) +Wi

b�� ; (2.6)

bY2i =�ib� (Ui; Xi;Wi)

Yi +

�1� �ib� (Ui; Xi;Wi)

��X 0ib� (Ui) +Wi

b��denote, respectively, the regression and inverse probability weighted imputed values, where b� (�)is a nonparametric estimator of the propensity score (2:2) given by

b� (u; x; w) = Pni=1 �iLb (Ui � u;Xi � x;Wi � w)Pni=1 Lb (Ui � u;Xi � x;Wi � w)

; (2.7)

where Lb (�) := L (�=b) =b is a kernel function and b =: b (n) is another bandwidth. Note thatbY2i is motivated by the so-called double robustness property, that in the context of estimationwith missing data means consistency of the estimator when either the model for the missingness

mechanism or the model for the distribution of the complete data is correctly speci�ed (see

e.g. Robins, Rotnitzky & Zhao (1994), Scharfstein et al. (1999) and Bang & Robins (2005)). As

6

noted by Sun et al. (2009), this property is desirable for estimation but not for testing (especially

for the correct speci�cation) because it reduces the sensitivity of the test to departures from

the null hypothesis. A second potential problem with bY2i is that it might su¤er from the curse

of dimensionality associated with a possibly high dimensional set of covariates X and W . Of

course a natural remedy for this problem would be to consider a fully parametric speci�cation

of � (Ui; Xi;Wi), such as that of a probit or logit. However the resulting estimator would be

characterized by a di¤erent asymptotic behaviour and could still be doubly-robust. For these

reasons we consider an alternative inverse probability weighting in which only the marginal (in

U) propensity score Pr (� = 1jU) is nonparametrically estimated. Let

b� (u) = Pni=1 �iLb (Ui � u)Pni=1 Lb (Ui � u)

(2.8)

denote the nonparametric estimator, and

bY3i = �ib� (Ui)Yi +�1� �ib� (Ui)

��X 0ib� (Ui) +Wi

b�� (2.9)

denote the resulting inverse probability weighting imputed values. Note that the same type of

inverse probability weighting as that based on (2:9) has been considered by Wang et al. (2004)

and Sun et al. (2009).

3 Speci�cation analysis

The null hypothesis that (2:1) is correctly speci�ed is

H0 : E ("jU;X;W ) = E (Y �X 0�0 (U)�W 0�0jU;X;W ) = 0 a:s:; (3.1)

or equivalently

H0 : E ["(U;X;W ;u; x; w)] = 0; (3.2)

provided the linear span of the function (�) is dense in the space of bounded measurablefunctions on the support of U;X;W (Bierens 1982). In this paper we specify (�) to be theindicator function

I ([U;X 0;W 0] � [u; x0; w0]) = I (U � u)pYj=1

I�X(j) � x(j)

� kYj=1

I�W (j) � w(j)

�; (3.3)

but we follow the same projection approach proposed by Escanciano (2006), which avoids the

potential practical problem of having a high dimensional set of covariates. To be speci�c we

consider

E�"I�U � u; �0 [X 0;W 0]

0 � s��= 0 a:a: u; s; � 2 �; (3.4)

where � = [�1;1]2 � Sk+p and Sk+p is the unit sphere2 in Rk+p.2This follows because by Lemma 1 in Escanciano (2006) the null hypothesis (3:1) holds i¤

E�"j�0 [U;X 0;W 0]

0�= 0 a:s: for any vector � 2 Rk+P+1 and k�k = 1.

7

The advantage of specifying (U;X;W ;u; x; w) as in (3:4) over the standard indicator func-

tion (3:3) is apparent in its dimension reduction character, which implies that potentially high

dimensional covariates X;W are not a problem as they would be with (3:3), i.e. too many zeroes

negatively a¤ecting both size and power properties of any test statistic based on it. A more

detailed discussion of the merits of (3:4), including also a comparison with the related approach

of Stute & Zhu (2002) in the context of generalized linear models, can be found in Escanciano

(2006).

Given (3:4) a test statistic for the null hypothesis (3:1) can be constructed by considering a

functional of the so-called projected marked empirical process

n1=2bvj (u; s; �) = 1

n1=2

nXi=1

b"jiI �Ui � u; �0 [X 0i;W

0i ]0 � s

�j = 1 or 3

where b"ji = bYji �X 0ib� (Ui) +W 0

ib� j = 1 or 3

and b� (�) and b� are the complete case estimators as de�ned in 2.4 and 2.3, respectively.3.1 Asymptotic null distributions

Theorem 2 Under A1(i), A2�A6(i), A7(i) and A8 listed in the Appendix

n1=2bvj (u; s; �) =) vj (u; s; �) in l1 (�) ; j = 1 or 3;

where the vj (u; s; �)s are two centred Gaussian processes with covariance function

E [�j (u1; s1; �1)�j (u2; s2; �2)] ; (3.5)

�1 (u; s; �) = �"I�U � u; �0 [X 0;W 0]

0 � s��D (u; s; �) ��10 �

�W � �0WX (U)

0X�"�

G (s; �; u) [E (�XX 0jU)]�1 �X"I (U � u) ;

�3 (u; s; �) =�"

� (U)I�U � u; �0 [X 0;W 0]

0 � s��D (u; s; �; �) ��10 �

�W � �WX (U)

0X�"�

G (s; �; u) [E (�XX 0jU)]�1 �

� (U)X"I (U � u) ;

and

D (u; s; �) = E�� (W 0 �X 0�0XW ) I

�U � u; �0 [X 0;W 0]

0 � s��;

D (u; s; �; �) = E

��

� (U)(W 0 �X 0�0XW ) I

�U � u; �0 [X 0;W 0]

0 � s��;

G (s; �; u) = E��X 0I

��0 [X 0;W 0]

0 � s�jU = u

�:

8

Let F� (u; s) and bF� (u; s) denote, respectively, the distribution and empirical distributionof U and �0 [X 0;W 0]0, and let d� denote the uniform distribution on the sphere Sk+p. Giventhe result of Theorem 2 we can use a Cramer von Mises type of functional to construct a test

statistic for the null hypothesis H0 that is

CMj = n

Z�

b�j (u; s; �)2 d bF� (u; s) d� j = 1 or 3: (3.6)

A straightforward application of the continuous mapping theorem gives the following:

Corollary 2.1 Under A1(i), A2�A6(i), A7(i) and A8-A9 in the Appendix and under the nullhypothesis (3:1)

CMjd!Z�

vj (u; s; �)2 dF� (u; s) d� j = 1 or 3: (3.7)

As it is typically the case with this type of approach to speci�cation testing the asymptotic

null distributions of the proposed test statistics are not asymptotically distribution free (ADF)

since they depend in a complicated way on the in�uence functions �j (j = 1; 3) given in (3:5). It

is also important to note that the CMj (j = 1; 3) statistics need not to be scaled by a normalizing

constant - typically given by a consistent estimator of (3:5). This is important for two reasons:

�rst because the presence of such normalizing constant can have a negative e¤ect on the power

properties of the statistics themselves depending on the type of estimator used. Secondly because

the estimation of the normalizing constant is further complicated by the presence of the unknown

conditional variance �2 (U;X;W ). In this case one possible estimator that can be used is the

same k-nearest neighbourhood-based estimator, say b�2nn (Ui; Xi;Wi), as that used by Robinson

(1987). Note also that we could take the conditional heteroskedasticy directly into account in

the estimation of b� (Ui) and b� by scaling the unobservable error "i by b�nn (Ui; Xi;Wi), and/or

consider a weighted version of the projected marked empirical processes n1=2bvj (u; s; �). Howeverin both cases the resulting test would still not be ADF, giving therefore no practical advantage to

the scaled test statistic over the original one. In the next subsection we present two resampling

techniques that can be used to consistently estimate the null distributions of the CMj statistics.

3.2 Bootstrap approximation

As mentioned in the Introduction we consider two bootstrap approaches to approximate the

distributions of the CMj statistics. The �rst one is based on the wild bootstrap (WB henceforth)

and the second one is based on the multiplier bootstrap (MB henceforth). Each methods has its

own advantage compared to the other: WB is easier to compute as the required multidimensional

integration over Sk+p admits a simple closed form expression -see (5:3) below; at the same time

it is rather computationally intensive because of the actual re-estimation involved, especially for

the nonparametric component. MB is more complicated to compute because it is directly based

9

on the empirical analog of (3:5) -see (3:8) below- but it is less computationally intensive because

the model is not actually resampled.

In the context of speci�cation testing with marked empirical processes WB is considered by

Stute (1997) and by Escanciano (2006) for linear regressions and parametric regression models,

respectively. When the responses are MAR Gonzalez-Manteiga & Perez-Gonzalez (2006) show

how to modify the WB for testing the correct speci�cation of regression models using kernel

smoothing. In this paper we follow their approach to obtain appropriate bootstrap samples that

preserve the MAR assumption. To be speci�c let b"i denote the ith residual when �i = 1 and letf�ig

ni=1 denote a random sample from the distribution of the bounded random variable � with

zero mean and unit variance that is independent from U , X and W: Let "�i = b"i�i and notethat E� ("�i ) = 0 and V

� ("�i ) = b"2i , where E� (�) and V � (�) denote the expectation and varianceoperator with respect to the original sample. Let

Y �i = X0ib� (Ui) +W 0

ib� + "�i if �i = 1;

Y �i = 0 if �i = 0

denote the bootstrap response so that (Y �i ; Ui; Xi;Wi)ni=1 represent a bootstrap sample. Let

b�� = " nXi=1

�i�Wi � b�WX (Ui)

0Xi

�2#�1 nXi=1


�(Y �i �X 0

ib�XY � (Ui)) ;and b� (u) = b�XY � (u)� b�XW (u) b��denote the bootstrap estimators, and let

n1=2bv�j (u; s; �) = 1

n1=2

nXi=1

b"�jiI �Ui � u; �0 [X 0i;W

0i ]0 � s

�j = 1 or 3

denote the bootstrap marked empirical process, where

b"�ji = bY �ji �X 0ib�� (Ui) +W 0

ib��;

and

bY �1i = �iY�i + (1� �i)

�X 0ib�� (Ui) +Wi

b��bY �3i =

�ib� (Ui)Y �i +�1� �ib� (Ui)

��X 0ib�� (Ui) +Wi

b�� :Let CM�(WB)

j denote the WB version of (3:6) and let Pr � denote the bootstrap probability; the

following theorem shows that WB consistently estimates the distribution of (3:6) :

10

Theorem 3 Under A1(i), A2�A6(i), A7(i) and A8-A9 listed in the Appendix

supc2R+

��Pr � �CM�(WB)j � c

�� Pr (CMj � c)

�� p! 0 j = 1 or 3;

where CMj has the asymptotic distribution as given in (3:7) :

MB is similar to WB in that an auxiliary sequence of zero mean and unit variance random

variables independent from the original sample is used in the simulation. However as opposed

to WB the original model is not resampled, nor re-estimated. MB is considered for example by

Su & Wei (1991) in the context of speci�cation tests for generalized linear models and by Zhu

& Ng (2003) for partial linear models.

Let

n1=2b��j (u; s; �) = 1

n1=2

nXi=1

b�ji (u; s; �) �i j = 1 or 3 (3.8)

denote the empirical randomized in�uence function, where

b�1i (u; s; �) = �ib"ib!iI �Ui � u; �0 [X 0i;W

0i ]0 � s

�� bD (u; s; �) b��1�i �W 0

i � b�WX (Ui)0Xi

�b"i �bG (s; �; u)" 1

n

nXj=1

�jXjX0jKh (Uj � Ui)

#�1�iXib"iI (Ui � u) ;

bD (u; s; �) =1

n

nXi=1

��i (W

0i �X 0

ib�XW (Ui)) I �Ui � u; �0 [X 0;W 0]0 � s

��;

bG (s; �; u) =1

n

nXi=1

��iX

0iI��0 [X 0

i;W0i ]0 � s

�Kh (Ui � u)

�;

where b�XW (Ui) and b� (Ui) are de�ned in Section 2, b�3i (u; s; �) is de�ned similarly to b�1i (u; s; �)and f�ig

ni=1 is a random sample from the random variable � with the same characteristics as

those used in WB. Let

CM�(MB)j =

Z�

b��j (u; s; �)2 d bF� (u; s) d�denote the MB version of (3:6).

The following theorem shows that MB consistently estimates the distribution of (3:6) :

Theorem 4 Under A1(i), A2�A6(i), A7(i) and A8-A9 listed in the Appendix

supc2R+

��Pr � �CM�(MB)j � c

�� Pr (CMj � c)

�� p! 0 j = 1 or 3;

where CMj has the asymptotic distribution as given in (3:7) :

11

3.3 Power properties

We now investigate the power properties of the test statistics CMj. We �rst consider global

alternatives of the form

H1g : Pr (E (Y �X 0� (U)�W 0�jU;X;W ) 6= 0) > 0 8�; � (U) : (3.9)

Theorem 5 Under A1(i), A2�A6(i), A7(i), A8-A9 listed in the Appendix and under the alter-native global hypothesis (3:9)

CM1

n

d!Z�

E��E (Y jU;X;W )�X 0�y (U)�W 0�y

��

I�U � u; �0 [X 0;W 0]

0 � s�2

dF� (u; s) d� > 0;

CM3

n

d!Z�

E

��

� (U)

��E (Y jU;X;W )�X 0�y (U)�W 0�y

��

I�U � u; �0 [X 0;W 0]

0 � s�2

dF� (u; s) d� > 0;

where b� � �0 = �y + op (1), kb� (U)� �0 (U)k = �y (U) + op (1), and �y 6= 0, �y (U) 6= 0 a:s:.

Next we consider local alternatives of the form

H1l : E ("jU;X;W ) = (U;X;W )

n1=2a:s: (3.10)

for some known bounded real-valued function (�).

Theorem 6 Under A1(i), A2�A6(i), A7(i), A8-A10 listed in the Appendix and the alternativelocal hypothesis (3:10)

CMjd!Z�

[vj (u; s; �) + sj (u; v; �)]2 dF� (u; s) d� j = 1 or 3;

where

s1 (u; v; �) = E�� (U;X;W ) I

�U � u; �0 [X 0;W 0]

0 � s��

E�� (W 0 �X 0�XW (U)) I

�U � u; �0 [X 0;W 0]

0 � s��

��10 E f� (U;X;W ) (W �X�XW (U))�h (U;X;W )�X 0 �E �X2jU

��1E (�X (U;X;W ) jU)

io;

and s3 (u; v; �) is as s1 (u; v; �) with � replaced by �=� (U).

Theorems 5 and 6 show that the tests (3:6) are consistent and can detect local alternatives

converging to the null hypothesis at the fastest possible rate.

12

4 Estimation of the mean functional

We now consider the problem of estimating the population mean �0 (or any other linear square

integrable functional) of the response variable Y using the same i.i.d. incomplete sample

(Yi; Ui; Xi;Wi)ni=1 as that de�ned in Section 2 and the same three di¤erent imputed responsesbYji (j = 1; 2; 3) de�ned in (2:6) and (2:9). Let

b�j = 1

n

nXi=1

bYji j = 1; 2; 3

denote the resulting imputation estimators. Recall that b�2 - the inverse probability weight-ing estimator with estimated full propensity score (2:7)- enjoys the double robustness property,

whereas b�3 - the inverse probability weighting estimator with estimated marginal propensityscore (2:8)- does not (unless � (U;X;W ) = � (U) a:s:). It is also important to note that an

estimator similar to b�3 is also considered by Wang et al. (2004), who showed that under the as-sumption of a partially linear speci�cation, b�3 is in general more e¢ cient than the correspondingfully nonparametric one of Cheng (1994).

Theorem 7 Under A1(i), A2�A6(i), A7(i) and A8 listed in the Appendix

n1=2�b�j � �0� d! N

�0; �2j + var (X

0�0 (U) +W0�0)

�j = 1 or 3;

where

�2j = Eh�2 (U;X;W )� (U;X;W )

�0j (U) + �0j�

�10

�W � �0XW (U)0X

��2i;

and

01 (U) = [E (XjU)� E (�XjU)]0�E��X2jU

��1X + 1; (4.1)

03 (U) =

�E (XjU)� E (�XjU)

� (U)

�0 �E��X2jU

��1X +

1

� (U);

�01 = E [(1� �) (W 0 �X 0�0XW (U))] ; �03 = E

��1� �

� (U)

�(W 0 �X 0�0XW (U))

�:

Under A1(ii), A2-A3, A4(ii), A5, A6(ii), A7(ii) and A8 listed in the Appendix

n1=2 (b�2 � �0) d! N

�0;�2 (U;X;W )

� (U;X;W )+ var (X 0�0 (U) +W

0�0)

�: (4.2)

The results of Theorem 7 complement those obtained by Cheng (1994) andWang et al. (2004)

and show that the inverse probability weighting estimator with full nonparametric propensity

score b�2 (4:2) is asymptotically equivalent to the estimator proposed by Cheng (1994), which isknown to be semiparametric e¢ cient (Hahn 1998), whereas neither of the two other imputation

13

estimators achieve this bound. It is also interesting to investigate the e¢ ciency of the proposed

estimators under the assumption that a partially linear varying coe¢ cient speci�cation holds,

that is

E (Y jU;X;W ) = X 0�0 (U) +W0�0 a:s: (4.3)

The following theorem derives the semiparametric e¢ ciency bound under (4:3) and the assump-

tion of normality of the errors.

Theorem 8 Under A1(i), A2�A6(i), A7(i) and A8 listed in the Appendix, (4:3) and " �N (0; �2), the semiparametric e¢ ciency bound of Bickel et al. (1993) for the mean functional is

�2eff = �2EhE (XjU)0E

��X2jU

��1E (XjU)

i+ E

h� (U;X;W )

�W � �0XW (U)0X

�0��10 �

E�W � �0XW (U)0X

��2+ var (X 0�0 (U) +W

0�0) : (4.4)

The e¢ ciency bound (4:4) is an important generalization of that derived by Wang et al.

(2004). This result complements that of Muller (2009), who considered e¢ cient estimation

of the mean functional under MAR assuming a nonlinear regression structure. The following

proposition shows that none of the proposed estimators b�j (j = 1; 2; 3) achieve this bound, andthus they are not semiparametric e¢ cient under (4:3) and normality. Let �2j(ho) denote the

variances �2j of Theorem 7 under conditional homoskedasticity.

Proposition 4.1 Under the assumptions of Theorem 8

�2j(ho) � �2eff = V1j � V2j � 0 j = 1; 2; 3;

where the Vkjs (k = 1; 2) are given in the Appendix.

Proposition 4.1 shows that the di¤erence between the variances of the proposed estimators

with that of the e¢ cient one consists of two terms: V1j and V2j. The former represents the vari-

ance reduction that results from imposing the additional information (4:3), whereas the latter

represents the e¤ect of estimating the additional information. In practice the more precise the

estimation is (i.e. the smaller V2j), the larger the e¢ ciency loss of using any of the imputation

estimators is. An immediate consequence of Proposition 4.1 is that the e¢ ciency result of Wang

et al. (2004, Theorem 3.4) does not generalize to more complex semiparametric models with a

partial linear structure. It is also important to note that Proposition 4.1 is consistent with the

result of Muller (2009), who showed that imputation estimators do not achieve the semipara-

metric e¢ ciency bound under the additional assumption of a nonlinear regression structure for

a MAR mean functional: Indeed to achieve the bound Muller (2009) suggested an alternative

weighted estimator with weights based on (pro�le) empirical likelihood estimation of the ad-

ditional information. A similar approach could be used for the (4:3) speci�cation, but its full

asymptotic analysis is beyond the scope of this paper.

14

4.1 Average treatment e¤ect estimation

The results of Theorem 7 can be used in the context of causal inference models to obtain

an estimator of the average treatment e¤ect parameter under the so-called strong ignorability

condition (Rosenbaum & Rubin 1983). To be speci�c let � denote a binary indicator of treatment

and let Y (�) denote the potential outcome. Given a sample (Yi; Ui; Xi;Wi)ni=1 where

Yi = �iY(1)i + (1� �i)Y (0)i ,

is the realised outcome, the average treatment e¤ect parameter is

� 0 = E�Y (1) � Y (0)

�: (4.5)

The central problem of the treatment literature is that either Y (1)i or Y (0)i is observed but

never both. Thus without further restrictions, the average treatment e¤ect � 0 is not identi�ed

and hence cannot be consistently estimated. To solve the identi�cation problem, we assume the

following strong ignorability condition

Pr��jY (�); U;X;W

�= Pr (�jU;X;W ) := � (U;X;W ) > 0 a:s:; (4.6)

which is e¤ectively the MAR assumption in (2:7). Hahn (1998) and Hirano et al. (2003) consider

nonparametric estimation of (4:5); fully parametric estimation is considered for example by Little

& Rubin (2002).

We consider two estimators, one based on imputation and the other based on inverse prob-

ability weighting, and assume a varying coe¢ cient partially linear speci�cation, that is

b� 1 =1

n

nXi=1

h�iY

(1)i + (1� �i)

�X 0ib�(1) (Ui) +Wi

b�(1)�� (1� �i)Y (0)i � �i�X 0ib�(0) (Ui) +Wi

b�(0)�i ;b� 2 =

�ib� (Ui; Xi;Wi)

�X 0ib�(1) (Ui) +Wi

b�(1)�+ �1� �ib� (Ui; Xi;Wi)

��X 0ib�(0) (Ui) +Wi

b�(0)� : (4.7)The following theorem establishes the asymptotic distribution of the two average treatment

parameter estimators b� j j = 1; 2. Let�(1)21 = E

��(1)2 (U;X;W )� (U;X;W )

�(1)01 (U) + �

(1)0 �

(1)�10

�W � �(1)0XW (U)

0X��2�

;

�(0)21 = E

��(0)2 (U;X;W ) (1� � (U;X;W ))

�(0)01 (U) + �

(0)0 �

(1)�10

�W � �(0)0XW (U)

0X��2�

;

where (1)01 (U), �(1)0 are as in (4:1) and (0)01 (U), �

(0)0 are

(0)01 (u) = E (�XjU)0

�E�(1� �)X2jU

��1X + 1; �

(0)0 = E [� (W 0 �X 0�0XW (U))] ;

�(0)0 = E

�(1� � (U;X;W ))

�W � �(0)0XW (U)

0X�2�

:

15

Also let

�(1)22 = E

��(1)2 (U;X;W )� (U;X;W )

�(1)02 (U) + �

(1)02 �

(1)�10

�W � �(1)0XW (U)

0X��2�

;

�(0)22 = E

��(0)2 (U;X;W ) (1� � (U;X;W ))

�(0)02 (U) + �

(0)02 �

(1)�10

�W � �(0)0XW (U)

0X��2�

;

where

(1)02 (u) =

E (XjU)� (U;X;W )

0 �E��X2jU

��1X;

(0)02 (u) =

E (XjU)� (U;X;W )

0 �E�(1� �)X2jU

��1X:

Theorem 9 Under A1(ii), A2, A3, A4(ii), A5, A6(ii), A7(ii) and A8 listed in the Appendix

n1=2 (b� 1 � � 0) d! N�0; �

(1)21 + �

(0)21 + var

�X 0h�(1)0 (U)� �(0)0 (U)

i+W 0

h�(1)0 � �(0)0

i�� 0

�;

n1=2 (b� 2 � � 0) d! N�0; �

(1)22 + �

(0)22 + var

�X 0h�(1)0 (U)� �(0)0 (U)

i+W 0

h�(1)0 � �(0)0

i�� 0

�:

5 Monte Carlo evidence

In this section we use simulations to assess the �nite sample properties of the proposed estimators

and test statistics. We consider two models: a varying coe¢ cient and a partially linear varying

coe¢ cient that are given, respectively, by

Yi = Ui +X1i cos (2�Ui) +X2iU2i + X

22i + "i; (5.1)

and

Yi = X1i sin (2�Ui) +X2i sin (6�Ui) + :5W1i + 2W2i +W3i + (X1i +W2i)2 + "i; (5.2)

where U is uniformly distributed on [0; 1], Xji are independent standard normal, Wji (j = 1; 2)

are jointly normally distributed with mean 0, variance 1 and correlation 2/3, W3i is a Bernoulli

random variable taking the value 1 with probability 0.4, "i is a normal random variable with

mean 0 and variance 0.2 and = 0 under the null hypothesis of correct speci�cation. The two

missing probability mechanisms are:

�(1) (U = u;X = x;W = w) = 0:7 + 0:25 (jx1 � 1j+ jw2 � 1j+ ju� 1j)if jx1 � 1j+ jw2 � 1j+ ju� 1j � 1=5 or 0:9 otherwise�(2) (U = u;X = x;W = w) = 0:6 for all u; x; w;

which imply that the mean probability of missing is approximately 0.1 and 0.4, respectively.

16

We �rst compare the �nite sample performance of the two test statistics CMj (3:6) using

bootstrapped critical values; for WB the critical values are based on 500 replications, whereas

for MB they are based on 1000 replications. Note that the actual computation of the integral

over Sk+p in CM�(WB)j is based on the following expression

CMj =1

n2

nXi=1

nXk=1

nXl=1

b"ijb"kjI (Ui � Ul) I (Uk � Ul)� (5.3)ZS

I��0 [X 0

i;W0i ]0 � �0 [X 0

l ;W0l ]0�I��0 [X 0

k;W0k]0 � �0 [X 0

l ;W0l ]0�d�

=1

n2

nXi=1

nXk=1

nXl=1

b"ijb"kjI (Ui � Ul) I (Uk � Ul)Sikl;where the integral Sikl is proportional to the volume of a spherical wedge and can be computed

as

Sikl =

�� arccos �(Xi �Xl)

0 ; (Wi �Wl)0� �(Xk �Xl)

0 ; (Wk �Wl)0�0

kXi �Xlk kXk �Xlk kWi �Wlk kWk �Wlk

!�� k+p2�1

��k+p2+1�

and � (�) is the gamma function; see Escanciano (2006) for further details.One important practical aspect of the results of this paper concerns the choice of the band-

width, since as pointed out for example by Zhu & Ng (2003), no optimal bandwidth selection

theory is available in the context of testing. To investigate this problem we consider bandwidths

chosen by standard cross-validation (bcv) and then two bandwidths chosen using a �xed grid

based on 2bcv and 4bcv (that is twice and fourth times the bandwidth chosen by cross-validation).

For the three sample sizes considered the average bcv for model (5:1) is [0:084; 0:065; 0:059], and

for (5:2) is [0:058; 0:047; 0:042].

Tables 1a-1b report the �nite sample size of CM1 and CM3 corresponding to a nominal

level of 10 and 5 per cent for models (5:1) and (5:2) using 1000 replications and a second-order

Epanechnikov kernel for both the varying coe¢ cient parameter � (U) and the missing probability

mechanisms �(j) (U;X;W ) (j = 1; 2).

Tables 1a-1b

Both tables indicate that the tests are slightly undersized especially for n = 50 but they

approach the correct nominal size as n increases. Not surprisingly the degree of the size dis-

tortion is related to the percentage of missingness, which is higher for the second speci�cation

�(2) (U;X;W ). In both tables it appears that WB typically provides a more accurate approxi-

mation to the distribution of the test statistics. Finally we note that the bandwidth choice has

little e¤ect on the �nite sample size.

17

Figures 1 and 2 illustrate the �nite sample power properties of CM1 and CM3 for the nominal

size 0.05. The power is computed at 12 values of in the range = [0:7; 3:5] for (5:1), and in

the range = [0:4; 2:8] for (5:2) using 1000 replications for a sample size of n = 100 with the

�(2) (U;X;W ) speci�cation and the same cross-validated bandwidths bcv as those used in Tables

1a and 1b. Results for n = 50 and 400 are qualitatively similar to the ones presented here and

hence are omitted.

Figures1 and 2 approx. here

In each �gure the left and centre panel show the power of, respectively, the CM1 and CM3

statistics with WB (solid line) and MB (dashed line) approximation. In each cases it is evident

that the WB approximation yields a test statistic with higher power. The two right panels

compare the relative powers of CM1 and CM3 and show that for both models the power of CM3

is higher than that of CM1.

Next we consider the three imputed estimators b�j for the mean �0 and compare them with

the same estimator estimator proposed by Horvitz & Thompson (1952) (see also Hirano et al.

(2003)) that is given by

b�HIR = 1

n

nXi=1

Yi�i:b� (Ui; Xi;Wi);

which is asymptotically equivalent to b�2. We also consider the complete case estimator b�C =Pni=1 �iYi=

Pni=1 �i:

Tables 2a-b report the biases and standard deviations for the four estimators based on 5000

replications for sample sizes of n = 50, n = 100 and n = 400 for (5:1) and (5:2) respectively,

with bandwidth chosen using least squares cross-validation.

Tables 2a, 2b approx. here

Table 2a indicates that b�3 is characterized by the smallest �nite sample bias across both designsand both missing probability speci�cations; the bias of b�HIR is always in between that of b�1and those of b�2 and b�3. Note also that for n = 400 and �(1) (U;X;W ) the biases of b�2 and b�3are almost identical. Table 2a also indicates that all of the proposed estimators represent an

improvement in terms of bias over the complete case estimator b�C . Table 2b reports the standarddeviations of the �ve estimators. The theoretical values of the standard deviation of the response

variable without MAR (i.e. the true unobservable one) are obtained by simulations and are 1.215

and 2.741 for models (5:1) and (5:2), respectively. All of the four imputation estimators have a

standard deviation that is closer to that of the unobservable response compared to that of the

complete case estimator. Thus, bearing in mind that according to Proposition 4:1 none of the

proposed estimators is fully e¢ cient under (5:1) and (5:2), we see that the proposed imputation

methods provide some improvements on the precision of the resulting estimators. We also note

that among the four imputation estimators the one based on inverse probability weighting with

18

marginal propensity score (2:8) has an edge over the others (except for the case of n = 50 in

model (5:1) where the Hirano et al.�s (2003) estimator is closer to the standard deviation of the

unobservable response).

Taken together the results of Tables 1a-2b and Figures 1 and 2 indicate that the proposed

imputation estimators and test statistics are characterized by good �nite sample properties

which compare favourably with those based on competing estimators and test statistics. The

results seem to suggest that WB delivers test statistics that have slightly better �nite sample

size and power than those based on MB, and that the inverse probability weighted estimators

and test statistics based on the marginal propensity score (2:8) are characterized by better �nite

sample properties than those based on the complete case analogue. These results combined with

those of Section 4 suggest that, from a practical point of view, marginal propensity score inverse

probability weighting can be a useful estimation technique with MAR responses, in particular

when the dimension of the covariates is high so that the estimation of the full propensity score

is subjected to the curse of dimensionality.

6 Empirical application

In this section we illustrate the methods of this paper by considering the question of whether

the WTO can have negative e¤ects on the environment. This question has been at the centre of

a long standing debate between environmentalists and the trade agreement supporters; see for

example Copeland & Taylor (2004) for a review of various theories and some empirical evidence

between trade, growth and the environment.

Millimet & Tchernis (2009) and Bravo & Jacho-Chavez (2011) use country-level data from

Frankel & Rose (2005)3 to investigate the possible negative causal relationship between trade

and environment, by specifying the treatment variable as the GATT/WTO membership and

considering �ver di¤erent measures of environmental quality: Per capita dioxide (CO2) emis-

sions, the average annual deforestation rate from 1990-1996, energy depletion, rural access to

clean water and urban access to clean water. Both Millimet & Tchernis (2009) and Bravo &

Jacho-Chavez (2011) use inverse probability weighed estimators for the average treatment e¤ect

with propensity scores estimated, respectively, with parametric and non parametric methods.

We use the same data and the same variables as those used by Millimet & Tchernis (2009) and

Bravo & Jacho-Chavez (2011), but as opposed to these authors we use a partially linear varying

coe¢ cient speci�cation with the same three covariates (log-real per capita GDP (Log�rgdppc),a measure of the democratic structure of the government (Demo� Str) and land area (Land).We consider only four of the �ve environmental variables, namely the dioxide emissions (Co2pc),

the annual deforestation (Defor), rural access (Rural) and urban access (Urban). This choice

3Available at http://faculty.haas.berkeley.edu/arose

19

http://faculty.haas.berkeley.edu/arose

is suggested by some preliminary graphical analysis suggesting that for each of these four vari-

ables there is evidence of some nonlinear relationship with one of the covariates, as illustrated

in Figure 3.

Figure 3 approx. here

We use the results of this paper in two di¤erent ways: �rst to test whether a given partially

linear varying coe¢ cient speci�cation is supported by the data; second to estimate the average

treatment e¤ect parameter using the corresponding speci�cation. In the estimation we use the

same second-order Epanechnikov kernel with cross-validated bandwidth as that used in the

previous section. For the speci�cation tests we consider the CM1 statistic and use bootstrap

p-values (p�) obtained from 500 replications of CM�(WB)1 .

The four speci�cations are

Co2pc(1)i = 0:0302Landi

p=0:145

(1) +Demo� Str(1)i b�(1) (Log � rgdppci) ; (6.1)

R2 = 0:82; p� = 0:562;

Co2pc(0)i = �0:0232Landi

p=0:138

(0) +Demo� Str(0)i b�(0) (Log � rgdppci) ;R2 = 0:11; p� = 0:881;

Defor(1)i = �0:003Land(1)i

p=0:021

+ 0:056Demo� Str(0)ip=0:001

+ b�(1) (Log � rgdppci) ;R2 = 0:19; p� = 0:384;

Defor(0)i = �0:0004Land(1)i

p=0:321

+ 0:003Demo� Str(0)ip=0:451

+ b�(0) (Log � rgdppci) ;R2 = 0:069; p� = 0:185;

Rural(1)i = 0:0036Log � rgdppc(1)i

p=0:000

+ Land(1)i b�(1) (Demo� Stri) ;

R2 = 0:42; p� = 0:673;

Rural(0)i = �0:0006Log � rgdppc(0)i

p=0:122

+ Land(0)i b�(0) (Demo� Stri) ;

R2 = 0:32; p� = 0:432;

Urban(1)i = 0:0026Log � rgdppc(1)i

p=0:000

+Demo� Stri(1)b�(1) (Landi) ;R2 = 0:323; p� = 0:303;

Urban(0)i = �0:0007Log � rgdppc(1)i

p=0:165

+Demo� Stri(1)b�(1) (Landi) ;R2 = 0:123; p� = 0:405;

that is we have three partially linear varying coe¢ cient models and one partially linear model

(for the deforestation variable). The latter is chosen over a partially linear varying coe¢ cient

because of a higher R2 and lower residual standard error.

20

Table 3 reports the point estimates of the treatment variable GATT/WTO membership,

together with the 0.95 con�dence interval (C.I.) and the p-value (p) of the associated t-statistic

for the null hypothesis of no treatment e¤ect H0 : � = 0, using two estimators: b� 1 as de�nedin (4:7) with the speci�cations of (6:1), and the inverse probability weighting method of Hirano

et al. (2003). Note that as in Bravo & Jacho-Chavez (2011) we exclude observations in the

averages with an estimated propensity score outside the interval [0.05,0.95] in both sets of

estimators.

Table 3 here

The results of Table 3 seem to suggest that the WTO might have negative e¤ect on the

environment once we take into consideration the three covariates. Interestingly the inverse

probability weighted estimator of Hirano et al. (2003) suggests that the environmental e¤ects

are statistically insigni�cant, and the explanation for this seems to stem from the large variability

associated with this estimator, as the 0.95 con�dence intervals clearly indicate. To assess the

sensitivity of the results of Table 3 we have considered a di¤erent choice of bandwidth, namely

that based on Silverman�s rule of thumb, and we have also estimated the average treatment

e¤ect using the doubly-robust estimator b� 2. In both cases the point estimates of the treatmentparameters varied, but crucially all of them remained statistically signi�cant.

7 Conclusions

In this paper we consider partially linear varying coe¢ cient models with responses missing at

random. We consider imputation and inverse probability weighting and propose an omnibus

test for the correct speci�cation based on a Cramer von Mises type of statistic. We also consider

the problem of estimating the mean of a response variable assumed to be related to a set of

covariates via a partially linear varying coe¢ cient speci�cation. As a by-product of this estimator

we propose a novel estimator for the average treatment e¤ect parameter that is key for causal

inference models. We investigate the �nite sample properties of the proposed estimators and test

statistics with simulations. The results of the simulations are encouraging and suggest that all

of the proposed estimators and test statistics and especially those based on inverse probability

weighting with a marginal propensity score are characterized by good �nite sample properties

that compare favourably with those of existing alternatives. We also apply the results of this

paper to investigate whether the WTO can have negative e¤ects on the environment, and suggest

that this might be the case.

21

Appendix

Throughout this appendix we use the following abbreviations: "CLT", "CMT" and "LNN"

denote, respectively, central limit theorem, continuous mapping theorem and (possibly uniform

in the sets U orZ de�ned below) law of large numbers. Let Z = [U;X 0;W 0]0.

Assumptions

A1 (i) The random variable U has bounded support U and its density fU (�) is Lipschitz con-tinuous and bounded away from 0 in U or (ii) the random vector Z has a compact set

support Z � Rl+k+1 and its density fZ (�) is Lipschitz continuous and bounded away from0 in Z,

A2 The p � p matrix E (�XX 0jU) is nonsingular for each U , and E (�WX 0jU), E (�XX 0jU)have Lipschitz continuous second derivatives in U 2 U , and E (�XX 0jU)�1 is Lipschitzcontinuous,

A3 E�kXk4

�<1, E

�kWk4

�<1, E ("4) <1,

A4 Eh� (Z)

�W � �WX (U)

0X�2i

is positive de�nite,

A5 The functions �j (U) have Lipschitz continuous second derivatives in U 2 U ;A6 (i) infu2U � (U) > 0, the function � (U) has Lipschitz continuous second derivatives in U 2 U

or (ii) infz2Z � (Z) > 0, the function � (Z) has Lipschitz continuous second derivatives in

Z 2 Z,A7 As n!1 (i) n1=2h4 ! 0 and nh3= ln (n)!1 or (ii) n1=2b2l ! 0 and nhk+p+3= ln (n)!1

for l > k + p+ 1,

A8 The kernel functions K (�) and L (�) are symmetric densities with compact support,A9 The product measure Fn;� (u; s) d� is absolutely continuous with respect to the Lebesgue

measure in �;

A10 The function E [ (Z) jU ] is Lipschitz continuous and E [ (Z)] = 0:

Proofs of the theorems

Proof of Theorem 1. Let �0XW (u) = [E (�X2jU = u)]�1E [�XW 0jU = u] and �0XY (u) =[E (�X2jU = u)]�1E [�XY jU = u]; by results of Fan & Huang (2005) and CMT

kb�XY (u)� �0XY (u)k = op (1) ;

kb�XW (u)� �0XW (u)k = op (1)

uniformly in U , so that by LLN and CMT (

nXi=1

��i�Wi � �0XW (Ui)0Xi

��2=n

)�1� ��10

= op (1) ;22

hence

n1=2�b� � �0� = ��10 1

n1=2

nXi=1

�i�Wi � �0XW (Ui)0Xi

�"i + op (1) ; (7.1)

and the result follows by CLT and CMT. Let Q denote the semiparametric model and let Q�denote the parametric submodel for the log-density of (Y; �; U;X;W ) that is

ln f� (Y; �; U;X;W ) = C � �2ln�2 � �

2�2(Y �X 0�� (U)�W 0�)

2+ � ln � (Z; �1) +

(1� �) ln [1� � (Z; �1)] + lnh (Z; �2) ;

where h (�) is the density of the covariates Z and � = [�0�; �0; �2; �01; �

02]0 is the �nite di-

mensional parameter. The score functions are S� = �W"��2, S� = �@��=@�X"��2, S� =

� ("2 � �2)��3, S�1 = (� � ��1) @��1=@�1��1�1 (1� ��1)�1 where ��1 := ��1 (Z; �1), and S�2 =

h (Z; �2)�1 @h (Z; �2) =@�2. The tangent space (Bickel et al. 1993, p. 93) is

T� =��W"; �@��=@�X"; �

�"2 � �2

��3; (� � ��1) a (Z) ; b (Z)

�(7.2)

where a (�) is such that E ka (Z)k2 <1 and b (Z) is such that E [b (Z)] = 0. Then the e¢ cient

score function S�� for �0 is the projection -Proj(j)- of S� into the orthogonal complement of thedirect sum S� + S� + S�1 + S�2 and since S�, S�, S�1, S�2 are orthogonal to each other and

S� and S� are orthogonal to (� � ��1) a (Z), b (Z) and E [S�S�] = 0 by normality, it follows

that S�� = S��Proj(S�jS�). Simple algebra shows that such projection amounts to �nding avector of the form �@��=@�X such that E kS� � S�k2 is minimised, and that such vector isE (�WX 0jU)E (�X2jU)�1 �X"=�2.Proof of Theorem 2. Note that

n1=2bv1 (u; s; �) = 1

n1=2

nXi=1

�i"iI�Ui � u; �0 [X 0

i;W0i ]0 � s

�+

1

n1=2

nXi=1

�i [X0i (b�XY (Ui)� �0XY (Ui)) +X 0

i (b�XW (Ui)� �0XW (Ui)) �0]�I�Ui � u; �0 [X 0

i;W0i ]0 � s

��

1

n1=2

nXi=1

�i (W0i �X 0

i�XW )�b� � �0� I �Ui � u; �0 [X 0

i;W0i ]0 � s

�+

1

n1=2

nXi=1

�iX0i (b�XW (Ui)� �0XW (Ui))�b� � �0��

I�Ui � u; �0 [X 0

i;W0i ]0 � s

�=:

4Xj=1

T1j:

23

By standard kernel calculations

T12 =1

n1=2

nXi=1

�iX0i

8<:"

nXl=1

�iX2i Kh (Ul � Ui)

#�1 nXj=1

�jXj

�Yj �X 0

j�0 (Uj)�

W 0i�0]Kh (Uj � Ui)g

=1

n1=2

nXi=1

[E (XjUi)� E (�XjUi)]0�E��X2jUi

��1�iXi"i + op (1) ;

furthermore by LLN

T13 =1

n1=2

nXi=1

�iX0iI�Ui � u; �0 [X 0

i;W0i ]0 � s

� �E��X2jUi

�f (Ui)

��1 �1

n

nXj=1

�jXj"jKh (Uj � Ui) + op (1)

=1

n1=2

nXi=1

G (s; �; Ui)�E��X2jUi

��1�iXi"i + op (1) ;

where G (s; �; u) = E��X 0I

��0 [X 0;W 0]0 � s

�jU = u

�. By LLN and as in the proof of Theorem

1

jT14j � supu2U

kb�XW (u)� �0XW (u)k 1n

nXi=1

XiI�Ui � u; �0 [X 0

i;W0i ]0 � s

� �n1=2

b� � �0 = op (1) ;and T32 �D (u; s; �)n1=2 �b� � �0� = op (1) ;where

D (u; s; �) = E�� (W 0 �X 0�0XY )!I

�U � u; �0 [X 0;W 0]

0 � s��

since the class of function f(r; q; t)! (r � d (q) t)! (r; q) I (r � u; �0t � s) ; u; s; � 2 �g is Vapnik-Chervonenkis. Thus

n1=2bv1 (u; s; �) =1

n1=2

nXi=1

�i"iI�Ui � u; �0 [X 0

i;W0i ]0 � s

�� (7.3)

D (u; s; �) ��101

n1=2

nXi=1

�i�Wi � �0WX (Ui)

0Xi

�"i �

1

n1=2

nXi=1

G (s; �; Ui)�E��X2jUi

��1�iXi"iI (Ui � u) + op (1) :

24

The �nite dimensional convergence of (7:3) follows by CLT, whereas the asymptotic equiconti-

nuity follows by a direct application of Theorem 2.5.2 of Van der Vaart & Wellner (1996), which

implies the weak convergence of (7:3) to v1 (u; s; �) :

For n1=2bv3 (u; s; �), (7:8), CMT and the same arguments as those used to obtain (7:3) can beused to show

n1=2bv3 (u; s; �) =1

n1=2

nXi=1

�i"i� (Ui)

I�Ui � u; �0 [X 0

i;W0i ]0 � s

�+

1

n1=2

nXi=1

�i� (Ui)

(W 0i �X 0

i�XY )�b� � �0� I �Ui � u; �0 [X 0

i;W0i ]0 � s

�+

1

n1=2

nXi=1

G (s; �; Ui)

� (Ui)

�E��X2jUi

��1�iXi"iI (Ui � u) + op (1) ;

and the rest of the proof follows by the same arguments as those used to prove the weak

convergence of n1=2bv1 (u; s; �).Proof of Theorem 3. Note that as in the proof of Theorem 1 by the bootstrap LLN (see e.g.

Bickel & Freedman (1981))

n1=2�b�� b�� = ��10 1

n1=2

nXi=1

�i�Wi � �XW (Ui)0Xi

�"�i + op� (1) ;

and

n1=2bv�1 (u; s; �) =1

n1=2

nXi=1

�i"�i I�Ui � u; �0 [X 0

i;W0i ]0 � s

�� (7.4)

bD (u; s; �) ��10 1

n1=2

nXi=1

�i�Wi � �WX (Ui)

0Xi

�"�i �

1

n1=2

nXi=1

bG (s; �; Ui)" 1n

nXj=1

�jX2j Kh (Uj � Ui)

#�1�

�iXi"�i I (Ui � u) + op� (1) ;

except if the original incomplete sample (Yi; Ui; Xi;Wi)ni=1 is on a set with probability converging

25

to 0 as n!1. As in Stute et al. (1998)

Cov��n1=2bv�1 (u1; s1; �) ; n1=2bv�1 (u2; s2; �)� = 1

n

nXi=1

�2ib"2i I �Ui � u1 ^ u2; �0 [X 0i;W

0i ]0 � s1 ^ s2

��

bD (u1; s1; �) b��1 1n

nXi=1

�Wi � �0XW (Ui)0Xi

�2�2ib"2i I �Ui � u2; �0 [X 0

i;W0i ]0 � s2

��

bD (u2; s2; �) b��1 1n

nXi=1


�2�2ib"2i I �Ui � u1; �0 [X 0

i;W0i ]0 � s1

��

1

n

nXi=1

bG (s1; �; Ui)" 1n

nXj=1


#�1Xi�

2ib"2i I �Ui � u2; �0 [X 0

i;W0i ]0 � s2

��

1

n

nXi=1

bG (s2; �; Ui)" 1n

nXj=1


#�1Xi�

2ib"2i I �Ui � u1; �0 [X 0

i;W0i ]0 � s1

�+

bD (u1; s1; �) ��10 1

n

nXi=1


�2�2ib"2i��10 bD (u2; s2; �)0 �

bD (u1; s1; �) ��10 1

n

nXi=1


��2ib"2iX 0

i

"1

n

nXj=1


#�1�

bG (s; �; Ui)0 I (Ui � u2)� bD (u2; s2; �) ��10 1

n

nXi=1


��2ib"2iX 0

i �"1

n

nXj=1


#�1 bG (s; �; Ui)0 I (Ui � u1) +1

n

nXi=1

bG (s1; �; Ui)" 1n

nXj=1


#�1XiX

0i

"1

n

nXj=1


#�1�

bG (s1; �; Ui)0 �2ib"2i I (Ui � u1 ^ u2)and by LLN��Cov� �n1=2bv�1 (u1; s1; �) ; n1=2bv�1 (u2; s2; �)�� Cov �n1=2bv1 (u1; s1; �) ; n1=2bv1 (u2; s2; �)�� = op (1) :Similarly for n1=2bv�3 (u; s; �)��Cov� �n1=2bv�3 (u1; s1; �) ; n1=2bv�3 (u2; s2; �)�� Cov �n1=2bv3 (u1; s1; �) ; n1=2bv3 (u2; s2; �)�� = op (1) :By LLN

n1=2bv�j (u; s; �) = nXi=1

�i�i"ji=n1=2 + op (1) j = 1 or 3;

26

where

�i = I�Ui � u; �0 [X 0

i;W0i ]0 � s

��D (u; s; �) ��10 �i

�Wi � �WX (Ui)

0Xi

��

G (s; �; Ui)E��X2jUi

�XiI (Ui � u) ;

hence, given the conditional independence of "�i from �i, the �nite dimensional convergence of

n1=2bv�j (u; s; �) follows by Lindeberg�s CLT since by A3 and LLN for each d > 0lim sup

n

1

n

nXi=1

Zj�i"�jij�dn1=2

�i��i"

�ji

�2dPr � � lim sup

n

1

n

nXi=1

�i (�i"ji)2 I (j�i"ji � Cj)! 0

as C ! 1. Finally the asymptotic equicontinuity follows by the same argument as that usedin the proof of Theorem 2 hence under H0

n1=2bv�j (u; s; �) =) vj (u; s; �) in l1 (�) j = 1 or 3

in probability. The conclusion follows by CMT.

Proof of Theorem 4. Note that by the de�nition of n1=2b��j (u; s; �) it follows that��Cov� �n1=2b��j (u1; s1; �) ; n1=2b��j (u2; s2; �)�� Cov �n1=2b�j (u1; s1; �) ; n1=2b�j (u2; s2; �)�� = op (1) ;where b�ji (u; s; �) are the sample analogues of �ji (u; s; �) given in (3:5), and the same argumentsas those used in the proof of Theorem 4 show the �nite dimensional convergence and asymptotic

equicontinuity of n1=2b��j (u; s; �). Thus the conclusion follows by CMT.Proof of Theorem 5. Let �y := p lim

�b�� and �y (U) := p lim (b� (U)). As in the proof ofTheorem 2 some calculations show that

bv1 (u; s; �) =1

n

nXi=1

�i"iI�Ui � u; �0 [X 0

i;W0i ]0 � s

�+1

n

nXi=1

�iW0i�yI�Ui � u; �0 [X 0

i;W0i ]0 � s

�+

1

n

nXi=1

�iX0i�y (Ui) I

�Ui � u; �0 [X 0

i;W0i ]0 � s

�+

1

n

nXi=1

�iE (YijUi; Xi;Wi) I�Ui � u; �0 [X 0

i;W0i ]0 � s

�+ op (1) ;

and the result follows by LLN and CMT. The result for n1=2bv3 (u; s; �) follows similarly hence isomitted.

Proof of Theorem 6. Note that under (3:10)

Yi �X 0i�0XY (Ui)� (W 0

i �X 0i�0XW (Ui)) �0 = "i +

1

n1=2f (Zi)�

X 0i

�E��iX

2i jUi

��1E (�iXi (Zi) jUi)

o:

27

Similarly to (7:1) by LLN and iterated expectations

n1=2�b� � �0� = ��10

1

n1=2

nXi=1


0Xi

�"i +

��10 E�� (Z)

�W 0 � �0XW (U)0X

�[ (Z)�

X 0 �E ��X2jU��1

E (�X (Z) jU)i+ op (1)

= ��101

n1=2

nXi=1


0Xi

�"i + S (Z) + op (1) :

By the same arguments as those used in the proof of Theorem 2 expanding n1=2bv1 (u; s; �) underthe local alternative hypothesis (3:10), say n1=2bv(l)1 (u; s; �), we haven1=2bv(l)1 (u; s; �) = n1=2bv1 (u; s; �) + 1

n

nXi=1

�i (Zi) I�U � u; �0 [X 0;W 0]

0 � s��

1

n

nXi=1

�i (W0i �X 0

i�0XY )S (U;X;W ) I�Ui � u; �0 [X 0

i;W0i ]0 � s

�+ op (1) ;

and by LLN

n1=2bv(l)1 (u; s; �) = n1=2bv1n (u; s; �) + s1 (u; v; �) + op (1) ;where

s1 (u; v; �) = E�� (Z) I

�U � u; �0 [X 0;W 0]

0 � s��

E�� (W 0 �X 0�XW (U)) I

�U � u; �0 [X 0;W 0]

0 � s��

��10 E f� (Z) (W �X�XW (U))�h (Z)�X 0 �E �X2jU

��1E (�X (Z) jU)

io:

The result follows using the same arguments as those used in the proof of Theorem 2 and CMT.

The same arguments can be applied to n1=2bv(l)3 (u; s; �) to obtain the second conclusion.Proof of Theorem 7. Note that

n1=2 (b�1 � �0) =1

n1=2

nXi=1

(X 0i�0 (Ui) +W

0i�0 � �0) +

1

n1=2

nXi=1

�i"i + (7.5)

1

n1=2

nXi=1

(1� �i)hX 0i (b� (Ui)� �0 (Ui)) +W 0

i

�b� � �0�i ;and that

X 0i (b� (Ui)� �0 (Ui)) = X 0

ib�XX (Ui)�1"

nXj=1

Xj�j�Yj �X 0

j (�0XY (Ui)� �0XW (Ui) �0)�

W 0j�0��Kh (Uj � Ui) +X 0

i�0XW

�b� � �0� ; (7.6)

28

where b�XX (u) =Pni=1 �iX

2i Kh (Ui � u) =n, hence by (7:6), LLN and standard kernel calcula-

tions

1

n1=2

nXi=1

(1� �i)hX 0i (b� (Ui)� �0 (Ui)) +W 0

i

�b� � �0�i = (7.7)

1

n1=2

nXi=1

[E (XjUi)� E (�XjUi)]0�E��X2jUi

��1�iXi"i +

E [(1� �) (W 0 �X 0�0XW (U))]n1=2�b� � �0�+ op (1) :

Therefore by (7:1)

n1=2 (b�1 � �0) =1

n1=2

nXi=1

(X 0i�0 (Ui) +Wi�0 � �0) +

1

n1=2

nXi=1

01 (Ui) �i"i +

�01��10

1

n1=2

nXi=1

�i�Wi � �0XW (Ui)0Xi

�"i + op (1) ;

and the result follows by CLT and CMT. For the second result note that for v either u or z and

V either U or Zsupv2V

�� (v)b� (v) � 1�� = op (1) (7.8)

by Masry (1996), and

n1=2 (b�3 � �0) =1

n1=2

nXi=1

(X 0i�0 (Ui) +Wi�0 � �0) +

1

n1=2

nXi=1

�i"i� (Ui)

+ (7.9)

1

n1=2

nXi=1

�1� �i

� (Ui)

�hX 0i (b� (Ui)� �0 (Ui)) +W 0

i

�b� � �0�i+1

n1=2

nXi=1

�b� (Ui)� � (Ui)� (Ui)

��i"i + op (1) ;

and1

n1=2

nXj=1

(�j � � (Uj))1

n

nXi=1

�j"i� (Ui) fU (Ui)

Kh (Uj � Ui) + op (1) = op (1) ;

hence the result follows by CLT and CMT. The last result follows in a similar manner using

(7:8) and (7:9) with � (Ui) replaced by � (Zi), noting that E [E (XjU)� E (�XjU) =� (Z)] = 0:

Proof of Theorem 8. To obtain the bound we calculate the e¢ cient in�uence function for

an estimator of �0. To do so we follow, as in the proof of Theorem 1, the approach of Bickel

et al. (1993) and show that the parameter of interest is pathwise di¤erentiable. The e¢ cient

in�uence function is then the projection of this derivative on the tangent space. Let

� =

ZY f" (Y �X 0�� (U)�W 0�jZ) f (Z; �) dY dZ

29

denote parametric (marginal) submodel for the parameter of interest and note that by the

implicit function theorem and the normality assumption

@�=@� =

��E (W )0 ;�E (@�� (U) =@�)0 ; 0; E

�(X 0�� (U) +W

0�)@f (Z; �)

@v

1

f (Z; �)

�0�:

To show the pathwise di¤erentiability of � we require an F� (Y; U;X;W ) such that @�=@�j�=�0 =E [F� (Y; U;X;W ) s�] j�=�0 , where s� is the score function generating the tangent space T� de-�ned in (7:2). It can be veri�ed that the function F� (Y; U;X;W ) = �"=� (Z) � X 0�� (U) �W 0� � � satis�es this condition, and thus � is pathwise di¤erentiable. Next we show that thefunction

S�� = E (XjU)0�E��X2jU

��1�X"+ E

�W � �0WX (U)

0X�0��10

�W � �0XW (U)0X

��"+

(W + �0 (U)0X � �) (7.10)

lies in T�, and is the required projection. Note that by the e¢ ciency result of Theorem 1 T� is

equivalent to T 0� =�C��W � �0XW (U)0X

�"; �g (U)X"; :::

�where g (�) is an arbitrary function

of U and the other terms are as those given in (7:2). Then with C = E�W � �0XW (U)0X

�0��10 ,

g (U) = E (XjU)0 [E (�X2jU)]�1 � C �0XW (U), a zero for S� and Sv1 , and the last summandin (7:10) as b (Z), it follows that S�� lies T

0�. To verify that S

�� is the required projection we show

that

E��F� (Y; U;X;W )� S��

�t0��= 0 8t0� 2 T 0�: (7.11)

Note that

F� (Y; U;X;W )� S�� =�"

� (Z)�hE (XjU)0

�E��X2jU

��1X + E

�W � �0XW (U)0X

�0��10 ��

W � �0XW (U)0X��";

and that

E

"�C"2

�W � �0XW (U)0X

�� (Z)

� E (XjU)0�E��X2jU

��1X�W � �0XW (U)0X

�0C 0�"2�

E�W � �0XW (U)0X

�0��10

�W � �0XW (U)0X

�2C 0�"2 +

�g (U)X"2

� (Z)�

E (XjU)0�E��X2jU

��1XX 0g (U)0 �"2 �

E�W � �0XW (U)0X

�0��10

�W � �0XW (U)0X

�X 0g (U)0 �"2

i= 0

30

since

�2E

"�C�W � �0XW (U)0X

�� (Z)

� E�W � �0XW (U)0X

�0��10

�W � �0XW (U)0X

�2C 0�

#= 0;

�2EhE (XjU)0

�E��X2jU

��1X�W � �0XW (U)0X

�0C 0�i= 0;

�2E

��g (U)X"2

� (Z)� E (XjU)0

�E��X2jU

��1Xg (U)X�

�=

�2EhE (XjU)0

n�E��X2jU

��1E (XjU)� �0XW (U)0C 0

o�

E (XjU)0n�E��X2jU

��1E (XjU)� �0XW (U)0C 0

oi= 0;

�2EhE�W � �0XW (U)0X

�0��10

�W � �0XW (U)0X

�X 0g (U)0 �

i= 0

by the MAR assumption and iterated expectations. It is easy to show the orthogonality of

F� (Y; U;X;W ) � S�� with respect to the other components of T 0�, which implies that (7:11) isveri�ed so that S�� is the e¢ cient score and the conclusion follows immediately.

Proof of Proposition 4.1. We �rst show the result for �22(ho). Note that

�22(ho) � �2eff = �2E

��

� (Z)� E (XjU)0E

��X2jU

��1E (XjU)� (7.12)

��W � �0XW (U)0X

�0��10 E

�W � �0XW (U)0X

��;

and

Cov

��"

� (Z)� E (XjU)0E

��X2jU

��1X�";

�W � �0XW (U)0X

��"

�= �2E

�W � �0XW (U)0X

�;

so that (7:12) is equivalent to

V ar

��

� (Z)� E (XjU)0E

��X2jU

��1E (XjU)

�� 2E

��W � �0XW (U)0X

��0��10 �

E�W � �0XW (U)0X

�:= V12 � V22 � 0:

Similarly for �21(ho) we have

�21(ho) � �2eff = �2E�� E (XjU)0E

��X2jU

��1E (XjU)� (7.13)

��W � �0XW (U)0X

�0��10 E

�W � �0XW (U)0X

��;

and

Cov��"� E (XjU)0E

��X2jU

��1X�";

�W � �0XW (U)0X

��"�= �2E

��W � �0XW (U)0X

��

31

so that (7:13) is equivalent to

V ar�� E (XjU)0E

��X2jU

��1E (XjU)

�� 2E

��W � �0XW (U)0X

��0 ��1�0��

E��W � �0XW (U)0X

��:= V11 � V21 � 0;

where ��0 = Eh��W � ��0XW (U)0X

�2i. Finally for �23(ho)

�23(ho) � �2eff = �2E

��

� (U)� E (XjU)

� (U)

0E��X2jU

��1E (XjU)�

�

� (U)

�W � �0XW (U)0X

�0��10 E

�W � �0XW (U)0X

��;

and by the same argument as the one used for (7:12) and (7:13) we have that

�23(ho) � �2eff = V ar

��

� (U)� E (XjU)

� (U)

0E��X2jU

��1E (XjU)

��

�2E

��

� (U)2�W � �0XW (U)0X

��0 ��1�20

�E

��

� (U)2�W � �0XW (U)0X

��: = V13 � V23 � 0;

where ��20 = Eh��W � �0XW (U)0X

�2=�2 (U)

i.

Proof of Theorem 9. Similar to the proof of Theorem 7

n1=2 (b� 1 � � 0) =1

n1=2

nXi=1

�X 0i

h�(1)0 (Ui)� �(0)0 (Ui)

i+W 0

i

h�(1)0 � �(0)0

i� � 0

�+

1

n1=2

nXi=1

n�i"

(1)i + (1� �i)

hX 0i

�b�(1) (Ui)� �(1)0 (Ui)�+W 0

i

�b�(1) � �(1)0 �io�1

n1=2

nXi=1

n(1� �i) "(0)i � �i

hX 0i

�b�(0) (Ui)� �(0)0 (Ui)�+W 0

i

�b�(0) � �(0)0 �io: =

3Xj=1

T2j;

and as in (7:7)

T22 =1

n1=2

nXi=1

h(1)01 (Ui) + �

(1)01 �

(1)�10

�Wi � �(1)0XW (Ui)

0Xi

�i�i"

(1)i + op (1) ;

T23 =1

n1=2

nXi=1

h(0)01 (Ui) + �

(0)01 �

(1)�10

�Wi � �(0)0XW (Ui)

0Xi

�i(1� �i) "(0)i + op (1) ;

and the �rst conclusion follows by CLT noting that Cov (T21; T2j) = 0 for j = 2; 3 and

32

Cov (T22; T23) = 0. For the second result note that

n1=2 (b� 2 � � 0) =1

n1=2

nXi=1

hX 0i

��(1)0 (Ui)� �(0)0 (Ui)

�+W 0

i

��(1)0 � �(0)0

�� 0

i+

1

n1=2

nXi=1

8><>:�i

�X 0ib�(1) (Ui) +W 0

ib�(1)�b� (Zi) �

�X 0i�(1)0 (Ui) +W

0i�(1)0

�9>=>;+1

n1=2

nXi=1

8><>:(1� �i)

�X 0ib�(0) (Ui) +W 0

ib�(0)�

1� b� (Zi) ��X 0i�(0)0 (Ui) +W

0i�(0)0

�9>=>;: =

3Xj=1

T3j:

The second term can be written as

T32 =1

n1=2

nXi=1

��i

� (Zi)� 1��

X 0i�(0)0 (Ui) +W

0i�(0)0

�+

1

n1=2

nXi=1

��ib� (Zi) � �i

� (Zi)

��h

X 0i

�b�(1) (Ui)� �(1)0 (Ui)�+W 0

i

�b�(1) � �(1)0 �i+1

n1=2

nXi=1

�i

hX 0i

�b�(1) (Ui)� �(1)0 (Ui)�+W 0

i

�b�(1) � �(1)0 �i� (Zi)

+

1

n1=2

nXi=1

��ib� (Zi) � �i

� (Zi)

��X 0i�(1)0 (Ui) +W

0i�(1)0

�:=

4Xj=1

T32j;

and note that by (7:8), (7:6) and standard kernel calculations

T322 =1

n3=2

nXi=1

�iPn

j=1 [� (Zi)� �j]Kh (Zj � Zi)� (Zi)

2 f (Zi)�h

E (XjUi)0�E��X2jUi

��1�iXi"

(1)i + (W 0

i +X0i�0XW (Ui))

�b�(1) � �(1)0 �i+ op (1) :By iterated expectation E (T322) = 0 and

E jT322j2 � 2

n3h2(k+p+1)

0B@ nXi=1

nXj=1

E

8<:�2i

hE (XjUi)0 [E (�X2jUi)]�1 �iXi"

(1)i

i� (Zi)

2 f (Zi)

9=;2

+

1

n

nXi=1

nXj=1

E

8><>:(W 0

i +X0i�0XW (Ui))

n1=2 �b�(1) � �(1)0 � � (Zi)

2 f (Zi)

9>=>;1CA2

K2 (Zj � Zi)

=C

nhk+p+1+ o (1) ;

33

and hence jT322j = op (1) : Similarly for T324

T324 =1

n3=2

nXi=1

�i

�X 0i�(1)0 (Ui) +W

0i�(1)0

�Pnj=1 [� (Zi)� �j]Kh (Zj � Zi)

� (Zi)2 f (Zi)

+ o()p (1)

=1

n

nXi=1

�i

�X 0i�(1)0 (Ui) +W

0i�(1)0

�Kh (Zj � Zi)

� (Zi) f (Zi)

1

n1=2

nXj=1

� (Zi)� �i� (Zi)

= �T321 + op (1) ;

hence

T32 =1

n1=2

nXi=1

�i

hX 0i

�b�(1) (Ui)� �(1)0 (Ui)�+W 0

i

�b�(1) � �(1)0 �i� (Zi)

+ op (1) :

The same arguments show that

T33n =1

n1=2

nXi=1

(1� �i)hX 0i

�b�(0) (Ui)� �(0)0 (Ui)�+W 0

i

�b�(0) � �(0)0 �i1� � (Zi)

+ op (1) ;

hence the second conclusion follows using (7:9) and CLT.

References

Ahmad, I., Leelahanon, S. & Li, Q. (2005), �E¢ cient estimation of a semiparametric partially

linear varying coe¢ cient model�, Annals of Statistics 33, 258�283.

Bang, H. & Robins, J. (2005), �Doubly robust esimation in missing data and casual inference

models�, Biometrics 61, 962�972.

Bickel, P. & Freedman, D. (1981), �Some asymptotic theory for the bootstrap�, Annals of Sta-

tistics 9, 1196�1297.

Bickel, P., Klaassen, C., Ritov, Y. & Wellner, J. (1993), E¢ cient and Adaptive Estimation for

Semiparametric Models, John Hopkins University.

Bierens, H. (1982), �Consistent model speci�cation tests�, Journal of Econometrics 20, 105�134.

Bravo, F. (2012), �Generalized empirical likelihood testing in semiparametric conditional mo-

ment restriction models�, Econometrics Journal 15, 1�31.

Bravo, F. & Jacho-Chavez, D. (2011), �Empirical likelihood for e¢ cient semiparametric average

treatment e¤ects�, Econometric Reviews 30, 1�24.

34

Cai, J., Fan, J., Jiang, J. & Zhou, H. (2008), �Partially linear hazard regression with vary-

ing coe¢ cients for multivariate survival data�, Journal of the Royal Statistical Society B

pp. 141�158.

Cai, Z., Fan, J. & Li, R. (2000), �E¢ cient estimation and inference for varying-coe¢ cient models�,

Journal of the American Statistical Association 95, 888�902.

Cai, Z., Fan, J. & Yao, Q. (2000), �Functional-coe¢ cient regression models for nonlinear time

series�, Journal of the American Statistical Association 95, 941�956.

Chen, S. & Cui, H. (2006), �On Bartlett correction of empirical likelihood in presence of nuisance

parameters�, Biometrika 93, 215�220.

Cheng, P. (1994), �Nonparametric estimation of mean functionals with data missing at random�,


Cleveland, W., Grosse, E. & Shyu, W. (1991), Local Regression Models, Wadsworth/Brooks-

Cole, pp. 309�376.

Copeland, B. & Taylor, M. (2004), �Trade, growth and enviroment�, Journal of Economic Lit-

erature 42, 7�71.

Delgado, M., Dominguez, M. & Lavergne, P. (2003), �Consistent tests of conditional moment

restrictions�, Annales de Economie et de Statistique 81, 33�67.

Engle, R., Granger, C., Rice, J. & Weiss, A. (1986), �Nonparametric estimation of the rela-

tion between weather and electricity sales�, Journal of the American Statistical Association

81, 310�320.

Escanciano, J. (2006), �A consistent diagnostic test for regression model using projections�,

Econometric Theory 22, 1030�1051.

Escanciano, J. & Velasco, C. (2010), �Speci�cation tests of parametric dynamic conditional

quantiles�, Journal of Econometrics 159, 209�221.

Fan, J. & Huang, T. (2005), �Pro�le likelihood inferences on semiparametric varying-coe¢ cient

partially linear models�, Bernoulli 11, 1031�1057.

Fan, J. & Wu, Y. (2008), �Semiparametric estimation of covariance matrices for longitudinal

data�, Journal of the American Statistical Association 103, 1520�1533.

Fan, J. & Zhang, W. (2008), �Statistical methods with varying coe¢ cients models�, Statistics

and its interface 1, 179�195.

35

Frankel, J. & Rose, A. (2005), �Is trade good or bad for the eniviroment? sorting out the

causality�, Review of Economics and Statistics 87, 85�91.

Gonzalez-Manteiga, W. & Perez-Gonzalez, A. (2006), �Goodness-of-�t tests for linear regression

models with missing response at random�, Canadian Journal of Statistics 34, 149�170.

Hahn, J. (1998), �On the role of the propensity score in e¢ cient semiparametric estimation of

average treatment e¤ects�, Econometrica 66(2), 315�331.

Hardle, W. & Mammen, E. (1993), �Comparing nonparametric versus parametric �ts�, Annals

of Statistics 21, 1926�1947.

Hastie, T. & Tibshirani, R. (1993), �Varying-coe¢ cient models�, Journal of the Royal Statistical

Society B 55, 757�796.

He, X. & Zhu, L. (2003), �A lack-of-�t test for quantile regression�, Journal of the American

Statistical Association 98, 1013�1022.

Hirano, K., Imbens, G. & Ridder, G. (2003), �E¢ cient estimation of average treatment e¤ects

using the estimated propensity score�, Econometrica 71, 1161�1189.

Horvitz, D. & Thompson, D. (1952), �A generalization of sampling without replacement from a

�nite universe�, Journal of the American Statistical Association 47, 663�685.

Kim, M. & Ma, Y. (2012), �The e¢ ciency of the second order nonlinear least squares estimator

and its extension�, Annals of the Institute of Statistical Mathematics 64, 751�764.

Koul, H., Muller, U. & Schick, A. (2012), The transfer principle: A tool for complete case

analysis. Unpublished manuscript.

Little, R. & Rubin, D. (2002), Statistical Analysis with Missing Data, J. Wiley and Sons, New

Jersey, USA.

Masry, E. (1996), �Multivariate regression estimation: Local polynomial �tting for time series�,

Stochastic Processess and their Applications 65, 81�101.

Millimet, D. & Tchernis, R. (2009), �On the speci�cation of propensity scores, with application

to the analysis of trade policies�, Journal of Economic and Business Statistics 27, 397�415.

Muller, U. (2009), �Estimating linear functionals in nonlinear regression with responses missing

at random�, Annals of Statistics 37, 2245�2277.

Paik, M. (1997), �The generalized estimating equation approach when data are not completely

missing at random�, Journal of the American Statistical Association 92, 1320�1329.

36

Pollard, D. (1984), Convergence of Stochastic Processes, New York: Springer & Verlag.

Robins, J., Rotnitzky, A. & Zhao, L. (1994), �Estimation of regression coe¢ cients when ssome

regressors are not always observed�, Journal of the American Statistical Association 89, 846�866.

Robinson, P. (1987), �Asymptotically e¢ cient estimation in the presence of heteroskedasticity

of unknown form�, Econometrica 55, 875�891.

Robinson, P. (1988), �Root-n consistent semiparametric regression�, Econometrica 56, 931�954.

Rosenbaum, P. & Rubin, D. (1983), �The central role of the propensity score in observational

studies for causal e¤ects�, Biometrika 70, 41�55.

Scharfstein, D., Rotnizky, A. & Robins, J. (1999), �Adjusting for nonignorable drop-out using

semiparameteric nonresponse models�, Journnal of the American Statistical Association

94, 1096�1120.

Speckman, H. (1988), �Kernel smoothing in partial linear models�, Journal of the Royal Statistical

Society B 50, 413�436.

Stute, W. (1997), �Nonparametric model checks for regressions�, Annals of Statistics 25, 613�641.

Stute, W., Gonzalez-Mantiega, W. & Quindimil, M. P. (1998), �Bootstrap approximations in

model checks for regression�, Journal of the American Statistical Association 93, 141�149.

Stute, W. & Zhu, L. (2002), �Model checks for generalized linear models�, Scandinavian Journal

of Statistics 29, 535�545.

Stute, W. & Zhu, L. (2005), �Nonparametric checks for single index models�, Annals of Statistics

33, 1048�1083.

Su, J. & Wei, L. (1991), �A lack of �t test for the mean function in a generalized linear model�,


Sun, Z., Wang, Q. & Dai, P. (2009), �Model checking for partially linear models with missing

responses at random�, Journal of Multivariate Analysis 100, 636�651.

Van der Vaart, A. & Wellner, J. (1996), Weak Convergence and Empirical Processes, Springer.

Wang, D. & Chen, S. (2009), �Empirical likelihood for estimating equation with missing values�,

Annals of Statistics 37, 490�517.

37

Wang, L. & LeBlanc, A. (2008), �Second-order nonlinear least squares estimation�, Annals of

the Institute of Statistical Mathematics 60, 883�900.

Wang, Q., Linton, O. & Härdle, W. (2004), �Semiparametric regression analysis with missing

response at random�, Journal of the American Statistical Association 99, 334�345.

Wang, Q. & Rao, J. (2002), �Empirical likelihood-based inference under imputation for missing

response data�, Annals of Statistcs 30, 896�924.

Whang, Y. (2000), �Consistent bootstrap tests of parametric regression functions�, Journal of

Econometrics 98, 27�46.

Whang, Y. (2001), �Consistent speci�cation testing for conditional moment restrictions�, Eco-

nomics Letters 71, 299�306.

Wu, C. (1986), �Jackknife, bootstrap and other resampling methods�, Annals of Statistics

14, 1261�1295.

Xia, Y., Li, W., Tong, H. & Zhang, D. (2004), �A goodness-of-�t test for single-index models�,

Statistica Sinica 14, 1�28.

Zhu, L. & Ng, K. (2003), �Checking the adequacy of a partial linear model�, Statistica Sinica

13, 763�781.

38

Tables and �gures

Table 1a Finite sample sizes of omnibus tests CM1 and CM3 for model 5:1:

� n bcv 2bcv 4bcv

WB

�(1)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0905 .0425

.0968 .0433

.0924 .0454

.0976 .0461

.0953 .0475

.0990 .0489

.0899 .0420

.0965 .0435

.0922 .0453

.0973 .0462

.0949 .0470

.0990 .0486

.0897 .0420

.0967 .0436

.0921 .0456

.0975 .0460

.0951 .0479

.0991 .0485

�(2)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0893 .0398

.0950 .0425

.0925 .0440

.0965 .0456

.0940 .0471

.0980 .0476

.0895 .0394

.0952 .0425

.0921 .0442

.0969 .0456

.0931 .0471

.0982 .0477

.0895 .0397

.0957 .0428

.0921 .0441

.0963 .0450

.0935 .0470

.0981 .0478

MB

�(1)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0888 .0403

.0943 .0426

.0910 .0427

.0949 .0437

.0939 .0446

.0978 .0482

.0890 .0400

.0940 .0425

.0906 .0426

.0948 .0438

.0938 .04 55

.0973 .0480

.0887 .0401

.0938 .0427

.0906 .0430

.0945 .0433

.0957 .0447

.0975 .0479

�(2)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0883 .0386

.0925 .0418

.0900 .0420

.0925 .0427

.0950 .0441

.0978 .0469

.0883 .0389

.0921 .0420

.0902 .0421

.0926 .0427

.0952 .0435

.0979 .0467

.0885 .0389

.0924 .0419

.0892 .0419

.0928 .0422

.0954 .0435

.0978 .0469

39

Table 1b Finite sample sizes of omnibus tests CM1 and CM3 for model 5:2

� n bcv 2bcv 4bcv+0:9

WB

�(1)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0923 .0445

.0956 .0459

.0942 .0465

.0980 .0470

.0983 .0480

.0991 .0491

.0920 .0448

.0958 .0454

.0941 .0468

.0977 .0471

.0979 .0479

.0989 .0491

.0921 .0449

.0954 .0457

.0943 .0465

.0980 .0468

.0980 .0479

.0988 .0492

�(2)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0917 .0434

.0929 .0448

.0938 .0455

.0950 .0462

.0967 .0475

.0976 .0481

.0918 .0436

.0928 .0449

.0935 .0455

.0949 .0460

.0966 .0474

.0975 .0480

.0915 .0435

.0930 .0445

.0935 .0456

.0951 .0460

.0967 .0475

.0974 .0480

MB

�(1)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0915 .0435

.0947 .0448

.0935 .0457

.0954 .0465

.0966 .0468

.0981 .0483

.0916 .0435

.0946 .0447

.0939 .0455

.0950 .0463

.0964 .0465

.0978 .0481

.0914 .0433

.0947 .0445

.0937 .0458

.0951 .0462

.0967 .0465

.0983 .0482

�(2)

50

100

400

CM1

CM3

CM1

CM3

CM1

CM3

.0915 .0425

.0938 .0441

.0932 .0436

.0945 .0459

.0960 .0463

.0971 .0475

.0913 .0429

.0939 .0440

.0938 .0434

.0943 .0458

.0960 .0462

.0972 .0474

.0913 .0424

.0940 .0442

.0931 .0438

.0945 .0456

.0961 .0462

.0970 .0477

40

Table 2a. Bias for imputation estimators for the mean functional

� n b�1 b�2 b�3 b�HIR b�CModel 5.1

�(1)

50

100

400

-.00104

-.00093

-.00046

-.00081

-.00074

-.00014

-.00858

-.00680

-.00147

-.00090

-.00081

-.00029

-.00125

-.00105

-.00056

�(2)

50

100

400

-.00283

-.00143

-.00078

-.00159

-.00113

-.00059

-.001508

-.00105

-.00058

-.00187

-.00126

-.00068

-.00299

-.00164

-.00088

Model 5.2

�(1)

50

100

400

.00642

.00542

.00322

.00328

.00244

.00133

.00327

.00223

.00144

.00513

.00371

.00222

.00751

.00599

.00381

�(2)

50

100

400

.00758

.00572

.00399

.00514

.00384

.00201

.00452

.00352

.00192

.00557

.00466

.00249

.00833

.00633

.00413

Table 2b. Standard error for imputation estimators for the mean functional

� n b�1 b�2 b�3 b�HIR b�CModel 5.1(a)

�(1)

50

100

400

1.1163

1.1203

1.1415

1.1299

1.1428

1.1541

1.1324

1.1483

1.1755

1.1401

1.1479

1.1512

1.0913

1.0971

1.1164

�(2)

50

100

400

.99562

1.0329

1.0645

1.0041

1.0412

1.0787

1.0091

1.0954

1.0883

1.0240

1.0866

1.0893

.94937

.96524

.98776

Model 5.2(b)

�(1)

50

100

400

2.5870

2.6099

2.6122

2.6083

2.6240

2.6304

2.6110

2.6280

2.6475

2.5935

2.6059

2.6290

2.4421

2.5559

2.6004

�(2)

50

100

400

2.1748

2.1959

2.2379

2.2219

2.2704

2.3073

2.2344

2.2877

2.3322

2.2027

2.2657

2.3002

2.0993

2.1089

2.1413

(a) The simulated standard deviation of unobservable response is 1:215:(b) The simulated standard deviation of unobservable response is 2:741.

41

0.2

0.4

0.6

0.8

1.0

CM1

γ

0 1.4 2.8

WBMB

0.2

0.4

0.6

0.8

1.0

CM3

γ

0 1.4 2.8

WBMB

0.2 0.6 1.0

0.2

0.4

0.6

0.8

1.0

CM3 vs CM1

CM1CM

3

Figure 1: Finite sample power of omnibus tests CM1 and CM3 for model (5:1) using both WB

and MB for n = 100.

Table 3 Average treatment e¤ect

� 1 �HIRb� 1 C:I: p b�HIR C:I: p

co2pc 0.79 0.34,1.24 0.00 0.33 -0.61,1.28 0.484

defor 0.72 0.47,0.97 0.00 0.38 -0.25,1.01 0.233

rural 21.44 15.08,27.39 0.00 8.57 -10.21,27.37 0.367

urban 28.06 18.89,37.24 0.00 11.78 -17.26,40.82 0.422

42

0.2

0.4

0.6

0.8

1.0

CM1

γ

0 1 2

WBMB

0.2

0.4

0.6

0.8

1.0

CM3

γ

0 1 2

WBMB

0.2 0.6 1.0

0.2

0.4

0.6

0.8

1.0

CM3 vs CM1

CM1

CM

3

Figure 2: Finite sample power of omnibus tests CM1 and CM3 for model (5:2) using both WB

and MB for n = 100.

43

7 8 9 10

05

1015

Logrgdppc

Co2

pc

7 8 9 102

02

46

Logrgdppc

Def

or

5 0 5 10

2040

6080

DemoStr

Rur

al

0 100 200 300 400 500

2040

6080

100

Land

Urb

an

Figure 3: Scatter plots for the four enviromental variables with local linear regression line

44

Partially linear varying coe¢ cient models with missing at ... · Partially linear varying coe¢...

Documents

Transcript of Partially linear varying coe¢ cient models with missing at ... · Partially linear varying coe¢...