Convergence rates for partially splined models

6
Statistics & Probability Letters 4 (1986) 203-208 June 1986 North-Holland CONVERGENCE RATES FOR PARTIALLY SPLINED MODELS John RICE Department of Mathematics, Universi(v of ('ahfornia, San Diego, La Jolla, CA 92093, USA Received January 1986 Abstract: A partial spline model is a semi-parametric regression model. In this paper we analyze the convergence rates of estimates of the parametric and nonparametric components of the model under a particular assumption on the design. We show that the estimate of the parametric component of the model is generally biased and that this bias can be larger than the standard error. To force the bias to be neglible with respect to the standard error, it is necessary to undersmooth the nonparametric component. 1. Introduction Partially splined models are a means of combin- ing both parametric and nonparametric compo- nents in a multiple regression model. An example of such a model is y=#Tx,+g(t,)+~,, i=l ..... #/, (1) where x, is a p-vector, fl is a vector of the usual regression coefficients, and f is an unknown smooth function. The coefficients fl and the func- tion f are estimated as the minimizers of #l ' E (>,,-p;'-,-,-s(,,>) 2 f )2 - +X (f~'"'(t) dt. (2) #/1=1 where the parameter ~ balances smoothness of f with fidelity to the data. Such models offer a convenient means of including nonlinearities of an unspecified form in a regression model and ex- tending the use of smoothing splines, which have been quite successful in one-dimensional prob- lems, to higher dimensions. Note that f is a natu- ral spline since for any value of fl the minimizer of (2) is a natural spline. Since natural splines can be expressed as linear combinations of B-splines, the minimizers of (2) can be calculated explicitly by solving a linear system. Regardless of whether one 'really believes' the model (1), the structure and the estimate can be viewed as an approximation that may be preferable for small to moderate sample sizes to a full-blown multivariate nonpara- metric analysis, since the bias may be more than offset by variance reduction. Partially splined models were proposed and applied in Engle et al. (1983) and were also proposed by Wahba (1984). Other references are given in the bibliography. Little is known about the statistical properties of partial splines, although many questions come immediately to mind. In particular, one might ask how the estimation of the parametric components is affected by the simultaneous estimation of the nonparametric components. Ideally, one might hope that parametric convergence rates could be attained for the parametric components and typi- cal nonparametric rates for the nonparametric components. In this note we obtain a representation for the estimators of a partially splined model and use it to explore the question raised in the paragraph above. It will be shown that the optimistic hope about convergence rates is not in general fulfilled, although there are special situations in which it is. 1.1. Representation of the estimate We consider the following model: y,=au,+g(ti)+~ i, i=l ..... n, (3) where a is an unknown regression coefficient, g is 0167-7152/86/$3.50 ~ 1986. Elsevier Science Publishers B.V. (North-Holland) 203

Transcript of Convergence rates for partially splined models

Statistics & Probability Letters 4 (1986) 203-208 June 1986 North-Holland

C O N V E R G E N C E RATES FOR PARTIALLY S P L I N E D M O D E L S

John RICE

Department of Mathematics, Universi(v of ('ahfornia, San Diego, La Jolla, CA 92093, USA

Received January 1986

Abstract: A partial spline model is a semi-parametric regression model. In this paper we analyze the convergence rates of estimates of the parametric and nonparametric components of the model under a particular assumption on the design. We show that the estimate of the parametric component of the model is generally biased and that this bias can be larger than the standard error. To force the bias to be neglible with respect to the standard error, it is necessary to undersmooth the nonparametric component.

1. Introduct ion

Part ia l ly spl ined models are a means of combin- ing both pa ramet r i c and nonparamet r i c compo- nents in a mul t ip le regression model. An example of such a model is

y=#Tx,+g(t,)+~,, i = l . . . . . #/, ( 1 )

where x, is a p-vector , fl is a vector of the usual regression coefficients, and f is an unknown smooth function. The coefficients fl and the func- t ion f are es t imated as the minimizers of

#l

' E (>,,-p;'-,-,-s(,,>) 2 f )2 - + X (f~'"'(t) dt . (2) #/1=1

where the pa rame te r ~ balances smoothness of f with fidelity to the data. Such models offer a convenient means of including nonl inear i t ies of an unspecif ied form in a regression model and ex- tending the use of smoothing splines, which have been quite successful in one-d imens ional prob- lems, to higher dimensions. Note that f is a natu- ral spline since for any value of fl the minimizer of (2) is a na tura l spline. Since natural splines can be expressed as l inear combina t ions of B-splines, the minimizers of (2) can be calculated explici t ly by solving a l inear system. Regardless of whether one ' r ea l ly believes ' the model (1), the s t ructure and the es t imate can be viewed as an app rox ima t ion

that may be preferable for small to modera te sample sizes to a ful l-blown mul t ivar ia te nonpa ra - metr ic analysis, since the bias may be more than offset by var iance reduct ion. Par t ia l ly spl ined models were p roposed and appl ied in Engle et al. (1983) and were also p roposed by W a h b a (1984). Other references are given in the b ib l iography.

Lit t le is known about the stat is t ical proper t ies of par t ia l splines, a l though many quest ions come immedia te ly to mind. In par t icular , one might ask how the es t imat ion of the paramet r ic componen t s is affected by the s imul taneous es t imat ion of the nonparamet r i c components . Ideally, one might hope that paramet r ic convergence rates could be a t ta ined for the pa ramet r i c componen t s and typi- cal nonparamet r i c rates for the nonparamet r i c components .

In this note we obta in a representa t ion for the es t imators of a par t ia l ly spl ined model and use it to explore the quest ion raised in the pa rag raph above. It will be shown that the opt imis t ic hope about convergence rates is not in general fulfilled, a l though there are special s i tuat ions in which it is.

1.1. Representation of the estimate

We consider the following model :

y , = a u , + g ( t i ) + ~ i, i = l . . . . . n, (3)

where a is an unknown regression coefficient, g is

0167-7152/86/$3.50 ~ 1986. Elsevier Science Publishers B.V. (North-Holland) 203

V o l u m e 4, N u m b e r 4 S T A T I S T I C S & P R O B A B I L I T Y L E T T E R S J u n e 1986

a smooth function to be estimated, and the e i are independent with mean 0 and variance o 2. As- sumpt ion on the u, and t, will be introduced in the next section, where we derive some asymptot ic propert ies of the estimates. In this section we derive a representat ion of the est imates of a and f , which are the minimizers of

1 n

( y i - & u i - f ( t i ) ) 2 + x f [ f ( " } ( t ) 1 2 d t . (4) n i = 1

A convenient basis for theoretical study of S,,", the space of all natural splines of order m on [0, 1], with knots at the points t,, i = 1 . . . . . n, was introduced by Demmle r and Reinsch (1975). This linear space is of d imension n and is spanned by functions q~j,(t), j = 1 . . . . . n, with the following bior thogonal i ty property:

1_ = & , (5)

h i _ 1

f0'e I2'(,) eg2(t) dt = Xk,8,k. (6)

Here Xk. are a nondecreasing sequence of non- negative eigenvalues. Asympto t ic propert ies of these eigenfunction and eigenvalues have been studied by Utreras (1980,1983) and will be used in the following section.

We introduce some notat ion: let

z i = y, - 8ui , (7)

i - -1

n

?aM = n - ' / 2 Y'. z , e & . ( t , ) , (9) i = 1

/}k~ = n ,/2 f i y, qak~(t/). (10) ~=1

n

ok, , = n ,,,2 ~_. g ( t , ) ~ k , , ( t , ) " (11) i - 1

Note that z i and 9a. depend on & although this dependence is suppressed. We have

n

f ( t ) = n ,/2 Y2~ ak,,{/k,,(t) (12) k - 1

form some coefficients b~, to be determined. Now using the or thogonal i ty propert ies of the

qk, we can diagonalize (4), re-expressing the quan- tity to be minimized as

1 (5% 8a,) 2 + X Y'. 82 (13) - - - - - - ~ k k n k n n n

k = l k = l

which is minimized if

Yk, (14) ^ =

ca" I + X X k .

Now, ~k,, still depends on 8; substituting this expression for ck. into (13) we see that the quan- tity to be minimized becomes

n = 1 + a ) , a . n k=,

XX~.. × (15)

1 + ),Xk. "

Minimizing this last expression with respect to &, we obtain

)( )' 2 X X k n XXk. f i vk" 1 = _ va~/fk. 1 + a x e . + XXk.

& k = l k = l

(16)

We have thus shown

Lemma. The minimizers o f (4) are given by 8 as in (16) and f as in (12) with coefficients ck,, given by (14).

The case of several paramet r ic components , in which a is a vector, can be analyzed similarly.

2. C o n v e r g e n c e r a t e s

In this section we analyze convergence rates under part icular assumptions. These assumptions are p robab ly not the most natural, and are cer- tainly not the most general, but they do allow us to see that various rates of convergence are indeed possible, and the results suggest what we conjec- ture is true more generally.

We will assume that the points t,,, are regular in the sense that

2 i - 1 - f t ' " p ( t ) d t (17)

2n a0

204

Volume 4, N u m b e r 4 S T A T I S T I C S & P R O B A B I L I T Y L E T T E R S June 1986

for some density function p(t) on [0, 1]. This assumpt ion was used by Speckman (1982). Under this assumpt ion we have the following est imate of the eigenvalues ~ , , (Speckman, 1982):

;k,,, = v,(1 + 0(1)) (18)

where 0(1 ) is independent of k for k = o( n 2/(2., + 1) } and where

v~=('~k)2"[folp(t) ' /2"~ dt]2"( l +o( l ) ) . (19)

Also, for k < m - 1, X~.,, = v k = 0. We will assume that the points u., are spread out in some fashion and are correlated with the points 6.- More specfi- cally we will assume that u,. = x; . + h ( 6 . ) where h is a cont inuous function. The assumpt ions to follow are essentially that the sequence x; . be- haves like a sequence of independent identically distr ibuted r andom variables and that h is the regression of u on t. Heckman (1985) has consid- ered a similar model with h = 0. Let

~,,, = n -~/~ ~ q,,.( t;)x;., (20) i = 1

t/

h , . = n E (2a) i = l

We assume that the x, , satisfy the following con- ditions:

1 L A1. -- ~ , , , ~ 0 , n ~ o c , /7

k = l

1 A2. - E l;2,--*P >0, n ~ ,

n k = l

A3. sup ] ( , , , ]=O( logn) 1 <~k~n

(the ¢,,, are obtained by an or thogonal trans- format ion of the x , , and are uncorrelated. As- sumpt ion A3 can be shown to follow from Hoeff- ding's inequality if the x, , are iid).

We now consider the variance and bias of &.

Proposit ion A. Assume that )kn 2''/'(2"+1)-~ ---~ oc for some e > 0. that h ~,, = O(nk ~) for some a > 1. and that A1 A3 above hold. Then Var(&) = O ( n - 1).

Proof. F rom (16),

02 L v*2. 1 +~, ;k , . k = l V a r ( a ) =

XXk. )2 (*~=1 v2"l + XX, .

We wish to show that

v,. - L XX,,, E 2 v2,, 1 + )tX,.

k = l k = l

(22)

1 - ~ V~'l +XX,,,

k = l

= o ( n ) (23)

and a similar bound for the numerator . We ex- pand v2, into componen t s involving ~ , , and h , , , and consider first the componen t s involving ( , , . Let K,, = [ l , n 2 / (2m+l ) -a] for some 6 > 0 and con- sider the sums over K , and /g,. Now

~ 2 O(log n ) ~ 1 (24) Y'~ 1 + XX,. 1 + X2,,. ' K,, K.

The sum can he est imated by an integral to be O(~, l/2m), which when multiplied by log n is o(n). Now

1 n ~ k 2 n ' l + XXk~ -- O(Iog n) Xn4m/,2m+l , 2m~ (25) K.

since the X, , are increasing, and this expression is o(n) for some 8 > 0.

We now consider the terms involving hk,. Now n 2 E , = ~ h , , X X k J ( 1 + XX,,,) is of the same order of

magni tude as

X ~ 2 (26) h , . X , . + Y'. 2 h k n " k~<)X 172., k>)t 1/2.,,

The second sum is o(n), and using the assumpt ion on the rate of decrease of the h , , and est imating the X, , by (18), we see that the first sum is o(n) as well. Similar calculations apply to the numerator . [ ]

The proposi t ion shows that the variance of & is asymptot ical ly the same as if there were no non- parametr ic component . The assumpt ion on the h,,, can be interpreted as follows: the h , , are the Fourier coefficients of h with respect to the func- tions q~k,,, and we know that n 1E~_lh2,, < oc for every n, and that as n ~ ze this sum tends to the

205

Volume 4. Number 4 STATISTICS & PROBABILITY LETTERS June 1986

Proposition E. I f h is m times continuously differen- tiable, then B 2 = 0 ( ~ ).

Proof. B 2 is of the same order of magnitude as

X E hk,,Ck,,. k<X 1.2,, k>~X t,2,,,

N o w

r/

( l / n ) Y" Ck,,hk,,Xk, , fh'"'l(,lg'""(t) dt , k=l

so that the contribution to B 2 from the first sum is O(X). Now, for the second sum,

in E h l , ' , , ¢kn~(X 1 /2m)2m E hk,,C/,,, k~>X i 2,,, k>~X 1 2,,,

1 <~ - Y'~ hk~Ck,,Xk, , = o(1)

t/ k>~X I 2m

which implies that (1/n)F~k~ x , :,,,h~.,,ck,, = o(X ). []

The analysis of the convergence rates of the nonparametric estimate of g is now relatively straightforward. Suppose that the parametric part were known; we would then let

)% =)% - auiH, (30) ?/

: n 1/: S (31) i=1

n

i(t)=n ,/2E k : l 1 ~SX-Xk,,ePk"'tl" ( ) (32)

The estimate f of the mixed parametric nonpara- metric model can be expressed as

t l

Ukn E l+XX ,,

k = l

= i ( t ) + a,,(t). (33)

Consider the contribution of 8,,(t) to the in- tegrated squared bias of f :

?t

1 E [ E 6 , , ( t , ) 1 2 : [ E ( a - S ) 2] H i - 1

1 n U 2 × _ y, k,, (34)

n J,-=l (1 + XX~.,,) 2 '

Using arguments similar to those used earlier, it can be seen that the sum is O(1). The convergence rate of the integrated squared bias is thus no worse than in the purely nonparametric situation. A simi- lar analysis applies to the variance.

The rate of decrease o f B2p referred to in Proposition C depends upon the smoothness of g and on boundary conditions on g and its deriva- tives (Speckman, 1983, and Rice and Rosenblatt, 1983). If g~") is in L 2, B,w - ~k 1 / 2 , whereas if gt2,,~ is in L 2, and certain higher derivatives of g vanish at the boundary, B,e - X. Thus we would expect the bias of & to be of order between X 1/2 and X. It can be seen from the proof of Proposition E that intermediate rates are attainable by making as- sumptions on the rates of decrease of hk, , and ck, ,. In summary, we have seen that the variance of decreases at a parametric rate, whereas the bias decreases at a nonparametric rate except under special conditions. These special conditions in- clude the case h = 0, which has been analyzed by Heckman (1985) and the case in which h is in the null space of the penalty functional, a design that has been used by Wahba (personal communica- tion). However, the standard deviation of & can tend to zero more quickly than the bias, which makes the use of traditional confidence intervals misleading. Finally note that the rate B 2 = o(n 1/2) can be achieved, but only at the expense of undersmoothing the nonparametric component. In light of these findings, the utility of ordinary cross-validation is questionable; there is a need for further study of automatic smoothing procedures in the context of semi-parametric modeling.

Acknowledgements

I would like to acknowledge some very helpful discussions with Charles Stone during a quarter of sabbatical leave he spent at UCSD in the Fall of 1984. and some comments on exposition made by Peter Bickel.

References

Engle, R., C. Granger, J. Rice and A. Weiss (1983). Semi-para- metric estimates of the relation between weather and elec- tricity sales, Journ. Amer. Star. Assoc'., to appear.

207

Volume 4, Number 4 STATISTICS & PROBABILITY LETTERS June 1986

Demmler, A. and C. Reisch (1975), Oscillation matrices with spline smoothing, Num. Math. 24, 375-382.

Denby, L. (1984), Smooth regression functions, Ph.D. Thesis, Department of Statistics, University of Michigan.

Green, P., C. Jennison and A. Seheult (1984), Analysis of field experiments by least squares smoothing, manuscript.

Heckman, N. (1985), Spline smoothing in a partly linear model, manuscript.

Rice, J. and M. Rosenblatt (1983), Smoothing splines: Regres- sion, differentiation, and deconvolution, Ann. Stat. 11, 141-156.

Speckman, P. (1982), The asymptotic integrated mean square error for smoothing noisy data by splines, manuscript.

Speckman, P. (1983), Spline smoothing and optimal rates of convergence in nonparametric regression models, manu- script.

Utreras, F. (1980), Sur le choix des parametre d'ajustment dans le lissage par functions splines. Num. Math. 34, 15-28.

Utreras F. (1983), Natural spline functions: their associated eigenvalue problem, Num. Math. 42, 107-117.

Wahba, G. (1984), Partial spline models for the semi-paramet- ric estimation of functions of several variables, manuscript.

208