Generalized principal component analysis with respect to instrumental variables via univariate...

Computational Statistics & Data Analysis 16 (1993) 423-440 North-Holland

423

Generalized principal component analysis with respect to instrumental variables via univariate spline transformations

Jean-Fransois Durand

ENSAM-INRA-UM II, Unite’ de Biomktrie, Montpellier, France

Received January 1992 Revised July 1992

Abstract: A method is proposed for a nonlinear structural analysis of multivariate data, that is termed a generalized principal component analysis with respect to instrumental variables via spline transformations (or spline-PCAIV). This method combines features of multiresponse additive spline regression analysis and principal component analysis. The solution of the corresponding linear problem belongs to the set of the feasible solutions and constitutes the first step of the associated iterative algorithm. Introducing adapted metrics in principal component analysis leads to an interpretation of the method as an optimal canonical analysis. Examples related to distorted pattern recognition, multivariate regression analysis and nonlinear discriminant analysis show how spline-PCAIV works.

Keywords: Regression splines; Linear smoothers; Additive models; Multivariate analysis.

1. Introduction and preliminaries

In the literature on multivariate analysis relating two sets of variables, principal component analysis with respect to instrumental variables (Rae, 1964) is vari- ously called redundancy analysis. An extensive bibliography can be found in van der Burg and de Leeuw (1990). The geometrical approach developed in the present paper is quite different. Introducing metrics to compute distances between objects and variables respectively provides a unifying tool for the linear structural analysis of two sets of data, see Escoufier (1987). In the same time, the transformation of the predictors by regression spline functions supplies an additive model (Hastie and Tibshirani, 1990) to the considered multiresponse

Correspondence to: J.-F. Durand, UnitC de BiomCtrie, ENSAM-INRA-Montpellier II, 9 Place Pierre Viala, 34060 Montpellier, France.

0167-9473/93/$06.00 0 1993 - Elsevier Science Publishers B.V. All rights reserved

424 J.-F. Durand / Generalized principal component

regression. The resulting iterative method that is termed spline-PCAIV offers to replace the linear approach by the additive. This generalization is different to that of Zaamoun (1989) which is not iterative and consists of carrying out the linear regression of one set of variables and a coding array of the second set using splines.

Let a statistical study of q variables measured on yt objects be defined by a triple (Y, Q, D), where Y is a it X q data matrix, Q is a q X q symmetric semi-definite positive matrix used to compute distances between objects, and D is a y1 X n diagonal matrix with positive diagonal elements summing to 1. The matrix D, called the matrix of weights associated with the objects, is positive- definite and provides a metric in the space of the variables. If we denote the transposed matrix of Y as Y’, the matrix YQY’D, whose eigenvectors are the principal components of the triple (Y, Q, D), is called the characteristic operator of the representation of the objects. It constitutes a straightforward generalization of II - ‘YY’, the usual operator of the scalar products between objects. However, choosing more general Q and D than 1, and IZ - ‘1, can be a procedure worth considering (Escoufier and Holmes, 1990). For example, the Cholesky decomposition Q = LL’ allows the user to take into account linear transformations of the variables. The operators of (Y, LL’, D) and (YZ, Iq, D) are identical. To introduce some preliminary notation before examining symmetrical properties of the characteristic operator, let S(D) be the set of the D-symmetric matrices (i.e., the set of matrices A such that DA =A’D). We know (Robert and Escoufier, 1976) that the symmetrical bilinear form ( , ) given by (A, B) = tr(AB), defines a scalar product on S(D). The induced norm is denoted II . )I, or more precisely, II A I( = (tr(A2))l12. The usual Euclidian norm, denoted I( . II 2, and defined by II A II 2 = (tr (A’A>>‘/2 is also used. Obviously, YQY ‘D belongs to S(D); when D = n -‘Inn, this operator is symmetric and the two norms yield the same result.

What is generalized PCAIV? Consider two studies on the same y1 objects provided with the same weight-metrics. The first, based on q response variables, is called the reference study. It leads to the completely determined PCA, that of the triple (Y, Q, 0). The second one is called the object information study of p

instrumental or explanatory variables, the observations on which are gathered in the IZ xp matrix X. The purpose of the generalized principal component analysis of (Y, Q, D) with respect to X, is to choose a metric R and a transformation 7 that gives T = 7(X), such that the PCA of (T, R, D) fits the PCA of (Y, Q, 0) as closely as possible. The distance between the two PCAs is quantified by I( YQY ‘D - TRT ‘D II 2.

In the next section, the transformation 7 applied to each column of X consists of a linear combination of normalized B-splines. The existence of a particular combination leading to T = X is explicitly detailed, thus showing how linear PCAIV as defined in Escoufier (1987) can be seen as a particular spline-PCAIV. The link between the method and multiresponse regression using additive splines is developed in Section 3. Considered as a particular Q, the Mahalanobis metric leads us to view spline-PCAIV as an optimal canonical

J.-F. Durand / Generalized principal component 425

analysis. Section 4 describes the iterative algorithm as a relaxation technique based on two variables: the metric R and the vector of the spline parameters.

Section 5 presents three examples. The first is strongly related to distorted pattern recognition and tries to answer the following question: Is it possible to distort an instrumental pattern in such a way that the user can recognize another one considered as a reference? When plane shapes are processed, this problem is called the reconstruction of an anamorphosis, anamorphosis being an old word which is here to be taken as meaning: a plane shape deduced from an another such shape by means of continuous nonlinear transformations. Identify- ing signatures is a real life example of such a problem. An artificial example showing how spline-PCAIV works in this context is discussed. It should be noted that this technique uses as information not only the two patterns but also the ordering of the points in the two patterns. The second application deals with spline-PCAIV considered as an additive regression technique on the Los Ange- les air pollution data (Breiman and Friedman, 198.5). The issues are very similar to those obtained by others methods based on additive models. The third example uses spline-PCAIV as a nonlinear discriminant analysis experimented on the iris data published by Fisher (1936). This nonlinear method performs as well or better than the linear and is to be compared with methods derived from other iterative smoothing techniques, see Breiman and Ihaca (1989). The Ap- pendix gathers the proofs of the propositions presented in the earlier sections.

2. Using regression splines to transform the predictors

Denote by xi the jth instrumental variable and {B,j(.)}i= l,...,r a basis of normalized B-splines of order m which allows the transformation of xj, Al- though it is more realistic to take a number of splines rj adapted to each x,, for the sake of simplicity, the number of splines Y is the same for all the variables (i.e., Y = m + K, where K is the number of interior knots). The knot sequence cl{, * *.,&+/J used for transforming the jth variable, may be written

min Xki=ti= ..a =ti<t,‘,+,< a.. k=l,...,n

and the associated transformation Fj is

Fj(Xj) = c UijBij(Xj). i=l,...,r

The parameters aij become part of the set of the unknown variables in the optimization problem presented in Section 3. The array X is replaced by T, with

= . . . =f’ 2m+K = max Xkj,

k=l,...,n

the same dimensions y1 xp, given by

Tkj=~(xkj), j=l,...,P, k=l,...,n.


Let B=(B’) **. 1 BP) be the matrix yt X r-p whose jth block of r columns Bj is obtained by the transformation of the jth column of X. For example, if we assume that the instrumental variables are categorical data, the B-matrix is a fuzzy coding of the structural data when the order of the splines is greater than 1 (van Rijckvorsel, 1988). We have

Tkj= C Biia,j = C Bk,(j-,,,+iaij, i=l,...,r i=l,...,r

or in matrix notation,

T=BA. (1)

The array A of dimensions rp Xp, containing the parameters aij takes the form

a11 0 . .

a rl 0 *

0 a12 . A= * * *

a r2

0

0 0 *.

0

0

alP

a rP

If {Jij} is the canonical basis of the set of rp xp matrices, then

The

A= c C ‘ijJ(j-l)r+i,j, j=l ,..., p i=l,..., r

parameters can also be written as an element of (Wrp,

a= c C aije(j-l)r+iy

(2)

(3) j=l,..., p i=l,...,r

where {ej) is the canonical basis of [wrp. We may also write a as

a’ = (a; I . . * ( a;),

where aj belongs to [w’. It must be noted that there exist particular values aij = tij called the nodal

values of the parameters (Schumaker, 19811, such that

C SijBij(xj) =xj>

i=l,...,r

which give T = X. These nodal values are computed from the knots when m, the order of the splines, is greater than 1

1 tij = ~ C

m - 1 k=l,...,m-1

ti+k, i=l,..., r,j=l,..., p,


The unknown ai can be constrained if the transformation q. is to preserve some properties of the variable xi. Let us note 9 the set of the admissible values for a (in the unconstrained case 9 = rWrp). If the jth column of T is to have nonnegative elements, the feasible region Kj is

where 2 is here the partial ordering of F!“. The admissible region can be written

5Z = nj=l,...,p J' K. As an aside we note that the set 06, is a cone containing the positive cone I&!>, but we cannot restrict the vector aj to be only an element of rWi without dropping possibilities of negative optimal parameters.

3. PCAIV via spline transformations

3.1. Spline-PCAIV and multiresponse additive regression

The aim of the method is to minimize the discrepancy between the operators YQY’D and TRT’D, with T = BA. The optimization variables are the p X p metric R and the array A, or more precisely, the vector a of the parameters. The definition of the norm yields

11 YQY’D - TRT’D I( 2 = II YQY’D II 2 + (I TRT’D II2 - 2 tr(YQY’DTRT’D),

and the objective function f may be defined by

f(R, a) = $11 TRT’D 1) 2 - f tr(YQY’DTRT’D). (4)

The problem is to find a metric R and a vector Z (i.e., an array T = Bx) such as

f(R, Z) = mjnf(R, a).

aE9 (5)

When a is fixed, it can be shown that an optimal metric is

R(a) = (T’DT)-T’DYQY’DT(T’DT)-, (6)

where (T’DT)- is a generalized inverse of T’DT. In particular, the choice of the nodal values a = .$ is not necessarily optimal but gives T = X. The metric R(t) is the optimal metric for the linear PCAIV problem and the corresponding triple (X, R(t), D> constitutes the first iteration of the algorithm of Section 4. Moreover, a solution is obviously not unique since

Once a, and thus T = BAT are determined, R is given by (6). Denote

Y = P,Y,


where PT is the D-orthogonal projector on the linear subspace of R” spanned by the columns of T:

P,= T(T'D$T'D.

The spline-PCAIV of (Y, Q, 01 with respect to X is the PCA of (T, R, D) which is equivalent to the PCA of (Y, Q, D) since the operators are identical,

--- TRT'D=(P~Y)Q(P~Y~D.

--- As an aside we note that TRT’D is unique for any choice of the inverse of T'DT since P,Y is unique. Spline-PCAIV is thus an extension to regression techniques using additive splines and P, may be considered as a multivariate linear smoother (Hastie and Tibshirani, 1990; Buja, Hastie and Tibshirani, 1989) associated with the additive model defined by

-- ~=TM.

The matrix B is p x q and

(7)

M=(T'DT)-T'DY. (8)

If we note by pj the jth variable of the model, for j = 1,. . . , q, (7) may be interpreted as

-;i= C f/(xi)>

i=l ,...,P

with

f/(q) =lvijq(xi).

This transformation is called the optimal additive contribution of xi to $j. The contributions of one predictor xi to two different modeled responses jjj and $,!, can be deduced by using an orthogonal affinity.

f/‘(Xi) = kf,i(X,).

The associated ratio k is given by k = aijf/Hij, if aij # 0. Although this model is not capable of fitting perfectly any kind of nonlinear multivariate relationship, it provides an important generalization for the multilinear model. The examples provided in Section 5 highlight the gain in fit obtained by spline-PCAIV compared with linear PCAIV, which constitutes the first step of the iterative algorithm.

By the use of regression splines, spline-PCAIV combines the nonlinear approach with the linear one of the PCA. The latter method introduces useful properties for the interpretation of the results and applications to multivariate data analysis.


3.2. Spline-PCAIV and canonical analysis

Without general restriction it can be assumed that T is centered with respect to D for any value of A (one can take B centered with respect to 0). The PCA of (Pry, (Y’DY)-, O), (i.e., the linear PCAIV of (Y, (Y’DY)-, D) and T), is effectively equivalent to the canonical analysis of Y and T (Sabatier, 1987; Escoufier, 1987). The moments of inertia are the squares of the canonical correlation coefficients, and the canonical variables are collinear to the principal components of the PCAIV.

Proposition 1. The spline-PCAIV of (Y, (Y ‘DY I-, D> and X is the best canonical analysis between Y and any T defined by (1) in the sense that

tr( ( p,PT)2) 2 tr( ( P,P,)2).

This result may be compared with monotone spline canonical correlation (Ramsay, 1988). Here, we do not require the response variables to be transformed by splines and the spline transformations of the instrumental variables are not necessarily monotonic. When Y is the indicator matrix of 4 groups, the spline-PCAIV between (Y, (Y ‘DY) -I, 0) and X can thus be considered as the best discriminant analysis based on Y and any T given by (1). As a consequence of this result, it is easy to see how discriminant analysis uses the qualitative variable associated with Y. The distances between objects are computed from (Y ‘DY)-’ and thus depend on the number of items which are found in the groups. Different choices for D give the opportunity of selecting distances not deduced from the observed objects but coming from the meaning of the classes.

4. Algorithmical considerations

Before presenting the algorithm used when searching for an approximate solution to the optimization problem defined by (4) and (51, Proposition 2 gives the explicit expression of the gradient of the function f with respect to a.

Propostion 2. Denoting the gradient off in a as g, for i = 1,. . . , r and j = 1,. . . , p,

gij = $ = [ B’D( BARA’B’ - YQY’)DBAR],j_ l)r+i,j. 11

Explicit knowledge of the gradients of f with respect to R, that leads to (61, and with respect to a allows the implementing of a numerical relaxation method. An initial value of a is chosen. The method consists in alternating a step of computing an optimal R by (6) for fixed a, and a step of a descent algorithm with respect to a, with R being fixed. If the initial choice is a = 5, the first step is nothing more than taking the optimal metric of the linear PCAIV. In the absence of constraints, the chosen direction of descent could be that of a


quasi-Newton method. In the case where the vector a of parameters is constrained, the direction of descent will be given by the projection of the gradient on the linear subspace spanned by the fulfilled constraints (projected gradient method). Unless the objective function defined by (4) is convex with respect to a, which is generally not the case, this algorithm leads to a local optimum. However, Proposition 3 will clarify the nature of f function of a, and will provide an explicit choice for the unidimensional minimization. By analogy with the notation in (2) and (3), the vector w of Rrp defining the direction of descent starting from the point a is written as

w= c C Wije(j-l)r+iT j=l,. ..,p i=l,..., r

and the corresponding rp Xp matrix as

W= c C wijJj- I)r+i,j’ j=l ,...) p i=l,..., r

Let cp be the numerical search line function starting from the point a and defined by

cp(t) =f(R a + tw) -f(R a).

Proposition 3.

ddt) -= dt

i=F aiti, 1..., 3

tr( RW ‘CARA’CW

a0 = tr(RT’D(TRT’ - YQY’)DBW) =g’w,

err = tr(R4’CARW’CW) + tr((RA’CW)*) +

- tr(RW’B’DYQY’DBW),

a2 = 3 tr(RA’CWZW’CW),

(Ye = tr((RW’CW)*),

with C = B’DB.

>

Consequently the function f may not be convex in a since cp is a polynomial of degree 4. If we take t 2 0, a matrix H such that

w= -Hg, cxo= = -g’Hg < 0, t=o

defines w as a direction of descent. A quasi-Newton method, when there are no constraints, uses a positive-definite matrix H. The choice H = I,.,, gives a0 = - I( g 11; and leads to a gradient descent method. When there are constraints, the projected gradient method takes H to be the projection on the fulfilled constraints and gives a0 = - II Hg (I z. The positive coefficient (Ye = II BWRW ‘B’D (I* ensures the existence of a positive root of dq(t)/dt = 0.


Table 1 The spline-PCAIV algorithm

Set a = 5; E > 0 (convergence criterion);

current step 1) T = BA; compute R by (6); 2) compute g by (9);

if llgll~ < E then $= P,y; PCA (I;, Q, D); (*I stop.

else w=-Hg;

5- = arg mincp(t);

a + a + %: got0 1);

Note: the line (*I may be replaced by PCA (T, R, D).

Solving this equation of degree three only does not theoretically call for an iterative method and a value of t giving the absolute minimum to cp is explicitly known. The diagram in Table 1 summarizes the unconstrained algorithm processed in the applications of the following section.

5. Applications

The unconstrained method implemented in the following examples uses a gradient descent algorithm, H being the identity. A more sophisticated matrix H can be used in order to reduce computational time (quasi-Newton method). A spline-PCAIV program has been developed on a Sun4 system using the S language (Becker, Chambers and Wilks, 1988). When the number of the X or Y-variables is one or two, there is no need of simplifying the representation of the objects by using PCA: the first and second examples will only test the capabilities of PT as a linear smoother. On the other hand, the iterations being stopped, the third example processes the eigenanalysis corresponding to the PCA of (T, R, II).

5.1. The reconstruction of an anamorphosis

We now assume that a well-defined spatial structure in the objects is known. Although such clear-cut structures are relatively scarce, the following example will show how spline-PCAIV works and tests the exactness of fit according to the existence of additive relations between the variables from the two studies.

The number of variables LJ being 2, the representation of the objects provided by the triple (Y, Q, 0) constitutes a plane-form called the reference form. This form, Y, consists of n = 100 points in the plane. The metrics used here are


(4

. . . . . . .* * . . . *: . . .

*. *. -. **. . l . . .

. .

. - - . .

. .

. . . . . . . . . - . . . . . . . . . :: :: . : :. . . . :: . . . . :: . . -. . . . . . .

0.0 0.2 0.4 0.6 0.8

resp.l

9 4

. *..

‘)S, . . . ..s..

. . . . . . a.. . - * a-.... . . *. “.3U.f-. _

I ,

0.13 0.14 0.15

expl. var. 1

0.16

Fig. 1. The reference form Y is plotted in (a> while X, the additive anamorphosis of Y, is displayed in (b). The location of the interior knots is indicated by the horizontal and vertical lines.

Q = I, and D = lOO-‘I,,,. The object has the dog-like shape shown in Figure l(a). The representation of the same 100 objects provided by the triple (X, 12, 0) with p = 2, is the anamorphosis of Y or the form to be distorted, taking Y for a model. This anamorphosis, termed X, is constructed using transformations given

by

(x1 = harccos(+y, - by,),

\ ( x2= fYz-5YJ l/3

.

This leads to the form shown in Figure l(b). These transformations have been chosen so that the link between the Y and X variables takes the additive form

i

yr = 2 cos(loX,) - 2(X,)3,

yz = 4 cos(loX,) - (X2)3. (10)

The splines used here are B-splines of order 3 (degree 2) with 3 interior knots represented by the vertical and horizontal lines in Figure l(b). Figure 2 summarizes the results of the spline-PCAIV algorithm and illustrates the exactness of the reconstruction. Let us first examine Figures 2(g) and 2(h) which indicate how the algorithm stretches out toward the 60th step. The values of the fitting criterion 1) YQY’D - TRT’D (I, are plotted in Figure 2(g). They constitute a decreasing convergent sequence that converges towards 0 in the present case thus indicating that the reconstitution is exact. The moment of stopping the algorithm is given by the values of the gradient norm plotted in Figure 2(h). They form a zero convergem sequence, not necessarily decreasing. The modeled dog given by Y = P,Y, and obtained at the 60th step, is plotted in Figure 2(i). The efficiency of the reconstitution may be examined when looking at Figures 2(e) and 2(f) which display the response residuals plotted against the response variables. The additive contributions of the explanatory variables, given by (7) and (8), are displayed in Figures 2a-d. They are very close to the exact

J.-F. Durand / Generalized principal component

r

Fig. 2. Results of the spline-PCAIV algorithm at the 60th step. The reconstituted dog Y= P,Y is displayed by (i). Figures (a), (b), (cl and (d) give the additive contributions of the explanatory variables compared with the exact transformations plotted using + signs. Figures (e) and (f) display

the response residuals plotted against the respective responses.

additive contributions given by (10) and plotted using + marks: Figures 2(a) and 2(c) display the graph of the function cos(lOxI) with scale ratios 2 and 4 respectively; Figures 2(b) and 2(d) plot the function -(x,)~ with respective ratios 2 and 1.

5.2. Spline-PCAIVprocessed as an additive regression on the ozone data

The data on the atmospheric concentration of ozone collected in the Los Angeles area for 330 days in 1976 have been analysed in Breiman and Friedman (1985) using the ACE algorithm, These data are processed in Buja, Hastie and Tibshirani (1989) using cubic smoothing splines and a comparison between the different multipredictor additive regression techniques implemented on these data can be found in Hastie and Tibshirani (1990). The ozone concentration is here analysed depending on the visibility, the Sandburg base temperature, the inversion base height, the Daggett pressure gradient and the day of the year


(a)

c I I 10 z+ 10

-0

considered as the response of five atmospheric predictors whose Fig. 3. Ozone concentration is transformations are plotted in (a), (b), (c), (d) and (e). The gain in fit of spline-PCAIV compared with linear PCAIV, or linear regression, is 90-74 = 16, see (f). The effectiveness of the approxi-

mation may be examined in (f) and (h).

lb) (C)

considered as the predictor variables, see Figure 3. The B-splines and the metrics used in this example are similar to those of the preceding application. Let us first examine Figures 3(f), 3(g) and 3(h) which give indications about the quality of the approximation when the algorithm runs to the 160th step. When the gradient norm is about 0.5, the fitting criterion, or objective function is near 74, and the gain in fit with respect to the linear regression associated to the first step is 90 - 74 = 16. The efficiency of the approximation may also be gauged by looking at Figure 3(h) which displays the ozone concentration residuals plotted against the ozone concentration. Examine now Figures 3a-e which plot the additive contributions of the instrumental variables. They are very close to the estimated transformations obtained in the comparison with the different additive methods displayed on page 297 in Hastie and Tibshirani (1990). We confine ourselves to noting that, considering the vertical range covered by each function, the Sandburg base-temperature is the most important predictor whereas the inversion base height is the least.


! . . . . . . . . . . . . 3 . . . .

Pa

a b n . . O$ . .

6 . .

ai . . . .

Sepal L.

I *

Sepal L.

Sepal W.

. . . . .

._.

. : I* ls *o $1 ..o

Sepal W. Petal L. Petal W.

:r;;.;~~!i’;li.~ I . I I I) II a9 II .e * I , . I , , es 1.0 II zo u

Sepal L. Sepal W. Petal L. Petal W.

Fig. 4. Results of the iterations processed on the Fisher iris data. The additive contributions of the predictors to the modeled responses can be deduced by orthogonal affinity transformations.

5.3. @line-PCAIV experimented as discriminant analysis on the Fisher iris data

The sepal length, sepal width, petal length, and petal width were measured on fifty iris specimen from each of the three species, Iris setosa (s), I. versicolor (c), and I. virginica (v> so that X is 150 X 4 and Y, the indicator of the groups, is 150 x 3. The B-splines employed here are of order three with three equidistant knots and the invoked metrics are D = 150-‘Z,,, and Q = (Y’DY)-I. The results of the iterations are displayed Figure 4 and give the contributions of the four predictors to the three dependent variables (the groups). The result is indeed strange even contra-intuitive. Why are the transformations so nearly identical (except for reflexion) for the three dependent variables. The answer is: the model given by (7) is linear with respect to the spline transformations of the predictors. For example, Petal L. gives contributions with corresponding scale factors that are of the same sign for the first and third group (Iris (s) and (v>>, and of the opposite sign for the second (Iris (cl). In order to appreciate the influence of one predictor on a reconstituted group, we may examine the vertical range of the transformation. It appears that the greatest ranges are obtained by the Petal variables. Considering these two predictors

J.-F. Durand / Generalized principal component

Discriminant variables Predictors

-1.5 -1.0 0.5 0.0 0.5 1.0 0.0 0.5 1.0 1.5

variable I 0.8891 ,54.28X, Ad*, 0.8881 (%a?&)

Fig. 5. Plots of the eigenanalysis corresponding to the PCA of (T, R, 0). The close to 1 eigenvalues yield well separated groups and the plot of the predictors corroborates the analysis of

their respective influence on the groups given by Figure 4.

only, the vertical range for the iris (s) are less important than the range for the iris (c) and (v) which are of the same order. Although the vertical ranges are equivalent these two groups are different in that the scale factors are all of opposite sign. This analysis is reinforced when examining the plots issued from the PCA of (T, 2, D), Figure 5. The discriminant variables can be deduced from the principal components by using the scale factor A,:‘i2, where hj is th_e corresponding eigenvalue, see Escoufier (1987). Denote G = (Y ‘DY 1; ‘Y ‘DT, the matrix whose rows Gj are the centroids of the classes defined by T and Y. The classification rule is: assign t (considered as a row vector) to class g if

(t - G,)(T'DT)- (t - Gg)' = min i=l,...,q

(t - G,)(T’DT)-(t - Gi)‘. (11)

There are three misclassified items, marked by using circles, that gives, in this example, the same number as the usual quadratic discrimination. It must be noted that there is no need here of the hypothesis of multinormality for the distribution within each group. The disadvantage of processing an iterative method is reduced in that the iterations concern only the training sample. Once the optimal spline parameters are computed, the object x which is to be classified, is transformed in t by using the optimal B-spline functions and the classification rule (11) is then applied. In order that spline transformations could make sense, the user has to verify that each observation in the test sample lies within the range of the corresponding variable in the training sample. The method has been experimented on ‘pathological’ examples of classes uniformly distributed in concentric rings (Durand, 1992). In such examples, linear and quadratic discrimination fail while the more attractive spline-PCAIV results are


similar to those obtained in Breiman and Ihaca (1989) by using the ACE algorithm.

6. Remarks and prospects

Some important problems concerning regression by additive models were not taken into account here: selection of explanatory variables, choice of the optimal order of the splines and of the number and the position of the knots. These problems are the subject of recent studies (Friedman and Silverman, 1989). Procedures will have to be adapted to the spline-PCAIV algorithm in order to reduce the computational cost but preserving the quality of the approximation. However that may be, the iterative approach of the method pays off in the simplicity of the interpretation of the additive model as in linear analysis. It has already been noted that additive models cannot adjust any kind of multivariate liaison whatsoever. In order to solve this difficult problem one should try to introduce in the spline-PCAIV method adaptive techniques using multivariate splines (Friedman, 1991). Generalized PCAIV defined in the introduction seems to have useful applications. One can, for instance, try to replace spline functions by parametric transformations and thus combine parametric regression with PCA. Other developments in the field of statistical applications of the method can be envisaged when the considered data arrays are contingency tables. Choosing Q and D metrics computed with the marginals will give the opportunity of studying the additive relationship between two contingency tables.

Appendix

Proof of Proposition 1

Since T is a solution of (5), for any T defined by (l),

--- 11 YQY’D - TRT’D II 2 < II YQY’D - TRT’D II 2.

For a linear PCAIV between (Y, Q, D> and T (i.e., if R is given by (6)), we have (Bonifas et al. 1984),

11 YQY’D - TRT’D II 2 = II YQY’D It ‘(1 - RV2(YQY’D, TRT’D)).

The RV coefficient defined on S2(D> (Robert and Escoufier, 1976) is given by

RV(A, B) = tr(4

(tr(A2) tr(B’))“’ ’

and

O<RV(A, B)<l.


The initial inequality becomes ---

RV(YQY’D, TRT’D) > RV(YQY’D, TRT’D),

for any T defined by (1) and R defined by (6). If, in addition, Q = (Y’DY)-,

(tr( (p,p,)2))1’2 2 (tr((P,PT)2))1’2. 0


For notation and results pertaining to tensor calculus and matrix derivatives see Rogers (1980). The lemma gives firstly the expression for the derivative of f with respect to T. the result desired.

Lemma

Finally, the formal derivation of f with respect to A leads to

af(R7 4

aT =D(TRT’ - YQY’)DTR.

Proof. If &) is the commutation matrix, it is easy to show that

a(TRT’D)

aT = vet Z,(vec RT’D) + (TR ~3 Zn)ZC,,p,(D C3 Zp).

Let us denote 2 = YQY’D which is constant and belongs to S(D). Using the star-product *, we have

tr(ZTRT’D) = 2’ * (TRT’D),

and then

a tr( ZTRT’D) =Z’*

a(TRT’D)

aT aT

= Z’DTR + DZTR

= 2 DZTR = 2 DYQY ‘DTR.

It should be noted that this result is a straightforward application of a formula in Table 4, page 178, in Magnus and Neudecker (1988). The other part of f(R, a> is derived as

allTRT’D112 a tr((TRT’D)2)

aT = aT

Zn * a( TRT’D)

= aT

(TRT’D BZ,) + (TRT’D c9zJ a( TRT’D)

aT I*

.I. -F. Durand / Generalized principal component

This yields

a (I TRT’D (I 2 aT

= 4 DTRT’DTR,

439

from which the result can be deduced. 0

Finally,

WC a> V-(R, a> aT

aA = aT *aA

aT z = vet B’(vec IP)’

?WC a>

aA = B’D(TRT’ - YQY’)DTR.

The last formula gives the derivative with respect to A. Taking into account the expression of A given by (21, we have

VPC a> = daij

VW a> *J, (I- l)r+i,j

=tr[r ) VW a)

aA ‘/ ,_

(1 l)r+iJ

= [ B’D(BA&l’B’ - YQY’)DBAR] (j-l)r+r,j> o


The result may be obtained replacing T by T + t BW in (4) and then derivating with respect to t. The following relationship is relevant

df(R, a + tw) af

dt = z AftW [ 1 *w = tr[ RT’D(TRT’ - YQY’)DBW]T=T+tBW.

The development of this formula leads to the result by using properties of the trace and the fact that matrices R and W’B’DBW are symmetrical. 0

Acknowledgements

I would like to express my thanks to Yves Escoufier for his valuable advice. I thank Alain Berlinet, Robert Sabatier, and Susan Holmes for their careful reading of the manuscript and helpful remarks.


References

Becker, R.A., J.M. Chambers and A.R. Wilks, The New S Language (Wadsworth and Brooks, Pacific Grove, CA, 1988).

Bonifas, L., Y. Escoufier, P.L. Gonzales and R. Sabatier, Choix de variables en analyse en composantes principales, Revue de Statistique Appliquee, XXXZZ (1984) 5-15.

Breiman, L. and J.H. Friedman, Estimating optimal transformations for multiple regression and correlation (with discussion), Journal of the American Statistical Association, 80 (1985) 580-618.

Breiman, L. and R. Ihaca, Nonlinear discriminant analysis via scaling and ACE, Technical report (Statistical Department, University of California, Berkeley, 1989).

Buja, A., T. Hastie, and R. Tibshirani, Linear smoothers and additive models, The Annals of Statistics, 17 (1989) 453-5.55.

Burg, E. van der and J. de Leeuw, Nonlinear redundancy analysis, British Journal of Mathematical and Statistical Psychology, 43 (1990) 217-230.

Durand, J.F., Additive spline discriminant analysis, in: Y. Dodge and J.C. Whittaker (Eds.), Computational Statistics, I (Physica-Verlag, Heidelberg, 1992) pp. 145-150.

Escoufier, Y., Principal components analysis with respect to instrumental variables, European Courses in Advanced Statistics (University of Napoli, 1987) pp. 285-299.

Escoufier, Y. and S. Holmes, Data analysis in France, Biometric Bulletin, 7 (3) (1990) 27-28. Fisher, R.A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7

(1936) 179-188. Friedman, J.H., Multivariate adaptive regression sphnes (with discussion), The Annals of Statistics,

19 (1991) 1-123. Friedman, J.H. and B.W. Silverman, Flexible parsimonious smoothing and additive modeling,

Technometrics, 31 (1989) 1, 3-39. Hastie, T. and R. Tibshirani, Generalized Additive Models (Chapman and Hall, London, 1990). Magnus, J. and H. Neudecker, Matrix Differential Calculus with Applications in Statistics and

Econometrics (Wiley, New York, 1988). Ramsay, J.O., Monotone regression splines in action, Statistical Science, 3 (1988) 425-461. Rao, CR., The use and the interpretation of principal component analysis in applied research,

Sankhya A, 26 (1964) 329-358. Rijckvorsel, J. van, Fuzzy coding and B-splines, in: J. van Rijckevorsel and J. de Leeuw (Eds.),

Component and Correspondence Analysis (Wiley, New York, 1988) pp. 33-55. Robert, P. and Y. Escoufier, A unifying tool for linear multivariate methods: The RV-coefficient,

Applied Statistics, 25 (1976) 257-265. Rogers, G.S., Matrix Deriuatiues, Lecture Notes in Statistics, Vol. 2 (M. Dekker, New York, 1980). Sabatier, R., Methodes factorielles en analyse des don&es: Approximations et prise en compte

des variables concomitantes, These d’etat (USTL, Montpelher, 1987). Schumaker, L.L., Spline Functions: Basic Theory (Wiley, New York, 1981). Zaamoun, S., Fonctions splines en analyse de donnees, These de doctorat (CNAM, Paris, 1989).

Generalized principal component analysis with respect to instrumental variables via univariate...

Documents

Transcript of Generalized principal component analysis with respect to instrumental variables via univariate...