Robustness and Reproducing Kernel Hilbert Spaces Grace...

34
Robustness and Reproducing Kernel Hilbert Spaces Grace Wahba Part 1. Regularized Kernel Estimation RKE. (Robustly) Part 2. Smoothing Spline ANOVA SS-ANOVA and RKE. Part 3. Partly missing covariates SS-ANOVA and QPLE(Robustly) ICORS 2010 Prague, Czech Republic, June 30, 2010 These slides at http://www.stat.wisc.edu/~wahba/ TALKS Link to the references and all papers/preprints since 1993 at http://www.stat.wisc.edu/~wahba/ - > TRLIST 1 May 30, 2010

Transcript of Robustness and Reproducing Kernel Hilbert Spaces Grace...

Page 1: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Robustness and Reproducing Kernel Hilbert Spaces

Grace Wahba

Part 1. Regularized Kernel Estimation RKE. (Robustly)Part 2. Smoothing Spline ANOVA SS-ANOVA and RKE.

Part 3. Partly missing covariates SS-ANOVA and QPLE(Robustly)

ICORS 2010Prague, Czech Republic, June 30, 2010

These slides at

http://www.stat.wisc.edu/~wahba/ → TALKS

Link to the references and all papers/preprints since 1993 at

http://www.stat.wisc.edu/~wahba/ − > TRLIST

1 May 30, 2010

Page 2: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

References

1. F. Lu, S. Keles, S. Wright, and G. Wahba. A framework forkernel regularization with application to protein clustering.Proceedings of the National Academy of Sciences,102:12332–12337, 2005. RKE.

2. H. Corrada Bravo, G. Wahba, K. E. Lee, B. E. K. Klein,R. Klein, and S. K. Iyengar. Examining the relative influenceof familial, genetic and environmental covariate information inflexible risk models. Proceedings of the National Academy ofSciences, 106:8128–8133, 2009. RKHS, SS-ANOVA, RKE.

3. X. Ma, B. Dai, R. Klein, B. Klein, K. Lee, and G. Wahba.Penalized likelihood regression in reproducing kernel Hilbertspaces with randomized covariate data. TR 1958, Statistics,University of Wisconsin- Madison 2010. Missing covariates,Quadrature Penalized Likelihood Estimates QPLE.

2 May 30, 2010

Page 3: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Part 1. Regularized Kernel Estimation RKE

Given scattered noisy non-metric pairwise dissimilarity informationdij between pairs ij of n objects, embed these objects in aEuclidean space that attempts to preserve the dissimilrityinformation as much as possible. Find an n× n distance encodingmatrix Rdist by solving the convex one optimization problem:

minR0

∑(i,j)∈Ω

|dij − dij(R)|+ λRKEtrace(R) (1)

where R 0 means R is in the convex cone of all real non-negativedefinite matrices of dimension n, Ω is all or a (sufficiently rich)subset of the

(n2

)pairs of indices, and

dij(R) ≡ R(i, i) + R(j, j)− 2R(i, j), the natural squared distanceinduced by R. Robust against dissimilarity data not satisfying thetriangle inequality!

3 May 30, 2010

Page 4: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Small eigenvalues in the fitted Rdist are deleted, leaving r non-zeroeigenvalues. Rdist(i, j) gives a (unique up to rotation) embeddingz(i) in Euclidean r dimensional space of the ith subject byRdist = Γn×rΛrΓ′r×n, Zn×r = ΓΛ1/2. The coordinates of the ithobject z(i) are given by the ith row of Z, (z(i), z(j)) = Rdist,ij ,‖z(i)− z(j)‖2 = dij .

4 May 30, 2010

Page 5: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

RKE example: proteins with BLAST scores.

−0.4−0.3−0.2−0.100.10.20.30.4

−1

0

1

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Figure 1: 3D representation of the sequence space for 280 proteinsfrom the globin family. Red: α-globin subfamily, blue: β-globins,purple: myglobin subfamily, and green: a heterogeneous group en-compassing proteins from other small subfamilies within the globinfamily. Note that in this example three, or even two dimensions areenough to separate the subfamilies.

5 May 30, 2010

Page 6: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Eigenvalues of Rdist from BLAST scores example as λ varies.

1 280

−8

−6

−4

−2

0

1.5λ=1e−08

Rank

Log 10

(eig

enva

lue)

1 280

λ=0.1

Rank1 280

λ=1

Rank1 280

λ=10

Rank1 280

λ=250

Rank5 101

0

0.5

1

1.5

1.25

0.75

0.25

λ=1

Rank

Figure 2: The effect of varying λ on the eigenvalues of Rdist. Theleft five images show log-scale eigensequence plots for five values of λ.As λ increases, smaller eigenvalues begin to shrink. The rightmostimage shows the first 10 eigenvalues of the λ = 1 case displayed ona larger scale. In this example the plots are insensitive to λ overseveral orders of magnitute.

6 May 30, 2010

Page 7: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

In this BLAST example the results may be similar to what onemight get using multidimensional scaling with a specifieddimension of two or three, but perhaps more fault-tolerant. In thenext section we will use the RKE in an entirely different context:Wwe have a population from a demographic study of eye diseaseswhere one-third of the population has at least one relative in thestudy. There is an outcome (pigmentary abnormalities-PA, Yes orNo), and attribues-genetic markers, environmental/clinical(E/C)data, and pedigrees. The RKE will be used to embed the subjectsin a Euclidean space using the pedigree information, (siblings close,niece-aunt not so close, etc) and the resulting coordinates will beadded to the genetic and E/C attributes in a penalized logisticregression Smoothing Spline ANOVA model estimating the risk ofPA. From reference 2. Switch gears now to model building withthis data.

7 May 30, 2010

Page 8: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Part 2. Smoothing Spline ANOVA SS-ANOVA Models.

The Log Likelihood for Bernoulli responses:

• Given: yi, x(i), i = 1, 2, · · · , n, y ∈ 0, 1x = (x1, x2, · · · , xd)Estimate: p(x) = Prob(y = 1|x)

• The log odds ratio (logit): f(x) = log p(x)1−p(x)

The negative log likelihood:

L(y, f) =n∑

i=1

−yif(x(i)) + log(1 + ef(x(i)))

• Recover p(x) = ef(x)/(1 + ef (x)).

8 May 30, 2010

Page 9: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Penalized Log Likelihood Estimate

The penalized log likelihood estimate of f is obtained by finding f

in some prescribed function space to minimize

I(f) = L(y, f) + λJ(f)

where J(f) is a penalty functional on f and λ is a tuningparameter which balances fit to the data and complexity/wigglinessof f . We will fit f in a function space which admits a usefulANOVA decomposition-a Reproducing Kernel Hilbert SpaceRKHS, using an SS-ANOVA model.

9 May 30, 2010

Page 10: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Reproducing Kernel Hilbert Spaces RKHS

• f will be in an RKHS. What is an RKHS?

• Let K(s, t) be a positive definite function on T ⊗ T . Thismeans for any t1, · · · , tk,

∑kr,s=1 K(tr, ts) ≥ 0.

• Moore-Aronszajn Theorem: To every positive definite functionK(·, ·) there corresponds a unique RKHS HK and vice versa.

• K(·, t∗) ∈ HK , all t∗ ∈ T . < K(·, s),K(·, t) >= K(s, t).

• All linear combinations of the K(·, t), t ∈ T and their limits inthe norm induced by the inner product constitute HK .

• < f(·),K(·, t∗) >= f(t∗) for all f ∈ HK . Important!

10 May 30, 2010

Page 11: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

To understand Smoothing Spline ANOVA Models:

ANOVA Decomposition of Functions of Several Variables

x ≡ (x1, · · · , xd) ∈ X ≡ X (1) ⊗ · · · ⊗ X (d)

f(x) = f(x1, · · · , xd).

Let dµα be a probability measure on X (α) and define the averagingoperator Eα on X by

(Eαf)(x) =∫X (α)

f(x1, · · · , xd)dµα(xα).

11 May 30, 2010

Page 12: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

ANOVA Decomposition of Functions of Several Variables (continued)

The averaging operators Eα give a (unique) ANOVA decompositionof f :

f(x1, · · · , xd) = µ +∑α

fα(xα) +∑αβ

fαβ(xα, xβ) + · · ·

where

µ =∏α

Eαf =∫· · ·

∫f(x1, · · · , xd)dµ1(x1) · · · dµd(xd)

fα = (I − Eα)∏β 6=α

Eβf

fαβ = (I − Eα)(I − Eβ)∏

γ 6=α,β

Eγf

...... Eαfα = 0, EαEβfαβ = 0, etc.

12 May 30, 2010

Page 13: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

ANOVA Decomposition of Functions of Several Variables (continued)

f(x) = µ +d∑

α=1

fα(xα) +∑α≤β

fαβ(xα, xβ) + · · ·

• The series is truncated at some point.

• Terms satisfy ANOVA-like side conditions (identifiable).

• SS-ANOVA representation with weights on kernels :

f(·) =m∑

j=1

djφj(·) +n∑

j=1

cjKθ(·, x(j)),

φj are unpenalized components (parametric part) and

Kθ(·, ·) =d∑

α=1

θαKα(·, ·),+∑α≤β

θαβKαβ(·, ·) + · · ·

• Kernels depend only on components of x in the subscripts.

13 May 30, 2010

Page 14: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

The SS-ANOVA penalty functional has the form

J(f) =n∑

i,j=1

cicj

d∑α=1

θ−1α Kα(x(i), x(j)) +

∑α≤β

θ−1αβKαβ(x(i), x(j)) + · · ·

since ‖f‖2

HθK= θ−1‖f‖2

HK. The θs are tuning parameters with an

identifiability constraint along with λ. We tune this Bernoullimodel with RKHS squared norm penalties using the GACV.

14 May 30, 2010

Page 15: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

SS-ANOVA Model in the Beaver Dam Eye Study

• The Beaver Dam Eye Study (BDES) is an ongoingpopulation-based study of age related ocular disorders, begunin 1988.

• An SS-ANOVA model for association of a number ofenvironmental/clinical (E/C) variables based on 2585 womenwith complete E/C data appears in Lin, Wahba et. al. Ann.Statist. 28 (2000).

• 684 women have at least one relative also in the study.

15 May 30, 2010

Page 16: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

• The predictor variables of present interest are:

code units description

horm yes/no current usage of hormone replacement therapy

hist yes/no history of heavy drinking

bmi kg/m2 body mass index

age years age at baseline

sysbp mmmHg systolic blood pressure

chol mg/dL serum cholesterol

smoke yes/no history of smoking

Table 1: E/C covariates for BDES pigmentary abnormalities SS-ANOVA model

16 May 30, 2010

Page 17: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

• The fitted E/C model that we are using in the present study is

f(t) = µ + f1(sys) + f2(chol) + f12(sys, chol)

+ dage · age + dbmi · bmi

+ dhorm · I1(horm) + ddrin · I2(drin) + dsmoke · I3(smoke)

• This is the same model that was fitted in Ann. Statist. 2000with the exception that smoke was not included there.

• f1, f2 and f12 are splines.

17 May 30, 2010

Page 18: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

BERNOULLI OBSERVATIONS AND THE ranGACV 1595

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=24.3, age=52

sys=108sys=123sys=137sys=157

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=24.3, age=62

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=24.3, age=71

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=27.5, age=52

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=27.5, age=62

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=27.5, age=71

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=31.6, age=52

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=31.6, age=62

cholesterol (mg/dL)

prob

abili

ty

100 200 300 400

0.0

0.1

0.2

0.3

0.4

bmi=31.6, age=71

Fig. 9. Estimated probability of pigmentary abnormality as a function of cholesterol by threelevels of bmi and age and four levels of sys, horm=no, drin=no.

Estimated probability from an SS- ANOVA logistic regressionmodel. Each x-axis is cholesterol, each set of four lines is four valuesof systolic blood pressure, each plot fixes body mass index and ageto the shown values. hist=0, horm =0. From Ann. Stat. 2000.

18 May 30, 2010

Page 19: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Modeling E/C, genetic and pedigree data in an extended SS-ANOVA

model

f(t) = µ + dSNP1,1 · I(X1 = 12) + dSNP1,2 · I(X1 = 22)

+ dSNP2,1 · I(X2 = 12)dSNP2,2 · I(X2 = 22)

+ f1(sysbp) + f2(chol) + f12(sysbp, chol)

+ dage · age + dbmi · bmi

+ dhorm · I1(horm) + ddrin · I2(drin) + dsmoke · I3(smoke)

+ fped(z(t)).

• First two lines: Genetic (SNP) data. Two SNPS each withthree levels, (1,1), (1,2), (2,2). (SNP IDs in TR1148)

• Next three lines E/C variables

• Last line: Pedigree/relationship data goes here. Will explain.

19 May 30, 2010

Page 20: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

A Pedigree from BDES

16

2

3

9

15

1

4

10

30

14

5

11

29

13

6

17

22

7

18

21

8

19

32

12

41

31

20

42

34

23

28

33

35

25

36

24

37 39 38 26 27 40 27 20

Example pedigree from the Beaver Dam Eye Study. Red nodes-withpigmentary abnormalities, blue nodes-without pigmentaryabnormalities. Circles are females, rectangles are males.

20 May 30, 2010

Page 21: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

A Relationship (Sub)Graph From the Pedigree

2640

10

358

3

13

2

Relationship graph for subjects in the pedigree. Edge labels aredistances defined by the kinship coefficient. Persons 26 and 35 aresiblings [1], persons 8 and 10 are aunt and niece [2] and persons 26and 40 are cousins [3]. Unrelated pairs have dashed lines.

21 May 30, 2010

Page 22: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Relationship Data Encoded with RKE

• To include relationship/pedigree data into an SS-ANOVAmodel, we encode it with the Regularized Kernel Estimationalgorithm (RKE). (Lu et al, PNAS 2005)

• Given n objects and pairwise dissimilarity measures dij

between a sufficient number of the(n2

)pairs, the RKE encodes

this information in an n× n positive definite matrix Rdist(i, j)defined on the n objects. The dij are obtained based on therelationship coefficients (1, 2, 3, 4, 5, L), where L is “norelation” by a biologically motivated transformation.(dij = −2log2(2φij)) where φ is Malecot’s kinship coefficient).

22 May 30, 2010

Page 23: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Embedding of Pedigree by RKE

−1000 −500 0 500 1000 1500−0.

20−

0.15

−0.

10−

0.05

0.0

0 0

.05

0.1

0 0

.15

0.2

0

−1.5−1.0

−0.5 0.0

0.5 1.0

1.5

26

40

10

35

8

z(i) for the five persons in the relationship graph. The x-axis ofthis plot is order of magnitudes larger than the other two axes. Theunrelated edges in the relationship graph occur along thisdimension, while the other two dimensions encode the relationshipdistance.

23 May 30, 2010

Page 24: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Relationship Data Encoded With RKE (continued)

The RKE embedding is unique up to rotation, but only thedistances dij are relevant. These distances can be used with any RKthat only depends on ‖z(i)− z(j)‖, that is, a radial basis function(RBF), Kped(z(i), z(j)) = Kped(‖z(i)− z(j)‖). We use a MaternRBF in the present work. Recall that without the pedigree data,

f(·) =m∑

j=1

djφj(·) +n∑

j=1

cjKθ(·, x(j)). (2)

The pedigree data enters the model by

Kθ(·, ·) → Kθ(·, ·) + θpedKped(·, ·). (3)

The Matern family is a two-parameter family, and the parametersare to be chosen along with λ and the θs.

24 May 30, 2010

Page 25: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Qualitative Results

An important goal of the study is to explore the relativecontribution of each source of data. Since there three sources ofinformation: (S=SNPS, P=Pedigrees,C= Environmental/Clinical)there are seven models we can consider:

• S = SNPS (genetic data) only

• C = Environmental/Clinical (E/C) data only

• S + C

• P = Pedigrees only

• S + P

• C + P

• S + C + P

Compare models by evaluating the AUC (Area Under the Curve).

25 May 30, 2010

Page 26: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Comparing Models by Their Area Under the (ROC) Curve (AUC)

ROC curves for models with two or all three data sources

False positive rate

Ave

rage

true

pos

itive

rat

e

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

S+C (0.7115)S+P (0.7015)C+P (0.7123)S+C+P (0.7455)

ROC curves for the models with two data sources. Plot isconstructed by classifying each person in a test set by thresholdingtheir value of p(x). As the threshold goes from 0 to 1, plot “Truepositive rate” against “False positive rate”. Dashed line-randomclassification.

26 May 30, 2010

Page 27: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Results

S−only C−only S+C P−only S+P C+P S+C+P

Mean AUC for each model

AU

C

0.5

0.6

0.7

0.8

0.9

Figure 9: AUC comparison of models. S-only is a model with only genetic markers, C-only is a model with only environ-mental covariates and S+C is a model containing both data sources. P-only is a model with only pedigree data, P+S isa model with both pedigree data and genetic marker data, P+C is a model with both pedigree data and environmentalcovariates, P+S+C is a model with all three data sources. Error bars are one standard deviation from the mean. Yellowbars indicate models containing pedigree data. For models containing pedigrees, the best AUC score for each model isplotted. All AUC scores are given in Table 2.

ROC curves for models with two data sources

False positive rate

Ave

rage

true

pos

itive

rat

e

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

S+C (0.7115)S+P (0.7015)C+P (0.7123)

Figure 10: ROC curves for models with two data sources. Although all three models have comparative AUC (shown inparenthesis in the legend), the relationship between the curves varies across ROC space. The S+C model dominates thelow false positive rate portion of space, while models including pedigree data dominate in the high true positive rateportion.

10 H. Corrada Bravo et al.

The mean AUC for each of the seven models is given in the plotabove, in order: Red: S-only, C-only and S+C. Pedigrees are addedin yellow: P-only, S+P, C+P and S+C+P.

27 May 30, 2010

Page 28: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Part 3. Partly Missing Covariates.

In the third reference, methods for handling partly missingcovariates in parametric models have been extended to SmoothingSpline ANOVA models (robustly!).

PENALIZED LIKELIHOOD WITH RANDOMIZED COVARIATE 33

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =52, horm=no

sys 157sys 137sys 123sys 108

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =62, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =71, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =52, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =62, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =71, horm=yes

Fig 7. Probability curves estimated from the full data analysis. This figure is adapted fromFigures 9 and 10 from Lin, Wahba, Xiang, Gao, Klein and Klein (2000)[28]. Each panelplots the estimated probability of pigmentary abnormalities as a function of cholesterol,for four different values of sys. The six panels correspond to different values of age andhorm, when drin=no and bmi=27.5 are fixed.

1 May 29, 2010

Figure 3: Repeat of the bmi=27.5 row of the earlier figure from Ann.Stat. 2000. This model includes horm,hist,bmi,age,sysbp,chol.

28 May 30, 2010

Page 29: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

To test the proposed method, 517 subjects out of the n = 2585original subjects with chol between 250 and 350 have one or moreof their covariates sys,bmi,horm deleted. 30 subjects missedsys,bmi,horm, 109 subjects missed sys,bmi, 118 subjects missedsys,horm and 260 subjects missed one attribute.

PENALIZED LIKELIHOOD WITH RANDOMIZED COVARIATE 35

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =52, horm=no

sys 157sys 137sys 123sys 108

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =62, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =71, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =52, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =62, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =71, horm=yes

Fig 9. Probability curves obtained from the naive method. Each panel plots the estimatedprobability of pigmentary abnormalities as a function of cholesterol, for four different val-ues of sys. The six panels correspond to different values of age and horm, when drin=noand bmi=27.5 are fixed. 2 May 29, 2010

Figure 4: Model using only the 2068 subjects with no missing datain the method test data set. Note the missing ‘bump’.

29 May 30, 2010

Page 30: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Quadrature Penalized Likelihood Estimation QPLE

The QPLE method for missing data in SS-ANOVA modelsgeneralizes results on missing data for parametric models (manyreferences). It begins by assuming conditional probabilitydistributions for each cluster of missing observations, conditionalon available observations, with parameters to be estimated. Theprobability distributions for continuous data appear in integrals forthe penalized log likelihood. The integrals are replaced byquadrature formulae. Thus, the continuous prior probabilitydistributions are replaced by distributions (weights) on masspoints, and (updated) parameters of the distributions are used toupdate the weights via the EM algorithm. The process is carriedout for each trial set of tuning parametera and at convergence atuning score (GACV) can be obtained in a manner similar to thatfor complete data.

30 May 30, 2010

Page 31: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

In this example the conditional distribution for sys,bmi givenage,chol was chosed as bivariate normal with means linear inage,chol, with unknown coefficients and unknown bivariatecovariance matrix, and horm was modeled conditional on age,

chol,sys,bmi, as the conditional logit linear in these variableswith unknown coefficients.

31 May 30, 2010

Page 32: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Here is the resulting model

34 MA, DAI, KLEIN, KLEIN, LEE AND WAHBA

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

babili

ty

age =52, horm=no

sys 157sys 137sys 123sys 108

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

babili

ty

age =62, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

babili

ty

age =71, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

babili

ty

age =52, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

babili

ty

age =62, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

babili

ty

age =71, horm=yes

Fig 8. Probability curves obtained from QPLE. Each panel plots the estimated probability ofpigmentary abnormalities as a function of cholesterol, for four different values of sys. Thesix panels correspond to different values of age and horm, when drin=no and bmi=27.5are fixed. 3 May 29, 2010

which can be compared with the model with complete data:.

32 May 30, 2010

Page 33: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

PENALIZED LIKELIHOOD WITH RANDOMIZED COVARIATE 33

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =52, horm=no

sys 157sys 137sys 123sys 108

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =62, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =71, horm=no

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =52, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =62, horm=yes

100 150 200 250 300 350 400

0.0

0.1

0.2

0.3

0.4

cholesterol(mg/dL)

pro

ba

bili

ty

age =71, horm=yes

Fig 7. Probability curves estimated from the full data analysis. This figure is adapted fromFigures 9 and 10 from Lin, Wahba, Xiang, Gao, Klein and Klein (2000)[28]. Each panelplots the estimated probability of pigmentary abnormalities as a function of cholesterol,for four different values of sys. The six panels correspond to different values of age andhorm, when drin=no and bmi=27.5 are fixed.

1 May 29, 2010

The result is rather amazing, but depends on the attributes beingwell correlated, very common in medical data.

33 May 30, 2010

Page 34: Robustness and Reproducing Kernel Hilbert Spaces Grace ...pages.stat.wisc.edu/~wahba/talks1/prague10/wahba.prague...Penalized likelihood regression in reproducing kernel Hilbert spaces

Summary and Conclusions

In Part 1 we have described the RKE estimate which embeds noisy,non-metric, incomplete dissimilarity information into a Euclideanspace. In Part 2 we first reviewed penalized log likelihood estimatesfor Bernoulli responses. Then we showed how pedigree data couldbe embedded into a Euclidean space via an RKE estimate, andthen combined with genetic and environmentl/clinical variables tobuild a penalized logistic regression Smoothing Spline ANOVAmodel. Part 3 notes recent work dealing with partially missingcovariate data in Smoothing Spline ANOVA models using aQuadrature Penalized Estimate QPLE in a robust manner.

Robustness is everywhere and we thank the organizers forprompting us to think along these lines!

34 May 30, 2010