Inference in High-Dimensional Varying Coefficient Models (slides)

Inference in High-dimensional Varying CoefficientModels

Mladen Kolar

The University of ChicagoBooth School of Business

Dec 15, 2014

Acknowledgments

D. Kozbur

M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 2

Varying-coefficient Model

yi = x>i β(ui) + σ(ui, xi)εi, εi ∼ Fi,E[εi | xi, ui] = 0, i = 1, . . . , n, E

[ε2i]

= 1,

High dimensional setting: Xi ∈ Rp, p n

Approximate sparsity

E[yi | xi = x, ui = u] ≈ x>S(u)βS(u)(u),

where S(u) ⊂ [p], |S(u)| ≤ s n

Our goal: constructing confidence bands for βj(u)


Varying-coefficient Model

Widely used

economics, finance, medical science, ecology

Flexible modeling

less restrictive assumptions

domain scientists have prior knowledge that can be used

Interpretable

for each value of the index parameter, one has a parametric model


Confidence Intervals/Bands


Local Linear Lasso

For a fixed point u, we estimate β(u) as

(β(u)

δ(u)

)= arg min

β,δ∈Rp1

2n

∑i∈[n]

Kh(ui − u)(yi − x>i β − x>i δ(ui − u)

)2+λ

∑j∈[p]

(σ1j |βj |+ σ2j |δj |)

where

σ21j = n−1∑i∈[n]

Kh(ui − u)x2ij(yi − x>i β(ui))2

estimates the variance of the score vector


Naive Confidence Bands

An idea

Use the local linear lasso to select the model

Use the selected components and refit the model

Construct confidence bands using the results of Fan and Zhang(2000)

Issues

not uniformly valid

hinges on correct model selection

requires stringent design conditions and


Example

Yi = 4XiUi(1− Ui)

+ Zi1√Ui/2 + Zi2

√Ui/4 + Zi3

√Ui/8 + Zi4

√Ui/16

+ Zi5(1− Ui)/2 + Zi6(1− Ui)/4 + Zi7(1− Ui)/8 + Zi8(1− Ui)/16

+ εi

Xi = Zi1√Ui/2 + Zi2

√Ui/4 + Zi3

√Ui/8 + Zi4

√Ui/16

+ Zi5(1− Ui)/2 + Zi6(1− Ui)/2 + Zi7(1− Ui)/8 + Zi8(1− Ui)/16

+ σxξi

εi, ξi ∼ N (0, 1), Zi ∈ Np(0, I), p = 50, n = 200


Example (con’t)

σx = 0.5

σx = 1

0 0.2 0.4 0.6 0.8 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Post-Double-Selection Estimator

u

α(u)

0 0.2 0.4 0.6 0.8 1−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Post-Single-Selection Estimator

u0 0.2 0.4 0.6 0.8 1

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Oracle Estimator

u

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2


u

α(u)

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5

2


u0 0.2 0.4 0.6 0.8 1

−1.5

−1

−0.5

0

0.5

1

1.5

2

Oracle Estimator

u


Example (con’t)

σx = 0.5

σx = 1

0 0.5 1 1.5 2


α(0.5)0 0.5 1 1.5 2


α(0.5)0 0.5 1 1.5 2

Oracle Estimator

α(0.5)

0.4 0.6 0.8 1 1.2 1.4 1.6


α(0.5)0.4 0.6 0.8 1 1.2 1.4 1.6


α(0.5)0.4 0.6 0.8 1 1.2 1.4 1.6

Oracle Estimator

α(0.5)


Example (con’t)

Confidence Interval at u = 0.5

σx = 0.5 σx = 1n 100 200 300 100 200 300

Post-Double-Selection 861 907 927 876 872 898Post-Single-Selection 752 653 574 861 845 866

Oracle 934 949 944 933 945 944

Confidence Band

σx = 0.5 σx = 1n 100 200 300 100 200 300

Post-Double-Selection 770 875 915 834 816 825Post-Single-Selection 585 425 395 785 750 716

Oracle 780 940 980 855 940 964


This Talk

Question:How to construct valid confidence intervals in high-dimensionalvarying-coefficient models?

Requirements:

robust against model selection mistakes

valid for a wide range of data generating processes


Outline

1 Recent developments

2 Post-double selection estimator

3 An application to inference in graphical models


Recent Developments

Inference in high-dimensional linear and generalized linear models

least squares regression: Zhang and Zhang (2013), Belloni et al.(2013a), Javanmard and Montanari (2014)

generalized linear models: van de Geer et al. (2014), Belloni et al.(2013d)

LAD and QR: Belloni et al. (2013c), Belloni et al. (2013b)

Gaussian graphical models

Ren et al. (2013), Chen et al. (2013), Jankova and van de Geer(2014)


Recent Developments

Selective inference

along the path: Lockhart et al. (2014)

fixed λ: Lee et al. (2013), Taylor et al. (2014)

Other

sample splitting Wasserman and Roeder (2009), Meinshausenet al. (2009)

stability selection Meinshausen and Buhlmann (2010), Shah andSamworth (2013)

FDR control Foygel Barber and Candes (2014)

POSI Berk et al. (2013)

...


Outline






Step 1: Regress Y onto X using local linear lasso

Obtain set of relevant predictors S1(u)

Step 2: Regress Xj onto X−j using local linear lasso

Obtain set of relevant predictors S2(u)

Step 3: Sj(u) = j ∪ S1(u) ∪ S2(u)

(β(u)

δ(u)

)min←−− 1

2n

∑i∈[n]

Kh(ui−u)(yi − x>i,Sj(u)β − x

>i,Sj(u)

δ(ui − u))2


Confidence Intervals and Bands

Confidence interval at a point u

σ−1(βj(u))(βj(u)− βj(u)− bias(βj(u))

)→D N (0, 1)

where σ2(βj(u)) =(nhf(u)E[X

SX>S| U = u]

)−1jj

(∫K2(u)du

)σ2(u)

Confidence band

P

(−2 log h)1/2

supu∈[0,1]

∣∣∣βj(u)− βj(u)− bias(βj(u))∣∣∣

σ2(βj(u))− dv,n

< x

→ exp(−2 exp(−x))

where dv,n = (−2 log h)1/2 + C(K)

(−2 log h)1/2


What do we need from Lasso?

Prediction bound

supu∈[0,1]

||X>(β(u)− β(u)

)||2 .P

√s (log p+ log h−1)

nh

Estimation bound

supu∈[0,1]

||β(u)− β(u)||1 .P s

√(log p+ log h−1)

nh

Size of the estimated support

supu∈[0,1]

|S(u)| ≤P cs


Conditions for Kernel-Lasso

Key Ingredients for Lasso Bounds:

Strong Convexity =⇒ Local sparse eigenvalue condition. ForA(u) = 1

nhf(u)

∑i∈[n]Kh(ui − u)xix

>i ,

κ1(C) ≤ φmin(C · s)(A(u)) ≤ φmax(C · s)(A(u)) ≤ κ2(C)

with probability 1− o(1).

Score Domination:

supu∈[0,1]

λ(u) ≥ c 1

σ21jn−1

∑i∈[n]

Kh(ui − u)xij(yi − x>i β(u))

for every 1 ≤ j ≤ p with probability 1− o(1)

Penalty Loading Quality: For `→P , u = OP (1),

`σ21j ≤ σ21j ≤ uσ21j for every 1 ≤ j ≤ p


Conditions for Kernel-Lasso

Conditions for Lasso: With probability 1− o(1)

E[Kh(u− ui)x2ij(yi − x>i β(u))2] bounded from above and awayfrom zero uniformly in n, u ∈ [0, 1].

max1≤j≤p σ1j(u)/min1≤j≤p σ1j(u) = O(1) uniformly in u ∈ [0, 1]

max1≤j≤p |σ1j(u)− σ1j(u)|/σ1j(u) = o(1) uniformly in u ∈ [0, 1]

max1≤j≤p

(n−1

∑i∈[n]Kh(ui − u)x3ij(yi − x>i β(u))3

)1/3/σ1j(u) =

O(1) uniformly in u ∈ [0, 1]

log3 p = o(nh) and s log(maxp, n) = o(nh)

Sparse eignevalue and penalty loading quality conditions aresatisfied.


Outline





Application to Inference in Gaussian graphical models

ModelX | U = u ∼ N (µ(u),Σ(u)),

Let Ω(u) = Σ−1(u) = (ωab(u))a,b∈[p]×[p]. For I = a, b and J = [p]\I,

ΩII(u) =(ΣII(u)− ΣIJ(u)Σ−1JJΣJI

)−1=:

(θaa(u) θab(u)θba(u) θbb(u)

)−1θaa(u) = σaa(u)− γ>a (u)ΣJJ(u)γa(u),

θbb(u) = σbb(u)− γ>b (u)ΣJJ(u)γb(u),

θab(u) = σab(u)− γ>a (u)ΣJJ(u)γb(u),

where γa(u) = Σ−1JJ (u)ΣJa(u) are coefficients in linear regression of Xa

onto XJ given U = u.


Application to Inference in Gaussian graphical models

Estimate the Markov blanket of Xa, Xb

J(u) = supp(γa(u)) ∪ supp(γb(u))

Define

θab(u) = σab(u)− ΣaJ(u)(u)(

ΣJ(u)J(u)(u))−1

ΣJ(u)b(u)

and similarly θaa(u) and θbb(u).

The estimator of ΩII(u) is

ΩII(u) =

(θaa(u) θab(u)

θba(u) θbb(u)

)−1.


Example – chain graph (p = 200)

n = 500

n = 1000

0 0.2 0.4 0.6 0.8 1

−0.25

0

0.25

0.5

0.75

1


u

EdgeV

al

0 0.2 0.4 0.6 0.8 1

−0.25

0

0.25

0.5

0.75

1

Oracle Estimator

u

0 0.2 0.4 0.6 0.8 1

−0.25

0

0.25

0.5

0.75

1


u

EdgeV

al

0 0.2 0.4 0.6 0.8 1

−0.25

0

0.25

0.5

0.75

1

Oracle Estimator

u


Improvements

Multiplier bootstrap instead of asymptotic theory

Hypothesis testing for more than one component of the unknownparameter vector


Thank you!


References I

A. Belloni, V. Chernozhukov, and C. B. Hansen. Inference ontreatment effects after selection amongst high-dimensional controls.Rev. Econ. Stud., 81(2):608–650, Nov 2013a.

A. Belloni, V. Chernozhukov, and K. Kato. Robust inference inhigh-dimensional approximately sparse quantile regression models.arXiv preprint arXiv:1312.7186, December 2013b.

A. Belloni, V. Chernozhukov, and K. Kato. Uniform post selectioninference for lad regression models. arXiv preprint arXiv:1304.0282,2013c.

A. Belloni, V. Chernozhukov, and Y. Wei. Honest confidence regionsfor logistic regression with a large number of controls. arXiv preprintarXiv:1304.3969, 2013d.

R. Berk, L. D. Brown, A. Buja, K. Zhang, and L. Zhao. Validpost-selection inference. Ann. Stat., 41(2):802–837, 2013.


References II

M. Chen, Z. Ren, H. Zhao, and H. H. Zhou. Asymptotically normaland efficient estimation of covariate-adjusted gaussian graphicalmodel. arXiv preprint arXiv:1309.5923, 2013.

J. Fan and W. Zhang. Simultaneous confidence bands and hypothesistesting in varying-coefficient models. Scand. J. Stat., 27(4):715–731,Dec 2000.

R. Foygel Barber and E. J. Candes. Controlling the false discovery ratevia knockoffs. ArXiv e-prints, arXiv:1404.5609, April 2014.

J. Jankova and S. A. van de Geer. Confidence intervals forhigh-dimensional inverse covariance estimation. ArXiv e-prints,arXiv:1403.6752, March 2014.

A. Javanmard and A. Montanari. Confidence intervals and hypothesistesting for high-dimensional regression. J. Mach. Learn. Res., 15(Oct):2869–2909, 2014.


References III

J. D. Lee, D. L. Sun, Y. Sun, and J. E. Taylor. Exact post-selectioninference with the lasso. ArXiv e-prints, arXiv:1311.6238, November2013.

R. Lockhart, J. E. Taylor, R. J. Tibshirani, and R. J. Tibshirani. Asignificance test for the lasso. Ann. Stat., 42(2):413–468, 2014.

N. Meinshausen and P. Buhlmann. Stability selection. J. R. Stat. Soc.B, 72(4):417–473, 2010.

N. Meinshausen, L. Meier, and P. Buhlmann. P-values forhigh-dimensional regression. J. Am. Stat. Assoc., 104(488), 2009.

Z. Ren, T. Sun, C.-H. Zhang, and H. H. Zhou. Asymptotic normalityand optimalities in estimation of large gaussian graphical model.arXiv preprint arXiv:1309.6024, 2013.

R. D. Shah and R. J. Samworth. Variable selection with error control:another look at stability selection. J. R. Stat. Soc. B, 75(1):55–80,2013.


References IV

J. E. Taylor, R. Lockhart, R. J. Tibshirani, and R. J. Tibshirani.Post-selection adaptive inference for least angle regression and thelasso. arXiv preprint arXiv:1401.3889, January 2014.

S. A. van de Geer, P. Buhlmann, Y. Ritov, and R. Dezeure. Onasymptotically optimal confidence regions and tests forhigh-dimensional models. Ann. Stat., 42(3):1166–1202, Jun 2014.

L. A. Wasserman and K. Roeder. High-dimensional variable selection.Ann. Stat., 37(5A):2178–2201, 2009.

C.-H. Zhang and S. S. Zhang. Confidence intervals for low dimensionalparameters in high dimensional linear models. J. R. Stat. Soc. B, 76(1):217–242, Jul 2013.


Inference in High-Dimensional Varying Coefficient Models (slides)

Documents

Transcript of Inference in High-Dimensional Varying Coefficient Models (slides)