Inference in High-Dimensional Varying Coefficient Models (slides)
Transcript of Inference in High-Dimensional Varying Coefficient Models (slides)
Inference in High-dimensional Varying CoefficientModels
Mladen Kolar
The University of ChicagoBooth School of Business
Dec 15, 2014
Acknowledgments
D. Kozbur
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 2
Varying-coefficient Model
yi = x>i β(ui) + σ(ui, xi)εi, εi ∼ Fi,E[εi | xi, ui] = 0, i = 1, . . . , n, E
[ε2i]
= 1,
High dimensional setting: Xi ∈ Rp, p n
Approximate sparsity
E[yi | xi = x, ui = u] ≈ x>S(u)βS(u)(u),
where S(u) ⊂ [p], |S(u)| ≤ s n
Our goal: constructing confidence bands for βj(u)
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 3
Varying-coefficient Model
Widely used
economics, finance, medical science, ecology
Flexible modeling
less restrictive assumptions
domain scientists have prior knowledge that can be used
Interpretable
for each value of the index parameter, one has a parametric model
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 4
Confidence Intervals/Bands
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 5
Local Linear Lasso
For a fixed point u, we estimate β(u) as
(β(u)
δ(u)
)= arg min
β,δ∈Rp1
2n
∑i∈[n]
Kh(ui − u)(yi − x>i β − x>i δ(ui − u)
)2+λ
∑j∈[p]
(σ1j |βj |+ σ2j |δj |)
where
σ21j = n−1∑i∈[n]
Kh(ui − u)x2ij(yi − x>i β(ui))2
estimates the variance of the score vector
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 6
Naive Confidence Bands
An idea
Use the local linear lasso to select the model
Use the selected components and refit the model
Construct confidence bands using the results of Fan and Zhang(2000)
Issues
not uniformly valid
hinges on correct model selection
requires stringent design conditions and
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 7
Example
Yi = 4XiUi(1− Ui)
+ Zi1√Ui/2 + Zi2
√Ui/4 + Zi3
√Ui/8 + Zi4
√Ui/16
+ Zi5(1− Ui)/2 + Zi6(1− Ui)/4 + Zi7(1− Ui)/8 + Zi8(1− Ui)/16
+ εi
Xi = Zi1√Ui/2 + Zi2
√Ui/4 + Zi3
√Ui/8 + Zi4
√Ui/16
+ Zi5(1− Ui)/2 + Zi6(1− Ui)/2 + Zi7(1− Ui)/8 + Zi8(1− Ui)/16
+ σxξi
εi, ξi ∼ N (0, 1), Zi ∈ Np(0, I), p = 50, n = 200
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 8
Example (con’t)
σx = 0.5
σx = 1
0 0.2 0.4 0.6 0.8 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Post-Double-Selection Estimator
u
α(u)
0 0.2 0.4 0.6 0.8 1−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Post-Single-Selection Estimator
u0 0.2 0.4 0.6 0.8 1
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Oracle Estimator
u
0 0.2 0.4 0.6 0.8 1−1.5
−1
−0.5
0
0.5
1
1.5
2
Post-Double-Selection Estimator
u
α(u)
0 0.2 0.4 0.6 0.8 1−1.5
−1
−0.5
0
0.5
1
1.5
2
Post-Single-Selection Estimator
u0 0.2 0.4 0.6 0.8 1
−1.5
−1
−0.5
0
0.5
1
1.5
2
Oracle Estimator
u
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 9
Example (con’t)
σx = 0.5
σx = 1
0 0.5 1 1.5 2
Post-Double-Selection Estimator
α(0.5)0 0.5 1 1.5 2
Post-Single-Selection Estimator
α(0.5)0 0.5 1 1.5 2
Oracle Estimator
α(0.5)
0.4 0.6 0.8 1 1.2 1.4 1.6
Post-Double-Selection Estimator
α(0.5)0.4 0.6 0.8 1 1.2 1.4 1.6
Post-Single-Selection Estimator
α(0.5)0.4 0.6 0.8 1 1.2 1.4 1.6
Oracle Estimator
α(0.5)
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 10
Example (con’t)
Confidence Interval at u = 0.5
σx = 0.5 σx = 1n 100 200 300 100 200 300
Post-Double-Selection 861 907 927 876 872 898Post-Single-Selection 752 653 574 861 845 866
Oracle 934 949 944 933 945 944
Confidence Band
σx = 0.5 σx = 1n 100 200 300 100 200 300
Post-Double-Selection 770 875 915 834 816 825Post-Single-Selection 585 425 395 785 750 716
Oracle 780 940 980 855 940 964
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 11
This Talk
Question:How to construct valid confidence intervals in high-dimensionalvarying-coefficient models?
Requirements:
robust against model selection mistakes
valid for a wide range of data generating processes
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 12
Outline
1 Recent developments
2 Post-double selection estimator
3 An application to inference in graphical models
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 13
Recent Developments
Inference in high-dimensional linear and generalized linear models
least squares regression: Zhang and Zhang (2013), Belloni et al.(2013a), Javanmard and Montanari (2014)
generalized linear models: van de Geer et al. (2014), Belloni et al.(2013d)
LAD and QR: Belloni et al. (2013c), Belloni et al. (2013b)
Gaussian graphical models
Ren et al. (2013), Chen et al. (2013), Jankova and van de Geer(2014)
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 14
Recent Developments
Selective inference
along the path: Lockhart et al. (2014)
fixed λ: Lee et al. (2013), Taylor et al. (2014)
Other
sample splitting Wasserman and Roeder (2009), Meinshausenet al. (2009)
stability selection Meinshausen and Buhlmann (2010), Shah andSamworth (2013)
FDR control Foygel Barber and Candes (2014)
POSI Berk et al. (2013)
...
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 15
Outline
1 Recent developments
2 Post-double selection estimator
3 An application to inference in graphical models
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 16
Post-Double-Selection Estimator
Step 1: Regress Y onto X using local linear lasso
Obtain set of relevant predictors S1(u)
Step 2: Regress Xj onto X−j using local linear lasso
Obtain set of relevant predictors S2(u)
Step 3: Sj(u) = j ∪ S1(u) ∪ S2(u)
(β(u)
δ(u)
)min←−− 1
2n
∑i∈[n]
Kh(ui−u)(yi − x>i,Sj(u)β − x
>i,Sj(u)
δ(ui − u))2
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 17
Confidence Intervals and Bands
Confidence interval at a point u
σ−1(βj(u))(βj(u)− βj(u)− bias(βj(u))
)→D N (0, 1)
where σ2(βj(u)) =(nhf(u)E[X
SX>S| U = u]
)−1jj
(∫K2(u)du
)σ2(u)
Confidence band
P
(−2 log h)1/2
supu∈[0,1]
∣∣∣βj(u)− βj(u)− bias(βj(u))∣∣∣
σ2(βj(u))− dv,n
< x
→ exp(−2 exp(−x))
where dv,n = (−2 log h)1/2 + C(K)
(−2 log h)1/2
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 18
What do we need from Lasso?
Prediction bound
supu∈[0,1]
||X>(β(u)− β(u)
)||2 .P
√s (log p+ log h−1)
nh
Estimation bound
supu∈[0,1]
||β(u)− β(u)||1 .P s
√(log p+ log h−1)
nh
Size of the estimated support
supu∈[0,1]
|S(u)| ≤P cs
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 19
Conditions for Kernel-Lasso
Key Ingredients for Lasso Bounds:
Strong Convexity =⇒ Local sparse eigenvalue condition. ForA(u) = 1
nhf(u)
∑i∈[n]Kh(ui − u)xix
>i ,
κ1(C) ≤ φmin(C · s)(A(u)) ≤ φmax(C · s)(A(u)) ≤ κ2(C)
with probability 1− o(1).
Score Domination:
supu∈[0,1]
λ(u) ≥ c 1
σ21jn−1
∑i∈[n]
Kh(ui − u)xij(yi − x>i β(u))
for every 1 ≤ j ≤ p with probability 1− o(1)
Penalty Loading Quality: For `→P , u = OP (1),
`σ21j ≤ σ21j ≤ uσ21j for every 1 ≤ j ≤ p
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 20
Conditions for Kernel-Lasso
Conditions for Lasso: With probability 1− o(1)
E[Kh(u− ui)x2ij(yi − x>i β(u))2] bounded from above and awayfrom zero uniformly in n, u ∈ [0, 1].
max1≤j≤p σ1j(u)/min1≤j≤p σ1j(u) = O(1) uniformly in u ∈ [0, 1]
max1≤j≤p |σ1j(u)− σ1j(u)|/σ1j(u) = o(1) uniformly in u ∈ [0, 1]
max1≤j≤p
(n−1
∑i∈[n]Kh(ui − u)x3ij(yi − x>i β(u))3
)1/3/σ1j(u) =
O(1) uniformly in u ∈ [0, 1]
log3 p = o(nh) and s log(maxp, n) = o(nh)
Sparse eignevalue and penalty loading quality conditions aresatisfied.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 21
Outline
1 Recent developments
2 Post-double selection estimator
3 An application to inference in graphical models
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 22
Application to Inference in Gaussian graphical models
ModelX | U = u ∼ N (µ(u),Σ(u)),
Let Ω(u) = Σ−1(u) = (ωab(u))a,b∈[p]×[p]. For I = a, b and J = [p]\I,
ΩII(u) =(ΣII(u)− ΣIJ(u)Σ−1JJΣJI
)−1=:
(θaa(u) θab(u)θba(u) θbb(u)
)−1θaa(u) = σaa(u)− γ>a (u)ΣJJ(u)γa(u),
θbb(u) = σbb(u)− γ>b (u)ΣJJ(u)γb(u),
θab(u) = σab(u)− γ>a (u)ΣJJ(u)γb(u),
where γa(u) = Σ−1JJ (u)ΣJa(u) are coefficients in linear regression of Xa
onto XJ given U = u.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 23
Application to Inference in Gaussian graphical models
Estimate the Markov blanket of Xa, Xb
J(u) = supp(γa(u)) ∪ supp(γb(u))
Define
θab(u) = σab(u)− ΣaJ(u)(u)(
ΣJ(u)J(u)(u))−1
ΣJ(u)b(u)
and similarly θaa(u) and θbb(u).
The estimator of ΩII(u) is
ΩII(u) =
(θaa(u) θab(u)
θba(u) θbb(u)
)−1.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 24
Example – chain graph (p = 200)
n = 500
n = 1000
0 0.2 0.4 0.6 0.8 1
−0.25
0
0.25
0.5
0.75
1
Post-Double-Selection Estimator
u
EdgeV
al
0 0.2 0.4 0.6 0.8 1
−0.25
0
0.25
0.5
0.75
1
Oracle Estimator
u
0 0.2 0.4 0.6 0.8 1
−0.25
0
0.25
0.5
0.75
1
Post-Double-Selection Estimator
u
EdgeV
al
0 0.2 0.4 0.6 0.8 1
−0.25
0
0.25
0.5
0.75
1
Oracle Estimator
u
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 25
Improvements
Multiplier bootstrap instead of asymptotic theory
Hypothesis testing for more than one component of the unknownparameter vector
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 26
Thank you!
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 27
References I
A. Belloni, V. Chernozhukov, and C. B. Hansen. Inference ontreatment effects after selection amongst high-dimensional controls.Rev. Econ. Stud., 81(2):608–650, Nov 2013a.
A. Belloni, V. Chernozhukov, and K. Kato. Robust inference inhigh-dimensional approximately sparse quantile regression models.arXiv preprint arXiv:1312.7186, December 2013b.
A. Belloni, V. Chernozhukov, and K. Kato. Uniform post selectioninference for lad regression models. arXiv preprint arXiv:1304.0282,2013c.
A. Belloni, V. Chernozhukov, and Y. Wei. Honest confidence regionsfor logistic regression with a large number of controls. arXiv preprintarXiv:1304.3969, 2013d.
R. Berk, L. D. Brown, A. Buja, K. Zhang, and L. Zhao. Validpost-selection inference. Ann. Stat., 41(2):802–837, 2013.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 28
References II
M. Chen, Z. Ren, H. Zhao, and H. H. Zhou. Asymptotically normaland efficient estimation of covariate-adjusted gaussian graphicalmodel. arXiv preprint arXiv:1309.5923, 2013.
J. Fan and W. Zhang. Simultaneous confidence bands and hypothesistesting in varying-coefficient models. Scand. J. Stat., 27(4):715–731,Dec 2000.
R. Foygel Barber and E. J. Candes. Controlling the false discovery ratevia knockoffs. ArXiv e-prints, arXiv:1404.5609, April 2014.
J. Jankova and S. A. van de Geer. Confidence intervals forhigh-dimensional inverse covariance estimation. ArXiv e-prints,arXiv:1403.6752, March 2014.
A. Javanmard and A. Montanari. Confidence intervals and hypothesistesting for high-dimensional regression. J. Mach. Learn. Res., 15(Oct):2869–2909, 2014.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 29
References III
J. D. Lee, D. L. Sun, Y. Sun, and J. E. Taylor. Exact post-selectioninference with the lasso. ArXiv e-prints, arXiv:1311.6238, November2013.
R. Lockhart, J. E. Taylor, R. J. Tibshirani, and R. J. Tibshirani. Asignificance test for the lasso. Ann. Stat., 42(2):413–468, 2014.
N. Meinshausen and P. Buhlmann. Stability selection. J. R. Stat. Soc.B, 72(4):417–473, 2010.
N. Meinshausen, L. Meier, and P. Buhlmann. P-values forhigh-dimensional regression. J. Am. Stat. Assoc., 104(488), 2009.
Z. Ren, T. Sun, C.-H. Zhang, and H. H. Zhou. Asymptotic normalityand optimalities in estimation of large gaussian graphical model.arXiv preprint arXiv:1309.6024, 2013.
R. D. Shah and R. J. Samworth. Variable selection with error control:another look at stability selection. J. R. Stat. Soc. B, 75(1):55–80,2013.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 30
References IV
J. E. Taylor, R. Lockhart, R. J. Tibshirani, and R. J. Tibshirani.Post-selection adaptive inference for least angle regression and thelasso. arXiv preprint arXiv:1401.3889, January 2014.
S. A. van de Geer, P. Buhlmann, Y. Ritov, and R. Dezeure. Onasymptotically optimal confidence regions and tests forhigh-dimensional models. Ann. Stat., 42(3):1166–1202, Jun 2014.
L. A. Wasserman and K. Roeder. High-dimensional variable selection.Ann. Stat., 37(5A):2178–2201, 2009.
C.-H. Zhang and S. S. Zhang. Confidence intervals for low dimensionalparameters in high dimensional linear models. J. R. Stat. Soc. B, 76(1):217–242, Jul 2013.
M. Kolar (Chicago Booth) Inference in High-dimensional Varying Coefficient ModelsDecember 15, 2014 31