CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free...

204
Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger, Nic Dalmasso, Jing Lei, Max G’Sell, Alessandro Rinaldo, Jamie Robins, Mauricio Sadinle, Ryan Tibshirani, JaeHyeok Shin, Robin Dunn

Transcript of CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free...

Page 1: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Model Free Predictive InferenceLarry Wasserman

Carnegie Mellon University

With: Yotam Hechtlinger, Nic Dalmasso, Jing Lei, Max G’Sell,Alessandro Rinaldo, Jamie Robins, Mauricio Sadinle, Ryan

Tibshirani, JaeHyeok Shin, Robin Dunn

Page 2: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

OUTLINE

“All models are wrong ...” George Box

“Use models but don’t believe them” Tukey?

“Investigators who use [regression] are not payingadequate attention to the connection - if any - betweenthe models and the phenomena they are studying. ... Bythe time the models are deployed, the scientific positionis nearly hopeless. Reliance on models in such cases isPanglossian ...”—David Freedman

By focusing on predictive quantities, we can do many things with(almost) no assumptions.Not assuming: linearity, constant variance, incoherence, sparsity,etc.

Page 3: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

OUTLINE

“All models are wrong ...” George Box

“Use models but don’t believe them” Tukey?

“Investigators who use [regression] are not payingadequate attention to the connection - if any - betweenthe models and the phenomena they are studying. ... Bythe time the models are deployed, the scientific positionis nearly hopeless. Reliance on models in such cases isPanglossian ...”—David Freedman

By focusing on predictive quantities, we can do many things with(almost) no assumptions.Not assuming: linearity, constant variance, incoherence, sparsity,etc.

Page 4: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

OUTLINE

“All models are wrong ...” George Box

“Use models but don’t believe them” Tukey?

“Investigators who use [regression] are not payingadequate attention to the connection - if any - betweenthe models and the phenomena they are studying. ... Bythe time the models are deployed, the scientific positionis nearly hopeless. Reliance on models in such cases isPanglossian ...”—David Freedman

By focusing on predictive quantities, we can do many things with(almost) no assumptions.Not assuming: linearity, constant variance, incoherence, sparsity,etc.

Page 5: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

OUTLINE

“All models are wrong ...” George Box

“Use models but don’t believe them” Tukey?

“Investigators who use [regression] are not payingadequate attention to the connection - if any - betweenthe models and the phenomena they are studying. ... Bythe time the models are deployed, the scientific positionis nearly hopeless. Reliance on models in such cases isPanglossian ...”—David Freedman

By focusing on predictive quantities, we can do many things with(almost) no assumptions.Not assuming: linearity, constant variance, incoherence, sparsity,etc.

Page 6: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

OUTLINE

“All models are wrong ...” George Box

“Use models but don’t believe them” Tukey?

“Investigators who use [regression] are not payingadequate attention to the connection - if any - betweenthe models and the phenomena they are studying. ... Bythe time the models are deployed, the scientific positionis nearly hopeless. Reliance on models in such cases isPanglossian ...”—David Freedman

By focusing on predictive quantities, we can do many things with(almost) no assumptions.Not assuming: linearity, constant variance, incoherence, sparsity,etc.

Page 7: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Outline

• Part I: High-dimensional regression and classification.

• Part II: Clustering.

• Part III: Random effects.

Page 8: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Outline

• Part I: High-dimensional regression and classification.

• Part II: Clustering.

• Part III: Random effects.

Page 9: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Outline

• Part I: High-dimensional regression and classification.

• Part II: Clustering.

• Part III: Random effects.

Page 10: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Outline

• Part I: High-dimensional regression and classification.

• Part II: Clustering.

• Part III: Random effects.

Page 11: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

High Dimensional Regression

Let (X1,Y1), . . . , (Xn,Yn) ∼ P. Xi ∈ Rd , Yi ∈ R.

No assumptions (except iid).

Linear regression:The projection interpretation: β is the best linear predictor:

β∗ = argminE[(Y − βTX )2] = Λ−1α

where Λ = E[XXT ] and α = E[YX ]. So β∗ = g(vec(Λ), α).

Inference? Use CLT or bootstrap to approximate the distributionof√n(β − β)?

Unfortunately, they are quite poor.

Page 12: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

High Dimensional Regression

Let (X1,Y1), . . . , (Xn,Yn) ∼ P. Xi ∈ Rd , Yi ∈ R.

No assumptions (except iid).

Linear regression:The projection interpretation: β is the best linear predictor:

β∗ = argminE[(Y − βTX )2] = Λ−1α

where Λ = E[XXT ] and α = E[YX ]. So β∗ = g(vec(Λ), α).

Inference? Use CLT or bootstrap to approximate the distributionof√n(β − β)?

Unfortunately, they are quite poor.

Page 13: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

High Dimensional Regression

Let (X1,Y1), . . . , (Xn,Yn) ∼ P. Xi ∈ Rd , Yi ∈ R.

No assumptions (except iid).

Linear regression:The projection interpretation: β is the best linear predictor:

β∗ = argminE[(Y − βTX )2] = Λ−1α

where Λ = E[XXT ] and α = E[YX ]. So β∗ = g(vec(Λ), α).

Inference? Use CLT or bootstrap to approximate the distributionof√n(β − β)?

Unfortunately, they are quite poor.

Page 14: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

High Dimensional Regression

Let (X1,Y1), . . . , (Xn,Yn) ∼ P. Xi ∈ Rd , Yi ∈ R.

No assumptions (except iid).

Linear regression:The projection interpretation: β is the best linear predictor:

β∗ = argminE[(Y − βTX )2] = Λ−1α

where Λ = E[XXT ] and α = E[YX ]. So β∗ = g(vec(Λ), α).

Inference? Use CLT or bootstrap to approximate the distributionof√n(β − β)?

Unfortunately, they are quite poor.

Page 15: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

High Dimensional Regression

Let (X1,Y1), . . . , (Xn,Yn) ∼ P. Xi ∈ Rd , Yi ∈ R.

No assumptions (except iid).

Linear regression:The projection interpretation: β is the best linear predictor:

β∗ = argminE[(Y − βTX )2] = Λ−1α

where Λ = E[XXT ] and α = E[YX ]. So β∗ = g(vec(Λ), α).

Inference? Use CLT or bootstrap to approximate the distributionof√n(β − β)?

Unfortunately, they are quite poor.

Page 16: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: n = 100, d = 50

True

−400 −200 0 200 400

020

4060

8010

0

Boot

−400 −200 0 200 400

050

100

150

200

Boot

−400 −200 0 200 400

050

100

150

Boot

−400 −200 0 200 400

010

020

030

0

Page 17: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

d Large

Rinaldo, Wasserman, G’Sell (2018). Berry-Esseen bound:

Ψn = supP∈Pn

supt|P(√n(β − β) ≤ t)−Normal| ≤ ∆1 + ∆2 + ∆3

∆1,n ≈(d2

n

)1/6

linear terms

∆2,n ≈(d4

n

)1/2

non− linear terms

∆3,n ≈(d5

n

)1/6

covariance estimation (sandwich)

Page 18: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

d Large

Rinaldo, Wasserman, G’Sell (2018). Berry-Esseen bound:

Ψn = supP∈Pn

supt|P(√n(β − β) ≤ t)−Normal| ≤ ∆1 + ∆2 + ∆3

∆1,n ≈(d2

n

)1/6

linear terms

∆2,n ≈(d4

n

)1/2

non− linear terms

∆3,n ≈(d5

n

)1/6

covariance estimation (sandwich)

Page 19: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bad News

∆3,n ≈(d5

n

)1/6is the worse term. (Due to estimating the

covariance of β i.e. sandwich estimator.)

Need: d = o(n1/5) ≈ constant!

Similar bound for bootstrap.

It’s bad. Easily confirmed by simulation.

We think it is tight (but have not proved it). Lower bounds forBerry-Esseen theorems are rare.

Similar conclusion in El Karoui and Purdom (2016).

Page 20: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bad News

∆3,n ≈(d5

n

)1/6is the worse term. (Due to estimating the

covariance of β i.e. sandwich estimator.)

Need: d = o(n1/5) ≈ constant!

Similar bound for bootstrap.

It’s bad. Easily confirmed by simulation.

We think it is tight (but have not proved it). Lower bounds forBerry-Esseen theorems are rare.

Similar conclusion in El Karoui and Purdom (2016).

Page 21: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bad News

∆3,n ≈(d5

n

)1/6is the worse term. (Due to estimating the

covariance of β i.e. sandwich estimator.)

Need: d = o(n1/5) ≈ constant!

Similar bound for bootstrap.

It’s bad. Easily confirmed by simulation.

We think it is tight (but have not proved it). Lower bounds forBerry-Esseen theorems are rare.

Similar conclusion in El Karoui and Purdom (2016).

Page 22: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bad News

∆3,n ≈(d5

n

)1/6is the worse term. (Due to estimating the

covariance of β i.e. sandwich estimator.)

Need: d = o(n1/5) ≈ constant!

Similar bound for bootstrap.

It’s bad. Easily confirmed by simulation.

We think it is tight (but have not proved it). Lower bounds forBerry-Esseen theorems are rare.

Similar conclusion in El Karoui and Purdom (2016).

Page 23: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bad News

∆3,n ≈(d5

n

)1/6is the worse term. (Due to estimating the

covariance of β i.e. sandwich estimator.)

Need: d = o(n1/5) ≈ constant!

Similar bound for bootstrap.

It’s bad. Easily confirmed by simulation.

We think it is tight (but have not proved it). Lower bounds forBerry-Esseen theorems are rare.

Similar conclusion in El Karoui and Purdom (2016).

Page 24: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bad News

∆3,n ≈(d5

n

)1/6is the worse term. (Due to estimating the

covariance of β i.e. sandwich estimator.)

Need: d = o(n1/5) ≈ constant!

Similar bound for bootstrap.

It’s bad. Easily confirmed by simulation.

We think it is tight (but have not proved it). Lower bounds forBerry-Esseen theorems are rare.

Similar conclusion in El Karoui and Purdom (2016).

Page 25: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Solutions

1. Change the parameter (LOCO)

2. Focus on prediction (conformalization).

Page 26: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Solutions

1. Change the parameter (LOCO)

2. Focus on prediction (conformalization).

Page 27: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Solutions

1. Change the parameter (LOCO)

2. Focus on prediction (conformalization).

Page 28: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 29: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 30: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 31: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 32: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 33: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 34: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 35: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

LOCO (Leave Out COvariates) Inference

Split data D1, D2.

Fit m (any regression estimator) on D1.

Drop variable j , re-run and get m(−j).

Use D2 to get exact confidence interval for the conditional quantity

θj = φ

(|Y − m(−j)(X )| − |Y − m(X )|

∣∣∣∣∣ D1

)

where φ is median or mean.

m can be: lasso, random forest, neural net, ...

Exact for median: For mean: Berry-Esseen bound: 1/√n

Similar to: Brieman permutation importance, Mentch and Hooker(2016), Abbasi-Asl and Yu (2017), Davies (2018)

Page 36: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

This is a conditional quantity. (Randomness is in X ,Y not in theestimator which is fixed.)

The corresponding marginal (population) quantity, namely,

ψ = E[|Y −m(−j)(X )| − |Y −m(X )|]

orψ = E[(Y −m(−j)(X ))2 − (Y −m(X ))2]

is not estimable in the model free framework. (Similarly, forconditional independence.)

Page 37: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

This is a conditional quantity. (Randomness is in X ,Y not in theestimator which is fixed.)

The corresponding marginal (population) quantity, namely,

ψ = E[|Y −m(−j)(X )| − |Y −m(X )|]

orψ = E[(Y −m(−j)(X ))2 − (Y −m(X ))2]

is not estimable in the model free framework. (Similarly, forconditional independence.)

Page 38: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

As an estimate of ψ, the plug-in is biased.

LOCO based on the 1-nn estimate (DeVroye, Gyorfi, Lugosi, Walk2018) has low bias but very low power.

Subtracting the first order influence function (Williamson, Gilbert,Simon, Carone 2017) is reasonable but still requires manyassumptions.

Also, it is degenerate under the null.(But we (JR and LW) are looking into higher-order statistics.)

Ultimately, we have to regard it as exact inference for a conditionalquantity .

Page 39: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

As an estimate of ψ, the plug-in is biased.

LOCO based on the 1-nn estimate (DeVroye, Gyorfi, Lugosi, Walk2018) has low bias but very low power.

Subtracting the first order influence function (Williamson, Gilbert,Simon, Carone 2017) is reasonable but still requires manyassumptions.

Also, it is degenerate under the null.(But we (JR and LW) are looking into higher-order statistics.)

Ultimately, we have to regard it as exact inference for a conditionalquantity .

Page 40: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

As an estimate of ψ, the plug-in is biased.

LOCO based on the 1-nn estimate (DeVroye, Gyorfi, Lugosi, Walk2018) has low bias but very low power.

Subtracting the first order influence function (Williamson, Gilbert,Simon, Carone 2017) is reasonable but still requires manyassumptions.

Also, it is degenerate under the null.(But we (JR and LW) are looking into higher-order statistics.)

Ultimately, we have to regard it as exact inference for a conditionalquantity .

Page 41: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

As an estimate of ψ, the plug-in is biased.

LOCO based on the 1-nn estimate (DeVroye, Gyorfi, Lugosi, Walk2018) has low bias but very low power.

Subtracting the first order influence function (Williamson, Gilbert,Simon, Carone 2017) is reasonable but still requires manyassumptions.

Also, it is degenerate under the null.(But we (JR and LW) are looking into higher-order statistics.)

Ultimately, we have to regard it as exact inference for a conditionalquantity .

Page 42: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Bias

As an estimate of ψ, the plug-in is biased.

LOCO based on the 1-nn estimate (DeVroye, Gyorfi, Lugosi, Walk2018) has low bias but very low power.

Subtracting the first order influence function (Williamson, Gilbert,Simon, Carone 2017) is reasonable but still requires manyassumptions.

Also, it is degenerate under the null.(But we (JR and LW) are looking into higher-order statistics.)

Ultimately, we have to regard it as exact inference for a conditionalquantity .

Page 43: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Examples of LOCO: n=200, d=500 (RT)

Example: n = 200, d = 500, with data model Y = βTX + ε, suchthat X ∼ N(0, Id), ε ∼ N(0, 1), and

βj

{∼ N(0, 2) j = 1, . . . , 5

= 0 otherwise

I Algorithm is the lasso, with 5-fold CV and 1se rule to select λ

I Compute an interval for

θj(D1) = med(|Y − m−jn1 (X )| − |Y − mn1(X )|

∣∣∣D1

)for each j in lasso active set

I Use Bonferroni correction: if s variables are selected, then wecompute each LOCO interval at level 1− α/s

Note: slides marked RT are courtesy of Ryan Tibshirani.

Page 44: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Examples of LOCO

0.0

0.5

1.0

1.5

LOCO Intervals using Lasso + CV

Variable

Test

err

or d

iff

12

34

522

6173

106122

140147

150157

221242

293316

433471

475

Page 45: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Examples of LOCO

−0.

50.

00.

51.

01.

5

LOCO Intervals using Stepwise + CV

Variable

Test

err

or d

iff

12

34

520

2232

3773

122140

147157

242322

394406

433475

Page 46: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Examples of LOCO

0.0

0.2

0.4

0.6

0.8

1.0

LOCO Intervals using SPAM + CV

Variable

Test

err

or d

iff

12

34

557

315340

Page 47: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Examples of LOCO

−0.

10.

00.

10.

20.

30.

40.

5

LOCO Intervals using Random Forest

Variable

Test

err

or d

iff

14

521

57150

218242

277306

315316

331422

433442

448463

471492

Page 48: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Examples of LOCO

Histogram of Test Errors

Absolute error

Den

sity

0 2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

LassoSPAMRandom forest

Page 49: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The good (RT)

I Algorithmically flexible: any algorithm can be used to measurevariable importance

I Computationally cheap(-ish): one refitting of the algorithm athand per variable considered

I No distributional assumptions: intervals for θj(D1) have exactcoverage in finite-sample, for any distribution P of (X ,Y )

I Selective validity: intervals cover the selected variables

I Accuracy: Intervals (with Bonferroni correction, for svariables) have length O(

√log(sn)/n)

I Simplicity: very simple/portable. Easy implementation

Page 50: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The good (RT)

I Algorithmically flexible: any algorithm can be used to measurevariable importance

I Computationally cheap(-ish): one refitting of the algorithm athand per variable considered

I No distributional assumptions: intervals for θj(D1) have exactcoverage in finite-sample, for any distribution P of (X ,Y )

I Selective validity: intervals cover the selected variables

I Accuracy: Intervals (with Bonferroni correction, for svariables) have length O(

√log(sn)/n)

I Simplicity: very simple/portable. Easy implementation

Page 51: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The good (RT)

I Algorithmically flexible: any algorithm can be used to measurevariable importance

I Computationally cheap(-ish): one refitting of the algorithm athand per variable considered

I No distributional assumptions: intervals for θj(D1) have exactcoverage in finite-sample, for any distribution P of (X ,Y )

I Selective validity: intervals cover the selected variables

I Accuracy: Intervals (with Bonferroni correction, for svariables) have length O(

√log(sn)/n)

I Simplicity: very simple/portable. Easy implementation

Page 52: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The good (RT)

I Algorithmically flexible: any algorithm can be used to measurevariable importance

I Computationally cheap(-ish): one refitting of the algorithm athand per variable considered

I No distributional assumptions: intervals for θj(D1) have exactcoverage in finite-sample, for any distribution P of (X ,Y )

I Selective validity: intervals cover the selected variables

I Accuracy: Intervals (with Bonferroni correction, for svariables) have length O(

√log(sn)/n)

I Simplicity: very simple/portable. Easy implementation

Page 53: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The good (RT)

I Algorithmically flexible: any algorithm can be used to measurevariable importance

I Computationally cheap(-ish): one refitting of the algorithm athand per variable considered

I No distributional assumptions: intervals for θj(D1) have exactcoverage in finite-sample, for any distribution P of (X ,Y )

I Selective validity: intervals cover the selected variables

I Accuracy: Intervals (with Bonferroni correction, for svariables) have length O(

√log(sn)/n)

I Simplicity: very simple/portable. Easy implementation

Page 54: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The good (RT)

I Algorithmically flexible: any algorithm can be used to measurevariable importance

I Computationally cheap(-ish): one refitting of the algorithm athand per variable considered

I No distributional assumptions: intervals for θj(D1) have exactcoverage in finite-sample, for any distribution P of (X ,Y )

I Selective validity: intervals cover the selected variables

I Accuracy: Intervals (with Bonferroni correction, for svariables) have length O(

√log(sn)/n)

I Simplicity: very simple/portable. Easy implementation

Page 55: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Combining many estimators (RT)

0.0

0.2

0.4

0.6

0.8

LOCO using Lasso + SPAM + RF + CV

Variable

Pro

p of

mad

exp

lain

ed

12

34

522

6173

106122

140147

150157

221242

293316

433471

475

Page 56: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The Bias (RT)

0.0

0.2

0.4

0.6

LOCO Intervals with Population Centers

Variable

Pro

p of

mad

exp

lain

ed

01

23

4522

6173106

122140

147150

157221

242293

316433

471475

01

23

4557315

3400

14

521

57150

218242

277306

315316

331422

433442

448463

471492

● ● ● ●●

●●

● ● ●●

●●

● ●●

●●

● ● ● ● ● ● ●●

● ● ●●

●●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Lasso SPAM Random forest

Conditional θMarginal θ

Page 57: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The Purely Predictive Approach: Conformal Prediction

This is an approach which makes NO model assumptions.

It can be used from many tasks (not just regression).

Page 58: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The Purely Predictive Approach: Conformal Prediction

This is an approach which makes NO model assumptions.

It can be used from many tasks (not just regression).

Page 59: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The Purely Predictive Approach: Conformal Prediction

This is an approach which makes NO model assumptions.

It can be used from many tasks (not just regression).

Page 60: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 61: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 62: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 63: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 64: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 65: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 66: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 67: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Recent HistoryConformal prediction (Vovk, Gammerman, Shafer 2005). Onlyassumes exchangeability.

Connected to minimax density estimation, regression andclassification: (Lei, Robins, Wasserman 2013), (Lei, Wasserman2013) and classification (Lei 2014).

High-dimensional regression, lasso (Lei, G’Sell, Rinaldo, Tibshirani,Wasserman 2018).

Multiclass problems. (Sadinle, Lei and Wasserman 2018).

Random effects: Dunn and Wasserman (2018).

Deep learning: Hechtlinger, Poczos, Wasserman (2018),Hechtlinger, Dalmasso, Rinaldo and Wasserman (2018).

Clustering: Lei, Rinaldo and Wasserman (2016), Shin, Rinaldo andWasserman (in progress).

Robustness: Balakrishnan, Patil, Shrotriya and Wasserman (inprogress).

Page 68: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

Augment −→ Fit −→ Test −→ Invert

Augment data: A = {(X1,Y1), . . . , (Xn,Yn), (x , y)}

Fit and get ‘residual’ or ’conformal score’ (this is the art)

Ri = φ((Xi ,Yi ),A)

Test: H0 : (Xn+1,Yn+1) = (x , y). Get p-value

π(x) =1

n + 1

n+1∑i=1

I (Ri ≥ Rn+1)

InvertCn(x) = {y : π(x) ≥ α}

Page 69: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

Augment −→ Fit −→ Test −→ Invert

Augment data: A = {(X1,Y1), . . . , (Xn,Yn), (x , y)}

Fit and get ‘residual’ or ’conformal score’ (this is the art)

Ri = φ((Xi ,Yi ),A)

Test: H0 : (Xn+1,Yn+1) = (x , y). Get p-value

π(x) =1

n + 1

n+1∑i=1

I (Ri ≥ Rn+1)

InvertCn(x) = {y : π(x) ≥ α}

Page 70: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

Augment −→ Fit −→ Test −→ Invert

Augment data: A = {(X1,Y1), . . . , (Xn,Yn), (x , y)}

Fit and get ‘residual’ or ’conformal score’ (this is the art)

Ri = φ((Xi ,Yi ),A)

Test: H0 : (Xn+1,Yn+1) = (x , y). Get p-value

π(x) =1

n + 1

n+1∑i=1

I (Ri ≥ Rn+1)

InvertCn(x) = {y : π(x) ≥ α}

Page 71: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

Augment −→ Fit −→ Test −→ Invert

Augment data: A = {(X1,Y1), . . . , (Xn,Yn), (x , y)}

Fit and get ‘residual’ or ’conformal score’ (this is the art)

Ri = φ((Xi ,Yi ),A)

Test: H0 : (Xn+1,Yn+1) = (x , y). Get p-value

π(x) =1

n + 1

n+1∑i=1

I (Ri ≥ Rn+1)

InvertCn(x) = {y : π(x) ≥ α}

Page 72: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

Augment −→ Fit −→ Test −→ Invert

Augment data: A = {(X1,Y1), . . . , (Xn,Yn), (x , y)}

Fit and get ‘residual’ or ’conformal score’ (this is the art)

Ri = φ((Xi ,Yi ),A)

Test: H0 : (Xn+1,Yn+1) = (x , y). Get p-value

π(x) =1

n + 1

n+1∑i=1

I (Ri ≥ Rn+1)

InvertCn(x) = {y : π(x) ≥ α}

Page 73: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

We have

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +1

n + 1.

No assumptions!

Does not require the model m to be correct.

Page 74: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

We have

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +1

n + 1.

No assumptions!

Does not require the model m to be correct.

Page 75: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

We have

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +1

n + 1.

No assumptions!

Does not require the model m to be correct.

Page 76: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Prediction

We have

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +1

n + 1.

No assumptions!

Does not require the model m to be correct.

Page 77: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Split Conformal

Split the data into two groups D1 and D2.

Compute residuals of the form

Ri = φ(Yi ,Q)

where Q is computed from D1 and Yi ∈ D2.

Let q be the 1− α quantile of the Ri ’s and letC = {y : φ(y ,Q) ≤ q}.

Then

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +2

n + 2.

This is much faster (no augmentation step) but leads to widerintervals.

Page 78: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Split Conformal

Split the data into two groups D1 and D2.

Compute residuals of the form

Ri = φ(Yi ,Q)

where Q is computed from D1 and Yi ∈ D2.

Let q be the 1− α quantile of the Ri ’s and letC = {y : φ(y ,Q) ≤ q}.

Then

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +2

n + 2.

This is much faster (no augmentation step) but leads to widerintervals.

Page 79: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Split Conformal

Split the data into two groups D1 and D2.

Compute residuals of the form

Ri = φ(Yi ,Q)

where Q is computed from D1 and Yi ∈ D2.

Let q be the 1− α quantile of the Ri ’s and letC = {y : φ(y ,Q) ≤ q}.

Then

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +2

n + 2.

This is much faster (no augmentation step) but leads to widerintervals.

Page 80: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Split Conformal

Split the data into two groups D1 and D2.

Compute residuals of the form

Ri = φ(Yi ,Q)

where Q is computed from D1 and Yi ∈ D2.

Let q be the 1− α quantile of the Ri ’s and letC = {y : φ(y ,Q) ≤ q}.

Then

1− α ≤ infP

P(Yn+1 ∈ Cn(Xn+1)) ≤ 1− α +2

n + 2.

This is much faster (no augmentation step) but leads to widerintervals.

Page 81: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Split Conformal

We could split N times at level 1− α/N and set

CN =⋃j

Cj

but:

under weak conditions, with probability tending to 1,

µ(CN) > µ(Csplit).

Page 82: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Split Conformal

We could split N times at level 1− α/N and set

CN =⋃j

Cj

but:

under weak conditions, with probability tending to 1,

µ(CN) > µ(Csplit).

Page 83: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The Choice of Conformal Score

Examples:

|Yi − maug(Xi )|

1

paug(Xi ,Yi )

1

paug(Yi |Xi )

1

paug(Xi |Yi )

1

paug(Xi ,Yi ; θ)

Page 84: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The Choice of Conformal Score

Examples:

|Yi − maug(Xi )|

1

paug(Xi ,Yi )

1

paug(Yi |Xi )

1

paug(Xi |Yi )

1

paug(Xi ,Yi ; θ)

Page 85: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Efficiency

The choice affects efficiency (size of set) not validity

Can be tuned to get minimax prediction sets (see Lei, Robins andWasserman 2013; Lei and Wasserman 2014).

Example: Unsupervised prediction.

Set

Ri (y) =1

ph(Yi ; {Y1, . . . ,Yn, y})

Page 86: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Efficiency

The choice affects efficiency (size of set) not validity

Can be tuned to get minimax prediction sets (see Lei, Robins andWasserman 2013; Lei and Wasserman 2014).

Example: Unsupervised prediction.

Set

Ri (y) =1

ph(Yi ; {Y1, . . . ,Yn, y})

Page 87: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Efficiency

The choice affects efficiency (size of set) not validity

Can be tuned to get minimax prediction sets (see Lei, Robins andWasserman 2013; Lei and Wasserman 2014).

Example: Unsupervised prediction.

Set

Ri (y) =1

ph(Yi ; {Y1, . . . ,Yn, y})

Page 88: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Efficiency

The choice affects efficiency (size of set) not validity

Can be tuned to get minimax prediction sets (see Lei, Robins andWasserman 2013; Lei and Wasserman 2014).

Example: Unsupervised prediction.

Set

Ri (y) =1

ph(Yi ; {Y1, . . . ,Yn, y})

Page 89: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Efficiency

The choice affects efficiency (size of set) not validity

Can be tuned to get minimax prediction sets (see Lei, Robins andWasserman 2013; Lei and Wasserman 2014).

Example: Unsupervised prediction.

Set

Ri (y) =1

ph(Yi ; {Y1, . . . ,Yn, y})

Page 90: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Unsupervised Prediction: Preservation of Minimaxity

The resulting set Cn satisfies infP P(Yn+1 ∈ Cn) ≥ 1− α and

(A1) Suppose p is β-Holder smooth.(A2) Suppose

c1|ε|γ ≤ |P({y : p(y) ≤ tα + ε})− α| ≤ c2|ε|γ .

Then, for all λ > 0,

P

(µ(Cn∆Cα) �

(log n

n

)βγ/(2β+d)

+

(log n

n

)1/2)≤(

1

n

)λwhere Cα is the oracle (smallest possible) prediction set.But if (A1), (A2) fail, we still have coverage (even if P does nothave a density).Can also use size of Cn to choose the bandwidth.

Page 91: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Unsupervised Prediction: Preservation of Minimaxity

The resulting set Cn satisfies infP P(Yn+1 ∈ Cn) ≥ 1− α and

(A1) Suppose p is β-Holder smooth.(A2) Suppose

c1|ε|γ ≤ |P({y : p(y) ≤ tα + ε})− α| ≤ c2|ε|γ .

Then, for all λ > 0,

P

(µ(Cn∆Cα) �

(log n

n

)βγ/(2β+d)

+

(log n

n

)1/2)≤(

1

n

)λwhere Cα is the oracle (smallest possible) prediction set.But if (A1), (A2) fail, we still have coverage (even if P does nothave a density).Can also use size of Cn to choose the bandwidth.

Page 92: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Unsupervised Prediction: Preservation of Minimaxity

The resulting set Cn satisfies infP P(Yn+1 ∈ Cn) ≥ 1− α and

(A1) Suppose p is β-Holder smooth.(A2) Suppose

c1|ε|γ ≤ |P({y : p(y) ≤ tα + ε})− α| ≤ c2|ε|γ .

Then, for all λ > 0,

P

(µ(Cn∆Cα) �

(log n

n

)βγ/(2β+d)

+

(log n

n

)1/2)≤(

1

n

)λwhere Cα is the oracle (smallest possible) prediction set.

But if (A1), (A2) fail, we still have coverage (even if P does nothave a density).Can also use size of Cn to choose the bandwidth.

Page 93: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Unsupervised Prediction: Preservation of Minimaxity

The resulting set Cn satisfies infP P(Yn+1 ∈ Cn) ≥ 1− α and

(A1) Suppose p is β-Holder smooth.(A2) Suppose

c1|ε|γ ≤ |P({y : p(y) ≤ tα + ε})− α| ≤ c2|ε|γ .

Then, for all λ > 0,

P

(µ(Cn∆Cα) �

(log n

n

)βγ/(2β+d)

+

(log n

n

)1/2)≤(

1

n

)λwhere Cα is the oracle (smallest possible) prediction set.But if (A1), (A2) fail, we still have coverage (even if P does nothave a density).

Can also use size of Cn to choose the bandwidth.

Page 94: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Unsupervised Prediction: Preservation of Minimaxity

The resulting set Cn satisfies infP P(Yn+1 ∈ Cn) ≥ 1− α and

(A1) Suppose p is β-Holder smooth.(A2) Suppose

c1|ε|γ ≤ |P({y : p(y) ≤ tα + ε})− α| ≤ c2|ε|γ .

Then, for all λ > 0,

P

(µ(Cn∆Cα) �

(log n

n

)βγ/(2β+d)

+

(log n

n

)1/2)≤(

1

n

)λwhere Cα is the oracle (smallest possible) prediction set.But if (A1), (A2) fail, we still have coverage (even if P does nothave a density).Can also use size of Cn to choose the bandwidth.

Page 95: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Properties (Regression Case)

Oracle:C (x) = [µ(x)− qn, µ(x) + qn]where qn is the upper quantile from Law(|Y − µ|).

Super-Oracle:C (x) = [µ(x)− q, µ(x) + q]where q is the upper quantile from Law(|Y − µ|).

Page 96: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Properties (Regression Case)

Oracle:C (x) = [µ(x)− qn, µ(x) + qn]where qn is the upper quantile from Law(|Y − µ|).

Super-Oracle:C (x) = [µ(x)− q, µ(x) + q]where q is the upper quantile from Law(|Y − µ|).

Page 97: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Properties (Regression Case)Then

Length(Conformal Interval)−Length(Oracle) = OP(ηn+ρn+n−1/2)

whereP(sup

y||µ− µX ,y ||∞ > ηn) ≤ ρn.

With some additional assumptions:

µ(Oracle∆Conformal) = oP(1)

andP(Y ∈ C (X )|X = x)→ 1− α.

Under the ‘usual’ assumptions (linearity, incoherhence etc)

ηn =κ2s√

log d√n

κ from restricted isometry condition, s = sparsity, ρn = (1/d)c .

Page 98: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Properties (Regression Case)Then

Length(Conformal Interval)−Length(Oracle) = OP(ηn+ρn+n−1/2)

whereP(sup

y||µ− µX ,y ||∞ > ηn) ≤ ρn.

With some additional assumptions:

µ(Oracle∆Conformal) = oP(1)

andP(Y ∈ C (X )|X = x)→ 1− α.

Under the ‘usual’ assumptions (linearity, incoherhence etc)

ηn =κ2s√

log d√n

κ from restricted isometry condition, s = sparsity, ρn = (1/d)c .

Page 99: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Properties (Regression Case)Then

Length(Conformal Interval)−Length(Oracle) = OP(ηn+ρn+n−1/2)

whereP(sup

y||µ− µX ,y ||∞ > ηn) ≤ ρn.

With some additional assumptions:

µ(Oracle∆Conformal) = oP(1)

andP(Y ∈ C (X )|X = x)→ 1− α.

Under the ‘usual’ assumptions (linearity, incoherhence etc)

ηn =κ2s√

log d√n

κ from restricted isometry condition, s = sparsity, ρn = (1/d)c .

Page 100: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

High Dimensional Examples

Amazing accuracy in very high dimensions

Page 101: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: n = 200, d = 2, 000; linear and Normal

0.0

0.2

0.4

0.6

0.8

1.0

Coverage, Setting A

Relative optimism

Cov

erag

e

0 0 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5

● ●●●●●●●●●●●●●● ● ●●●●●●●●● ● ●●●●●● ● ●●●● ● ●● ● ●● ●● ● ● ● ●●●●●●●●●●●●●●●●●● ● ●●●●●●●●● ● ● ●●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●●●● ●

● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

LassoElastic netStepwiseSpamRandom forest

4.5

5.0

5.5

Length, Setting A

Relative optimism

Leng

th

0 0 0.1 0.2 0.2 0.3 0.3 0.4 0.4 0.5

●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●

●●●●

●●●

● ●●

●●●

●●

●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●●

●●

●● ● ● ●● ● ● ●●

●●

●●

●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●●

●●

● ●● ● ● ●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Page 102: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: n = 200, d = 2, 000; nonlinear and heavy-tailed

0.0

0.2

0.4

0.6

0.8

1.0

Coverage, Setting B

Relative optimism

Cov

erag

e

0 0 0.1 0.1 0.2 0.3 0.3 0.4 0.4 0.5

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●●●● ●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

LassoElastic netStepwiseSpamRandom forest

1214

1618

2022

24

Length, Setting B

Relative optimism

Leng

th

0 0 0.1 0.1 0.2 0.3 0.3 0.4 0.4 0.5

●●●●●●●●●●●●●●●●●●●●●●●

●●●●●● ●

●●

●●● ●●●●

●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●

●●

●●●● ● ●●● ●

●●

●● ●

●●

●●

●●

●●

●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●

●●

● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Page 103: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: n = 200, d = 2, 000; linear, correlated,heteroskedastic, heavy-tailed

0.0

0.2

0.4

0.6

0.8

1.0

Coverage, Setting C

Relative optimism

Cov

erag

e

−0.9 −0.6 −0.4 −0.1 0.2 0.3

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●● ● ● ● ● ● ● ● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

LassoElastic netStepwiseSpamRandom forest 15

2025

3035

Length, Setting C

Relative optimism

Leng

th

−0.9 −0.6 −0.4 −0.1 0.2 0.3

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●●●● ●

●●

●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●

●● ●

●●

●●

●●●

●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Page 104: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Sliced Regression

Careful choice of conformal score matters.1

Use residuals based on slicing i.e. p(x |dy).

Similar to sliced inverse regression but we do NOT assume theusual index model Y = f (βTX ) + ε.

This choice encourages outputting Cn = ∅ when x is unusual.

1joint work with Yotam Hechtlinger, Nic Dalmasso, Alessandro Rinaldo

Page 105: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Sliced Regression

Careful choice of conformal score matters.1

Use residuals based on slicing i.e. p(x |dy).

Similar to sliced inverse regression but we do NOT assume theusual index model Y = f (βTX ) + ε.

This choice encourages outputting Cn = ∅ when x is unusual.

1joint work with Yotam Hechtlinger, Nic Dalmasso, Alessandro Rinaldo

Page 106: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Sliced Regression

Careful choice of conformal score matters.1

Use residuals based on slicing i.e. p(x |dy).

Similar to sliced inverse regression but we do NOT assume theusual index model Y = f (βTX ) + ε.

This choice encourages outputting Cn = ∅ when x is unusual.

1joint work with Yotam Hechtlinger, Nic Dalmasso, Alessandro Rinaldo

Page 107: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Sliced Regression

Careful choice of conformal score matters.1

Use residuals based on slicing i.e. p(x |dy).

Similar to sliced inverse regression but we do NOT assume theusual index model Y = f (βTX ) + ε.

This choice encourages outputting Cn = ∅ when x is unusual.

1joint work with Yotam Hechtlinger, Nic Dalmasso, Alessandro Rinaldo

Page 108: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Sliced Regression

Careful choice of conformal score matters.1

Use residuals based on slicing i.e. p(x |dy).

Similar to sliced inverse regression but we do NOT assume theusual index model Y = f (βTX ) + ε.

This choice encourages outputting Cn = ∅ when x is unusual.

1joint work with Yotam Hechtlinger, Nic Dalmasso, Alessandro Rinaldo

Page 109: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

The D31 Data

Page 110: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Part of choosing score is the choice of covariates.

The Merck molecular activity dataset.

n = 14,875 observationsd = 5,464 features.

Sliced conformal regression.

Use features as givenorextract features from deep learning?

Page 111: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Part of choosing score is the choice of covariates.

The Merck molecular activity dataset.

n = 14,875 observationsd = 5,464 features.

Sliced conformal regression.

Use features as givenorextract features from deep learning?

Page 112: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Part of choosing score is the choice of covariates.

The Merck molecular activity dataset.

n = 14,875 observationsd = 5,464 features.

Sliced conformal regression.

Use features as givenorextract features from deep learning?

Page 113: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Part of choosing score is the choice of covariates.

The Merck molecular activity dataset.

n = 14,875 observationsd = 5,464 features.

Sliced conformal regression.

Use features as givenorextract features from deep learning?

Page 114: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Part of choosing score is the choice of covariates.

The Merck molecular activity dataset.

n = 14,875 observationsd = 5,464 features.

Sliced conformal regression.

Use features as givenorextract features from deep learning?

Page 115: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Page 116: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Choice of Features: Efficiency not Validity

Page 117: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Optimal Multiclass Prediction: (Sadinle, Lei and W. 2018)

Suppose that Y ∈ {1, . . . , k}. k can be large.

Want to minimize E|C (X )| subject toP(Y ∈ C (X )|Y = y) ≥ 1− αy .Call E|C (X )| the ambiguity.

Solution (NP lemma) is

C ∗(x) = {x : p(y |x) ≥ ty}

Page 118: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Optimal Multiclass Prediction: (Sadinle, Lei and W. 2018)

Suppose that Y ∈ {1, . . . , k}. k can be large.

Want to minimize E|C (X )| subject toP(Y ∈ C (X )|Y = y) ≥ 1− αy .Call E|C (X )| the ambiguity.

Solution (NP lemma) is

C ∗(x) = {x : p(y |x) ≥ ty}

Page 119: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Optimal Multiclass Prediction: (Sadinle, Lei and W. 2018)

Suppose that Y ∈ {1, . . . , k}. k can be large.

Want to minimize E|C (X )| subject toP(Y ∈ C (X )|Y = y) ≥ 1− αy .Call E|C (X )| the ambiguity.

Solution (NP lemma) is

C ∗(x) = {x : p(y |x) ≥ ty}

Page 120: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Optimal Multiclass Prediction: (Sadinle, Lei and W. 2018)

Suppose that Y ∈ {1, . . . , k}. k can be large.

Want to minimize E|C (X )| subject toP(Y ∈ C (X )|Y = y) ≥ 1− αy .Call E|C (X )| the ambiguity.

Solution (NP lemma) is

C ∗(x) = {x : p(y |x) ≥ ty}

Page 121: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Estimating the Optimal Classifier

Plug-in estimator C = {x : p(y |x) ≥ ty}:

With prob at least 1− kδn − (1/n),

P(C∆C ∗) � εγn +

√log n

n

where δn and εn are model specific.

Example: kNN, p(y |x) Lipschitz: δn = 1/n andεn � (log n/n)1/(2d).

Example: sparse logistic regression, with incoherence assumption:

ε � (log d/n)1/4, δn �1

d+ ||β||0

√log d/n

but validity does not depend on these assumptions.

Page 122: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Estimating the Optimal Classifier

Plug-in estimator C = {x : p(y |x) ≥ ty}:

With prob at least 1− kδn − (1/n),

P(C∆C ∗) � εγn +

√log n

n

where δn and εn are model specific.

Example: kNN, p(y |x) Lipschitz: δn = 1/n andεn � (log n/n)1/(2d).

Example: sparse logistic regression, with incoherence assumption:

ε � (log d/n)1/4, δn �1

d+ ||β||0

√log d/n

but validity does not depend on these assumptions.

Page 123: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Estimating the Optimal Classifier

Plug-in estimator C = {x : p(y |x) ≥ ty}:

With prob at least 1− kδn − (1/n),

P(C∆C ∗) � εγn +

√log n

n

where δn and εn are model specific.

Example: kNN, p(y |x) Lipschitz: δn = 1/n andεn � (log n/n)1/(2d).

Example: sparse logistic regression, with incoherence assumption:

ε � (log d/n)1/4, δn �1

d+ ||β||0

√log d/n

but validity does not depend on these assumptions.

Page 124: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Estimating the Optimal Classifier

Plug-in estimator C = {x : p(y |x) ≥ ty}:

With prob at least 1− kδn − (1/n),

P(C∆C ∗) � εγn +

√log n

n

where δn and εn are model specific.

Example: kNN, p(y |x) Lipschitz: δn = 1/n andεn � (log n/n)1/(2d).

Example: sparse logistic regression, with incoherence assumption:

ε � (log d/n)1/4, δn �1

d+ ||β||0

√log d/n

but validity does not depend on these assumptions.

Page 125: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Estimating the Optimal Classifier

Plug-in estimator C = {x : p(y |x) ≥ ty}:

With prob at least 1− kδn − (1/n),

P(C∆C ∗) � εγn +

√log n

n

where δn and εn are model specific.

Example: kNN, p(y |x) Lipschitz: δn = 1/n andεn � (log n/n)1/(2d).

Example: sparse logistic regression, with incoherence assumption:

ε � (log d/n)1/4, δn �1

d+ ||β||0

√log d/n

but validity does not depend on these assumptions.

Page 126: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Estimating the Optimal Classifier

Plug-in estimator C = {x : p(y |x) ≥ ty}:

With prob at least 1− kδn − (1/n),

P(C∆C ∗) � εγn +

√log n

n

where δn and εn are model specific.

Example: kNN, p(y |x) Lipschitz: δn = 1/n andεn � (log n/n)1/(2d).

Example: sparse logistic regression, with incoherence assumption:

ε � (log d/n)1/4, δn �1

d+ ||β||0

√log d/n

but validity does not depend on these assumptions.

Page 127: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,
Page 128: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Price of Optimality

Sometimes, C (x) = ∅.

Page 129: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Price of Optimality

Sometimes, C (x) = ∅.

Page 130: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Price of Optimality

Sometimes, C (x) = ∅.

Page 131: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Null Predictions

Could try to minimize E|C (X )| subject toP(Y ∈ C (X )|Y = y) ≥ 1− αy and subject to C (x) 6= ∅.Difficult in general.

Accretive completion: gradually decrease each ty (greedily)whileminimizing ambiguity E|C (x)|.

Page 132: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Null Predictions

Could try to minimize E|C (X )| subject toP(Y ∈ C (X )|Y = y) ≥ 1− αy and subject to C (x) 6= ∅.Difficult in general.

Accretive completion: gradually decrease each ty (greedily)whileminimizing ambiguity E|C (x)|.

Page 133: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

... but maybe we should encourage empty set predictions ....

Page 134: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

... but maybe we should encourage empty set predictions ....

Page 135: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Cautious Deep Learning

If k is large, and X is ambiguous, C (X ) will be large. This is afeature.

If we use p(x |y) as a score, and x is unusual (i.e. p(x |y) small forall y) we will encourage C (x) = ∅. This is a feature not a bug!

Page 136: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Cautious Deep Learning

If k is large, and X is ambiguous, C (X ) will be large. This is afeature.

If we use p(x |y) as a score, and x is unusual (i.e. p(x |y) small forall y) we will encourage C (x) = ∅. This is a feature not a bug!

Page 137: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Deep Learning

TL: Sea SnakePrediction: Null Set

TL: AlpPrediction: Ski

TL: Shetland SheepdogPrediction: Shetland Sheepdog, Collie, Toilet Paper

TL: Soup BowlPrediction: Face Powder, Soup Bowl, Tray

TL: CradlePrediction: Sleeping Bag

TL: Garter SnakePrediction: Null Set

TL: PorcupinePrediction: Porcupine, Quill

TL: BakeryPrediction: Null Set

TL: MousetrapPrediction: Mousetrap

TL: AngoraPrediction: Null Set

TL: Brain Coral

Prediction: Brain Coral, Water Bottle, Coral Reef

TL: CougarPrediction: Cougar

TL: GuenonPrediction: Guenon, Patas

TL: Recreational VehiclePrediction: Recreational Vehicle

TL: HarvesterPrediction: Null Set

TL: Grey WhalePrediction: Null Set

TL: Sea AnemonePrediction: Null Set

TL: VulturePrediction: Null Set

TL: CartonPrediction: Null Set

TL: CranePrediction: Crane, Hook

Page 138: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Deep Learning

Our Method (α=.5):• Null Set

Inception-v4 Model:• Coil (0.910)• Hay (0.008)• Maze (0.005)

Page 139: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Deep Learning

Our Method (α=.55):• Null Set

Inception-v4 Model:• Volleyball (0.388)• Tennis Ball (0.160)• Racket (0.157)

Page 140: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Deep Learning

Our Method (α=.5):• Null Set

Inception-v4 Model:• Kite (0.137)• Bee Eater (0.033)• Missle (0.031)

Page 141: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Back To Inference: Combining LOCO With ConformalPrediction

Use conformal set C to assess effects of dropping covariates.

LOCO: Drop X (j) to get m(−j) and let

∆j(x , y) = |y − m(−j)(x)| − |y − m(x)|

Wj(x) = {∆j(x , y) : y ∈ C}.

TheninfP

P(∆j(X ,Y ) ∈ C (X ) for all j) ≥ 1− α.

Page 142: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Back To Inference: Combining LOCO With ConformalPrediction

Use conformal set C to assess effects of dropping covariates.

LOCO: Drop X (j) to get m(−j) and let

∆j(x , y) = |y − m(−j)(x)| − |y − m(x)|

Wj(x) = {∆j(x , y) : y ∈ C}.

TheninfP

P(∆j(X ,Y ) ∈ C (X ) for all j) ≥ 1− α.

Page 143: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Back To Inference: Combining LOCO With ConformalPrediction

Use conformal set C to assess effects of dropping covariates.

LOCO: Drop X (j) to get m(−j) and let

∆j(x , y) = |y − m(−j)(x)| − |y − m(x)|

Wj(x) = {∆j(x , y) : y ∈ C}.

TheninfP

P(∆j(X ,Y ) ∈ C (X ) for all j) ≥ 1− α.

Page 144: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Back To Inference: Combining LOCO With ConformalPrediction

Use conformal set C to assess effects of dropping covariates.

LOCO: Drop X (j) to get m(−j) and let

∆j(x , y) = |y − m(−j)(x)| − |y − m(x)|

Wj(x) = {∆j(x , y) : y ∈ C}.

TheninfP

P(∆j(X ,Y ) ∈ C (X ) for all j) ≥ 1− α.

Page 145: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Back To Inference: Combining LOCO With ConformalPrediction

Use conformal set C to assess effects of dropping covariates.

LOCO: Drop X (j) to get m(−j) and let

∆j(x , y) = |y − m(−j)(x)| − |y − m(x)|

Wj(x) = {∆j(x , y) : y ∈ C}.

TheninfP

P(∆j(X ,Y ) ∈ C (X ) for all j) ≥ 1− α.

Page 146: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Nonparametric Additive Model: f4 = f5 = f6 = 0

−1.0 −0.5 0.0 0.5 1.0

−0.

20.

00.

20.

40.

60.

8Component 1

Location

Inte

rval

−1.0 −0.5 0.0 0.5 1.0

0.0

0.5

1.0

Component 2

LocationIn

terv

al−1.0 −0.5 0.0 0.5 1.0

−0.

20.

00.

20.

40.

60.

8

Component 3

Location

Inte

rval

−1.0 −0.5 0.0 0.5 1.0

−0.

04−

0.02

0.00

0.02

0.04

Component 4

Location

Inte

rval

−1.0 −0.5 0.0 0.5 1.0

−0.

06−

0.02

0.02

0.04

0.06

Component 5

Location

Inte

rval

−1.0 −0.5 0.0 0.5 1.0−

0.03

−0.

010.

010.

03

Component 6

Location

Inte

rval

Page 147: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part II: Clustering

By taking a predictive view of clustering we solve several problems:

• choose tuning parameters

• automatically merge some clusters

• replace Voronoi diagram with spheres and ellipsoids (morenatural)

• get a prediction coverage gaurantee

Page 148: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part II: Clustering

By taking a predictive view of clustering we solve several problems:

• choose tuning parameters

• automatically merge some clusters

• replace Voronoi diagram with spheres and ellipsoids (morenatural)

• get a prediction coverage gaurantee

Page 149: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part II: Clustering

By taking a predictive view of clustering we solve several problems:

• choose tuning parameters

• automatically merge some clusters

• replace Voronoi diagram with spheres and ellipsoids (morenatural)

• get a prediction coverage gaurantee

Page 150: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part II: Clustering

By taking a predictive view of clustering we solve several problems:

• choose tuning parameters

• automatically merge some clusters

• replace Voronoi diagram with spheres and ellipsoids (morenatural)

• get a prediction coverage gaurantee

Page 151: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part II: Clustering

By taking a predictive view of clustering we solve several problems:

• choose tuning parameters

• automatically merge some clusters

• replace Voronoi diagram with spheres and ellipsoids (morenatural)

• get a prediction coverage gaurantee

Page 152: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Predictive Clustering

Basic idea:

1. perform clustering.

2. define cluster-based conformal score (residual)

3. Get conformal set.

4. Choose tuning parameters to minimize size of conformalprediction set.

This implicitly creates a bias-variance tradeoff that is missing inclustering.

Page 153: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Predictive Clustering

Basic idea:

1. perform clustering.

2. define cluster-based conformal score (residual)

3. Get conformal set.

4. Choose tuning parameters to minimize size of conformalprediction set.

This implicitly creates a bias-variance tradeoff that is missing inclustering.

Page 154: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Predictive Clustering

Basic idea:

1. perform clustering.

2. define cluster-based conformal score (residual)

3. Get conformal set.

4. Choose tuning parameters to minimize size of conformalprediction set.

This implicitly creates a bias-variance tradeoff that is missing inclustering.

Page 155: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Predictive Clustering

Basic idea:

1. perform clustering.

2. define cluster-based conformal score (residual)

3. Get conformal set.

4. Choose tuning parameters to minimize size of conformalprediction set.

This implicitly creates a bias-variance tradeoff that is missing inclustering.

Page 156: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Predictive Clustering

Basic idea:

1. perform clustering.

2. define cluster-based conformal score (residual)

3. Get conformal set.

4. Choose tuning parameters to minimize size of conformalprediction set.

This implicitly creates a bias-variance tradeoff that is missing inclustering.

Page 157: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Predictive Clustering

Basic idea:

1. perform clustering.

2. define cluster-based conformal score (residual)

3. Get conformal set.

4. Choose tuning parameters to minimize size of conformalprediction set.

This implicitly creates a bias-variance tradeoff that is missing inclustering.

Page 158: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 159: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 160: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 161: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 162: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 163: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 164: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 165: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

From k-means to k-spheres

1. Split the data into two halves Y1 and Y2.

2. Run k-means on Y1 to get centers c1, . . . , ck .

3. For the data in Y2 compute the (non-augmented) residuals

Ri = minj||Yi − cj(i)||

where cj(i) is the closest center to Yi .

4. Let tα be the 1− α quantile of the residuals.

5. Let Ck =⋃

j B(cj , tα).

6. Choose k to minimize Lebesgue measure. (Has a min!)

7. Return: Ck

=⋃

j B(cj , tα)

TheninfP

P(Y ∈ C) ≥ 1− α

Clusters C1, . . . ,Cr are the connected components of C.

Page 166: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●●

●●

●●

●●

●●●

●●

●●

Top left: Data. Top right: k = 6-means. Bottom left: Repaired byour method.

Page 167: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Improved Residuals

We can choose any residuals for conformal prediction.

Ri = minj

||Yi − cj ||2

σ2j+ 2d log σj − 2 log πj .

ThenC =

⋃j

B(cj , rj)

where

rj = σj

√[tα + 2 log πj − 2d log σj ]+.

Page 168: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Improved k-means Prediction

Smaller prediction sets.

Better approximation to density level sets.

Robust to outliers: Huber contamination model

P = (1− ε)P0 + εQ

Valid coverage and smaller prediction set.

Page 169: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Improved k-means Prediction

Smaller prediction sets.

Better approximation to density level sets.

Robust to outliers: Huber contamination model

P = (1− ε)P0 + εQ

Valid coverage and smaller prediction set.

Page 170: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Improved k-means Prediction

Smaller prediction sets.

Better approximation to density level sets.

Robust to outliers: Huber contamination model

P = (1− ε)P0 + εQ

Valid coverage and smaller prediction set.

Page 171: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Improved k-means Prediction

Smaller prediction sets.

Better approximation to density level sets.

Robust to outliers: Huber contamination model

P = (1− ε)P0 + εQ

Valid coverage and smaller prediction set.

Page 172: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Improved k-means Prediction

Smaller prediction sets.

Better approximation to density level sets.

Robust to outliers: Huber contamination model

P = (1− ε)P0 + εQ

Valid coverage and smaller prediction set.

Page 173: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: Outliers, standard k-means

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●●

●●

●●

● ●

●●

●●

●●

●● ●

●●●

●●

●●

●●

● ●●

x

y

−10 −5 0 5 10

−5

05

−10 −5 0 5 10

−5

05

2 4 6 8 10

0.05

0.06

0.07

0.08

k

Vol

ume

Page 174: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: Outliers, improved k-means

●●

● ●

●●

●●●

● ●

●●

●●

● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●● ●

●●

● ●

●●

x

y

2 4 6 8 10

0.03

0.04

0.05

0.06

0.07

k

Vol

ume

−10 −5 0 5 10

−5

05

●●

●●

●●

●●●●

Page 175: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example

●●●

●●●

●●

●●

●●

● ●●●

●●

●●

●●

●●●

●●●●

● ●●● ●

●●

●●

●●

●●

●●●

● ●●●

●●

●●

●●

● ●

●●●

●●●

● ●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

●●● ●

● ●●

●●

●●●●

● ●

●● ●●

●●

●●

●●

●●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

●●●

● ●

●●

●●●●

●●

●●

●●

●● ●●

●●

●●

●● ●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

●●●

● ●● ●●

●●

●●

●●

●●

● ●

●● ●●

● ●

●●

●●●●

●●

●●

● ●●

●●●

●●

● ●

●● ●

● ●

● ●

●● ●

●●

●●●

●●●●

●●●

● ●●

●●

● ●

●●

●●

●●

●●●

● ●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

● ●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●

●●

●● ●

●●●

●● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●●

●●

●● ●

●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●● ●

●●

●●

●● ●

●●

●●● ●●

●●

●●

●●

● ●

●●

●●●● ●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●

●●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●●●

●●

●●

●●

● ●●● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

Data

5 10 15 20

110

120

130

140

150

160

k

Vol

ume

k = 4

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●●

●●

● ●

●●

●● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●●

●●

●●

●●●

●●●●

● ●●● ●

●●

●●

●●

●●

●●●

● ●●●

●●

●●

●●

● ●

●●●

●●●

●●

● ●

●●

●● ●

●●

● ●

●●

●●

●●

●●

● ●

●●● ●

● ●●

●●

●●●●

● ●

●● ●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

●●●

● ●

●●

●●●●

●●

●●

●●

●● ●●

●●

●●

● ●● ●

●●

● ●

●●

● ●●

●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

● ●● ●●

●●

●●

●●

●●

● ●

●● ●●

● ●

●●

●●●●

●●

●●

● ●●

●●●

●●

● ●

●● ●

● ●

● ●

●● ●

●●

●●●

●●●●

●●●

● ●●

●●

● ●

●●

●●

●●●

● ●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

● ●● ●

●●●

●●

●●

● ●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●

●●

●● ●

●●●

●● ●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●● ●●

●●

●● ●

●●

●●

●●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●● ●

●●

●●

●● ●

●●

●●● ●●

●●

●●

●●

● ●

●●

●●●● ●●

●●

●●

●●

● ●●

●●●

●●

●●

●●

●●

●●

●●● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

● ●

●●

●●

●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

● ●●

● ●

●●

● ●

●●● ●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

● ●

●●●●

●●

●●

● ●●

●●●

● ●●● ●●

●●

●●

●●

● ●

● ●

●●

●●

Page 176: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example

k = 2

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●

●●

● ●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●●

●●

● ●●

●●

●●●

●●●●

●●

● ●●● ●

●●● ●

●●

●●

●●●

●●●

● ●●●

●●

●●●

●●

● ●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

● ●●

●●

●● ●●

●●

●●●

●●

●●

●●

●●●

● ●

●●● ●

●●

● ●●

●●

●●● ●

●●

● ●

●● ●●

● ●

●●

●●

●●

● ●

●●●●●

●●

●●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●● ●

●●

● ●● ● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●●

●●

●●●

●●●●

●●●

● ●

● ●

●●

●●●

●●

●●●●

●●●

● ●

●●●

● ●●

●●

●●

●●

●●●●

●●●

●●

● ●

●● ●●

●●

●●

●● ●

●●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●● ●● ●●

●●

●●

●●

● ●

● ●

●● ●●

● ●●

●●

●●●●

●●

●●●

●●

● ●●

●●●

●●

● ●

●●●

● ●

● ●

●● ●

●●

●●● ●

●●●●

●●

● ●●●

●●

●●

●● ●

●●

●●

●●

●● ●●

●● ●

●●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●

● ●●

●●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●●

●●

●●●●●

●●

●●

●●●● ●

● ●

●●

●● ●

●●

●●●

●● ●●●●

●●

●● ●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●● ●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●● ●●●●

●●

●●

●●● ●●

●●

●●●

●●

● ●

● ●●●●● ●●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●

●● ●

●●●

●●

●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●●●

● ●

●●

●●

● ●

●●● ●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●● ●

●●●

●● ●

●●

●●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

● ●

k = 3

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●

●●

● ●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●●

●●

● ●●

●●

●●●

●●●●

●●

● ●●● ●

●●● ●

●●

●●

●●●

●●●

● ●●●

●●

●●●

●●

● ●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

● ●●

●●

●● ●●

●●

●●●

●●

●●

●●

●●●

● ●

●●● ●

●●

● ●●

●●

●●● ●

●●

● ●

●● ●●

●●

●●

● ●

●●

●●●●●

●●

●●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●● ●

●●

● ●● ● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●●

●●●

● ●

● ●

●●

●●●

●●

●●●●

●●●

● ●

●●●

● ●●

●●

●●

●●

●●●●

●●●

●●

● ●

●● ●●

●●

●●

●● ●

●●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●● ●● ●●

●●

●●

●●

● ●

● ●

●● ●●

● ●●

●●

●●●●

●●

●●●

●●

● ●●

●●●

●●

● ●

●●●

● ●

● ●

●● ●

●●

●●● ●

●●●●

●●

● ●●●

●●

●●

●● ●

●●

●●

●●

●● ●●

●● ●

●●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●

●●

● ●●

●●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●●

●●

●●●●●

●●

●●

●●●● ●

● ●

●●

●● ●

●●

●●●

●● ●●●●

●●

●● ●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●● ●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●● ●●●●

●●

●●

●●● ●●

●●

●●●

●●

● ●

● ●●●●● ●●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●

●● ●

●●●

●●

●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●●●

● ●

●●

●●

● ●

●●● ●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●● ●

●●●

●● ●

●●

●●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

● ●

●●

k = 4

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●

●●

● ●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●●

●●

● ●●

●●

●●●

●●●●

●●

● ●●● ●

●●● ●

●●

●●

●●●

●●●

● ●●●

●●

●●●

●●

● ●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●● ●

● ●●

● ●●

●●

●● ●●

●●

●●●

●●

●●

●●

●●●

● ●

●●● ●

●●

● ●●

●●

●●● ●

●●

● ●

●● ●●

●●

●●

● ●

●●

●●●●

●●

●●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●● ●

●●

● ●● ● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●●

●●●●

●●●

● ●

● ●

●●

●●●

●●

●●●●

●●●

● ●

●●●

● ●●

●●

●●

●●

●●●●

●●●

●●

● ●

●● ●●

●●

●●

● ●● ●

●●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●● ●● ●●

●●

●●

●●

● ●

● ●

●● ●●

● ●●

●●

●●●●

●●

●●●

●●

● ●●

●●●

●●

● ●

●●●

● ●

● ●

●● ●

●●

●●● ●

●●●●

●●

● ●●●

●●

●●

●● ●

●●

●●

●● ●●

●● ●

●●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●

●●●

●●

●●

● ●●

●●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●●

●●

●●●●●

●●

●●

●●●● ●

● ●

●●

●● ●

●●

●●●

●● ●●●●

●●

●● ●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●● ●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●● ●●●●

●●

●●

●●● ●●

●●

●●●

●●

● ●

● ●●●●● ●●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●

●● ●

●●●

●●

●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●●●

● ●

●●

●●

● ●

●●● ●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●● ●

●●●

●● ●

●●

●●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●●●

●●●

●●

●● ●●

●●

●●●

●●

● ●●● ●●

●●●

●●

●●

● ●

● ●

●●

●●

● ●

k = 5

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●

●●

● ●

●●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

● ●●●

●●

●●

● ●●

●●

●●●

●●●●

●●

● ●●● ●

●●● ●

●●

●●

●●●

●●●

● ●●●

●●

●●●

●●

● ●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●● ●

● ●●

● ●●

●●

●● ●●

●●

●●●

●●

●●

●●

●●●

● ●

●●● ●

●●

● ●●

●●

●●● ●

●●

● ●

●● ●●

●●

●●

● ●

●●

●●●●●

●●

●●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●● ●

●●

● ●● ● ●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●● ●

●●

●●●

●●●●

●●●

● ●

● ●

●●

●●●

●●

●●●●

●●●

● ●

●●●

● ●●

●●

●●

●●

●●●●

●●●

●●

● ●

●● ●●

●●

●●

●● ●

●●●

● ●●

●●

● ●●

●●

●●

●●

● ●

●●

●● ●

● ●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●● ●● ●●

●●

●●

●●

● ●

● ●

●● ●●

● ●●

●●

●●●●

●●

●●●

●●

● ●●

●●●

●●

● ●

●●●

● ●

● ●

●● ●

●●

●●● ●

●●●●

●●

● ●●●

●●

●●

●● ●

●●

●●

●● ●●

●● ●

●●

●●

●●

● ●

● ●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

●●

●●●

●●

●●

●●

● ●●

●●●

● ●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●●

●●

● ●

●●●

●●●●

●●

●●●●●

●●

●●

●●●● ●

● ●

●●

●● ●

●●

●●●

●● ●●●●

●●

●● ●● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

● ●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●● ●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●●

●●

●●

● ●

●●

●●●

●●

●●

●●●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●●●● ●

●●

● ●●

●●

●●

●● ●

●●

●●

●●

●● ●●●●

●●

●●

●●● ●●

●●

●●●

●●

● ●

● ●●●●● ●●

●●

●●

●●

●●

●● ●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

● ●●

●●

●●

●●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

● ●

●●

●●●

●● ●

●●●

●●

●●●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●● ●

● ●●●

● ●

●●

●●

● ●

●●● ●

●●

●●

●●

●●●

● ●●

●●

●●

●●

●● ●

●●●

●● ●

●●

●●●

●●

●●●

●●

● ●

● ●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●●

● ●●

●●●●

●●●

●●

●● ●●

●●

●●●

●●

● ●●● ●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

● ●

Page 177: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example

● ●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

● ●●●

●●

●●

●●

● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●●●

●●

●●

●● ●

●●

● ●●

●●●

●●

● ●

●●

●● ●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●

● ●

● ●●●

● ●

● ●● ●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●●●

● ●

●● ●

●●●●

●●

● ●●

●●

● ●

●●

●● ●

●● ●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●● ●

●●

●● ●

●●●●

●●

●●

●●

●●

●●

●●

●●

Data

0 5 10 15 20 25 30

6080

100

120

140

160

180

k

Vol

ume

●●

k = 4

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

● ●

● ●

●●

●●●

●●

●●

● ●●●

●●

●●

●●

● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●●●

●●

●●

●● ●

●●

● ●●

●●●

●●

● ●

●●

●● ●●

●●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●

● ●

● ●●●

● ●

● ●● ●

●●

●●

● ●

●●

●●●

●●

●●

● ●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●●●

● ●

●● ●

●●●●

●●

● ●●

●●

● ●

●●

●● ●

●● ●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●●

● ●

●●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●●

●●

●● ●

●●

●● ●

●●●●

●●

Page 178: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Conformal Density Clustering

Let C1, . . . ,Cr be the connected components of

L = {y : ph(y) > t}.

Problems:1. How to choose h?2. How to choose t?3. How to find C1, . . . ,Cr?All solved by conformal clustering.

Page 179: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Density Level Set Conformal Clustering

1. Split the data into two halves Y1 and Y2.

2. Estimate a density p from Y1.

3. For a given t > 0, let Y(t)1 := {Yi ∈ Y1 : p(Yi ) ≥ L}.

4. Compute the residuals Ri := R(Yi ) for Yi ∈ Y2, whereR(y) = min

Yj∈Y(t)1

‖y − Yj‖.

5. Let tα be the 1− α quantile of the residuals.

6. Let C =⋃

Yj∈Y(t)1

B(Yj , tα).

Choose h and L to minimize volume.Conformalization makes this very robust to h and L because tαadapts.

Page 180: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: Small L, 4 bandwidths

Page 181: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Example: Large L, 4 bandwidths

Page 182: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part III: Random Effects (with Robin Dunn)

Random distributions (subjects)

P1, . . . ,Pk ∼ Π

and observe data Dj drawn from Pj .

Problem 1: Predict Y ∼ Pk+1 (new subject).Problem 2: Predict Y ∼ Pj for some 1 ≤ j ≤ k ; new observationon existing subject. (Shrinkage.)

I’ll focus on Problem 1.

Page 183: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part III: Random Effects (with Robin Dunn)

Random distributions (subjects)

P1, . . . ,Pk ∼ Π

and observe data Dj drawn from Pj .

Problem 1: Predict Y ∼ Pk+1 (new subject).Problem 2: Predict Y ∼ Pj for some 1 ≤ j ≤ k ; new observationon existing subject. (Shrinkage.)

I’ll focus on Problem 1.

Page 184: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Part III: Random Effects (with Robin Dunn)

Random distributions (subjects)

P1, . . . ,Pk ∼ Π

and observe data Dj drawn from Pj .

Problem 1: Predict Y ∼ Pk+1 (new subject).Problem 2: Predict Y ∼ Pj for some 1 ≤ j ≤ k ; new observationon existing subject. (Shrinkage.)

I’ll focus on Problem 1.

Page 185: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 1

Select one observation randomly from each group. (These are iid).

Apply any conformal procedure at level α/N.

Repeat N times.

Set C =⋂N

j=1 Cj .

This is always valid. Optimal N seems to be N = 1.

Page 186: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 1

Select one observation randomly from each group. (These are iid).

Apply any conformal procedure at level α/N.

Repeat N times.

Set C =⋂N

j=1 Cj .

This is always valid. Optimal N seems to be N = 1.

Page 187: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 1

Select one observation randomly from each group. (These are iid).

Apply any conformal procedure at level α/N.

Repeat N times.

Set C =⋂N

j=1 Cj .

This is always valid. Optimal N seems to be N = 1.

Page 188: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 1

Select one observation randomly from each group. (These are iid).

Apply any conformal procedure at level α/N.

Repeat N times.

Set C =⋂N

j=1 Cj .

This is always valid. Optimal N seems to be N = 1.

Page 189: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 1

Select one observation randomly from each group. (These are iid).

Apply any conformal procedure at level α/N.

Repeat N times.

Set C =⋂N

j=1 Cj .

This is always valid. Optimal N seems to be N = 1.

Page 190: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 1

Select one observation randomly from each group. (These are iid).

Apply any conformal procedure at level α/N.

Repeat N times.

Set C =⋂N

j=1 Cj .

This is always valid. Optimal N seems to be N = 1.

Page 191: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 2. Double Conformal InferenceUse a parametric working model. (Does not need to be correct).Apply conformal prediction to get Cj for group j at level 1− δ.In fact, these are iid random sets such that

∫CjdPj = 1− δ+oP(1).

We have m iid random sets C1, . . . ,Cm.

Apply conformal inference to the sets at level β to get C such that

P(Cm+1 ∈ C) ≥ 1− β.

If δ + β ≤ α thenP(Y ∈ C) ≥ 1− α.

Seems elegant but subsampling works better.

Page 192: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 2. Double Conformal InferenceUse a parametric working model. (Does not need to be correct).Apply conformal prediction to get Cj for group j at level 1− δ.In fact, these are iid random sets such that

∫CjdPj = 1− δ+oP(1).

We have m iid random sets C1, . . . ,Cm.

Apply conformal inference to the sets at level β to get C such that

P(Cm+1 ∈ C) ≥ 1− β.

If δ + β ≤ α thenP(Y ∈ C) ≥ 1− α.

Seems elegant but subsampling works better.

Page 193: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 2. Double Conformal InferenceUse a parametric working model. (Does not need to be correct).Apply conformal prediction to get Cj for group j at level 1− δ.In fact, these are iid random sets such that

∫CjdPj = 1− δ+oP(1).

We have m iid random sets C1, . . . ,Cm.

Apply conformal inference to the sets at level β to get C such that

P(Cm+1 ∈ C) ≥ 1− β.

If δ + β ≤ α thenP(Y ∈ C) ≥ 1− α.

Seems elegant but subsampling works better.

Page 194: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 2. Double Conformal InferenceUse a parametric working model. (Does not need to be correct).Apply conformal prediction to get Cj for group j at level 1− δ.In fact, these are iid random sets such that

∫CjdPj = 1− δ+oP(1).

We have m iid random sets C1, . . . ,Cm.

Apply conformal inference to the sets at level β to get C such that

P(Cm+1 ∈ C) ≥ 1− β.

If δ + β ≤ α thenP(Y ∈ C) ≥ 1− α.

Seems elegant but subsampling works better.

Page 195: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects: Method 2. Double Conformal InferenceUse a parametric working model. (Does not need to be correct).Apply conformal prediction to get Cj for group j at level 1− δ.In fact, these are iid random sets such that

∫CjdPj = 1− δ+oP(1).

We have m iid random sets C1, . . . ,Cm.

Apply conformal inference to the sets at level β to get C such that

P(Cm+1 ∈ C) ≥ 1− β.

If δ + β ≤ α thenP(Y ∈ C) ≥ 1− α.

Seems elegant but subsampling works better.

Page 196: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Random Effects

Page 197: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 198: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 199: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 200: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 201: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 202: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 203: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END

Page 204: CMU Statistics - Model Free Predictive Inferencelarry/talk.pdf · 2019. 1. 7. · Model Free Predictive Inference Larry Wasserman Carnegie Mellon University With: Yotam Hechtlinger,

Summary

In the last few years, we have pursued the goal of model freeinference.

This philosophy can be applied to many problems:high-dimensional regression, deep learning, random effects, ...

Conformalizing any optimal procedure preserves its propertieswithout sacrificing general validity.

Especially important in high-dimensional problems whereassumptions are hard to check.

Can we get rid of the iid assumption?

Code: https://github.com/ryantibs/conformal

THE END