Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an...

29
Lecture on Bootstrap Yu-Chin Hsu (許育進) Academia Sinica December 4, 2014 1 / 29

Transcript of Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an...

Page 1: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Lecture on Bootstrap

Yu-Chin Hsu (許育進)

Academia Sinica

December 4, 2014

1 / 29

Page 2: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

This lecture is based on Xiaoxia Shi’s lecture note and myunderstanding of bootstrap.

This is an introductory lecture to Bootstrap Method in that Iwon’t provide any proofs.

For further reading, please seeHorowitz, J.L. (2001). “The Bootstrap”, in J.J. Heckman andE. Leamer, eds, Handbook of Econometrics, vol. 5, ElsevierScience, B.V., p. 3159-3228,and references within.

Xiaoxia Shi’s lecture note is available athttp://www.ssc.wisc.edu/∼ xshi/econ715/Lecture 10 bootstrap.pdf

2 / 29

Page 3: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Introduction

Let W = (W1, . . . ,Wn) denote an i.i.d. sample withdistribution function F .

Let θ0 be the parameter of interest and θn(W ) denote theestimator based on W.

For example, θ0 can be the mean of W , E [W ], andθ(W) = n−1

∑Wi .

To make inference or to construct confidence interval (CI) forθ0, in general, we need to know the exact (or limiting)distribution of

√n(θ − θ0).

Most of the time,√n(θ − θ0)

d→ N(0, σ2).

Then given the availability of consistent estimator σn for σ,we can make inference and construct CI.

3 / 29

Page 4: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Introduction (cont’d)

However, sometimes, (1) the exact form of σ is hard to obtainor (2) it is very complicated to construct consistent estimatorσ for σ.

(1) can happen when the derivatives of the objective functionof a maximum likelihood model are complicated.(2) can happen when the σ involves nonparametriccomponents, e.g., θ0 is the medium of of F and θn(W) is thesample medium. Then σ2 will be 1/(4f (θ0)) where f (·) is thepdf of F . Then to estimate σ, one needs f (θn) which is anonparametric estimator.

Therefore, it is hard to make inference and to construct CI forθ0.

Bootstrap can be served as an alternate method for thispurpose.

4 / 29

Page 5: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

What is bootstrap?

Shi: “Bootstrap is an alternative to asymptotic approximationfor carrying out inference. The idea is to mimic the variationfrom drawing different samples from a population by thevariation from redrawing samples from a sample.”

Horowitz: “The bootstrap is a method for estimating thedistribution of an estimator or test statistic by resamplingone’s data or a model estimated from the data.”

Shi: “The name comes from the common English phrase“bootstrap” which alludes to “pulling oneself over the fenceby pulling on ones own bootstrap”, and means solving aproblem without external help.”

5 / 29

Page 6: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

How does bootstrap work?

Let W∗ = (W ∗1 , . . . ,W

∗n ) denote an i.i.d. sample with

distribution function F ∗.

Let θ∗0 be the parameter under F ∗ and θn(W∗) denote the

estimator based on W∗.

Basic Idea:

When F ∗ is close to F , then the distribution of√n(θ∗ − θ∗0)

should be close to√n(θ − θ0).

Therefore, if we can find an F ∗ (known to us) that is close toF (unknown), then we can approximate

√n(θ− θ0) (unknown)

by√n(θ∗ − θ∗0 ) (known).

A natural choice of F ∗ is the empirical cdf Fn since we canshow that when sample size is large enough, Fn is consistentfor F . This leads to the “nonparametric bootstrap”.

6 / 29

Page 7: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Confidence Interval

What is the concept of a confidence interval (from afrequentist point of view)?

Suppose√n(θ − θ0)

D→ N(0, σ2) and σ2 p→ σ2.

Then the two-sided 95% confidence interval of θ is:(θ − 1.96σ√

n, θ +

1.96σ√n

).

Why?

P(−1.96 <√n(θ − θ0)/σ < 1.96) ≈ 95%

⇒P(−1.96σ <√n(θ − θ0) < 1.96σ) ≈ 95%

⇒P(−1.96σ/√n < (θ − θ0) < 1.96σ/

√n) ≈ 95%

⇒P(−1.96σ/√n < (θ0 − θ) < 1.96σ/

√n) ≈ 95%

⇒P(θ − 1.96σ/√n < θ0 < θ + 1.96σ/

√n) ≈ 95%

7 / 29

Page 8: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Bootstrap Confidence Interval

Suppose that the limiting distribution of√n(θ∗ − θ) is close

to√n(θ − θ0).

Let’s pretend that we know the 2.5% and 97.5% quantiles of√n(θ∗ − θ) for now and they are denoted as q∗2.5 and q∗97.5.

Then the 95% CI is (θ−q∗97.5/√n, θ−q∗2.5/

√n).

Note that

P(q∗2.5 <√n(θ∗ − θ) < q∗97.5) = 95%

⇒P(q∗2.5 <√n(θ − θ0) < q∗97.5) ≈ 95%

⇒P(q∗2.5/√n < (θ − θ0) < q∗97.5/

√n) ≈ 95%

⇒P(−q∗97.5/√n < (θ0 − θ) < −q∗2.5/

√n) ≈ 95%

⇒P(θ−q∗97.5/√n < θ0 < θ−q∗2.5/

√n) ≈ 95%

8 / 29

Page 9: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Remarks:

In this example, we know that the limiting distribution of√n(θ − θ0) is symmetric. Let α95 such that

P(√n|(θ∗ − θ)| < α95) = 95%.

Then P(θ−α∗95/

√n < θ0 < θ−α∗

95/√n) = 95%.

Both CI’s are asymptotically valid.

In general, the second one can have higher-order improvementin that the converge rate of this CI converge to 95% at afaster rate than the first one. (Why?)

In general, if the finite sample distribution of√n(θ − θ0) is

known to be skewed, then the first one might be a better oneto use.

9 / 29

Page 10: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

How to obtain those quantiles?

So far, we pretend that q∗2.5, q∗97.5 and α∗

95 are known.√n(θ∗ − θ) is known as we pointed out, because we know W ∗

i

are drawn from F , the empirical CDF.

Of course, the close form of the CDF of√n(θ∗ − θ) is still

hard to get!

Then this is where the well-known bootstrap simulations comeinto play.

10 / 29

Page 11: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

How to obtain those quantiles? (Cont’d)

We know that W ∗i ’s are drawn from F which is equivalent to

randomly draw W ∗i from {W1, . . . ,Wn} with prob 1/n.

Therefore, a bootstrap sample {W ∗1 , . . . ,W

∗n } is formed from

n random sample with replacement.

This step can be done by computer.Generate U[0, 1] random variables. Let u be a realization andwe can have the index be k if (k − 1)/n < u ≤ k/n.

11 / 29

Page 12: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Bootstrap simulations:

1. We can use computer to draw {W ∗1,b, . . . ,W

∗n,b} for

b = 1, . . . ,B and obtain θ∗b.

2. Then the√n(θ∗ − θ) can be further approximated by the

empirical distribution of√n(θ∗b − θ) from b = 1, . . . ,B .

3. Rank√n(θ∗(b) − θ) in an ascending order such that

√n(θ∗(1) − θ) ≤ √

n(θ∗(2) − θ) ≤ . . . ≤ √n(θ∗(B) − θ).

4. q∗2.5 and q∗97.5 can be approximated by q∗2.5 =√n(θ∗(⌊2.5∗B⌋) − θ)

and q∗97.5 =√n(θ∗(⌊97.5∗B⌋) − θ), respectively, where ⌊c⌋ denote

the largest integer a such that a ≤ c .

5. That is, if B = 1000, then q∗2.5 =√n(θ∗(25) − θ) and

q∗97.5 =√n(θ∗(975) − θ), respectively.

6. α∗95 is defined similarly except that the ranking is based on√n|(θ∗(b) − θ)|.

12 / 29

Page 13: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

How to obtain those quantiles? (Cont’d)

Note that this approximation can be as accurate as you pleaseby setting B large enough.

When B is too large, it might take too much time to compute.Therefore, there is a trade-off between accuracy and time.

In general, setting B = 700 ∼ 1000, the approximation can begood.

Note that q∗97.5 =√n(θ∗(975) − θ). Therefore, the lower bound

of the CI can be rewritten as

θ − q∗97.5√n

= θ −√n(θ∗(975) − θ)

√n

= θ − (θ∗(975) − θ).

Similarly, the upper bound can be rewritten as θ − (θ∗(25) − θ).

13 / 29

Page 14: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Hypothesis testing

Let Wi ∼ N(µ, 1). We want to test H0 : µ = 1 v.s.H0 : µ 6= 1 at 5% significance level.

Test statistic:√n(µn − 1) where µn is the sample average.

We would reject H0 when |√n(µn − 1)| > 1.96

Under H0 : µ = 1, we will falsely reject the null hypothesis5% of the time.

Under H1 : µ 6= 1, we will reject the null hypothesis withprobability 1 asymptotically. (when n → ∞)

14 / 29

Page 15: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

A wrong bootstrap procedure!

The following procedure is WRONG!

1 Generate bootstrap samples: {W1,b, . . . ,Wn,b} forb = 1, . . . ,B, say B = 1000.

2 Calculate√n(µ∗

b − 1) and obtain q∗2.5 =√n(µ∗

(25) − 1) and

q∗97.5 =√n(µ∗

(975) − 1).

3 Reject H0 when√n(µn − 1) < q∗(25) or

√n(µn − 1) > q∗(975).

To see why?

15 / 29

Page 16: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Note that

P(√n(µn − 1) < q∗(25))

=P(√n(µn − 1) <

√n(µ∗

(25) − 1))

=P(0 <√n(µ∗

(25) − µn)) → 0.

Similarly,

P(√n(µn − 1) > q∗(975))

=P(√n(µn − 1) >

√n(µ∗

(975) − 1))

=P(0 >√n(µ∗

(975) − µn)) → 0.

Note that the previous two results hold no matter the trueparameters are.

Therefore, no matter under the null or under the alternative,the size or the power of such test is zero.

16 / 29

Page 17: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

A Right way to do!

The following procedure is correct!

1 Generate bootstrap samples: {W1,b, . . . ,Wn,b} forb = 1, . . . ,B, say B = 1000.

2 Calculate√n(µ∗

b − µn) and obtain q∗2.5 =√n(µ∗

(25) − µn) and

q∗97.5 =√n(µ∗

(975) − µn).

3 Reject H0 when√n(µn − 1) < q∗(25) or

√n(µn − 1) > q∗(975).

Why this is a valid procedure?

17 / 29

Page 18: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Under the null hypothesis H0 : µ = 1,

P(√n(µn − 1) < q∗(25) or

√n(µn − 1) > q∗(975))

=1− P(q∗(25) <√n(µn − 1) < q∗(975))

=1− P(q∗(25) <√n(µn − µ) < q∗(975)) ≈ 0.05.

Under the alternative H1 : µ 6= 1, we have√n(µn − 1) → ±∞. Also, q∗(25) and q∗(975) are bounded in

probability. Therefore,

P(√n(µn − 1) < q∗(25) or

√n(µn − 1) > q∗(975)) → 1.

18 / 29

Page 19: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Remarks

The key is to approximate the “null distribution” no matterwe are under the null or under the alternative.

In this case, the null distribution is√n(µn − µ) no matter the

value of true parameter is.

Therefore, we cannot just plug in the value that we want totest in the bootstrap repetitions.

19 / 29

Page 20: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Other uses of Bootstrap

Standard Error

We can use bootstrap method to approximate the asymptoticstandard error of an estimator, σ.

As we mentioned, when constructing CI’s or conductinghypothesis testing, we need a consistent estimator for σ.

We can use bootstrap to obtain an consistent estimator:

σ∗n =

1

B

B∑

i=1

(θ∗b − θ

∗)2,

where θ∗is the sample average of θ∗b’s.

Then we can replace σ with σ∗n in the previous cases.

Shi’s remark: To use bootstrap for standard error, theestimator under consideration must be asymptotically normal.Otherwise, the use of standard error itself is misguided.

20 / 29

Page 21: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Other uses of Bootstrap

Bias Correction

We can use bootstrap to correct the bias of an estimator.The exact bias is Bias(θn, θ) = E [θn]− θ and is unknown.The bootstrap estimator of the bias is:

Bias∗

(θn, θ) =1

B

B∑

i=1

θ∗b − θn.

Then the bootstrap bias-corrected estimator for θ is

θBC ,n = θn − Bias∗

(θn, θ) = 2θn −1

B

B∑

i=1

θ∗b .

Shi’s remark: Bias correction usually increases the variancebecause the bias is estimated. (This causes a trade-offbetween bias and variance.) Therefore it should not be usedindiscriminately.

21 / 29

Page 22: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Higher-order improvements of the Bootstrap

This part is rather theoretical, so we will skip it. Please seeShi’s note for more discussions.

22 / 29

Page 23: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Bootstrap for Regression Models

The regression model we consider is

Yi = Xiβ + Ui , for i = 1, . . . , n,

where Wi = (Yi ,X′i ) is iid with F .

Let βn denote the OLS estimator for β such that

βn =(1n

n∑

i=1

XiX′i

)−1 1

n

n∑

i=1

XiYi .

Under regularity conditions,√n(βn − β)

D→ N(0,V ) where

V = E [XX ′]−1E [U2XX ′]E [XX ′]−1.

23 / 29

Page 24: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Bootstrap for Regression Models (Cont’d)

The nonparametric bootstrap works here.

Bootstrap sample are form by the pairs of Wi = (Yi ,X′i )

′.

That is, a bootstrap sample {(Y ∗i ,X

∗i )}ni=1 is a random

sample with replacement from {(Yi ,Xi )}ni=1.

β∗n is calculated in the same way as βn:

β∗n =

(1n

n∑

i=1

X ∗i X

∗i′)−1 1

n

n∑

i=1

X ∗i Y

∗i .

Then, results similar to what we discussed before would holdin this case under suitable conditions.

24 / 29

Page 25: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Wild Bootstrap for Regression Models

In OLS, we have Yi = Xi βn + ei where ei ’s are the residuals.

Let Ubi ’s denote iid pseudo random variables with mean 0 and

variance 1.

Let the b-th bootstrap sample be

Y ∗b,i = Xi βn + ei · Ub

i ,

and regressors are Xi ’s.

Then the β∗b is

β∗

b =(1n

n∑

i=1

XiX′

i

)−1 1

n

n∑

i=1

XiY∗

b,i = βn +(1n

n∑

i=1

XiX′

i

)−1 1

n

n∑

i=1

Xi ei · Ubi .

Then we can show that√n(β∗

n − βn) can approximate√n(βn − β) well.

This is a residual-based bootstrap and this only works for OLS.

25 / 29

Page 26: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Bootstrap method for weakly dependent data

The bootstrap method we discuss above only works in iidframework.

For weakly dependent data, the dependence amongobservations plays an important role in the asymptotics.

Doing the nonparametric bootstrap above will not workbecause it will break down the dependence.

We need a method that can mimic the dependence structure.

26 / 29

Page 27: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Blockwise Bootstrap

Instead of resample an observation, we resample a bunch ofobservations together.To be specific, let the block size be k and the sample size beT . Then we have T − k + 1 blocks:

(W1,W2, . . . ,Wk)

(W2,W3, . . . ,Wk+1)

...

(WT−k+1, . . . ,WT ).

To form a bootstrap sample,1. we randomly select m blocks (with replacement) such that

m · k ≥ T and (m − 1) · k < T .2. laying them end-to-end in the order sampled.3. Drop the last m · k − T observations from the last sampled

block so that the sample size of the bootstrap sample is equalto T . 27 / 29

Page 28: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Blockwise Bootstrap (Cont’d)

For this method to work asymptotically, we require the blocksize k → ∞, but k/T → 0 at a suitable rate.

Why this method would work?

Why k → ∞, but k/T → 0??

28 / 29

Page 29: Lecture on Bootstrapckuan/pdf/2014fall/Lecture-Bootstrap... · Bootstrap can be served as an alternate method for this purpose. 4/29. What is bootstrap? Shi: “Bootstrap is an alternative

Conclusion!

29 / 29