Lecture 13: Subsampling vs Bootstrapdoubleh/eco273/subsampleslides.pdf · Lecture 13: Subsampling...

Lecture 13: Subsampling vs Bootstrap

Dimitris N. Politis, Joseph P. Romano, Michael Wolf

2011

Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap

Bootstrap

• Rn (xn, θ (P)) = τn

(θn − θ (P)

)• Example:

• θn = Xn, τn =√n, θ = EX = µ (P)

• θ = minXn, τn = n, θ (P) = sup{x : F (x) ≤ 0}

• Define: Jn (P), the distribution of τn(θn − θ (P)

)under P.

For real θn,

Jn (x ,P) ≡ ProbP(τn

(θn − θ (P)

)≤ x

)• Since P is unknown, θ (P) is unknown, and Jn (x ,P) is also

unknown.


• The bootstrap estimate Jn (x ,P) by Jn(x , Pn), where Pn is aconsistent estimate of P in some sense.

• For example, take Pn (x) = 1n

∑ni=1 1 (Xi ≤ x)

supx

∣∣∣∣Pn (x)− P (x)

∣∣∣∣ a.s.−→ 0

• Similarly estimate (1− α)th quantile of Jn (x ,P) by Jn(x , Pn):i.e. Estimate J−1

n (x ,P) by J−1n (x , Pn).

• Usually Jn(x , Pn) can’t be explicitly calculated, use MC:

Jn(x , Pn

)≈ 1

B

B∑i=1

1(τn

(θn,i − θn

)≤ x

)for θn,i = θ

(X ∗1,i , . . . ,X

∗n,i

).


• When bootstrap works, for each x,

Jn(x , Pn)− Jn (x ,P)p→ 0 =⇒ J−1n (1− α, Pn)− J−1n (1− α,P)

p→ 0

• When should Bootstrap “work”? Need local uniformity inweak convergence:

• Usually Jn (x ,P) −→ J (x ,P).

• Usually Pna.s.→ P in some sense, say supx |Pn (x)−P (x) | a.s.→ 0

• Suppose for each sequence Pn s.t. Pn → P, saysupx |Pn − P| → 0, it is also true that Jn (x ,Pn) −→ J (x ,P),then it must be true that a.s. Jn(x , Pn) −→ J (x ,P)

• So it ends up having to show for Pn → P,Jn (x ,Pn)→ J (x ,P), use triangular array formulation.


Case When Bootstrap Works

• Sample mean with finite variance.

• supx |Fn (x)− F (x) | a.s.−→ 0.

• θ(Fn) = 1n

∑ni=1 Xi

a.s.−→ θ (F ) = E (X ).

• σ2(Fn) = 1n

∑ni=1

(Xi − Xn

)2 a.s.−→ σ2 (F ) = Var(X ).

• Use Linderberg-Feller for the triangular array, applied to thedeterministic sequence of Pn such that:

1) supx |Pn (x)− Pn (x) | → 0; 2) θ (Pn)→ θ (P);

3) σ2 (Pn)→ σ2 (P),

it can be shown that√n(Xn − θ (Pn)

) d→ N(0, σ2

)under Pn.

• Since Pn satisfies 1,2,3 a.s., therefore Jn(x , Pn)a.s.−→ J (x ,P)

• So “local uniformity” of weak convergence is satisfied here.


Cases When Bootstrap Fails

• Order Statistics:

F ∼ U (0, θ), and X(1), . . . ,X(n) is the order statistics of thesample, so X(n) is the maximum:

P

(nθ − X(n)

θ> x

)= P

(X(n) < θ − θx

n

)= P

(Xi < θ − θx

n

)n

=

(1

θ

(θ − θx

n

))n

=(

1− x

n

)n n→∞−→ e−x

• The bootstrap version:

P

(nX(n) − X ∗(n)

X(n)= 0

)=

(1−

(1− 1

n

)n)n→∞−→

(1− e−1

)≈ 0.63


• Degenerate U-statistics:

Take w (x , y) = xy , θ (F ) =∫ ∫

w (x , y) dF (x) dF (y) = µ (F )2.

θn = θ(Fn

)=

1

n (n − 1)

∑∑i 6=j

XiXj

S (x) =

∫xydF (y) = xµ (F )

• If µ (F ) 6= 0 it is known that

√n(θn − θ

)d−→ N (0, 4Var (S (X ))) = N

(0, 4

(µ2EX 2 − µ4

))The bootstrap works.


• But if µ (F ) = 0 =⇒ θ (F ) = 0:

θ(Fn) =1

n (n − 1)

∑∑i 6=j

XiXj = X 2n −

1

n

1

n − 1

∑i

(Xi − Xn

)2= X 2

n −S2n

n

n(θ(Fn

)− θ (F )

)= nX 2

n − S2n

d−→ N(0, σ2

)− σ2

• However the bootstrap version of n[θ(F ∗n

)− θ

(Fn)]

:

n

([X ∗2n −

1

nS∗2n

]−[X 2n −

1

nS2n

])= nX ∗2n − S∗2n − nX 2

n + S2n

≈ n(X ∗2n − X 2

n

)=[√

n(X ∗n − Xn

)]2+ 2√n(X ∗n − Xn

)√nXn

d−→ N(0, σ2

)2+ 2N

(0, σ2

)√nXn


Subsampling

• iid case: Yi block of size b from (X1, . . . ,Xn), i = 1, . . . , q, forq =

(nb

). Let θn,b,i = θ (Yi ) calculated with ith block of data.

• Use the empirical distribution of τb(θn,b,i − θ) over the q

pseudo-estimates to approximate the distribution of τn(θ − θ):

Approximate Jn (x ,P) =P(τn

(θn − θ

)≤ x

)by Ln,b (x) =q−1

q∑i=1

1(τb

(θn,b,i − θn

)≤ x

)

• Claim: If b →∞, b/n→ 0, τb/τn → 0, as long as τn(θ − θ)d−→

something,

Jn (x ,P)− Ln,b (x)p−→ 0


Different Motivation for Subsampling vs. Bootstrap

Subsampling:

• Each subset of size b comes from the TRUE model. Sinceτn(θn − θ)

d−→ J (x ,P), so as long as b →∞:

τb(θb − θ)d−→ J (x ,P)

The distributions of τn(θn − θ) and τb(θb − θ) should be close.

• But τb(θb − θ) = τb(θb − θn) + τb(θn − θ). Since

τb

(θn − θ

)= Op

(τbτn

)= op (1)

The distributions of τb(θb − θ) and τb(θb − θn) should be close.

• The distribution of τb(θb − θn) is estimated by the empiricaldistribution over q =

(nb

)pseudo-estimates.


Bootstrap:

• Recalculate the statistics from the ESTIMATED model Pn.

• Given that Pn is close to P, hopefully Jn(x , Pn) is close toJn (x ,P) (Or to J (x ,P), the limit distribution).

• But when bootstrap fails

Pn −→ P ; Jn(x , Pn

)−→ J (x ,P)


Formal Proof of Consistency of Subsampling

• Assumptions: τn(θn − θ)d−→ J (x ,P), b →∞, b

n → 0, τbτn→ 0.

Need to show: Ln,b (x)− J (x ,P)p→ 0.

• Since τ (θn − θ)p−→ 0, it is enough to show

Un,b (x) = q−1q∑

i=1

1(τb

(θn,b,i − θ

)≤ x

)p−→ J (x ,P) .

Un,b (x) is a bth order U-statistics with kernel functionbounded by (−1, 1).

• Un,b (x)− J (x ,P) = Un,b (x)− EUn,b (x) + EUn,b (x)− J (x ,P), itis enough to show

Un,b (x)− EUn,b (x)p−→ 0 and EUn,b (x)− J (x ,P)→ 0


• But

EUn,b (x)− J (x ,P) = Jb (x ,P)→ 0

• Use Hoeffding exponential-type inequality (Serfling(1980),Thm A. p201):

P (Un,b (x)− Jb (x ,P) ≥ ε) ≤ exp(−2

n

bε2/ [1− (−1)]

)= exp

(−n

bt2)−→ 0 as

n

b−→∞.

• So

Ln,b (x)− J (x ,P) = Ln,b (x)− Un,b (x) + Un,b (x)

− Jb (x ,P) + Jb (x ,P)− J (x ,P)p−→ 0.

Q.E.D.


Time Series

• Respect the ordering of the data to preserve correlation.

θn,b,t = θb (Xt , . . . ,Xt+b−1) , q = T − b + 1.

Ln,b (x) =1

q

q∑i=1

1(τb

(θn,b,t − θn

)≤ x

)• Assumption: τn(θn − θ)

d−→ J (x ,P), b →∞, bn → 0, τb

τn→ 0,

α (m)→ 0.

• Result: Ln,b (x)− J (x ,P)p−→ 0.

• Most difficult part: To show τn(θn − θ)d−→ J (x ,P).


• Can treat iid data as time series, or even usingnon-overlapping blocks k =

[nb

], but using

(nb

)more efficient.

• For example, if

Un (x) = k−1k∑

j=1

1 (τb [Rn,b,j − θ (P)] ≤ x)

then

Un,b (x) = E[Un (x) |Xn

]= E [1 (τb [Rn,b,j − θ (P)] ≤ x) |Xn]

for Xn =(X(1), . . . ,X(n)

).

• Un,b (x) is better than Un (x) since Xn is sufficient statisticsfor iid data.


• Hypothesis Testing: Tn = τntn (X1, . . . ,Xn),

Gn (x ,P) = Probp (τn ≤ x)P∈P0−→ J (x ,P)

Gn,b (x) = q−1q∑

i=1

1 (Tn,b,i ≤ x) = q−1q∑

i=1

1 (τbtn,b,i ≤ x)

As long as b →∞, bn → 0, then under P ∈ P0:

Gn,b (x) −→ G (x ,P)

If under P ∈ P1, Tn →∞, then ∀x , Gn,b (x)→ 0.

• Key difference with confidence interval: don’t need τbτn→ 0,

because don’t need to estimate θ0 but assumed known underthe null hypothesis.


Estimating the unknown rate of convergence

• Assume that τn = nβ, for some unknown β > 0. Estimate βusing different size of subsampling distribution.

• Key idea: Compare the shape of the empirical distributions ofθb − θn for different values of b to infer the value of β.

• Let q =(nb

)for iid data, or q = T − b + 1 for time series data:

Ln,b (x |τb) ≡ q−1q∑

a=1

1(τb

(θn,b,a − θn

)≤ x

)Ln,b (x |1) ≡ q−1

q∑a=1

1(θn,b,a − θn ≤ x

)• This implies

Ln,b (x |τb) = Ln,b(τ−1b x |1

)≡ t


• x = L−1n,b (t|τb) = τb

(τ−1b x

)= τbL

−1n,b (t|1)

• Since Ln,b (x |τb)p−→ J (x ,P), if J (x ,P) is continuous and

increasing, it can be infered that

L−1n,b (t|τb) = J−1 (t,P) + op (1)

• Same as

τbL−1n,b (t|1) = J−1 (t,P) + op (1)

• So

bβL−1n,b (t|1) = J−1 (t,P) + op (1)

• Assuming J−1 (t,P) > 0, or t > J (0,P), take log.


• For different b1 and b2, then this becomes

β log b1 + log(L−1n,b1

(t|1))

= log J−1 (t,P) + op (1)

β log b2 + log(L−1n,b2

(t|1))

= log J−1 (t,P) + op (1)

• Different out the “fixed effect”

β (log b1 − log b2) = log(L−1n,b2

(t|1))− log

(L−1n,b1

(t|1))

+ op (1)

• So estimate β by

β = (log b1 − log b2)−1(

log(L−1n,b2

(t|1))− log

(L−1n,b1

(t|1)))

= β + (log b1 − log b2)−1 × op (1)

• Take b1 = nγ1 , b2 = nγ2 , (1 ≥ γ1 > γ2 > 0)

β − β = ((γ1 − γ2) log n)−1 op (1) = op(

(log n)−1)


Two Step Subsampling

• τn = nβ

Ln,b (x |τb) = q−1q∑

a=1

1(τb

(θn,b,a − θn

)≤ x

)Can show that

supx

∣∣∣∣Ln,b (x |τb)− J (x ,P)

∣∣∣∣ p−→ 0.

• Problem: imprecise in small samples.

• In variation estimation, best choice of b gives O(n−1/3) errorrate.

• Parameter estimates, if model is true, gives O(n−1/2) errorrate.

• Bootstrap pivotal statistics, when applicable, gives even betterthan O(n−1/2) error rate.


Lecture 13: Subsampling vs Bootstrapdoubleh/eco273/subsampleslides.pdf · Lecture 13: Subsampling...

Documents

Transcript of Lecture 13: Subsampling vs Bootstrapdoubleh/eco273/subsampleslides.pdf · Lecture 13: Subsampling...