Lecture 13: Subsampling vs Bootstrapdoubleh/eco273/subsampleslides.pdf · Lecture 13: Subsampling...
Transcript of Lecture 13: Subsampling vs Bootstrapdoubleh/eco273/subsampleslides.pdf · Lecture 13: Subsampling...
Lecture 13: Subsampling vs Bootstrap
Dimitris N. Politis, Joseph P. Romano, Michael Wolf
2011
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Bootstrap
• Rn (xn, θ (P)) = τn
(θn − θ (P)
)• Example:
• θn = Xn, τn =√n, θ = EX = µ (P)
• θ = minXn, τn = n, θ (P) = sup{x : F (x) ≤ 0}
• Define: Jn (P), the distribution of τn(θn − θ (P)
)under P.
For real θn,
Jn (x ,P) ≡ ProbP(τn
(θn − θ (P)
)≤ x
)• Since P is unknown, θ (P) is unknown, and Jn (x ,P) is also
unknown.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• The bootstrap estimate Jn (x ,P) by Jn(x , Pn), where Pn is aconsistent estimate of P in some sense.
• For example, take Pn (x) = 1n
∑ni=1 1 (Xi ≤ x)
supx
∣∣∣∣Pn (x)− P (x)
∣∣∣∣ a.s.−→ 0
• Similarly estimate (1− α)th quantile of Jn (x ,P) by Jn(x , Pn):i.e. Estimate J−1
n (x ,P) by J−1n (x , Pn).
• Usually Jn(x , Pn) can’t be explicitly calculated, use MC:
Jn(x , Pn
)≈ 1
B
B∑i=1
1(τn
(θn,i − θn
)≤ x
)for θn,i = θ
(X ∗1,i , . . . ,X
∗n,i
).
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• When bootstrap works, for each x,
Jn(x , Pn)− Jn (x ,P)p→ 0 =⇒ J−1n (1− α, Pn)− J−1n (1− α,P)
p→ 0
• When should Bootstrap “work”? Need local uniformity inweak convergence:
• Usually Jn (x ,P) −→ J (x ,P).
• Usually Pna.s.→ P in some sense, say supx |Pn (x)−P (x) | a.s.→ 0
• Suppose for each sequence Pn s.t. Pn → P, saysupx |Pn − P| → 0, it is also true that Jn (x ,Pn) −→ J (x ,P),then it must be true that a.s. Jn(x , Pn) −→ J (x ,P)
• So it ends up having to show for Pn → P,Jn (x ,Pn)→ J (x ,P), use triangular array formulation.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Case When Bootstrap Works
• Sample mean with finite variance.
• supx |Fn (x)− F (x) | a.s.−→ 0.
• θ(Fn) = 1n
∑ni=1 Xi
a.s.−→ θ (F ) = E (X ).
• σ2(Fn) = 1n
∑ni=1
(Xi − Xn
)2 a.s.−→ σ2 (F ) = Var(X ).
• Use Linderberg-Feller for the triangular array, applied to thedeterministic sequence of Pn such that:
1) supx |Pn (x)− Pn (x) | → 0; 2) θ (Pn)→ θ (P);
3) σ2 (Pn)→ σ2 (P),
it can be shown that√n(Xn − θ (Pn)
) d→ N(0, σ2
)under Pn.
• Since Pn satisfies 1,2,3 a.s., therefore Jn(x , Pn)a.s.−→ J (x ,P)
• So “local uniformity” of weak convergence is satisfied here.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Cases When Bootstrap Fails
• Order Statistics:
F ∼ U (0, θ), and X(1), . . . ,X(n) is the order statistics of thesample, so X(n) is the maximum:
P
(nθ − X(n)
θ> x
)= P
(X(n) < θ − θx
n
)= P
(Xi < θ − θx
n
)n
=
(1
θ
(θ − θx
n
))n
=(
1− x
n
)n n→∞−→ e−x
• The bootstrap version:
P
(nX(n) − X ∗(n)
X(n)= 0
)=
(1−
(1− 1
n
)n)n→∞−→
(1− e−1
)≈ 0.63
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• Degenerate U-statistics:
Take w (x , y) = xy , θ (F ) =∫ ∫
w (x , y) dF (x) dF (y) = µ (F )2.
θn = θ(Fn
)=
1
n (n − 1)
∑∑i 6=j
XiXj
S (x) =
∫xydF (y) = xµ (F )
• If µ (F ) 6= 0 it is known that
√n(θn − θ
)d−→ N (0, 4Var (S (X ))) = N
(0, 4
(µ2EX 2 − µ4
))The bootstrap works.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• But if µ (F ) = 0 =⇒ θ (F ) = 0:
θ(Fn) =1
n (n − 1)
∑∑i 6=j
XiXj = X 2n −
1
n
1
n − 1
∑i
(Xi − Xn
)2= X 2
n −S2n
n
n(θ(Fn
)− θ (F )
)= nX 2
n − S2n
d−→ N(0, σ2
)− σ2
• However the bootstrap version of n[θ(F ∗n
)− θ
(Fn)]
:
n
([X ∗2n −
1
nS∗2n
]−[X 2n −
1
nS2n
])= nX ∗2n − S∗2n − nX 2
n + S2n
≈ n(X ∗2n − X 2
n
)=[√
n(X ∗n − Xn
)]2+ 2√n(X ∗n − Xn
)√nXn
d−→ N(0, σ2
)2+ 2N
(0, σ2
)√nXn
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Subsampling
• iid case: Yi block of size b from (X1, . . . ,Xn), i = 1, . . . , q, forq =
(nb
). Let θn,b,i = θ (Yi ) calculated with ith block of data.
• Use the empirical distribution of τb(θn,b,i − θ) over the q
pseudo-estimates to approximate the distribution of τn(θ − θ):
Approximate Jn (x ,P) =P(τn
(θn − θ
)≤ x
)by Ln,b (x) =q−1
q∑i=1
1(τb
(θn,b,i − θn
)≤ x
)
• Claim: If b →∞, b/n→ 0, τb/τn → 0, as long as τn(θ − θ)d−→
something,
Jn (x ,P)− Ln,b (x)p−→ 0
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Different Motivation for Subsampling vs. Bootstrap
Subsampling:
• Each subset of size b comes from the TRUE model. Sinceτn(θn − θ)
d−→ J (x ,P), so as long as b →∞:
τb(θb − θ)d−→ J (x ,P)
The distributions of τn(θn − θ) and τb(θb − θ) should be close.
• But τb(θb − θ) = τb(θb − θn) + τb(θn − θ). Since
τb
(θn − θ
)= Op
(τbτn
)= op (1)
The distributions of τb(θb − θ) and τb(θb − θn) should be close.
• The distribution of τb(θb − θn) is estimated by the empiricaldistribution over q =
(nb
)pseudo-estimates.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Bootstrap:
• Recalculate the statistics from the ESTIMATED model Pn.
• Given that Pn is close to P, hopefully Jn(x , Pn) is close toJn (x ,P) (Or to J (x ,P), the limit distribution).
• But when bootstrap fails
Pn −→ P ; Jn(x , Pn
)−→ J (x ,P)
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Formal Proof of Consistency of Subsampling
• Assumptions: τn(θn − θ)d−→ J (x ,P), b →∞, b
n → 0, τbτn→ 0.
Need to show: Ln,b (x)− J (x ,P)p→ 0.
• Since τ (θn − θ)p−→ 0, it is enough to show
Un,b (x) = q−1q∑
i=1
1(τb
(θn,b,i − θ
)≤ x
)p−→ J (x ,P) .
Un,b (x) is a bth order U-statistics with kernel functionbounded by (−1, 1).
• Un,b (x)− J (x ,P) = Un,b (x)− EUn,b (x) + EUn,b (x)− J (x ,P), itis enough to show
Un,b (x)− EUn,b (x)p−→ 0 and EUn,b (x)− J (x ,P)→ 0
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• But
EUn,b (x)− J (x ,P) = Jb (x ,P)→ 0
• Use Hoeffding exponential-type inequality (Serfling(1980),Thm A. p201):
P (Un,b (x)− Jb (x ,P) ≥ ε) ≤ exp(−2
n
bε2/ [1− (−1)]
)= exp
(−n
bt2)−→ 0 as
n
b−→∞.
• So
Ln,b (x)− J (x ,P) = Ln,b (x)− Un,b (x) + Un,b (x)
− Jb (x ,P) + Jb (x ,P)− J (x ,P)p−→ 0.
Q.E.D.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Time Series
• Respect the ordering of the data to preserve correlation.
θn,b,t = θb (Xt , . . . ,Xt+b−1) , q = T − b + 1.
Ln,b (x) =1
q
q∑i=1
1(τb
(θn,b,t − θn
)≤ x
)• Assumption: τn(θn − θ)
d−→ J (x ,P), b →∞, bn → 0, τb
τn→ 0,
α (m)→ 0.
• Result: Ln,b (x)− J (x ,P)p−→ 0.
• Most difficult part: To show τn(θn − θ)d−→ J (x ,P).
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• Can treat iid data as time series, or even usingnon-overlapping blocks k =
[nb
], but using
(nb
)more efficient.
• For example, if
Un (x) = k−1k∑
j=1
1 (τb [Rn,b,j − θ (P)] ≤ x)
then
Un,b (x) = E[Un (x) |Xn
]= E [1 (τb [Rn,b,j − θ (P)] ≤ x) |Xn]
for Xn =(X(1), . . . ,X(n)
).
• Un,b (x) is better than Un (x) since Xn is sufficient statisticsfor iid data.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• Hypothesis Testing: Tn = τntn (X1, . . . ,Xn),
Gn (x ,P) = Probp (τn ≤ x)P∈P0−→ J (x ,P)
Gn,b (x) = q−1q∑
i=1
1 (Tn,b,i ≤ x) = q−1q∑
i=1
1 (τbtn,b,i ≤ x)
As long as b →∞, bn → 0, then under P ∈ P0:
Gn,b (x) −→ G (x ,P)
If under P ∈ P1, Tn →∞, then ∀x , Gn,b (x)→ 0.
• Key difference with confidence interval: don’t need τbτn→ 0,
because don’t need to estimate θ0 but assumed known underthe null hypothesis.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Estimating the unknown rate of convergence
• Assume that τn = nβ, for some unknown β > 0. Estimate βusing different size of subsampling distribution.
• Key idea: Compare the shape of the empirical distributions ofθb − θn for different values of b to infer the value of β.
• Let q =(nb
)for iid data, or q = T − b + 1 for time series data:
Ln,b (x |τb) ≡ q−1q∑
a=1
1(τb
(θn,b,a − θn
)≤ x
)Ln,b (x |1) ≡ q−1
q∑a=1
1(θn,b,a − θn ≤ x
)• This implies
Ln,b (x |τb) = Ln,b(τ−1b x |1
)≡ t
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• x = L−1n,b (t|τb) = τb
(τ−1b x
)= τbL
−1n,b (t|1)
• Since Ln,b (x |τb)p−→ J (x ,P), if J (x ,P) is continuous and
increasing, it can be infered that
L−1n,b (t|τb) = J−1 (t,P) + op (1)
• Same as
τbL−1n,b (t|1) = J−1 (t,P) + op (1)
• So
bβL−1n,b (t|1) = J−1 (t,P) + op (1)
• Assuming J−1 (t,P) > 0, or t > J (0,P), take log.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• For different b1 and b2, then this becomes
β log b1 + log(L−1n,b1
(t|1))
= log J−1 (t,P) + op (1)
β log b2 + log(L−1n,b2
(t|1))
= log J−1 (t,P) + op (1)
• Different out the “fixed effect”
β (log b1 − log b2) = log(L−1n,b2
(t|1))− log
(L−1n,b1
(t|1))
+ op (1)
• So estimate β by
β = (log b1 − log b2)−1(
log(L−1n,b2
(t|1))− log
(L−1n,b1
(t|1)))
= β + (log b1 − log b2)−1 × op (1)
• Take b1 = nγ1 , b2 = nγ2 , (1 ≥ γ1 > γ2 > 0)
β − β = ((γ1 − γ2) log n)−1 op (1) = op(
(log n)−1)
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
• How to know t > J (0,P),
Ln,b (0|τb) = Ln,b (0|1) = J (0,P) + op (1)
So estimating J (0,P) not a problem.
• Alternatively, take t2 ∈ (0.5, 1), take t1 ∈ (0, 0.5)
bβ(L−1n,b (t2|1)− L−1n,b (t1|1)
)= J−1 (t2|P)− J−1 (t1|P) + op (1)
β log b + log(L−1n,b (t2|1)− L−1n,b (t1|1)
)= log
(J−1 (t2|P)− J−1 (t1|P)
)+ op (1)
• β = (log b1 − log b2)−1[log(L−1n,b2
(t2|1)− L−1n,b2(t1|1)
)− log
(L−1n,b1
(t2|1)− L−1n,b1(t1|1)
)]• Take b1 = nγ1 , b2 = nγ2 (1 > γ1 > γ2 > 0), β − β = op((log n)−1).
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap
Two Step Subsampling
• τn = nβ
Ln,b (x |τb) = q−1q∑
a=1
1(τb
(θn,b,a − θn
)≤ x
)Can show that
supx
∣∣∣∣Ln,b (x |τb)− J (x ,P)
∣∣∣∣ p−→ 0.
• Problem: imprecise in small samples.
• In variation estimation, best choice of b gives O(n−1/3) errorrate.
• Parameter estimates, if model is true, gives O(n−1/2) errorrate.
• Bootstrap pivotal statistics, when applicable, gives even betterthan O(n−1/2) error rate.
Dimitris N. Politis, Joseph P. Romano, Michael Wolf Subsampling vs Bootstrap