Emergency & Massive Transfusion Brian Poirier, MD UCDavis Medical Center.
ChallengesinFiducialInference · ChallengesinFiducialInference Partsofthistalkarejointworkwith...
Transcript of ChallengesinFiducialInference · ChallengesinFiducialInference Partsofthistalkarejointworkwith...
Challenges in Fiducial Inference
Parts of this talk are joint work withT. C.M Lee (UC Davis), Randy Lai (U of Maine), H. Iyer (NIST)J. Williams, Y Cui (UNC)
BFF 2017
Jan Hanniga
University of North Carolina at Chapel Hill
aNSF support acknowledged
outline
Outline
Introduction
Definition
Sparsity
Regularization
Conclusions
1
introduction
Outline
Introduction
Definition
Sparsity
Regularization
Conclusions
introduction
Fiducial?
▶ Oxford English Dictionary▶ adjective technical (of a point or line) used as a fixed basis of
comparison.▶ Origin from Latin fiducia ‘trust, confidence’
▶ Merriam-Webster dictionary1. taken as standard of reference a fiducial mark2. founded on faith or trust3. having the nature of a trust : fiduciary
2
introduction
Aim of this talk
▶ Explain the definition of generalized fiducial distribution
▶ Challenge of extra information:▶ Sparsity▶ Regularization
▶ My point of view: frequentist▶ Justified using asymptotic theorems and simulations.▶ GFI tends to work well
3
introduction
Aim of this talk
▶ Explain the definition of generalized fiducial distribution▶ Challenge of extra information:
▶ Sparsity▶ Regularization
▶ My point of view: frequentist▶ Justified using asymptotic theorems and simulations.▶ GFI tends to work well
3
introduction
Aim of this talk
▶ Explain the definition of generalized fiducial distribution▶ Challenge of extra information:
▶ Sparsity▶ Regularization
▶ My point of view: frequentist▶ Justified using asymptotic theorems and simulations.▶ GFI tends to work well
3
definition
Outline
Introduction
Definition
Sparsity
Regularization
Conclusions
definition
Comparison to likelihood
▶ Density is the function f(x, ξ), where ξ is fixed and x isvariable.
▶ Likelihood is the function f(x, ξ), where ξ is variable and x isfixed.
▶ Likelihood as a distribution?
4
definition
Comparison to likelihood
▶ Density is the function f(x, ξ), where ξ is fixed and x isvariable.
▶ Likelihood is the function f(x, ξ), where ξ is variable and x isfixed.
▶ Likelihood as a distribution?
4
definition
Comparison to likelihood
▶ Density is the function f(x, ξ), where ξ is fixed and x isvariable.
▶ Likelihood is the function f(x, ξ), where ξ is variable and x isfixed.
▶ Likelihood as a distribution?
4
definition Formal Defintion
General Definition
▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi
▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of
arg minξ
∥x−G(U⋆, ξ)∥ | {minξ
∥x−G(U⋆, ξ)∥ ≤ ε} (1)
▶ Similar to ABC; generating from prior replaced bymin.▶ Is this practicle? Can we compute?
5
definition Formal Defintion
General Definition
▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi
▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of
arg minξ
∥x−G(U⋆, ξ)∥ | {minξ
∥x−G(U⋆, ξ)∥ ≤ ε} (1)
▶ Similar to ABC; generating from prior replaced bymin.▶ Is this practicle? Can we compute?
5
definition Formal Defintion
General Definition
▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi
▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of
arg minξ
∥x−G(U⋆, ξ)∥ | {minξ
∥x−G(U⋆, ξ)∥ ≤ ε} (1)
▶ Similar to ABC; generating from prior replaced bymin.
▶ Is this practicle? Can we compute?
5
definition Formal Defintion
General Definition
▶ Data generating equationX = G(U , ξ).▶ e.g. Xi = µ+ σUi
▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of
arg minξ
∥x−G(U⋆, ξ)∥ | {minξ
∥x−G(U⋆, ξ)∥ ≤ ε} (1)
▶ Similar to ABC; generating from prior replaced bymin.▶ Is this practicle? Can we compute?
5
definition Formal Defintion
Explicit limit (1)
▶ AssumeX ∈ Rn is continuous; parameter ξ ∈ Rp
▶ The limit in (1) has density (H, Iyer, Lai & Lee, 2016)
r(ξ|x) = fX(x|ξ)J(x, ξ)∫Ξ fX(x|ξ′)J(x, ξ′) dξ′
,
where J(x, ξ) = D
(ddξG(u, ξ)
∣∣∣u=G−1(x,ξ)
)▶ n = p givesD(A) = | detA|
▶ ∥· ∥2 givesD(A) = (detA⊤A)1/2
Compare to Fraser, Reid, Marras & Yi (2010)
▶ ∥· ∥∞ givesD(A) =∑
i=(i1,...,ip)
|det(A)i|
6
definition Formal Defintion
Explicit limit (1)
▶ AssumeX ∈ Rn is continuous; parameter ξ ∈ Rp
▶ The limit in (1) has density (H, Iyer, Lai & Lee, 2016)
r(ξ|x) = fX(x|ξ)J(x, ξ)∫Ξ fX(x|ξ′)J(x, ξ′) dξ′
,
where J(x, ξ) = D
(ddξG(u, ξ)
∣∣∣u=G−1(x,ξ)
)▶ n = p givesD(A) = | detA|
▶ ∥· ∥2 givesD(A) = (detA⊤A)1/2
Compare to Fraser, Reid, Marras & Yi (2010)
▶ ∥· ∥∞ givesD(A) =∑
i=(i1,...,ip)
|det(A)i|
6
definition Formal Defintion
Example -- Linear Regression
▶ Data generating equation Y = Xβ + σZ
▶ ddθY = (X,Z) and Z = (Y −Xβ)/σ.
▶ The L2 Jacobian is
J(y, β, σ) =
(det
((X,
y −Xβ
σ)⊤(X,
y −Xβ
σ)
))1/2
= σ−1|det(XTX)|1/2(RSS)1/2
▶ Fiducial happens to be same as independence Jeffreys,explicit normalizing constant
7
definition Formal Defintion
Example -- Linear Regression
▶ Data generating equation Y = Xβ + σZ
▶ ddθY = (X,Z) and Z = (Y −Xβ)/σ.
▶ The L2 Jacobian is
J(y, β, σ) =
(det
((X,
y −Xβ
σ)⊤(X,
y −Xβ
σ)
))1/2
= σ−1| det(XTX)|1/2(RSS)1/2
▶ Fiducial happens to be same as independence Jeffreys,explicit normalizing constant
7
definition Formal Defintion
Example -- Linear Regression
▶ Data generating equation Y = Xβ + σZ
▶ ddθY = (X,Z) and Z = (Y −Xβ)/σ.
▶ The L2 Jacobian is
J(y, β, σ) =
(det
((X,
y −Xβ
σ)⊤(X,
y −Xβ
σ)
))1/2
= σ−1| det(XTX)|1/2(RSS)1/2
▶ Fiducial happens to be same as independence Jeffreys,explicit normalizing constant
7
definition Formal Defintion
Example -- Uniform(θ, θ2)
▶ Xi i.i.d. U(θ, θ2), θ > 1
▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).
▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with
Ui =Xi−θθ2−θ
.
▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2
θ2−θ .
▶ Reference prior π(θ) = eψ( 2θ
2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &
Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference
prior which was much better than flat prior.
8
definition Formal Defintion
Example -- Uniform(θ, θ2)
▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).
▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with
Ui =Xi−θθ2−θ
.
▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2
θ2−θ .
▶ Reference prior π(θ) = eψ( 2θ
2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &
Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference
prior which was much better than flat prior.
8
definition Formal Defintion
Example -- Uniform(θ, θ2)
▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).
▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with
Ui =Xi−θθ2−θ
.
▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2
θ2−θ .
▶ Reference prior π(θ) = eψ( 2θ
2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &
Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference
prior which was much better than flat prior.
8
definition Formal Defintion
Example -- Uniform(θ, θ2)
▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).
▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with
Ui =Xi−θθ2−θ
.
▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2
θ2−θ .
▶ Reference prior π(θ) = eψ( 2θ
2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &
Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference
prior which was much better than flat prior.
8
definition Formal Defintion
Example -- Uniform(θ, θ2)
▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).
▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with
Ui =Xi−θθ2−θ
.
▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2
θ2−θ .
▶ Reference prior π(θ) = eψ( 2θ
2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &
Sun (2009) – complicated to derive.
▶ In simulations fiducial was marginally better than referenceprior which was much better than flat prior.
8
definition Formal Defintion
Example -- Uniform(θ, θ2)
▶ Xi i.i.d. U(θ, θ2), θ > 1▶ Data generating equationXi = θ + (θ2 − θ)Ui, Ui ∼ U(0, 1).
▶ Compute Jacobian: ddθ [θ + (θ2 − θ)Ui] = 1 + (2θ − 1)Ui, with
Ui =Xi−θθ2−θ
.
▶ Using ∥· ∥∞ we have J(x, θ) = n x̄(2θ−1)−θ2
θ2−θ .
▶ Reference prior π(θ) = eψ( 2θ
2θ−1)(2θ−1)θ(θ−1) Berger, Bernardo &
Sun (2009) – complicated to derive.▶ In simulations fiducial was marginally better than reference
prior which was much better than flat prior.
8
definition Formal Defintion
Important Simple Observations
▶ GFD is allways proper
▶ GFD is invariant to re-parametrizations (same as Jeffreys)
▶ GFD is not invariant to smooth transformation of the data ifn > p
▶ Does not satisfy likelihood principle.
9
definition Formal Defintion
Important Simple Observations
▶ GFD is allways proper
▶ GFD is invariant to re-parametrizations (same as Jeffreys)
▶ GFD is not invariant to smooth transformation of the data ifn > p
▶ Does not satisfy likelihood principle.
9
definition Formal Defintion
Important Simple Observations
▶ GFD is allways proper
▶ GFD is invariant to re-parametrizations (same as Jeffreys)
▶ GFD is not invariant to smooth transformation of the data ifn > p
▶ Does not satisfy likelihood principle.
9
definition Formal Defintion
Important Simple Observations
▶ GFD is allways proper
▶ GFD is invariant to re-parametrizations (same as Jeffreys)
▶ GFD is not invariant to smooth transformation of the data ifn > p
▶ Does not satisfy likelihood principle.
9
definition Asymptotic results
Various Asymptotic Results
r(ξ|x) ∝ fX(x|ξ)J(x, ξ)where J(x, ξ) = D
(ddξ
G(u, ξ)∣∣∣u=G−1(x,ξ)
)
▶ Most start with C−1n J(x, ξ) → J(ξ0, ξ)
▶ Bernstein-von Mises theorem for fiducial distributionsprovides asymptotic correctness of fiducial CIs H (2009,2013), Sonderegger & H (2013) .
▶ Consistency of model selection H & Lee (2009), Lai, H & Lee(2015), H, Iyer, Lai & Lee (2016).
▶ Regular higher order asymptotics in Pal Majumdar & H(2016+).
10
sparsity
Outline
Introduction
Definition
Sparsity
Regularization
Conclusions
sparsity
Model Selection
▶ X = G(M, ξM ,U), M ∈ M, ξM ∈ ξM
Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions
r(M |y) ∝ q|M |∫ξM
fM (y, ξM )JM (y, ξM ) dξM
▶ Need for penalty – in fiducial framework additional equations0 = Pk, k = 1, . . . ,min(|M |, n)
▶ Default value q = n−1/2 (motivated by MDL)
11
sparsity
Model Selection
▶ X = G(M, ξM ,U), M ∈ M, ξM ∈ ξM
Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions
r(M |y) ∝ q|M |∫ξM
fM (y, ξM )JM (y, ξM ) dξM
▶ Need for penalty – in fiducial framework additional equations0 = Pk, k = 1, . . . ,min(|M |, n)
▶ Default value q = n−1/2 (motivated by MDL)
11
sparsity
Model Selection
▶ X = G(M, ξM ,U), M ∈ M, ξM ∈ ξM
Theorem: (H, Iyer, Lai, Lee 2016) Under assumptions
r(M |y) ∝ q|M |∫ξM
fM (y, ξM )JM (y, ξM ) dξM
▶ Need for penalty – in fiducial framework additional equations0 = Pk, k = 1, . . . ,min(|M |, n)
▶ Default value q = n−1/2 (motivated by MDL)
11
sparsity
Alternative to penalty
▶ Penalty is used to discourage models with many parameters
▶ Real issue: Not too many parameters but a smaller modelcan do almost the same job
r(M |y) ∝∫ξM
fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,
hM (ξM ) =
{0 a smaller model predicts nearly as well
1 otherwise
▶ Motivated by non-local priors of Johnson & Rossell (2009)
12
sparsity
Alternative to penalty
▶ Penalty is used to discourage models with many parameters▶ Real issue: Not too many parameters but a smaller model
can do almost the same job
r(M |y) ∝∫ξM
fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,
hM (ξM ) =
{0 a smaller model predicts nearly as well
1 otherwise
▶ Motivated by non-local priors of Johnson & Rossell (2009)
12
sparsity
Alternative to penalty
▶ Penalty is used to discourage models with many parameters▶ Real issue: Not too many parameters but a smaller model
can do almost the same job
r(M |y) ∝∫ξM
fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,
hM (ξM ) =
{0 a smaller model predicts nearly as well
1 otherwise
▶ Motivated by non-local priors of Johnson & Rossell (2009)
12
sparsity
Alternative to penalty
▶ Penalty is used to discourage models with many parameters▶ Real issue: Not too many parameters but a smaller model
can do almost the same job
r(M |y) ∝∫ξM
fM (y, ξM )JM (y, ξM )hM (ξM ) dξM ,
hM (ξM ) =
{0 a smaller model predicts nearly as well
1 otherwise
▶ Motivated by non-local priors of Johnson & Rossell (2009)
12
sparsity
Regression
▶ Y = Xβ + σZ
▶ First idea hM (βM ) = I{|βi|>ϵ, i∈M} – issue: collinearity
▶ Better:
hM (βM ) := I{ 12∥XT (XMβM−Xbmin)∥22≥ε(n,|M |)}
where bmin solves
minb∈Rp
1
2∥XT (XMβM −Xb)∥22 subject to ∥b∥0 ≤ |M | − 1.
▶ algorithm – Bertsimas et al (2016)▶ similar to Dantzig selector Candes & Tao (2007)
different norm and target
13
sparsity
Regression
▶ Y = Xβ + σZ
▶ First idea hM (βM ) = I{|βi|>ϵ, i∈M} – issue: collinearity▶ Better:
hM (βM ) := I{ 12∥XT (XMβM−Xbmin)∥22≥ε(n,|M |)}
where bmin solves
minb∈Rp
1
2∥XT (XMβM −Xb)∥22 subject to ∥b∥0 ≤ |M | − 1.
▶ algorithm – Bertsimas et al (2016)
▶ similar to Dantzig selector Candes & Tao (2007)different norm and target
13
sparsity
Regression
▶ Y = Xβ + σZ
▶ First idea hM (βM ) = I{|βi|>ϵ, i∈M} – issue: collinearity▶ Better:
hM (βM ) := I{ 12∥XT (XMβM−Xbmin)∥22≥ε(n,|M |)}
where bmin solves
minb∈Rp
1
2∥XT (XMβM −Xb)∥22 subject to ∥b∥0 ≤ |M | − 1.
▶ algorithm – Bertsimas et al (2016)▶ similar to Dantzig selector Candes & Tao (2007)
different norm and target
13
sparsity
GFD
r(M |y) ∝ π|M|2 Γ
(n− |M |2
)RSS
−(n−|M|−1
2)
M E[hεM (β⋆M )]
Observations:
▶ Expectation with respect to within model GFD (usual T)
▶ r(M |y) negligibly small for large models because of h,e.g., |M | > n implies r(M |y) = 0.
▶ Implemented using Grouped Independence MetropolisHastings (Andrieu & Roberts, 2009).
14
sparsity
GFD
r(M |y) ∝ π|M|2 Γ
(n− |M |2
)RSS
−(n−|M|−1
2)
M E[hεM (β⋆M )]
Observations:
▶ Expectation with respect to within model GFD (usual T)▶ r(M |y) negligibly small for large models because of h,
e.g., |M | > n implies r(M |y) = 0.
▶ Implemented using Grouped Independence MetropolisHastings (Andrieu & Roberts, 2009).
14
sparsity
GFD
r(M |y) ∝ π|M|2 Γ
(n− |M |2
)RSS
−(n−|M|−1
2)
M E[hεM (β⋆M )]
Observations:
▶ Expectation with respect to within model GFD (usual T)▶ r(M |y) negligibly small for large models because of h,
e.g., |M | > n implies r(M |y) = 0.▶ Implemented using Grouped Independence Metropolis
Hastings (Andrieu & Roberts, 2009).
14
sparsity
Main Result
TheoremWilliams & H (2017+)Suppose the true model is given byMT . Then under certainconditions, for a fixed positive constant α < 1,
r(MT |y) =r(MT |y)∑nα
j=1
∑M :|M |=j r(M |y)
P−→ 1 as n, p → ∞.
15
sparsity
Some Conditions
▶ Number of Predictors: lim infn→∞p→∞
n1−α
log(p) > 2,
▶ For the true model/parameter pT < log nγ
εMT(n, p) ≤ 1
18∥XT (µT −Xbmin)∥22
where bmin minimizes the norm subject to ∥b∥0 ≤ pT − 1.▶ For a large model |M | > pT and large enough n or p,
9
2∥XT (HM −HM(−1))µT ∥22 < εM (n, p),
whereHM andHM(−1) are the projection matrix forM andM with a covariate removed respectively.
16
sparsity
Some Conditions
▶ Number of Predictors: lim infn→∞p→∞
n1−α
log(p) > 2,
▶ For the true model/parameter pT < log nγ
εMT(n, p) ≤ 1
18∥XT (µT −Xbmin)∥22
where bmin minimizes the norm subject to ∥b∥0 ≤ pT − 1.
▶ For a large model |M | > pT and large enough n or p,
9
2∥XT (HM −HM(−1))µT ∥22 < εM (n, p),
whereHM andHM(−1) are the projection matrix forM andM with a covariate removed respectively.
16
sparsity
Some Conditions
▶ Number of Predictors: lim infn→∞p→∞
n1−α
log(p) > 2,
▶ For the true model/parameter pT < log nγ
εMT(n, p) ≤ 1
18∥XT (µT −Xbmin)∥22
where bmin minimizes the norm subject to ∥b∥0 ≤ pT − 1.▶ For a large model |M | > pT and large enough n or p,
9
2∥XT (HM −HM(−1))µT ∥22 < εM (n, p),
whereHM andHM(−1) are the projection matrix forM andM with a covariate removed respectively.
16
sparsity
Simulation
▶ Setup from Rockova & George (2015)▶ n = 100, p = 1000, pT = 8.▶ Columns ofX either a) independent or b) correlated with
ρ = 0.6
▶ εM (n, p) = ΛM σ̂2M
(n0.51
9 + |M | log(pπ)1.1
9 − log(n)γ)+with
γ = 1.45.
17
sparsity
Highlight of simulation results
▶ See Jon Williams’ poster for details on theory and simulation
▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model
▶ Conditions of Theorem violated▶ based on conditions: p decreased to 500 to satisfy,
performance improves.
18
sparsity
Highlight of simulation results
▶ See Jon Williams’ poster for details on theory and simulation▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model
▶ Conditions of Theorem violated▶ based on conditions: p decreased to 500 to satisfy,
performance improves.
18
sparsity
Highlight of simulation results
▶ See Jon Williams’ poster for details on theory and simulation▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model
▶ Conditions of Theorem violated
▶ based on conditions: p decreased to 500 to satisfy,performance improves.
18
sparsity
Highlight of simulation results
▶ See Jon Williams’ poster for details on theory and simulation▶ WhenX independent – usually select the correct model▶ WhenX correlated – usually select too small of a model
▶ Conditions of Theorem violated▶ based on conditions: p decreased to 500 to satisfy,
performance improves.
18
sparsity
Comments
▶ Standardized way of measuring closeness in other models?▶ What if small model not the right target, e.g., gene
interactions?
19
sparsity
Comments
▶ Standardized way of measuring closeness in other models?
▶ What if small model not the right target, e.g., geneinteractions?
19
sparsity
Comments
▶ Standardized way of measuring closeness in other models?▶ What if small model not the right target, e.g., gene
interactions?
19
regularization
Outline
Introduction
Definition
Sparsity
Regularization
Conclusions
regularization
Recall
▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of
arg minξ
∥x−G(U⋆, ξ)∥ | {minξ
∥x−G(U⋆, ξ)∥ ≤ ε}
▶ Conditioning U⋆ on {x = G(U⋆, ξ)}– “regularization by model”
20
regularization
Recall
▶ A distribution on the parameter space is Generalized FiducialDistribution if it can be obtained as a limit (as ε ↓ 0) of
arg minξ
∥x−G(U⋆, ξ)∥ | {minξ
∥x−G(U⋆, ξ)∥ ≤ ε}
▶ Conditioning U⋆ on {x = G(U⋆, ξ)}– “regularization by model”
20
regularization
Most general iid model
▶ Data generating equation:
Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)
▶ Inverting (solving for F ) we get
F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).
There is a solution iff order of U∗i matches order of xi.
21
regularization
Most general iid model
▶ Data generating equation:
Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)
▶ Inverting (solving for F ) we get
F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).
There is a solution iff order of U∗i matches order of xi.
21
regularization
Most general iid model
▶ Data generating equation:
Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)
▶ Inverting (solving for F ) we get
F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).
There is a solution iff order of U∗i matches order of xi.
x x x x• •
••
0
1
U⋆i
xi
21
regularization
Most general iid model
▶ Data generating equation:
Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)
▶ Inverting (solving for F ) we get
F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).
There is a solution iff order of U∗i matches order of xi.
x x x x0
1
F ∗
xi lower upper
• •
••
21
regularization
Most general iid model
▶ Data generating equation:
Xi = F−1(Ui), Ui, i.i.d. Uniform(0,1)
▶ Inverting (solving for F ) we get
F ∗(x−i ) ≤ U∗i ≤ F ∗(xi).
There is a solution iff order of U∗i matches order of xi.
x x x x0
1
F ∗
xi lower upper
• •
••
▶ See Yifan Cui’s poster for extension to censored data.21
regularization
Additional Constraints
▶ Location scale family with known density f(x) and cdf F (x),e.g.,N(µ, σ2).
▶ Condition U∗i on existence µ∗, σ∗ so that
F (σ∗−1(xi − µ∗)) = U∗i , for all i
x x x x0
1
F ∗
xiN(4.5, 32) lower upper
• •
••
▶ GFD is r(µ, σ) ∝ σ−1∏n
i=1 σ−1f
(σ−1(xi − µ)
)
22
regularization
Additional Constraints
▶ Location scale family with known density f(x) and cdf F (x),e.g.,N(µ, σ2).
▶ Condition U∗i on existence µ∗, σ∗ so that
F (σ∗−1(xi − µ∗)) = U∗i , for all i
x x x x0
1
F ∗
xiN(4.5, 32) lower upper
• •
••
▶ GFD is r(µ, σ) ∝ σ−1∏n
i=1 σ−1f
(σ−1(xi − µ)
)
22
regularization
Additional Constraints
▶ Location scale family with known density f(x) and cdf F (x),e.g.,N(µ, σ2).
▶ Condition U∗i on existence µ∗, σ∗ so that
F (σ∗−1(xi − µ∗)) = U∗i , for all i
x x x x0
1
F ∗
xiN(4.5, 32) lower upper
• •
••
▶ GFD is r(µ, σ) ∝ σ−1∏n
i=1 σ−1f
(σ−1(xi − µ)
)22
regularization
Constraint complications
Toy example: X = µ+ Z, µ > 0.
▶ Option 1: condition Z⋆|x− Z⋆ > 0
▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}
▶ Lower confidence bounds do not have correct coverage.
▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against
▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1
2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.
23
regularization
Constraint complications
Toy example: X = µ+ Z, µ > 0.
▶ Option 1: condition Z⋆|x− Z⋆ > 0
▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}
▶ Lower confidence bounds do not have correct coverage.
▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against
▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1
2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.
23
regularization
Constraint complications
Toy example: X = µ+ Z, µ > 0.
▶ Option 1: condition Z⋆|x− Z⋆ > 0
▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}
▶ Lower confidence bounds do not have correct coverage.
▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against
▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1
2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.
23
regularization
Constraint complications
Toy example: X = µ+ Z, µ > 0.
▶ Option 1: condition Z⋆|x− Z⋆ > 0
▶ r(µ) = φ(x−µ)Φ(x) I{µ>0}
▶ Lower confidence bounds do not have correct coverage.
▶ Option 2: projection to µ > 0▶ r(µ) = (1− Φ(x))I{0} + φ(x− µ)I{µ>0}▶ Correct coverage; possible to get {0} as CI – sure bet against
▶ Option 3: mixture▶ r(µ) = min( 12 , 1− Φ(x)))I{0} +max( 1
2Φ(x) , 1)φ(x− µ)I{µ>0}▶ Correct/conservative coverage, no {0} for reasonable α CIs.
23
regularization
Shape restrictions - preliminary results
▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)
▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions
(quadratic program)
24
regularization
Shape restrictions - preliminary results
▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)
▶ Condition U⋆ on concave solution (Gibbs sampler)
▶ Project unrestricted GFD to space of concave functions(quadratic program)
24
regularization
Shape restrictions - preliminary results
▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)
▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions
(quadratic program)
24
regularization
Shape restrictions - preliminary results
▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)
▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions
(quadratic program)
Condition Projection
24
regularization
Shape restrictions - preliminary results
▶ Example: Positive iid data with concave cdf(MLE is the Grenander estimator)
▶ Condition U⋆ on concave solution (Gibbs sampler)▶ Project unrestricted GFD to space of concave functions
(quadratic program)
Condition Projection
24
regularization
Comments
▶ When to use conditioning vs. projection?
▶ Connection to ancillarity and IM.▶ Computational cost a consideration?
25
regularization
Comments
▶ When to use conditioning vs. projection?▶ Connection to ancillarity and IM.
▶ Computational cost a consideration?
25
regularization
Comments
▶ When to use conditioning vs. projection?▶ Connection to ancillarity and IM.▶ Computational cost a consideration?
25
conclusions
Outline
Introduction
Definition
Sparsity
Regularization
Conclusions
conclusions
Fiducial Future
▶ What is it that we provide?▶ GFI: General purpose method that often works well
▶ Computational convenience and efficiency▶ Fiducial options in software.
▶ Theoretical guarantees▶ Applications
▶ The proof is in the pudding
26
conclusions
Fiducial Future
▶ What is it that we provide?▶ GFI: General purpose method that often works well
▶ Computational convenience and efficiency▶ Fiducial options in software.
▶ Theoretical guarantees▶ Applications
▶ The proof is in the pudding
26
conclusions
Fiducial Future
▶ What is it that we provide?▶ GFI: General purpose method that often works well
▶ Computational convenience and efficiency▶ Fiducial options in software.
▶ Theoretical guarantees▶ Applications
▶ The proof is in the pudding
26
conclusions
Fiducial Future
▶ What is it that we provide?▶ GFI: General purpose method that often works well
▶ Computational convenience and efficiency▶ Fiducial options in software.
▶ Theoretical guarantees
▶ Applications▶ The proof is in the pudding
26
conclusions
Fiducial Future
▶ What is it that we provide?▶ GFI: General purpose method that often works well
▶ Computational convenience and efficiency▶ Fiducial options in software.
▶ Theoretical guarantees▶ Applications
▶ The proof is in the pudding
26
conclusions
List of successful applications
▶ General Linear Mixed Models E, H & Iyer (2008); Cissewski &H (2012)
▶ Confidence sets for wavelet regression H & Lee (2009) andfree knot splines Sonderegger & H (2014)
▶ Extreme value data (Generalized Pareto), Maximum mean,and model comparison Wandler & H (2011, 2012ab)
▶ Uncertainty quantification for ultra high dimensionalregression Lai, H & Lee (2015), Wandler & H (2017+)
▶ Volatility estimation for high frequency data Katsoridas & H(2016+)
▶ Logistic regression with random effects (response models)Liu & H (2016,2017)
27
conclusions
I have a dream …
▶ One famous statistician said (I paraphrase)
“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”
▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.
Thank you!
28
conclusions
I have a dream …
▶ One famous statistician said (I paraphrase)
“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”
▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.
Thank you!
28
conclusions
I have a dream …
▶ One famous statistician said (I paraphrase)
“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”
▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.
Thank you!
28
conclusions
I have a dream …
▶ One famous statistician said (I paraphrase)
“I use Bayes because there is no need to proveasymptotic theorem; it is correct.”
▶ I have a dream that by the time I retire people will havesimilar trust in fiducial inspired approaches.
Thank you!
28