Optimal importance sampling using
stochastic controlLorenz Richter, Carsten Hartmann
Freie Universität Berlin
The setting
Goal: Compute Eπ [f(X)], f : Rd → R. Model π as stationary distribution of astochastic process Xt ∈ Rd associated with path measure P and described by
dXt = ∇ log π(Xt)dt+√2dWt
and compute EP[f(XT )].
Objective: Deal with variance and convergence speed.
Idea: Consider the controlled process associated with path measure Qu
dXut =
(∇ log π(Xu
t ) +√2u(Xu
t , t))dt+
√2dWt
and do importance sampling in path space: EQu
[f(Xu
T ) dPdQu
].
Outlook: Exploit ergodicity to find a good control for infinite times:
limT→∞
1
T
∫ T
0f(Xt)dt = Eπ [f(X)].
Optimal change of measure
Donsker-Varadhan: Consider W (X0:T ) =∫ T0 g(Xt, t)dt+h(XT ).The duality
between sampling and an optimal change of measure
γ(x, t) := − logEP[exp(−W )|Xt = x] = infQu≪P
{EQu [W ] + KL(Qu∥P)}
brings a zero-variance estimator, i.e. VarQ∗(exp (−W ) dP
dQ∗
)= 0.
→ γ(x, t) is the value function of a control problem and fulfills the HJB equation
→ the control costs are J(u) = E[∫ T
0
(g(Xu
t , t) +12|u(Xu
t , t)|2)dt+ h(Xu
T )]
→ the optimal change of measure reads dQ∗ = e−W
E[e−W ]dP
→ the optimal control is u∗(x, t) = −√2∇xγ(x, t)
→ dPdQu is computed via Girsanov’s theorem
Computing the optimal change of measure
Gradient descentParametrize the control (i.e. the change of measure) in ansatz functions φi:
u(x, t) = −√2
m∑i=1
αi(t)φi(x).
Compute the minimization of the costs J(u) with a gradient descent in α, i.e.
αk+1 = αk − ηk∇αJ(u(αk))
Different estimators of the gradient scale differently with the time horizon T :– Gfinite differences ∝ T
– Gcentered likelihood ratio ∝ T 2
– Glikelihood ratio ∝ T 3
Cross-entropy methodIt holds J(u) = J(u∗) + KL(Qu∥Q∗), however we minimize KL(Q∗∥Qu)since this is feasible.Again parametrize u(α) in ansatz functions. We then need to minimize thecross-entropy functional for which we get a necessary optimality condition inform of the linear equation Sα = b, which we solve iteratively.
Approximate policy iterationSuccessive linearization of HJB equation: the cost functional J(u) solves a PDEof the form
A(u)J(u) = l(x, u),
where γ(x, t) = minu J(u;x, t). Start with an initial guess of the control policyand iterate
uk+1(x, t) = −√2∇xJ(uk;x, t),
yielding a fixed point iteration in the non-optimal control u.
Numerical simulations
• Sample E[exp(−αXT )] with π = N (0, 1) and u∗(x, t) = −√2αet−T ,
u determined by gradient decent
0 1 2 3 4 5t
0
5
10
15
eX t
0 1 2 3 4 5t
0
2
4
6
8
10
eX
u td d
x
2 1 0 1 2
t
01
23
45
u* (
x,t)
0.00.10.20.30.40.50.60.7
x
2 1 0 1 2
t
01
23
45
u(x,
t)
0.00.10.20.30.40.50.60.7
• normalized variances with α = 1, T = 5,E[exp(−αXT )] = 1.649:
∆t vanilla with u∗ with u10−2 4.64 1.70× 10−4 5.69× 10−2
10−3 4.02 1.69× 10−7 3.21× 10−2
10−4 4.55 1.73× 10−8 6.45× 10−2
• Sample E[XT ] with π = N (b, 1), i.e. consider h(x) = − log x, w.r.t.
dXs = (−Xs + b)ds+√2dWs, Xt = x
– then XT ∼ N (b+ (x− b)et−T , 1− e2(t−T ))
u∗(x, t) =
√2et−T
b+ (x− b)et−T
• Molecular dynamics: compute transition probabilities, i.e. first hitting times
– consider g = 1, h = 0, i.e. we samplehitting times E[e−τ ]
– optimal change of measure can be inter-preted as a tilting of the potential guidingthe dynamics
2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0x
0
2
4
6
8
10
12
14
G(x
)
13579111315optimal
References
[1] Hartmann, C., Richter, L., Schütte, C., & Zhang, W. (2017). Variational characterization of free energy: theory and algorithms. Entropy, 19(11), 626.[2] Richter, L. (2016). Efficient statistical estimation using stochastic control and optimization. Master’s thesis, Freie Universität Berlin.[3] Fleming, W.H., Soner, H.M. (2006). Controlled Markov processes and viscosity solutions. Springer, New York, NY, USA.[4] Dai Pra, P., Meneghini, L., Runggaldier, W.J. (1996) Connections between stochastic control and dynamic games. Math. Control Signals Syst, 9, 303-326.
Top Related