Numerical method for HJB equations. Optimal control problems … · 2012. 5. 11. · Numerical...

Numerical method for HJB equations. Optimalcontrol problems and differential games

(lecture 3/3)

Maurizio Falcone (La Sapienza) & Hasnaa Zidani (ENSTA)

ANOC, 23–27 April 2012

M. Falcone & H. Zidani () HJB approach for optimal control problems ANOC, 23–27 April 2012 1 / 50

Outline

1 Introduction

2 Planing Motion, reachability analysis

3 Hamilton-Jacobi approach: level set method

4 Differential games under state-constraints


Consider the controlled system:yx (s) = f (yx (s), α(s)), s ∈ (0,+∞),yx (0) = x ,

(1)

α(s) ∈ A, a.e s ∈ (0,+∞).

where A is a convex compact set in Rm, (m ≥ 1).

Admissible trajectories:

S[0,τ ](x) := yx satisfying (1) on(0, τ), yx (0) = x

Under classical assumptions, the set-valued function x S[0,τ ](x) is

Lipschitz continuous,

∃L > 0,S[0,τ ](x) ⊂ S[0,τ ](z) + L|x − z|BW 1,1 ∀x , z ∈ Rd .


Now, consider the following control problems:

ä Mayer’s problem:

V (x , t) = infyx∈S[0,t](x)

Φ(yx (t))

ä Time minimum problem (C closed set in Rd ):

T (x) = inf

t ; yx (t) ∈ C, yx ∈ S[0,t](x)

ä Supremum cost:

V∞(x , t) = infyx∈S[0,t](x)

Φ(yx (t))∨

supθ∈[0,t]

g(yx (θ))


Assume that g : Rd → R is Lipschitz continuous (+classicalassumptions on f )

If Φ : Rd → R is lsc (resp. Lipschitz), then V and V∞ are lsc (resp.Lipschitz).

When the target C is closed, the minimum time function T is lsc.


Mayer Problem

V (x , t) = minyx∈S[0,h](x)

V (yx (h), t − h) h ∈ (0, t),

V (x ,0) = Φ(x)

Minimum time problem:

T (x) = minyx∈S[0,h](x)

T (yx (h)) + h h < T (x), x 6∈ C,

T (x) = 0 x ∈ C;

Supremum cost

V∞(x , t) = minyx∈S[0,h](x)

V∞(yx (h), t − h)∨

supθ∈[0,h]

g(yx (θ))

V∞(x ,0) = Φ(x)∨

g(x);


V (x , t) = minyx∈S[0,t](x)

Φ(yx(t))

V (x , t) = minyx∈S[0,h]

V (yx (h), t − h) h ∈ (0, t).

V (x ,0) = Φ(x)

ä Suboptimality:

∀yx ∈ S[0,t](x), s 7−→ V (yx (s), t − s) is increasing,

ä Superoptimality

∃y∗x ∈ S[0,t](x), s 7−→ V (y∗x (s), t − s) is constant


Next, we derive the Hamilton-Jacobi-Bellman equation (HJB), which isan infinitesimal version of the DPP.

ä ∂tV (x , t) + H(x ,DxV (x , t)) = 0, x ∈ Rd , t > 0;V (x ,0) = Φ(x) Time-dependent HJB equation

ä H(x ,DT (x)) = 1, x 6∈ C, T (x) < +∞;T (x) = 0 on C Steady HJB equation

ä min(∂tV∞(x , t) + H(x ,DV∞(x , t)),V∞(x , t)− g(x)) = 0,

x ∈ Rd , t > 0;V∞(x ,0) = Φ(x) ∨ g(x) HJB-VI ineqation

where H(x ,q) := maxa∈A(−f (x ,a) · q).


ä To each control problem is associated an adequateHamilton-Jacobi equation

ä A very large class of control problems can be considered withinthe HJ framework (state-constrained control problem, infinitehorizon control problems, hybrid systems, impulsive control, ... )

ä The viscosity notion provides a very convenient framework for thetheoretical and numerical studies of the value function


ä When the (exact) value function is known, the feed-back controller canbe defined as the minimizer of the DPP.

ä This feedback can be shown to be an optimal control law

Van der Pol Problem :y1(t) = y2

y2(t) = −y1 + y2(1− y21 ) + a

a(t) ∈ [−1,1]

−2 −1 0 1 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Scheme:ENO2−RK1

Target


ä When the (exact) value function is known, the feed-back controllercan be defined as the minimizer of the DPP.

ä This feedback can be shown to be an optimal control law

Van der Pol Problem :y1(t) = y2

y2(t) = −y1 + y2(1− y21 ) + a(t)

a(t) ∈ [−1,1]

x1

x2

−1.5 −1 −0.5 0 0.5 1 1.5

−1.5

−1

−0.5

0

0.5

1

1.5


Open problemIn general, only an approximation of the value function can becomputed.It is not clear how the feedback control behaves with respect to a smallperturbation of the value function.


Numerical methodsSemi-Lagrangian methods: based on the DPP (Falcone, Ferretti,Jakobsen, Grüne, Kushner-Dupuis, ...)

PROS: no CFL condition for stability (⇒ adaptative schemes)CONS: non-local

Finite difference methods: approximation of the gradient by FD(Crandall/Lions, Barles, Souganidis)

CONS: needs CFL condition for stabilityPROS: local method =⇒ can be parallelizedPROS: non-monotone variants are proposed to get numerically"high-order"

• ε-monotone schemes (R. Abgrall)• ENO, WENO (Osher, Shu, ... )• Discontinuous Galerkin, direct DG (Cockburn, Shu, Cheng&Shu,

Bokanowski’12)• Anti-diffusive schemes, Ultra-Bee (Megdich-Bokanowski-HZ’10,

Bokanowski-Cristiani-HZ’10)M. Falcone & H. Zidani () HJB approach for optimal control problems ANOC, 23–27 April 2012 14 / 50

Recent developments and ongoing works:

• "Curse of dimentionality free methods" : max-plus algebra(McKeneaney, Akian, Gaubert, Sridharan, ...)

• Sparse grids method (Bokanowski/Klompmaker/Garke/Griebel 12’)


ENO2-RK1 scheme (and example of numerical Hnum)void HJB_FD::ENO2_RK1(double t, double deltat, double* vin, double* vout)

int i,j,d;double vi,v1,v2,v3,v4,vv,h;

for(j=0;j<mesh->inn_nbPoints;j++)i = rank[j];vi = vin[i];for(d=0;d<dim;d++)

v1 = vin[i - mesh->out_neighbors[d]];v3 = vin[i - 2*mesh->out_neighbors[d]];v2 = vin[i + mesh->out_neighbors[d]];v4 = vin[i + 2*mesh->out_neighbors[d]];h = mesh->Dx[d];vv = (v2-2.*vi+v1)*divdx[d]*divdx[d];Dvnum[2*d] = (vi-v1)*divdx[d] + h*.5*minmod((vi-2.*v1+v3)*divdx[d]*divdx[d],vv);Dvnum[2*d+1]= (v2-vi)*divdx[d] - h*.5*minmod((vi-2.*v2+v4)*divdx[d]*divdx[d],vv);

vout[i] = vi - deltat * (*this.*Hnum)((mesh->*(mesh->getcoords))(i),Dvnum,t);

inline double Hnum(const double* x, const double* v, double t)

double z=0., p[DIM], amax[DIM];int i;for(i=0;i<DIM;i++)

amax[i]=1.; //- a maximal bound of the dynamicsp[i]=(v[2*i] + v[2*i+1])/2.;z += amax[i]*(v[2*i+1] - v[2*i])/2.;

return H(x,p)-z; //- H(x,Nabla u) is the Hamiltonian


HJ parallel Library (Binope)by O. Bokanowski, A. Désilles, J. Zhao, H. Zidani

http://www.ensta-paristech.fr/~zidani/BiNoPe_HJ/presentation.html

Finite Differences solver (ENO, UltraBee)

C++, parallel (MPI/OpenMP)works in any dimension (limited to machine’s capacity)

Semi-Lagrangian solverC++ (OpenMP)works in any dimension (limited to machine’s capacity)


Reachable (or Attainable) set

ä The reachable set Rf (x ; t) from x at time t is the set of all pointsof the form yx (τ), where yx ∈ S[0,t](x):

Rf (x ; t) := yx (τ) | yx ∈ S[0,t](x), τ ∈ [0, t ].

ä The reachable set from X is defined by:

RfX (t) := ∪x∈XRf (x ; t).

Figure: Reachable set


Let C be a closed target set (in our examples, C is safe)

Capture Basin (or Backward reachable set)

ä The Capture Basin CaptCt , at time t , is the set of all initial positionsx from which a trajectory yx ∈ S[0,t](x) can reach the target C.


ä Does there exist a trajectory leading from a state in initial setX to a state in the target C, during some finite time horizon?

ä Once an obstacle has been detected by suitable sensors(e.g. radar, pursuer), can a collision be avoided?

ä Sometimes we have no control over input signal (noise,actions of other agents, unknown system parameters, ...): itis safest to consider the worst-case.


Different approaches for computing the reachable sets

ä set-valued integration schemes [Saint-Pierre’91, Baier’95], optimalcontrol techniques [Varaiya’00, Baier et al.’07]

ä external and inner ellipsoidal techniques [Kurzhanski andVaraiya’00,’01,’02]

ä discretization methods for nonlinear problems with stateconstraints [Chahma’03, Beyn an Rieger’07]

ä Optimal controller design: level set method [Osher-Sethian’91,Falcone-et-al.’05, Mitchell’07, Bokanowski-HZ’07,Bokanowski-Forcadel-HZ’10]


Optimization-based controller designMinimum-time control problem:

T (x) = inf

t ; yx (t) ∈ C, yx ∈ S[0,t](x).

The sublevels of the minimum function T correspond to theCapture Basins of the target C:

CaptCt = x ∈ Rd | T (x) = t.

When the minimum time function T is continuous, it can becharacterized as the unique viscosity solution of the HJBequation:

H(x ,DT (x)) = 1, x 6∈ C, T (x) < +∞;

T (x) = 0 in C,

where H(x ,q) := maxa∈A(−f (x ,a).q).M. Falcone & H. Zidani () HJB approach for optimal control problems ANOC, 23–27 April 2012 23 / 50

Unfortunately ...ä The continuity of T is equivalent to small controllability of the

system around the target: let ηx be the normal to C,

minα∈A

f (x , α) · ηx < 0, ∀x ∈ ∂C.

ä This (restrictive) controllability property is not satisfied inseveral examples (e.g. Zermelo problem).

ä In the general case, the minimum time function is only lsc. Inthis case, the approximation of the minimum value functionbecomes more difficult.


Zermelo problem


Optimal control problem. Level set approach

Define the signed distance function Φ(x) = dC(x),and consider the following control problem:

V (x , t) = infyx∈S[0,t](x)

Φ(yx(t))


ä V is Lipschitz continuous and satisfies (Crandall-Lions’84):

∂tV (x , t) + H(x ,DV (x , t)) = 0,V (x ,0) = Φ(x),

where H(x ,q) := supa∈A(−f (x ,a) · q)

ä For every t ≥ 0, CaptCt = x ∈ Rd ; V (x , t) ≤ 0;

ä The minimum time function T : Rd → R+ ∪ +∞ is lsc.Moreover, we have:

T (x) = inft ≥ 0; x ∈ CaptCt = mint ≥ 0; V (x , t) ≤ 0.

ã Φ can be any function satisfying

Φ(x) ≤ 0⇐⇒ x ∈ C.


ä The level set approach can be used even whenthe minimum time function is discontinuous!

The value function V is Lipschitz continuous!

ä The level set approach can be extended to moregeneral situations: differential games, avoidanceof obstacles, moving target and/or obstacles, ...


General setting: differential games under stateconstraintsLet α ∈ A be a controlled input, and β ∈ B an uncontrolled input(perturbance). Consider the trajectory:

yx (s) = f (yx (s), α(s), β(s)), s ∈ (0,1),yx (0) = x ,

Let (Kθ)θ≥0 be a family of closed set (of constraints). Consider agame involving two players.

I The first player wants to steer the system from the initial position atpoint x to the target C and by staying in K (and using his/her inputα(t) ∈ A)

I while the second player tries to steer the system away from C orfrom K (with his/her input β(t) ∈ B).

Assume θ 7−→ Kθ is usc.


Non anticipative strategies

We define the set of non-anticipative strategies for the first player, asfollows:

Γ :=

a : B→ A, ∀(β, β) ∈ B and ∀s ∈ [0,∞),(β(θ) = β(θ) a.e. θ ∈ [0, s]

)⇒(

a[β](θ) = a[β](θ) a.e. θ ∈ [0, s])

.


ä For τ ≥ 0,

CaptK(τ) := x ∈ Rd | ∃α,∀β, ya[β],βx (s) ∈ Ks,and ya[β],β

x (τ) ∈ C

ä Again, let Φ(x) = dC(x) and consider the control problem:

ϑ(x , τ) := mina[β]

maxβ|ya[β],β

x (t)∈Kt

Φ(y(τ)).

Then

CaptK(τ) = x ∈ Rd , ϑ(x , τ) ≤ 0

T (x) = minτ ≥ 0, ϑ(x , τ) ≤ 0.

ä For controlled systems lacking controllability assumptions, thecharacterization of ϑ by means of HJB equations is not an easytask !!! (Ref: Soner’86, Ishii-Koike’91, Frankowska’91,Altarovici-Bokanowski-HZ’12)


An other formulation: exact penalisation

ä Let g(x , θ) = dKθ(x) for any θ ≥ 0 and x ∈ Rd . Then, consider :

ϑg(x , τ) := infa[.]∈Γ

maxβ

Φ(yx (τ))∨

maxθ∈[0,τ ]

g(ya[β],βx (θ), θ).

ä ϑg is the unique continuous viscosity solution

min(∂ϑg(x , τ) + H(x ,Dxϑ

g(x , τ)), ϑg(x , τ)− g(x , τ)))

= 0,

ϑg(x ,0) = max(Φ(x),g(x ,0)).

where H(x ,q) := supa∈A minb∈B(−f (x ,a,b) · q)

ä And

Captt = x ∈ Rd , ϑg(x , τ) ≤ 0, T (x) = minτ ≥ 0, ϑg(x , τ) ≤ 0.


Example1: One player, fixed obstacles


Example2: Zermelo problem with obstacles

x ′ = Vboat cos(θ) + Vcurrent − ay2

y ′ = Vboat sin(θ)


Zermelo problem with obstacles: feedback control law

x ′ = Vboat cos(θ) + Vcurrent − ay2

y ′ = Vboat sin(θ)

−3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Scheme:ENO2−RK1 Obstacle Target


Exemple: Ariane V

ObjectifMinimize the ergol consumption tosteer the (given) payload MCU tothe GTO (or GEO).

Collaboration with Cnes (projet OPALE 2007-2010)


The physical model involves 7 state variables, the position−−→OG of

the rocket in the 3D space, its velocity −→v and its mass m.

O

L

eJ

`

G

eK

eI

e`

−→v

er

G

χ

γ

eL

Projection of −→v on the frame (er , eL, e`)

The forces acting on the rocket are: Gravity−→P , Drag

−→FD, Thrust−→

FT , and Coriolis−→Ω .

Newton Law:

md−→vdt

=−→P +

−→FD +

−→FT − 2m

−→Ω ∧ −→v −m

−→Ω ∧ (

−→Ω ∧−−→OG),


GTO

m2

m1

m3

m2m1

m3

PHASE B

PHASE C (HJB)

GEO

PHASE A(HJB)

(transport)

Figure: mission Kourou-GEO.M. Falcone & H. Zidani () HJB approach for optimal control problems ANOC, 23–27 April 2012 40 / 50

The related equation

State variables:r=altitudev=modulus of the velocityγ=angle between the direction earth-rocket and the direction of therocket’s velocity.L= latitude`= longitudeχ= azimuthm= masse of the engine

Control:α=angle between the thrust direction and the direction of the rocket’svelocity.


r = v cos γ

v = −g(r) cos γ − FD(r , v)

m+

FT (r , v ,a)

mcosα

Ω2r cos `(cos γ cos `− sin γ sin ` sinχ)

γ = sin γ(

g(r)

v− v

r

)− FT (r , v ,a)

vmsinα

−2Ω cos ` cosχ− Ω2 rv cos `(sin γ cos `− cos γ sin ` sinχ)

L =vr

sin γ cosχcos `

˙ =vr

sin γ sinχ

χ = −vr

sin γ tan ` cosχ− 2Ω(sin `− cotanγ cos ` sinχ)+

Ω2 rv

sin ` cos ` cosχsin γ


â The plane of motion is the equatorial plane ` ≡ 0, and χ ≡ 0.

r = v cos γ

v = −g(r) cos γ − FD(r , v)

m+

FT (r , v ,a)

mcosα + Ω2r cos γ

γ = sin γ(

g(r)

v− v

r

)− FT (r , v ,a)

vmsinα− 2Ω− Ω2 r

vsin γ

L =vr

sin γ

m = −b(m(t))


The rocket’s mass

ä The evolution of the mass can be summarized as follows

Phase 0 & 1 Phase 2 Phase 3m1(t) = −βEAP m1(t) = 0 m1(t) = 0m2(t) = −βE1 m2(t) = −βE1 m2(t) = 0m3(t) = 0 m3(t) = 0 m3(t) = −βE2

where βEAP , βE1 and βE2 are the mass flow rates for the boosters,the first and the second stage.

ä At the changes of phases, we have a (not negligible) discontinuityin the rocket’s mass.


The control problem can be formulated as (for a fixed payload)

Minimize tf

(r , v , γ,m, α) satisfy the state equation

α(t) ∈ [0, π/2] a.e. t ∈ (0, tf ),

(r(tf ), v(tf ), γ(tf )) ∈ C,

Q(r(t), v(t))α(t)) ≤ Cs for t ∈ (0, tf ),

m(tf ) = Mp.

where the target C corresponds to the GTO orbit, and the function Q isthe dynamic pressure.


ã The Capture Basin is wide+ We introduce "physical" state constraints to define the

computational domain

ã Due to the CFL condition, the time step is very small+ Adaptative time discretization

ã "Different scales" for the state variables:

+ Change of variable:

r = r0(ex − 1) + rTv = v0(ey − 1) + vT


GTO target

0 500 10000

200

400

time (sec)

altitu

de

(km

)

0 500 1000

2000

4000

6000

8000

10000

time (sec)

sp

ee

d (

m/s

)

0 500 10000

0.5

1

1.5

time (sec)

ga

mm

a (

rad

)

0 500 1000

200

400

600

time (sec)

ma

ss (

ton

)

Figure: Full trajectory using the HJB minimal time value function

Reference trajectory, final mass: mT = 21.57 (t)HJB trajectory, final mass (after reconstruction): mT = 22.50 (t)


"Collision analysis for an UAV".E. Crück, A. Desilles, HZ.AIAA Guidance, Navigation, and Control, 2012

"A general Hamilton-Jacobi framework for nonlinear state-constrained control problems".A. Altarovici, O. Bokanowski and HZESAIM:COCV, 2012

"Minimal time problems with moving targets and obstacles". O. Bokanowski and HZ.18th IFAC World Congress, Milan, 2011

"Deterministic state constrained optimal control problems without controllability assumptions". O. Bokanowski, N.Forcadel and HZESAIM: COCV, 17(04), pp. 975–994, 2011

"An efficient data structure and accurate scheme to solve front propagation problems". O. Bokanowski, E. Cristiani andHZJ. of Scientific Computing, 42(2), pp. 251–273, 2010

"Reachability and minimal times for state constrained nonlinear problems without any controllability assumption". O.Bokanowski, N. Forcadel and HZSIAM J. Control and Optimization, vol. 48(7), pp. 4292-4316, 2010

"Convergence of a non-monotone scheme for Hamilton-Jacobi-Bellman equations with discontinuous initial data".O. Bokanowski, N. Megdich and HZ.Numerische Mathematik, 115(1), pp. 1–44, 2010

"An anti-diffusive scheme for viability problems"O. Bokanowski, S. Martin, R. Munos and HZ.Applied Num. Methematics, 56(9), pp. 1147–1162, 2006


... many thanks for your attention!


Numerical method for HJB equations. Optimal control problems … · 2012. 5. 11. · Numerical...

Documents

Transcript of Numerical method for HJB equations. Optimal control problems … · 2012. 5. 11. · Numerical...