Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

8

Click here to load reader

Transcript of Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

Page 1: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

JOURNAL OF GUIDANCE, CONTROL, AND DYNAMICS

Vol. 29, No. 4, July–August 2006

Optimal Finite-Time Feedback Controllersfor Nonlinear Systems with Terminal Constraints

Srinivas R. Vadali∗ and Rajnish Sharma†

Texas A&M University, College Station, Texas 77843-3141

This paper presents a dynamic-programming approach for determining finite-time optimal feedback controllersfor nonlinear systems with nonlinear terminal constraints. The method utilizes a polynomial series expansion ofthe cost-to-go function with time-dependent gains that operate on the state variables and constraint Lagrangemultipliers. These gains are computed from backward integration of differential equations with terminal boundaryconditions, derived from the constraint specifications. The differential equations for the gains are independent ofthe states. The Lagrange multipliers at any particular time are evaluated from the knowledge of the currentstate and the gain values. Several numerical examples are considered to demonstrate the applications of thismethodology. The accuracy of the method is ascertained by comparing the results with those obtained by usingopen-loop solutions to the respective problems. Finally, results of the application of the developed methodology toa spacecraft detumbling problem are presented.

Introduction

S OLUTIONS to optimal control problems can be obtained bysolving the Hamilton–Jacobi–Bellman (HJB) equation, a non-

linear partial differential equation. The method of characteristics,which is typically implemented via shooting algorithms, results inopen-loop solutions to the HJB equation. The linear-quadratic (LQ)problem is a special case for which there is an exact, closed-formfeedback solution for the HJB equation, with the control gains re-sulting from the solution to a Riccati equation and a set of linear aux-iliary differential equations. Other special cases of norm-invariantnonlinear systems exist for which the optimal control involves alinear feedback of the states. An example of this type of a problemis that of the optimal detumbling of a rigid body using continuouscontrols, which minimize a quadratic, infinite-horizon performanceindex, based on the kinetic energy.1 Many approaches have beendevised in the controls literature to obtain approximate solutions tothe unconstrained HJB equation, in general settings. In this paper,attention is focused on problems for which the HJB equation admitssmooth solutions.

Power-series solution methods to the infinite-horizon problemscan be traced back to the work of A’lbrekht2 who used Lyapunovfunctions to construct optimal controls for nonlinear systems, infeedback form. The convergence proof is based on the asymptoticstability of the linear part of the closed-loop system. Lukes3 formal-ized the power-series approach by expanding the cost and controlsabout the origin. Dabbous and Ahmed4 showed the application ofthe series solution method to the problem of nonlinear optimal an-gular momentum regulation of a satellite. Carrington and Junkins5

used the series expansion of the costate vector to solve the optimalfinite-horizon, spacecraft attitude regulation problem in feedbackform. Their approach is different from the series solution of the HJBequation in which the scalar cost function is expanded in a power se-ries. The control law obtained by Carrington and Junkins5 involves

Presented as AAS Paper 05-182 at the AAS/AIAA 15th Space Flight Me-chanics Conference, Copper Mountain, CO, 23–27 January 2005; received24 March 2005; revision received 20 July 2005; accepted for publication 25July 2005. Copyright c© 2005 by Srinivas R. Vadali and Rajnish Sharma.Published by the American Institute of Aeronautics and Astronautics, Inc.,with permission. Copies of this paper may be made for personal or internaluse, on condition that the copier pay the $10.00 per-copy fee to the CopyrightClearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923; includethe code 0731-5090/06 $10.00 in correspondence with the CCC.

∗Stewart & Stevenson Professor, Department of Aerospace Engineering,3141 TAMU; [email protected]. Associate Fellow AIAA.

†Graduate Student, Department of Aerospace Engineering, 3141 TAMU;[email protected]. Student Member AIAA.

nonlinear feedback of angular velocities and Euler parameters, usingtime-varying, matrix, and higher-order tensor gain elements. Theyincorporated a change of variables involving the terminal boundaryconditions in order to convert a general maneuver problem into azero-set-point problem. In addition, they chose a long enough hori-zon for completing the maneuver, thus requiring only the method-ology of the asymptotically stable, infinite-horizon regulator.

Yoshida and Loparo6 presented an approximation theory for theoptimal regulation of nonlinear systems with additive controls usinga lexicographic listing of vectors containing monomials of differentdegrees. This arrangement allows the use of Carleman linearizationfor the theoretical analysis of convergence. Extension of the seriessolution methodology to the solution of the Hamilton–Jacobi–Isaacsequation can be found in Huang and Lin7 and Tsiotras et al.8

Approximations of other forms can also be found in the literatureon suboptimal control. Garrard et al.9 presented an approximationmethod by neglecting the time derivatives of certain nonlinear termsin the HJB equation. One can establish a connection between thiswork and the state-dependent Riccati equation approach.10 Tewari11

presented an iterative approximation technique for the spacecraftattitude control problem, which uses a specific form of a positive-definite cost function (Lyapunov function) chosen a priori. Exten-sion of this technique to tracking maneuvers can be found in Sharmaand Tewari.12

Cimen and Banks13 introduced an “approximating sequence ofRiccati equations” approach to nonlinear optimal control problemswith nonquadratic performance indices. However, they did not con-sider terminal constraints. In this approach, the optimal control isobtained by solving a sequence of time-varying Riccati differentialequations until convergence is achieved. The result is an optimalcontrol law for the original nonlinear system, which has a linear statefeedback structure, with time-varying gains. However, the feedbackcontrol solution has to be recomputed for each new initial condi-tion. The θ − D suboptimal approach of Xin et al.14 also results inan approximate solution to the HJB equation. It involves the addi-tion of small perturbations to the original cost function such that aclosed-form solution to the control can be achieved. This methodcan also be classified as an inverse optimal control approach.

Inverse optimal control approaches obtain a performance indexthat is minimized, given a stabilizing feedback control law. How-ever, often, the performance index might not have known physicalsignificance. Examples of the applications of this technique can befound in Vadali and Junkins15 and Kristic and Tsiotras.16

Beard et al.17 developed a Galerkin successive approximationtechnique to solve the generalized HJB (GHJB) equation, a linearpartial differential equation. The sequence of solutions convergesto the solution of the HJB equation in the limit. Extensions of the

921

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 2: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

922 VADALI AND SHARMA

methodology to solve robust optimal control problems can be foundin Beard and McLain.18 Other approaches exploit the fact that theoptimal control at any time is dependent on the local gradient of thecost-to-go. Hence a local approximation procedure is also possibleto approximate the cost function in the neighborhood of the currentstate.19 Park and Tsiotras20,21use the wavelet basis functions to solvethe GHJB equation using the iterative Galerkin procedure as wellas the collocation approach.

As just discussed, much of the work has been focused on theinfinite-time regulation problem, but the fixed final-time problemwith nonlinear terminal constraints has not received its due attention.The primary reason is the need to deal with the hard terminal con-straints and the associated computation of the constraint Lagrangemultiplier. A special case of the hard terminal constraint problemis one where the final states are specified a priori or can be solvedfor uniquely from the constraint equation. For such problems, in aseries of papers, Guibout and Scheeres22 and Park and Scheeres23,24

proposed an alternate approach based on canonical transformationsand the Taylor’s series expansions of the Hamiltonian and generat-ing functions. They have demonstrated a computational technique tosolve in closed form a class of two-point boundary-value problemsarising in the context of impulsive orbit-transfer problems, forma-tion flying, as well as optimal control problems. In this approach,the state and costate vectors at a particular time instant are viewedas a result of a canonical transformation from the initial state andcostate vectors. Different types of generating functions can be cho-sen depending on the class of problems (boundary conditions) be-ing solved. A generating function of one type can be converted intothat of another type via the Legendre transformation. Reference 24contains applications of the HJB theory to the minimum-time con-trol problem of the double integrator as well as a singular controlproblem for a linear system with a quadratic cost function. Ex-plicit nonlinear terminal constraints have not been accounted for inRefs. 22–24.

The sweep method25 was developed for solving finite-time LQproblems with linear terminal constraints. The method is based ontwo steps. As presented in Ref. 25, the first step expresses the costatevector at any time instant as a linear combination of the currentstate vector and the constraint Lagrange multiplier vector. The sec-ond step in the process requires that the terminal constraint alsobe expressed as a linear combination of the same form, as alreadymentioned. A symmetry property, involving the gains of the twolinear combinations, is obtained after the derivation process for thedifferential equations governing the evolution of the gain matricesis completed. Similarity between the sweep method and dynamicprogramming for a class of LQ problems is shown in Bryson.26 Analternate derivation using the generating function approach for solv-ing the HJB equation is also presented in Park and Scheeres23 forthe solution to the LQ problem. The terminal constraint is a givenfinal state vector, and a constraint Lagrange multiplier vector is notrequired in this approach. The HJB approach is more direct andnatural than the sweep method.

In this paper, the dynamic-programming approach, based onpower-series expansion of the cost function, is extended to deal withfinite-time feedback control of nonlinear systems with nonlinear ter-minal constraints. In so doing, a compact derivation of the results ofthe sweep method for LQ problems is obtained as a special case. Thepower-series solution methodology is used, but, unlike in the worksjust discussed, explicit use is made of the constraint Lagrange mul-tipliers. The methodology is illustrated first on a problem that has ananalytical solution. Subsequently, a two-dimensional problem witha nonlinear terminal constraint is treated. Finally, an example ofoptimal detumbling of a spacecraft is presented.

Statement of the ProblemThe main objective in this study is the determination of a nonlinear

feedback control law to minimize a general Bolza performance index(cost) defined as

J = φ[x(t f )] +∫ t f

t0

L(x, u, τ ) dτ (1)

for a nonlinear dynamical system:

x = f (x, u, t) (2)

where f : �n + m + 1 → �n is a smooth, analytic, vector-valued func-tion with x ∈ �n and u ∈ �m . The initial time t0, the final timet f , and the initial condition on the state vector x0 are assumedto be given. It is also assumed that f (0, 0, t) = 0 and the con-trol vector u is unconstrained. The terminal constraint is given byψ[x(t f )] − ψ f = 0; ψ, ψ f ∈ �p ≤ n , where ψ f , the value of the con-straint, is defined to satisfy the condition ψ(0) = ψ f . Equation (1),when written with the current time as the lower limit of integration,represents the cost-to-go function, and it is termed the value func-tion in the dynamic-programming literature. It is also assumed thatthe value function and the controls are smooth with respect to thestates, Lagrange multipliers, and time.

Derivation of the Important ResultsHerein, the variational approach to the derivation of the necessary

conditions is briefly outlined first, and then the results are related tothose obtained from dynamic programming.

The terminal constraint and Eq. (2) are augmented to the costfunction, and the augmented cost function is written as

Ja = φ[x(t f )] + νT {ψ[x(t f )] − ψ f }

+∫ t f

t0

{L(x, u, t) + λT [ f (x, u, t) − x]} dt (3)

where ν is a vector of constant Lagrange multipliers that enforcethe terminal constraints and λ is the costate vector. The necessaryconditions for optimality can be derived by using the Hamiltonianformalism, which begins with the definition of the Hamiltonian:

H = L(x, u, t) + λT f (x, u, t) (4)

The first variation of the augmented cost function25 is written as

δ Ja =[

∂φ

∂x(t f )+ ∂ψ

∂x(t f )

T

ν − λ(t f )

]T

δx(t f ) + {ψ[x(t f )]

− ψ f }T dν +∫ t f

t0

[(∂ H

∂x+ λ

)T

δx +(

∂ H

∂λ− x

)T

δλ

+(

∂ H

∂u

)T

δu

]dt (5)

First-order necessary conditions that must be satisfied by the opti-mal state and control are obtained by requiring that the first variationδ Ja in Eq. (5) vanish, for admissible variations in x, u, λ, and ν. Thenecessary conditions for extremizing the augmented performanceindex are25

x = ∂ H

∂λ, λ = −∂ H

∂x(6)

u∗(t) = minu(H) (7)

where u∗(t) is the optimal control.The transversality condition is

λ(t f ) = ∂φ[x(t f )]

∂x(t f )+

{∂ψ[x(t f )]

∂x(t f )

}T

ν (8)

A further necessary condition is that the terminal constraint must besatisfied.

Dynamic-programming formalism also provides necessary andsufficient conditions for the preceding minimization problem in theform of the HJB equation and the associated boundary conditions.Explicit feedback control laws can be obtained from the solutions

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 3: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

VADALI AND SHARMA 923

to the HJB equation, a partial differential equation for the optimalvalue function (cost-to-go) J ∗ as shown next25:

∂ J ∗

∂t= −H

[x(t), u∗(t),

∂ J ∗

∂x, t

](9)

with the boundary condition

J ∗[x(t f ), t f ] = φ[x(t f )]; x(t f )

such that ψ[x(t f )] = ψ f (10)

The key result that connects dynamic programming and the varia-tional approach is the relationship between the costate vector andthe gradient of the value function25,26:

λ∗(t) = ∂ J ∗[x(t), t]

∂x(t)(11)

Indeed, using the Lagrange multiplier rule, Eq. (10) can also bewritten as

J ∗[x(t f ), t f ] = φ[x(t f )] + νT {ψ[x(t f )] − ψ f } (12)

and the partial derivative of the preceding expression with respectto x(t f ) results in the boundary condition of Eq. (8). A detailedtreatment of the boundary conditions can be found in Dreyfus.27

Another key sensitivity result can be arrived at by considering theeffect of the differential in the constraint Lagrange multiplier vectordν on the cost, in the neighborhood of the optimal solution. It canbe observed from Eq. (5) that if all of the necessary conditions aresatisfied then the gradient with respect to ν, of the augmented costfunction, along the optimal trajectory is

∂ Ja

∂ν= ∂ J [x(t0), t0]

∂ν+ {ψ[x(t f )] − ψ f } = 0 (13)

Because the terminal conditions are also satisfied as part of thenecessary conditions and because the initial time is arbitrary, thepreceding result can also be stated as

∂ J ∗[x(t), t]

∂ν= 0 (14)

The partial derivative of Eq. (12) with respect to ν shows that thepreceding condition is also satisfied at the final time:

∂ J ∗[x(t f ), t f ]

∂ν= {ψ[x(t f )] − ψ f } = 0 (15)

Equation (14) is the main result that will be exploited in this paper.This important result has not been dealt with in any of the standardreferences found in the literature. It is also important to point outthat unlike the solution obtained from a two-point boundary-valueproblem a solution that satisfies the HJB equation, if it exists, satis-fies both the necessary and sufficient conditions for a minimum. Inthe next section, the dynamic-programming approach is applied tothe fixed-horizon, terminally constrained, LQ problem. Interestedreaders can compare the derivation process with that presented inBryson and Ho.25

Alternative Derivation of the LQ Terminal ControllerConsider the following LQ problem with known initial and final

times:Minimize:

J = 1

2

∫ t f

t0

(x T Qx + uT Ru) dt (16)

Subject to:

x = Ax + Bu (17)

and a linear terminal constraint

Cx(t f ) = ψ f (18)

The weight matrix Q is assumed to be symmetric and positivesemidefinite. The control-weighting matrix R is assumed to be sym-metric and positive definite. Controllability of the pair (A, B) andobservability of the pair (A,

√Q) are also assumed. When the num-

ber of constraints is equal to the number of states and the matrix Cin Eq. (18) is invertible, then the final state can be explicitly solvedfor. For cases where the number of constraints is less than the di-mension of the state vector, the final state is determined as a resultof the optimization process.

The Hamiltonian for the problem is

H = 12[x T Qx + uT Ru] + λT [Ax + Bu] (19)

The application of the optimality condition results in the followingrelationship:

u = −R−1 BTλ (20)

Substitution of Eq. (20) in Eq. (19) results in the following form ofthe Hamiltonian:

H = 12[x T Qx − λT B R−1 BTλ] + λT [Ax] (21)

The transversality condition is written as

λ(t f ) = ∂ J ∗[x(t f ), t f ]

∂x(t f )= CTν (22)

The next step for obtaining a feedback control law is an expansionof the cost-to-go. Ideally, it is desirable to express the cost functionin terms of the current and final states. However, this is not possibledirectly because an expansion of this type does not lead to explicitterminal boundary conditions for the gains. This situation can beidentified with the lack of identity transformations for certain typesof generating functions in the canonical transformation approach.23

Because the Hamiltonian is quadratic in x and λ and Eq. (22) sug-gests a linear dependence of λ on ν, the following quadratic formis written for J ∗[x(t), t]:

J ∗[x(t), t] = 12x(t)T S(t)x(t) +νT K (t)x(t) + 1

2νT P(t)ν −νT ψ f

(23)

Equation (23) is a modified version of a similar form given inBryson.26 The addition of the last term in Eq. (23) is motivated byits presence in Eq. (3). The matrices S, K , andP in the precedingequations are time-varying gain matrices of appropriate dimensions.Differential equations and the appropriate boundary conditions forthe gains have to be determined. Note that S and P can be assumedto be symmetric. The following results are obtained by utilizingEqs. (11) and (14) for the LQ problem:

λ = ∂ J ∗

∂x= Sx + K Tν (24)

∂ J ∗

∂ν= 0 = K x + Pν − ψ f (25)

The boundary conditions for the preceding gains can be obtainedby inspecting Eqs. (18) and (22), as given next:

S(t f ) = 0 (26)

K (t f ) = C (27)

P(t f ) = 0 (28)

Note also that these boundary conditions result in the followingintuitive result for the cost at the final time:

J ∗[x(t f ), t f ] = 0 (29)

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 4: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

924 VADALI AND SHARMA

The HJB equation can be written as

∂ J ∗

∂t= −1

2

[x T Qx − ∂ J ∗

∂x

T

B R−1 BT ∂ J ∗

∂x

]

− 1

2

∂ J ∗

∂x

T

[Ax] − 1

2x T AT ∂ J ∗

∂x(30)

The differential equations for the gains are obtained by utilizingEq. (23) in Eq. (30) and collecting the coefficients of the linear andquadratic terms involving x and ν. These equations are

S = −Q − S A − AT S + SB R−1 BT S (31)

K = −K (A − B R−1 BT S) (32)

P = K B R−1 BT K T (33)

The optimal control law can be obtained from Eq. (20), via Eq. (24).However, in order to implement the control, the constant ν must becalculated from Eq. (25) at any time, other than the final time, asfollows:

ν = −P(t)−1[K (t)x(t) − ψ f ] (34)

Sufficient conditions for a minimum, based on the existence ofsolutions to Eqs. (31–33), can be found in Bryson and Ho.25 Signif-icance of the normality condition P(t) < 0 for t0 ≤ t ≤ t f can beseen from the evaluation of the value function by replacing ψ f inEq. (23) by using Eq. (25). The result is

J ∗[x(t), t] = 12x(t)T S(t)x(t) − 1

2νT P(t)ν (35)

Because S is required to be the positive-definite solution to theRiccati equation, the normality condition ensures that the valuefunction remains positive definite, as it should be.

For an open-loop solution to the problem, the initial costate isall that is required. The initial costate can be computed with theknowledge of the initial values of the gains and ν. In the absence ofnumerical errors, ν is a constant, irrespective of the time t , at whichit is computed by using Eq. (34). The optimal control problem canbe solved by forward integration of the state and costate differentialequations, once the initial costate vector is determined from Eq. (24).However, a feedback control implementation requires that the gainsbe stored and the control be implemented using the current state andtime. A singularity exists in the feedback control implementationat the final time, where ν cannot be calculated from Eq. (34). Apractical fix to this problem is to stop the updating of ν during the“blind time.”26

The dynamic-programming approach for the derivation of thefeedback control law is more direct than that of the sweep method,using the state and costate differential equations.25 Furthermore,the computational burden for determining the optimal control is re-duced (considerably more so for nonlinear problems because of theexploitation of symmetry of the gain tensors) by implementing theHJB methodology. The method just presented is especially attrac-tive when the dimension of ν is much smaller than that of x becausethe number of gains is reduced. In the next section, the procedurejust described for the LQ problem is extended to the more generalcase of a nonlinear system with a nonlinear terminal constraint.

Series Solution Methodology for the Nonlinear ProblemSeries solution methods are attractive for weakly nonlinear sys-

tems that are analytic. Convergence of the series solution is notguaranteed for highly nonlinear systems. Herein, it is assumed thatthe states are nondimensionalized such that the series solution isvalid. Alternatively, the nonlinear perturbation dynamics close tothe nominal optimal trajectory can be considered, in order to jus-tify the use of the series solution. Required assumptions regardingsmoothness of the value function and existence of solutions are alsomade.

Because the Hamiltonian is a function involving the states andcostates, the cost-to-go function is expanded in a polynomial seriesinvolving x and ν as shown here:

J ∗[x(t), t] = S1i j (t)xi x j + S2i jk (t)xi x j xk + · · +K1pj (t)νpx j

+ K2pq j (t)νpνq x j + K3pi j (t)νpxi x j + · · · − (ψ f )pνp

+ C1i pq (t)νpνq + C2i pqr (t)νpνqνr + · · · h.o.t

i, j, k, l . . . . . . etc. = 1, 2, 3, . . . , n

p, q, r . . . . . . etc = 1, 2, 3, . . . , p ≤ n (36)

where Si (t), Ri (t), Ki (t), Ci (t) . . . , i = 1, 2, 3, . . . , are gain ten-sors (expressed using indicial notation), having time-dependent el-ements. This polynomial expansion of the cost-to-go is at the heartof the method presented in this paper.

Differential equations for the gain elements can be obtained bysubstituting Eq. (36) into the HJB equation [Eq. (9)] and collectingthe coefficients of like powered terms involving x and ν. For exam-ple, it can be seen that S1(t) satisfies the familiar Riccati equation.Terminal boundary conditions on the gain variables can be postu-lated by using Eqs. (8) and (15). Once the gain elements are obtainedvia backward integration, the costate vector can be determined fromEq. (11), and ν can be determined from Eq. (14) by using vectorreversion of series.28

Generation of the equations for the gain elements is a tediousprocess, but it can be simplified by the use of symbolic manipulationprograms like Mathematica® or Maple. Structured approaches to theseries solution methodology have been presented in Refs. 6 and 7.It is convenient to combine x and ν into a single extended vector forgenerating the gain equations. The equations required in this workwere generated by using the software package, Maple.

Example with a Closed-Form SolutionThe following, one-dimensional, example is considered to illus-

trate the application of the methodology:Minimize:

J = 1

2

∫ t f

0

u2 dt

Subject to:

x = u, ψ = x(t f ) + εx(t f )2 = ψ f

x(0), ψ f , and t f = given (37)

where ε is a small parameter. It is assumed that the constraint is reg-ular, that is, at least one real solution exists for x(t f ). This problemis quite simple if the constraint is solved for x(t f ) a priori. With-out an explicit solution for the final state, assuming that two realroots exist, it will be desirable for the algorithm to choose the finalcondition that involves the least cost.

The HJB equation for the example problem is

∂ J ∗[x(t), t]

∂t= 1

2J ∗2

x (38)

subject to

J ∗x [x(t f ), t f ] = λ(t f ) = νψx [x(t f )] = ν[1 + 2εx(t f )] (39)

The cost-to-go is expanded using a power series as follows:

J ∗[x(t), t] = S1x2 + S2x3 + · · · + K1νx + K2ν2x + K3νx2

+ · · · − ψ f ν + C1ν2 + C2ν

3 + · · · (40)

Thus the expansions for the costate and the constraint are

λ = 2S1x + 3S2x2 + · · · + K1ν + K2ν2 + 2K3νx + · · · (41)

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 5: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

VADALI AND SHARMA 925

and

K1x + 2K2νx + K3x2 + · · · + 2C1ν + 3C2ν2 + · · · = ψ f (42)

Equation (42) can be used to solve for ν, once the gains have beencomputed. The following system of differential equations for thegains is obtained by substituting Eq. (40) into Eq. (38) and utilizingEq. (39):

S1 = 2S21 , S1(t f ) = 0, S2 = 6S1 S2, S2(t f ) = 0

· ·K1 = 2S1 K1, K1(t f ) = 1, K2 = 2S1 K2 + 2K1 K3

K2(t f ) = 0, K3 = 3S2 K1 + 4S1 K3, K3(t f ) = ε

· · · ·C1 = 1

2K 2

1 , C1(t f ) = 0, C2 = K1 K2, C2(t f ) = 0

· · · · · (43)

The preceding differential equations can be solved analytically, andthe following is the series representation for the control obtainedfrom Eq. (41):

λ = −u = [ν + 2ε(t − t f )ν

2 + 4ε2(t − t f )2ν3

+ 8ε3(t − t f )3ν4 . . .

]ψx [x(t)] (44)

The preceding series is a binomial expansion, and it is recognizedthat as more terms in the series are taken the resulting control lawtakes the form:

u = −ν

[1

1 − 2ε(t − t f )ν

]ψx = −ν

[1

1 − 2ε(t − t f )ν

](1 + 2εx)

(45)

Hence, the series in Eq. (44) converges if |2ε(t − t f )ν| < 1. Equa-tion (42) can be reduced to the form shown next by using the solu-tions to Eq. (43):[ν(t − t f ) + 3ε(t − t f )

2ν2 + 8ε2(t − t f )3ν3

+ 20ε3(t − t f )4ν4 . . .

] = −{

ψ[x(t)] − ψ f

ψ2x [x(t)]

}(46)

which can also be written compactly as

ν(t − t f )

{1 − ε(t − t f )ν

[1 − 2ε(t − t f )ν]2

}= −

{ψ[x(t)] − ψ f

ψ2x [x(t)]

}(47)

On further inspection, the preceding result simplifies to

[1 − 2ε(t − t f )ν]2 = (1 + 2εx)2

[1 + 4εψ f ]ε = 0

ν = −(

x − ψ f

t − t f

), ε = 0 (48)

Hence, ν can be computed from Eq. (46), by using series reversionor directly, from Eq. (48). The process of series reversion yields anunique solution to ν:

ν = F − 3ε(t − t f )F2 + 10ε2(t − t f )2 F3 − 35ε3(t − t f )

3 F4

+ 126ε4(t − t f )4 F5 − 462ε5(t − t f )

5 F6 + · · · · · (49)

where

F = − 1

(t − t f )

{ψ[x(t)] − ψ f

ψ2x [x(t)]

}

Fig. 1 Optimal value function: comparison of the results of the seriesand exact solutions.

On the other hand, the proper solution to Eq. (48) must be selectedto obtain the optimal value function. Theoretically, ν is a constant,but its evaluation from Eq. (46) shows variation with respect to timebecause of truncation of the series. Note that if ψx = 0 initially, thenν cannot be computed from Eq. (49). This special case correspondsto the existence of a point from which two equally optimal solu-tions exist (Darboux point29). The series solution, in the form justpresented, is not valid for this special initial condition because therequirement for convergence of the series is violated. When posedas a fixed final-state problem, the difficulty with the special initialcondition just mentioned disappears. However, one must solve theproblem for multiple possible final states and choose the minimum-cost solution (not a practical method of solution for more difficultproblems). The control gain for either method approaches infinity,as the final time is reached. The advantage of the series solutionmethod is that it automatically chooses the optimal control, in theneighborhood of the terminal constraint, when the convergence cri-terion is satisfied and the required number of terms is included inthe series expansion.

For completeness, the exact, value function for the precedingproblem is given next:

J ∗[x(t), t] = ν

[ψ + (ν/2)(t − t f )

1 − 2ε(t − t f )− ψ f

](50)

where ν can be evaluated from Eq. (48). This result is used tovalidate the series solution method.

Consider an example with ε = 0.5, t f = 5 s, and ψ f = 0.5.Figure 1 shows the exact, bimodal cost function for this problem,evaluated at the initial time, t = t0 = 0. Results of the series solutionprocedure for four different initial conditions are superimposed onthis figure. These results were obtained by integrating the closed-loop system with the feedback control law given by Eq. (44), withν evaluated from Eq. (49). Six terms were used in the expansions ineach of the preceding equations. As shown in Fig. 1, values of thecost function are reproduced accurately at the selected points. Thesix-term series solution does not converge when the initial state isin the range (−2,0).

Other ExamplesThe following, two-dimensional, example is presented to further

illustrate the application of the series solution methodology. Un-like for the preceding example with an analytical solution for thegain differential equations, the gain equations for the examples pre-sented in this section were integrated backwards in time to obtaintheir initial values. In the actual implementation of the control law,the gain equations were integrated forward in time, along with thestate equations, using the known initial conditions. Although thecomputational burden increases with such an implementation, stor-age of the time-varying gains is avoided. A different implementation

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 6: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

926 VADALI AND SHARMA

is utilized in the next section. In general, especially when integrat-ing over large time intervals, some form of a symplectic integratorshould be utilized, as suggested in Ref. 22. In all of the examplespresented next, the value function was expanded to fourth order, andthe series reversion process to determine ν was also implementedto fourth order. Furthermore, because ν cannot be determined atthe final time, the simulation was stopped just prior to reaching thefinal time. For each example, the corresponding open-loop optimalsolution, obtained by using a shooting method, is also presented.

Equations (51), (52), and (53), respectively, show the dynamicalmodel, constraint, and performance index:

x = −(m1x + m2x2 + m3x3

) + Bu (51)

ψ = [ε1x(t f ) + ε2 x(t f ) + ε3x(t f )

2 + ε4 x(t f )2

+ ε5x(t f )x(t f )] = ψ f (52)

J =∫ t f

t0

(Q11x2 + Q22 x2 + Ru2

)dt (53)

Two cases are considered next by choosing different sets of param-eters for the model.

Case A: Highly Nonlinear System and Nonlinear Terminal ConstraintThis example is parameterized by the following choices for the

constants in Eqs. (51–53):

m1 = 1, m2 = 0.8, m3 = 0.75, ε1 = 0.9

ε2 = 0.6, ε3 = 0.6, ε4 = 0.3, ε5 = 0.1

Q11 = Q22 = B = R = 1, t0 = 0, t f = 3

x(to) = [0.1; 0.2], ψ f = 0.5

Figure 2a shows the open-loop and feedback control historiesfor the nominal and perturbed initial conditions. The feedback con-trol histories track their open-loop counterparts closely. Deviationsbetween the open-loop and feedback control histories are evidentbecause of the approximations introduced by the series expansionand the series reversion process. The results for the perturbed ini-tial conditions were obtained by utilizing forward integration only,because the initial conditions of the gains were known from thecomputations for the nominal problem. The error in satisfying theterminal constraint is approximately 4.8 × 10−5, for a 10% changein the initial state from its nominal value. The simulation is stoppedslightly before reaching the final time of 3 s because of the singu-larity at the final time. Figure 2b shows the open-loop and feedbackphase portraits for various initial conditions. Even though there issome deviation in the midcourse, the terminal constraint is satisfied

Fig. 2a Case A: open-loop and feedback control histories.

Fig. 2b Case A: phase portraits with various I.C. (Nominal initial con-ditions (I.C.) is perturbed within the range [−−10 10] %.)

Fig. 2c Case A: convergence of the state trajectory as a function of theorder of approximation.

Fig. 2d Multiple open-loop extremals for the same state I.C. but withdifferent initial costate guesses.

quite accurately. Figure 2c shows the convergence of the first statetoward its optimal solution, as more terms in the series are included.Third-order series solution for the control is seen to be much betterthan the lower-order control approximations.

Another interesting observation is the existence of multiple open-loop solutions obtained for the same initial conditions, dependingon the initial guesses for the costates. Because the shooting methodutilizes necessary conditions only, the result cannot be guaranteedto be a local minimum unless tested further using second-order con-ditions. Figure 2d shows three trajectories from the same initial

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 7: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

VADALI AND SHARMA 927

Fig. 3 Case B: phase portraits for the feedback and open-loop solu-tions for different state I.C.

conditions for the states, each with a different cost function. How-ever, each feedback-controlled trajectory, for all of the initial condi-tions, is associated with the lowest cost solution, for the respectivecase.

Case B: Unstable Nonlinear Plant with a NonlinearTerminal Constraint

Parameters for this example are given here:

m1 = −1, m2 = 0.8, m3 = 0.75, ε1 = 0.9

ε2 = 0.6, ε3 = 0.6, ε4 = 0.3, ε5 = 0.1

Q11 = Q22 = B = R = 1, t0 = 0, t f = 3

x(to) = [0.1; 0.2], ψ f = 0.5

The linear part of this system is unstable for this example. Figure 3shows the phase portraits of the optimal trajectories, both closed loopand open loop. As can be seen, all of the feedback and some of theopen-loop trajectories, for neighboring initial conditions, lie closeto each other. Some of the open-loop trajectories converge to higher-cost solutions, depending on the initial guesses for the costate values.As mentioned before, the same initial guess was used to obtain thesolutions for all of the initial conditions.

Application to Optimal Detumbling of a SpacecraftThe series solution methodology is finally applied to the problem

of optimal detumbling maneuvers of a rigid asymmetric spacecraft.The Euler equations for a rigid body are

ω = I −1[ωIω + u] (54)

where ω = (ω1, ω2, ω3) ∈ �3 × 1 and u = (u1, u2, u3) ∈ �3 × 1 are theangular velocity and input control torque vectors, respectively. Themoment of inertia matrix I ∈ �3 × 3 is represented in the principalaxes system as

I =

⎡⎣I1 0 0

0 I2 0

0 0 I3

⎤⎦ (55)

The angular velocity cross-product operator is represented by ω.The principal moments of inertia for this example are given here30:

I1 = 86.24 kg-m2, I2 = 85.07 kg-m2, I3 = 113.59 kg-m2

The performance index is selected to be of the form

J = 1

2

∫ T

0

(ωT Qω + u2

1 + u22 + u2

3

)dt (56)

where Q is a positive-definite weight matrix.

The solution presented was developed based upon an expansionof the cost-to-go to fourth-order and a second-order solution for theseries reversion.28 Series reversion to higher order was not carriedout because of excessive computational burden. Series reversion tosecond order was deemed sufficient for this problem, after inspectingthe results. All of the time-dependent gains were stored at discretepoints of time during backward integration, and the stored gainswere utilized to calculate the feedback control and the trajectory,during the forward integration process. A cubic-spline-interpolationtechnique available in MATLAB® was used to determine gains atintermediate time points between two stored data points. Resultsobtained for a specific numerical example are presented next.

In the example considered, the final time was selected to be 2s, and the weight matrix in the performance index is selected asthe moment of inertia matrix, that is, Q = I . The initial angularvelocities selected are as shown next:

ω1(0) = −0.4 rad/s, ω2(0) = 0.8 rad/s, ω3(0) = 2 rad/s

This example is not very realistic because of the high torque require-ments but is used to demonstrate the accuracy of our methodology.Figure 4a shows the trajectories obtained by using the feedback solu-tion, and for comparison the open-loop solutions are also presentedon the same graph. Again, there is a very close match between thetwo solutions. Figure 4b shows the control torques, and slight devia-tions between the respective closed-loop and open-loop profiles arenoticed. These deviations can be attributed to the very high initialangular velocities and the short maneuver time. Even though thedifferences between the respective control histories are noticeable,the open-loop and feedback controlled state trajectories are almostidentical. As an additional check, the Lagrange multiplier ν, com-puted along the trajectory, is plotted in Fig. 4c. Ideally, this vector

Fig. 4a Angular velocities for the spin maneuver.

Fig. 4b Feedback and open-loop control inputs.

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90

Page 8: Optimal Finite-Time Feedback Controllers for Nonlinear Systems with Terminal Constraints

928 VADALI AND SHARMA

Fig. 4c Lagrange multipliers associated with the terminal constraints.

should be constant, but it is not so because of series truncation. Asmentioned earlier, very near the final time, the Lagrange multipliersexhibit sharp changes. Hence the forward integration process wasstopped just before reaching the final time.

ConclusionsThis paper presents a series solution method for determining

finite-horizon optimal feedback control laws for nonlinear systemswith nonlinear terminal constraints. Expansion of the value func-tion as a polynomial series, in terms of the states and the constraintLagrange multipliers, is utilized for the solution of the Hamilton–Jacobi–Bellman equation. The method is applicable to analytic sys-tems and constraints. This method is more direct and accomplishedin a single step rather than in two steps as in the sweep method.The resulting optimal control law performs exceedingly well, asdemonstrated on some numerical examples, including the optimalspacecraft detumbling problem. Many practical guidance problemsrequire the solutions to fixed final time, terminally constrained non-linear optimal control problems. There might be significant advan-tages to neighboring optimal control methods based on nonlinearrealizations of the perturbation dynamics.

AcknowledgmentsThe authors wish to thank James D. Turner for his help with the

implementation of the vector series reversion algorithm.

References1Debs, A. S., and Athans, M., “On the Optimal Angular Velocity Control

of Asymmetrical Space Vehicles,” IEEE Transactions on Automatic Control,Vol. 14, No. 1, Feb. 1969, pp. 80–83.

2Al’brekht, E. G., “On the Optimal Stabilization of Nonlinear Systems,”Journal of Applied Mathematics and Mechanics, Vol. 25, No. 5, 1962,pp. 1254–1266.

3Lukes, D. L., “Optimal Regulation of Nonlinear Dynamical Systems,”SIAM Journal of Control, Vol. 7, No. 1, 1969, pp. 75–100.

4Dabbous, T. E., and Ahmed, N. U., “Nonlinear Optimal Feedback Regu-lation of Satellite Angular Momenta,” IEEE Transactions on Aerospace andElectronic Systems, Vol. AES-18, No. 1, 1982, pp. 2–10.

5Carrington, C. K., and Junkins, J. L., “Optimal Nonlinear FeedbackControl for Spacecraft Attitude Maneuvers,” Journal of Guidance, Control,and Dynamics, Vol. 9, No. 1, 1986, pp. 99–107.

6Yoshida, T., and Laparo, K. A., “Quadratic Regulatory Theory forAnalytic Nonlinear Systems with Additive Controls,” Automatica, Vol. 25,No. 4, 1989, pp. 531–544.

7Huang, J., and Lin, C. F., “Numerical Approach to Computing NonlinearH-Infinity Control Laws,” Journal of Guidance, Control, and Dynamics,Vol. 18, No. 5, 1995, pp. 989–994.

8Tsiotras, P., Corless, M., and Rotea, M., “An L2 Disturbance AttenuationSolution to the Nonlinear Benchmark Problem,” International Journal ofRobust and Nonlinear Control, Vol. 8, 1998, pp. 311–330.

9Garrard, W. L., McClamroch, N. H., and Clark, L. G., “An Approach toSuboptimal Feedback Control of Nonlinear Systems,” International Journalof Control, Vol. 5, No. 5, 1967, pp. 425–435.

10Cloutier, J. R., “State-Dependent Riccati Equation Techniques,” Pro-ceedings of the American Control Conference, American Automatic ControlCouncil, Dayton, OH, June 1997, pp. 932–936.

11Tewari, A., “Optimal Nonlinear Spacecraft Attitude Control ThroughHamilton-Jacobi Formulation,” Journal of the Astronautical Sciences,Vol. 50, No. 1, 2002, pp. 99–112.

12Sharma, R., and Tewari, A., “Optimal Nonlinear Tracking of SpacecraftAttitude Maneuvers,” IEEE Transactions on Control Systems Technology,Vol. 12, No. 5, 2004, pp. 677–682.

13Cimen, T., and Banks, S. P., “Global Optimal Feedback Control for Gen-eral Nonlinear Systems with Nonquadratic Performance Criteria,” Systemsand Control Letters, Vol. 53, No. 5, 2004, pp. 327–424.

14Xin, M., Balakrishnan, S. N., Stansbery, D. T., and Ohlmeyer, E. J.,“Nonlinear Missile Autopilot Design with θ − D Technique,” Journal ofGuidance, Control, and Dynamics, Vol. 27, No. 3, 2004, pp. 406–417.

15Vadali, S. R., and Junkins, J. L., “Optimal Open Loop and Stable Feed-back Control of Rigid Spacecraft Attitude Maneuvers,” The Journal of As-tronautical Sciences, Vol. 32, No. 2, 1984, pp. 105–122.

16Kristic, M., and Tsiotras, P., “Inverse Optimal Stabilization of a RigidSpacecraft,” IEEE Transactions on Automatic Control Systems, Vol. 44,No. 5, 1999, pp. 1042–1048.

17Beard, R., Saridis, G., and Wen, J., “Galerkin Approximation of the Gen-eralized Hamilton-Jacobi-Bellman Equation,” Automatica, Vol. 33, No. 12,1997, pp. 2159–2177.

18Beard, R., and McLain, T., “Successive Galerkin Approximation Tech-niques for Nonlinear Optimal and Robust Control,” International Journal ofControl, Vol. 71, No. 5, 1998, pp. 717–748.

19Curtis, J. W., and Beard, R. W., “Successive Collocation: An Approxi-mation to Optimal Nonlinear Control,” Proceedings of the American ControlConference, American Automatic Control Council, Dayton, OH, June 2001,pp. 3481–3485.

20Park, C., and Tsiotras, P., “Sub-Optimal Feedback Control Using a Suc-cessive Wavelet-Galerkin Algorithm,” Proceedings of the American ControlConference, American Automatic Control Council, Dayton, OH, June 2003,pp. 1926–1931.

21Park, C., and Tsiotras, P., “Approximations to Optimal Feedback Con-trol Using a Successive Wavelet Collocation Algorithm,” Proceedings ofthe American Control Conference, American Automatic Control Council,Dayton, OH, June 2003, pp. 1950–1955.

22Guibout, V. M., and Scheeres, D. J., “Solving Relative Two-PointBoundary Value Problems: Spacecraft Formation Flight Transfer Applica-tions,” Journal of Guidance, Control, and Dynamics, Vol. 27, No. 4, 2004,pp. 693–704.

23Park, C., and Scheeres, D. J., “Solutions of Optimal Feedback Con-trol Problems with General Boundary Conditions Using Hamiltonian Dy-namics and Generating Functions,” Proceedings of the American ControlConference, American Automatic Control Council, Dayton, OH, June 2004,pp. 679–684.

24Park, C., and Scheeres, D. J., “Extended Applications of GeneratingFunctions to Optimal Feedback Control Problems,” Proceedings of the Amer-ican Control Conference, American Automatic Control Council, Dayton,OH, June 2005, pp. 852–857.

25Bryson, A. E., and Ho, Y.-C., Applied Optimal Control, HemisphereWashington, DC, 1975, pp. 217–236.

26Bryson, A. E., Dynamic Optimization, Addison Wesley Longman,Menlo Park, CA, 1999, pp. 262–325.

27Dreyfus, S., Dynamic Programming and the Calculus of Variations,Academic Press, New York, 1965.

28Turner, J. D., “Generalized Gradient Search and Newton’s Method forMultilinear Algebra Root Solving and Optimization Applications,” Pro-ceedings of John L. Junkins Astrodynamics Symposium, Advances in theAstronautical Sciences, Vol. 115, Univelt Publishers, San Diego, CA, 2003,pp. 55–78.

29Mereau, P. M., and Powers, W. F., “The Darboux Point,” Journal ofOptimization Theory and Applications, Vol. 17, No. 5/6, 1975, pp. 545–559.

30Junkins, J. L., and Turner, J. D., Optimal Spacecraft Rotational Maneu-vers, Elsevier, New York, 1986, pp. 243, 244.

Dow

nloa

ded

by 1

49.1

42.2

35.2

06 o

n D

ecem

ber

9, 2

013

| http

://ar

c.ai

aa.o

rg |

DO

I: 1

0.25

14/1

.167

90