Model Reference Adaptive Control using Nonparametric...

Model Reference Adaptive Control using

Nonparametric Adaptive Elements

Girish Chowdhary∗, Jonathan How†

Massachusetts Institute of Technology, Cambridge, MA, 02139

and

Hassan Kingravi‡

Georgia Institute of Technology, Atlanta, GA, 30332

Most current model reference adaptive control methods rely on parametric adap-tive elements, in which the number of parameters of the adaptive element are fixeda-priori, often through expert judgment. Examples of such adaptive elements arethe commonly used Radial Basis Function Neural Networks (RBF-NN) with cen-ters allocated a priori based on the expected operating domain. If the systemoperates outside of the expected operating domain, such adaptive elements canbecome non-effective, thus rendering the adaptive controller only semi-global innature. This paper investigates two classes of nonparametric adaptive elements,that is, adaptive elements whose number of parameters grow in response to data.This includes RBF adaptive elements with centers that are allocated dynamically asthe system evolves using a Kernel linear independence test, and Gaussian Processesbased adaptive elements which generalize the notion of Gaussian Distribution tofunction approximation. We show that these nonparametric adaptive elements re-sult in good closed loop performance without requiring any prior knowledge aboutthe domain of the uncertainty. These results indicate that the use of such non-parametric adaptive elements can improve the global stability properties adaptivecontrollers.

I. Introduction

Model Reference Adaptive Control (MRAC) has been widely used for control of nonlinear sys-tems in presence of significant modeling uncertainties1–7. Several authors have explored adaptiveflight vehicle control (see e.g. [8–11]). In several such adaptive control problems very little is knownabout the system uncertainty. A popular method for handling system uncertainties about whichlimited information is available is to employ universal approximators such as Neural Networks(NNs). Since they were first proposed as adaptive elements by Sanner and Slotine12, Radial BasisFunction (RBF)-NNs have emerged as perhaps the most widely used universal-approximator adap-tive elements13–21. Lewis et al. derived Single Hidden Layer (SHL) NN based adaptive laws15,22,which were further generalized by other authors10,11,23,24. However, RBF-NN have maintained theirpopularity partly because of their linear-in-the-parameter structure. However, the accuracy of arepresentation using RBFs is heavily dependent on the location of RBF centers. Several adaptivecontrol approaches using RBF-NN rely on distributing the RBF centers over an estimated domain

∗Postdoctoral associate, Lab. for Info. and Decision Systems, MIT, [email protected]†Richard C. Maclaurin Professor of Aeronautics and Astronautics, MIT. Associate Fellow AIAA, [email protected]‡Graduate Research Assistant, Georgia Institute of Technology, [email protected]

1 of 24

American Institute of Aeronautics and Astronautics

of system operation. Since the output of an RBF diminishes exponentially fast away from its cen-ter, this restricts RBF-NN based neuroadaptive controllers to be only locally valid. Particularly, ifexternal commands drive the system outside of the estimated domain of operation, the RBF-NNcan become ineffective. This renders the adaptive controller only semi-global in nature. To counterthis, authors have also explored RBF center tuning laws that attempt to move the centers to min-imize instantaneous tracking errors (see e.g. [21,25,26]). However, tracking error based updatelaws are often rank-1, which means that if the centers were all initialized at zero, they would movetogether in the same direction. Furthermore, little insight is available about the convergence prop-erties of centers that are tuned using instantaneous tracking error. Others have proposed heuristictechniques for pruning “irrelevant” nodes, however, these techniques typically assume that the do-main of operation of the adaptive controller remain bounded12,27. Yet another open problem inusing RBF-NN is that it is not clear how many RBFs should be chosen for a particular adaptivecontrol problem, and expert judgment is often employed. In this paper, we approach the problemby appealing to the theory of Reproducing Kernel Hilbert Spaces (RKHS)28,29, and the theory ofBayesian nonparametric probabilistic models.

In machine learning literature authors have been exploring nonparametric models for regressionproblems for which little prior information is available. The idea behind nonparametric mod-els and associated regression methods is that the number of parameters are not fixed a priori,rather parameters grow in response to data. Leveraging the powerful framework of RKHS, authorshave developed kernel filtering methods that allocate radial basis functions based on example inputoutput pairs using RBF kernels30,31. Gaussian Processes are another example of Bayesian nonpara-metric models for regression where the unknown function is assumed to be drawn from a normaldistribution with a mean and covariance matrix that is updated using Bayesian inference30,32,33.These methods hold great promise for adaptive control, as they can overcome the limitations offixed-parameter RBF-NN. There does not seem however, to be much work done in utilizing non-parametric Kernel methods and Bayesian nonparameteric models in direct MRAC control for liftingthe assumption of bounded basis of operation.

In order to bring kernel methods and adaptive control together, we introduced the BudgetedKernel Restructuring-Concurrent Learning adaptive control algorithm34. In BKR-CL RBF centersare allocated along the trajectory of the system. The key contribution of that work was to showthat kernel based nonparametric models can be implemented in an MRAC framework on a budget.Particularly we showed that when using RBF-NN nonparametric models in a direct adaptive controlframework, one can limit the maximum number of RBFs allowable, but keep updating a dictionaryof centers in response to the data. In this paper we explore this concept further and extendthe theory of BKR-CL MRAC to non-square systems (in which number of inputs is less thanthe number of states). We also explore Gaussian Process (GP) based regression models in theframework of MRAC, leading to the GP-MRAC algorithm. Murray-Smith et al have exploredGaussian processes in the context of dual adaptive control35,36, the main difference here is that weuse GPs in the MRAC framework which is a direct adaptive control framework and requires that(budgeted) GP inference be performed online. The introduction of GPs as models of uncertaintyallows great flexibility in the types of uncertainties that can be handled. In particular, traditionalMRAC treats the uncertainty as a smooth deterministic function. In GP-MRAC on the other handit is assumed that the uncertainty is modeled by a GP, that is, it is completely specified by a meanand a covariance function. This probabilistic model of uncertainty is motivated from the notionthat a smooth deterministic function representation of the uncertainty may not always be valid;particularly in presence of noise, servo chattering, wind-related disturbances, and other stochasticeffects. Corresponding to the uncertainty, the adaptive element is also modeled as a GP, outputof which is a draw from a normal distribution whose posterior mean and covariance are updatedusing Bayesian inference32,37. An intentionally simple and online implementable algorithm for

2 of 24


maintaining an online dictionary of data points for posterior inference on a budget is also developedfor GP-MRAC.

Section II provides an introduction to approximate model inversion based MRAC, and exploresfurther the limitations of RBF-NN adaptive elements with preallocated centers. In Section III wedescribe BKR-CL adaptive controllers. In Section IV GP-MRAC adaptive controllers are intro-duced. The performance of both BKR-CL and GP-MRAC adaptive controllers is evaluated usingnumerical simulations. The paper is concluded in Section V.

II. Approximate Model Inversion based Model Reference Adaptive Control

AMI-MRAC is an MRAC method that allows the design of adaptive controllers for a gen-eral class of nonlinear plants for which an approximate inversion model exists. This method ofadaptive control is flexible, and has been widely used for flight vehicle control.8,25,38 Let x(t) =[xT1 (t), xT2 (t)]T ∈ Rn be the known state vector, with x1(t) ∈ Rn2 and x2(t) ∈ Rn2 , let δ(t) ∈ Rldenote the control input, and consider the following multiple-input nonlinear uncertain dynamicalsystem

x1(t) = x2(t) (1)

x2(t) = f(x(t), δ(t)),

where the function f(0, 0) = 0, is assumed to be unknown, and assumed to be globally Lipschitz.For the purpose of this paper it is assumed that n2 ≥ l. This assumption can be restrictive forover actuated systems, in this case, it may be relaxed by using matrix inverse and pseudoinverseapproaches, constrained control allocation, pseudocontrols, or daisy chaining (see e.g. [39–41]),to reduce the dimension of the control input vector. It is further assumed that δ(t) is limited tothe class of admissible control inputs consisting of measurable functions and x(t) is available forfull state feedback. These conditions ensure the existence and uniqueness of the solution to (1).Furthermore, a condition on controllability of f with respect to δ must also be assumed.

In AMI-MRAC a pseudo-control input (desired acceleration) is designed ν(t) ∈ Rn2 that can beused to find the control input δ such that the system states track the output of a reference model.Let z = (x, δ), if the exact system model f(z) in (1) is available and invertible, for a given ν(t), δ(t)can be found by inverting the system dynamics. However, since the exact system model is usuallynot available or not invertible, we let ν be the output of an approximate inversion model f whichsatisfies the following assumption

Assumption 1. The approximate inversion model ν = f(x, δ) is continuous with respect to

δ ∈ Rl, and ∂f(x,δ)∂δ is continuous for every (x, δ) ∈ Rn×l.

Assumption 1 guarantees the existence of an approximate inversion operator f−1 : Rn+n2 → Rlsuch that for every unique pair (x, ν) ∈ Rn+n2 a unique control command δ ∈ Rl can be assigned42

δ = f−1(x, ν). (2)

The use of an approximate inversion model results in a model error of the form

x2 = ν + ∆(x, δ) (3)

where ∆ is the modeling error given by:

∆(z) = f(z)− f(z). (4)

3 of 24


Note that if the control assignment function (the mapping between control inputs to states) wereknown and invertible with respect to δ, then an inversion model can be chosen such that themodeling error is only a function of the state x. A reference model can be designed that characterizesthe desired response of the system

x1rm = x2rm , (5)

x2rm = frm(xrm, r),

where frm(xrm(t), r(t)) denote the reference model dynamics, which are assumed to be continuouslydifferentiable in xrm for all xrm ∈ Dx ⊂ Rn. The command r(t) is assumed to be bounded andpiecewise continuous, furthermore, frm is assumed to be such that xrm is bounded for a boundedreference input.

The pseudo-control input ν consisting of a linear feedback part νpd = [K1,K2]e with K1 ∈Rn2×n2 and K2 ∈ Rn2×n2 , a linear feedforward part νrm = x2rm , and an adaptive part νad(z) ischosen to have the following form8,11,38

ν = νrm + νpd − νad. (6)

Defining the tracking error e(t) = xrm(t)− x(t), and using (3) the tracking error dynamics canbe written as

e = xrm −

[x2

ν + ∆

]. (7)

Letting A =

[0 I1

−K1 −K2

], B = [0, I2]

T where 0 ∈ Rn2×n2 , I1 ∈ Rn2×n2 , and I2 ∈ Rn2×n2 are

the zero and identity matrices, and using (6) we have the following tracking error dynamics thatcontain a term linear in e

e = Ae+B[νad(z)−∆(z)]. (8)

The baseline full state feedback controller νpd is chosen to make A Hurwitz. Hence for any positivedefinite matrix Q ∈ Rn×n , a positive definite solution P ∈ Rn×n exists to the Lyapunov equation

0 = ATP + PA+Q. (9)

Note that with the formulation presented here the existence of a fixed point solution to νad =∆(., νad) needs to be assumed18, sufficient conditions are available for its existence42.

A. Limitations of Radial Basis Function Neural Networks with Preallocated Centers

Upon examining the tracking error dynamics equation (8) we see that since A is designed to beHurwitz, and since the equation contains a term linear in e, if νad(z) − ∆(z) remains bounded,the tracking error will also remain bounded. In order to achieve this, the output of adaptiveelement νad is adjusted so as to dominate or cancel the uncertainty ∆. If sufficient prior knowledgeof the uncertainty is available, it may be possible to represent the uncertainty as an unknownlinear combination of known nonlinear basis functions. For this widely studied case, the adaptiveelement can be chosen as a parameterized combination of the known basis functions, with theparameters updated using an adaptive law. Global convergence of the tracking error dynamics tozero can be established if the parameters are updated based on the instantaneous tracking error(see e.g. [1,3,4]). Furthermore, stronger results of exponential stability of the zero solution of theclosed loop tracking error and the parameter error dynamics are available for the case of structureduncertainty.43,44 The focus of this paper is on the more general class of uncertainties for which the

4 of 24


basis functions for the uncertainty are not known due to limited prior knowledge. Traditionally,this case has been handled by using universal function approximators as adaptive elements. Oneof the most commonly used approximatros are Gaussian Radial Basis Function Neural Networks(RBF-NN) (see e.g. [12,15–20]).

When RBF-NN are used as adaptive elements, the adaptive part of the control law (6) can berepresented using a linear combination of RBF kernels:

νad(z) = W Tσ(z), (10)

where W ∈ Rq×n2 and σ(z) = [1, σ2(z), σ3(z), ....., σq(z)]T is a q dimensional vector of radial basis

functions. For i = 2, 3..., q let ci denote the RBF centroid and µi denote the RBF widths, then foreach i the radial basis functions are given as

σi(x) = exp

−‖x− ci‖22µ2i . (11)

Appealing to the universal approximation property of Radial Basis Function Neural Networks45 wehave that, given a fixed number of radial basis functions q, there exists ideal parameters W ∗ ∈ Rq×n2

and a vector ε ∈ Rn such that the following approximation holds for all z ∈ D ⊂ Rn+l, where D iscompact

∆(z) = W ∗Tσ(z) + ε(z). (12)

Furthermore ε = supz∈D ‖ε(z)‖ can be made arbitrarily small by increasing the number of radialbasis functions distributed over D.

Typical implementations of adaptive controllers using RBF-NN in the literature rely on esti-mating the operating domain D of the system and preallocating the centers ci of a predeterminednumber of RBFs over that domain. Figure 1 shows a one dimensional RBF centered at 1 and witha width of 2. From the figure we see that the function decays exponentially as we move away fromthe center. Therefore, if z(t) evolves far away from the centers ci we see that elements of σ(z) willbe very close to zero. Increasing the width of the RBF helps by flattening the function, however,the fact is that the system states must remain close to the location of RBF centers if the RBF-NNis to be effective. This means that in order for the adaptive element to be effective and any stabilityresults to hold, it must be assumed that the uncertainty ∆(z) evolves only over the compact domainover which the centers have been distributed. This indicates that the adaptive element will onlybe effective only within D, which is a major limitation of RBF-NN based adaptive controllers withpreallocated centers.

When RBF-NN are used as adaptive elements, it is well known that the following adaptive lawwith the σ-modification term guarantees uniform ultimate boundedness of the tracking error andadaptive parameters2

W = −ΓW (σ(x)eTPB − κW ). (13)

In the above adaptive law, ΓW denotes a positive definite learning rate matrix, and κ denotes thegain for the σ-modification term. Note that when RBF-NN are used as adaptive elements, onecannot in general guarantee that the tracking error will asymptotically approach zero due to thepresence of nonzero approximation error ε(z) in (12). Furthermore, note that this adaptive law doesnot in general guarantee that the adaptive parameters W (t) approach and stay bounded within acompact neighborhood of the ideal parameters W ∗. A condition on Persistency of Excitation (PE)of the system states is often required to guarantee the convergence of the adaptive parameters totheir ideal values4,15,46.

Kingravi et al. showed that location of centers also affects the amount of excitation that is“visible” to the adaptive law (13)47. The results in that paper showed that even if x(t) is exciting,

5 of 24


−10 −8 −6 −4 −2 0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

σ

RBF with µ = 2 centerd at 1

Figure 1.

if the system evolves away from where the centers are, σ(x(t)) need not be exciting.Therefore,preallocating RBF centers can result in severe limitations on adaptive control.

So in summary, two major limitations of RBF-NN with preallocated centers can be identified:

Limitation 1 The RBF-NN can be ineffective outside of the estimated operating domain D overwhich the centers have been preallocated. This makes the adaptive controller only locallyeffective.

Limitation 2 It is difficult to guarantee the convergence of the parameters to their ideal valueswhen adaptive laws such as (13) are used. The problem is further exacerbated by the fact thatthe amount of excitation “visible” to the adaptive element may diminish if the system evolvesoutside of the estimated operating domain over which the centers have been preallocated.

In this paper, these limitations are addressed by introducing nonparametric models as adaptiveelements.

B. Nonparametric Models in Regression

In regression, nonparametric models refer to a class of regression models in which the number ofparameters are not fixed a priori. Rather, the number of parameters grow with the data. Non-parametric models are designed to overcome local approximation properties of popular universalapproximators, including RBFs. Gaussian Processes (GP) based models are an example of a wellstudied nonparametric models in regression (see e.g. [37]), particularly in the field of geographicalsurveying, where it is often known as kriging. In recent years there has been great interest indeveloping nonparametric models by exploiting the theory of Reproducing Kernel Hilbert Spaces(RKHS).48 Regression methods employing such models are often known as kernel filtering meth-ods30. Several similarities between GP regression methods and kernel filtering methods are be-coming apparent through a RKHS interpretation of GP regression32,37. The main differences come

6 of 24


from the underlying philosophy of modeling the function to be learned. Particularly, in GPs, func-tions are viewed as draws from a Gaussian process completely specified by a mean and a covariancefunction, and in kernel based methods, they are viewed as linear combinations of an ever growingbasis.

The main benefit of using these nonparametric adaptive elements is that little to no priorknowledge of the operating domain of the uncertainty is required, and the centers need not bepreallocated. Rather, insights from RKHS theory are used to allocate the RBF centers and to growthe number of RBF kernels used in response to how the system evolves online. This is fundamentallydifferent than tuning the location of a fixed number of RBF centers based on the instantaneoustracking error (as is done in [21,25]). Particularly since these center update laws are rank-1, and arenot guaranteed to move the centers to ensure their relevance. Furthermore, we alleviate Limitation2 by using insights from concurrent learning adaptive control43 and GP regression32,37 to ensurethat the adaptive element best captures the uncertainty. It should be noted that traditionallynonparametric models have been used for offline supervised learning and regression problems. Inthese situations, the data set size is often fixed a priori, therefore, nonparametric regression modelsas studied in [30,37] are not conducive to ensuring that computations are completed in real-time.In order to ensure real-time feasibility, we enforce an additional restriction that the number ofmaximum allowable parameters at any instant of time be limited (this number is referred to as the“budget”). Once the budget is reached, any new parameters are added by removing older (possiblyirrelevant) parameters.

III. Kernel Linear Independence Test based Nonparametric AdaptiveElements

A kernel on D ⊂ Rn is any continuous, symmetric positive-semidefinite function of the formk : D ×D → R. A kernel can be thought of as a measure of how different two points in the statespace are. Mercer’s Theorem implies that there exists an RKHS H (of functions) and a mappingψ : Rn → H s.t.

k(x, y) = 〈ψ(x), ψ(y)〉H, (14)

where 〈·, ·〉H is an inner product on H. For a given kernel function, the mapping ψ(x) ∈ H doesnot have to be unique, and is often unknown. However, since ψ is implicit, it does not need tobe known for most machine learning algorithms, which exploit the nonlinearity of the mapping tocreate nonlinear algorithms from linear ones30,49. The key strength of a kernel in an RKHS comesfrom their reproducing property, which states 〈f(·), k(·, x)〉H = f(x)37,50. Further discussion ofKernels can be found in [30,37,47], for the purpose of this paper, it is sufficient to note that RBFsare examples of bounded reproducing kernels.

Traditionally, when a new data point is encountered, kernel methods place an RBF centered atthat point30. If used in online adaptive control settings, this approach would mean that the numberof kernels, and hence the number of parameters would grow unbounded as the system evolves. Tocounter this, Kingravi et al. proposed the Budgeted Kernel Restructuring – Concurrent Learning(BKR-CL) adaptive control algorithm47. In BKR-CL, the kernel linear independence test used byNguyen-Tuong et al.51 is used to determine whether the current data point should be added toan online maintained dictionary of RBF centers Cl = cili=1, where l is the current size of thedictionary, and ND is the upper limit on the number of points (the budget). A new center cl+1 isinserted into the dictionary if it cannot be approximated in the feature space H by the current setof centers. This test is performed using

γ =

∥∥∥∥∥l∑

i=1

aiψ(ci)− ψ(cl+1)

∥∥∥∥∥2

H

, (15)

7 of 24


Algorithm 1 Kernel Linear Independence Test51

Input: New point cl+1, η,ND.Computeal = K−1l klkl = k(Cl, cl+1)

Compute γ as in Equation 16if γ > η then

if l < ND thenUpdate dictionary by storing the new point cl+1, and recalculating γ’s for each of the points

elseUpdate dictionary by deleting the point with the minimal γ, and then recalculate γ’s foreach of the points

end ifend if

where the ai denote the coefficients of the linear independence. Unraveling the above equation interms of (14) shows that the coefficients ai can be determined by minimizing γ, which yields theoptimal coefficient vector al = K−1l kl, where Kl = k(Cl, Cl) is the kernel matrix (a kernel distance

matrix evaluated pairwise between points) for the dictionary dataset Cl, and kl = k(Cl, cl+1) is thekernel vector. After substituting the optimal value al into (15) yields

γ = K(cl+1, cl+1)− kTl al. (16)

The BKR algorithm is summarized in Algorithm 1. Further details, including efficient ways ofimplementing this algorithm are available in [51].

In summary, Algorithm 1 provides a method to allocate centers online. Particularly, it allowsthe adaptive controller to begin by starting with one center at zero, and then manage a dictionaryof centers as state measurement become available. Note that until the budget is reached, every timea center is added, the number of parameters increases. Once the budget it reached, the algorithmadds new centers by removing old centers. This is fundamentally different from gradient basedupdate laws for RBF centers such as those in [21,25]. These laws attempt to move the centers tominimize the instantaneous tracking error e and are rank-1. Hence, for these laws, if all the centersare initialized to 0, they will all move in the same direction together.

A. The BKR-CL Algorithm

The BKR-CL algorithm uses the BKR algorithm 1 to maintain an online dictionary of centers,and uses the Concurrent Learning (CL) algorithm developed by Chowdhary et al.43,52 to ensurethat the adaptive weights approach the ideal values as required by the approximation property(see (12)). The combined algorithm guarantees that the dictionary of centers best represents thecurrent operating domain and the weights approach a compact set around their ideal values.

In CL, online recorded data is used concurrently with instantaneous data to improve weightconvergence. Chowdhary et al. have shown that the performance of CL can be related to theminimum singular value of the matrix containing the recorded data, and proposed a singular valuemaximizing algorithm for recording data online52. In context of BKR-CL, the data points recordedshould also correspond with the data points chosen as RBF centers. In the following, we describean online data recording algorithm (Algorithm 2) that selects, records, and removes data points toachieve this34. Note that Algorithm 1 picks the centers discretely; therefore, there always existsan interval [tk, tk+1] where k ∈ N where the centers are fixed. This discrete update of the centers

8 of 24


introduces switching in the closed loop system. Let σk(z) denote the value of σ given by thisparticular set of centers, denote by W k∗ the ideal set of weights for these centers and by σk theradial basis function for these centers.

Let p ∈ N denote the subscript of the last point stored. For a stored data point zj , let σkj ∈ Rn

denote σk(zj). We will let St = [z1, . . . , zp] denote the matrix containing the recorded informationin the history stack at time t, then Zt = [σk1 , . . . , σ

kp ] is the matrix containing the output of the RBF

function for the current set of RBF centers. The p-th column of Zt will be denoted by Zt(:, p). Itis assumed that the maximum allowable number of recorded data points is limited due to memoryor processing power considerations. Therefore, we will require that Zt has a maximum of p ∈ Ncolumns; clearly, in order to be able to satisfy rank(Zt) = l, p ≥ l. For the j-th data point, theassociated model error ∆(zj) is assumed to be stored in the array ∆(:, j) = ∆(zj). The uncertaintyof the system is estimated by estimating zj for the jth recorded data point using optimal fixed pointoptimal smoothing and solving equation (4) to get ∆(z) = x2j − νj .

The history stack is populated using an algorithm that aims to maximize the minimum singularvalue of the symmetric matrix Ω =

∑pj=1 σ

k(zj)σkT (zj). Thus any data point linearly independent

of the data stored in the history stack is included in the history stack. At the initial time t = 0 andk = 0, the algorithm begins by setting Zt(:, 1) = σ0(z(t0)). The algorithm then selects sufficientlydifferent points for storage; a point is considered sufficiently different if it is linearly independentof the points in the history stack, or if it is sufficiently different in the sense of the Euclidean normof the last point stored. Furthermore, if Algorithm 1 updates the dictionary by adding a center,then this point is also added to the history stack (if the maximum allowable size of the historystack is not reached), so the rank of Zt is maintained. If the number of stored points exceeds themaximum allowable number, the algorithm seeks to incorporate new data points in manner thatthe minimum singular value of Zt is increased; the current data point is added to the history stackonly if swapping it with an old point increases the singular value.

The BKR-CL adaptive control algorithm combines Algorithm 1 (BKR) and the online datarecording algorithm (Algorithm 2). It is summarized in Algorithm 3.

Note that in BKR-CL the number and location of RBFs change as the dictionary of centers isupdated. Therefore, between switches in the NN approximation, the tracking error dynamics aregiven by the following switching differential equation

e = Ae+ [W Tσk(z)−∆(z)]. (17)

The NN approximation error of (12) for the kth system can be rewritten as

∆(z) = W k∗T

σk(z) + εk(z). (18)

For the jth recorded data point let εj(t) = W T (t)σ(zj) −∆(zj). Also, let p be the number ofrecorded data points σ(zj) in the matrix Z = [σ(z1), ...., σ(zp)], such that rank(Z) = l. Then, theweight are update using the following switching law:

W = −ΓWσk(z)eTPB − ΓW

p∑j=1

σk(zj)εkT

j . (19)

The following theorem was proven in34 for square systems, it is extended here for systems ofthe form 1. Furthermore, assumptions requiring the system to stay within the bounded domain Dare removed.

Theorem 1. Consider the system in (1), the control law of (6), and the output of the adaptiveelement given by 10 with l denoting the number of RBFs used. At t = 0 assume that l = 1 and

9 of 24


Algorithm 2 Singular Value Maximizing Algorithm for Recording Data Points

Require: p ≥ 1

if‖σk((z(t))−σkp‖2

‖σk(z(t))‖ ≥ ε or a new center added by Algorithm 1 without replacing an old center thenp = p+ 1St(:, p) = z(t); store ∆(:, p) = x2 − ν(t)

else if new center added by Algorithm 1 by replacing an old center thenif old center was found in St then

overwrite old center in the history stack with new centerset p equal to the location of the data point replaced in the history stack

end ifend ifRecalculate Zt = [σk1 , . . . , σ

kp ]

if p ≥ p thenT = ZtSold = min SVD(ZTt )for j = 1 to p doZt(:, j) = σk(z(t))S(j) = min SVD(ZTt )Zt = T

end forfind maxS and let k denote the corresponding column indexif maxS > Sold thenSt(:, k) = z(t); store ∆(:, k) = x− ν(t)Zt(:, k) = σk(z(t))p = p− 1

elsep = p− 1Zt = T

end ifend if

the dictionary of RBF centers is maintained using Algorithm 1. Furthermore, assume that theAlgorithm 2 is used to record data online. Then, the switching weight update law in (19) ensuresthat the tracking error e of (17) and the RBF-NN weight errors W k ∈ Rl of (18) are bounded.

Proof. Consider the tracking error dynamics given by (17) and the update law of (19). Sinceνad = W T (σk(x)), we have εj = W Tσk(z)−∆(z). Then the NN approximation error is now givenby (18). With W k = W −W k∗

εkj (z) = W Tσk(z)−W k∗T

σk(z)− εk(z)= W kσk(z)− εk(z).

Therefore over [tk, tk+1], the weight dynamics are given by the following switching system

˙W k(t) = −Γ

p∑j=1

σk(zj)σkT (zj)

W k(t)

+

p∑j=1

σk(zj)εkT (zj)− σk(z(t))eT (t)PB

.(20)

10 of 24


Algorithm 3 The Budgeted Kernel Restructuring - Concurrent Learning (BKR-CL) adaptivecontrol algorithm

while new measurements are available doobtain measurement x(t)call Algorithm 1 to determine whether or not to add x(t) to the dictionary of centerscall Algorithm 2 to determine whether or not to add x(t) to the history stack for concurrentlearningupdate weights using (19)calculate pseudo control ν using (6) and the control input using (2)

end while

Consider the family of positive definite functions V k = 12eTPe + 1

2 tr(W kTΓ−1W W k), where tr(·)denotes the trace operator. Note that V k ∈ C1, V k(0) = 0 and V k(e, W k) > 0 ∀e, W k 6= 0, and

define Ωk :=∑

j σkj skTj . Then

V k(e, W k) = −1

2eTQe+ eTPBεk − tr

W kT

∑j

σkj σkTj W k −

∑j

σj εkj

≤ −1

2λmin(Q)eT e− λmin(Ωk)W kT W k + ‖eTPBεk‖ − ‖W kT

∑i

σki εk‖

≤ ‖e‖(Ck1 −

1

2λmin(Q)‖e‖

)+ ‖W k‖(Ck2 − λmin(Ωk)‖W k‖),

where Ck1 = ‖PB‖εk, C2 = pεk√l (l being the number of RBFs in the network), and Ωk =

p∑j=1

σk(zj)σkT . Note that since Algorithm 2 guarantees that only sufficiently different points are

added to the history stack then due to Micchelli’s theorem53, Ωk is guaranteed to be positivedefinite. Hence if ‖e‖ > 2C1/λmin(Q) and ‖W k‖ > C3/λmin(Ωk), we have V (e, W k) < 0. Hencethe set Πk = (e, W k) : ‖e‖+ ‖W k‖ ≤ 2C1/λmin(Q) + C2/λmin(Ωk) is positively invariant for thekth system.

Let S = (t1, 1), (t2, 2), . . . be an arbitrary switching sequence with finite switches in finitetime (note that this is always guaranteed due to the discrete nature of Algorithm 1). The sequencedenotes that a system Sk was active between tk and tk+1. Suppose at time tk+1, the system switchesfrom Sk to Sk+1. Then e(tk) = e(tk+1) and W k+1 = W k + ∆W k∗ , where ∆W k∗ = W k∗ −W (k+1)∗ .Since V k(e, W k) is guaranteed to be negative definite outside of a compact interval, it followsthat e(tk) and W k(tk) are guaranteed to be bounded. Therefore, e(tk+1) and W k+1(tk+1) are alsobounded and since V k+1(e, W k+1) is guaranteed to be negative definite outside of a compact set,e(t) and W k+1(t) are also bounded. Furthermore, over every interval [tk, tk+1], they will approachthe positively invariant set Πk+1 or stay bounded within Πk+1 if inside.

Remark 1. Note that BKR-CL along with the assumption that rank(Z) = l guarantees thatthe weights stay bounded in a compact neighborhood of their ideal values without requiring anyadditional damping terms such as σ modification2 or e modification3 (even in the presence ofnoise). Due to Micchelli’s theorem53, as long as the history stack matrix Zk contains l differentpoints, rank(Z) = l. Furthermore, since the BKR-CL algorithms also stores the states for whichcenters are added, it can be shown that the the rank condition will be met even in the first few timesteps when one begins with no a-priori recorded data points in the history stack.

11 of 24


Remark 2. Note that the result does not assume that the domain of the uncertainty (operatingdomain) is compact. This assumption is not required since the RBF centers are guaranteed to beclose to where the system is operating due to the nature of BKR-CL (Algorithm 3). Therefore, theresult is global in that sense.

B. Application to Trajectory Tracking in Presence of Wingrock Dynamics

Modern highly swept-back or delta wing fighter aircraft are susceptible to lightly damped oscil-lations in roll angle known as “Wing Rock”. Wing rock often occurs at conditions commonlyencountered at landing54; making precision control in presence of wing rock critical for safe land-ing. In this section we use concurrent learning control to track a sequence of roll commands in thepresence of wing rock dynamics. Let φ denote the roll attitude of an aircraft, p denote the roll rate,δa denote the aileron control input, then a model for wing rock dynamics is55

φ = p (21)

p = Lδaδa + ∆(x). (22)

where ∆(x) = W ∗0 + W ∗1 φ + W ∗2 p + W ∗3 |φ|p + W ∗4 |p|p + W ∗5 φ3, and Lδa = 3. The parameters for

wing rock motion are adapted from56, they are W ∗1 = 0.2314,W ∗2 = 0.6918,W ∗3 = −0.6245,W ∗4 =0.0095,W ∗5 = 0.0214. In addition to these parameters, a trim error is introduced by setting W ∗0 =0.8. The ideal parameter vector W ∗ is assumed to be unknown. The chosen inversion model hasthe form ν = 1

Lδaδa. This choice results in the modeling uncertainty of (4) to be given by ∆(x).

The adaptive controller uses the control law of (6). The linear gain K of the control law is givenby [1.2, 1.2]. A second order reference model with natural frequency of 1 rad/sec and dampingratio of 0.5 is chosen, and the learning rate is set to ΓW = 1. The simulation uses a time-step of0.05 sec. The BKR-CL adaptive controller uses Algorithms 1, 2, and the update law of 19. Theadaptive controller without BKR-CL uses the parameter update law of (13) augmented with σmodification2. When BKR-CL was not used, the centers were distributed uniformly between −1and 1. In both cases, the maximum number of centers was limited to 12.

Figure 2 compares the tracking performance of the BKR-CL and the σ modification basedcontroller with uniformly distributed weights. We observer that the BKR-CL controller outperformsthe σ modification controller with preallocated centers. This is further illustrated by Figure 3 whichdepicts the tracking errors. In Figure 4 the difference between the evolution of weights betweenBKR-CL and the σ modification controller can be seen. Figure 5 shows that the NN output whenusing BKR-CL better matches the uncertainty. Furthermore, we see that when the σ modificationalgorithm with uniformly allocated centers is used, the uncertainty approximation improves whenthe system starts evolving within the range over which the centers were distributed. Particularly,note that in Figure 6 shows the trajectory of the system in the phase plane. The figure showswhere the centers were placed when BKR-CL was not used. It can be seen that the system startsinitially outside of the range over which the centers were allocated. The fact that the output of theadaptive element does not match the uncertainty highlights the impact of placement centers. Incontrast, in Figure 6 we see that the BKR-CL algorithm allocates the centers along the path thatthe system takes in the phase plane. Note that when BKR-CL is used, the centers are all initializedat zero. The fact that the algorithm figures out an allocation of the centers hints at the fact thatthe adaptive element does not suffer from local validity.

IV. Adaptive Control using Gaussian Process Regression

Consider the uncertainty ∆(z) in (4). Traditionally in MRAC, it has been assumed that theuncertainty is a (smooth) deterministic function. Here we offer an alternate view of modeling the

12 of 24


0 10 20 30 40 50 60−1

0

1

2

3

time (seconds)θ

(rad

)

Angular Position

0 10 20 30 40 50 60−2

−1

0

1

2

time (seconds)

θ D

ot (

rad/

s)

Angular Rate

ref modelσ−modBKR−CL

Figure 2. Comparison of system states and the reference model for BKR-CL and σ modification withpreallocated centers.

uncertainty ∆(z) as a Gaussian Process (GP). A GP is defined as a collection of random variables,any finite subset of which has a joint Gaussian distribution37 with mean m(x) and covariancek(x(t′), x(t). GPs can be thought of as a distribution over functions. This probabilistic model ofuncertainty is motivated from the notion that a smooth deterministic function representation of theuncertainty may not always be valid; particularly in presence of noise, servo chattering, turbulence,and other non-smooth effects. For the sake of clarity of exposition, we will assume that ∆(z) ∈ R,extension to multidimensional case is straight forward. Note also that z ∈ Rn+δ as before. Hence,the uncertainty ∆(z) here is completely specified by its mean function m(x) = E(∆(z)) and itscovariance function k(z, z′) = E[(∆(z) − m(z))(∆(z′) − m(z′))], where z′ is the value that thevariable z takes at some time t′. We will use the notation:

∆(z) ∼ GP(m(z), k(z, z′)). (23)

The notion of representing the uncertainty as probability distribution can be very powerful due toits inherent ability to incorporate process noise. Consequently, it is assumed that the measurementsare corrupted with Gaussian white noise ε of variance ω2

n.

A. GP Regression

In Bayesian GP regression, we begin by assuming that the data can be modeled by using a GP priorand use Bayes law to update the posterior estimate of the probability distribution. For the prior, weassume a zero mean normal distribution. Note that this is not restrictive, since the posterior meanwill be updated based on the data. As data become available, the state measurements are storedin a matrix Z, and the noisy output (y = ∆ + ε) is stored in the vector y. Note that both Z and ygrow in size as data becomes available. Let zi+1 be the state measurement at measurement instanti+ 1, and let Z and yi contain measurements up to i. Let K(Z,Z) denote the kernel matrix, such

13 of 24


0 10 20 30 40 50 60−0.5

0

0.5

θ E

rr (

rad)

Position Error

0 10 20 30 40 50 60−0.5

0

0.5

θ D

otE

rr (

rad/

s)Angular Rate Error

0 10 20 30 40 50 60−5

0

5

time (seconds)

δ (r

ad)

Control Input

σ−modBKR−CL

Figure 3. Evolution of tracking error when using BKR-CL and σ modification with preallocatedcenters. Note that the tracking error performance is better with BKR-CL.

0 10 20 30 40 50 60−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

σ−mod weightsBKR−CL weights

Figure 4. Evolution of adaptive weights when using BKR-CL and σ modification with preallocatedcenters. Note that for σ modification with preallocated centers, the weights evolve in an oscillatorymanner when system evolves away from the range over which the centers were allocated.

14 of 24


0 10 20 30 40 50 60−2

−1

0

1

2

3

4

σ−modBKR−CLtrue uncertainty

Figure 5. Comparison of online estimate of uncertainty by the RBF-NN and the actual uncertaintywhen using BKR-CL and σ modification with preallocated centers. Note that the estimate of uncer-tainty improves when using σ modification with preallocated centers when the system starts evolvingwithin the range over which the centers were allocated. On the other hand, since BKR-CL allocatescenters along the way, the adaptation performance is uniformly good.

that Kl,m = k(Zl, Zm), where Zl and Zm are the lth and mth columns of Z respectivelly. Hence,K(Z,Z) ∈ Ri×i. Let k ∈ Ri denote the kernel vector corresponding to the i+ 1th measurement,such that kl = k(Zl, zi). Then, the joint distribution of the data available up to i and zi under theprior distribution is given as follows:[

yi

yi+1

]∼ N

(0,

[K(Z,Z) + ω2

nI k(Z, zi+1)

kT (Z, zi+1) k(zi+1, zi+1)

]). (24)

The posterior distribution can be obtained by conditioning the joint Gaussian prior distributionover the observation z. Therefore, it is possible to predict yi+1 given the measurements Z, yi andzi+1 as follows

yi+1|Z, yi, zi+1 ∼ N (m(zi+1), cov(zi+1)) , (25)

where analytical expressions for the mean and covariance can be derived using Bayes law andproperties of Gaussian distributions37

m(yi+1) = kT (Z, zi+1)[K(Z,Z) + ω2nI]−1yi, (26)

andcov(zi+1) = k(zi+1, zi+1)− kT (Z, zi+1)[K(Z,Z) + ω2

nI]−1k(Z, zi+1). (27)

An important insight is obtained here through the representer theorem28,29,37,57, using which it canbe shown that the predictive mean function can be represented by a finite combination of kernelfunctions. That is,

m(yi+1) = αT k(Z, zi+1), (28)

15 of 24


−1 −0.5 0 0.5 1 1.5 2 2.5 3−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

angular position θ (rad)

θ(rad

/s)

beforeafterwithout BKR−CL

Figure 6. Plot of centers allocated by BKR-CL and the uniformly distributed centers for σ modificationwith preallocated centers. The trajectory of the system in the phase plane is also plotted. It can beseen that the BKR-CL algorithm selects centers along the way. Since the number and location ofcenters (and hence parameters) is not fixed a priori, the BKR-CL is an example of adaptive controlusing a nonparametric adaptive element (on a budget).

and α = [K(Z,Z) + ω2nI]−1yi. This is important because it states that our best guess for the

predictive mean can be formed using only the available information even when the actual meanneed not be restricted to be a finite combination of bases (since it is a function).

Note that due to Mercer’s theorem and the presence of the positive definite matrix ω2nI, the

matrix inversion in (26) and (27) is well defined. Several numerical techniques are available forincreasing the accuracy and speed of computing [K(Z,Z) + ω2

nI]−1 (see e.g. [37]). However, sinceboth Z and y grow with data, the inversion can become computationally intractable. This is less ofa problem for traditional GP regression applications, which include regression problems with finitelearning samples, such as supervised machine learning. However, in an online setting, the finitenessof the samples cannot be guaranteed. Several techniques exist for approximating larger data sets37,however, they are either computationally intensive or more suited for batch applications. Therefore,in order to extend GP regression to MRAC, an online implementable method to restrict the numberof data points stored for inference is needed. In the following section we present an intentionallysimple and online implementable algorithm for this purpose and incorporate it with MRAC to formGP-MRAC.

B. GP nonparametric model based MRAC

Since, the uncertainty is modeled as a GP, it is fair to let the output of the adaptive element alsobe a draw from a normal distribution with mean m(z) and covariance k(z, z′):

νad(z) ∼ N (m(z), k(z, z′)). (29)

16 of 24


Algorithm 4 The Gaussian Process - Model Reference Adaptive Control (GP-MRAC) algorithm

while new measurements are available doobtain measurement z(t)if k(zi, z(t)) < γ then

increment i, store Z(:, i) = z(t), store yi = ˙x2(t)− ν(t)calculate K(Z,Z)

end ifif maximum number of points to be stored (pmax) reached then

set i = 1end ifcalculate k(Z, z(t))predict posterior mean m(z(t)) = kT (Z, z(t))[K(Z,Z) + ω2

nI]−1yipredict posterior covariance cov(z(t)) = 1− kT (Z, z(t))[K(Z,Z) + ω2

nI]−1k(Z, z(t))draw the output of the adaptive element from a normal distribution νad ∼N (m(z(t)), cov(z(t)))calculate pseudo control ν using (6) and the control input using (2)

end while

It follows that if m(z) is close to m(z) then νad−∆ must be bounded. Note that this is the best wecan hope for, since we have assumed ∆(z) to be random variable. This lends to an adaptive controlalgorithm in which we form online an estimate of the mean m(z) using available measurements andexpress the covariance using a kernel function, such as the RBF. Using notation from Section A

and using the RBF as the kernel function, we have k(Z, zi+1)l = exp

−‖zi+1−Zl‖2

2µ2i . Note that we are

not restricted to RBFs however, and any valid kernel functions can be used37.In order to ensure the online feasibility of performing prediction with GP, it would be beneficial

to restrict the number of data points that are stored for posterior inference. That is, it wouldbe beneficial to restrict the maximum size of Z. A simple way to achieve this is to store onlypoints that are sufficiently different from the last point stored. The covariance function is onemeasure that allows us to tell how different two points are. Particularly, note that if zi = zi+1 thenk(zi, zi+1) = 1. Noting this, we propose a simple algorithm for performing GP regression using asparse representation of the data. Particularly, let pmax be the maximum number of points to bestored (the budget), and let zi be the last point stored, then a measurement z(t) is selected forstorage if k(zi, z(t)) < γ, and 0 < γ < 1. The variable γ is a design variable that can be tuned todetermine how “different” two stored points are. Once the maximum number of points are stored,the next point to be stored replaces the last stored point and so on. The mean is updated usingexpressions similar to (26) and (27) (see Algorithm 4).

The Gaussian Process - Model Reference Adaptive Control (GP-MRAC) algorithm is depictedin Algorithm 4. In that algorithm, zi denotes the last point stored, and Z denotes the matrixcontaining the recorded data. When a measurement becomes available, its kernel function withthe last data point stored is computed as k(zi, z(t)). If k(zi, z(t)) < γ then the point is stored forposterior inference. Along with storing z(t), we also need to store an estimate of the model error∆, this can be achieved by noting that ∆(z) = x2(t)− ν(t). When measurements of x2(t) are notexplicitly available or are noisy, an estimate ˙x2 can be used. Note that since we can handle noiseexplicitly when performing posterior inference using GPs, the estimate can be formed using wellknown linear filtering techniques58.

17 of 24


C. Application to Trajectory Tracking in Presence of Wingrock Dynamics

The GP-MRAC approach is applied for the problem of trajectory tracking control in presence ofwingrock dynamics (see Section B. The inverting controller of 6 is used in the framework of theGP-MRAC algorithm (Algorithm 4). The gain for the linear part of control law (νpd) is given by[1.2, 1.2]. A second order reference model with natural frequency of 1 rad/sec and damping ratioof 0.5 is chosen. The simulation uses a time-step of 0.05 sec. The maximum number of points tobe stored (pmax) was set to 100, and points were selected for storage if k(zi, z(t)) < γ = 0.98. Statemeasurements were corrupted with Gaussian white noise of variance 0.01.

Figure 7 compares the system states and the reference model when using GP-MRAC and RBF-MRAC with σ modification and uniformly distributed centers over [−1, 1]. It can be observed thatGP-MRAC performs better, especially when tracking the command in θ. Note the presence of noisein the state measurements. Figure 8 further compares the tracking error for both the controllers.It can be seen that the performance of GP-MRAC contains less oscillations. Note that comparedwith Figure 3 from Section B the performance of the RBF MRAC with preallocated centers isworst because the reference model drives the system out of the range over which the centers werepreallocated ([−1, 1]). Figure 9 compares the output of the GP adaptive element updated usingGP −MRAC and the RBF-NN adaptive element with preallocated centers updated with (13).While the GP adaptive element output is almost indistinguishable from the uncertainty in presenceof measurement noise, the RBF with preallocated centers does not fare as well. One reason forthis is that the reference model drives the system out of the range over which the centers werepreallocated. These results show that the GP adaptive element updated using Algorithm 4 issuccessful in capturing the uncertainty without requiring any prior domain knowledge about theuncertainty.

0 5 10 15 20 25 30 35 40 45 50−3

−2

−1

0

1

2

3

time (seconds)

x (r

ad)

roll angle

0 5 10 15 20 25 30 35 40 45 50−1.5

−1

−0.5

0

0.5

1

1.5

time (seconds)

xDot

(ra

d/s)

roll rate

ref modelwith GPwithout GP

Figure 7. Comparison of system states and reference model when using GP regression based MRACand RBF MRAC with σ modification and uniformly distributed centers over [−1, 1]. Note that thestate measurements are corrupted with Gaussian white noise.

18 of 24


0 5 10 15 20 25 30 35 40 45 50−0.5

0

0.5

xErr

(ra

d)

Position Error

0 5 10 15 20 25 30 35 40 45 50−0.5

0

0.5

1

xDot

Err

(ra

d/s)

Angular Rate Error

0 5 10 15 20 25 30 35 40 45 50−5

0

5

time (seconds)

δ (r

ad)

aileron command

with GPwithout GP

Figure 8. Comparison of tracking error when using GP regression based MRAC and RBF MRACwith σ modification and uniformly distributed centers over [−1, 1]. Note that as compared to Figure 3the tracking errors for RBF MRAC with uniformly distributed centers is worst because the commandsdrive the system out of the range over which the centers were distributed.

V. Conclusion

In this paper the benefits of using nonparametric models as adaptive elements in the frame-work of Model Reference Adaptive Control (MRAC) were demonstrated. Numerical simulationsand theoretical analysis were used to demonstrate and discuss local validity of popular RBF-NNadaptive elements with preallocated centers, particularly when external commands drive the sys-tem out of the range over which the centers were allocated. RBF adaptive elements with numberof basis and the location of centers that are allocated dynamically as the system evolves using aKernel linear independence test were developed in the framework of MRAC. The Budgeted KernelRestructuring-Concurrent Learning algorithm was described and validated through a trajectorytracking simulation in presence of unknown wingrock dynamics. Gaussian Processes (GP) basedadaptive elements which generalize the notion of Gaussian Distribution to function approxima-tion were also developed. The GP-MRAC adaptive control algorithm was validated for adaptivecontrol problems with measurement noise. We showed that nonparametric adaptive elements re-sult in good closed loop performance without requiring much prior knowledge about the domainof the uncertainty. These results indicate that nonparametric adaptive elements can improve theglobal stability properties adaptive controllers by alleviating local applicability issues related withRBF-NN with preallocated centers.

Acknowledgments

This research was supported in part by ONR MURI Grant N000141110688.

19 of 24


0 5 10 15 20 25 30 35 40 45 50−3

−2

−1

0

1

2

3

4

Time

unce

rtai

nty

∆ν

ad with GP

νad

without GP

Figure 9. Comparison of the adaptive element output and the actual uncertainty. We note thatthe approximation when using RBF MRAC with uniformly distributed centers is significantly worsethan the GP approximation and worse than RBF approximation in Figure 5 because the referencecommands drive the system out of the range over which the centers were distributed.

20 of 24


References

1 Astrom, K. J. and Wittenmark, B., Adaptive Control , Addison-Weseley, Readings, 2nd ed.,1995.

2 Ioannou, P. A. and Sun, J., Robust Adaptive Control , Prentice-Hall, Upper Saddle River, 1996.

3 Narendra, K. S. and Annaswamy, A. M., Stable Adaptive Systems, Prentice-Hall, EnglewoodCliffs, 1989.

4 Tao, G., Adaptive Control Design and Analysis, Wiley, New York, 2003.

5 Kim, B. S. and Calise, A. J., “Nonlinear Flight Control Using Neural Networks,” AIAA Journalof Guidance, Control, and Dynamics, Vol. 20, No. 1, 1997, pp. 26–33.

6 Lavretsky, E., “Combined/Composite Model Reference Adaptive Control,” Automatic Control,IEEE Transactions on, Vol. 54, No. 11, nov. 2009, pp. 2692 –2697.

7 Cao, C. and Hovakimyan, N., “Design and Analysis of a Novel Adaptive Control ArchitectureWith Guaranteed Transient Performance,” Automatic Control, IEEE Transactions on, Vol. 53,No. 2, march 2008, pp. 586 –591.

8 Johnson, E. and Kannan, S., “Adaptive Trajectory Control for Autonomous Helicopters,” Jour-nal of Guidance Control and Dynamics, Vol. 28, No. 3, May 2005, pp. 524–538.

9 Dydek, Z., Annaswamy, A., and Lavretsky, E., “Adaptive Control and the NASA X-15-3 FlightRevisited,” Control Systems Magazine, IEEE , Vol. 30, No. 3, Jun. 2010, pp. 32 –48.

10 Johnson, E. N. and Calise, A. J., “Limited Authority Adaptive Flight Control for ReusableLaunch Vehicles,” Journal of Guidance, Control and Dynamics.

11 Chowdhary, G. and Johnson, E. N., “Theory and Flight Test Validation of a Concurrent LearningAdaptive Controller,” Journal of Guidance Control and Dynamics, Vol. 34, No. 2, March 2011,pp. 592–607.

12 Sanner, R. and Slotine, J.-J., “Gaussian networks for direct adaptive control,” Neural Networks,IEEE Transactions on, Vol. 3, No. 6, nov 1992, pp. 837 –863.

13 Narendra, K., “Neural networks for control theory and practice,” Proceedings of the IEEE ,Vol. 84, No. 10, oct 1996, pp. 1385 –1406.

14 Suykens, J. A., Vandewalle, J. P., and Moor, B. L. D., Artificial Neural Networks for Modellingand Control of Non-Linear Systems, Kluwer, Norwell, 1996.

15 Kim, Y. H. and Lewis, F., High-Level Feedback Control with Neural Networks, Vol. 21 of Roboticsand Intelligent Systems, World Scientific, Singapore, 1998.

16 Kim, B. S. and Calise, A. J., “Nonlinear Flight Control Using Neural Networks,” Proceedings ofthe AIAA Guidance Navigation and Control Conference, August 1994.

17 Patino, D. and Liu., D., “Neural Network Based Model Reference Adaptive Control System,”IEEE Transactions on Systems, Man and Cybernetics, Part B , Vol. 30, No. 1, 2000, pp. 198–204.

18 Calise, A., Hovakimyan, N., and Idan, M., “Adaptive Output Feedback Control of NonlinearSystems Using Neural Networks,” Automatica, Vol. 37, No. 8, 2001, pp. 1201–1211, Special Issueon Neural Networks for Feedback Control.

21 of 24


19 Volyanskyy, K. Y., Haddad, W. M., and Calise, A. J., “A New Neuroadaptive Control Ar-chitecture for Nonlinear Uncertain Dynamical Systems: Beyond σ and e-Modifications,” IEEETransactions on Neural Networks, Vol. 20, No. 11, Nov 2009, pp. 1707–1723.

20 Yucelen, T. and Calise, A., “Kalman Filter Modification in Adaptive Control,” Journal of Guid-ance, Control, and Dynamics, Vol. 33, No. 2, march-april 2010, pp. 426–439.

21 Sundararajan, N., Saratchandran, P., and Yan, L., Fully Tuned Radial Basis Function NeuralNetworks for Flight Control , Springer, 2002.

22 Lewis, F. L., “Nonlinear Network Structures for Feedback Control,” Asian Journal of Control ,Vol. 1, 1999, pp. 205–228, Special Issue on Neural Networks for Feedback Control.

23 Kannan, S., Adaptive Control of Systems in Cascade with Saturation, Ph.D. thesis, GeorgiaInstitute of Technology, Atlanta Ga, 2005.

24 Rysdyk, R. T. and Calise, A. J., “Nonlinear Adaptive Flight Control Using Neural Networks,”IEEE Controls Systems Magazine, Vol. 18, No. 6, Dec 1998, pp. 14–25.

25 Nardi, F., Neural Network based Adaptive Algorithms for Nonlinear Control , Ph.D. thesis, Geor-gia Institute of Technology, School of Aerospace Engineering, Atlanta, GA 30332, nov 2000.

26 Shankar, P., Self-Organizing radial basis function networks for adaptive flight control and aircraftengine state estimation, Ph.d., The Ohio State University, Ohio, 2007.

27 Cannon, M. and Slotine, J.-J. E., “Space-frequency localized basis function networks for non-linear system estimation and control,” Neurocomputing , Vol. 9, No. 3, 1995, pp. 293 – 342,¡ce:title¿Control and Robotics, Part III¡/ce:title¿.

28 Aronszajn, N., “Theory of Reproducing Kernels,” Transactions of the American MathematicalSociety , Vol. 68, No. 3, may 1950, pp. 337–404.

29 Scholkopf, B., Herbrich, R., and Smola, A., “A Generalized Representer Theorem,” Computa-tional Learning Theory , edited by D. Helmbold and B. Williamson, Vol. 2111 of Lecture Notesin Computer Science, Springer Berlin / Heidelberg, 2001, pp. 416–426.

30 Liu, W., Principe, J. C., and Haykin, S., Kernel Adaptive Filtering: A Comprehensive Introduc-tion, Wiley, Hoboken, New Jersey, 2010.

31 Engel, Y., Mannor, S., and Meir, R., “The Kernel Least-Mean-Square Algorithm,” IEEE Trans-actions on Signal Processing , Vol. 52, No. 8, Aug 2004, pp. 2275–2285.

32 Scholkopf, B. and Smola, A., Support Vector Machines, Regularization, Optimization, and Be-yond , MIT press, Cambridge, MA, USA, 2002.

33 Rasmussen, C. and Williams, C., Gaussian Processes for Machine Learning (Adaptive Compu-tation and Machine Learning), The MIT Press, 2005.

34 Kingravi, H. A., Chowdhary, G., Vela, P. A., and Johnson, E. N., “Reproducing Kernel HilbertSpace Approach for the Online Update of Radial Bases in Neuro-Adaptive Control,” NeuralNetworks and Learning Systems, IEEE Transactions on, Vol. 23, No. 7, july 2012, pp. 1130–1141.

35 Murray-Smith, R. and Sbarbaro, D., “Nonlinear adaptive control using non-parametric Gaus-sian Process prior models,” 15th Triennial World Congress of the International Federation ofAutomatic Control , International Federation of Automatic Control (IFAC), 2002.

22 of 24


36 Murray-Smith, R., Sbarbaro, D., Rasmussen, C., and Girard, A., “Adaptive, cautious, predictivecontrol with Gaussian process priors,” 13th IFAC Symposium on System Identification, editedby P. V. de Hof, B. Wahlberg, and S. Weiland, IFAC proceedings volumes, Elsevier Science,2003, pp. 1195–1200.

37 Rasmussen, C. and Williams, C., Gaussian Processes for Machine Learning , MIT Press, Cam-bridge, MA, 2006.

38 Johnson, E. N., Limited Authority Adaptive Flight Control , Ph.D. thesis, Georgia Institute ofTechnology, Atlanta Ga, 2000.

39 Durham, W., “Constrained Control Allocation,” AIAA Journal of Guidance, Control, and Dy-namics, Vol. 16, 1993, pp. 717–772.

40 Wise, K., “Applied controls research topics in the aerospace industry,” Decision and Control,1995., Proceedings of the 34th IEEE Conference on, Vol. 1, dec 1995, pp. 751 –756 vol.1.

41 Yucelen, T., Kim, K., Calise, A., and Nguyen, N., “Derivative-Free output feedback adaptivecontrol of an aeroelastic generic transport model,” Guidance, Navigation, and Control confer-ence, AIAA, Portland, OR, August 2011, AIAA-2011-6454.

42 Kim, N., Improved Methods in Neural Network Based Adaptive Output Feedback Control, withApplications to Flight Control , Ph.D. thesis, Georgia Institute of Technology, Atlanta Ga, 2003.

43 Chowdhary, G. and Johnson, E. N., “Concurrent Learning for Convergence in Adaptive ControlWithout Persistency of Excitation,” 49th IEEE Conference on Decision and Control , 2010, pp.3674–3679.

44 Chowdhary, G., Concurrent Learning for Convergence in Adaptive Control Without Persistencyof Excitation, Ph.D. thesis, Georgia Institute of Technology, Atlanta, GA, 2010.

45 Park, J. and Sandberg, I., “Universal approximation using radial-basis-function networks,” Neu-ral Computatations, Vol. 3, 1991, pp. 246–257.

46 Boyd, S. and Sastry, S., “Necessary and Sufficient Conditions for Parameter Convergence inAdaptive Control,” Automatica, Vol. 22, No. 6, 1986, pp. 629–639.

47 Kingravi, H., Chowdhary, G., Vela, P., and Johnson, E., “A Reproducing Kernel Hilbert SpaceApproach for the Online Update of Radial Bases in Neuro-Adaptive Control,” Conference onDecision and Control , IEEE, Orlando, FL, December 2011.

48 Burges, C., “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Miningand Knowledge Discovery , Vol. 2, 1998, pp. 121–167.

49 Cristianini, N. and Shawe-Taylor, J., Kernel Methods for Pattern Analysis, Cambridge Univer-sity Press, 2004.

50 Scholkopf, B. and Smola, A., “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,”Neural Computations, Vol. 10, No. 5, 1998, pp. 1299–1319.

51 Nguyen-Tuong, D., Scholkopf, B., and Peters, J., “Sparse Online Model Learning for Robot Con-trol with Support Vector Regression,” IEEE/RSJ Int. Conf. on Intelligent Robots and Systems,2009, pp. 3121–3126.

23 of 24


52 Chowdhary, G. and Johnson, E. N., “A Singular Value Maximizing Data Recording Algorithmfor Concurrent Learning,” American Control Conference, San Francisco, CA, June 2011.

53 Micchelli, C. A., “Interpolation of scattered data: distance matrices and conditionally positivedefinite functions,” Construct. Approx., Vol. 2, No. 1, dec. 1986, pp. 11–22.

54 Saad, A. A., Simulation and analysis of wing rock physics for a generic fighter model with threedegrees of freedom, Ph.D. thesis, Air Force Institute of Technology, Air University, Wright-Patterson Air Force Base, Dayton, Ohio, 2000.

55 Monahemi, M. M. and Krstic, M., “Control of Wingrock Motion Using Adaptive Feedback Lin-earization,” Journal of Guidance Control and Dynamics, Vol. 19, No. 4, August 1996, pp. 905–912.

56 Singh, S. N., Yim, W., and Wells, W. R., “Direct Adaptive Control of Wing Rock Motion ofSlender Delta Wings,” Journal of Guidance Control and Dynamics, Vol. 18, No. 1, Feb. 1995,pp. 25–30.

57 Muller, K.-R., Mika, S., Ratsch, G., Tsuda, S., and Scholkopf, B., “An Introduction to Kernel-Based learning Algorithms,” IEEE Transactions on Neural Networks, Vol. 12, No. 2, 2001,pp. 181–202.

58 Gelb, A., Applied Optimal Estimation, MIT Press, Cambridge, 1974.

24 of 24


Model Reference Adaptive Control using Nonparametric...

Documents

Transcript of Model Reference Adaptive Control using Nonparametric...