Cautious Model Predictive Control using Gaussian Process ...1 Cautious Model Predictive Control...

1

Cautious Model Predictive Control using GaussianProcess Regression

Lukas Hewing, Juraj Kabzan, Melanie N. Zeilinger

Abstract—Gaussian process (GP) regression has been widelyused in supervised machine learning due to its flexibility andinherent ability to describe uncertainty in function estimation.In the context of control, it is seeing increasing use for modelingof nonlinear dynamical systems from data, as it allows the directassessment of residual model uncertainty. We present a modelpredictive control (MPC) approach that integrates a nominalsystem with an additive nonlinear part of the dynamics modeledas a GP. Approximation techniques for propagating the statedistribution are reviewed and we describe a principled way offormulating the chance constrained MPC problem, which takesinto account residual uncertainties provided by the GP modelto enable cautious control. Using additional approximations forefficient computation, we finally demonstrate the approach in asimulation example, as well as in a hardware implementation forautonomous racing of remote controlled race cars, highlightingimprovements with regard to both performance and safety overa nominal controller.

Index Terms—Model Predictive Control, Gaussian Processes,Learning-based Control

I. INTRODUCTION

Many modern control approaches depend on accurate modeldescriptions to enable safe and high performance control. Iden-tifying these models, especially for highly nonlinear systems,is a time-consuming and complex endeavor. Often times, how-ever, it is possible to derive an approximate system model, e.g.a linear model with adequate accuracy close to some operatingpoint, or a simple model description from first principles.In addition, measurement data from previous experiments orduring operation is often available, which can be exploited toenhance the system model and controller performance. In thispaper, we present a model predictive control (MPC) approach,which improves such a nominal model description from datausing Gaussian Processes (GPs) to safely enhance performanceof the system.

Learning methods for automatically identifying dynamicalmodels from data have gained significant attention in the pastyears [1] and more recently also for robotic applications [2],[3]. In particular, nonparametric methods, which have seenwide success in machine learning, offer significant potentialfor control [4]. The appeal of using Gaussian Process regres-sion for model learning stems from the fact that it requireslittle prior process knowledge and directly provides a measureof residual model uncertainty. In predictive control, GPs weresuccessfully applied to improve control performance whenlearning periodic time-varying disturbances [5]. The task of

All authors are with the Institute for Dynamic Systems and Control, ETHZurich. [lhewing|kabzanj|mzeilinger]@ethz.ch

This work was supported by the Swiss National Science Foundation undergrant no. PP00P2 157601 / 1.

learning the system dynamics as opposed to disturbances,however, imposes the challenge of propagating probabilitydistributions of the state over the prediction horizon of thecontroller. Efficient methods to evaluate GPs for Gaussianinputs have been developed in [6], [7], [8]. By successivelyemploying these approximations at each prediction time step,predictive control with GP dynamics has been presented in [9].In [10] a piecewise linear approximate explicit solution for theMPC problem of a combustion plant was presented. Applica-tion of a one-step MPC with a GP model to a mechatronicsystem was demonstrated in [11] and the use for fault-tolerantMPC was presented in [12]. A constrained tracking MPCfor robotic applications, which uses a GP to improve thedynamics model from measurement data, was shown in [13]. Avariant of this approach was realized in [14], where uncertaintyof the GP prediction is robustly taken into account usingconfidence bounds. Solutions of GP-based MPC problemsusing sequential quadratic programming schemes is discussedin [15]. A simulation study for the application of GP-basedpredictive control to a drinking water network was presentedin [16] and data efficiency of these formulations for learningcontrollers was recently demonstrated in [17].

The goal of this paper is both to provide an overview ofexisting techniques and to propose extensions for deriving asystematic and efficiently solvable approximate formulation ofthe MPC problem with a GP model, which incorporates con-straints and takes into account the model uncertainty for cau-tious control. Unlike most previous approaches, we specificallyconsider the combination of a nominal system description withan additive GP part which can be of different dimensionalityas the nominal model. This corresponds to a common scenarioin controller design, where an approximate nominal modelis known and can be improved on, offering a multitude ofadvantages. A nominal model allows for rudimentary systemoperation and the collection of measurement data, as well asthe design of pre-stabilizing ancillary controllers which reducethe state uncertainty in prediction. Additionally, it allows forlearning only specific effects from data, potentially reducingthe dimensionality of the machine learning task. For example,most dynamical models derived from physical laws includean integrator chain, which can be directly represented in thenominal dynamics and for which no nonlinear uncertaintymodel has to be learned from data—it would rather addunnecessary conservatism to the controller. The separationbetween system and GP model dimension is therefore key forthe application of the GP-based predictive control scheme tohigher order systems.

The paper makes the following contributions. A reviewand compact summary of approximation techniques for prop-

arX

iv:1

705.

1070

2v4

[cs

.SY

] 2

9 D

ec 2

019

2

agating GP dynamics and uncertainties is provided and ex-tended to the additive combination with nominal dynamics.The approximate propagation enables a principled formulationof chance constraints on the resulting state distributions interms of probabilistic reachable sets [18]. In addition, thenominal system description allows for reductin the GP modellearning to a subspace of states and inputs. We discuss sparseGPs and a tailored selection of inducing points for MPC toreduce the computational burden of the approach. The useof some or all of these techniques is crucial to make thecomputationally expensive GP approach practically feasiblefor control with sampling times in the millisecond range.We finally present two application examples. The first is asimulation of an autonomous underwater vehicle, illustratingkey concepts and advantages in a simplified setting. Second,we present a hardware implementation for autonomous racingof remote controlled cars, showing the real-world feasibilityof the approach for complex high performance control tasks.To the best of our knowledge this is the first hardwareimplementation of a Gaussian Process-based predictive controlscheme to a system of this complexity at sampling times of20 ms.

II. PRELIMINARIES

A. Notation

The i-th element of a vector x is denoted [x]i. Similarly,[M ]ij denotes element ij of a matrix M , and [M ]·i, [M ]i·its i-th column or row, respectively. We use diag(x) to referto a diagonal matrix with entries given by the vector x.The squared Euclidean norm weighted by M , i.e. xTMxis denoted ‖x‖2M . We use boldface to emphasize stackedquantities, e.g. a collection of vector-valued data in matrixform. ∇f(z) is the gradient of f evaluated at z. The Pontryaginset difference is denoted A B = {a |a+ b ∈ A ∀b ∈ B}. Anormally distributed vector x with mean µ and variance Σis given by x ∼ N (µ,Σ). The expected value of x is E(x),cov(x, y) is the covariance between vectors x and y, and thevariance var(x) = cov(x, x). We use p(x), (p(x|y)) to refer tothe (conditional) probability densities of x. Similary Pr(E),(Pr(E |A)) denotes the probability of an event E (given A).

Realized quantities during closed loop control are timeindexed using parenthesis, i.e. x(k), while quantities in pre-diction use subscripts, e.g. xi is the predicted state i-stepsahead.

B. Problem formulation

We consider the control of dynamical systems that can berepresented by the discrete-time model

x(k+1) = f(x(k), u(k))+Bd(g(x(k), u(k))+w(k)) , (1)

where x(k) ∈ Rnx is the system state and u(k) ∈ Rnu thecontrol inputs at time k. The model is composed of a knownnominal part f and additive dynamics g that describe initiallyunknown dynamics of the system, which are to be learnedfrom data and are assumed to lie in the subspace spanned byBd. We consider i.i.d. process noise wk ∼ N (0,Σw), whichis spatially uncorrelated, i.e. has diagonal variance matrix

Σw = diag([σ21 , . . . , σ

2nd

]). We assume that both f and g aredifferentiable functions.

The system is subject to state and input constraints X ⊆Rnx , U ⊆ Rnu , respectively. The constraints are formulatedas chance constraints, i.e. by enforcing

Pr(x(k) ∈ X ) ≥ px ,Pr(u(k) ∈ U) ≥ pu ,

where px, pu are the associated satisfaction probabilities.

C. Gaussian Process Regression

Gaussian process regression is a nonparametric frameworkfor nonlinear regression. A GP is a probability distributionover functions, such that every finite sample of function valuesis jointly Gaussian distributed. We apply Gaussian processesregression to infer the noisy vector-valued function g(x, u)in (1) from previously collected measurement data of statesand inputs {(xj , uj), j = 0, . . . ,M}. State-input pairs formthe input data to the GP and the corresponding outputs areobtained from the deviation to the nominal system model:

yj = g(xj , uj) + wj = B†d (xj+1 − f(xj , uj)) ,

where B†d is the Moore-Penrose pseudo-inverse. Note that themeasurement noise on the data points wj corresponds to theprocess noise in (1). With zj := [xTj , u

Tj ]T, the data set of the

GP is therefore given by

D = {y = [y0, . . . , yM ]T ∈ RM×nd ,

z = [z0, . . . , zM ]T ∈ RM×nz} .

Each output dimension is learned individually, meaning thatwe assume the components of each yj to be independent, giventhe input data zj . Specifying a GP prior on g in each outputdimension a ∈ {1, . . . , nd} with kernel ka(·, ·) and prior meanfunction ma(·) results in normally distributed measurementdata with

[y]·,a ∼ N (m(z),Kazz + Iσ2

a) , (2)

where Kazz is the Gram matrix of the data points, i.e.

[Kazz]ij = ka(zi, zj) and ma(z) = [ma(z0), . . . ,ma(zM )]

T.The choice of kernel functions ka and its parameterizationis the determining factor for the inferred distribution of gand is typically specified using prior process knowledge andoptimization [19], e.g. by optimizing the likelihood of theobserved data distribution (2). Throughout this paper weconsider the squared exponential kernel function

ka(zi, zj) = σ2f,a exp

(−1

2(zi − zj)TL−1a (zi − zj)

), (3)

in which La is a positive diagonal length scale matrix andσ2f,a the signal variance. It is, however, straightforward to use

any other (differentiable) kernel function.The joint distribution of the training data and an arbitrary

test point z in output dimension a is given by

p([y]a, [y]·,a) = N([ma(z)ma(z)

],

[Ka

zz + Iσ2a Ka

zz

Kazz Ka

zz

]), (4)

3

where [Kazz]j = ka(zj , z), Ka

zz = (Kazz)

T and similarlyKazz = ka(z, z). The resulting distribution of [y]a condi-

tioned on the observed data points is again Gaussian withp([y]a | [y]·,a) = N

(µda(z),Σda(z)

)and

µda(z) = Kazz(Ka

zz + Iσ2a)−1

[y]·,a , (5a)

Σda(z) = Kazz −Ka

zz(Kazz + Iσ2

a)−1Ka

zz . (5b)

The resulting multivariate GP approximation of the unknownfunction g(z) is then simply given by stacking the individualoutput dimensions, i.e.

d(z) ∼ N(µd(z),Σd(z)

)(6)

with µd = [µd1, . . . , µdnd

]T and Σd = diag([Σd1, . . . ,Σdnd

]T).Evaluating mean and variance in (5) has cost O(ndnzM)

andO(ndnzM2), respectively and thus scales with the number

of data points. For large amounts of data points or fastreal-time applications this can limit the use of GP models.To overcome these issues, various approximation techniqueshave been proposed, one class of which is sparse Gaussianprocesses using inducing inputs [20], briefly outlined in thefollowing.

D. Sparse Gaussian Processes

Many sparse GP approximations make use of inducingtargets yind, inputs zind and conditional distributions to approx-imate the joint distribution (4) [21]. Many such approximationsexist, a popular of which is the Fully Independent TrainingConditional (FITC) [22], which we make use of in thispaper. Given a selection of inducing inputs zind and usingthe shorthand notation Qa

ζζ:= Ka

ζzind(Ka

zindzind)−1Ka

zindζthe

approximate posterior distribution is given by

µda(z) = Qazz(Qazz + Λ)−1

[y]·,a , (7a)

Σda(z) = Kazz −Qazz(Qazz + Λ)

−1Qzz (7b)

with Λ = diag(Kazz − Qazz + Iσ2

a). Concatenating the outputdimensions similar to (5) we arrive at the approximation

d(z) ∼ N(µd(z), Σd(z)

).

Several of the matrices used in (7) can be precomputed andthe evaluation complexity becomes independent of the numberof original data points. With M being the number of inducingpoints, this results in O(ndnzM) and O(ndnzM

2) for thepredictive mean and variance, respectively.

There are numerous options for selecting the inducinginputs, e.g. heuristically as a subset of the original data points,by treating them as hyperparameters and optimizing theirlocation [22], or letting them coincide with test points [23],which is often referred to as transductive learning. In Sec-tion III-C we make use of such transductive ideas and proposea dynamic selection of inducing points, with a resulting localapproximation tailored to the predictive control task.

III. MPC CONTROLLER DESIGN

We consider the design of an MPC controller for system (1)using a GP approximation d of the unknown function g:

xi+1 = f(xi, ui) +Bd (d(xi, ui) + wi) . (8)

At each time step, the GP approximation evaluates to astochastic distribution according to the residual model uncer-tainty and process noise, which is then propagated forwardin time. A stochastic MPC formulation allows the principledtreatment of chance constraints, i.e. by imposing a prescribedmaximum probability of constraint violation. Denoting a stateand input reference trajectory by Xr = {xr0, . . . , xrN}, Ur ={ur0, . . . , urN−1}, respectively, the resulting stochastic finitetime optimal control problem can be formulated as

minΠ(x)

E

(lf (xN−xrN ) +

N−1∑i=0

l(xi−xri , ui−uri )

)(9a)

s.t. xi+1 = f(xi, ui) +Bd(d(xi, ui) + wi), (9b)ui = πi(xi), (9c)

Pr(xi+1 ∈ X ) ≥ px, (9d)Pr(ui ∈ U) ≥ pu, (9e)

x0 = x(k) , (9f)

for all i = 0, . . . , N − 1 with appropriate terminal costlf (xN ) and stage cost l(xi, ui), where the optimization iscarried out over a sequence of input policies Π(x) ={π0(x), . . . , πN−1(x)}.

In the form (9), the optimization problem is computationallyintractable. In the following sections, we present techniquesfor deriving an efficiently solvable approximation. Specifically,we use affine feedback policies with precomputed linear gainsand show how this allows simple approximate propagationof the system uncertainties in terms of mean and variance inprediction. Using these approximate distributions, we presenta framework for reformulating chance constraints determin-istically and briefly discuss the evaluation of possible costfunctions.

A. Ancillary Linear State Feedback Controller

Optimization over general feedback policies Π(x) is an in-finite dimensional optimization problem. A common approachthat integrates well in the presented framework is to restrictthe policy class to linear state feedback controllers

πi(xi) = Ki(µxi − xi) + µui ,

where µxi is the predicted mean of the state distribution. Usingpre-selected gains Ki, we then optimize over µui , the mean ofthe applied input. Note that due to the ancillary control law theinput ui applied in prediction is a random variable, resultingfrom the affine transformation of xi.

In Fig. 1, the effect of state feedback on the propagation ofuncertainty is exemplified. It shows the evolution over time ofa double integrator with a quadratic friction term inferred by aGP. The right plot displays mean and 2-σ variance of a numberof trajectories from repeated simulations of the system with

4

−2 0 2

−1

0

1

[x]2

d([x] 2)

0 10 20

−0.5

0

0.5

k

[x] 1

open-loopclosed-loop

Fig. 1: Propagation of uncertainty for double integrator withGP trained on a nonlinear friction term. The left plot displaysthe GP with 2-σ confidence bound, while the right plot showsthe mean and 2-σ variance of repeated simulation runs of thesystem under an open-loop control sequence and closed-loopstate feedback control.

different noise realizations. Mean and variance under an open-loop control sequence derived from a linear quadratic infinitetime optimal control problem are shown in red. The resultswith a corresponding LQR feedback law applied in closed-loop are shown in blue. As evident, open-loop input sequencescan lead to rapid growth of uncertainty in the prediction, andthereby cause conservative control actions in the presence ofchance constraints. Linear ancillary state feedback controllersare therefore commonly employed in stochastic and robustMPC [24].

The adequate choice of ancillary feedback gains Ki isgenerally a hard problem, especially if the system dynamicsare highly nonlinear. A useful heuristic is to consider alinearization of the system dynamics around an approximateprediction trajectory, which in MPC applications is typicallyavailable using the solution trajectory of the previous timestep. Feedback gains for the linearized system can be derivede.g. by solving a finite horizon LQR problem [25]. For mildor stabilizing nonlinearities, a fixed controller gain Ki = Kcan be chosen to reduce computational burden, as e.g. donefor the example in Fig. 1.

B. Uncertainty Propagation

1) Approximation as Normal Distributions: Because ofstochastic process noise and the representation by a GPmodel, future predicted states result in stochastic distributions.Evaluating the posterior of a GP from an input distributionis generally intractable and the resulting distribution is notGaussian [6]. While under certain assumptions on g, somestrict over-approximations of the resulting distributions ex-ist [26], they are typically very conservative and computation-ally demanding. We focus instead on computationally cheapand practical approximations at the cost of strict guarantees.

State, control input and nonlinear disturbance are approxi-mated as jointly Gaussian distributed at every time step.[

xTi uTi (di + wi)T]T∼ N (µi,Σi)

= N

µxiµuiµdi

,Σxi Σxui Σxdi? Σui Σudi? ? Σdi + Σw

, (10)

where Σui = KiΣxiK

Ti , Σxui = ΣxiK

Ti and ? denotes terms

given by symmetry. Considering the covariances betweenstates, inputs and GP, i.e. Σxd and Σud, is of great importancefor an accurate propagation of uncertainty when the GPmodel d is paired with a nominal system model f . Using alinearization of the nominal system dynamics around the mean

f(x, u) ≈ f(µx, µu) +∇f(µx, µu)

([xu

]−[µx

µu

]),

similar to extended Kalman filtering, this permits simpleupdate equations for the state mean and variance based onaffine transformations of the Gaussian distribution in (10)

µxi+1 = f(µxi , µui ) +Bdµ

di ,

Σxi+1 = [∇f(µxi , µui ) Bd] Σi[∇f(µxi , µ

ui ) Bd]

T.

2) Gaussian Process Prediction from Uncertain Inputs: Inorder to define µdi ,Σ

di , Σxdi and Σudi different approximations

of the posterior of a GP from a Gaussian input have beenproposed in the literature. In the following, we will give a briefoverview of the most commonly used techniques, for detailsplease refer to [6], [7], [8]. For comparison, we furthermorestate the computational complexity of these methods. Fornotational convenience, we will use

µzi =

[µxiµui

], Σzi =

[Σxi Σxui? Σui

], Σzdi =

[ΣxdiΣudi

].

a) Mean Equivalent Approximation: A straightforwardand computationally cheap approach is to evaluate (6) at themean, such that

µdi = µd(µzi ) , (12a)[ΣzdiΣdi

]=

[0

Σd(µzi )

]. (12b)

In [27] it was demonstrated that this can lead to poor ap-proximations for increasing prediction horizons as it neglectsthe accumulation of uncertainty in the GP. The fact that thisapproach neglects covariance between d and (x, u) can inaddition severely deteriorate the prediction quality when pairedwith a nominal system f(x, u). The computationally mostexpensive operation is the matrix multiplication in (5) for eachoutput dimension of the GP, such that the complexity of oneprediction step is O(ndnzM

2).b) Taylor Approximation: Using a first-order Taylor ap-

proximation of (5), the expected value, variance and covari-ance of the resulting distribution results in

µdi = µd(µzi ) , (13a)[ΣzdiΣdi

]=

[Σzi (∇µd(µzi ))

T

Σd(µzi ) +∇µd(µzi )Σzi (∇µd(µzi ))T

]. (13b)

Compared to (12) this leads to correction terms for theposterior variance and covariance, taking into account thegradient of the posterior mean and the variance of the inputof the GP. The complexity of one prediction step amountsto O(ndn

2zM

2). Higher order approximations are similarlypossible at the expense of increased computational effort.

5

−3 −2 −1 0 1

−1.5

−1

−0.5

0

[x]2

[x] 2+

d([x] 2)

Mean Eq.Taylor Appr.Exact MMM. Carlo

Fig. 2: Comparison of prediction methods for Gaussian input[x]2. The posterior distribution [x]2+d([x]2) is evaluated withthe different approximation methods of Section III-B2. Forreference, the true distribution is approximated by Monte Carlosimulation.

c) Exact Moment Matching: It has been shown thatfor certain mean and kernel functions, the first and secondmoments of the posterior distribution can be analyticallycomputed [6]. Under the assumption that zi = [xTi , u

Ti ]T is

normally distributed, we can define

µdi = E (d(zi)) , (14a)[ΣzdiΣdi

]=

[cov (zi, d(zi))var (d(zi)) .

](14b)

In particular, this is possible for a zero prior mean functionand the squared exponential kernel. As the procedure exactlymatches first and second moments of the posterior distribution,it can be interpreted as an optimal fit of a Gaussian to the truedistribution. Note that the model formulation directly allowsfor the inclusion of linear prior mean functions, as they can beexpressed as a nominal linear system in (8), while preservingexact computations of mean and variance. The computationalcomplexity of this approach is O(n2dnzM

2) [8]. In Fig. 2 thedifferent approximation methods are compared for a one-stepprediction of the GP shown in Fig. 1 with normally distributedinput. As evident, the predictions with Taylor Approximationand Moment Matching are qualitatively similar, which willtypically be the case if the second derivative of the posteriormean and variance of the GP is small over the input distribu-tion. It is furthermore apparent that the posterior distributionis not Gaussian and that the approximation as a Gaussiandistribution leads to prediction error, even if the first twomoments are matched exactly. This can lead to the effect thatlocally the Taylor Approximation provides a closer fit to theunderlying distribution. The Mean Equivalent Approximationis very conservative by neglecting the covariance between [x]2and d([x]2). Note that depending on the sign of the covarianceit can also be overly confident.

All presented approximations scale directly with the inputand output dimensions, as well as the number of data pointsand thus become expensive to evaluate in high dimensionalspaces. This presents a challenge for predictive control ap-proaches with GP models, which in the past have mainlyfocused on relatively small and slow systems [9], [12]. Usinga nominal model, however, it is possible to consider GPs that

depend on only a subset of states and inputs, such that thecomputational burden can be significantly reduced. This is dueto a reduction in the effective input dimension nz , and moreimportantly due to a reduction the in necessary training pointsM for learning in a lower dimensional space.

Remark 1. With only slight notational changes, the presentedapproximation methods similarly apply to prior mean andkernel functions that are functions of only a subset of statesand inputs.

Another significant reduction in the computational complex-ity can be achieved by employing sparse GP approaches, forwhich in the following we present a modification tailored topredictive control.

C. Dynamic Sparse GPs for MPC

To reduce the computational burden one can make use ofsparse GP approximations as outlined in Section II-D, whichoften comes with little deterioration of the prediction quality. Aproperty of the task of predictive control is that it lends itself totransductive approximation schemes, meaning that we adjustthe approximation based on the test points, i.e. the (future)evaluation locations in order to achieve a high fidelity localapproximation. In MPC, some knowledge of the test pointsis typically available in terms of an approximate trajectorythrough the state-action space. This trajectory can, e.g., begiven by the reference signal or a previous solution trajectorywhich will typically lie close to the current one. We thereforepropose to select inducing inputs locally at each samplingtime according to the prediction quality on these approximatetrajectories. Ideally, the local inducing inputs would be opti-mized to maximize predictive quality, which is, however, not

x1

x2

x1

x2

2 4 6 8 100

0.2

0.4

i

√Σ

d(x

i) full GP

sparse GP

Fig. 3: Illustration of dynamic sparse approximation [28].Countor plot of the posterior variance of the full GP (topleft) and dynamic sparse approximation (top right) with corre-sponding data points as black crosses. Trajectories planned byan MPC are shown as solid red lines, while the dashed linesshow the prediction of the previous time step and used in theapproximation, with the chosen inducing points indicated byblack circles. The bottom plot shows the respective variancesalong the planned trajectory.

6

computationally feasible in the targeted millisecond samplingtimes. Instead, inducing inputs are selected heuristically alongthe approximate state input trajectory, specifically we focushere on the MPC solution computed at a previous time step.

To illustrate the procedure, we consider a simple doubleintegrator system controlled by an MPC. Fig. 3 shows thevariance Σd(x) of a GP trained on the systems’ states. Addi-tionally, two successive trajectories from an MPC algorithmare displayed, in which the solid red line is the currentprediction, while the dashed line is the prediction trajectoryfrom the previous iteration. The plot on the left displays theoriginal GP, with data points marked as crosses, whereas onthe right we have the sparse approximation resulting fromplacing inducing points along the previous solution trajectory,indicated by circles. The figure illustrates how full GP andsparse approximation match closely along the predicted tra-jectory of the system, while approximation quality far awayfrom the trajectory deteriorates. Since current and previoustrajectory are similar, however, local information is sufficientfor computation of the MPC controller.

D. Chance Constraint Formulation

The tractable Gaussian approximation of the state and inputdistribution over the prediction horizon in (10) can be used toapproximate the chance constraints in (9d) and (9e).

We reformulate the chance constraints on state and inputw.r.t. their means µx, µu using constraint tightening based onthe respective errors exi = µxi − xi and eui = Ki(µ

xi − xi).

For this we make use of probabilistic reachable sets [18],an extension of the concept of reachable sets to stochasticsystems, which is related to probabilistic set invariance [29],[30].

Definition 1 (Probabilistic i-step Reachable Set). A set R issaid to be a probabilistic i-step reachable set (i-step PRS) ofprobability level p if

Pr(ei ∈ R | e0 = 0) ≥ p .

Given i-step PRS Rx of probability level px for the stateerror exi and similarly Ru for the input eui , we can definetightened constraints on µxi and µui as

µxi ∈ Z = X Rx , (15a)µui ∈ V = U Ru , (15b)

where denotes the Pontryagin set difference. Satisfactionof the tightened constraints (15) for the mean thereby impliessatisfaction of the original constraints (9d) and (9e), i.e. whenµxi ∈ Z we have Pr(xi = µxi +exi ∈ X ) ≥ Pr(exi ∈ Rx) ≥ px.

Under the approximation of a normal distribution of xi, theuncertainty in each time step is fully specified by the variancematrix Σxi and Σui . The sets can then be computed as functionsof these variances, i.e. Rx(Σxi ) and Ru(Σxi ), for instance asellipsoidal confidence regions. The online computation andrespective tightening in (15), however, is often computationallyprohibitive for general convex constraint sets.

In the following, we present important cases for whicha computationally cheap tightening is possible, such that it

can be performed online. For brevity, we concentrate on stateconstraints, as input constraints can be treated analogously.

1) Half-space constraints: Consider the constraint set Xgiven by a single half-space constraint X hs :=

{x∣∣hTx ≤ b},

h ∈ Rn, b ∈ R+. Considering the marginal distribution of theerror in the direction of the half-space hTexi ∼ N (0, hTΣxi h)enables us to use the quantile function of a standard Gaus-sian random variable φ−1(px) at the needed probability ofconstraint satisfaction px to see that

Rx(Σxi ) :=

{e

∣∣∣∣hTe ≤ φ−1(px)√hTΣxi h

}is an i-step PRS of probability level px. In this case, evaluatingthe Pontryagin difference in (15) is straightforward and we candirectly define the tightened constraint on the state mean

Zhs(Σxi ) :=

{z

∣∣∣∣hTz ≤ b− φ−1(px)√hTΣxi h

}.

Remark 2. A tightening for slab constraintsX sl =

{x∣∣|hTx| ≤ b} can be similarly derived as

Zsl(Σxi ) :=

{z

∣∣∣∣|hTz| ≤ b− φ−1(px+1

2

)√hTΣxi h

}.

2) Polytopic Constraints: Consider polytopic constraints,given as the intersection of nj half-spaces

X p ={x∣∣hTjx ≤ bj ∀j = 1, . . . , nj

}.

Making use of ellipsoidal PRS for Gaussian random vari-ables one can formulate a semidefinite program to tightenconstraints [31], [32], which is computationally demandingto solve online. Computationally cheaper approaches typicallyrely on more conservative bounds. One such bound is givenby Boole’s inequality

Pr(E1 ∧ E2) ≤ Pr(E1) + Pr(E2) , (16)

which allows for the definition of polytopic PRS based onindividual half-spaces. We present two possibilities, the firstof which is based on bounding the violation probability of theindividual polytope faces, and a second which considers themarginal distributions of exi .

a) PRS based on polytope faces: Similar to the treatmentof half space constraints, we can define a PRS on the state errorex which is aligned with the considered constraints. UsingBoole’s inequality (16), we have that

Rx(Σxi ) ={e

∣∣∣∣hTj e ≤ φ−1(1− 1−pxnj

)√hTjΣxi hj ∀j = 1, . . . , nj

}is a PRS of probability level px. The tightening results in

Zp(Σxi ) ={z

∣∣∣∣hTj z ≤ bj − φ−1(1− 1−pxnj

)√hTjΣxi hj ∀j = 1, . . . , nj

}which scales with the number of faces of the polytope andcan therefore be quite conservative if the polytope has many(similar) faces.

7

b) PRS based on marginal distributions: Alternatively,one can define a PRS based on the marginal distribution ofthe error ex in each dimension, which therefore scales with thestate dimension. We use Boole’s inequality (16) with Remark 2directly on marginal distributions in each dimension of thestate to define a box-shaped PRS of probability level px

Rx(Σxi ) =

{e

∣∣∣∣|[e]j | ≤ φ−1(p)√

[Σxi ]j,j ∀j = 1, . . . , nx

}.

with p = 1 −(

1nx− px+1

2nx

). To compute the Pontryagin

difference (15) we make use of the following Lemma.

Lemma 1. Let A = {x |Hx ≤ b} be a polytope andB =

{e∣∣−r ≤ e ≤ r, r ∈ Rn+

}, a box in Rn.

Then A B = {x |Hx ≤ b− |H|r}, where |H| is theelement-wise absolute value.

Proof. From definition of the Pontryagin difference we have

A B = {x |x+ e ∈ A ∀e ∈ B}= {x |H(x+ e) ≤ b ∀e ∈ B}

=

{x

∣∣∣∣[H]j,·x ≤ [b]j −maxe∈B

[H]j,·e ∀j = 1, . . . , n

}= {x |[H]j,·x ≤ [b]j − |[H]j,·|r ∀j = 1, . . . , n}= {x |Hx ≤ b− |H|r} ,

proving the result.

The resulting tightened set is therefore

Zp(Σxi ) ={z∣∣∣Hz ≤ b(Σxi )

}with b(Σxi ) = b− |H|φ−1(p)

√diag(Σxi ), where diag(·) is the

vector of diagonal elements and the square root of the vectoris taken element wise.

Remark 3. Treating polytopic constraints through (16) canlead to undesired and conservative individual constraints [33].Practically, it can therefore be beneficial to constrain theprobability of violating each individual half-space constraint.

E. Cost Function

Given the approximate joint normal distribution of state andinput, the cost function (9a) can be evaluated using standardstochastic formulations. For simplicity, we focus here on costsin form of expected values, which encompass most stochasticobjectives typically considered. While the expected value forgeneral cost functions needs to be computed numerically, thereexist a number of functions for which evaluation based onmean and variance information can be analytically expressedand is computationally cheap. The most prominent examplefor tracking tasks is a quadratic cost on states and inputs

l(xi − xri , ui − uri ) = ‖xi − xri ‖2Q + ‖ui − uri ‖2R (17)

with appropriate weight matrices Q and R, typically satisfyingQ � 0 and R � 0, resulting in

E (l(xi − xri , ui − uri )) =

‖µxi − xri ‖2Q + tr(QΣxi ) + ‖µui − uri ‖2R + tr(RΣui ) .

Further examples include a saturating cost [8], risk sensitivecosts [34] or radial basis function networks [17].

We refer to the evaluation of the expected value (9a) interms of mean and variance as

E (l(xi − xri , ui − uri )) = c(µxi − xri , µui − uri ,Σxi )

and similarly cf (µxN − xrN ,ΣxN ) for the terminal cost.

Remark 4. While many performance indices can be expressedas expected values, there exist various stochastic cost mea-sures, such as conditional value-at-risk. Given the approximatestate distributions, many can be similarly considered.

F. Tractable MPC Formulation with GP Model

By bringing together the approximations of the previoussections, the following tractable approximation of the MPCproblem (9) can be derived:

min{µui }

cf (µxN−xrN ,ΣxN ) +

N−1∑i=0

c(µxi −xri , µui −uri ,Σxi )

s.t. µxi+1 = f(µxi , µui ) +Bdµ

di ,

Σxi+1 = [∇f(µxi , µui ) Bd] Σi[∇f(µxi , µ

ui ) Bd]

T,

µxi+1 ∈ Z(Σxi+1),

µui ∈ V(Σxi ),

µdi ,Σi according to (10) and (12), (13) or (14),µx0 = x(k),Σx0 = 0

(18)for i = 0, . . . , N − 1. The resulting optimal control law isobtained in a receding horizon fashion as κ(x(k)) = µu∗0 ,where µu∗0 is the first element of the optimal control sequence{µu∗0 , . . . , µu∗N } obtained from solving (18) at state x(k).

The presented formulation of the MPC problem with a GP-based model results in a non-convex optimization problem.Assuming twice differentiability of kernel and prior meanfunction, second-order derivative information of all quantitiesis available. Problems of this form can typically be solved tolocal optima using Sequential Quadratic Programming (SQP)or nonlinear interior-point methods [35]. There exist a numberof general-purpose solvers that can be applied to solve thisclass of optimization problems, e.g. the openly availableIPOPT [36]. In addition, there are specialized solvers thatexploit the structure of predictive control problems, such asACADO [37] or FORCES Pro [38], [39], which implements anonlinear interior point method. Derivative information can beautomatically generated using automated differentiation tools,such as CASADI [40].

In the following, we demonstrate the proposed algorithmand its properties using one simulation and one experimentalexample. The first is an illustrative example of an autonomousunderwater vehicle (AUV) which, around a trim point, is welldescribed by nominal linear dynamics but is subject to nonlin-ear friction effects for larger deviations. The second exampleis a hardware implementation demonstrating the approach forautonomous racing of miniature race cars which are describedby a nonlinear nominal model. While model complexity andsmall sampling times require additional approximations in this

8

second example, we show that we can leverage key benefitsof the proposed approach in a highly demanding real-worldapplication.

IV. ONLINE LEARNING FOR AUTONOMOUS UNDERWATERVEHICLE

We consider the depth control of an autonomous under-water vehicle (AUV). The vehicle is described by a nonlinearcontinuous-time model of the vehicle at constant surge velocityas ground truth and for simulation purposes [41]. Around atrim point of purely horizontal movement, the states and inputof the system are

[x]1 the pitch angle relative to trim point [rad],[x]2 the heave velocity relative to trim point [m/s],[x]3 the pitch velocity [rad/s],u the stern rudder deflection around trim point [rad].We assume that an approximate linear system model is

given, which in practice can be established using methodsof linear system identification around the trim point of thesystem. Using a zero-order hold discretization of the linearpart of the model with Ts = 100 ms we obtain

x(k+1) = Ax(k) +Bu(k) +Bd (g(x(k)) + w(k)) ,

which is in the form of (1). The nonlinearity results fromfriction effects when moving in the water, which is thereforemodeled as only affecting the (continuous time) velocity states,

Bd =

0T 2s

2Ts 00 Ts

, g(x(k)) = g([x(k)]2, [x(k)]3) : R2 → R2 .

Note that the eigenvalues of A are given by λ ={1.1, 1.03, 1, 0.727}, i.e. the linear system has one integratingand two unstable modes.

In the considered scenario, the goal is to track referencechanges from the original zero set point to a pitch angle of30◦, back to zero and then to 45◦. We furthermore consider asafety state constraint on the pitch angle of at least 10◦ belowthe reference, as well as input constraints corresponding to±20◦ rudder deflection. In this example, we treat the case ofonline learning, that is we start without any data about thenonlinearity g, collect measurement data during operation andenhance performance online.

A. GP-based Reference Tracking Controller

GP data D = {y, z} is generated by calculating thedeviation of the linear model from the measured states, asdescribed in Section II-C, where the input data is reduced tothe velocity states zj = [[xj ]2, [xj ]3]T. Data is continuouslyupdated during the closed-loop run. Specifically, we considera new data point every 5 time steps, and keep track of 30data points by discarding the oldest when adding a new point.The training data is initialized with 30 data points of zeroinput and zero output. We employ a squared exponential ker-nel (3) for both output dimensions with fixed hyperparametersL1 = L2 = diag([0.35, 0.15]T), and variances σ2

f,1 = 0.04and σ2

f,2 = 0.25.

2 3 4 5 6

0

0.5

time [s]

[xr]1

[x]1u

Fig. 4: Predicted pitch angle [x]1 and rudder deflection u.Dotted lines are the mean prediction and shaded the 2-σconfidence region. The dashed lines show the correspondingstate and input constraints, while the black line is the pitchangle reference.

We use a quadratic stage cost as in (17), with weightmatrices Q = diag([1, 0, 10, 0.5T]), R = 20 and a pre-diction horizon N = 35 to track a pitch angle reference.The terminal cost P is chosen according to the solution ofthe associated discrete-time algebraic Riccati equation, i.e.the LQR cost. As ancillary linear controller Ki, an infinitehorizon LQR control law based on the linear nominal modelis designed using the same weights as in the MPC and usedin all prediction steps i. This stabilizes the linear system andreduces uncertainty growth over the prediction horizon. Topropagate the uncertainties associated with the GP we makeuse of the Taylor Approximation outlined in Section III-B2.Constraints are introduced based on Remark 3 consideringa maximum probability of individual constraint violation of2.28%, corresponding to a 2-σ confidence bound. The MPCoptimization problem (18) is solved using FORCES Pro [38],[39].

B. Results

Fig. 4 shows the prediction of the GP-based MPC for thefirst reference change from 0◦ to 30◦. Since no data wascollected on the state trajectory necessary for this change, thepredicted state and input trajectories are uncertain and a safetymargin to the state constraint at the end of the horizon isenforced. The resulting closed-loop trajectory, during whichthe system learns from online data, is displayed in the topplot of Fig. 5a. For comparison, we run the same simulationwith a soft constrained linear MPC formulation with thesame cost function, which does not consider nonlinearitiesor uncertainties in the dynamics, shown in the bottom plot.The results demonstrate improved performance of the GP-based MPC over the linear MPC, especially with regard toconstraint satisfaction. The constraint on minimum pitch angleis violated under the linear MPC control law during bothreference changes, even though the soft constraint is chosenas an exact penalty function such that constraints are alwayssatisfied, if possible. Fig. 5b exemplifies the residual modelerror during the closed-loop simulation applying the GP-basedcontroller, as well as the predicted 2-σ residual error bound ofthe GP. We observe the largest error during reference changes,which are anticipated by the GP-uncertainty, overall matchingthe resulting residual errors well.

9

−0.5

0

0.5

1[x]1u

0 10 20 30 40−0.5

0

0.5

1

time [s]

(a) GP-based MPC (top) and linear MPC (bottom). The solid gray lineshows the reference value of the pitch angle, dashed in the respectivecolor the state and input constraints.

0 10 20 30 40−4

−2

0

2

4·10−2

time [s]

e(k)

(b) Residual model error with GP-based MPC. Measured model erroras black dots, 2-σ residual model uncertainty from GP shaded blue.

Fig. 5: Simulation results of GP-based MPC for autonomousunderwater vehicle in an online learning scenario.

V. AUTONOMOUS RACING

As a second example we consider an autonomous racingscenario, in which the goal is to drive a car around atrack as quickly as possible, while keeping the vehicle safe,i.e. while avoiding collision with the track boundaries. Thecontroller is based on a model predictive contouring controlformulation [42], [43] which has been applied to the problemof autonomous racing in [44]. Preliminary simulations resultswere published in [28].

A. Car Dynamics

The race cars are modeled by continuous-time nominaldynamics x = f c(x, u) obtained from a bicycle modelwith nonlinear tire forces given by a simplified Pacejka tiremodel [45], which results in the following states and inputs

x = [X,Y,Φ, vx, vy, ω]T, u = [p, δ]T ,

with position xXY = [X,Y ]T, orientation Φ, longitudinal and

lateral velocities vx and vy , and yaw rate ω. The inputs to thesystem are the motor duty cycle p and the steering angle δ.For details on the system modeling please refer to [44], [28].

For use in the MPC formulation, we discretize the sys-tem using a Runge-Kutta method with a sampling time ofTs = 20 ms. In order to account for model mismatch due toinaccurate parameter choices and limited fidelity of this simplemodel, we add g(x, u) capturing unmodeled dynamics, as wellas additive Gaussian white noise w. Due to the structure of thenominal model, i.e. since the dynamics of the first three states

are given purely by kinematic relationships, we assume thatthe model uncertainty, as well as the process noise w, onlyaffect the velocity states vx, vy and ω of the system. Fromphysical considerations, we can also infer that the unmodeleddynamics do not depend on the position states, i.e. we assume

Bd = [0 I]T, g(x, u) = g(vx, vy, ω, p, δ) : R5 → R3 ,

resulting in

x(k+1) = f(x(k), u(k))+Bd(g(x(k), u(k))+w(k)) .

The system is subject to input constraints U , i.e. the steeringangle is limited to lie in ±δmax and the duty cycle has to lie[−0.1, 1], where the negative values correspond to negativeapplied torques, i.e. breaking. Additionally, operation requiresthe vehicle to stay on track, which is expressed as a stateconstraint X on the car’s position.

B. GP-based Racing Controller

We consider a race track given by its centerline and a fixedtrack width. The centerline is described by a piecewise cubicspline polynomial, which is parameterized by the path lengthΘ. Progress along the track is characterized by Θi, whichis introduced as an additional state and enters the consideredcost function linearly, encouraging a maximization of progressalong the track. Given a Θi, we can evaluate the correspondingcenterline position C(Θ) = [Xc(Θ), Yc(Θ)]

T, such that theconstraint for the car to stay within the track boundaries canbe expressed as

X (Θi) :={xXY

∣∣ ∥∥xXY − C(Θi)∥∥ ≤ r} ⊂ R2 ,

where r is half the track width.Based on approximate Gaussian distributions of the state

in prediction we have for the position error eXYi = xXYi −µXYi ∼ N (0,ΣXYi ) and define an ellipsoidal PRS as

Rell(ΣXYi ) ={eXY

∣∣∣(eXY )T(ΣXYi )−1eXY ≤ χ2

2(px)},

where χ22(px) is the quantile function of the chi-squared

distribution with two degrees of freedom. In order to simplifyconstraint tightening of X (Θi) we find an outer approximationby a ball Rb ⊆ Rell as

Rb(ΣXYi ) =

{eXY

∣∣∣∣∥∥eXY ∥∥ ≤√λmax

(ΣXYi

)χ22(px)

},

Fig. 6: Illustration of the constraint tightening procedure.The effective track radius is adjusted based on the predictedposition uncertainty.

10

Vel

ocity

[m/s

]

0

1

2

3

(a) Nominal controller

Vel

ocity

[m/s

]

0

1

2

3

(b) GP-based controller

Fig. 7: Comparison of racelines with nominal and GP-based controller. The color indicates the 2-norm of the velocity, the redcross is the starting point of the race car.

where λmax(·) is the maximum eigenvalue which can bereadily computed since ΣXYi is a 2 by 2 matrix. The necessaryconstraint tightening can therefore be expressed as

Z(Θi,ΣXYi ) = X (Θi)Rb(ΣXYi ) (19)

={zXY

∣∣∥∥zXY − C(Θi)∥∥ ≤ r (ΣXYi )}

,

where r(ΣXYi

)= r −

√χ22(px)λmax

(ΣXYi

). Fig. 6 exem-

plifies the predicted evolution of the cars position and theresulting constraint tightening.

The state is extended by previously applied inputs and largeinput changes are additionally penalized. We make use of theTaylor approximation (13) to propagate uncertainties withoutthe use of an ancillary state feedback controller, i.e. K = 0.The prediction horizon is chosen as N = 30 and we formulatethe chance constraints (19) with χ2

2(px) = 1. To reduceconservatism of the controller, constraints are instead onlytightened for the first 20 prediction steps and are applied to themean for the remainder of the prediction horizon, similar to themethod used in [46]. We reduce computation times by makinguse of the dynamic sparse approximations with 10 inducingpoints as outlined in Section III-C, placing the inducing inputsregularly along the previous solution trajectory.

To ensure real-time feasibility of the approach for thesampling time of 20ms two additional approximations areapplied. The variance dynamics are pre-evaluated based on theprevious MPC solution, which enables the pre-computationof state constraints (19) such that they remain fixed duringoptimization. Additionally, we neglect the mean predictionof the lateral velocity error vy since the state is difficult toestimate reliably and the error generally small, such that weobserved no improvement when including it in the controlformulation.

C. Results

We start out racing the car using the nominal controller, i.e.the controller without an added GP term, which therefore doesnot consider uncertainties for constraint tightening. Since thenominal model is not well tuned, driving behavior is somewhat

erratic and there are a number of small collisions with the trackboundaries. This can be observed in the racelines plotted inFig. 7a showing 20 laps run with the nominal controller. Usingthe collected data, we train the GP error model d using 325data points. We infer the hyperparameters in (3) as well as thenoise level Σw using maximum likelihood optimization. Theresulting racelines of 20 laps with the GP-based controllerare displayed in Figure 7b, generally showing a much moreconsistent and safe racing behavior. In particular, it can beseen that almost all of the systematic and persistent problemsin the raceline of the nominal controller can be alleviated.

Fig. 8 shows the encountered dynamics error in the yaw-rateand the predicted error during the first lap with the sparse GP-based controller. Mean and residual uncertainty predicted bythe GP matches the encountered errors well. It is importantto note that the apparent volatility in the plot is not due tooverfitting, but is instead due to fast changes in the inputand matches the validation data, i.e. the measured errors. Toquantify performance of the proposed controllers we comparein Table I average lap time T l, minimum lap time Tl,min aswell as the average 2-norm error in the system dynamics ‖e‖,i.e. the difference between the mean state after one prediction

6 8 10 12−4

−2

0

2

4

time [s]

dω

[rad/s]

Fig. 8: Dynamic sparse GP compensation of the yaw-rate errorwith 10 inducing inputs during the first race lap. The blackdots show the measured error on the yaw rate at each timestep, while the blue line shows the error predicted by the GP.The shaded region is the 2-σ confidence interval.

11

TABLE I: Experimental results

Controller T l [s] Tl,min [s] ‖e‖ [−] Tc [ms] Tc < 20ms

Nominal 10.32 9.65 0.73 18.2 76.3%GP-based 9.61 9.27 0.33 17.2 99.8%

step and the realized state, e(k+1) = µx1 − x(k+1). We seethat the GP-based controller is able to improve significantly onall these quantities, with an average lap time improvement of0.71 s, or almost 7%, which constitutes a large improvementin the considered racing task. This is in part due to theimproved system model, as evident in the average dynamicserror ‖e‖, but also due to the cautious nature of the controller,which helps to further reduce collisions and large problemsin the raceline. Due to the cautious nature, the minimum laptime gains are slightly less pronounced. In fact, the nominalcontroller consistently displays higher top speeds, which oftentimes, however, lead to significant problems at the breakpointto a slow corner. Computation times are reported as averagesolve times T c and the percentage of solutions in under 20 ms,Tc < 20 ms. Average solution times of nominal and GP-basedcontroller are similar. The percentiles of solutions in under20 ms, however, differ significantly. This is mainly due tofrequent large re-planning actions occurring with the nominalcontroller.

The results therefore demonstrate that the presented GP-based controller can significantly improve performance whilemaintaining safety in a hardware implementation of a complexsystem with small sampling times.

VI. CONCLUSION

The paper discussed the use of Gaussian process regressionto learn nonlinearities for improved performance in modelpredictive control. Combining GP dynamics with a nominalsystem allows for learning only parts of the dynamics, whichis key for keeping the required number of data points andcomputational complexity of the GP regression feasible foronline control. Approximation methods for the propagationof the state distributions over the prediction horizon werereviewed and it was shown how this enables a principledtreatment of chance constraints on both states and inputs.

A simulation example as well as a hardware implementationhave shown how the proposed formulations provide cautiouscontrol with improved performance for medium-sized systemswith low sampling times. In particular, we have demonstratedin experiment that both performance and safety in an au-tonomous racing setting can be significantly improved by usingcautious data-driven techniques.

ACKNOWLEDGEMENT

We would like to thank the Automatic Control Laboratory(IfA) at ETH Zurich and in particular Alexander Liniger for hisvaluable input and support with the hardware implementationon the miniature race cars.

REFERENCES

[1] K. J. Hunt, D. Sbarbaro, R. Zbikowski, and P. J. Gawthrop, “Neuralnetworks for control systems-A survey,” Automatica, vol. 28, no. 6, pp.1083–1112, 1992.

[2] D. Nguyen-Tuong and J. Peters, “Model learning for robot control: asurvey,” Cognitive Processing, vol. 12, no. 4, pp. 319–340, Nov. 2011.

[3] O. Sigaud, C. Salan, and V. Padois, “On-line regression algorithmsfor learning mechanical models of robots: A survey,” Robotics andAutonomous Systems, vol. 59, no. 12, pp. 1115–1129, 2011.

[4] G. Pillonetto, F. Dinuzzo, T. Chen, G. De Nicolao, and L. Ljung,“Kernel methods in system identification, machine learning and functionestimation: A survey,” Automatica, vol. 50, no. 3, pp. 657–682, 2014.

[5] E. D. Klenske, M. N. Zeilinger, B. Scholkopf, and P. Hennig, “Gaussianprocess-based predictive control for periodic error correction,” IEEETransactions on Control Systems Technology, vol. 24, no. 1, pp. 110–121, 2016.

[6] J. Quionero-Candela, A. Girard, and C. Rasmussen, “Prediction at anuncertain input for Gaussian processes and relevance vector machines -application to multiple-step ahead time-series forecasting,” Max PlanckInstitute for Biological Cybernetics, Tubingen, Germany, Tech. Rep.IMM-2003-18, 2003.

[7] M. Kuß, “Gaussian process models for robust regression, classification,and reinforcement learning,” Ph.D. dissertation, Technische UniversitatDarmstadt, Darmstadt, 2006.

[8] M. Deisenroth, “Efficient reinforcement learning using Gaussian pro-cesses,” Ph.D. dissertation, Karlsruhe Institute for Technology, Karl-sruhe, 2010.

[9] J. Kocijan, R. Murray-Smith, C. E. Rasmussen, and A. Girard, “Gaussianprocess model based predictive control,” in American Control Confer-ence, Jun. 2004, pp. 2214–2219.

[10] A. Grancharova, J. Kocijan, and T. A. Johansen, “Explicit stochasticpredictive control of combustion plants based on Gaussian processmodels,” Automatica, vol. 44, no. 6, pp. 1621–1631, 2008.

[11] G. Cao, E. M.-k. Lai, and F. Alam, “Gaussian process model predic-tive control of unmanned quadrotors,” in International Conference onControl, Automation and Robotics, 2016.

[12] X. Yang and J. M. Maciejowski, “Fault tolerant control using Gaussianprocesses and model predictive control,” International Journal of Ap-plied Mathematics and Computer Science, vol. 25, no. 1, pp. 133–148,2015.

[13] C. J. Ostafew, A. P. Schoellig, T. D. Barfoot, and J. Collier, “Learning-based nonlinear model predictive control to improve vision-based mobilerobot path tracking,” Journal of Field Robotics, vol. 33, no. 1, pp. 133–152, 2016.

[14] C. J. Ostafew, A. P. Schoellig, and T. D. Barfoot, “Robust constrainedlearning-based NMPC enabling reliable mobile robot path tracking,” TheInternational Journal of Robotics Research, vol. 35, no. 13, pp. 1547–1563, 2016.

[15] G. Cao, E. M.-K. Lai, and F. Alam, “Gaussian process model predictivecontrol of unknown non-linear systems,” IET Control Theory & Appli-cations, vol. 11, pp. 703–713(10), 2017.

[16] Y. Wang, C. Ocampo-Martinez, and V. Puig, “Stochastic model pre-dictive control based on Gaussian processes applied to drinking waternetworks,” IET Control Theory Applications, vol. 10, no. 8, pp. 947–955,2016.

[17] S. Kamthe and M. P. Deisenroth, “Data-efficient reinforcement learningwith probabilistic model predictive control,” in International Conferenceon Artificial Intelligence and Statistics, 2018.

[18] L. Hewing and M. N. Zeilinger, “Stochastic model predictive controlfor linear systems using probabilistic reachable sets,” Conference onDecision and Control, 2018, accepted.

[19] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machinelearning. The MIT Press, 2006.

[20] J. Quinonero-Candela, C. E. Rasmussen, and C. K. Williams, “Approx-imation methods for Gaussian process regression,” Microsoft Research,Tech. Rep. MSR-TR-2007-124, 2007.

[21] J. Quinonero-Candela, C. E. Rasmussen, and R. Herbrich, “A unifyingview of sparse approximate Gaussian process regression,” Journal ofMachine Learning Research, vol. 6, pp. 1935–1959, 2005.

[22] E. Snelson and Z. Ghahramani, “Sparse gaussian processes usingpseudo-inputs,” in Advances in Neural Information Processing Systems,Y. Weiss, B. Scholkopf, and J. C. Platt, Eds., 2006, pp. 1257–1264.

[23] V. Tresp, “A Bayesian committee machine,” Neural Computation,vol. 12, no. 11, pp. 2719–2741, Nov. 2000.

12

[24] A. Bemporad and M. Morari, “Robust model predictive control: Asurvey,” Robustness in identification and control, vol. 245, pp. 207–226,1999.

[25] J. Rawlings and D. Mayne, Model Predictive Control: Theory andDesign. Nob Hill Pub., 2009.

[26] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, “Learning-based model predictive control for safe exploration and reinforcementlearning,” arXiv:0707.3168, 2018.

[27] A. Girard, C. E. Rasmussen, and R. Murray-Smith, “Gaussian processpriors with uncertain inputs : Multiple-step-ahead prediction,” Depart-ment of Computing Science, University of Glasgow, Glasgow, Tech.Rep., 2002.

[28] L. Hewing, A. Liniger, and M. N. Zeilinger, “Cautious NMPC withGaussian process dynamics for miniature race cars,” European ControlConference, 2018.

[29] E. Kofman, J. A. De Don, and M. M. Seron, “Probabilistic set invarianceand ultimate boundedness,” Automatica, vol. 48, no. 10, pp. 2670–2676,2012.

[30] L. Hewing, A. Carron, K. Wabersich, and M. N. Zeilinger, “On acorrespondence between probabilistic and robust invariant sets for linearsystems,” European Control Conference, 2018.

[31] Z. Zhou and R. Cogill, “Reliable approximations of probability-constrained stochastic linear-quadratic control,” Automatica, vol. 49,no. 8, pp. 2435 – 2439, 2013.

[32] G. Schildbach, P. Goulart, and M. Morari, “Linear controller design forchance constrained systems,” Automatica, vol. 51, pp. 278 – 284, 2015.

[33] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgwer, “Constraint-tightening and stability in stochastic model predictive control,” IEEETransactions on Automatic Control, vol. 62, no. 7, pp. 3165–3177, 2017.

[34] P. Whittle, “Risk-sensitive linear/quadratic/gaussian control,” Advancesin Applied Probability, vol. 13, no. 4, p. 764777, 1981.

[35] M. Diehl, H. J. Ferreau, and N. Haverbeke, “Efficient numerical methodsfor nonlinear MPC and moving horizon estimation problem formula-tion,” Nonlinear model predictive control, pp. 391–417, 2009.

[36] A. Wachter and L. T. Biegler, “On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming,”Mathematical programming, vol. 106, no. 1, pp. 25–57, 2006.

[37] B. Houska, H. J. Ferreau, and M. Diehl, “ACADO Toolkit – An OpenSource Framework for Automatic Control and Dynamic Optimization,”Optimal Control Applications and Methods, vol. 32, no. 3, pp. 298–312,2011.

[38] A. Domahidi and J. Jerez, “FORCES Professional,” embotech GmbH(http://embotech.com/FORCES-Pro), Jul. 2014.

[39] A. Zanelli, A. Domahidi, J. Jerez, and M. Morari, “FORCES NLP:an efficient implementation of interior-point methods for multistagenonlinear nonconvex programs,” International Journal of Control, 2017.

[40] J. Andersson, “A general-purpose software framework for dynamicoptimization,” Ph.D. dissertation, KU Leuven, 2013.

[41] M. S. Naik and S. N. Singh, “State-dependent Riccati equation-basedrobust dive plane control of AUV with control constraints,” OceanEngineering, vol. 34, no. 11-12, pp. 1711–1723, 2007.

[42] T. Faulwasser, B. Kern, and R. Findeisen, “Model predictive path-following for constrained nonlinear systems,” Conference on Decisionand Control and Chinese Control Conference, pp. 8642–8647, 2009.

[43] D. Lam, C. Manzie, and M. Good, “Model predictive contouringcontrol,” in Conference on Decision and Control, 2010, pp. 6137–6142.

[44] A. Liniger, A. Domahidi, and M. Morari, “Optimization-based au-tonomous racing of 1:43 scale RC cars,” Optimal Control Applicationsand Methods, vol. 36, no. 5, pp. 628–647, 2015.

[45] H. B. Pacejka and E. Bakker, “The magic formula tyre model,” VehicleSystem Dynamics, vol. 21, no. sup001, pp. 1–18, 1992.

[46] J. V. Carrau, A. Liniger, X. Zhang, and J. Lygeros, “Efficient implemen-tation of randomized MPC for miniature race cars,” European ControlConference, pp. 957–962, 2016.

http://embotech.com/FORCES-Pro

Cautious Model Predictive Control using Gaussian Process ...1 Cautious Model Predictive Control...

Documents

Transcript of Cautious Model Predictive Control using Gaussian Process ...1 Cautious Model Predictive Control...