System Identification without Lennart Ljung: what would have been
EECE 574 - Adaptive Control - Basics of System · PDF fileL. Ljung, System Identification:...
Transcript of EECE 574 - Adaptive Control - Basics of System · PDF fileL. Ljung, System Identification:...
EECE 574 - Adaptive ControlBasics of System Identification
Guy Dumont
Department of Electrical and Computer EngineeringUniversity of British Columbia
January 2012
Guy Dumont (UBC) EECE574 - Basics of System Identification 1 / 40
Identification for Adaptive Control
IntroductionModels for identificationIdentification methodsRecursive identificationTracking of time-varying parametersPseudo-random binary sequences (PRBS)Identification in closed loop
Recommended textbook on identification:L. Ljung, System Identification: Theory for the User, 2nd edition,Prentice Hall, 1999
Guy Dumont (UBC) EECE574 - Basics of System Identification 2 / 40
Introduction
System Identification
This is the field of building a dynamic model of a process fromexperimental dataIt consists of the following:
Experimental planningSelection of model structureParameter estimationValidation
Guy Dumont (UBC) EECE574 - Basics of System Identification 3 / 40
Introduction
Experimental Design
Experiments are difficult and costlyInput signal should excite all modes
Problematic in adaptive control with no active learningClosed-loop identifiability helped by:
Nonlinear, or time-varying feedbackSetpoint changes
Guy Dumont (UBC) EECE574 - Basics of System Identification 4 / 40
Models for Identification
Models for Identification
Models are really approximations of the real systemThere is NO unique model to describe a processStructure derived from prior knowledge and purpose of modelNatural to use general linear system, called a black-box model
Impulse response modelStep response modelTransfer function modelGeneral stochastic modelState-space modelLaguerre model
Guy Dumont (UBC) EECE574 - Basics of System Identification 5 / 40
Models for Identification
Models for Identification
A few words of caution:
Do NOT fall in love with your model...
When map disagrees with Nature, TRUST Nature
Swedish Army Manual
Guy Dumont (UBC) EECE574 - Basics of System Identification 6 / 40
Models for Identification
Models for Identification
The Impulse Response Model
One of the simplest:
y(k) =∞∑j=0
hju(k − j − 1)
hj is the j th element of the impulse response of the process. Assuming thatthe impulse response asymptotically goes to zero, it may be truncated afterthe nh
th element:
y(k) =
nh∑j=0
hju(k − j − 1)
This is called a Finite Impulse Response (FIR) model and can only be usedfor stable processes. Used mainly in adaptive filtering.
Guy Dumont (UBC) EECE574 - Basics of System Identification 7 / 40
Models for Identification
Models for Identification
The Step Response Model
Also one of the simplest: y(k) =∑ns
j=0 sj∆u(k − j − 1)
sj is the j th element of the step response of the process. ∆ is thedifferencing operator:
∆u(k) = u(k)− u(k − 1) = (1− q−1)u(k)
here q is the forward-shift operator, i.e. qy(k) = y(k + 1). Conversely,q−1y(k) = y(k − 1)
This is the Finite Step Response (FSR) model used in Dynamic MatrixControl (DMC). It can only be used for stable processes.
Guy Dumont (UBC) EECE574 - Basics of System Identification 8 / 40
Models for Identification
Models for Identification
The Transfer Function Model
This is the classical discrete-time transfer function model:
y(k) =B(q−1)
A(q−1)q−du(k − 1)
where d is the time delay of the process in sampling intervals (d ≥ 0) and thepolynomials A and B are given by:
A(q−1) = 1 + a1q−1 + · · ·+ anAq−nA
B(q−1) = b0 + b1q−1 + · · ·+ bnB q−nB
This can be used for both stable and unstable processes. Note that both FIRand FSR models can be seen as subsets of the transfer function model. Thismodel requires less parameters than FIR or FSR, but assumptions about theorders nA and nB and the time delay d must be made.
Guy Dumont (UBC) EECE574 - Basics of System Identification 9 / 40
Models for Identification
Models for Identification
The General Stochastic Model
This is better known as the Box-Jenkins model, and describes both theprocess and the disturbance:
y(k) =B(q−1)
A(q−1)q−du(k − 1) +
C(q−1)
D(q−1)e(k)
where e(k) is a discrete white noise sequence with zero mean andvariance σe.
This general form is usually an overkill for practical applications.
Guy Dumont (UBC) EECE574 - Basics of System Identification 10 / 40
Models for Identification
Models for Identification
The General Stochastic Model
The form most used in adaptive control is the so-called CARIMA, orARIMAX model:
y(k) =B(q−1)
A(q−1)q−du(k − 1) +
C(q−1)
A(q−1)∆e(k)
Putting ∆ = 1− q−1 in the numerator of the noise model means thatthe noise is nonstationary, (e(t)/∆ is known as a random walk). Thisforces an integration in the controller.
This is the model used in Generalized Predictive Control (GPC).
Guy Dumont (UBC) EECE574 - Basics of System Identification 11 / 40
Models for Identification
Models for Identification
The State-Space Model
Not used very often in adaptive control:
x(k) = A x(k − 1) + B u(k − 1)
y(k) = cT x(k)
The ARMAX model can also be written in state-space form. One of theadvantages of the state-space representation is that it simplifies theprediction. However, system identification is more complex for astate-space model than for a transfer function model.
Guy Dumont (UBC) EECE574 - Basics of System Identification 12 / 40
Models for Identification
Models for Identification
The Laguerre Model
The use of this model in adaptive control will be developed in detaillater on. In this representation, the process output is represented as atruncated weighted sum of Laguerre functions li(k):
y(k) =N∑
i=0
ci li(k)
This representation is much more parsimonious than the FIR or FSRones, but contrary to the transfer function model does not requireassumptions about the order and time delay of the process. Although itis implemented in a state-space form, it is straightforward to use insystem identification.
Guy Dumont (UBC) EECE574 - Basics of System Identification 13 / 40
Least-Squares Identification
Least-Squares Identification
Developed by Karl Gauss in his study of the motion of celestial bodies“. . . the unknown parameters are chosen so that the sum of the squaresof the differences between the observed and the computed values,multiplied by a number that measure the degree of precision is aminimum”
Applied to a variety of problems in mathematics, statistics, physics,economics, signal processing and control
Basic technique for parameter estimation
Guy Dumont (UBC) EECE574 - Basics of System Identification 14 / 40
Least-Squares Identification
Least-Squares Estimation
Example
Assume that data generated by y = bx + n where n is white measurementnoise. The problem is to estimate b, from k data pairs (x , y ).
The least-squares performance index is
J =12
k∑i=1
[y(i)− x(i)b]2
Guy Dumont (UBC) EECE574 - Basics of System Identification 15 / 40
Least-Squares Identification
Least-Squares Estimation
Defining the measurement vector:
y = [y(1), . . . , y(k)]T
and the regressor:x = [x(1), . . . , x(k)]T
one can writeJ =
12
[y − bx ]T [y − bx ]
Guy Dumont (UBC) EECE574 - Basics of System Identification 16 / 40
Least-Squares Identification
Least-Squares Estimation
The value that minimizes J is then found as
dJdb
= −xT [y − bx ] = 0
The least-squares estimate b is then given by the solution to the normalequation:
−xT y + xT bx = 0
which, if [xT x ] is invertible, is
b = [xT x ]−1xT y
[xT x ]−1xT is known as the (Moore-Penrose) pseudoinverse.
This idea can easily be extended to dynamic systems.
Guy Dumont (UBC) EECE574 - Basics of System Identification 17 / 40
Least-Squares Identification
Least-Squares Identification
Let a discretized dynamic system be described by
y(t) =n∑
i=1
−aiy(t − i) +n∑
i=1
biu(t − i) + w(t)
where u(t) and y(t) are respectively the input and ouput of the plant and w(t)is the process noise.
Definingθ = [ a1 · · · an b1 · · · bn]T
x = [ −y(t) · · · −y(t − n) u(t − 1) · · · u(t − n)]T
the system above can be written as
y(t) = xT (t)θ + w(t)
Guy Dumont (UBC) EECE574 - Basics of System Identification 18 / 40
Least-Squares Identification
Least-Squares Identification
With N observations, the data can be put in the following compact form:
Y = Xθ + W
where
Y T = [ y(1) · · · y(N)]
W T = [ w(1) · · · w(N)]
X =
xT (1)...
xT (N)
Guy Dumont (UBC) EECE574 - Basics of System Identification 19 / 40
Least-Squares Identification
Least-Squares Identification
Defining the modelling error as
ε(t) = y(t)− xT (t)θ
the least-squares performance index to be minimized is:
J =12
N∑1
ε2(t) =12
[Y − Xθ]T [Y − Xθ]
Differentiate J with respect to θ and equate to zero. θ, the least-squaresestimate of θ, which, if [X T X ] is invertible, is:1
θ = [X T X ]−1X T Y
1I am now tired of having to underline vectors, so I will stop doing so fromnow on
Guy Dumont (UBC) EECE574 - Basics of System Identification 20 / 40
Least-Squares Identification
Properties of the Least-Squares Estimates
θ = [X T X ]−1X T [Xθ + W ]
orθ = θ + [X T X ]−1X T W
Taking the expecting value gives
E(θ) = θ + E{[X T X ]−1X T W}
The second term in the above expression represents a bias in the estimate.
Guy Dumont (UBC) EECE574 - Basics of System Identification 21 / 40
Least-Squares Identification
Properties of the Least-Squares Estimates
Note that the second term in the RHS is generally non-zero except when
1 {w(t)} is zero mean and uncorrelated, or
2 {w(t)} is independent of {u(t)} and y does not appear in the regressor
FIR and step modelsLaguerre modelOutput-error models
Only in the above cases is the least-squares estimate unbiased.Furthermore, if it is the case as the number of observations increases toinfinity, then we say that the method gives consistent estimates.
Guy Dumont (UBC) EECE574 - Basics of System Identification 22 / 40
Least-Squares Identification
Properties of the Least-Squares Estimates
The estimate covariance is then given by
cov θ = E{[X T X ]−1X T WW T X [X T X ]−1}
which, if {w(t)} is white, zero-mean gaussian with covariance σ2w , becomes
cov θ = σ2w [X T X ]−1
The matrix (X T X )−1X is called the pseudo-inverse of X .
Because of this, [X T X ]−1 is sometimes called the covariance matrix.
The condition that X T X be invertible is called an excitation condition.
Guy Dumont (UBC) EECE574 - Basics of System Identification 23 / 40
Alternatives to Least-Squares
Alternatives to Least-Squares
Need a method that gives consistent estimates in presence of colourednoise
Generalized Least-SquaresInstrumental Variable MethodMaximum Likelihood Identification
Guy Dumont (UBC) EECE574 - Basics of System Identification 24 / 40
Alternatives to Least-Squares Generalized Least Squares
Generalized Least Squares
Let the noise model be w(t) = e(t)C(q−1)
where {e(t)} = N(0, σ) andC(q−1) is monic, of degree n
Given the system described as
A(q−1)y(t) = B(q−1)u(t) +1
C(q−1)e(t)
Defining filtered sequences
y(t) = C(q−1)y(t)u(t) = C(q−1)u(t)
The system becomes
A(q−1)y(t) = B(q−1)u(t) + e(t)
Guy Dumont (UBC) EECE574 - Basics of System Identification 25 / 40
Alternatives to Least-Squares Generalized Least Squares
Generalized Least Squares
If C(q−1) is known, then least-squares gives consistent estimates of Aand B, given y and u
The problem however, is that in practice C(q−1) is not known and {y}and {u} cannot be obtained
An iterative method proposed by Clarke (1967) solves that problem
Guy Dumont (UBC) EECE574 - Basics of System Identification 26 / 40
Alternatives to Least-Squares Generalized Least Squares
Generalized Least Squares
1 Set C(q−1) = 1
2 Compute {y(t)} and {u(t)} for t = 1, . . . ,N
3 Use least-squares to estimate A and B from y and u
4 Compute the residuals
w(t) = A(q−1)y(t)− B(q−1)u(t)
5 Use least-squares method to estimate C from
C(q−1)w(t) = ε(t)
i.e. w(t) = −c1w(t − 1)− c2w(t − 2)− · · ·+ ε(t) where {ε(t)} is white
6 If converged, then stop, otherwise repeat from step 2
Guy Dumont (UBC) EECE574 - Basics of System Identification 27 / 40
Alternatives to Least-Squares Generalized Least Squares
Generalized Least Squares
For convergence, the loss function and/or the whiteness of the residualscan be tested
An advantage of GLS is that it not only may give consistent estimates ofthe deterministic part of the system, but also gives a representation ofthe noise that the LS method does not give
The consistency of GLS depends on the signal to noise ratio, theprobability of consistent estimation increasing with the S/N
There is, however no guarantee of obtaining consistent estimates
Guy Dumont (UBC) EECE574 - Basics of System Identification 28 / 40
Alternatives to Least-Squares Instrumental Variables
Instrumental Variable Methods (IV)
T. Söderström and P.G. Stoica,Instrumental Variable Methods for SystemIdentification, Spinger-Verlag, Berlin, 1983
As seen previously, the LS estimate
θ = [X T X ]−1X T Y
is unbiased if W is independent of X .
Assume that a matrix V is available, which is correlated with X but notwith W and such that V T X is positive definite, i.e.
E [V T X ] is nonsingular
E [V T W ] = 0
Guy Dumont (UBC) EECE574 - Basics of System Identification 29 / 40
Alternatives to Least-Squares Instrumental Variables
Instrumental Variable Methods (IV)
Then,V T Y = V T Xθ + V T W
and θ estimated by
θ = [V T X ]−1V T Y
V is called the instrumental variable matrix.
Guy Dumont (UBC) EECE574 - Basics of System Identification 30 / 40
Alternatives to Least-Squares Instrumental Variables
Instrumental Variable Methods (IV)
Ideally V is the noise-free process output and the IV estimate θ isconsistent
There are many possible ways to construct the instrumental variable.For instance, it may be built using an initial least-squares estimates:
A1y(t) = B1u(t)
and the k th row of V is given by
vTk = [−y(k − 1), . . . ,−y(k − n),u(k − 1), . . . ,u(k − n)]
Guy Dumont (UBC) EECE574 - Basics of System Identification 31 / 40
Alternatives to Least-Squares Instrumental Variables
Instrumental Variable Methods (IV)
Consistent estimation cannot be guaranteed in general
A two-tier method in its off-line version, the IV method is more useful inits recursive form
Use of instrumental variable in closed-loop
Often the instrumental variable is constructed from the inputsequence. This cannot be done in closed-loop, as the input isformed from old outputs, and hence is correlated with the noise w ,unless w is white. In that situation, the following choices areavailable.
Guy Dumont (UBC) EECE574 - Basics of System Identification 32 / 40
Alternatives to Least-Squares Instrumental Variables
Instrumental Variable Methods (IV)
Instrumental variables in closed-loop
Delayed inputs and outputs. If the noise w is assumed to be amoving average of order n, then choosing v(t) = x(t − d) withd > n gives an instrument uncorrelated with the noise. This,however will only work with a time-varying regulator.Reference signals. Building the instruments from the setpoint willsatisfy the noise independence condition. However, the setpointmust be a sufficiently rich signal for the estimates to converge.External signal. This is effect relates to the closed-loopidentifiability condition (covered a bit later...). A typical externalsignal is a white noise independent of w(t).
Guy Dumont (UBC) EECE574 - Basics of System Identification 33 / 40
Alternatives to Least-Squares Maximum Likelihood
Maximum-Likelihood Identification
The maximum-likelihood method considers the ARMAX model below where uis the input, y the output and e is zero-mean white noise with standarddeviation σ:
A(q−1)y(t) = B(q−1)u(t − k) + C(q−1)e(t)
whereA(q−1) = 1 + a1q−1 + · · ·+ anq−n
B(q−1) = b1q−1 + · · ·+ bnq−n
C(q−1) = 1 + c1q−1 + · · ·+ cnq−n
The parameters of A, B, C as well as σ are unknown.
Guy Dumont (UBC) EECE574 - Basics of System Identification 34 / 40
Alternatives to Least-Squares Maximum Likelihood
Maximum-Likelihood Identification
DefiningθT = [ a1 · · · an b1 · · · bn c1 · · · cn]
xT = [ −y(t) · · · u(t − k) · · · e(t − 1) · · · ]
the ARMAX model can be written as
y(t) = xT (t)θ + e(t)
Unfortunately, one cannot use the least-squares method on this modelsince the sequence e(t) is unknown.
In case of known parameters, the past values of e(t) can bereconstructed exactly from the sequence:
ε(t) = [A(q−1)y(t)− B(q−1)u(t)]/C(q−1)
Guy Dumont (UBC) EECE574 - Basics of System Identification 35 / 40
Alternatives to Least-Squares Maximum Likelihood
Maximum-Likelihood Identification
V =12
N∑t=1
ε2(t)
Minimize V with respect to θ, using for instance a Newton-Raphsonalgorithm. Note that ε is linear in the parameters of A and B but not inthose of C. We then have to use some iterative procedure
θi+1 = θi − αi (V ′′(θi ))−1V ′(θi )
Initial estimate θ0 is usually obtained from a least-squares estimate
Estimate the noise variance as
σ2 =2N
V (θ)
Guy Dumont (UBC) EECE574 - Basics of System Identification 36 / 40
Alternatives to Least-Squares Maximum Likelihood
Properties of the Maximum-Likelihood Estimate (MLE)
If the model order is sufficient, the MLE is consistent, i.e. θ → θ asN →∞
The MLE is asymptotically normal with mean θ and standard deviation σθ
The MLE is asymptotically efficient, i.e. there is no other unbiasedestimator giving a smaller σθ
Guy Dumont (UBC) EECE574 - Basics of System Identification 37 / 40
Alternatives to Least-Squares Maximum Likelihood
Properties of the Maximum-Likelihood Estimate (MLE)
The Cramer-Rao inequality says that there is a lower limit on theprecision of an unbiased estimate, given by
cov θ ≥ M−1θ
where Mθ is the Fisher Information Matrix
Mθ = −E [(log L)θθ]
For the MLEσ2θ = M−1
θ = σ2V−1θθ
if σ is estimated, then
σ2θ =
2VN
V−1θθ
Guy Dumont (UBC) EECE574 - Basics of System Identification 38 / 40
Practice
Identification In Practice
1 Specify a model structure
2 Compute the best model in this structure
3 Evaluate the properties of this model
4 Test a new structure, go to step 1
5 Stop when satisfactory model is obtained
Guy Dumont (UBC) EECE574 - Basics of System Identification 39 / 40