Identification Lecture9. 2018.november27.users.itk.ppke.hu/~vago/A_09_Identify_AR_slide_18.pdf ·...
Transcript of Identification Lecture9. 2018.november27.users.itk.ppke.hu/~vago/A_09_Identify_AR_slide_18.pdf ·...
Stochastic Signals and Systems
IdentificationLecture 9.
2018. november 27.
Autocovariance function
(yn) is w.s.st. Eyn = 0. A sequence of observations: y1, . . . , yN .
The first objective: estimate the auto-covariance function
r(τ) = Eyn+τyn.
A natural candidate is the sample covariance:
r(τ) = 1N − τ
N−τ∑n=1
yn+τyn, τ ≥ 0.
Certainly, no estimates will be available for τ ≥ N.
r(τ) = 1N−τ
N−τ∑n=1
yn+τyn, τ ≥ 0.
The values yn+τyn form a dependent sequence: (((((((standard LLN
In order to get meaningful results we have to restrict ourselves to
systems the structure of which can be perfectly described by a
finite set of parameters.
We will consider three classes of processes: AR, MA and ARMA.
Identification of AR models
Least Squares estimate of an AR process
Let (yn) be w.s.st. stable AR(p) process defined by
yn + a∗1yn−1 + · · ·+ a∗pyn−p = en,
(en) is a w.s.st. orthogonal process.
Assume, that A∗(z−1) =p∑
k=1a∗kz−k is stabil. Thus (en) is the
innovation process of (yn).
Our goal is to estimate (a∗1, . . . , a∗p) using observations y1, . . . , yN
yn + a∗1yn−1 + · · ·+ a∗pyn−p = en.
Introduce the notations:
ϕn = (−yn−1 · · · − yn−p)T , θ∗ = (a∗1 · · · a∗p)T .
The AR(p) equation can be rewritten as:
yn = ϕTn θ∗ + en,
a linear stochastic regression. (ϕn) is not independent of (en) (!)
Goal : Estimate θ∗ using the observations y1, . . . yN .
yn = ϕTn θ∗ + en. θ∗ =?
Let us fix a tentative value of θ ∈ Rp. Define the error process:
εn(θ) = yn − ϕTn θ, n ≥ 0.
We may assume that y0, y−1, ..., y−p+1 are known.
The LSQ estimation method: minimize the cost function:
VN(θ) = 12
N∑n=1
ε2n(θ) = 1
2
N∑n=1
(yn − ϕT
n θ)2.
VN(θ) = 12
N∑n=1
ε2n(θ) is quadratic and convex in θ, thus ∃ min.
For any minimizing value of θ0 we have:
∂
∂θVN(θ0) = 0.
Differentiating VN(θ) w.r.t. θ we get for j = 1, . . . p:
∂
∂θjVN(θ) =
N∑n=1
εTθj n(θ)εn(θ) with εT
θj n(θ) = ∂
∂θjεT
n (θ).
∂∂θVN(θ) =
N∑n=1
εTθn(θ)εn(θ)
We have εn(θ) = yn − ϕTn θ, thus
εθn(θ) = −ϕTn .
We get the equation
∂
∂θVN(θ) =
N∑n=1−ϕnεn(θ) = 0.
LSQ estimate:N∑
n=1−ϕnεn(θ) = 0.
Remark. For θ = θ∗ we have εn(θ∗) = en.
Using the definition ϕn = (−yn−1 · · · − yn−p)T we get:
E ∂
∂θVN(θ∗) = 0.
After substituting εn(θ) we get
N∑n=1−ϕn(yn − ϕT
n θ) = 0.
This is a linear equation for θ, which certainly has a solution.
LSQ estimate
PropositionLet θN be a LSQ estimator of θ∗ based on N samples. Then θNsatisfies the following normal equation:[ N∑
n=1ϕnϕ
Tn
]θN =
N∑n=1
ϕnyn.
The estimator θN is unique, if SN (or RN) is non-singular:
SN =N∑
n=1ϕnϕ
Tn , RN = 1
N
N∑n=1
ϕnϕTn .
RN = 1N
N∑n=1
ϕnϕTn
The elements of RN are the empirical auto-covariances of (yn) :
RN(k, l) = 1N
N∑n=1
yn−kyn−l = r(l − k).
We assume, that limN
RN = R∗ a.s.
R∗ is the p-th order auto-covariance matrix.
Remark. It can be proved, that R∗ is non-singular. (e.g. by astate-space representation of y).
θN = 1N R−1N
N∑n=1
ϕnyn.
PropositionLet (yn) be a w.s.st. stable AR(p) process, and lim
NRN = R∗.
Then the LSQ estimate θN converges to θ∗ a.s.
The error of the estimator θN is denoted by θN . Then:
θN = θN − θ∗ = (N∑
n=1ϕnϕ
Tn )−1
N∑n=1
ϕnen,
since yn = ϕTn θ∗ + en.
An approximating error process is ˜θN = 1N (R∗)−1
N∑n=1
ϕnen.
PropositionThe approximating error process ˜θN has the following covariancematrix:
E ˜θN˜θ
TN = 1
N (R∗)−1σ2.
The (asymptotic) quality of the LSQ estimator is determined bythe covariance matrix R∗.
R∗ is the covariance matrix of the state-vector of SS representationof (yn). Thus it is the solution of a Lyapunov-equation.
Example. An AR(1) process
The process is given by
yn + a∗yn−1 = en.
Then R∗ = σ2(e)1− (a∗)2 , and thus the asymptotic variance of the
LSQ estimator of a∗ equals
limN→∞
N E(aN − a∗)2 = 1− (a∗)2
It follows, that if as a∗ is close to ±1, then the asymptotic varianceof the LSQ estimator is close to 0.
The recursive LSQ method
Assume that θN is available.
θN =[ N∑
n=1ϕnϕ
Tn
]−1 N∑n=1
ϕnyn.
Suppose we get one more observation yN+1.
Do we need to recompute S−1N+1 and θN+1 from scratch, or is there
a way to compute S−1N+1 and θN+1 using S−1
N and θN?
A matrix-inversion formula
Proposition. (The Sherman-Morrison lemma.) Let A ∈ Rm×m,and let b, c ∈ Rm. Assume that A is non-singular and so isA + bcT . Then(
A + bcT)−1
= A−1 − 1(1 + cTA−1b)A
−1bcTA−1.
A direct corollary is a recursion for the inverse of S−1N , where:
SN+1 = SN + ϕN+1ϕTN+1.
SN is the coefficient matrix of the normal equation. If SN isnon-singular for some N, then:
S−1N+1 = S−1
N − 11 + ϕT
N+1S−1N ϕN+1
S−1N ϕN+1ϕ
TN+1S−1
N .
Recursion for θN : The RLSQ method.
PropositionAssume that RN is non-singular, and let θN be the LSQ estimateof θ∗. Then θN+1 and RN+1 can be computed via the recursion
θN+1 = θN + 1N + 1R
−1N+1 ϕN+1(yN+1 − ϕT
N+1θN)
RN+1 = RN + 1N + 1(ϕN+1ϕ
TN+1 − RN)
The term (yN+1 − ϕTN+1θN) is approx. (yN+1 − ϕT
N+1θ∗), which is
the innovation eN+1.
The correction term E(yN+1 − ϕTN+1θ) = 0, when θ = θ∗.
Identification of MA models.
Identification of MA models.
Consider an MA process (yn) defined by
yn = en + c∗1en−1 + · · ·+ c∗r en−r =⇒ y = C∗(q−1)e,
withC∗(q−1) = 1 +
r∑j=1
c∗j q−j , degC∗ = r ,
where (en) is a w.s.st. orthogonal process. We will use the notation
θ∗ = (c∗1 , . . . , c∗r )T .
Assume that C∗ is stable, then e is the innovation process of y .
yn = en + c∗1en−1 + · · ·+ c∗r en−r
Goal : Estimate θ∗ using the observations y1, . . . yN .
The key idea: try to reconstruct e1, ..., eN by inverting the system
Let us take a polynomial C(q−1) = 1 +r∑
j=1cjq−j .
Define an estimated driving noise process ε = (εn) by
ε = C−1(q−1)y , or, equivalently: C(q−1)ε = y .
If C(z−1) 6= 0 for |z | = 1, then the process ε is well-defined.
Example: Inverse of an MA(1) process.
An MA(1) process is given by
yn = en + c∗1en−1.
Then the inverse of the noise process is:
εn + c1εn−1 = yn, n ≥ 1.
To generate ε1 we would need to know ε0 which is not available.
A standard choice is εj = 0 for −r + 1 ≤ j ≤ 0.
C(q−1) = 1 +r∑
j=1cjq−j
θ = (c1, . . . , cr )T . Define the estimated noise process (εn(θ)) by:
εn(θ) + c1εn−1(θ) + · · ·+ crεn−r (θ) = yn, ε−j(θ) = 0.
Consider the cost function: VN(θ) = 12
N−1∑n=0
ε2n(θ).
The Prediction Error (PE) estimator θN is the solution of:
minθ∈D
VN(θ), D = {θ ∈ Rr : C(z−1) = 1 +r∑
j=1cjz−j is stable}.
εn(θ) + c1εn−1(θ) + · · ·+ crεn−r(θ) = yn
The cost function is VN(θ) = 12
N−1∑n=0
ε2n(θ).
We have∂
∂θVN(θ) = VθN(θ) =
N∑n=1
εθn(θ)εn(θ).
To get εθn(θ), we can differentiate the egquation w.r.to θj = cj :
∂
∂θjεn(θ) + c1
∂
∂θjεn−1(θ) + · · ·+ cr
∂
∂θjεn−r (θ) + εn−j(θ) = 0.
The gradient process.
The j-th coordinate of the gradient process is generated by:
εθj n(θ) + c1εθj ,n−1(θ) + · · ·+ crεθj ,n−r (θ) + εn−j(θ) = 0
with initial values εθj n(θ) = 0 for n ≤ 0.
Introduce the notation:
φn(θ) = (εn−1(θ), ..., εn−r (θ))T .
Conclusion. The gradient process εθ(θ) satisfies
C(q−1)εθ(θ) = −φ(θ).
Thus the equation defining the PE estimator is non-linear in θ.
The asymptotic covariance matrix of θN
The error of the estimation is θN = θN − θ∗. Introduce the notation
R∗ = E(εθn(θ∗)εT
θn(θ∗)).
PropositionAssume that C∗(z−1) is stable . Then θN has the followingcovariance matrix:
EθN θTN = 1
N (R∗)−1σ2(e).
The asymptotic covariance of the estimators
(A remarkable feature of the above results)
The asymptotic covariance matrix of the PE estimator of theparameters of the MA system
y = C∗(q−1)e,
and the asymptotic covariance matrix of the LSQ estimator of theparameters of the AR system
C∗(q−1)y = e
are the same.
Example: MA(1) process
yn = en + c∗en−1 c∗ 6= ±1.
We have seen, that R∗ = σ2(e)1− (c∗)2 , and thus
limN→∞
N E(cN − c∗)2 = 1− (c∗)2
It follows, that if c∗ is close to ±1, then σ2(cN) is close to 0.
Remark. A transfer function H(e−iω, θ∗), and its inverseH−1(e−iω, θ∗) can be equally accurately estimated, at leastasymptotically.
Exercise. Compute the gradient process for AR(1) and MA(1)processes.