Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51....

45
Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1 / 51

Transcript of Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51....

Page 1: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Lecture 10

Håkan Hjalmarsson

February 6, 2019

(KTH) Lecture 10 1 / 51

Page 2: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Outline

1 Model structure selection

2 A model’s accuracy

3 Fundamental limitations: FIR

4 The structure of the asymptotic covariance matrix P

5 A refresher on orthogonal projection

6 A geometric interpretation of P

7 Structural results

(KTH) Lecture 10 2 / 51

Page 3: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

The Identification Problem

Simply solve: minθ∑t εTt (θ)Λ−1εt(θ) (εt(θ) pred. error)

What’s the big deal???Experiment design/Cost of complexityValue of sensors and communicationModel structure selectionNon-convex optimization problem(KTH) Lecture 10 4 / 51

Page 4: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model structure selection

Suppose given a set of model structures:

Ξ := G(ρ) : ρ ∈ Dρ ⊂ R,

Which model structure G to use?

ML fails:

ρ, gG(ρ) = arg minρ,g

L(g)

s.t. g ∈ G(ρ)

has solution g = gLS whenever at least one model structure containsgLS .

(KTH) Lecture 10 5 / 51

Page 5: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model structure selection

Methods:Cross Validation

1 Estimate each θ(i) with training data

2 Choose i = arg maxi pi(y; θ(i)) on test data

Residual analysis (Section 16.6 in Ljung)Hypothesis testingInformation based criteria (AIC, BIC, MDL, ..., Section 16.4 inLjung)Confidence regions

(KTH) Lecture 10 6 / 51

Page 6: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Residual analysis

ε(t) := ε(t, θN ) white and independent of the input???

(KTH) Lecture 10 7 / 51

Page 7: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Residual analysis: Whiteness test

Correlation with past residuals: ϕ(t) =[ε(t− 1) . . . ε(t−M)

]T1√N

N∑t=1

ϕ(t)ε(t) =√N

RNε (1)

...RNε (M)

=√NRN,Mε ∼ N(0, λ2

e · I)

Test statistic:

N

λ2e

‖RN,Mε ‖2 ∼ χ2(M) asymptotically

Noise variance estimate λe := 1N

∑Nt=1 ε

2(t) = RNε (0) gives

ζN,M = N‖RN,Mε ‖2

(RNε (0))2∼ χ2(M)

Reject hypothesis that residuals white if ζN,M > χ2α(M)

(KTH) Lecture 10 8 / 51

Page 8: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Residual analysis: Cross-correlation test

Use past inputs instead: ϕ(t) =[u(t− 1) . . . u(t−M)

]T1√N

N∑t=1

ϕ(t)ε(t) =√N

RNεu(1)

...RNεu(M)

=√NRN,Mεu

Assume residuals white (general case in Ljung):√NRN,Mεu ∼ N(0, λeE[ϕ(t)ϕT (t)])

Test statistic:

ζuN,M := N

Rε(0)(RN,Mεu )T

(1N

N∑t=1

ϕ(t)ϕT (t))−1

RN,Mεu ∼ χ2(M)

(KTH) Lecture 10 9 / 51

Page 9: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model error modeling

E(g(θN )) = Y − Φg(θN ) = Φη + V, η ∈ RM

η = (ΦTΦ)−1ΦTE

Assuming η = 0:√Nη ∼ N(0, λe(ΦTΦ)−1)

N

λeηTΦTΦη ∼ χ2(M)

N

λeηTΦTΦη = N

λeETΦ(ΦTΦ)−1ΦTE

Same as cross-correlation statistic(KTH) Lecture 10 10 / 51

Page 10: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Information criteria

Akaike Information Criterion (AIC)ChooseMi with highest

AIC(i) = ln p(y; θ(i))− dim(θ(i))

Bayesian Information Criterion (BIC)

BIC(i) = ln p(y; θ(i))− dim(θ(i)) lnN, N := dim(y)

Consistent estimate of model order

(KTH) Lecture 10 11 / 51

Page 11: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model comparisons

Suppose we know that

Y = Φgo + V ∈ RN

Φ ∈ RN×n known deterministicV ∈ N(0, λeI)ML-estimate: gLS = (ΦTΦ)−1ΦTY

Want to curb the high variance by using another model structureLet gG ∈ RnG be the minimizer in G ⊂ Rn

(KTH) Lecture 10 12 / 51

Page 12: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Hypothesis testing

Y = Φgo + V, V ∈ N(0, λeI)

L(g) := (Y − Φg)T (Y − Φg)

= (g − gLS)TR(g − gLS) + L(gLS)

gLS : = R−1ΦTY = go +R−1ΦTV

Hence

L(gLS) = (Φgo + V − Φ(go +R−1ΦTV ))T (same thing)= V T (IN×N − Φ(ΦTΦ)−1ΦT )︸ ︷︷ ︸

Projection matrix

V ∈ λe χ2(N − n)

Similarly

L(gG)− L(gLS) = (gG − gLS)TR(gG − gLS) ∈ λe χ2(n− nG)(at least asymptotically in N )Can also show these quantities are independent.

(KTH) Lecture 10 13 / 51

Page 13: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Hypothesis testing

Hence

F :=L(gG)−L(gLS)λe(n−nG)L(gLS)λe(N−n)

∈ F (n− nG , N − n)

Thus, if g ∈ G,

F ∈ F (n− nG , N − n) (at least asymptotically in N)

Standard F-test: Reject hypothesis if F > F−1p (n− nG , N − n)

(KTH) Lecture 10 14 / 51

Page 14: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Information criteria

AIC(G) = L(gG)(

1 + 2nGN

)(1)

Structured vs unstructured estimate:

L(gG)(

1 + 2nGN

)> L(gLS)

(1 + 2n

N

)i.e. if

L(gG) > L(gLS)1 + 2n

N

1 + 2nGN

≈(

1 + 2(n− nG)N

)L(gLS)

(KTH) Lecture 10 15 / 51

Page 15: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Confidence regions

C(p) =g : (g − gLS)TR(g − gLS) ≤ nλeF−1

p (n,N − n)

Reject G if gG /∈ C(p).

i.e. if(gG − gLS)TR(gG − gLS) > λenF

−1p (n,N − n)

(KTH) Lecture 10 16 / 51

Page 16: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model structure selection - Summary

F-test:F > F−1

p (n− nG , N − n)

Cross-correlation test:

CGEΦ > F−1p (n− nG , N − n)

AIC:L(gG) >

(1 + 2(n− nG)

N

)L(gLS)

Confidence region test:

(gG − gLS)TR(gG − gLS) > nλeF−1p (n,N − n)

(KTH) Lecture 10 17 / 51

Page 17: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model structure selection revisited

L(g) = (g − gLS)TR(g − gLS) + L(gLS)

V (g) := (g − gLS)TR(g − gLS) = ‖Φg − ΦgLS‖2 =∥∥∥Y (g)− Y (gLS)

∥∥∥2

• F-test: V (gG) > (n− nG) F−1p (n− nG , N − n) λe

• Cross-corr. test: V (gG) > (n− nG) F−1p (n− nG , N − n) λe

• AIC: V (gG) > (n− nG) 2(N−n)N λe

• Conf. region test: V (gG) > nF−1p (n,N − n) λe

Same statistic(Slightly) different thresholdsCheck of how well structured predictor matches unstructuredpredictor

(KTH) Lecture 10 18 / 51

Page 18: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Model structure selection revisited

Why does this hold?Cross-correlation statistic:

N

λeETΦ(ΦTΦ)−1ΦTE

ΦTE = ΦT (Y − Φg(θN ))= ΦT (Y − ΦgLS + Φ(gLS)− g(θN ))= ΦTΦ(gLS)− g(θN )

N

λeETΦ(ΦTΦ)−1ΦTE = N

λe(gLS − g)TΦTΦ(gLS − g)

(KTH) Lecture 10 19 / 51

Page 19: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A model’s accuracy: FIR

y(t) = Bo(q)u(t) + e(t) = ϕT (t)θo + e(t), ϕ(t) =[u(t− 1) . . .

]T

Estimate: θN =[N∑t=1

ϕ(t)ϕT (t)]−1 N∑

t=1ϕ(t)y(t)

=[N∑t=1

ϕ(t)ϕT (t)]−1 N∑

t=1ϕ(t)(ϕT (t)θo + e(t))

= θo +[N∑t=1

ϕ(t)ϕT (t)]−1 N∑

t=1ϕ(t)e(t)

White Gaussian noise ⇒√N(θN − θo) ∼ N (0, I−1

F )

(Per sample) Information matrix: IF = 1Nλe

N∑t=1

ϕ(t)ϕT (t)

(KTH) Lecture 10 21 / 51

Page 20: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A model’s accuracy: FIR

√N(θN − θo) ∼ N (0, I−1

F )

IF = 1λeN

N∑t=1

ϕ(t)ϕT (t), ϕ(t) =[u(t− 1) . . .

]T

Confidence ellipsoids: θ : (θo − θ)T IF (θo − θ) ≤ χ2α(n)N

(χ2α(n) quantile funct. of χ2 dist. with n def. of freedom.

θoEid

θoEid

Scales with sample sizeχ2α(n) ∝ n so scales with number of parametersIF ∝ 1/λe so scales with noise variance

(KTH) Lecture 10 22 / 51

Page 21: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A model’s accuracy: FIR

(θo − θ)T IF (θo − θ) =∑Nt=1(θo − θ)Tϕ(t)ϕT (t)(θo − θ)

λeN θoEid

θoEid

ϕT (t)(θo − θ) = y(t)− e(t)− ϕT θ = ε(t, θ)− e(t) ⇒

(θo − θ)T IF (θo − θ) ≈1

λeN

N∑t=1

ε2(t, θ) + 1λeN

N∑t=1

e2(t) ≈ 1λeVid(θ) + 1

Confidence ellipsoid: θ : Vid(θ) ≤ λeχ2α(n)N − λe

Level curve of the identification criterionAccurate estimate in directions where prediction error sensitive toparameter changes (input dependent)Large sample size (N →∞):

IF ≈1λe

E[ϕ(t)ϕT (t)] = 1λe

E[u2(t)] E[u(t)u(t− 1)] . . .E[u(t)u(t− 1)] E[u2(t)] . . .

.... . . . . .

Strong correlations⇒ Poor conditioning⇒ Poor information

(KTH) Lecture 10 23 / 51

Page 22: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A model’s accuracy: FIR

Frequency function estimate:

G(eiω, θ) = B(eiω) = ΓT (eiω)θ, where Γ(q) =[q−1 . . . q−n

]TLinear in θ ⇒ Error:

CovG(eiω, θN ) = 1N

ΓT (eiω)I−1F Γ(eiω)

White input: IF = λuλeI ⇒

CovG(eiω, θN ) = 1N

ΓT (eiω)λuλeI Γ(eiω) = n

λeNλu

Noise power (λe) to signal energy (Nλu)Scaling with number of parameters n

Approximate expression for general input and noise spectra:

CovG(eiω, θN ) ≈ n Φe(ω)NΦu(ω)

(KTH) Lecture 10 24 / 51

Page 23: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Error bounds

Data (input is white with variance 1):

0 10 20 30 40 50 60 70 80 90 100−30

−20

−10

0

10

20

30

InputOutput

Model: y(t) = b1+f1q−1+f2q−2u(t) + e(t)

Bode plots with error bounds:

(KTH) Lecture 10 25 / 51

Page 24: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Fundamental limitations: FIRCramér-Rao lower boundI−1F smallest possible covariance matrix among all unbiased

estimators.1

∫ π

−π

CovG(eiω, θN )Φu(ω)dω =1

2πN

∫ π

−π

ΓT (eiω)I−1F

Γ(eiω)Φu(ω)dω

= Tr

I

−1F

12πN

∫ π

−π

Γ(eiω)ΓT (eiω)Φu(ω)dω

but ϕ(t) =

[u(t− 1) . . . u(t− n)

]T= Γ(q)u(t) so

IF ≈1λe

E[ϕ(t)ϕT (t)] = 1λe2π

∫ π

−πΓ(eiω)Φu(ω)Γ∗(eiω)dω

Water bed effect

12π

∫ π

−πCovG(eiω, θN )Φu(ω)dω = 1

λeNTrI = nλe

N

(KTH) Lecture 10 27 / 51

Page 25: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

The waterbed effect: FIR

12π

∫ π

−πCovG(eiω, θN )Φu(ω)dω = nλe

N

Decreasing the error somewhere will increase the errorsomewhere else.Input spectrum acts as weighting and controls the effect.Can allow smaller error where input spectrum largeAgain the number of parameters pops-up

(KTH) Lecture 10 28 / 51

Page 26: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A model’s accuracy and fundamental limitations

The general case

y(t) = G(q, θ)u(t) +H(q, θ)e(t)

Error for small sample sizes not well understoodLarge sample size⇒

ε(t, θN ) ≈ ε(t, θo) + ϕT (t)(θN − θo)

where ϕ(t) = ddθε(t, θo) = − d

dθ y(t, θo). Same as FIR case!√N(θN − θo) ∼ N (0, I−1

F ), IF = 1λe

E[ϕ(t)ϕT (t)]

CovG(eiω, θN ) ≈ n |Ho(eiω)|2λeNΦu(ω)

12π

∫ π

−πCovG(eiω, θN ) Φu(ω)

|Ho(eiω)|2dω = n

λeN

(KTH) Lecture 10 29 / 51

Page 27: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Connection to ML

VN (θ) = 1N

N∑t=1

`(ε(t, θ))

P = κ(`)(E[ϕ(t)ϕT (t)]

)−1

κ(`) has replaced λe

κ(`) = E[`′(eo(t))]2

(E[`′′(eo(t))])2

κ = λe for `(x) = x2

κ(`) ≥ κo := κ(− log fe)

(KTH) Lecture 10 30 / 51

Page 28: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Analysing the asymptotic covariance matrix

√N(θN − θo

)∈ AsN (0, P )

where P also satisfies

P = AsCov θN , limN→∞

N · E[(θN −EθN )(θN −EθN )T

]

(KTH) Lecture 10 31 / 51

Page 29: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Introduction

An enormous amount of model error information hidden in P :Structural

Model structure (ARX, ARMAX, Box-Jenkins, non-linear,...)Model orderOpen vs closed loopInput channels and input excitationSensor channelsNoise

System propertiesFrequency functionImpulse response coefficientsGainsPoles and zerosControl applications

(KTH) Lecture 10 32 / 51

Page 30: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

The structure of P

A non-singlular P can always be written as

P = [〈Ψ,Ψ〉]−1

where Ψ : C→ Cn×m, for some integer m > 0 depending on the modelstructure, and where

〈Ψ,Φ〉 = 12π

∫ π

−πΨ(ejω)Ψ∗(ejω)dω

(inner products between the rows,also known as a Gramian)

(KTH) Lecture 10 34 / 51

Page 31: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

The structure of P : Example 1 - FIR models

yt =n∑k=1

θk ut−k + et = ϕTt θ + et

P = λe[E[ϕtϕ

Tt

]]−1

E[ϕtϕ

Tt

]=

r0 r1 . . . rn−1r1 r0 . . . rn−2...

...rn−1 rn−2 . . . r0

= 1

∫ π

−πΓn(ejω)Φu(ω)Γ∗n(ejω)

where Γn(q) =[q−1 . . . q−n

]T⇒ P = 〈ΓnΦ1/2

u /√λe,ΓnΦ1/2

u /√λe〉−1 = 〈Ψ,Ψ〉−1

(KTH) Lecture 10 35 / 51

Page 32: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

The structure of P : Example 2 - NFIR models

yt =n∑k=1

θk ut−k +m∑

k=n+1θk (ut−(k−n))2 + et = ϕTt θ + et

ϕt = [ut−1, . . . , ut−n, (ut−1)2, . . . , (ut−m)2]T

= (M(q)zt)

where

M(q) =[Γn(q) 0

0 Γm(q)

], zt =

[ut

(ut)2

]

⇒ Ψ(eiω) = 1√λeM(eiω)Φ1/2

z (eiω).

where Φ1/2z is a Cholesky factor of Φz, the spectrum of z.(KTH) Lecture 10 36 / 51

Page 33: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

The structure of P : Other quantities

Typically not the parameters θ directly we are interested in, but rathersome function

J(θ) : Rn×1 → C1×q:frequency functionimpulse responsesystem gain

AsCov J(θN ) = limN→∞

E[(J(θN )− J(θo))∗(J(θN )− J(θo))] = Λ∗ [〈Ψ,Ψ〉]−1 Λ

where

Λ , J ′(θo) ∈ Cn×q

(KTH) Lecture 10 37 / 51

Page 34: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A refresher on orthogonal projection: Scalar case

Least squares estimation

Y = θΦ + E

W

QP

(Y and θ are row vectors)

Y = Y ΦT[ΦΦT

]−1Φ

W

Q

Y

X

Z

General case:Let the rows of X =

[x1 x2 . . . xn

]Tspan a (closed) subspace SX

to a Hilbert space H.Then the projection of f ∈ H on SX is given by

f = 〈f,X〉 [〈X,X〉]−1X

(KTH) Lecture 10 39 / 51

Page 35: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A refresher on orthogonal projection: MV-case

Multi-output least squares estimation

Y = θΦ + E

Q

WP

Y = Y ΦT[ΦΦT

]−1Φ

Same equation as before. Each row of Y projected on the row spaceof Φ.

General case:The projection of fi ∈ H, i = 1, . . . , n on SX is given by the rows of

f = 〈f,X〉 [〈X,X〉]−1X

where f =[f1 f2 . . . fn

]T(KTH) Lecture 10 40 / 51

Page 36: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A refresher on orthogonal projection: The norm

f = 〈f,X〉 [〈X,X〉]−1X

The “norm” of the projection:

〈f , f〉 = 〈f,X〉 [〈X,X〉]−1 〈X, f〉

(KTH) Lecture 10 41 / 51

Page 37: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A refresher on orthogonal projection

〈f , f〉 = 〈f,X〉 [〈X,X〉]−1 〈X, f〉

Hmmm, didn’t we just see something similar a minute ago????

AsCov J(θN ) = Λ∗ [〈Ψ,Ψ〉]−1 Λ

What if Λ = 〈Ψ, γ〉 for some function γ?

AsCov J(θN ) = 〈γ, γ〉 = 〈ProjSΨγ,ProjSΨγ〉

(KTH) Lecture 10 42 / 51

Page 38: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

A geometric interpretation of P

AsCov J(θN ) = 〈ProjSΨγ,ProjSΨγ〉

Scalar quantities: The asymptotic variance is the squared norm of γprojected onto the subspace spanned by the rows of Ψ.

Why is this useful?Often γ can be chosen so that it only depends on the quantity J(θ)of interestThe influence on Ψ from model structure, model order,experimental conditions often simple to establish.

Decoupling!Examples will be used to illustrate this.

(KTH) Lecture 10 44 / 51

Page 39: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Structural results: Adding parameters

Example: FIR system

yt =n∑k=1

θk ut−k + et = θϕTt + et

P = [〈Ψ,Ψ〉]−1

Ψ = 1√λe

ΓnΦ1/2u

True order n = no, J = J(θ1, . . . , θno).What happens with the accuracy if we over-model (n > no)?

(KTH) Lecture 10 46 / 51

Page 40: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Adding parameters: FIR example

AsCov J(θN ) =[

Λ0(n−no)×q

]∗[〈Ω,Ω〉]−1

0(n−no)×q

]

where

Ω = ΓnΦ1/2u√λe

=[

ΓnoΓno,n

]Φ1/2u√λe

where

Γm,n(z) =[z−(m+1) . . . z−n

]T.

The question is now how large Λ∗ [〈Ψ,Ψ〉]−1 Λ is in comparison withthe expression above.

(KTH) Lecture 10 47 / 51

Page 41: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Structural results: Adding parameters

Geometrical resultLet X and Y be two subspaces of Lm2 such that X ⊆ Y ⊆ Lm2 and letγ ∈ Lq×m2 . We then have the orthogonal decomposition

ProjYγ = ProjXγ+ ProjX⊥(Y)γ

(X⊥(Y) denotes the orthogonal complement of X in Y)

Hence

〈ProjYγ,ProjYγ〉 − 〈ProjXγ,ProjXγ〉= 〈ProjX⊥(Y)γ,ProjX⊥(Y)γ〉 ≥ 0

(KTH) Lecture 10 48 / 51

Page 42: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Adding parameters: FIR example

AsCov J(θN ) =[

Λ0(n−no)×q

]∗[〈Ω,Ω〉]−1

0(n−no)×q

]

where Ω = ΓnΦ1/2u√λe

=[

ΓnoΓno,n

]Φ1/2u√λe

,

[ΨΦ

]

X = SΨ ⊆ SΩ = Y.

A property of a suitable γ: 〈Ω, γ〉 =[

Λ0(n−no)×q

]i.e. γ ⊥ SΦ.

When is there no variance increase?0 = ProjX⊥(Y)γ = ProjS⊥Ψ (SΩ)γ, γ ⊥ SΦ⇒ S⊥Ψ (SΩ) ⊂ SΦ ⇒SΦ ⊥ SΨ ⇒ 〈Φ,Ψ〉 = 0

(KTH) Lecture 10 49 / 51

Page 43: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Adding parameters: FIR example

No increase: If and only if 〈Φ,Ψ〉 = 0

FIR example:

〈Φ,Ψ〉 =〈Γno,nΦ1/2u ,ΓnoΦ1/2

u 〉

= Rno,n ,

rno rno−1 · · · r1rno+1 rno · · · r2

......

rn−1 rn−2 · · · rn−no

,

No increase: If and only if r1 = . . . = rn−1 = 0

(KTH) Lecture 10 50 / 51

Page 44: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

Structural results: Adding parameters

No increase: If and only if 〈Φ,Ψ〉 = 0

Derivation not tied to the FIR example.

Immediate generalization:Adding parameters increases asymptotic variance unless newpredictor gradients are orthogonal to the old ones.

(KTH) Lecture 10 51 / 51

Page 45: Lecture 10 - kth.se€¦ · Lecture 10 Håkan Hjalmarsson February 6, 2019 (KTH) Lecture 10 1/51. Outline 1 Model structure selection 2 A model’s accuracy 3 Fundamental limitations:

L. Ljung, “Asymptotic variance expressions for identified black-boxtransfer function models,” IEEE Trans. Automatic Control, vol. 30,no. 9, pp. 834–844, 1985.

L. Ljung and Z. Yuan, “Asymptotic properties of black-boxidentification of transfer functions,” IEEE Trans. Automatic Control,vol. 30, no. 6, pp. 514–530, 1985.

H. Hjalmarsson and J. Mårtensson, “A geometric approach tovariance analysis in system identification,” IEEE Transactions onAutomatic Control, vol. 56, no. 5, pp. 983–997, May 2011.

J. Mårtensson, N. Everitt, and H. Hjalmarsson, “Covarianceanalysis in SISO linear systems identification,” Automatica, vol. 77,pp. 82–92, Mar 2017.

(KTH) Lecture 10 51 / 51