Introduction to Local Level Model and Kalman Filter · Kalman Filter I The Kalman lter calculates...

Post on 16-Oct-2020

51 views 0 download

Transcript of Introduction to Local Level Model and Kalman Filter · Kalman Filter I The Kalman lter calculates...

Introduction toLocal Level Model and Kalman Filter

S.J. Koopman

http://staff.feweb.vu.nl/koopman

January 2011

U.K. Yearly Inflation : signal extraction

1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

−0.2

−0.1

0.0

0.1

0.2

0.3

Dow Jones

Dow Jones (Industrial Avg, Monthly Closures)

1930 1940 1950 1960 1970 1980 1990 2000 2010

5000

10000

15000Dow Jones (Industrial Avg, Monthly Closures)

LogReturn

1930 1940 1950 1960 1970 1980 1990 2000 2010

−0.25

0.00

0.25LogReturn

LSqrLReturn

1930 1940 1950 1960 1970 1980 1990 2000 2010

−15

−10

−5

0LSqrLReturn

LogSqr LogReturns Dow Jones : signal extraction

1940 1960 1980 2000

−15

−10

−5

1940 1960 1980 2000

0

1

2

1950 1960 1970 1980 1990 2000 2010

−15

−10

−5

1950 1960 1970 1980 1990 2000 2010

−0.2

−0.1

0.0

0.1

Classical Decomposition

Basic ModelA basic model for representing a time series is the additive model

yt = µt + γt + εt , t = 1, . . . , n,

also known as the Classical Decomposition.

yt = observation,

µt = slowly changing component (trend),

γt = periodic component (seasonal),

εt = irregular component (disturbance).

Unobserved Components Time Series Model

In a Structural Time Series Model (STSM) or UnobservedComponents Model (UCM), the various components are modelledexplicitly as stochastic processes.

Local Level Model

I Components can be deterministic functions of time (e.g.polynomials), or stochastic processes;

I Deterministic example: yt = µ+ εt with εt ∼ NID(0, σ2ε).

I Stochastic example: the Random Walk plus Noise, orLocal Level model:

yt = µt + εt , εt ∼ NID(0, σ2ε)

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

I The disturbances εt , ηs are independent for all s, t;

I The model is incomplete without a specification for µ1 (notethe non-stationarity):

µ1 ∼ N (a,P)

Local Level Model

yt = µt + εt , εt ∼ NID(0, σ2ε)

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

µ1 ∼ N (a,P)

General framework

I The level µt and the irregular εt are unobservables;

I Parameters: σ2ε and σ2η;I Trivial special cases:

I σ2η = 0 =⇒ yt ∼ NID(µ1, σ

2ε) (WN with constant level);

I σ2ε = 0 =⇒ yt+1 = yt + ηt (pure RW);

I Local Level is a model representation for EWMA forecasting.

Simulated LL Data

y µ

0 10 20 30 40 50 60 70 80 90 100

−6

−4

−2

0

2

4

6

σε2=0.1 ση

2=1 y µ

Simulated LL Data

0 10 20 30 40 50 60 70 80 90 100

−6

−4

−2

0

2

4

6

σε2=1 ση

2=1

Simulated LL Data

0 10 20 30 40 50 60 70 80 90 100

−6

−4

−2

0

2

4

6

σε2=1 ση

2=0.1

Simulated LL Data

y µ

0 10 20 30 40 50 60 70 80 90 100

−5

0

5 σε2=0.1 ση

2=1 y µ

0 10 20 30 40 50 60 70 80 90 100

−5

0

5 σε2=1 ση

2=1

0 10 20 30 40 50 60 70 80 90 100

−2

0

2 σε2=1 ση

2=0.1

Local Level Model

yt = µt + εt , εt ∼ NID(0, σ2ε),

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

Its properties

I First difference is stationary:

∆yt = ∆µt + ∆εt = ηt−1 + εt − εt−1.

I Dynamic properties of ∆yt :

E(∆yt) = 0,

γ0 = E(∆yt∆yt) = σ2η + 2σ2ε ,

γ1 = E(∆yt∆yt−1) = −σ2ε ,γτ = E(∆yt∆yt−τ ) = 0 for τ ≥ 2.

Properties of the LL model

I The ACF of ∆yt is

ρ1 =−σ2ε

σ2η + 2σ2ε= − 1

q + 2, q = σ2η/σ

2ε ,

ρτ = 0, τ ≥ 2.

I q is called the signal-noise ratio;

I The model for ∆yt is MA(1) with restricted parameters suchthat

−1/2 ≤ ρ1 ≤ 0

i.e., yt is ARIMA(0,1,1);

I Write ∆yt = ξt + θξt−1, ξt ∼ NID(0, σ2) to solve θ:

θ =1

2

(√q2 + 4q − 2− q

).

Local Level Model

I The model parameters are estimated by Maximum Likelihood;

I Advantages of model based approach: assumptions can betested, parameters are estimated...;

I The model with estimated parameters is used for the signalextraction of components;

I The estimated level µt is effectively a locally weighted averageof the data;

I The distribution of weights can be compared with Kernelfunctions in nonparametric regressions;

I On basis of model, the methods yield minimum mean squareerror (MMSE) forecasts and the associated confidenceintervals.

Nile Data: decomposition

Nile Level +/− 2SE

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970500

750

1000

1250

Nile Level +/− 2SE

Nile−Irregular

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

−200

0

200Nile−Irregular

Nile Data: decomposition weights

Nile−Level

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970

800

900

1000

1100

Nile−Level

Nile−Weights of Level

−20 −15 −10 −5 0 5 10 15 20

0.05

0.10

0.15 Nile−Weights of Level

Nile Data: forecasts

Nile Forecast−Nile +/− SE

1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980

500

600

700

800

900

1000

1100

1200

1300

1400Nile Forecast−Nile +/− SE

Kalman Filter

I The Kalman filter calculates the mean and variance of theunobserved state, given the observations.

I The state is Gaussian: the complete distribution ischaracterized by the mean and variance.

I The filter is a recursive algorithm; the current best estimate isupdated whenever a new observation is obtained.

I To start the recursion, we need a1 and P1, which we assumedgiven.

I There are various ways to initialize when a1 and P1 areunknown, which we will not discuss here. See discussion inDK book, Chapter 2.

Kalman Filter

The unobserved variable µt can be estimated from theobservations with the Kalman filter:

vt = yt − at ,

Ft = Pt + σ2ε ,

Kt = PtF−1t ,

at+1 = at + Ktvt ,

Pt+1 = Pt + σ2η − K 2t Ft ,

for t = 1, . . . , n and starting with given values for a1 and P1.

I Writing Yt = {y1, . . . , yt}, define

at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt).

Kalman Filter

Local level model: µt+1 = µt + ηt , yt = µt + εt .

I Writing Yt = {y1, . . . , yt}, define

at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt);

I The prediction error is

vt = yt − E(yt |Yt−1)

= yt − E(µt + εt |Yt−1)

= yt − E(µt |Yt−1)

= yt − at ;

I It follows that vt = (µt − at) + εt and E(vt) = 0;

I The prediction error variance is Ft = var(vt) = Pt + σ2ε .

Regression theoryThe proof of the Kalman filter uses lemmas from the multivariateNormal regression theory.

Lemma 1Suppose x , y are jointly Normally distributed vectors. Then

E(x |y) = E(x) + ΣxyΣ−1y y ,

var(x |y) = Σxx − ΣxyΣ−1yy Σ′xy .

Lemma 2Suppose x , y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then

E(x |y , z) = E(x |y) + ΣxzΣ−1zz z ,

var(x |y , z) = var(x |y)− ΣxzΣ−1zz Σ′xz .

Derivation Kalman FilterLocal level model: µt+1 = µt + ηt , yt = µt + εt .

I We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 forj = 1, . . . , t − 1;

I The lemma is E(x |y , z) = E(x |y) + ΣxzΣ−1zz z .In our case, take x = µt+1, y = Yt−1 andz = vt = (µt − at) + εt ;

I E(x |y) implies thatE(µt+1|Yt−1) = E(µt |Yt−1) + E(ηt |Yt−1) = at ;

I Further, Σxz provides the expressionE(µt+1vt) = E(µtvt) + E(ηtvt) = E[(µt − at)(yt − at)] +E(ηtvt) = E[(µt−at)(µt−at)]+E[(µt−at)εt ]+E(ηtvt) = Pt ;

I Since Σzz = Ft , we can apply lemma and obtain the stateupdate

at+1 = E(µt+1|Yt−1, yt)

= at + PtF−1t vt

= at + Ktvt ; with Kt = PtF−1t .

Kalman Filter Derived

I Our best prediction of yt based on its past is at . When theactual observation arrives, calculate the prediction errorvt = yt − at and its variance Ft = Pt + σ2ε .

I The best estimate of the state mean for the next period isbased on both the current estimate at and the newinformation vt :

at+1 = at + Ktvt ,

similarly for the variance:

Pt+1 = Pt + σ2η − KtFtK′t .

I The Kalman gainKt = PtF

−1t

is the optimal weighting matrix for the new evidence.

I You should be able to replicate the proof of the Kalman filterfor the Local Level Model (DK, Chapter 2).

Kalman filter for Nile Data: (i) at ; (ii) Pt ; (iii) vt and (iv) Ft .

1880 1900 1920 1940 1960500

750

1000

1250

(i)

1880 1900 1920 1940 1960

7500

10000

12500

15000

17500 (ii)

1880 1900 1920 1940 1960

−250

0

250

(iii)

1880 1900 1920 1940 1960

22500

25000

27500

30000

32500 (iv)

Steady State Kalman Filter

Kalman filter converges to a positive value, say Pt → P. We wouldthen have

Ft → P + σ2ε , Kt → P/(P + σ2ε).

The state prediction variance updating leads to

P = P

(1− P

P + σ2ε

)+ σ2η,

which reduces to the quadratic

x2 − xq − q = 0,

where x = P/σ2ε and q = σ2η/σ2ε , with solution

P = σ2ε(q +

√q2 + 4q

)/2.

Smoothing

I The filter calculates the mean and variance conditional on Yt ;

I The Kalman smoother calculates the mean and varianceconditional on the full set of observations Yn;

I After the filtered estimates are calculated, the smoothingrecursion starts at the last observations and runs until thefirst.

µt = E(µt |Yn), Vt = var(µt |Yn),

rt = weighted sum of future innovations, Nt = var(rt),

Lt = 1− Kt .

Starting with rn = 0, Nn = 0, the smoothing recursions are givenby

rt−1 = F−1t vt + Ltrt , Nt−1 = F−1t + L2t Nt ,

µt = at + Ptrt−1, Vt = Pt − P2t Nt−1.

Kalman smoothing for Nile Data: (i) µt ; (ii) Vt ; (iii) rt and (iv) Nt .

1880 1900 1920 1940 1960500

750

1000

1250

(i)

1880 1900 1920 1940 1960

2500

3000

3500

4000(ii)

1880 1900 1920 1940 1960

−0.02

0.00

0.02

(iii)

1880 1900 1920 1940 1960

6e−5

8e−5

0.0001

(iv)

Missing Observations

Missing observations are very easy to handle in Kalman filtering:

I suppose yj is missing

I put vj = 0,Kj = 0 and Fj =∞ in the algorithm

I proceed further calculations as normal

The filter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations.

Nile Data with missing observations : (i) at , (ii) Pt , (iii) µt and(iv) Vt .

1880 1900 1920 1940 1960500

750

1000

1250

(i)

1880 1900 1920 1940 1960

10000

20000

30000

(ii)

1880 1900 1920 1940 1960500

750

1000

1250

(iii)

1880 1900 1920 1940 1960

2500

5000

7500

10000 (iv)

Forecasting

Forecasting requires no extra theory: just treat future observationsas missing:

I put vj = 0,Kj = 0 and Fj =∞ for j = n + 1, . . . , n + k

I proceed further calculations as normal

I forecast for yj is aj

Nile Data: forecasting

1900 1950 2000500

750

1000

1250

(i)

1900 1950 2000

10000

20000

30000

40000

50000 (ii)

1900 1950 2000

800

900

1000

1100

1200 (iii)

1900 1950 2000

30000

40000

50000

60000

(iv)

Parameters in Local Level Model

We recall the Local Level Model as

yt = µt + εt , εt ∼ NID(0, σ2ε)

µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),

µ1 ∼ N (a,P)

General framework

I The unknown µt ’s can be estimated by prediction, filteringand smoothing;

I The other parameters are given by the variances σ2ε and σ2η;

I We estimate these parameters by Maximum Likelihood;

I Parameters can be transformed : σ2ε = exp(ψε) andσ2η = exp(ψη);

I Parameter vector ψ = (ψε , ψη)′.

Parameter Estimation by ML

The parameters in any state space model can be collected in somevector ψ. When model is linear and Gaussian; we can estimate ψby Maximum Likelihood.

The loglikelihood af a time series is

log L =n∑

t=1

log p(yt |Yt−1).

In the state space model, p(yt |Yt−1) is a Gaussian density withmean at and variance Ft :

log L = −n

2log 2π − 1

2

n∑t=1

(log Ft + F−1t v2

t

),

with vt and Ft from the Kalman filter. This is called the predictionerror decomposition of the likelihood. Estimation proceeds bynumerically maximising log L.

Diagnostics

I Null hypothesis: standardised residuals

vt/√

F t ∼ NID(0, 1)

I Apply standard test for Normality, heteroskedasticity, serialcorrelation;

I A recursive algorithm is available to calculate smootheddisturbances (auxilliary residuals), which can be used to detectbreaks and outliers;

I Model comparison and parameter restrictions: use likelihoodbased procedures (LR test, AIC, BIC).

Nile Data: diagnostics

1880 1900 1920 1940 1960

−2

0

2

(i)

−3 −2 −1 0 1 2 3

0.1

0.2

0.3

0.4

0.5 (ii)

−2 −1 0 1 2

−2

0

2

(iii)

0 5 10

−0.5

0.0

0.5

1.0 (iv)

Three exercises

1. Consider LL model (see slides, see DK chapter 2).I Reduced form is ARIMA(0,1,1) process. Derive the

relationship between signal-to-noise ratio q of LL model andthe θ coefficient of the ARIMA model;

I Derive the reduced form in the case ηt =√

qεt and notice thedifference in the general case.

I Give the elements of the mean vector and variance matrix ofy = (y1, . . . , yn)′ when yt is generated by a LL model.

I Show that the forecasts of the Kalman filter (in a steady state)are the same as those generated by the exponentially weightedmoving average (EWMA) method of forecasting:yt+1 = yt + λ(yt − yt) for t = 1, . . . , n. Derive the relationshipbetween λ and the signal-to-noise ratio q ?

Three exercises (cont.)

2. Derive a Kalman filter for the local level model

yt = µt + εt , εt ∼ N(0, σ2ε), ∆µt+1 = ηt ∼ N(0, σ2η),

with E (εtηt) = σεη 6= 0 and E (εtηs) = 0 for all t, s and t 6= s.Also discuss the problem of missing obervations in this case.

3. Write Ox program(s) that produce all Figures in Ch 2 of DKexcept Fig. 2.4. Data:http://www.ssfpack.com/dkbook.html

Selected references

I A.C.Harvey (1989). Forecasting, Structural Time SeriesModels and the Kalman Filter. Cambridge University Press

I G.Kitagawa & W.Gersch (1996). Smoothness Priors Analysisof Time Series. Springer-Verlag

I J.Harrison & M.West (1997). Bayesian Forecasting andDynamic Models. Springer-Verlag

I R.H. Shumway and D.S. Stoffer (2000). Time Series Analysisand its Applications. Springer-Verlag.

I J.Durbin & S.J.Koopman (2011). Time Series Analysis byState Space Methods. Second Edition, Oxford UniversityPress

I J.Commandeur & S.J.Koopman (2007). An Introduction toState Space Time Series Analysis. Oxford University Press