Introduction to Local Level Model and Kalman Filter · Kalman Filter I The Kalman lter calculates...
Transcript of Introduction to Local Level Model and Kalman Filter · Kalman Filter I The Kalman lter calculates...
Introduction toLocal Level Model and Kalman Filter
S.J. Koopman
http://staff.feweb.vu.nl/koopman
January 2011
U.K. Yearly Inflation : signal extraction
1760 1780 1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000
−0.2
−0.1
0.0
0.1
0.2
0.3
Dow Jones
Dow Jones (Industrial Avg, Monthly Closures)
1930 1940 1950 1960 1970 1980 1990 2000 2010
5000
10000
15000Dow Jones (Industrial Avg, Monthly Closures)
LogReturn
1930 1940 1950 1960 1970 1980 1990 2000 2010
−0.25
0.00
0.25LogReturn
LSqrLReturn
1930 1940 1950 1960 1970 1980 1990 2000 2010
−15
−10
−5
0LSqrLReturn
LogSqr LogReturns Dow Jones : signal extraction
1940 1960 1980 2000
−15
−10
−5
1940 1960 1980 2000
0
1
2
1950 1960 1970 1980 1990 2000 2010
−15
−10
−5
1950 1960 1970 1980 1990 2000 2010
−0.2
−0.1
0.0
0.1
Classical Decomposition
Basic ModelA basic model for representing a time series is the additive model
yt = µt + γt + εt , t = 1, . . . , n,
also known as the Classical Decomposition.
yt = observation,
µt = slowly changing component (trend),
γt = periodic component (seasonal),
εt = irregular component (disturbance).
Unobserved Components Time Series Model
In a Structural Time Series Model (STSM) or UnobservedComponents Model (UCM), the various components are modelledexplicitly as stochastic processes.
Local Level Model
I Components can be deterministic functions of time (e.g.polynomials), or stochastic processes;
I Deterministic example: yt = µ+ εt with εt ∼ NID(0, σ2ε).
I Stochastic example: the Random Walk plus Noise, orLocal Level model:
yt = µt + εt , εt ∼ NID(0, σ2ε)
µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),
I The disturbances εt , ηs are independent for all s, t;
I The model is incomplete without a specification for µ1 (notethe non-stationarity):
µ1 ∼ N (a,P)
Local Level Model
yt = µt + εt , εt ∼ NID(0, σ2ε)
µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),
µ1 ∼ N (a,P)
General framework
I The level µt and the irregular εt are unobservables;
I Parameters: σ2ε and σ2η;I Trivial special cases:
I σ2η = 0 =⇒ yt ∼ NID(µ1, σ
2ε) (WN with constant level);
I σ2ε = 0 =⇒ yt+1 = yt + ηt (pure RW);
I Local Level is a model representation for EWMA forecasting.
Simulated LL Data
y µ
0 10 20 30 40 50 60 70 80 90 100
−6
−4
−2
0
2
4
6
σε2=0.1 ση
2=1 y µ
Simulated LL Data
0 10 20 30 40 50 60 70 80 90 100
−6
−4
−2
0
2
4
6
σε2=1 ση
2=1
Simulated LL Data
0 10 20 30 40 50 60 70 80 90 100
−6
−4
−2
0
2
4
6
σε2=1 ση
2=0.1
Simulated LL Data
y µ
0 10 20 30 40 50 60 70 80 90 100
−5
0
5 σε2=0.1 ση
2=1 y µ
0 10 20 30 40 50 60 70 80 90 100
−5
0
5 σε2=1 ση
2=1
0 10 20 30 40 50 60 70 80 90 100
−2
0
2 σε2=1 ση
2=0.1
Local Level Model
yt = µt + εt , εt ∼ NID(0, σ2ε),
µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),
Its properties
I First difference is stationary:
∆yt = ∆µt + ∆εt = ηt−1 + εt − εt−1.
I Dynamic properties of ∆yt :
E(∆yt) = 0,
γ0 = E(∆yt∆yt) = σ2η + 2σ2ε ,
γ1 = E(∆yt∆yt−1) = −σ2ε ,γτ = E(∆yt∆yt−τ ) = 0 for τ ≥ 2.
Properties of the LL model
I The ACF of ∆yt is
ρ1 =−σ2ε
σ2η + 2σ2ε= − 1
q + 2, q = σ2η/σ
2ε ,
ρτ = 0, τ ≥ 2.
I q is called the signal-noise ratio;
I The model for ∆yt is MA(1) with restricted parameters suchthat
−1/2 ≤ ρ1 ≤ 0
i.e., yt is ARIMA(0,1,1);
I Write ∆yt = ξt + θξt−1, ξt ∼ NID(0, σ2) to solve θ:
θ =1
2
(√q2 + 4q − 2− q
).
Local Level Model
I The model parameters are estimated by Maximum Likelihood;
I Advantages of model based approach: assumptions can betested, parameters are estimated...;
I The model with estimated parameters is used for the signalextraction of components;
I The estimated level µt is effectively a locally weighted averageof the data;
I The distribution of weights can be compared with Kernelfunctions in nonparametric regressions;
I On basis of model, the methods yield minimum mean squareerror (MMSE) forecasts and the associated confidenceintervals.
Nile Data: decomposition
Nile Level +/− 2SE
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970500
750
1000
1250
Nile Level +/− 2SE
Nile−Irregular
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
−200
0
200Nile−Irregular
Nile Data: decomposition weights
Nile−Level
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
800
900
1000
1100
Nile−Level
Nile−Weights of Level
−20 −15 −10 −5 0 5 10 15 20
0.05
0.10
0.15 Nile−Weights of Level
Nile Data: forecasts
Nile Forecast−Nile +/− SE
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980
500
600
700
800
900
1000
1100
1200
1300
1400Nile Forecast−Nile +/− SE
Kalman Filter
I The Kalman filter calculates the mean and variance of theunobserved state, given the observations.
I The state is Gaussian: the complete distribution ischaracterized by the mean and variance.
I The filter is a recursive algorithm; the current best estimate isupdated whenever a new observation is obtained.
I To start the recursion, we need a1 and P1, which we assumedgiven.
I There are various ways to initialize when a1 and P1 areunknown, which we will not discuss here. See discussion inDK book, Chapter 2.
Kalman Filter
The unobserved variable µt can be estimated from theobservations with the Kalman filter:
vt = yt − at ,
Ft = Pt + σ2ε ,
Kt = PtF−1t ,
at+1 = at + Ktvt ,
Pt+1 = Pt + σ2η − K 2t Ft ,
for t = 1, . . . , n and starting with given values for a1 and P1.
I Writing Yt = {y1, . . . , yt}, define
at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt).
Kalman Filter
Local level model: µt+1 = µt + ηt , yt = µt + εt .
I Writing Yt = {y1, . . . , yt}, define
at+1 = E(µt+1|Yt), Pt+1 = var(µt+1|Yt);
I The prediction error is
vt = yt − E(yt |Yt−1)
= yt − E(µt + εt |Yt−1)
= yt − E(µt |Yt−1)
= yt − at ;
I It follows that vt = (µt − at) + εt and E(vt) = 0;
I The prediction error variance is Ft = var(vt) = Pt + σ2ε .
Regression theoryThe proof of the Kalman filter uses lemmas from the multivariateNormal regression theory.
Lemma 1Suppose x , y are jointly Normally distributed vectors. Then
E(x |y) = E(x) + ΣxyΣ−1y y ,
var(x |y) = Σxx − ΣxyΣ−1yy Σ′xy .
Lemma 2Suppose x , y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then
E(x |y , z) = E(x |y) + ΣxzΣ−1zz z ,
var(x |y , z) = var(x |y)− ΣxzΣ−1zz Σ′xz .
Derivation Kalman FilterLocal level model: µt+1 = µt + ηt , yt = µt + εt .
I We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 forj = 1, . . . , t − 1;
I The lemma is E(x |y , z) = E(x |y) + ΣxzΣ−1zz z .In our case, take x = µt+1, y = Yt−1 andz = vt = (µt − at) + εt ;
I E(x |y) implies thatE(µt+1|Yt−1) = E(µt |Yt−1) + E(ηt |Yt−1) = at ;
I Further, Σxz provides the expressionE(µt+1vt) = E(µtvt) + E(ηtvt) = E[(µt − at)(yt − at)] +E(ηtvt) = E[(µt−at)(µt−at)]+E[(µt−at)εt ]+E(ηtvt) = Pt ;
I Since Σzz = Ft , we can apply lemma and obtain the stateupdate
at+1 = E(µt+1|Yt−1, yt)
= at + PtF−1t vt
= at + Ktvt ; with Kt = PtF−1t .
Kalman Filter Derived
I Our best prediction of yt based on its past is at . When theactual observation arrives, calculate the prediction errorvt = yt − at and its variance Ft = Pt + σ2ε .
I The best estimate of the state mean for the next period isbased on both the current estimate at and the newinformation vt :
at+1 = at + Ktvt ,
similarly for the variance:
Pt+1 = Pt + σ2η − KtFtK′t .
I The Kalman gainKt = PtF
−1t
is the optimal weighting matrix for the new evidence.
I You should be able to replicate the proof of the Kalman filterfor the Local Level Model (DK, Chapter 2).
Kalman filter for Nile Data: (i) at ; (ii) Pt ; (iii) vt and (iv) Ft .
1880 1900 1920 1940 1960500
750
1000
1250
(i)
1880 1900 1920 1940 1960
7500
10000
12500
15000
17500 (ii)
1880 1900 1920 1940 1960
−250
0
250
(iii)
1880 1900 1920 1940 1960
22500
25000
27500
30000
32500 (iv)
Steady State Kalman Filter
Kalman filter converges to a positive value, say Pt → P. We wouldthen have
Ft → P + σ2ε , Kt → P/(P + σ2ε).
The state prediction variance updating leads to
P = P
(1− P
P + σ2ε
)+ σ2η,
which reduces to the quadratic
x2 − xq − q = 0,
where x = P/σ2ε and q = σ2η/σ2ε , with solution
P = σ2ε(q +
√q2 + 4q
)/2.
Smoothing
I The filter calculates the mean and variance conditional on Yt ;
I The Kalman smoother calculates the mean and varianceconditional on the full set of observations Yn;
I After the filtered estimates are calculated, the smoothingrecursion starts at the last observations and runs until thefirst.
µt = E(µt |Yn), Vt = var(µt |Yn),
rt = weighted sum of future innovations, Nt = var(rt),
Lt = 1− Kt .
Starting with rn = 0, Nn = 0, the smoothing recursions are givenby
rt−1 = F−1t vt + Ltrt , Nt−1 = F−1t + L2t Nt ,
µt = at + Ptrt−1, Vt = Pt − P2t Nt−1.
Kalman smoothing for Nile Data: (i) µt ; (ii) Vt ; (iii) rt and (iv) Nt .
1880 1900 1920 1940 1960500
750
1000
1250
(i)
1880 1900 1920 1940 1960
2500
3000
3500
4000(ii)
1880 1900 1920 1940 1960
−0.02
0.00
0.02
(iii)
1880 1900 1920 1940 1960
6e−5
8e−5
0.0001
(iv)
Missing Observations
Missing observations are very easy to handle in Kalman filtering:
I suppose yj is missing
I put vj = 0,Kj = 0 and Fj =∞ in the algorithm
I proceed further calculations as normal
The filter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations.
Nile Data with missing observations : (i) at , (ii) Pt , (iii) µt and(iv) Vt .
1880 1900 1920 1940 1960500
750
1000
1250
(i)
1880 1900 1920 1940 1960
10000
20000
30000
(ii)
1880 1900 1920 1940 1960500
750
1000
1250
(iii)
1880 1900 1920 1940 1960
2500
5000
7500
10000 (iv)
Forecasting
Forecasting requires no extra theory: just treat future observationsas missing:
I put vj = 0,Kj = 0 and Fj =∞ for j = n + 1, . . . , n + k
I proceed further calculations as normal
I forecast for yj is aj
Nile Data: forecasting
1900 1950 2000500
750
1000
1250
(i)
1900 1950 2000
10000
20000
30000
40000
50000 (ii)
1900 1950 2000
800
900
1000
1100
1200 (iii)
1900 1950 2000
30000
40000
50000
60000
(iv)
Parameters in Local Level Model
We recall the Local Level Model as
yt = µt + εt , εt ∼ NID(0, σ2ε)
µt+1 = µt + ηt , ηt ∼ NID(0, σ2η),
µ1 ∼ N (a,P)
General framework
I The unknown µt ’s can be estimated by prediction, filteringand smoothing;
I The other parameters are given by the variances σ2ε and σ2η;
I We estimate these parameters by Maximum Likelihood;
I Parameters can be transformed : σ2ε = exp(ψε) andσ2η = exp(ψη);
I Parameter vector ψ = (ψε , ψη)′.
Parameter Estimation by ML
The parameters in any state space model can be collected in somevector ψ. When model is linear and Gaussian; we can estimate ψby Maximum Likelihood.
The loglikelihood af a time series is
log L =n∑
t=1
log p(yt |Yt−1).
In the state space model, p(yt |Yt−1) is a Gaussian density withmean at and variance Ft :
log L = −n
2log 2π − 1
2
n∑t=1
(log Ft + F−1t v2
t
),
with vt and Ft from the Kalman filter. This is called the predictionerror decomposition of the likelihood. Estimation proceeds bynumerically maximising log L.
Diagnostics
I Null hypothesis: standardised residuals
vt/√
F t ∼ NID(0, 1)
I Apply standard test for Normality, heteroskedasticity, serialcorrelation;
I A recursive algorithm is available to calculate smootheddisturbances (auxilliary residuals), which can be used to detectbreaks and outliers;
I Model comparison and parameter restrictions: use likelihoodbased procedures (LR test, AIC, BIC).
Nile Data: diagnostics
1880 1900 1920 1940 1960
−2
0
2
(i)
−3 −2 −1 0 1 2 3
0.1
0.2
0.3
0.4
0.5 (ii)
−2 −1 0 1 2
−2
0
2
(iii)
0 5 10
−0.5
0.0
0.5
1.0 (iv)
Three exercises
1. Consider LL model (see slides, see DK chapter 2).I Reduced form is ARIMA(0,1,1) process. Derive the
relationship between signal-to-noise ratio q of LL model andthe θ coefficient of the ARIMA model;
I Derive the reduced form in the case ηt =√
qεt and notice thedifference in the general case.
I Give the elements of the mean vector and variance matrix ofy = (y1, . . . , yn)′ when yt is generated by a LL model.
I Show that the forecasts of the Kalman filter (in a steady state)are the same as those generated by the exponentially weightedmoving average (EWMA) method of forecasting:yt+1 = yt + λ(yt − yt) for t = 1, . . . , n. Derive the relationshipbetween λ and the signal-to-noise ratio q ?
Three exercises (cont.)
2. Derive a Kalman filter for the local level model
yt = µt + εt , εt ∼ N(0, σ2ε), ∆µt+1 = ηt ∼ N(0, σ2η),
with E (εtηt) = σεη 6= 0 and E (εtηs) = 0 for all t, s and t 6= s.Also discuss the problem of missing obervations in this case.
3. Write Ox program(s) that produce all Figures in Ch 2 of DKexcept Fig. 2.4. Data:http://www.ssfpack.com/dkbook.html
Selected references
I A.C.Harvey (1989). Forecasting, Structural Time SeriesModels and the Kalman Filter. Cambridge University Press
I G.Kitagawa & W.Gersch (1996). Smoothness Priors Analysisof Time Series. Springer-Verlag
I J.Harrison & M.West (1997). Bayesian Forecasting andDynamic Models. Springer-Verlag
I R.H. Shumway and D.S. Stoffer (2000). Time Series Analysisand its Applications. Springer-Verlag.
I J.Durbin & S.J.Koopman (2011). Time Series Analysis byState Space Methods. Second Edition, Oxford UniversityPress
I J.Commandeur & S.J.Koopman (2007). An Introduction toState Space Time Series Analysis. Oxford University Press