Paris2012 session2
description
Transcript of Paris2012 session2
State space methods for signal extraction,parameter estimation and forecasting
Siem Jan Koopmanhttp://personal.vu.nl/s.j.koopman
Department of EconometricsVU University Amsterdam
Tinbergen Institute2012
Regression Model in time series
For a time series of a dependent variable yt and a set of regressorsXt , we can consider the regression model
yt = µ+ Xtβ + εt , εt ∼ NID(0, σ2), t = 1, . . . , n.
For a large n, is it reasonable to assume constant coefficients µ, βand σ2 ?
Assume we have strong reasons to believe that µ and β aretime-varying, the model may be written as
yt = µt + Xtβt + εt , µt+1 = µt + ηt , βt+1 = βt + ξt .
This is an example of a linear state space model.
2 / 35
State Space ModelLinear Gaussian state space model is defined in three parts:
→ State equation:
αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),
→ Observation equation:
yt = Ztαt + εt , εt ∼ NID(0,Ht),
→ Initial state distribution α1 ∼ N (a1,P1).
Notice that
• ζt and εs independent for all t, s, and independent from α1;
• observation yt can be multivariate;
• state vector αt is unobserved;
• matrices Tt ,Zt ,Rt ,Qt ,Ht determine structure of model.3 / 35
State Space Model
• state space model is linear and Gaussian: therefore propertiesand results of multivariate normal distribution apply;
• state vector αt evolves as a VAR(1) process;
• system matrices usually contain unknown parameters;
• estimation has therefore two aspects:• measuring the unobservable state (prediction, filtering and
smoothing);• estimation of unknown parameters (maximum likelihood
estimation);
• state space methods offer a unified approach to a wide rangeof models and techniques: dynamic regression, ARIMA, UCmodels, latent variable models, spline-fitting and many ad-hocfilters;
• next, some well-known model specifications in state spaceform ...
4 / 35
Regression with time-varying coefficients
General state space model:
αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),
yt = Ztαt + εt , εt ∼ NID(0,Ht).
Put regressors in Zt (that is Zt = Xt),
Tt = I , Rt = I ,
Result is regression model yt = Xtβt + εt with time-varyingcoefficient βt = αt following a random walk.
In case Qt = 0, the time-varying regression model reduces to thestandard linear regression model yt = Xtβ + εt with β = α1.
Many dynamic specifications for the regression coefficient can beconsidered, see below.
5 / 35
AR(1) process in State Space Form
Consider AR(1) model yt+1 = φyt + ζt , in state space form:
αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),
yt = Ztαt + εt , εt ∼ NID(0,Ht).
with state vector αt and system matrices
Zt = 1, Ht = 0,
Tt = φ, Rt = 1, Qt = σ2
we have
• Zt and Ht = 0 imply that α1t = yt ;
• State equation implies yt+1 = φ1yt + ζt with ζt ∼ NID(0, σ2);
• This is the AR(1) model !
6 / 35
AR(2) in State Space FormConsider AR(2) model yt+1 = φ1yt + φ2yt−1 + ζt , in state spaceform:
αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),
yt = Ztαt + εt , εt ∼ NID(0,Ht).
with 2× 1 state vector αt and system matrices:
Zt =[
1 0]
, Ht = 0
Tt =
[
φ1 1φ2 0
]
, Rt =
[
10
]
, Qt = σ2
we have
• Zt and Ht = 0 imply that α1t = yt ;
• First state equation implies yt+1 = φ1yt + α2t + ζt withζt ∼ NID(0, σ2);
• Second state equation implies α2,t+1 = φ2yt ;
7 / 35
MA(1) in State Space FormConsider MA(1) model yt+1 = ζt + θζt−1, in state space form:
αt+1 = Ttαt + Rtζt , ζt ∼ NID(0,Qt),
yt = Ztαt + εt , εt ∼ NID(0,Ht).
with 2× 1 state vector αt and system matrices:
Zt =[
1 0]
, Ht = 0
Tt =
[
0 10 0
]
, Rt =
[
1θ
]
, Qt = σ2
we have
• Zt and Ht = 0 imply that α1t = yt ;
• First state equation implies yt+1 = α2t + ζt withζt ∼ NID(0, σ2);
• Second state equation implies α2,t+1 = θζt ;
8 / 35
General ARMA in State Space Form
Consider ARMA(2,1) model
yt = φ1yt−1 + φ2yt−2 + ζt + θζt−1
in state space form
αt =
[
ytφ2yt−1 + θζt
]
Zt =[
1 0]
, Ht = 0,
Tt =
[
φ1 1φ2 0
]
, Rt =
[
1θ
]
, Qt = σ2
All ARIMA(p, d , q) models have a (non-unique) state spacerepresentation.
9 / 35
UC models in State Space FormState space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .
LL model µt+1 = µt + ηt and yt = µt + εt :
αt = µt , Tt = 1, Rt = 1, Qt = σ2η ,
Zt = 1, Ht = σ2ε .
LLT model µt+1 = µt + βt + ηt , βt+1 = βt + ξt andyt = µt + εt :
αt =
[
µtβt
]
, Tt =
[
1 10 1
]
, Rt =
[
1 00 1
]
, Qt =
[
σ2η 0
0 σ2ξ
]
,
Zt =[
1 0]
, Ht = σ2ε .
10 / 35
UC models in State Space Form
State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .
LLT model with season: µt+1 = µt + βt + ηt , βt+1 = βt + ξt ,S(L)γt+1 = ωt and yt = µt + γt + εt :
αt =[
µt βt γt γt−1 γt−2
]′,
Tt =
1 1 0 0 00 1 0 0 00 0 −1 −1 −10 0 1 0 00 0 0 1 0
, Qt =
σ2η 0 0
0 σ2ξ 0
0 0 σ2ω
, Rt =
1 0 00 1 00 0 10 0 00 0 0
,
Zt =[
1 0 1 0 0]
, Ht = σ2ε .
11 / 35
Kalman Filter
• The Kalman filter calculates the mean and variance of theunobserved state, given the observations.
• The state is Gaussian: the complete distribution ischaracterized by the mean and variance.
• The filter is a recursive algorithm; the current best estimate isupdated whenever a new observation is obtained.
• To start the recursion, we need a1 and P1, which we assumedgiven.
• There are various ways to initialize when a1 and P1 areunknown, which we will not discuss here.
12 / 35
Kalman Filter
The unobserved state αt can be estimated from the observationswith the Kalman filter:
vt = yt − Ztat ,
Ft = ZtPtZ′t + Ht ,
Kt = TtPtZ′tF
−1t ,
at+1 = Ttat + Ktvt ,
Pt+1 = TtPtT′t + RtQtR
′t − KtFtK
′t ,
for t = 1, . . . , n and starting with given values for a1 and P1.
• Writing Yt = {y1, . . . , yt},
at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt).
13 / 35
Lemma I : multivariate Normal regressionSuppose x and y are jointly Normally distributed vectors withmeans E(x) and E(y), respectively, with variance matrices Σxx andΣyy , respectively, and covariance matrix Σxy . Then
E(x |y) = µx +ΣxyΣ−1yy (y − µy ),
Var(x |y) = Σxx − ΣxyΣ−1yy Σ
′xy .
Define e = x − E(x |y). ThenVar(e) = Var([x − µx ]− ΣxyΣ
−1yy [y − µy ])
= Σxx − ΣxyΣ−1yy Σ
′xy
= Var(x |y),and
Cov(e, y) = Cov([x − µx ]− ΣxyΣ−1yy [y − µy ], y)
= Σxy − Σxy = 0.
14 / 35
Lemma II : an extension
The proof of the Kalman filter uses a slight extension of the lemmafor multivariate Normal regression theory.
Suppose x , y and z are jointly Normally distributed vectors withE(z) = 0 and Σyz = 0. Then
E(x |y , z) = E(x |y) + ΣxzΣ−1zz z ,
Var(x |y , z) = Var(x |y)− ΣxzΣ−1zz Σ
′xz ,
Can you give a proof of Lemma II ?Hint : apply Lemma I for x and y∗ = (y ′, z ′)′.
15 / 35
Kalman Filter : derivation
State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .
• Writing Yt = {y1, . . . , yt}, define
at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt);
• The prediction error is
vt = yt − E(yt |Yt−1)
= yt − E(Ztαt + εt |Yt−1)
= yt − Zt E(αt |Yt−1)
= yt − Ztat ;
• It follows that vt = Zt(αt − at) + εt and E(vt) = 0;
• The prediction error variance is Ft = Var(vt) = ZtPtZ′t + Ht .
16 / 35
Kalman Filter : derivation
State space model: αt+1 = Ttαt + Rtζt , yt = Ztαt + εt .
• We have Yt = {Yt−1, yt} = {Yt−1, vt} and E(vtyt−j) = 0 forj = 1, . . . , t − 1;
• Lemma E(x |y , z) = E(x |y) + ΣxzΣ−1zz z , and take x = αt+1,
y = Yt−1 and z = vt = Zt(αt − at) + εt ;
• It follows that E(αt+1|Yt−1) = Ttat ;
• Further, E(αt+1v′t) = Tt E(αtv
′t) + Rt E(ζtv
′t) = TtPtZ
′t ;
• We carry out lemma and obtain the state update
at+1 = E(αt+1|Yt−1, yt)
= Ttat + TtPtZ′tF
−1t vt
= Ttat + Ktvt ;
with Kt = TtPtZ′tF
−1t
17 / 35
Kalman Filter : derivation• Our best prediction of yt is Ztat . When the actualobservation arrives, calculate the prediction errorvt = yt − Ztat and its variance Ft = ZtPtZ
′t + Ht . The new
best estimates of the state mean is based on both the oldestimate at and the new information vt :
at+1 = Ttat + Ktvt ,
similarly for the variance:
Pt+1 = TtPtT′t + RtQtR
′t − KtFtK
′t .
Can you derive the updating equation for Pt+1 ?• The Kalman gain
Kt = TtPtZ′tF
−1t
is the optimal weighting matrix for the new evidence.• You should be able to replicate the proof of the Kalman filterfor the Local Level Model (DK, Chapter 2).
18 / 35
Kalman Filter Illustration
1880 1900 1920 1940 1960
500
750
1000
1250
observation filtered level a_t
1880 1900 1920 1940 1960
6000
7000
8000
9000
10000state variance P_t
1880 1900 1920 1940 1960
−250
0
250
prediction error v_t
1880 1900 1920 1940 1960
21000
22000
23000
24000
25000prediction error variance F_t
19 / 35
Prediction and FilteringThis version of the Kalman filter computes the predictions of thestates directly:
at+1 = E(αt+1|Yt), Pt+1 = Var(αt+1|Yt).
The filtered estimates are defined by
at|t = E(αt |Yt), Pt|t = Var(αt |Yt),
for t = 1, . . . , n.
The Kalman filter can be “re-ordered” to compute both:
at|t = at +Mtvt , Pt|t = Pt −MtFtM′t ,
at+1 = Ttat|t , Pt+1 = TtPt|tT′t + RtQtR
′t ,
with Mt = PtZ′tF
−1t for t = 1, . . . , n and starting with a1 and P1.
Compare Mt with Kt . Verify this version of KF using earlierderivations.
20 / 35
Smoothing
• The filter calculates the mean and variance conditional on Yt ;
• The Kalman smoother calculates the mean and varianceconditional on the full set of observations Yn;
• After the filtered estimates are calculated, the smoothingrecursion starts at the last observations and runs until thefirst.
α̂t = E(αt |Yn), Vt = Var(αt |Yn),
rt = weighted sum of future innovations, Nt = Var(rt),
Lt = Tt − KtZt .
Starting with rn = 0, Nn = 0, the smoothing recursions are givenby
rt−1 = F−1t vt + Lt rt , Nt−1 = F−1
t + L′tNtLt ,
α̂t = at + Ptrt−1, Vt = Pt − PtNt−1Pt .
21 / 35
Smoothing Illustration
1880 1900 1920 1940 1960
500
750
1000
1250
observations smoothed state
1880 1900 1920 1940 1960
2500
3000
3500
4000 V_t
1880 1900 1920 1940 1960
−0.02
0.00
0.02 r_t
1880 1900 1920 1940 1960
0.000025
0.000050
0.000075
0.000100 N_t
22 / 35
Filtering and Smoothing
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
500
600
700
800
900
1000
1100
1200
1300
1400observation smoothed level
filtered level
23 / 35
Missing Observations
Missing observations are very easy to handle in Kalman filtering:
• suppose yj is missing
• put vj = 0,Kj = 0 and Fj = ∞ in the algorithm
• proceed further calculations as normal
The filter algorithm extrapolates according to the state equationuntil a new observation arrives. The smoother interpolatesbetween observations.
24 / 35
Missing Observations
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
500
750
1000
1250observation a_t
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
10000
20000
30000
P_t
25 / 35
Missing Observations, Filter and Smoohter
1880 1900 1920 1940 1960
500
750
1000
1250
filtered state
1880 1900 1920 1940 1960
10000
20000
30000
P_t
1880 1900 1920 1940 1960
500
750
1000
1250
smoothed state
1880 1900 1920 1940 1960
2500
5000
7500
10000V_t
26 / 35
Forecasting
Forecasting requires no extra theory: just treat future observationsas missing:
• put vj = 0,Kj = 0 and Fj = ∞ for j = n + 1, . . . , n + k
• proceed further calculations as normal
• forecast for yj is Zjaj
27 / 35
Forecasting
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
500
750
1000
1250observation a_t
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000
10000
20000
30000
40000
50000P_t
28 / 35
Parameter EstimationThe system matrices in a state space model typically depends on aparameter vector ψ. The model is completely Gaussian; weestimate by Maximum Likelihood.The loglikelihood af a time series is
log L =
n∑
t=1
log p(yt |Yt−1).
In the state space model, p(yt |Yt−1) is a Gaussian density withmean at and variance Ft :
log L = −nN
2log 2π − 1
2
n∑
t=1
(
log |Ft |+ v ′tF−1t vt
)
,
with vt and Ft from the Kalman filter. This is called the prediction
error decomposition of the likelihood. Estimation proceeds bynumerically maximising log L.
29 / 35
ML Estimate of Nile Data
1870 1880 1890 1900 1910 1920 1930 1940 1950 1960 1970
500
600
700
800
900
1000
1100
1200
1300
1400observation level (q = 1000)
level (q = 0) level (q = 0.0973 = ML estimate)
30 / 35
Diagnostics
• Null hypothesis: standardised residuals
vt/√F t ∼ NID(0, 1)
• Apply standard test for Normality, heteroskedasticity, serialcorrelation;
• A recursive algorithm is available to calculate smootheddisturbances (auxilliary residuals), which can be used to detectbreaks and outliers;
• Model comparison and parameter restrictions: use likelihoodbased procedures (LR test, AIC, BIC).
31 / 35
Nile Data Residuals Diagnostics
1880 1900 1920 1940 1960
−2
0
2Residual Nile
0 5 10 15 20
−0.5
0.0
0.5
1.0 CorrelogramResidual Nile
−4 −3 −2 −1 0 1 2 3 4
0.1
0.2
0.3
0.4
0.5N(s=0.996)
−2 −1 0 1 2
−2
0
2
QQ plotnormal
32 / 35
Assignment 1 - Exercises 1 and 2
1. Formulate the following models in state space form:• ARMA(3,1) process;• ARMA(1,3) process;• ARIMA(3,1,1) process;• AR(3) plus noise model;• Random Walk plus AR(3) process.
Please discuss the initial conditions for αt in all cases.
2. Derive a Kalman filter for the local level model
yt = µt + ξt , ξt ∼ N(0, σ2ξ ), ∆µt+1 = ηt ∼ N(0, σ2η),
with E (ξtηt) = σξη 6= 0 and E (ξtηs) = 0 for all t, s and t 6= s.Also discuss the problem of missing obervations in this case.
33 / 35
Assignment 1 - Exercise 3
3.
Consider a time series for yt that is modelled by a state spacemodel and consider a time series vector of k exogenous variablesXt . The dependent time series yt depend on the explanatoryvariables Xt in a linear way.
• How would you introduce Xt as regression effects in the statespace model ?
• Can you allow the transition matrix Tt to depend on Xt too ?
• Can you allow the transition matrix Tt to depend on yt ?
For the last two questions, please also consider maximumlikelihood estimation of coefficients within the system matrices.
34 / 35
Assignment 1 - Computing work
• Consider Chapter 2 of Durbin and Koopman book.
• There are 8 figures in Chapter 2.
• Write computer code that can reproduce all these figures inChapter 2, except Fig. 2.4.
• Write a short documentation for the Ox program.
35 / 35