ARIMA to the Rescue (Excel)
-
Upload
spider-financial -
Category
Documents
-
view
377 -
download
0
description
Transcript of ARIMA to the Rescue (Excel)
ARIMA to the rescue ‐1‐ © Spider Financial Corp, 2012
ARIMAtotherescueThis is the second entry in our series of “Unplugged” tutorials, in which we delve into the details of each
of the time series models with which you are already familiar, highlighting the underlying assumptions
and driving home the intuitions behind them.
In financial time series and other fields, we often face a non‐stationary time series, for example traded
security (e.g. stock, bond, commodity, etc.) price levels. In this case, the time series exhibits either
trending, seasonality or merely misguided (random) walk. Unfortunately the bulk of time series and
econometric methods can be applied only to stationary processes, so how do we handle this scenario?
In this issue, we tackle the ARIMA model – an extension of the ARMA model, but the ARIMA model
applies to non‐stationary time series – the kind of time series with one or more unit‐roots (integrated).
Once again, we will start here with the ARIMA process definition, stating the inputs, outputs,
parameters, stability constraints, and assumptions. Then we will introduce the integration operator and
draw a few guidelines for the modeling process.
BackgroundA non‐stationary time series often exhibits a few common patterns including trend over time,
seasonality, and misguided random walk. The trend or seasonality can also be classified as either
deterministic (function of time) or stochastic (function of past values).
For stochastic trend and/or seasonality, we often difference (i.e. compute the change of) the original
time series to induce a stationary series which can be further modeled by an ARMA type of process.
By definition, the auto‐regressive integrated moving average (ARIMA) process is an ARMA process for
the differenced time series.
Alternatively, in a simple formulation, an ARIMA (p,d,q) is defined as follow:
1 1
(1 )(1 ) (1 )p q
i d ji t j t
i j
L L Y L a
OR
1 1
1 1
(1 ) (1 )
(1 )( ) (1 )
(1 ) (1 ) ... (1 ) (1 )
p qi j
i t j ti j
p qi j
i t j ti j
d dt t t
L Z L a
L Z L a
Z Y L L L L Y
ARIMA to the rescue ‐2‐ © Spider Financial Corp, 2012
Where
tY is the observed output at time t
d is the difference operator of order d
tZ is the differenced time series at time t
ta is the innovation, shock or error term at time t
{ }ta time series observations:
o Are independent and identically distributed
o Follow a Gaussian distribution (i.e. 2(0, ) ).
AssumptionsLooking closer at the formulation, we see that the ARIMA process is essentially an ARMA process for the
differenced time series aside from the difference operator ( d ). The same assumption for an ARMA
process applies here as well:
The ARMA process generates a stationary time series .tZ
The residuals { }ta follow a stable Gaussian distribution.
The components’ parameter 1 2 1 2{ , ,..., , , ,..., }p q values are constants.
The parameter 1 2 1 2{ , ,..., , , ,..., }p q values yield a stationary process.
Sound simple? It is! A careful selection of the ARMA model parameters can guarantee a stationary
process for the differenced time series ( tZ ), but how do we interpret the forecast of tY using tZ
Integration(un‐difference)OperatorIn many cases, we often apply a difference operator to yield a stationary time series that can be easily
modeled using ARMA type of model. But how do go back to the original un‐differenced time series
space and interpret the ARMA results (e.g. forecast)? Our best bet is to use the Integration Operator.
DEFINITION: a stochastic time series{ }tY is said to be integrated of order (d) (i.e. ~ ( )tY I d ), if the d‐
times differenced time series yields an invertible ARMA representation.
1 1
1
1 1
1 1(1 )
1 1
p pi i
i id ki i
t t t k tq qj j k
j jj j
L La Y Z L Z
L L
ARIMA to the rescue ‐3‐ © Spider Financial Corp, 2012
AND, 1
kk
Now, to recover tY from the (1 )dtL Y , we apply the un‐difference (integration) operator.
A first order integration can be expressed as
2 3
0
0
1
(1 ... ...)1
n itt t t
i
t t ii
n
T n T T ii
ZY Z L L L L Z L
L
Y Z
Y Y Z
For higher order (i.e. d ‐order) integration, we simply integrate multiple times:
0 0 0
1 1 1... ...
(1 ) 1 1 1i i it
t t tdi i i
ZY Z Z L L L
L L L L
For instance, for 2d , the integration operator is defined as follow:
222
2 3
0
1
1
1
1 11 ...
(1 ) 1 1
(1 2 3 4 ... ( 1) ...) ( 1)
( 1 )
tt t t
n it t t
i
n
T n T T T n ii
T T T
ZY Z Z L L
L L L
Y Z L L L n L Z i L
Y Y n W n i Z
W Y Y
For 3d , the integration operator is defined as follow:
323
2
0
1
1
1
1 1 2
1 1 11 ...
(1 ) 1 1 1
( 1)( 2) ( 1)( 2)(1 3 6 ... ..)
2 2
( 1) ( 1 )( 2 )
2 2
2
tt t t
n it t t
i
n
T n T T T T n ii
T T T
T T T T T T
ZY Z Z L L
L L L L
n n i iY Z L L L Z L
n n n i n iY Y nW V Z
W Y Y
V W W Y Y Y
ARIMA to the rescue ‐4‐ © Spider Financial Corp, 2012
Since { }tY is an integrated timer series of order d , then tZ is a stationary time series which we can
express in a MA representation:
0 0 0 0 0 0
...k i i i k it t k t k i
k i i i k i
Y a L L L L a L L
We can compute the conditional variance at time T n given the information available at time t :
12 2
1 10 0 1
0
( | ... ) ( )
Where:
1
ni k
T n T T t i k a ii k i
i
i i k kk
o o
Var Y Y Y Y Var a L L
IMPORTANT: NumXL has a function INTG() that computes the integral of a seasonal differenced (i.e.
(1 )d s dt s t tZ Y L Y ) time series. To recover a differenced time series of order d, set 1s and pass
on the initial conditions (e.g. 1, ,...,T T T dY Y Y ), and it will recover the original data series.
ARIMA to the rescue ‐5‐ © Spider Financial Corp, 2012
ARIMAMachineThe ARIMA process is a simple machine that retains limited information about its past differenced
outputs and the shocks it has experienced. In a more systematic view, the ARIMA process or machine
can be viewed as below:
Note that we are observing the integrated output of the ARMA process ( tY ), but the machine processes
the differenced outputs ( tZ ). The INTG block references the integration operator.
Howdoweknowifwehaveaunit‐rootinourtimeseries?Aside from the statistical tests for unit‐root (e.g. ADF, KPSS, etc.), there are a few visual clues for
detecting unit‐root using the ACF and PACF plots. For instance, a time series with unit‐root will exhibit
high and very slow decaying ACF values for all lags. On the PACF plot, the PACF value for the first lag is
almost one (1), and the PACF values for lag‐order greater than one are insignificant.
For statistical testing, the Augmented Dickey – Fuller (ADF) test will examine the evidence for a unit
root, even in the presence of deterministic trend or squared time trend.
Note: Starting in 1.55 (LYNX), NumXL natively supports the ADF test with a step‐down optimization
procedure.
StatisticalCharacteristicsIn our description of the ARIMA process, we highlighted a single input stimulus: shocks/innovations,
emphasizing how they propagate throughout the ARIMA machinery to generate the observed output.
ARIMA to the rescue ‐6‐ © Spider Financial Corp, 2012
The ARIMA machine is basically an ARMA machine, but the output is integrated before we can observe
it. How does this affect the output distribution?
Whydowecare?The statistical distribution (i.e. ) of the output ( T nY ) is pivotal for conducting a forecast and/or
establishing a confidence interval at any future time (n ).
2
/2 /2
~ ( , )
ˆT n T n T n
T n l T n T n T n u T n
Y
Z Y Z
Where
T̂ nY is the out‐of‐sample forecast at time T n
/2lZ is the lower critical value for / 2 significance level
/2uZ is the upper critical value for / 2 significance level
2T n is the conditional variance at time T n
By now, the importance of understanding the output statistical distribution should be clear. Now how
do we go about forming that understanding?
Back to the definition, the differenced time series { }tZ is modeled as a stationary ARMA process. Let’s
convert it to an infinite‐order MA model:
1 1
1
1 0
1
0
(1 ) (1 )
1
(1 )1
1
p qi j
i T j ti j
qj
jj k k
T t k t t kpi k k
ii
L Z L a
L
Z a L a a LL
Now, let’s recover the original time series from { }tZ
Example 1: Let’s consider the following differenced series (1 )t tZ L Y . To recover the { }tY time
series, we simply add up all the differences to date.
1
1
1 1 1 0
T T T
n n n ij
T n T T i T j T i T T i ji i j o i j
Z Y Y
Y Y Z Y L a Y a
ARIMA to the rescue ‐7‐ © Spider Financial Corp, 2012
Now, the variance of the forecast is expressed as follow:
2
2
1 0
( )n n k
T n a ik i
Var Y
As we see, although computing the forecast is simple exercise of summing all prior differences, the
variance calculation is much more involved.
Furthermore, as 1n , the
1
1T n p
ii
Z
, so the T nY estimate/forecast asymptotically approaches
the deterministic linear trend defined by:
1
1T n T p
ii
nY Y
.
Note: For higher order integration (d>1), it can be easily shown that long‐run forecast values of the time
series values would asymptotically follow a polynomial of the same order.
ConclusionIn simple terms, an ARIMA process is merely an ARMA process whose outputs have gone through an
integrator. The integrator causes the observed time series { }TY to be non‐stationary. The integration
process introduces the unit‐root into{ }TY . Integrating multiple times introduces multiple unit‐roots into
the output time series. This is why the word “integrated” is used in ARIMA.
The main take away of this paper is that differencing is a special transformation procedure that is aimed
to convert a non‐stationary time series into a stationary one. Like all transformations, care must be
taken when we interpret the results back into the original time series space.
Notice that the unit‐root modeling (e.g. ARIMA) is intended to capture a stochastic trend and it is not
suited for a deterministic trend. If you suspect the presence of a deterministic trend, you should explore
this avenue first (i.e. regress over time). At that point, you may choose to take the residuals and apply
an ARMA type of process to exploit any remaining dynamics.