Discrete time processes - ETH Zürich · Stochastic Systems, ... Brownian Motion is a discrete-time...

Discrete time processes

Predictions are difficult. Especially about the future

Mark Twain.

Florian Herzog

2013

Modeling observed data

When we model observed (realized) data, we encounter usually the following situations:

• Time-series data of a single time series (Dt), e.g. economic data.

• Online observations also single time series (Dt)

• Ensemble data from measurements (experiments) or seasonal data (Djt )

Modeling ensemble data is often easier than single time series, because we have the

same stochastic information of a given point in time t with multiple realizations.

By Dt t = 1, ..., T we denote the data observation at time in the case that we have

only one single time series observation. By Djt t = 1, ..., T j = 1, ..., l we denote

the time series observation at time t and measurement j.

Data Dt is recorded in discrete intervals (sampled data) and assumed to be generated

by an SDE.

Stochastic Systems, 2013 2

Modeling real-world data

Stochastic differential equations are can be viewed as stochastic process from two point

of view:

• Stochastic process with a probability distribution p(t|θ) across-time, where θ

denotes the parameters

• A stochastic process, where the changes in the resulting time series is the stochastic

process, i.e. p(Dt −Dt−1|θ) or p(Dt−Dt−1Dt−1

|θ)

The first interpretation is help full to describe ensemble data and the second to analyze

single time series.

In order to deal with discrete data, all SDEs need to be discretized.


Discrete-time white noise

When we assume a discrete white noise process εt which identically and independently

distributed (i.i.d) with zero expectation and σ2 variance. Then following statistical

properties apply:

E[εt] = 0 (1)

E[ε2t ] = σ

2(2)

E[εtετ ] = 0 for t ̸= τ (3)

The last property is derived from the assumption of independence. A discretized

Brownian Motion is a discrete-time white noise process with εt ∼ N (0,∆t) where

∆t is the time discretization.

In discrete-time models, a white noise process can be normally distributed (Gaussian

white noise) but can be distributed by any other distribution as long as the i.i.d.

assumption is valid.


Statistical Theory

Stationarity: A process is covariance-stationary if and only if (iff):

E[Dt] = µ ∀t (4)

E[(Dt − µ)(Dt−j − µ)] = νj ∀t ∀j . (5)

This form of stationarity is often also called weak stationarity. For example the process:

Dt = µ + εt where εt is white noise process with zero mean and σ2 variance. In this

case Dt is stationary since:

E[Dt] = µ

E[(Dt − µ)(Dt−j − µ)] =

{σt j = 0

0 j ̸= 0


Statistical Theory

Stationarity: A process is strictly stationary iff for any value of j1, j2, j3, ... the joint

distribution of Dt, Dt+j1, Dt+j2

, Dt+j3, .. depend only on the intervals separating the

different values of j and not the time t.

Ergodicity: A process is called ergodic for the mean iff:

limt→∞

E[Dt] = µ . (6)

A process is said to be ergodic for the second moment iff:

limT→∞

1

T − jE[(Dt − µ)(Dt−j − µ)] →p

νj (7)

where →p denotes convergence in probability. From an ergodic process, we can estimate

all necessary data without possing ensemble data. If an process is not ergodic, we have

only one observation, we can not identify its parameters.


Statistical Theory

It can be shown that if the autocovariance of a covariance stationary process satisfies

the following:

∞∑j=0

|νj| < ∞ (8)

then the process is ergodic for the mean.


First Order Moving Average Process

When εk be a Gaussian white noise processes with mean zeros and variance σ2 and

consider the process:

Yk = µ+ εk + θεk−1 , (9)

where µ and θ are nay real valued constant. The process is called first order moving

average process and ussualy denotes as MA(1). The expectation of Yk is given as:

E[Yk] = E[µ+ εk + θεk−1]

= µ+ E[εk] + θE[εk−1] = µ

The constant µ is thus the mean of the processes.



The variance of the MA(1) is calculated as

Var[Yk] = E[(Yk − µ)2] = E[(εk + θεk−1)

2]

= E[ε2k + θ

2ε2k−1 + 2θεkεk−1]

= σ2+ 0 + θ

2σ

2= (1 + θ

2)σ

2

The first autocovariance is computed as

E[(Yk − µ)(Yk−1 − µ)] = E[(εk + θεk−1)(εk−1 + θεk−2)]

= E[εkεk−1 + θε2k−1 + θεk−1εkθ

2εk−1εk−2]

= 0 + θσ2+ 0 + 0 = θσ

2



The higher order autocovariances are all zero:

E[(Yk − µ)(Yk−j − µ)] = E[(εk + θεk−1)(εk−j + θεk−j−1)] = 0 .

Since the mean and autocovariance are not function of times, an MA(1) process is

covariance stationary regardless of the value of θ, since

∞∑j=0

|νj| = (1 + θ2)σ

2+ |θσ2| . (10)

When εk is a Gaussian white noise process, the MA(1) is ergodic for all moments.



Autocorrelation is defined as ρ1 = Cov[YkYk−1]/√V ar[Yk]V ar[Yk−1] and

V ar[Yk] = V ar[Yk+1] = (1 + θ2)σ2. Therefore we get

ρ1 =θσ2

(1 + θ2)σ2=

θ

1 + θ2. (11)

All higher order autocorrelations are zero. Positive values of θ induce positive auto-

correlation, negative values induces negative autocorrelation. The autocorrelation of an

MA(1) process can be between -0.5 and 0.5. For any value of ρ1, there exists two

values of θ, see

ρ1 =(1/θ)

1 + (1/θ)2=

θ2(1/θ)

θ2(1 + (1/θ)2)=

θ

1 + θ2. (12)


Q-th Order Moving Average Process

A q-th order moving average process, denotes as MA(q) is given by

Yk = µ+ εk + θ1εk−1 + θ2εk−2 + ......+ θqεk−q , (13)

where µ and θ1, ...θ1 are real valued constant. The expectation of Yk is given as:

E[Yk] = µ+ E[εk] + θ1E[εk−1] + ......+ θqE[εk−q] = µ

The constant µ is thus the mean of the processes. The variance is computed as The

variance of the MA(1) is calculated as

Var[Yk] = E[(Yk − µ)2] = E[(εk + θ1εk−1 + θ2εk−2 + ......+ θqεk−q)

2] .

Since all ε’s are uncorrelated, all cross terms εjεk are zero.



The variance is therefore

Var[Yk] = σ2+ θ

21σ

2+ ......+ θ

2qσ

2

= σ2(1 +

q∑j=1

θ2j) .

The autocovariance is for the i-th lag is computed as

νi = E[(εk +

q∑j=1

θjεk−j)(εk−i +

q∑j=1

θjεk−j−i)]

= E[θiε2k−i + θi+1θ1ε

2k−i−1 + θi+2θ2εk−i−2 + ......+ θqθq−iε

2k−q]



The ε’s at different dates have been drop since their product is zero. For i > q there

are no ε’s with common dates in the definition of of νi. We get for the autocovariance

thus

νi =

{(θi + θi+1θ1 + θi+2θ2 + ...+ θqθq−j)σ

2 i = 1, 2, ..., q

0 i > q

The autocorrelation is given as

νi =

θi+θi+1θ1+θi+2θ2+...+θqθq−j

(1+∑qj=1

θ2j)

i = 1, 2, ..., q

0 i > q


Autoregressive first order Process

We analyze an so-called first order autoregressive (AR(1)):

Yk+1 = c+ ΦYk + εk

Long-term mean: We take the mean on both sides of the equation

E[Yk+1] = E[c+ ΦYk + εk]

E[Yk+1] = c+ ΦE[Yk]

µ = = c+ Φµ

µ =c

1 − Φ



µ =c

1 − Φ

From this formula, we can see that Φ must be smaller than 1, otherwise the mean

would be negative when c ≥ 0. Also we have assumed that E[Yk] = E[Yk+1] = µ

and exists. Second moment:

Yk+1 = µ(1 − Φ) + ΦYk + εk

Yk+1 − µ = Φ(Yk − µ) + εk



E[(Yk+1 − µ)2] = E[(Φ(Yk − µ) + εk)

2]

= Φ2E[(Yk − µ)] + 2ΦE[(Yk − µ)εk] + E[ε

2k]

E[(Yk+1 − µ)2] = Φ

2E[(Yk − µ)] + σ

2

where E[(Yk−µ)εk] = 0. Assuming covariance stationarity, i.e. E[(Yk+1−µ)2] = ν

and E[(Yk − µ)2] = ν. We get:

ν = Φ2ν + σ

2

ν =σ2

1 − Φ2



Autocorrelation:

E[(Yk+1 − µ)(Yk − µ)] = E[(Φ(Yk − µ) + εk)(Yk − µ)]

E[(Yk+1 − µ)(Yk − µ)] = ΦE[((Yk − µ)2] + E[(εk)(Yk − µ)] .

Again E[(εk)(Yk − µ)] = 0 and E[((Yk − µ)2] = ν we get:

E[(Yk+1 − µ)(Yk − µ)] = Φν = Φσ2

1 − Φ2.

Autocorrelation is defined as Cov[Yk+1Yk]/√V ar[Yk+1]V ar[Yk] and V ar[Yk] =

V ar[Yk+1] = ν. Therefore we get

acf(Yk+1Yk) = ϕ (14)



AR(1) process with Yk+1 = ΦYk + c+ εk hast the following properties:

Mean : µ =c

1 − Φ

V ariance : ν =σ2

1 − Φ2

Autocorr. : acf(Yk+jYk) = ϕj

A covariance stationary AR(1) process is only well defined when |Φ| ≤ 1, otherwise

the variance is infinite and mean makes no sense. We can check if a process is a AR(1)

process by calculating the mean and variance over-time and the autocorrelation. From

these three calculations the parameters can be estimated. It is important to notice that

the autocorrelation is exactly the coefficient of the autoregression.


Example: GBM and mean reversion

The two examples have a discrete-time structure as an first-order autoregressive process

with:

(I) Φ = 1 + µ∆t

(II) Φ = 1 − b∆t

When we assume that µ > 0, the first model is instable and we can not use the

techniques from auto-regressions to estimate the parameters. We have to deal different-

ly in this case, by using p = ln(x), since then we get an SDE with constant coefficients.

When we assume that b > 0, the process is a true mean reverting process that can be

estimated by AR(1) methods. In this case, the discrete-time model is also stable.


First order difference equation

A first order difference equation give by

yk+1 = Φyk + wk

is of the same type as an AR(1) equation. The can be solved by recursive substitution

since y1 = Φy0 + w0, y2 = Φy1 + w1, y3 = Φy2 + w2 ... . When we plug in the

results of y1 into the equation for y2, we get y2 = Φ2y0 + Φw0 + w1, which in turn

we plug in the equation for y3 and get y3 = Φ3y0 + Φ2w0 + Φw1 + w2. Expanding

this further on, we get the general solution

yk+1 = Φk+1y0 +

k∑j=0

Φk−jwj


p-th order difference equation

A p-th order difference equation give by

yk+1 = Φ1yk + Φ2yk−1 + ......+ Φpyk−p+1 + wk

We write this in a vectorized form with:

ξk =

ykyk−1

yk−2...

yk−p−1

F =

Φ1 Φ2 Φ3 · · · Φp−1 Φp

1 0 0 · · · 0 0

0 1 0 · · · 0 0... ... ... · · · ... ...

0 0 0 · · · 1 0

vk =

wk0

0...

0

and get

ξk+1 = Fξk + vk .


p-th order difference equation

The solution is given as:

ξk = Fkξ0 + F

k−1v0 + F

k−2v1 + F

k−3v2 + ......+ Fvk−1 + vk

The p-th order difference equation is stable (or stationary) when the eigenvalues of F

are smaller than one. The eigenvalue polynom is given as:

λp − Φ1λ

p−1 − Φ2λp−2 − ......− Φp−1λ− Φp = 0


Lag operators

A highly useful operator for time series models is the lag operator. Suppose that we

start with an sequence of data xk where k = −∞...∞ and generate a new sequence

yk where the value of y for date k is equal to the value x took at time k − 1

yk = xk−1 .

This is described as applying the Lag operator on xk and is represented by symbol L:

Lxk = xk−1 .

Consider applying the lag operator twice to a series:

L(Lxk) = L(xk−1) = xk−2 ,

which is denotes as L2 or L2xk = xk−2.


Lag operators

In general for any integer l

Llxk = xk−l ,

The lag operator is linear and can be treated as the multiplation operator and thus

L(βxk) = βLxk = βxk−1

L(xk + wk) = Lxk + Lwk

yk = (a+ bL)Lxk = (aL+ bL2)xk = axk−1 + bxk−2

An expression such as (aL+ bL2) is referred as a polynomial lag operators.

Notice if xk = c then Lxk = xk−1 = c


First order difference equation and lag operators

A first order difference equation give by

yk+1 = Φyk + wk

can be written as

yk+1 = ΦLyk+1 + wk

yk+1 − ΦLyk+1 = wkyk+1(1 − L) = wk

when we multiply both sides with (1+ΦL+Φ2L2 +Φ3L3 + ......+ΦkLk) we get

yk+1(1 + ΦL+ Φ2L

2+ Φ

3L

3+ ......+ Φ

kLk)(1 − L)

= (1 + ΦL+ Φ2L

2+ Φ

3L

3+ ......+ Φ

kLk)wk


First order difference equation and lag operators

yk+1(1 − Φk+1Lk+1

) = (1 + ΦL+ Φ2L

2+ Φ

3L

3+ ......+ Φ

kLk)wk

yk+1(1) = Φk+1Lk+1yk+1 + (1 + ΦL+ Φ

2L

2+ Φ

3L

3+ ......+ Φ

kLk)wk

yk+1 = Φk+1y0 + wk + Φwk−1 + Φ

2wk−2 + Φ

3wk−3 + ......+ Φ

kw0

which results exactly in the same solution as by the usage of the recursive substitution.


P-th order difference equation and lag operators

The p-th order difference equation given by

yk+1 = Φ1yk + Φ2yk−1 + ......+ Φpyt−p+1 + wk ,

can be written using the lag operator as:

yk+1(1 − Φ1L− Φ2L2 − Φ3L

3 − ....− ΦpLp) = wk ,

which results in the same polynomial as in the case of the eigenvalues of the matrix F .


P-th order difference equation and lag operators

Since

(1 − Φ1z − Φ2z2 − Φ3z

3 − ....− Φpzp) = (1 − λ1z)(1 − λ2z)....(1 − λpz) ,

and by defining z−1 = λ and multiply both sides with z−p and we get

(λp − Φ1λ

p−1 − Φ2λp−2 − ....− Φp) = (λ− λ1)(λ− λ2z)....(λ− λpz) ,

which is nothing but the root search of the left side polynomial:

λp − Φ1λ

p−1 − Φ2λp−2 − ......− Φp−1λ− Φp = 0


Second order Autoregression

A second order autoregressive process (denotes as AR(2)) has the form:

Yk+1 = c+ Φ1Yk + Φ2Yk−1 + εk

written in lag notation:

Yk+1(1 − Φ1L− Φ2L2) = c+ εk

The AR(2) is stable if the roots of (1 − Φ1z − Φ2z2) = 0 lie outside the unit circle.

The process has the mean value:

E[Yk+1] = c+ Φ1E[Yk] + Φ2E[Yk−1] + E[εk]

µ = c+ Φ1µ+ Φ2µ

µ =c

1 − Φ1 − Φ2



To find the second moments of the AR(2), we write

Yk+1 = µ(1 − Φ1 − Φ2) + Φ1Yk + Φ2Yk−1 + εk

Yk+1 − µ = Φ1(Yk − µ) + Φ2(Yk−1 − µ) + εk .

multiplying both sides with Yk+1−j − µ we get and taking the expectation yields:

νj = Φ1νj−1 + Φ2νj−2 j = 1, 2, 3..

The auto-covariances follow the same second order difference equation as the AR(2)

process. The autocorrelation is given by dividing the last equation by ν0 we get

ρj = Φ1ρj−1 + Φ2ρj−2 j = 1, 2, 3..



For j = 1 we get for the autocorrelation:

ρ1 = Φ1 + Φ2ρ1

ρ1 =Φ1

1 − Φ2

,

since ρ0 = 1 and ρ−1 = ρ1. For j=2 we get

ρ2 = Φ1ρ1 + Φ2

ρ2 =Φ2

1

1 − Φ2

+ Φ2 .

The covariance is computed by

E[(Yk+1 − µ)(Yk+1 − µ)] = Φ1E[(Yk − µ)(Yk+1 − µ)] + Φ2E[(Yk−1 − µ)(Yk+1 − µ)]

+ E[εk(Yk+1 − µ)] ,



The covariance equation can be written as

ν0 = Φ1ν1 + Φ2ν2 + σ2j = 1, 2, 3..

The last term from noticing

E[εk(Yk+1 − µ)] = E[(εk)(Φ1(Yk − µ)

+Φ2(Yk−1 − µ) + εk)]

= Φ10 + Φ20 + σ2

The covariance equation can be written as

ν0 = Φ1ρ1ν0 + Φ2ρ2ν0 + σ2,

ν0 =(1 − Φ2)σ

2

(1 + Φ2)[(1 − Φ22) − Φ2

1]


P-th order Autoregression

A p-th order autoregressive process (denotes as AR(p)) has the form:

Yk+1 = c+ Φ1Yk + Φ2Yk−1 + ...+ ΦpYk−p+1 + εk

written in lag notation:

Yk+1(1 − Φ1L− Φ2L2 − ...− ΦpL

p) = c+ εk

The AR(2) is stable if the roots of (1 − Φ1z − Φ2z2 − ...− Φpz

p) = 0 lie outside

the unit circle. The process has the mean value:

E[Yk+1] = c+ Φ1E[Yk] + Φ2E[Yk−1] + ...+ ΦpE[Yk−p+1] + E[εk]

µ = c+ Φ1µ+ Φ2µ+ ...+ Φpµ

µ =c

1 − Φ1 − Φ2 − ....− Φp


P-th order Autoregression

The autocovariance is given as

νj =

{Φ1νj−1 + Φ2νj−2 + ...+ Φpνj−p j = 1, 2, 3..

Φ1ν1 + Φ2ν2 + ...+ Φpνp j = 0

The equation for the autocorrelation is given as

ρj = Φ1ρj−1 + Φ2ρj−2 + ...+ Φ2ρj−p j = 1, 2, 3..

which is known as the ”Yule-Walker” equation. The equation is again a p-th order

difference equation.

—————————————————————–


Mixed Autoregressive Moving Average Process (ARMA)

A ARMA(p,q) process includes autoregressive elements as well as moving average

elements

Yk+1 = c+ Φ1Yk + Φ2Yk−1 + ...+ ΦpYk−p+1

+εk + θ1εk−1 + θ2εk−2 + ......+ θqεk−q .

The process is written in Lag form as

Yk+1 (1 − Φ1L− Φ2L2 − ...− ΦpL

p) =

+c+ εk(1 + θ1L+ θ2L2+ ......+ θqL

q) .

Provided that the roots of (1−Φ1L−Φ2L2 − ...−ΦpL

p) = 0 lie outside the unit

circle we obtain the following general form for an ARMA(p,q) process:

Yk+1 = µ+(1 + θ1L+ θ2L

2 + ......+ θqLq)

(1 − Φ1L− Φ2L2 − ...− ΦpLp)εk .



The mean is given as

µ =c

(1 − Φ1L− Φ2L2 − ...− ΦpLp).

The stationarity of an ARMA process depends only on the autoregressive parameters

and not on the moving average parameters. The more convenient form of an ARMA(p,q)

process is given as

Yk+1 − µ = Φ1(Yk − µ) + Φ2(Yk−1 − µ) + ...+ Φp(Yk−p+1 − µ)

+εk + θ1εk−1 + θ2εk−2 + ......+ θqεk−q .

The autocovariance equation for j > q is given as

νj = Φ1νj−1 + Φ2νj−2 + ...+ Φpνj−p j = q + 1, q + 2, ...



Thus after q lags, the autocovariance function νj follows a p-th order difference equation

governed by the the autoregressive part.

Note that last equation does not hold for j ≤ q since the correlation between θjεk−jand Yt−j cannot be neglected.

In order to analyze the auto-covariance, we introduce the auto-covariance generating

function

gY (z) =∞∑

j=−∞νjz

j

which is constructed as taking the j-th auto-covariance and multiplying with z to the

power of j and then summing over all possible values of j


Parameter Identification

We start to examine the appropriate log-likelihood function with our frame work of the

discretized SDEs. Let ψ ∈ Rd be the vector of unknown parameters, which belongs to

the admissible parameter space Ψ, e.g. a, b, etc. The various system matrices of the

discrete models, as well as the variances of stochastic processes, given in the previous

section, depend on ψ. The likelihood function of the state space model is given by the

joint density of the observational data x = (xN , xN−1, . . . , x1)

l(x, ψ) = p(xN , xN−1, . . . , x1;ψ) (15)

which reflects how likely it would have been to have observed the date if ψ were the

true values of the parameters.



Using the definition of condition probability and employing Bayes’s theorem recursively,

we now write the joint density as product of conditional densities

l(x, ψ) = p(xN |xN−1, . . . , x1;ψ) · . . . · p(xk|xk−1, . . . , x1;ψ) · . . . · p(x1;ψ) (16)

where we approximate the initial density function p(x1;ψ) by p(x1|x0;ψ). Further-

more, since we deal with a Markovian system, future values of xl with l > k only

depend on (xk, xk−1, . . . , x1) through the current values of xk, the expression in (16)

can be rewritten to depend on only the last observed values, thus it is reduced to

l(x, ψ) = p(xN |xN−1;ψ) · . . . · p(xk|xk−1;ψ) · . . . · p(x1;ψ) (17)



A Euler discretized SDE is given as:

xk+1 = xk + f(xk, k)∆t+ g(xk, k)√∆tεk (18)

where εk is an i.i.d random variable with normal distribution, zero expectation and unit

variance. Since f and g are deterministic functions, xk+1 is conditional (on xk normal

distributed with

E[xk+1] = xk + f(xk, k)∆t (19)

Var[xk+1] = g(xk, k)g(xk, k)T∆t (20)

p(xk+1) =1

(2π)n2√

|Var[xk+1]|e−12(xk+1−E[xk+1])

TVar[xk+1]−1(xk+1−E[xk+1])(21)



The parameter of an SDEs are estimated by maximizing the log-likelihood. To maximize

the log-likelihood is equivalent to maximizing the likelihood, since the log function is

strictly increasing function. The log likelihood is given as:

ln(l(x, ψ)) = ln(p(xN |xN−1;ψ) · . . . · p(yk|xk−1;ψ) · . . . · p(x1;ψ))(22)

ln(l(x, ψ)) =

N∑j=1

ln(p(xj|xj−1;ψ) (23)

ln(l(x, ψ)) =

N∑j=1

{ln((2π)

−n2 |Var[xk+1]|−

12)

−1

2(xk+1 − E[xk+1])

TVar[xk+1]

−1(xk+1 − E[xk+1])

}(24)


Example: Parameter Identification

We want to identify the parameters of a AR(1) process without any constant:

xk+1 = Φxk + σεk (25)

which has the log-likelihood function:

ln(l(x, ψ)) =N∑j=1

ln((2π)−12σ

−1) −

(xk+1 − Φxk)2

2σ2. (26)

The function is maximized by taking the derivatives for Φ and σ. The log-likelihood

function has the same from as an so-called ordinary least-square (OLS) estimation.


Example: Parameter Identification

The solution of the maximum likelihood estimations is:

Φ̂ =

∑Nj=1 xk+1xk∑N

j=1 x2k

(27)

σ̂ =1

N

N∑j=1

(xk+1 − Φxk)2

(28)

The solution for Φ is the same solution as for an OLS regression. Furthermore, the

solution is exactly the definition of autocorrelation with lag one. The estimation for σ

is the empirical standard deviation of the estimation of the empirical white noise, i.e.

ε̂k = xk+1 − Φxk


Discrete time processes - ETH Zürich · Stochastic Systems, ... Brownian Motion is a discrete-time...

Documents

Transcript of Discrete time processes - ETH Zürich · Stochastic Systems, ... Brownian Motion is a discrete-time...