An Introduction to Stochastic Calculus · Example: A stochastic process is called Gaussian if all...

An Introduction to Stochastic Calculus

Haijun Li

[email protected] of Mathematics and Statistics

Washington State University

Lisbon, May 2018

Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 1 / 169

Outline

Basic Concepts from Probability TheoryRandom VectorsStochastic Processes


Notations

Sample or outcome space Ω := all possible outcomes ω of theunderlying experiment.σ-field or σ-algebra F : A non-empty class of subsets (orobservable events) of Ω closed under countable union, countableintersection and complements.Probability measure P(·) on F : P(A) denotes the probability ofevent A.Random variable X : Ω 7→ R is a real-valued measurable functiondefined on Ω. That is, events X−1(a,b) ∈ F are observable for alla,b ∈ R.Induced probability measurePX (B) := P(X ∈ B) = P(ω : X (ω) ∈ B), for any Borel set B ⊆ R.Distribution function FX (x) := P(X ≤ x), x ∈ R.


Continuous and Discrete Random VariablesRandom variable X is said to be continuous if the distributionfunction FX has no jumps, that is,

limh→0

FX (x + h) = FX (x), ∀x ∈ R.

Most continuous distributions of interest have a density fX ≥ 0:

FX (x) =

∫ x

−∞fX (y)dy , x ∈ R

where∫∞−∞ fX (y)dy = 1.

Random variable X is said to be discrete if the distribution functionFX is a pure jump function:

FX (x) =∑

k :xk≤x

pk , x ∈ R

where the probability mass function pk satisfies that 1 ≥ pk ≥ 0and

∑∞k=1 pk = 1.


Expectation, Variance and Moments

A General FormulaFor a real-valued function g, the expectation of g(X ) is given byEg(X ) =

∫g(x)dFX (x).

The k -th moment of X is given by E(X k ) =∫

xkdFX (x). Themean µX (or “center of gravity”) of X is the first moment.The variance (or “spread out”) of X is defined asσ2

X = var(X ) := E(X − µX )2. Clearly σ2X = E(X 2)− µ2

X .If the variance exists, then the Chebyshev inequality holds:

P(|X − µX | > kσX ) ≤ k−2, k > 0.

That is, the probability of tail regions that are k standarddeviations away from the mean is bounded by 1/k2.


Random VectorsLet (Ω,F ,P) be a probability space.

X = (X1, . . . ,Xd ) : Ω 7→ Rd denotes a d-dimensional randomvector, where its components X1, . . . ,Xd are real-valued randomvariables.The induced probability measure: PX (B) = P(X ∈ B):= P(ω : X (ω) ∈ B) for all Borel subsets B of Rd .The distribution function FX (x) := P(X1 ≤ x1, . . . ,Xd ≤ xd ),x = (x1, . . . , xd ) ∈ Rd .If X has a density fX ≥ 0, then

FX (x) =

∫ x1

−∞· · ·∫ xd

−∞fX (x)dx

with∫∞−∞ · · ·

∫∞−∞ fX (x)dx = 1.

For any J ⊆ 1, . . . ,d, let X J := (Xj ; j ∈ J) be the J-margin of X .The marginal density of X J is given by

fX J (xJ) =

∫fX (x)dxJc .


Expectation, Variance, and CovarianceThe expectation or mean value of X is denoted byµX = EX := (E(X1), . . . ,E(Xd )).The covariance matrix of X is defined as

ΣX := (cov(Xi ,Xj); i , j = 1, . . . ,d)

where the covariance of Xi and Xj is defined as

cov(Xi ,Xj) := E [(Xi − µXi )(Xj − µXj )] = E(XiXj)− µXiµXj .

The correlation of Xi and Xj is denoted by

corr(Xi ,Xj) :=cov(Xi ,Xj)

σXiσXj

.

It follows from the Cauchy-Schwarz inequality that−1 ≤ corr(Xi ,Xj) ≤ 1.


Independence and DependenceThe events A1, . . . ,An are independent if for any1 ≤ i1 < i2 < · · · < ik ≤ n,

P(∩kj=1Aij ) =

k∏j=1

P(Aij ).

The random variables X1, . . . ,Xn are independent if for any Borelsets B1, . . . ,Bn, the events X1 ∈ B1, . . . , Xn ∈ Bn areindependent.The random variables X1, . . . ,Xn are independent if and only ifFX1,...,Xn (x1, . . . , xn) =

∏ni=1 FXi (xi), for all (x1, . . . , xn) ∈ Rn.

The random variables X1, . . . ,Xn are independent if and only ifE [∏n

i=1 gi(Xi)] =∏n

i=1 Egi(Xi) for any real-valued functionsg1, . . . ,gn.In the continuous case, the random variables X1, . . . ,Xn areindependent if and only if fX1,...,Xn (x1, . . . , xn) =

∏ni=1 fXi (xi), for all

(x1, . . . , xn) ∈ Rn.


Two ExamplesLet X = (X1, . . . ,Xd ) have a d-dimensional Gaussian distribution.The random variables X1, . . . ,Xd are independent if and only ifcorr(Xi ,Xj) = 0 for i 6= j .For non-Gaussian random vectors, however, independence anduncorrelatedness are not equivalent. Let X be a standard normalrandom variable. Since both X and X 3 have expectation zero, Xand X 2 are uncorrelated:

cov(X ,X 2) = E(X 3)− EXE(X 2) = 0.

But X and X 2 are clearly dependent (co-monotone). SinceX ∈ [−1,1] = X 2 ∈ [0,1], we obtain

P(X ∈ [−1,1],X 2 ∈ [0,1]) = P(X ∈ [−1,1])

> [P(X ∈ [−1,1])]2 = P(X ∈ [−1,1])P(X 2 ∈ [0,1]).


Autocorrelations

For a time series X0,X1,X2, . . . the autocorrelation at lag h isdefined by corr(X0,Xh), h = 0,1, . . . .Log-returns Xt := log St

St−1, where St is the price of a speculative

asset (equities, indexes, exchange rates and commodity) at theend of the t-th period. If the relative returns are small, thenXt ≈ St−St−1

St−1. Note that the log-returns are scale-free, additive,

stationary, ....Stylized Fact #1: Log-returns Xt are not iid (independent andidentically distributed) although they show little serialautocorrelation.Stylized Fact #2: Series of absolute |Xt | or squared X 2

t returnsshow profound serial autocorrelation (long-range dependence).


Stochastic ProcessesA stochastic process X := (Xt , t ∈ T ) is a collection of randomvariables defined on some space Ω, where T ⊆ R.If index set T is a finite or countably infinite set, X is said to be adiscrete-time process. If T is an interval, then X is acontinuous-time process.A stochastic process X is a (measurable) function of twovariables: time t and sample point ω.

Fix time t , Xt = Xt (ω), ω ∈ Ω, is a random variable.

Fix sample point ω, Xt = Xt (ω), t ∈ T , is a sample path.

Example: An autoregressive process of order 1 is given by

Xt = φXt−1 + Zt , t ∈ Z,

where φ is a real parameter. Time series models can beunderstand as discretization of stochastic differential equations.


Finite-Dimensional DistributionsAll possible values of a stochastic process X = (Xt , t ∈ T )constitute a function space of all sample paths(Xt (ω), t ∈ T ),∀ω ∈ Ω.Specifying the distribution of X on this function space is equivalentto specifying which information is available in terms of theobservable events from the σ-field generated by X .The distribution of X can be described by the distributions of thefinite-dimensional vectors

(Xt1 , . . . ,Xtn ), for all possible choices of times t1, . . . , tn ∈ T .

Example: A stochastic process is called Gaussian if all itsfinite-dimensional distributions are multivariate Gaussian. Thedistribution of this process is determined by the collection of themean vectors and covariance matrices.


Expectation and Covariance FunctionsThe expectation function of a process X = (Xt , t ∈ T ) is defined as

µX (t) := µXt = EXt , t ∈ T .

The covariance function of X is given by

CX (t , s) := cov(Xt ,Xs) = E [(Xt − EXt )(Xs − EXs)], t , s ∈ T .

In particular, the variance function of X is given byσ2

X (t) = CX (t , t) = var(Xt ), t ∈ T .Example: A Gaussian white noise X = (Xt ,0 ≤ t ≤ 1) consists ofiid N(0,1) random variables. In this case its finite-dimensionaldistributions are given by, for any 0 ≤ t1 ≤ · · · ≤ tn ≤ 1,

P(Xt1 ≤ x1, . . . ,Xtn ≤ xn) =n∏

i=1

P(Xti ≤ xi) =n∏

i=1

Φ(xi), ∀x ∈ Rn.

Its expectation and covariance functions are given by µX (t) = 0,

CX (t , s) =

1 if t = s0 if t 6= s


Dependence StructureA process X = (Xt , t ∈ T ) is said to be strictly stationary if for anyt1, . . . , tn ∈ T

(Xt1 , . . . ,Xtn ) =d (Xt1+h, . . . ,Xtn+h).

That is, its finite-dimensional distribution functions are invariantunder time shifts.A process X = (Xt , t ∈ T ) is said to have stationary increments if

Xt − Xs =d Xt+h − Xs+h, ∀t , s, t + h, s + h ∈ T .

A process X = (Xt , t ∈ T ) is said to have independent incrementsif for all t1 < · · · < tn in T ,

Xt2 − Xt1 , . . . ,Xtn − Xtn−1 are independent.


Strictly Stationary vs Stationary

A process X is said to be stationary (in the wide sense) if

µX (t + h) = µX (t), and CX (t , s) = CX (t + h, s + h).

If second moments exist, then the strictly stationarity implies thestationarity.Example: Consider a strictly stationary Gaussian process X . Thedistribution of X is determined by µX (0) and CX (t , s) = gX (|t − s|)for some function gX . In particular, for Gaussian white noise X ,gX (0) = 1 and gX (x) = 0 for any x 6= 0.


Homogeneous Poisson Process

A stochastic process X = (Xt , t ≥ 0) is called an Poisson process withintensity rate λ > 0 if

X0 = 0,it has stationary, independent increments, andfor every t > 0, Xt has a Poisson distribution Poi(λt).

Simulation of Poisson ProcessesSimulate iid exponential Exp(λ) random variables Y1,Y2, . . . , and setTn :=

∑ni=1 Yi . The Poisson process can be constructed by

Xt := #n : Tn ≤ t, t ≥ 0.

Example: Claims arriving in an insurance portfolio.


Outline

Brownian MotionSimulation of Brownian Sample Paths


DefinitionA stochastic process B = (Bt , t ∈ [0,∞)) is called a (standard)Brownian motion or a Wiener process if

B0 = 0,it has stationary, independent increments,for every t > 0, Bt has a normal N(0, t) distribution, andit has continuous sample paths.

Historical Note: Brownian motion is named after the botanist RobertBrown who first observed, in the 1820s, the irregular motion of pollengrains immersed in water. By the end of the nineteenth century, thephenomenon was understood by means of kinetic theory as a result ofmolecular bombardment. in 1900, L. Bachelier had employed it tomodel the stock market, where the analogue of molecularbombardment is the interplay of the myriad of individual marketdecisions that determine the market price. Norbert Wiener (1923) wasthe first to put Brownian motion on a firm mathematical basis.


Distributional Properties of Brownian Motion

For any t > s, Bt − Bs =d Bt−s − B0 = Bt−s has an N(0, t − s)distribution. That is, the larger the interval, the larger thefluctuations of B on this interval.µB(t) = EBt = 0 and for any t > s,

CB(t , s) = E [((Bt − Bs) + Bs)Bs] = E(Bt − Bs)EBs + s = min(s, t).

Brownian motion is a Gaussian process: its finite-dimensionaldistributions are multivariate Gaussian.

Question: How irregular are Brownian sample paths?


Self-SimilarityA stochastic process X = (Xt , t ≥ 0) is H-self-similar for someH > 0 if it satisfies the condition

(T HXt1 , . . . ,THXtn ) =d (XTt1 , . . . ,XTtn )

for every T > 0 and any choice of ti ≥ 0, i = 1, . . . ,n.Self-similarity means that the properly scaled patterns of a samplepath in any small or large time interval have a similar shape.

Non-Differentiability of Self-Similar ProcessesFor any H-self-similar process X with stationary increments and0 < H < 1,

lim supt↓t0

|Xt − Xt0 |t − t0

=∞, at any fixed t0.

That is, sample paths of H-self-similar processes are nowheredifferentiable with probability 1.


Path Properties of Brownian MotionBrownian motion is 0.5-self-similar.Its sample paths are nowhere differentiable. That is, any samplepath changes its shape in the neighborhood of any time epoch ina completely non-predictable fashion (Wiener, Paley andZygmund, 1930s).

Unbounded Variation of Brownian Sample Paths

supτ

n∑i=1

|Bti (ω)− Bti−1(ω)| =∞, a.s.,

where the supremum is taken over all possible partitionsτ : 0 = t0 < · · · < tn = T of any finite interval [0,T ].

The unbounded variation and non-differentiability of Brownian samplepaths are major reasons for the failure of classical integration methods,when applied to these paths, and for the introduction of stochasticcalculus.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 21 / 169

Brownian Bridge

Let B = (Bt , t ∈ [0,∞)) denote Brownian Motion.The process X := (Bt − tB1,0 ≤ t ≤ 1) satisfies that X0 = X1 = 0.This process is called the (standard) Brownian bridge.Since multivariate normal distributions are closed under lineartransforms, the finite-dimensional distributions of X are Gaussian.The Brownian bridge is characterized by two functions µX (t) = 0and CX (t , s) = min(t , s)− ts, for all s, t ∈ [0,1].The Brownian bridge appears as the limit process of thenormalized empirical distribution function of a sample of iiduniform U(0,1) random variables.


Brownian Motion with Drift

Let B = (Bt , t ∈ [0,∞)) denote Brownian Motion.The process X := (µt + σBt , t ≥ 0), for constant σ > 0 and µ ∈ R,is called Brownian motion with (linear) drift.X is a Gaussian process with expectation and covariancefunctions

µX (t) = µt , CX (t , s) = σ2 min(t , s), s, t ≥ 0.


Geometric Brownian Motion (Black, Scholes andMerton 1973)

The process X = (exp(µt + σBt ), t ≥ 0), for constant σ > 0 andµ ∈ R, is called geometric Brownian motion.Since EetZ = et2/2 for an N(0,1) random variable Z , it followsfrom the self-similarity of Brownian motion that

µX (t) = eµtEeσBt = eµtEeσt1/2B1 = e(µ+0.5σ2)t .

Since Bt − Bs and Bs are independent for any s ≤ t , andBt − Bs =d Bt−s, then

CX (t , s) = e(µ+0.5σ2)(t+s)(eσ2t − 1).

In particular, σ2X (t) = e(2µ+σ2)t (eσ

2t − 1).


Central Limit Theorem

Consider a sequence Y1,Y2, . . . , of iid non-degenerate randomvariables with mean µY = EY1 and variance σ2

Y = var(Y1) > 0.Define the partial sums: R0 := 0, Rn :=

∑ni=1 Yi , n ≥ 1.

Central Limit Theorem (CLT)If Y1 has finite variance, then the sequence (Rn) obeys the CLT via thefollowing uniform convergence:

supx∈R

∣∣∣P ( Rn − ERn

[var(Rn)]1/2 ≤ x)− Φ(x)

∣∣∣→ 0, as n→∞,

where Φ(x) denotes the distribution of the standard normal distribution.

That is, for large sample size n, the distribution of[Rn − ERn]/[var(Rn)]1/2 is approximately standard normal.


Functional ApproximationLet (Yi) be a sequence of iid random variables with mean µY = EY1and variance σ2

Y = var(Y1) > 0. Consider the processSn = (Sn(t), t ∈ [0,1]) with continuous sample paths on [0,1],

Sn(t) =

(σ2

Y n)−1/2(Ri − µY i), if t = i/n, i = 0, . . . ,nlinearly interpolated, otherwise.

Example: If Yis are iid N(0,1), consider the restriction of the processSn on the points i/n: Sn(i/n) = n−1/2∑i

k=1 Yk , i = 0, . . . ,n.Sn(0) = 0.Sn has independent increments: for any 0 ≤ i1 ≤ · · · ≤ im ≤ n,

Sn(i2/n)− Sn(i1/n), . . . ,Sn(im/n)− Sn(im−1/n)

are independent.For any 0 ≤ i ≤ n, Sn(i/n) has a normal N(0, i/n) distribution.Sn and Brownian motion B on [0,1], when restricted to the pointsi/n, have very much the same properties.


Functional Central Limit TheoremLet C[0,1] denote the space of all continuous functions defined on[0,1]. With the maximum norm, C[0,1] is a complete separable space.

Donsker’s TheoremIf Y1 has finite variance, then the process Sn obeys the functional CLT:

Eφ(Sn(·))→ Eφ(B), as n→∞,

for all bounded continuous functionals φ : C[0,1]→ R, where B(t) isthe Brownian motion on [0,1].

The finite-dimensional distributions of Sn converge to thecorresponding finite-dimensional distributions of B: As n→∞,

P(Sn(t1) ≤ x1, . . . ,Sn(tm) ≤ xm)→ P(Bt1 ≤ x1, . . . ,Btm ≤ xm),

for all possible ti ∈ [0,1] xi ∈ R.The max functional max0≤i≤n Sn(ti) converges in distribution tomax0≤t≤1 Bt as n→∞.


Functional CLT for Jump ProcessesStochastic processes are infinite-dimensional objects, andtherefore unexpected events may happen. For example, thesample paths of the converging processes may fluctuate verywildly with increasing n. In order to avoid such irregular behavior,a so-called tightness or stochastic compactness condition must besatisfied.The functional CLT remains valid for the processSn = (Sn(t), t ∈ [0,1]), where

Sn(t) = (σ2Y n)−1/2(R[nt] − µY [nt ])

and [nt ] denotes the integer part of the real number nt .In contrast to Sn, the process Sn is constant on the intervals[(i − 1)/n, i/n) and has jumps at the points i/n.Sn and Sn coincide at the points i/n, and the differences betweenthese two processes are asymptotically negligible: thenormalization n1/2 makes the jumps of Sn arbitrarily small for largen.


Simulating a Brownian Sample Path

Plot the paths of the processes Sn, or Sn, for sufficiently large n,and get a reasonable approximation to Brownian sample paths.Since Brownian motion appears as a distributional limit,completely different graphs for different values of n may appear forthe same sequence of realizations Yi(ω)s.

Simulating a Brownian Sample Path on [0,T ]

Simulate one path of Sn, or Sn on [0,1], then scale the time interval bythe factor T and the sample path by the factor T 1/2.


Lévy-Ciesielski RepresentationSince Brownian sample paths are continuous functions, we cantry to expand them in a series.However, the paths are random functions: for different ω, weobtain different path functions. This means that the coefficients ofthis series are random variables.Since the process is Gaussian, the coefficients must be Gaussianas well.

Lévy-Ciesielski ExpansionBrownian motion on [0,1] can be represented in the form

Bt (ω) =∞∑

n=1

Zn(ω)

∫ t

0φn(x)dx , t ∈ [0,1],

where Zns are iid N(0,1) random variables and (φn) is a completeorthonormal function system on [0,1].


Paley-Wiener RepresentationThere are infinitely many possible representations of Brownian motion.

Let (Zn,n ≥ 0) be a sequence of iid N(0,1) random variables, then

Bt (ω) = Z0(ω)t

(2π)1/2 +2π1/2

∞∑n=1

Zn(ω)sin(nt/2)

n, t ∈ [0,2π].

This series converges for every t , and uniformly for t ∈ [0,2π].

Simulating a Brownian Path via Paley-Wiener ExpansionCalculate

Z0(ω)tj

(2π)1/2 +2π1/2

M∑n=1

Zn(ω)sin(ntj/2)

n, tj =

2πjN, for 0 ≤ j ≤ N.

The problem of choosing the “right” values for M and N is similar to thechoice of the sample size n in the functional CLT.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 31 / 169

Outline

Conditional Expectation: An IllustrationGeneral Conditional Expectation


Discrete ConditioningLet X be a random variable defined on s probability space (Ω,F ,P),and B ⊂ Ω with P(B) > 0.

The conditional distribution function of X given B is defined as

FX (x |B) := P(X ≤ x |B) =P(X ≤ x ∩ B)

P(B).

The conditional expectation of X given B is given by

E(X |B) =

∫x dFX (x |B) =

∫B x dP(X ≤ x)

P(B)=

E(XIB)

P(B),

where IB(ω) is the indicator function of the event B.E(X |B) can be viewed as our estimate to X given the informationthat the event B has occurred.E(X |Bc) is similarly defined. Together, E(X |B) and E(X |Bc)provide our estimate to X depending on whether or not B occurs.


Conditional Expectation Under Discrete ConditioningThink IB as a random variable that carries the information onwhether the event B occurs, and the conditional expectationE(X |IB) of X given IB is a random variable defined as

E(X |IB)(ω) =

E(X |B), if ω ∈ BE(X |Bc), if ω /∈ B.

The random variable E(X |IB) is our estimate to X based on theinformation provided by IB.Consider a discrete random variable Y on Ω taking distinct valuesyi , i = 1,2, . . . ,. Let Ai = ω ∈ Ω : Y (ω) = yi. Note that Y carriesthe information on whether or not events Ais occur.Define the conditional expectation of X given Y :

E(X |Y )(ω) := E(X |Ai) = E(X |Y = yi), if ω ∈ Ai , i = 1,2, . . . .

The random variable E(X |Y ) can be viewed as our estimate to Xbased on the information carried by Y .


Example: Uniform Random VariableConsider the random variable X (ω) = ω on Ω = (0,1], endowed withthe probability measure P((a,b]) := b − a for any (a,b] ⊂ (0,1].

Assume that one of the events

Ai =

(i − 1

n,

in

], i = 1, . . . ,n,

occurred. Then

E(X |Ai) =1

P(Ai)

∫Ai

xfX (x)dx =12

2i − 1n

(i.e., the center of Ai ).

The value E(X |Ai) is the updated expectation on the new spaceAi , given the information that Ai occurred.Define Y (ω) :=

∑ni=1

i−1n IAi (ω), i = 1, . . . ,n. The conditional

expectation E(X |Y )(ω) = 12

2i−1n if ω ∈ Ai , i = 1, . . . ,n.

Since E(X |Y )(ω) is the average of X given the information thatω ∈ ((i − 1)/n, i/n], E(X |Y ) is a coarser version of X , that is, anapproximation to X , given the information that any of the Aisoccurred.


σ-Fields

Observe that the values of Y did not really matter for the definition ofE(X |Y ) under discrete conditioning, but it was crucial that conditioningevents Ais describe the information carried by all the distinct values ofY .That is, we estimate the random variable X via E(X |Y ) based on theinformation provided by observable events Ais and their compositeevents, such as Ai ∪ Aj and Ai ∩ Aj , ....

Definition of σ-FieldsA σ-field F on Ω is a collection of subsets (observable events) of Ωsatisfying the following conditions:∅ ∈ F and Ω ∈ F .If A ∈ F , then Ac ∈ F .If A1,A2, · · · ∈ F , then ∪∞i=1Ai ∈ F , and ∩∞i=1 Ai ∈ F .


Generated σ-Fields

For any collection C of events, let σ(C) denote the smallest σ-fieldcontaining C, by adding all possible unions, intersections andcomplements. σ(C) is said to be generated by C.

The following are some examples.F = ∅,Ω = σ(∅).F = ∅,Ω,A,Ac = σ(A).F = A : A ⊆ Ω = σ(A : A ⊆ Ω).Let C = (a,b] : −∞ < a < b <∞, then any set in B1 = σ(C) iscalled a Borel subset in R.Let C = (a, b] : −∞ < ai < bi <∞, i = 1, . . . ,d, then any set inBd = σ(C) is called a Borel subset in Rd .


σ-Fields Generated By Random VariablesLet Y be a discrete random variable taking distinct valuesyi , i = 1,2, . . . . Define

Ai = ω : Y (ω) = yi, i = 1,2, . . . .

A typical set in the σ-field σ(Ai) is of this form

A = ∪i∈IAi , I ⊆ 1,2, . . . .

σ(Ai) is called the σ-field generated by Y , and denoted by σ(Y ).Let Y be a random vector and

A(a, b] = ω : Y (ω) ∈ (a, b], −∞ < ai < bi <∞, i = 1, . . . ,d.

The σ-field σ(A(a, b],a, b ∈ Rd) is called the σ-field generatedby Y , and denoted by σ(Y ).σ(Y ) provides the essential information about the structure of Y ,and contains all the observable events ω : Y (ω) ∈ C, where C isa Borel subset of Rd .


σ-Fields Generated By Stochastic ProcessesFor a stochastic process Y = (Yt , t ∈ T ), and any (measurable)set C of functions on T , let

A(C) = ω : the sample path (Yt (ω), t ∈ T ) belongs to C.

The σ-field generated by the process Y is the smallest σ-field thatcontains all the events of the form A(C).Example: For Brownian motion B = (Bt , t ≥ 0), let

Ft := σ(Bs, s ≤ t)

denote the σ-field generated by Brownian motion prior to time t .Ft contains the essential information about the structure of theprocess B on [0, t ]. One can show that this σ-field is generated byall sets of the form

At1,...,tn (C) = ω : (Bt1(ω), . . . ,Btn (ω)) ∈ C

for all the n-dimensional Borel sets C.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 40 / 169

Information Represented by σ-FieldsFor a random variable, a random vector or a stochastic process Yon Ω, the σ-field σ(Y ) generated by Y contains the essentialinformation about the structure of Y as a function of ω ∈ Ω. Itconsists of all subsets ω : Y (ω) ∈ C for suitable sets C.Because Y generates a σ-field, we also say that Y containsinformation represented by σ(Y ) or Y carries the informationσ(Y ).For any measurable function f acting on Y , since

ω : f (Y (ω)) ∈ C = ω : Y (ω) ∈ f−1(C), ∀ measurable set C

we have that σ(f (Y )) ⊆ σ(Y ). That is, a function f acting on Ydoes not provide new information about the structure of Y .Example: For Brownian motion B = (Bi , t ≥ 0), consider thefunction f (B) = sup0≤t≤1 Bt . The σ(f (B)) ⊂ σ(Bs, s ≤ t) for anyt ≥ 1.


The General Conditional ExpectationLet (Ω,F ,P) be a probability space, and Y ,Y1 and Y2 denote randomvariables (or random vectors, stochastic processes) defined on Ω.

The information of Y is contained in F or Y does not contain moreinformation than that contained in F ⇔ σ(Y ) ⊆ F .Y1 contains more information than Y2 ⇔ σ(Y2) ⊆ σ(Y1).

Conditional Expectation Given the σ-FieldLet X be a random variable defined on Ω. The conditional expectationgiven F is a random variable, denoted by E(X |F), with the followingproperties:

E(X |F) does not contain more information than that contained inF : σ(E(X |F)) ⊆ F .For any event A ∈ F , E(XIA) = E(E(X |F)IA).

By virtue of the Radon-Nikodym theorem, we can show the existenceand almost sure (a.s.) uniqueness of E(X |F).


Conditional Expectation Given Generated InformationLet Y be a random variable (random vector or stochastic process) onΩ. The conditional expectation of X given Y , denoted by E(X |Y ), isdefined as E(X |Y ) := E(X |σ(Y )).

The random variables X and E(X |F) are “close” to each other, notin the sense that they coincide for any ω, but averages(expectations) of X and E(X |F) on suitable sets A are the same.The conditional expectation E(X |F) is a coarser version of theoriginal random variable X and is our estimate to X given theinformation F .

Example: Let Y be a discrete random variable taking distinct valuesyi , i = 1,2, . . . . Any set A ∈ σ(Y ) can be written as

A = ∪i∈IAi = ∪i∈Iω : Y (ω) = yi, for some I ⊆ 1,2, . . . .Let Z := E(X |Y ). Then σ(Z ) ⊂ σ(Y ) and Z (ω) = E(X |Ai), for ω ∈ Ai .Observe that

E(XIA) = E(X∑i∈I

IAi ) =∑i∈I

E(XIAi ) =∑i∈I

E(X |Ai)P(Ai) = E(ZIA).


Special Cases

Classical Conditional Expectation: Let B be an event withP(B) > 0,P(Bc) > 0. Define FB := σ(B) = ∅,Ω,B,Bc. Then

E(X |FB)(ω) = E(X |B), for ω ∈ B.

Classical Conditional Probability: If X = IA, then

E(IA|FB)(ω) = E(IA|B) =P(A ∩ B)

P(B), for ω ∈ B.


Rules for Calculation of Conditional ExpectationsLet X ,X1,X2 denote random variables defined on (Ω,F ,P).

1 For any two constants c1, c2,E(c1X1 + c2X2|F) = c1E(X1|F) + c2E(X2|F).

2 EX = E [E(X |F)].3 If X and F are independent, then E(X |F) = EX . In particular, if X

and Y are independent, then E(X |Y ) = EX .4 If σ(X ) ⊂ F , then E(X |F) = X . In particular, if X is a function of

Y , then σ(X ) ⊂ σ(Y ) and E(X |Y ) = X .5 If σ(X ) ⊂ F , then E(XX1|F) = XE(X1|F). In particular, if X is a

function of Y , then σ(X ) ⊂ σ(Y ) and E(XX1|Y ) = XE(X1|Y ).6 If F and F ′ are two σ-fields with F ⊂ F ′, then

E(X |F) = E [E(X |F ′)|F ], and E(X |F) = E [E(X |F)|F ′].7 Let G be a stochastic process with σ(G) ⊂ F . If X and F are

independent, then for any function h(x , y),

E [h(X ,G)|F ] = E(EX [h(X ,G)]|F)

where EX [h(X ,G)] = expectation of h(X ,G) with respect to X .Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 45 / 169

Another Example: Squared Brownian MotionConsider again Brownian motion B = (Bt , t ≥ 0), with the σ-fieldsFs = σ(Bx , x ≤ s). Define Xt := B2

t − t , t ≥ 0.If s ≥ t , then Fs ⊃ Ft and thus E(Xt |Fs) = Xt .If s < t , observe that

Xt = [(Bt − Bs) + Bs]2 − t = (Bt − Bs)2 + B2s + 2Bs(Bt − Bs)− t .

Since (Bt − Bs) and (Bt − Bs)2 are independent of Fs, we have

E [(Bt − Bs)2|Fs] = E(Bt − Bs)2 = (t − s),

E [Bs(Bt − Bs)|Fs] = BsE(Bt − Bs) = 0.

Since σ(B2s ) ⊂ σ(Bs) ⊂ Fs, we have

E(B2s |Fs) = B2

s .

Thus, E(Xt |Fs) = Xs.E(Xt |Fs) = Xmin(s,t).


The Projection Property of Conditional ExpectationsWe now formulate precisely the meaning of the statement that theconditional expectation E(X |F) can be understood as the optimalestimate to X given the information F . Define

L2(F) := Z : σ(Z ) ⊂ F , EZ 2 <∞.

If F = σ(Y ), then Z ∈ L2(σ(Y )) implies that Z is a function of Y .

The Projection Property

Let X be a random variable with EX 2 <∞. The conditionalexpectation E(X |F) is that random variable in L2(F) which is closestto X in the mean square sense:

E [X − E(X |F)]2 = minZ∈L2(F)

E(X − Z )2.

If F = σ(Y ), then E(X |Y ) is that function of Y which has a finitesecond moment and which is closest to X in the mean square sense.


The Best Prediction Based on Available Information

It follows from the projection property that the conditional expectationE(X |F) can be viewed as the best prediction of X given theinformation F .For example, for Brownian motion B = (Bt , t ≥ 0), we have, s ≤ t ,

E(Bt |Bx , x ≤ s) = Bs, and E(B2t − t |Bx , x ≤ s) = B2

s − s.

That is, the best predictions of the future values Bt and B2t − t , given

the information about Brownian motion until the present time s, are thepresent values Bs and B2

s − s, respectively. This property characterizesthe whole class of martingales with a finite second moment.


Outline

MartingalesMartingale Transforms


Filtration

Let (Ft , t ≥ 0) be a collection of σ-fields on the same probability space(Ω,F ,P) with Ft ⊆ F , for all t ≥ 0.

DefinitionThe collection (Ft , t ≥ 0) of σ-fields on Ω is called a filtration if

Fs ⊆ Ft , ∀ 0 ≤ s ≤ t .

A filtration represents an increasing stream of information.The index t can be discrete, for example, the filtration(Fn,n = 0,1, ...) is a sequence of σ-fields on Ω with Fn ⊆ Fn+1 forall n ≥ 0.


Adapted Processes

A filtration is usually linked up with a stochastic process.

DefinitionThe stochastic process Y = (Yt , t ≥ 0) is said to be adapted to thefiltration (Ft , t ≥ 0) if

σ(Yt ) ⊆ Ft , ∀ t ≥ 0.

The stochastic process Y is always adapted to the natural filtrationgenerated by Y :

Ft = σ(Ys, s ≤ t).

For a discrete-time process Y = (Yn,n = 0,1, . . . ), theadaptedness means σ(Yn) ⊆ Fn for all n ≥ 0.


ExampleLet (Bt , t ≥ 0) denote Brownian motion and (Ft , t ≥ 0) denote thecorresponding natural filtration. Stochastic processes of the form

Xt = f (t ,Bt ), t ≥ 0, where f is a function of two variables,

are adapted to (Ft , t ≥ 0).

Examples: X (1)t = Bt and X (2)

t = B2t − t .

More Examples: X (3)t = max0≤s≤t Bs and X (4)

t = max0≤s≤t B2s .

Examples that are not adapted to the Brownian motion filtration:X (5)

t = Bt+1 and X (6)t = Bt + BT for some fixed number T > 0.

DefinitionIf the stochastic process Y = (Yt , t ≥ 0) is adapted to the naturalBrownian filtration (Ft , t ≥ 0) (that is, Yt is a function of (Bs, s ≤ t) forall t ≥ 0), we will say that Y is adapted to Brownian motion.


Adapted to Different Filtrations

Consider Brownian motion (Bt , t ≥ 0) and the corresponding naturalfiltration Ft = σ(Bs, s ≤ t). The stochastic process

Xt := B2t , t ≥ 0,

generates its own natural filtration F ′t = σ(B2s , s ≤ t), t ≥ 0. The

process (Xt , t ≥ 0) is adapted to both F ′t and Ft .

Observe that F ′t ⊂ Ft . For example, we can only reconstruct the wholeinformation about |Bt | from B2

t ;, but not about Bt : we can say nothingabout the sign of Bt .


Market Information or Information Histories

Share prices, exchange rates, interest rates, etc., can be modelledby solutions of stochastic differential equations which are drivenby Brownian motion.These solutions are then functions of Brownian motion.The fluctuations of these processes actually represent theinformation about the market. This relevant knowledge iscontained in the natural filtration.In finance there are always people who know more than theothers. For example, they might know that an essential politicaldecision will be taken in the very near future which will completelychange the financial landscape.This enables the informed persons to act with more competencethan the others. Thus they have their own filtrations which can bebigger than the natural filtration.


MartingaleIf information Fs and X are dependent, we can expect that knowing Fsreduces the uncertainty about the values of Xt at t > s. That is, Xt canbe better predicted via E(Xt |Fs) with the information Fs than without it.

DefinitionThe stochastic process X = (Xt , t ≥ 0) adapted to the filtration(Ft , t ≥ 0) is called a continuous-time martingale with respect to(Ft , t ≥ 0), if

1 E |Xt | <∞ for all t ≥ 0.2 Xs is the best prediction of Xt given Fs: E(Xt |Fs) = Xs for all

0 ≤ s ≤ t .

The discrete-time martingale can be similarly defined by replacing thesecond condition by E(Xn+1|Fn) = Xn, ∀ n = 0,1, . . . .A martingale has the remarkable property that its expectation functionis constant: EXs = E [E(Xt |Fs)] = EXt for all s, t .


Example: Partial Sums

Let (Zn) be a sequence of independent random variables withfinite expectations and Z0 = 0. Consider the partial sums

Rn =n∑

i=0

Zi , n ≥ 0.

and the corresponding natural filtrationFn = σ(R0, . . . ,Rn) = σ(Z0, . . . ,Zn), n ≥ 0.Observe that

E(Rn+1|Fn) = E(Rn|Fn) + E(Zn+1|Fn) = Rn + EZn+1.

and hence, if EZn = 0 for all n ≥ 0, then (Rn,n ≥ 0) is amartingale with respect to the filtration (Fn,n ≥ 0).


Collecting Information About a Random Variable

Let Z be a random variable on Ω with E |Z | <∞ and (Ft , t ≥ 0) bea filtration on Ω. Define

Xt = E(Z |Ft ), t ≥ 0.

Since Ft increases when time goes by, Xt gives us more and moreinformation about the random variable Z . In particular, ifσ(Z ) ⊆ Ft for some t , then Xt = Z .An appeal to Jensen’s inequality yields

E |Xt | = E |E(Z |Ft )| ≤ E [E(|Z ||Ft )] = E |Z | <∞.

σ(Xt ) ⊆ Ft .E(Xt |Fs) = E [E(Z |Ft )|Fs] = E(Z |Fs) = Xs.

X is a martingale with respect to (Ft , t ≥ 0).


Brownian Motion is a Martingale

Let B = (Bt , t ≥ 0) be Brownian motion with the natural filtrationFt = σ(Bs, s ≤ t).B and (B2

t − t , t ≥ 0) are martingales with respect to the naturalfiltration.(B3

t − 3tBt , t ≥ 0) is a martingale.


Martingale TransformLet X = (Xn,n = 0,1, . . . ) be a discrete-time martingale with respect tothe filtration (Fn,n = 0,1, . . . ). Let Yn := Xn − Xn−1, n ≥ 1, andY0 := X0. The sequence Y = (Yn,n = 0,1, . . . ) is called a martingaledifference sequence with respect to the filtration (Fn,n = 0,1, . . . ).

Consider a stochastic process C = (Cn,n = 1,2, . . . ), satisfyingthat σ(Cn) ⊆ Fn−1, n ≥ 1. Given Fn−1, we completely know Cn attime n − 1. Such a sequence is called predictable with respect to(Fn,n = 0,1, . . . ).Define

Z0 = 0, Zn =n∑

i=1

CiYi =n∑

i=1

Ci(Xi − Xi−1), n ≥ 1.

The process C · Y := (Zn,n ≥ 0) is called the martingaletransform of Y by C.Note that if Cn = 1 for all n ≥ 1, then C · Y = X is the originalmartingale.


Martingale Transform Leads to a Martingale

Assume that the second moments of Cn and Yn are finite.

It follows from the Cauchy-Schwarz inequality that

E |Zn| ≤n∑

i=1

E |CiYi | ≤n∑

i=1

[EC2i EY 2

i ]1/2 <∞.

Since Y1, . . . ,Yn do not carry more information than Fn, andσ(C1, . . . ,Cn) ⊆ Fn−1 (predictability), we have σ(Zn) ⊆ Fn.Due to the predictability of C,

E(Zn − Zn−1|Fn−1) = E(CnYn|Fn−1) = CnE(Yn|Fn−1) = 0.

(Zn − Zn−1,n ≥ 0) is a martingale difference sequence, and (Zn,n ≥ 0)is a martingale with respect to (Fn,n = 0,1, . . . ).


A Brownian Martingale TransformConsider Brownian motion B = (Bs, s ≤ t) and a partition

0 = t0 < t1 < · · · < tn−1 < tn = t .

The σ-fields at these time instants are described by the filtration:

F0 = ∅,Ω, Fi = σ(Btj ,1 ≤ j ≤ i), i = 1, . . . ,n.

The sequence ∆B := (∆iB,1 ≤ i ≤ n) defined by

∆0B = 0, ∆iB = Bti − Bti−1 , i = 1, . . . ,n,

forms a martingale difference sequence with respect to thefiltration (Fi ,1 ≤ i ≤ n).B := (Bti−1 ,1 ≤ i ≤ n) is predictable with respect to (Fi ,1 ≤ i ≤ n).The martingale transform B ·∆B is then a martingale:(B ·∆B)k =

∑ki=1 Bti−1(Bti − Bti−1), k = 1, . . . ,n.

This is precisely a discrete-time analogue of the Itô stochasticintegral

∫ t0 BsdBs.


Martingale as a Fair Game

Let X = (Xn,n = 0,1, . . . ) be a discrete-time martingale with respect tothe filtration (Fn,n = 0,1, . . . ). Let Yn = Xn − Xn−1, n ≥ 0, denote themartingale difference, and Cn, n ≥ 1, be predictable with respect to(Fn,n = 0,1, . . . ).

Think of Y , as your net winnings per unit stake at the n-th gamewhich are adapted to a filtration (Fn,n = 0,1, . . . ).At the n-th game, your stake Cn, does not contain moreinformation than Fn−1 does. At time n − 1, this is the bestinformation we have about the game.CnYn is the net winnings for stake Cn at the n-th game.(C · Y )n =

∑ni=1 CiYi is the net winnings up to time n.

The game is fair because the best prediction of the net winningsCnYn of the n-th game, just before the n-th game starts, is zero:E(CnYn|Fn−1) = 0.


Outline

The Itô IntegralsThe Stratonovich Integrals


Integrating With Respect To a FunctionLet B = (Bt , t ≥ 0) be Brownian motion.

Goal: Define an integral of type∫ 1

0 f (t)dBt (ω), where f (t) is afunction or a stochastic process on [0,1] and Bt (ω) is a Browniansample path.Difficulty: The path Bt (ω) does not have a derivative.The pathwise integral of the Riemann-Stieltjes type is one option.Consider a partition of the interval [0,1]:

τn : 0 = t0 < t1 < t2 < . . . tn−1 < tn = 1, n ≥ 1.

Let f and g be two real-valued functions on [0,1] and define∆ig := g(ti)− g(ti−1), 1 ≤ i ≤ n and the Riemann-Stieltjes sum:

Sn =n∑

i=1

f (yi)∆ig =n∑

i=1

f (yi)(g(ti)− g(ti−1)),

for ti−1 ≤ yi ≤ ti , i = 1, . . . ,n.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 65 / 169

Riemann-Stieltjes Integrals

DefinitionIf the limit S = limn→∞ Sn exists as mesh(τn)→ 0 and S isindependent of the choice of the partitions τn, and their intermediatevalues yi ’s, then S, denoted by

∫ 10 f (t)dg(t), is called the

Riemann-Stieltjes integral of f with respect to g on [0,1].

When does the Riemann-Stieltjes integral∫ 1

0 f (t)dg(t) exist, and isit possible to take g = B for Brownian motion B on [0,1]?One usual assumption is that f is continuous and g has boundedvariation:

supτn

n∑i=1

|g(ti)− g(ti−1)| <∞.

But Brownian sample paths Bt (ω) do not have bounded variation.


Bounded p-VariationThe real function h on [0,1] is said to have bounded p-variation forsome p > 0 if

supτn

n∑i=1

|h(ti)− h(ti−1)|p <∞

where the supremum is taken over all partitions τ of [0,1].Brownian motion have bounded p-variation on any fixed finiteinterval, provided that p > 2, and unbounded variation for p ≤ 2.

A Sufficient and Almost Necessary Condition

The Riemann-Stieltjes integral∫ 1

0 f (t)dg(t) exists if1 The functions f and g do not have discontinuities at the same

point t ∈ [0,1].2 The function f has bounded p-variation and the function g has

bounded q-variation such that p−1 + q−1 > 1.


Existence of the Riemann-Stieltjes IntegralAssume that f is a differentiable function with bounded derivativef ′(t) on [0,1]. Then f has bounded variation.The Riemann-Stieltjes integral∫ 1

0f (t)dBt (ω)

exits for every Brownian sample path Bt (ω).Example:

∫ 10 tkdBt (ω) for k ≥ 0,

∫ 10 etdBt (ω), ...

But the existence does not mean that you can evaluate theseintegrals explicitly in terms of Brownian motion.A more serious issue: How to define

∫ 10 Bt (ω)dBt (ω)?

Brownian motion has bounded p-variation for p > 2, not for p ≤ 2,and so the sufficient condition 2p−1 > 1 for the existence of theRiemann-Stieltjes integral is not satisfied.In fact, it can be shown that

∫ 10 Bt (ω)dBt (ω) does not exist as a

Riemann-Stieltjes integral.


Another Fatal Blow to the Riemann-Stieltjes Approach

It can be shown that if∫ 1

0 f (t)dg(t) exists as a Riemann-Stieltjesintegral for all continuous functions f on [0,1], then g necessarilyhas bounded variation. But Brownian sample paths do not havebounded variation on any finite interval.Since pathwise average with respect to a Brownian sample path,as suggested by the Riemann-Stieltjes integral, does not lead to asufficiently large class of integrable functions f , one has to find adifferent approach to define the stochastic integrals such as∫ 1

0 Bt (ω)dBt (ω).We will try to define the integral as a probabilistic average, leadingto the Itô Integrals.


A Motivating ExampleLet B = (Bt , t ≥ 0) be Brownian motion. Consider a partition of [0, t ]:

τn : 0 = t0 < t1 < · · · < tn−1 < tn = t , with ∆i = ti − ti−1, n ≥ 1

and the Riemann-Stieltjes sums, for n ≥ 1,

Sn =n∑

i=1

Bti−1∆iB, with ∆iB := Bti − Bti−1 1 ≤ i ≤ n.

Rewriting: Sn = 12B2

t −12∑n

i=1(∆iB)2 =: 12B2

t −12Qn(t).

The limit of Sn boils down to the limit of Qn(t), as n→∞.One can show that Qn(t) does not converge for a given Browniansample path and suitable choices of partitions τn.We will show that Qn(t) converges in probability to t , as n→∞.This is the key to define the Itô Integral!


Quadratic Variation of Brownian MotionSince Brownian motion has independent and stationaryincrements, E(∆iB∆jB) = 0 for i 6= j and

E(∆iB)2 = Var(∆iB) = ti − ti−1 = ∆i .

Thus E(Qn(t)) =∑n

i=1 E(∆iB)2 =∑n

i=1 ∆i = t .Var(Qn(t)) =

∑ni=1 Var((∆iB)2) =

∑ni=1[E((∆iB)4)−∆2

i ].Since E(B4

1) = 3 (standard normal), we haveE((∆iB)4) = EB4

ti−ti−1= E(∆

1/2i B1)4 = 3∆2

i (self-similarity). ThusVar(Qn(t)) = 2

∑ni=1 ∆2

i .If mesh(τn) = max1≤i≤n ∆i → 0, we obtain that

Var(Qn(t)) = E(Qn(t)−t)2 ≤ 2mesh(τn)n∑

i=1

∆i = 2t mesh(τn)→ 0.

It follows from the Chebyshev inequality that Qn(t)→ t inprobability as mesh(τn)→ 0 (n→∞). This limiting functionf (t) = t is called the quadratic variation of Brownian motion.


Mean Square Limit is a MartingaleThe quadratic variation is an emerging characteristic only forBrownian motion.Since Sn = 1

2B2t −

12Qn(t) converges in mean square to 1

2B2t −

12 t ,

we define the Itô Integral in the mean square sense:∫ t

0BsdBs =

12

(B2t − t).

The values of Brownian motion were evaluated at the left endpoints of the intervals [ti−l , ti ], then the martingale transform

B ·∆B =k∑

i=1

Bti−1(Bti − Bti−1)

is a martingale with respect to the filtration σ(Bti ,0 ≤ i ≤ k), for allk = 1, . . . ,n.As a result, the mean square limit limn→∞ Sn = 1

2(B2t − t) is a

martingale with respect to the natural Brownian filtration.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 72 / 169

Heuristic RulesThe increment ∆iB = Bti − Bti−1 on the interval [ti−1, ti ] satisfies

E(∆iB) = 0, Var(∆iB) = ∆i = ti − ti−1.

These properties suggest that (∆iB)2 is of order ∆i .In terms of differentials, we write

(dBt )2 = (Bt+dt − Bt )

2 = dt .

In terms of integrals, we write∫ t

0(dBs)2 =

∫ t

0ds = t .

These rules can be made mathematically precise in the meansquare sense.


Stratonovich IntegralConsider partitions τn:

0 = t0 < t1 < · · · < tn−1 < tn = t , with mesh(τn)→ 0.

Using the same arguments and tools as given for the Itô Integral,the Riemann-Stieltjes sums

Sn =n∑

i=1

Byi ∆iB, with ∆iB := Bti − Bti−1 ,

where yi = 12(ti−1 + ti),1 ≤ i ≤ n, converges to the mean square

limit 12B2

t .This quantity is called the Stratonovich stochastic integral anddenoted by ∫ t

0Bt dBt =

12

B2t .

The Riemann-Stieltjes sums∑k

i=1 Byi ∆iB, k = 1, . . . ,n, do notconstitute a martingale, and neither does the limit process 1

2B2t .


Itô Integral vs Stratonovich Integral

The Itô Integral is a martingale with respect to the naturalBrownian filtration, but its does not obey the classical chain rule ofintegration.A chain rule which is well suited for Itô integration is given by theItô lemma.The Stratonovich Integral is not a martingale, but it does obey theclassical chain rule of integration.It turns out that the Stratonovich Integral will also be a useful toolfor solving Itô stochastic differential equations.


Outline

The Itô Stochastic Integrals


Simple Processes

Let B = (Bt , t ≥ 0) denote Brownian motion and Ft = σ(Bs, s ≤ t)denote the corresponding natural filtration. Consider a partition on[0,T ]:

τn : 0 = t0 < t1 < . . . tn−1 < tn = T .

The stochastic process C = (Ct , t ∈ [0,T ]) is said to be simple if thereexists a sequence (Zi , i = 1, . . . ,n) of random variables such that

(Zi , i = 1, . . . ,n) is adapted to (Fti ,0 ≤ i ≤ n), i.e., Zi is a functionof (Bs, s ≤ ti−1) and EZ 2

i <∞, 1 ≤ i ≤ n.Ct =

∑ni=1 Zi Iti−1 ≤ t < ti+ ZnIt = T.

Example: fn(t) =∑n

i=1i−1

n I[ i−1n , i

n )(t) + n−1

n IT(t) on [0,1].

Example: Cn(t) =∑n

i=1 Bti−1 I[ i−1n , i

n )(t) + Btn−1 IT(t) on [0,T ].

Note that Ct is a function of Brownian motion until time t .


Itô Stochastic Integrals of Simple ProcessesDefine ∫ T

0CsdBs :=

n∑i=1

Cti−1(Bti − Bti−1) =n∑

i=1

Zi∆iB.

Itô Integrals of Simple Processes on [0, t ], tk−1 ≤ t < tk∫ t

0CsdBs :=

∫ T

0CsI[0,t](s)dBs =

k−1∑i=1

Zi∆iB + Zk (Bt − Btk−1).

Example:∫ t

0 fn(s)dBs =∑k−1

i=1i−1

n (Bti − Bti−1) + k−1n (Bt − Btk−1)

for k−1n ≤ t < k

n . Note that

limn→∞

∫ t

0fn(s)dBs =

∫ t

0sdBs.

Example:∫ t

o Cn(s)dBs =∑k−1

i=1 Bti−1∆iB + Btk−1(Bt − Btk−1) fortk−1 ≤ t < tk .


Itô Integral of a Simple Process is a Martingale

The form of the Itô stochastic integral for simple processes very muchreminds us of a martingale transform, which results in a martingale.

A Martingale Property

The stochastic process It (C) :=∫ t

0 CsdBs, t ∈ [0,T ], is a martingalewith respect to the natural Brownian filtration (Ft , t ∈ [0,T ]).

Using the isometry property, E(|It (C)|) <∞, for all t ∈ [0,T ].It (C) is adapted to (Ft , t ∈ [0,T ]).E(It (C)|Fs) = Is(C), for s < t .


PropertiesThe Itô stochastic integral has expectation zero.The Itô stochastic integral satisfies the isometry property:

E(∫ t

0CsdBs

)2

=

∫ t

0EC2

s ds, t ∈ [0,T ].

For any constants c1 and c2, and simple processes C(1) and C(2)

on [0,T ],∫ t

0(c1C(1)

s + c2C(2)s )dBs = c1

∫ t

0C(1)

s dBs + c2

∫ t

0C(2)

s dBs.

For any t ∈ [0,T ],∫ T

0CsdBs =

∫ t

0CsdBs +

∫ T

tCsdBs.

The process I(C) has continuous sample paths.


Basic Assumptions

Assumptions on the Integrand Process C1 C = (Ct , t ∈ [0,T ]) is adapted to Brownian motion on [0,T ], i.e. Ct

is a function of Bs, s ≤ t .2 The integral

∫ T0 EC2

s ds <∞.

For fixed t and a given partition τn = (ti) of [0, t ], we defined∫ T0 CsdBs =

∑ni=1 Cti−1(Bti − Bti−1), as the Riemann-Stieltjes

sums, for a simple process C.Brownian motion B = (Bt , t ∈ [0,T ]) satisfies the Assumptions.Simple process C = (Ct , t ∈ [0,T ]) satisfies the Assumptions.Another class of admissible integrands consists of thedeterministic functions c(t) on [O,T ] with

∫ T0 c2(t)dt <∞.


Key Steps to Define Itô Integrals and ProofsLet C = (Ct , t ∈ [0,T ]) be a process satisfying the Assumptions.

We need to find a sequence C(n) = (C(n)t , t ∈ [0,T ]) of simple

processes such that∫ T

0E [Cs − C(n)

s ]2ds → 0, as mesh(τn)→ 0.

That is, the simple processes C(n) converge in a certain meansquare sense to the integrand process C.Since C(n) is simple we can evaluate the Itô IntegralsIt (C(n)) =

∫ t0 C(n)

s dBs for every n and t .We need to show the existence of a process I(C) on [0,T ] suchthat

E sup0≤t≤T

[It (C)− It (C(n))]2 → 0, as mesh(τn)→ 0.

That is, to show that the sequence (It (C(n))) of Itô stochasticintegrals converges in a certain mean square sense to a uniquelimit process.


The General Itô Stochastic IntegralDefinitionThe mean square limit I(C) is called the Itô stochastic integral of C. Itis denoted by It (C) =

∫ t0 CsdBs, t ∈ [0,T ].

For practical purposes, the following rule of thumb is helpful:

The Itô stochastic integrals It (C) =∫ t

0 CsdBs, t ∈ [0,T ], constitute astochastic process. For a given partition

τn : 0 = t0 < t1 < · · · < tn−1 < tn = T

and t ∈ [tk−l , tk ], the random variable It (C) is “close” to theRiemann-Stieltjes sum

k−1∑i=1

Cti−1(Bti − Bti−1) + Ctk−1(Bt − Btk−1)

and this approximation is the closer (in the mean square sense) to thevalue of It (C) the more dense the partition τn in [0,T ].Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 83 / 169

Properties of the General Itô Stochastic IntegralThe stochastic process

∫ t0 CsdBs is a martingale with respect to

the natural Brownian filtration (Ft , t ∈ [0,T ]).The Itô stochastic integral has expectation zero.The Itô stochastic integral satisfies the isometry property:

E(∫ t

0CsdBs

)2

=

∫ t

0EC2

s ds, t ∈ [0,T ].

For any constants c1, c2, and processes C(1) and C(2) on [0,T ],∫ t

0(c1C(1)

s + c2C(2)s )dBs = c1

∫ t

0C(1)

s dBs + c2

∫ t

0C(2)

s dBs.

For any t ∈ [0,T ],∫ T

0CsdBs =

∫ t

0CsdBs +

∫ T

tCsdBs.

The process I(C) has continuous sample paths.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 84 / 169

Outline

The Itô Lemma: Stochastic Analogue of the Chain Rule


A Simple Version of the Itô LemmaLet B = (Bt , t ≥ 0) denote Brownian motion and Ft = σ(Bs, s ≤ t)denote the corresponding natural filtration. Assume that f is a twicedifferentiable function, it follows from the Taylor expansion

f (Bt + dBt )− f (Bt ) = f ′(Bt )dBt +12

f ′′(Bt )(dBt )2 + · · · .

In contrast to the deterministic case, the contribution of thesecond order term an the Taylor expansion is not negligible.The squared differential (dBt )

2 can be interpreted as dt .

Integrating both sides in a formal sense and neglecting terms of orderhigher than 3 on the right-hand side, we obtain

Itô Lemma (1951)

f (Bt )− f (Bs) =

∫ t

sf ′(Bx )dBx +

12

∫ t

sf ′′(Bx )dx , s < t ,

for any twice continuously differentiable f .


Examples1 Let f (t) = t2, then

B2t = 2

∫ t

0BxdBx + t

resulting in∫ t

0 BxdBx = 12(B2

t − t).2 Let f (t) = t3, then

B3t − B3

s = 3∫ t

sB2

x dBx + 3∫ t

sBxdx .

We cannot express∫ t

s Bxdx in simpler terms of Brownian motion,simulations have to be used.

3 Let f (t) = et , we have

eBt − eBs =

∫ t

seBx dBx +

12

∫ t

seBx dx >

∫ t

seBx dBx .

So, the exponential function is not the Itô exponential.


Extension I of the Itô LemmaAssume that f (t , z) has continuous partial derivatives of at leastsecond order. Let

fi(t , x) =∂

∂xif (x1, x2)

∣∣x1=t ,x2=x , fij(t , x) =

∂2

∂xi∂xjf (x1, x2)

∣∣x1=t ,x2=x ,

for i , j = 1,2.

Itô LemmaFor any s < t ,

f (t ,Bt )− f (s,Bs) =

∫ t

s[f1(x ,Bx ) +

12

f22(x ,Bx )]dx +

∫ t

sf2(x ,Bx )dBx .

Example: Let f (t , x) = ex−0.5t , and then

eBt−0.5t − eBs−0.5s =

∫ t

seBx−0.5xdBx .

eBt−0.5t is called the Itô exponential.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 88 / 169

Geometric Brownian MotionConsider a particular form of geometric Brownian motion

Xt = f (t ,Bt ) = e(c−0.5σ2)t+σBt ,

where c and σ > 0 are constants.

An application of the Itô lemma yields that the process X satisfies

Xt − X0 = c∫ t

0Xsds + σ

∫ t

0XsdBs.

Symbolically in the differential form,

dXt = σXtdBt + cXtdt .

The process suggested by Black, Scholes and Merton.


Itô ProcessesLet B = (Bt , t ≥ 0) denote Brownian motion and Ft = σ(Bs, s ≤ t)denote the corresponding natural filtration. Consider

Xt = X0 +

∫ t

0A(1)

s ds +

∫ t

0A(2)

s dBs,

or symbolically in the differential form,

dXt = A(1)t dt + A(2)

t dBt ,

where processes A(1) and A(2) are adapted to Brownian motion(Ft , t ≥ 0).

A process X with this representation is called an Itô process.One can show that the processes A(1) and A(2) are uniquelydetermined in the sense that, if X has above representation,where the A(i)s are replaced with adapted processes D(i), thenA(i) and D(i) necessarily coincide.The geometric Brownian motion is an Itô process with A(1) = cXand A(2) = σX .


Extension II of the Itô LemmaLet f (t , z) be a function whose second order partial derivatives arecontinuous.

Itô Lemma

If dXt = A(1)t dt + A(2)

t dBt , then for any s < t ,

f (t ,Xt )− f (s,Xs) =

∫ t

sA(2)

y f2(y ,Xy )dBy +

∫ t

s[f1(y ,Xy ) + A(1)

y f2(y ,Xy ) +12

[A(2)y ]2f22(y ,Xy )]dy ,

This Itô Lemma is frequently given in the following form:

f (t ,Xt )− f (s,Xs) =

∫ t

sf2(y ,Xy )dXy +∫ t

s[f1(y ,Xy ) +

12

[A(2)y ]2f22(y ,Xy )]dy .


Extension III of the Itô LemmaLet X (1) and X (2) be two Itô processes and f (t , x1, x2) be a functionwhose second order partial derivatives are continuous.

Itô Lemma

f (t ,X (1)t ,X (2)

t )− f (s,X (1)s ,X (2)

s ) =

∫ t

sf1(y ,X (1)

y ,X (2)y )dy

+3∑

i=2

∫ t

sfi(y ,X

(1)y ,X (2)

y )dX (i−1)y

+12

3∑i=2

3∑j=2

∫ t

sfij(y ,X

(1)y ,X (2)

y )A(2,i−1)y A(2,j−1)

y dy ,

where for i = 1,2,

dX (i)t = A(1,i)

t dt + A(2,i)t dBt .


Stochastic Integration by PartsConsider the function f (t , x1, x2) = x1x2, then we obtain

Integration by Parts Formula

d(X (1)t X (2)

t ) = X (2)t dX (1)

t + X (1)t dX (2)

t + A(2,1)t A(2,2)

t dt ,

where for i = 1,2, dX (i)t = A(1,i)

t dt + A(2,i)t dBt .

Example: Consider X (1)t = et − 1 =

∫ t0 esds and

X (2)t = Bt =

∫ t0 dBs. Integration by parts yields∫ t

0esdBs = etBt −

∫ t

0Bsesds.

More generally, for any continuously differentiable function f ,∫ t

0f (s)dBs = f (t)Bt −

∫ t

0f ′(s)Bsds.


Outline

The Stratonovich and Other IntegralsItô Stochastic Differential Equations


Sums with the Values at the Center of SubintervalsThere is a large variety of other integrals, the Itô integral being just onemember of this family.Let B = (Bt , t ≥ 0) denote Brownian motion and Ft = σ(Bs, s ≤ t)denote the corresponding natural filtration. Consider a partition of[0,T ]:

τn : 0 = t0 < t1 < · · · < tn−1 < tn = T .

Consider first Ct = f (Bt ), t ∈ [0,T ], for a twice differentiablefunction f on [0,T ].Define the Riemann-Stieltjes sums

Sn =n∑

i=1

f (Byi )∆iB, where yi =ti−1 + ti

2.

The mean square limit of the Riemann-Stieltjes sums Sn exists asmesh(τn)→ 0.


The Stratonovich IntegralDefinition

Assume that∫ T

0 E [f (Bt )]2dt <∞. The mean square limit of the sumsSn is called the Stratonovich Integral, and denoted by

St (f (B)) =

∫ t

0f (Bs) dBs, t ≤ T

1 Let f (t) = t , then the itô integral∫ t

0 BxdBx = 12(B2

t − t).2 Let f (t) = t , consider Sn =

∑ni=1 B ti−1+ti

2(Bti − Bti−1). The mean

square limit of the Riemann-Stieltjes sums Sn is 12B2

t , and thus itsStratonovich Integral ∫ t

0Bs dBs =

12

B2t .


Relation between Itô and Stratonovich IntegralsAssume that

∫ T0 E [f (Bt )]2dt <∞, and

∫ T0 E [f ′(Bt )]2dt <∞.

Observe that the Taylor expansion

f(B(ti−1+ti )/2

)= f (Bti−1) + f ′(Bti−1)(Byi − Bti−1) + · · ·

where we neglect higher order terms. Then the Riemann-Stieltjessums can be written as follows

n∑i=1

f (Byi )∆iB = S(1)n + S(2)

n + S(3)n ,

1 S(1)n =

∑ni=1 f (Bti−1)∆iB converges in the mean square sense to

the Itô integral∫ t

0 f (Bs)dBs,2 S(2)

n =∑n

i=1 f ′(Bti−1)(Byi − Bti−1)2 converges in the mean squaresense to 1

2

∫ t0 f ′(Bs)ds,

3 S(3)n =

∑ni=1 f ′(Bti−1)(Byi − Bti−1)(Bti − Byi ) converges in the mean

square sense to 0.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 97 / 169

Chain Rule for the Stratonovich Integrals

Transformation Formula∫ t

0f (Bs) dBs =

∫ t

0f (Bs)dBs +

12

∫ t

0f ′(Bs)ds.

Chain RuleThe Stratonovich stochastic integral satisfies the chain rule of classicalcalculus: ∫ t

0g′(Bs) dBs = g(Bt )− g(B0).

1 The Stratonovich stochastic integral (St (f (B)), t ≤ T ) does notconstitute a martingale, but obeys the “nice” classical chain rule.

2 The Itô integral does not obey the classical chain rule, but is amartingale that offers rich structural properties.


A More General Transformation Formula

ConsiderCt = f (t ,Xt ), t ∈ [0,T ],

where f (t , x) is a function with continuous partial derivatives of ordertwo. The process X is supposed to be an Itô process given by thestochastic differential equation:

Xt = X0 +

∫ t

0a(s,Xs)ds +

∫ t

0b(s,Xs)dBs

where the continuous functions a(t , x) and b(t , x) satisfy someregularity conditions.

Theorem∫ t

0f (s,Xs) dBs =

∫ t

0f (s,Xs)dBs +

12

∫ t

0b(s,Xs)f2(s,Xs)ds.


Another Approximation

Let f (t , z) be a function whose second order partial derivatives arecontinuous. Assume that∫ T

0E [f (t ,Xt )]2dt <∞.

Consider approximating Riemann-Stieltjes sums

Sn =n∑

i=1

f(

ti−1,Xti−1 + Xti

2

)∆iB.

One can show that this definition is consistent with the previous one Snwith f (t , x) = f (x) and X = B.


p-Stochastic Integrals, 0 ≤ p ≤ 1Let process (Ct , t ∈ [0,T ]) is adapted to Brownian motion B. Consider

Spn =

n∑i=1

Cti−1+p(ti−ti−1)∆iB.

Under some regularity conditions, the mean square limit of theRiemann-Stieltjes sums Sp

n , as mesh(τn)→ 0, exists. This meansquare limit is called the p-Stochastic Integral and denoted by(p)−

∫ T0 CsdBs.

1 If p = 0, we obtain the Itô integral. If p = 0.5, we obtain theStratonovich integral.

2 For non-trivial integrands C, the values (p)−∫ T

0 CsdBs differ fordistinct ps.

3 For example, using arguments similar to the Itô and Stratonovichcases, one can show that

(p)−∫ T

0BsdBs = 0.5B2

t + (p − 0.5)T .


Itô Stochastic Differential EquationsLet B = (Bt , t ≥ 0) be Brownian motion. The randomness in thedifferential equation is introduced via a perturbed initial condition andan additional random noise term:

dXt = a(t ,Xt )dt + b(t ,Xt )dBt , X0(ω) = Y (ω),

where a(t , x) and b(t , z) are deterministic functions.An Itô stochastic differential equation with driving process B isgiven in the integral form by

Xt = X0 +

∫ t

0a(s,Xs)ds +

∫ t

0b(s,Xs)dBs, X0(ω) = Y (ω).

It is possible to replace the driving process B by semimartingales,which contains Brownian motion and a large variety of jumpprocesses. They are useful tools when one is interested inmodeling the jump character of real-life processes, e.g., the strongoscillations of foreign exchange rates or crashes of the stockmarket.


Diffusions: Strong and Weak Solutions

A strong solution is a stochastic process (Xt , t ≥ 0) that satisfies thefollowing conditions:

1 X is adapted to Brownian motion.2 The integrals in the integral form are well defined as Riemann or

Itô stochastic integrals, respectively.3 X is a function of the underlying Brownian sample path and of the

coefficient functions a(t , x) and b(t , x).For weak solutions the path behavior is not essential, we are onlyinterested in the distribution of X . The initial condition X0 and thecoefficient functions a(t , z) and b(t , x) are given, and we have to find aBrownian motion such that the Itô stochastic differential equationholds.We only consider strong solutions.


Existence of Strong Solutions

TheoremA unique strong solution of an Itô stochastic differential equation withdriving process B exists on [0,T ] if

1 The initial condition X0 has a finite second moment and isindependent of B.

2 The coefficient functions a(t , x) and b(t , x) are continuous.3 The coefficient functions a(t , x) and b(t , x) satisfy a Lipschitz

condition with respect to the second variable.

Example: A linear Itô stochastic differential equation given by

Xt = X0 +

∫ t

0(c1Xs + c2)ds +

∫ t

0(σ1Xs + σ2)dBs, X0(ω) = Y (ω),

has a unique strong solution.


Linear Itô SDE with Multiplicative NoiseConsider the linear Itô stochastic differential equation

Xt = X0 + c∫ t

0Xsds + σ

∫ t

0XsdBs, X0(ω) = Y (ω)

Let Xt = f (t ,Bt ), for some smooth function f . From Itô Lemma, wehave

cf (t , x) = f1(t , x) +12

f22(t , x), σf (t , x) = f2(t , x).

If f (t .x) = g(t)h(x) is separable, then we have

f (t , x) = g(0)h(0)e(c−0.5σ2)t+σx .

The unique strong solution is given by the geometric Brownianmotion:

Xt = X0e(c−0.5σ2)t+σBt .


Langevin Equation

Linear SDE with Additive Noise

Xt = X0 + c∫ t

0Xsds + σ

∫ t

0dBs, t ∈ [0,T ],

where c is a constant.

In the differential form dXt = cXtdt + σdBt .This resembles a time series (autoregressive process of order 1),

Xt+1 − Xt = cXt + σ(Bt+1 − Bt ), or Xt+1 = φXt + Zt ,

where φ = c + 1 and Zt = σ(Bt+1 − Bt ) ∼ N(0, σ2). This timeseries model can be considered as a discrete analogue of thesolution to the Langevin equation.


Ornstein-Uhlenbeck Process

The unique strong solution of the Langevin Equation is given by

Xt = ectX0 + σect∫ t

0e−csdBs.

For a constant initial condition X0, this process is called anOrnstein-Uhlenbeck process.The Ornstein-Uhlenbeck process is a Gaussian process.if X0 = 0, then

EXt = 0, cov(Xt ,Xs) =σ2

2c(ec(t+s) − ec(t−s)), s < t .


Example: Two Independent Driving Brownian MotionsLet B(i) = (B(i)

t , t ≥ 0) be two independent Brownian motions and σi ,i = 1,2, positive real numbers.

Define the process

Bt = (σ21 + σ2

2)−1/2(σ1B(1)t + σ2B(2)

t ).

Bt is a Brownian motion, because it has exactly the sameexpectation and covariance functions as standard Brownianmotion: E(Bt ) = 0, cov(Bt , Bs) = min(s, t).Consider the integral equation

Xt = X0 + c∫ t

0Xsds + σ1

∫ t

0XsdB(1)

s + σ2

∫ t

0XsdB(2)

s

= X0 + c∫ t

0Xsds + (σ2

1 + σ22)1/2

∫ t

0XsdBs

for some constants c and σ1 and σ2.The solution: Xt = X0e[c−0.5(σ2

1+σ22)]t+[σ1B(1)

t +σ2B(2)t ].


Outline

Solving Itô Differential Equations via Stratonovich CalculusThe General Linear Differential Equation


Converting Itô’s to Stratonovich’sConsider the Itô differential equation

Xt = X0 +

∫ t

0a(s,Xs)ds +

∫ t

0b(s,Xs)dBs, t ∈ [0,T ]

where the coefficient functions a(t , x) and b(t , x) satisfy the regularityconditions for existence and uniqueness of a strong solution.Using the transformation formula for Stratonovich integrals in terms ofItô and Riemann integrals, we have∫ t

0b(s,Xs)dBs =

∫ t

0b(s,Xs) dBs −

12

∫ t

0b(s,Xs)b2(s,Xs)ds.

We then arrive at the equivalent Stratonovich stochastic differentialequation:

Xt = X0 +

∫ t

0a(s,Xs)ds +

∫ t

0b(s,Xs) dBs, t ∈ [o,T ]

where a(t , x) = a(t , x)− 12b(t , x)b2(t , x).


The Stratonovich Version of Itô LemmaNow consider the stochastic process Yt = u(t ,Xt ) for some smoothfunction u(t , x). Using the Itô lemma, we have

Yt = Y0 +

∫ t

0(u1 + au2 +

12

b2u22)ds +

∫ t

0bu2dBs.

Applying the transformation formula for f = bu2 and f2 = b2u2 + bu22,we obtain

Theorem

Yt = Y0 +

∫ t

0[u1 + (a− 0.5bb2)u2]ds +

∫ t

0bu2 dBs.

This formula is the exact analogue of the classical chain rule for atwice differentiable function u(t , x), evaluated at x(t) satisfying

dx(t) = [a(t , x(t))− 0.5b(t , x(t))b2(t , x(t))]dt + b(t , x(t))dc(t)

where c(t) is a differentiable function.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 111 / 169

A Typical Scheme via Stratonovich IntegralsConsider an Itô stochastic differential equation,

Xt = X0 +

∫ t

0[qf (Xs) +

12

f (Xs)f ′(Xs)]ds +

∫ t

0f (Xs)dBs.

The equivalent Stratonovich stochastic differential equation is

Xt = X0 +

∫ t

0qf (Xs)ds +

∫ t

0f (Xs) dBs.

This corresponds to the deterministic differential equation

dx(t) = qf (x(t))dt + f (x(t))dc(t)

where c(t) is a differential function.Separating variables, we have

g(x(t))− g(x(0)) :=

∫ x(t)

x(0)

dxf (x)

= qt + c(t)− c(0).

The solution is given by g(Xt )− g(X0) = qt + Bt .Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 112 / 169

The General Linear Differential Equation

Linear Differential Equation

Xt = X0 +

∫ t

0(c1(s)Xs + c2(s))ds +

∫ t

0(σ1(s)Xs + σ2(s))dBs, t ∈ [0,T ]

1 if the (deterministic) coefficient functions ci and σi are continuous,then the existence and uniqueness conditions guarantee that ithas a unique strong solution.

2 It is particularly attractive because it has an explicit solution interms of the coefficient functions and of the underlying Browniansample path.

3 We derive this solution by multiple use of different variants of theItô lemma.


Linear Equations with Additive NoiseConsider

Xt = X0 +

∫ t

0(c1(s)Xs + c2(s))ds +

∫ t

0σ2(s)dBs, t ∈ [0,T ].

The process X is not directly involved in the stochastic integral.

Try Yt = f (t ,Xt ) := y(t)Xt , where y(t) = e−∫ t

0 c1(s)ds.An application of the Itô lemma yieldsdYt = d(y(t)Xt ) = c2(t)y(t)dt + σ2(t)y(t)dBt .

The Solution:

Xt = (y(t))−1(

X0 +

∫ t

0c2(s)y(s)ds +

∫ t

0σ2(s)y(s)dBs

).

Observe that∫ t

0 σ2(s)y(s)dBs is Gaussian with variance∫ t0 σ

22(s)y2(s)ds. If X0 is a constant, X is a Gaussian process.


The Vasicek Interest Rate Model

Let rt denote the instantaneous interest rate at time t for borrowingand lending money.In the Vasicek model, rt is described by

drt = c[µ− rt ]dt + σdBt , t ∈ [0,T ],

where c, µ and σ are positive constants.rt reverts to the mean µ in the sense that when rt deviates from µit will immediately be drawn back to µ and the speed at which thishappens is proportional to |µ− rt | adjusted by the parameter c.The volatility parameter σ is a measure for the order of themagnitude of the fluctuations of rt around µ.


The Vasicek Process

The solution of the Vasicek interest rate model is

rt = r0e−ct + µ(1− e−ct ) + σe−ct∫ t

0ecsdBs.

1 If r0 is a constant. Then r is a Gaussian process with

Ert = r0e−ct + µ(1− e−ct ), var(rt ) =σ2

2c(1− e−2ct ).

2 For µ = 0 we obtain an Ornstein-Uhlenbeck process.3 As t →∞, rt →d N(µ, σ2/2c).


Homogeneous Equations with Multiplicative NoiseConsider

Xt = X0 +

∫ t

0c1(s)Xsds +

∫ t

0σ1(s)XsdBs, t ∈ [0,T ].

Without loss of generality, we assume that X0 = 1.Since we expect an exponential form of the solution, we assumethat Xt > 0 for all t . Let Yt = f (Xt ) := ln Xt .Applying the Itô Lemma, we obtain that

dYt = [c1(t)− 0.5σ21(t)]dt + σ1(t)dBt

The Solution:

Xt = X0 exp∫ t

0[c1(s)− 0.5σ2

1(s)]ds +

∫ t

0σ1(s)dBs

.

Example: If c1(t) = c and σ1(t) = σ, we get the geometric Brownianmotion.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 117 / 169

The General CaseConsider

Xt = X0 +

∫ t

0(c1(s)Xs + c2(s))ds +

∫ t

0(σ1(s)Xs + σ2(s))dBs, t ∈ [0,T ].

1 Let Y denote the solution of the homogeneous stochasticdifferential equation with Y0 = 1.

2 Consider X (1)t = Y−1

t and X (2)t = Xt . Apply the Itô lemma to

X (1)t = Y−1

t , we havedX (1)

t = [−c1(t) + σ21(t)]X (1)

t dt − σ1(t)X (1)t dBt .

3 An appeal to the integration by parts formula yieldsd(X (1)

1 X (2)2 ) = [c2(t)− σ1(t)σ2(t)]X (1)

t dt + σ2(t)X (1)t dBt .

The Solution:

Xt = Yt

(X0 +

∫ t

0[c2(s)− σ1(s)σ2(s)]Y−1

s ds +

∫ t

0σ2(s)Y−1

s dBs

).


The Expectation and Variance of the SolutionConsider again

Xt = X0 +

∫ t

0(c1(s)Xs + c2(s))ds +

∫ t

0(σ1(s)Xs + σ2(s))dBs, t ∈ [0,T ].

Let µX (t) = E(Xt ).1 Take expectations on both sides and notice that the stochastic

integral has expectation zero. Hence

µX (t) = µX (0) +

∫ t

0(c1(s)µX (s) + c2(s))ds.

2 This corresponds to the general linear differential equation

µ′X (t) = c1(t)µ′X (t) + c2(t).

3 The variance function can be similarly obtained.


Outline

The Black-Scholes Option Pricing FormulaChange of MeasuresExtensions and Limitations of the Model


A Short Excursion into Finance

Let Xt denote the price of a risky asset (let’s call it a stock) at timet .Assume that the relative return from the asset in the period of time[t , t + dt ] has a linear trend c dt which is disturbed by a stochasticnoise term σdBt .

Xt+dt − Xt

Xt= cdt + σdBt , or dXt = cXtdt + σXtdBt .

The constant c > 0 is the so-called mean rate of return, and σ > 0is the volatility.Observe that this is a crude, first order approximation to a realprice process. But people in economics believe in exponentialgrowth and they are often happy with this model.


Trading StrategyAssume that you have a non-risky asset such as a bank account,which can be called a bond. Let βt denote the bond yield at time t .Let’s say your initial capital is β0, which will be continuouslycompounded with a constant interest rate r > 0. That is,

dβt = rβtdt , or βt = β0ert .

Note again that this is an idealization since the interest ratechanges over time as well.If you have at shares in stock and bt shares in bond at time t , thenyour portfolio at time t can be represented by (at ,bt ), t ∈ [0,T ],which is called a trading strategy.You want to adjust your strategy according to information availableto you at time t , as to maximize your wealth Vt = atXt + btβt (thevalue of your portfolio) at time t . So, It is reasonable to assumethat at and bt are stochastic processes adapted to Brownianmotion B.


Self-Financing Conditionat and bt can be positive or negative. A negative value of atmeans short sale of stock (i.e. you sell the stock at time t). Anegative value of bt means that you borrow money at the bond’sriskless interest rate r .We neglect transaction costs for operations on stock and sale forsimplicity.Assume that you spend no money on other purposes (such asfood), i.e., you do not make your portfolio smaller by consumption.We assume finally that your trading strategy (at ,bt ) isself-financing. That is, the increments of your wealth Vt result onlyfrom changes of the prices Xt and βt of your asserts:

dVt = atdXt + btdβt = (catXt + rbtβt )dt + σatXtdBt .

Vt = V0 +

∫ t

0(casXs + rbsβs)ds +

∫ t

0σasXsdBs.


OptionAn option at time t = 0 is a “ticket” which entitles you to buy oneshare of stock until or at time T , the time of maturity or time ofexpiration of the option.If you can exercise this option (or exercise the call) at a fixed priceK , called the exercise price or strike price of the option, only attime of maturity T , this is called a European call option. If you canexercise it until or at time T , it is called an American call option.There are many other kinds ....The purchaser of a European call option is entitled to a payment of

(XT − K )+ = max(0,XT − K ).

We illustrate option pricing using European call options.A put is an option to sell stock at a strike price K on or until aparticular date of maturity T . A European put option is exercisedonly at time of maturity with profit (K −XT )+, and an American putcan be exercised until or at time T .


Option Pricing

Since you do not know the price XT at time t = 0 when you purchasethe call, a natural question arises:

How much would you be willing to pay for such a ticket, i.e. what is arational price for this option at time t = 0?

Black, Scholes and Merton responded as follows:

1 You, after investing this rational value of money in stock and bondat time t = 0, can manage your portfolio according to aself-financing strategy so as to yield the same payoff (XT − K )+

as if the option had been purchased.2 If the option were offered at any price other than this rational

value, there would be an opportunity of arbitrage, i.e. forunbounded profits without an accompanying risk of loss.


Hedging Against the Contingent ClaimGoal: Find a self-financing strategy (at ,bt ) and a wealth process Vt ,such that

Vt = atXt + btβt = u(T − t ,Xt ), t ∈ [0,T ],

for some smooth (a technical assumption) deterministic function u(t , x)with the terminal condition

VT = u(0,XT ) = (XT − K )+.

That is, to hedge against the contingent claim (XT − K )+.1 Apply the Itô lemma to the wealth process Vt = u(T − t ,Xt ) and

we obtain an integral representation.2 Plug bt = Vt−at Xt

βtinto the self-financing condition, and we obtain

another integral representation.3 From these two integral representations, we derive a PDE with

condition u(0, x) = (x − K )+, x > 0.


Black-Scholes PDEWe obtain that

0.5σ2x2u22(t , x) + rxu2(t , x) + u1(t , x)− ru(t , x) = 0

with boundary conditions

u(t ,0) = 0, limx→∞

u(t , x)

x= 1 ∀t ∈ [0,T ]; u(0, x) = (x − K )+ ∀x ≥ 0.

Transform the equation into a diffusion equation by usingθ = T − t , y = log(x/K ) + (r − σ2/2)θ, w(θ, y) = erθu(t , x).

We arrive at a heat equation

∂w∂θ

=σ2

2∂2w∂y2

with an initial condition w(0, y) = K (ey − 1)+.Use the heat kernel, we have

w(θ, y) = (2πσ2θ)1/2∫ ∞−∞

w(0, z)e−(y−z)2/2σ2θdz.


Black-Scholes-Merton ApproachThe explicit solution can be simplified as

u(t , x) = xΦ(g(t , x))− Ke−rt Φ(h(t , x)),

where Φ is the standard normal distribution function, and

g(t , x) =ln(x/K ) + (r + 0.5σ2)t

σt1/2 , h(t , x) = g(t , x)− σt1/2.

Black-Scholes Option Pricing FormulaA rational price at time t = 0 for a European call option with exerciseprice K is

V0 = X0Φ(g(T ,X0))− Ke−rT Φ(h(T ,X0)).

The stochastic process Vt = u(T − t ,Xt ) is the value of yourself-financing portfolio with trading strategy

at = u2(T − t ,Xt ) > 0, bt =u(T − t ,Xt )− atXt

βt.


The Radon-Nikodym TheoremConsider two measures µ and ν defined on a σ-field F on Ω. µ is saidto be absolutely continuous with respect to ν (denoted by µ ν) if

ν(A) = 0 implies µ(A) = 0, ∀A ∈ F .

We say that µ and ν are equivalent measures if µ ν and ν µ.

TheoremAssume µ and ν are two σ-finite measures. Then µ ν holds if andonly if there exists a non-negative measurable function f such that

µ(A) =

∫A

f (ω)dν(ω), ∀A ∈ F .

Moreover, f is almost everywhere unique with respect to ν. Thefunction f is called the (relative) density of µ with respect to ν, anddenoted by f = dµ

dν .


Girsanov’s TheoremLet B = (Bt , t ≥ 0) be standard Brownian motion on the probabilityspace (Ω,F ,P), and Ft = σ(Bs, s ≤ t) the Brownian filtration.Consider

Bt = Bt + qt , t ∈ [0,T ], for some constant q.

Although B is not a standard Brownian motion under P for q 6= 0, Bcan be shown to be a standard Brownian motion under the newprobability measure Q.

Girsanov-Cameron-Martin Theorem1. The stochastic process

Mt = exp−qBt −12

q2t, t ∈ [0,T ]

is a martingale with respect to the natural Brownian filtration underthe probability measure P.


Eliminating the Drift TermGirsanov-Cameron-Martin Theorem

2. Q(A) =∫

A MT (ω)dP(ω), A ∈ F , defines a probability measure Q(called an equivalent martingale measure) on F that is equivalentto P.

3. Under the probability measure Q, the process B is a standardBrownian motion.

4. The process B is adapted to the filtration Ft .

Consider the linear stochastic differential equation

dXt = cXtdt + σXtdBt , t ∈ [0,T ].

With a linear drift term, X is not a martingale under P.Define Bt = Bt + c

σ t , and we havedXt = σXtd(Bt + c

σ t) = σXtdBt , t ∈ [0,T ].

B is a standard Brownian motion under the equivalent martingalemeasure Q, and thus X is a martingale under Q.


Significance of the Change-of-Measure Trick

If we had known the solution only for the case without a lineardrift, we could have derived the solution for the case with a lineardrift via the change of measure.More significantly, X is a martingale under the equivalentmartingale measure Q, and one can make use of the martingaleproperty for proving various results about X .In fact, this is not just a technical trick, and as we demonstratebelow, the change of measure provides an effective method toincorporate uncertainty and to hedge against contingent claims.


Recap: The Black-Scholes Model

The price of one share of the risky asset (stock) is described by


The price of the riskless asset (bond) is described by

dβt = rβdβt , t ∈ [0,T ].

Portfolio = (at ,bt ), with value Vt = atXt + btβt at time t .The portfolio is self-financing: dVt = atdXt + btdβt , t ∈ [0,T ].At time of maturity, VT = h(XT ), where h(Xt ) is the contingentclaim at time t . For a European call option, h(x) = (x − K )+, andfor a European put option, h(x) = (K − x)+.


Pricing via the Change-of-MeasureYour gain from the option at time of maturity is h(XT ). Todetermine the value of this amount of money at t = 0, you have todiscount it with given interest rate r : e−rT h(XT ), and take theexpectation of it as the price for the option at t = 0.You have to also discount price of one share of stock:Xt = e−rtXt , t ∈ [0,T ], and the Itô lemma leads to dXt = σXtdBt ,where Bt = Bt + c−r

σ t .There exists an equivalent martingale measure Q which turns Binto a standard Brownian motion, and

Xt = X0e−0.5σ2t+σBt

becomes a martingale with respect to the natural Brownianfiltration under Q.The value of the portfolio at time t is given byVt = EQ[e−r(T−t)h(XT )|Ft ], t ∈ [0,T ].

At time t = 0, V0 = EQ[e−rT h(XT )] is a rational price of the option.


The Value of an European OptionWrite θ = T − t for t ∈ [0,T ].

Since Xt = X0e(r−0.5σ2)t+σBt , we haveVt = EQ

[e−rθh(Xte(r−0.5σ2)θ+σ(BT−Bt ))|Ft

].

Since σ(Xt ) ⊆ Ft , Xt can be treated as a constant under Ft .Under Q, BT − Bt ∼ N(0, θ), and is independent of Ft .Thus, Vt = f (t ,Xt ), wheref (t , x) = e−rθ ∫∞

−∞ h(xe(r−0.5σ2)θ+σyθ1/2)dΦ(y).

For a European call option, h(x) = (x − K )+, and thus

f (t , x) = xΦ(z1)− Ke−rθΦ(z2),

z1 =ln(x/K ) + (r + 0.5σ2)θ

σθ1/2 , z2 = z1 − σθ1/2.

For a European put option, f (t , x) = Ke−rθΦ(−z2)− xΦ(−z1).


Extensions and Limitations of the ModelThe Black-Scholes model can be extended for variable (butdeterministic) rates and volatilities.The model may be also used to value European style options oninstruments paying dividends, and closed-form solutions availableif the dividend is a known proportion of the stock price.The model underestimates extreme moves that yields tail risk.In reality security prices do not follow a strict stationary log-normalprocess, nor is the risk-free interest actually known (and is notconstant over time).The variance has been observed to be non-constant leading tomodels such as GARCH to model volatility changes.Pricing discrepancies between empirical and the Black-Scholesmodel have long been observed in options corresponding toextreme price changes; such events would be very rare if returnswere log-normally distributed, but are observed much more oftenin practice.


Historical Notes

A sociologist investigating the behavior of the probabilitycommunity during the early 1990s would surely report aninteresting phenomenon. Many of the best minds of this (or anyother) generation began concentrating their research in the areaof mathematical finance. The main reason for this can be summedup in two words: option pricing (D. Applebaum, 2004)The Black-Scholes model is widely employed as a usefulapproximation, but proper application requires understanding itslimitations.The limitations and defects of the model have led manyprobabilists to query it.


Lévy MattersHeavy tails of stock prices, which is incompatible with a Gaussianmodel, suggests that it might be fruitful to replace Brownianmotion with a more general Lévy process.A Lévy process L = (Lt , t ≥ 0) has independent and stationaryincrements and is stochastically continuous, i.e.,limt→s P(|Lt − Ls| > ε) = 0 for any ε > 0.Example: Brownian motion, the Poisson process, compoundPoisson processes and their “combinations”.The Lévy-Itô decomposition for a one-dimensional Lévy process:Lt = bt + Bt +

∫|x |<1 x(N(t ,dx)− tν(dx)) +

∫|x |≥1 xN(t ,dx), where

N = Poisson random measure and ν = the Lévy measure.The small jumps term

∫|x |<1 x(N(t ,dx)− tν(dx)) describes the

day-to-day jitter that causes minor fluctuations in stock prices,while the big jumps term

∫|x |≥1 xN(t ,dx) describes large stock

price movements caused by major market upsets arising from,e.g., earthquakes or terrorist atrocities.


Outline

More on Change of MeasuresThe Feynman-Kac FormulaConstruction of Risk-Neutral and Distorted MeasuresThe World is Incomplete.


Risk-Neutral MeasureA risk-neutral measure is a probability measure under which theunderlying risky asset has the same expected return as theriskless bond (or money market account).We often demand more for bearing uncertainty. To price assets,the calculated values need to be adjusted for the risk involved.One way of doing this is to first take the expectation under thephysical distribution and then adjust for risk.A better way is to first adjust the probabilities of future outcomesby incorporating the effects of risk, and then take the expectationunder those adjusted, ‘virtual’ risk-neutral probabilities.

DefinitionA risk-neutral measure is a probability measure under which thecurrent value of all financial assets at time t is equal to the expectedfuture payoff of the asset discounted at the risk-free rate, given theinformation structure available at time t .


Complete Market

The existence of a risk-neutral measure involves absence ofarbitrage in a complete market.A market is complete with respect to a trading strategy if all cashflows for the trading strategy can be replicated by a similarsynthetic trading strategy.For example, consider the put-call parity: A put is synthesized bybuying the call, investing the strike at the risk-free rate, andshorting the stock.If at some time before maturity, they differ, then someone elsecould purchase the cheaper portfolio and immediately sell themore expensive one to make risk-less profit (since they have thesame value at maturity).In insurance markets, a complete market models the situation thatagents can buy insurance contracts to protect themselves againstany future time and state-of-the-world.


Fundamental Theorem of Arbitrage-Free PricingConsider a finite state market.

1 There is no arbitrage if and only if there exists a risk-neutralmeasure that is equivalent to the physical probability measure.

2 In absence of arbitrage, a market is complete if and only if there isa unique risk-neutral measure that is equivalent to the physicalprobability measure.

Let B = (Bt , t ≥ 0) denote standard Brownian motion and Ft thenatural filtration generated by B. When risky asset price is driven by asingle Brownian motion, there is a unique risk-neutral measure Q.

Harrison-Pliska TheoremIf (rt , t ≥ 0) is the short rate process driven by Brownian motion, andVt is any Ft -adapted contingent claim payable at time t , then its valueat time t ≤ T is given by Vt = EQ

(e−

∫ Tt ruduVT |Ft

).

The result can be extended to the case when the asset price is drivenby a semi-martingale (see Delbaen and Schachermayer 1994).Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 142 / 169

A PDE ConnectionConsider a parabolic partial differential equation

∂u∂t

+ µ(t , x)∂u∂x

+12σ2(t , x)

∂2u∂x2 = r(x)u(t , x), x ≥ 0, t ∈ [0,T ]

subject to the terminal condition u(T , x) = h(x).The functions µ, σ, h and r are known functions, and T is aparameter.It turns out that the solution can be expressed as a conditionalexpectation with respect to an Itô process starting at xdXt = µ(t ,Xt )dt + σ(t ,Xt )dBt , X0 = x .

The Feynman-Kac Formula

u(t , x) = E(e−

∫ Tt r(Xs)dsh(XT )|Xt = x

).

Example: For the Black-Scholes PDE of the European call option,µ(t , x) = rx , σ(t , x) = σx , r(t , x) = r and h(x) = (x − K )+.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 143 / 169

Exponential MartingalesPositive martingales play a central role in changing probabilitymeasures. Since a necessary condition for an Itô process to be amartingale is that its drift term vanishes, many continuous positivemartingales used in option pricing have an exponential form inconnection with Itô processes.As usual, let X denote a solution of an Itô SDEdXt = µ(t ,Xt )dt + σ(t ,Xt )dBt .

Consider Mt = exp∫ t

0 bsσ(s,Xs)dBs − 12

∫ t0 b2

sσ2(s,Xs)ds, where

(bt , t ≥ 0) is an Ft -adapted stochastic process.

Novikov’s ConditionThe process Mt is a martingale with respect to Ft for any process bt

satisfying Novikov’s condition E(exp12

∫ T0 b2

sσ2(s,Xs)ds) <∞.

Example: In Girsanov’s Theorem, Mt = exp−qBt − 12q2t is a

martingale.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 144 / 169

Itô Integral RepresentationLet B = (Bt , t ≥ 0) be standard Brownian motion on the probabilityspace (Ω,F ,P), and Ft = σ(Bs, s ≤ t) the Brownian filtration.

Consider an Itô process dXt = µ(t ,Xt )dt + σ(t ,Xt )dBt .

If µ = 0, Xt = X0 +∫ t

0 σ(s,Xs)dBs becomes a martingale withrespect to Ft . Conversely, such an integral representation holdsfor any square integrable martingale.

Martingale Representation Theorem

If a martingale (Mt , t ≥ 0) with respect to Ft satisfies E(M2t ) <∞ for

any t ≥ 0, then there exists a unique Ft -adapted stochastic processσM(t) with E(σ2

M(t)) <∞ (called the volatility process), such thatMt = M0 +

∫ t0 σM(s)dBs.

Example: Let X be a random variable on the probability space(Ω,FT ,P) with EX 2 <∞. Then X = E(X |FT ) = E(X ) +

∫ T0 σX (s)dBs.


Adjusted Measure: A Fundamental Idea of DistortionWe may want to price in uncertainty by adjusting the probabilitymeasure under which our expectation is taken.For a given Itô process, it means to adjust the probability of eachpath of the process so that the Itô process under the newprobabilities has a specific drift.For pricing an option or a contingent claim, this often requiresfinding a equivalent probability measure Q under which theunderlying asset price process has the same stochastic return asthat of the money market account (i.e., risk-neutral) or a processof our choice (e.g., a long-term zero-coupon bond).The Randon-Nikodym derivative dQ

dP of the adjusted measure Qwith respect to the physical measure P can be viewed as adistortion factor for P that incorporates uncertainty. This distortionfactor often takes an exponential form.Example: In Girsanov’s Theorem, dQ

dP = exp−qBt − 12q2t is a

distortion factor.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 146 / 169

An Extension of the Girsanov’s TheoremLet B = (Bt , t ≥ 0) be standard Brownian motion on the probabilityspace (Ω,F ,P), and Ft = σ(Bs, s ≤ t) the Brownian filtration.

Let (bt , t ≥ 0) denote an Ft -adapted stochastic process, satisfyingNovikov’s condition.Define a new probability measure Q(A) =

∫A MT dP, where

Mt = exp∫ t

0 bsdBs − 12

∫ t0 b2

sds, t ∈ [0,T ], is an exponentialmartingale with respect to Ft . Clearly, Q and P are equivalent.The stochastic process Bt = −

∫ t0 bsds + Bt , t ∈ [0,T ], is standard

Brownian motion under the probability measure Q.Note that −

∫ t0 bsds + Bt represents a stochastic process with a

predetermined drift −∫ t

0 bsds under P. To make the driftdisappear, we adjust the probability of each path by multiplying adistortion factor MT .Consider an Itô SDE dXt = µ(t ,Xt )dt + σ(t ,Xt )dBt under theprobability measure P. It has a new drfit µ(t ,Xt ) + σ(t ,Xt )bt underthe distorted probability measure Q.


Adjust for a Specified DriftPricing a contingent claim often requires us to find a probabilitymeasure for which the underlying risky asset has a specified drfit.Consider an Itô SDE dXt = µ(t ,Xt )dt + σ(t ,Xt )dBt under theprobability measure P.Let µ′(t , x) be a continuous function such that

µ′(t , x)− µ(t , x)

σ(t , x)

satisfies Novikov’s Condition.Construct a new probability measure Q with the Radon-Nikodymderivative

dQdP

= exp

∫ T

0bsdBs −

12

∫ T

0b2

sds

, bt =

µ′(t ,Xt )− µ(t ,Xt )

σ(t ,Xt ).

Under Q, X is a solution of the SDEdXt = µ′(t ,Xt )dt + σ(t ,Xt )dBt , where Bt is standard Brownianmotion under Q.


Relation Between Bond Price and Short RateConsider a continuously trading bond market over [0,T ].

Let P(t , s), 0 ≤ t ≤ s ≤ T , be the price of a default-free zerocoupon bond at time t that pays one monetary unit at maturity s.Let Pt be the σ-field generated by the bond prices P(t , s).The forward rate, compounded continuously for time s that isdetermined at time t , is defined as f (t , s) = −∂ ln P(t ,s)

∂s .The short rate (i.e., instantaneous interest rate) at time t is definedas rt = f (t , t).To ensure no arbitrage for the the bond market, there exists arisk-neutral measure Q such that for all s ≥ 0, the discountedprocess

V (t , s) = e−∫ t

0 ruduP(t , s), 0 ≤ t ≤ s,

is a martingale with respect to Pt .Thus V (t , s) = EQ(V (s, s)|Pt ) which leads toP(t , s) = EQ

(e−

∫ st rudu|Pt

).


Hull-White (Extended Vasicek) Interest Rate ModelAssume that the short rate rt follows the SDE

drt = κ(θ(t)− rt )dt + σdBt

under the risk-neutral measure Q, where the mean-revertingintensity κ is a positive constant and the long-run average θ(t) is adeterministic function.Solving it, the short rate (Markov) process is given byrt = r0e−κt + κ

∫ t0 e−κ(t−u)θ(u)du + σ

∫ t0 e−κ(t−u)dBu.

Pt = Ft , the natural Brownian filtration.The bond price P(t , s) can then be solved, and more generally, forany Ft -contingent claim C(s) payable at time s, its price C(t) attime t is given by C(t) = EQ

(e−

∫ st ruduC(s)|Ft

).

The closed form expression for P(t , s) is given by the so-calledaffine form P(t , s) = eA(t ,s)−B(s−t)rt , where A and B are explicit,deterministic and independent of the short rate.


The One-Factor Gaussian Forward Rate ModelAssume that under a risk-neutral probability measure Q, theforward rate is governed by the SDE

df (t , s) = µ(t , s)dt + σ(t , s)dBt , 0 ≤ t ≤ s,

where the deterministic function µ(t , s) is the term structure of theforward rate drifts and the deterministic function σ(t , s) is the termstructure of the forward rate volatilities.The forward rate processes are Gaussian.Since the discounted bond price V (t , s) is a martingale under therisk-neutral measure, we have µ(t , s) = σ(t , s)

∫ st σ(t , y)dy . So the

term structures are uniquely determined.Using Itô Lemma, the bond price satisfies the SDE

dP(t , s) = rtP(t , s)dt −[∫ s

tσ(t , y)dy

]P(t , s)dBt

This linear homogeneous equation with multiplicative noise can besolved using the standard method.


From Risk-Neutral to Forward Risk

Consider again the one-factor Gaussian forward rate model:

df (t , s) = µ(t , s)dt + σ(t , s)dBt , 0 ≤ t ≤ s,

with µ(t , s) = σ(t , s)∫ s

t σ(t , y)dy under the risk-neutral measureQ.For any Ft -adapted contingent claim C(s), payable at time s, itsprice C(t), t ≤ s, can be obtained via expectation under Q.The implementation of calculating the risk-neutral expectation issometime difficult because the joint distribution of e−

∫ st rudu and

C(s) under Q needs to be identified.Rewrite: df (t , s) = σ(t , s)

[∫ st σ(t , y)dydt + dBt

], 0 ≤ t ≤ s.

Consider Bst =

∫ t0 b(u, s)du + Bt , where b(u, s) = −

∫ su σ(u, y)dy .


Forward Risk Adjusted MeasureThe Girsanov’s Theorem implies that there is a probabilitymeasure Qs, called the forward risk adjusted measure, such thatBs

t , 0 ≤ t ≤ s, is standard Brownian motion under Qs.Under Qs, df (t , s) = σ(t , s)dBs

t , 0 ≤ t ≤ s, becomes a martingale.For any Ft -contingent claim C(s) payable at time s, its discountedprice e−

∫ t0 ruduC(t) is a martingale under Q.

It follows from the Martingale Representation Theorem thatd(e−

∫ t0 ruduC(t)

)= σC(t)dBt for some volatility process σC(t).

This martingale representation, the SDE for P(t , s) and the ItôLemma imply that C(t)/P(t , s), 0 ≤ t ≤ s, is a martingale underthe forward risk adjusted measure Qs.C(t) = P(t , s)EQs (C(s)|Ft ). That is, the discounted processe−

∫ st rudu is separated from the contingent claim payoff under Qs.

This is useful for pension valuation for which one often needs toevaluate the expected cash flow from a fixed income portfolio andthen discount it using a yield curve.


Bond Option PricingConsider European call options on zero-coupon bond P(t ,T ) withstrike price K and maturity s, t ≤ s ≤ T .The payoff of the option is (P(s,T )− K )+.The forward rate f (t , s) follow the one-factor Gaussian model.The process P(t ,T )/P(t , s) is a martingale under the forward riskadjusted measure Qs, and satisfies

d(

P(t ,T )

P(t , s)

)= −P(t ,T )

P(t , s)

[∫ T

sσ(t , y)dy

]dBs

t .

Hence P(s,T ) = P(s,T )/P(s, s) has a lonnormal distributionunder Qs.The price of the call option can then be calculated usingφc(t) = P(t , s)EQs (P(s,T )− K )+.

The corresponding put price φp(t) = P(t , s)EQs (K − P(s,T ))+

may be obtained by the put-call parity.


Market is Incomplete

If stock prices are modelled by Lévy processes, then a problemarising from non-Gaussian option pricing is that the market isincomplete.That is, there may be more than one possible pricing formula. Thisis clearly undesirable, and a number of selection principles, suchas entropy minimization, have been employed to overcome thisproblem.


Outline

Numerical SolutionsReferences


Numerical Solution of Stochastic Differential EquationsSDEs which admit an explicit solution are few exceptions.Therefore numerical techniques for the approximation of thesolution to a SDE are often called for.One purpose is to visualize a variety of sample paths of thesolution. A collection of such paths is called a scenario, which canbe used for some kind of “prediction” of the stochastic process atfuture instants of time.A second objective is to achieve reasonable approximations to thedistributional quantities (expectations, variances, covariance andhigher-order moments) of the solution to a SDE.Only in a few cases one is able to give explicit formulas for thesequantities, and even then they frequently involve special functionswhich have to be approximated numerically.Numerical solutions allow us to simulate as many sample paths aswe want; they constitute the basis for Monte-Carlo techniques toobtain the distributional characteristics and option pricing.


The Euler Approximation SchemeFor illustration, consider the SDE

dXt = µ(Xt )dt + σ(Xt )dBt , t ∈ [0,T ].

We assume that the coefficient functions µ(x) and σ(x) are Lipschitzcontinuous, and EX 2

0 <∞, which guarantee the existence anduniqueness of a strong solution.

1 To approximate the solution, partition [0,T ] as follows,

τn : 0 = t0 < t1 < · · · < tn−1 < tn = T , with ∆i = ti−ti−1,1 ≤ i ≤ n,

and mesh(τn) = max1≤i≤n ∆i . Let ∆iB = Bti − Bti−1 , 1 ≤ i ≤ n.2 Define recursively, 1 ≤ i ≤ n,

X (n)ti = X (n)

ti−1+ µ(X (n)

ti−1)∆i + σ(X (n)

ti−1)∆iB,

with X (n)0 = X0


Idea: The First-Order Approximation1 Consider, 1 ≤ i ≤ n,

Xti = Xti−1 +

∫ ti

ti−1

µ(Xs)ds +

∫ ti

ti−1

σ(Xs)dBs.

2 The Euler approximation is based on a discretization of theintegrals∫ ti

ti−1

µ(Xs)ds ≈ µ(Xti−1)∆i ,

∫ ti

ti−1

σ(Xs)dBs ≈ σ(Xti−1)∆iB.

3 That is, for 1 ≤ i ≤ n,

Xti ≈ Xti−1 + µ(Xti−1)∆i + σ(Xti−1)∆iB.

In practice one usually chooses equi-distant points ti such thatmesh(τn) = T/n, and

X (n)iT/n = X (n)

(i−1)T/n + µ(X (n)(i−1)T/n)∆i + σ(X (n)

(i−1)T/n)∆iB, 1 ≤ i ≤ n.


Strong Numerical Solution

Strong Marginal Convergence1 The numerical solution X (n) converges strongly to X with orderγ > 0 if there exists a constant c > 0 such thatE |XT − X (n)

T | ≤ c mesh(τn)γ , ∀n ≥ 1.2 X (n) is a strong numerical solution of the SDE if

E |XT − X (n)T | → 0, as mesh(τn)→ 0.

One could use E sup0≤t≤T |Xt − X (n)t | as a more appropriate criteria to

describe the pathwise closeness of X and X (n). But this quantity ismore difficult to deal with theoretically.

The Euler ApproximationThe equidistant Euler approximation converges strongly with order 0.5.


Weak Numerical SolutionIn contrast to a strong numerical solution, a weak numerical solutionaims at the approximation of the moments of the solution X . Let f bechosen from a class of smooth functions, e.g., certain polynomials orfunctions with a specific polynomial growth.

Weak Marginal Convergence1 The numerical solution X (n) converges weakly to X with orderγ > 0 if there exists a constant c > 0 such that|Ef (XT )− Ef (X (n)

T )| ≤ c mesh(τn)γ , ∀n ≥ 1.2 X (n) is a weak numerical solution of the SDE if|Ef (XT )− Ef (X (n)

T )| → 0, as mesh(τn)→ 0.

The Euler ApproximationThe equidistant Euler approximation converges weakly with order 1.0for a class of functions f with appropriate polynomial growth.


The Milstein Approximation SchemeIn contrast to the first order approximation, the Milsteinapproximation exploits a so-called Taylor-Itô expansion thatincorporates high order approximation.Heuristics: Apply the Itô lemma to the integrands µ(Xs) and σ(Xs)at each point ti−1 of discretization, and then estimate the higherorder terms using the fact that (dBs)2 = ds.Taylor-Itô expansions involve multiple stochastic integrals. Theirrigorous treatment requires a more advanced theory of thestochastic calculus.

The Milstein ApproximationDefine recursively for 1 ≤ i ≤ n,X (n)

ti = X (n)ti−1

+µ(X (n)ti−1

)∆i +σ(X (n)ti−1

)∆iB + 12σ(X (n)

ti−1)σ′(X (n)

ti−1)[(∆iB)2−∆i ],

with X (n)0 = X0.

The equidistant Milstein approximation converges strongly with order1.0.Haijun Li An Introduction to Stochastic Calculus Lisbon, May 2018 162 / 169

Monte Carlo vs Numerical MethodsOnce sample paths (or scenarios) of the solution of an Itô SDEare obtained, they can be used to estimate the distributionalquantities (expectations, variances, covariance and higher-ordermoments) of the solution.Since derivative prices are often written as expectations ofunderlying asset values, which are the solutions of SDEs, MonteCarlo method becomes an essential tool in the pricing ofderivative securities and in risk management.Monte Carlo is generally not a competitive method for calculatingunivariate expectation. For example, the error in a trapezoidal rulefor the integral of a d-dimensional twice continuously differentiablefunction is O(n−2/d ), which is in contrast to the standard errorO(n−1/2) of the Monte Carlo method for the same problem.The performance degradation with increasing dimension is acharacteristic of all deterministic integration methods, and thusMonte Carlo methods are attractive in evaluating integrals in highdimension.


Illustrative Example: European Call Option

The price of one share of a risky asset (stock) is described by


The price of a riskless asset (bond) is described bydβt = rβdβt , t ∈ [0,T ].

At time of maturity T , VT = (XT − K )+.Using the Fundamental Theorem of Arbitrage-Free Pricing, wehave

C := V0 = E(e−rT (XT − K )+).

with XT = X0e(r− 12σ

2)T+σBT .Although this formula can be written explicitly in terms of thenormal distribution (the Black-Scholes formula), we can alsoestimate C using Monte Carlo method.


MC Estimate of European Call Options

Algorithmfor i = 1, . . . ,n

generate the standard normal Zi

set Xi(T ) = X0e(r− 12σ

2)T+σ√

TZi

set Ci = e−rT (Xi(T )− K )+

set Cn = (C1 + · · ·+ Cn)/n.

The estimator Cn is unbiased and strongly consistent.For finite but at least moderately large n, we can supplement thepoint estimate Cn with a (1− α)100% confidence intervalCn +−tα/2,n−1

sC√n , where sC is the sample standard deviation, and

tα/2,n−1 is the upper 100(α/2)th percentage point of a t distributionwith n − 1 degrees of freedom.


Another Illustrative Example: Asian Options

Consider the payoff VT = (X − K )+, where X = (∑m

j=1 Xtj )/m fora fixed set of dates 0 = t0 < t1 < · · · < tm = T .Again, the Fundamental Theorem of Arbitrage-Free Pricingimplies that C := V0 = E(e−rT (X − K )+) whereXtj+1 = Xtj e

(r− 12σ

2)(tj+1−tj )+σ√

tj+1−tj Zj+1 .

Algorithmfor i = 1, . . . ,n

for j = 1, . . . ,mgenerate the standard normal Zij

set Xi(j) = Xi(j − 1)e(r− 12σ

2)(tj−tj−1)+σ√

tj−tj−1Zij

set Xi = (Xi(1) + · · ·+ Xi(m))/mset Ci = e−rT (Xi − K )+

set Cn = (C1 + · · ·+ Cn)/n.


Efficiency of Simulation EstimatorsCn from above two examples is unbiased and asymptoticallynormal.More precisely, let s denote our computational budget, and τdenote the computational time needed for Ci , then

√s[Cbs/τc − C]→d N(0, σ2

Cτ),

as s →∞. In comparing unbiased estimators, we should preferthe one for which σ2

Cτ is smallest.Bias frequently occurs in estimation via MC methods. Forexample, the bias can arise due to the following errors.

1 Model discretization error: For many models, exact sampling of thecontinuous-time dynamics is infeasible, some discretizationapproximation has to be used, resulting a bias.

2 Payoff discretization error: Discretization has to be used for thepayoffs that are functionals of the underlying asset processes.

3 Nonlinear functions of means: In a compound option, the price ofthe first option depends on the price of the second option ..., butthese prices can only be estimated, resulting a bias.


Some References and Further ReadingThis lecture notes are written using the books “ElementaryStochastic Calculus” (World Scientific, 2002) by Thomas Mikosch,and “Introductory Stochastic Analysis for Finance and Insurance”(Wiley, 2006) by Sheldon Lin.A Standard Advanced Textbook on Itô Integrals: “Brownian Motionand Stochastic Calculus” (Springer 1991) by I. Karatzas and S. E.Shreve.Stochastic Integrals and SDEs Driven by Lévy Processes: “LévyProcesses and Stochastic Calculus” (Cambridge 2009) by D.Applebaum.Stochastic Finance: “Stochastic Calculus for Finance I, II”(Springer 2004) by S. E. Shreve.SDE Application in Actuarial Science: “Introductory StochasticAnalysis for Finance and Insurance” (Wiley, 2006) by Sheldon Lin,and “Stochastic Control in Insurance” (Springer 2008) by H.Schmidli.


More References

Numerical Analysis on SDEs: “Numerical Solution of StochasticDeferential Equations” (Springer 1995) by P. Kloeden and E.Platen.Monte Carlo Simulation: “Monte Carlo Methods in FinancialEngineering” (Springer 2004) by Paul Glasserman.Lévy Matters: “Financial Modelling with Jump Processes”(Chapman & Hall 2004) by Rama Cont and Peter Tankov.Financial Times Series (GARCH, univariate and multivariate):“Statistics of Financial Markets” (Springer 2008) by J. Franke, C.M. Hafner and W. K. Hardle.


An Introduction to Stochastic Calculus · Example: A stochastic process is called Gaussian if all...

Documents

Transcript of An Introduction to Stochastic Calculus · Example: A stochastic process is called Gaussian if all...