MARKOV PROCESSES: THEORY AND EXAMPLES - uni …hm0110/Markovprocesses/sw20.pdf · MARKOV PROCESSES:...

MARKOV PROCESSES: THEORY AND EXAMPLES

JAN SWART AND ANITA WINTER

Date: April 10, 2013.

1

2 JAN SWART AND ANITA WINTER

Contents

1. Stochastic processes 31.1. Random variables 31.2. Stochastic processes 51.3. Cadlag sample paths 61.4. Compactification of Polish spaces 182. Markov processes 232.1. The Markov property 232.2. Transition probabilities 272.3. Transition functions and Markov semigroups 302.4. Forward and backward equations 323. Feller semigroups 343.1. Weak convergence 343.2. Continuous kernels and Feller semigroups 353.3. Banach space calculus 373.4. Semigroups and generators 403.5. Dissipativity and the maximum principle 423.6. Hille-Yosida: different formulations 463.7. Dissipative operators 483.8. Resolvents 503.9. Hille-Yosida: proofs 514. Feller processes 564.1. Markov processes 564.2. Jump processes 574.3. Feller processes with compact state space 624.4. Feller processes with locally compact state space 655. Harmonic functions and martingales 705.1. Harmonic functions 705.2. Filtrations 705.3. Martingales 725.4. Stopping times 745.5. Applications 765.6. Non-explosion 796. Convergence of Markov processes 816.1. Convergence in path space 816.2. Proof of the main result (Theorem 4.2) 877. Strong Markov property 89References 93

MARKOV PROCESSES 3

1. Stochastic processes

In this section we recall some basic definitions and facts on topologies andstochastic processes (Subsections 1.1 and 1.2). Subsection 1.3 is devoted tothe study of the space of paths which are continuous from the right andhave limits from the left. Finally, for sake of completeness, we collect factson compactifications in Subsection 1.4. These will only find applications inlater sections.

1.1. Random variables. Probability theory is the theory of random vari-ables, i.e., quantities whose value is determined by chance. Mathemati-cally speaking, a random variable is a measurable map X : Ω → E, where(Ω,F ,P) is a probability space and (E, E) is a measurable space. The prob-ability measure

PX = L(X) := P X−1

on (E, E) is called the law of X and usually the only object that we are reallyinterested in.1 If (Xt)t∈T is a family of random variables, taking values inmeasurable spaces (Et, Et)t∈T , then we can view (Xt)t∈T as a single randomvariable, taking values in the product space

∏t∈T Et equipped with the

product-σ-field∏

t∈T Et.2 The law P(Xt)t∈T= L((Xt)t∈T ) of this random

variable is called the joint law of the random variables (Xt)t∈T .In practise, we usually need a bit more structure on the spaces that our

random variables take values in. For our purposes, it will be sufficient toconsider random variables taking values in Polish spaces.

Recall that a topology on a space E is a collection O of subsets of E, calledopen sets, such that:

(1) E, ∅ ∈ O.(2) Ot ∈ O ∀t ∈ T implies

⋃t∈T Ot ∈ O.

(3) O1, O2 ∈ O implies O1 ∩O2 ∈ O.

A topology is metrizable if there exists a metric d on E such that the opensets in this topology are the sets O with the property that ∀x ∈ O ∃ε >0 s.t. Bε(x) ⊂ O, where Bε(x) := y ∈ E : d(x, y) < ε is the open ballaround x with radius ε. Two metrics are called equivalent if they define thesame topology. Concepts such as convergence, continuity, and compactnessdepend only on the topology but completeness depends on the choice of the

1At this point, one may wonder why probabilists speak of a random variables at alland do not immediately focus on the probability measures that are their laws, if thatis what they are really after. The reason is mainly a matter of convenient notation. Ifµ = L(X) is the law of a real-valued random variable X, then what is the law of X2? Interms of random variables, this is simply L(X2). In terms of probability measures, this isthe image of the probability measure µ under the map x 7→ x2, i.e., the measure µ f−1

where f : R → R is defined as f(x) = x2 –an unpleasantly long mouthful.2Recall that

∏t∈T Et := (xt)t∈T : xt ∈ Et ∀t ∈ T. The coordinate projections

πt :∏

t∈T Et := (xt)t∈T : xt ∈ Et → Et are defined by πt((xs)s∈T ) := xt, t ∈ T . Bydefinition, the product-σ-field

∏t∈T Et is the σ-field on

∏t∈T Et that is generated by the

coordinate projections, i.e.,∏

t∈T Et := σ(πt : t ∈ T ) = σ(π−1t (A) : A ∈ Et).


metric.3 A topological space E is called separable if there exists a countableset D ⊂ E such that D is dense in E.

By definition, a topological space (E,O) is Polish if E is separable andthere exists a complete metric defining the topology on E. We always equipPolish spaces with the Borel-σ-field B(E), which is the σ-field generated bythe open sets.

The reason why we are interested in Polish spaces is that for randomvariables taking values in Polish spaces, certain useful results are true thatdo not hold in general, since they make use of the fact.

Lemma 1.1 (Probability measures on Polish spaces are tight). Each prob-ability measure P on a Polish space (E,O) is tight, i.e., for all ε > 0 thereis a compact set K ⊆ E such that P(K) ≥ 1− ε.

Proof. Let (xk)k∈N be dense in (E,O), and let P be a probability measureon (E,O). Given ε > 0 and a metric d on (E,O), we can choose N1, N2, . . .such that

(1.1) P(∪Nnk=1 x′ : d(x′, xk) <

1

n)≥ 1− ε

2n.

Let K be the closure of⋂

n≥1

⋃Nnk=1x′ : d(x′, xk) <

1n. Then K is totally

bounded4, and hence compact, and we have P(K) ≥ 1−ε∑∞

n=1 2−n = 1−ε.

For example, the following result states that provided the state space(E,O) is Polish, for each projective family of probability measures thereexists a projective limit.

Theorem 1.2 (Percy J. Daniell [Dan19], Andrei N. Kolmogorov [Kol33]).Let (Et)t∈T be (a possibly uncountable) collection of Polish spaces and letµS (S ⊂ T finite) be probability measures on (Et)t∈S such that

(1.2) µS′ (πS)−1 = µS , S ⊂ S′ ⊂ T, S, S′ finite,

where πS denotes the projection on (Et)t∈S . Then there exists a uniqueprobability measure µT on

∏t∈T Et, equipped with the product σ-field, such

that

(1.3) µT π−1S = µS, S ⊂ T, S finite.

Proof. For Et ≡ R see e.g. Theorem 2.2.2 in [KS88].

3More precisely: completeness depends on the uniform structure defined by the metric.For the theory of uniform spaces, see for example [Kel55].

4Recall that a set A is totally bounded if for each ε > 0, A possesses a finite ε-net,where an ε-net for A is a collection of points xn with the property that for each x ∈ A

there is an xk such that d(x, xk) < ε.

MARKOV PROCESSES 5

A consequence of Kolmogorov’s extension theorem is that if µS : S ⊂T finite are probability measures satisfying the consistency relation (1.2),then there exist random variables (Xt)t∈T defined on some probability space(Ω,F ,P) such that L((Xt)t∈S) = µS for each finite S ⊂ T . (The canonicalchoice is Ω =

∏t∈T Et.)

Exercise 1.3. For n ∈ N, ki ∈ 0, 1, i = 0, . . . , n, and 0 =: t0 < t1 <· · · < tn in [0,∞), let τn := infl ≥ 0 : kl = 1 ∧ (1 + n), and

(1.4)

µt1,...,tn(k1, . . . , kn)

:= 10 ≤ k1 ≤ · · · ≤ kn ≤ 1

e−tτn−1 − e−tτn , if τn ≤ n,e−tn , if τn = 1 + n.

(i) Show that the collection µt1,...,tn ; 0 =: t0 < t1 < · · · < tn of prob-ability measures on k ∈ 0, 1n : 0 ≤ k1 ≤ · · · ≤ kn ≤ 1 satisfiesthe consistence condition (1.2).

(ii) Can you find one (or even more than one) 0, 1-valued stochasticprocess X with

(1.5)PXt1 = k1, . . . ,Xtn = kn = µt1,...,tn(k1, . . . , kn), 0 ≤ k1 ≤ · · · ≤ kn ≤ 1.

1.2. Stochastic processes. A stochastic process with index set T and statespace E is a collection of random variables X = (Xt)t∈T (defined on aprobability space (Ω,F ,P)) with values in E. We will usually be interestedin the case that T = [0,∞) and E is a Polish space. We interpret X =(Xt)t∈[0,∞) as a quantity the value of which is determined by chance andthat develops in time.

A stochastic process is called measurable if the map (t, ω) 7→ Xt(ω) from[0,∞)×Ω into E is measurable. The functions t 7→ Xt(ω) (with ω ∈ Ω) arecalled the sample paths of the process X.

Lemma 1.4 (Right continuous sample paths). If X has right continuoussample paths then X is measurable.

Proof. Define processes X(n) by X(n)t := X⌊nt+1⌋/n. Then, for each measur-

able set A ⊆ E, (t, ω) : X(n)t (ω) ∈ A =

⋃∞k=0[k/n, (k + 1)/n) ×X−1

k/n(A),

so X(n) is measurable for each n ≥ 1. By the right-continuity of the samplepaths, X(n) −→

n→∞X pointwise, so X is measurable.

By definition, the laws L(Xt1 , . . . ,Xtn) with 0 ≤ t1 < · · · < tn are calledthe finite-dimensional distributions ofX. IfX and Y are stochastic processeswith the same finite dimensional distributions then we say that Y is a versionof X (and vice versa). Here X and Y need not be defined on the sameprobability space. If X and Y are stochastic processes defined on the sameprobability space then we say that Y is a modification of X if Xt = Yt a.s.


∀t ≥ 0. Note that if Y is a modification of X, then X and Y have the samefinite dimensional distributions. We say that X and Y are indistinguishableif Xt = Yt ∀t ≥ 0 a.s.5

Example (Modification). Let (Ω,F ,P) = ([0, 1],B[0, 1], ℓ) where ℓ is theLebesgue measure. For a given x ∈ (0,∞), define [0, 1]-valued stochasticprocesses Xx and Y by

(1.6) Xx(t) :=

t, if t 6= x,0, if t = x,

and

(1.7) Y (t) := t, t ∈ [0, 1].

Then Y is a modification of Xx but X and Y are not indistinguishable.

Lemma 1.5 (Right continuous modifications). If Y is a modification ofX and X and Y have right-continuous sample paths, then X and Y areindistinguishable.

Proof. If Y is a modification of X then Xt = Yt ∀t ∈ Q a.s. By right-continuity of sample paths, this implies that Xt = Yt ∀t ≥ 0 a.s.

We will usually be interested in stochastic processes with sample pathsthat have right limits Xt+ := lims↓tXs for each t ≥ 0 and left limitsXt− := lims↑tXs for each t > 0. In practise nobody can measure timewith infinite precision, so when we model a real process it is a matter oftaste whether we assume that the sample paths are right or left continuous;it is tradition to assume that they are right continuous. (Lemmas 1.4 and1.5 hold equally well for processes with left continuous sample paths.) Notethat a consequence of this assumption is that the sample paths cannot havea jump at time t = 0; this will actually be convenient later on. In the nextsection we study the space of all paths that are right-continuous with leftlimits in more detail.

1.3. Cadlag sample paths. Let (E,O) be a metrizable space. A functionw : [0,∞) → E such that w is right continuous and w(t−) exists for eacht > 0 is called a cadlag function (from the French “continu a droit limite agauche”). The space of all such functions is denoted by

(1.8)DE [0,∞) :=

w : [0,∞) → E : w(t) = w(t+) ∀t ≥ 0, w(t−) exists ∀t > 0

.

5Note the order of the statements: If Y is a modification of X, then there is for eacht ≥ 0 a measurable set Ω∗

t ⊂ Ω with P(Ω∗) = 1 such that Xt(ω) = Yt(ω) for all ω ∈ Ω∗t .

If X and Y are indistiguishable, then there exists a measurable set Ω∗ (independent of t)such that Xt(ω) = Yt(ω) for all ω ∈ Ω∗.

MARKOV PROCESSES 7

We begin by observing that functions in DE[0,∞) are better behavedthan one might suspect.

Lemma 1.6 (Only countably many jumps). If w ∈ DE [0,∞), then w hasat most countably many points of discontinuity.

Proof. For n = 1, 2, . . ., and d a metric on (E,O), let

(1.9) An :=t > 0 : d

(w(t), w(t−)

)>

1

n

.

Since w has limits from the right and the left, An can not possess clus-ter points. Hence An is countable for all n = 1, 2, . . ., and the set of alldiscontinuities of ∪n≥1An is countable too.

In order to be in a position to do probability on spaces of random variableswith values in DE [0,∞) we want to equip DE [0,∞) with a topology so thatin this topology DE[0,∞) is Polish. We wil see that this is possible providedthat (E,O) is Polish.

To motivate the topology that we will choose, we first take a look at thespace

(1.10) CE [0,∞) :=continuous functions w : [0,∞) → E

.

Lemma 1.7 (Uniform convergence on compacta). Let (E, d) be metricspace. Then the following conditions on functions wn, w ∈ CE[0,∞) areequivalent.

(a) For all T > 0,

(1.11) limn→∞

supt∈[0,T ]

d(wn(t), w(t)) = 0.

(b) For all (tn)n∈N, t ∈ [0,∞) such that tn −→n→∞

t,

(1.12) limn→∞

wn(tn) = w(t).

Proof. (a)⇒(b). If tn → t then there is a T > 0 such that tn, t ≤ T for alln. Now

(1.13)d(wn(tn), w(t)

)≤ d

(w(tn), w(t)

)+ d

(wn(tn), w(tn)

)

≤ d(w(tn), w(t)

)+ sup

s∈[0,T ]d(wn(s), w(s)

)−→n→∞

0.

by (a) and the continuity of w.

(b)⇒(a). Imagine that there exists a T > 0 such that

(1.14) lim supn→∞

supt∈[0,T ]

d(wn(t), w(t)) = ε > 0.

Then we can choose sn ∈ [0, T ] such that lim supn→∞ d(wn(sn), w(sn)) =ε. By the compactness of [0, T ] we can choose n1 < n2 < · · · such thatlimm→∞ snm = t for some t ∈ [0, T ] and d(wnm(snm), w(snm)) ≥ ε

2 for each


m. Hence, d(wnm(snm), w(t)

)+ d

(w(snm), w(t)

)≥ d(wnm(snm), w(snm)) ≥

ε2 . By continuity, d(w(snm), w(t)) −→

m→∞

0. We therefore find that


d(wn(sn), w(t)) ≥ε

2

which contradicts (1.12).

If wn, w are as in Lemma 1.7 then because of Property (a) we say thatwn converges to w uniformly on compacta. Property (b) shows that this

definition does not depend on the choice of the metric on E, i.e., if d and d areequivalent metrics on E then wn → w uniformly on compacta w.r.t. d if andonly if wn → w uniformly on compacta w.r.t. d. The topology on CE[0,∞)of uniform convergence on compacta is metrizable. A possible choice ofa metric on CE[0,∞) generating the topology of uniform convergence oncompacta is for example:

(1.16) du.c.(w1, w2) :=

∫ ∞

0ds e−s sup

t∈[0,∞)1 ∧ d

(w1(t ∧ s), w2(t ∧ s)

)

Remark. If d is a metric on (E,O), then 1 ∧ d is also a metric, and bothmetrics are equivalent.

On DE [0,∞), we could also define uniform convergence on compacta asin Lemma 1.7, Property (a), but this topology would be too strong for ourpurposes. For example, if E = R, we would like the functions wn := 1[1+ 1

n,∞)

to approximate the function w := 1[1,∞) as n → ∞, but supt∈[0,2] |wn(t) −w(t)| = 1 for each n. We wish to find a topology on DE [0,∞) such thatwn → w whenever the jump times of the functions wn converge to the jumptimes of w while the “rest” of the paths converge uniformly in compacta.The main result of this section is that such a topology exists and has niceproperties.

Theorem 1.8 (Skorohod topology). Let (E, d) be a metric space. Thenthere exists a metric ddSk on DE [0,∞) such that in this metric, DE [0,∞)is separable if E is separable, DE[0,∞) is complete if E is complete, andwn → w if and only if for all T ∈ [0,∞) there exists a sequence λn of strictlyincreasing, continuous functions λn : [0, T ] → [0,∞) with λn(0) = 0, suchthat

(1.17) limn→∞

supt∈[0,T ]

|λn(t)− t| = 0

and for (tn)n∈N, t ∈ [0, T ],

(1.18) limn→∞

wn(λn(tn)) =

w(t), whenever (tn) ↓ t,w(t−), whenever (tn) ↑ t.

MARKOV PROCESSES 9

Remark. The idea of the functions λn in (1.17) and (1.18) is to make twofunctions w, w close in the topology on DE [0,∞) if a small deformation ofthe time scale makes them near in the uniform topology. The topology inTheorem 1.8 is called the Skorohod topology, after its inventor.

Our proof of Theorem 1.8 will follow Section 3.5 in [EK86]. Let Λ′ bethe collection of strictly increasing functions λ mapping [0,∞) onto [0,∞).In particular, for all λ ∈ Λ′ we have λ(0) = 0, limt→∞ λ(t) = ∞, and λis continuous. Furthermore, let Λ be the subclass of Lipschitz continuousfunctions λ ∈ Λ′ such that

(1.19) ‖ λ ‖:= sup0≤s<t

∣∣∣∣logλ(t)− λ(s)

t− s

∣∣∣∣ < ∞.

In the literature ‖ λ ‖ is refered to as the dilatation of λ ∈ Λ.

Lemma 1.9 (Properties of a dilatation). The dilatation ‖ · ‖: Λ → R+ hasthe following properties:

(i) For all λ ∈ Λ,

(1.20) ‖ λ ‖=‖ λ−1 ‖,where λ−1 denotes the inverse function of λ, i.e., λ−1(λ(t)) = t forall t ≥ 0.

(ii) If λ1, λ2 ∈ Λ, then λ1 λ2 ∈ Λ, and we have

(1.21) ‖ λ1 λ2 ‖≤‖ λ1 ‖ + ‖ λ2 ‖ .

(iii) If (λn)n∈N is a sequence in Λ with ‖ λn ‖ −→n→∞

0, then for all T ∈[0,∞),

(1.22) limn→∞

supt∈[0,T ]

|λn(t)− t| = 0.

Proof of Lemma 1.9. (i) For all λ ∈ Λ,

(1.23)

‖ λ ‖ = sup0≤s<t

∣∣ log λ(t)− λ(s)

t− s

∣∣

= sup0≤s′:=λ(s)<t′:=λ(t)

∣∣− logλ−1(t′)− λ−1(s′)

t′ − s′∣∣ =‖ λ−1 ‖ .

(ii) For λ1, λ2 ∈ Λ, λ1 λ2 is also Lipschitz continuous, and

(1.24)

‖ λ1 λ2 ‖ = sup0≤s<t

∣∣∣∣logλ1 λ2(t)− λ1 λ2(s)

t− s

∣∣∣∣

= sup0≤s<t

∣∣∣∣logλ1 λ2(t)− λ1 λ2(s)

λ2(t)− λ2(s)+ log

λ2(t)− λ2(s)

t− s

∣∣∣∣≤‖ λ1 ‖ + ‖ λ2 ‖ .

In particular, λ1 λ2 ∈ Λ.


(iii) Since for all λ ∈ Λ,

(1.25)

‖ λ ‖ ≥ 1− e−‖λ‖

= sup0≤s<t

(1− e−| log λ(t)−λ(s)

t−s|)

= sup0≤s<t

(1− λ(t)− λ(s)

t− s∧ t− s

λ(t)− λ(s)

),

‖ λn ‖ −→n→∞

0 implies that sup0≤s<t |λn(t)−λn(s)t−s | −→

n→∞

1. In particular, for all

T ≥ 0,

(1.26) limn→∞

supt∈[0,T ]

|λn(t)− t| = 0.

(Counter-)Example. For n ∈ N, let

(1.27) λn(t) :=

n(n−2)n2−2

t, if t ∈ [0, 12 − 1n2 ],

nt+ 12(1− n), if t ∈ [12 − 1

n2 ,12 +

1n2 ],

n(n−2)n2−2 t− 2(n−1)

n2−2 , t ∈ [12 + 1n2 , 1].

Then (λn)n∈N in Λ, satisfies (1.22) but ‖ λn ‖= n−→n→∞

∞.

In analogy with (1.16), for v,w ∈ DE [0,∞), we define the Skorohod metricby

(1.28)

ddSk(v,w)

:= infλ∈Λ

‖ λ ‖ ∨

∫

[0,∞)ds e−s sup

t∈[0,∞)1 ∧ d

(v(t ∧ s

), w

(λ(t) ∧ s

)).

The next lemma states that ddSk is indeed a metric on DE[0,∞).

Lemma 1.10.(DE [0,∞), ddSk

)is a metric space.

Proof. For symmetry recall Part (i) of Lemma 1.9, and notice that

(1.29)

supt∈[0,∞)

1 ∧ d

(v(t ∧ s

), w

(λ(t) ∧ s

))

= supt∈[0,∞)

1 ∧ d(v(λ−1(t) ∧ s

), w

(t ∧ s

)),

for all λ ∈ Λ. This implies that dSk(v,w) = dSk(w, v) for all v,w ∈ DE [0,∞).

If dSk(v,w) = 0, then there exists a sequence (λn)n∈N in Λ such that‖ λn ‖ −→

n→∞

0 and

(1.30) ℓs ∈ [0, s0] : supt∈[0,∞)

1 ∧ d(v(t ∧ s), w(λn(t) ∧ s)

)≥ ε −→

n→∞

0

MARKOV PROCESSES 11

for all ε > 0 and s0 ∈ [0,∞). Hence by Part (iii) of Lemma 1.9 and (1.30),v(t) = w(t) for all continuity points t of w, and therefore by Lemma 1.6 andright continuity of v and w, v = w.

It remains to show the triangle inequality. Recall Part (ii) of Lemma 1.9,and notice that

for all t ∈ [0,∞),

(1.31)

sups∈[0,∞)

1 ∧ d

(w(t ∧ s

), u

(t ∧ λ1 λ2(s)

))

≤ sups∈[0,∞)

1 ∧ d

(w(t ∧ s

), v(t ∧ λ2(s)

))

+ sups∈[0,∞)

1 ∧ d

(v(t ∧ λ2(s)

), u

(t ∧ λ1 λ2(s)

))

= sups∈[0,∞)

1 ∧ d

(w(t ∧ s

), v(t ∧ λ2(s)

))

+ sups∈[0,∞)

1 ∧ d

(v(t ∧ s

), u

(t ∧ λ1(s)

)).

Combining (1.24) and (1.31) implies that dSk(w, u) ≤ dSk(w, v)+dSk(v, u).

Exercise 1.11. For n ∈ N, let vn := 1[0,1−2−n) and wn := 1[0,2−n). Decide

whether the sequences (vn)n∈N and (wn)n∈N converge in(DE[0,∞), ddSk

)and,

if so, determine the limit function.

Proposition 1.12 (A convergence criterion). Let (wn) ∈ DE[0,∞) andw ∈ DE [0,∞). Then the following are equivalent:

(a) ddSk(wn, w)−→n→∞

0.

(b) There exists a sequence (λn)n∈N ∈ Λ such that ‖ λn ‖ −→n→∞

0 and

(1.32) limn→∞

supt∈[0,T ]

d(wn(λn(t)), w(t)

)= 0,

for all T ∈ [0,∞).(c) For each T > 0, there exists a sequence (λn)n∈N in Λ′ (possibly

depending on T ) satisfying (1.26) and (1.32).(d) For each T > 0, there exists a sequence (λn)n∈N in Λ′ (possibly

depending on T ) satisfying (1.17) and (1.18).

Corollary 1.13. The Skorohod topology does not depend on the choice ofthe metric on (E,O).


Proof of Corollary 1.13. If d, d are two equivalent metrics on (E,O) and ddSkand ddSk are the associated Skorohod metrics, then formula (1.18) shows that

wn −→n→∞

w in dSk if and only if wn −→n→∞

w in ddSk. It is easy to see that two

metrics are equivalent if every sequence that converges in one metric alsoconverges in the other metric, and vice versa.6

Proof of Proposition 1.12. (a)⇐⇒(b). We start showing that (a) is equiva-lent to (b). Assume first that ddSk(wn, w)−→

n→∞

0 for a metric d on (E,O). By

definition, then there exist sequences (λn)n∈N in Λ such that ‖ λn ‖ −→n→∞

0

and

(1.33) ℓs ∈ [0, s0] : supt∈[0,∞)

1 ∧ d(wn(λn(t) ∧ s), w(t ∧ s)

)≥ ε −→

n→∞

0

for all ε > 0 and s0 ∈ [0,∞).Hence, there is a subsequence (nk)k∈N such that d

(wnk

(λnk(t) ∧ s), w(t ∧

s))−→k→∞

0 for almost every s ∈ [0,∞), and thus for all continuity points s of

w. That is, there exist sequences (λn)n∈N in Λ and (sn)n∈N ↑ ∞ in [0,∞)such that ‖ λn ‖ −→

n→∞

0 and

(1.34) limn→∞

supt≥0

d(wn(λn(t) ∧ sn), w(t ∧ sn)

)= 0.

Now for given T ∈ [0,∞), sn ≥ T ∨λn(T ) for all n sufficiently large. There-fore (1.34) implies (1.32).

On the other hand, let a sequence (λn)n∈N in Λ satisfy the condition of(b). Let s ∈ [0,∞). Then for each n ∈ N,

(1.35)

supt≥0

d(wn(λn(t) ∧ s), w(t ∧ s)

)

= supt′n:=λn(t)≥0

d(wn(t

′n ∧ s), w(λ−1

n (t′n) ∧ s))

≤ supt′n≥0

d(wn(t

′n ∧ s), w(λ−1

n (t′n ∧ s)))

+ supt′n≥0

d(w(λ−1

n (t′n ∧ s)), w(λ−1n (t′n) ∧ s)

).

6To see this, note that a set A is closed in the topology generated by a metric d ifand only if x ∈ A for all xn ∈ A with xn → x in d. This shows that two metrics whichdefine the same form of convergence have the same closed sets. Since open sets are thecomplements of closed sets, they also have the same open sets, i.e., they generate the sametopology.

MARKOV PROCESSES 13

We can estimate this further by

≤ supr:=λ−1

n (t′n∧s)∈[0,λ−1n (s)]

d(wn(λn(r)), w(r)

)

+ supr:=λ−1

n (t′n)∈[s,λ−1n (s)∨s]

d(w(r), w(s)

)∨

supr:=λ−1

n (t′n)∧s∈[λ−1n (s)∧s,s]

d(w(λ−1

n (s)), w(r)),

where the second half of the last inequality follows by considering the casest′n ≤ s and t′n > s separately. Thus by (1.32),

(1.36) limn→∞

supt∈[0,∞)

1 ∧ d

(wn(λn(t) ∧ s), w(t ∧ s)

)= 0

for every continuity point s of w. Hence, applying the dominated conver-gence theorem in (1.28) yields that ddSk

(wn, w

)−→n→∞

0.

(b)⇐⇒(c). Obviously, assumption (c) is weaker than (b) (recall also(1.26)). To see the other direction, let N be a positive integer, and (λN

n )n∈Nin Λ′ satisfying (1.26) with T = N and such that

(1.37) λNn (t) := λN

n (N) + t−N, t ≥ N.

We want to construct a sequence (λn)n∈N in Λ such that

• ‖ λn ‖ −→n→∞

0, and

• supt∈[0,T ] d(wn(λn(t)), w(t)

)−→n→∞

0, for all T ∈ [0,∞).

Notice that by (1.32) we can find a subsequence (nk)k∈N such that

(1.38) supt∈[0,N ]

d(wn(λ

Nn (t)), w(t)

)≤ 1

N

for all n ≥ nN , while in general, we can not conclude from (1.26) thatlim infn→∞ ‖ λN

n ‖= 0 (recall the counterexample given behind the proof ofLemma 1.9).

We proceed as follows. First we are therefore going to construct a se-

quence (λNn )n∈N in Λ whose Lipschitz constant goes to 1 as N → ∞ and

which is obtained by disturbing (λNn )n∈N such that the dilatation of λN

n

converges to zero as N → ∞ but mildly enough that we can ensure that

supt∈[0,N ] d(wn(λN

n (t)), wn(λn(t)))−→N→∞

0.

For that, define τN0 := 0, and for all k ≥ 1,

(1.39) τNk :=

inft > τNk−1 : d

(w(t), w(τNk−1)

)> 1

N , if τNk−1 < ∞,∞, ifτNk−1 = ∞.

Since w is right continuous, the sequence (τNk )k∈N is strictly increasing aslong as its terms remain finite. Since w has limits from the left, the sequencehas no cluster point. Now let for each n ∈ N,

(1.40) sNk,n := (λNn )−1(τNk ),


where by convention (λNn )−1(∞) = ∞.

Define a sequence (λNn )n∈N in Λ by

(1.41) λNn (t) :=

τNk +τNk+1−τNk

sNk+1,n−sNk,n(t− sNk,n), if t ∈ [sNk,n, s

Nk+1,n ∧N),

λNn (N) + t−N, if t ∈ (N,∞),

arbitrary, otherwise,

where, by convention ∞−1∞ = 1. With this convention and by (1.26),

(1.42) ‖ λNn ‖= max

sNk,n≤N

∣∣ log (τNk+1 − τNk )− log (sNk+1,n − sNk,n)∣∣−→n→∞

0,

and

(1.43) supt∈[0,N ]

d(wn(λN

n (t)), wn(λn(t)))≤ 2

N.

Since(1.44)

supt∈[0,N ]

d(wn(λN

n (t)), w(t))

≤ supt∈[0,N ]

d(wn(λ

Nn (t)), w(t)

)+ sup

t∈[0,N ]d(wn(λN

n (t)), wn(λn(t)))

≤ supt∈[0,N ]

d(wn(λ

Nn (t)), w(t)

)+

2

N,

for all n ∈ N, (1.18) implies that we can choose a subsequence (nk)k∈N such

that ‖ λNn ‖≤ 1

N and supt∈[0,N ] d(wn(λN

n (t)), w(t))≤ 3

N for all n ≥ nN . For

1 ≤ n < n1, let λn be arbitrary. For nN ≤ n < nN+1, N ≥ 1, let λn := λNn .

Then the sequence (λn)n∈N satisfies the conditions of (b).

(c)⇐⇒(d). To finish the proof we must show that (c) is equivalent to(d). Fix T > 0 and λn ∈ Λ′ satisfying (1.26). Define wn(t) := wn(λn(t))(t ∈ [0, T ]). Then we must show that the following conditions are equivalent

(i) limn→∞

supt∈[0,T ]

d(wn(t), w(t)) = 0.

(ii) limn→∞

wn(tn) =

w(t) whenever tn ↓ t

w(t−) whenever tn ↑ t, tn, t ∈ [0, T ].

This is very similar to the proof of Lemma 1.7, with wn replaced by wn. Theimplication (i)⇒(ii) can be proved as in (1.13) using the facts that w(tn) →w(t) if tn ↓ t and w(tn) → w(t−) if tn ↑ t. To prove the implication (ii)⇒(i)we assume that (i) does not hold and show that there exist n1 < n2 < · · ·such that limm→∞ snm = t for some t ∈ [0, T ] and d(wnm(snm), w(snm)) ≥ ε

2for each m. Since either snm > t infinitely often, or snm < t infinitely often,or snm = t infinitely often, by going to a further subsequence we can assumethat either snm ↓ t or snm ↑ t. Now the proof proceeds as before.

MARKOV PROCESSES 15

We next state that if the underlying space (E,O) is Polish then DE [0,∞)is Polish.

Proposition 1.14 (Andrei N. Kolmogorov [Kol56]). If (E,O) is separable,then

(DE [0,∞), ddSk

)is separable. If (E, d) is complete, then

(DE [0,∞), ddSk

)

is complete.

Remark. If E is Polish then DE [0,∞) is separable and we can choose d suchthat E is complete in d, hence DE [0,∞) is complete in dSk, hence DE [0,∞)is Polish.

We prepare the proof with the following problem:

Exercise 1.15. Let (E,O) be a separable topological space, and (αn)n∈N acountable dense subset of E. Show that the collection Γ of all functions ofthe form

(1.45) w(t) :=

αnk

, t ∈ [tk−1, tk),αnK

, t ∈ [tK ,∞),

where 0 = t0 < t1 < · · · < tK are rational numbers, K ≥ 1, and n1, . . . , nK ∈N, is dense in

(DE [0,∞), ddSk

).

Proof of Proposition 1.14. Separability is covered by Exercise 1.15.To prove completeness, it is enough to show that every Cauchy sequence

has a subsequential limit. If (wn)n∈N is Cauchy, then for all k ∈ N thereexists a Nk such that for all m,n ≥ Nk,

(1.46) ddSk(wm, wn

)≤ 2−(k+1)e−k.

That is, for k ≥ 1, we can choose λk ∈ Λ and sk > k such that

(1.47) ‖ λk ‖ ∨ supt∈[0,∞)

1 ∧ d(wNk

(λk(t) ∧ sk), wNk+1(t ∧ sk)

)≤ 2−k.

Let then

(1.48) µk := limn→∞

λk+n · · · λk+1 λk,

and notice that µk exists uniformly on bounded intervals, is Lipschitz con-tinuous and satisfies

(1.49) ‖ µk ‖≤∞∑

l=k

‖ λl ‖≤ 2−k+1,


and hence, in particular, belongs to Λ. Since by (1.47), for all k ≥ 1,

(1.50)

supt∈[0,∞)

1 ∧ d(wNk

(µ−1k (t) ∧ sk), wNk+1

(µ−1k+1(t) ∧ sk)

)

= supt∈[0,∞)

1 ∧ d(wNk

(µ−1k (t) ∧ sk), wNk+1

(λk(µ−1k )(t) ∧ sk)

)

= supt∈[0,∞)

1 ∧ d(wNk

(t ∧ sk), wNk+1(λk(t) ∧ sk)

)

≤ 2−k,

completeness of E implies that uk := wNk µ−1

k converges uniformly onbounded intervals to a function w : [0,∞) → E. Moreover, since uk ∈DE[0,∞), for all k ≥ 1, also w ∈ DE [0,∞). Therefore (wNk

)k∈N and wsatisfy the conditions of part (b) of Proposition 1.12, and hence we concludethat ddSk(wNk

, w)−→k→∞

0.

Let SE denote the Borel σ-algebra on(DE[0,∞), ddSk

). Since we are going

to talk about probability measures on(DE [0,∞),SE

)it is important to

know more about SE.The following result states that SE is just the σ-algebra generated by the

coordinate variables.

Proposition 1.16 (Borel σ-field). If (E,O) is Polish, then the Borel-σ-fieldon DE [0,∞) coincides with the σ-field generated by the coordinate projec-tions (ξt)t≥0, defined as

(1.51) ξt : DE [0,∞) ∋ w 7→ w(t), t ≥ 0.

Proof. Let ScoorE denote the σ-algebra generated by the coordinate maps,

i.e.,

(1.52) ScoorE := σ(ξt : t ∈ [0,∞)).

We start showing that ScoorE ⊆ SE. For a given ε > 0, t ∈ [0,∞) and f a

bounded continuous function on E, consider the following map:

(1.53) f εt : DE [0,∞) ∋ w 7→ 1

ε

∫ t+ε

tds f(ξs(w)) ∈ R.

It is easy to check that f εt is continuous on DE [0,∞), and hence Borel

measurable. Moreover, since limε↓0 f εt = f ξt, we find that f ξt is Borel

measurable for every bounded and continuous function f , and hence also forall bounded functions f . Consequently,

(1.54) ξ−1t (Γ) := w ∈ DE [0,∞) : ξt(w) ∈ Γ ∈ SE , Γ ∈ B(E).

That is, ScoorE ⊆ SE .

To prepare the other direction, notice first that if D ⊆ [0,∞) is densethen

(1.55) ScoorE = σ(ξt : t ∈ D).

MARKOV PROCESSES 17

Indeed, for each t ∈ [0,∞), there exists a sequence (tn)n∈N in D∩ [t,∞) with(tn) ↓ t, as n → ∞. Therefore, ξt = limn→∞ ξtn is σ(ξt : t ∈ D)-measurable.

Assume now that (E,O) is separable. Fix n ∈ N and 0 =: t0 < t1 < t2 <· · · < tn < tn+1 < ∞. Consider the function

(1.56) η : En+1 ∋ (α0, . . . , αn) 7→n−1∑

k=0

αk1[tk ,tk+1)+αn1[tn,∞) ∈ DE[0,∞).

Since for a metric d on E,

(1.57) ddSk(η(α0, . . . , αn), η(β0, . . . , βn)

)≤ max

0≤k≤nd(αk, βk),

η is continuous. Moreover, since ξt is by definition ScoorE -measurable and

(E,O) is separable, for a given u ∈ DE [0,∞), the following map

(1.58) κu,(t0,...,tn) : DE [0,∞) ∋ w 7→ dSk(u, η (ξt0 , . . . , ξtn)(w)

)∈ R

is ScoorE -measurable.

Finally, for each m ∈ N, let ηm be defined as η was with the special choicen = m2 and ti :=

im , i = 0, . . . ,m2. Then for all w ∈ DE [0,∞),

(1.59) limm→∞

dSk(u, ηm (ξt0 , . . . , ξtm2 )(w)

)= dSk

(u,w).

Thus, also the map dSk(u, ·) : w 7→ dSk(u,w) is Scoor

E -measurable for allfixed u ∈ DE [0,∞). In particular, every open ball

(1.60) B(u, ε) :=w ∈ DE [0,∞) : dSk

(u,w) < ε

belongs to ScoorE , and since (E,O) (and by Proposition 1.14 also DE [0,∞))

is separable, ScoorE contains all open sets in DE[0,∞), and hence contains

SE.

If E is a metrizable space, then we denote the space of continuous func-tions w : [0,∞) → E by CE [0,∞).

Lemma 1.17 (Continuous functions). The space CE [0,∞) is a closed subsetof DE [0,∞). The induced topology on CE [0,∞) is the topology of uniformconvergence on compact sets.

Proof. For closedness, let (wn)n∈N be a sequence of functions in CE [0,∞),and w ∈ DE [0,∞) such that dSk(wn, w)−→

n→∞

0. We have to show that w ∈CE[0,∞). By condition (c) of Proposition 1.12, for all T ∈ [0,∞), thereexists a sequence (λT

n )n∈N in Λ′ satisfying (1.26) and (1.32). Hence, for allT ∈ [0,∞) and ε > 0,

• by (1.32), there exists N = N(T, ε) such that for all n ≥ N andt ∈ [0, T ], |λn(t)− t| < ε, and

• by continuity of wn, there exists δ = δ(ε) > 0 such that for alls, t ∈ [0, T ] with |s− t| < δ, d(wn(t), wn(s)) < ε.


Combining both yields that for all n ≥ N(T, δ(ε)) and t ∈ [0, T ],

(1.61) d(wn(t), wn(λn(t))) < ε.

Thus, (1.61) together with (1.26) implies

(1.62)

supt∈[0,T ]

d(wn(t), w(t))

≤ supt∈[0,T ]

d(wn(λn(t)), wn(t)) + supt∈[0,T ]

d(wn(λn(t)), w(t))

−→n→∞

0.

This is equivalent to uniform convergence of (wn)n∈N against w on compacta.In particular, the limit function w is continuous.

The next lemma shows that stochastic processes with cadlag sample pathsare just random variables with values in a rather large and complicatedspace.

Lemma 1.18 (Processes with cadlag sample paths). A function (t, ω) 7→Xt(ω) is a stochastic process with Polish state space E and cadlag samplepaths if and only if ω 7→ (Xt(ω))t≥0 is a DE [0,∞)-valued random vari-

able. Two E-valued stochastic processes X and X with cadlag sample pathshave the same finite dimensional distributions if and only if, considered asDE[0,∞)-valued random variables, they have the same laws L(X) and L(X).

Proof. Let X : Ω → DE[0,∞) denote the function ω 7→ X(ω) := (Xt(ω))t≥0.By Proposition 1.16, the Borel-σ-field on DE[0,∞) is generated by the co-ordinate projections (ξt)t≥0. Therefore, the function X is measurable if

and only if X−t(ξ−1t (A)) ∈ F for all A ∈ B(E). Since X−t(ξ−1

t (A)) =(ξt X)−1(A) = X−1

t (A) this is equivalent to the statement that the (Xt)t≥0

are random variables.The finite dimensional distributions of an E-valued stochastic process X

are uniquely determined by all probabilities of the form

(1.63) PXt1 ∈ A1, . . . ,Xtn ∈ Anwith 0 ≤ t1 ≤ · · · ≤ tn and A1, . . . , An ∈ B(E). The class of all subsets ofDE[0,∞) of the form w ∈ DE[0,∞) : wt1 ∈ A1, . . . , wtn ∈ An is closedunder finite intersections and generates the Borel-σ-field on DE[0,∞), sothe probabilities of the form (1.63) uniquely determine the law L(X) of X,considered as DE [0,∞)-valued random variable.

1.4. Compactification of Polish spaces. In this section we collect someimportant facts about Polish spaces that will be useful later on. In par-ticular, we will see that every Polish space can be embedded in a compactspace.

Compact metrizable spaces are, in a sense, the “nicest” topological spaces.A countable product

∏i∈NEi of compact metrizable spaces, equipped with

MARKOV PROCESSES 19

the product topology, is compact and metrizable [Kel55, Theorem 4.14].7

Every compact metrizable space is separable.8 Conversely, every separablemetrizable space can be embedded in a compact metrizable space.

Definition 1.19. By definition, a compactification of a topological space Eis a compact topological space E such that E ⊆ E, the topology on E is theinduced topology from E, and E is the closure of E.

Proof. Notice that if E is a compactification of E then E is compact andonly if E = E

The next proposition can be found in [Kel55, Theorem 4.17] or [Cho69,Theorem 6.3].

Proposition 1.20 (Metrizable compactifications). Every separable metriz-able space E has a metrizable compactification E.

Definition 1.21 (Product topology). Let ((Ek,Ok))k∈N be metrizable topo-logical spaces. The product topology O on

∏∞k=1Ek is the roughest topology

on∏∞

k=1Ek such that all projections π from E → Ei are continuous.

Remark. Let ((Ek, dk))k∈N be metric spaces. Then the product topology Oon

∏∞k=1Ek can be metrized by

(1.64) d(x, y) :=∞∏

k=1

2−k 1 ∧ dk(xk, yk)

for all x := (x1, x2, . . .) and y := (y1, y2, . . .) in E.

Proof. (Sketch) Equip [0, 1]N with the product topology. Then [0, 1]N iscompact and metrizable. Using Urysohn’s lemma, it can be shown that thereexist a countable family (fi)i∈N of continuous functions fi : E → [0, 1] suchthat the map f : E → [0, 1]N defined by f(x) := (fi(x))i∈N is open and one-to-one. Since f is obviously continuous, it follows that f is a homeomorphismbetween E and f(E). Identifying E with its image f(E) and taking for Ethe closure of f(E) in [0, 1]N we obtain the required compactification.

Unfortunately, for general separable metrizable spaces, E may be a very‘bad’ (even non-measurable) subset of its compactification E. For Polishspaces, and in particular for locally compact spaces, the situation is better.In what follows, all spaces are separable and metrizable.

7Uncountable products of compact metrizable spaces are still compact but no longermetrizable.

8This follows from the fact that for a metric space compact ⇒ totally bounded ⇒countable basis for the topology ⇒ separable.


Definition 1.22 (Locally compact). We say that E is locally compact iffor each x ∈ E there exists an open set O and a compact set C such thatx ∈ O ⊂ C.

Exercise 1.23. Let E be a locally compact space. Show that E is separable,and that there exists compact sets (Ci)i∈N such that E =

⋃i∈NCi.

We need the following facts.

Proposition 1.24 (Subsets of locally compact and Polish spaces).

(i) A subset F of a locally compact space E is itself locally compact inthe induced topology if and only if F ⊂ E is the intersection of anopen set with a closed set.

(ii) A subset F of a Polish space E is itself Polish in the induced topologyif and only if F ⊂ E is a countable intersection of open sets.

Proof. For Part (i), see [Bou64, §8.16]. For Part (ii), see [Bou58, Section 6,Theorem 1].

Remark. Sets that are the intersection of an open set with a closed set arecalled locally closed. Sets that are a countable intersection of open sets arecalled Gδ-sets. Every closed set is a Gδ-set.

9

The following is a immediate consequence of Proposition 1.24.

Corollary 1.25. Let E be a separable metrizable space and E is a metrizablecompactification of E. Then

• E is locally compact if and only if E is an open subset of E.• E is Polish if and only if E is a countable intersection of open setsin E.

Exercise 1.26. Prove Corollary 1.25.

In particular,

(1.65) E compact ⇒ E locally compact ⇒ E Polish.10

If E is locally compact but not compact, then there exists a metrizablecompactification E of E such that E\E consists of one point (usually de-noted by ∞). In this case, E is the set

(1.66) E∞ := E ∪ ∞and by definition a subset U ⊆ E is open if either

9If (E, d) is a metric space and A ⊆ E is closed, then On := x ∈ E : d(x,A) < 1n are

open sets with A =⋂

n On.10If E is a separable metrizable space and there exists a metrizable compactification E

of E such that E ⊂ E is a Borel measurable set, an analytic set, or a universally measurableset, then E is called a Lusin space, a Souslin space, or a Radon space, respectively.

MARKOV PROCESSES 21

• ∞ 6∈ U and U is open in the original topology.• ∞ ∈ U and E \ U is compact in the original topology of E.

We call E∞ the one-point compactification of E.As an application of Proposition 1.24, we prove:

Proposition 1.27 (Product spaces).(i) A finite product E1 × · · · × En of locally compact spaces is locally

compact but a countably infinite product∏

i∈NEi is not, unless allbut finitely many Ei are compact.

(ii) A countable product∏

i∈NEi of Polish spaces is Polish.

Proof. Let Ei (i ∈ N) be locally compact spaces and let Ei be metrizablecompactifications of the Ei. Then

∏i∈NEi is a metrizable compactification

of∏

i∈NEi. Let πi denote the projection on Ei. If all but finitely many Ei

are compact, then there is an n such that Ei = Ei for all i > n. Therefore∏i∈N Ei =

⋂ni=1 π

−1i (Ei) is an open subset of

∏i∈NEi, hence

∏i∈NEi is

locally compact. If there are infinitely many Eik that are not compact, then

choose x = (xi)i∈N ∈ ∏i∈NEi and x(k) ∈ ∏

i∈N Ei such that x(k)i = xi for

all i 6= ik and x(k)ik

∈ Eik\Eik . Then x(k) 6∈ ∏i∈N Ei and x(k) → x in the

product topology, which proves that∏

i∈NEi\∏

i∈N Ei is not closed, hence∏i∈N Ei is not open, hence

∏i∈NEi is not locally compact.

If the Ei are Polish, then each Ei is a countable intersection of opensubsets of Ei, say Ei =

⋂j Oij . Then

∏i∈NEi =

⋂i,j π

−ii (Oij) is a countable

intersection of open subsets of∏

i∈NEi, hence∏

i∈NEi is Polish.

Definition 1.28 (Separating points). We say that a family (fi)i∈I of func-tions on a space E separates points if for each x 6= y there exists an i ∈ Isuch that fi(x) 6= fi(y).

The next (deep) result is often very useful.

Proposition 1.29 (Borel σ-field). Let E, (Ei)i∈N be Polish spaces and let(fi)i∈N be a countable family of measurable functions fi : E → Ei thatseparates points. Then σ(fi : i ∈ N) = B(E).

Proof. See [Sch73, Lemma II.18]. Warning: the statement is false for un-countable families (fi)i∈I . For example, if E = [0, 1], then the functions(1x)x∈[0,1] separate points, but they generate the σ-field S := A ⊂ [0, 1] :A countable or [0, 1]\A countable.

A simple application is:

Corollary 1.30 (Product σ-field). If (Ei)i∈N are Polish spaces, then theBorel-σ-field B(∏i∈NEi) coincides with the product-σ-field

∏i∈N B(Ei).


Proof. Let πi denote the projection on Ei. Then the functions (πi)i∈N arecontinuous (hence certainly measurable) and separate points.

Note that Proposition 1.29 also implies that if E is Polish, then the Borel-σ-field on DE [0,∞) coincides with the σ-field generated by the coordinateprojections ξt : t ∈ Q ∩ [0,∞). This strengthens Proposition 1.16!

MARKOV PROCESSES 23

2. Markov processes

In the previous section, we have studied stochastic processes in general,and stochastic processes with cadlag sample paths in particular. In thepresent section we take a look at a special class of stochastic processes,namely those which have the Markov property, and in particular at thosewhose transition probabilities are time-homogeneous. We will see how suchtime-homogeneous transition probabilities can be interpreted as semigroups.In the next sections we will then see how a certain type of these semigroups,namely those which have the Feller property, may be constructed from theirgenerators, and how such semigroups give rise to Markov processes withcadlag sample paths.

2.1. The Markov property. We start by recalling the notion of condi-tional expectation.

Let (Ω,F ,P) be our underlying probability space. For any σ-field H, let

(2.1) B(H) := f : Ω → R : f H-measurable and bounded.Definition 2.1. The conditional expectation of a random variable F ∈B(F) given H, denoted by EH[F ] or E[F |H], is a random variable such that

(2.2)(1) EH(F ) ∈ B(H),

(2) E[EH(F )H] = E[FH] ∀H ∈ B(H).

The random variable EH[F ] is almost surely defined through these twoconditions (with respect to the restriction of P to H). Some elementaryproperties of the conditional expectation are:

(2.3)

(“continuity”) EH(Fi) ↑ EH(F ) a.s. ∀Fi ↑ F,

(“projection”) EG[EH[F ]] = EG[F ] a.s. ∀G ⊂ H,

(“pull out”) EH(F )H = EH(FH) a.s. ∀H ∈ B(H).

We write P(A|H) := EH[1A] (A ∈ F) and for any random variable G weabbreviate EG[F ] = E[F |G] := E[F |σ(G)] and P(A|G) := P(A|σ(G)).

Proof of the pull out property. We need to check that EH[F ]H satisfies (2.2).Indeed, EH[F ]H ∈ B(H) since EH[F ] ∈ B(H) and H ∈ B(H), and applying(2.2) (2) twice we see that E[EH(F )HH ′] = E[FHH ′] = E[EH[FH]H ′] forall H ∈ B(H), which shows that EH[F ]H satisfies (2.2) (2).

Lemma 2.2 (Conditional expectation). It suffices to check (2.2) (2) for Hof the form H = 1A with A ∈ G, where G is closed under finite intersections,there exists Ai ∈ G such that Ai ↑ Ω, and σ(G) = H.

Before we prove Lemma 2.2, we recall a basic fact from measure theory.A subset D of the set of all subsets of Ω is called a Dynkin system if

(1) Ω ∈ D,(2) A,B ∈ D, A ⊇ B ⇒ A\B ∈ D, and


(3) An ∈ D, An ↑ A ⇒ A ∈ D.

Lemma 2.3. Let C be a collection of subsets of Ω which is closed underfinite intersections. Then the smallest Dynkin system which contains C isequal to σ(C).

Proof. See any book on measure theory.

Proof of Lemma 2.2. Set D := A ∈ H : E[EH(F )1A] = E[F1A]. By thelinearity and continuity of the conditional expectation, A,B ∈ D, A ⊇ B ⇒A\B ∈ D and An ∈ D, An ↑ A ⇒ A ∈ D. Since we are assuming thatthere exists Ai ∈ G such that Ai ↑ Ω, we also have Ω ∈ D, so D is a Dynkinsystem. Therefore, by Lemma 2.3, E[EH(F )1A] = E[F1A] for all A ∈ Gimplies E[EH(F )1A] = E[F1A] for all A ∈ H. The general statement followsby approximation with simple functions, using the linearity and continuityof EH.

Example. Let Z be uniformly distributed on [0, 1], X := cos (2πZ) andY := sin (2πZ). Then (X,Y ) is uniform distributed on (x, y) : x2+y2 = 1,and hence a version of the conditioned distribution of X given Y is

(2.4) P (X ∈ C|Y ) =1

2δ√1−Y 2(C) +

1

2δ−

√1−Y 2(C).

Moreover, we find EY [X] = 0, and EY [X2] = 1− Y 2.

Let X be a stochastic process with values in a Polish space (E,O). Foreach t ≥ 0, we introduce the σ-fields

(2.5) FXt := σ

(Xs; 0 ≤ s ≤ t),

and

(2.6) GXt := σ

(Xu; t ≤ u).

Note that FXt is the collection of events that refer to the behavior of the

process X up to time t. That is, FXt contains all “information” that can be

obtained by observing the process X up to time t. Likewise, GXt contains all

information that can be obtained by observing the process X after time t.

Proposition 2.4 (Markov property). The following four conditions on Xare equivalent.

(a) For all A ∈ FXt , B ∈ GX

t , and t ≥ 0,

(2.7) P(A ∩B|Xt) = P(A|Xt)P(B|Xt) a.s.,

(b) For all B ∈ GXt , t ≥ 0,

(2.8) P(B|Xt) = P(B|FXt ) a.s.,

MARKOV PROCESSES 25

(c) For all C ∈ B(E), and 0 ≤ s ≤ t,

(2.9) P(Xt ∈ C|FXs ) = P(Xt ∈ C|Xs) a.s.,

(d) For all C1, C2, . . . ∈ B(E), and 0 ≤ t1 ≤ · · · ≤ tn,

(2.10)PXt1 ∈ C1, . . . ,Xtn ∈ Cn

= E[1Xt1∈C1EXt1

[1Xt2∈C2EXt2

[· · ·EXtn−1[1Xtn∈Cn

]]]].

Remark. If X satisfies the equivalent conditions from Proposition 2.4 thenwe say that X has the Markov property. Note that condition (a) says thatthe future and the past are conditionally independent given the present.Condition (b) says that the behavior of X after time t depends only on thebehavior of X before time t through the state of X at time t.

Exercise 2.5 (Gaussian processes with Markov property). A stochasticprocess X := (Xt)t∈[0,∞) is called Gaussian process if for all n ∈ N, and

(t1, . . . , tn) ∈ [0,∞)n, the random vector (Xt1 , . . . ,Xtn)t is normal dis-

tributed with mean (µt1 , . . . , µtn) ∈ Rn and covariance function Γ(s, t) :=E[(Xs − µs)(Xt − µt)].

Show that a centered (i.e., µ ≡ 0) Gaussian process has the Markov prop-erty if and only if for all s, t, u ∈ [0,∞) with s < t < u,

(2.11) Γ(s, u)Γ(t, t) = Γ(s, t)Γ(t, u).

Proof of Proposition 2.4. (a)⇒(b): We have to show that for all A ∈ FXt

and B ∈ GXt

(2.12) E[1AP(B|Xs)

]= P

(A ∩B

).

By the projection property and by the pull out property applied to H :=P(B|Xs),

(2.13)E[1AP(B|Xs)

]= E

[EXs [1AP(B|Xs)]

]

= E[P(A|Xs)P(B|Xs)

]

By (a),

(2.14)E[P(A|Xs)P(B|Xs)

]= E

[P(A ∩B|Xs)

]

= P(A ∩B

).

(b)⇒(c): This follows by applying (2.8) to t := s and B := Xt ∈ C ∈ GXs .

(c)⇒(d): Since Ft1 ⊆ Ft2 ⊆ . . ., repeated use of the projection property andthe pull out property give

(2.15)PXt1 ∈ C1, . . . ,Xtn ∈ Cn = E[1Xt1∈C1 · · · 1Xtn∈Cn]

= E[1Xt1∈C1EFX

t1

[1Xt2∈C2EFX

t2[· · ·EFX

tn−1[1Xtn∈Cn

]]],

and the right hand side of (2.15) equals the right hand side of (2.10) by (b).


(d)⇒(c): By approximation with simple functions it follows from (d) thatfor any 0 ≤ t1 ≤ · · · ≤ tn and F1 ∈ B(σ(Xt1), . . . , Fn ∈ B(σ(Xtn),

(2.16)E[F1 · · ·Fn]

= E[F1EXt1

[F2EXt2

[· · ·EXtn−1

[Fn]]]]

.

Let 0 ≤ s1 ≤ · · · ≤ sm = s ≤ t, C1, . . . , Cm, C ∈ B(E). Using (c) andapplying (2.16) to n = m − 1 and Fn = 1Xsm∈CmEXs [1Xt∈C], we findthat

(2.17)

E[1Xs1∈C1,...,Xsm∈Cm1Xt∈C

]

= E[1Xs1∈C1EXs1

[· · ·EXsm−1

[1Xsm∈CmEXs [1Xt∈C]

]]]]

= E[1Xs1∈C1,...,Xsm∈CmEXs [1Xt∈C]

].

It follows from Lemma 2.2 that

(2.18) EXs [1Xt∈C] = EFXs[1Xt∈C].

(c)⇒(b): By approximation with simple functions it follows from (c) thatfor all F ∈ B(σ(Xt)) and 0 ≤ s ≤ t,

(2.19) E[F |FXs ] = E[F |Xs] a.s.

Let 0 ≤ t ≤ u1 ≤ · · · ≤ um and C1, . . . , Cm ∈ B(E). Then repeated use ofthe projection property, the pull out property, and (2.19) give

(2.20)

EFXt

[1Xu1∈C1,...,Xum∈Cm

]

= EFXt

[1Xu1∈C1EFX

u1

[· · ·EFX

um−1[1Xum∈Cm]

]]

= EXt

[1Xu1∈C1EXu1

[· · ·EXum−1

[1Xum∈Cm]]].

In the last step we have applied (2.19) first to 1Xum∈Cm ∈ B(σ(Xun)),then to 1Xum−1∈CmEXum−1

[1Xum∈Cm] ∈ B(σ(Xun−1)), and so on. It

follows that EFXt


]is σ(Xt)-measurable. Therefore, by

the projection property,

(2.21)

EXt


]

= EXt

[EFX

t


]]

= EFXt


].

The class of all sets A such that EXt [1A] = EFXt[1A] forms a Dynkin system,

so by Lemma 2.3 we arrive at (b).

MARKOV PROCESSES 27

(b)⇒(a): Indeed, by (2.8) and the pull out property, for all D ∈ σ(Xt),

(2.22)

P(A ∩B ∩D

)= E

[P(B|FX

t )1A∩D]

= E[P(B|Xt)1A∩D

]

= E[EXt

[P(B|Xt)1A∩D

]]

= E[P(A|Xt)P(B|Xt)1D

],

which proves (2.7) because P(A|Xt)P(B|Xt) ∈ B(σ(Xt)).

2.2. Transition probabilities. Let E,F be Polish spaces. By definition,a probability kernel from E to F is a function K : E × B(F ) → [0, 1] suchthat

(1) For fixed x ∈ E, K(x, ·) is a probability measure on F .(2) For fixed A ∈ B(F ), K(·, A) is a measurable function on E.

If E = F then we say that K is a probability kernel on E.

Example. For all x ∈ R and A ∈ B(R), set

(2.23) K(x,A

):=

1√2π

∫

Ady exp

[− (x− y)2

2

].

Then K is a probability kernel on R.

There is another way of looking at probability kernels that is often veryuseful. For any Polish space E we define

(2.24) B(E) := f : E → R : f Borel measurable and bounded.Lemma 2.6 (Probability kernels). If K is a probability kernel from E to Fthen the operator K : B(F ) → B(E) defined by

(2.25) Kf(x) :=

∫

FK(x,dy) f(y), x ∈ E, f ∈ B(F ),

satisfies

(1) K is conservative, i.e., K1 = 1.(2) K is positive, i.e., Kf ≥ 0 for all f ≥ 0.(3) K is linear, i.e., K(λ1f1+λ2f2) = λ1K(f1)+λ2K(f2) for all f1, f2 ∈

B(F ) and λ1, λ2 ∈ R.(4) K is continuous with respect to monotone sequences, i.e., K(fi) ↑

K(f) for all fi ↑ f , fi, f ∈ B(F ).

Conversely, every operator K : B(F ) → B(E) with these properties corre-sponds to a probability kernel from E to F as in (2.25).


Proof. If K is a probability kernel from E to F then the operator K :B(F ) → B(E) defined in (2.25) maps B(F ) into B(E) since K(·, A) ismeasurable for each A ∈ B(F ), and the operator K has the properties (1)–(4)) since K(x, ·) is a probability measure for each x ∈ E. Conversely, ifK : B(F ) → B(E) satisfies (1)–(4) then K(x,A) := K1A(x) is measurableas a function of x for each A ∈ B(F ) since the operator K maps B(F ) intoB(E) and K(x, ·) is a probability measure by (1)–(4).

Remark. If E is a set consisting of one point, say E = 0, then a probabilitykernel from E to F is just a probability measure K(0, ·) = µ, say. In thiscase B(E) is isomorphic to R and the operator in (2.25), considered as anoperator from B(F ) to R, is given by

(2.26) µf :=

∫

Fµ(dy)f(y), f ∈ B(F ).

If E,F , and G are Polish spaces, K is a probability kernel from E toF , and L is a probability kernel from F to G, then the composition of theoperators L : B(G) → B(F ) and K : B(F ) → B(E) yields an operatorKL : B(G) → B(E) that corresponds to the composite kernel KL from Eto G given by

(2.27) (KL)(x,A) :=

∫

FK(x,dy)L(y,A) (x ∈ E, A ∈ B(G)).

The following result states that conditional probabilities of random vari-ables with values in Polish spaces are associated with probability kernels.

Proposition 2.7 (Conditional probability kernel). Let X,Y be randomvariables with values in Polish spaces E and F , respectively. Then thereexists a probability kernel P from E to F such that for all A ∈ B(F ),

PY ∈ A|X = P (X,A), a.s.

The kernel P is unique up to a.s. equality with respect to L(X).

Proof (sketch). Let M(F ) be the space of finite measures on F , equippedwith σ-field generated by the mappings µ 7→ µ(A) with A ∈ B(F ). Define afunction M : B(E) → M(F ) by

(2.28) M(B)(A) := PX ∈ B, Y ∈ A.Then M(∅) = 0, the zero measure, and M is σ-additive, so we may interpretM as a measure on (E,B(E)) with values inM(F ). Moreover, PX ∈ B =0 implies M(B) = 0, so M is absolutely continuous with respect to PX , thelaw of X. It follows from the fact that F is a Polish space that the spaceM(F ) has the Radon-Nikodym property, i.e., the Radon-Nikodym theoremalso holds for M(F )-valued measures and functions. As a result, there exists

MARKOV PROCESSES 29

a M(F )-valued measurable function x 7→ P (x, ·) from E to M(F ), uniqueup to a.s. equality with respect to PX , such that

(2.29) M(B) =

∫

BP (x, ·)PX (dx).

It is not hard to check that a function x 7→ P (x, ·) from E to M(F ) ismeasurable if and only if P is a probability kernel from E to F . Now (2.29)says that(2.30)

E[P (X,A)1B ] =

∫

BP (x,A)PX (dx) = M(B)(A) = PX ∈ B, Y ∈ A,

which is equivalent to (2.7).

Remark. Proposition 2.7 remains true if only F is Polish and E is any mea-surable space.

It follows from Proposition 2.7 that for any stochastic process X in Ethere exist probability kernels (Ps,t)0≤s≤t on E such that for all A ∈ B(E),and 0 ≤ s ≤ t,

(2.31) PXt ∈ A|Xs = Ps,t(Xs, A), a.s.

We call (Ps,t)0≤s≤t the transition probabilities of X.

Proposition 2.8 (Markov transition probabilities). Let X be a stochasticprocess with values in E and let (Ps,t)0≤s≤t be probability kernels on E. Thenthe following conditions are equivalent:

(a) For all C ∈ B(E) and 0 ≤ s ≤ t,

(2.32) P(Xt ∈ C|FXs ) = Ps,t(Xs, C), a.s.

(b) X has the Markov property, and for all C ∈ B(E) and 0 ≤ s ≤ t,

(2.33) P(Xt ∈ C|Xs) = Ps,t(Xs, C), a.s.

(c) For all C1, . . . , Cn ∈ B(E) and 0 = t0 ≤ t1 ≤ · · · ≤ tn,(2.34)

PXt1 ∈ C1, . . . ,Xtn ∈ Cn

=

∫PX0 ∈ dx0

∫

C1

Pt0,t1(x0,dx1) · · ·∫

Cn

Ptn−1,tn(xn−1,dxn).

Proof. (a)⇒(b): It follows from (a) that P(Xt ∈ C|FXs ) is measur-

able with respect to σ(Xt), and therefore P(Xt ∈ C|Xs) = E[P(Xt ∈C|FX

s )|Xs] = P(Xt ∈ C|FXs ), a.s. By condition (c) of Proposition 2.4,

X has the Markov property.

(b)⇒(a): Since X has the Markov property, by condition (c) of Proposi-tion 2.4, E[P(Xt ∈ C|FX

s )|Xt] = P(Xt ∈ C|Xs) = P(Xt ∈ C|FXs ),

a.s.


(b)⇒(c): Since X has the Markov property, X satisfies condition (d) ofProposition 2.4. Using the fact that P(Xt ∈ C|Xs) = Ps,t(Xs, C), a.s.,we arrive at (c).

(c)⇒(b): We start by showing that for all C ∈ B(E) and 0 ≤ s ≤ t),

(2.35) P(Xt ∈ C|Xs) = Ps,t(Xs, C),

where (Ps,t)0≤s≤t are the probability kernels in (c). Since Ps,t(Xs, C) ismeasurable with respect to σ(Xs), by the definition of the conditional prob-ability, it suffices to show that E[Ps,t(Xs, C)1Xs∈B] = PXs ∈ B,Xt ∈ Cfor all B,C ∈ B(E). Indeed, by (c),(2.36)

E[Ps,t(Xs, C)1Xs∈B] =∫

PX0 ∈ dx0∫

BP0,s(x0,dx1)Ps,t(x1, C)

= PXs ∈ B,Xt ∈ C.This proves (2.35), i.e., the (Ps,t)0≤s≤t are the transition probabilities of X.It follows that X satisfies condition (d) from Proposition 2.4, so X has theMarkov property.

2.3. Transition functions and Markov semigroups. Condition (c) ofProposition 2.8 shows that the finite dimensional distributions of a processX with the Markov probability are uniquely determined by its transitionprobabilities (Ps,t)0≤s≤t and its initial law L(X0). We will mainly be inter-ested in the case that the transition probabilities can be chosen in such away that Ps,t is a function of t−s only. This leads to the following definition.Recall that the delta-measure δx in a point x is defined as

(2.37) δx(A) =

1, x ∈ A,0, x 6∈ A.

Definition 2.9 (Transition function). By definition, a transition functionon E is a collection (Pt)t≥0 of probability kernels on E such that

(1) (Initial law) For all x ∈ E,

(2.38) P0(x, ·) := δx,

(2) (Chapman-Kolmogorov equation) For all x ∈ E, C ∈ B(E), and0 ≤ s ≤ t,

(2.39)

∫Ps(x,dy)Pt(y,A) = Ps+t(x,A).

We make the following observation.

Lemma 2.10 (Markov semigroups). A collection (Pt)t≥0 of probability ker-nels on E is a transition function if and only if the associated operatorsPt : B(E) → B(E) defined by

(2.40) Ptf(x) :=

∫

EPt(x,dy)f(y) x ∈ E, f ∈ B(E)

MARKOV PROCESSES 31

satisfy

(1) P0f = f (f ∈ B(E)),(2) PsPt = Ps+t (s, t ≥ 0).

Properties (1) and (2) from Lemma 2.10 say that the operators (Pt)t≥0

form a semigroup. If (Pt)t≥0 is a transition function then we call the asso-ciated semigroup of operators on B(E) a Markov semigroup.

Proposition 2.11 (Markov processes). Let X be a stochastic process withvalues in E and let (Pt)t≥0 be a transition function on E. Then the followingconditions are equivalent:

(a) For all f ∈ B(E) and 0 ≤ s ≤ t,

(2.41) E[f(Xt)|FXs ] = Pt−sf(Xs), a.s.

(b) X has the Markov property, and for all A ∈ B(E) and 0 ≤ s ≤ t,

(2.42) P(Xt ∈ A|Xs) = Pt−s(Xs, A), a.s.

(c) For all A1, . . . , An ∈ B(E) and 0 = t0 ≤ t1 ≤ · · · ≤ tn,(2.43)

PXt1 ∈ A1, . . . ,Xtn ∈ An

=

∫PX0 ∈ dx0

∫

A1

Pt1−t0(x0,dx1) · · ·∫

An

Ptn−tn−1,(xn−1,dxn).

Proof. We claim that condition (a) is equivalent to

(a)’ For all A ∈ B(E) and 0 ≤ s ≤ t,

(2.44) P(Xt ∈ A|FXs ) = Pt−s(Xs, A), a.s.

Indeed, the implication (a)⇒(a)’ is obvious, while the converse follows byapproximation with simple functions. Therefore the statement follows di-rectly from Proposition 2.8.

Proposition 2.12 (Construction from semigroup). Let E be a Polish space,(Pt)t≥0 a transition function on E, and µ a probability measure on E. Thenthere exists a stochastic process X, unique in finite dimensional distribu-tions, such that X satisfies the equivalent conditions (a)–(c) from Proposi-tion 2.11.

Proof. By condition (c) from Proposition 2.11, it suffices to show that thereexists a stochastic process X with finite dimensional distributions given by

(2.45)

PXt1 ∈ A1, . . . ,Xtn ∈ An

=

∫µ(dx0)

∫

A1

Pt1−t0(x0,dx1) · · ·∫

An

Ptn−tn−1(xn−1,dxn)

for all n ∈ N, 0 = t0 ≤ t1 ≤ · · · ≤ tn and A1, . . . , An ∈ B(E). By the Chap-man Kolmogorov equation for transition functions, these finite dimensionaldistributions are consistent in the sense of Theorem 1.2, so there exists a


stochastic process (Xt)t≥0 with the finite dimensional distributions in (2.45).

If X is a stochastic process with the Markov property and there existsa transition function (Pt)t≥0 such that X satisfies the equivalent conditions(a)–(c) from Proposition 2.11, then we say that X is time-homogeneous.Note that by (c), the finite dimensional distributions of X are uniquelydetermined by L(X0) and (Pt)t≥0. We call X the Markov process withsemigroup (Pt)t≥0, started in the initial law L(X0).

Example. (Transition function of Brownian motion). Very often it is notpossible to give the transition function of a process explicitly. An exceptionis Brownian motion. Here, for all 0 ≤ s ≤ t, x ∈ R, and A ∈ B(R),

(2.46) Ps,t(x,A) =1√2πt

∫

Ady exp

[− (x− y)2

2(t− s)

].

Exercise 2.13 (Time reversal and time-homogeneity). Let T > 0 and let(Xt)t∈[0,T ] be a stochastic process with index set [0, T ]. How would you definethe Markov property for such a process? Show that if (Xt)t∈[0,T ] has theMarkov property then the time-reversed process (XT−t)t∈[0,T ] also has theMarkov property. If (Xt)t∈[0,T ] is time-homogeneous, then is (XT−t)t∈[0,T ]

in general also time-homogeneous? (Hint: it may be easier to investigatethe latter question for Markov chains (Xi)i∈0,...,n.)

2.4. Forward and backward equations. In Proposition 2.12 we haveseen that for a given initial law L(X0) and transition function (Markovsemigroup) (Pt)t≥0, there exists a Markov processX, which is unique in finitedimensional distributions. There are two reasons why we are not satisfiedwith this result. The first reason is that Proposition 2.12 says nothing aboutthe sample paths of X, which we would like to be cadlag. The second reasonis that Proposition 2.12 says nothing about how to construct transitionfunctions (Pt)t≥0 in the first place. Examples such as (2.46) where we canexplicitly write down a transition function are rare. There are basically twomore general approaches towards obtaining transition functions.

Identify probability kernels K on E with operators K : B(E) → B(E) asin Lemma 2.6 and probability measures µ on E with functions µ : B(E) →R. In a first attempt to obtain a transition function (Pt)t≥0, we fix x ∈ E,and we consider the probability measures

µt := Pt(x, ·) (t ≥ 0).

Then

µtf =

∫Pt(x,dy)f(y) = Ptf(x) (t ≥ 0)

MARKOV PROCESSES 33

and therefore

µt+εf = Pt+εf(x) = PtPεf(x) = µtPεf (t, ε ≥ 0).

Therefore, we can try to define an operator H, acting on probability mea-sures, by

Hµ := limε→0

ε−1(µPε − µ

),

and then try to solve

(2.47)

∂∂tµt=Hµt,µ0= δx

for fixed x ∈ E. Equation (2.47) is called the forward equation.In the second approach, we fix f ∈ B(E), and consider the functions

ut := Ptf (t ≥ 0).

Thenut+ε = Pt+εf = PεPtf = Pεut (t, ε ≥ 0).

Therefore, we can try to define an operator G, acting on functions f , by

Gf := limε→0

ε−1(Pεf − f

),

and then try to solve

(2.48)

∂∂tut=Gut,u0 = f

for fixed f ∈ B(E). Equation (2.48) is called the backward equation.


3. Feller semigroups

3.1. Weak convergence. Let E be a Polish space. By definition,

(3.1) Cb(E) :=f : f : E → R bounded continuous

is the space of all bounded continuous real-valued functions on E. We equipCb(E) with the supremumnorm

(3.2) ‖f‖ := supx∈E

|f(x)|.

With this norm, Cb(E) is a Banach space. If E is compact then everycontinuous function is bounded so we simply write C(E) = Cb(E). In thiscase C(E) is moreover separable. By definition

(3.3) M1(E) := µ : µ probability measure on (E,B(E)).is the space of all probability measures on E. We equip M1(E) with thetopology of weak convergence. We say that a sequence of measures µn ∈M1(E) converges weakly to a limit µ ∈ M1(E), denoted as µn ⇒ µ, if

(3.4) µnf −→n→∞

µf ∀f ∈ Cb(E).

(Recall the notation µf :=∫fdµ from (2.26).) This notion of convergence

indeed comes from a topology.

Proposition 3.1 (Prohorov metric). Let (E, d) be a separable metric space.For any A ⊆ E and r > 0, put Ar := x ∈ E : infy∈A d(x, y) < r. Then(3.5)

dPr(µ1, µ2) := infr > 0 : µ1(A) ≤ µ2(A

r) + r ∀A ⊆ E closed

= infr > 0 : ∃µ ∈ M1(E × E) s.t.µ(A× E) = µ1(A), µ(E ×A) = µ2(A) ∀A ∈ B(E),µ((x1, x2) ∈ E ×E : d(x1, x2) ≥ r) ≤ r

defines a metric on M1(E) generating the topology of weak convergence. Thespace (M1(E), dPr) is separable. If (E, d) is complete, then (M1(E), dPr) iscomplete.

Proof. See [EK86, Theorems 3.1.2, 3.1.7, and 3.3.1].

The second formula for dPr in (3.5) says that

(3.6)dPr(µ1, µ2) = inf

r > 0 :Pd(X1,X2) ≥ r ≤ r,

L(X1) = µ1, L(X1) = µ1

,

where the infimum is over all pairs of random variables (X1,X2) with lawsµ1 and µ2, respectively.

Formula (3.4) shows that the topology of weak convergence on M1(E)

does not depend on the choice of the metric on E. In other words, if d, d areequivalent metrics on E and dPr and dPr are the associated Prohorov metricson M1(E), then dPr and dPr are equivalent. Proposition 3.1 moreover showsthat M1(E) is Polish if E is Polish.

MARKOV PROCESSES 35

The next proposition can be found in [EK86, Theorem 3.2.2].

Proposition 3.2 (Prohorov). Let E be Polish. Then a set K ⊆ M1(E) iscompact if and only if K is closed and

(3.7) ∀ε > 0 ∃K ⊂ E compact s.t. µ(E\K) ≤ ε ∀µ ∈ K.

Property (3.7) is called the tightness if the set K. Note that Proposition 3.2implies in particular that M1(E) is compact if E is compact.

Exercise 3.3. Let E be Polish. Show K ⊂ M1(E) tight ⇒ K compact.

3.2. Continuous kernels and Feller semigroups. For a proof of thefollowing proposition, see for example [RS80, Theorem IV.14].

Proposition 3.4 (Probability measures as positive linear forms). Let E becompact and metrizable. A probability measure µ ∈ M1(E) defines through(2.26) a function µ : C(E) → R with the following properties

(1) (normalization) µ1 = 1.(2) (positivity) µf ≥ 0 for all f ≥ 0.(3) (linearity) µ(λ1f1 + λ2f2) = λ1µ(f1) + λ2µ(f2)

for all λ1, λ2 ∈ R, f1, f2 ∈ Cb(E).

Conversely, each function µ : C(E) → R with these properties correspondsthrough (2.26) to a probability measure µ ∈ M1(E).

Let E,F be compact metrizable spaces and let M1(E),M1(F ) be thespaces of probability measures on E and F , respectively, equipped with thetopology of weak convergence. By definition, a probability kernel K from Eto F is continuous if the map x 7→ K(x, ·) from E to M1(F ) is continuous.

Proposition 3.5 (Continuous probability kernels). A continuous probabilitykernel K from E to F defines through (2.25) an operator K : C(F ) → C(E)with the following properties

(1) (conservativeness) K1 = 1.(2) (positivity) Kf ≥ 0 for all f ≥ 0.(3) (linearity) K(λ1f1 + λ2f2) = λ1K(f1) + λ2K(f2)

for all λ1, λ2 ∈ R, f1, f2 ∈ C(E).

Conversely, each operator K : C(F ) → C(E) with these properties corre-sponds through (2.25) to a continuous probability kernel K from E to F .

Proof. By Proposition 3.4, the properties (1)–(4) from Proposition 3.5 areequivalent to the statement that for fixed x ∈ E, K(x, ·) is a probabilitymeasure on F . (Note that tightness is automatic since F is compact.) Thestatement that K maps C(F ) into C(E) means that

∫K(xn,dy)f(y) →∫

K(x,dy)f(y) whenever xn → x and f ∈ C(F ). This is equivalent to thestatement that K(xn, ·) ⇒ K(x, ·) whenever xn → x, i.e., x 7→ K(x, ·) iscontinuous.


Exercise 3.6. Show that properties (1)–(3) from Proposition 3.5 imply thatK : C(F ) → C(E) is continuous, i.e., Kfn → Kf whenever ‖fn − f‖ → 0.

It is easy to see (for example from Proposition 3.5) that the composi-tion (in the sense of (2.27)) of two continuous probability kernels is againcontinuous.

Let E be a compact metrizable space. By definition, we say that a transi-tion probability (Pt)t≥0 on E is continuous if the map (t, x) 7→ Pt(x, ·) from[0,∞) × E into M1(E) is continuous. Here we equip [0,∞) × E with theproduct topology and M1(E) with the topology of weak convergence.

Proposition 3.7 (Feller semigroups). Let (Pt)t≥0 be a continuous transitionprobability on E. Then the operators (Pt)t≥0 defined in (2.40) map C(E) intoC(E) and, considered as operators from C(E) into C(E), they satisfy

(1) Pt is conservative for each t ≥ 0, i.e., Pt1 = 1.(2) Pt is positive for each t ≥ 0, i.e., Ptf ≥ 0 for all f ≥ 0.(3) Pt is linear for each t ≥ 0.(4) The (Pt)t≥0 form a semigroup, i.e., P0f = f for all f ∈ C(E)

and PsPt = Ps+t for all s, t ≥ 0.(5) (Pt)t≥0 is strongly continuous, i.e., limt→0 ‖Ptf − f‖ = 0 for all

f ∈ C(E).

Conversely, each collection of operators (Pt)t≥0 from C(E) into C(E) withthese properties corresponds through (2.40) to a continuous transition prob-ability on E.

A collection of operators (Pt)t≥0 from C(E) into C(E) with the properties(1)–(5) from Proposition 3.7 is called a Feller semigroup.

Proof of Proposition 3.7. By the definition of weak convergence of proba-bility measures, a transition probability (Pt)t≥0 on E is continuous if andonly if the function (t, x) 7→ Ptf(x) from [0,∞) × E into R is continuousfor each f ∈ C(E). We claim that this is equivalent to the statement thatPtf ∈ C(E) for all t ≥ 0 and

(5)’ lims→t ‖Psf − Ptf‖ = 0 for all f ∈ C(E), t ≥ 0.

Assume that Ptf ∈ C(E) for all t ≥ 0 and (5)’ holds. Choose (tn, xn) →(t, x). Then

(3.8)|Ptnf(xn)− Ptf(x)| ≤ |Ptnf(xn)− Ptf(xn)|+ |Ptf(xn)− Pt(x)|

≤ ‖Ptnf − Ptf‖+ |Ptf(xn)− Pt(x)| −→n→∞

0,

which shows that (t, x) 7→ Ptf(x) is continuous. Conversely, if (t, x) 7→Ptf(x) is continuous then obviously we must have Ptf ∈ C(E) for all t ≥ 0.Now assume that (5)’ does not hold. Then we can find ε > 0, tn → t, andxn ∈ E such that

(3.9) |Ptnf(xn)− Ptf(xn)| ≥ ε.

MARKOV PROCESSES 37

Since E is compact, we can choose a convergent subsequence xnm → x.Then (tmn , xmn) → (t, x) but since(3.10)|Ptmn

f(xmn)− Ptf(x)| ≥ |Ptmnf(xmn)− Ptf(xmn)| − |Ptf(xmn)− Ptf(x)|

we have lim infn→∞ |Ptmnf(xmn)−Ptf(x)| ≥ ε by te continuity of Ptf , which

shows that Ptmnf(xmn) 6→ Ptf(x), i.e., (t, x) 7→ Ptf(x) is not continuous.

(Note that this is very similar to the proof of Lemma 1.7.)It follows from Proposition 3.5 that a collection (Pt)t≥0 of operators on

C(E) satisfying (1)–(4) corresponds to a transition probability on E withthe property that Pt is a continuous probability kernel for each fixed t ≥ 0.

It therefore suffices to show that (5) is equivalent to (5)’. The implication(5)’⇒(5) is trivial. Conversely, if (5) holds then

(3.11) limtn↓t

‖Ptnf − Ptf‖ = limtn↓t

‖Ptn−t(Ptf)− (Ptf)‖ = 0

by the semigroup property and (5) applied to Ptf . This shows that t 7→ Ptf ,considered as a function from [0,∞) into C(E), is continuous from the right.To prove also continuity from the left, we note that

(3.12) limtn↑t

‖Ptnf − Ptf‖ = limtn↑t

‖Ptn(f − Pt−tnf)‖ ≤ limtn↑t

‖f − Pt−tnf‖ = 0,

where we have the semigroup property, (5), and the fact that

(3.13)

‖Ptf‖ = supx∈E

∣∣∣∫

EPt(x,dy)f(y)

∣∣∣

≤ supx∈E

∫

EPt(x,dy)

∣∣f(y)∣∣ ≤ sup

y∈E|f(y)| = ‖f‖.

3.3. Banach space calculus. Let (V, ‖ · ‖) be a Banach space, equippedwith the topology generated by the norm. We need to develop calculusfor V -valued functions. The next proposition defines the Riemann integralfor continuous V -valued functions. Since this is very similar to the usualRiemann integral, we skip the proof.

Proposition 3.8 (Riemann integral). Let u : [a, b] → V be continuous andlet

(3.14) a = t(n)0 ≤ s

(n)1 ≤ t

(n)t ≤ · · · ≤ t

(n)mn−1 ≤ s(n)mn

≤ t(n)mn= b

satisfy

(3.15) limn→∞

supt(n)k − t(n)k−1 : k = 1, . . . ,mn = 0.

Then the limit

(3.16)

∫ b

au(t) dt := lim

n→∞

mn∑

k=1

u(s(n)k )(t

(n)k − t

(n)k−1)

exists and does not depend on the choice of the t(n)k and s

(n)k .


If a < b ≤ ∞ and u : [a, b) → V is continuous then we define

(3.17)

∫ b

au(t) dt := lim

c↑b

∫ c

au(t) dt,

whenever the limit exists. In this case we say that u is integrable over [a, b).In case b < ∞ and u : [a, b] → V is continuous this coincides with our earlier

definition of∫ ba u(t) dt.

Lemma 3.9 (Infinite integrals). Let a < b ≤ ∞, let u : [a, b) → V be

continuous and∫ ba ‖u(t)‖dt < ∞. Then u is integrable over [a, b) and

(3.18)∥∥∥∫ b

au(t) dt

∥∥∥ ≤∫ b

a‖u(t)‖dt.

Proof. Since u is continuous and f 7→ ‖f‖ is continuous, the function t 7→‖u(t)‖ is continuous. First consider the case that b < ∞ and that u : [a, b] →V is continuous. Choose t

(n)k and s

(n)k as in (3.14) and (3.15). Then

(3.19)∥∥∥

mn∑

k=1

u(s(n)k )(t

(n)k − t

(n)k−1)

∥∥∥ ≤mn∑

k=1

‖u(s(n)k )‖(t(n)k − t(n)k−1).

Taking the limit n → ∞ we arrive at (3.18). If a < b ≤ ∞ and u : [a, b) → Vis continuous then it follows that for each a ≤ c ≤ c′ < b

(3.20)∥∥∥∫ c′

au(t) dt−

∫ c

au(t) dt

∥∥∥ ≤∫ c′

c‖u(t)‖dt.

If∫∞a ‖u(t)‖dt < ∞ and ci ↑ b then (3.20) implies that

( ∫ cia u(t) dt

)i≥1

is

a Cauchy sequence, and hence, by the completeness of V , u is integrableover [a,∞). Taking the limit in (3.18) we see that this estimate holds in themore general case as well.

Let I be an interval. We say that a function u : I → V is continuouslydifferentiable if for each t ∈ I the limit

(3.21) ∂∂tu(t) := lim

h→0h−1(u(t+ h)− u(t))

exists, and t 7→ ∂∂tu(t) is continuous on I. We skip the proof of the next

result.

Proposition 3.10 (Fundamental theorem of calculus). Assume that u :[a, b] → V is continuously differentiable. Then

(3.22)

∫ b

a

∂∂tu(t) dt = u(b)− u(a).

So far, when we talked about a linear operator A on a normed linear spaceN , we always meant a linear map A : N → N that is defined on all of N . Itwill be convenient to generalize this definition such that operators need nolonger be defined on the whole space.

MARKOV PROCESSES 39

Definition 3.11 (Linear operators). A linear operator on a normed space(N, ‖ · ‖) is a pair (D(A), A) where D(A) ⊆ N is a linear subspace of N andA : D(A) → N is a linear map. The graph of such a linear operator is thelinear space

(3.23) G(A) := (f,Af) : f ∈ D(A) ⊆ N ×N.

We say that a linear operator is closed if its graph G(A) is a closed subspaceof N ×N , equipped with the product topology.

Note that a linear operator (including its domain!) is uniquely charac-terized by its graph. In fact, every linear subspace G ⊂ N × N with theproperty that

(3.24) (f, g) ∈ G, (f, g) ∈ G ⇒ g = g

is the graph of a linear operator (D(A), A).11 Note that the fact thatA is closed means that if fi ∈ D(A) are such that limi→∞ fi =: f andlimi→∞Afi =: g exist, then f ∈ D(A) and Af = g.

We recall a few facts from functional analysis.

Theorem 3.12 (Closed graph theorem). Let (N, ‖ · ‖) be a normed linearspace and let (D(A), A) be a linear operator on N with D(A) = N . Thenone has the relations (a)⇔(b)⇒(c) between the statements:

(a) A is continuous, i.e., ‖Afn −Af‖ → 0 whenever ‖fn − f‖ → 0.(b) A is bounded, i.e., there exists a constant K such that ‖Af‖ ≤ K‖f‖

for all f ∈ L.(c) A is closed.

If N is complete then all statements are equivalent.

To see that unbounded closed operators have nice properties, we provethe following fact, that will be useful later.

Lemma 3.13 (Closed operators and integrals). Let V be a Banach spaceand let (D(A), A) be a closed linear operator on V . Let a < b ≤ ∞, letu : [a, b) → V be continuous, u(t) ∈ D(A) for all t ∈ [a, b), t 7→ Au(t)

continuous,∫ ba ‖u(t)‖dt < ∞, and

∫ ba ‖Au(t)‖dt < ∞. Then

(3.25)

∫ b

au(t) dt ∈ D(A) and A

∫ b

au(t) dt =

∫ b

aAu(t) dt.

Proof. We first prove the statement for the case that u and Au are contin-

uous functions on a bounded time interval [a, b]. Choose t(n)k and s

(n)k as in

(3.14) and (3.15). Define

(3.26) fn :=

mn∑

k=1

u(s(n)k )(t

(n)k − t

(n)k−1).

11Sometimes the concept of a linear operator is generalized even further in the sensethat condition (3.24) is dropped. In this case, one talks about multi-valued operators.


Then fn ∈ D(A) and

(3.27) Afn =

mn∑

k=1

Au(s(n)k )(t

(n)k − t

(n)k−1).

By our assumptions,

(3.28) fn −→n→∞

∫ b

au(t) dt and Afn −→

n→∞

∫ b

aAu(t) dt.

Since A is closed, it follows that (3.25) holds. The statement for intervals ofthe form [a, b) follows by approximation with compact intervals, again usingthe fact that A is closed.

3.4. Semigroups and generators. Let (V, ‖ · ‖) be a Banach space. Bydefinition, a (linear) semigroup on V is a collection of everywhere definedlinear operators (St)t≥0 on V such that S0f = f for all f ∈ V and SsSt =Ss+t for all s, t ≥ 0. We say that (St)t≥0 is a contraction semigroup if St isa contraction for each t ≥ 0, i.e.,

(3.29) ‖Stf‖ ≤ ‖f‖ ∀f ∈ V.

We say that a semigroup (St)t≥0 is strongly continuous if

(3.30) limt→0

Stf = f ∀f ∈ V.

Example. A Feller semigroup on C(E), where E is compact and metrizableand C(E) is equipped with the supremumnorm, is a strongly continuouscontraction semigroup. (See Proposition 3.7 and (3.13).)

Remark. If (St)t≥0 is a strongly continuous contraction semigroup on a Ba-nach space V , then exactly the same proof as in (3.11)–(3.12) shows thatt 7→ Stf is a continuous map from [0,∞) into V for each f ∈ V .

Let (St)t≥0 be a strongly continuous contraction semigroup on a Ba-nach space V . By definition, the generator of (St)t≥0 is the linear operator(D(G), G), where

(3.31) D(G) :=f ∈ L : lim

t→0t−1(Stf − f) exists

,

and

(3.32) Gf := limt→0

t−1(Stf − f).

Exercise 3.14 (Generator of a deterministic process). Define a continuoustransition probability (Pt)t≥0 on [−1, 1] by

(3.33) Pt(x, · ) := δxe−t(·) (x ∈ [0, 1]).

Determine the generator (D(G), G) of the corresponding Feller semigroup(Pt)t≥0 on C[−1, 1].

MARKOV PROCESSES 41

Proposition 3.15 (Generators). Let V be a Banach space, let (St)t≥0 bea strongly continuous contraction semigroup on V , and let (D(G), G) be itsgenerator. Then D(G) is dense in V and (D(G), G) is closed. For each f ∈D(G), the function t 7→ Stf from [0,∞) to V is continuously differentiable,Stf ∈ D(G) for all t ≥ 0, and

(3.34) ∂∂tStf = GStf = StGf (t ≥ 0).

Proof. For each h > 0 and f ∈ V we have

(3.35) h−1(St+h − St)f =h−1(Sh − S0)

Stf = St

h−1(Sh − S0)

f.

If f ∈ D(G) then limh↓0 h−1(Sh − S0)f = Gf . Since St is a contraction itis continuous, so the right-hand side of (3.35) converges to StGf . It followsthat the other expressions converge as well, so Stf ∈ D(G) for all t ≥ 0 and

(3.36) limh↓0

h−1(St+h − St)f = GStf = StGf (t ≥ 0).

If t > 0 and 0 < h ≤ t then

(3.37) h−1(Stf − St−h)f = St−h

h−1(Sh − S0)

f → StGf as h ↓ 0.

Here the convergence follows from the estimates

(3.38)

‖St−hh−1(Sh − S0)f − StGf‖≤ ‖St−hh−1(Sh − S0)f − St−hGf‖+ ‖St−hGf − StGf‖≤ ‖h−1(Sh − S0)f −Gf‖+ ‖St−hGf − StGf‖.

Formula (3.37) shows that the time derivatives of Stf from the left existand are equal to the derivatives from the right. It follows that t 7→ Stf iscontinuously differentiable and (3.34) holds.

To prove the other statements, we start by showing that for any f ∈ V ,

(3.39)

∫ t

0Ssf ds ∈ D(G) and G

∫ t

0Ssf ds = Stf − f.

We have already seen that for any f ∈ V the function t 7→ Stf is continuous,

so∫ t0 Ssf ds is well-defined. For each t ≥ 0, St is a contraction, hence

continuous, hence closed, so by Lemma 3.13(3.40)

h−1(Sh − S0)

∫ t

0Ssf ds = h−1

∫ t

0

(Ss+h − Ss

)f ds

= h−1∫ t+h

hSsf ds−

∫ t

0Ssf ds

= h−1

∫ t+h

tSsf ds− h−1

∫ h

0Ssf ds.

Letting h → 0 we arrive at (3.39).Since for each f ∈ V

(3.41) limt↓0

t−1

∫ t

0Ssf ds = f,


formula (3.39) shows that D(G) is dense in V . To show that (D(G), G) isclosed, choose fn ∈ D(G) such that limn→∞ fn =: f and limn→∞Gfn =: gexist. By (3.34) and the fundamental theorem of calculus,

(3.42) Stfn − fn =

∫ t

0SsGfn ds (t > 0).

Letting n → ∞, using the fact that ‖Ss(Gfn − fn)‖ ≤ ‖Gfn − fn‖ for eachs ≥ 0, we find that

(3.43) Stf − f =

∫ t

0Ssg ds (t > 0).

Dividing by t and letting t → 0 we conclude that f ∈ D(G) and Gf = g.

3.5. Dissipativity and the maximum principle. Let E be a compactmetrizable space. We say that a linear operator (D(A), A) on C(E) satisfiesthe positive maximum principle if

(3.44) Af(x) ≤ 0 whenever f(x) ≥ 0 and f(y) ≤ f(x) ∀y ∈ E.

This says that Af(x) ≤ 0 whenever f assumes a positive maximum over Ein x.

Proposition 3.16 (Generators of Feller semigroups). Let E be compact andmetrizable, let (Pt)t≥0 be a Feller semigroup on C(E), and let (D(G), G) beits generator. Then

(1) 1 ∈ D(G) and G1 = 0.(2) D(G) is dense in C(E).(3) (D(G), G) is closed.(4) (D(G), G) satisfies the positive maximum principle.

Proof. Property (1) follows from the fact that Pt1 = 1 for all t ≥ 0. Prop-erties (2)–(3) follow from Proposition 3.15. To prove (4), assume thatf ∈ D(G), x ∈ E, f(x) ≥ 0, and f(y) ≤ f(x) ∀y ∈ E. Then

(3.45) Ptf(x) =

∫Pt(x,dy)f(y) ≤ f(x) (t ≥ 0),

and therefore

(3.46) Gf(x) := limt→0

t−1(Ptf(x)− f(x)) ≤ 0.

(Note that the limit exists by our assumption that f ∈ D(G).)

Generators of strongly continuous contraction semigroups have an impor-tant property that we have not mentioned so far.

Definition 3.17. A linear operator (D(A), A) on a Banach space V is calleddissipative if ‖(λ−A)f‖ ≥ λ‖f‖ for every f ∈ D(A) and λ > 0.

MARKOV PROCESSES 43

Note that an equivalent formulation of dissipativity is that

(3.47) ‖f‖ ≤ ‖(1 − εA)f‖ (f ∈ D(A), ε > 0).

This follows by setting ε = λ−1 in ‖(λ−A)f‖ ≥ λ‖f‖ and multiplying bothsides of the inequality by ε.

Lemma 3.18 (Contractions and dissipativity). If C is an (everywhere de-fined) contraction and r > 0 then r(C − 1) is dissipative.

Proof. If C is a contraction then for each ε > 0 and f ∈ V , one has ‖(1 −εr(C − 1))f‖ = ‖f − εrCf + εrf‖ ≥ (1 + rε)‖f‖ − rε‖Cf‖ ≥ ‖f‖.

Lemma 3.19 (Maximum principle and dissipativity). Let E be compactand metrizable and let (D(A), A) be a linear operator on C(E). If A satisfiesthe positive maximum principle, then A is dissipative.

Proof. Assume that f ∈ D(A). Since E is compact there exists an x ∈ Ewith |f(x)| ≥ |f(y)| for all y ∈ E. If f(x) ≥ 0, then by the positive maximumprinciple Af(x) ≤ 0 and therefore ‖(1− εA)f‖ ≥ |f(x)− εAf(x)| ≥ f(x) =‖f‖. If f(x) ≤ 0, then by the fact that A is linear also −f ∈ D(A) and‖(1− εA)f‖ = ‖(1− εA)(−f)‖ ≥ ‖ − f‖ = ‖f‖.

Exercise 3.20. Let (D(AWF), AWF) be the operator on C[0, 1] given by

(3.48)D(AWF) := C2[0, 1],

AWFf(x) :=12x(1− x) ∂2

∂x2 f(x), x ∈ [0, 1].

Show that AWF satisfies the positive maximum principle. For which valuesof c does the operator

(3.49) 12x(1− x) ∂2

∂x2 + c(12 − x) ∂∂x

satisfy the positive maximum principle?

Lemma 3.21 (Laplace equation and dissipativity). Let V be a Banachspace, let (St)t≥0 be a strongly continuous contraction semigroup on V , andlet (D(G), G) be its generator. Then G is dissipative. Moreover, for eachλ > 0 and f ∈ V , the Laplace equation

(3.50) p ∈ D(G) and (λ−G)p = f

has a unique solution, which is given by

(3.51) p =

∫ ∞

0Stfe

−λt dt.

Proof. For each λ > 0, define Uλ : V → V by

(3.52) Uλf :=

∫ ∞

0Stfe

−λt dt.

Since∫∞0 e−λt dt = λ−1 we have

(3.53) ‖Uλf‖ ≤ λ−1‖f‖ (λ > 0, f ∈ V ).


Since (compare (3.40))

(3.54)

h−1(Sh − S0)Uλf = h−1

∫ ∞

0(St+h − St)fe

−λt dt

= h−1(eλh − 1)

∫ ∞

0Stfe

−λtdt− h−1eλh∫ h

0Stfe

−λtdt,

letting h → 0 we find that Uλf ∈ D(G) and GUλf = λUλf − f , i.e.,

(3.55) (λ−G)Uλf = f (λ > 0, f ∈ V ).

If f ∈ D(G), then, using Lemma 3.13 and Proposition 3.15,(3.56)

UλGf =

∫ ∞

0StGfe−λt dt =

∫ ∞

0GStfe

−λt dt = G

∫ ∞

0Stfe

−λt dt = GUλf,

so that by (3.55) we also have

(3.57) Uλ(λ−G)f = f (λ > 0, f ∈ D(G)).

It follows that (λ − G) is a bijection from D(G) to V and that Uλ is itsinverse. Now (3.53) implies that

(3.58) ‖f‖ = ‖Uλ(λ−G)f‖ ≤ λ−1‖(λ−G)f‖ (λ > 0, f ∈ D(G)),

which shows that G is dissipative.

Lemma 3.22 (Cauchy equation). Let (D(A), A) be a dissipative linear op-erator on a Banach space V . Assume that f ∈ D(A), u : [0,∞) → V iscontinuously differentiable, u(t) ∈ D(A) for all t ≥ 0, and that u solves theCauchy equation

(3.59)

∂∂tu(t)=Au(t) (t ≥ 0),u(0)= f.

Then ‖u(t)‖ ≤ ‖f‖ for all t ≥ 0. In particular, by linearity, solutions to theCauchy equation (3.59) are unique.

Proof. Since A is dissipative, by (3.47), ‖f‖ ≤ ‖(1− tA)f‖ for all f ∈ D(A)and t > 0, so

(3.60)

‖u(t)‖ − ‖u(0)‖≤ ‖(1− tA)u(t)‖ − ‖u(t)− (u(t)− u(0))‖

= ‖u(t)− tAu(t)‖ −∥∥u(t)−

∫ t

0Au(s) ds

∥∥

≤∥∥∥tAu(t)−

∫ t

0Au(s) ds

∥∥∥ ≤∫ t

0‖Au(t)−Au(s)‖ds.

MARKOV PROCESSES 45

Choose 0 = t(n)0 ≤ t

(n)1 ≤ · · · ≤ t

(n)mn = t such that limn→∞ supt(n)k − t

(n)k−1 :

k = 1, . . . ,mn = 0. Applying (3.60) to ‖u(t(n)k )‖ − ‖u(t(n)k−1)‖ we see that

(3.61)

‖u(t)‖ − ‖u(0)‖ =

mn∑

k=1

‖u(t(n)k )‖ − ‖u(t(n)k−1)‖

≤mn∑

k=1

∫ t(n)k

t(n)k−1

∥∥∥Au(t(n)k )−Au(s)∥∥∥ ds.

It is not hard to check that

(3.62) limn→∞

supk=1,...,mn

supt(n)k−1≤s<t

(n)k

‖Au(t(n)k )−Au(s)‖ = 0,

so the right-hand side of (3.61) tends to zero as n → ∞.

Lemma 3.22 has two useful corollaries.

Corollary 3.23 (Generator characterizes semigroup). Let V be a Banachspace. If two strongly continuous contraction semigroups on V have thesame generator, then they are equal.

Proof. Let (St)t≥0 and (St)t≥0 be strongly continuous contraction semi-groups on V with the same generator (D(G), G). By Proposition 3.15, for

each f ∈ D(G), the functions u(t) := Stf and u(t) := Stf solve the Cauchyequation

(3.63)

∂∂tu(t)=Gu(t) (t ≥ 0),u(0) = f.

By Lemmas 3.21 and 3.22, Stf = Stf for all t ≥ 0, f ∈ D(G). By Proposi-

tion 3.15, D(G) is dense in V so using the continuity of St and St we find

that Stf = Stf for all t ≥ 0, f ∈ V .

Remark. An alternative proof of Corollary 3.23 uses the fact that the Laplaceequation (3.50) has a unique solution.

Corollary 3.24 (Bounded generators). Let V be a Banach space and letA : V → V be a bounded dissipative linear operator. Then A generates astrongly continuous contraction semigroup (St)t≥0 on V , which is given by

(3.64) Stf = eAtf :=

∞∑

n=0

1

n!(At)nf (t ≥ 0).

Proof. Using the fact that A is bounded it is not hard to prove that theinfinite sequence in (3.64) converges, defines a strongly continuous semigroup(St)t≥0 on V , and that Stf solves the Cauchy equation

(3.65)

∂∂tu(t)=Au(t) (t ≥ 0),u(0)= f.


It follows from Lemma 3.22 that (St)t≥0 is a contraction semigroup.

Exercise 3.25. Let E be compact and metrizable, let K be a continuousprobability kernel on E, and r ≥ 0 a constant. Define an (everywhere de-fined) linear operator on C(E) by

(3.66) Gf := r(Kf − f) (f ∈ C(E)).

Show that G generates a Feller semigroup. How would you describe thecorresponding Markov process on E?

3.6. Hille-Yosida: different formulations. By definition, the range of alinear operator (A,D(A)) on a Banach space V is the space

(3.67) R(A) := Af : f ∈ D(A).Here is a version of the celebrated Hille-Yosida theorem:

Theorem 3.26 (Hille-Yosida). A linear operator (D(G), G) on a Banachspace V is the generator of a strongly continuous contraction semigroup ifand only if

(1) D(G) is dense.(2) G is dissipative.(3) There exists a λ > 0 such that R(λ−G) = V .

Note that condition (3) says that there exists a λ > 0 such that for eachf ∈ V , there exists a solution p ∈ D(G) to the Laplace equation (λ−G)p = f .Thus, the necessity of the conditions (1)–(3) follows from Proposition 3.15and Lemma 3.21.

Before we turn to the proof of Theorem 3.26, we first discuss some of itsmerits, drawbacks, and consequences. The Hille-Yosida theorem is actuallyseldomly applied in the form in which we have stated it above. The reason isthat in most cases of interest, the domain D(G) of the generator of the semi-group that one is interested in is not known explicitly. Rather, one knowsthe action of G on certain well-behaved elements of the Banach space (forexample sufficiently differentiable functions) and wishes to extend this ac-tion to a generator of a strongly continuous semigroup. Since generators arealways closed (recall Proposition 3.15) one is naturally led to the followingdefinition.

Definition 3.27. Let (D(A), A) be a linear operator on a Banach space Vand let G = G(A) := (f,Af) : f ∈ D(A) be its graph. Let G denote theclosure of G in V × V , equipped with the product topology. If G is itself thegraph of a linear operator (D(A), A) then we say that (D(A), A) is closableand we call (D(A), A) the closure of (D(A), A).

Here is the form of the Hille-Yosida theorem that it is usually applied in:

MARKOV PROCESSES 47

Theorem 3.28 (Hille-Yosida, second version). A linear operator (D(A), A)on a Banach space V is closable and its closure generates a strongly contin-uous contraction semigroup if and only if

(1) D(A) is dense.(2) A is dissipative.(3) There exists a λ > 0 such that R(λ−A) is dense in V .

Since we are mainly interested in Feller semigroups, we will usually needthe following version of Theorem 3.28:

Theorem 3.29 (Hille-Yosida for Feller semigroups). Let E be compact andmetrizable. A linear operator (D(A), A) on C(E) is closable and its closureA generates a Feller semigroup if and only if

(1) There exist fn ∈ D(A) such that fn → 1 and Afn → 0.(2) D(A) is dense.(3) A satisfies the positive maximum principle.(4) There exists a λ > 0 such that R(λ−A) is dense in C(E).

In (1), the convergence is in C(E), i.e., ‖fn − 1‖ → 0 and ‖Afn‖ → 0. Itsuffices, of course, if 1 ∈ D(A) and A1 = 0.

We have already seen how to check the positive maximum principle in anexplicit set-up. To check that a subset of C(E) is dense, the next theoremis often useful. For a proof, see [RS80, Theorem IV.9].

Theorem 3.30 (Stone-Weierstrass). Let E be compact and metrizable. As-sume that D ⊂ C(E) separates points and

(1) 1 ∈ D.(2) f1f2 ∈ D for all f1, f2 ∈ D.(3) λ1f1 + λ2f2 ∈ D for all f1, f2 ∈ D and λ1, λ2 ∈ R.

Then D is dense in C(E).

In view of Theorem 3.30, the Conditions (1)–(3) from Theorem 3.29 areusually easy to check. The hard condition is usually condition (4), whichsays that there exists a dense set D ⊂ C(E) and a λ > 0 such that for eachf ∈ D, there exists a solution p ∈ D(A) to the Laplace equation (λ−A)p = f .It actually suffices to find solutions to a Cauchy equation. This is not easierbut perhaps a bit more intuitive:

Lemma 3.31 (Cauchy and Laplace equations). Let (A,D(A)) be a denselydefined dissipative linear operator on a Banach space V , f ∈ V , and assumethat u : [0,∞) → V is continuously differentiable, u(t) ∈ D(A) for all t ≥ 0,and u solves the Cauchy equation

(3.68)

∂∂tu(t)=Au(t) (t ≥ 0),u(0)= f.

Then A is closable and p :=∫∞0 u(t)e−λtdt satisfies the Laplace equation

(3.69) p ∈ D(A) and (λ−A)p = f.


Proof. By Lemma 3.22, ‖u(t)‖ ≤ ‖f‖ for all t ≥ 0 so∫∞0 ‖u(t)e−λt‖dt < ∞.

By Lemma 3.36 below, (A,D(A)) is closable. By Proposition 3.13, p ∈ D(A)and

(3.70)Ap =

∫ ∞

0Au(t)e−λtdt =

∫ ∞

0( ∂∂tu(t))e

−λtdt

=∣∣∣∞

t=0u(t)e−λt −

∫ ∞

0u(t)( ∂

∂te−λt)dt = −f + λp,

which shows that (λ−A)p = f .

Exercise 3.32. Show that the closure of the operator AWF from Exer-cise 3.20 generates a Feller semigroup on C[0, 1]. Hint: use the space ofall polynomials on [0, 1].

3.7. Dissipative operators. Before we embark on the proofs of the variousversions of the Hille-Yosida theorem we study dissipative operators in moredetail. In doing so, it will be convenient to use the formalism of multi-valuedoperators. By definition, a multi-valued (linear) operator on a Banach spaceV is a linear subspace

(3.71) G ⊆ V × V.

We say that G is single-valued if G satisfies (3.24). In this case, G is thegraph of some linear operator (D(A), A) on V . We call

(3.72)D(G) := f : ∃g ∈ V s.t. (f, g) ∈ G,R(G) := g : ∃f ∈ V s.t. (f, g) ∈ G

the domain and range of G. We say that G is bounded if there exists aconstant K such that

(3.73) ‖g‖ ≤ K‖f‖ ∀(f, g) ∈ G.We say that G is a contraction if ‖g‖ ≤ ‖f‖ for all (f, g) ∈ G. Note thatif G is single-valued and G is the graph of (D(A), A), then these definitionscoincide with the corresponding definitions for A.

Lemma 3.33 (Bounded operators). Let V be a Banach space and let G be abounded (possibly multivalued) linear operator on G. Then G is single-valued.

Moreover, D(G) = D(G) and G is closed if and only if D(G) is closed.

Proof. Assume that (f, g), (f, g) ∈ G. Then by linearity (0, g − g) ∈ G, andby boundedness ‖g − g‖ ≤ K‖0‖ = 0, hence g = g. It follows that G is thegraph of a bounded linear operator (D(A), A) on V .

One has

(3.74)D(A)=

f ∈ V : ∃fn ∈ D(A) s.t. fn → f

,

D(A)=f ∈ V : ∃fn ∈ D(A), g ∈ V s.t. fn → f, Afn → g

.

Therefore, the inclusion D(A) ⊇ D(A) is obvious. Conversely, assume thatfn ∈ D(G), fn → f for some f ∈ V . Then ‖Afn − Afm‖ ≤ K‖fn − fm‖,

MARKOV PROCESSES 49

which shows that the Afn form a Cauchy sequence. Therefore Afn → g forsome g ∈ V , which shows that D(A) ⊆ D(A).

If (D(A), A) is closed then by what we have just proved D(A) = D(A) =D(A), so D(A) is closed. Conversely, if D(A) is closed and fn ∈ D(G),fn → f , Afn → g, then f ∈ D(A) by the fact that D(A) is closed and‖A(fn − f)‖ ≤ K‖fn − f‖ → 0 by the boundedness of (D(A), A), whichshows that g = limn→∞Afn = Af , and therefore (f, g) ∈ G(A). This showsthat (D(A), A) is closed.

Let G ⊆ V ×V again be a multivalued operator and let λ1, λ2 be constants.We define

(3.75) λ1 + λ2G := (f, λ1f + λ2g) : (f, g) ∈ G.Note that if G is the graph of a single-valued operator (D(A), A), thenλ1 + λ2G is the graph of (D(A), λ1 + λ2A). We define

(3.76) G−1 := (g, f) : (f, g) ∈ G.If G is the graph of a single-valued operator (D(A), A) and A is a bijectionfrom D(A) to R(A), then G−1 is the graph of (R(A), A−1). Extending ourearlier definition (see (3.47)), we say that G is dissipative if

(3.77) ‖f‖ ≤ ‖f − εg‖ ∀(f, g) ∈ G, ε > 0.

Lemma 3.34 (Closures). Let V be a Banach space and let G ⊆ V × V be amultivalued linear operator on V . Then

(i) λ1 + λ2G = λ1 + λ2G for all λ1, λ2 ∈ R, λ2 6= 0.

(ii) G−1 = G−1.

(iii) If G is dissipative then G is dissipative.

Proof. Since λ2 6= 0,

(3.78)

λ1 + λ2G=(f, h) : ∃(fn, λ1f + λ2gn) ∈ (λ1 + λ2G),

fn → f, λ1f + λ2gn → h

=(f, λ1 + λ2g) : ∃(fn, gn) ∈ G,

fn → f, gn → g= λ1 + λ2G

.

The proof of (ii) is similar but easier. To prove (iii), note that if (f, g) ∈ G,then there exist (fn, gn) ∈ G such that fn → f and gn → g, and therefore, bythe dissipativity of G, ‖f‖ = limn→∞ ‖fn‖ ≤ limn→∞ ‖fn−εgn‖ = ‖f −εg‖.

Lemma 3.35 (Dissipativity and range). Let G be dissipative and ε > 0.

Then (1− εG)−1 is a contraction. Moreover, R(1− εG) = R(1− εG) and Gis closed if and only if R(1− εG) is closed.

Proof. If G is dissipative then ‖f‖ ≤ ‖f − εg‖ for all (f, g) ∈ G. Thismeans that ‖h‖ ≤ ‖f‖ for all (h, f) ∈ (1 − εG)−1. This shows that (1 −εG)−1 is a contraction. Therefore, by Lemmas 3.33 and 3.34, R(1− εG) =


D((1− εG)−1) = D((1− εG)−1) = D((1 − εG)−1) = R(1 − εG), and G isclosed ⇔ (1 − εG)−1 is closed ⇔ D((1 − εG)−1) is closed ⇔ R(1 − εG) isclosed.

Lemma 3.36 (Dissipativity and closability). Let (D(A), A) be dissipativeand assume that D(A) is dense in V . Then (D(A), A) is closable.

Proof. Let G be the graph of (D(A), A). By Lemma 3.34, G is dissipative,while obviously D(G) is dense in V . We need to show that G is single-valued.By linearity, it suffices to show that (0, g) ∈ G implies g = 0. So imaginethat (0, g) ∈ G. Since D(G) is dense in V there exist (gn, hn) ∈ G such thatgn → g. Since G is dissipative, ‖0+ εgn‖ ≤ ‖(0+ εgn)− ε(g+ εhn)‖ for eachε > 0. It follows that ‖gn‖ ≤ ‖gn − g − εhn‖ for each ε > 0. Letting ε → 0and then n → ∞ we find that ‖g‖ = limn→∞ ‖gn‖ ≤ limn→∞ ‖gn − g‖ = 0.

3.8. Resolvents.

Definition 3.37 (Resolvents). By definition, the resolvent set of a closedlinear operator (D(A), A) on a Banach space V is the set

(3.79)ρ(A) :=

λ ∈ R : (λ−A) : D(A) → V is a bijection,

(λ−A)−1 is a bounded operator.

If λ ∈ ρ(A) then the bounded operator (λ − A)−1 : V → D(A) is called theresolvent of A (at λ).

Note that λ ∈ ρ(A) implies that λ is not an eigenvalue of A. For imaginethat Ap = λp for some p ∈ D(A). Then p = (λ − A)−1(λ − A)p = (λ −A)−10 = 0. Note furthermore that the generator (D(G), G) of a stronglycontinuous contraction semigroup (St)t≥0 never has eigenvalues λ > 0. Forif Gf = λf with f ≥ 0 then u(t) := feλt solves the Cauchy equation∂∂tu(t) = Gu(t) and therefore Stf = eλtf , contradicting contractiveness.

Exercise 3.38. Show that if (D(A), A) is not closed then the set ρ(A) in(3.79) is always empty.

Lemma 3.39 (Resolvent set is open). Let A be a closed linear operator ona Banach space V . Then the resolvent set ρ(A) is an open subset of R.

Proof. Assume that λ ∈ ρ(A). Then (λ − A)−1 is a bounded operator, sothere exists a K such that ‖(λ − A)−1f‖ ≤ K‖f‖ for all f ∈ V . Now let|λ′ − λ| < K−1. Then the infinite sum

(3.80) Sf :=

∞∑

n=0

(λ− λ′)n(λ−A)−(n+1)f (f ∈ V )

MARKOV PROCESSES 51

defines a bounded operator S : V → D(A), and(3.81)

(λ′ −A)Sf =((λ−A)− (λ− λ′)

) ∞∑

n=0

(λ− λ′)n(λ−A)−n+1f

=

∞∑

n=0

(λ− λ′)n(λ−A)−nf −∞∑

n=0

(λ− λ′)n+1(λ−A)−n+1f = f

for each f ∈ V . In the same way we see that S(λ′ − A)f = f for eachf ∈ D(A) so (λ′−A) : D(A) → V is a bijection and its inverse S = (λ′−A)−1

is a bounded operator.

Exercise 3.40. Let A be a closed linear operator on a Banach space V andλ, λ′ ∈ ρ(A), λ 6= λ′. Prove the resolvent identity(3.82)

(λ−A)−1(λ′ −A)−1 =(λ−A)−1 − (λ′ −A)−1

λ′ − λ= (λ′ −A)−1(λ−A)−1.

According to [EK86, page 11]: ‘Since (λ−A)(λ′ −A) = (λ′ −A)(λ−A) forall λ, λ′ ∈ ρ(A), we have (λ′ − A)−1(λ−A)−1 = (λ−A)−1(λ′ −A)−1’. Doyou agree with this argument?

Lemma 3.41 (Resolvent set of dissipative operator). Let ρ(A) be the re-solvent set of a closed dissipative operator (D(A), A) on a Banach space Vand let ρ+(A) := ρ(A) ∩ (0,∞). Then

(3.83) ρ+(A) = λ > 0 : R(λ−A) = V and either ρ+(A) = ∅ or ρ+(A) = (0,∞).

Proof. If A is a dissipative operator and λ > 0 then by Lemma 3.35 (λ −A)−1 = λ(1 − λ−1A) is a bounded operator (and therefore single-valued byLemma 3.33). This proves (3.83). To see that ρ+(A) is either ∅ of (0,∞),by Lemma 3.39, it suffices to show that ρ+(A) ⊂ (0,∞) is closed. Chooseλn ∈ ρ+(A), λn → λ ∈ (0,∞). We need to show that R(λ− A) = V . SinceA is closed, by Lemma 3.35, it suffices to show that R(λ−A) is dense in V .Choose g ∈ V and define gn := (λ − A)(λn − A)−1g. Then gn ∈ R(λ − A)and, since A is dissipative,

(3.84)‖g − gn‖ = ‖

((λn −A)− (λ−A)

)(λn −A)−1g‖

= |λn − λ|‖(λn −A)−1g‖ ≤ |λn − λ|λ−1n ‖g‖ −→

n→∞0.

3.9. Hille-Yosida: proofs. Let (V, ‖ ·‖) be a Banach space. By definition,the operator norm of an everywhere defined bounded linear operator A :V → V is

(3.85) ‖A‖ := infK > 0 : ‖Af‖ ≤ K‖f‖ ∀f ∈ V .


We say that a collection A of (everywhere defined) bounded linear operatorson V is uniformly bounded if sup‖A‖ : A ∈ A < ∞. The following fact iswell-known, see for example [RS80, Theorem III.9]

Proposition 3.42 (Principle of uniform boundedness). Let (V, ‖ · ‖) be aBanach space and let A be a collection of bounded linear operators A : V →V . Assume that sup‖Af‖ : f ∈ V < ∞ for each f ∈ V . Then A isuniformly bounded.

Lemma 3.43 (Order of limits). Let C,Cn be bounded linear operators on aBanach space V . Assume that limn→∞Cnf = Cf for all f ∈ V . Then theCn are uniformly bounded and(3.86)

limn→∞

limm→∞

Cnfm = limm→∞

limn→∞

Cnfm = limn→∞

Cnfn = Cf ∀fn → f.

Proof. By the continuity of the Cn,

(3.87) limn→∞

limm→∞

Cnfm = limn→∞

Cnf = Cf.

By the continuity of C,

(3.88) limm→∞

limn→∞

Cnfm = limm→∞

Cfm = Cf,

Since limn→∞ ‖Cnf‖ = ‖Cf‖ one has supn ‖Cnf‖ < ∞ for each f ∈ V , soby the principle of uniform boundedness the Cn are uniformly bounded.Set K := supn ‖Cn‖. Then

(3.89)‖Cnfn −Cf‖≤‖Cnfn − Cnf‖+ ‖Cnf − Cf‖

≤K‖fn − f‖+ ‖Cnf − Cf‖,which shows that limn→∞Cnfn = Cf .

Exercise 3.44. Let V be a Banach space, let Cn be uniformly boundedlinear operators on V . Let f, fm ∈ V , fm → f . Assume that the limitlimn→∞Cnfm =: gm exists for all m. Show that the limit limm→∞ gm existsand

(3.90) limn→∞

Cnf = limm→∞

gm.

Corollary 3.45. Let V be a Banach space, let D ⊂ V be dense and let Cn

be (everywhere defined) uniformly bounded linear operators on V . Assumethat limn→∞Cnf exists for all f ∈ D. Then there exists a bounded linearoperator C on V such that Cnf → Cf for all f ∈ V .

Proof. By Exercise 3.44, the set f ∈ V : limn→∞Cnf exists is closed,so by our assumptions the limit limn→∞Cnf exists for all f ∈ V . DefineCf := limn→∞Cnf . It is easy to see that C is linear and bounded.

MARKOV PROCESSES 53

Proof of Theorem 3.26. By Proposition 3.15 and Lemma 3.21, the condi-tions (1)–(3) are necessary. Conversely, if (1)–(3) hold, then by Lemma 3.41,(1− εG)−1 : V → D(G) is a bounded operator for each ε > 0. By definition,the Yosida approximation of G (at ε > 0) is the everywhere defined bounded(by Lemma 3.35) linear operator

(3.91) Gεf := ε−1((1− εG)−1 − 1

)f (f ∈ V ).

One has

(3.92) limε→0

Gεf = Gf (f ∈ D(G)).

To see this, recall that (1 − εG)−1(1 − εG)f = f for all f ∈ D(G) so that(1− εG)−1f − f = ε(1− εG)−1Gf for all f ∈ D(G), and therefore by (3.91)

(3.93) Gεf = (1− εG)−1Gf (f ∈ D(G)).

In order to prove (3.92), by (3.93) it suffices to show that

(3.94) limε→0

(1− εG)−1f = f (f ∈ V ).

By (3.91), (3.93), and the fact that (1− εG)−1 is a contraction

(3.95) ‖(1− εG)−1f − f‖ = ε‖Gεf‖ = ε‖(1 − εG)−1Gf‖ ≤ ε‖Gf‖(f ∈ D(G)). This proves (3.94) for f ∈ D(G). By Corollary 3.45 and thefact that D(G) is dense, we conclude that (3.94) holds for each f ∈ V .

By Lemma 3.35, (1−εG)−1 is a contraction, so by (3.91) and Lemma 3.18,Gε is dissipative. Therefore, by Corollary 3.24, Gε generates a stronglycontinuous contraction semigroup (Sε

t )t≥0 = (eGεt)t≥0 on V . We will showthat the limit

(3.96) Stf := limε→0

Sεt f

exists for all t ≥ 0 and f ∈ V and defines a strongly continuous contractionsemigroup (St)t≥0 with generator G.

It follows from Exercise 3.40 that GεGε′ = Gε′Gε for all ε, ε′ > 0. Conse-quently also Gε and eGε′ t commute, so

(3.97)

‖eGεtf − eGε′ tf‖=

∥∥∥∫ t

0

∂∂s

(eGεseGε′ (t−s)

)fds

∥∥∥

≤∫ t

0

∥∥( ∂∂se

Gεs)eGε′ (t−s)f + eGεs( ∂∂se

Gε′ (t−s))f∥∥ds

=

∫ t

0

∥∥eGεseGε′ (t−s)(Gε −Gε′)f∥∥ds

≤∫ t

0‖(Gε −Gε′)f‖ds = t‖(Gε −Gε′)f‖.

Note that we have used commutativity in the last equality. It follows from(3.92) and (3.97) that for each f ∈ D(G), t ≥ 0, and εn → 0, (eGεn tf)n≥0 isa Cauchy sequence, and therefore the limit in (3.96) exists for all f ∈ D(G)


and t ≥ 0. By Corollary 3.45 and the fact that the Sεt are contractions, the

limit in (3.96) exists for all f ∈ V . With a bit more effort it is possible tosee that the limit is locally uniform in t, i.e.,

(3.98) limε→0

sup0≤s≤T

‖Sεt f − Stf‖ = 0 ∀T > 0, f ∈ V.

It remains to show that the operators (St)t≥0 defined in (3.96) form astrongly continuous contraction semigroup with generator G. It is easyto see that they are contractions. For the semigroup property, we note thatby Lemma 3.43

(3.99) StSsf = limε→0

SεtS

εsf = lim

ε→0Sεt+sf = St+sf (f ∈ V ).

To see that (St)t≥0 is strongly continuous, we note that

(3.100) limt→0

‖Stf − f‖ = limt→0

limε→0

‖Sεt f − f‖ = lim

ε→0limt→0

‖Sεt f − f‖ = 0,

where the interchanging of limits is allowed by (3.98). In order to prove thatG is the generator of (St)t≥0 it suffices to show that limt→0 t

−1(Stf−f) = Gf

for all f ∈ D(G). For if this is the case, then the generator (D(G), G) of(St)t≥0 is an extension of the operator (D(G), G). Since both (λ − G) :

D(G) → R(λ−G) = V and (λ− G) : D(G) → V are bijections, this is only

possible if (D(G), G) = (D(G), G).In order to show that limt→0 t

−1(Stf−f) = Gf for all f ∈ D(G) it sufficesto show that

(3.101) Stf − f =

∫ t

0SsGfds (f ∈ D(G)).

By Proposition 3.15,

(3.102) Sεt f − f =

∫ t

0SεsGεfds (f ∈ V ).

Using (3.98) and (a simple extension of) Lemma 3.43,

(3.103) limε→0

sup0≤s≤t

‖SεsGεf − SsGf‖ = 0 (f ∈ D(G)).

Inserting this into (3.102) we arrive at (3.101).

Proof of Theorem 3.28. By Theorem 3.26, conditions (1) and (2) are obvi-

ously necessary. (Note that in general D(A) ⊆ D(A) so if D(A) is not densethen D(A) is not dense.) By Lemma 3.35, condition (3) is also necessary.

Conversely, if (D(A), A) satisfies (1)–(3) then by Lemma 3.36, A is clos-able, while by Lemma 3.35, R(λ − A) = V , so that by Lemma 3.41, Asatisfies the conditions of Theorem 3.26.

Proof of Theorem 3.29. By Proposition 3.16 and Theorem 3.28, the condi-tions (1)–(4) are necessary. We have seen in Lemma 3.19 that the positivemaximum principle implies that A is dissipative, so if A satisfies (1)–(4) thenby Theorem 3.28 A generates a strongly continuous contraction semigroup

MARKOV PROCESSES 55

on C(E). If 1 ∈ D(A) and A1 = 0 then ut := 1 solves the Cauchy equation∂∂tut = Aut so by Proposition 3.15 and Lemma 3.22 Pt1 = 1 for all t ≥ 0.To finish the proof, we must show that Ptf ≥ 0 for all f ≥ 0. This

would be easy using the Cauchy equation if we would know that A satisfiesthe positive maximum principle; unfortunately it is not straightforward toshow the latter. Therefore we use a different approach. We know thatR(1− εA) is dense for all ε > 0 and that (1− εA)−1 : R(1− εA) → D(A) isa bounded operator. We claim that (1− εA)−1 maps nonnegative functionsinto nonnegative functions. Indeed, if f ∈ D(A) does not satisfy f ≥ 0, thenf assumes a negative minimum over E in some point x, and therefore bythe positive maximum principle applied to −f ,

(3.104) (1− εA)f(x) ≤ f(x) < 0,

which shows that not (1 − εA)f ≥ 0. Thus (1 − εA)f ≥ 0 implies f ≥ 0,i.e., (1− εA)−1 maps nonnegative functions into nonnegative functions. Byapproximation it follows that also (1 − εA)−1 maps nonnegative functionsinto nonnegative functions.12 Let Aε = ε−1((1 − εA)−1 − 1) be the Yosidaapproximation of A. Then f ≥ 0 implies

(3.105) eAεtf = e−εt

eε−1(1− εA)−1tf = e−εt

∞∑

n=0

ε−n

n!(1− εA)−nf ≥ 0.

Letting ε → 0 we conclude that Ptf ≥ 0.

12Since the set P := f ∈ C(E) : f ≥ 0 is the closure of its interior and R(1− εA) isdense in C(E), it follows that R(1− εA) ∩ P is dense in P .


4. Feller processes

4.1. Markov processes. Let E be a Polish space. In Proposition 2.12we have seen that for a given initial law L(X0) and transition function (or,equivalently, Markov semigroup) (Pt)t≥0 on E, there exists a Markov processX, which is unique in finite dimensional distributions. We are not satisfiedwith this result, however, since do not know in general if X has a versionwith cadlag sample paths. This motivates us to change our definition of aMarkov process. From now on, we work with the following definition.

Definition 4.1 (Markov process). By definition, a Markov process withtransition function (Pt)t≥0, is a collection (Px)x∈E of probability laws on(DE [0,∞),B(DE [0,∞))) such that under the law Px the stochastic processX = (Xt)t≥0 given by the coordinate projections

(4.1) Xt(w) := ξt(w) = w(t) (w ∈ DE[0,∞), t ≥ 0),

satisfies the equivalent conditions (a)–(c) from Proposition 2.11 and one hasPxX0 = x = 1.

Sometimes we denote a Markov process by a pair (X, (Px)x∈E), since wewant to indicate which symbol we use for the coordinate projections. Notethat a Markov process (Px)x∈E is uniquely determined by its transitionfunction (Pt)t≥0. We do not know if to each transition function (Pt)t≥0

there exists a corresponding Markov process (Px)x∈E . The problem is toshow cadlag sample paths. Indeed, by Proposition 2.12, there exists foreach x ∈ E an stochastic process Xx = (Xx

t )t≥0 such that Xx0 = x and

Xx satisfies the equivalent conditions (a)–(c) from Proposition 2.11. If foreach x ∈ E we can find a version of Xx with cadlag sample paths, then thelaws Px := L(Xx), considered as probability measures on DE [0,∞), form aMarkov process in the sense of Definition 4.1.

We postpone the proof of the next theorem till later.

Theorem 4.2 (Feller processes). Let E be compact and metrizable and let(Pt)t≥0 be a Feller semigroup on C(E). Then there exists a Markov process(Px)x∈E with transition function (Pt)t≥0.

Thus, each Feller semigroup (Pt)t≥0 defines a unique Markov process(Px)x∈E . We call this the Feller process with Feller semigroup (Pt)t≥0. If(D(G), G) is the generator of (Pt)t≥0, then we also say that (Px)x∈E is theFeller process with generator G.

We develop some notation and terminology for general Markov processesin Polish spaces.

Lemma 4.3 (Measurability). Let E be Polish and let (Px)x∈E be a Markovprocess with transition function (Pt)t≥0. Then (x,A) 7→ Px(A) is a proba-bility kernel from E to DE [0,∞).

Proof. By definition, Px is a probability measure on DE [0,∞) for each fixedx ∈ E. Formula (4.1) shows that for fixed A of the form A = w ∈

MARKOV PROCESSES 57

DE[0,∞) : wt1 ∈ A1, . . . , wtn ∈ An with A1, . . . , An ∈ B(E), the func-tion x 7→ Px(A) is measurable. Since

(4.2) D := A ∈ B(DE[0,∞)) : x 7→ Px(A) is measurableis a Dynkin system and since the coordinate projections generate the Borel-σ-field on DE [0,∞), the same is true for all A ∈ B(DE[0,∞)).

If (Px)x∈E is a Markov process with transition function (Pt)t≥0 and µ isa probability measure on E, then using Lemma 4.3 we define a probabilitymeasure Pµ on (DE [0,∞),B(DE [0,∞))) by

(4.3) Pµ(A) :=

∫

Eµ(dx)Px(A) (A ∈ B(DE[0,∞))).

Under Pµ, the stochastic process given by the coordinate projections X =(Xt)t≥0 satisfies the equivalent conditions (a)–(c) from Proposition 2.11 andPxX0 ∈ · = µ. We call (X,Pµ) the Markov process with transition func-tion (Pt)t≥0 started in the initial law µ. We let Ex, Eµ denote expectationwith respect Px,Pµ, respectively.

Recall that two stochastic processes with the same finite dimensionaldistributions are called versions of each other. Thus, if (X, (Px)x∈E) is aMarkov process with transition function (Pt)t≥0, then a stochastic processX ′, defined on any probability space, which has the same finite dimensionaldistributions as X under the law Pµ, is called a version of the Markov pro-cess with semigroup (Pt)t≥0 and initial law µ. This is equivalent to thestatement that X ′ satisfies the equivalent conditions (a)–(c) from Proposi-tion 2.11 and L(X ′

0) = µ. If X ′ has moreover cadlag sample paths then thisis equivalent to L(X ′) = Pµ, where we view X ′ as a random variable withvalues in DE [0,∞). We are usually only interested in versions with cadlagsample paths.

4.2. Jump processes. Jump processes are the simplest Markov processes.We have already met them in Exercise 3.66.

Proposition 4.4 (Jump processes). Let E be Polish, let K be a probabilitykernel on E, and let r ≥ 0 be a constant. Define G : B(E) → B(E) by

(4.4) Gf := r(Kf − f) (f ∈ B(E)),

and put

(4.5) Ptf := eGtf :=

∞∑

n=0

1

n!(Gt)nf (f ∈ B(E), t ≥ 0).

Then (Pt)t≥0 is a Markov semigroup and there exists a Markov process(Px)x∈E corresponding to (Pt)t≥0. If E is compact and K is a continuousprobability kernel then (Px)x∈E is a Feller process.


Note that the infinite sum in (4.5) converges uniformly since ‖r(Kf −f)‖ ≤ 2r‖f‖ for each f ∈ B(E). Before we prove Proposition 4.4 we firstlook at a special case.

Example: (Poisson process with rate r). LetE := N,K(x, y) := 1y=x+1,and r > 0. Hence

(4.6) Gf(x) = r(f(x+ 1)− f(x)

)(f ∈ B(N)).

Then the Markov semigroup in (4.5) is given by

Ptf(x) = eGt = e

rt(K − 1) = e−rtertK

= e−rt∞∑

n=0

(rt)n

n!Knf(x) = e−rt

∞∑

n=0

(rt)n

n!f(x+ n),

hence

(4.7) Pt(x, x+ n) = 1n≥0e−rt (rt)

n

n!.

We call the associated Markov process (N, (Px)x∈N) the Poisson process withintensity r. By condition (a) from Proposition 2.11,

(4.8) Px(Nt −Ns = n|FNs ) = 1n≥0e

−r(t−s) (r(t− s))n

n!.

This says that Nt − Ns is Poisson distributed with mean r(t − s). Sincethe right-hand side of (4.8) does not depend on Ns, the random variableNt −Ns is independent of (Nu)u≤s. It follows that if (Nt)t≥0 is a version ofthe Poisson process started in any initial law, then for any 0 ≤ t1 ≤ · · · ≤ tn,the random variables

Nt1 −N0, . . . , Ntn −Ntn−1

are independent and Poisson distributed with means r(t1 − 0), . . . , r(tn −tn−1). Recall that if P,Q are Poisson distributed random variables withmeans p and q, then P +Q is Poisson distributed with mean p+ q.

Poisson processes describe the statistics of rare events. For each n ≥ 1, let

(X(n)i )i∈N be a Markov chain in N with X

(n)0 = 0 and transition probabilities

P(X(n)i+1 = y|X(n)

0 , . . . ,X(n)i ) = p(n)(X

(n)i , y),

where

p(n)(x, y) :=

1− 1n if y = x,

1n if y = x+ 1,0 otherwise.

Fix r > 0 and define processes (N(n)t )t≥0 by

N(n)t := X

(n)⌊nrt⌋ (t ≥ 0).

Then

PN (n)t = k =

(mk

)1n

k (1− 1

n

)m−k, where m := ⌊nrt⌋.

MARKOV PROCESSES 59

It follows that N(n)t converges as n → ∞ to a Poisson distributed random

variable with mean rt. With a bit more work it is easy to see that thestochastic process N (n) converges in finite dimensional distributions to aPoisson process with intensity r, started in N0 = 0.

The next exercise show how to construct versions of the Poisson processwith cadlag sample paths.

Exercise 4.5. Let (σk)k≥1 be independent exponentially distributed randomvariables with mean r−1 and set τn :=

∑nk=1 σk. Show that

(4.9) Nt := maxn ≥ 0 : τn ≤ t (t ≥ 0)

defines a version N = (Nt)t≥0 of the Poisson process with rate r started inN0 = 0.

Proof of Proposition 4.4. The case r = 0 is trivial so assume r > 0. For eachx ∈ E, let (Y x

n )n≥0 be a Markov chain started in Y x0 = x with transition

kernel K i.e.,

(4.10) P(Y xn ∈ A|Y x

0 , . . . , Yxn−1) = K(Y x

n−1, A) a.s. (A ∈ B(E)).

Let (σk)k≥1 be independent exponentially distributed random variables withmean r−1, independent of Y x, and set τn :=

∑nk=1 σk. Define a process

Xx = (Xxt )t≥0 by

(4.11) Xxt := Y x

n if τn ≤ t < τn+1.

We claim that Xx satisfies the equivalent conditions (a)–(c) from Propo-sition 2.11. Since Xx obviously has cadlag sample paths this then impliesthat Px := L(Xx) defines a Markov process with semigroup (Pt)t≥0. If Eis compact and K is a continuous probability kernel then we have alreadyseen in Exercise 3.66 that (Pt)t≥0 is a Feller semigroup.

To see that Xx defined in (4.11) satisfies condition (c) from Proposi-tion 2.11, let N = (Nt)t≥0 be a Poisson process with intensity r, started inN0 = 0, independent of Y x. Then (4.11) says that

(4.12) Xxt := Y x

Nt(t ≥ 0),

i.e., Xx jumps according to the kernel K at random times that are given bya Poisson process with intensity r. It follows that for any f ∈ B(E),

E[f(Y xNt)] =

∞∑

k=0

PNt = kE[f(Y xk )] =

∞∑

k=0

e−rt (rt)k

k!Kkf(x)

= e−rtertKf(x) = e

tr(K − 1)f(x) = etGf(x).

The proof thatXx satisfies condition (c) from Proposition 2.11 goes basicallythe same. Let 0 ≤ t1 ≤ · · · ≤ tn and f1, . . . , fn ∈ B(E). Then, since a


Poisson process has independent increments,

E[f1(Y

xNt1

) · · · fn(Y xNtn

)]

=

∞∑

k1=1

· · ·∞∑

kn=1

PNt1 = k1 · · ·PNtn −Ntn−1 = kn

·E[f1(Y

xk1) · · · fn(Y x

k1+···+kn)]

=

∞∑

k1=1

· · ·∞∑

kn=1

e−rt1 (rt1)k1

k1!· · · e−r(tn−tn−1) (r(tn − tn−1))

kn

kn!

·Kk1f1Kk2f2 · · ·Kknfn(x)

= ettGf1e

(t2 − t1)Gf2 · · · e (tn − tn−1)Gfn(x).

Inserting fi = 1Ai we see that condition (c) from Proposition 2.11 is satisfied.

Remark. Jump processes can be approximated with Markov chains. Let Ebe Polish, K a probability kernel on E, x ∈ E, and r > 0. For each n ≥ 1,

let (Y(n)i )i≥0 be a Markov chain with Y

(n)0 = x and transition probabilities

P(Y (n)i+1 ∈ · |Y (n)

0 , . . . , Y(n)i ) = K(n)(Y

(n)i , · ),

whereK(n)(y,dz) = 1

nK(y,dz) + (1− 1n)δy(dz).

Then the processes (X(n)t )t≥0 given by

X(n)t := Y

(n)⌊nrt⌋

converge as n → ∞ in finite dimensional distributions to the jump processX with jump kernel K, jump rate r, and initial condition X0 = x.

A well-known example of a jump process is continuous-time random walk.

Example: (Random walk). Let d ≥ 1 and let p : Zd → R be a probabilitydistribution on Zd, i.e., p(x) ≥ 0 for all x ∈ Z and

∑x p(x) = 1. Let E := Zd

(equipped with the discrete topology), K(x, y) := p(y − x), r > 0, and

define G as in (4.4). The jump process (X, (Px)x∈Zd) with generator G is

called the (continuous-time) random walk that jumps from x to y with raterp(y − x).

Let (Zk)k≥1 be independent random variables with distribution PZk =

x = p(x) (x ∈ Zd) and let (σk)k≥1 be independent exponentially distributedrandom variables with mean r−1, independent of (Zk)k≥1. Set Y x

n := x +∑nk=1 Zk, τn :=

∑nk=1 σk, and put

(4.13) Xxt := Y x

n if τn ≤ t < τn+1

Then Xx = (Xxt )t≥0 is a version of the random walk that jumps from x to

y with rate rp(y − x), started in Xx0 = x.

MARKOV PROCESSES 61

Often, one wants to consider jump processes in which the jump rate r is afunction of the position of the process. As long as r is a bounded function,such jump processes exist by a trivial extension of Proposition 4.4.

Proposition 4.6 (Jump processes with non-constant rate). Let E be Polish,let K be a probability kernel on E, and let r ∈ B(E) be nonnegative. DefineG : B(E) → B(E) by

(4.14) Gf := r(Kf − f) (f ∈ B(E)).

Then Ptf := eGtf (f ∈ B(E), t ≥ 0) defines a Markov semigroup and there

exists a Markov process (Px)x∈E corresponding to (Pt)t≥0. If E is compactand K and r are continuous then (Px)x∈E is a Feller process.

Proof. Set R := supx∈E r(x) and define

(4.15) K ′(x,dy) :=r(x)

RK(x,dy) +

(1− r(x)

R

)δx(dy).

Then Gf = R(K ′f − f) so we are back at the situation in Proposition 4.4.

Example: (Moran model). Fix n ≥ 1, put E := 0, 1, . . . , n, and(4.16) GXf(x) := 1

2x(n− x)(f(x+ 1) + f(x− 1)− 2f(x)

),

which corresponds to setting r(x) = x(n−x) and K(x, y) := 121y=x+1 +

121y=x−1 for x = 1, . . . , n−1. Observe that since r(0) = r(n) = 0 it is ir-

relevant how we defineK(0, · ) and K(n, · ). The jump process (X, (Px)x∈E)with generator GX is called the Moran model with population size n.

The Moran model arises in the following way. Consider n organisms thatare divided into two types, denoted by 0, 1. (For example, 0 might representa white flower and 1 a red one.) Let Sn := 0, 1n = y = (y(1), . . . , y(n)) :y(i) ∈ 0, 1 ∀i be the set of all different ways in which we can assign typesto these n organisms. Put vij(y)(i) := y(j) and vij(y)(k) := y(k) if k 6= i.Then vij(y) is the configuration in which the i-th organism has adoptedthe type of the j-th organism. Let (Y,Py∈Sn) be the Markov process withgenerator

(4.17) GY f(y) :=12

∑

ij

(f(vij(y))− f(y)).

This means that each unordered pair i, j of organisms is selected with rate1, and then one of these organisms, chosen with equal probabilities, takesover the type of the other one. (Note that there is no harm in includingi = j in the sum in (4.17) since vii(y) = y.) Now if Y y is a version of theMarkov process with generator GY started in Y y

0 = y, then

(4.18) Xxt :=

n∑

i=1

Y yt (i) (t ≥ 0)


is a version of the Moran model started in x :=∑n

i=1 y(i). To see this, atleast intuitively, note that if x =

∑ni=1 y(i) then x(n − x) is the number of

unordered pairs i, j of organisms such that i and j have different types,and therefore 1

2x(n−x) is the total rate of 1’s changing to 0’s, which equalsthe total rate of 0’s changing to 1’s.

4.3. Feller processes with compact state space. We will now take alook at some examples of Markov processes that are not jump processes. Allprocesses that we will look at are processes on compact subsets of Rd withcontinuous sample paths (although we will not prove the latter here). Oneshould keep in mind that there are many more possibilities for a Markovprocess not to be a jump process. For example, there are processes thathave a combination of continuous and jump dynamics or that make infinitelymany (small) jumps in each open time interval.

Let d ≥ 1, let D ⊂ Rd be a bounded open set and let D denote its closure.Let f |D denote the restriction of a function f : Rd → R to D. By definition:

(4.19) C2(D) := f |D : f : Rd → R twice continuously differentiable.Let Md

+ denote the space of real d× d matrices m that are symmetric, i.e.,mij = mji, and nonnegative definite, i.e.,

(4.20)∑

ij

vimijvj ≥ 0 ∀v ∈ Rd.

Let a : D → Md+ and b : D → Rd be continuous functions and let (D(A), A)

be the linear operator on C(D) defined by

(4.21) Af(x) := 12

∑

ij

aij(x)∂2

∂xi∂xjf(x) +

∑

i

bi(x)∂

∂xif(x) (x ∈ D).

For x ∈ D these derivatives are defined in the obvious way, that is, ∂∂xi

f(x) =

limε→0 ε−1

(f(x+ εδi)− f(x)

). For x in the boundary ∂D := D\D we have

to be a bit careful since it may happen that x + εδi 6∈ D for all ε 6= 0. Bydefinition, each f ∈ C2(D) can be extended to a continuously differentiablefunction f on all of Rd. Therefore, we define

(4.22) ∂∂xi

f := ( ∂∂xi

f)∣∣D.

To see that this definition does not depend on the choice of the extension f ,note that if f is another extension, then ∂

∂xif = ∂

∂xif on D. By continuity,

∂∂xi

f = ∂∂xi

f on D.13

13Alternatively, we might have defined C2(D) as the space of all functions f : D → R

whose partial derivatives up to second order exist on D and can be extended to continuousfunctions on D. For ‘nice’ (for example convex) domains D this definition coincides withthe definition in (4.19). This is a consequence of Whitney’s extension theorem, see [EK86,Appendix 6].

MARKOV PROCESSES 63

We ask ourselves when the closure of A generates a Feller process inD, i.e.,satisfies the conditions (1)–(3) from Theorem 3.28. By the Stone-Weierstrasstheorem (Theorem 3.30), C2(D) is dense in C(D), so condition (1) is alwayssatisfied.

If f ∈ C2(D) assumes its maximum in a point x ∈ D, then Af(x) ≤0. This is a consequence of the fact that a(x) is nonnegative definite.In fact, since a(x) is symmetric, it can be diagonalized. Therefore, foreach x there exist orthonormal vectors e1(x), . . . , ed(x) ∈ Rd and constantsa1(x), . . . , ak(x) such that

(4.23)∑

ij

aij(x)∂2

∂xi∂xjf(x) =

∑

k

ak(x) ∂2

∂ε2f(x+ εek(x))

∣∣ε=0

.

Since a is nonnegative definite the constants ak(x) are all nonnegative, and

if f assumes its maximum in x then ∂2

∂ε2f(x+ εek(x))|ε=0 ≤ 0 for each k.

Exercise 4.7. Let D := x ∈ R2 : |x| < 1 be the open unit ball in R2 andput

(4.24)

(a11(x) a12(x)a21(x) a22(x)

):=

(x22 −x1x2

−x1x2 x21

)

and

(4.25) (b1(x), b2(x)) := c (x1, x2).

For which values of c does the operator A in (4.21) satisfy the positive max-imum principle?

The preceding exercise shows that it is not always easy to see when Asatisfies the positive maximum principle also for x ∈ ∂D. If this is thecase, however, and by some means one can also check Condition (4) from

Theorem 3.29, then A generates a Feller process (X, (Px)x∈D) in D. Wewill later see that under Px, X has a.s. continuous sample paths. We call

(X, (Px)x∈D) the diffusion with drift b and local diffusion rate (or diffusionmatrix) a. The next lemma explains the meaning of the functions a and b.

Lemma 4.8 (Drift and diffusion rate). Assume that the closure of the op-erator A in (4.21) generates a Feller semigroup (Pt)t≥0. Then, as t → 0,

(4.26)(i)

∫

DPt(x,dy)(yi − xi) = bi(x)t+ o(t),

(ii)

∫

DPt(x,dy)(yi − xi)(yj − xj)= aij(x)t+ o(t),

for all i, j = 1, . . . , d.

Proof. For any f ∈ C2(D) we have by (3.32)

(4.27)

∫

DPt(x,dy)f(y) = Ptf(x) = f(x) + tAf(x) + o(t) as t → 0.


Fix x ∈ D and set fi(y) := (yi−xi). Then f(x) = 0 and Afi(x) = bi(x), andtherefore (4.27) yields (4.26) (i). Likewise, inserting fij(y) := (yi − xi)(yj −xj) into (4.27) yields (4.26) (ii).

By condition (a) from Proposition 2.11, Lemma 4.8 says that if X is aversion of the Markov process with generator A, started in any initial law,then(4.28)

(i) E[(Xt+ε(i) −Xt(i))|FXt ] = bi(Xt)ε+ o(ε),

(ii) E[(Xt+ε(i) −Xt(i))(Xt+ε(j)−Xt(j))|FXt ] = aij(Xt)ε+ o(ε).

Therefore, the functions b and a describe the mean and the covariance matrixof small increments of the diffusion process X.14 If a ≡ 0 then

(4.29) PxXt = xt ∀t ≥ 0 = 1,

where t 7→ xt solves the differential equation

(4.30) ∂∂txt = b(xt) with x0 = x.

In general, diffusions can be obtained as solutions to stochastic differentialequations of the form

(4.31) dXt = σ(Xt)dBt + b(Xt)dt

where σ(x) is a matrix such that∑

k σik(x)σjk(x) = aij(x) and B is d-dimensional Brownian motion, but this falls outside the scope of this section.

Example: (Wright-Fisher diffusion). By Exercise 3.32, the operator AWF

from Exercise 3.20 generates a diffusion process (Y, (Py)y∈[0,1]) in [0, 1]. Thisdiffusion is known as the Wright-Fisher diffusion.

For each n ≥ 1, let Xn be a Moran model in 0, . . . , n (see (4.16)) withsome initial law L(Xn

0 ). Define

(4.32) Y nt := 1

nXnt ,

and assume that L(Y n0 ) ⇒ µ for some probability measure µ on [0, 1]. We

claim that L(Y n) ⇒ L(Y ) where Y is a version of the Wright-Fisher diffusionwith initial law L(Y0) = µ. The proof of this fact relies on some deep resultsthat we have not seen yet, so we will only give a heuristic argument. Let(Pn

t )t≥0 be the transition function of Y n. Then by (4.16),

(4.33) Pnt (y, · ) = δy + tn2y(1− y)

(12δy+ 1

n+ 1

2δy− 1n− δy

)+ o(t),

and therefore

(4.34)(i)

∫

DPnt (y,dz)(z − y) = o(t),

(ii)

∫

DPnt (y,dz)(z − y)2 = ty(1− y) + o(t).

14Indeed, since the mean of Pt(x, · ) is x plus a term of order t, the covariance matrixof Pt(x, · ) is equal to

∫DPt(x,dy)(yi − xi)(yj − xj) up to an error term of order t2.

MARKOV PROCESSES 65

Thus, at least the first and second moments of small increments of theprocess Y n converge to those of Y .

The next example shows that the domain of an operator is not only oftechnical interest, but can significantly contribute to the behavior of thecorresponding Markov process.

Example: (Brownian motion with absorption and reflection). Define linearoperators (D(Aab), Aab) and (D(Are), Are) on C[0, 1] by

(4.35)D(Aab) := f ∈ C2[0, 1] : f ′′(0) = 0 = f ′′(1),Aabf(x) :=

12f

′′(x) (x ∈ [0, 1]),

and

(4.36)D(Are) := f ∈ C2[0, 1] : f ′(0) = 0 = f ′(1),Aref(x) :=

12f

′′(x) (x ∈ [0, 1]).

Then the closures of Aab and Aab generate a Feller processes in [0, 1]. Theoperator Aab generates Brownian motion absorbed at the boundary and Are

generates Brownian motion reflected at the boundary.To see that Aab and Are satisfy the positive maximum principle, note that

if f ∈ D(Aab) or f ∈ D(Are) assumes its maximum in a point x ∈ (0, 1) then12f

′′(0) ≤ 0. If f ∈ D(Aab) assumes its maximum in a point x ∈ 0, 1 then

Aabf(x) =12f

′′(x) = 0 by the definition of D(Aab)! Similarly, if f ∈ D(Are)

assumes its maximum in a point x ∈ 0, 1 then Aref(x) = 12f

′′(x) ≤ 0because of the fact that f ′(x) = 0 by the definition of D(Are).

The fact that Aab and Are satisfy Condition (4) from Theorem 3.29 followsfrom the theory of partial differential equations, see for example [Fri64].

4.4. Feller processes with locally compact state space. So far, wehave only been able to treat Feller processes with compact state spaces. Wewill now show how to deal with processes with locally compact state spaces.We start with an example.

Example: (Brownian motion). Fix d ≥ 1 and define

(4.37) Pt(x,dy) :=1

(2πt)d/2e− 1

2t |y − x|2dy.

Then (Pt)t≥0 is a transition function on Rd and there exists a Markov process

(B, (Px)x∈Rd) with continuous sample paths associated with (Pt)t≥0.

Let Rd := Rd ∪ ∞ be the one-point compactification of Rd (compare

with (1.66)) and define a Markov process (Px)x∈R

dby

(4.38) Px:=

Px if x ∈ Rd,δ∞ if x = ∞,

where δ∞ denotes the delta-measure on the constant function w(t) := ∞ for

all t ≥ 0. Note that Pxis a measure on D

Rd [0,∞) while Px is a measure on


DRd [0,∞), so when we say that Px= Px for x ∈ Rd we mean that P

xis the

image of Px under the embedding map DRd [0,∞) ⊂ DRd [0,∞).

We claim that (Px)x∈R

dis a Feller process with compact state space Rd.

It is not hard to see that this is a Markov process with transition function

(4.39) P t(x, · ) :=

Pt(x, · ) if x ∈ Rd,δ∞ if x = ∞.

We must show that this transition function is continuous. This means thatwe must show that (t, x) 7→ P tf(x) is continuous for each f ∈ C(Rd). SinceP t1(x) = 1, by subtracting a constant it suffices to show that (t, x) 7→P tf(x) is continuous for each f ∈ C0(Rd) := f ∈ C(Rd) : f(∞) := 0.Assume that (tn, xn) → (t, x) ∈ [0,∞) × Rd. Without loss of generality wemay assume that xn 6= ∞ for all n. We distinguish two cases. 1. If x 6= ∞,then by uniform convergence

(4.40)Ptnf(xn) =

∫

Rd

1(2πtn)d/2

e− 1

2tn|y − xn|2f(y) dy

−→n→∞

∫

Rd

1(2πt)d/2

e− 1

2t |y − x|2f(y) dy = Ptf(x).

2. If x = ∞, then for each compact set C ⊂ Rd

(4.41) |Ptnf(xn)| ≤ ‖f‖ 1(2πtn)d/2

supy∈C

e− 1

2tn|y − xn|2 + sup

x∈Rd\C‖f‖.

Since for each ε > 0 we can find a compact set C such that supx∈Rd\C ‖f‖ ≤

ε, taking the limit n → ∞ in (4.41) we find that lim supn→∞ |Ptnf(xn)| ≤ εfor each ε > 0, and therefore, by (4.39)

(4.42) limn→∞

Ptnf(xn) = 0 = P tf(∞).

We can use the compactification trick from the previous example moregenerally. We start with a simple observation. Let E be locally compactbut not compact, separable, and metrizable, and let E := E ∪ ∞ be itsone-point compactification. Let C0(E) := f ∈ Cb(E) : limx→∞ f(x) =0 denote the separable Banach space of continuous real functions on Evanishing at infinity, equipped with the supremumnorm.

Lemma 4.9 (Compactification of Markov process). Assume that (Px)x∈E

is a Markov process in E with transition function (P t)t≥0, and that

(1) (non-explosion) PxXt,Xt− 6= ∞ ∀t ≥ 0 = 1 ∀x 6= ∞.

Let Px and Pt(x, · ) denote the restrictions of Pxand P t(x, · ) to DE [0,∞)

and E, respectively. Then (Px)x∈E is a Markov process in E with transitionfunction (Pt)t≥0. If moreover

(2) (non-implosion) P∞Xt = ∞ ∀t ≥ 0 = 1,

MARKOV PROCESSES 67

then for each t ≥ 0, Pt maps C0(E) into itself and (Pt)t≥0 is a stronglycontinuous contraction semigroup on C0(E).

Proof. The fact that (Px)x∈E is a Markov process in E with transition func-tion (Pt)t≥0 if the Feller process in E is non-explosive is almost trivial. Wemust only show that the event in condition (1) is actually measurable. SinceX = (Xt)t≥0 can be viewed as a random variable with values in DE [0,∞),it suffices to show that DE [0,∞) is a measurable subset of DE [0,∞). Thisfollows from the fact that DE [0,∞) is Polish in the induced topology, sothat by Proposition 1.24, DE [0,∞) is a countable intersection of open setsin DE[0,∞).

Observe that there is a natural identification between the space C0(E)and the closed subspace of C(E) given by C0(E) := f ∈ C(E) : f(∞) = 0.If the Feller process in E is not only non-explosive but also non-implosive,then

(4.43) P tf(∞) = f(∞) = 0

for each f ∈ C0(E) := f ∈ C(E) : f(∞) = 0, which shows that Pt

maps C0(E) into itself. Since (P t)t≥0 is a strongly continuous contraction

semigroup on C(E) its restriction to the closed subspace C0(E) is also astrongly continuous contraction semigroup.

The next Proposition gives sufficient conditions for non-explosion andnon-implosion in terms of the generator of a process. The function f incondition (1) is an example of a Lyapunov function.

Proposition 4.10 (Non-explosion and non-implosion). Let E be the one-point compactification of a locally compact separable metrizable space E and

let (Px)x∈E be a Feller process in E with generator (D(G), G). If

(1) There exist functions f, g : E → R such that f ≥ 0, limx→∞ f(x) =∞, and supx∈E g(x) < ∞, and functions fn ∈ D(G) such that 0 ≤fn(x) ↑ f(x) for all x ∈ E, fn(∞) ↑ ∞, and Gfn → g uniformly oncompacta in E.

then (Px)x∈E is non-explosive. If

(2) Gf(∞) = 0 for all f ∈ D(G),

then (Px)x∈E is non-implosive.

Proof. The proof that (1) implies that (Px)x∈E is non-explosive will be post-

poned to Section 5.6.If (2) holds then any solution to the Cauchy equation ∂

∂tu(t) = Gu(t)

satisfies ∂∂tu(t)(∞) = 0. Therefore, by Proposition 3.15, P tf(∞) = f(∞)

for all f ∈ D(G) and t ≥ 0, where (P t)t≥0 is the semigroup of (Px)x∈E . Since

D(G) is dense it follows that P tf(∞) = f(∞) for all f ∈ C(E), t ≥ 0. This

means that P t(∞, · ) = δ∞ for each t ≥ 0. It follows that P∞Xt = ∞ = 1

for all t ≥ 0, which implies that P∞Xt = ∞ ∀t ∈ Q ∩ [0,∞) = 1, and


therefore (Px)x∈E is non-implosive by the right-continuity of sample paths.

Example: (Feller diffusion). Identify C[0,∞] with the space f ∈ C[0,∞) :limx→∞ f(x) exists and define an operator (D(AFel), AFel) on C[0,∞] by(4.44)

D(AFel) := f ∈ C2[0,∞) : limx→∞

f(x) exists and limx→∞

x ∂2

∂x2 f(x) = 0,AFelf(x) :=x ∂2

∂x2 f(x) (x ∈ [0,∞)).

We claim that the closure of AFel generates the semigroup of a non-explosiveand non-implosive Feller process in [0,∞]. It is not hard to check thatAFel satisfies the positive maximum principle. Consider the class of Laplacefunctions fλλ≥0 defined by

(4.45) fλ(x) := e−λx (x ∈ [0,∞)).

We calculate

(4.46) x ∂2

∂x2 fλ(x) = λ2xe−λx −→x→∞

0,

which shows that fλ ∈ D(AFel) for all λ ≥ 0. By the Stone-Weierstrasstheorem (Theorem 3.30), the linear span of fλλ≥0 is dense in C[0,∞]. Weclaim that for each λ ≥ 0 there exists a solution u to the Cauchy equation

(4.47)

∂∂tu(t)=AFelu(t) (t ≥ 0),u(0) = fλ.

Indeed, it is easy to see that

(4.48) ∂∂te

−x/t = x ∂2

∂x2e−x/t (t > 0, x ∈ [0,∞))

so the solution to (3.59) is given by

(4.49) u(t) = fλt where λt := (λ−1 + t)−1 (t ≥ 0)

if λ > 0 and λt := 0 (t ≥ 0) if λ = 0. It therefore follows from Theorem 3.29and Lemma 3.31 that AFel generates a Feller semigroup on C[0,∞].

Let (Px)x∈[0,∞] denote the corresponding Feller process. It is easy to

see that the function f(x) = x (with g(x) = 0) satisfies condition (1) from

Proposition 4.10, so (Px)x∈[0,∞] is non-explosive. Since limx→∞AFelf(x) = 0

for all f ∈ D(AFel) it follows that (Px)x∈[0,∞] is also non-implosive.

Let Pt(x, · ) denote the restriction of P t(x, · ) to [0,∞). By what wehave just proved, (Pt)t≥0 is the transition function of a Markov process

(X, (Px)x∈[0,∞)) on [0,∞), which is called the Feller diffusion.Formula (4.49) tells us that the semigroup (Pt)t≥0 maps Laplace functions

into itself. Indeed,

(4.50) Ptfλ = fλt (t, λ ≥ 0)

with λt as in (4.49). This is closely related to the branching property ofthe Feller diffusion. By this, we mean that if Xx and Xy are independent

MARKOV PROCESSES 69

versions of the Feller diffusion started in Xx0 = x and Xy

0 = y, respectively,

and Xx+y is a version of the Feller diffusion started in Xx+y0 := x+ y, then

(4.51) L(Xxt +Xy

t ) = L(Xx+yt ) (t ≥ 0).

To see this, note that (4.50) says that

(4.52) Ex[e−λXt

]= e

−λtx (t, λ ≥ 0).

By independence

(4.53)E[e−λ(Xx

t +Xyt )]= E

[e−λXx

t]E[e−λXy

t]

= e−λtx

e−λty = e

−λt(x+ y) = E[e−λXx+y

t].

Since this holds for all λ ≥ 0 and the linear span of the Laplace functionalsis dense, (4.51) follows.

Exercise 4.11. Let (X, (Px)x∈[0,∞)) be the Feller diffusion. Calculate theextinction probability Px[Xt = 0] for each t, x ≥ 0.


5. Harmonic functions and martingales

5.1. Harmonic functions. Let (Px)x∈E be a Feller process with a compactmetrizable state space E, Feller semigroup (Pt)t≥0 and generator (D(G), G).

Lemma 5.1 (Harmonic functions). The following conditions on a functionh ∈ C(E) are equivalent:

(a) Pth = h ∀t ≥ 0.(b) h ∈ D(G) and Gh = 0.

Proof. If Pth = h for all t ≥ 0 then limt→0 t−1(Pth − h) = 0, so h ∈ D(G)

and Gh = 0. Conversely, if h ∈ D(G) and Gh = 0, then the function ut := h(t ≥ 0) solves the Cauchy equation ∂

∂tut = Gut with initial condition u0 = h,so by Propositions 3.15 and 3.22 it follows that Pth = ut = h for all t ≥ 0.

A function h ∈ C(E) satisfying the equivalent conditions (a) and (b) fromLemma 5.1 is called a harmonic function for the Feller process (Px)x∈E .

Example 5.2 (Harmonic function for Wright-Fisher diffusion). Let AWF be

as in Exercises 3.20 and 3.32 and let (Y, (Py)y∈[0,1]) be the Wright-Fisherdiffusion, i.e., the Feller process with generator AWF. Then the functionh : [0, 1] → [0, 1] given by h(x) := x is harmonic for X. As a consequence,the Wright-Fisher diffusion satisfies

(5.1) Ex[Xt] = x (t ≥ 0).

Proof. Since 12x(1 − x) ∂2

∂x2x = 0, h satisfies condition (b) from Lemma 5.1.

As a consequence, Ex[Xt] = Pth(x) = h(x) = x for all t ≥ 0.

Let (Px)x∈E) is a Feller process with a compact metrizable state space Eand let h ∈ C(E) be harmonic. Then, if X is a version of (Px)x∈E startedin an arbitrary initial law, by condition (a) from Proposition 2.11,

(5.2) E[h(Xt)|FXs ] = Pth(Xs) = h(Xs) a.s. (0 ≤ s ≤ t).

This motivates the following definitions. By definition, a filtration is a family(Ft)t≥0 of σ-fields such that Fs ⊂ Ft for all 0 ≤ s ≤ t. An Ft-martingaleis a stochastic process M such that Mt is Ft-measurable, E[|Mt|] < ∞,and E[Mt|Fs] = Ms for all 0 ≤ s ≤ t. In the next sections we will studyfiltrations and martingales in more detail.

5.2. Filtrations. By definition, a filtered probability space is a quadruple(Ω,F , (Ft)t≥0,P) such that (Ω,F ,P) is a probability space and (Ft)t≥0 isa filtration on Ω with Ft ⊂ F ∀t ≥ 0. For example, if X is a stochasticprocess, then (FX

t )t≥0, defined in (2.5), is the filtration generated by X.We say that a stochastic process X is adapted to a filtration (Ft)t≥0, or

simply Ft-adapted, if Xt is Ft-measurable for each t ≥ 0.

MARKOV PROCESSES 71

Definition 5.3 (Progressive processes). A stochastic process X on (Ω,F ,P)is said to be progressively measurable with respect to (Ft)t≥0, or simply Ft-progressive, if the map (s, ω) 7→ Xs(ω) from [0, t]×Ω into E is B[0, t]×Ft-measurable for each t ≥ 0.

Exercise 5.4. Let X be a stochastic process and (Ft)t≥0 a filtration. Assumethat X is Ft-adapted and that X has right continuous sample paths. Showthat X is Ft-progressive. (Hint: adapt the proof of Lemma 1.4.)

If (Ft)t≥0 is a filtration, then

(5.3) Ft+ :=⋂

s>t

Fs (t ≥ 0)

defines a new, larger filtration (Ft+)t≥0. If Ft+ = Ft ∀t ≥ 0 then we say thatthe filtration (Ft)t≥0 is right continuous. It is not hard to see that (Ft+)t≥0

is right continuous.Recall that the completion of a σ-field F with respect to a probability

measure P is the σ-field

(5.4) F := A ⊂ Ω : ∃B ∈ F s.t. 1A = 1B a.s..There is a unique extension of the probability measure P to a probabilitymeasure on F . If (Ω,F , (Ft)t≥0,P) is a filtered probability space then

(5.5) F t := A ⊂ Ω : ∃B ∈ Ft s.t. 1A = 1B a.s. (t ≥ 0)

defines a new filtration (F t)t≥0. If F t = Ft ∀t ≥ 0 then we say that thefiltration (Ft)t≥0 is complete.15 A random variable X with values in a Polish

space is F t-measurable if and only if there exists an Ft-measurable randomvariable Y such that X = Y a.s.

Lemma 5.5 (Usual conditions). If (Ft)t≥0 is a filtration, then

(5.6) F t+ :=⋂

s>t

Fs = A ⊂ Ω : ∃B ∈ Ft+ s.t. 1A = 1B a.s. (t ≥ 0)

defines a complete, right-continuous filtration.

Proof. It is easy to see that⋂

s>tFs is right continuous and that A ⊂ Ω :∃B ∈ Ft+ s.t. 1A = 1B a.s. is complete. To see that the two formulasfor F t+ in (5.6) are equivalent, observe that A ∈ ⋂

s>tFs ⇒ ∀n ∃Bn ∈Ft+ 1

ns.t. 1A = 1Bn a.s. Put 1B∞ := lim infm 1Bm . Then 1A = 1B∞ a.s.

and since 1B∞ = lim infm≥n 1Bm we have B∞ ∈ Ft+ 1n∀n ⇒ B∞ ∈ Ft+.

This shows that A ∈ A ⊂ Ω : ∃B ∈ Ft+ s.t. 1A = 1B a.s.. Conversely,if ∃B ∈ Ft+ s.t. 1A = 1B a.s., then obviously A ∈ Fs for all s > t soA ∈ ⋂

s>tFs.

15Warning: F t is not the same as the completion of the σ-field Ft with respect to therestriction of P to Ft. The reason is the class of null sets of the restriction of P to Ft issmaller than the class of null sets of P. Because of this fact, some authors prefer to call(Ft)t≥0 the augmentation, rather than the completion, of (Ft)t≥0.


A filtration that is complete and right-continuous is said to fulfill the usualconditions.

5.3. Martingales.

Definition 5.6 (Martingale). An Ft-submartingale is a real-valued stochas-tic process M , adapted to a filtration (Ft)t≥0, such that E[|Mt|] < ∞ ∀t ≥ 0and

(5.7) E[Mt|Fs] ≥ Ms a.s. (0 ≤ s ≤ t).

A stochastic process M is called an Ft-supermartingale if −M is an Ft-submartingale, and an Ft-martingale if M is an Ft-submartingale and anFt-supermartingale.

We can think of an Ft-martingale as a model for a fair game of chance,where Mt is the capital that a player holds at time t ≥ 0 and Ft is theinformation available to the player at that time. Then (5.7) says that if theplayer holds a capital Ms at time s, then the expected capital that the playerwill hold at a later time t, given the information at time s, is precisely Ms.

Lemma 5.7 (Martingale filtration). Let (Ft)t≥0 and (Gt)t≥0 be filtrationssuch that Ft ⊂ Gt for all t ≥ 0. Then every Gt-submartingale that is Ft-adapted is an Ft-submartingale.

Proof. Since Ft ⊂ Gt, since M is a Gt-submartingale, and since M is Ft-adapted:

E[Mt|Fs] = E[E[Mt|Gs]

∣∣Fs

]≥ E[Ms|Fs] = Ms a.s. (0 ≤ s ≤ t).

In particular, if follows that every Ft-submartingale is also an FMt -sub-

martingale. If a stochastic process M is an FMt -submartingale, FM

t -super-martingale, or FM

t -martingale, then we simply say that M is a submartin-gale, supermartingale, or martingale, respectively.

Note that if (Ω,F , (Ft)t≥0,P) is a filtered probability space and M∞ is areal random variable such that E[|M∞|] < ∞, then

(5.8) Mt := E[M∞|Ft] (t ≥ 0)

defines an Ft-martingale. This follows from the facts that E[∣∣E[M∞|Ft]

∣∣] ≤E[E[|M∞||Ft]

]= E[|M∞|] < ∞ and E

[E[M∞|Ft]

∣∣Fs

]= E[M∞|Fs] for all

0 ≤ s ≤ t. Formula (5.8) defines the stochastic process M uniquely up tomodifications, since for each fixed t the conditional expectation is unique upto a.s. equality.

These observations raise a number of questions. Do all Ft-martingaleshave a last element M∞ as in (5.8)? Can we find modifications of M withcadlag sample paths? Before we address these questions we first pose anotherone: We know that the conditional expectation E[X|F ] of a random variableX with respect to a σ-field F is continuous in X. For example, if Xn → X

MARKOV PROCESSES 73

in Lp-norm for some 0 ≤ p < ∞, then E[Xn|F ] → E[X|F ] in Lp-norm. Buthow about the continuity of E[X|F ] in the σ-field F?

Let (Fn)n∈N be a sequence of σ-fields. We say that the σ-fields Fn decreaseto a limit F∞, denoted as Fn ↓ F∞, if F0 ⊃ F1 ⊃ · · · and

F∞ :=⋂

Fn.

Likewise, we say that the σ-fields Fn increase to a limit F∞, denoted asFn ↑ F∞, if F0 ⊂ F1 ⊂ · · · and

F∞ := σ(⋃

Fn

).

One has the following theorem. (See [Chu74, Theorem 9.4.8], or [Bil86,Theorems 35.5 and 35.7].)

Theorem 5.8 (Continuity of conditional expectation in the σ-field). LetX be a random variable defined on a probability space (Ω,F ,P) and let(Fn)n∈N be a sequence of sub-σ-fields of F . Assume that E[|X|] < ∞ andthat Fn ↓ F∞ or Fn ↑ F∞. Then

E[X|Fn] −→n→∞

E[X|F∞] a.s. and in L1-norm.

Corollary 5.9 (Filtration enlargement). Let (Ft)t≥0 be a filtration. Thenevery Ft-submartingale with right continuous sample paths is also an F t+-submartingale.

Proof. By Theorem 5.8 and the right continuity of sample paths, we haveE[Mt|Fs+] = E[Mt|Fs+] = limn→∞E[Mt|Fs+ 1

n] = limn→∞Ms+ 1

n= Ms a.s.

Coming back to our earlier questions about martingales, here are twoanswers.

Theorem 5.10 (Modification with cadlag sample paths). Let (Ft)t≥0 be a

filtration and let M be an F t+-submartingale. Assume that t 7→ E[Mt] isright continuous. Then M has a modification with cadlag sample paths.

This result can be found in [KS91, Theorem 1.3.13]. Note that if M isa martingale, then E[Mt] does not depend on t so that in this case t 7→E[Mt] is trivially right continuous. The next result can be found in [KS91,Theorem 1.3.15].

Theorem 5.11 (Submartingale convergence). Let M be a submartingalewith right continuous sample paths, and assume that supt≥0 E[Mt ∨ 0] <∞. Then there exists a random variable M∞ such that E[|M∞|] < ∞ andMt −→

t→∞M∞ a.s.


5.4. Stopping times. There is one more result about martingales that is ofcentral importance. Think of a martingale as a fair game of chance. Thenformula (5.7) says that the expected gain of a player who stops playingat a fixed time t is zero. But how about players who stop playing at arandom time? It turns out that the answer depends on what we mean bya random time. If the information available to the player at time t is Ft,then the decision whether to stop playing should be made on the basis ofthis information only. This leads to the definition of stopping times.

Let (Ft)t≥0 be a filtration. By definition, an Ft-stopping time is a functionτ : Ω → [0,∞] such that the stochastic process (1τ≤t)t≥0 is Ft-adapted.Obviously, this is equivalent to the statement that the event τ ≤ t (i.e.,the set ω : τ(ω) ≤ t) is Ft-measurable for each t ≥ 0. We interpret τ as arandom time with the property that, if Ft is the information that is availableto us at time t, then we can at any time t decide whether the stopping timeτ has already occurred.

Lemma 5.12 (Optional times). Let (Ft)t≥0 be a filtration on Ω and letτ : Ω → [0,∞] be a function. Then τ is an Ft+-stopping time if and only ifτ < t ∈ Ft ∀t ≥ 0.

Proof. If τ is an Ft+-stopping time then τ ≤ s ∈ ⋂t>s Ft ∀s ≥ 0, hence

τ ≤ s ∈ Ft ∀t > s ≥ 0. Therefore, for each t ≥ 0 we can choose sn ↑ t tosee that τ < t =

⋃nτ ≤ sn ∈ Ft ∀t ≥ 0. Conversely, if τ < t ∈ Ft

∀t ≥ 0, then for each t > s ≥ 0 we can choose t > un ↓ s to see thatτ ≤ s =

⋂nτ < un ∈ Ft, hence τ ≤ s ∈ ⋂

t>s Ft =: Fs+ ∀s ≥ 0.

Ft+-stopping times are also called optional times.

Lemma 5.13 (Stopped process). Let (Ft)t≥0 be a filtration, let τ be anFt+-stopping time, and let X be an Ft-progressive stochastic process. Then(Xt∧τ )t≥0 is Ft-progressive. If τ < ∞ then Xτ is measurable.

Proof. The fact that X is progressive means that for each t ≥ 0, the map(s, ω) 7→ Xs(ω) from [0, t] × Ω to E is B[0, t] × Ft-measurable. We need toshow that (s, ω) 7→ Xs∧τ(ω)(ω) is B[0, t]×Ft-measurable. It suffices to showthat (s, ω) 7→ s ∧ τ(ω) is measurable with respect to B[0, t]×Ft and B[0, t].Then (s, ω) 7→ (s∧ τ(ω), ω) 7→ Xs∧τ(ω)(ω) from [0, t]×Ω → [0, t]×Ω → E ismeasurable with respect to B[0, t]×Ft, B[0, t]×Ft, and B(E). Now, for any0 < s < t one has (s, ω) : s∧ τ(ω) < u = (s, ω) : s < u∩(s, ω) : τ(ω) <u = ([0, u)×Ω)∩ ([0, t]× ω : τ(ω) < u) ∈ B[0, t]×Ft, which proves that(s, ω) 7→ s ∧ τ(ω) is measurable with respect to B[0, t]×Ft and B[0, t].

If τ < ∞ (i.e., τ(ω) < ∞ for all ω ∈ Ω), then it follows that Xτ =limn→∞Xn∧τ is measurable.

Lemma 5.14 (Operations with stopping times). Let (Ft)t≥0 be a filtration.

MARKOV PROCESSES 75

(1) If τ, σ are Ft-stopping times, then τ ∧ σ is an Ft-stopping time.(2) If τn are Ft-stopping times, then supn τn is an Ft-stopping time.(3) If τn are Ft+-stopping times such that τn ↑ τ and τn < τ ∀n, then τ

is an Ft-stopping time.

Proof. To prove (1), note that τ ∧ σ ≤ t = τ ≤ t ∪ σ ≤ t ∈ Ft ∀t ≥ 0.To prove (2), note that supn τn ≤ t =

⋂nτn ≤ t ∈ Ft ∀t ≥ 0. To prove

(3), finally, note that in this case τ ≤ t =⋂

nτn < t ∈ Ft ∀t ≥ 0.

A typical example of a stopping time is a first entrance time. Let E be aPolish space and let X be an E-valued stochastic process. For any ∆ ⊂ E,the first entrance time of X into ∆ is defined as

(5.9) τ∆ := inft ≥ 0 : Xt ∈ ∆,

where τ(ω) := ∞ if t ≥ 0 : Xt(ω) ∈ ∆ = ∅. Note that Xτ∆ ∈ ∆ if τ∆ < ∞,X has right continuous sample paths, and ∆ is closed.

Proposition 5.15 (First entrance times). Let X have cadlag sample paths.If ∆ is closed, then τ∆ is an FX

t -stopping time.

Proof. For each t ≥ 0, define a map St : DE[0,∞) → DE [0,∞) by

(St(w))s := ws∧t (s ≥ 0).

Then St(X) is the process X stopped at time t. We claim that St(X) :Ω → DE [0,∞) is FX

t -measurable. This follows from the facts that theBorel-σ-field on DE [0,∞) is generated by the coordinate projections (πs)s≥0

(Proposition 1.16) and that St(X)−1(π−1s (A)) = (St(w))

−1s (A) = X−1

s∧t(A) ∈FXt for each s ≥ 0 and A ∈ B(E). Since E\∆ is an open subset of E it is

Polish, hence the space DE\∆[0,∞) is a Polish subspace of DE [0,∞), andtherefore, by Proposition 1.24 (b), a countable intersection of open subsetsof DE[0,∞). In particular, DE\∆[0,∞) is a measurable subset of DE [0,∞),

and therefore τ ≤ t = St(X) 6∈ DE\∆[0,∞) ∈ FXt for each t ≥ 0.

The next theorem shows what happens to a player who stops playing ata stopping time τ . For a proof, see for example [KS91, Theorem 1.3.22].

Theorem 5.16 (Optional sampling). Let (Ft)t≥0 be a filtration, let M bean Ft-submartingale with right continuous sample paths, and let τ be anFt-stopping time such that τ ≤ T for some T < ∞. Then

(5.10) E[Mτ ] ≥ M0 a.s.


5.5. Applications. The next example gives an application of Theorem 5.11.

Example 5.17 (Convergence of the Wright-Fisher diffusion). Let Xx be aversion of the Wright-Fisher diffusion started in Xx

0 = x ∈ [0, 1]. Then thereexists a random variable Xx

∞ such that E[Xx∞] = x and

limt→∞

Xxt = Xx

∞ a.s.

Proof. It follows from Example 5.2 and (5.2) that X is a nonnegative mar-tingale. Therefore, by Theorem 5.11, there exists a random variable X∞such that Xt → X∞ a.s. It follows from (5.1) and bounded convergencethat E[X∞] = x.

Exercise 5.17 leaves a number of questions open. It is not hard to seethat the boundary points 0, 1 are traps for the Wright-Fisher diffusion,in the sense that Px[Xt = x ∀t ≥ 0] = 1 if x ∈ 0, 1. Therefore, we ask:is it true that the random variable X∞ from Example 5.17 takes values in0, 1? Does the Wright-Fisher diffusion reach the traps in finite time? Inorder to answer these questions, we need one more piece of general theory.

Let (X, (Px)x∈E) be a Feller process on a compact metrizable space Eand with generator G. If h ∈ D(G) satisfies Gh = 0 then it follows fromLemma 5.1 and formula (5.2) that (h(Xt))t≥0 is an FX

t -martingale. Evenif a function f ∈ D(G) does not satisfy Gf = 0, we can still associate amartingale with f .

Proposition 5.18 (Martingale problem). Let X be a version of a Fellerprocess with generator (D(G), G), started in any initial law. Then, for everyf ∈ D(G), the process Mf given by

(5.11) Mft := f(Xt)−

∫ t

0Gf(Xs)ds (t ≥ 0)

is an FXt -martingale.

Proof.

E[Mft |Fu] =E[f(Xt)|Fu]−

∫ t

0E[Gf(Xs)|Fu]ds

=Pt−uf(Xu)−∫ u

0Gf(Xs)ds−

∫ t

uPs−uGf(Xu)ds

= f(Xu)−∫ u

0Gf(Xs)ds = Mf

u ,

where we have used that∫ t

0PsGf =

∫ t

0

∂∂sPsf = Ptf − f

by Proposition 3.15.

The next two examples give applications of Proposition 5.18.

MARKOV PROCESSES 77

Example 5.19 (Wright-Fisher diffusion converges to traps). Let Xx be aversion of the Wright-Fisher diffusion started in Xx

0 = x ∈ [0, 1]. Then therandom variable Xx

∞ from Example 5.17 is 0, 1-valued.Proof. Denote the Wright-Fisher diffusion by (X, (Px)x∈E). The functionf(x) := x2 satisfies f ∈ D(AWF) and AWFf(x) = x(1 − x). Therefore, byProposition 5.18,

Ex[X2t ] =

∫ t

0Ex[Xs(1−Xs)] ds (t ≥ 0).

Since Xt ∈ [0, 1] it follows, letting t → ∞, that

Ex[ ∫ ∞

0Xs(1−Xs) ds

]≤ 1.

In particular,∫∞0 Xs(1−Xs)ds is finite a.s., which is possible only if X∞ ∈

0, 1 a.s.

Example 5.20 (Wright-Fisher diffusion gets trapped in finite time). LetXx be a version of the Wright-Fisher diffusion started in Xx

0 = x ∈ [0, 1].Define (using Proposition 5.15) an FX

t -stopping time τ by

τ := inft ≥ 0 : Xx

t ∈ 0, 1.

Then E[τ ] < ∞.

Proof. Let (X, (Px)x∈E) be the Wright-Fisher diffusion. The idea of theproof is to show that there exists a continuous function f : [0, 1] → [0,∞)such f(0) = f(1) = 0 and the process

(5.12) Mt := f(Xt) +

∫ t

01(0,1)(Xs)ds (t ≥ 0)

is an FXt -martingale. Let us first explain why we are interested in such a

function. If the process in (5.12) is a martingale, then by optional sampling(Theorem 5.16), Ex[Mτ∧t] = Ex[M0], hence

Ex[τ ∧ t] = Ex[ ∫ τ∧t

01(0,1)(Xs)ds

]= f(x)− Ex[f(Xτ∧t)] (t ≥ 0).

Letting t ↑ ∞ we see that Ex[τ ] ≤ f(x), so τ < ∞ a.s. Since f is zero on0, 1 it follows that Ex[f(Xτ∧t)] → 0 as t ↑ ∞, so we find that

(5.13) Ex[τ ] = f(x) (x ∈ [0, 1]).

To get a function f such that (5.12) holds, we choose 0 < εn < 12 such that

εn ↓ 0, we define

hn(x) :=

− 2

x(1− x)(x ∈ (εn, 1− εn)),

− 2εnεn(1− εn)

(x ∈ [0, εn] ∪ [1− εn, 1]),


and we put

fn(x) :=

∫ x

0dy

∫ y

12

dz hn(z) (x ∈ [0, 1]).

Then the functions fn : [0, 1] → R are continuous, symmetric in the sensethat fn(x) = fn(1 − x), and satisfy fn(0) = fn(1) = 0. Moreover, we havefn ↑ f , where

f(x) :=

∫ x

0dy

∫ y

12

dz−2

z(1 − z)(x ∈ [0, 1]).

To see that this is finite, note that for y ≤ 12 ,

∫ y

12

dz−2

z(1− z)=

∫ 12

ydz

2

z(1 − z)≤ 4

∫ 12

y

dz

z= 4

(log(12 )− log(y)

),

which is integrable at zero. The functions fn satisfy

AWFfn(x) =12x(1− x)hn(x) ↓n→∞ −1(0,1)(x) (x ∈ [0, 1]).

The fact that the process M in (5.12) is a martingale now follows fromProposition 5.18 and Lemma 5.21 below.

We say that a sequence of bounded real functions fn, defined on a mea-surable space, converges to a bounded pointwise limit f , if fn → f pointwisewhile supn ‖fn‖ < ∞. We denote this as

f = bp limn→∞

fn.

Recall that the integral is continuous with respect to bounded pointwiseconvergence. So, if Xn are real-valued random variables and Xn → X, thenE[Xn] → E[X].

Lemma 5.21 (Bounded pointwise limits). Let X be a Feller process on acompact metrizable space E and let G be its generator. Let fn ∈ D(G),f ∈ C(E), and g ∈ B(E) be functions such that

f = bp limn→∞

fn and g = bp limn→∞

Gfn.

Then the process M given by

Mt := f(Xt)−∫ t

0g(Xs)ds (t ≥ 0)

is an FXt -martingale.

Proof. We know that the processes

M(n)t := fn(Xt)−

∫ t

0Gfn(Xs)ds (t ≥ 0)

are FXt -martingales. In particular, E[M

(n)t |FX

s ] = M(n)s a.s. for all 0 ≤ s ≤

t. By the definition of the conditional expectation, this is equivalent to thefact that

MARKOV PROCESSES 79

(1) M(n)s is FX

s -measurable,

(2) E[M(n)t 1A] = E[M

(n)s 1A] ∀A ∈ FX

s ,

for all 0 ≤ s ≤ t. For each fixed t ≥ 0, we observe that

Mt = bp limn→∞

M(n)t .

It follows that

(1) Ms is FXs -measurable,

(2) E[Mt1A] = E[Ms1A] ∀A ∈ FXs ,

which proves that E[Mt|FXs ] = Ms a.s. for all 0 ≤ s ≤ t.

5.6. Non-explosion. Using martingales and stopping times, we can com-plete the proof of Proposition 4.10 started in Section 4.4. Let E be theone-point compactification of a locally compact separable metrizable spaceE and let (X, (Px)x∈E) be a Feller process in E with generator (D(G), G).Recall that (Px)x∈E is called non-explosive if

PxXt,Xt− 6= ∞ ∀t ≥ 0 = 1 ∀x 6= ∞.

Proof of Proposition 4.10 (continued). The fact that condition (2) from Pro-position 4.10 implies non-implosion has already been proved in Section 4.4.Assume that condition (1) from Proposition 4.10 holds. For each R > 0,put

(5.14) OR := x ∈ E : f(x) < Rand define stopping times τR by

(5.15) τR := inft ≥ 0 : Xt ∈ E\OR (R > 0).

Fix x ∈ E. By Proposition 5.18 and optional stopping, for each t > 0,

(5.16)

PxτR ≤ t infx∈E\OR

fn(x)

≤ Ex[fn(Xt∧τR)] = f(x) + Ex[ ∫ t∧τR

0Gfn(Xs)ds

]

≤ f(x) + t supx∈OR

Gfn(x),

where in the last inequality we have used that f(Xs) < R for all s < τR.Since OR is compact and Gfn converges uniformly on compacta to g,


supx∈OR

Gfn(x) ≤ supx∈E

g(x).

We claim that moreover

(5.18) infx∈E\OR

fn(x) −→n→∞

R.

Indeed, by our assumptions, the sets x ∈ E\OR : fn(x) ≤ R − ε arecompact subsets of E, decreasing to the empty set. Therefore, for eachε > 0 there exists an n with x ∈ E\OR : fn(x) ≤ R− ε = ∅.


Inserting (5.17) and (5.18) into (5.16), we find that

(5.19) PxτR ≤ t ≤ R−1(f(x) + t sup

x∈Eg(x)

).

Letting R ↑ ∞ shows that

(5.20) PxXs,Xs− 6= ∞ ∀s ≤ t = 1

for each fixed t > 0. Letting t ↑ ∞ shows that (Px)x∈E is non-explosive.

MARKOV PROCESSES 81

6. Convergence of Markov processes

6.1. Convergence in path space. In this section, we discuss the conver-gence of a sequence of Feller processes to a limiting Feller process. Themartingale problem from Proposition 5.18 will play an important role in theproofs. As an application of our main result, we will complete the proof ofTheorem 4.2.

Let Gn be multivalued linear operators on a Banach space V , i.e., the Gn

are linear subspaces of V ×V . We define the extended limit exlimn→∞ Gn as

(6.1) ex limn→∞

Gn := (f, g) : ∃(fn, gn) ∈ Gn s.t. (fn, gn) −→n→∞

(f, g).If the Gn are single-valued, and therefore the graphs of some linear operators(D(An), An), and moreover exlimn→∞ Gn is single-valued and the graph of(D(A), A), then we also write exlimn→∞An = A.

Exercise 6.1. Show that exlimn→∞ Gn is always a closed linear operator.Show that

(i) ex limn→∞

(λ1 + λ2Gn) = λ1 + λ2ex limn→∞

Gn for all λ1, λ2 ∈ R, λ2 6= 0.

(ii) ex limn→∞

G−1n = (ex lim

n→∞Gn)

−1.

(iii) If Gn is dissipative for each n then exlimn→∞ Gn is dissipative.

Exercise 6.2. Let An, A be bounded linear operators. Show that Af =limn→∞Anf for all f ∈ V implies A = exlimn→∞An. Hint: Lemma 3.43.

The main result of this section is:

Theorem 6.3. (Convergence of Feller processes) Let E be a compact

metrizable space and let (P(n),x)x∈E and (Px)x∈E be Feller processes in E

with Feller semigroups (P(n)t )t≥0 and (Pt)t≥0 and generators Gn and G,

respectively. Then the following statements are equivalent:

(a) ex limn→∞

Gn ⊃ G.

(b) ex limn→∞

Gn = G.

(c) P(n)t f −→

n→∞Ptf for all f ∈ C(E) and t ≥ 0.

(d) P(n),µn(Xt1 , . . . ,Xtm) ∈ · =⇒n→∞

P(n),µ(Xt1 , . . . ,Xtm) ∈ · whenever µn =⇒

n→∞µ.

(e) P(n),µn =⇒n→∞

Pµ whenever µn =⇒n→∞

µ.

Condition (a) means that exlimn→∞Gn, considered as a multivalued opera-tor, contains G. Thus, (a) says that for all f ∈ D(G) there exist fn ∈ D(Gn)such that fn → f and Gnfn → Gf . We can reformulate conditions (d) and

(e) as follows. Let X(n) and X be random variables with laws P(n),µn and

Pµ, respectively, i.e., X(n) is a version of the Markov process with semigroup

(P(n)t )t≥0, started in the initial law L(X(n)

0 ) = µn and X is a version of theMarkov process with semigroup (Pt)t≥0, started in the initial law L(X0) = µ.


Then condition (d) says that µn ⇒ µ implies that X(n) converges to X infinite dimensional distributions, and (e) says that µn ⇒ µ implies that

L(X(n)) ⇒ L(X), where L(X(n)) and L(X) are probability measures on the

‘path space’ DE [0,∞). In this case we say that X(n) converges to X in thesense of weak convergence in path space.

Under weak additional assumptions, weak convergence in path space im-plies convergence in finite dimensional distributions.

Lemma 6.4. (Converge of finite dimensional distributions) Let Y (n)

and Y be DE [0,∞)-valued random variables. Assume that PYt− = Yt = 1

for all t ≥ 0. Then L(Y (n)) ⇒ L(Y ) implies that L(Y (n)t1 , . . . , Y

(n)tk

) ⇒L(Yt1 , . . . , Ytk) for all 0 ≤ t1 ≤ · · · ≤ tk.

Proof. See [EK86, Theorem 3.7.8 (a)].

Exercise 6.5. Assume that Y (n) and Y are stochastic processes with samplepaths in CE[0,∞). Show that weak convergence in path space (of the Y (n) toY ) implies convergence in finite dimensional distributions.

Weak convergence in paths space is usually a more powerful statementthat convergence in finite dimensional distributions (and more difficult toprove). The next example shows that weak convergence in path space is notimplied by convergence of finite dimensional distributions.

(Counter-)Example. Let for n ≥ 1, Xn be the 0, 1-valued Markov processwith infinitesimal matrix (generator)

(6.2) A(n) :=

(−1 1n −n

)

and initial law PXn0 = 0 = 1.

Recall that then the corresponding semigroup is given by

(6.3)

T(n)t f = eA

(n)tf

=(Id− 1

(n+ 1)

∑

k≥1

(−(n+ 1)t)k

k!A(n)

)f

=(Id− 1

(n+ 1)(e−(n+1)t − 1)A(n)

)f

where we have used that (A(n))k = (−(n+ 1))k−1A(n).Put f(0) := 1 and f(1) := 0 then

(6.4) P0,(n)Xt = 0 = T(n)t f(0) = 1− 1

(n+ 1)(1− e−(n+1)t)−→

n→∞

1.

One can iterate the argument to show that finite dimensional distributionsof X under P0,(n) converge to those of a process that is identical 1. Onethe other hand, P0,(n) is supported on the set of paths which E0,(n)[τ ] = 1,

MARKOV PROCESSES 83

where τ := inft ≥ 0 : Xt ∈ 1 < ∞. Hence, the sequence L(Xn) doesnot converge in the sense of weak convergence on D0,1[0,∞).

Exercise 6.6. Let Y be a Poisson process with parameter λ, and define

(6.5) Xnt :=

1

n

(Yn2t − λn2t

).

apply Theorem 6.3 to show that Xn converges in distribution and identifyits limit.

The fact that conditions (a), (b) and (c) from Theorem 6.3 are equivalentfollows from abstract semigroup theory. We will only prove the easy impli-cation (c)⇒(b). For a full proof, see [EK86, Theorem 1.6.1].

Proposition 6.7. (Convergence of semigroups) Assume that (S(n)t )t≥0

and (St)t≥0 are strongly continuous contraction semigroups on a Banachspace V , with generators Gn and G, respectively. Then the following state-ments are equivalent:

(a) ex limn→∞

Gn ⊃ G.

(b) ex limn→∞

Gn = G.

(c) S(n)t f −→

n→∞Stf for all f ∈ V and t ≥ 0.

Proof. (c)⇒(b): Fix λ > 0. By Lemma 3.21, (λ−G)−1 is a bounded linearoperator which is given by

(λ−G)−1f =

∫ ∞

0Stfe

−λtdt (f ∈ V ).

A similar formula holds for (λ−Gn)−1. Since St and S

(n)t are contractions,

‖Stf − S(n)t f‖ ≤ 2‖f‖, so using bounded convergence

‖(λ−G)−1f − (λ−Gn)−1f‖ ≤

∫ ∞

0‖Stf − S

(n)t f‖e−λtdt −→

n→∞0.

By Exercise 6.2 this proves that

ex limn→∞

(λ−Gn)−1 = (λ−G)−1.

By Exercise 6.1, it follows that exlimn→∞Gn = G.Since (b)⇒(a) is trivial, to complete the proof it suffices to prove that

(a)⇒(c). This implication is more difficult. One proves that the Yosidaapproximations Gε and Gn,ε of G and Gn satisfy Gn,εf → Gεf for eachf ∈ V and ε > 0, uses this to derive estimates that are uniform in ε, andthen lets ε → 0.


The main technical tool in the proof of Theorem 6.3 is a tightness criterionfor sequences of probability laws on DE [0,∞), which we will not prove. Re-call the concept of tightness from Proposition 3.2. To stress the importanceof tightness, we note the following fact.

Lemma 6.8. (Application of tightness) Let Y (n) be a sequence of pro-cesses with sample paths in DE[0,∞). Assume that the finite dimensional

distributions of Y (n) converge and that the laws L(Y (n)) are tight. Then there

exists a process Y with sample paths in DE [0,∞) such that L(Y (n)) ⇒ L(Y ).

Proof. The weak limits limn→∞ L(Y (n)t1 , . . . , Y

(n)tn ) form a consistent family in

the sense of Kolmogorov’s extension theorem, so by the latter there exists anE-valued process Y ′ such that the Y (n) converge to Y ′ in finite dimensionaldistributions. Since the laws L(Y (n)) are tight, we can select a convergent

subsequence L(Y (nm)) ⇒ L(Y ). If we can show that all convergent sub-sequences have the same limit L(Y ), then by the exercise below, the laws

L(Y (n)) converge to L(Y ).For any function f ∈ C(E) and 0 ≤ t < u, the map w 7→

∫ ut f(w(s))ds

from DE [0,∞) to R is bounded and continuous. (Note that the coordi-

nate projections are not continuous!) Therefore, L(Y (nm)) ⇒ L(Y ) implies

that E[∫ ut f(Y

(nm)s )ds] → E[

∫ ut f(Ys)ds] for each t ≥ 0 and ε > 0. More-

over E[∫ ut f(Y

(nm)s )ds] =

∫ ut E[f(Y

(nm)s )]ds →

∫ ut E[f(Y ′

s)]ds by boundedconvergence, so by the right-continuity of sample paths

E[f(Yt)] = limε→0

E[ε−1

∫ ε

0f(Yt+s)ds

]= lim

ε→0ε−1

∫ ε

0E[f(Y ′

t+s)]ds.

A similar argument shows that

(6.6) E[f1(Yt1) · · · fk(Ytk)] = limε→0

ε−1

∫ ε

0E[f1(Y

′t1+s) · · · fk(Y ′

tk+s)]ds.

for any f1, . . . , fk ∈ C(E) and 0 ≤ t1 ≤ · · · ≤ tk. This clearly determines thefinite dimensional distributions of Y , and therefore L(Y ), uniquely. (Warn-ing: the finite dimensional distributions of Y and Y ′ need in general not bethe same!)

Exercise 6.9. Let M be a metrizable space and let (xn)n≥1 be a sequencein M . Assume that the closure of the set xn : n ≥ 1 is compact and thatthe sequence (xn)n≥1 has only one cluster point x. Show that xn → x.

The next theorem relates tightness of probability measures on DE [0,∞) tomartingales in the spirit of Proposition 5.18. Below, for any measurablefunction h : [0,∞) → R, T > 0, and p ∈ [1,∞] we define:

(6.7) ‖h‖p,T :=

( ∫ T0 |h(t)|pdt

) 1p if p < ∞,

ess supt∈[0,T ] |h(t)| if p = ∞.

MARKOV PROCESSES 85

Here the essential supremum is defined as:

ess supt∈[0,T ]

|h(t)| := infH ≥ 0 : |h(t)| ≤ H a.s.,

where a.s. means almost surely with respect to Lebesgue measure. Thus,‖h‖p,T is just the Lp-norm of the function [0, T ] ∋ t 7→ h(t) with respect toLebesgue measure.

Theorem 6.10. (Tightness criterion) Let E be compact and metrizable

and let X(n) : n ≥ 1 be a sequence of processes with sample paths in

DE[0,∞), defined on probability spaces (Ω(n),P(n),F (n)) and adapted to fil-

trations (F (n)t )t≥0. Let D ⊂ C(E) be dense and assume that for all f ∈ D

and n ≥ 1 there exist F (n)t -adapted real processes F (n) and G(n) with cadlag

sample paths, such that

Mt := F(n)t −

∫ t

0G(n)

s ds

is an F (n)t -martingale, and such that for each T > 0,

(6.8) supn

E(n)[

supt∈[0,T ]∩Q

|F (n)t − f(X

(n)t )|

]< ∞

and

(6.9) supn

E(n)[‖G(n)‖p,T

]< ∞ for some p ∈ (1,∞].

Then the laws L(X(n)) : n ≥ 1 are tight.

Proof. This is a much simplified version of Theorems 3.9.1 and 3.9.4 in[EK86].

Remark. For example, if X(n) is a Feller process with generator Gn and

fn ∈ D(Gn), then by Proposition 5.18, Mt := fn(X(n)t ) −

∫ t0 Gnfn(X

(n)s )ds

is an FX(n)

t -martingale. Thus, a typical application of Theorem 6.10 is to

take F(n)t := fn(X

(n)t ) and G

(n)t := Gnfn(X

(n)t ).

Counterexample. Taking p = 1 in (6.9) is not sufficient. To see this, for

n ≥ 1 let X(n) be the Markov process with generator (6.2) and initial law

P(n)X(n)0 = 0 = 1. Take for D the space of all real functions f on 0, 1 and

for such a function put F(n)t := f(X

(n)t ) and G

(n)t := Anf(X

(n)t ). Then by

Proposition 5.18, F(n)t −

∫ t0 G

(n)s ds is an FX(n)

t -martingale, (6.8) is satisfies,and by (6.4)

E[|gn(X(n)

t )|]= E

[|A(n)f(X

(n)t )|

]

= n|f(0)− f(1)|PX(n)t = 1+ |f(0)− f(1)|PX(n)

t = 0

= |f(0)− f(1)|(1 +

n− 1

n+ 1(1− e−(n+1)t)

)≤ 2|f(0)− f(1)|.


This shows that

supn

E[‖gn(X(n))‖1,T

]= sup

nE[ ∫ T

0|gn(X(n)

t )|ds]≤ 2T |f(0)− f(1)| < ∞,

so (6.9) is satisfied for p = 1. Since the X(n) converge in finite dimensional

distributions, if the laws L(X(n)) : n ≥ 1 were tight, then X(n) would alsoconverge weakly in path space. We have already seen that this is not thecase.

Proof of Theorem 6.3. Conditions (a), (b) and (c) are equivalent by Propo-sition 6.7. Our next step is to show that (c) is equivalent to (d). Indeed, if(c) holds, then for any f1, . . . , fn ∈ C(E) and 0 = t0 ≤ · · · ≤ tk,

E(n),µn[f1(Xt1) · · · fk(Xtk )

]= µnP

(n)t1−t0f1 · · ·P

(n)tk−tk−1

fk

−→n→∞

µPt1−t0f1 · · ·Ptk−tk−1fk = Eµ

[f1(Xt1) · · · fk(Xtn)

],

where we have used Lemma 3.43. This implies (d). Conversely, if (d) holds,then for any f ∈ C(E), xn → x, and t ≥ 0,

P(n)t f(xn) = E(n),xn [f(Xt)] −→

n→∞Ex[f(Xt)] = Ptf(x),

which proves that P(n)t f converges uniformly to Ptf (compare the proof of

Proposition 3.7).To complete the proof, it suffices to show that (a) and (d) imply (e)

and that (e) implies (b). (Warning: it is not immediately obvious that (e)implies (d) since weak convergence in path space does not in general implyconvergence in finite dimensional distributions.)

(a) & (d)⇒(e): Let X(n) be random variables with laws P(n),µn . We

start by showing that the laws L(X(n)) are tight. This is a straightforwardapplication of Theorem 6.10. We choose D := D(G), which is dense inC(E). By (a), for each f ∈ D there exist fn ∈ D(Gn) such that fn → f

and Gnfn → Gf . Setting F(n)t := fn(X

(n)t ) and G

(n)t := Gnfn(X

(n)t ), using

Proposition 5.18, we see that (6.8) and (6.8) are satisfied, where in the latterwe can take p = ∞.

Since the laws L(X(n)) are tight, we can select a convergent subsequenceL(X(nm)) ⇒ L(X). We are done if we can show that L(X) = Pµ (and henceall weak cluster points are the same). In the same way as in the proof ofLemma 6.8 (see in particular (6.6)), we find that

E[f1(Xt1) · · · fk(Xtk)] = limε→0

ε−1

∫ ε

0ds µPt1−t0+sf1 · · ·Ptk−tk−1+sfk

µPt1−t0f1 · · ·Ptk−tk−1fk

for any f1, . . . , fk ∈ C(E) and 0 ≤ t1 ≤ · · · ≤ tk. This proves that X is aversion of the Markov process with semigroup (Pt)t≥0 started in the initiallaw µ.

MARKOV PROCESSES 87

(e)⇒(b): This is similar to the proof of the implication (c)⇒(b) in Propo-sition 6.7. Fix λ > 0. Then

(λ−G)−1f(x) = Ex[ ∫ ∞

0f(Xt)e

−λtdt]

(x ∈ E, f ∈ C(E)).

A similar formula holds for (λ−Gn)−1. Since w 7→

∫∞0 f(w(t))e−λtdt from

DE[0,∞) to R is bounded and continuous, P(n),xn ⇒ Px implies that

(λ−Gn)−1f(xn) −→

n→∞(λ−G)−1f(x) (f ∈ C(E), xn, x ∈ E, xn → x).

This shows that ‖(λ −Gn)−1f − (λ −G)−1f‖ → 0. Just as in the proof of

Proposition 6.7, this implies that exlimn→∞Gn = G.

6.2. Proof of the main result (Theorem 4.2). The proof of Theorem 6.3has an important corollary.

Corollary 6.11. (Existence of limiting process) Let E be compact and

metrizable and let (P(n)t )t≥0 and (Pt)t≥0 be Feller semigroups on C(E) with

generators Gn and G, respectively. Assume that exlimn→∞Gn ⊃ G andthat for each n there exists a Markov process (P(n),x)x∈E with semigroup

(P(n)t )t≥0. Then there exists a Markov process (Px)x∈E with semigroup

(Pt)t≥0.

Proof. By Proposition 2.12, there exists for each x ∈ E an E-valued stochas-tic process Xx = (Xx

t )t≥0 such that Xx0 = x and Xx satisfies the equivalent

conditions (a)–(c) from Proposition 2.11. We need to show that Xx has aversion with cadlag sample paths. Let X(n),x be DE [0,∞)-valued random

variables with laws P (n),x. Our proof of Theorem 6.3 shows that the lawsL(X(n),x) are tight and that each cluster point has the same finite dimen-

sional distributions as Xx. It follows that the X(n),x converge weakly inpath space and that their limit is a version of Xx with cadlag sample paths.

We will use Corollary 6.11 to complete the proof of Theorem 4.2. All weneed to do is to show that a general Feller semigroup can be approximatedby ‘easy’ semigroups, for which we know that they correspond to a Markovprocess.

Proof of Theorem 4.2. Let E be compact and metrizable and let (Pt)t≥0 bea Feller semigroup on E with generator G. For each ε > 0, let Gε denotethe Yosida approximation to G, defined in (3.91). We claim that Gε is thegenerator of a jump process in the sense of Proposition 4.4 (and hence thereexists a Markov process associated with the semigroup generated by Gε).Indeed, by Lemma 3.21,

(1− εG)−1f =

∫ ∞

0Ptf ε−1e−t/εdt,


so if we define continuous probability kernels Kε on E by

Kε(x,A) :=

∫ ∞

0Pt(x,A) ε

−1e−t/εdt (x ∈ E, A ∈ B(E)),

then Gεf = ε−1(Kεf − f), which shows that Gε is the generator of a jumpprocess. Choose εn → 0. Then formula (3.92) implies that exlimn→∞Gεn ⊃G, which by Corollary 6.11 shows that there exists a Markov process (Px)x∈E

with semigroup (Pt)t≥0.

MARKOV PROCESSES 89

7. Strong Markov property

Let X := (Xt)t≥0, defined on (Ω,F ,P), be an E-valued Markov processwith respect to a filtration (Ft)t≥0 such that X is (Ft)-progressive (recallDefinition 5.3).

Recall that the Markov property says that given the “present”, the futureis independent of the past. In this section we want to replace the determin-istic notion of “present” by a stopping time.

Recall the intuitive description of Ft as the information known to anobserver at time t. For an (Ft)-stopping time τ , the σ-algebra should havethe same intuitive meaning.

Definition 7.1 (σ-algebra generated by a stopping time). For an Ft-stoppingtime τ , put

(7.1) FS :=A ∈ F : A ∩ τ ≤ t ∈ Ft, ∀ t ≥ 0

.

Similarly, Fτ+ is defined by replacing in (7.1) Ft by Ft+.

Exercise 7.2. Fix t ≥ 0. Show that if Pτ = t, then Ft = Fτ upto P-zerosets.

We immediately get the following useful properties.

Lemma 7.3. Let σ and τ be (Ft)-stopping times, let X be an (Ft)-progressiveE-valued process. Then the following hold:

(i) Fτ is a σ-algebra.(ii) τ ∧ σ is Fτ -measurable.(iii) If σ ⊆ τ then Fσ ⊆ Fτ .(vi) Xτ is Fτ -measurable.

Proof. (i) Obviously, ∅,Ω ∈ Fτ . If A ∈ Fτ then

(7.2) Ac ∩ τ ≤ t = τ ≤ t \ (A ∩ τ ≤ t) ∈ Ft

for all t ≥ 0, and therefore Ac ∈ Fτ . Similarly, if (An)n∈N ∈ Fτ then

(7.3) (⋃

n∈NAn) ∩ τ ≤ t =

⋃

n∈N(An ∩ τ ≤ t) ∈ Ft

for all t ≥ 0, and hence⋃

n∈NAn ∈ Fτ .

(ii) For each c ≥ 0 and t ≥ 0,

(7.4) σ ∧ τ ≤ c ∩ τ ≤ t = σ ∧ τ ≤ c ∧ t ∩ τ ≤ t ∈ Ft.

Hence σ ∧ τ ≤ c ∈ Fτ and σ ∧ τ is Fτ -measurable.

(iii) Let A ∈ Fσ. Then for all t ≥ 0,

(7.5) A ∩ τ ≤ t = A ∩ τ ≤ t ∩ σ ≤ t ∈ Ft.

Hence A ∈ Fτ .


(iv) Fix t ≥ 0, and apply (ii) to τ := t to the effect that σ ∧ t ∈ Fσ.Xσ∧t is the composition of the (Ω,Ft)-([0, t] × Ω,B([0, t]) ×Ft) measurablemapping which sends ω to (σ(ω) ∧ t, ω) with the ([0, t] × Ω,B([0, t]) × Ft)-(E,B(E))-measurable mapping which sends (s, ω) to Xs(ω). Notice that forthe measurability of the second mapping one uses that X is (Ft)-progressive.As a consequence Xσ∧t is Ft-measurable. Therefore, for all t ≥ 0 andΓ ∈ B(E),

(7.6) Xσ ∈ Γ ∩ σ ≤ t = Xσ∧t ∈ Γ ∩ σ ≤ t ∈ Ft.

Hence Xσ ∈ Γ ∈ Fσ for all Γ ∈ B(E), or equivalently, Xσ is Fσ-measurable.

We next define the strong Markov property of a Markov process.

Definition 7.4 (Strong Markov property). Let X := (Xt)t≥0, defined on(Ω,F ,P), be an E-valued Markov process with respect to a filtration (Ft)t≥0

such that X is (Ft)-progressive (recall Definition 5.3). Suppose Pt(x,A) isa transition function for X, and let τ be a (Ft)-stopping time with τ < ∞,almost surely.

• X is said to be strong Markov at τ if

(7.7) PXτ+t ∈ A|Gτ

= Pt(Xτ , A)

for all t ≥ 0 and A ∈ B(E).• X is said to be a strong Markov process with respect to (Ft) if X isstrong Markov at τ for all (Ft)-stopping times τ with τ < ∞, almostsurely.

(Counter-)Example. A typical counterexample appears once we mix deter-ministic evolution with random evolution. Consider the R-valued processwhich has the following dynamics.

• If x 6= 0, then X grows (deterministically) with unit speed,• while if X reaches x = 0, then it spends their an exponential timewith unit parameter.

In formulae, its semigroup is given as(7.8)

Ttf(x) :=

f(x+ t), if x ≤ 0, x+ t ≤ 0

e−(t+x)f(0) +∫ t−x du e

−uf(t− u), if x ≤ 0, x+ t > 0,

f(x+ t), if x > 0.

It is easy to check that (7.8) indeed gives a Markovian semigroup. Tosee that the corresponding Markov process does not has the strong Markovproperty, put

(7.9) σ := inft ≥ 0 : Xt > 0,and start the process in x < 0 (and thereby ensure that σ < ∞, a.s.). Sinceσ ≥ t =

⋃s∈[0,t]∩QXs ≤ 0 ∈ FX

t for all t ≥ 0, σ is a (FXt+)-stopping

MARKOV PROCESSES 91

time. Moreover, since X has right continuous paths, Xσ+ = Xσ = 0 . HenceE[Xσ+t|Xσ+] < t, a.s., while E[Xσ+t|FX

σ+] = t which contradicts (7.7) (with

(Ft) replaced by (FXt+)).

The following result says that progressive Markov processes are strongMarkov at discrete stopping times.

Proposition 7.5. Let X be E-valued, (Ft)-progressive, and (Ft)-Markov,and let Pt(x,A) be a transition function for X. Let τ be a discrete (Ft)-stopping time with τ < ∞, almost surely. Then X is strong Markov atτ .

Proof. Let τ be a discrete (Ft)-stopping time with τ < ∞ a.s. We need toshow that for all B ∈ Gτ ,

(7.10) E[f(Xt+τ );B

]= E

[ ∫Pt(Xτ ,dy)f(y);B

].

By assumption, there are t1, t2, . . . such that τ ∈ t1, t2, . . .. Further-more, if B ∈ Gτ then B ∩ τ = tk ∈ Gtk for all k ∈ N, and hence for allf ∈ B(E) and t ≥ 0,

(7.11)

E[f(Xt+τ );B ∩ τ = tk

]= E

[f(Xtk+τ );B ∩ τ = tk

]

= E[ ∫

Pt(Xti ,dy)f(y);B ∩ τ = tk]

= E[ ∫

Pt(Xτ ,dy)f(y);B ∩ τ = tk].

Summing over all k yields (7.10).

The next result states that each stopping time is the limit of a decreasingsequence of discrete stopping times.

Lemma 7.6. Let (Ft)t≥0 be a filtration, and τ be a (Ft+)-stopping time.Then there exists a decreasing sequence (τn)n∈N of discrete (Ft)-stoppingtimes such that τ = limn→∞ τn.

Proof. Choose for each n ∈ N, 0 = tn0 < tn1 < · · · such that limn→∞ tn = ∞and limn→∞ supk∈N(t

nk+1 − tnk) = 0. Then put

(7.12) τn :=

tnk+1, if tnk ≤ t < tnk+1,∞, τ = ∞.

Obviously, limn→∞ τn = τ , while (τn)n∈N is decreasing if (tn+1k )k∈N is finer

than (τnk )k∈N.

We will exploit the latter to state that Feller semigroups define strongMarkov processes.


Theorem 7.7 (Feller semigroups give strong Markov processes). Let Ebe locally compact and separable, and let (Pt)t≥0 be a Feller semigroup onCb(E). Then for each probability law ν on E there exists a Markov process Xcorresponding to (Pt)t≥0 with initial law ν and sample paths in DE([0,∞))which is strong Markov with respect to the filtration Ft := FX

t+.

Proof. We already know from Theorem 4.2 (combined with the consider-ations for locally compact state spaces discussed in Subsection 4.4) thatunder the above assumptions there is a Markov process X with (cadlagpaths) corresponding to (Pt)t≥0 with initial law ν. It remains to verify thestrong Markov property..

Assume for the moment that τ is a discrete (Ft)-stopping time with τ <∞, i.e., τ can be written as

(7.13) τ :=∑

n≥1

tn1τ = tn

for suitable (tn)n∈N in [0,∞). Let A ∈ Ft, s > 0, and f ∈ C(E). Thenτ = tn ∈ Ftn+ε for all ε > 0 and n ∈ N, so

(7.14)

∫

A∩τ=tndP f(Xτ+s) =

∫

A∩τ=tndP f(Xtn+s)

=

∫

A∩τ=tnPs−ε f(Xtn+ε)

for all ε ∈ [0, s]. Since (Pt)t≥0 is strongly continuous, Tsf is continuous onE for all s ≥ 0. Moreover, since X has right continuous sample paths, wecan let ε ↓ 0 in (7.14) to the effect that it holds with for ε = 0 as well. Thisgives

(7.15) E[Xτ+s

∣∣Fτ

]= Psf(Xτ )

for discrete τ .If τ is an arbitrary (Ft)-stopping time, with τ < ∞, a.s., we know from

Lemma 7.6 that τ can be written as the decreasing limit of discrete stop-ping times (τn)n∈N. It follows then from continuity of Psf and the rightcontinuous sample paths that (7.15) holds.

References

[Bil86] P. Billingsley. Probability and Measure. John Wiley & Sons, New York, 1986.

[Bou58] N. Bourbaki. Elements de Mathematique, 2nd ed., Book 3, Chap. 9. Hermann &Cie, Paris, 1958.

[Bou64] N. Bourbaki. Elements de Mathematique, 2nd ed., Book 3, Fascicule de Resultats.Hermann & Cie, Paris, 1964.

[Cho69] G. Choquet. Lectures on Analysis, Vol. 1. W.A. Benjamin, New York, 1969.[Chu74] K.L. Chung. A Course in Probability Theory, 2nd ed. Academic Press, Orlando,

1974.[Dan19] P.J. Daniell. Integrals in an infinite number of dimensions. Annals of Mathemat-

ics, 20:281–288, 1919.[EK86] Stewart N. Ethier and Thomas G. Kurtz. Markov processes: Characterization

and convergence. John Wiley and Sons, 1986.[Fri64] A. Friedman. Partial Differential Equations of Parabolic Type. Prentice-Hall, En-

glewood Cliffs, 1964.[Kel55] J.L. Kelley. General Topology. Van Nostrand, New York, 1955.[Kol33] A.N. Kolmogorov. Grundbegriffe der Wahrscheinlichkeitstheorie, volume 2(3) of

Ergeb. Math. Springer, Berlin, 1933.[Kol56] A.N. Kolmogorov. On skorohod convergence. Theory Probab. Appl., 1:213–222,

1956.[KS88] Ioannis Karatzas and Steven E. Shreve. Brownian Motion and Stochastic Calcu-

lus. Springer-Verlag, 1988.[KS91] I. Karatzas and E.S. Shreve. Brownian Motion and Stochastic Calculus, 2nd ed.

Springer, New York, 1991.[RS80] Michael Reed and Barry Simon. Functional Analysis, volume I. Academic Press,

Inc., 1980.[Sch73] L. Schwartz. Radon Measures on Arbitrary Topological Spaces and Cylindical

Measures. Tata Institute, Oxford University Press, London, 1973.

Jan Swart, Mathematisches Institut, Universitat Erlangen–Nurnberg, Bis-

marckstraße 1 12, 91054 Erlangen, GERMANY

E-mail address: [email protected]

Anita Winter, Mathematisches Institut, Universitat Erlangen–Nurnberg,

Bismarckstraße 1 12, 91054 Erlangen, GERMANY

E-mail address: [email protected]

MARKOV PROCESSES: THEORY AND EXAMPLES - uni …hm0110/Markovprocesses/sw20.pdf · MARKOV PROCESSES:...

Documents

Transcript of MARKOV PROCESSES: THEORY AND EXAMPLES - uni …hm0110/Markovprocesses/sw20.pdf · MARKOV PROCESSES:...