centlib.ajums.ac.ircentlib.ajums.ac.ir/multiMediaFile/58588890-4-1.pdf · Stochastic Mechanics...

Stochastic Mechanics

Random Media

Signal Processing and Image Synthesis

Mathematical Economics and Finance

Stochastic Optimization

Stochastic Control

Stochastic Models in Life Sciences

G. Grimmett

Advisory Board D. Dawson D. Geman I. Karatzas F. Kelly

Y. Le Jan B. Øksendal G. Papanicolaou E. Pardoux

and Applied Probability(Formerly:Applications of Mathematics)

Stochastic Modelling

Edited by B. Rozovski ı

60

For other titles published in this series, go towww.springer.com/series/602

Alan Bain · Dan Crisan

Fundamentals of StochasticFiltering

123

Alan BainBNP Paribas10 Harewood AvLondon NW1 6AAUnited [email protected]

Dan Crisan

Library of Congress Control Number: 2008938477

Mathematics Subject Classification (2000): 93E10, 93E11, 60G35, 62M20, 60H15

c© Springer Science+Business Media, LLC 2009 All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use inconnection with any form of information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subject toproprietary rights.

Printed on acid-free paper

springer.com

ISBN: 978-0-387-76895-3 e-ISBN: 978-0-387-76896-0DOI 10.1007/978-0-387-76896-0

ISSN: 0172-4568 Stochastic Modelling and Applied Probability

Managing EditorsB. RozovskiDivision of Applied Mathematics182 George St.

[email protected]

G. GrimmettCentre for Mathematical SciencesWilberforce RoadCambridge CB3 0WBUK

ı

Providence, RI 02912

[email protected]

Department of MathematicsImperial College London180 Queen’s Gate

United [email protected]

London SW7 2AZ

Preface

Many aspects of phenomena critical to our lives can not be measured directly.Fortunately models of these phenomena, together with more limited obser-vations frequently allow us to make reasonable inferences about the state ofthe systems that affect us. The process of using partial observations and astochastic model to make inferences about an evolving system is known asstochastic filtering.

The objective of this text is to assist anyone who would like to becomefamiliar with the theory of stochastic filtering, whether graduate student ormore experienced scientist. The majority of the fundamental results of thesubject are presented using modern methods making them readily availablefor reference. The book may also be of interest to practitioners of stochasticfiltering, who wish to gain a better understanding of the underlying theory.

Stochastic filtering in continuous time relies heavily on measure theory,stochastic processes and stochastic calculus. While knowledge of basic measuretheory and probability is assumed, the text is largely self-contained in thatthe majority of the results needed are stated in two appendices. This shouldmake it easy for the book to be used as a graduate teaching text. With thisin mind, each chapter contains a number of exercises, with solutions detailedat the end of the chapter.

The book is divided into two parts: The first covers four basic topics withinthe theory of filtering: the filtering equations (Chapters 3 and 4), Clark’srepresentation formula (Chapter 5), finite-dimensional filters, in particular,the Benes and the Kalman–Bucy filter (Chapter 6) and the smoothness of thesolution of the filtering equations (Chapter 7). These chapters could be usedas the basis of a one- or two-term graduate lecture course.

The second part of the book is dedicated to numerical schemes for theapproximation of the solution of the filtering problem. After a short survey ofthe existing numerical schemes (Chapter 8), the bulk of the material is dedi-cated to particle approximations. Chapters 9 and 10 describe various particlefiltering methods in continuous and discrete time and prove associated con-

vi Preface

vergence results. The material in Chapter 10 does not require knowledge ofstochastic integration and could form the basis of a short introductory course.

We should like to thank the publishers, in particular the senior editor,Achi Dosanjh, for her understanding and patience. Thanks are also due tovarious people who offered their support and advice during the project, inparticular Martin Clark, Mark Davis and Boris Rozovsky. One of the authors(D.C.) would like to thank Robert Piche for the invitation to give a series oflectures on the subject in August 2006.

Part of the book grew out of notes on lectures given at Imperial CollegeLondon, University of Cambridge and Tampere University of Technology. Spe-cial thanks are due to Kari Heine from Tampere University of Technology andOlasunkanmi Obanubi from Imperial College London who read large portionsof the first draft and suggested many corrections and improvements.

Finally we would like to thank our families for their support, without whichthis project would have never happened.

London Alan BainDecember 2007 Dan Crisan

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 The Contents of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Historical Account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Part I Filtering Theory

2 The Stochastic Process π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1 The Observation σ-algebra Yt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 The Optional Projection of a Measurable Process . . . . . . . . . . . . 172.3 Probability Measures on Metric Spaces . . . . . . . . . . . . . . . . . . . . . 19

2.3.1 The Weak Topology on P(S) . . . . . . . . . . . . . . . . . . . . . . . . 212.4 The Stochastic Process π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.4.1 Regular Conditional Probabilities . . . . . . . . . . . . . . . . . . . 322.5 Right Continuity of Observation Filtration . . . . . . . . . . . . . . . . . . 332.6 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.7 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3 The Filtering Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.1 The Filtering Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2 Two Particular Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.2.1 X a Diffusion Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.2.2 X a Markov Process with a Finite Number of States . . . 51

3.3 The Change of Probability Measure Method . . . . . . . . . . . . . . . . 523.4 Unnormalised Conditional Distribution . . . . . . . . . . . . . . . . . . . . . 573.5 The Zakai Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

viii Contents

3.6 The Kushner–Stratonovich Equation . . . . . . . . . . . . . . . . . . . . . . . 673.7 The Innovation Process Approach . . . . . . . . . . . . . . . . . . . . . . . . . 703.8 The Correlated Noise Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 733.9 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.10 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

4 Uniqueness of the Solution to the Zakai and theKushner–Stratonovich Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 954.1 The PDE Approach to Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . 964.2 The Functional Analytic Approach . . . . . . . . . . . . . . . . . . . . . . . . 1104.3 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164.4 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5 The Robust Representation Formula . . . . . . . . . . . . . . . . . . . . . . . 1275.1 The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.2 The Importance of a Robust Representation . . . . . . . . . . . . . . . . 1285.3 Preliminary Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.4 Clark’s Robustness Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.5 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1395.6 Bibliographic Note . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

6 Finite-Dimensional Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1416.1 The Benes Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

6.1.1 Another Change of Probability Measure . . . . . . . . . . . . . . 1426.1.2 The Explicit Formula for the Benes Filter . . . . . . . . . . . . 144

6.2 The Kalman–Bucy Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1486.2.1 The First and Second Moments of the Conditional

Distribution of the Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 1506.2.2 The Explicit Formula for the Kalman–Bucy Filter . . . . . 154

6.3 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

7 The Density of the Conditional Distribution of the Signal . 1657.1 An Embedding Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1667.2 The Existence of the Density of ρt . . . . . . . . . . . . . . . . . . . . . . . . . 1687.3 The Smoothness of the Density of ρt . . . . . . . . . . . . . . . . . . . . . . . 1747.4 The Dual of ρt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1807.5 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

Part II Numerical Algorithms

8 Numerical Methods for Solving the Filtering Problem . . . . . 1918.1 The Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1918.2 Finite-Dimensional Non-linear Filters . . . . . . . . . . . . . . . . . . . . . . 1968.3 The Projection Filter and Moments Methods . . . . . . . . . . . . . . . 1998.4 The Spectral Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

Contents ix

8.5 Partial Differential Equations Methods . . . . . . . . . . . . . . . . . . . . . 2068.6 Particle Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2098.7 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

9 A Continuous Time Particle Filter . . . . . . . . . . . . . . . . . . . . . . . . . 2219.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2219.2 The Approximating Particle System . . . . . . . . . . . . . . . . . . . . . . . 223

9.2.1 The Branching Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 2259.3 Preliminary Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2309.4 The Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2419.5 Other Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2499.6 The Implementation of the Particle Approximation for πt . . . . . 2509.7 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

10 Particle Filters in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 25710.1 The Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25710.2 The Recurrence Formula for πt . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25910.3 Convergence of Approximations to πt . . . . . . . . . . . . . . . . . . . . . . 264

10.3.1 The Fixed Observation Case . . . . . . . . . . . . . . . . . . . . . . . . 26410.3.2 The Random Observation Case . . . . . . . . . . . . . . . . . . . . . 269

10.4 Particle Filters in Discrete Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 27210.5 Offspring Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27510.6 Convergence of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28110.7 Final Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28510.8 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Part III Appendices

A Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293A.1 Monotone Class Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293A.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293A.3 Topological Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296A.4 Tulcea’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

A.4.1 The Daniell–Kolmogorov–Tulcea Theorem . . . . . . . . . . . . 301A.5 Cadlag Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303

A.5.1 Discontinuities of Cadlag Paths . . . . . . . . . . . . . . . . . . . . . 303A.5.2 Skorohod Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

A.6 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306A.7 The Optional Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

A.7.1 Path Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312A.8 The Previsible Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317A.9 The Optional Projection Without the Usual Conditions . . . . . . 319A.10 Convergence of Measure-valued Random Variables . . . . . . . . . . . 322A.11 Gronwall’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

x Contents

A.12 Explicit Construction of the UnderlyingSample Space for the Stochastic Filtering Problem . . . . . . . . . . . 326

B Stochastic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329B.1 Martingale Theory in Continuous Time . . . . . . . . . . . . . . . . . . . . 329B.2 Ito Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330

B.2.1 Quadratic Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332B.2.2 Continuous Integrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338B.2.3 Integration by Parts Formula . . . . . . . . . . . . . . . . . . . . . . . 341B.2.4 Ito’s Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343B.2.5 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343

B.3 Stochastic Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344B.3.1 Girsanov’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345B.3.2 Martingale Representation Theorem . . . . . . . . . . . . . . . . . 348B.3.3 Novikov’s Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350B.3.4 Stochastic Fubini Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 351B.3.5 Burkholder–Davis–Gundy Inequalities . . . . . . . . . . . . . . . . 353

B.4 Stochastic Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 355B.5 Total Sets in L1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355B.6 Limits of Stochastic Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358B.7 An Exponential Functional of Brownian motion . . . . . . . . . . . . . 360

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367

Author Name Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387

Notation

Spaces

• Rd – the d-dimensional Euclidean space.• Rd – the one-point compactification of Rd formed by adjoining a single

point at infinity to Rd.• B(S) – the Borel σ-field on S. That is the σ-field generated by the open

sets in S. If S = Rd for some d, then this σ-field is countably generated.• (S,S) – the state space for the signal. Unless otherwise stated, S is a

complete separable metric space and S is the associated Borel σ-field B(S).• C(S) – the space of real-valued continuous functions defined on S.• M(S) – the space of B(S)-measurable functions S→ R.• B(S) – the space of bounded B(S)-measurable functions S→ R.• Cb(S) – the space of bounded continuous functions S→ R.• Ck(S) – the space of compactly supported continuous functions S→ R.• Cmk (S) – the space of compactly supported continuous functions S → R

whose first m derivatives are continuous.• Cmb (Rd) – the space of all bounded, continuous functions with bounded

partial derivatives up to order m. The norm ‖ · ‖m,∞ is frequently usedwith this space.

• C∞b (Rd) =⋂∞m=0 C

mb (Rd).

• DS[0,∞) – the space of cadlag functions from [0,∞)→ S.• C1,2

b the space of bounded continuous real-valued funtions u(t, x) withdomain [0,∞) × R, which are differentiable with respect to t and twicedifferentiable with respect to x. These derivatives are bounded and con-tinuous with respect to (t, x).

• Cl(Rd) the subspace of C(Rd) containing functions ϕ such that ϕ/ψ ∈Cb(Rd), where ψ(x) = 1 + ‖x‖.

• Wmp (Rd) – the Sobolev space of all functions with generalized partial

derivatives up to order m with both the function and all its partial deriva-tives being Lp-integrable. This space is usually endowed with the norm‖ · ‖m,p.

xii Notation

• SL(Rd) =ϕ ∈ Cb(Rd) : ∃M such that ϕ(x) ≤M/(1 + ‖x‖), ∀x ∈ Rd

• M(S) – the space of finite measures over (S,S).• P(S) – the space of probability measures over (S,S), i.e the subspace ofM(S) such that µ ∈ P(S) satisfies µ(S) = 1.

• DMF (Rd)[0,∞) – the space of right continuous functions with left limitsa : [0,∞)→MF (Rd) endowed with the Skorohod topology.

• I – an arbitrary finite set a1, a2, . . ..• P (I) – the power set of I, i.e. the set of all subsets of I.• M(I) – the space of finite positive measures over (I, P (I)).• P(I) – the space of probability measures over (I, P (I)), i.e. the subspace

of M(I) such that µ ∈ P (I) satisfies µ(I) = 1.

Other notations

• ‖ · ‖ – the Euclidean norm, for x = (xi)mi=1 ∈ Rm, ‖x‖ =√x2

1 + · · ·+ x2m.

It is also applied to d × p-matrices by considering them as d × p vectors,viz

‖a‖ =

√√√√ d∑i=1

p∑j=1

a2ij .

• ‖ · ‖∞ – the supremum norm; for ϕ : Rd → R, ‖ϕ‖∞ = supx∈Rd |ϕ(x)|. Ingeneral if ϕ : Rd → Rm then

‖ϕ‖∞ = maxi=1,...m

supx∈Rd

|ϕi(x)|.

The notation ‖ · ‖∞ is equivalent to ‖ · ‖0,∞. This norm is especially usefulon spaces such as Cb(Rd), or Ck(Rd), which only contain functions ofbounded supremum norm; in other words, ‖ϕ‖∞ <∞.

• ‖ · ‖m,p – the norm used on the space Wmp defined by

‖ϕ‖m,p =

∑|α|≤m

‖Dαϕ(x)‖pp

1/p

where α = (α1, . . . , αd) is a multi-index and Dαϕ = (∂1)α1. . . (∂d)α

d

ϕ.• ‖ · ‖m,∞ is the special case of the above norm when p =∞, defined by

‖ϕ‖m,∞ =∑|α|≤m

supx∈Rd

|Dαϕ(x)| .

• δa – the Dirac measure concentrated at a ∈ S, δx(A) ≡ 1A(x).• 1 – the constant function 1.• ⇒ – used to denote weak convergence of probability measures in P(S); see

Definition 2.14.

Notation xiii

• µf , µ(f) – the integral of f ∈ B(S) with respect to µ ∈ M(S), i.e. µf ,∫S f(x)µ(dx).

• a> is the transpose of the matrix a.• Id – the d× d identity matrix.• Od,m – the d×m zero matrix.• tr(A) – the trace of the matrix A, i.e. if A = (aij), then tr(A) =

∑i aii.

• [x] – the integer part of x ∈ R.• x – the fractional part of x ∈ R, i.e. x− [x].• 〈M〉t – the quadratic variation of the semi martingale M .• s ∧ t – for s, t ∈ R, s ∧ t = min(s, t).• s ∨ t – for s, t ∈ R, t ∨ s = max(s, t).• A ∨B – the σ-algebra generated by the union A ∪B.• A4B – the symmetric difference of sets A and B, i.e. all elements that

are in one of A or B but not both, formally A4B = (A \B) ∪ (B \A).• N – the collection of null sets in the probability space (Ω,F ,P).

Part I

Filtering Theory

2

The Stochastic Process π

The principal aim of this chapter is to familiarize the reader with the factthat the conditional distribution of the signal can be viewed as a stochasticprocess with values in the space of probability measures. While it is true thatthis chapter sets the scene for the subsequent chapters, it can be skipped bythose readers whose interests are biased towards the applied aspects of thesubject. The gist of the chapter can be summarized by the following.

The principal aim of solving a filtering problem is to determine the condi-tional distribution of the signal X given the observation σ-algebra Yt, where

Yt , σ(Ys, 0 ≤ s ≤ t) ∨N ,

where N is the collection of all null sets of the complete probability space(Ω,F ,P) (see Remark 2.3 for comments on what is possible without the addi-tion of these null sets to Yt). We wish to formalise this by defining a stochasticprocess describing this conditional distribution. Let the signal process X takevalues in a measurable space (S,S). Suppose we naıvely define a stochasticprocess (ω, t) → πωt taking values in the space of functions from S into [0, 1]by

πωt (A) = P [Xt ∈ A | Yt] (ω), (2.1)

where A is an arbitrary set in the σ-algebra S. Recalling Kolmogorov’s defini-tion of conditional expectation†, πωt (A) is not uniquely defined for all ω ∈ Ω,but only for ω outside a P-null set, which may depend upon the set A. It wouldbe natural to think of this πt as a probability measure on (S,S). However, thisis not straightforward. For example consider the countable additivity prop-erty which any measure must satisfy. Let A1, A2, . . . ∈ S be a sequence ofpairwise disjoint sets, then by properties a. and c. of conditional expectation(see Section A.2), πt(·)(ω) satisfies the expected σ-additivity condition

† See Section A.2 in the appendix for a brief review of the properties of conditionalexpectation and conditional probability.

A. Bain, D. Crisan, Fundamentals of Stochastic Filtering,DOI 10.1007/978-0-387-76896-0 c© Springer Science+Business Media, LLC 20092,

14 2 The Stochastic Process π

πωt

(⋃n

An

)=∑n

πωt (An)

for every ω ∈ Ω\N (An, n ≥ 1), where N (An, n ≥ 1) is a P-null set whichdepends on the choice of the disjoint sets An, n ≥ 1. Then we define

N =⋃N (An, n ≥ 1),

where the union is taken over all sequences of disjoint sets (An)n≥1, such thatfor all n > 0, An ∈ S. Then πωt satisfies the σ-additivity property for arbitrarysets An, n ≥ 1 only if ω /∈ N . Although the P-measure of N (An, n ≥ 1) iszero, the set N need not even be measurable because it is defined in terms ofan uncountable union, and furthermore, N need not be contained in a P-nullset. This would imply that πt cannot be a probability measure.

To solve this difficulty we require that the state space of the signal Sbe a complete separable metric space and S be the Borel σ-algebra B(S).This enables us to define πt as the regular conditional distribution (in thesense of Definition A.2) of Xt given Yt. Defined in this manner, the processπ = πt, t ≥ 0 will be a P(S)-valued Yt-adapted process which satisfies (2.1)for any t ≥ 0.

Unfortunately this is not enough. A second requirement must be satisfiedby the process π. One of the results established in Chapter 3 is an evolutionequation (1.4) for π, which is called the filtering equation. This evolutionequation involves a stochastic integral with respect to the observation processY whose integrand is described in terms of π.

Since the integrator process Y is continuous, it follows from Theorem B.19that the stochastic integral with respect to Y is defined if π is a progressivelymeasurable process, that is, if the function

(t, ω)→ πt : ([0, T ]×Ω,B([0, T ])⊗ Yt)→ (P(S),B(P(S))),

is measurable for any T > 0. It is necessary to show that π has a version whichis progressively measurable. We construct such a version for a signal processX which has cadlag paths. In general, such a version is no longer adapted withrespect to Yt, but with respect to a right continuous enlargement of Yt. In thecase of the problems considered within this book Yt itself is right continuous(see Section 2.5) so no enlargement is required.

Theorem 2.1. Let S be a complete separable metric space and S be the as-sociated Borel σ-algebra. Then there exists a P(S)-valued Yt-adapted processπ = πt, t ≥ 0 such that for any f ∈ B(S)

πtf = E[f(Xt) | Yt] P-a.s.

In particular, identity (2.1) holds true for any A ∈ B(S). Moreover, if Ysatisfies the evolution equation

2 The Stochastic Process π 15

Yt = Y0 +∫ t

0

h(Xs) ds+Wt, t ≥ 0, (2.2)

where W = Wt, t ≥ 0 is a standard Ft-adapted m-dimensional Brownianmotion and h = (hi)mi=1 : S→ Rm is a measurable function such that

E[∫ t

0

‖h(Xs)‖ ds]<∞ (2.3)

and

P(∫ t

0

‖πs(h)‖2 ds <∞)

= 1. (2.4)

for all t ≥ 0, then π has a Yt-adapted progressively measurable modification.Furthermore, if X is cadlag then πt can be chosen to have cadlag paths.

The conditions (2.3) and (2.4) are frequently difficult to check (particularly(2.4)). They are implied by the stronger, but simpler condition

E[∫ t

0

‖h(Xs)‖2 ds]<∞. (2.5)

To prove Theorem 2.1 we prove first a more general result (Theorem 2.24)which justifies the existence of a version of π adapted with respect to a rightcontinuous enlargement of the observation filtration Yt. This result is provedwithout imposing any additional constraints on the observation process Y .However, under the additional constraints (2.2)–(2.4) as a consequence of The-orem 2.35, the filtration Yt is right continuous, so no enlargement is required.Theorem 2.1 then follows.

In order to prove Theorem 2.24, we must introduce the optional projectionof a stochastic process with respect to a filtration which satisfies the usualconditions. The standard construction of the optional projection requires thefiltration to be right continuous and a priori the filtration Yt may not havethis property. Therefore choose a right continuous enlargement of the filtrationYt defined by Yt+, t ≥ 0, where Yt+ = ∩s>tYs. The existence of such anoptional projection is established in Section 2.2.

Remark 2.2. The construction of the optional projection is valid without re-quiring that the filtration satisfy the usual conditions (see Section A.9). How-ever such conditions are too weak for the proof of Theorem 2.24.

Remark 2.3. We always assume that the process π is this progressively mea-surable version and consequently Yt, t ≥ 0 always denotes the augmentedobservation filtration. However, for any t ≥ 0, the random probability measureπt has a σ(Ys, s ∈ [0, t])-measurable version, which can be used whenever theprogressive measurability property is not required (see Exercise 2.36). Such aversion of πt, being σ(Ys, s ∈ [0, t])-adapted, is a function of the observationpath and thus is completely determined by the observation data. It turns outthat πt is a continuous function of the observation path. This is known as thepath-robustness of filtering theory and it is discussed in Chapter 5.


2.1 The Observation σ-algebra Yt

Let (Ω,F ,P) be a probability space together with a filtration (Ft)t≥0 whichsatisfies the usual conditions:

1. F is complete i.e. A ⊂ B, B ∈ F and P(B) = 0 implies that A ∈ F andP(A) = 0.

2. The filtration Ft is right continuous i.e. Ft = Ft+.3. F0 (and consequently all Ft for t ≥ 0) contains all the P-null sets.

On (Ω,F ,P) we consider a stochastic process X = Xt, t ≥ 0 which takesvalues in a complete separable metric space S (the state space). Let S be theassociated Borel σ-algebra. We assume that X is measurable. That is, X hasthe property that the mapping

(t, ω)→ Xt(ω) : ([0,∞)×Ω,B([0,∞))⊗F)→ (S,S)

is measurable. Moreover we assume that X is Ft-adapted.Also let Y = Yt, t ≥ 0 be another Ft-adapted process. The σ-algebra

Yt has already been mentioned in the introductory chapter. We now make aformal definition

Yt , σ(Ys, 0 ≤ s ≤ t) ∨N , (2.6)

where N is the set of P-null sets in F and the notation A∨B is the standardnotation for the σ-algebra generated by A and B, i.e. σ(A,B).

The addition of the null sets N to the observation σ-algebra considerablyincreases the complexity of the proofs in the derivation of the filtering equa-tions via the innovations approach in Chapter 3, so we should be clear why itis necessary. It is important that we can modify Yt-adapted processes. Sup-pose Nt is a such a process, then we need to be able to construct a processNt so that for ω ∈ G we change the values of the process, and for all ω /∈ G,Nt(ω) = Nt(ω), where G is a P-null set. In order that Nt be Yt-adapted, theset G must be in Yt, which is assured by the augmentation of Yt with theP-null sets N .

The following exercise gives a straightforward characterization of the σ-algebra Yt and the relation between the expectation conditional upon theaugmented filtration Yt and that conditional upon the unaugmented filtrationYot .

Exercise 2.4. Let Yot = σ(Ys, 0 ≤ s ≤ t).

i. Prove that

Yt = F ⊂ Ω : F = (G\N1) ∪N2, G ∈ Yot , N1, N2 ∈ N. (2.7)

ii. Deduce from part (i) that if ξ is Yt-measurable, then there exists a Yot -measurable random variable η, such that ξ = η P-almost surely. In partic-ular, for any integrable random variable ξ, the identity

2.2 The Optional Projection of a Measurable Process 17

E[ξ | Yt] = E[ξ | Yot ]

holds P-almost surely.

As already stated, we consider a right continuous enlargement of the fil-tration Yt defined by Yt+, t ≥ 0, where Yt+ = ∩s>tYs. We do not wisha priori to impose the requirement that this observation σ-algebra be rightcontinuous and satisfy Yt+ = Yt, because verifying the right continuity ofa σ-algebra which depends upon observations might not be possible beforethe observations have been made! We note, however, that the σ-algebra Yt+satisfies the usual conditions; it is right continuous and complete.

Finally we note that no path regularity is assumed on either X or Y . Alsono explicit connection exists between the processes X and Y .

2.2 The Optional Projection of a Measurable Process

From the perspective of measure theory, the filtering problem is associatedwith the construction of the optional projection of a process. The results inthis section are standard in the theory of continuous time stochastic processes;but since they are often not mentioned in elementary treatments we considerthe results which we require in detail.

Definition 2.5. The optional σ-algebra O is defined as the σ-algebra on[0,∞) × Ω generated by Ft-adapted processes with cadlag paths. A processis said to be optional if it is O-measurable.

There is a well-known inclusion result: the set of previsible processes iscontained in the set of optional processes, which is contained in the set ofprogressively measurable processes. We only require the second part of thisinclusion; for a proof of the first part see Rogers and Williams [249].

Lemma 2.6. Every optional process is progressively measurable.

Proof. As the optional processes are generated by the adapted processes withcadlag paths; it is sufficient to show that any such process X is progressivelymeasurable.

For fixed T > 0, define an approximation process

Y (n)(s, ω) ,∞∑k=0

XT2−n(k+1)(ω)1[Tk2−n,T (k+1)2−n)(s) +XT (ω)1[T,∞)(s).

It is immediate that Y (n)(s, ω) restricted to s ∈ [0, T ] is B([0, T ]) ⊗ FT -measurable and progressive. Since X has right continuous paths as does Y (n),it follows that limn Y

(n)t = lims↓tXs = Xt as n → ∞. Since the limit exists,

X = lim infn→∞ Y (n), and is therefore progressively measurable. ut


The following theorem is only important in the case of a process X whichis not adapted to the filtration Ft. It allows us to construct from X an Ft-adapted process. Unlike in the case of discrete time, we can not simply usethe process defined by the conditional expectation E[Xt | Ft], since this wouldnot be uniquely defined for ω in a null set which depends upon t; thus theprocess would be unspecified on the uncountable union of these null sets overt ∈ [0,∞), which need not be null, therefore this definition could result in aprocess unspecified on a set of strictly positive measure which is unacceptable.

Theorem 2.7. Let X be a bounded measurable process, then there exists anoptional process oX called the optional projection† of X such that for everystopping time T

oXT 1T<∞ = E[XT 1T<∞ | FT

]. (2.8)

This process is unique up to indistinguishability, i.e. any processes which sat-isfy these conditions will be indistinguishable.

As we have assumed that the filtration Ft satisfies the usual conditions,this result can be established using Doob’s result on the regularization of thetrajectories of martingales. The proof is given in Section A.7 of the Appendix.

Remark 2.8. A simple consequence of the uniqueness part of this result is thefact that if X is itself optional then oX = X. The definition can be extendedto unbounded non-negative measurable processes by applying Theorem 2.7 toX ∧ n and taking the limit as n→∞.

While Theorem 2.7 establishes the existence of the optional projectionprocess, it does not provide us with any information about the trajectoriesof this process; for example, if the process X has continuous paths, does oXalso have continuous paths? This turns out not to be true; see Remark 2.10.We must establish some kind of path regularity in order to apply many of thestandard techniques of continuous time processes to the optional projectionprocess.

The following theorem establishes the regularity which we need, however,its proof is fairly long and uses multiple applications of the optional sectiontheorem; therefore the proof is not given here, but can be found in SectionA.7.1 of the appendix.

Theorem 2.9. If Y is a bounded cadlag process then the optional projectionoY is also cadlag.

Since the optional projection is only unique up to indistinguishability,the theorem is in fact stating that oYt is indistinguishable from a cadlag† In some older French literature relevant to the subject, this projection is called

the projection bien-measurable, although more recently, projection optionelle hassuperseded this.

2.3 Probability Measures on Metric Spaces 19

process. As may be expected, this result depends upon Ft satisfying the usualconditions.

The restriction to bounded processes in the statement of the theoremis not essential, but is natural since our definition of optional projectionwas for a bounded process. The theorem can be extended to a processY in the class D (i.e. the class of processes such that the set XT :T is a stopping time and P(T < ∞) = 1 is uniformly integrable). As a uni-formly integrable martingale is of class D, it follows that the theorem appliesto uniformly integrable martingales.

Remark 2.10. The optional projection of a bounded continuous process neednot itself be continuous. As an example, consider the process whose valueat any time t is given by the same integrable random variable A; that isXt(ω) = A(ω). The optional projection of such a process is clearly the cadlagmodification of the martingale E[A | Ft], however, clearly this modificationneed not be continuous.

2.3 Probability Measures on Metric Spaces

This section presents some results on probability measures on metric spaceswhich are needed in order to construct the process π, and which are usedthroughout the book. The reader familiar with these topics can skip thissection to proceed with the construction of π. Let P(S) denote the space ofprobability measures on the space S, that is, the subspace of µ ∈ M(S) suchthat µ(S) = 1. Let B(S) be the space of bounded B(S)-measurable functionsS→ R. If ν ∈ P(S) and f ∈ B(S) we write

νf =∫

Sf(x)ν(dx).

The following standard results about probability measures are necessary.For more details on these subjects, the reader should consult one of the manyreferences, such as Billingsley [19] and Parthasarathy [239].

Theorem 2.11. Any probability measure µ on a metric space S endowed withthe associated Borel σ-algebra B(S) is regular. That is, if A ∈ B(S), givenε > 0 we can find an open set G and a closed set F such that F ⊆ A ⊆ G andµ(G \ F ) < ε.

Proof. Let d be the metric on S. If A is closed then we can take F = A andG = x : d(x,A) < δ; as δ ↓ 0 the set G decreases to A. So if we let Hbe the class of sets A with the property of regularity then all the closed setsare contained in H. As the closed sets are the complements of the open setsthey also generate the Borel σ-algebra. So if we show that H is a σ-algebrathen we shall have established the result. As H is obviously closed under


complementation we only need to prove that it is closed under the formationof countable unions. Let An ∈ H and let Fn and Gn be closed and open setssuch that Fn ⊆ An ⊆ Gn. By the definition of H we can choose these sets suchthat P(Gn \ Fn) < ε/2n+1. If we define G =

⋃∞n=1Gn this is clearly an open

set. Choose n0 such that P(⋃∞n0+1 Fn) < ε/2 and then define F =

⋃n0n=1 Fn

which, by virtue of the finite union is a closed set. Thus F ⊆ A ⊆ G andP(G \ F ) < ε, establishing that H is closed under countable unions. Hence Hcontains the Borel sets. ut

The main consequence for us of this theorem is that if two probabilitymeasures on (S,B(S)) agree on the closed sets then they are equal.

Definition 2.12. A subset A ⊂ B(S) is said to be separating if for ν, µ ∈P(S), the condition νf = µf for all f ∈ A implies that µ = ν.

The following result determines a very important separating class whichmotivates the definition of weak convergence. However, it should be notedthat the conclusion of Theorem 2.13 does follow from the more general Port-manteau theorem (Theorem 2.17).

Theorem 2.13. Let (S, d) be a metric space and Ud(S) be the space of allcontinuous bounded functions S → R which are uniformly continuous withrespect to the metric d on S. If µ, ν are elements of P(S), and∫

Sf(x)µ(dx) =

∫Sf(x)ν(dx) ∀f ∈ Ud(S),

then this implies that µ = ν. That is, the space Ud(S) is separating.

Proof. By Theorem 2.11 it is sufficient to show that ν and µ agree on closedsubsets of S. Let F be a closed set and define F ε = x : d(x, F ) < ε which isclearly open and

⋂∞n=1 F

1/n = F . The sets F and (F 1/n)c are disjoint closedsets and d(F, (F 1/n)c) ≥ 1/n. Define

fn = (1− nd(x, F ))+.

It is clear that fn ∈ Ud(S) and 0 ≤ fn ≤ 1. For x ∈ F it follows that fn(x) = 1,and fn(x) = 0 for x ∈ (F 1/n)c. Hence

µ(F ) =∫

S1F (x)µ(dx) ≤

∫Sfn(x)µ(dx)

andν(F 1/n) =

∫S

1F 1/n(x)ν(dx) ≥∫

Sfn(x)ν(dx),

but by assumption as fn ∈ Ud(S) the right-hand sides of these two equationsare equal; therefore µ(F ) ≤ ν(F 1/n) and letting n tend to infinity we obtainµ(F ) ≤ ν(F ). By symmetry we obtain the opposite inequality; hence µ(F ) =ν(F ) for all closed sets F . ut


2.3.1 The Weak Topology on P(S)

Let us endow the space P(S) with the weak topology. Familiarity with basicresults of general topology is assumed here, but some less elementary resultswhich are required are proved in Appendix A.3.

Definition 2.14. A sequence of probability measures µn ∈ P(S), convergesweakly to µ ∈ P(S) if and only if µnϕ converges to µϕ as n → ∞ for allϕ ∈ Cb(S). Weak convergence of µn to µ is denoted µn ⇒ µ.

No restriction is implied in this definition by the assertion that the limitµ is a probability measure. Since 1 ∈ Cb(S) it follows that µn1 = 1 for alln; hence µ1 = 1. We now exhibit a topology which engenders this form ofconvergence.

The reader with an interest in functional analysis should be aware thatthe concept of weak topology in the following definition is really the weak*-topology on the dual of Cb(S) which is the space M(S), but the terminologyweak convergence for this concept has become standard within probabilitytheory. Recall that for a space S, if T1 and T2 are topologies (collections ofsubsets of S satisfying the axioms of closure under finite intersections, closureunder all unions, and containing the S and ∅), then we say that T1 is weaker(coarser) than T2 if T1 ⊂ T2, in which case T2 is said to be finer than T1.

Definition 2.15. The weak topology on the space P(S) is defined to be theweakest topology such that for all f ∈ Cb(S), the function µ 7→ µf is continu-ous.

A basis for the neighbourhoods of a measure µ is defined to be a collectionof open sets which contain µ, such that if V is another open set containing µthen there exists an element of the basis which is a subset of V .

It is clear that for f ∈ Cb(S) the required continuity of the real-valuedfunction on P(S) given by ν 7→ νf implies that the set ν : |νf − µf | < εis open and contains µ. As the axioms of a topology require closure underfinite intersections, we can construct a neighbourhood basis from these setsby taking finite intersections of them; thus in the weak topology on P(S) abasis for the neighbourhood of µ (a local basis) is provided by the sets of theform

ν ∈ P(S) : |µfi − νfi| < ε, 1 ≤ i ≤ m (2.9)

for m ∈ N, ε > 0 and where f1, . . . , fm are elements of Cb(S).

Theorem 2.16. A sequence of probability measures µn ∈ P(S) convergesweakly to µ ∈ P(S) if and only if µn converges to µ in the weak topology.

Proof. If µn converges to µ in the weak topology then for any set A in theneighbourhood base of µ, there exists n0 such that for n ≥ n0, µn ∈ A. For anyf ∈ Cb(S), and ε > 0, the set ν : |µf − νf | < ε is in such a neighbourhood


basis; thus µnf → µf for all f ∈ Cb(S), which implies that µn ⇒ µ. Converselysuppose µn ⇒ µ, and let A be the element of the neighbourhood basis for theweak topology given by (2.9). By the definition of weak convergence, it followsthat µnfi → µnf , for i = 1, . . . , m, so there exists ni such that for n ≥ ni,|µnfi−µfi| < ε; thus for n ≥ maxi=1,...,m ni, µn is in A and thus µn convergesto µ in the weak topology. ut

We do not a priori know that this topology is metrizable; therefore we areforced to consider convergence of nets instead of sequences until such pointas we prove that the space is metrizable. Consequently we make this proofour first priority. Recall that a net in E is a set of elements in E indexed byα ∈ D, where D is an index set (i.e. a set with a partial ordering). Let xα bea net in E. Define

lim supα

xα , infα0∈D

supα≥α0

xα

and

lim infα

xα , supα0∈D

infα≥α0

xα

.

The net is said to converge to x if and only if

lim infα

xα = lim supα

xα = x.

If S is compact then by Theorem A.9, the space of continuous functionsC(S) = Cb(S) is separable and we can metrize weak convergence immedi-ately; however, in the general case Cb(S) is not separable. Is it possible tofind a smaller space of functions which still guarantee weak convergence butwhich is separable? The first thought might be the functions Ck(S) with com-pact support; however, these functions generate a different topology called thevague topology which is weaker than the weak topology. To see this, considerS = R and µn = δn the measure with an atom at n ∈ N; clearly this sequencedoes not converge in the weak topology, but in the vague topology it convergesto the zero measure. (Although this is not an element of P(S); it is an elementof M(S).)

The Portmanteau theorem provides a crucial characterization of weak con-vergence; while an important part of the theory of weak convergences its mainimportance to us is a step in the metrization of the weak topology.

Theorem 2.17. Let S be a metric space with metric d. Then the followingare equivalent.

1. µα ⇒ µ.2. limα µαg = µg for all uniformly continuous functions g, with respect to

the metric d.3. limα µαg = µg for all Lipschitz functions g, with respect to the metric d.4. lim supα µα(F ) ≤ µ(F ) for all F closed in S.5. lim infα µα(G) ≥ µ(G) for all G open in S.


Proof. The equivalence of (4) and (5) is immediate since the complement ofan open set G is closed. That (1)⇒(2)⇒(3) is immediate. So it is sufficient toprove that (3)⇒(4)⇒(1). Start with (3)⇒(4) and suppose that µαf → µf forall Lipschitz continuous f ∈ Cb(S). Let F be a closed set in S. We constructa sequence fn ↓ 1F viz for n ≥ 1,

fn(x) = (1− nd(x, F ))+. (2.10)

Clearly fn ∈ Cb(S) and fn is Lipschitz continuous with Lipschitz constant n.But 0 ≤ fn ≤ 1 and for x ∈ F , fn(x) = 1, so it follows that fn ≥ 1F , and itis also immediate that this is a decreasing sequence. Thus by the monotoneconvergence theorem

limn→∞

µfn = µ(F ). (2.11)

Consider n fixed; since 1F ≤ fn it follows that for α ∈ D µα(F ) ≤ µαfn, andthus

lim supα∈D

µα(F ) ≤ lim supα∈D

µαfn.

But by (3)lim supα∈D

µαfn = limα∈D

µαfn = µfn;

it follows that for all n ∈ N, lim supα∈D µα(F ) ≤ µfn and by (2.11) it followsthat lim supα∈D µα(F ) ≤ µ(F ), which is (4). The harder part is the proof isthat (4)⇒(1). Given f ∈ Cb(S) we split it up horizontally as in the definitionof the Lebesgue integral. Let

−‖f‖∞ = a0 < a1 < · · · < an = ‖f‖∞ + ε/2

be constructed with n sufficiently large to ensure that ai+1 − ai < ε. Define

Fi , x : ai ≤ f(x),

which by continuity of f is clearly a closed set. It is clear that µ(F0) = 1 andµ(Fn) = 0. Therefore

n∑i=1

ai−1 [µ(Fi−1)− µ(Fi)] ≤ µf <n∑i=1

ai [µ(Fi−1)− µ(Fi)] .

By telescoping the sums on the left and right and using the fact that a0 =−‖f‖∞, we obtain

−‖f‖∞ + ε

n−1∑i=1

µ(Fi) ≤ µf < −‖f‖∞ + ε+ ε

n−1∑i=1

µ(Fi). (2.12)

By the assumption that (4) holds, lim supα µα(Fi) ≤ µ(Fi) for i = 0, . . . , nhence we obtain from the right-hand inequality in (2.12) that


µαf ≤ −‖f‖∞ + ε+ ε

n−1∑i=1

µα(Fi)

thus

lim supα

µαf ≤ −‖f‖∞ + ε+ ε

n−1∑i=1

lim supα

µα(Fi)

≤ −‖f‖∞ + ε+ ε

n−1∑i=1

µ(Fi)

and from the left-hand inequality in (2.12) this yields

lim supα

µαf ≤ ε+ µf.

As ε was arbitrary we obtain lim supα µnf ≤ µf , and application to −f yieldslim inf µnf ≥ µf which establishes (1).

ut

While it is clearly true that a convergence determining set of functionsis separating, the converse is not true in general and in the case when S isnot compact, there may exist separating sets which are not sufficiently largeto be convergence determining. For further details see Ethier and Kurtz [95,Chapter 3, Theorem 4.5].

Theorem 2.18. If S is a separable metric space then there exists a countableconvergence determining class ϕ1,ϕ2, . . . where ϕi ∈ Cb(S).

Proof. By Lemma A.6 a separable metric space is homeomorphic to a subset of[0, 1]N; let the homeomorphism be denoted α. As the space [0, 1]N is compact,the closure α(S) is also compact. Thus by Theorem A.9 the space C(α(S)) isseparable. Let ψ1,ψ2, . . . be a countable dense family, where ψi ∈ C(α(S)).

It is therefore immediate that we can approximate any function ψ ∈C(α(S)) arbitrarily closely in the uniform metric by suitable choice of ψiprovided that ψ is the restriction to α(S) of a function in C(α(S)).

Now define ϕi = ψi α for each i. By the same reasoning, we can approx-imate f ∈ C(S) arbitrarily closely in the uniform metric by some fi providedthat f = g α where g is the restriction to α(S) of a function in C(α(S)).

Define a metric on S, ρ(x, y) = d(α(x), α(y)), where d is a metric inducedby the topology of co-ordinatewise convergence on [0, 1]N. As α is a homeo-morphism, this is a metric on S. For F closed in S, define the function

fFn (x) , (1− nρ(x, F ))+ = (1− nd(α(x), α(F )))+ = (gFn α)(x), (2.13)

where


gFn (x) , (1− nd(x, α(F )))+.

This function gFn is an element of C([0, 1]N), and hence is an element ofC(α(S)); thus by the foregoing argument, we can approximate fFn arbitrar-ily closely by one of the functions ϕi. But we have seen from the proof that(3)⇒(4) in Theorem 2.17 that fFn of the form (2.13) for all F closed, n ∈ Nform a convergence determining class. Suppose that for all i, we have thatlimα µαϕi = µϕi; then for each i

|µαfFn − µfFn | ≤ 2‖fFn − ϕi‖∞ + |µαϕi − µϕi|,

by the postulated convergence for all i of µαϕi; it follows that the second termvanishes and thus for all i,

lim supα|µαfFn − µfFn | ≤ 2‖fFn − ϕi‖∞.

As i was arbitrary, it is immediate that

lim supα|µαfFn − µfFn | ≤ 2 lim inf

i‖fFn − ϕi‖∞,

and since fFn can be arbitrarily approximated in the uniform norm by a ϕi,it follows limα µαf

Fn = µfFn , and since this holds for all n, and F is closed, it

follows that µα ⇒ µ. ut

Theorem 2.19. If S is a separable metric space, then P(S) with the weaktopology is separable. We can then find a countable subset ϕ1,ϕ2, . . . of Cb(S),with ‖ϕi‖∞ = 1 for all i, such that

d : P(S)× P(S)→ [0,∞), d(µ, ν) =∞∑i=1

|µϕi − νϕi|2i

(2.14)

defines a metric on P(S) which generates the weak topology; i.e., a net µαconverges to µ weakly if and only if limα d(µα, µ) = 0.

Proof. By Theorem 2.18 there exists a countable set f1,f2, . . . of elements ofCb(S) which is convergence determining for weak convergence. Define ϕi ,fi/‖fi‖∞; clearly ‖ϕi‖∞ = 1, and the ϕis also form a convergence determiningset. Define the map

β : P(S)→ [0, 1]N β : µ 7→ (µϕ1, µϕ2, . . .).

Since the ϕis are convergence determining; they must also be separating andthus the map β is one to one. It is clear that if µα ⇒ µ then from thedefinition of weak convergence, limα β(µα) = β(µ). Conversely, since the ϕisare convergence determining, if limα µαϕi = µϕi for all i then µα ⇒ µ. Thusβ is a homeomorphism from P(S) with the topology of weak convergence to


[0, 1]N with the topology of co-ordinatewise convergence. Thus since [0, 1]N isseparable, this implies that P(S) is separable.

The space [0, 1]N admits a metric which generates the topology of co-ordinatewise convergence, given for x, y ∈ [0, 1]N by

D(x, y) =∞∑n=1

|xi − yi|2i

. (2.15)

Therefore it follows that d(x, y) = D(β(x), β(y)) is a metric on P(S) whichgenerates the weak topology. ut

As a consequence of this theorem, when S is a complete separable metricspace the weak topology on P(S) is metrizable, so it is possible to considerconvergence in terms of convergent sequences instead of using nets.

Exercise 2.20. Exhibit a countable dense subset of the space P(R) endowedwith the weak topology. (Such a set must exist since R is a complete separablemetric space, which implies that P(R) is separable.) Show further that P(R)is not complete under the metric d defined by (2.14).

Separability is a topological property of the space (i.e. it is independent ofboth existence and choice of metric), whereas completeness is a property ofthe metric. The topology of weak convergence on a complete separable spaceS can be metrized by a different metric called the Prohorov metric, underwhich it is complete (see, e.g. Theorem 1.7 of Chapter 3 of Ethier and Kurtz[95]).

Exercise 2.21. Let (Ω,F) be a probability space and S be a separable metricspace. Let ζ : Ω → P(S) be a function. Write B(P(S)) for the Borel σ-algebraon P(S) generated by the open sets in the weak topology. Let ϕii>0 be acountable convergence determining set of functions in Cb(S), whose existenceis guaranteed by Theorem 2.18. Prove that ζ is F/B(P(S))-measurable (andthus a random variable) if and only if ζϕi : Ω → R is F/B(R)-measurable forall i > 0. [Hint: Consider the case where S is compact for a simpler argument.]

Let us now turn our attention to the case of a finite state space I. Thesituation is much easier in this case since bothM(I) and P(I) can be viewedas subsets of the Euclidean space R|I| with the product topology (which isseparable), and equipped with a suitable complete metric.

M(I) =

(xi)i∈I ∈ R|I| :

∑i∈I

xi <∞, xi ≥ 0 ∀i ∈ I

P(I) =

(xi)i∈I ∈M(I) :

∑i∈I

xi = 1

.

The Borel sets in M(I), viz B(M(I)), are generated by the cylinder setsRi,a,bi∈I;a,b≥0, where Ri,a,b = (xj)j∈I ∈ M(I) : a ≤ xi ≤ b and B(P(I))is similarly described in terms of cylinders.

2.4 The Stochastic Process π 27

Exercise 2.22. Let d(x, y) be the Euclidean metric on R|I|. Prove that dmetrizes the topology of weak convergence on P(I) and that (P(I), d) is acomplete separable metric space.

2.4 The Stochastic Process π

The aim of this section is to construct a P(S)-valued stochastic process πwhich is progressively measurable. In order to guarantee the existence of sucha stochastic process some topological restrictions must be imposed on thestate space S. In this chapter we assume that S is a complete separable metricspace.† While this topological restriction is not the most general possible, itincludes all the cases which are of interest to us; extensions to more generalspaces are possible at the expense of additional technical complications (fordetails of these extensions, see Getoor [105]).

If we only wished to construct for a fixed t ∈ [0,∞) a P(S)-valued randomvariable πt then we could use the theory of regular conditional probabilities.If the index set (in which t takes values) were countable then we could con-struct a suitable conditional distribution Qt for each t. However, in the theoryof continuous time processes the index set is [0,∞). If suitable conditions aresatisfied, then by making a specific choice of Ω (usually the canonical pathspace), it is possible to regularize the sequence of regular conditional distri-butions Qt : t ∈ Q+ to obtain a cadlag P(Ω)-valued stochastic process,(Qt)t≥0 which is called a kernel for the optional projection. Such a kernelis independent of the signal process X and depends only on the probabilityspace (Ω,F) and the filtration Yt.

Performing the construction in this way (see Meyer [206] for details) issomewhat involved and imposes unnecessary conditions on Ω, which are irrel-evant since we are only interested in the distribution of the signal process Xt.Thus we do not follow this approach and instead choose to construct πt bypiecing together optional projections. The existence and uniqueness theoremfor optional projections requires that we work with a filtration which satisfiesthe usual conditions, since the proof makes use of Doob’s martingale regulari-sation theorem. Therefore since we have do not assume right continuity of Yt,in the following theorem the right continuous enlargement Yt+ is used as thissatisfies the usual conditions.

Lemma 2.23. Assume that S is a compact metric space and S = B(S) isthe corresponding Borel σ-algebra. Then there exists a P(S)-valued stochasticprocess πt which satisfies the following conditions.

1. πt is a Yt+-optional process.

† A complete separable metric space is sometimes called a Polish space followingBourbaki in recognition of the work of Kuratowksi.


2. For any f ∈ B(S), the process πtf is indistinguishable from the Yt+-optional projection of f(Xt).

Proof. The proof of this lemma is based upon the proofs of Proposition 1 inYor [279], Theorem 4.1 in Getoor [105] and Theorem 5.1.15 in Stroock [262].

Let fi∞i=1 be a set of continuous bounded functions fi : S → R whoselinear span is dense in Cb(S). The compactness of S implies by Corollary A.10that such a set must exist. Set f0 = 1. We may choose such a set so thatf0, . . . , fn is linearly independent for each n. Set g0 = 1, and for n ≥ 1 setthe process gn equal to a Yt+-optional projection of fn(X). The existence ofsuch an optional projection is guaranteed by Theorem 2.7.

Let U be the (countable) vector space generated by finite linear combina-tions of these fis with rational coefficients. If for some N ∈ N, f =

∑Ni=1 αifi

with αi ∈ Q then define the process Λω =∑Ni=1 αigi. By the linear indepen-

dence property, it is clear that any such representation is unique and thereforethis is well defined.

Define a subspace, U+ , v ∈ U , v ≥ 0. For v ∈ U+ define

N (v) = ω ∈ Ω : Λωt (v) < 0 for some t ≥ 0 .

It is immediate from Lemma A.26 that for each v ∈ U+, the process Λω(v)has non-negative paths a.s., thus N (v) is a P-null set. Define

N =⋃f∈U+

Nf ,

which is also a P-null set since this is a countable union. By construction Λω

is linear; Λ(1) = 1.Define a modified version of the process Λω which is a functional on U ⊂

Cb(S) and retains the properties of non-negativity and linearity for all ω ∈ Ω,

Λω(f) =

Λω(f) ω /∈ N ,0 ω ∈ N .

It only remains to check that Λω is a bounded operator. Let f ∈ U ⊂ Cb(S);then trivially |f | ≤ ‖f‖∞1, so it follows that ‖f‖∞1 ± f ≥ 0 and hence forall t ≥ 0 Λωt (‖f‖∞1± f) ≥ 0 by the non-negativity property. But by linearitysince Λω(1) = 1, it follows that for all t ≥ 0, ‖f‖∞1±Λωt (f) ≥ 0, from whichwe deduce supt∈[0,∞) ‖Λωt (f)‖∞ < ‖f‖∞.

Since Λω is bounded, and U is dense in Cb(S) we can extend† the definitionof Λω(f) for f outside of U as follows. Let f ∈ Cb(S), since U is dense in Cb(S),we can find a sequence fk ∈ U such that fk → f pointwise. Define† Functional analysts will realise that we can use the Hahn–Banach theorem to

construct a norm preserving extension. Since this is a metric space we can usethe constructive proof given here instead.


Λω(f) , lim Λω(fk)

which is clearly well defined since if f ′k ∈ U is another sequence such thatf ′k → f , then by the boundedness of Λ and using the triangle inequality

supt∈[0,∞)

‖Λωt (fk)− Λωt (f ′n)‖∞ ≤ ‖fk − f ′n‖∞ ≤ ‖fk − f‖∞ + ‖f − f ′n‖∞.

Since S is compact and the convergence fk → f and f ′n → f is uniform on S,then given ε > 0, there exists k0 such that k ≥ k0 implies ‖fk − f‖∞ < ε/2and similarly n0 such that n ≥ n0 implies ‖f ′n − f‖∞ < ε/2 whence it followsthat the limit as n→∞ of Λω(f ′n) is Λω(f).

We must check that for f ∈ Cb(S), that Λωt (f) is the Yt+-optional projec-tion of f(Xt). By the foregoing it is Yt+-optional. Let T be a Yt+-stoppingtime

E[ΛT (f)1T<∞] = limk→∞

E[ΛT (fk)1T<∞

]= limk→∞

E [fk(XT )1T<∞]

= E[f(XT )1T<∞],

where the second equality follows since Λ(fn) is a Yt+-optional projection offn(X) and the other two inequalities follow by the dominated convergencetheorem.

By the Riesz representation theorem, which applies since S is compact,†

we can find a kernel πω(·) such that for ω ∈ Ω,

Λωt (f) =∫

Sf(x)πωt (dx) = πωt f, for all t ≥ 0. (2.16)

To establish the first and second parts of the theorem, we need to checkthat for f ∈ B(S) that (πωf)t is the Yt+-optional projection of f(Xt). Wedo this via the monotone class framework (see Theorem A.1 in the appendix)since on a metric space the σ-algebra generated by Cb(S) is B(S). It is clearthat for f ∈ Cb(S) from (2.16) and the preceding argument that (πωf)t is theYt+-optional projection of f(Xt).

Let H be the subset of B(S) for which (πωf)t is the Yt+-optional pro-jection of f(Xt). Clearly 1 ∈ H, H is a vector space and Cb(S) ⊆ H. Themonotone convergence theorem for integration implies that H is a monotoneclass. Therefore by the monotone class theorem H contains B(S). ut

Theorem 2.24. Let S be a complete separable metric space. Then there existsa P(S)-valued stochastic process πt which satisfies the following conditions.

1. πt is a Yt+ optional process.

† Without the compactness property, we can not guarantee that the kernel be σ-additive.


2. For any f ∈ B(S), the process πtf is indistinguishable from the Yt+ op-tional projection of f(Xt).

Proof. Since S is a complete separable metric space, by Theorem A.7 of theappendix, it is homeomorphic to a Borel subset of a compact metric space S;we denote the homeomorphism by α.

Define a process Xt = α(Xt) taking values in S. Since S is a compactseparable metric space, by Lemma 2.23 there exists a P(S)-valued stochasticprocess π, such that for each f ∈ B(S), πtf is a Yt+-optional projection off(Xt).

Since the process X takes values in α(S) ⊂ S, it is immediate that

o1S\α(S)(Xt) = o1S\α(S)(α(Xt)) = o0 = 0.

As the optional projection is only defined up to indistinguishability, it followsthat

πωt

(S \ α(S)

)= πωt

(1S\α(S)

)= o1S\α(S)(Xt) = 0 ∀t ∈ [0,∞) P-a.s.

DefineN ,

ω ∈ Ω : πωt

(S \ α(S)

)= 0 ∀t ∈ [0,∞)

c,

which we have just shown to be a P-null set. We define a P(S)-valued randomprocess π as follows; let A ∈ B(S),

πωt (A) ,

πωt (α(A)) ω /∈ N ,PY −1

t (A) ω ∈ N .

Here the choice of π on N is arbitrary; we cannot choose 0, because πωt must bea probability measure on S for all ω ∈ Ω. Thus it is immediate that πωt ∈ P(S).

If f ∈ B(S) then we can extend f to a function f in B(S) by defining

f(x) =

f(α−1(x)) if x ∈ α(S)0 otherwise.

Clearlyπ(f) = π

(f α

)= π

(f1α(S)

)= π(f) P-a.s.,

but πtf is the Yt+-optional projection of f(Xt) = f(Xt), hence as requiredπt(f) is the Yt+-optional projection of f(Xt). ut

Exercise 2.25. Let πt be defined as above. Show that for any f ∈ B(S), thenfor any t ∈ [0,∞),

πtf = E [f(Xt) | Yt+]

holds P-a.s.


Corollary 2.26. If the sample paths of X are cadlag then there is a versionof πt with cadlag paths (where P(S) is endowed with the topology of weakconvergence) and a countable set Q ⊂ [0,∞), such that for t ∈ [0,∞) \Q, forany f ∈ B(S),

πtf = E[f(Xt) | Yt].

Proof. For any f ∈ Cb(S), the Yt+-optional projection of f(Xt) is indistin-guishable from a cadlag process by Theorem 2.9. Since by Theorem 2.24 πtfis indistinguishable from the Yt+-optional projection of f(Xt), it follows thatπtf is indistinguishable from a cadlag process.

By Theorem 2.18, there is a countable convergence determining classϕii≥0, which is therefore also a separating class. We can therefore choose amodification of πt such that πtϕi is cadlag for all i. Therefore πt is cadlag.

Since P(S) with the weak topology is metrizable it then follows by LemmaA.14 that

I , t > 0 : P(πt− 6= πt) > 0

is countable. But for t /∈ I, πt = πt− a.s. thus πt = lims↑↑t πs (where the nota-tion s ↑↑ t is defined in Section A.7.1). Clearly πs for s < t is Yt-measurableand therefore so is the limit πt. As Yt ⊂ Yt+ it follows from the definition ofKolmogorov conditional expectation that

πtf = E[πtf | Yt] = E[f(Xt) | Yt].

ut

Remark 2.27. The theorem as stated above only guarantees πtf is a Yt+-optional projection of f(Xt) for f a bounded measurable function. Examiningthe proof shows that this restriction to bounded f is the usual one arising fromthe use of the monotone class theorem A.1.

It is useful to consider πf when f is not bounded. Consider f non-negativeand define fn , f∧n, which is bounded, so by the above theorem π(fn) is Yt+-optional. Clearly fn → f in a monotone fashion as n → ∞, and since π(fn)is the expectation of fn under the measure πt, by the monotone convergencetheorem π(fn) → π(f). Since π(f) is the limit of a sequence of Yt+-optionalprocesses, it is Yt+-optional. By application of the monotone convergencetheorem to the defining equation of optional projection (2.8), it follows thatπ(f) is a Yt+-optional projection of f(Xt).

In the general case where f is unbounded, but not necessarily non-negative,if πt|f | < ∞ for all t ∈ [0,∞), P-a.s., then writing f+ = f ∧ 0 and f− =(−f) ∧ 0, it follows that |f | = f+ + f− and hence πtf+ < ∞ and πtf− < ∞for all t ∈ [0,∞) P-a.s. Thus πtf = πtf+ − πtf− is well defined (i.e. it cannotbe ∞−∞) and a similar argument verifies that it satisfies the conditions forthe Yt+-optional projection of f(Xt).

The pathwise regularity of πtf (i.e. showing that the trajectories of πfare cadlag if X is cadlag) requires stronger conditions in the unbounded case


irrespective of whether f is non-negative. In particular we need to be ableto exchange a limit and an expectation; a suitable condition for this to bevalid is that the family of random variables f(Xt) : t ∈ [0,∞) is uniformlyintegrable. For example, this is true if this family is dominated by an integrablerandom variable, in other words if sups∈[0,∞) |f(Xs)| is integrable.

2.4.1 Regular Conditional Probabilities

This section is not essential reading in order to understand the subsequentchapters. It describes the construction of a regular conditional probability.The ideas involved are important in many areas of probability theory andmost of the work in establishing them has been done in the previous section,hence their inclusion here.

For many purposes a stronger notion of conditional probability is requiredthan that provided by Kolmogorov conditional expectation (see AppendixA.2). The most useful form is that of regular conditional probability.

Definition 2.28. Let (Ω,F ,P) be a probability space and G a sub-σ-algebraof F . A function Q(ω,B) defined for all ω ∈ Ω and B ∈ E is called a regularconditional probability of P with respect to G if

(a) For each B ∈ F ,Q(ω,B) = E[IB | G] P-a.s.

(b) For each ω ∈ Ω, Q(ω, ·) is a probability measure on (Ω,F).(c) For each B ∈ E, the map Q(·, B) is G-measurable.(d) If the σ-algebra G is countably generated then for all G ∈ G,

Q(ω,G) = 1G(ω) P-a.s.

Regular conditional probabilities as described in Definition 2.28 do notalways exist. For an example of non-existence of regular conditional probabil-ities due to Halmos, Dieudonne, Andersen and Jessen see Rogers and Williams[248, Section II.43].

Exercise 2.29. Prove by similar methods to those used in the proof of The-orem 2.24 that if Ω is a compact metric space then there exists a regularconditional probability distribution with respect to the σ-algebra G ⊂ F . Fur-thermore, show that in the case where G is finitely generated that if AG(ω)is the atom of G containing ω (i.e.

⋂G ∈ G : ω ∈ G) then Q(ω,A(ω)) = 1.

This argument can be extended to complete separable metric spaces usingTheorem A.7 using an argument similar to that in the proof of Theorem 2.24.

2.5 Right Continuity of Observation Filtration 33

2.5 Right Continuity of Observation Filtration

The results in this section are proved under more restrictive conditions thanthose of Section 2.1. The observation process Y is assumed to satisfy theevolution equation (2.2). That is,

Yt = Y0 +∫ t

0

h(Xs) ds+Wt, t ≥ 0,

where W = Wt, t ≥ 0 is a standard Ft-adapted m-dimensional Brownianmotion and h = (hi)mi=1 : S → Rm is a measurable function. Assume thatconditions (2.3) and (2.4) are satisfied; that is

E[∫ t

0

‖h(Xs)‖ds]<∞,

and

P(∫ t

0

‖πs(h)‖2 ds <∞)

= 1.

Let I = It, t ≥ 0 be the following process, called the innovation process,

It = Yt −∫ t

0

πs(h) ds

= Wt +∫ t

0

h(Xs) ds−∫ t

0

πs(h) ds, t ≥ 0. (2.17)

For this innovation process to be well defined it is necessary that∫ t

0

πs(‖h‖) ds <∞ P-a.s., (2.18)

which is clearly implied by the stronger condition (2.3). The condition (2.18)is not strong enough for the proof of the following theorem; consequently onlycondition (2.3) is referenced subsequently.

Proposition 2.30. If condition (2.3) is satisfied then It is a Yt-adaptedBrownian motion under the measure P.

Proof. Obviously It is Yt-adapted as both Yt and∫ t

0πs(h) ds are. First it

is shown that It is a continuous martingale. As a consequence of (2.3) It isintegrable, hence taking conditional expectation


E [It | Ys]− Is = E[Wt +

∫ t

0

h(Xr) dr∣∣∣∣ Ys]− (Ws +

∫ s

0

h(Xr) dr)

− E[∫ t

0

πr(h) dr∣∣∣∣ Ys]+

∫ s

0

πr(h) dr

= E[Ys +Wt −Ws +

∫ t

s

h(Xr) dr∣∣∣∣ Ys]− Ys

− E[∫ t

0

πr(h) dr∣∣∣∣ Ys]+

∫ s

0

πr(h) dr.

Since Yt and∫ t

0πr dr are Yt-measurable,

E [It | Ys]− Is = E [Wt −Ws | Ys] +∫ t

s

E [h(Xr)− πr(h) | Ys] dr = 0,

where we have used the fact that for r ≥ s, E [πr(h) | Ys] = E [h(Xr) | Ys]and E [Wt −Ws | Ys] = E [E [Wt −Ws | Fs] | Ys] = 0. The cross-variation ofI is the same as the cross-variation of W as the other two terms in (2.17) givezero cross-variation. So I is a continuous martingale and its cross-variation isgiven by ⟨

Ii, Ij⟩t

=⟨W i,W j

⟩t

= tδij . (2.19)

Hence I is a Brownian motion by Levy’s characterisation of a Brownian motion(Theorem B.27). ut

From the first part of (2.17), for small δ,

Yt+δ − Yt ' πs(h)δ + It+δ − It.

Heuristically the incoming observation Yt+δ − Yt has a part which could bepredicted from the knowledge of the system state πs(h)δ and an additionalcomponent It+δ − It, containing new information which is independent of thecurrent knowledge. This is why I is called the innovation process.

Proposition 2.31 (Fujisaki, Kallianpur and Kunita [104]). Assume theconditions (2.3) and (2.4) are satisfied. Then every square integrable randomvariable η which is Y∞-measurable has a representation of the form

η = E[η] +∫ ∞

0

ν>s dIs, (2.20)

where ν = νt, t ≥ 0 is a progressively measurable Yt-adapted process suchthat

E[∫ ∞

0

‖νs‖2 ds]<∞.


This theorem is often proved under the stronger condition

E[∫ t

0

‖h(Xs)‖2 ds]<∞, ∀t ≥ 0,

which implies both conditions (2.4) and (2.3). The innovation process It isclearly Yt-adapted. If the converse result, that Yt = σ(Is : 0 ≤ s ≤ t) ∨ N ,were known† then this proposition would be a trivial application of the mar-tingale representation theorem B.32. However, the representation provided bythe proposition is the closest to a satisfactory converse which is known tohold.‡

The main element of the proof of Proposition 2.31 is an application ofGirsanov’s theorem followed by use of the martingale representation theorem.In Section 2.1 it was necessary to augment the filtration Yt+ with the null setsin order to construct the process π. This will cause some difficulties, becausethe process to be used as a change of measure is not necessarily a martingale.In order to construct a uniformly integrable martingale, a stopping argumentmust be used and this cannot be done directly working with the augmentedfiltration. This has the unfortunate effect of obscuring a simple and elegantproof. The proof for a simpler case, where the process is a martingale isdiscussed in Exercise 2.33 and the reader who is uninterested in measurabilityaspects would be well advised to consult the solution to this exercise instead ofreading the proof. To be clear in notation, we denote by Yot the unaugmentedσ-algebra (i.e. without the addition of the null sets) corresponding to Yt.

The following technical lemma, whose conclusion might well seem to be‘obvious’ is required. The proof of the lemma is not important for under-standing the proof of the representation result, therefore it can be found inthe appendix proved as Lemma A.24.

Lemma 2.32. Let X ot be the unaugmented σ-algebra generated by a processXt. Then for T a X o-stopping time, for all t ≥ 0,

X ot∧T = σ Xs∧T : 0 ≤ s ≤ t .† An example of Tsirel’son which is presented in a filtering context in Benes [11]

demonstrates that in general Yt is not equal to σ(Is : 0 ≤ s ≤ t) ∨N .‡ In special cases the observation and innovation filtrations can be shown to be

equal. Allinger and Mitter [3] extend an earlier result of Clark [55] (see also Theo-rem 11.4.1 in Kallianpur [145] and Meyer [205] pp 244–246) to show that if the ob-

servation and signal noise are uncorrelated and for some T , E[∫ T

0‖h(Xs)‖2 ds] <

∞, then for t ≤ T , σ(Is : 0 ≤ s ≤ t)∨N = Yt. Their proof consists of an analysisof the Kallianpur–Striebel functional which leads to a pathwise uniqueness resultfor weak solutions of the equation It = Yt−

∫ t

0πs(h) ds. That is, if two valid weak

solutions (Y, I) and (Y , I) of this equation have a common Brownian motion I(but not necessarily a common filtration) then Y and Y are indistinguishable.From a result of Yamada and Watanabe (Remark 2, Corollary 1 of [275]; see alsoChapter 8 of Stroock and Varadhan [261]) this establishes the result.


We are now in a position to prove the representation result, Proposition2.31.

Proof. Since the integral in (2.4) is non-decreasing in t, this condition impliesthat

P(∫ t

0

‖πr(h)‖2 dr <∞, ∀t ∈ [0,∞))

= 1. (2.21)

Define

Zt , exp(−∫ t

0

πr(h>) dIr −12

∫ t

0

‖πr(h)‖2 dr), (2.22)

and for n > 0 define

Tn , inft ≥ 0 :

∣∣∣∣∫ t

0

‖πr(h)‖2 dr∣∣∣∣ ≥ n or |Zt| ≥ n

, (2.23)

which by Lemma A.19 is a Yt-stopping time, since the processes t 7→∫ t0‖πr(h)‖2 dr and Z are both continuous and Yt-adapted. By Lemma A.21

the Yt-stopping time Tn is a.s. equal to a Yot+-stopping time. However, this isnot strong enough; a Yot -stopping time is required.

The process πt(h) gives rise to a sequence of cadlag step function approx-imations

πn(h)(ω) ,∞∑i=0

1[2−ni,2−n(i+1))(t)π2−ni(h)(ω).

Each π2−ni(h) is a Y2−ni-measurable random variable. From the definition ofaugmentation, by modification on a P-null set a Yo2−ni-measurable randomvariable Pni can be defined such that π2−ni(h) = Pni holds P-a.s. Then define

πn(h)(ω) ,∞∑i=0

1[2−ni,2−n(i+1))(t)Pni (ω),

and as a countable family of random variables has been modified on nullsets, it follows that the processes πn(h) and πn(h) are indistinguishable. Theprocess πn(h) is cadlag and Yot -adapted, therefore it must be Yot -optional.

As the process π(h) has by Lemma A.13 at most a countable number ofdiscontinuities, it follows that πn(h) converges λ-a.s. to π(h). Therefore the se-quence πn(h) converges λ⊗ P-a.s. to π(h). The limit π(h) , lim infn→∞ πn(h)is a limit of Yot -optional processes and is therefore Yo-optional. Using this πprocess in place of π(h) in the definition of Z we may define Z. This processZ as constructed need not be continuous as it can explode from a finite valueto infinity, because (2.21) only holds outside a null set. This process cannotsimply be modified on a null set, as this might destroy the property of Yot -adaptedness. Instead define Z to be a modification of Z, which is zero on theset

ω ∈ Ω :∫ r

0

‖πs(h)‖2 ds =∞ for r < t, r ∈ Q.


This set is clearly Yot -measurable, hence this modified process Z is Yot -adaptedand continuous.

As the processes Z and∫ ·

0‖πs(h)‖2 ds are continuous and Yot -adapted by

Lemma A.19 inft ≥ 0 : Zt ≥ n and inft ≥ 0 :∫ t

0‖πs(h)‖2 ds are both

Yot -stopping times. The process π(h) is indistinguishable from π(h), thereforedefine a second sequence of stopping times

Tn , inft ≥ 0 :

∣∣∣∣∫ t

0

‖πr(h)‖2 dr∣∣∣∣ ≥ n or |Zt| ≥ n

(2.24)

and it follows that Tn is be a.s. equal to Tn and Zn , ZTn is P-a.s. equal toZTn .

Clearly Z is a local martingale; but in general it is not a martingale. Thenext argument shows that by stopping at Tn, the process ZTn is a uniformlyintegrable martingale and therefore suitable for use as a measure change.

From (2.22), using Ito’s formula

ZTnt = Z0 −∫ t∧Tn

0

ZTns πs(h>) dIs,

and since by Proposition 2.30, I is a P-Brownian motion adapted to Yt; itfollows that the stochastic integral is a Yt-adapted martingale provided that

E

[∫ t∧Tn

0

‖πs(h)‖2(ZTns

)2ds

]<∞, for all t ≥ 0.

It is clear that∫ t∧Tn

0

‖πs(h)‖2(ZTns

)2ds ≤ n2

∫ t∧Tn

0

‖πs(h)‖2 ds ≤ n4 <∞. (2.25)

It follows that ZTn

is a martingale which by (2.25) is uniformly bounded inL2 and hence uniformly integrable. Define a change of measure by

dPn

dP= ZTn .

As P and Pn are by construction equivalent probability measures, it followsthat statements which hold P-a.s. also hold Pn-a.s. As a consequence of Gir-sanov’s theorem (see Theorem B.28 of the appendix), since ZTn is a uniformlyintegrable martingale, under the measure Pn, the process

Y nt , It +∫ Tn∧t

0

πr(h) dr,

is a Brownian motion with respect to the filtration Yt.


We are forced to use this Brownian motion Y n in place of Y when applyingthe martingale representation theorem. Were Z itself a uniformly integrablemartingale we could use this directly to construct a representation of Z−1

∞ ηwhich is Y∞-measurable and square integrable as an integral over Y . Using Y n

instead of Y as our Brownian motion is not itself a problem. However, the mar-tingale representation theorem only allows representations to be constructedof random variables which are measurable with respect to the augmentationof the filtration generated by the Brownian motion. In this case this meansmeasurable with respect to the augmentation of the filtration

Yn,ot , σ Y ns : 0 ≤ s ≤ t .

Clearly this filtration Yn,ot is not the same as Yot .From the definition of the innovation process

Yt = It +∫ t

0

πr(h) dr.

Thus Y n and Y agree on the time interval [0, Tn]. It must now be shownthat the σ-algebras generated by these processes stopped at Tn agree. FromLemma A.24 it follows that

Yn,ot∧Tn = σ Y ns∧Tn : 0 ≤ s ≤ t = σ Ys∧Tn : 0 ≤ s ≤ t = Yot∧Tn ,

where the second equality follows from the fact that Y n and Y agree on theinterval [0, Tn].

Suppose that η is an element of L2(YoTn ,P), that is η is YoTn -measurable,and E[η2] <∞. As the process Zt is continuous, it is progressively measurabletherefore Zn is YoTn -measurable. Thus Z−1

n η is also YoTn-measurable. One ofthe conditions defining the stopping time Tn ensures that |(Zn)−1| < exp(2n),thus

EPn(Z−1n η)2 = E[ZnZ−2

n η2] ≤ exp(2n)E[η2] <∞,

and hence Z−1n η is an element of L2(YoTn , P

n).We can now apply the classical martingale representation theorem B.32

(together with Remark B.33) to construct a representation with respect tothe Brownian motion Y n of η to establish the existence of a previsible processΦn adapted to the filtration Ynt (the representation theorem requires the useof the augmented filtration) such that

Z−1n η = EPn(Z−1

n η) +∫ ∞

0

(Φns )> dY ns

= E[η] +∫ ∞

0

(Φns )> dIs +∫ ∞

0

(Φns )> πs(h) ds.

As Φns is Yns -adapted, it follows that for s > Tn, Φns = 0 and since Y and Y n

agree on [0, Tn] it follows that Φns is adapted to Ys. We now construct a Pnmartingale from ηZ−1

n via


ηt = EPn[ηZ−1

n | Ynt].

Applying Ito’s formula to the product ηtZTn

t ,

d(ηtZTn∧t) = 1t≤Tn(−Ztηtπt(h>) dIt + Zt (Φnt )> dIt

+ Zt (Φnt )> πt(h) dt− Zt (Φnt )> πt(h) dt).

The finite variation terms in this expression cancel and thus integrating from0 to t,

ηtZTnt = E[η] +

∫ t∧Tn

0

(Zs (Φns )> − ηsZsπs(h>)

)dIs.

Writing νnt , ZtΦnt − ηtZtπt(h),

ηtZTnt = E[η] +

∫ t∧Tn

0

ν>s dIs,

taking the limit as t→∞, yields a representation

Z−1n ηZTn = E[η] +

∫ Tn

0

ν>t dIt.

By choice of Zn, the left-hand side is a.s. equal to η and since Φn and Zare Yt-adapted, it follows that νn is also Yt-adapted. The fact that Φn isprevisible implies that it is progressively measurable and hence since π(h) isprogressively measurable the progressive measurability of νn follows and wehave established that for η ∈ L2(YoTn ,P) there is a representation

η = E[η] +∫ Tn

0

ν>t dIt, P-a.s., (2.26)

where ν is progressively measurable.Taking expectation of the square of (2.26) it follows that

E[η − E[η]]2 = E

[∫ Tn

0

(νns )2 ds

].

Since η is a priori a square integrable random variable, the left-hand side isfinite and hence

E

[∫ Tn

0

(νns )2 ds

]<∞.

The representation of the form (2.26) must be unique, thus it follows thatthere exists νt such that νt = νnt on t ≤ Tn for all n ∈ N.


To complete the proof let H be the set of all elements of L2(Y∞,P) whichhave a representation of the form (2.20). By the foregoing argument, for anyn, L2(YoTn ,P) ⊆ H. Clearly the set H is closed and since⋃

n∈NL2(YoTn ; P)

is dense in L2(Y∞; P), hence H = L2(Y∞,P). ut

Exercise 2.33. To ensure you understand the above proof, simplify the proofof Proposition 2.31 in the case where for ω not in some null set∫ t

0

‖πr(h)‖2 dr < K(t) <∞, (2.27)

where K(t) is independent of ω, a condition which is satisfied if h is bounded.In this case the condition (2.3) holds trivially.

Proposition 2.31 offers an easy route to showing that the filtration Yt isright continuous.

Lemma 2.34. Let M = Mt, t ≥ 0 be a right continuous Yt+-adapted mar-tingale that is bounded in L2(Ω); that is M satisfies supt≥0 E[M2

t ] <∞. ThenM is Yt-adapted and continuous.

Proof. By the martingale convergence theorem (Theorem B.1) Mt = E[M∞ |Yt+], and by Proposition 2.31

M∞ = E[M∞] +∫ ∞

0

ν>s dIs,

so using the fact that It is Yt-adapted

Mt = E[M∞] + E[∫ ∞

0

ν>s dIs

∣∣∣∣ Yt+]= E[M∞] +

∫ t

0

ν>s dIs,

from which it follows both that Mt is Yt-measurable and that M is continuous.ut

Theorem 2.35. The observation σ-algebra is right continuous that is Yt+ =Yt.

Proof. For a given t ≥ 0 let A ∈ Yt+. Then the process M = Ms, s ≥ 0defined by

Ms ,

1A − E[IA | Yt] for s ≥ t0 for s < t

2.6 Solutions to Exercises 41

is a Ys+-adapted right continuous martingale bounded in L2(Ω). Hence, byLemma 2.34, M is also a continuous Ys-adapted martingale. In particular1A−E[IA | Yt] is Yt-measurable so A ∈ Yt. Hence Yt+ ⊆ Yt and the conclusionfollows since t was arbitrarily chosen. ut

Exercise 2.36. Let π = πt, t ≥ 0 be the Yt-adapted process defined inTheorem 2.24. Prove that for any t ≥ 0, πt has a σ(Ys, 0 ≤ s ≤ t)-measurablemodification.

2.6 Solutions to Exercises

2.4

i. Let Ht be set on the right-hand side of (2.7). Since, for any G ∈ Yot andN1, N2 ∈ N , (G\N1) ∪ N2 ∈ Yt it follows that Ht ⊆ Yt. Since Yot and Nare subsets of Ht and H is a σ-algebra Yt = Yot ∨N ⊆ Ht.

ii. From Part (i) the result is true for ξ, the indicator function of an arbitraryset in Yt. By linearity the result holds for simple random variables, thatis, for linear combinations of indicator functions of sets in Yt. Finally letξ be an arbitrary Yt-measurable function. Then there exists a sequence(ξn)n>1 of simple random variables such that limn→∞ ξn(ω) = ξ(ω) forany ω ∈ Ω. Let (ηn)n≥1 be the corresponding sequence of Yot -measurablesimple random variables such that, for any n ≥ 1, ξn(ω) = ηn(ω) for anyω ∈ Ω\Nn where Nn ∈ N . Define η = lim supn→∞ ηn. Hence η is Yot -measurable and ξ(ω) = η(ω) for any ω ∈ Ω\(∪n≥1Nn) which establishesthe result.

2.20 The rational numbers Q are a dense subset of R. We show that the setG ⊂ P(R) of measures

∑nk=1 αkδxk , for αk ∈ Q+, and xk ∈ Q for all k with∑n

k=1 αk = 1, is dense in P(R) with the weak topology. Given µ ∈ P(R) wemust find a sequence µn ∈ G such that µn ⇒ µ.

It is sufficient to show that we can find an approximating sequence µn inthe space H of measures of the form

∑∞i=1 αiδxi where αi ∈ R+, xi ∈ Q and∑∞

i=1 αi = 1. It is clear that each such measure in H is the weak limit of asequence of measures in G.

We can cover R by the countable collection of disjoint sets of the form[k/n, (k + 1)/n) for k ∈ Z. Define

µn ,∞∑

k=−∞

µ([k/n, (k + 1)/n))δk/n;

then µn ∈ H. Let g ∈ Cb(R) be a Lipschitz continuous function. Define

ank , infx∈[k/n,(k+1)/n)

g(x), bni , supx∈[k/n,(k+1)/n)

g(x).


As n→∞, since g is uniformly continuous it is clear that supi |ani − bni | → 0.Thus as

µng =∞∑

k=−∞

g(k/n)µ([k/n, (k + 1)/n)),

and∞∑

k=−∞

ankµ([k/n, (k + 1)/n)) ≤ µg ≤∞∑

k=−∞

bnkµ([k/n, (k + 1)/n)),

it follows that

|µng − µg| ≤∞∑

k=−∞

|bnk − ank | → 0.

As this holds for all uniformly continuous g, we have established (2) of The-orem 2.17 and thus µn ⇒ µ.

For the second part, define µn , δn for n ∈ N. This sequence does notconverge weakly to any element of P(R) but the sequence is Cauchy in d,hence the space (P(R), d) is not complete.

2.21 Suppose that ζϕi is F/B(R)-measurable for all i. To show that ζ isF/B(P(S))-measurable, it is sufficient to show that for all elements A of theneighbourhood basis of µ, the set ζ−1(A) ∈ F . But the sets of the neighbour-hood basis have the form given by (2.9). We show that the weak topology isalso generated by the local neighbourhoods of µ of the form

B =m⋂i=1

ν ∈ P(S) : |νϕji − µϕji | < ε , (2.28)

where ε > 0, and j1, . . . , jm are elements of N. Clearly the topology with thisbasis must be weaker than the weak topology. We establish the equivalenceof the topologies if we also show that the weak topology is weaker than thetopology with neighbourhoods of the form (2.28). To this end, consider anelement A in the neighbourhood basis µ of the weak topology

A =m⋂i=1

ν ∈ P(S) : |νfi − µfi| < ε ;

we show that there is an element of the neighbourhood (2.28) which is a subsetof A. Suppose no such subset exists; in this case we can find a sequence µn inP(S) such that µnϕi → µϕi for all i, yet µn /∈ A for all n. But since ϕi∞i=1 isa convergence determining set, this implies that µn ⇒ µ and hence µnf → µffor all f ∈ Cb(S), in which case for n sufficiently large µn must be in A, whichis a contradiction. Thus we need only consider

ζ−1(B) =m⋂i=1

ω : |ζ(ω)ϕji − µϕji | < ε ,


where ε > 0 and j1, . . . , jm in N. Since ζϕi is F/B(R)-measurable, it followsthat each element of the intersection is F-measurable and thus ζ−1(B) ∈ F .Thus we have established that ζ is F/B(P(S))-measurable.

For the converse implication suppose that ζ is B(P(S))-measurable. Wemust show that ζf is B(R)-measurable for any f ∈ Cb(R). For any x ∈ R,ε > 0 the set µ ∈ P(S) : |µf − x| < ε is open in the weak topology onP(S), hence ω : |ζf − x| < ε is F-measurable; thus we have shown that(ζf)−1(x− ε, x+ ε) ∈ F . The open intervals (x− ε, x+ ε) for all x ∈ R, ε > 0generate the open sets in R, hence ζf is F/B(R) measurable.

2.22 Considering µ ∈ P(I) as a subset of R|I|, then a continuous boundedfunction ϕ on a finite set I may be thought of as elements of R|I| and µϕ isthe dot product µ · ϕ.

If µn, µ ∈ P(I) and µn ⇒ µ, then by choosing the functions to be the basisvectors of R|I| we see that µni → µi as n → ∞ for all i ∈ I. Thus weakconvergence in P(I) is equivalent to co-ordinatewise convergence in R|I|.

It is then clear that P(I) is separable since the set Q|I| is a countabledense subset of R|I|.

Since (R|I|, d) is complete and since d is a metric for co-ordinatewise con-vergence in R|I|, it also metrizes weak convergence on P(I).

2.25 We know from Theorem 2.24 that πf is indistinguishable from theYt+ optional projection of f(X). As t is a bounded stopping time, for anyt ∈ [0,∞),

E[f(Xt) | Yt+] = o(f(Xt)) P-a.s.,

hence the result.

2.29 Parts (a) and (b) are similar to the argument given for the existence ofthe process π, but in this case taking fi ∈ Cb(Ω,R) and gi = E[fi | G] choosingsome version of the conditional expectation. For (c) let Gi be a countablefamily generating G. Define K to be the (countable) π system generated bythese Gis. Clearly G = σ(K). Define

Ω′ , ω ∈ Ω : Q(ω,K) = 1K(ω), ∀K ∈ K .

SinceE[1K | G] = 1K , P-a.s.,

it follows that P(Ω′) = 1. For ω ∈ Ω′ the set of G ∈ G on which Q(ω,G) =1G(ω) is a d-system; so by Dynkin’s lemma (see A1.3 of Williams [272]) sincethis d-system includes the π-system K it must include σ(K) = G. Thus forω ∈ Ω′ it follows that

Q(ω,G) = 1G(ω), ∀G ∈ G.

To show that Q(ω,AG(ω)) = 1, observe that this would follow immediatelyfrom the above if AG(ω) ∈ G, but since it is defined in terms of an uncountableintersection we must use the countable generating system to write


AG(ω) =

( ⋃Gi:ω∈Gi

Gi

)∩

⋃Gi:ω/∈Gi

Gci

and since the expression on the right-hand side is in terms of a countableintersection of elements of G, the result follows.

2.33 To keep the solution concise, consider the even simpler case where theprocess Z defined in (2.22) is itself a uniformly integrable martingale (thegeneral case can be handled by defining the change of measure on each Ft tobe given by Zt as in Section 3.3). Thus we define a change of measure via

dPdP

= Z∞,

and consequently under P by Girsanov’s theorem Yt is a Brownian motion.Let η ∈ L2(Y∞,P), and apply the martingale representation theorem to

Z−1η, to obtain a previsible process νt such that

Z−1∞ η = EP(Z−1η) +

∫ ∞0

Φ>t dYs.

If we define a P-martingale via ηt = EP[Z−1∞ η | Yt] and by stochastic integra-

tion by partsd(Ztηt) =

(ZtΦ

>t − ηtZtπt(h>)

)dIt,

consequently we may define νt = ZtΦ>t − ηtZtπt(h>). We may integrate this

to obtain

Ztηt = E[η] +∫ t

0

ν>t dIt,

and passing to the t→∞ limit

η = Z∞η∞ = E[η] +∫ ∞

0

ν>t dIt.

2.36 Follow the same steps as in Lemma 2.23 for arbitrary fixed t ≥ 0 onlyconsider the random variables gi to be given by the (Kolmogorov) conditionalexpectations E[fi(Xt) | σ(Ys, 0 ≤ s ≤ t)] instead of the Yt-optional projection.Then use Exercise 2.4 part (ii) to show that the two constructions give riseto the same (random) probability measure almost surely.

Alternatively, let πt be the regular conditional distribution (in the senseof Definition A.2) of Xt given σ(Ys, 0 ≤ s ≤ t). Then for any f ∈ B(S),

πtf = E [f(Xt) | σ(Ys, 0 ≤ s ≤ t)]

holds P-a.s. Following Exercise 2.25 using the right continuity of the filtration(Yt)t≥0 and Exercise 2.4, for any f ∈ B(S),

2.7 Bibliographical Notes 45

πtf = E [f(Xt) | Yt] = E [f(Xt) | σ(Ys, 0 ≤ s ≤ t)] P-a.s.

Since S is a complete separable metric space there exists a countable separat-ing set A ⊂ Cb(S). Therefore, there exists a null set N(A) such that for anyω ∈ Ω\N(A) we have

πtf(ω) = πtf(ω)

for any f ∈ A. Therefore πt(ω) = πt(ω) for any ω ∈ Ω\N(A).

2.7 Bibliographical Notes

The majority of the results about weak convergence and probability measureson metric spaces can be found in Prokhorov [246] and are part of the standardtheory of probability measures.

The innovations argument originates in the work of Fujisaki, Kallianpurand Kunita [104], however, there are some technical difficulties whose resolu-tion is not clear from this paper but which are discussed in detail in Meyer[205].

3

The Filtering Equations

3.1 The Filtering Framework

Let (Ω,F ,P) be a probability space together with a filtration (Ft)t≥0 whichsatisfies the usual conditions. (See Section 2.1 for a definition of the usualconditions.) On (Ω,F ,P) we consider an Ft-adapted process X = Xt, t ≥ 0which takes values in a complete separable metric space S (the state space).Let S be the associated Borel σ-algebra B(S). The process X is assumed tohave paths which are cadlag. (See appendix A.5 for details.) In the followingX is called the signal process. Let Xt, t ≥ 0 be the usual augmentationwith null sets of the filtration associated with the process X. In other wordsdefine

Xt = σ(Xs, s ∈ [0, t]) ∨N , (3.1)

where N is the collection of all P-null sets of (Ω,F) and define

X ,∨t∈R+

Xt, (3.2)

where the ∨ notation denotes taking the σ-algebra generated by the union∪tXt. That is,

X = σ

⋃t∈R+

Xt

.

Recall that B(S) is the space of bounded B(S)-measurable functions. LetA : B(S) → B(S) and write D(A) for the domain of A which is a subsetof B(S). We assume that 1 ∈ D(A) and A1 = 0. This definition implies thatif f ∈ D(A) then Af is bounded. This is a very important observation whichis crucial for many of the bounds in this chapter.

Let π0 ∈ P(S). Assume that X is a solution of the martingale problem for(A, π0). In other words, assume that the distribution of X0 is π0 and that theprocess Mϕ = Mϕ

t , t ≥ 0 defined as


48 3 The Filtering Equations

Mϕt = ϕ(Xt)− ϕ(X0)−

∫ t

0

Aϕ(Xs) ds, t ≥ 0, (3.3)

is an Ft-adapted martingale for any ϕ ∈ D(A). The operator A is called thegenerator of the process X.

Let h = (hi)mi=1 : S→ Rm be a measurable function such that

P(∫ t

0

‖h(Xs)‖ ds <∞)

= 1 (3.4)

for all t ≥ 0. Let W be a standard Ft-adapted m-dimensional Brownianmotion on (Ω,F ,P) independent of X, and Y be the process satisfying thefollowing evolution equation

Yt = Y0 +∫ t

0

h(Xs) ds+Wt, (3.5)

where h = (hi)mi=1 : S → Rm is a measurable function. The condition (3.4)ensures that the Riemann integral in the definition of Yt exists a.s. This pro-cess Yt, t ≥ 0 is the observation process. Let Yt, t ≥ 0 be the usualaugmentation of the filtration associated with the process Y , viz

Yt = σ(Ys, s ∈ [0, t]) ∨N , (3.6)

Y =∨t∈R+

Yt. (3.7)

Then note that since by the measurability of h, Yt is Ft-adapted, it followsthat Yt ⊂ Ft.

Remark 3.1. To simplify notation we have considered A and h as having noexplicit time dependence. By addition of t as a component of the state vectorX, most results immediately extend to the case when A and h are time de-pendent. The reason for adopting this approach is that it keeps the notationsimple.

Definition 3.2. The filtering problem consists in determining the conditionaldistribution πt of the signal X at time t given the information accumulatedfrom observing Y in the interval [0, t]; that is, for ϕ ∈ B(S), computing

πtϕ = E[ϕ(Xt) | Yt]. (3.8)

As discussed in the previous chapter, we must choose a suitable regulari-sation of the process π = πt, t ≥ 0, and by Theorem 2.24 we can do this sothat πt is an optional (and hence progressively measurable), Yt-adapted proba-bility measure-valued process for which (3.8) holds almost surely. While (3.8)was established for ϕ bounded, πt as constructed is a probability measure-valued process, so it is quite legitimate to compute πtϕ when ϕ is unbounded

3.2 Two Particular Cases 49

provided that the expectation in question is well defined, in other words whenπt|ϕ| < ∞. In the following, Y0 is considered to be identically zero (there isno information available initially). Hence π0, the initial distribution of X, isidentical with the conditional distribution of X0 given Y0 and we use the samenotation for both

π0ϕ =∫

Sϕ(x)PX−1

0 (dx).

In the following we deduce the evolution equation for π. We consider twopossible approaches.

• The change of measure method. A new measure is constructed under whichY becomes a Brownian motion and π has a representation in terms ofan associated unnormalised version ρ. This ρ is then shown to satisfy alinear evolution equation which leads to the evolution equation for π byan application of Ito’s formula.

• The innovation process method. The second approach isolates the Brow-nian motion driving the evolution equation for π (called the innovationprocess) and then identifies the corresponding terms in the Doob–Meyerdecomposition of π.

Before we proceed, we first present two important examples of the aboveframework.

3.2 Two Particular Cases

We consider here two particular cases. One is a diffusion process and thesecond is a Markov chain with a finite state space.

The results in the chapter are stated in as general a form as possible and thevarious exercises show how the results can be applied in these two particularcases. The exercises establish suitable conditions on the processes, under whichthe general results of the chapter are valid. The process of verifying theseconditions is sequential and the exercises build upon the results of earlierexercises, thus they are best attempted in order. As usual, the solutions maybe found at the end of the chapter.

3.2.1 X a Diffusion Process

Let X = (Xi)di=1 be the solution of a d-dimensional stochastic differentialequation driven by a p-dimensional Brownian motion V = (V j)pj=1:

Xit = Xi

0 +∫ t

0

f i(Xs) ds+p∑j=1

∫ t

0

σij(Xs) dV js , i = 1, . . . , d. (3.9)


We assume that both f = (f i)di=1 : Rd → Rd and σ = (σij)i=1,...,d,j=1,...,p :Rd → Rd×p are globally Lipschitz: that is, there exists a positive constant Ksuch that for all x, y ∈ Rd we have

‖f(x)− f(y)‖ ≤ K‖x− y‖‖σ(x)− σ(y)‖ ≤ K‖x− y‖, (3.10)

where the Euclidean norm ‖ · ‖ is defined in the usual fashion for vectors, andextended to d× p-matrices by considering them as d× p-dimensional vectors,viz:

‖σ‖ =

√√√√ d∑i=1

p∑j=1

σ2ij .

Under the globally Lipschitz condition, (3.9) has a unique solution by TheoremB.38. The generator A associated with the process X is the second-orderdifferential operator

A =d∑i=1

f i∂

∂xi+

d∑i,j=1

aij∂2

∂xi∂xj, (3.11)

where a = (aij)i,j=1,...,d : Rd → Rd×d is the matrix-valued function defined as

aij = 12

p∑k=1

σikσjk = 12

(σσ>

)ij. (3.12)

for all i, j = 1, . . . , d. Recall from the definition that Af must be boundedfor f ∈ D(A). There are various possible choices of the domain. For example,we can choose D(A) = C2

k(Rd), the space of twice differentiable, compactlysupported, continuous functions on Rd, since Aϕ ∈ B(Rd) for all ϕ ∈ C2

k(Rd)and the process Mϕ = Mϕ

t , t ≥ 0 defined as in (3.3) is a martingale for anyϕ ∈ C2

k(Rd).

Exercise 3.3. If the global Lipschitz condition (3.10) holds, show that thereexists κ > 0 such that for x ∈ Rd,

‖σ(x)‖2 ≤ κ(1 + ‖x‖)2 (3.13)‖f(x)‖ ≤ κ(1 + ‖x‖). (3.14)

Consequently show that there exists κ′ > 0 such that

‖σ(x)σ>(x)‖ ≤ κ′(1 + ‖x‖2). (3.15)

Exercise 3.4. Let SL2(Rd) be the subset of all twice continuously differen-tiable real-valued functions on Rd for which there exists a constant C suchthat for all i, j = 1, . . . , d and x ∈ Rd we have

3.2 Two Particular Cases 51

|∂iϕ(x)| ≤ C

1 + ‖x‖, |∂i∂jϕ(x)| ≤ C

1 + ‖x‖2.

Prove that Aϕ ∈ B(Rd) for all ϕ ∈ SL2(Rd) and the process Mϕ defined asin (3.3) is a martingale for any ϕ ∈ SL2(Rd).

We can also choose D(A) to be the maximal domain of A. That is, D(A)is the set of all ϕ ∈ B(Rd) for which Aϕ ∈ B(Rd) and Mϕ is a martingale. Inthe following, unless otherwise stated, we assume that D(A) is the maximaldomain of A.

Remark 3.5. The following question is interesting to answer. Under what con-ditions is the solution of a martingale problem associated with the second-order differential operator defined in (3.11) the solution of the SDE (3.9)? Theanswer is surprisingly complicated. If D(A) contains the sequences (ϕik)k>0,(ϕi,jk )k>0 of functions in C2

k(Rd) such that ϕik = xi and ϕi,jk = xixj for ‖x‖ ≤ kthen there exists a p-dimensional Brownian motion V defined on an extension(Ω, F , P) of (Ω,F ,P) such that X is a weak solution of (3.9). For details seeProposition 4.6, page 315 together with Remark 4.12, page 318 in Karatzasand Shreve [149].

3.2.2 X a Markov Process with a Finite Number of States

Let X be an Ft-adapted Markov process with values in a finite state spaceI. Then B(S) is isomorphic to RI and the role of A is taken by the Q-matrixQ = qij(t), i, j ∈ I, t ≥ 0 associated with the process. The Q-matrix isdefined so that for all t, h ≥ 0 as h→ 0, uniformly in t, for any i, j ∈ I,

P (Xt+h = j | Xt = i) = Ji(j) + qij(t)h+ o(h). (3.16)

In (3.16) Ji is the indicator function of the atom i. In other words, qij(t) isthe rate at which the process jumps from site i to site j and −qii(t) is therate at which the process leaves site i. Assume that Q has the properties:

a. qii(t) ≤ 0 for all i ∈ I, qij (t) ≥ 0 for all i, j ∈ I, i 6= j.b.∑j∈I qij(t) = 0 for all i ∈ I.

c. supt≥0 |qij(t)| <∞ for all i, j ∈ I.

Exercise 3.6. Prove that for all ϕ ∈ B(S), the process Mϕ = Mϕt , t ≥ 0

defined as


∫ t

0

Qϕ(s,Xs) ds, t ≥ 0, (3.17)

is an Ft-adapted right-continuous martingale. In (3.17), Qϕ : [0,∞)× I → Ris defined in a natural way as

(Qϕ)(s, i) =∑j∈I

qij(s)ϕ(j), for all (s, i) ∈ [0,∞)× I.


Exercise 3.7. The following is a simple example with real-world applicationswhich fits within the above framework. Let X = Xt, t ≥ 0 be the process

Xt = I[T,∞)(t), t ≥ 0,

where T is a positive random variable with probability density p and tailprobability

gt = P(T ≥ t), t > 0.

Prove that the Q-matrix associated with X has entries q01(t) = −q00(t) =pt/gt, q11(t) = q10(t) = 0. See Exercise 3.32 for more on how the associatedfiltering problem is solved.

Remark 3.8. We can think of T as the time of a certain event occurring, forexample, the failure of a piece of signal processing equipment, or the onset ofa medical condition, which we would like to detect based on the informationgiven by observing Y . This is the so-called change-detection filtering problem.

3.3 The Change of Probability Measure Method

This method consists in modifying the probability measure on Ω, in orderto transform the process Y into a Brownian motion by means of Girsanov’stheorem. Let Z = Zt, t > 0 be the process defined by

Zt = exp

(−

m∑i=1

∫ t

0

hi(Xs) dW is −

12

m∑i=1

∫ t

0

hi(Xs)2 ds

), t ≥ 0. (3.18)

We need to introduce conditions under which the process Z is a martingale.The classical condition is Novikov’s condition (see Theorem B.34). If

E

[exp

(12

m∑i=1

∫ t

0

hi(Xs)2 ds

)]<∞ (3.19)

for all t > 0, then Z is a martingale. Since (3.19) is quite difficult to verifydirectly, we use an alternative condition provided by the following lemma.

Lemma 3.9. Let ξ = ξt, t ≥ 0 be a cadlag m-dimensional process suchthat

E

[m∑i=1

∫ t

0

(ξis)2

ds

]<∞ (3.20)

and z = zt, t > 0 be the process defined as

zt = exp

(m∑i=1

∫ t

0

ξis dW is −

12

m∑i=1

∫ t

0

(ξis)2

ds

), t ≥ 0. (3.21)

3.3 The Change of Probability Measure Method 53

If the pair (ξ, z) satisfies for all t ≥ 0

E

[m∑i=1

∫ t

0

zs(ξis)2

ds

]<∞, (3.22)

then z is a martingale.

Proof. From (3.20), we see that the process

t 7→m∑i=1

∫ t

0

ξis dW is

is a continuous (square-integrable) martingale with quadratic variation pro-cess

t 7→m∑i=1

∫ t

0

(ξis)2

ds.

By Ito’s formula, the process z satisfies the equation

zt = 1 +m∑i=1

∫ t

0

zsξis dW i

s .

Hence z is a non-negative, continuous, local martingale and therefore by Fa-tou’s lemma a continuous supermartingale. To prove that z is a (genuine)martingale it is enough to show that it has constant expectation. Using thesupermartingale property we note that

E[zt] ≤ E[z0] = 1.

By Ito’s formula, for ε > 0,

zt1 + εzt

=1ε− 1ε (1 + εzt)

=1

1 + ε+

m∑i=1

∫ t

0

zs(1 + εzs)2

ξis dW is

−m∑i=1

∫ t

0

εz2s

(1 + εzs)3

(ξis)2

ds. (3.23)

From (3.20) it follows that

E

[m∑i=1

∫ t

0

(zs

(1 + εzs)2

)2 (ξis)2

ds

]

= E

[m∑i=1

∫ t

0

1ε2

(εzs

1 + εzs

)2 1(1 + εzs)2

(ξis)2

ds

]

≤ 1ε2

E

[m∑i=1

∫ t

0

(ξis)2

ds

]<∞,


hence the second term in (3.23) is a martingale with zero expectation. Bytaking expectation in (3.23),

E[

zt1 + εzt

]=

11 + ε

− E

[m∑i=1

∫ t

0

1(1 + εzs)2

εzs1 + εzs

zs(ξis)2

ds

]. (3.24)

We now take the limit in (3.24) as ε tends to 0. From (3.22) we obtain ourclaim by means of the dominated convergence theorem. ut

As we require Z to be a martingale in order to construct the change ofmeasure, the preceding lemma suggests the following as a suitable conditionto impose upon h,

E[∫ t

0

‖h(Xs)‖2 ds]<∞, E

[∫ t

0

Zs ‖h(Xs)‖2 ds]<∞, ∀t > 0. (3.25)

Note that, since X has cadlag paths, the process s 7→ h(Xs) is progressivelymeasurable. Condition (3.25) implies conditions (2.3) and (2.4) and hence Ytis right continuous and πt has a Yt-adapted progressively measurable version.

Exercise 3.10. Let X be the solution of (3.9). Prove that if (3.10) is satisfiedand X0 has finite second moment, then the second moment of ‖Xt‖ is boundedon any finite time interval [0, T ]. That is, there exists GT such that for all0 ≤ t ≤ T ,

E[‖Xt‖2] < GT . (3.26)

Further show that under the same conditions, if X0 has finite third momentthat for any time interval [0, T ], there exists HT such that for 0 ≤ t ≤ T ,

E[‖Xt‖3] < HT . (3.27)

[Hint: Use Gronwall’s lemma, in the form of Corollary A.40 in the appendix.]

Exercise 3.11. i. (Difficult) Let X be the solution of (3.9). Prove that ifcondition (3.10) is satisfied and X0 has finite second moment and h haslinear growth, that is, there exists C such that

‖h(x)‖2 ≤ C(1 + ‖x‖2) ∀x ∈ Rd, (3.28)

then (3.25) is satisfied.ii. Let X be the Markov process with values in the finite state space I as

described in Section 3.2. Then show that (3.25) is satisfied.

Proposition 3.12. If (3.25) holds then the process Z = Zt, t ≥ 0 is anFt-adapted martingale.

Proof. Condition (3.25) implies condition (3.22) of Lemma 3.9, which impliesthe result. ut

3.3 The Change of Probability Measure Method 55

For fixed t ≥ 0, since Zt > 0 introduce a probability measure Pt on Ft byspecifying its Radon–Nikodym derivative with respect to P to be given by Zt,viz

dPt

dP

∣∣∣∣∣Ft

= Zt.

It is immediate from the martingale property of Z that the measures Pt forma consistent family. That is, if A ∈ Ft and T ≥ t then

PT (A) = E[ZT 1A] = E [E[ZT 1A | Ft]] = E [1AE[ZT | Ft]] = E[1AZt] = Pt(A),

where E denotes expectation with respect to the probability measure P, aconvention which we adhere to throughout this chapter. Therefore we candefine a probability measure P which is equivalent to P on

⋃0≤t<∞ Ft and we

are able to suppress the superscript t in subsequent calculations.It is important to realise that we have not defined a measure on F∞, where

F∞ =∞∨t=0

Ft = σ

⋃0≤t<∞

Ft

.

We cannot in general use the Daniel–Kolmogorov theorem here to extend thedefinition of P to F∞. Indeed there may not exist a measure defined on F∞which agrees with Pt on Ft for all 0 ≤ t <∞. For a more detailed discussionof why this extension may not be possible, see the discussion in Section B.3.1of the appendix.

Proposition 3.13. If condition (3.25) is satisfied then under P, the observa-tion process Y is a Brownian motion independent of X; additionally the lawof the signal process X under P is the same as its law under P.

Proof. By Corollary B.31 to Girsanov’s theorem, the process

Yt = Wt +∫ t

0

h(Xs) ds

is a Brownian motion with respect to P. Also, the law of the pair process(X,Y ) can be written as

(X,Y ) = (X,W ) +(

0,∫ t

0

h(Xs) ds),

thus on the interval [0, t] where t is arbitrary, the law of (X,W ) is absolutelycontinuous with respect to the law of the process (X,Y ), and its Radon–Nikodym derivative is Zt (see Exercise 3.14). That is, for any bounded mea-surable function f defined on the product of the corresponding path spacesfor the pair (X,Y ),


E [f(X,Y )Zt] = E[f(X,W )], (3.29)

where in (3.29) both processes are regarded up to time t. Hence

E[f(X,Y )] = E[f(X,Y )Zt] = E[f(X,W )]

and therefore X and Y are independent under P since (X,W ) has the samejoint distribution under P as (X,Y ) has under P and a priori X and W areindependent. ut

Exercise 3.14. i. Show that the process P = Pt, t ≥ 0 defined with β ∈Rm as

Pt = exp(iβ>Yt −

12‖β‖2t

)Zt

is a X ∨ Ft-martingale.ii. Deduce from (i) that for any n ≥ 1 and 0 ≤ t1 ≤ t2 ≤ · · · ≤ tn < ∞ and

any β1, . . . , βn ∈ Rm, we have

E

exp

n∑j=1

iβ>j Ytj

Ztn

∣∣∣∣∣∣X = E

exp

n∑j=1

iβ>j Wtj

∣∣∣∣∣∣X .

iii. Deduce from (ii) that (3.29) holds true for any bounded measurable func-tion f defined on the product of the corresponding path spaces for the pair(X,Y ).

Let Z = Zt, t ≥ 0 be the process defined as Zt = Z−1t for t ≥ 0. Under

P, Zt satisfies the following stochastic differential equation,

dZt =m∑i=1

Zthi(Xt) dY it (3.30)

and since Z0 = 1,

Zt = exp

(m∑i=1

∫ t

0

hi(Xs) dY is −12

m∑i=1

∫ t

0

hi(Xs)2 ds

), (3.31)

then E[Zt] = E[ZtZt] = 1, so Zt is an Ft-adapted martingale under P and wehave

dPdP

∣∣∣∣Ft

= Zt for t ≥ 0.

Proposition 3.13 implies that under P the observation process Y is a Yt-adapted Brownian motion; we can make use of the fact that Brownian motionis a Markov process to derive the following proposition.

Proposition 3.15. Let U be an integrable Ft-measurable random variable.Then we have

E[U | Yt] = E[U | Y]. (3.32)

3.4 Unnormalised Conditional Distribution 57

Proof. Let us denote by

Y ′t = σ(Yt+u − Yt; u ≥ 0);

then Y = σ(Yt,Y ′t). Under the probability measure P the σ-algebra Y ′t ⊂ Y isindependent of Ft because Y is an Ft-adapted Brownian motion. Hence sinceU is Ft-adapted using property (f) of conditional expectation

E[U | Yt] = E[U | σ(Yt,Y ′t)] = E[U | Y].

ut

This proposition is an important step in the change of measure route toderiving the equations of non-linear filtering. It allows us to replace the time-dependent family of σ-algebras Yt in the conditional expectations with thefixed σ-algebra Y. This enables us to use techniques based on results fromKolmogorov conditional expectation which would not be applicable if theconditioning set were time dependent (as in the case of Yt). The propositionalso has an interesting physical interpretation: the solution of the filteringproblem for an Ft-adapted random variable U given all observations (future,present and past) is equal to E[U | Yt]; that is, future observations will notinfluence the estimator.

3.4 Unnormalised Conditional Distribution

In this section we first prove the Kallianpur–Striebel formula and use this todefine the unnormalized conditional distribution process.

The notation P(P)-a.s. in Proposition 3.16 means that the result holdsboth P-a.s. and P-a.s. We only need to show that it holds true in the firstsense since P and P are equivalent probability measures.

Proposition 3.16 (Kallianpur–Striebel). Assume that condition (3.25)holds. For every ϕ ∈ B(S), for fixed t ∈ [0,∞),

πt(ϕ) =E[Ztϕ(Xt) | Y]

E[Zt | Y]P(P)-a.s. (3.33)

Proof. It is clear from the definition that Zt ≥ 0; furthermore it is readilyobserved that

0 = E[1Zt=0Zt

]= E

[1Zt=0

]= P(Zt = 0),

whence it follows that Zt > 0 P-a.s. as a consequence of which E[Zt | Y] > 0 P-a.s. and the right-hand side of (3.33) is well defined. Hence using Proposition3.15 it suffices to show that


πt(ϕ)E[Zt | Yt] = E[Ztϕ(Xt) | Yt] P-a.s.

As both the left- and right-hand sides of this equation are Yt-measurable, thisis equivalent to showing that for any bounded Yt-measurable random variableb,

E[πt(ϕ)E[Zt | Yt]b] = E[E[Ztϕ(Xt) | Yt]b].

A consequence of the definition of the process πt is that πtϕ = E[ϕ(Xt) | Yt]P-a.s., so from the definition of Kolmogorov conditional expectation

E [πt(ϕ)b] = E [ϕ(Xt)b] .

Writing this under the measure P,

E[πt(ϕ)bZt

]= E

[ϕ(Xt)bZt

].

By the tower property of the conditional expectation, since by assumption thefunction b is Yt-measurable

E[πt(ϕ)E[Zt | Yt]b

]= E

[E[ϕ(Xt)Zt | Yt]b

]which proves that the result holds P-a.s. ut

Let ζ = ζt, t ≥ 0 be the process defined by

ζt = E[Zt | Yt], (3.34)

then as Zt is an Ft-martingale under P and Ys ⊆ Fs, it follows that for0 ≤ s < t,

E[ζt | Ys] = E[Zt|Ys] = E[E[Zt | Fs] | Ys

]= E[Zs | Ys] = ζs.

Therefore by Doob’s regularization theorem (see Rogers and Williams [248,Theorem II.67.7] since the filtration Yt satisfies the usual conditions we canchoose a cadlag version of ζt which is a Yt-martingale. In what follows, as-sume that ζt, t ≥ 0 has been chosen to be such a version. Given such a ζ,Proposition 3.16 suggests the following definition.

Definition 3.17. Define the unnormalised conditional distribution of X to bethe measure-valued process ρ = ρt, t ≥ 0 which is determined (see Theorem2.13) by the values of ρt(ϕ) for ϕ ∈ B(S) which are given for t ≥ 0 by

ρt(ϕ) , πt(ϕ)ζt.

Lemma 3.18. The process ρt, t ≥ 0 is cadlag and Yt-adapted. Further-more, for any t ≥ 0,

ρt(ϕ) = E[Ztϕ(Xt) | Yt

]P(P)-a.s. (3.35)

3.4 Unnormalised Conditional Distribution 59

Proof. Both πt(ϕ) and ζt are Yt-adapted. By construction ζ, t ≥ 0 is alsocadlag.† By Theorem 2.24 and Corollary 2.26 πt, t ≥ 0 is cadlag and Yt-adapted, therefore the process ρt, t ≥ 0 is also cadlag and Yt-adapted.

For the second part, from Proposition 3.15 and Proposition 3.16 it followsthat

πt(ϕ)E[Zt | Yt] = E[Ztϕ(Xt) | Yt] P-a.s.,

From (3.34), E[Zt | Yt] = ζt a.s. from which the result follows. ut

It may be useful to point out that for general ϕ, the process ρt(ϕ) isnot a Yt-martingale but a semimartingale. This misconception arising from(2.8) is due to confusion with the well-known result that taking conditionalexpectation of an integrable random variable Z with respect to the family Ytgives rise to a (uniformly integrable) martingale E[Z | Yt]. But this is onlytrue for a fixed random variable Z which does not depend upon t.

Corollary 3.19. Assume that condition (3.25) holds. For every ϕ ∈ B(S),

πt(ϕ) =ρt(ϕ)ρt(1)

∀t ∈ [0,∞) P(P)-a.s. (3.36)

Proof. It is clear from Definition 3.17 that ζt = ρt(1). The result then followsimmediately. ut

The Kallianpur–Striebel formula explains the usage of the term unnor-malised in the definition of ρt as the denominator ρt(1) can be viewed as thenormalising factor. The result can also be viewed as the abstract version ofBayes’ identity in this filtering framework. In theory at least the Kallianpur–Striebel formula provides a method for solving the filtering problem.

Remark 3.20. The Kallianpur–Striebel formula (3.33) holds true for any Borel-measurable ϕ, not necessarily bounded, such that E [|ϕ(Xt)|] <∞; see Exer-cise 5.1 for details.

Lemma 3.21. i. Let ut, t ≥ 0 be an Ft-progressively measurable processsuch that for all t ≥ 0, we have

E[∫ t

0

u2s ds

]<∞; (3.37)

then, for all t ≥ 0, and j = 1, . . . ,m, we have

E[∫ t

0

us dY js

∣∣∣∣ Y] =∫ t

0

E[us | Y] dY js . (3.38)

† It is in fact the case that ζt = exp(∫ t

0πs(h>) dYs − 1

2

∫ t

0‖πs(h)‖2 ds

); see Lemma

3.29.


ii. Now let ut, t ≥ 0 be an Ft-progressively measurable process such that forall t ≥ 0, we have

E[∫ t

0

u2s d 〈Mϕ〉s

]<∞; (3.39)

then

E[∫ t

0

us dMϕs

∣∣∣∣ Y] = 0. (3.40)

Proof. i. Every εt from the total set St as defined in Lemma B.39 satisfiesthe following stochastic differential equation

εt = 1 +∫ t

0

iεsr>s dYs.

We observe the following sequence of identities

E[εtE

[∫ t

0

us dY js

∣∣∣∣ Y]] = E[εt

∫ t

0

us dY js

]= E

[∫ t

0

us dY js

]+ E

[∫ t

0

iεsrjsus ds

]= E

[E[∫ t

0

iεsrjsus ds

∣∣∣∣ Y]]= E

[∫ t

0

iεsrjs E[us | Y] ds

]= E

[εt

∫ t

0

E[us | Y] dY js

],

which completes the proof of (3.38).ii. Since for all ϕ ∈ D(A), Mϕ

t ,Ft is a square integrable martingale, wecan define the Ito integral with respect to it. The proof of (3.40) is similarto that of (3.38). We once again choose εt from the set St and obtainthe following sequence of identities (we use the fact that the quadraticcovariation between Mϕ

t and Y is 0).

3.5 The Zakai Equation 61

E[εtE

[∫ t

0

us dMϕs

∣∣∣∣ Y]] = E[εt

∫ t

0

us dMϕs

]= E

[∫ t

0

us dMϕs

]+

m∑i=1

E⟨∫ ·

0

iεsrjs dY js ,

∫ ·0

us dMϕs

⟩t

= E[∫ t

0

us dMϕs

]+

m∑i=1

E∫ t

0

iεsrjsus d

⟨Mϕ· , Y

j·⟩s

= E[∫ t

0

us dMϕs

]= 0,

where the final equality follows from the fact that the condition (3.39)ensures that the stochastic integral is a martingale. ut

Exercise 3.22. Prove that if ϕ,ϕ2 ∈ D (A) then

〈Mϕ〉t =∫ t

0

(Aϕ2 − 2ϕAϕ

)(Xs) ds. (3.41)

Hence, show in this case that condition (3.37) implies condition (3.39) ofLemma 3.21.

3.5 The Zakai Equation

In the following, we further assume that for all t ≥ 0,

P[∫ t

0

[ρs(‖h‖)]2 ds <∞]

= 1. (3.42)

Exercise 3.25 gives some convenient conditions under which (3.42) holds forthe two example classes of signal processes considered in this chapter.

Exercise 3.23. Show that the stochastic integral∫ t

0ρs(ϕh>) dYs is well de-

fined for any ϕ ∈ B(S) under condition (3.42). Hence the process

t 7→∫ t

0

ρs(ϕh>) dYs,

is a local martingale with paths which are almost surely continuous, since itis Yt-adapted and (Yt)t≥0 is a Brownian filtration.


Theorem 3.24. If conditions (3.25) and (3.42) are satisfied then the processρt satisfies the following evolution equation, called the Zakai equation,

ρt(ϕ) = π0(ϕ) +∫ t

0

ρs(Aϕ)ds+∫ t

0

ρs(ϕh>) dYs, P-a.s. ∀t ≥ 0 (3.43)

for any ϕ ∈ D(A).

Proof. We first approximate Zt with Zεt given by

Zεt =Zt

1 + εZt.

Using Ito’s rule and integration by parts, we find

d(Zεt ϕ(Xt)

)= ZεtAϕ(Xt) dt+ Zεt dMϕ

t

− εϕ(Xt)(1 + εZt)−3Z2t ‖h(Xt)‖2 dt

+ ϕ(Xt)(1 + εZt)−2Zth>(Xt) dYt.

Since Zεt is bounded, (3.39) is satisfied; hence by Lemma 3.21

E[∫ t

0

Zεs dMϕs

∣∣∣∣ Y] = 0.

Also since

E

∫ t

0

ϕ2(Xs)1

(1 + εZs)2

1ε2

(εZs

1 + εZs

)2

‖h(Xs)‖2 ds

≤‖ϕ‖2∞ε2

E[∫ t

0

‖h(Xs)‖2 ds]

=‖ϕ‖2∞ε2

E[∫ t

0

Zs ‖h(Xs)‖2 ds]<∞,

where the final inequality is a consequence of (3.25). Therefore condition (3.37)is satisfied. Hence, by taking conditional expectation with respect to Y andapplying (3.38) and (3.40), we obtain

E[Zεt ϕ(Xt) | Y] =π0(ϕ)1 + ε

+∫ t

0

E[ZεsAϕ(Xs) | Y] ds

−∫ t

0

E[εϕ(Xs)(Zεt )2 1

(1 + εZs)‖h(Xs)‖2 | Y

]ds

+∫ t

0

E[Zεt

11 + εZs

ϕ(Xs)h>(Xs) | Y]

dYs. (3.44)


Now let ε tend to 0. We have, writing λ for Lebesgue measure on [0,∞),

limε→0

Zεt = Zt

limε→0

E[Zεt ϕ(Xt) | Y] = ρt(ϕ), P-a.s.

limε→0

E[ZεtAϕ(Xt) | Y] = ρt(Aϕ), λ⊗ P-a.e.

This last sequence remains bounded by the random variable ‖Aϕ‖∞E[Zt | Y],which can be seen to be in L1([0, t]×Ω;λ⊗ P) since

E[∫ t

0

‖Aϕ‖∞E[Zs | Y] ds]≤ ‖Aϕ‖∞

∫ t

0

E[Zs] ds ≤ ‖Aϕ‖∞t <∞.

Consequently by the conditional form of the dominated convergence theoremas ε→ 0,

E[∫ t

0

E[ZεsAϕ(Xs) | Y] ds∣∣∣∣ Y]→ E

[∫ t

0

ρs(Aϕ) ds∣∣∣∣ Y] , P-a.s.

Using the definition of ρt, we see that by Fubini’s theorem∫ t

0

E[ZεsAϕ(Xs) | Y] ds→∫ t

0

ρs(Aϕ) ds, P-a.s.

Next we have that for almost every t,

limε→0

εϕ(Xs)(Zεs )2(1 + εZs)−1‖h(Xs)‖2 = 0, P-a.s.,

and∣∣∣εϕ(Xs)(Zεs )2(1 + εZs)−1‖h(Xs)‖2∣∣∣

=

∣∣∣∣∣ϕ(Xs)Zs‖h(Xs)‖2εZs

1 + εZs

(1 + εZs

)−2∣∣∣∣∣

≤ ‖ϕ‖∞ Zs‖h(Xs)‖2. (3.45)

The right-hand side of (3.45) is integrable over [0, t]×Ω with respect to λ⊗ Pusing (3.25):

E[∫ t

0

Zs‖h(Xs)‖2 ds]

= E[∫ t

0

‖h(Xs)‖2 ds]<∞.

Thus we can use the conditional form of the dominated convergence theoremto obtain that

limε→0

∫ t

0

εE[ϕ(Xs)

(Zεs

)2

(1 + εZs)−1‖h(Xs)‖2 | Y]

ds = 0.


To complete the proof it only remains to show that as ε→ 0,∫ t

0

E[Zεs

11 + εZs

ϕ(Xs)h>(Xs) | Y]

dYs →∫ t

0

ρs(ϕh>) dYs. (3.46)

Consider the process

t 7→∫ t

0

E[Zεt

11 + εZs

ϕ(Xs)h>(Xs) | Y]

dYs; (3.47)

we show that this is a martingale. By Jensen’s inequality, Fubini’s theoremand (3.25),

E

[∫ t

0

E

[(Zεt

11 + εZs

ϕ(Xs)h>(Xs))2

| Y

]ds

]

≤ ‖ϕ‖2∞

ε2E[∫ t

0

E[‖h(Xs)‖2 | Y] ds]

= ε2‖ϕ‖2∞∫ t

0

E[‖h(Xs)‖2] ds

= ε2‖ϕ‖2∞E[∫ t

0

Zs‖h(Xs)‖2 ds]

<∞.

Thus the process defined in (3.47) is an Ft-martingale. From condition (3.42)and Exercise 3.23 the postulated limit process as ε→ 0,

t 7→∫ t

0

ρs(ϕh>) dYs, (3.48)

is a well defined local martingale. Thus the difference of (3.47) and (3.48) isa well defined local martingale,

t 7→∫ t

0

E

[εZ2

s (2 + εZs)(1 + εZs)2

ϕ(Xs)h>(Xs) | Y

]dYs. (3.49)

We use Proposition B.41 to prove that the integral in (3.49) converges to 0,P-almost surely. Since, for all i = 1, . . . ,m,

limε→0

εZ2s (2 + εZs)

(1 + εZs)2ϕ(Xs)hi(Xs) = 0, P-a.s.

and ∣∣∣∣∣Zs εZs

(1 + εZs)(2 + εZs)(1 + εZs)

ϕ(Xs)hi(Xs)

∣∣∣∣∣ ≤ 2‖ϕ‖∞Zs∣∣hi(Xs)

∣∣ , (3.50)


using (3.25) it follows that for Lebesgue a.e. s ≥ 0, the right-hand side is P-integrable, and hence it follows by the dominated convergence theorem thatfor almost every s ≥ 0,

limε→0

E

[εZ2

s (2 + εZs)(1 + εZs)2

ϕ(Xs)hi(Xs) | Y

]= 0, P-a.s.

As a consequence of (3.50),

E

[εZ2

s (2 + εZs)(1 + εZs)2

ϕ(Xs)hi(Xs) | Y

]≤ 2‖ϕ‖∞ρs(‖h‖),

and using the assumed condition (3.42), it follows that P-a.s.∫ t

0

(E

[εZ2

s (2 + εZs)(1 + εZs)2

ϕ(Xs)hi(Xs) | Y

])2

ds

≤ 4‖ϕ‖2∞∫ t

0

[ρs(‖h‖)]2 ds <∞.

Thus using the dominated convergence theorem for L2([0, t]), we obtain that∫ t

0

m∑i=1

(E

[εZ2

s (2 + εZs)(1 + εZs)2

ϕ(Xs)hi(Xs) | Y

])2

ds→ 0 P-a.s. (3.51)

Because this convergence only holds almost surely we cannot apply the Itoisometry to conclude that the stochastic integrals in (3.46) converge. However,Proposition B.41 of the appendix is applicable as a consequence of (3.51),which establishes the convergence in (3.46).† ut

Exercise 3.25. i. (Difficult) Let X be the solution of (3.9). Prove that if(3.10) is satisfied, X0 has finite third moment and h has linear growth(3.28), then (3.42) is satisfied. [Hint: Use the result of Exercise 3.10.]

ii. Let X be the Markov process with values in the finite state space I asdescribed in Section 3.2. Then (3.42) is satisfied.

Remark 3.26. If X is a Markov process with finite state space I, then theZakai equation is, in fact, a (finite-dimensional) linear stochastic differentialequation. To see this, let us define by ρit the mass that ρt puts on site i forany i ∈ I. In particular,

ρit = ρt(i)= E[Ji(Xt)Zt | Yt], i ∈ I,

† The convergence established in Proposition B.41 is in probability only. Thereforethe convergence in (3.46) follows for a suitably chosen sequence (εn) such thatεn → 0. The theorem follows by taking the limit in (3.44) as εn → 0.


where Ji is the indicator function of the singleton set i and for an arbitraryfunction ϕ : I → R, we have

ρt(ϕ) =∑i∈I

ϕ(i)ρit.

Hence the measure ρt and the |I|-dimensional vector (ρit)i∈I can be identifiedas one and the same object and from (3.43) we get that

ρt(ϕ) =∑i∈I

ϕ(i)ρit

=∑i∈I

ϕ(i)

πi0 +∫ t

0

∑j∈I

Qjiρjs ds+

m∑j=1

∫ t

0

ρishj(i) dY js

.

Hence ρt = (ρit)i∈I satisfies the |I|-dimensional linear stochastic differentialequation

ρt = π0 +∫ t

0

Q>ρs ds+m∑j=1

∫ t

0

Hjρs dY js , (3.52)

where, for j = 1, . . . ,m, Hj = diag(hj) is the |I| × |I| diagonal matrix withentries Hii = hji , and π0 is the |I|-dimensional vector with entries

πi0 = π0(i) = P (X0 = i) .

The use of the same notation for the vector and the corresponding measure iswarranted for the same reasons as above. Evidently, due to its linearity, (3.52)has a unique solution.

Exercise 3.27. Let X be a Markov process with finite state space I with as-sociated Q-matrix Q and π =

(πit)i∈I , t ≥ 0 be the conditional distribution

of X given the σ-algebra Yt viewed as a process with values in RI .

i. Deduce from (3.52) that the |I|-dimensional process π solves the following(non-linear) stochastic differential equation,

πt = π0 +∫ t

0

Q>πs ds

+m∑j=1

∫ t

0

(Hj − πs(hj)I|I|

)πs(dY js − πs(hj) ds), (3.53)

where I|I| is the identity matrix of size |I|.ii. Prove that (3.53) has a unique solution in the space of continuous Yt-

adapted |I|-dimensional processes.

3.6 The Kushner–Stratonovich Equation 67

Remark 3.28. There is a corresponding treatment of the Zakai equation forthe case S = Rd and X is the solution of the stochastic differential (3.9).This be done in Chapter 7. In this case, ρt can no longer be associated witha finite-dimensional object (a vector). Under additional assumptions, it canbe associated with functions defined on Rd which represent the density ofthe measure ρt with respect to the Lebesgue measure. The analysis goes intwo steps. First one needs to make sense of the stochastic partial differentialequation satisfied by the density of ρt (the equivalent of (3.52)). That is, oneshows the existence and uniqueness of its solution in a suitably chosen spaceof functions. Next one shows that the measure with that given density solvesthe Zakai equation which we establish beforehand that it has a unique solu-tion. This implies that ρt has the solution of the stochastic partial differentialequation as its density with respect to the Lebesgue measure.

3.6 The Kushner–Stratonovich Equation

An equation has been derived for the unnormalised conditional distribution ρ.In order to solve the filtering problem the normalised conditional distributionπ is required. In this section an equation is derived which π satisfies. Thecondition (2.4) viz:

P(∫ t

0

‖πs(h)‖2 ds <∞)

= 1, for all t ≥ 0, (3.54)

turns out to be fundamental to the derivation of the Kushner–Stratonovichequation by various methods This technical condition (3.54) is unfortunatesince it depends on the process π which we are trying to find, rather thanbeing a direct condition on the system. It is, however, a consequence of thestronger condition which was required for the change of measure approach tothe derivation of the Zakai equation, which is the first part of (3.25), since πtis a probability measure for all t ∈ [0,∞).

Lemma 3.29. If conditions (3.25) and (3.42) are satisfied then the processt 7→ ρt(1) has the following explicit representation,

ρt(1) = exp(∫ t

0

πs(h>) dYs −12

∫ t

0

πs(h>)πs(h) ds). (3.55)

Proof. Because h is not bounded, it is not automatic that πt(h) is defined(h might not be integrable with respect to πt). However (3.25) ensures thatit is defined λ ⊗ P-a.s. which suffices. From the Zakai equation (3.43), sinceA1 = 0, one obtains that ρt(1) satisfies the following equation,

ρt(1) = 1 +∫ t

0

ρs(h>) dYs,


which gives

ρt(1) = 1 +∫ t

0

ρs(1)πs(h>) dYs.

We cannot simply apply Ito’s formula to log ρt(1) to conclude that ρt(1) hasthe explicit form (3.55), because the function x 7→ log x is not continuous atx = 0 (it is not even defined at 0) and we do not know a priori that ρt(1) > 0.

Using the fact that ρt(1) is non-negative, we use Ito’s formula to computefor ε > 0

d(

log√ε+ ρt(1)2

)=

ρt(1)2

ε+ ρt(1)2πt(h>) dYt

+12

ε− ρt(1)2

(ε+ ρt(1)2)2πs(h>)πs(h) dt

=ρt(1)2

ε+ ρt(1)2πt(h>)h(Xt)dt+

ρt(1)2

ε+ ρt(1)2πt(h>) dWt

+12

ε− ρt(1)2

(ε+ ρt(1)2)2πs(h>)πs(h) dt. (3.56)

From (3.25) the condition (2.4) is satisfied; thus∫ t

0

(ρs(1)2

ε+ ρs(1)2

)2

‖πs(h)‖2 ds ≤∫ t

0

‖πs(h)‖2 ds <∞ P-a.s.

and from (3.25) and (2.4)∫ t

0

πs(h>)h(Xs) ds ≤

√∫ t

0

‖πs(h)‖2 ds∫ t

0

‖h(Xs)‖2 ds P-a.s.

Thus s 7→ πs(h>)h(Xs) is integrable, so by dominated convergence the limitas ε→ 0 in (3.56) yields

d (log ρt(1)) = πt(h>) (h(Xt)dt+ dWt)− 12πt(h

>)πt(h) dt

= πt(h>) dYt − 12πt(h

>)πt(h) dt.

Integrating this SDE, followed by exponentiation yields the desired result. ut

Theorem 3.30. If conditions (3.25) and (3.42) are satisfied then the condi-tional distribution of the signal πt satisfies the following evolution equation,called the Kushner–Stratonovich equation,

πt(ϕ) = π0(ϕ) +∫ t

0

πs(Aϕ) ds

+∫ t

0

(πs(ϕh>)− πs(h>)πs(ϕ)

)(dYs − πs(h) ds), (3.57)


3.6 The Kushner–Stratonovich Equation 69

Proof. From Lemma 3.29 we obtain

1ρt(1)

= exp(−∫ t

0

πs(h>) dYs +12

∫ t

0

πs(h>)πs(h) ds)

d(

1ρt(1)

)=

1ρt(1)

[−πt(h>)dYt + πt(h>)πt(h)dt

]. (3.58)

By using (stochastic) integration by parts, (3.58), the Zakai equation forρt(ϕ) and the Kallianpur–Striebel formula, we obtain the stochastic differ-ential equation satisfied by πt,

πt(ϕ) = ρt(ϕ) · 1ρt(1)

dπt(ϕ) = πt(Aϕ)dt+ πt(ϕh>)dYt − πt(ϕ)πt(h>)dYt+ πt(ϕ)πt(h>)πt(h)dt− πt(ϕh>)πt(h)dt

which gives us the result. ut

Remark 3.31. The Zakai and Kushner–Stratonovich equations can be ex-tended for time inhomogeneous test functions. Let ϕ : [0,∞) × S → R bea bounded measurable function and let ϕt(·) = ϕ(t, ·) for any t ≥ 0. Then

ρt(ϕt) = π0(ϕ0) +∫ t

0

ρs(∂sϕs +Aϕs) ds+∫ t

0

ρs(ϕsh>) dYs (3.59)

πt(ϕt) = π0(ϕ0) +∫ t

0

πs(∂sϕs +Aϕs) ds

+∫ t

0

(πs(ϕsh>)− πs(h>)πs(ϕs))(dYs − πs(h) ds) (3.60)

for any ϕ ∈ D(A). This extension is carried out in Lemma 4.8.

Exercise 3.32. Consider once again the change detection filter introduced inExercise 3.7. Starting from the result of this exercise define an observationprocess

Yt =∫ t

0

Xs ds+Wt.

Show that the Kushner–Stratonovich equation for the process X takes theform

dπt(J1) = πt(J1)(1− πt(J1)) (dYt − πt(J1)dt) + (1− πt(J1))pt/gt dt. (3.61)

where J1 is the indicator function of the singleton set 1.


3.7 The Innovation Process Approach

Here we use the representation implied by Proposition 2.31 to derive theKushner–Stratonovich equation. The following corollary gives us a represen-tation for Yt-adapted martingales.

Corollary 3.33. Under the conditions of Proposition 2.31 every right contin-uous square integrable martingale which is Yt-adapted has a representation

ηt = η0 +∫ t

0

ν>s dIs t ≥ 0. (3.62)

Proof. Following Proposition 2.31, for any n ≥ 0, the Y∞-measurable (squareintegrable) random variable ηn − η0 has a representation of the form

ηn − η0 =∫ ∞

0

(νns )>dIs.

By conditioning with respect to Yt, for arbitrary t ∈ [0, n], we get that

ηt = η0 +∫ t

0

(νns )>dIs, t ∈ [0, n].

The result follows by observing that the processes νn, n = 1, 2, . . . mustbe compatible. That is, for any n,m > 0, νn and νm are equal on the set[0,min(n,m)]. ut

We therefore identify a square integrable martingale to which the corollary3.33 may be applied.

Lemma 3.34. Define Nt , πtϕ−∫ t

0πs(Aϕ) ds, then N is a Yt-adapted square

integrable martingale under the probability measure P.

Proof. Recall that πtϕ is indistinguishable from the Yt-optional projection ofϕ(Xt), hence let T be a bounded Yt-stopping time such that T (ω) ≤ K forall ω ∈ Ω. Then since Aϕ is bounded it follows that we can apply Fubini’stheorem combined with the definition of optional projection to obtain,

ENT = E

[πTϕ−

∫ T

0

πs(Aϕ) ds

]

= E[πTϕ]− E

[∫ K

0

1[0,T ](s)πs(Aϕ) ds

]

= E[ϕ(XT )]−∫ K

0

E[1[0,T ](s)πs(Aϕ)

]ds

= E[ϕ(XT )]−∫ K

0

E[1[0,T ](s)Aϕ(Xs)

]ds

= E[ϕ(XT )]− E

[∫ T

0

Aϕ(Xs) ds

].

3.7 The Innovation Process Approach 71

Then using the definition of the generator A in the form of (3.3), we can findMϕt an Ft-adapted martingale such that

ENT = E[ϕ(XT )]− E [ϕ(XT )− ϕ(X0)−MϕT ]

= E[ϕ(X0)].

Thus since Nt is Yt-adapted, and this holds for all bounded Yt-stopping times,it follows by Lemma B.2 that N is a Yt-adapted martingale. Furthermore sinceAϕ is bounded for ϕ ∈ D(A), it follows that Nt is bounded and hence squareintegrable. ut

An alternative proof of Proposition 3.30 can now be given using the inno-vation process approach. The proposition is restated because the conditionsunder which it is proved via the innovations method differ slightly from thosein Proposition 3.30.

Theorem 3.35. If the conditions (2.3) and (2.4) are satisfied then the con-ditional distribution of the signal π satisfies the following evolution equation,

πt(ϕ) = π0(ϕ) +∫ t

0

πs(Aϕ) ds

+∫ t

0

(πs(ϕh>)− πs(h>)πs(ϕ)

)(dYs − πs(h) ds), (3.63)


Proof. Let ϕ be an element of D(A). The process Nt = πtϕ−∫ t

0πs(Aϕ) ds is

by Lemma 3.34 a square integrable Yt-martingale. By assumption, condition(2.21) is satisfied, thus Corollary 3.33 allows us to find an integral representa-tion for Nt. This means that there exists a progressively measurable processν such that

Nt = EN0 +∫ t

0

ν>s dIs = π0(ϕ) +∫ t

0

ν>s dIs; (3.64)

thus using the definition of Nt, we obtain the following evolution equation forthe conditional distribution process π,

πt(ϕ) = π0(ϕ) +∫ t

0

πs(Aϕ) ds+∫ t

0

ν>s dIs. (3.65)

To complete the proof, it only remains to identify explicitly the process νt.Let ε = (εt)t≥0 be the process as defined in (B.19), Lemma B.39. Thus

dεt = iεtr>t dYt,

hence, by stochastic integration by parts (i.e. by applying Ito’s formula to theproducts πt(ϕ)εt and ϕ(Xt)εt)


πt(ϕ)εt = π0(ϕ)ε0 +∫ t

0

πs(Aϕ)εs ds+∫ t

0

ν>s εs dIs

+∫ t

0

πs(ϕ)iεsr>s (dIs + πs(h)ds) +∫ t

0

iεsr>s νs ds (3.66)

ϕ(Xt)εt = ϕ(X0)ε0 +∫ t

0

Aϕ(Xs)εs ds+∫ t

0

εs dMϕs +

∫ t

0

iεsr>s d 〈Mϕ,W 〉s

+∫ t

0

ϕ(Xs)iεsr>s (h(Xs)ds+ dWs) . (3.67)

Since we have assumed that the signal process and the observation processnoise are uncorrelated, 〈Mϕ, Y 〉t = 〈Mϕ,W 〉t = 0 consequently subtracting(3.67) from (3.66) and taking the expectation, all of the martingale termsvanish and we obtain∫ t

0

ir>s E [εs (νs − ϕ(Xs)h(Xs) + πs(h)πs(ϕ))] ds

= E [εt (πt(ϕ)− ϕ(Xt))] + E [ε0 (π0(ϕ)− ϕ(X0))]

+ E[∫ t

0

εs (Aϕ(Xs)− πs(Aϕ)) ds]

= E [εt (E [ϕ(Xt) | Yt]− ϕ(Xt))]= 0.

Hence, for almost all t ≥ 0,

E [εt (νt − ϕ(Xt)h(Xt) + πt(ϕ)πt(h))] = 0,

so since εt belongs to a total set it follows that

νt = πt(ϕh)− πt(ϕ)πt(h), P-a.s. (3.68)

Using the expression for πt(ϕ) given by (3.65) expressing the final term usingthe representation (3.64) with νt given by (3.68)

πt(ϕ) = π0(ϕ) +∫ t

0

πs(Aϕ) ds+∫ t

0

(πs(ϕh>

)− πs(ϕ)πs

(h>))

dIs, (3.69)

which is the Kushner–Stratonovich equation as desired. ut

The following exercise shows how the filtering equations can be derived ina situation which on first inspection does not appear to have an interpretationas a filtering problem, but which can be approached via the innovation processmethod.

Exercise 3.36. Define the Ft-adapted semimartingale α via

αt = α0 +∫ t

0

βs ds+ Vt, t ≥ 0

3.8 The Correlated Noise Framework 73

and

δt = δ0 +∫ t

0

γs ds+Wt, t ≥ 0,

where βt and γt are bounded progressively measurable processes and whereW is an Ft-adapted Brownian motion which is independent of β and γ. DefineDt = σ(δs; 0 ≤ s ≤ t) ∨N . Find the equivalent of the Kushner–Stratonovichequation for πt(ϕ) = E [ϕ(αt) | Dt].

The following exercise shows how one can deduce the Zakai equation fromthe Kushner–Stratonovich equation. For this introduce the exponential mar-tingale Z = Zt, t > 0 defined by

Zt , exp(∫ t

0

πs(h>)

dYs −12

∫ t

0

‖πs(h)‖2 ds), t ≥ 0.

Exercise 3.37. i. Show that

d(

1Zt

)= − 1

Ztπt(h>)

dIt.

ii. Show that for any εt from the total set St as defined in Lemma B.39,

E[εt

Zt

]= E [εtZt] .

iii. Show that Zt = E[Zt | Yt

]= ρt(1) .

iv. Use the Kallianpur–Striebel formula to deduce the Zakai equation.

3.8 The Correlated Noise Framework

Hitherto the noise in the observations W has been assumed to be independentof the signal process X. In this section we extend the results to the case whenthis noise W is correlated to the signal.

As in the previous section, the signal process Xt, t ≥ 0 is the solution ofa martingale problem associated with the generator A. That is, for ϕ ∈ D(A),

Mϕt , ϕ(Xt)− ϕ(X0)−

∫ t

0

Aϕ(Xs) t ≥ 0

is a martingale.We assume that there exists a vector of operators B = (B1, . . . , Bm)> such

that Bi : B(S)→ B(S) for i = 1, . . . ,m. Let D(Bi) ⊆ B(S) denote the domainof the operator Bi. We require for each i = 1, . . . ,m that Bi1 = 0 and forϕ ∈ D(Bi),

〈Mϕ,W i〉t =∫ t

0

Biϕ(Xs) ds. (3.70)


Define

D(B) ,n⋂i=1

D(Bi).

Corollary 3.38. In the correlated noise case, the Kushner–Stratonovich equa-tion is

dπt(ϕ) = πt(Aϕ)dt+ (πt(h>ϕ)− πt(h>)πt(ϕ) + πt(B>ϕ))× (dYt − πt(h)dt), for all ϕ ∈ D(A) ∩ D(B). (3.71)

Proof. We now follow the innovations proof of the Kushner–Stratonovichequation. However, using (3.70) the term∫ t

0

iεsr>s d〈Mϕ,W 〉s =

∫ t

0

iεsr>s Bϕ(Xs) ds.

Inserting this term, we obtain instead of (3.68),

νt = πt(ϕh)− πt(ϕ)πt(h) + πt(Bϕ), P-a.s.

and using this in (3.65) yields the result. ut

Corollary 3.39. In the correlated noise case, for ϕ ∈ B(S), the Zakai equa-tion is

ρt(ϕ) = ρ0(ϕ) +∫ t

0

ρs(Aϕ) ds+∫ t

0

ρs((h> +B>)ϕ) dYs. (3.72)

Consider the obvious extension of the diffusion process example studiedearlier to the case where the signal process is a diffusion given by

dXt = b(Xt) dt+ σ(Xt) dVt + σ(Xt) dWt; (3.73)

thus σ is a d × m matrix-valued process. If σ ≡ 0 this case reduces to theuncorrelated case which was studied previously.

Corollary 3.40. When the signal process is given by (3.73), the operator B =(Bi)mi=1 defined by (3.70) is given for k = 1, . . . ,m by

Bk =d∑i=1

σik∂

∂xi. (3.74)

Proof. Denoting by A the generator of X,


∫ t

0

Aϕ(Xs) ds

=d∑i=1

∫ t

0

∂ϕ

∂xi(σdVs)i +

d∑i=1

∫ t

0

∂ϕ

∂xi(σdWs)i.


Thus

〈Mϕ,W k〉t =d∑i=1

m∑j=1

∫ t

0

∂ϕ

∂xiσij d〈W j ,W k〉s

=d∑i=1

∫ t

0

∂ϕ

∂xiσik ds

and the result follows from (3.70). ut


3.3 From (3.10) with y = 0,

‖σ(x)− σ(0)‖ ≤ K‖x‖,

by the triangle inequality

‖σ(x)‖ ≤ ‖σ(x)− σ(0)‖+ ‖σ(0)‖≤ ‖σ(0)‖+K‖x‖.

Thus since (a+ b)2 ≤ 2a2 + 2b2,

‖σ(x)‖2 ≤ 2‖σ(0)‖2 + 2K2‖x‖2;

thus setting κ1 = max(2‖σ(0)‖2, 2K2), we see that

‖σ(x)‖2 ≤ κ1(1 + ‖x‖2).

Similarly from (3.10) with y = 0, and the triangle inequality, it follows that

‖f(x)‖ ≤ ‖f(0)‖+K‖x‖,

so setting κ2 = max(‖f(0)‖,K),

‖f(x)‖ ≤ κ2(1 + ‖x‖).

The result follows if we take κ = max(κ1, κ2).For the final part, note that

(σσ>)ij =p∑k=1

σikσjk,

hence |(σσ>)ij(x)| ≤ p‖σ‖2, consequently

‖σ(x)σ>(x)‖ ≤ pd2κ(1 + ‖x‖2);


thus we set κ′ = pd2κ to get the required result.

3.4 First we must check that Aϕ is bounded for ϕ ∈ SL2(Rd). By the resultof Exercise 3.3, with κ′ = κpd2/2,

‖a‖ = 12‖σ(x)σ>(x)‖ ≤ κ′(1 + ‖x‖2).

Hence

|Aϕ(x)| ≤d∑i=1

|fi(x)||∂iϕ(x)|+d∑

i,j=1

|aij(x)||∂i∂jϕ(x)|

≤d∑i=1

|fi(x)| C

1 + ‖x‖+

d∑i,j=1

|aij(x)| C

1 + ‖x‖2

≤ Cdκ+ Cpd2κ′ <∞,

so Aϕ ∈ B(Rd). By Ito’s formula since ϕ ∈ C2(Rd),

ϕ(Xt) = ϕ(X0) +∫ t

0

d∑i=1

∂iϕ(Xs)

f i(Xs) ds+p∑j=1

σij dV js

+

12

∫ t

0

d∑i,j=1

∂i∂jϕ(Xs)p∑k=1

σik(Xs)σjk(Xs) ds.

Hence

Mϕt =

d∑i=1

∫ t

0

∂iϕ(Xs)p∑j=1

σij(Xs) dV js ,

which is clearly a local martingale. Consider

d∑i=1

∫ t

0

|∂iϕ(Xs)|2∣∣∣∣∣∣p∑j=1

σij(Xs)

∣∣∣∣∣∣2

ds ≤ p∫ t

0

C2‖σ(Xs)‖2

(1 + ‖Xs‖)2ds

≤ C2p

∫ t

0

pd2κ(1 + ‖Xs‖2)(1 + ‖Xs‖)2

ds

≤ C2p2d2κt <∞.

Hence Mϕ is a martingale.

3.6 It is sufficient to show that for all i ∈ I, the process M i =M it , t ≥ 0

defined as

M it = Ji(Xt)− Ji(X0)−

∫ t

0

qXsi(s) ds, t ≥ 0,

where Ji is the indicator function of the singleton set i, is an Ft-adaptedright-continuous martingale. This is sufficient since


Mϕ =∑i∈I

ϕ(i)M i, for all ϕ ∈ B(S).

Thus if M i is a martingale for i ∈ I then so is Mϕ which establishes theresult.

The adaptedness, integrability and right continuity of M it are straightfor-

ward. From (3.16) and using the Markov property for 0 ≤ s ≤ t,

P(Xt = i | Fs) = E(E(1Xt=i | Ft−h

)∣∣Fs)= E [P(Xt = i | Xt−h)| Fs]= E[Ji(Xt−h) | Fs] + E

[qXt−hi(t− h)

∣∣Fs]h+ o(h)

= P(Xt−h = i | Fs) + E[qXt−hi(t− h)

∣∣Fs]h+ o(h).

It is clear that we may apply this iteratively; the error term is o(h)/h whichby definition tends to zero as h → 0. Doing this and passing to the limit ash→ 0 we obtain

P(Xt = i | Xs) = Ji(Xs) + E[∫ t

s

qXri(r) dr∣∣∣∣Fs] .

Now

E[M it | Fs] = P(Xt = i | Fs)− Ji(X0)− E

[∫ t

0

qXri(r) dr∣∣∣∣Fs]

= Ji(Xs)− Ji(X0)−∫ s

0

qXri(r) dr

= M is.

It follows that M it is a martingale.

3.7 Clearly the state space of X is 0, 1. Once in state 1 the process neverleaves the state 1 hence q10(t) = q11(t) = 0. Consider the transition from state0 to 1,

P(Xt+h = 1 | Xt = 0) = P(T ≤ t+ h | T > t) =P(t < T ≤ t+ h)

P(T > t)

=ptgth+ o(h).

Thus q01(t) = pt/gt and hence q00(t) = −q01(t) = −pt/gt.

3.10 By Ito’s formula

d(‖Xt‖2

)= 2X>t (f(Xt)dt+ σ(Xt)dVt) + tr

(σ(Xt)σ>(Xt)

)dt, (3.75)

Thus if we define

Mt ,∫ t

0

2X>s σ(Xs) dVs,


this is clearly a local martingale. Take Tn a reducing sequence (see DefinitionB.4) such that MTn

t is a martingale for all n and Tn →∞. Integrating between0 and t ∧ Tn and taking expectation, EMt∧Tn = 0, hence

E‖Xt∧Tn‖2 = E‖X0‖2 + E∫ t∧Tn

0

2XTs f(Xs) + tr(σ(Xs)σ>(Xs)) ds.

By the results of Exercise 3.3,

E‖Xt∧Tn‖2 ≤ E‖X0‖2 + E∫ t∧Tn

0

2dκ‖Xs‖(1 + ‖Xs‖) + κ′(1 + ‖Xs‖2) ds

so setting c = max(2dκ, 2dκ+ κ′, κ′) > 0,

E‖Xt∧Tn‖2 ≤ E‖X0‖2 + cE∫ t∧Tn

0

(1 + ‖X‖s + ‖Xs‖2) ds.

But by Jensen’s inequality for p > 1, it follows that for Y a non-negativerandom variable

E[Y ] ≤ (E[Y p])1/p ≤ 1 + E[Y p].

Thus

1 + E‖Xt∧Tn‖2 ≤ 1 + E‖X0‖2 + 2c∫ t∧Tn

0

E[1 + ‖Xs‖2] ds,

and by Corollary A.40 to Gronwall’s lemma

1 + E‖Xt∧Tn‖2 ≤ (1 + E‖X0‖2)e2c(t∧Tn).

We may take the limit as n→∞ by Fatou’s lemma to obtain

E‖Xt‖2 ≤ (1 + E‖X0‖2)e2ct − 1, (3.76)

which establishes the result for the second moment.In the case of the third moment, applying Ito’s formula to f(x) = x3/2

and the process ‖Xt‖2 yields

d(‖Xt‖3

)= 3‖Xt‖

(2X>t (f(Xt)dt+ σ(Xt)dVt) + tr(σ(Xt)σ>(Xt))dt

)+

32‖Xt‖

X>t σ(Xt)σ>(Xt)Xtdt.

Define

Nt , 6∫ t

0

‖Xs‖X>s σ(Xs) dVs,

and let Tn be a reducing sequence for the local martingale Nt. Integratingbetween 0 and t ∧ Tn and taking expectation, we obtain for some constantc > 0 (independent of n, t) that


E[‖Xt∧Tn‖3] ≤ E[‖X0‖3] + c

∫ t∧Tn

0

E[‖Xs‖+ ‖Xs‖2 + ‖Xs‖3] ds,

using Jensen’s inequality as before,

E[‖Xt∧Tn‖3] ≤ E[‖X0‖3] + 3c∫ t∧Tn

0

1 + E[‖Xs‖3] ds,

thus by Corollary A.40 to Gronwall’s lemma

E[‖Xt∧Tn‖3 + 1] ≤ E[‖X0‖3] + (1 + E‖X0‖3)e3c(t∧Tn),

passing to the limit as n→∞ using Fatou’s lemma

E[‖Xt‖3] ≤ (1 + E[‖X0‖3])e3ct − 1, (3.77)

and since E[‖X0‖3] <∞ (X0 has finite third moment) this yields the result.

3.11

i. As a consequence of the linear growth bound on h,

E(∫ t

0

‖h(Xs)‖2 ds)≤ CE

(∫ t

0

(1 + ‖Xs‖2) ds)≤ Ct+CE

∫ t

0

‖Xs‖2 ds.

It follows by Jensen’s inequality that

E[‖Xt‖2] ≤[E‖Xt‖3

]2/3.

Since the conditions (3.10) are satisfied and the second moment of X0 isfinite, we can use the bound derived in Exercise 3.7 as (3.76); viz

E[‖Xt‖2] ≤ (E‖X0‖2 + 1)e2ct.

Consequently for t ≥ 0,

E(∫ t

0

‖h(Xs)‖2 ds)≤ Ct+ C

(E[‖X0‖2 + 1]

e2ct − 12c

)<∞. (3.78)

This establishes the first of the conditions (3.25). For the second condition,using the result of (3.75), Ito’s formula yields

d(Zt‖Xt‖2

)= Zt

(2X>t (f(Xt)dt+ σ(Xt)dVt) + tr

(σ(Xt)σ>(Xt)

)dt)

− Zt‖Xt‖2h>(Xt)dYt.

Thus applying Ito’s formula to the function f(x) = x/(1 + εx) and theprocess Zt‖Xt‖2 yields


d(

Zt‖Xt‖2

1 + εZt‖Xt‖2

)=

1(1 + εZt‖Xt‖2)2 d

(Zt‖Xt‖2

)− ε

(1 + εZt‖Xt‖2)3

(Z2t ‖Xt‖4h>(Xt)h(Xt)

+ 4Z2tX>t σ(Xt)σ>(Xt)Xt

)dt. (3.79)

Integrating between 0 and t and taking expectation, the stochastic in-tegrals are local martingales; we must show that they are martingales.Consider first the term ∫ t

0

Zs2X>s σ(Xs)(1 + εZs‖Xs‖2)2 dVs;

to show that this is a martingale we must therefore establish that

E

∫ t

0

∥∥∥∥∥ Zs2X>s σ(1 + εZs‖Xs‖2)2

∥∥∥∥∥2

ds

= 4E

[∫ t

0

Z2sX>s σσ

>Xs

(1 + εZs‖Xs‖2)4 ds

]<∞.

In order to establish this inequality notice that

|X>t σ(Xt)σ>(Xt)Xt| ≤ d2‖Xt‖2‖σ(Xt)σ>(Xt)‖,

and from Exercise 3.3

‖σσ>‖ ≤ κ′(1 + ‖X‖2),

hence|Xtσ(Xt)σ(Xt)Xt| ≤ d2κ′‖Xt‖2

(1 + ‖Xt‖2

),

so the integral may be bounded by∫ t

0

Z2sX>s σσ

>Xs

(1 + εZt‖Xt‖2)4 ds ≤ κ′d2

∫ t

0

Z2s‖Xs‖2

(1 + ‖Xs‖2

)(1 + εZt‖Xs‖2)4 ds

= κ′d2

∫ t

0

Z2s‖Xs‖2

(1 + εZs‖Xs‖2)4 +Z2s‖Xs‖4

(1 + εZs‖Xs‖2)4 ds.

Considering each term of the integral separately, the first satisfies∫ t

0

Z2s‖Xs‖2

(1 + εZs‖Xs‖2)4 ds ≤∫ t

0

Zs ×Zs‖Xs‖2

(1 + εZt‖Xt‖2)× 1

(1 + εZt‖Xt‖2)3 ds

≤∫ t

0

Zsε

ds ≤ 1ε

∫ t

0

Zs ds.

Thus the expectation of this integral is bounded by t/ε, because E[Zs] ≤ 1.Similarly for the second term,

3.9 Solutions to Exercises 81∫ t

0

[Zs‖Xs‖2

(1 + εZs‖Xs‖2)2

]2

ds ≤∫ t

0

Z2s‖Xs‖4

(1 + εZs‖Xs‖2)2 ×1

(1 + εZs‖Xs‖2)2 ds

≤ 1ε2t <∞.

For the second stochastic integral term,

−∫ t

0

Zs‖Xs‖2h>(Xs)(1 + εZs‖Xs‖2)2

dVs,

to show that this is a martingale, we must show that

E[∫ t

0

Z2s‖Xs‖4‖h(Xs)‖2

(1 + εZs‖Xs‖2)4ds]<∞.

Thus bounding this integral∫ t

0


(1 + εZs‖Xs‖2)4ds ≤

∫ t

0

(Zs‖Xs‖2

(1 + εZs‖Xs‖2)

)2 ‖h(Xs)‖2

(1 + εZt‖X‖2)2ds

≤ C

ε2

∫ t

0

‖h(Xs)‖2 ds.

Taking expectation, and using the result (3.78),

E[∫ t

0


(1 + εZs‖Xs‖2)4ds]≤ C

ε2E(∫ t

0

‖h(Xs)‖2 ds)<∞.

Therefore we have established that the stochastic integrals in (3.79) aremartingales and have zero expectation. Consider now the remaining terms;by an application of Fubini’s theorem, we see that

ddt

E[

Zt‖Xt‖2

1 + εZt‖Xt‖2

]≤ E

[Zt(2X>t f(Xt) + tr

(σ(Xt)σ>(Xt)

))1 + εZt‖Xt‖2

]

≤ K(

E[

Zt‖Xt‖2

1 + εZt‖Xt‖2

]+ 1),

where we used the fact that E[Zt] ≤ 1. Hence, by Corollary A.40 to Gron-wall’s inequality there exists Kt such that for 0 ≤ s ≤ t,

E[

Zs‖Xs‖2

1 + εZs‖Xs‖2

]≤ Kt <∞,

by Fatou’s lemma as ε→ 0,

E[Zs‖Xs‖2

]≤ Kt <∞.

Then by Fubini’s theorem


E[∫ t

0

Zs‖h(Xs)‖2 ds]

= E

[∫ t

0

Zs

m∑i=1

hi(Xs)2 ds

]

=∫ t

0

E[Zs ‖h(Xs)‖2

]ds

≤ C∫ t

0

E[Zs(1 + ‖Xs‖2

)]ds ≤ Ct(1 +Kt) <∞,

which establishes the second condition in (3.25).ii. Let H = maxi∈I |h(i)|, as the state space I is finite, it is clear that

H <∞. Therefore

E[∫ t

0

‖h(Xs)‖2 ds]≤ E[Ht] = Ht <∞,

which establishes the first condition of (3.25). For the second condition byFubini’s theorem and the fact that Zt ≥ 0,

E[∫ t

0

Zs‖h(Xs)‖2 ds]≤ H

∫ t

0

E[Zs] ds ≤ Ht <∞.

Thus both conditions in (3.25) are satisfied (E[Zs] ≤ 1 for any s ∈ [0,∞)).

3.14

i. It is clear that Pt is X ∨Ft-measurable and that it is integrable. Now for0 ≤ s ≤ t,

E [Pt | X ∨ Fs] = E[exp(iβ>Yt − 1

2‖β‖2t)Zt∣∣X ∨ Fs]

=E[exp(iβ>Yt − 1

2‖β‖2t)| X ∨ Fs

]E[Zt | X ∨ Fs

]= Z−1

s exp(iβ>Ys − 1

2‖β‖2s)

= Ps.

Hence Pt is a X ∨ Ft martingale under P.ii. For notational convenience let us fix t0 = 0 and define

li =n∑j=i

βj .

Since W is independent of X it follows that


E

exp

n∑j=1

iβ>j Wtj

∣∣∣∣∣∣ X = E

exp

n∑j=1

iβ>j Wtj

= E

exp

n∑j=1

il>j (Wtj −Wtj−1)

= exp

n∑j=1

12‖lj‖

2(tj − tj−1)

.

For the left-hand side we write

E

exp

n∑j=1

iβ>j Ytj

Ztn

∣∣∣∣∣∣ ,X = E

exp

i n∑j=1

l>j (Ytj − Ytj−1)

Ztn

∣∣∣∣∣∣ X

= E[Zt1 exp

(il>1 Yt1

) Zt2 exp(il>2 Yt2

)Zt1 exp

(il>2 Yt1

)× · · · ×

Ztn exp(il>n Ytn

)Ztn−1 exp

(il>n Ytn−1

) ∣∣∣∣∣ X].

Write Pt(l) = exp(il>Yt − 1

2‖l‖2t)Zt; then

E

exp

n∑j=1

iβ>j Ytj

Ztn

∣∣∣∣∣∣ X

= E[Pt1(l1)

Pt2(l2)Pt1(l2)

· · ·Ptn−1(ln−1)Ptn−2(ln−1)

Ptn(ln)Ptn−1(ln)

∣∣∣∣ X]

×

exp

n∑j=1

12‖lj‖

2(tj − tj−1)

.From part (i) we know that Pt(l) is a X ∨Ft martingale for each l ∈ Rm;thus conditioning on X ∨ Ftn−1 ,

E[Pt1(l1)

Pt2(l2)Pt1(l2)

· · ·Ptn−1(ln−1)Ptn−2(ln−1)

Ptn(ln)Ptn−1(ln)

∣∣∣∣ X]= E

[Pt1(l1)Pt1(l2)

Pt2(l2)Pt2(l3)

· · ·Ptn−1(ln−1)Ptn−1(ln)

Ptn(ln)∣∣∣∣ X]

= E[Pt1(l1)Pt1(l2)

Pt2(l2)Pt2(l3)

· · ·Ptn−1(ln−1)Ptn−1(ln)

E[Ptn(ln) | X ∨ Ftn−1

] ∣∣∣∣ X]= E

[Pt1(l1)Pt1(l2)

Pt2(l2)Pt2(l3)

· · ·Ptn−2(ln−2)Ptn−2(ln−1)

Ptn−1(ln−1)∣∣∣∣ X] .


Repeating this conditioning we obtain

E[Pt1(l1)

Pt2(l2)Pt1(l2)

· · ·Ptn−1(ln−1)Ptn−2(ln−1)

Ptn(ln)Ptn−1(ln)

∣∣∣∣ X]= E [Pt1(l1) | X ]= E [E [Pt1(l1) | X ∨ Ft0 ] | X ]= E [Pt0(l1) | X ] = 1.

Hence

E

exp

n∑j=1

iβ>j Ytj

Ztn

∣∣∣∣∣∣ X = exp

n∑j=1

12‖lj‖

2(tj − tj−1)

,

which is the same as the result computed earlier for the right-hand side.iii. By Weierstrass’ approximation theorem any bounded continuous complex

valued function g(Yt1 , . . . , Ytp) can be approximated by a sequence as r →∞,

g(r)(Yt1 , . . . , Ytp) ,mr∑k=1

ark exp

i p∑j=1

(βrk,j

)>Ytj

.

Thus as a consequence of (ii) it follows that for such a function g,

E[g(Yt1 , . . . , Ytp)Zt | X ] = E[g(Yt1 , . . . , Ytp) | X ],

which since p was arbitrary by a further standard approximation argumentextends to any bounded Borel measurable function g,

E[g(Y )Zt | X ] = E[g(Y ) | X ].

Thus given f(X,Y ) bounded and measurable on the path spaces of X andY it follows that

E[f(X,Y )Zt] = E [E[f(X,Y )Zt | X ]] .

Conditional on X , f(X,Y ) may be considered as a function gX(Y ) on thepath space of Y and hence

E[f(X,Y )Zt] = E[E[gX(Y )Zt | X ]

]= E

[E[gX(W ) | X ]

]= E[f(X,W )].

3.22 The result (3.41) is immediate from the following identities,

ϕ(Xt) = ϕ(X0) +Mϕt +

∫ t

0

Aϕ(Xs) ds,

ϕ2(Xt) = ϕ2(X0) + 2∫ t

0

ϕ(Xs) dMϕs +

∫ t

0

2ϕAϕ(Xs) ds+ 〈Mϕ〉t ,

ϕ2(Xt) = ϕ2(X0) +Mϕ2

t +∫ t

0

Aϕ2(Xs) ds;


thus

〈Mϕ〉t =∫ t

0

(Aϕ2 − 2ϕAϕ)(Xs) ds.

Hence (3.39) becomes∫ t

0

u2s(Aϕ

2 − 2ϕAϕ) ds ≤(‖Aϕ2‖∞ + 2‖ϕ‖∞‖Aϕ‖∞

) ∫ t

0

u2s ds <∞.

3.23 Since under P the process Y is a Brownian motion a sufficient conditionfor the stochastic integral to be well defined is given by (B.9) which in thiscase takes the form, for all t ≥ 0, that

P

[∫ t

0

d∑i=1

(ρs(ϕhi))2 ds <∞

]= 1.

But since ϕ ∈ B(Rd) it follows that∫ t

0

d∑i=1

ρs(ϕhi)2 ds ≤ ‖ϕ‖2∞∫ t

0

d∑i=1

ρs(hi)2 ds

≤ d‖ϕ‖2∞∫ t

0

ρs(‖h‖)2 ds.

Thus under (3.42) for all t ≥ 0

P[∫ t

0

ρs(‖h‖)2 ds <∞]

= 1,

and the result follows.

3.25

i. As a consequence of the linear growth condition (3.28) we have that

ρt(‖h‖) ≤ Cρt(√

1 + ‖Xt‖2),

and we prove thatt 7→ ρt

(√1 + ‖Xt‖2

)(3.80)

is uniformly bounded on compact intervals. The derivation of (3.44) did notrequire condition (3.42). We should like to apply this to ψ(x) =

√1 + ‖x‖2,

but while continuous this is not bounded. Thus choosing an approximatingtest function

ϕλ(x) =

√1 + ‖x‖2

1 + λ‖x‖2

in (3.44), we wish to take the limit as λ tends to 0 as ϕλ converges pointwiseto ψ. Note that

86 3 The Filtering Equations∥∥∥∥∥ ϕλ(Xs)Zs(1 + εZs)2

h(Xs)

∥∥∥∥∥ =1ε

∥∥∥∥∥ϕλ(Xs)εZs

1 + εZs

11 + εZs

h(Xs)

∥∥∥∥∥≤ 1εϕλ(Xs)‖h(Xs)‖

≤√C

ε

(√1 + ‖Xs‖2

1 + λ‖Xs‖2√

1 + ‖Xs‖2)

≤√C

ε

(1 + ‖Xs‖2

).

Therefore we have the bound,∥∥∥∥∥E[

(ϕλ(Xs)− ψ(Xs))Zs(1 + εZs)2

h(Xs) | Y

]∥∥∥∥∥ ≤√C

ε

(1 + E[‖Xs‖2 | Y]

).

But by Proposition 3.13 since under P the process X is independent of Y ,and since the law of X is the same under P as it is under P, it follows that∥∥∥∥∥E

[(ϕλ(Xs)− ψ(Xs))Zs

(1 + εZs)2h(Xs) | Y

]∥∥∥∥∥ ≤√C

ε

(1 + E[‖Xs‖2]

). (3.81)

Using the result (3.76) of Exercise 3.10 conclude that

∫ t

0

(√C

ε

(1 + E‖Xs‖2

))2

ds ≤ C

ε2

(1 + E‖X0‖2

)2 ∫ t

0

e4cs ds <∞.

Thus by the dominated convergence theorem using the right-hand side of(3.81) as a dominating function, λ→ 0,∫ t

0

(E[ϕλ(Xs)Zs(1 + εZs)−2h(Xs) | Y

]− E

[ψ(Xs)Zs(1 + εZs)−2h(Xs) | Y

])2

ds→ 0;

thus using Ito’s isometry it follows that as λ→ 0,∫ t

0

E[ϕλ(Xs)Zs(1 + εZs)−2h(Xs) | Y

]dYs

→∫ t

0

E[ϕ(Xs)Zs(1 + εZs)−2h(Xs) | Y

]dYs → 0,

whence we see that (3.44) holds for the unbounded test function ψ. Thisψ is not contained in D(A) since it is not bounded; however, computingusing (3.11) directly


Aψ =1ψ

(f>x+

12

tr(σσ>)− 12ψ2

(X>σσ>X)).

Thus using the bounds in (3.14) and (3.15) which follow from (3.10),

|Aψ|/ψ ≤ 1ψ2

(κd(1 + ‖X‖)‖X‖+ 1

2κ′(1 + ‖X‖2) + 1

2κ′d2‖X‖2

)≤ 1

2κ′ + κd+ 1

2d2κ′.

For future reference we define

kA , 12κ′ + κd+ 1

2d2κ′. (3.82)

We also need a simple bound which follows from (3.26) and Jensen’s in-equality

E[Ztψ(Xt)] = E[ψ(Xt)] ≤√

1 + E[‖Xt‖2] ≤√

1 +Gt. (3.83)

In the argument following (3.47) the stochastic integral in (3.44) was shownto be a Yt-adapted martingale under the measure P. Therefore for 0 ≤ r ≤t,

E[E[Zεt ψ(Xt) | Y]− π0(ψ)

1 + ε+∫ t

0

E[ZεsAψ(Xs) | Y] ds

−∫ t

0

E[εψ(Xs)

(Zs

)2 (1 + εZs

)−3

‖h(Xs)‖2 | Y]

ds∣∣∣∣ Yr]

= E[Zεrψ(Xr) | Y]− π0(ψ)1 + ε

+∫ r

0

E[ZεsAψ(Xs) | Y

]ds

−∫ r

0

E[εψ(Xs)

(Zs

)2 (1 + εZs

)−3

‖h(Xs)‖2 | Y]

ds.

Then we can take the limit on both sides of this equality as ε → 0. Forthe term E[Zεt ψ(Xt) | Y] the limit follows by monotone convergence. Forthe term involving π0(ψ), since X0 has finite third moment,

E(π0(ψ)) = E[ψ(X0)] <√

1 + E‖X0‖2 <∞, (3.84)

the limit follows by the dominated convergence theorem. For the integralinvolving the generator A we use the bound (3.82) to construct a domi-nating function since using (3.83) it follows that E[ZtkAψ(Xt)] < ∞, thelimit then follows by the dominated convergence theorem. This only leavesthe integral term which does not involve A; as this is not monotone in εwe must construct a dominating function. As a consequence of (3.28) andthe definition of ψ(x),

88 3 The Filtering Equations∣∣∣∣∣ εψ(Xs)Z2s

(1 + εZs)3‖h(Xs)‖2

∣∣∣∣∣ =

∣∣∣∣∣ψ(Xs)Zs‖h(Xs)‖2εZs

1 + εZs

(1 + εZs

)−2∣∣∣∣∣

≤ ψ(Xs)Zs‖h(Xs)‖2

≤ CZs(1 + ‖Xs‖2)1/2(1 + ‖Xs‖2)

≤ CZs(1 + ‖Xs‖2)3/2.

and use the fact that the third moment of ‖Xt‖ is bounded (3.27) to seethat this is a suitable dominating function. Hence as ε→ 0,∫ t

0

E[εϕ(Xs)

(Zs

)2 (1 + εZs

)−3

‖h(Xs)‖2 | Y]

ds→ 0,

and thus passing to the ε→ 0 limit we obtain that

Mt , ρt(ψ)− π0(ψ) +∫ t

0

ρs(Aψ) ds (3.85)

satisfies E[Mt | Fr] = Mr for 0 ≤ r ≤ t, and Mt is Yt-adapted. To showthat Mt is a martingale, it only remains to show that E|Mt| <∞, but thisfollows from the fact that for s ∈ [0, t] using (3.83),

E[ρt(ψ)] = E[E[Ztψ(Xt) | Y]

]= E(Ztψ(Xt)) <∞,

together with the bounds (3.82) and (3.84) this implies

E[|Mt|] ≤ E(ρt(ψ))) + E[π0(ψ)] + kA

∫ t

0

E[ρs(ψ)] ds <∞

≤√

1 +Gt(1 + kat) +√

1 + E‖X0‖2 <∞.

But since ρt(ψ) is cadlag (from the properties of ρt) it follows that Mt

is a cadlag Yt-adapted martingale under P. Finally we use the fact thata cadlag martingale has paths which are bounded on compact intervalsin time (a consequence of Doob’s submartingale inequality, see Theo-rem 3.8 page 13 of Karatzas and Shreve [149] for a proof) to see thatP(sups∈[0,t] |Mt| <∞) = 1. Then for ω fixed we have from (3.82) that

|ρt(ψ)| ≤ sups∈[0,t]

|Mt|+ |π0(ψ)|+ kA

∫ t

0

|ρs(ψ)|ds,

so Gronwall’s inequality implies that

|ρt(ψ)(ω)| ≤

(sups∈[0,t]

|Mt|+ |π0(ψ)|

)ekAt,

whence for ω not in a null set ρs(ψ) is bounded for s ∈ [0, t]. Hence theresult.


ii. Setting H = maxi∈I ‖h(i)‖, since I is finite, H <∞, thus using the factthat ρs is a probability measure∫ t

0

ρs(‖h‖)2 ds ≤ H2

∫ t

0

ρs(1)2 ds.

From (3.44) with ϕ = 1, since A1 = 0,

E[Zεt | Y] =π0(1)1 + ε

−∫ t

0

E[ε(Zεt )2 1

(1 + εZs)‖h(Xs)‖2 | Y

]ds

+∫ t

0

E[Zεt

11 + εZs

h>(Xs) | Y]

dYs.

Taking conditional expectation with respect to Yr for 0 ≤ r ≤ t,

E(

E[Zεt ] +∫ t

0

E[ε(Zεt )2 1

(1 + εZs)‖h(Xs)‖2 | Y

]ds∣∣∣∣ Yr)

= E[Zεt | Y] +∫ r

0

E[ε(Zεt )2 1

(1 + εZs)‖h(Xs)‖2 | Y

]ds.

Since ‖h‖ ≤ H, it is straightforward to pass to the limit as ε → 0 whichyields ρt(1) is a Yt-martingale. As in case (i) above then this has a cadlagversion which is a.s. bounded on finite intervals. Thus∫ t

0

ρs(1) ds <∞ P-a.s.,

which establishes (3.42) since the measures P and P are equivalent on Ftand thus have the same null sets.

3.27

i. Observe first that (using the properties of the matrix Q):

ρt(1) =∑i∈I

ρit = 1 +m∑j=1

∫ t

0

ρs(hj)

dY js .

Next apply Ito’s formula and integration by parts to obtain the evolutionequation of

πit =ρit∑i∈I ρ

it

.

ii. Assume that there are two continuous Yt-adapted |I|-dimensional pro-cesses, π and π, solutions of the equation (3.53). Show that the processescontinuous Yt-adapted |I|-dimensional processes ρ and ρ defined as


ρt = exp

m∑j=1

∫ t

0

πs(hj) dY js − 12

∫ t

0

πs(hj)2 ds

πt, t ≥ 0

ρt = exp

m∑j=1

∫ t

0

πs(hj) dY js − 12

∫ t

0

πs(hj)2 ds

πt, t ≥ 0

satisfy equation (3.52) hence must coincide. Hence their normalised versionmust do so, too. Note that the continuity and the adaptedness of the pro-cesses are used to ensure that the stochastic integrals appearing in (3.52)and, respectively, (3.53) are well defined.

3.32 It is easiest to start from the finite-dimensional form of the Kushner–Stratonovich equation which was derived as (3.53). The Markov chain hastwo states, 0 and 1 depending upon whether the event is yet to happen, orhas happened. Since it is clear that π0

t + π1t = 1, then it suffices to write the

equation for the component corresponding to the state 1 as this is π1t = πt(J1).

Then h is given by 1T≤t and hence h = J1. Writing the equation for state1,

π1t = π1

0 +∫ t

0

(q01π0s + q11π

1s) ds+

∫ t

0

(h(1)− π1s(h))π1

s(dYs − π1sds)

= π10 +

∫ t

0

(1− π1s)pt/gt ds+

∫ t

0

(1− π1s)π1

s(dYs − π1sds).

3.36 Since β is bounded for ϕ ∈ C2b (R) by Ito’s formula

ϕ(αt)− ϕ(α0) =∫ t

0

Asϕ(αs) ds+Mϕt ,

where

As = βs∂

∂x+

12∂2

∂x2,

and Mϕt =

∫ t0ϕ′(Xs) dVs is an Ft-adapted martingale.

Analogously to Theorem 2.24, we can define a probability measure-valuedprocess πt, such that for ft a bounded Ft-adapted process, π(ft) is a versionof the Dt-optional projection of ft. The equivalent of the innovations processIt for this problem is

It , δt −∫ t

0

πs(γs) ds,

which is a Dt-adapted Brownian motion under P. By the representation result,Proposition 2.31, we can find a progressively measurable process νt such that

πt(ϕ(αt))−∫ t

0

πs(Asϕ(αs)) ds = π0(ϕ(α0)) +∫ t

0

νs dIs,


therefore it follows that

πt(ϕ(αt)) = π0(ϕ(α0)) +∫ t

0

πs(Asϕ(αs)) ds+∫ t

0

νs dIs.

As in the innovations proof of the Kushner–Stratonovich equation, to identifyν, we can compute d(πt(ϕ(αt))εt) and d(εtϕ(αt)) whence subtracting andtaking expectations and using the independence of W and V we obtain that

νt = πt(γtϕ(αt))− π(γt)π(ϕ(αt)),

whence

πt(ϕ(αs)) = π0(ϕ(α0)) +∫ t

0

πs(βsϕ

′(αs) + 12ϕ′′(αs)

)ds

+∫ t

0

(πs(γsϕ(αs))− πs(γs)πs(ϕ(αs))) (dδs − πs(γs)ds).

3.37

i. By Ito’s formula

d(Z−1t ) = Z−1

t (−πt(h>)dYt + 12‖πt(h)‖2dt) + 1

2 Z−1t ‖πt(h)‖2dt

= −Z−1t πt(h>)(dYt − πt(h)dt)

= −Z−1t πt(h>)dIt.

ii. Let εt ∈ St be such that dεt = iεtr>dYt and apply Ito’s formula to the

product d(εtZ−1t ) which yields

d(εtZ−1t ) = −εtZ−1

t πt(h>)dIt + iZ−1t εtr

>t dYt − iεtZ−1

t 〈r>t dYt, πt(h>)dIt〉= εtZ

−1t

(−πt(h>)dIt + ir>t dYt − ir>t πt(h)ds

)= εtZ

−1t

(−πt(h>) + ir>t

)dIt.

Since by Proposition 2.30 the innovation process It is a Yt-adapted Brow-nian motion under the measure P it follows that taking expectation

E[εtZ−1t ] = E[ε0Z

−10 ] = 1.

Now consider

E[Ztεt] = E[εt] = E[1 +

∫ t

0

iεsr>s dYs

]= 1,

since Yt is a Brownian motion under P. Thus

E[Z−1t εt] = E[Ztεt].


iii. It follows from the result of the previous part that

E[Ztεt/Zt

]= E[ZtZtεt].

HenceE[εt

(Z−1t Zt − 1

)]= 0.

Clearly Zt and εt are Yt-measurable

E[εt

(Z−1t E[Zt | Yt]− 1

)]= 0.

Since Z−1t E[Zt | Yt] − 1 is Yt-measurable, it follows from the total set

property of St that

Z−1t E[Zt | Yt] = 1, P-a.s.

Since Zt > 0 it follows that

Zt = E[Zt | Yt].

We may drop the a.s. qualification since it is implicit from the fact thatconditional expectations are only defined almost surely.

iv. By the Kallianpur–Striebel formula P-a.s. using the result of part (iii)


= Z−1t ρt(ϕ).

Hence ρt(ϕ) = Ztπt, and note that by a simple application of Ito’s formuladZt = Ztπt(h>)dYt. Starting from the Kushner–Stratonovich equation

dπt(ϕ) = πt(Aϕ)dt+ πt(ϕh>)dIt − πt(ϕ)πt(h>)dIt.

Applying Ito’s formula to the product Ztπt we find

dρt(ϕ) = dπt(ϕ)Zt + πtZtπt(h>)dYt + d〈Zt, πt(ϕ)〉=(πt(Aϕ)dt+ πt(ϕh>)dIt − πt(ϕ)πt(h>)dIt

)Zt + πt(ϕ)Ztπt(h>)dYt

+ Ztπt(h)(πt(ϕh>)− πt(ϕ)πt(h>))dt

= Zt(πt(Aϕ)dt+ πt(ϕh>)dYt

)= ρt(Aϕ)dt+ ρt(ϕh>)dYt.

But this is the Zakai equation as required.



In [160], Krylov and Rozovskii develop the theory of strong solutions of Itoequations in Banach spaces and use this theory to deduce the filtering equa-tions in a different manner from the two methods presented here.

In [163], Krylov and Zatezalo deduce the filtering equations using a PDE,rather than probabilistic, approach. They use extensively the elaborate theo-retical framework for analyzing SPDEs developed by Krylov in [157] and [158].The approach requires boundedness of the coefficients and strict ellipticity ofthe signal’s diffusion matrix.

4

Uniqueness of the Solution to the Zakai andthe Kushner–Stratonovich Equations

The conditional distribution of the signal π = πt, t ≥ 0 is a solution of theKushner–Stratonovich equation, whilst its unnormalised version ρ = ρt, t ≥0 solves the Zakai equation. It then becomes natural to ask whether the Zakaiequation uniquely characterizes ρ, and the Kushner–Stratonovich equationuniquely characterizes π. In other words, we should like to know under whatassumptions on the coefficients of the signal and observation processes the twoequations have a unique solution. The question of uniqueness of the solutionsof the two equations is central when attempting to approximate numericallyπ or ρ as most of the analysis of existing numerical algorithms relies on theSPDE characterization of the two processes.

To answer the uniqueness question one has to identify suitable spacesof possible solutions to the equations (3.43) and (3.57). These spaces mustbe large enough to allow for the existence of solutions of the correspondingSPDE. Thus π should naturally belong to the space of possible solutions forthe Kushner–Stratonovich equation, and ρ to the space of possible solutionsto the Zakai equation. However, if we choose a space of possible solutionswhich is too large this may make the analysis more difficult, and even allowmultiple solutions.

In the following we present two approaches to prove the uniqueness of thesolutions to the two equations: the first one is a PDE approach, inspired byBensoussan [13]; the second one is a more recent functional analytic approachintroduced by Lucic and Heunis [200]. For both approaches the following resultis useful.

Exercise 4.1. Let µ1 = µ1t , t ≥ 0 and µ2 = µ2

t , t ≥ 0 be two M(S)-valued stochastic processes with cadlag paths and (ϕi)i≥0 be a separating setof bounded measurable functions (in the sense of Definition 2.12). If for eacht ≥ 0 and i ≥ 0, the identity µ1

t (ϕi) = µ2t (ϕi) holds almost surely, then µ1

and µ2 are indistinguishable.


96 4 Uniqueness of the Solution

4.1 The PDE Approach to Uniqueness

In this section we assume that the state space of the signal is S = Rd and thatthe signal process is a diffusion process as described in Section 3.2.1.

First we define the space of measure-valued stochastic processes withinwhich we prove uniqueness of the solution. This space has to be chosen so thatit contains only measures with respect to which the integral of any functionwith linear growth is finite. The reason for this is that we want to allow thecoefficients of the signal and observation processes to be unbounded. Definefirst the class of integrands for these measures. Let ψ : Rd → R be the function

ψ(x) = 1 + ‖x‖, (4.1)

for any x ∈ Rd and define Cl(Rd) to be the space of continuous functions ϕsuch that ϕ/ψ ∈ Cb(Rd). Endow the space Cl(Rd) with the norm

‖ϕ‖l∞ = supx∈Rd

|ϕ(x)|ψ(x)

.

Also let E be the space of continuous functions ϕ : [0,∞)×Rd → R such thatfor all t ≥ 0, we have

sups∈[0,t]

‖ϕs‖l∞ <∞, (4.2)

where ϕs(x) = ϕ(s, x) for any (s, x) ∈ [0,∞)× Rd.Let Ml(Rd) ⊂ M(Rd) be the space of finite measures µ over B(Rd) such

that µ(ψ) <∞. In particular, this implies that µ(ϕ) <∞ for all ϕ ∈ Cl(Rd).We endowMl(Rd) with the corresponding weak topology. That is, a sequence(µn) of measures in Ml(Rd) converges to µ ∈Ml(Rd) if and only if

limn→∞

µn(ϕ) = µ(ϕ), (4.3)

for all ϕ ∈ Cl(Rd). Obviously this topology is finer than the usual weaktopology (i.e. the topology under which (4.3) holds true only for ϕ ∈ Cb(Rd)).

Exercise 4.2. For any µ ∈ Ml(Rd) define νµ ∈ M(Rd) to be the measurewhose Radon–Nikodym derivative with respect to µ is ψ (defined in (4.1)).Let µ, µn, n ≥ 1 be measures inMl(Rd). Then µn converges to µ inMl(Rd)if and only if (νµn) converges weakly to νµ in M(Rd).

Definition 4.3. The class U is the space of all Yt-adapted Ml(Rd)-valuedstochastic processes µ = µt, t ≥ 0 with cadlag paths such that, for all t ≥ 0,we have

E[∫ t

0

(µs(ψ))2 ds]<∞. (4.4)

4.1 The PDE Approach to Uniqueness 97

Exercise 4.4. (Difficult) Let X be the solution of (3.9). Prove that if (3.10)is satisfied, X0 has finite second moment, and h is bounded then ρ belongs tothe class U . [Hint: You will need to use the Kallianpur–Striebel formula andthe normalised conditional distribution πt.]

We prove that the Zakai equation (3.43) has a unique solution in the classU subject to the following conditions on the processes.

Condition 4.5 (U). The functions f = (f i)di=1 : Rd → Rd appearing in thesignal equation (3.9), a = (aij)i,j=1,...,d : Rd → Rd×d as defined in (3.12) andh = (hi)mi=1 : Rd → Rm appearing in the observation equation (3.5) have twicecontinuously differentiable components and all their derivatives of first- andsecond-order are bounded.

Remark 4.6. Under condition U all components of the functions a, f and h arein Cl(Rd), but need not be bounded. However, condition U does imply thata, f and h satisfy the linear growth condition (see Exercise 4.11 for details).

Exercise 4.7. i. Show that if the process µ belongs to the class U then t 7→µt(ϕt) is a Yt-adapted process for all ϕ ∈ E (where E is defined in (4.2)).

ii. Let ϕ be a function in C1,2b ([0, t] × Rd) and µ be a process in the class

U . Assume that h satisfied the bounded growth condition (3.28). Then theprocesses

t 7→∫ t

0

µs

(∂ϕs∂s

+Aϕs

)ds, t ≥ 0

t 7→∫ t

0

µs(ϕsh>) dYs, t ≥ 0

are well defined Yt-adapted processes. In particular, the second process isa square integrable continuous martingale under the measure P.

When establishing uniqueness of the solution of the Zakai equation, weneed to make use of a time-inhomogeneous version of (3.43).

Lemma 4.8. Assume that the coefficients a, f and g satisfy condition U. Letµ be a process belonging to the class U which satisfies (3.43) for any ϕ ∈ D(A).Then, P-almost surely,

µt(ϕt) = π0(ϕ0) +∫ t

0

µs

(∂ϕs∂s

+Aϕs

)ds+

∫ t

0

µs(ϕsh>) dYs, (4.5)

for any ϕ ∈ C1,2b ([0, t]× Rd).

Proof. Let us first prove that under condition U, µ satisfies equation (3.43)for any function ϕ ∈ C2

b (Rd) not just for ϕ in the domain of the infinitesimalgenerator ϕ ∈ D(A) ⊂ C2

b (Rd). We do this via an approximation argument.


Choose a sequence (ϕn) such that ϕn ∈ D(A) (e.g. ϕn ∈ C2k(Rd)) such that,

ϕn, ∂αϕn, α = 1, . . . , d and ∂α∂βϕn, α, β = 1, . . . , d converge boundedlypointwise to ϕ, ∂αϕ, α = 1, . . . , d and ∂α∂βϕ, α, β = 1, . . . , d. In other wordsthe sequence (ϕn) is uniformly bounded and for all x ∈ Rd, limn→∞ ϕn(x) =ϕ(x), with a similar convergence assumed for the first and second partialderivatives of ϕn. Then, P-almost surely

µt(ϕn) = π0(ϕn) +∫ t

0

µs (Aϕn) ds+∫ t

0

µs(ϕnh>) dYs. (4.6)

Since (ϕn) is uniformly bounded and pointwise convergent, by the dominatedconvergence theorem, we get that

limn→∞

µt(ϕn) = µt(ϕ), (4.7)

and similarlylimn→∞

π0(ϕn) = π0(ϕ). (4.8)

The use of bounded pointwise convergence and condition U implies that thereexists a constant K such that

|Aϕn(x)| ≤ Kψ(x),

for any x ∈ Rd and n > 0. Since µ ∈ U implies that µs(ψ) < ∞, by thedominated convergence theorem limn→∞ µs (Aϕn) = µs(Aϕ). Also, from (4.4)it follows that

E[∫ t

0

µs(ψ) ds]≤ E

[∫ t

0

12

(1 + µs(ψ)2

)ds]<∞. (4.9)

Therefore, P-almost surely ∫ t

0

µs(ψ) ds <∞

and, again by the dominated convergence theorem, it follows that

limn→∞

∫ t

0

µs (Aϕn) ds =∫ t

0

µs(Aϕ) ds P-a.s. (4.10)

Similarly, one uses the integrability condition (4.4) and again the domi-nated convergence theorem to show that for i = 1, . . . ,m,

limn→∞

E[∫ t

0

(µs(ϕnhi)− µs(ϕhi))2 ds]

= 0;

hence by Ito’s isometry property, we get that


limn→∞

∫ t

0

µs(ϕnh>) dYs =∫ t

0

µs(ϕh>) dYs. (4.11)

Finally, by taking the limit of both sides of the identity (4.6) and using theresults (4.7), (4.8), (4.10) and (4.11) we obtain that µ satisfies equation (3.43)for any function ϕ ∈ C2

b (Rd). The limiting processes t 7→∫ t

0µs(Aϕ) ds and

t 7→∫ t

0µs(ϕsh>) dYs, t ≥ 0 are well defined as a consequence of Exercise 4.7.

Let us extend the result to the case of time-dependent test functions ϕ ∈C1,2b ([0, t]×Rd). Once again by Exercise 4.7 all the integral terms in (4.5) are

well defined and finite. Also from (3.43), for i = 0, 1, . . . , n− 1 we have

µ(i+1)t/n(ϕit/n) = µit/n(ϕit/n) +∫ (i+1)t/n

it/n

µs(Aϕit/n) ds

+∫ (i+1)t/n

it/n

µs(ϕit/nh>) dYs

for i = 0, 1, . . . , n− 1. By Fubini’s theorem we have that

µ(i+1)t/n(ϕ(i+1)t/n − ϕit/n) =∫ (i+1)t/n

it/n

µ(i+1)t/n

(∂ϕs∂s

)ds.

Hence

µ(i+1)t/n(ϕ(i+1)t/n) = µ(i+1)t/n(ϕ(i+1)t/n − ϕit/n) + µ(i+1)t/n(ϕit/n)

= µit/n(ϕit/n) +∫ (i+1)t/n

it/n

µ(i+1)t/n

(∂ϕs∂s

)ds

+∫ (i+1)t/n

it/n

µs(Aϕit/n

)ds

+∫ (i+1)t/n

it/n

µs(ϕit/nh>) dYs.

Summing over the intervals [it/n, (i+ 1)t/n] from i = 0 to n− 1,

µt(ϕt) = π0(ϕ0) +∫ t

0

µ([ns/t]+1)t/n

(∂ϕs∂s

)ds+

∫ t

0

µs(Aϕ[ns/t]t/n

)ds

+∫ t

0

µs(ϕ[ns/t]t/nh

>) dYs. (4.12)

The claim follows by taking the limit as n tends to infinity of both sides ofthe identity (4.12) and using repeatedly the dominated convergence theorem.Note that we use the cadlag property of the paths of µ to find the upperbound for the second term. ut


Exercise 4.9. Assume that the coefficients a, f and g satisfy condition U.Let µ be a process belonging to the class U which satisfies the Zakai equation(3.43) and ϕ be a function in C1,2

b ([0, t]×Rd). Let εt ∈ St, where St is the setdefined in Corollary B.40, that is,

εt = exp(i

∫ t

0

r>s dYs +12

∫ t

0

‖rs‖2 ds),

where r ∈ Cmb ([0, t],Rm). Then

E[εtµt(ϕt)] = π0(ϕ0) + E[∫ t

0

εsµs

(∂ϕs∂s

+Aϕs + iϕsh>rs

)ds]

(4.13)

for any ϕ ∈ C1,2b ([0, t]× Rd).

In the following we establish the existence of a function ϕ ∈ C1,2b ([0, t]×Rd)

which plays the role of a (partial) function dual of the process µ; in other wordswe seek ϕ such that for s ∈ [0, t], µs(ϕs) = 0. In particular as a consequenceof (4.13) and the fact that the set St is total, such a function could arise as asolution ϕ ∈ C1,2

b ([0, t]×Rd) of the second-order parabolic partial differentialequation

∂ϕs(s, x)∂s

+Aϕs(s, x) + iϕs(s, x)h>(x)rs = 0, (4.14)

where the operator A is given by

Aϕ =d∑

i,j=1

aij∂2

∂xi∂xjϕ+

d∑i=1

f i∂

∂xiϕ.

This leads to a unique characterisation of µ. The partial differential equation(4.14) turns out to be very hard to analyse for two reasons. Firstly, the coef-ficients aij(x) for i, j = 1, . . . , d, f i(x), and hi(x) for i = 1, . . . , d are not ingeneral bounded as functions of x. Secondly, the matrix a(x) may be degen-erate at some points x ∈ Rd. A few remarks on this degeneracy may be help-ful. Since a(x) = 1

2σ>(x)σ(x) it is clear that y>a(x)y = 1

2y>σ>(x)σ(x)y =

12 (σ(x)y)>(σ(x)y) ≥ 0, thus for all x ∈ Rd, a(x) is positive semidefinite.However, a(x) is not guaranteed to be positive definite for all x ∈ Rd; inother words there may exist x ∈ Rd such that there is a non-zero y suchthat y>a(x)y = 0, for example, if for some x, a(x) = 0 and this is not pos-itive definite. Such a situation is not physically unrealistic since it has theinterpretation of an absence of noise in the signal process at the point x.

A typical existence and uniqueness result for parabolic PDEs is the fol-lowing

Theorem 4.10. If the PDE

∂ϕt∂t

=d∑

i,j=1

aij∂2ϕt∂xi∂xj

+d∑i=1

f i∂ϕt∂xi

(4.15)


is uniformly parabolic, that is, if there exists λ > 0 such that x>ax ≥ λ‖x‖2for every x 6= 0, the functions f and a bounded and Holder continuous withexponent α and Φ is a C2+α function, then there exists a unique solution tothe initial condition problem given by (4.15) and the condition ϕ0(x) = Φ(x).Furthermore if the coefficients a, f and the initial condition Φ are infinitelydifferentiable then the solution ϕ is infinitely differentiable in the spatial vari-able x.

The proof of the existence of solutions to the parabolic PDE is fairlydifficult and its length precludes its inclusion here. These details can be foundin Friedman [102] as Theorem 7 of Chapter 3 and the continuity result followsfrom Corollary 2 in Chapter 3. Recall that the Holder continuity condition issatisfied with α = 1 for Lipschitz functions.

As these conditions are not satisfied by the PDE (4.14), we use a sequenceof functions (vn) which solves uniformly parabolic PDEs with smooth boundedcoefficients. For this, we approximate a, f and h by bounded continuous func-tions. More precisely let (an)n≥1 be a sequence of functions an : Rd → Rd×d,(fn)n≥1 a sequence of functions fn : Rd → Rd and (hn)n≥1 a sequence of func-tions hn : Rd → Rm. We denote components as usual by superscript indices.We require that these sequences of functions have the following properties. Allthe component functions have bounded continuous derivatives of all orders; inother words each component is an element of C∞b (Rd). There exists a constantK0 such that the bounds on the first- and second-order derivatives (but notnecessarily on the function values) hold uniformly in n,

supn

maxi,j,α

∥∥∂αaijn ∥∥∞ ≤ K0, supn

maxi,j,α,β

∥∥∂α∂βaijn ∥∥∞ ≤ K0, (4.16)

and the same inequality holds true for the partial derivatives of the compo-nents of fn and hn. We also require that these sequences converge to theoriginal functions a, f and h; i.e. limn→∞ an(x) = a(x), limn→∞ fn(x) = f(x)and limn→∞ hn(x) = h(x) for any x ∈ Rd. Finally we require that the matrixan is uniformly elliptic; in other words for each n, there exists λn such thatx>anx ≥ λn‖x‖2 for all x ∈ Rd. We write

An ,d∑

i,j=1

aijn∂2

∂xi∂xj+

d∑i=1

f in∂

∂xi,

for the associated generator of the nth approximating system.†

† To obtain an, we use first the procedure detailed in section 6.2.1. That is, we con-sider first the function ψna, where ψn is the function defined in (6.23) (see alsothe limits (6.24), (6.25) and (6.26)). Then we regularize ψna by using the convo-lution operator T1/n as defined in (7.4), to obtain the function T1/n(ψna). Moreprecisely, T1/n(ψna) is a matrix-valued function with components T1/n(ψnaij),1 ≤ i, j ≤ d. Finally, we define the function an to be equal to T1/n(ψna) + 1

nId,

where Id is the d× d identity matrix. The functions fn and hn are constructed inthe same manner (without using the last step).


Exercise 4.11. If condition U holds, show that the entries of the sequences(an)n≥1, (fn)n≥1 and (hn)n≥1 belong to Cl(Rd). Moreover show that thereexists a constant K1 such that

supn

(maxi,j

∥∥aijn ∥∥l∞ ,maxi

∥∥f in∥∥l∞ ,maxi

∥∥hin∥∥l∞) ≤ K1.

Next we use a result from the theory of systems of parabolic partial dif-ferential equations. Consider the following partial differential equation

∂vns∂s

= −Anvns − ivns h>n rs, s ∈ [0, t] (4.17)

with final conditionvnt (x) = Φ(x), (4.18)

where r ∈ Cmb ([0, t],Rm) and Φ is a complex-valued C∞ function. In otherwords, if vns = vn,1s + ivn,2s , s ∈ [0, t], Φ = Φ1 + iΦ2 then we have the equivalentsystem of real-valued PDEs

∂vn,1s

∂s= −Anvn,1s + vn,2s h>n rs vn,1t (x) = Φ1(x),

∂vn,2s

∂s= −Anvn,2s − vn,1s h>n rs vn,2t (x) = Φ2(x).

(4.19)

We need to make use of the maximum principle for parabolic PDEs in thedomain [0, T ]× Rd.

Lemma 4.12. Let

A =d∑

i,j=1

aij(x)∂2

∂xi∂xj+ fi(x)

∂

∂xi

be an elliptic operator; that is, for all x ∈ Rd, it holds that∑di,j=1 yiaij(x)yj >

0 for all y ∈ Rd \ 0. Let the coefficients aij(x) and fi(x) be continuous inx. If u ∈ C1,2([0,∞)× Rd) is such that

Au− ∂u

∂t≥ 0 (4.20)

in (0,∞) × Rd with u(0, x) = Φ(x) and u is bounded above, then for all t ∈[0,∞),

‖u(t, x)‖∞ ≤ ‖Φ‖∞. (4.21)

Proof. Define w(t, x) = u(t, x) − ‖Φ‖∞. It is immediate that Aw − ∂w∂t ≥ 0.

Clearly w(0, x) ≤ 0 for all x ∈ Rd. Consider the region (0, t]×Rd for t fixed. If(4.21) does not hold for s ∈ [0, t] then w(t, x) > 0 for some 0 < s ≤ t, x ∈ Rd.As we have assumed that u is bounded above, the same holds for w, which


implies that w has a positive maximum in the region (0, t]×Rd (including theboundary at t). Suppose this occurs at the point P0 = (x, t); then it followsby Theorem 4′ of Chapter 2 of Friedman [102] that w assumes this positiveconstant value over the whole region S(P0) = [0, t] × Rd which is clearly acontradiction since w(0, x) ≤ 0 and w is continuous in t. Thus w(t, x) ≤ 0 forall x ∈ Rd which establishes the result. ut

Exercise 4.13. Prove the above result in the case where the coefficients aijfor i, j = 1, . . . , d and fi for i = 1, . . . , d are bounded, without appealing togeneral results from the theory of parabolic PDEs. By modifying the aboveproof of Lemma 4.12 it is clear that it is sufficient to prove directly that ifu ∈ C1,2([0,∞)×Rd) is bounded above, satisfies (4.20), and u(0, x) ≤ 0, thenu(t, x) ≤ 0 for t ∈ [0,∞) and x ∈ Rd. This may be done in the followingstages.

i. First, by considering derivatives prove that if (4.20) were replaced by

Au− ∂u

∂t> 0 (4.22)

then u(t, x) cannot have a maximum in (0, t]× Rd.ii. Show that if u satisfies the original condition (4.20) then show that we can

find δ and ε such that wδ,ε , u(t, x)− δt− εe−t‖x‖2 satisfies the strongercondition (4.22).

iii. Show that if u(t, x) ≥ 0 then wδ,ε must have a maximum in (0, t] × Rd;hence use (i) to establish the result.

Proposition 4.14. If Φ1, Φ2 ∈ C∞b (Rd), then the system of PDEs (4.19) hasa solution (vn,1, vn,2) where vn,i ∈ C1,2

b ([0, t]×Rd) for i = 1, 2, for which thereexists a constant K2 independent of n such that ‖vn,i‖, ‖∂αvn,i‖, ‖∂α∂βvn,i‖,for i = 1, 2, α, β = 1, . . . , d are bounded by K2 on [0, t]× Rd.

Proof. We must rewrite our PDE as an initial value problem, by reversingtime. That is, we define vns , vnt−s for s ∈ [0, t]. Then we have the followingsystem of real-valued partial differential equations and initial conditions

∂vn,1s

∂s= Anv

n,1s − vn,2s h>n rt−s vn,10 (x) = Φ1(x),

∂vn,2s

∂s= Anv

n,2s + vn,1s h>n rt−s vn,20 (x) = Φ2(x).

(4.23)

As the operator An is uniformly elliptic and has smooth bounded coefficients,the existence of the solution of (4.23) is justified by Theorem 4.10 (the coeffi-cients have uniformly bounded first derivative and are therefore Lipschitz andthus satisfy the Holder continuity condition). Furthermore since the initialcondition and coefficients are also smooth, the solution vn (and thus vn) isalso smooth (has continuous derivatives of all orders) in the spatial variable.


It only remains to prove the boundedness of the solution and of its first andsecond derivatives. Here we follow the argument in Proposition 4.2.1, page 90from Bensoussan [13]. Define

znt ,12

((vn,1t

)2

+(vn,2t

)2). (4.24)

Then

∂zns∂s−Anzns = −

d∑α,β=1

aαβn(∂αv

n,1s ∂β v

n,1s + ∂αv

n,2∂β vn,2s

)≤ 0.

Therefore from our version of the positive maximum principle, Lemma 4.12,it follows that ∥∥vn,1s

∥∥2

∞ +∥∥vn,2s

∥∥2

∞ ≤∥∥Φ1

∥∥2

∞ +∥∥Φ2

∥∥2

∞ , (4.25)

for any s ∈ [0, t], which establishes the bound on ‖vn,i‖. Define

uns ,12

d∑α=1

((∂αv

n,1s

)2+(∂αv

n,2s

)2). (4.26)

Then

∂uns∂s−Anuns =

−d∑

α,β,γ=1

aαβn((∂α∂γ v

n,1s

) (∂β∂γ v

n,1s

)+(∂α∂γ v

n,2s

) (∂β∂γ v

n,2s

))+

d∑α,β,γ=1

∂γaαβn

((∂α∂β v

n,1s

) (∂γ v

n,1s

)+(∂α∂β v

n,2s

) (∂γ v

n,2s

))+

d∑α,β=1

∂βfαn

(∂αv

n,1s ∂β v

n,1s + ∂αv

n,2s ∂β v

n,2s

)+

d∑α=1

∂αgn,s(−vn,2s ∂αvn,1s + vn,1s ∂αv

n,2s ), (4.27)

where gn,s = h>n rt−s. The first term in (4.27) is non-positive as a consequenceof the non-negative definiteness of a. Then by (4.16), since |∂βfαn | is uniformlybounded by K0, using the inequality (

∑di=1 ai)

2 ≤ d∑di=1 a

2i , the third term

of (4.27) satisfies

d∑α,β=1

∂βfαn

(∂αv

n,1∂β vn,1 + ∂αv

n,2∂β vn,2)≤ 2K0du

ns . (4.28)


Similarly, from (4.16) and (4.25) we see that the fourth term of (4.27) satisfies

d∑α=1


n,2s ) ≤ K0

d∑α=1

(|vn,2s |

∣∣∂αvn,1s

∣∣+ |vn,1s |∣∣∂αvn,2s

∣∣)≤ K0

(∥∥Φ1∥∥∞ +

∥∥Φ2∥∥∞

) d∑α=1

(∣∣∂αvn,1s

∣∣+∣∣∂αvn,2s

∣∣)≤ K0

(∥∥Φ1∥∥∞ +

∥∥Φ2∥∥∞

)(uns + d)

≤ C4(uns + d), (4.29)

where the constant C4 , K0(‖Φ1‖∞ + ‖Φ2‖∞). It only remains to find asuitable bound for the second term in (4.27). This is done using the followinglemma, which is due to Oleinik–Radkevic (see [234, page 64]). Recall that ad× d-matrix a is said to be non-negative definite if θ>aθ ≥ 0 for all θ ∈ Rd.

Lemma 4.15. Let a : R → Rd×d, be a symmetric non-negative definitematrix-valued function which is twice continuously differentiable and denoteits components aij(x) for 1 ≤ i, j ≤ d. Let u be any symmetric d× d-matrix;then

(tr(a′(x)u))2 ≤ 2d2λ tr(ua(x)u) ∀x ∈ R,

where primes denote differentiation with respect to x, and

λ = sup

∣∣θ>a′′(x)θ∣∣

‖θ‖2: x ∈ R, θ ∈ Rd\0

.

Proof. We start by showing that∣∣a′ij(x)∣∣ ≤√λ(aii(x) + ajj(x)) ∀x ∈ R. (4.30)

Let ϕ ∈ C2(R) be a non-negative function with uniformly bounded secondderivative; let α = supx∈R |ϕ′′(x)|. Then Taylor’s theorem implies that

0 ≤ ϕ(x+ y) ≤ ϕ(x) + yϕ′(x) + αy2/2;

thus the quadratic in y must have no real roots, which implies that the dis-criminant is non-positive thus

|ϕ′(x)| ≤√

2αϕ(x).

Let ei denote the standard basis of Rd; define the functions

ϕij±(x) = (ei ± ej)>a(x)(ei ± ej) = aii(x)± 2aij(x) + ajj(x).

From the fact that a is non-negative definite, it follows that ϕij±(x) ≥ 0. Fromthe definition of λ, since ‖ei ± ej‖ =

√2, it follows that |ϕ′′±(x)| < 2λ; thus

applying the above result

106 4 Uniqueness of the Solution∣∣ϕ′±(x)∣∣ ≤√4λϕ±(x).

From the definition aij(x) = (ϕ+ − ϕ−)/4, using (4.30)

|a′ij(x)| ≤ (|ϕ′+(x)|+ |ϕ′−(x)|)/4

≤ 12

√λϕ+(x) +

√λϕ−(x)

≤√λ(ϕ+(x) + ϕ−(x))/

√2

≤√λ(aii(x) + ajj(x)).

To establish the main result, by Cauchy–Schwartz

(tr(a′(x)u))2 =

d∑i,j=1

a′ij(x)uji

2

≤ d2d∑

i,j=1

(a′ij(x)uji

)2≤ 2λd2

d∑i,j=1

(aii(x) + ajj(x))(uji)2

≤ 2d2λ

d∑i,j=1

uijajj(x)uji.

In general since a is real-valued and symmetric, at any x we can find anorthogonal matrix q such that q>a(x)q is diagonal. We fix this matrix q andthen since tr(q>uq) = tr(qq>u) = tru, it follows that

(tr(a′(x)u))2 =(tr(q>a′(x)qq>uq)

)2≤ 2d2λ

d∑i,j=1

(q>uq)ij(q>a(x)q)jj(q>uq)ji

≤ 2d2λ tr((q>uq)(q>a(x)q)(q>uq)

)≤ 2λd2 tr(ua(x)u).

ut

Taking uα,β = ∂α∂β vn,is , Lemma 4.15 implies that

d∑α,β,γ=1

(∂γa

αβn ∂α∂β v

n,is

)2 ≤ C2

d∑α,β,γ=1

aαβn(∂α∂γ v

n,is

) (∂β∂γ v

n,is

), i = 1, 2,

where C2 only depends upon the dimension of the space and K0 (in particular,it depends on the bound on the second partial derivatives of the entries of an).Hence, by using the elementary inequality, for C > 0,


τζ ≤ 12C

τ2 +12Cζ2, (4.31)

on each term in the summation in the second term of (4.27) one can find anupper bound for the second sum of the form

12Θ

ns + C2u

ns ,

where Θns is given by

Θns ,d∑

α,β,γ=1


n,1s

) (∂β∂γ v

n,1s

)+(∂α∂γ v

n,2s

) (∂β∂γ v

n,2s

)),

and as a is non-negative definite Θns ≥ 0. By substituting the bounds (4.28),(4.29) and (4.31) into (4.27) we obtain the bound

∂uns∂s−Anuns ≤ −Θns + 1

2Θns + C2u

ns + 2K0du

ns + C4(uns + d)

≤ C2uns + 2K0du

ns + C4(uns + d)

≤ C0uns + C1,

where the constants C0 and C1 only depend upon the dimension of the spaceand K0 (and not upon s or x). Thus

uns =C1

C0e−C0s + uns e−C0s

satisfies∂uns∂t−Anuns ≤ 0;

thus from the maximum principle in the form of Lemma 4.12 we have that‖uns ‖∞ ≤ ‖un0‖∞, but u0 = C1/C0 + un0 ,

‖uns ‖∞ ≤ eC0T

(12

d∑α=1

(∥∥∂αΦ1∥∥2

∞ +∥∥∂αΦ2

∥∥2

∞

)+C1

C0

),

which establishes the uniform bound on the first derivatives. The bound onthe second-order partial derivatives of v is obtained by performing a similar,but more tedious, analysis of the function

wnt ,12

d∑α,β=1

((∂α∂β v

n,1t

)2

+(∂α∂β v

n,2t

)2).

Similar bounds will not hold for higher-order partial derivatives. ut

Theorem 4.16. Assuming condition U on the coefficients a, f and g, theequation (4.5) has a unique solution in the class U , up to indistinguishability.


Proof. Let vn be the solution to the PDE (4.17). Applying Exercise 4.9 to vn

yields that for any solution µ of (3.43) in the class U we have

E[εtµt(vnt )] = π0(vn0 ) + E[∫ t

0

εsµs

(∂vns∂t

+Avns + ih>vns rs

)ds]

and using the fact that vns satisfies (4.17) we see that

E[εtµt(vnt )] = π0(vn0 )

+ E[∫ t

0

εsµs((A−An) vns + ivns (h− hn)>rs

)ds]. (4.32)

As a consequence of Proposition 4.14, vn and its first- and second-order partialderivatives are uniformly bounded and consequently,

limn→∞

(A−An)vns (x) = 0, limn→∞

vns (x)(h>(x)− h>n (x))rs(x) = 0

for any x ∈ Rd×d. Also there exists a constant Ct independent of n such that

|(A−An)vns (x)|, |vns (x)(h(x)− hn(x))>rs| ≤ Ctψ(x)

for any x ∈ Rd×d and s ∈ [0, t]. Hence, as µs ∈ U it follows that µs(ψ) < ∞and thus by the dominated convergence theorem we have that

limn→∞

µs((A−An)vns + ivns (h− hn)>rs

)= 0.

Next let us observe that sups∈[0,t] |εs| < exp(sups∈[0,t] ‖rs‖t/2) < ∞, hencethere exists a constant C ′t such that for s ∈ [0, t],∣∣εsµs ((A−An)vns + ivns (h− hn)>rs

)∣∣ ≤ C ′tµs(ψ)

and since as a consequence of (4.4), it follows that (4.9) holds; thus

E[∫ t

0

µs(ψ) ds]<∞.

It follows that C ′tµs(ψ) is a dominating function, thus by the dominated con-vergence theorem it follows that

limn→∞

E[∫ t

0

εsµs((A−An)vns + ivns (h− hn)>rs

)ds]

= 0. (4.33)

Finally, let µ1 and µ2 be two solutions of the Zakai equation (3.43) in theclass U . Then from (4.32),

E[εtµ1t (v

nt )]− E[εtµ2

t (vnt )]

= E[∫ t

0

εs(µ1s − µ2

s

) ((A−An)vns + ivns (h− hn)>rs

)ds].


The final condition of the partial differential equation (4.18) implies thatvnt (x) = Φ(x) for all x ∈ Rd; thus

E[εtµ1t (Φ)]− E[εtµ2

t (Φ)]

= E[∫ t

0

εs(µ1s − µ2

s

) ((A−An)vns + ivns (h− hn)>rs

)ds]

and we may then pass to the limit as n→∞ using (4.33) to obtain

E(εtµ1t (Φ)) = E(εtµ2

t (Φ)). (4.34)

The function Φ was an arbitrary C∞b function, therefore using the fact that theset St is total, for ϕ any smooth bounded function, P-almost surely µ1

t (ϕ) =µ2t (ϕ). From the bounds we know that ‖vn0 ‖∞ ≤ ‖Φ‖∞, thus by the dominated

convergence theorem since π0 is a probability measure

limn→∞

π0(vn0 ) = π0

(limn→∞

vn0

);

passing to n→∞ we get

E(εtµt(Φ)) = π0

(limn→∞

vn0

)whence ∣∣∣E(εtµt(Φ))

∣∣∣ ≤ ‖Φ‖∞.By the dominated convergence theorem, we can extend (4.34) to any ϕ whichis a continuous bounded function. Hence by Exercise 4.1 µ1

t and µ2t are indis-

tinguishable. ut

Exercise 4.17. (Difficult) Extend Theorem 4.16 to the correlated noise frame-work.

Now let µ = µt, t ≥ 0 be a Yt-adapted Ml(Rd)-valued stochastic pro-cess with cadlag paths and mµ = mµ

t , t ≥ 0 be the Yt-adapted real-valuedprocess

mµt = exp

(∫ t

0

µs(h>) dYs −12

∫ t

0

µs(h>)µs(h) ds), t ≥ 0.

We prove uniqueness for the Kushner–Stratonovich equation (3.57) in theclass U of all Yt-adaptedMl(Rd)-valued stochastic processes µ = µt, t ≥ 0with cadlag paths such that the process mµµ belongs to the class U .

Exercise 4.18. Let X be the solution of the SDE (3.9). Prove that if (3.10) issatisfied, π0 has finite third moment and h satisfies the linear growth condition(3.28) then the process π belongs to the class U .


Theorem 4.19. Assuming condition U on the coefficients a, f and g theequation (3.57) has a unique solution in the class U , up to indistinguishability.

Proof. Let π1 and π2 be two solutions of the equation (3.57) belonging tothe class U . Then by a straightforward integration by parts, one shows thatρi = mπiπi, i = 1, 2 are solutions of the Zakai equation (3.43). However, byTheorem 4.16, equation (3.43) has a unique solution in the class U (whereboth ρ1 and ρ2 reside). Hence, ρ1 and ρ2 coincide. In particular, P-almostsurely

mπ1

t = ρ1t (1) = ρ2

t (1) = mπ2

t

for all t ≥ 0. and hence

π1t =

1ρ1t (1)

ρ1t =

1ρ2t (1)

ρ2t = π2

t

for all t ≥ 0, P-almost surely. ut

4.2 The Functional Analytic Approach

In this section, uniqueness is proved directly for the case when the signal andobservation noise are correlated. However, in contrast to all of the argumentswhich have preceded this we assume that the function h is bounded. Werecall that A,Bi : B(S) → B(S), i = 1, . . . ,m are operators with domains,respectively, D(A), D(Bi) ⊆ B(S), i = 1, . . . ,m with

1 ∈ D , D(A) ∩m⋂i=1

D(Bi) and A1 = B11 = · · · = Bn1 = 0. (4.35)

As in the previous section we need to define the space of measure-valuedstochastic processes within which we prove uniqueness of the solution. Werecall that (Ω,F , P) is a complete probability space and that the filtration(Ft)t≥0 satisfies the usual conditions. Also recall that, under P, the process Yis an Ft-adapted Brownian motion. The conditions (4.35) imply that for allt ≥ 0 and ϕ ∈ D since Bϕ is bounded,∫ t

0

(µs(‖Bϕ‖))2 ds < ‖Bϕ‖2∞∫ t

0

(µs(1))2 ds, (4.36)

for any µ = µt, t ≥ 0 which is an Ft-adapted M(S)-valued stochasticprocess.

Definition 4.20. Let U ′ be the class of Ft-adapted M(S)-valued stochasticprocesses µ = µt, t ≥ 0 with cadlag paths that satisfy conditions (4.36) and(3.42); that is, for all t ≥ 0, ϕ ∈ D,

P

[∫ t

0

m∑i=1

[µs(|(hi +Bi)ϕ|)]2 ds <∞

]= 1. (4.37)

4.2 The Functional Analytic Approach 111

Let ρ = ρs, s ≥ 0 be the M(S)-valued process with cadlag paths whichis the unnormalised conditional distribution of the signal given the observationprocess as defined in Section 3.4. We have assumed that h = (hi)mi=1 : S→ Rfor i = 1, . . . ,m is a bounded measurable function hence it satisfies condition(3.25) which in turn ensures that the process Z = Zt, t ≥ 0 introduced in(3.30) and (3.31) is a (genuine) martingale under P, where P is the probabilitymeasure defined in Section 3.3.

Exercise 4.21. Prove that the mass process ρ(1) = ρt(1), t ≥ 0 is a Yt-adapted martingale under P.

Since the mass process ρ(1) = ρt(1), t ≥ 0 is a martingale under P whichis cadlag by Lemma 3.18, it is almost surely bounded on compact intervals.

Exercise 4.22. Prove that if (3.42) is satisfied, then the process ρ as definedby Definition 3.17 belongs to the class U ′.

Recall that, for any t ≥ 0 and ϕ ∈ D we have, P-almost surely that theunnormalised conditional distribution satisfies the Zakai equation, which inthe correlated noise situation which we are considering here is

ρt(ϕ) = π0(ϕ) +∫ t

0

ρs(Aϕ) ds+∫ t

0

ρs((h> +B>)ϕ) dYs, (4.38)

where condition (4.37) ensures that the stochastic integral in this equation iswell defined.

Proposition 4.23. If h is a bounded measurable function and ρ = ρt, t ≥ 0is an Ft-adapted M(S)-valued stochastic process belonging to the class U ′which satisfies (4.38), then for any α > 0, there exists a constant k(α) suchthat

E

[sups∈[0,t]

(ρs(1))α]< k(α) <∞. (4.39)

Proof. From condition (4.35) and equation (4.38) for ϕ = 1, we get that

ρt(1) = 1 +∫ t

0

ρs(h>) dYs. (4.40)

In the following we make use of the normalised version of ρt(hi). Since wedo not know that ρt(1) is strictly positive this normalisation must be definedwith some care. Let ρt(hi) be defined as

ρt(hi) =

ρt(hi)ρt(1)

if ρt(1) > 0

0 if ρt(1) = 0.


Since h is bounded it follows that ρt(hi) ≤ ‖hi‖ρt(1); hence ρt(hi) ≤ ‖hi‖.Hence ρt(1) satisfies the equation

ρt(1) = 1 +∫ t

0

ρt(h>)ρt(1) dYs (4.41)

and has the explicit representation (as in Lemma 3.29)

ρt(1) = exp

(m∑i=1

(∫ t

0

ρs(hi) dY is −12

∫ t

0

(ρs(hi))2 ds

)).

We apply Lemma 3.9 to the bounded m-dimensional process ξ = ξt, t ≥ 0defined as ξit , ρt(hi), i = 1, . . . ,m, t ≥ 0 and deduce from the boundednessof ρt that ρt(1) is a (genuine) Yt-adapted martingale under P. Also

(ρt(1))α = zαt exp

(m∑i=1

α2 − α2

∫ t

0

(ρs(hi))2 ds

)≤ zαt exp

(m2t∣∣α2 − α

∣∣ ‖h‖2∞) , (4.42)

where the process zα = zαt , t ≥ 0 is defined by

zαt , exp

(m∑i=1

(α

∫ t

0

ρs(hi) dY is −α2

2

∫ t

0

(ρs(hi))2 ds

)), t ≥ 0.

and is again a genuine P martingale by using Lemma 3.9. By Doob’s maximalinequality we get from (4.42) that for α > 1,

E

[sups∈[0,t]

(ρs(1))α]≤(

α

α− 1

)αE [(ρt(1))α]

≤(

α

α− 1

)αexp(m

2t(α2 − α

)‖h‖2∞

).

Hence defining

k(α) =(

α

α− 1

)αexp(m

2t(α2 − α

)‖h‖2∞

),

we have established the required bound for α > 1. The bound (4.39) for0 < α ≤ 1 follows by a straightforward application of Jensen’s inequality. Forexample,

E

[sups∈[0,t]

(ρs(1))α]≤

(E

[sups∈[0,t]

(ρs(1))2

])α/2≤ k(2)α/2.

ut


The class U ′ of measure-valued stochastic processes is larger than theclass U defined in the Section 4.1. This is for two reasons; firstly because theconstituent processes are no longer required to be adapted to the observationfiltration Yt, but to the larger filtration Ft. This relaxation is quite importantas it leads to the uniqueness in distribution of the weak solutions of the Zakaiequation (4.38) (see Lucic and Heunis [200] for details). The second relaxationis that condition (4.4) is no longer imposed. Unfortunately, this has to be doneat the expense of the boundedness assumption on the function h.

Following Proposition 4.23, assumption (4.37) can be strengthened to

E

[∫ t

0

m∑i=1

ρs(|(hi +Bi)ϕ|)2 ds

]

≤ m (‖Bϕ‖∞ + ‖h‖∞‖ϕ‖∞)2 E[∫ t

0

(ρs(1))2 ds]

≤ m (‖Bϕ‖∞ + ‖h‖∞‖ϕ‖∞)2tk(2) <∞. (4.43)

In particular, this implies that the stochastic integral in (4.38) is a (genuine)martingale. Let us define the operator Φ : B(S× S)→ B(S× S) with domain

D(Φ) = ϕ ∈ B(S× S) : ϕ(x1, x2) = ϕ1(x1)ϕ2(x2),∀x1, x2 ∈ S, ϕ1, ϕ2 ∈ D

defined as follows. For ϕ ∈ D(Φ) such that ϕ(x1, x2) = ϕ1(x1)ϕ2(x2), for allx1, x2 ∈ S we have

Φϕ(x1, x2) = ϕ1(x1)Aϕ2(x2) + ϕ2(x2)Aϕ1(x1)

+m∑i=1

(hi +Bi)ϕ1(x1)(hi +Bi)ϕ2(x2). (4.44)

We introduce next the following deterministic evolution equation

νtϕ = ν0(ϕ) +∫ t

0

νs(Φϕ) ds, (4.45)

where ν = νt, t ≥ 0 is an M(S × S)-valued stochastic process, with theproperty that the map t 7→ νtϕ : [0,∞)→ [0,∞) is Borel-measurable for anyϕ ∈ B(S× S) and integrable for any ϕ in the range of Φ.

Condition 4.24 (U′). The function h = (hi)mi=1 : S → Rm appearing in theobservation equation (3.5) is a bounded measurable function and the deter-ministic evolution equation (4.45) has a unique solution.

Of course, condition U′ is not as easy to verify as the corresponding condi-tion U which is used in the PDE approach of Section 4.1. However Lucic andHeunis [200] prove that, in the case when the signal satisfies the stochasticdifferential equation,


dXit = f i(Xt) dt+

n∑j=1

σij(Xt) dV jt +m∑j=1

σij(Xt) dW jt , (4.46)

then condition U′ is implied by the following condition which is easier to verify.

Condition 4.25 (U′′). The function f = (f i)di=1 : Rd → Rd appearingin the signal equation (4.46) is Borel-measurable, whilst the functions σ =(σij)i=1,...,d,j=1,...,n : Rd → Rd×n and σ = (σik)i=1,...,d,k=1,...,m : Rd → Rd×mare continuous and there exists a constant K such that, for x ∈ Rd, theysatisfy the following linear growth condition

maxi,j,k

|f i(x)|, |σij(x)|, |σik(x)|

≤ K(1 + |x|).

Also σσ> is a strictly positive definite matrix for any x ∈ Rd. Finally, thefunction h = (hi)mi=1 : S→ Rm appearing in the observation equation (3.5) isa bounded measurable function.

The importance of Condition U′ is that it ensures that there are enoughfunctions in the domain of Φ so that ν = νt, t ≥ 0 is uniquely characterizedby (4.45). Lucic and Heunis [200] show that, under condition U′′, the closureof the domain of Φ contains the set of bounded continuous functions which inturn implies the uniqueness of (4.45).

Theorem 4.26. Assuming condition U′, the equation (4.38) has a uniquesolution in the class U ′, up to indistinguishability.

Proof. Let ρ1 = ρ1t , t ≥ 0 and ρ2 = ρ2

t , t ≥ 0 be two processes belongingto the class U ′ and define the M(S× S)-valued processes

ραβ = ραβt , t ≥ 0, α, β = 1, 2

to be the unique processes for which

ραβt (Γ1 × Γ2) = ραt (Γ1)ρβt (Γ2), for any Γ1, Γ2 ∈ B(S) and t ≥ 0.

Of course ραβ is an Ft-adapted, progressively measurable process. Also defineναβ = ναβt , t ≥ 0 for α, β = 1, 2 as follows

ναβt (Γ ) = E[ραβt (Γ )

]for any Γ ∈ B(S× S) and t ≥ 0.

It follows that ναβt is a positive measure on (S× S, B(S× S)) and from Propo-sition 4.23 we get that, for any t ≥ 0,

sups∈[0,t]

ναβs (S× S) = sups∈[0,t]

E[ραt (S)ρβt (S)

]≤ k(2);

hence ναβ is uniformly bounded with respect to s in any interval [0, t] and byFubini’s theorem t 7→ ναβt (Γ ) is Borel-measurable for any Γ ∈ B(S× S). Let


ϕ ∈ B(S × S) such that ϕ ∈ D(Φ). By definition, ϕ(x1, x2) = ϕ1(x1)ϕ2(x2)and for all x1, x2 ∈ S, ϕ1, ϕ2 ∈ D and

dραβt (ϕ) = d(ραt (ϕ1)ρβt (ϕ2)

)= ραt (ϕ1) dρβt (ϕ2) + ραt (ϕ2) dρβt (ϕ2) + d

⟨ρα(ϕ1), ρβ(ϕ2)

⟩t

= ραt (ϕα)(ρβt (Aϕ2) dt+ ρβt ((h> +B>)ϕ2) dYt

)+ ρβt (ϕ2)

(ραt (Aϕ1) dt+ ραt ((h> +B>)ϕ1) dYt

)+

m∑i=1

ραt ((hi +Bi)ϕ1)ρβt ((hi +Bi)ϕ2) dt.

In other words using Φ defined in (4.44) for ϕ ∈ D(Φ),

ραβt (ϕ) = ραβ0 (ϕ) +∫ t

0

ραβs (Φϕ) ds+∫ t

0

Λαβs (ϕ) dYs, (4.47)

where Λαβs (ϕ) , ραs (ϕ1)ρβs ((h>+B>)ϕ2)+ρβs (ϕ2)ραs ((h>+B>)ϕ1). By Propo-sition 4.23 and the Cauchy–Schwartz inequality we have that

E[∫ t

0

(Λαβs (ϕ)

)2ds]≤ME

[∫ t

0

ραs (1)2ρβs (1)2 ds]

≤MtE

[sup

s∈[0,T ]

ραs (1)2 sups∈[0,T ]

ρβs (1)2

]

≤Mt

√√√√E

[sup

s∈[0,T ]

ραs (1)4

]E

[sup

s∈[0,T ]

ρβs (1)4

]≤Mtk(4) <∞,

where the constant M is given by

M = 4 max

(‖ϕ1‖2∞, ‖ϕ2‖2∞,

m∑i=1

‖(hi +Bi)ϕ1‖2∞ ,

m∑i=1

‖(hi +Bi)ϕ2‖2∞

),

which is finite since ϕ1, ϕ2 ∈ D and consequently they belong to the domain ofBi, i = 1, . . . ,m. It follows that the stochastic integral in (4.47) is a martingalewith zero expectation. In particular, from (4.47) and Fubini’s theorem we getthat for ϕ ∈ D(Φ),

ναβt (ϕ) = E[ραβt (ϕ)

]= E

[ραβ0 (ϕ) +

∫ t

0

ραβs (Φϕ) ds]

= ναβ0 (ϕ) +∫ t

0

ναβs (Φϕ) ds. (4.48)


In (4.48), the use of the Fubini’s theorem is justified as the mapping

(ω, s) ∈ Ω × [0, t] 7→ ραβs (Φϕ) ∈ R

is F×B([0, t])-measurable (it is a product of two F×B([0, t])-measurable map-pings) and integrable (following Proposition 4.23). From (4.48), we deducethat ναβ is a solution of the equation (4.45), hence by condition U′ the deter-ministic evolution equation has a unique solution and since ν11

0 = ν120 = ν22

0 ,we have that for any t ≥ 0,

ν11t = ν22

t = ν12t .

This implies that for any ϕ bounded Borel-measurable function we have

E[(ρ1t (ϕ)− ρ2

t (ϕ))2]

= ν11t (ϕ× ϕ) + ν11

t (ϕ× ϕ)− 2ν12t (ϕ× ϕ) = 0.

Hence ρ1t (ϕ) = ρ2

t (ϕ) holds P-almost surely and by Exercise 4.1, the measure-valued processes ρ1 and ρ2 are indistinguishable. ut

As in the previous section, now let µ = µt, t ≥ 0 be an Ft-adaptedM(S)-valued stochastic processes with cadlag paths and mµ = mµ

t , t ≥ 0be the Ft-adapted real-valued process

mµt = exp

(∫ t

0

µs(h>) dYs −12

∫ t

0

µs(h>)µs(h) ds), t ≥ 0.

Define the class U ′ of all Ft-adapted M(S)-valued stochastic processes withcadlag paths such that the process mµµ belongs to the class U ′.

Exercise 4.27. Let X be the solution of the SDE (4.46). Prove that if h isbounded then π belongs to the class U ′.

Exercise 4.28. Assume that condition U ′ holds. Prove that the Kushner–Stratonovich equation has a unique solution (up to indistinguishability) inthe class U ′.


4.1 Since µ1t (ϕi) = µ2

t (ϕi) almost surely for any i ≥ 0 one can find a set Ωtof measure one, independent of i ≥ 0, such that for any ω ∈ Ωt, µ1

t (ϕ)(ω) =µ2t (ϕi)(ω) for all i ≥ 0. Since (ϕi)i≥0 is a separating sequence, it follows that

for any ω ∈ Ωt, µ1t (ω) = µ2

t (ω). Hence one can find a set Ω of measure oneindependent of t such that for any ω ∈ Ω, µ1

t (ω) = µ2t (ω) for all t ∈ Q+

(the positive rational numbers). This together with the right continuity of thesample paths of µ1 and µ2 implies that for any ω ∈ Ω, µ1

t (ω) = µ2t (ω) for all

t ≥ 0.


4.2 Suppose νµn ⇒ νµ; then from the definition of weak convergence, for anyϕ ∈ Cb(Rd) it follows that νµnϕ → νµϕ as n → ∞. Thus µn(ϕψ) → µ(ϕψ).Since any function in Cl(Rd) is of the form ϕψ where ϕ ∈ Cb(Rd), it followsthat µn converges to µ in Ml(Rd).

Conversely suppose that µn converges to µ in Ml(Rd); thus µnϕ → µϕfor ϕ ∈ Cl(Rd). If we set ϕ = ψθ for θ ∈ Cb(Rd), then as ϕ/ψ ∈ Cb(Rd), itfollows that ϕ ∈ Cl(Rd). Thus µn(ψθ) → µ(ψθ) for all θ ∈ Cb(Rd), whenceνµn ⇒ νµ.

4.4 We have by the Kallianpur–Striebel formula

E[∫ t

0

(ρs(ψ))2 ds]

= E[∫ t

0

(πs(ψ))2ρ2s(1) ds

]=∫ t

0

E[(πs(ψ))2

ρ2s(1)

]ds.

Now

E[(πs(ψ))2

ρ2s(1)

]≤ E

[πs(ψ2) ρ2

s(1)]

= E[πs(ψ2) ρ2

s(1)Zs]

= E[πs(ψ2)ρ2

s(1)E [Zs|Ys]].

Since ρs(1) = 1/E [Zs | Ys] (see Exercise 3.37 part (iii) we get that

E[πs(ψ2) ρ2

s(1)]

= E[πs(ψ2) ρs(1)

]= E

[E[ψ2(Xs)|Ys

]ρs(1)

]= E

[ψ2(Xs) ρs(1)

].

Now since h is bounded,

ρs(1) = exp(∫ s

0

πr(h>) dYr −12

∫ s

0

‖πr(h)‖2 dr)

= exp(∫ s

0

πr(h>)

dWr +∫ s

0

πr(h>)h(Xr) dr − 1

2

∫ s

0

‖πr(h)‖2 dr)

≤ es‖h‖2∞ exp

(∫ s

0

πr(h>)

dWs −12

∫ s

0

‖πr(h)‖2 dr).

Using the independence of W and X we see that

E[

exp(∫ s

0

πr(h>)

dWs −12

∫ s

0

πr(h>)πr(h)dr

)∣∣∣∣σ(Xr, r ∈ [0, s])]

= 1,

henceE [ρs(1)|σ(Xr, r ∈ [0, s])] ≤ es‖h‖

2∞ .

It follows that


E[πs(ψ2) ρ2

s(1)]≤ es‖h‖

2∞E

[(1 + ‖Xs‖)2

],

and therefore

E[∫ t

0

(ρs(ψ))2 ds]≤ tet‖h‖

2∞ sups∈[0,t]

E[(1 + ‖Xs‖)2

]≤ 2tet‖h‖

2∞

(1 + sup

s∈[0,t]

E[‖Xs‖2

]).

As a consequence of Exercise 3.10, the last term in this equation is finite ifX0 has finite second moment and (3.10) is satisfied. Thus ρ satisfies condition(4.4) and hence it belongs to the class U .

4.7

i. We know that for t in [0,∞) the process µt is Yt-measurable. As ϕ ∈ E ,this implies that ϕt ∈ Cl(Rd) and thus |ϕt(x)| ≤ ‖ϕt‖l∞ψ(x). Define thesequence

ϕnt (x) , ϕt(x)1|ϕt(x)|≤n.

By the argument used for Exercise 2.21 we know that µt(ϕn) is Yt-adaptedsince ϕn is bounded. But ‖ϕt‖l∞ψ is a dominating function, and sinceµ ∈ U , it follows that µt(ψ) <∞ hence it is a µt-measurable dominatingfunction. Thus µt(ϕnt ) → µt(ϕt) as n → ∞, which implies that µt(ϕt) isYt-measurable. As this holds for all t ∈ [0,∞) it follows that µt(ϕt) isYt-adapted.

ii. From the solution to Exercise 3.23, a sufficient condition for the stochasticintegral to be well defined is

P[∫ t

0

(µs(ϕ‖h‖))2 ds <∞]

= 1.

We establish the stronger condition for the stochastic integral to be amartingale; viz for all t ≥ 0,

E[∫ t

0

(µs(ϕ‖h‖))2 ds]<∞.

Using the boundedness of ϕ and the linear growth condition

ϕ(x)h(x) ≤√C‖ϕ‖∞

√1 + ‖x‖2 =

√C‖ϕ‖∞ψ(x),

but since µs ∈Ml(Rd), it follows that µs(ψ) <∞. Thus∫ t

0

(µs(ϕ‖h‖))2 ds ≤ ‖ϕ‖∞C∫ t

0

(µs(ψ))2 ds,

and by condition (4.4) it follows that


E[∫ t

0

(µs(ψ))2

]<∞

so the stochastic integral is both well defined and a martingale.

4.9 Starting from (4.5) we apply Ito’s formula to the product εtµt(ϕt), ob-taining

εtµt(ϕt) = ε0π0(ϕ0) +∫ t

0

εsµs

(∂ϕt∂t

+Aϕs

)ds

+∫ t

0

εsµs(ϕth>)dYs +∫ t

0

iεsr>s µt(ϕt)dYs +

∫ t

0

iεsrsµs(ϕsh>)ds.

Next we take expectation under P. We now show that as a consequence ofcondition (4.4) both stochastic integrals are genuine martingales. Because εtis complex-valued we need to introduce the notation

‖ε(ω)‖∞ = supt∈[0,∞)

|εt(ω)|

where | · | denotes the modulus of the complex number. The following boundis elementary,

‖εt‖∞ ≤ exp(

12 maxi=1,...,m

‖ri‖2∞t)<∞;

for notational conciseness write R = maxi=1,...,m ‖ri‖∞. By assumption thereis a uniform bound on ‖ϕs‖∞ for s ∈ [0, t]; hence

E[∫ t

0

ε2s

(µs(ϕsh>)

)2ds]≤ eR

2t sup[0,t]

‖ϕs‖∞E[∫ t

0

(µs(‖h‖))2 ds]

and the right-hand side is finite by (4.4). The second stochastic integral istreated in a similar manner

E[∫ t

0

ε2s‖rs‖2 (µs(ϕs))

2 ds]≤ R2eR

2t sup[0,t]

‖ϕs‖2∞E[∫ t

0

(µs(1))2 ds].

Therefore

E(εtµt(ϕt)) = π0(ϕ0) + E[∫ t

0

εsµs

(∂ϕs∂t

+Aϕs + irsϕsh>)

ds],

which is (4.13).

4.11 Since the components of an, fn and hn are bounded it is immediatethat they belong to Cb(Rd) and consequently to the larger space Cl(Rd).

For the bound, as there are a finite number of components it is sufficientto establish the result for one of them. Clearly


aijn (x) = aijn (0) +d∑k=1

∫ 1

0

∂aijn∂xk

(xs)xk ds.

By (4.16), uniformly in x and i,∣∣∣∣∂aijn∂xi

∣∣∣∣ ≤ K0;

thus ∣∣aijn (x)∣∣ ≤ ∣∣aijn (0)

∣∣+ dK0‖x‖.

Secondly, since aijn → aij it follows that aijn (0) → aij(0); thus given ε > 0,there exists n0 such that for n ≥ n0, |aijn (0)− aij(0)| < ε. Thus we obtain thebound

‖aijn (x)‖ ≤ max1≤i≤n0

‖aiji (0)|+ ‖aij(0)‖+ ε+ dK0‖x‖.

Hence, since

‖aijn ‖l∞ = supx∈Rd

|aijn (x)|1 + ‖x‖

setting A = max(max1≤i≤n0 ‖aiji (0)| + ‖aij(0)‖ + ε, dK0), it follows that

‖aijn ‖l∞ ≤ A.

4.13

i. At such a maximum (t0, x0) in (0, t]× Rd,

∂u

∂t(t0, x0) ≥ 0,

∂u

∂xi(t0, x0) = 0, i = 1, . . . d,

(we cannot assert that the time derivative is zero, since the maximummight occur on the boundary at t) and the Hessian matrix of u (i.e.(∂i∂ju)) is negative definite. Thus since a is positive definite, it followsthat

d∑i,j=1

aij(x0)∂2u

∂xi∂xj(t0, x0) ≤ 0,

d∑i=1

f i(x0)∂u

∂xi(t0, x0) = 0;

consequently

Au(x0)− ∂u

∂t(t0, x0) ≤ 0

which is a contradiction since we had assumed that the left-hand side wasstrictly positive.

ii. It is easy to verify that

∂w

∂t=∂u

∂t− δ + εe−t‖x‖2,

and


Aw = Au− εe−t(2 tr a+ 2b>x

).

ThusAw − ∂w

∂t≥ −εe−t

(2 tr a+ 2(b− x)>x

)+ δ.

Thus given δ > 0 using the fact that a and b are bounded, we can findε(δ) so that this right-hand side is strictly positive.

iii. Choose δ, ε so that the condition in part (ii) is satisfied. It is clear thatwδ,ε(0, x) = u(0, x) − ε‖x‖2. Thus since ε > 0, if u(0, x) ≤ 0, it followsthat wδ,ε(0, x) ≤ 0. Also since u is bounded above, it is clear that as‖x‖ → ∞, wδ,ε(t, x) → −∞. Therefore if u(t, x) ≥ 0 at some point, itis clear that wδ,ε has a maximum. But by part (i) wδ,ε(t, x) cannot havesuch a maximum on (0, t] × Rd. Hence u(t, x) ≤ 0 for all t ∈ [0,∞) andx ∈ Rd.

4.17 Under the condition that

E(∫ t

0

[ρs(‖(h> +B>)ϕ‖)

]2ds)<∞,

we deduce that the corresponding complex values PDE for a functional dualϕ is

Aϕt +∂ϕt∂t

+ ir>t (hϕt +Bϕt) = 0.

If we write ϕt = v1t + ivt2, then the time reversed equation is

∂v1

∂t= Av1 − v2gs − r>Bv2

∂v2

∂t= Av2 + v1gs + r>Bv1,

where rs = rt−s, and gs = h>r. As in the proof for the uncorrelated case anapproximating sequence of uniformly parabolic PDEs is taken, with smoothbounded coefficients and so that (4.16) holds together with the analogue forf . Then with znt defined by (4.24),

∂zs∂s−Azs = −

d∑α,β=1

aαβ(∂αv

n,1s ∂βvn,1s + ∂αv

n,2s ∂βvn,2s

)− vn,1s r>Bvn,2s + vn,2s r>Bvn,1s .

If we consider the special case of Corollary 3.40, and write ct = σrt, which weassume to be uniformly bounded, then

∂zns∂s−Azns = −

d∑α,β=1

aαβ(∂αv

n,1s ∂β v

n,1s + ∂αv

n,2s ∂β v

n,2s

)+

d∑γ=1

cγt(−vn,1s ∂γ v

n,2s + vn,2s ∂γ v

n,1s

).


Using the inequality ab ≤ 12 (a2 + b2), it follows that for ε > 0,

∂zns∂s−Azns ≤ −

d∑α,β=1

aαβ(∂αv

n,1s ∂β v

n,1s + ∂αv

n,2s ∂β v

n,2s

)+

12ε

d∑γ=1

|cγt |((vn,1s )2 + (vn,2s )2

)+ε

2

d∑γ=1

((∂γ v

n,1s

)2+(∂γ v

n,2s

)2)≤ zns d‖c‖∞

ε

−d∑

α,β=1

(a− ε/2I)αβ(∂αv

n,1s ∂β v

n,1s + ∂αv

n,2s ∂β v

n,2s

).

As a is uniformly elliptic, x>ax ≥ λ‖x‖2, therefore, by choosing ε sufficientlysmall (i.e. ε < 2λ) then the matrix a− ε/2I is positive definite. Thus

∂zns∂s−Azns ≤

zns d‖c‖∞ε

.

Writing C0 = d‖c‖∞/ε and zt = e−C0tzt, then

∂zns∂s−Azns ≤ 0,

from which the positive maximum principle (Lemma 4.12) implies that

‖vn,1t ‖2∞ + ‖vn,2t ‖2∞ ≤ eC0t(‖Φ1

t‖2∞ + ‖Φ2t‖2∞

)and the boundedness of vn,1 and vn,2 follows. To show the boundedness of thefirst derivatives, define uns as in (4.26); then


∂uns∂s−Anuns =

−d∑

α,β,γ=1


n,1s

) (∂β∂γ v

n,1s

)+(∂α∂γ v

n,2s

) (∂β∂γ v

n,2s

))+

d∑α,β,γ=1

∂γaαβn

((∂α∂β v

n,1s

) (∂γ v

n,1s

)+(∂α∂β v

n,2s

) (∂γ v

n,2s

))+

d∑α,β=1

∂βfαn

(∂αv

n,1s ∂β v

n,1s + ∂αv

n,2s ∂β v

n,2s

)+

d∑α=1


n,2s )

+d∑

α=1

(−(∂αvn,1s )

(∂α(r>Bvn,2s )

)+ (∂αvn,2s )

(∂α(r>Bvn,1s )

)).

Bounds on the first four summations are identical to those used in the proofin the uncorrelated noise case, so

∂uns∂s−Anuns ≤ −Θns + 1

2Θns + C2u

ns + 2K0du

ns + C4(uns + d)

+d∑

α=1

(−(∂αvn,1s )

(∂α(r>Bvn,2s )

)+ (∂αvn,2s )

(∂α(r>Bvn,1s )

)).

To bound the final summation again use the special form of Corollary 3.40,

∂uns∂s−Anuns ≤ 1

2Θns + C0u

ns + C1

+d∑

α,γ=1

cγs(−(∂αvn,1s )(∂α∂γ vn,2s ) + (∂αvn,2s )(∂α∂γ vn,1s )

)+

d∑α,γ=1

(∂αcγs )(−(∂αvn,1s )(∂γ vn,2s ) + (∂αvn,2s )(∂γ vn,1s )

).

The first summation can be bounded using ab ≤ 12 (a2 + b2) for ε > 0,

d∑α,γ=1

cγs(−(∂αvn,1s )(∂α∂γ vn,2s ) + (∂αvn,2s )(∂α∂γ vn,1s )

)≤ d‖c‖∞uns

ε

+ε

2‖c‖∞

d∑α,γ=1

((∂α∂γ v

n,1s

)2+(∂α∂γ v

n,2s

)2).


Again by choice of ε sufficiently small, the matrix a−ε‖c‖∞I remains positivedefinite (for ε < λ), therefore

−12Θns +

ε

2‖c‖∞

d∑α,γ=1

((∂α∂γ v

n,1s

)2+(∂α∂γ v

n,2s

)2) ≤ 0.

Since ∂αcγt is uniformly bounded by C5, it follows that

d∑α,γ=1

(∂αcγs )(−(∂αvn,1s )(∂γ vn,2s ) + (∂αvn,2s )(∂γ vn,1s )

)≤ C5

d∑α,γ=1

(|∂αvn,2s ||∂γ vn,1s |+ |∂αvn,1s ||∂γ vn,2s |

)≤ C5

2

d∑α,γ=1

(|∂αvn,1s |2 + |∂γ vn,2s |2 + |∂αvn,2s |2 + |∂γ vn,1s |2

)≤ dC5

d∑α=1

(|∂αvn,1s |2 + |∂αvn,2s |2

)≤ 2dC5u

ns .

Using all these bounds

∂uns∂s−Anuns ≤ C0u

ns + C1,

where C0 , C2 + 2K0d+C4 + d‖c‖∞/ε+ 2dC5 and C1 , dC4; thus as in thecorrelated case

‖uns ‖∞ ≤ eC0T

(12

d∑α=1

(∥∥∂αΦ1∥∥2

∞ +∥∥∂αΦ2

∥∥2

∞

)+C1

C0

),

from which the bound follows. The boundedness of the second derivatives isestablished by a similar but longer argument.

4.18 Using Exercises 3.11 and 3.25 the conditions (3.25) and (3.42) are satis-fied. Lemma 3.29 then implies that mπ

t = ρt(1). From the Kallianpur–Striebelformula (3.36), for any ϕ bounded Borel-measurable, ρt(ϕ) = πt(ϕ)ρt(1), andby Exercise 4.4 the process ρt belongs to U .

4.21 Since ρt(1) = E[Zt|Yt], we need to prove that E[ρt(1)ξ] = E[ρs(1)ξ]for any Ys-measurable function. We have, using the martingale property of Zthat

E[E[Zt|Yt]ξ

]= E

[Ztξ]

= E[E[Ztξ|Ys

]]= E

[Zsξ

]= E

[E[Zs|Ys]ξ

],

which implies that ρt(1) is a Yt-martingale.


4.22 From Lemma 3.18 it follows that ρt is cadlag, and ρt is Yt-adaptedwhich implies that it is Ft-adapted since Yt ⊂ Ft. To check the condition(4.37), note that

(µt(|(hi +Bi)ϕ|))2 ≤ 2 (µt(|hiϕ|))2 + 2 (µt(|Biϕ|))2

≤ 2‖ϕ‖2∞ (µt(‖h‖))2 + 2‖Bϕ‖2∞ (µt(1))2.

Thus∫ t

0

m∑i=1

[µs(|(hi +Bi)ϕ|)]2 ds

≤ 2m(‖ϕ‖2∞

∫ t

0

(µs(‖h‖))2 ds+ ‖Bϕ‖2∞∫ t

0

(µs(1))2 ds)

≤ 2m

‖ϕ‖2∞ ∫ t

0

(µs(‖h‖))2 ds+ t‖Bϕ‖2∞

(sups∈[0,t]

µs(1)

)2 .

Since (3.42) is satisfied, the first term is P-a.s. finite. As µt(1) has cadlagpaths, it follows that the second term is P-a.s. finite.

4.27 If h is bounded then conditions (3.25) and (3.42) are automaticallysatisfied. If πt is the normalised conditional distribution, by Lemma 3.29,mπt = ρt(1), hence from the Kallianpur–Striebel formula (3.36) mπ

t πt(ϕ) =ρt(ϕ), and from Exercise 4.22 it then follows that mππ is in U ′. As πt is Yt-adapted, it is Ft-adapted. Furthermore, from Corollary 2.26 the process πthas cadlag paths; thus πt is in U ′.

4.28 Suppose that there are two solutions π1 and π2 in U ′. Then ρi , mπiπiare corresponding solutions of the Zakai equation, and from the definition ofU ′ must lie in U ′. As condition U ′ holds, by Theorem 4.26, it follows that ρ1

and ρ2 are indistinguishable. The remainder of the proof is identical to thatof Theorem 4.19.


There are numerous other approaches to establish uniqueness of solution to thefiltering equations. Several papers address the question of uniqueness with-out assuming that the solution of the two SPDEs (Zakai’s equation or theKushner–Stratonovich equation) is adapted with respect to the given obser-vation σ-field Yt. A benefit of this approach is that it allows uniqueness in lawof the solution to be established. In Szpirglas [264], the author shows that inthe absence of correlation between the observation noise and the signal, theZakai equation is equivalent to the equation

ρt(ϕ) = π0(Ptϕ) +∫ t

0

ρs(Pt−sϕh>) dYs, (4.49)


for all ϕ ∈ B(S), where Pt is the semigroup associated with the generator A.This equivalence means that a solution of the Zakai equation is a solution of(4.49) and vice versa. The uniqueness of the solution of (4.49) is establishedby iterating a simple integral inequality (Section V2, [264]). However, thistechnique does not appear to extend to the case of correlated noise.

More recently, Lucic and Heunis [200] prove uniqueness for the correlatedcase, again without the assumption of adaptedness of the solution to theobservation σ-algebra. There are no smoothness conditions imposed on thecoefficients of the signal or observation equation. However h is assumed to bebounded and the signal non-degenerate (i.e. σ>σ is required to be positivedefinite).

The problem of establishing uniqueness when ρt and πt are required to beadapted to a specified σ-algebra Yt is considered in Kurtz and Ocone [170]and further in Bhatt et al. [18]. This form of uniqueness can be establishedunder much less restrictive conditions on the system.

5

The Robust Representation Formula

5.1 The Framework

Throughout this section we assume that the pair (X,Y ) are as defined inChapter 3. That is, X is a solution of the martingale problem for (A, π0) andY satisfies the evolution equation (3.5) with null initial condition; that is,

Ys =∫ s

0

h(Xr) dr +Ws, s ≥ 0. (5.1)

To start off with, we assume that the function h = (hi)mi=1 : S→ Rm satisfieseither Novikov’s condition (3.19) or condition (3.25) so that the process Z =Zt, t > 0 defined by

Zt = exp(−∫ t

0

h(Xs)> dWs −12

∫ t

0

‖h(Xs)‖2 ds), t ≥ 0, (5.2)

is a genuine martingale and the probability measure P defined on Ft by takingits Radon–Nikodym derivative with respect to P to be given by Zt, viz

dPdP

∣∣∣∣∣Ft

= Zt

is well defined (see Section 3.3 for details; see also Theorem B.34 and CorollaryB.31). We remind the reader that, under P the process Y is a Brownian motionindependent of X. The Kallianpur–Striebel formula (3.33) implies that for anyϕ a bounded Borel-measurable function


P(P)-a.s.,

where ρt is the unnormalised conditional distribution of X,


128 5 The Robust Representation Formula

ρt(ϕ) = E[ϕ(Xt)Zt

∣∣∣Yt] ,and

Zt = exp(∫ t

0

h(Xs)> dYs −12

∫ t

0

‖h(Xs)‖2 ds). (5.3)

Exercise 5.1. Show that the Kallianpur–Striebel formula holds true for anyBorel-measurable function ϕ such that E [|ϕ(Xt)|] <∞.

In the following, we require that s 7→ h(Xs) be a semimartingale. Let

h(Xs) = H fvs +Hm

s , s ≥ 0

be the Doob–Meyer decomposition of h(Xs) with H fv· = (H fv,i

· )mi=1 the finitevariation part of h(X), and Hm

· = (Hm,i· )mi=1 the martingale part, which is

assumed to be square integrable. We require that for all positive k > 0, thefollowing conditions be satisfied,

cfv,k = E

[exp

(k

m∑i=1

∫ t

0

∣∣dH fv,is

∣∣)] <∞ (5.4)

cm,k = E

[exp

(k

m∑i=1

∫ t

0

d⟨Hm,i

⟩s

)]<∞, (5.5)

where s 7→⟨Hm,i

⟩s

is the quadratic variation of Hm,i, for i = 1, . . . ,m and∫ t0

∣∣dH fv,is

∣∣ is the total variation of H fv,i on [0, t] for i = 1, . . . ,m.

Exercise 5.2. Using the notation from Chapter 3, show that if X is a solutionof the martingale problem for (A, π0) and hi, (hi)2 ∈ D(A), i = 1, . . . ,m, thenconditions (5.4) and (5.5) are satisfied. [Hint: Use Exercise 3.22.]

5.2 The Importance of a Robust Representation

In the following we denote by y· an arbitrary element of the set CRm [0, t], wheret ≥ 0 is arbitrary but fixed throughout the section. In other words s 7→ ys isa continuous function y· : [0, t]→ Rm. Also let Y· be the path-valued randomvariable

Y· : Ω → CRm [0, t], Y·(ω) = (Ys(ω), 0 ≤ s ≤ t).

Similar to Theorem 1.1, one can show that if ϕ is, for example, a boundedBorel-measurable function, then πt(ϕ) can be written as a function of theobservation path. That is, there exists a bounded measurable function fϕ :CRm [0, t]→ R such that

πt(ϕ) = fϕ(Y·) P-a.s. (5.6)

5.3 Preliminary Bounds 129

Of course, fϕ is not unique. Any other function fϕ such that

P Y −1·(fϕ 6= fϕ

)= 0,

where P Y −1· is the distribution of Y· on the path space CRm [0, t] can re-

place fϕ in (5.6). In the following we obtain a robust representation of theconditional expectation πt(ϕ) (following Clark [56]). That is, we show thatthere exists a continuous function fϕ : CRm [0, t] → R (with respect to thesupremum norm on CRm [0, t]) such that

πt(ϕ) = fϕ(Y·) P-a.s. (5.7)

The following exercise shows that such a continuous fϕ has the virtue ofuniqueness.

Exercise 5.3. Show that if PY −1· positively charges all non-empty open sets

in CRm [0, t], then there exists a unique continuous function fϕ : CRm [0, t]→ Rfor which (5.7) holds true. Finally show that if Y satisfies evolution equation(5.1) then it charges all non-empty open sets.

The need for this type of representation arises when the filtering frameworkis used to model and solve ‘real-life’ problems. As explained in a substantialnumber of papers (e.g. [56, 74, 73, 75, 76, 179, 180]) the model Y chosenfor the “real-life” observation process Y may not be a perfect one. However,as long as the distribution of Y· is close in a weak sense to that of Y· (andsome integrability assumptions hold), the estimate f(Y·) computed on theactual observation will still be reasonable, as E[(ϕ(Xt) − fϕ(Y·))2] is wellapproximated by the idealized error E[(ϕ(Xt)− fϕ(Y·))2].

Even when Y and Y coincide, one is never able to obtain and exploit acontinuous stream of data as modelled by the continuous path Y·(ω). Insteadthe observation arrives and is processed at discrete moments in time

0 = t0 < t1 < t2 < · · · < tn = t.

However the continuous path Y·(ω) obtained from the discrete observations(Yti(ω))ni=1 by linear interpolation is close to Y·(ω) (with respect to the supre-mum norm on CRm [0, t]); hence, by the same argument, fϕ(Y·) will be asensible approximation to πt(ϕ).

5.3 Preliminary Bounds

Let Θ(y·) be the following random variable

Θ(y·) , exp(h(Xt)>yt − I(y·)− 1

2

∫ t

0

‖h(Xs)‖2 ds), (5.8)


where I(y·), is a version of the stochastic integral∫ t

0y>s dh(Xs). The argument

of the exponent in the definition of Θ(y·) will be recognized as a formal inte-gration by parts of the argument of the exponential in (5.3). In the following,for any random variable ξ we denote by ‖ξ‖Ω,p the usual Lp norm of ξ,

‖ξ‖Ω,p = E [|ξ|p]1/p ,

Lemma 5.4. For any R > 0 and p ≥ 1 there exists a positive constant MΘR,p

such thatsup‖y·‖≤R

‖Θ(y·)‖Ω,p ≤MΘR,p. (5.9)

Proof. In the following, for notational conciseness, for arbitrary y· ∈ CRm [0, t],define y· ∈ CRm [0, t] by

ys , yt − ys, s ∈ [0, t].

If ‖y·‖ ≤ R, then it is clear that ‖y·‖ ≤ 2R. From (5.8) we get that

Θ(y·) = exp(∫ t

0

y>s dh(Xs)− 12

∫ t

0

‖h(Xs)‖2 ds)

≤ exp(∫ t

0

y>s dH fvs +

∫ t

0

y>s dHms

).

Next observe that, from (5.4) we have

E[exp(

2p∫ t

0

y>s dH fvs

)]≤ E

[exp(

4pR∫ t

0

∣∣dH fvs

∣∣)] = cfv,4pR,

and by using the Cauchy–Schwartz inequality

E[exp(

2p∫ t

0

y>s dHms

)]

= E

exp

2p∫ t

0

y>s dHms − 4p2

m∑i,j=1

∫ t

0

yisyjs d〈Hm,i, Hm,j〉s

+ 4p2m∑

i,j=1

∫ t

0


≤√

E [Θ′r(y·)]

√√√√√E

exp

8p2

m∑i,j=1

∫ t

0


≤√

E [Θ′r(y·)]

√√√√√E

exp

32p2R2

m∑i,j=1

∫ t

0

|d〈Hm,i, Hm,j〉|s

,

5.3 Preliminary Bounds 131

where

Θ′r(y·) , exp

4p∫ r

0

y>s dHms −

(4p)2

2

m∑i,j=1

∫ r

0


.

The process r 7→ Θ′r(y·) is clearly an exponential local martingale and byNovikov’s condition and (5.5) it is a martingale, so

E [Θ′r(y·)] = 1.

From this, the fact that∫ t

0

∣∣d ⟨Hm,i, Hm,j⟩∣∣s≤ 1

2

∫ t

0

d⟨Hm,i

⟩s

+12

∫ t

0

d⟨Hm,j

⟩s,

and (5.5) we get

E[exp(

2p∫ t

0

y>s dHms

)]≤√cm,32p2R2m.

Hence, again by applying Cauchy–Schwarz’s inequality, (5.9) follows withMΘR,p = (cfv,4pR

√cm,32p2R2m)1/2p. ut

Now let ϕ be a Borel-measurable function such that ‖ϕ(Xt)‖Ω,p <∞ forsome p > 1. Note that ‖ϕ(Xt)‖Ω,p is the same whether we integrate withrespect to P or P. Let gϕ, g1, fϕ : CRm [0, t]→ R be the following functions,

gϕ(y·) = E [ϕΘ(y·)] , g1(y·) = E [Θ(y·)] , f(y·) =gϕ(y·)g1(y·)

. (5.10)

Lemma 5.5. For any R > 0 and q ≥ 1 there exists a positive constant MΘR,q

such that ∥∥Θ(y1· )−Θ(y2

· )∥∥Ω,q≤MΘ

R,q

∥∥y1· − y2

·∥∥ (5.11)

for any two paths y1· , y

2· such that |y1

· |, |y2· | ≤ R. In particular, (5.11) implies

that g1 is locally Lipschitz; more precisely∣∣g1 (y1·)− g1

(y2·)∣∣ ≤MΘ

R

∥∥y1· − y2

·∥∥

for any two paths y1· , y

2· such that

∥∥y1·∥∥ ,∥∥y2

·∥∥ ≤ R and MΘ

R = infq≥1MΘR,q.

Proof. For the two paths y1· , y

2· denote by y12

· the difference path defined asy12· , y1

· − y2· . Then

∣∣Θ (y1·)−Θ

(y2·)∣∣ ≤ (Θ (y1

·)

+Θ(y2·)) ∣∣∣∣∫ t

0

(y12s

)>dh(Xs)

∣∣∣∣ ,Using the Cauchy–Schwartz inequality


∥∥Θ(y1· )−Θ(y2

· )∥∥Ω,q≤ 2MΘ

R,2q

∥∥∥∥∫ t

0

(y12s

)>dh(Xs)

∥∥∥∥Ω,2q

. (5.12)

Finally, since∥∥y12·∥∥ ≤ 2

∥∥y1· − y2

·∥∥, a standard argument based on Burk-

holder–Davis–Gundy’s inequality shows that the expectation on the right-hand side of (5.12) is bounded by∥∥∥∥∫ t

0

(y12s

)>dh(Xs)

∥∥∥∥Ω,2q

≤∥∥∥∥∫ t

0

(y12s

)>dH fv

s

∥∥∥∥Ω,2q

+∥∥∥∥∫ t

0

(y12s

)>dHm

s

∥∥∥∥Ω,2q

≤ 2∥∥y1· − y2

·∥∥∥∥∥∥∫ t

0

∣∣dH fvs

∣∣∥∥∥∥Ω,2q

+ 2cq∥∥y1· − y2

·∥∥ m∑i=1

∥∥∥∥∫ t

0

d⟨Hm,i

⟩s

∥∥∥∥1/2

Ω,q

,

where cq is the constant appearing in the Burkholder–Davis–Gundy inequality.Hence (5.11) holds true. utLemma 5.6. The function gϕ is locally Lipschitz and locally bounded.

Proof. Fix R > 0 and let y1· , y

2· be two paths such that ‖y1

· ‖, ‖y2· ‖ ≤ R. By

Holder’s inequality and (5.11), we see that

E[∣∣ϕ(Xt)

∣∣ ∣∣Θ (y1·)−Θ

(y2·)∣∣] ≤ ‖ϕ(Xt)‖Ω,pMΘ

R,q

∥∥y1· − y2

·∥∥ . (5.13)

where q is such that p−1 + q−1 = 1. Hence gϕ is locally Lipschitz, since

gϕ(y1· )− gϕ(y2

· ) = E[ϕ(Xt)

(Θ(y1

· )−Θ(y2· ))]

and R > 0 was arbitrarily chosen. Next let y· be a path such that ‖y·‖ ≤ R.Again, by Holder’s inequality and (5.9), we get that

sup‖y·‖≤R

|gϕ(y·)| = sup‖y·‖≤R

∣∣∣E [ϕ(Xt)Θ(y1·)]∣∣∣ ≤ ‖ϕ(Xt)‖pMΘ

R,q <∞.

Hence gϕ is locally bounded. utTheorem 5.7. The function fϕ is locally Lipschitz.

Proof. The ratio gϕ/g1 of the two locally Lipschitz functions gϕ and g1

(Lemma 5.5 and Lemma 5.6) is locally Lipschitz provided both gϕ and 1/g1tare locally bounded. The local boundedness property of gϕ is shown in Lemma5.6 and that of 1/g1t follows from the following simple argument. If ‖y·‖ ≤ RJensen’s inequality implies that

E [Θ(y·)] ≥ exp(

E[∫ t

0

y>s dHms +

∫ t

0

y>s dH fvs −

12

∫ t

0

‖h(Xs)‖2 ds])

≥ exp

(−2R

m∑i=1

E[∫ t

0

∣∣dH fv,is

∣∣]− 12

E[∫ t

0

‖h(Xs)‖2 ds])

. (5.14)

Note that both expectations in (5.14) are finite, by virtue of condition (5.4).ut

5.4 Clark’s Robustness Result 133

5.4 Clark’s Robustness Result

We proceed next to show that fϕ(Y·) is a version of πt(ϕ). This fact is muchmore delicate than showing that fϕ is locally Lipschitz. The main difficultyis the fact that the mapping

(y·, ω) ∈ CRm [0, t]×Ω → I(y·) ∈ R

is not B (CRm [0, t])×F-measurable since the integral I(y·) is constructed pathby path (where B(CRm [0, t]) is the Borel σ-field on CRm [0, t]). Let H1/3 be thefollowing subset of CRm [0, t],

H1/3 =

y· ∈ CRm [0, t] : K(y·) , sup

s1,s2∈[0,t]

‖ys1 − ys2‖∞|s1 − s2|1/3

<∞

.

Exercise 5.8. Show that almost all paths of Y belong to H1/3, in other wordsshow that

P(ω ∈ Ω : Y·(ω) ∈ H1/3

)= 1.

[Hint: Use the modulus of continuity for Brownian motion; see, for example,[149, page 114].]

Lemma 5.9. There exists a version of the stochastic integral I(y·) which hasthe property that the mapping (y·, ω) ∈ CRm [0, t] × Ω → I(y·) ∈ R, whilststill non-measurable, is equal on H1/3 ×Ω to a B (CRm [0, t])×Ω-measurablemapping.

Proof. Denote by I fv(y·) the Stieltjes integral with respect to H fv. I fv(y·)is defined unambiguously pathwise. To avoid ambiguity, for arbitrary y· ∈CRm [0, t] and all ω ∈ Ω, we have

I fv(y·)(ω) = limn→∞

n−1∑i=0

y>it/n

(H fv

(i+1)t/n(ω)−H fvit/n(ω)

).

Hence defining I(y·) only depends on selecting the version of∫ t

0y>s dHm

s , thestochastic integral with respect to the martingale part of h(X·), which wedenote by Im(y·). Recall that for integrators which have unbounded variationon locally compact intervals it is not possible to define a stochastic integralpathwise for general integrands. However, if we restrict to a suitable class ofintegrands (such as H1/3) then this is possible.

Imn (y·)(ω) ,

n−1∑i=0

y>it/n

(Hm

(i+1)t/n(ω)−Hmit/n(ω)

).

Since, for y· ∈ H1/3,


E

[(Im2k(y·)−

∫ t

0

y>s dHms

)2]

= E

( m∑i=1

∫ t

0

(yis − yi[s2k/t]t2−k

)dHm,i

s

)2

≤ mm∑i=1

E[∫ t

0

(yis − yi[s2k/t]t2−k

)2

d⟨Hm,i

⟩s

]≤ mcXK(y·)2t2/3

22k/3,

where

cX =m∑i=1

E[(Hm,it

)2]<∞.

Hence by Chebychev’s inequality

P(∣∣∣∣Im

2k(y·)−∫ t

0

y>s dHms

∣∣∣∣ > ε

)≤ 1ε2

mcXK(y·)2t2/3

22k/3.

But since∞∑k=1

mcXK(y·)2t2/3

22k/3<∞,

by the first Borel–Cantelli lemma it follows that

P(

lim supk→∞

∣∣∣∣Im2k(y·)−

∫ t

0

y>s dHms

∣∣∣∣ > ε

)= 0;

hence for y ∈ H1/3, Im2k(y·) converges to

∫ t0y>s dHm

s , P-almost surely. We defineIm(y·) to be the limit

Im(y·)(ω) , lim supk→∞

Im2k(y·)(ω)

for any (ω, y·) ∈ Ω×H1/3 and any version of∫ t

0y>s dHm

s on(CRm [0, t] \ H1/3

)×

Ω. Although the resulting map is generally non-measurable with respect toB(CRm [0, t])⊗F , it is equal on H1/3 ×Ω to the following jointly measurablefunction

Jm(y·) , lim supk→∞

Im2k(y·) (5.15)

defined on the whole of CRm [0, t] × Ω. We emphasize that for y /∈ H1/3 it isquite possible that Jm(y) differs from the value of

∫ t0y>s dHm

s . ut

In order to simplify the proof of the robustness result which follows, itis useful to decouple the two processes X and Y . Let (Ω, F , P) be an iden-tical copy of (Ω,F , P) and let X be the copy of X within the new space


(Ω, F , P). Let Hm and H fv be the processes within the new space (Ω, F , P)corresponding to the original Hm and H fv. Then the function gϕ has thefollowing representation,

gϕ(y·) = E[ϕ(Xt)Θ(y·)

](5.16)

Θ(y·) = exp(h(Xt)>yt − I(y·)− 1

2

∫ t

0

‖h(Xt)‖2 ds), (5.17)

where E denotes integration on (Ω, F , P), and I(y·) is the version of thestochastic integral

∫ t0y>s dh(Xs) corresponding to I(y·) as constructed above.

Denote by Im(y·) the respective version of the stochastic integral with respectto the martingale Hm and by I fv(y·) the Stieltjes integral with respect to H fv.Let Jm(y·) be the function corresponding to Jm(y·) as defined in (5.15). Then,for y· ∈ H1/3, Θ(y·) can be written as

Θ(y·) = exp(h>(Xt)yt − I fv(y·)− Jm(y·)− 1

2

∫ t

0

‖h(Xs)‖2 ds). (5.18)

Finally, let (Ω, F , P) be the product space

(Ω, F , P) = (Ω × Ω,F ⊗ F , P⊗ P)

on which we ‘lift’ the processes H and Y from the component spaces. In otherwords, Y (ω, ω) = Y (ω) and H(ω, ω) = H(ω) for all (ω, ω) ∈ Ω × Ω.

Lemma 5.10. There exists a null set N ∈ F such that the mapping (ω, ω) ∈Ω 7→ I(Y (ω))(ω) coincides on (Ω\N )× Ω with an F-measurable mapping.

Proof. First let us remark that (ω, ω) 7→ I fv(Y (ω))(ω) is equal to

I fv (Y·(ω)) (ω) = limn→∞

n−1∑i=0

Y >it/n(ω)(H fv

(i+1)t/n(ω)− H fvit/n(ω)

)(5.19)

and since

(ω, ω) ∈ Ω 7→n−1∑i=0

Y >it/n(ω)(H fv

(i+1)t/n(ω)− H fvit/n(ω)

)is F-measurable then so is its limit. Define N , ω ∈ Ω : Y·(ω) 6∈ H1/3.Then N ∈ F and P(N ) = 0. Following the definition of Im(y·), the mapping(ω, ω) 7→ Im(Y (ω))(ω) coincides with the mapping (ω, ω) 7→ Jm(Y (ω))(ω) on(Ω\N )× Ω. Then Jm is an F-measurable random variable, since

Jm(Y (ω))(ω) = lim supk→∞

2k−1∑i=0

Y >it/2k(ω)(Hm

(i+1)t/2k(ω)− Hmit/2k(ω)

). (5.20)

Combining this with the measurability of I fv(Y·) gives us the lemma. ut


Lemma 5.11. P-almost surely∫ t

0

Y >s dHs = I fv(Y·) + Jm(Y·). (5.21)

Proof. We have ∫ t

0

Y >s dHs =∫ t

0

Y >s dHms +

∫ t

0

Y >s dH fvs .

Following (5.19) it is obvious that∫ t

0Y >s dH fv

s = I fv(Y·). Hence, followingthe proof of the previous lemma, it suffices to prove that, P-almost surely,∫ t

0Y >s dHm

s = Jm(Y·) where Jm(Y·) is the function defined in (5.20). Withoutloss of generality we assume that m = 1 (the general case follows by treatingeach of the m components in turn) and we note that we only need to provethat, for arbitrary K > 0, P-almost surely,∫ t

0

Y Ks dHms = Jm(Y K· ), (5.22)

where

Y Ks =

Ys if |Ys| ≤ KK otherwise.

In turn, (5.22) follows once we prove that

limn→∞

E

(n−1∑i=0

(Y Kit/n

)> (Hm

(i+1)t/n − Hmit/n

)− Jm

(Y K·))2

= 0.

By Fubini’s theorem, using the F-measurability of Jm(Y K· ) and the fact thatIm(Y K· ) coincides with Jm(Y K· ) on (Ω\N )× Ω we have

E

(n−1∑i=0

(Y Kit/n

)> (Hm

(i+1)t/n − Hmit/n

)− Jm

(Y K·))2

=∫Ω\N

E[(Imn

(Y K· (ω)

)− Jm

(Y K·))2]

dP(ω)

=∫Ω\N

E[(Imn (Y K· (ω))− Im(Y K· )

)2]

dP(ω).

Now since s 7→ Y Ks (ω) is a continuous function and Im(Y K· (ω)) is a versionof the stochastic integral

∫ t0

(Y Ks)> (ω) dHm

s , it follows that

limn→∞

E[(Imn (Y K· (ω))− Im(Y K· (ω))

)2]

= 0


for all ω ∈ Ω\N . Also, we have the following upper bound

E[(Imn (Y K· (ω))− Im(Y K· (ω))

)2]≤ 4K2E

[(Hm

t )2]<∞.

Hence, by the dominated convergence theorem,

limn→∞

E

(n−1∑i=0

(Y Kit/n

)> (Hm

(i+1)t/n − Hmit/n

)− Im(Y K· )

)2

=∫Ω\N

limn→∞

E[(Imn (Y K· (ω))− Im(Y K· (ω))

)2]

dP(ω) = 0.

ut

Theorem 5.12. The random variable fϕ(Y·) is a version of πt(ϕ); that is,πt(ϕ) = fϕ(Y·), P-almost surely. Hence fϕ(Y·) is the unique robust represen-tation of πt(ϕ).

Proof. It suffices to prove that, P-almost surely (or, equivalently, P-almostsurely),

ρt(ϕ) = gϕ(Y·) and ρt(1) = g1(Y·).

We need only prove the first identity as the second is just a special case ob-tained by setting ϕ = 1 in the first. From the definition of abstract conditionalexpectation therefore it suffices to show

E [ρt(ϕ)b(Y·)] = E [gϕ(Y·)b(Y·)] , (5.23)

where b is an arbitrary continuous bounded function b : CRm [0, t]→ R. SinceX and Y are independent under P, it follows that the pair processes (X,Y )under P, and (X, Y ) under P have the same distribution. Hence, the left-handside of (5.23) has the following representation,

E [ρt(ϕ)b(Y·)]

= E[ϕ(Xt) exp

(∫ t

0

h(Xs)> dYs − 12

∫ t

0

‖h(Xs)‖2 ds)b(Y·)

]= E

[ϕ(Xt) exp

(∫ t

0

h(Xs)> dYs − 12

∫ t

0


]= E

[ϕ(Xt) exp

(h(Xt)>Yt −

∫ t

0

Y >s dh(Xs)− 12

∫ t

0


].

On the other hand, using (5.18), the right-hand side of (5.23) has the repre-sentation


E [gϕ(Y·)b(Y·)]

= E[b(Y·)E

[ϕ(Xt) exp

(h(Xt)>Yt − I fv(Y·)− Im(Y·)

− 12

∫ t

0

‖h(Xs)‖2 ds)]]

= E[b(Y·)E

[ϕ(Xt) exp

(h(Xt)>Yt − I fv(Y·)− Jm(Y·)

− 12

∫ t

0

‖h(Xs)‖2 ds)]]

.

Hence by Fubini’s theorem (using, again the F-measurability of Jm(Y·))

E [gϕ(Y·)b(Y·)] = E[ϕ(Xt) exp

(h(Xt)>Yt − I fv(Y·)− Jm(Y·)

−12

∫ t

0


].

Finally, from Lemma 5.11, the two representations coincide. ut

Remark 5.13. Lemma 5.11 appears to suggest a pathwise construction for thestochastic integral ∫ t

0

h(Xs)> dYs,

but we know that for cases such as∫ t

0Bs dBs a stochastic integral cannot

be defined pathwise (see Remark B.17). However, this apparent paradox isresolved by noting that the terms appearing in the lemma are only constructedon the space Ω.

This construction has other uses in the numerical solution of problemsinvolving stochastic integrals. For example, adaptive pathwise approximationis sometimes used in numerical evaluation of stochastic integrals. Suppose wewish to evaluate the stochastic integral

∫ t0Xs dYs where X and Y are cadlag

processes and we assume the usual conditions on the filtration. Given δ > 0,if we define stopping times T δ0 = 0 and

T δk = inft > T δk−1 : |Xt −Xtk−1 | > δ,

then the stochastic integral may be approximated pathwise by

(X · Y )(δ) ,∞∑k=0

XT δk(YT δk+1

− YT δk ).

If δn is a sequence of values of δ which tends to zero sufficiently fast, bysimilar calculations to those used in the justification that Im is a pathwiseapproximation to the stochastic integral, this series of approximations canbe shown to converge P-a.s. uniformly on a finite interval to the stochasticintegral as n→∞.

5.6 Bibliographic Note 139


5.1 Repeat the proof of the formula for a Borel-measurable function ϕ suchthat E [|ϕ(Xt)|] <∞. Alternatively use the following argument. It suffices toprove the result only for ϕ a non-negative Borel-measurable function such thatE [ϕ(Xt)] <∞, as the general result follows by decomposing the function intoits positive and negative parts. Consider the sequence (ϕn)n≥0 of functionsdefined as

ϕn =

ϕ(x) if ϕ(x) ≤ nn otherwise

.

Then ϕn is bounded and by the Kallianpur–Striebel formula (3.33),

πt(ϕn) =ρt(ϕn)ρt(1)

P(P)−a.s.

AlsoE[ϕ (Xt) Zt

]= E [ϕ(Xt)] <∞.

Hence, by the conditional monotone convergence theorem

πt(ϕ) = limn→∞

πt(ϕn) =1

ρt(1)limn→∞

ρt(ϕn) =ρt(ϕ)ρt(1)

.

5.3 Let fϕ1 and fϕ2 be two continuous functions both versions of πt(ϕ). Then

A =y· ∈ CRm [0, t] : fϕ1 (y·) 6= fϕ2 (y·)

is an open set CRm [0, t]. Also, from (5.7), we get that

P Y −1· (A) = P

(ω ∈ Ω : fϕ1 (Y·(ω)) 6= fϕ2 (Y·(ω))

)= 0.

Since PY −1· positively charges all non-empty open sets in CRm [0, t], it follows

that A must be empty. Finally observe that, by Girsanov’s theorem the distri-bution of Y· under P is absolutely continuous with respect to the distributionof Y under P. The results follows since the Wiener measure charges all opensets in CRm [0, t] and the Radon–Nikodym derivative dP/dP is almost surelypositive.

5.6 Bibliographic Note

The robust representation was introduced by Clark [56]. Both Clark and Kush-ner [179] show that the associated robust expression for the conditional dis-tribution fϕ given by (5.10) is locally Lipschitz continuous in the observationpath y. Very general robustness results have been obtained by Gyongy [115]and Gyongy and Krylov [114].

6

Finite-Dimensional Filters

In Section 3.5 we analyzed the case when X is a Markov process with finitestate space I and associated Q-matrix Q (see Exercise 3.27). In that case,π = πt, t ≥ 0 the conditional distribution of Xt given the σ-algebra Ytis a finite-dimensional process. More precisely π = (πit)i∈I , t ≥ 0, theconditional distribution of Xt given the σ-algebra Yt is a process with valuesin RI which solves the stochastic differential equation (3.53). The naturalquestion which arises is whether the finite-dimensionality property is preservedwhen the signal is a diffusion process, in particular when the signal is thesolution of the d-dimensional stochastic differential equation (3.9) (see Section3.2 for details). In general, the answer to this question is negative (see, e.g.[42, 189, 231, 233]). With some notable exceptions, π is truly an infinite-dimensional stochastic process. The aim of this chapter is to study two specialclasses of filters for which the corresponding π is finite-dimensional: the Benesfilter (see [9]) and the linear filter, also known as the Kalman–Bucy filter([29, 146, 147]).

6.1 The Benes Filter

To simplify the calculations, we assume that both the signal and the obser-vation are one-dimensional. We also assume that the signal process satisfies astochastic differential equation with constant diffusion term and non-randominitial condition; that is, X is a solution of the equation

Xt = x0 +∫ t

0

f(Xs) ds+ σVt. (6.1)

In (6.1) σ > 0 is a positive constant, x0 ∈ R, V is a Brownian motion and thefunction f : R→ R is differentiable, and satisfies the analogue of (3.10),

|f(x)− f(y)| ≤ K|x− y|. (6.2)


142 6 Finite-Dimensional Filters

As in Chapter 3, the Lipschitz condition (6.2) is to ensure that the SDE for thesignal process has a unique solution. We assume that W is a standard Brow-nian motion which is independent of V and that Y is the process satisfyingthe following evolution equation

Yt =∫ t

0

h(Xs) ds+Wt. (6.3)

In (6.3) h : R→ R is chosen to be the linear function

h(x) = h1x+ h2, x ∈ R, where h1, h2 ∈ R.

We assume that the following condition, introduced by Benes in [9], is satisfied

f ′(x) + f2(x)σ−2 + h2(x) = P (x), x ∈ R, (6.4)

where f ′ is the derivative of f and P (x) is a second-order polynomial withpositive leading-order coefficient.

Exercise 6.1. i. Show that if f is linear then the Benes condition is satisfied(which establishes that the linear filter with time-independent coefficientsis a Benes filter).

ii. Show that the function f defined as

f(x) = ασβe2αx/σ − 1βe2αx/σ + 1

, where α, β ∈ R

satisfies the Benes condition. Thus show that f(x) = aσ tanh(ax/σ) sat-isfies the Benes condition.

iii. Show that the function f defined as

f(x) = aσ tanh(b+ ax/σ), where a, b ∈ R,

satisfies the Benes condition.

6.1.1 Another Change of Probability Measure

We need to apply a change of the probability measure similar to the one de-tailed in Section 3.3. This time both the distribution of X and Y are affected,not just that of the observation process Y as was previously the case. LetZ = Zt, t > 0 be the process defined by

Zt , exp(−∫ t

0

f(Xs)σ

dVs −12

∫ t

0

f(Xs)2

σ2ds

−∫ t

0

h(Xs) dWs −12

∫ t

0

h(Xs)2 ds), t ≥ 0. (6.5)

6.1 The Benes Filter 143

Exercise 6.2. Show that the process Z = Zt, t ≥ 0 is an Ft-adaptedmartingale under the measure P.

Let P be a new probability measure such that its Radon–Nikodym deriva-tive with respect to P is

dPdP

∣∣∣∣∣Ft

= Zt

for all t ≥ 0. Let V = Vt, t > 0, be the process

Vt , Vt +∫ t

0

f(Xs)σ

ds, t ≥ 0.

Using Girsanov’s theorem the pair process (V , Y ) = (Vt, Yt), t > 0 is astandard two-dimensional Brownian motion. Let Z = Zt, t ≥ 0 be theprocess defined as Zt = Z−1

t for t ≥ 0. By Ito’s formula, this process Zsatisfies the following stochastic differential equation,

dZt = Zt

(h(Xt) dYt + f(Xt)σ−1 dVt

), (6.6)

and since Z0 = 1,

Zt = exp(∫ t

0

f(Xs)σ

dVs −12

∫ t

0

f(Xs)2

σ2ds

+∫ t

0

h(Xs) dYs −12

∫ t

0

h(Xs)2 ds), t ≥ 0. (6.7)

It is clear that EZt = E(ZtZt) = 1, so Z is a martingale under P and we have

dPdP

∣∣∣∣Ft

= Zt for t ≥ 0.

Let F be an antiderivative of f ; that is, F is such that F ′(x) = f(x) for allx ∈ R. By Ito’s formula,

F (Xt) = F (X0) +∫ t

0

f(Xs)σ dVs +12

∫ t

0

f ′(Xs)σ2 ds.

Thus from the Benes condition (6.4) we get that, for all t ≥ 0,

Zt = exp(F (Xt)σ2

− F (x0)σ2

+∫ t

0

h(Xs) dYs −12

∫ t

0

P (Xs) ds).

Exercise 6.3. Prove that, under P the observation process Y is a Brownianmotion independent of X, where we can write

Xt = X0 + σVt.


Define ρt to be a measure-valued process following the definition of theunnormalised conditional expectation in Chapter 3. For every ϕ a boundedBorel-measurable function, it follows that ρt(ϕ) satisfies

ρt(ϕ) , E[ϕ(Xt)Zt|Y] P-a.s., (6.8)

where E is the expectation with respect to P. As a consequence of Proposition3.15, the process ρ(ϕ) is a modification of that defined with Y replaced by Ytin (6.8).

Exercise 6.4. For every ϕ a bounded Borel-measurable function we have


, P(P)-a.s. (6.9)

6.1.2 The Explicit Formula for the Benes Filter

We aim to obtain an explicit expression of the (normalised) density of ρt. Forthis we make use of the closed form expression (B.30) of the functional Iβ,Γ,δt

as described in equation (B.22) of the appendix. This cannot be done directlyas the argument of the exponential in (B.22) contains no stochastic integral.However, similar to the analysis in Chapter 5, one can show that

ρt(ϕ)ρt(1)

= limn→∞

ρnt (ϕ)ρnt (1)

,

where ρnt is the measure defined as

ρnt (ϕ) , E[ϕ(Xt) exp

(F (Xt)σ2

− F (x0)σ2

+∫ t

0

h(Xs)yns ds

−12

∫ t

0

P (Xs) ds)]

, (6.10)

for any bounded measurable function ϕ and yn = yns , s ∈ [0, t] the piecewiseconstant process

yns =Y(k+1)t/n − Ykt/n

t/n, s ∈ [kt/n, (k + 1)t/n), k = 0, 1, . . . , n− 1.

As explained in Chapter 5, the expectation in (6.10) is no longer conditional.We keep yn fixed to the observation path, or rather the approximation of its‘derivative’ and integrate with respect to the law of V .

Exercise 6.5. Prove that, almost surely,

limn→∞

∫ t

0

sinh(spσ)sinh(tpσ)

yns ds =∫ t

0


dYs,


and that there exists a positive random variable c(t, Y ) such that, uniformlyin n ≥ 1, we have ∣∣∣∣∫ t

0


yns ds∣∣∣∣ ≤ c(t, Y ).

In the following, we express the polynomial P (x) in the form

P (x) = p2x2 + 2qx+ r,

where p, q, r ∈ R are arbitrary. Then we have the following.

Lemma 6.6. For an arbitrary bounded Borel-measurable function ϕ, the ratioρnt (ϕ)/ρnt (1) has the following explicit formula

ρnt (ϕ)ρnt (1)

=1cnt

∫ ∞−∞

ϕ(x0 + σz) exp(F (x0 + σz)σ−2 +Qnt (z)

)dz, (6.11)

where Qnt (z) is the second-order polynomial

Qnt (z) , z

(∫ t

0


σ(h1y

ns − q − p2x0

)ds)− pσ coth(tpσ)

2z2,

and cnt is the normalising constant

cnt ,∫ ∞−∞

exp(F (x0 + σz)σ−2 +Qn(z)

)dz.

Proof. From (6.10), the expression for ρnt (ϕ) becomes

ρnt (ϕ) = λnt E[ϕ(x0 + σVt) exp

(F(x0 + σVt

)σ−2 +

∫ t

0

Vsβns ds

−12

∫ t

0

(pσVs)2 ds)]

, (6.12)

where

λnt , exp(−F (x0)σ−2 +

∫ t

0

(h1x0 + h2)yns ds− 12

(r + 2x0q + p2x20)t),

βns , σ(h1yns − q − p2x0).

If we make the definition

Iβn,pσ,z

t , E[

exp(∫ t

0

Vsβns ds− 1

2

∫ t

0

(pσVs)2 ds)∣∣∣∣ Vt = z

],

then


E[ϕ(x0 + σVt) exp

(F(x0 + σVt

)σ−2 +

∫ t

0

Vsβns ds

−12

∫ t

0

(pσVs)2 ds)∣∣∣∣ Vt = z

]= Iβ

n,pσ,zt ϕ(x0 + σz) exp

(F (x0 + σz)

σ2

). (6.13)

Following (B.36) we get that

Iβn,pσ,z

t = fβn,pσ

t exp(z

∫ t

0


βns ds− pσ coth(tpσ)2

z2 +z2

2t

),

(6.14)where

fβn,pσ

t ,

√tpσ

sinh(tpσ)exp(∫ t

0

∫ t

0

sinh((s− t)pσ)sinh(s′pσ)2pσ sinh(tpσ)

βns βns′ dsds′

).

Identity (6.11) then follows from (6.12)–(6.14) by integrating over the N(0, t)law of Vt,

E(·) =1√2πt

∫ ∞−∞

E(· | Vt = z)e−z2/2t dz.

ut

Observe that the function fβn,pσ

t which is used in the above proof does notappear in the final expression for ρnt (ϕ)/ρnt (1). We are now ready to obtainthe formula for πt(ϕ).

Proposition 6.7. If the Benes condition (6.4) is satisfied then for arbitrarybounded Borel-measurable ϕ, it follows that πt(ϕ) satisfies the following ex-plicit formula

πt(ϕ) =1ct

∫ ∞−∞

ϕ(z) exp(F (z)σ−2 +Qt(z)

)dz, (6.15)

where Qt(z) is the second-order polynomial

Qt(z) , z

(h1σ

∫ t

0


dYs +q + p2x0

pσ sinh(tpσ)− q

pσcoth(tpσ)

)− p coth(tpσ)

2σz2,

and ct is the corresponding normalising constant,

ct ,∫ ∞−∞

exp(F (z)σ−2 +Qt(z)

)dz. (6.16)

In particular, π depends only on the one-dimensional Yt-adapted process

t 7→∫ t

0

sinh(spσ) dYs.


Proof. Making a change of variable in (6.11), we get that

ρnt (ϕ)ρnt (1)

=1cnt

∫ ∞−∞

ϕ(u) exp(F (u)σ2

+Qnt

(u− x0

σ

))1σ

du.

Following Exercise 6.5 we get that

limn→∞

∫ t

0


yns ds =∫ t

0


dYs,

hence†

limn→∞

Qnt (z) = zσh1

∫ t

0


dYs

− q + p2x0

p(coth(tpσ)− csch(tpσ)) z − pσ coth(tpσ)

2z2.

Thus

Qt(u) = limn→∞

Qnt

(u− x0

σ

)= uh1

∫ t

0


dYs −p coth(tpσ)

2σu2

− q + p2x0

pσ(coth(tpσ)− csch(tpσ))u+

p coth(tpσ)σ

ux0.

Finally, since

πt(ϕ) = limn→∞

ρnt (ϕ)ρnt (1)

,

the proposition follows by the dominated convergence theorem (again useExercise 6.5). ut

Remark 6.8. For large t, as coth(x)→ 1 and csch(x)→ 0 as x→∞, it followsthat πt(ϕ) is approximately equal to

πt(ϕ) ' 1ct

∫ ∞−∞

ϕ(z) exp(F (z)σ−2 + Pt(z)

)dz,

where Pt(z) is the second-order polynomial

Pt(z) ,

(h1σ

∫ t

t′


dYs −q

σp

)z − p

2σz2, t′ < t.

In particular, past observations become quickly (exponentially) irrelevant andso does the initial position of the signal x0.† Recall that coth(x) = cosh(x)/ sinh(x) and csch(x) = 1/ sinh(x).


Exercise 6.9. Compute the normalising constant ct for the linear filter andthe filter given by f(x) = aσ tanh(ax/σ), which were shown to satisfy theBenes condition described in Exercise 6.1. Hence determine an explicit ex-pression for the density of πt. What is the asymptotic behaviour of πt forlarge t?

If the initial state of the signal X0 is random, then the formula for πt(ϕ)is obtained by integrating (6.15) in the x0 variable with respect to the lawof X0. A multidimensional version of (6.15) can be obtained by following thesame procedure as above. The details of the computation of the exponentialBrownian function Iβ,Γ,δt are described in formula (B.22) of the appendix inthe multidimensional case. Including the full form of πt(ϕ) in this case wouldmake this chapter excessively long. However, the fact that such a computationis possible is fairly important, due to the scarcity of explicit expressions for π.Such explicit expressions provide benchmarks for testing numerical algorithmsfor computing approximations to π.

6.2 The Kalman–Bucy Filter

Let now X = (Xi)di=1 be the solution of the linear SDE driven by a p-dimensional Brownian motion process V = (V j)pj=1,

Xt = X0 +∫ t

0

(FsXs + fs) ds+∫ t

0

σs dVs, (6.17)

where, for any s ≥ 0, Fs is a d× d matrix, σs is a d× p matrix and fs is a d-dimensional vector. The functions s 7→ Fs, s 7→ σs and s 7→ fs are measurableand locally bounded.† Assume that X0 ∼ N(x0, r0) is independent of V . Nextassume that W is a standard Ft-adapted m-dimensional Brownian motion on(Ω,F ,P) independent of X and let Y be the process satisfying the followingevolution equation

Yt =∫ t

0

(HsXs + hs) ds+Wt, (6.18)

where, for any s ≥ 0, Hs is a m×d matrix and hs is an m-dimensional vector.

Remark 6.10. Let Im be the m×m-identity matrix and 0a,b be the a×b matrixwith all entries equal to 0. Let Ls be the (d+m)× (d+m) matrix, ls be the(d+m)-dimensional vector and zs be the (d+m)× (r+m) matrix given by,respectively,

Ls =(Fs Od,m

Hs Om,m

), ls =

(fshs

), zs =

(σs Od,m

Om,r Im

).

† That is, for every time t, the functions are bounded for s ∈ [0, t].

6.2 The Kalman–Bucy Filter 149

Let T = Tt, t > 0 be the (d + m)-dimensional pair process (X,Y ) andU = Ut, t > 0 be the (p+m)-dimensional Brownian motion (V,W ). ThenT is a solution of the linear SDE

Tt = T0 +∫ t

0

(LsTs + ls) ds+∫ t

0

zs dUs. (6.19)

Exercise 6.11. i. Prove that T has the following representation

Tt = Φt

[T0 +

∫ t

0

Φ−1s ls ds+

∫ t

0

Φ−1s zs dUs

], (6.20)

where Φ is the unique solution of the matrix equation

dΦtdt

= LtΦt, (6.21)

with initial condition Φ0 = Id+m.ii. Deduce from (i) that for any n > 0 and any n+ 1-tuple of the form(

Yt1 , Yt2 , . . . , Ytn−1 , Yt, Xt

),

where 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t, has a (d+nm)-variate normal distribution.iii. Let K : [0, t] → Rd×m be a measurable (d × m) matrix-valued function

with all of its entries square integrable. Deduce from (ii) that the pair(Xt,

∫ t

0

Ks dYs

)has a 2d-variate normal distribution.

Lemma 6.12. In the case of the linear filter, the normalised conditional dis-tribution πt of Xt conditional upon Yt is a multivariate normal distribution.

Proof. Consider the orthogonal projection of the components of the signal Xit ,

i = 1, . . . , d, onto the Hilbert spaceHYt ⊂ L2(Ω) generated by the componentsof the observation process

Y js , s ∈ [0, t], j = 1, . . . ,m.

Using Lemma 4.3.2, page 122 in Davis [71], the elements of HYt have thefollowing representation

HYt =

m∑i=1

∫ t

0

ai dY is : ai ∈ L2([0, t]), i = 1, . . . ,m

.

It follows that there exists a (d×m) matrix-valued function K : [0, t]→ Rd×mwith all of its entries square integrable, and a random variable Xt = (Xi

t)di=1

with entries orthogonal on HYt such that


Xt = Xt +∫ t

0

Ks dYs.

In particular, as a consequence of Exercise 6.11 part (iii), Xt has a Gaussiandistribution. Moreover, for any n > 0 any n-tuple of the form(

Yt1 , Yt2 , . . . , Ytn−1 , Xt

),

where 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t has a (d + (n − 1)m)-variate nor-mal distribution. Now since Xt has all entries orthogonal on HYt it followsthat Xt is independent of (Yt1 , Yt2 , . . . , Ytn−1) and since the time instances0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t have been arbitrarily chosen it follows that Xt isindependent of Yt. This observation is crucial! It basically says that, in the lin-ear/Gaussian case, the linear projection (the projection onto the linear spacegenerated by the observation) coincides with the non-linear projection (theconditional expectation with respect to the observation σ-algebra). Hence thedistribution of Xt conditional upon Yt is the same as the distribution of Xt

shifted by the (fixed) quantity∫ t

0Ks dYs. In particular πt is characterized by

its first and second moments alone. ut

6.2.1 The First and Second Moments of the ConditionalDistribution of the Signal

We know from Chapter 3 that the conditional distribution of the signal isthe unique solution of the Kushner–Stratonovich equation (3.57). Unlike themodel analysed in Chapter 3, the above linear filter has time-dependent co-efficients. Nevertheless all the results and proofs presented there apply to thelinear filter with time-dependent coefficients (see Remark 3.1). In the follow-ing we deduce the equations for the first and second moments of π. Let ϕi, ϕijfor i, j = 1, . . . , d be the functions

ϕi(x) = xi, ϕij(x) = xixj , x ∈ R

and let πit, πijt be the moments of πt

πit = πt(ϕi), πijt = πt(ϕij), i, j = 1, . . . , d.

Exercise 6.13. i. Show that for any t ≥ 0 and i = 1, . . . , d and p ≥ 1, thesolution of the equation (6.17) satisfies

sups∈[0,t]

E[∣∣Xi

s

∣∣p] <∞.ii. Deduce from (i) that for any t ≥ 0 and i, j = 1, . . . , d

sups∈[0,t]

E [(πs (|ϕi|))p] <∞, sups∈[0,t]

E [|πs(|ϕij |)|p] <∞.


In particular

sups∈[0,t]

E[|πis|p

]<∞, sup

s∈[0,t]

E[∣∣πijs ∣∣p] <∞.

In this case the innovation process I = It, t ≥ 0 defined by (2.17) hasthe components

Ijt = Y jt −∫ t

0

(d∑i=1

Hjis π

is + hjs

)ds, t ≥ 0, j = 1, . . . ,m.

The Kushner–Stratonovich equation (3.57) now takes the form

πt(ϕ) = π0(ϕ) +∫ t

0

πs(Asϕ) ds+d∑i=1

m∑j=1

∫ t

0

πs(ϕ(ϕi − πis

))Hjis dIjs (6.22)

where the time-dependent generator As, s ≥ 0 is given by

Asϕ =d∑

i,j=1

(F ijs xj + f is

) ∂ϕ∂xi

+12

d∑i=1

d∑j=1

(σsσ>s )ij∂2ϕ

∂xi∂xj,

and ϕ is chosen in the domain of As for any s ∈ [0, t] such that

sups∈[0,t]

‖Asϕ‖ <∞.

To find the equations satisfied by πit and πijt we cannot replace ϕ by ϕi andϕ by ϕij in (6.22) because neither of them belongs to the domain of As (sincethey are unbounded). We proceed by cutting off ϕi and ϕij at a fixed levelwhich we let tend to infinity. For this let us introduce the functions (ψk)k>0

defined asψk(x) = ψ(x/k), x ∈ Rd, (6.23)

where

ψ(x) =

1 if |x| ≤ 1

exp(|x|2−1|x|2−4

)if 1 < |x| < 2

0 if |x| ≥ 2

.

Obviously, for all k > 0, ψk ∈ C∞b (Rd) and 0 ≤ IB(k) ≤ ψk ≤ 1. Also, allpartial derivatives of ψk tend uniformly to 0. In particular

limk→∞

‖Aψk‖∞ = 0, limk→∞

‖∂iψk‖∞ = 0, i = 1, . . . , d.

In the following we use the relations


limk→∞

ϕi(x)ψk(x) = ϕi(x), |ϕi(x)ψk(x)| ≤ |ϕi(x)| , (6.24)

limk→∞

As(ϕiψk)(x) = Asϕi(x), (6.25)

sups∈[0,t]

|As(ϕiψ

k)

(x)| ≤ Ct

n∑i=1

|ϕi(x)|+n∑

i,j=1

|ϕij(x)|

. (6.26)

Proposition 6.14. Let x = xt, t ≥ 0 be the conditional mean of the signal.In other words, x is the d-dimensional process with components

xit = E[Xit |Yt] = πit, i = 1, . . . , d, t ≥ 0.

Define R = Rt, t ≥ 0 to be the conditional covariance matrix of the signal.In other words, Rt is the d× d-dimensional process with components

Rijt = E[XitX

jt |Yt]− E[Xi

t |Yt]E[Xjt |Yt]

= πijt − πitπjt , i, j = 1, . . . , d, t ≥ 0.

Then x satisfies the stochastic differential equation

dxt = (Ftxt + ft) dt+RtH>t (dYt − (Htxt + ht) dt), (6.27)

and R satisfies the deterministic matrix Riccati equation

dRtdt

= σtσ>t + FtRt +RtF

>t −RtH>t HtRt. (6.28)

Proof. Replacing ϕ by ϕiψk in (6.22) gives us

πt(ϕiψk) = π0(ϕiψk) +∫ t

0

πs(As(ϕiψ

k)) ds

+d∑l=1

m∑j=1

∫ t

0

πs((ϕiψ

k(ϕl − πls

)))Hjls dIjs . (6.29)

By the dominated convergence theorem (use (6.24)–(6.26)) we may pass tothe limit as k →∞,

limk→∞

πt(ϕiψk) = πt(ϕi) (6.30)

limk→∞

π0(ϕiψk) +∫ t

0

πs(As(ϕiψk)) ds = π0(ϕi) +∫ t

0

πs(Asϕi) ds. (6.31)

Also

limk→∞

E[∣∣∣∣∫ t

0

πs((ϕi(ψk − 1

) (ϕk − πls

)))Hjls dIjs

∣∣∣∣] = 0.

Hence at least for subsequence (kn)n≥0, we have that


limkn→∞

d∑l=1

m∑j=1

∫ t

0

πs((ϕiψ

k(ϕl − πls

)))Hjls dIjs =

d∑l=1

m∑j=1

∫ t

0

RilsHjls dIjs .

(6.32)By taking the limit in (6.29) along a convenient subsequence and using (6.30)–(6.32) we obtain (6.27).

We now derive the equation for the evolution of the covariance matrixR. Again we cannot apply the Kushner–Stratonovich equation directly to ϕijbut use first an intermediate step. We ‘cut off’ ϕij and use the functions(ψk)k>0 and take the limit as k tends to infinity. After doing that we obtainthe equation for πijt which is

dπijt =

((σtσ>t )ij +

d∑k=1

F ikt πkjt + F jkt πikt + f it

(xit + xjt

))dt

+d∑k=1

m∑l=1

(πt (ϕiϕjϕk)− πijt xkt

)H lkt dI lt. (6.33)

Observe that since πt is normal we have the following result on the thirdmoments of a multivariate normal distribution

πt (ϕiϕjϕk) = xitxjt xkt + xitR

jkt + xjtR

ikt + xktR

ijt .

It is clear thatdRijt = dπijt − d(xitx

jt ), (6.34)

where the first term is given by (6.33) and using Ito’s form of the product ruleto expand out the second term

d(xitxjt ) = xitdx

jt + xjtdx

it + d〈xi, xj〉t.

Therefore using (6.27) we can evaluate this as

d(xitx

jt

)=

d∑k=1

F ikt xkt x

jtdt+ F jkt xitx

kt dt+ f it

(xit + xjt

)dt+ xit(HtR

>t dIt)j

+ xjt (HtR>t dIt)i +

⟨(H>t RtdIt)

i, (HtR>t dIt)j

⟩. (6.35)

For evaluating the quadratic covariation term in this expression it is simplestto work componentwise using the Einstein summation convention and use thefact that by Proposition 2.30 the innovation process It is a P-Brownian motion⟨

(HtR>t dIt)i, (H>t RtdIt)

j⟩

=⟨Rilt H

klt dIkt , R

jmt Hnm

t dInt⟩

= Rilt Hklt R

jmt Hnm

t δkndt

= Rilt Hklt H

kmt Rjmt dt

= (RH>HR>)ijdt

= (RH>HR)ijdt, (6.36)


where the last equality follows since R> = R. Substituting (6.33), (6.35) and(6.36) into (6.34) yields the following equation for the evolution of the ijthelement of the covariance matrix

dRijt =((σtσ>t )ij + (FtRt)ij + (R>t F

>t )ij − (RtH>t HtRt)ij

)dt

+ (xitRjmt + xjtR

jmt )H lm

t dI lt − (xitRjmt + xjtR

jmt )H lm

t dI lt.

Thus we obtain the final differential equation for the evolution of the condi-tional covariance matrix (notice that all of the stochastic terms will cancelout). ut

6.2.2 The Explicit Formula for the Kalman–Bucy Filter

In the following we use the notation R1/2 to denote the square root of thesymmetric positive semi-definite matrix R; that is, the matrix R1/2 is the(unique) symmetric positive semi-definite matrix A such that A2 = R.

Theorem 6.15. The conditional distribution of Xt given the observation σ-algebra is given by the explicit formula

πt(ϕ) =1

(2π)n/2

∫Rdϕ(xt +R

1/2t ζ

)exp(−1

2‖ζ‖2

)dζ

for any ϕ ∈ B(Rd).

Proof. Immediate as πt is a normal distribution with mean xt and covariancematrix Rt. ut

We remark that, in this case too, π is finite-dimensional as it depends onlyon the (d+ d2)-process (x,R) (its mean and covariance matrix).

Corollary 6.16. The process ρt satisfying the Zakai equation (3.43) is givenby

ρt(ϕ) = Zt1

(2π)n/2

∫Rdϕ(xt +R

1/2t ζ

)exp(−1

2‖ζ‖2

)dζ,

where ϕ ∈ B(Rd)

and

Zt = exp(∫ t

0

(Hxt + h)> dYs −∫ t

0

‖Hxt + h‖2 ds).

Proof. Immediate from Theorem 6.15 and the fact that ρt(1) has the repre-sentation

ρt(1) = exp(∫ t

0

(Hxs + h)> dYs −∫ t

0

‖Hxs + h‖2 ds)

as proved in Exercise 3.37. ut



6.1

i. Suppose that f(x) = ax + b; then P (x) = a + (ax + b)2σ−2 + (h1x + h2)2

which is a second-order polynomial with leading coefficient a2/σ2+h21 ≥ 0.

The Lipschitz condition on f is trivial.ii. In this case P (x) = α2 + (h1x + h2)2 which is a second order polynomial

with leading coefficient h21 ≥ 0. The case f(x) = aσ tanh(ax/σ) is obtained

by taking α = a and β = 1. The derivative f ′(x) is bounded by 1/(4β),thus the function f is Lipschitz and satisfied (6.2).

iii. Use the previous result with α = a, β = e2b.

6.2 Lemma 3.9 implies that it is sufficient to show that

E[∫ t

0

(f(Xs)2σ−2 + h2(Xs)

)ds]<∞.

From the Lipschitz condition (6.2) on f , the fact that σ is constant, and thatX0 = x0 is constant and thus trivially has bounded second moment, it followsfrom Exercise 3.11 that for 0 ≤ t ≤ T , EX2

t < GT < ∞. It also follows fromExercise 3.3 that f(X) has a linear growth bound f(x) ≤ κ(1+‖x‖), therefore

E[∫ t

0

(f(Xs)2

σ2+ h(Xs)2

)ds]≤ E

[∫ t

0

κ2

σ2(1 + |Xs|)2 + (h1Xs + h2)2 ds

]≤ 2

(h2

1 +κ2

σ2

)∫ t

0

E|Xs|2 ds+(h2

2 +κ2

σ2

)t

≤ 2(h2

1 +κ2

σ2

)tGT +

(h2

2 +κ2

σ2

)t <∞.

6.3 By Girsanov’s theorem under P, the process with components

X1t = Wt −

⟨W,−

∫ t

0

f(Xs)σ

dVs −∫ t

0

h(Xs) dWs

⟩= Wt +

∫ t

0

h(Xs) ds

and

X2t = Vt −

⟨V,−

∫ t

0

f(Xs)σ

dVs −∫ t

0

h(Xs) dWs

⟩= Vt +

∫ t

0

F (Xs)σ

ds

is a two-dimensional Brownian motion. Therefore the law of (X1t , X

2t ) =

(Tt, Vt) is bivariate normal, so to show the components are independent itis sufficient to consider the covariation

〈Vt, Yt〉 = 〈Vt,Wt〉 = 0, ∀t ∈ [0,∞),

from which we may conclude that Y is independent of V , and since Xt =X0 + σV , it follows that under P the processes Y and X are independent.


6.4 Follow the same argument as in the proof of Proposition 3.16.

6.5 Consider t as fixed; it is then sufficient to show that uniformly in n,∫ t

0

sinh(spσ)yns ds

=n−1∑k=0

Y(k+1)t/n − Ykt/nt/n

∫ (k+1)t/n

kt/n

sinh(spσ) ds

=n−1∑k=0

cosh((k + 1)pσt/n)− cosh(kpσt/n)pσt/n

∫ (k+1)t/n

kt/n

dYs

=∫ t

0

n−1∑k=0


1(kt/n,(k+1)t/n](s) dYs.

Thus by Ito’s isometry, since Y is a Brownian motion under P, therefore it issufficient to show

E

[∫ t

0

(n−1∑k=0


1(kt/n,(k+1)t/n](s)

− sinh(spσ))2

ds

]→ 0.

Using the mean value theorem, for each interval for k = 0, . . . , n − 1, thereexists ξ ∈ [kpσ/n, (k + 1)pσ/n] such that

sinh(ξpσ) =cosh((k + 1)pσ/n)− cosh(kpσ/n)

pσ/n

therefore since sinh(x) is monotonic increasing for x > 0,

E

[n−1∑k=0

∫ (k+1)t/n

kt/n

(cosh((k + 1)pσt/n)− cosh(kpσt/n)

pσt/n− sinh(spσ)

)2

ds

]

≤n−1∑k=0

t

n(sinh((k + 1)pσt/n)− sinh(kpσt/n))2

≤n−1∑k=0

t

ncosh2((k + 1)pσt/n)

(tpσ

n

)2

≤ t cosh2(tpσ)(tpσ)2

n2,

where we use the bound for a, x > 0,

sinh(a+ x)− sinh(a) ≤ sinh′(a+ x)x = cosh(a+ x)x.


Thus this tends to zero as n→∞, which establishes the required convergence.For the uniform bound, it is sufficient to show that for fixed t,

E

[n−1∑k=0

∫ (k+1)t/n

kt/n


dYs

]2

(6.37)

is uniformly bounded in n. We can then use the fact that E|Z| <√

EZ2, tosee that the modulus of the integral is bounded in the L1 norm and hence inprobability. The dependence on ω in this bound arises solely from the processY ; thus considered as a functional of Y , there is a uniform in n bound.

To complete the proof we establish a uniform in n bound on (6.37) usingthe Ito isometry

E

[n−1∑k=0

∫ (k+1)t/n

kt/n


dYs

]2

≤n−1∑k=0

∫ (k+1)t/n

kt/n

(cosh((k + 1)pσt/n)− cosh(kpσt/n)

pσt/n

)2

ds

≤(n

pσt

)2

sinh2(tpσ)(pσt

n

)2

≤ sinh2(tpσ).

6.9 For the linear filter take F (x) = ax2/2 + bx; computing the normalisingconstant involves computing for B > 0,∫ ∞

−∞exp(−Bx2 +Ax) dx = eA

2/(4B)

∫ ∞−∞

exp

(−(√

Bx− A

2√B

)2)

dx

= eA2/(4B)

√π/√B. (6.38)

In the case of the linear filter the coefficients p =√a2/σ2 + h2

1, q = ab/σ2 +h1h2 and r = a + b2/σ2 + h2

2. Thus from the equation for the normalisingconstant (6.16),

At = b/σ2 + h1Ψt +q + p2x0

pσ sinh(tpσ)− q

pσcoth(tpσ),

where

Ψt =∫ t

0


dYs

and

Bt = − a

2σ2+p coth(tpσ)

2σ.


Since coth(x) > 1 for x > 0 and p ≥ a/σ, it follows that B > 0 as required.Using the result (6.38) we see that the normalised conditional distribution isgiven by (6.15),

πt(ϕ) =√Bt√π

∫ ∞−∞

ϕ(x)exp

(−1

2

(x−At/(2Bt)

1/√

2Bt

)2)

dx,

which corresponds to a Gaussian distribution with mean xt = At/(2Bt) andvariance Rt = 1/2Bt. Differentiating

dRtdt

=p2

4 sinh2(tpσ)B2t

thus with the aid of the identity coth2(x)−1 = 1/ sinh2(x), it is easy to checkthat

dRtdt

= σ2 + 2aRt −R2th

21

which is the one-dimensional form of the Kalman filter covariance equation(6.28).

In one dimension the Kalman filter equation for the conditional mean is

dxt = (axt + b)dt+Rth1dYt −Rth1(h1x+ h2)dt

thus to verify that the mean AtRt is a solution of this SDE we compute

d(AtRt) =AtR

2t p

2

sinh2(tpσ)dt+Rth1dYt −Rt coth(tpσ)pσ(At − b/σ2)− qRt

= Rth1dYt + (AtRt)(Rtp

2 coth2(tpσ)− pσ coth(tpσ)−Rtp2)

+pRtb

σcoth(tpσ)− ab

σ2Rt − h1h2Rt

= Rth1dYt − h1h2Rt + b

+ (AtRt)(Rtp

2 coth2(tpσ)− pσ coth(tpσ)−Rtp2)

= Rth1dYt − h1h2Rt + b−Rth21(RtAt)

+ (AtRt)Rt

(p2 coth2(tpσ)− pσ

Rtcoth(tpσ)− a2

σ2

)= Rth1dYt − h1h2Rt + b−Rh2

1(AtRt) + (AtRt)Rt

×(p2 coth2(tpσ)− pσ coth(tpσ)

(− a

σ2+p

σcoth(tpσ)

)− a2

σ2

)= Rth1dYt −Rth1(h1AtRt + h2) + (AtRt)a+ b.

Therefore the solution computed explicitly solves the SDEs for the one-dimensional Kalman filter.


In the limit as t→∞, Bt → −a/σ2+p/(2σ) and At ' b/σ2+h1Ψt−q/(pσ)and thus the law of the conditional distribution asymptotically for large t isgiven by

N

(h1Ψtσ

2 + b− qσ/ppσ − a

,σ2

pσ − a

).

For the second Benes filter, from the solution to Exercise 6.1 p = h1,q = h1h2 and r = h2

2 + α2, so

Qt(x) =(h1Ψt +

h2 + h1x0

σ sinh(tpσ)− h2

σcoth(tpσ)

)x− h1

2σcoth(tpσ)x2.

In the general case we can take as antiderivative to f

F (x) =σ2

αlog(

e2αx/σ + 1/β)− σx.

However, there does not seem to be an easy way to evaluate this integral ingeneral, so consider the specific case where β = 1 and α = a,

F (x) = σ2log(cosh(ax/σ)) ;

thus from (6.16) the normalising constant is

ct =∫ ∞−∞

cosh(axσ

)× exp

((h1Ψt +

h2 + h1x0

σ sinh(tpσ)− h2

σcoth(tpσ)

)x− h1

2σcoth(tpσ)x2

)dx,

which can be evaluated using two applications of the result (6.38), with

Bt ,h1

2σcoth(tpσ),

andA±t , ± a

σ+ h1Ψt +

h2 + h1x0

σ sinh(tpσ)− h2

σcoth(tpσ).

Thus the normalising constant is given by

ct =√π

2√Bt

(e(A+

t )2/(4Bt) + e(A−t )2/(4Bt)).

Therefore the normalised conditional distribution is given by


πt(ϕ) =√Bt√π

1

e(A+t )2/(4Bt) + e(A−t )2/(4Bt)

×∫ ∞−∞

ϕ(x) exp(−Btx2)(exp(A+

t x) + exp(A−t x))

dx

=√Bt√π

1

e(A+t )2/(4Bt) + e(A−t )2/(4Bt)

×

[e−(A+)2/(4Bt)

∫ ∞−∞

ϕ(x) exp

(−1

2

(x−A+

t /(2Bt)1/√

2Bt

)2)

dx

+ e−(A−)2/(4Bt)

∫ ∞−∞

ϕ(x) exp

(−1

2

(x−A−t /(2Bt)

1/√

2Bt

)2)

dx

].

Thus the normalised conditional distribution is the weighted mixture of twonormal distributions, with weight

w± =exp(−(A±t )2/(4Bt))

exp((A+t )2/(4Bt)) + exp((A−t )2/(4Bt))

on a N(A±t /(2Bt), 1/(2Bt)) distributed random variable.In the limit as t→∞, Bt → h1/(2σ) and A±t ' ±a/σ + h1Ψt − h2/σ and

the asymptotic expressions for the weights become

w± = 2exp(±2a/(h1Ψt/σ − h2/σ

2))cosh(2a/(h1Ψt/σ − h2/σ2))

and the distributions N(±a/h1 + σΨt − h2/h1, σ/h1).

6.11

i. Setting

Ct , T0 +∫ t

0

Φ−1s ls ds+

∫ t

0

Φ−1s zs dUs, (6.39)

and At , ΦtCt, where Φt is given by (6.21), it follows by integration byparts that

dAt = dΦt

[T0 +

∫ t

0

Φ−1s ls dt+

∫ t

0

Φ−1s zs dUs

]+ Φt

[Φ−1t lt dt+ Φ−1

t zt dUt]

= LtAt + lt dt+ zt dUt

which is the SDE for Tt. As Φ0 = Id+m, it follows that A0 = T0. Thus Tthas the representation (6.20).

ii. In this part we continue to use the notation for the process Ct intro-duced above. It is clearly sufficient to show that (Tt1 , . . . , Ttn−1 , Tn) has amultivariate-normal distribution, since T = (X,Y ). Note that the process


Φt is a deterministic matrix-valued process, thus if for fixed t, Ct has amultivariate normal distribution then so does ΦtCt.Since X0 has a multivariate normal distribution and Y0 = 0, T0 has amultivariate normal distribution. From the SDE (6.39) it follows thatCt1 , Ct2 − Ct1 , . . . , Ct − Ctn−1 are independent random variables, each ofwhich has a multivariate-normal distribution. The result now follows since

Tt1 = Φt1Ct1

Tt2 = Φt2(Ct1 + (Ct2 − Ct1))...

...Tt = Φt(Ct1 + · · ·+ (Ctn−2 − Ctn−1) + (Ct − Ctn−1)).

iii. It follows from (ii) and the fact that the image under a linear map of amultivariate-normal distribution is also multivariate-normal, that for anyn and fixed times 0 ≤ t1 ≤ · · · ≤ tn−1 ≤ t,(

Xt,

n−2∑i=0

Kti

(Yti+1 − Yti

)+Ktn−1

(Yt − Ytn−1

))has a multivariate-normal distribution. By the usual Ito isometry argu-ment as the mesh of the partition tends to zero, this term converges in L2

and thus in probability to (Xt,

∫ t

0

Ks dYs

).

By a standard result on weak convergence (e.g. Theorem 4.3 of [19]) thisconvergence in probability implies that the sequence converges weakly;consequently the characteristic functions must also converge. As each ele-ment of the sequence is multivariate normal it follows that the limit mustbe multivariate normal.

6.13

i. The first part follows using the SDE for X and Ito’s formula using thelocal boundedness of fs, Fs, σs. In the case p = 1 local boundedness of σsimplies that the stochastic integral is a martingale, thus using the notation

‖F‖[0,t] , sup0≤s≤t

maxi,j=1,...,d

|F ijs | <∞, ‖f‖[0,t] , sup0≤s≤t

maxi=1,...,d

|f is| <∞,

we can obtain the following bound

E‖Xt‖ ≤ E‖X0‖+ E[∥∥∥∥∫ t

0

FsXs + fs ds∥∥∥∥]

≤ x0 + td‖f‖[0,t] + d‖F‖[0,t]∫ t

0

‖Xs‖ ds.


Thus from Corollary A.40 to Gronwall’s lemma

E‖Xt‖ ≤(x0 + td‖fs‖[0,t]

)exp(td‖F‖[0,t]

).

Similarly for p = 2, use f(x) = x>x,

d‖Xt‖2 = 2X>s (FsXs + fs)ds+ 2X>s σdVs + tr(σ>σ)ds.

Let Tn be a reducing sequence for the stochastic integral, which is a localmartingale (see Exercise 3.10 for more details). Then

E‖Xt∧Tn‖2 = E‖X0‖2 + E

[∫ t∧Tn

0

2X>s (FsXs + fs) + tr(σ>σ) ds

]

≤ E‖X0‖2 + 2d2‖F‖[0,t]∫ t

0

E[‖Xs‖2] ds

+ dt‖f‖[0,t] sup0≤s≤t

E‖Xs‖+ td‖σ‖2[0,t].

Using the first moment bound, Gronwall’s inequality yields a bound inde-pendent of n, thus as n→∞ Fatou’s lemma implies that

sup0≤s≤t

E‖Xs‖2 <∞.

We can proceed by induction to the general case for the pth moment. ApplyIto’s formula to f(x) = xp/2 for p ≥ 3 to obtain the pth moment bound;thus

d‖Xt‖p = p‖X‖p−2(2X>t (FtXt + ft)dt+ tr(σ>σ) ds+ 2X>t σdVt

)+p(p− 1)

2‖Xt‖p−4(X>t σσ

>Xt)dt.

The stochastic integral is a local martingale and so a reducing sequence Tncan be found. The other terms involve moments of order p, p−1 and p−2,so the result follows as in the case above from the inductive hypotheses,Gronwall’s lemma followed by Fatou’s lemma and the fact that all momentsof the initial X0 are finite since it is normally distributed.

ii. For any s ∈ [0,∞),

E [(πs(|ϕi|))p] = E [(E [|ϕi(Xs)| | Ys])p] ≤ E [E [|ϕi(Xs)|p | Ys]]= E [|ϕi(Xs)|p] ,

where the inequality follows from the conditional form of Jensen’s inequal-ity. Therefore from part (i),

sups∈[0,t]

E [(πs(|ϕi|))p] ≤ sups∈[0,t]

E[|Xi

s|p]<∞.


For the product term

E [(πs(|ϕij |))p] = E[(

E[|Xi

sXjs | | Ys

])p] ≤ E[E[|Xi

s|p|Xjs |p | Ys

]]= E

[|Xi

s|p|Xjs |p]

≤√

E [|Xis|2p] E

[|Xj

s |2p]<∞.

7

The Density of the Conditional Distribution ofthe Signal

The question which we consider in this chapter is whether πt, the conditionaldistribution of Xt given the observation σ-algebra Yt, has a density with re-spect to a reference measure, in particular with respect to Lebesgue measure.We prove that, under fairly mild conditions, the unnormalised conditional dis-tribution ρt, which is the unique solution of the Zakai equation (3.43), has asquare integrable density with respect to Lebesgue measure. This automati-cally implies that πt has the same property. There are various approaches toanswer this question. The approach presented here is that adopted by Kurtzand Xiong in [174]. In the second part of the chapter we discuss the smoothnessproperties (i.e. the differentiability) of the density of ρ. Finally we show theexistence of the dual of the solution of the Zakai equation (see (7.30) below).The dual of ρ plays an important role in establishing the rates of convergenceof particle approximations to π and ρ which are discussed in more detail inChapter 9.

In the following, we take the signal X to be the solution of the stochasticdifferential equation (3.9); that is,X = (Xi)di=1 is the solution of the stochasticdifferential equation

dXt = f(Xt)dt+ σ(Xt) dVt, (7.1)

where f : Rd → Rd and σ : Rd → Rd×p are bounded and globally Lipschitz(i.e. they satisfy the conditions (3.10)) and V = (V j)pj=1 is a p-dimensionalBrownian motion. The observation process is the solution of the evolutionequation (3.5). That is, Y is an m-dimensional stochastic process which sat-isfies

dYt = h(Xt) dt+ dWt,

where h = (hi)mi=1 : Rd → Rm is a bounded measurable function and W is astandard m-dimensional Brownian motion which is independent of X.

In the following we make use of an embedding theorem which we statebelow. In order to state this theorem, we need a few notions related to Sobolevspaces. Further details on this topic can be found, for example, in Adams [1].


166 7 Density of the Conditional Distribution of the Signal

7.1 An Embedding Theorem

Let α = (α1, . . . , αd) ∈ Nd be an arbitrary multi-index. Given two functions fand g ∈ Lp(Rd), we say that ∂αf = g in the weak sense if for all ψ ∈ C∞0 (Rd)we have ∫

Rdf(x)∂αψ(x) dx = (−1)|α|

∫Rdg(x)ψ(x) dx. (7.2)

We immediately see that if the partial derivative of a function exists in theconventional sense and is continuous up to order |α|, integration by parts willyield (7.2). The converse is not true; to see this one can consider, for example,the function exp(i/|x|n). Let k be a non-negative integer. The Sobolev space,denoted W p

k (Rd), is the space of all functions f ∈ Lp(Rd) such that the partialderivatives ∂αf exist in the weak sense and are in Lp(Rd) whenever |α| ≤ k,where α is a multi-index. We endow W p

k (Rd) with the norm

‖f‖k,p =

∑|α|≤k

‖∂αf‖pp

1/p

, (7.3)

where ∂0f = f and the norms on the right are the usual norms in Lp(Rd).Then W p

k (Rn) is complete with respect to the norm defined by (7.3); henceit is a Banach space. In the following, we make use, without proof, of thefollowing Sobolev-type embedding theorem (for a proof see Adams [1], Saloff-Coste [252], or Stein [256]).

Theorem 7.1. If k > d/p then there exists a modification of f ∈W pk (Rd) on

a set of zero Lebesgue measure so that the resulting function is continuous.

In the following we work mostly with the space W 2k (Rd). This space is a

Hilbert space with the inner product

〈f, g〉Wp2 (Rd) =

∑|α|≤k

〈∂αf, ∂αg〉 ,

where 〈·, ·〉 is the usual inner product on L2(Rd)

〈f, g〉 =∫

Rdf(x)g(x) dx.

Exercise 7.2. Let ϕii>0 be an orthonormal basis of L2(Rd) with the prop-erty that ϕi ∈ Cb(Rd) for all i > 0. Let µ ∈M(Rd) be a finite measure. Showthat if

∞∑i=1

µ(ϕi)2 <∞,

then µ is absolutely continuous with respect to Lebesgue measure. Moreoverif gµ : Rd → R is the density of µ with respect to Lebesgue measure thengµ ∈ L2(Rd).

7.1 An Embedding Theorem 167

The results from below make use of the regularisation method . Let ψ bethe kernel for the heat equation (∂tu = 1/2

∑di=1 ∂i∂iu), viz

ψε(x) , (2πε)−d/2 exp(−‖x‖2/2ε

),

and define the convolution operator

Tε : B(Rd)→ B(Rd)

Tεf(x) ,∫

Rdψε(x− y)f(y) dy, x ∈ Rd.

(7.4)

Also define the corresponding operator on the space of finite measures

Tε : M(Rd)→M(Rd)

Tεµ(f) , µ(Tεf)

=∫

Rd

∫Rdψε(x− y)f(y) dy µ(dx)

=∫

Rdf(y)Tεµ(y) dy,

where y 7→ Tεµ(y) is the density of the measure Tεµ with respect to Lebesguemeasure, which by the above exists even if µ is not absolutely continuous withrespect to Lebesgue measure; furthermore the density is given by

Tεµ(y) =∫

Rdψε(x− y)µ(dx), y ∈ Rd.

In the following, we use the same notation Tεµ for the regularized measureand its density.

Exercise 7.3. Let µ be a finite measure on Rd and |µ| ∈ M(Rd) be its totalvariation measure. Show that:

i. For any ε > 0 and g ∈ L2(Rd), ‖Tεg‖2 ≤ ‖g‖2.ii. For any ε > 0, Tεµ ∈W 2

k (Rd).iii. For any ε > 0, ‖T2εµ‖2 ≤ ||Tε|µ|||2.

Let µ be a finite signed measure on Rd; then for f ∈ Cb(Rd), denote by fµthe finite signed measure on Rd which is absolutely continuous with respectto µ and whose density with respect to µ is f .

Exercise 7.4. Let µ be a finite (signed) measure on Rd and |µ| ∈ M(Rd) beits total variation measure. Also let f ∈ Cb(Rd) be a Lipschitz continuous andbounded function. Denote by kf , supx∈Rd |f(x)| and let k′f be the Lipschitzconstant of f . Show that:

i. For any ε > 0, ‖Tεfµ‖2 ≤ kf ‖Tε|µ|‖2.ii. For any ε > 0 and i = 1, . . . , d, we have

∣∣⟨Tεµ, f∂iTεµ⟩∣∣ ≤ 12k′f ‖Tε|µ|‖

22.

iii. For any ε > 0 and i = 1, . . . , d, we have ‖f∂iTεµ − ∂iTεfµ‖2 ≤2d/2+2k′f‖T2ε|µ|‖2.


7.2 The Existence of the Density of ρt

In this section we prove that the unnormalised conditional distribution ρt isabsolutely continuous with respect to Lebesgue measure and its density issquare integrable. We start with two technical lemmas.

We require a set of functions ϕii≥1, where ϕ ∈ C2b (Rd), such that these

functions form an orthonormal basis of the space L2(Rd). There are manymethods to construct such a basis. One of the most straightforward ones isto use wavelets (see, e.g. [224]). For any orthonormal basis of L2(Rd) andarbitrary f ∈ L2(Rd),

f =∞∑i=1

〈f, ϕi〉ϕi,

so

‖f‖22 =∞∑i=1

〈f, ϕi〉2‖ϕi‖22 =∞∑i=1

〈f, ϕi〉2.

The function ψε(x) decays to zero as ‖x‖ → ∞, therefore for ϕ ∈ C1b (Rd),

using the symmetry of ψε(x− y) and integration by parts

∂iTεϕ =∂

∂xi

∫Rdψε(x− y)ϕ(y) dy =

∫Rd

∂

∂xiψε(x− y)ϕ(y) dy

= −∫

Rd

∂

∂yiψε(x− y)ϕ(y) dy =

∫Rdψε(x− y)

∂ϕ(y)∂yi

dy

= Tε(∂iϕ).

Lemma 7.5. Let A be a generator of the form

Aϕ =d∑

i,j=1

aij∂2ϕ

∂xi∂xj+

d∑i=1

f i∂ϕ

∂xi, ϕ ∈ D(A) ⊂ Cb(Rd), (7.5)

where the matrix a is defined as in (3.12); that is, a = 12σσ

>. Let ϕii>0 beany orthonormal basis of L2(Rd) with the property that ϕi ∈ C2

b (Rd) for alli > 0. Then

∞∑k=1

ρs(A (Tεϕk))2 ≤ dd∑i=1

∥∥∂iTε(f iρs)∥∥2

2+ d2

d∑i,j=1

∥∥∂i∂jTε(aijρs)∥∥2

2. (7.6)

In particular, if

kf = maxi=1,...,d

supx∈Rd

|f i(x)| <∞

ka = maxi,j=1,...,d

supx∈Rd

|aij(x)| <∞,

then there exists a constant k = k(f, a, ε, d) such that

7.2 The Existence of the Density of ρt 169

∞∑k=1

ρs(A (Tεϕk))2 ≤ k ‖Tερs‖22 .

Proof. For any i ≥ 0, for ϕ ∈ C2b (Rd), integration by parts yields

ρs(f i∂iTεϕ) = ρs(f iTε∂iϕ) = (f iρs)(Tε∂iϕ) = 〈∂iϕ, Tε(f iρs)〉= −〈ϕ, ∂iTε(f iρs)〉 (7.7)

and

ρs(aij∂i∂jTεϕ) = ρs(aijTε∂i∂jϕ) = (aijρs)(Tε∂i∂jϕ) = 〈∂i∂jϕ, Tε(aijρs)〉= 〈ϕ, ∂i∂jTε(aijρs)〉. (7.8)

Thus using (7.7) and (7.8),

ρs (A (Tεϕk)) = −d∑i=1

⟨ϕk, ∂

iTε(f iρs)⟩

+d∑

i,j=1

⟨ϕk, ∂

i∂jTε(aijρs)⟩, (7.9)

from which inequality (7.6) follows. Then

∣∣∂iTε(f iρs)(x)∣∣ ≤ ∣∣∣∣∫

Rd

|xi − yi|ε

ψε(x− y)(f iρs)(dy)∣∣∣∣

≤ 2d/2kf∫

Rd

|xi − yi|ε

exp(−‖x− y‖

2

4ε

)ψ2ε(x− y)ρs(dy)

≤ 2d/2kf√ε

T2ερs(x),

where the last inequality follows as supt≥0 t exp(−t2/4) =√

2/e < 1. For thesecond term in (7.9) we can construct a similar bound

∣∣∂i∂jTε(aijρs)(x)∣∣ ≤ ∣∣∣∣∫

Rd

((xi − yi)(xj − yj)

ε2− 1i=j

ε

)ψε(x− y)(aijρs)(dy)

∣∣∣∣≤ 2d/2ka

∫Rd

(‖x− y‖2

ε2+

1ε

)× exp

(−‖x− y‖

2

4ε

)ψ2ε(x− y)ρs(dy)

≤ 2d/2ka(2 + 1/ε)T2ερs(x),

where we used the fact that supt≥0 te−t/4 = 4/e < 2. The lemma then followsusing part iii. of Exercise 7.3. ut

Lemma 7.6. Let k′σ be the Lipschitz constant of the function σ, where a =12σσ

>. Then we have


d∑i,j=1

⟨Tερs, ∂

i∂jTε(aijρs)⟩

+12

p∑k=1

∥∥∥∥∥d∑i=1

∂iTε(σikρs)

∥∥∥∥∥2

2

≤ 2d/2+3d2p(k′σ)2 ‖Tερs‖22 . (7.10)

Proof. First let us note that⟨Tερs, ∂

i∂jTε(aijρs)⟩

=∫

Rd

∫Rdψε(x− y)ρs(dy)

∫Rd

∂2

∂xi∂xjψε(x− z)aij(z)ρs(dz) dx

=∫

Rd

∫RdΘ(y, z)aij(z)ρs(dy)ρs(dz)

=∫

Rd

∫RdΘ(y, z)

aij(z) + aij(y)2

ρs(dy)ρs(dz), (7.11)

where the last equality follows from the symmetry in z and y, and where

Θ(y, z) ,∫

Rdψε(x− y)

∂2

∂xi∂xjψε(x− z) dx

=∂2

∂zi∂zj

∫Rdψε(x− z)ψε(x− y) dx

=∂2

∂zi∂zjψ2ε(z − y)

=(

(zi − yi)(zj − yj)4ε2

−1i=j

2ε

)ψ2ε(z − y).

Then by integration by parts and the previous calculation we get that⟨∂iTε(σikρs), ∂jTε(σjkρs)

⟩= −

⟨Tε(σikρs), ∂i∂jTε(σjkρs)

⟩= −

∫Rd

∫RdΘ(y, z)

σik(y)σjk(z) + σik(z)σjk(y)2

ρs(dy)ρs(dz).

(7.12)

Combining (7.11) and (7.12) summing over all the indices, and using the factthat a = σσ>, the left-hand side of (7.10) is equal to

12

∫Rd

∫RdΘ(y, z)

p∑k=1

d∑i,j=1

(σik(y)− σik(z)

) (σjk(y)− σjk(z)

)ρs(dy)ρs(dz)

and hence using the Lipschitz property of σ,

7.2 The Existence of the Density of ρt 171

d∑i,j=1

⟨Tερs, ∂

i∂jTε(aijρs)⟩

+p∑k=1

∥∥∥∥∥d∑i=1

∂iTε(σikρs)

∥∥∥∥∥2

2

≤ d2p

2(k′σ)2

∫Rd

∫Rd‖y − z‖2Θ(y, z)ρs(dy)ρs(dz).

It then follows that

‖y − z‖2|Θ(y, z)| ≤ 2d/2‖y − z‖2ψ4ε(z − y)

×(‖z − y‖2

4ε2+

12ε

)exp(−‖z − y‖

2

8ε

)≤ 2d/2+5ψ4ε(z − y),

where the final inequality follows by setting x = ‖y−z‖2/(2ε) in the inequality

supx≥0

(x2 + x)exp(−x/4) < 25.

Hence the left-hand side of (7.10) is bounded by

2d/2+3d2p(k′σ)2 ‖T2ερs‖22 ≤ 2d/2+3d2p(k′σ)2 ‖Tερs‖22 ,

the final inequality being a consequence of Exercise 7.3, part (iii). ut

Proposition 7.7. If the function h is uniformly bounded, then there exists aconstant c depending only on the functions f, σ and h and such that for anyε > 0 and t ≥ 0 we have

E[‖Tερt‖22

]≤ ‖Tεπ0‖22 + c

∫ t

0

E[‖Tερs‖22

]ds.

Proof. For any t ≥ 0 and ϕi an element of an orthonormal basis of L2(Rd)chosen so that ϕi ∈ Cb(Rd) we have from the Zakai equation using the factthat ρt(Tεϕi) = Tερt(ϕi),

Tερt(ϕi) = Tεπ0(ϕi) +∫ t

0

ρs(A (Tεϕi)) ds+m∑j=1

∫ t

0

ρs(hjTεϕi) dY js

and by Ito’s formula

(Tερt(ϕi))2 = (Tεπ0(ϕi))

2 + 2∫ t

0

Tερs(ϕi)ρs (A (Tεϕi)) ds

+ 2m∑j=1

∫ t

0

Tερs(ϕi)ρs(hjTεϕi) dY js

+m∑j=1

∫ t

0

(ρs(hjTεϕi)

)2ds.


The stochastic integral term in the above identity is a martingale, hence itsexpectation is 0. By taking expectation and using Fatou’s lemma we get that

E[‖Tερt‖22

]≤ lim inf

n→∞E

[n∑i=1

(Tερt(ϕi))2

]≤ ‖Tεπ0‖22

+ lim infn→∞

n∑i=1

E

[∫ t

0

(2Tερs(ϕi)ρs (A (Tεϕi))

+m∑j=1

(ρs(hjTεϕi)

)2)ds

]. (7.13)

By applying the inequality |ab| ≤ (a2 + b2)/2,

n∑i=1

E[∫ t

0

|Tερs(ϕi)ρs(A (Tεϕi))| ds]

≤ 12

E

[∫ t

0

n∑i=1

(Tερs(ϕi))2 ds

]+

12

E

[∫ t

0

n∑i=1

(ρs(A(Tεϕi)))2 ds

].

Thus using the bound of Lemma 7.5, it follows that uniformly in n ≥ 0,

n∑i=1

E[∫ t

0

|Tερs(ϕi)ρs(A (Tεϕi))| ds]≤ 1 + k

2

∫ t

0

E[‖Tερs‖22

]ds.

For the second part of the last term on the right-hand side of (7.13) for anyn ≥ 0,

n∑i=1

E

m∑j=1

∫ t

0

(ρs(hjTεϕi)

)2ds

≤ mk2h

∫ t

0

E[‖Tερs‖22

]ds,

wherekh , max

j=1,...,msupx∈Rd

|hj(x)|.

As a consequence, there exists a constant k = k(f, a, h, ε, d,m) such that

E[‖Tερt‖22

]≤ ‖Tεπ0‖22 + k

∫ t

0

E[‖Tερs‖22

]ds;

hence by Corollary A.40 to Gronwall’s lemma

E[‖Tερt‖22

]≤ ‖Tεπ0‖22ekt,

thus

7.2 The Existence of the Density of ρt 173∫ t

0

E[‖Tερs‖22

]ds ≤ ‖Tεπ0‖22

kekt <∞,

where we used Exercise 7.3 part (ii) to see that ‖Tεπ0‖22 < ∞. Thus as aconsequence of the dominated convergence theorem in (7.13) the limit canbe exchanged with the integral and expectation (which is a double integral).From (7.9), using 〈f, g〉 =

∑∞i=1〈f, ϕi〉〈g, ϕi〉, we then get that

E[‖Tερt‖22

]≤ ‖Tεπ0‖22 + 2

d∑i=1

∫ t

0

E[⟨Tερs, ∂

iTεfiρs⟩]

ds

+d∑

i,j=1

∫ t

0

E[⟨Tερs, ∂

i∂jTεaijρs

⟩]ds

+m∑j=1

∫ t

0

E[∥∥Tεhjρs∥∥2

2

]ds. (7.14)

From Exercise 7.4 parts (ii) and (iii), we obtain∣∣⟨Tερs, ∂iTεf iρs⟩∣∣ ≤ ∣∣⟨Tερs, f i∂iTερs⟩∣∣+∣∣⟨Tερs, ∂iTε(f iρs)− f i∂iTερs⟩∣∣

≤ 12k′f ‖Tερs‖

22 + 2d/2+2k′f ‖Tερs‖2 ‖T2ερs‖2 . (7.15)

Since the function h is uniformly bounded, it follows that∥∥Tε(hjρt)∥∥2

2≤ k2

h ‖Tερt‖22 , j = 1, . . . ,m. (7.16)

The proposition follows now by bounding the terms on the right-hand sideof (7.14) using (7.10) for the third term, (7.15) for the second term and (7.16)for the fourth term. ut

Theorem 7.8. If π0 is absolutely continuous with respect to Lebesgue measurewith a density which is in L2(Rd) and the sensor function h is uniformlybounded, then almost surely ρt has a density with respect to Lebesgue measureand this density is square integrable.

Proof. In view of Exercise 7.2, it is sufficient to show that

E

[ ∞∑i=1

ρt(ϕi)2

]<∞,

where ϕii>0 is an orthonormal basis of L2(Rd) with the property that ϕi ∈Cb(Rd) for all i > 0. From Proposition 7.7, Corollary A.40 to Gronwall’slemma and Exercise 7.3 part (iii) we get that,

supε>0

E[‖Tερt‖22

]≤ ect ‖π0‖22 . (7.17)


Hence, by Fatou’s lemma

E

[ ∞∑i=1

(ρt(ϕi))2

]= E

[limε→0

∞∑i=1

(Tερt(ϕi))2

]≤ lim inf

ε→0E[‖Tερt‖22

]≤ ect‖π0‖22 <∞,

hence the result. ut

Corollary 7.9. If π0 is absolutely continuous with respect to Lebesgue mea-sure with a density which is in L2(Rd) and the sensor function h is uniformlybounded, then almost surely πt has a density with respect to Lebesgue measureand this density is square integrable.

Proof. Immediate from Theorem 7.8 and the fact that πt is the normalisedversion of ρt. ut

7.3 The Smoothness of the Density of ρt

So far we have proved that ρt has a density in L2(Rd). The above proof hasthe advantage that the conditions on the coefficients are fairly minimal. Inparticular, the diffusion matrix a is not required to be strictly positive. From(7.17) we get that

supε>0

E [‖Tερt‖2] <∞.

Since, for example, the sequence (‖T2−nρt‖2)n>0

is non-decreasing (see part(iii) of Exercise 7.3), by Fatou’s lemma, this implies that

supn>0‖T2−nρt‖2 <∞.

This implies that T2−nρt belongs to a finite ball in L2(Rd). But L2(Rd) and ingeneral any Sobolev space W p

k (Rd) with p ∈ (1,∞) has the property that itsballs are weakly sequentially compact (as Banach spaces, they are reflexive;see, for instance, Adams [1]). In particular, this implies that the sequenceT2−nρt has a weakly convergent subsequence. So ρt, the (weak) limit of theconvergent subsequence of T2−nρt must be in L2(Rd) almost surely. Similarly,if we can prove the stronger result

supε>0

E[‖Tερt‖Wp

k (Rd)

]<∞, (7.18)

then, by the same argument, we can get that the density of ρt belongs toW pk (Rd). Moreover by Theorem 7.1, if k > d/p then the density of ρt is

continuous (more precisely it has a continuous modification with which we can

7.3 The Smoothness of the Density of ρt 175

identify it) and bounded. Furthermore, if k > d/p+n, not just the density of ρtbut also all of its partial derivatives up to order n are continuous and bounded.To obtain (7.18) we require additional smoothness conditions imposed on thecoefficients f, σ and h and we also need π0 to have a density that belongs toW pk (Rd). We need to analyse the evolution equation not just of Tερt but also

that of all of its partial derivatives up to the required order k. Unfortunately,the analysis becomes too involved to be covered here. The following exerciseshould provide a taster of what would be involved if we were to take thisroute.

Exercise 7.10. Consider the case where d = m = 1 and let zεt , t ≥ 0be the measure-valued process (signed measures) whose density is the spatialderivative of Tερt. Show that

E[‖zεt ‖

22

]≤∥∥(Tεπ0)′

∥∥2

2− 2

∫ t

0

E[⟨zεs , (Tεfρs)

′′⟩] ds

−∫ t

0

E[⟨zεs , (Tεaρs)

′′′⟩] ds+∫ t

0

E[∥∥(Tεhρs)

′∥∥2

2

]ds.

A much cleaner approach, but just as lengthy, is to recast the Zakai equa-tion in its strong form. Heuristically, if the unconditional distribution of thesignal ρt has a density pt with respect to Lebesgue measure for all t ≥ 0 andpt is ‘sufficiently nice’ then from (3.43) we get that

ρt(ϕ) =∫

Rdϕ(x)pt(x) dx

=∫

Rdϕ(x)

(p0(x) +

∫ t

0

A∗ps(x) ds+∫ t

0

h>(x)ps(x) dYs

)dx. (7.19)

In (7.19), ϕ is a bounded function of compact support with bounded first andsecond derivatives and A∗ is the adjoint of the operator A, where

Aϕ =d∑

i,j=1

aij∂2ϕ

∂xi∂xj+

d∑i=1

f i∂ϕ

∂xi

A∗ϕ =d∑

i,j=1

∂2

∂xi∂xj(aijϕ)−

d∑i=1

∂

∂xi(f iϕ)

and for suitably chosen functions ψ,ϕ (e.g. ψ,ϕ ∈W 22 (Rd)),†

〈A∗ψ,ϕ〉 = 〈ψ,Aϕ〉.

It follows that it is natural to look for a solution of the stochastic partialdifferential equation† We also need f to be differentiable and a to be twice differentiable.


pt(x) = p0(x) +∫ t

0

A∗ps(x) ds+∫ t

0

h>(x)ps(x) dYs, (7.20)

in a suitably chosen function space. It turns out that a suitable function spacewithin which we can study (7.20) is the Hilbert space W 2

k (Rd). A multitude ofdifficulties arise when studying (7.20): the stochastic integral in (7.20) needs tobe redefined as a Hilbert space operator, the operator A∗ has to be rewrittenin its divergence form and the solution of (7.20) needs further explanations interms of measurability, continuity and so on. A complete analysis of (7.20) iscontained in Rozovskii [250]. The following two results are immediate corol-laries of Theorem 1, page 155 and, respectively, Corollary 1, page 156 in [250](see also Section 6.2, page 229). We need to assume the following.

C1. The matrix-valued function a is uniformly strictly elliptic. That is, thereexists a constant c such that ξ>aξ ≥ c‖ξ‖2 for any x, ξ ∈ Rd such thatξ 6= 0.

C2. For all i, j = 1, . . . , d, aij ∈ Ck+2b (Rd), fi ∈ Ck+1

b (Rd) and for all i =1, . . . ,m, we have hi ∈ Ck+1

b (Rd).C3. p0 ∈W r

k (Rd), r ≥ 2.

Theorem 7.11. Under the assumptions C1–C3 there exists a unique Yt-adapted process p = pt, t ≥ 0, such that pt ∈ W 2

k (Rd) and p is a solutionof the stochastic PDE (7.20). Moreover there exists a constant c = c(k, r, t)such that

E[

sup0≤s≤t

‖ps‖r′

W rk (Rd)

]≤ c‖p0‖r

′

W rk (Rd), (7.21)

where r′ can be chosen to be either 2 or r.

Theorem 7.12. Under the assumptions C1–C3, if n ∈ N is given and (k −n)r > d, then p = pt, t ≥ 0; the solution of (7.20) has a unique modificationwith the following properties.

1. For every x ∈ Rd, pt(x) is a real-valued Yt-adapted process.2. Almost surely, (t, x) → pt(x) is jointly continuous over [0,∞) × Rd and

is continuously differentiable up to order n in the space variable. Both ptand its partial derivatives are continuous bounded functions.

3. There exists a constant c = c(k, n, r, t) such that

E

[sups∈[0,t]

‖ps‖rn,∞

]≤ c‖p0‖rW r

k (Rd). (7.22)

Remark 7.13. The inequality (7.21) implies that, almost surely, pt belongsto the subspace W r

k (Rd) or W 2k (Rd). However, the definition of the solution

of (7.20) requires the Hilbert space structure of W 2k (Rd) which is why the

conclusion of Theorem 7.11 is that p is a W 2k (Rd)-valued process.


Let now ρt be the measure which is absolutely continuous with respectto Lebesgue measure with density pt. For the following exercise, use the factthat the stochastic integral appearing on the right-hand side of the stochas-tic partial differential equation (7.20) is defined as the unique L2(Rd)-valuedstochastic process M = Mt, t ≥ 0 satisfying

〈Mt, ϕ〉 =∫ t

0

〈psh>, ϕ〉dYs, t ≥ 0 (7.23)

for any ϕ ∈ L2(Rd) (see Chapter 2 in Rozovskii [250] for details).

Exercise 7.14. Show that ρ = ρt, t ≥ 0 satisfies the Zakai equation (3.43);that is for any test function ϕ ∈ C2

k(Rd),

ρt(ϕ) = π0(ϕ) +∫ t

0

ρs(Aϕ) ds+∫ t

0

ρs(ϕh>) dYs. (7.24)

Even though we proved that ρ satisfies the Zakai equation we cannot con-clude that it must be equal to ρ based on the uniqueness theorems provedin Chapter 4. This is because the measure-valued process ρ does not a prioribelong to the class of processes within which we proved uniqueness for thesolution of the Zakai equation. In particular, we do not know if ρ has finitemass (i.e. ρ(1) may be infinite), so the required inequalities (4.4), or (4.37)may not be satisfied. Instead we use the same approach as that adopted inSection 4.1.

Exercise 7.15. Let εt ∈ St where St is the set defined in Corollary B.40; thatis,

εt = exp(i

∫ t

0

r>s dYs +12

∫ t

0

‖rs‖2 ds),

where r ∈ Cpb ([0, t],Rm). Then show that

E[εtρt(ϕt)] = π0(ϕ0) + E[∫ t

0

εsρs

(∂ϕs∂s

+Aϕs + iϕsh>rs

)ds], (7.25)

for any ϕ ∈ C1,2b ([0, t]× Rd), such that for any t ≥ 0, ϕ ∈W 2

2 (Rd) and

sups∈[0,t]

‖ϕs‖W 22 (Rd) <∞. (7.26)

Proposition 7.16. Under the assumptions C1–C3, for any ψ ∈ C∞k (Rd) wehave, almost surely,

ρt(ψ) = ρt(ψ), P-a.s.

Proof. Since all coefficients are now bounded and a is not degenerate thereexists a (unique) function ϕ ∈ C1,2

b ([0, t]×Rd) which solves the parabolic PDE(4.14); that is,


∂ϕs∂s

+Aϕs + iϕsh>rs = 0, s ∈ [0, t]

with final condition ϕt = ψ. The compact support of ψ ensures that (7.26) isalso satisfied. From (7.25) we obtain that

E[εtρt(ψ)] = π0(ϕ0).

As the same identity holds for ρt(ψ) the conclusion follows since the set St istotal. ut

Theorem 7.17. Under the assumptions C1–C3, the unnormalised conditionaldistribution of the signal has a density with respect to Lebesgue measure andits density is the process p = pt, t ≥ 0 which is the unique solution of thestochastic PDE (7.20).

Proof. Similar to Exercise 4.1, choose (ϕi)i≥0 to be a sequence of C∞k (Rd)functions dense in the set of all continuous functions with compact support.Then choose a common null set for all the elements of the sequence outsidewhich ρt(ϕi) = ρt(ϕi) for all i ≥ 0 and by a standard approximation argumentone shows that outside this null set

ρt(A) = ρt(A)

for any ball A = B(x, r) for arbitrary x ∈ Rd and r > 0, hence the twomeasures must coincide. ut

The following corollary identifies the density of the conditional distribution ofthe signal (its existence follows from Corollary 7.9). Denote the density of πtby πt ∈ L2(Rd).

Corollary 7.18. Under the assumptions C1–C3, the conditional distributionof the signal has a density with respect to Lebesgue measure and its density isthe normalised version of process p = pt, t ≥ 0 which is the solution of thestochastic PDE (7.20). In particular, πt ∈W 2

k (Rd) and there exists a constantc = c(k, r, t) such that

E[

sup0≤s≤t

‖πs‖r′

W rk (Rd)

]≤ c‖p0‖r

′

W rk (Rd), (7.27)

where r′ can be chosen to be either 1 or r/2.

Proof. The first part of the corollary is immediate from Theorem 7.11 andTheorem 7.17. Inequality (7.27) follows from (7.21) and the Cauchy–Schwarzinequality

E[

sup0≤s≤t

‖πs‖r′

W rk (Rd)

]≤

√E[

sup0≤s≤t

ρ−2r′s (1)

] [sup

0≤s≤t‖ps‖2r

′

W rk (Rd)

].

Exercise 9.16 establishes the finiteness of the term E[sup0≤s≤t ρ−2r′

s (1)].


Additional smoothness properties of π follow in a similar manner from Theo-rem 7.12. Following the Kushner–Stratonovich equation (see Theorem 3.30),the density of π satisfies the following non-linear stochastic PDE

πt(x) = π0(x) +∫ t

0

A∗πs(x) ds+∫ t

0

πs(x)(h>(x)− πs(h>) (dYs − πs(h) ds).

(7.28)It is possible to recast the SPDE for the density p into a form in which

there are no stochastic integral terms. This form can be analysed; for example,Baras et al. [7] treat the one-dimensional case in this way, establishing theexistence of a fundamental solution to this form of the Zakai equation. Theythen use this fundamental solution to prove existence and uniqueness resultsfor the solution to the Zakai equation without requiring bounds on the sensorfunction h.

Theorem 7.19. If we write

Rt , exp(−Y >t h(x) +

12‖h(x)‖2t

)(7.29)

and define pt(x) , Rt(x)pt(x) then this satisfies the following partial differ-ential equation with stochastic coefficients

dpt = RtA∗(R−1

t pt) dt

with initial condition p0(x) = p0(x).

Proof. Clearly

dRt = Rt

(−h>(x)dYt +

12‖h(x)‖2 dt+

12‖h(x)‖2 d〈Y 〉t

)= Rt

(−h>(x)dYt + ‖h(x)‖2 dt

).

Therefore using (7.20) for dpt it follows by Ito’s formula that

dpt(x) = d(Rt(x)pt(x))

= RtA∗pt(x)dt+Rt(x)h>(x)pt(x) dYt

+ pt(x)Rt(x)(−h>(x) dYt + ‖h(x)‖2dt)− pt(x)Rt‖h(x)‖2 dt= RtA

∗pt(x) dt

= RtA∗(Rt(x)−1pt(x)) dt.

The initial condition result follows from the fact that R0(x) = 1 . ut


7.4 The Dual of ρt

A result similar to Theorem 7.12 justifies the existence of a function dual forthe unnormalised conditional distribution of the signal. Theorem 7.20 statedbelow is an immediate corollary of Theorem 7.12 using a straightforward time-reversal argument. Choose a fixed time horizon t > 0 and let Yt = Yts, s ∈[0, t], be the backward filtration

Yts = σ(Yt − Yr, r ∈ [s, t]).

Theorem 7.20. Let m > 2 be an integer such that (m−2)p > d. Then underthe assumptions C1 – C2, for any bounded ϕ ∈Wm

p (Rd) there exists a uniquefunction-valued process ψt,ϕ = ψt,ϕs , s ∈ [0, t]:

1. For every x ∈ Rd, ψt,ϕs (x) is a real-valued process measurable with respectto the backward filtration Yts.

2. Almost surely, ψt,ϕs (x) is jointly continuous over (s, x) ∈ [0,∞) × Rd andis twice differentiable in the spatial variable x. Both ψt,ϕs and its partialderivatives are continuous bounded functions.

3. ψt,ϕ is a (classical) solution of the following backward stochastic partialdifferential equation,

ψt,ϕs (x) = ϕ(x)−∫ t

s

Aψt,ϕp (x) dp

−∫ t

s

ψt,ϕp (x)h>(x) dYp, 0 ≤ s ≤ t, (7.30)

where∫ tsψt,ϕp h>dY kp is a backward Ito integral.

4. There exists a constant c = c(m, p) independent of ϕ such that

E

[sups∈[0,t]

∥∥ψt,ϕs ∥∥p2,∞

]≤ cm,p1 ‖ϕ‖pm,p. (7.31)

Exercise 7.21. If ϕ ∈ Wmp (Rd) as above, prove that for 0 ≤ r ≤ s ≤ t we

haveψs,ψt,ϕsr = ψt,ϕr .

Theorem 7.22. The process ψt,ϕ = ψt,ϕs , s ∈ [0, t] is the dual of the solu-tion of the Zakai equation. That is, for any ϕ ∈Wm

p (Rd)∩B(Rd), the process

s 7→ ρs(ψt,ϕs

), s ∈ [0, t]

is almost surely constant.

Proof. Let εt ∈ St where St is the set defined in Corollary B.40; that is,

εt = exp(i

∫ t

0

r>s dYs +12

∫ t

0

‖rs‖2 ds),

7.4 The Dual of ρt 181

where r ∈ Cmb ([0, t],Rm). Then for any ϕ ∈ C1,2b ([0, t] × Rd), the identity

(4.13) gives

E [εtρt(ϕt)] = E [εrρr(ϕr)]

+ E[∫ t

r

εsρs

(∂ϕs∂s

+Aϕs + iϕsh>rs

)ds]. (7.32)

Let

εs = exp(i

∫ t

s

r>u dYu +12

∫ t

s

‖ru‖2 du)

;

then for s ∈ [0, t], it is immediate that

E[ψt,ϕs εt | Ys

]= εsE

[ψt,ϕs εs | Ys

].

Since ψt,ϕs and εs are both Yts-measurable, it follows that they are independentof Ys; thus defining Ξ = Ξs, s ∈ [0, t] to be given by Ξs = E[ψt,ϕs εs], itfollows that

E[ψt,ϕs εt | Ys

]= εsΞs.

Since ε = εs, s ∈ [0, t] is a solution of the backward stochastic differentialequation:

εs = 1− i∫ t

s

εur>u dYu, 0 ≤ s ≤ t.

It follows by stochastic integration by parts using the SDE (7.30) that

d(ψt,ϕp εp) = −iψt,ϕp εpr>p dYp + εpAψ

t,ϕp dp+ εpψ

t,ϕp h> dYp + iεph

>rpψt,ϕp dp

and taking expectation and using the fact that ψt,ϕt = ϕ, and εt = 1,

Ξs = ϕ− E[∫ t

s

εpAψt,ϕp dp

]− iE

[∫ t

s

h>rpψt,ϕp dp

], 0 ≤ s ≤ t;

using the boundedness properties of ψ,a,f ,h and r we see that

E[∫ t

s

εpAψt,ϕp dp

]=∫ t

s

AΞp dp,

E[∫ t

s

εph>rpψ

t,ϕp dp

]=∫ t

s

h>rpΞp dp,

hence

Ξs = ϕ−∫ t

s

AΞp dp− i∫ t

s

h>rpΞp dp, 0 ≤ s ≤ t;

in other words Ξ = Ξs, s ∈ [0, t] is the unique solution of the the parabolicPDE (4.14), therefore Ξ ∈ C1,2

b ([0, t] × Rd). Hence from (7.32), for arbitraryr ∈ [0, t]


E[εtρt(ϕ)] = E[εtρt(Ξt)] = E[ρr(εrΞr)] = E[εrΞr]

= E[εrE

[ψt,ϕr εr | Yr

]]= E

[E[εr εrψ

t,ϕr | Yr

]]= E

[εtψ

t,ϕr

]= E

[εtE

[ψt,ϕr | Yr

]]= E

[εtρr(ψt,ϕr )

],

where the penultimate equality uses the fact that ψt,ϕr is Ytr-adapted andhence independent of Yr. The conclusion of the theorem then follows sincethis holds for any εt ∈ St and the set St is total, thus ρr(ψt,ϕr ) = ρt(ϕ) P-a.s.,and as t is fixed this implies that ρr(ψt,ϕr ) is a.s. constant. ut

Remark 7.23. Theorem 7.22 with r = 0 implies that

ρt(ϕ) = π0

(ψt,ϕ0

), P-a.s.,

hence the solution of the Zakai equation is unique (up to indistinguishability).

We can represent ψt,ϕ by using the following version of the Feynman–Kacformula (see Pardoux [238])

ψt,ϕs (x) = E[ϕ (Xt(x)) ats(X(x), Y ) | Y

], s ∈ [0, t], (7.33)

where

ats(X(x), Y ) = exp(∫ t

s

h>(Xs(x)) dYs −12

∫ t

s

‖h(Xs(x))‖2 ds), (7.34)

and Xt(x) follows the law of the signal starting from x, viz

Xt = x+∫ t

s

f(Xs) ds+∫ t

s

σ(Xs) dVs +∫ t

s

σ(Xs) dWs. (7.35)

The same formula appears in Rozovskii [250] (formula (0.3), page 176) underthe name of the averaging over the characteristics (AOC) formula. Using (7.33)we can prove that if ϕ is a non-negative function, then so is ψt,ϕs for anys ∈ [0, t] (see also Corollary 5, page 192 of Rozovskii [250]). We can also use(7.33) to define the dual ψt,ϕ of ρ for ϕ in a larger class than Wm

p (Rd), forexample, for B(Rd). For these classes of ϕ, Rozovskii’s result no longer applies:the dual may not be differentiable and may not satisfy an inequality similarto (7.31). However, if ϕ has higher derivatives, one can use Kunita’s theoryof stochastic flows (see Kunita [164]) to prove that ψt,ϕ is differentiable.


7.2 Let gµ : Rd → R be defined as

gµ =∞∑i=1

µ(ϕi)ϕi.


Then gµ ∈ L2(Rd). Let µ be a measure absolutely continuous with respect toLebesgue measure with density gµ. Then µ(ϕi) = µ(ϕi), since

µ(ϕi) =∫

Rdϕigµ dx =

⟨ ∞∑j=1

µ(ϕj)ϕj , ϕi

⟩= µ(ϕi);

hence via an approximation argument µ(A) = µ(A) for any ball A of arbitrarycenter and radius. Hence µ = µ and since µ is absolutely continuous withrespect to Lebesgue measure the result follows.

7.3

i. First we show that if for p, q ≥ 1, 1/p + 1/q = 1 + 1/r then ‖f ? g‖r ≤‖f‖p‖g‖q, where f ? g denotes the convolution of f and g. Then choosingp = 2, q = 1, and r = 2, we see that for g ∈ L2(Rd), using the fact thatthe L1 norm of the heat kernel is unity,

‖ψεg‖2 = ‖ψε ? g‖2 ≤ ‖ψε‖1‖g‖2 = ‖g‖2.

We now prove the result for convolution. Consider f, g non-negative; let1/p′ + 1/p = 1 and 1/q + 1/q′ = 1. Since 1/p′ + 1/q′ + 1/r = 1 we mayapply Holder’s inequality,

f ? g(x) =∫

Rdf(y)g(x− y) dy

=∫

Rdf(y)p/rg(x− y)q/rf(y)1−p/rg(x− y)1−q/r dy

≤(∫

Rdf(y)pg(x− y)q dy

)1/r (∫Rdf(y)(1−p/r)q′ dy

)1/q′

×(∫

Rdg(x− y)(1−q/r)p′ dy

)1/p′

=(∫

Rdf(y)pg(x− y)q dy

)1/r (∫Rdf(y)p dy

)1/q′

×(∫

Rdg(y)q dy

)1/p′

.

Therefore(f ? g)r(x) ≤ (fp ? gq)(x)‖f‖pr/q

′

p ‖g‖rq/p′

q ,

so by Fubini’s theorem

‖f ? g‖rr ≤ ‖f‖r−pp ‖g‖r−qq

∫Rd

∫Rdfp(y)gq(x− y) dy dx

≤ ‖f‖r−pp ‖g‖r−qq

∫Rdfp(y)

∫Rdgq(x− y) dxdy

≤ ‖f‖r−pp ‖g‖r−qq ‖f‖pp‖g‖qq = ‖f‖rp‖g‖rq.


ii. The function ψ2ε(x) is bounded by 1/(2πε)d/2, therefore

‖Tεµ‖22 =∫

Rd

∫Rd

∫Rdψε(x− y)ψε(x− z)µ(dy)µ(dz) dx

=∫

Rd

∫Rdψ2ε(y − z)µ(dy)µ(dz)

≤(

14πε

)d/2 ∫Rd

∫Rd|µ|(dy)|µ(dz)|

≤(

14πε

)d/2 (|µ|(Rd)

)2<∞.

Also

‖∂iTεµ‖22 =∫

Rd

∫Rd

∫Rd

(xi − yi)ε

ψε(x− y)

× (xi − zi)ε

ψε(x− z)µ(dy)µ(dz) dx

= 2d∫

Rd

∫Rd

∫Rd

(xi − yi)ε

ψ2ε(x− y)exp(−‖x− y‖

2

4ε

)× (xi − zi)

εψ2ε(x− z)exp

(−‖x− z‖

2

4ε

)µ(dy)µ(dz) dx

≤ 2d

ε

∫Rd

∫Rd

∫Rdψ2ε(x− y)ψ2ε(x− z)µ(dy)µ(dz) dx

≤2d

ε

∫Rd

∫Rdψ4ε(y − z)µ(dy)µ(dz)

≤ 2d

ε

(1

8πε

)d/2 (|µ|(Rd)

)2<∞.

In the above the bound supt≥0 te−t2/4 < 1 was used twice. Similar bounds

hold for higher-order derivatives and are proved in a similar manner.iii. From part (ii) Tεµ ∈ L2(R), thus by part (i),

‖T2εµ‖22 = ‖Tε(Tεµ)‖22 ≤ ‖Tεµ‖22.

7.4

i. Immediate from

|Tεfµ(x)| =∣∣∣∣∫

Rdψε(x− y)f(y)µ(dy)

∣∣∣∣ ≤ kfTε|µ|(x).

ii. Assuming first that f ∈ C1b (Rd), integration by parts yields

〈Tεµ, f∂iTεµ〉 =12

∫Rdf(x)∂i

((Tεµ(x))2

)dx

= −12

∫Rd

(Tεµ(x))2∂if(x) dx.


Thus|〈Tεµ, f∂iTεµ〉| ≤ 1

2k′f‖Tεµ‖22,

which implies (ii) for f ∈ C1b (Rd). The general result follows via a standard

approximation argument.iii. ∣∣f∂iTεµ(x)− ∂iTε(fµ)(x)

∣∣=∣∣∣∣∫

Rd(f(x)− f(y))∂iψε(x− y)µ(dy)

∣∣∣∣≤ k′f

∣∣∣∣∫Rd‖x− y‖ |xi − yi|

εψε(x− y)|µ|(dy)

∣∣∣∣≤ 2d/2k′f

∣∣∣∣∫Rd

‖x− y‖2

εexp(−‖x− y‖

2

4ε

)ψ2ε(x− y)|µ|(dy)

∣∣∣∣≤ 2d/2+1k′fT2ε|µ|(x),

where the final inequality follows as a consequence of the fact thatsupt≥0(t exp(−t/4)) < 2.

7.10 Using primes to denote differentiation with respect to the spatial vari-able, from the Zakai equation,

Tερ(ϕ′) = Tεπ0(ϕ′) +∫ t

0

ρs(ATεϕ′) ds+∫ t

0

ρs(hTεϕ′) dYs.

By Ito’s formula, setting zεt = (Tερ)′,

(zεt (ϕ))2 = (Tεπ0)′ϕ+ 2∫ t

0

zεt (ϕ)ρs(ATεϕ′) ds+ 2∫ t

0

zεt (ϕ) dYs

+∫ t

0

(ρs(hTεϕ′))2 ds.

Taking expectation and using Fatou’s lemma,

E (zεt (ϕ))2 ≤ E [(Tεπ0)′(ϕ)] + 2E∫ t

0

zεt (ϕ)ρs(ATεϕ′) ds

+ E∫ t

0

(ρs(hTεϕ′))2 ds.

For the final term

ρs(hTεϕ′) = (hρ)(Tεϕ′) = 〈ϕ, Tε(hρ)〉;

using this and the result (7.9) of Lemma 7.5 it follows that


E (zεt (ϕ))2 ≤ E [(Tεπ0)′(ϕ)] + 2E∫ t

0

zεt (φ)〈ϕ′, (Tεfρ)′〉ds

+ 2E∫ t

0

zεt (φ)〈ϕ′, (Tεaρ)′′〉ds+ E∫ t

0

〈ϕ′, Tε(hρ)〉2 ds.

Therefore integrating by parts yields,

E (zεt (ϕ))2 ≤ E [(Tεπ0)′(ϕ)] + 2E∫ t

0

zεt (φ)〈ϕ, (Tεfρ)′′〉ds

+ 2E∫ t

0

zεt (φ)〈ϕ, (Tεaρ)′′′〉ds+ E∫ t

0

〈ϕ, Tε(hρ)′〉2 ds. (7.36)

Now let ϕ range over an orthonormal basis of L2(Rd), and bound

limn→∞

n∑i=1

(zεt (ϕi))2

using the result (7.36) applied to each term. By the dominated convergencetheorem the limit can be exchanged with the integrals and the result is ob-tained.

7.14 By Fubini and integration by parts (use the bound (7.21) to prove theintegrability of

∫ t0A∗ps(x) ds),

〈∫ t

0

A∗ps ds, ϕ〉 =∫ t

0

ρs(Aϕ) ds.

Next using the definition 7.23 of the stochastic integral appearing in thestochastic partial differential equation (7.20),

〈∫ t

0

h>(x)ps(x) dYs, ϕ〉 =∫ t

0

ρs(ϕh>) dYs.

Hence the result.

7.15 This proof requires that we repeat, with suitable modifications, theproof of Lemma 4.8 and Exercise 4.9. In the earlier proofs, (4.4) was usedfor two purposes, firstly in the proof of Lemma 4.8 to justify via dominatedconvergence interchange of limits and integrals, and secondly in the solutionto Exercise 4.9 to show that the various stochastic integrals are martingales.The condition (7.26) must be used instead.

First for the analogue of Lemma 4.9 we show that (7.24) also holds forϕ ∈ W 2

2 (Rd), by considering a sequence ϕn ∈ C2k(Rd) converging to ϕ in the

‖ · ‖2,2 norm. From Theorem 7.11 with k = 0,

E[

sup0≤s≤t

‖ps‖22]≤ c‖p0‖22 <∞,


since we assumed the initial state density was in L2(Rd); thus sup0≤s≤t ‖ps‖22 <∞ P-a.s. Therefore by the Cauchy–Schwartz inequality∫ t

0

ρs(ϕ) ds =∫ t

0

〈ps, ϕ〉ds ≤∫ t

0

‖ps‖2‖ϕ‖2 ds

≤ ‖ϕ‖2∫ t

0

‖ps‖2 ds ≤ t‖ϕ‖2 sup0≤s≤t

‖ps‖2 <∞ P-a.s.

and similarly∫ t

0

ρs(∂iϕ) ds ≤ ‖∂iϕ‖2∫ t

0

‖ps‖2 ds <∞ P-a.s.,

and ∫ t

0

ρs(∂i∂jϕ) ds ≤ t‖∂i∂jϕ‖2∫ t

0

‖ps‖2 ds <∞ P-a.s.

Thus using the boundedness (from C2) of the aij and fi, it follows from thedominated convergence theorem that

limn→∞

∫ t

0

ρs(Aϕn) ds =∫ t

0

ρs(Aϕ) ds.

From the boundedness of h, and Cauchy–Schwartz

limn→∞

∫ t

0

[ρs(hiϕn)− ρs(hiϕ)]2 ds ≤ ‖h‖2∞∫ t

0

〈ps, ϕn − ϕ〉2 ds

≤ ‖h‖2∞ sup0≤s≤t

‖pt‖2t‖ϕn − ϕ‖22 = 0,

so by Ito’s isometry

limn→∞

∫ t

0

ρs(h>ϕn) dYs =∫ t

0

ρs(h>ϕ) dYs.

Thus from these convergence results (7.24) is satisfied for any ϕ ∈W 22 . The re-

sult can then be extended to time-dependent ϕ, which is uniformly bounded inW 2

2 over [0, t] by piecewise approximation followed by the dominated conver-gence theorem using the bounds just derived. Thus for any ϕ ∈ C1,2

b ([0, t]×Rd)such that ϕt ∈W 2

2 ,

ρt(ϕt) = ρ0(ϕ0) +∫ t

0

ρs

(∂ϕs∂s

+Aϕs

)ds+

∫ t

0

ρs(ϕsh>) dYs.

For the second part of the proof, apply Ito’s formula to εtρt(ϕt) and then takeexpectation. In order to show that the stochastic integrals are martingales andtherefore have zero expectation, we may use the bound


E[∫ t

0

ε2s (ρs(ϕs))

2 ds]≤ e‖r‖

2∞tE

[∫ t

0

(ρs(ϕs))2 ds]

≤ E[∫ t

0

〈ps, ϕ〉2 ds]

≤ E[∫ t

0

‖ϕs‖22‖ps‖22 ds]

≤ t(

sup0≤s≤t

‖ϕs‖2)2

E[

sup0≤s≤t

‖ps‖22]<∞.

Consequently since the stochastic integrals are all martingales, we obtain

E [εtρt(ϕt)] = π0(ϕ0) + E[∫ t

0

εsρs

(∂ϕs∂s

+Aϕs + iϕsh>rs

)ds].

7.21 It is immediate from (7.30) that

ψs,ψt,ϕss = ψt,ϕs ;

thus by subtraction of (7.30) at times s and r, for 0 ≤ r ≤ s ≤ t, we obtain

ψt,ϕr = ψt,ϕs −∫ s

r

Aψt,ϕp dp−∫ s

r

ψt,ϕp h> dYp

and this is the same as the evolution equation for ψs,ψt,ϕs

r . Therefore by theuniqueness of its solution (Theorem 7.20), ψt,ϕr = ψ

s,ψt,ϕsr for r ∈ [0, s].

Part II

Numerical Algorithms

8

Numerical Methods for Solving the FilteringProblem

This chapter contains an overview of six classes of numerical methods forsolving the filtering problem. For each of the six classes, we give a brief de-scription of the ideas behind the methods and state some related results. Thelast class of methods presented here, particle methods, is developed and stud-ied in depth in Chapter 9 for the continuous time framework and in Chapter10 for the discrete one.

8.1 The Extended Kalman Filter

This approximation method is based on a natural extension of the exact com-putation of the conditional distribution for the linear/Gaussian case. Recallfrom Chapter 6, that in the linear/Gaussian framework the pair (X,Y ) satis-fies the (d+m)-dimensional system of linear stochastic differential equations(6.17) and (6.18); that is,

dXt = (FtXt + ft) dt+ σtdVtdYt = (HtXt + ht) dt+ dWt.

(8.1)

In (8.1), the pair (V,W ) is a (d+m)-dimensional standard Brownian motion.Also Y0 = 0 and X0 has a Gaussian distribution, X0 ∼ N(x0, p0), and isindependent of (V,W ). The functions

F : [0,∞)→ Rd×d, f : [0,∞)→ Rd

H : [0,∞)→ Rd×m, h : [0,∞)→ Rm

are locally bounded, measurable functions. Then πt, the conditional distri-bution of the signal Xt, given the observation σ-algebra Yt is Gaussian.Therefore πt is uniquely identified by its mean and covariance matrix. Letx = xt, t ≥ 0 be the conditional mean of the signal; that is, xit = E[Xi

t |Yt].Then x satisfies the stochastic differential equation (6.27), that is,


192 8 Numerical Methods for Solving the Filtering Problem

dxt = (Ftxt + ft) dt+RtH>t (dYt − (Htxt + ht) dt),

and R = Rt, t ≥ 0 satisfies the deterministic matrix Riccati equation (6.28),

dRtdt

= σtσ>t + FtRt +RtF

>t −RtH>t HtRt.

We note that R = Rt, t ≥ 0 is the conditional covariance matrix of thesignal; that is, Rt = (Rijt )di,j=1 has components

Rijt = E[XitX

jt |Yt]− E[Xi

t |Yt]E[Xjt |Yt], i, j = 1, . . . , d, t ≥ 0.

Therefore, in this particular case, the conditional distribution of the signal isexplicitly described by a finite set of parameters (xt and Rt) which, in turn, areeasy to compute numerically. The conditional mean xt satisfies a stochasticdifferential equation driven by the observation process Y and is computedonline, in a recursive fashion, updating it as new observation values becomeavailable. However Rt is independent of Y and can be computed offline, i.e.,before any observation is obtained.

Some of the early applications of the linear/Gaussian filter, known as theKalman–Bucy filter, date back to the early 1960s. They include applications tospace navigation, aircraft navigation, anti-submarine warfare and calibrationof inertial navigation systems. Notably, the Kalman–Bucy filter was used toguide Rangers VI and VII in 1964 and the Apollo space missions. See Bucyand Joseph [31] for details and a list of early references. For a recent self-contained treatment of the Kalman–Bucy filter and a number of applicationsto mathematical finance, genetics and population modelling, see Aggoun andElliott [2] and references therein.

The result obtained for the linear filtering problem (8.1) can be generalizedas follows. Let (X,Y ) be the solution of the following (d + m)-dimensionalsystem of stochastic differential equations

dXt = (F (t, Y )Xt + f(t, Y )) dt+ σ(t, Y ) dVt

+m∑i=1

(Gi(t, Y )Xt + gi(t, Y ))dY it (8.2)

dYt = (H(t, Y )Xt + h(t, Y )) dt+ dWt,

where F, σ,G1, . . . , Gn : [0,∞) × Ω → Rn×n, f, g1, . . . , gn : [0,∞) × Ω →Rn, H : [0,∞) × Ω → Rn×m and h : [0,∞) × Ω → Rm are progressivelymeasurable† locally bounded functions. Then, as above, πt is Gaussian withmean xt, and variance Rt which satisfy the following equations† If (Ω,F ,Ft,P) is a filtered probability space, then we say that a : [0,∞)×Ω → RN

is a progressively measurable function if, for all t ≥ 0, its restriction to [0, t]×Ωis B ([0, t])×Ft-measurable, where B([0, t]) is the Borel σ-algebra on [0, t]).

8.1 The Extended Kalman Filter 193

dxt =

(F (t, Y )xt + f(t, Y ) +

m∑i=1

Gi(t, Y )RtH>i (t, Y )

)dt

+m∑i=1

(Gi(t, Y )xt + gi(t, Y )) dY it

+RtH>(t, Y ) (dYt − (Ht(t, Y )xt + ht(t, Y )) dt) (8.3)

dRt =(F (t, Y )Rt +RtF (t, Y ) + σ(t, Y )σ>(t, Y )

+m∑i=1

Gi(t, Y )RtG>i (t, Y ))

dt−RtH>(t, Y )H(t, Y )Rt dt

+m∑i=1

(Gi(t, Y )Rt +RtG>i (t, Y )) dY it . (8.4)

The above formulae can be used to estimate πt for more general classes offiltering problems, which are non-linear. This will lead to the well-known ex-tended Kalman filter (EKF for short). The following heuristic justification ofthe EKF follows that given in Pardoux [238].

Let (X,Y ) be the solution of the following (d+m)-dimensional system ofnon-linear stochastic differential equations

dXt = f(Xt) dt+ σ(Xt) dVt + g(Xt) dWt

dYt = h(Xt) dt+ dWt,(8.5)

and assume that (X0, Y0) = (x0, 0), where x0 ∈ Rd. Define xt to be thesolution of the ordinary differential equation

dxtdt

= f(xt), x0 = x0.

The contribution of the two stochastic terms in (8.5) remains small, at leastwithin a small window of time [0, ε], so a trajectory t 7→ Xt may be viewedas being a perturbation from the (deterministic) trajectory t→ xt. Thereforethe following Taylor-like expansion is expected

dXt ' (f ′(xt)(Xt − xt) + f(xt)) dt+ σ(xt) dVt + g(xt) dWt

dYt ' (h′(xt)(Xt − xt) + h(xt)) dt+ dWt.

In the above equation, ‘'’ means approximately equal, although one can notattach a rigorous mathematical meaning to it. Here f ′ and h′ are the deriva-tives of f and h. In other words, for a small time window, the equation satisfiedby the pair (X,Y ) is nearly linear. By analogy with the generalized linear fil-ter (8.2), we can ‘conclude’ that πt is ‘approximately’ normal with mean xtand with covariance Rt which satisfy (cf. (8.3) and (8.4))


dxt = [(f ′ − gh′)(xt)xt + (f − gh)(xt)− (f ′ − gh′)(xt)xt] dt

+ g(xt)dYt +Rth′>(xt)[dYt − (h′(xt)xt + h(xt)− h′(xt)xt) dt]

dRtdt

= (f ′ − gh′)(xt)Rt +Rt(f ′ − gh′)>(xt) + σσ>(xt)−Rth′>h′(xt)Rt

with x0 = x0 and R0 = p0. Hence, we can estimate the position of the signalby using xt as computed above. We can use the same procedure, but insteadof xt we can use any Yt-adapted ‘estimator’ process mt. Thus, we obtain amapping Λ from the set of Yt-adapted ‘estimator’ processes into itself

mtΛ−→ xt.

The extended Kalman filter (EKF) is the fixed point of Λ; that is, the solutionof the following system

dxt = (f − gh)(xt)dt+ g(xt)dYt +Rth′>(xt)[dYt − h(xt)dt]

dRtdt

= (f ′ − gh′)(xt)Rt +Rt(f ′ − gh′)>(xt) + σσ>(xt)−Rth′>h′(xt)Rt.

Although this method is not mathematically justified, it is widely used in prac-tice. The following is a minute sample of some of the more recent applicationsof the EKF.

• In Bayro-Corrochano et al. [8], a variant of the EKF is used for the motionestimation of a visually guided robot operator.

• In Kao et al. [148], the EKF is used to optimise a model’s physical param-eters for the simulation of the evolution of a shock wave produced througha high-speed flyer plate.

• In Mangold et al. [202], the EKF is used to estimate the state of a moltencarbonate fuel cell.

• In Ozbek and Efe [235], the EKF is used to estimate the state and theparameters for a model for the ingestion and subsequent metabolism of adrug in an individual.

The EKF will give a good estimate if the initial position of the signal iswell approximated (p0 is ‘small’), the coefficients f and g are only ‘slightly’non-linear, h is injective and the system is stable. Theorem 8.5 (below) givesa result of this nature. The result requires a number of definitions.

Definition 8.1. The family of function fε : [0,∞) × Rd → Rd, ε ≥ 0, issaid to be almost linear if there exists a family of matrix-valued functionsFt : Rd → Rd×d such that, for any t ≥ 0 and x, y ∈ Rd, we have

|fε(t, x)− fε(t, y)− Ft(x− y)| ≤ µε|x− y|,

for some family of numbers µε converging to 0 as ε converges to 0.

8.1 The Extended Kalman Filter 195

Definition 8.2. The function fε : [0,∞) × Rd → Rd is said to be stronglyinjective if there exists a constant c > 0 such that

|f(t, x)− f(t, y)| ≥ c|x− y|

for any x, y ∈ Rd.

Definition 8.3. A family of stochastic processes ξεt , t ≥ 0, ε > 0, is saidto be bounded in L∞− if, for any q < ∞ there exists εq > 0 such that ‖ξεt ‖qis bounded uniformly for (t, ε) ∈ [0,∞)× [0, εq].

Definition 8.4. The family ξεt , ε > 0, is said to be of order εα for some α > 0if ε−αξεt is bounded in L∞−.

Assume that the pair (Xε, Y ε) satisfies the following system of SDEs,

dXεt = βε(t,Xε

t )dt+√εσ(t,Xε

t )dWt +√εγ(t,Xε

t )dBtdY εt = hε(t,Xε

t )dt+√εdBt.

The following theorem is proved in Picard [240].

Theorem 8.5. Assume that p−1/20 (Xε

0 − x0) is of order√ε and the following

conditions are satisfied.

• σ and γ are bounded.• βε and hε are continuously. differentiable and almost linear.• h is strongly injective and σσ> is uniformly elliptic.• The ratio of the largest and smallest eigenvalues of P0 is bounded.

Then (Rεt )−1/2(Xε

t − xεt ) is of order√ε.

Hence the EKF works well under the conditions described above. If anyof these conditions are not satisfied, the approximation can be very bad. Thefollowing two examples, again taken from [240], show this fact.

Suppose first that Xε and Y ε are one-dimensional and satisfy

dXεt = (2 arctanXε

t −Xεt )dt+

√εdWt

dY εt = HXεt dt+

√εdBt,

where H is a positive real number. In particular, the signal’s drift is no longeralmost linear. The deterministic dynamical system associated with Xε (ob-tained for ε = 0) has two stable points of equilibrium denoted by x0 > 0 and−x0. The point 0 is an unstable equilibrium point.

The EKF performs badly in this case. For instance, it cannot be used todetect phase transitions of the signal. More precisely, suppose that the signalstarts from x0. Then, for all ε, Xε

t will change sign with probability one. Infact, one can check that

α0 = limε→0

ε log(E [inft > 0; Xεt < 0])


exists and is finite. We choose α1 > α0 and t1 , exp(α1/ε). One can provethat

limε→0

P[(Xεt1 < 0

)]=

12,

but on the other hand,

limε→0

P [(xt1 > x0 − δ)] = 1

for small δ > 0. Hence Xεt − xεt does not converge to 0 in probability as ε

tends to 0.In the following example the EKF does not work because the initial con-

dition of the signal is imprecisely known. Assume that Xε is one-dimensional,Y ε is two-dimensional, and that they satisfy the system of SDEs,

dXεt =√εdWt

dY ε,1t = Xεt +√εdB1

t

dY ε,2t = 2|Xεt |+

√εB2

t ,

and Xε0 ∼ N(−2, 1). In this case Xε

t − xεt does not converge to 0. To beprecise,

lim infε→0

P(

infs≤t

Xεs ≥ 1, sup xεt ≤ −1

)> 0.

For further results and examples see Bensoussan [12], Bobrovsky and Zakai[21], Fleming and Pardoux [97] and Picard [240, 243].

8.2 Finite-Dimensional Non-linear Filters

We begin by recalling the explicit expression of the conditional distribution ofthe Benes filter as presented in Chapter 6. Let X and Y be one-dimensionalprocesses satisfying the system of stochastic differential equations (6.1) and(6.3); that is,

dXt = f(Xt) dt+ σdVtdYt = (h1Xt + h2) dt+ dWt

(8.6)

with (X0, Y0) = (x0, 0), where x0 ∈ R. In (8.6), the pair process (V,W ) is atwo-dimensional Brownian motion, h1, h2, σ ∈ R are constants with σ > 0,and f : R→ R is differentiable with bounded derivative (Lipschitz) satisfyingthe Benes condition

f ′(x) + f2(x)σ−2 + (h1x+ h2)2 = p2x2 + 2qx+ r, x ∈ R,

where p, q, r ∈ R are arbitrary. Then πt satisfies the explicit formula (6.15);that is,

8.2 Finite-Dimensional Non-linear Filters 197

πt(ϕ) =1ct

∫ ∞−∞

ϕ(z)exp(F (z)σ−2 +Qt(z)

)dz, (8.7)

where F is an antiderivative of f , ϕ is an arbitrary bounded Borel-measurablefunction, Qt(z) is the second-order polynomial

Qt(z) , z

(h1σ

∫ t

0


dYs +q + p2x0

pσ sinh(tpσ)− q

pσcoth(tpσ)

)−p coth(tpσ)

2σz2

and ct is the corresponding constant,

ct ,∫ ∞−∞

exp(F (z)σ−2 +Qt(z)

)dz. (8.8)

In particular, π only depends on the one-dimensional Yt-adapted process

t 7→ ψt =∫ t

0

sinh(spσ) dYs.

The explicit formulae (8.7) and (8.8) are very convenient. If the observationsarrive at the given times (ti)i≥0, then ψti can be recursively approximatedusing, for example, the Euler method

ψti+1 = ψti + sinh(ti+1pσ)(Yti+1 − Yti)

and provided the constant ct and the antiderivative F can be computed thisgives an explicit approximation of the density of πt. Chapter 6 gives someexamples where this is possible. If ct and F are not available in closed formthen they can be approximated via a Monte Carlo method for c and numericalintegration for F .

The following extension to the d-dimensional case (see Benes [9] for details)is valid. Let f : Rd → Rd be an irrotational vector field; that is, there existsa scalar function F such that f = ∇F and assume that the signal and theobservation satisfy

dXt = f(Xt)dt+ dVt, X0 = x (8.9)dYt = Xtdt+Wt, Y0 = 0, (8.10)

and further assume that F satisfies the following condition

∇2F + |∇F |2 + |z|2 = z>Qz + q>Z + c, (8.11)

where Q ≥ 0 and Q = Q>. Let T be an orthogonal matrix such that TQT> =Λ, where Λ is the diagonal matrix of (nonnegative) eigenvalues λi of Q and b =Tq. Let k = (

√λ1, . . . ,

√λd), u> = (0, 1,−1, 0, 1,−1, . . . repeated d times)

and m be the 3d-dimensional solution of the equation


dmdt

= Am, (8.12)

where m(0) = (x1, 0, 0, x2, 0, 0, . . . , xd, 0, 0) and

A =

A1 0

A2

. . .0 Ad

, Ai =

−ki 0 00 0 0ki(Ty)i − bi/2 0 0

.Let also R be the 3d× 3d matrix-valued solution of

dRdt

= Y +RA∗ +AR,

where

R =

R1 0

R2

. . .0 Rd

, Y =

Y1 0

Y2

. . .0 Yd

,

Yi =

1(TYt)i0

(1, (TYt)i, 0) .

Then we have the following theorem (see Benes [9] for details).

Theorem 8.6. If condition (8.11) is satisfied, then πt satisfies the explicitformula

πt(ϕ) =1ct

∫Rdϕ(z)exp(F (z) + Ut(z)) dz,

where ϕ is an arbitrary bounded Borel-measurable function, Ut(z) is thesecond-order polynomial

Ut(z) = z>Yt +12z>Q1/2z − 1

2(Tz +Ru−m)>R−1(Tz +Ru−m), z ∈ Rd

and ct is the corresponding normalising constant

ct =∫

Rdexp(F (z) + Ut(z)) dz.

As in the one-dimensional case, this filter is finite-dimensional. The con-ditional distribution of the signal πt depends on the triplet (Y,m,R), whichcan be recursively computed/approximated. Again, as long as the normalis-ing constant ct and the antiderivative F can be computed we have an explicitapproximation of the density of πt and if ct and F are not available in closed

8.3 The Projection Filter and Moments Methods 199

form they can be approximated via a Monte Carlo method and numericalintegration, respectively.

The above filter is equivalent to the Kalman–Bucy filter: one can be ob-tained from the other via a certain space transformation. This in turn inducesa homeomorphism which makes the Lie algebras associated with the two fil-ters equivalent (again see Benes [9] for details). However in [10], Benes hasextended the above class of finite-dimensional non-linear filters to a largerclass with corresponding Lie algebras which are no longer homeomorphic tothe Lie algebra associated with the Kalman–Bucy filter. Further work onfinite-dimensional filters and numerical schemes based on approximation us-ing these classes of filter can be found in Cohen de Lara [58, 59], Daum[69, 70], Schmidt [253] and the references therein. See also Darling [68] foranother related approach.

8.3 The Projection Filter and Moments Methods

The projection filter (see Brigo et al. [24] and the references therein) is analgorithm which provides an approximation of the conditional distribution ofthe signal in a systematic way, the method being based on the differentialgeometric approach to statistics. The algorithm works well in some cases, forexample, the cubic sensor example discussed below, but no general conver-gence theorem is known.

Let S , p(·, θ), θ ∈ Θ be a family of probability densities on Rd, whereΘ ⊆ Rn is an open set of parameters and let

S1/2 , √p(·, θ), θ ∈ Θ ∈ L2(Rd)

be the corresponding set of square roots of densities. We assume that for allθ ∈ Θ,

∂√p(·, θ)∂θ1

, . . . ,∂√p(·, θ)∂θn

are independent vectors in L2(Rd), i.e., that S1/2 is an n-dimensional sub-manifold of L2(Rd), The tangent vector space at

√p(·, θ) to S1/2 is

L√p(·,θ)S

1/2 = span

∂√p(·, θ)∂θ1

, . . . ,∂√p(·, θ)∂θn

.

The L2-inner product of any two elements of the basis is defined as⟨∂√p(·, θ)∂θi

,∂√p(·, θ)∂θj

⟩=

14

∫Rd

1p(x, θ)

∂p(x, θ)∂θi

∂p(x, θ)∂θj

dx =14gij(θ),

where g(θ) = (gij(θ)) is called the Fisher information matrix and followingnormal tensorial convention, its inverse is denoted by g−1(θ) = (gij(θ)).


In the following we choose S to be an exponential family, i.e.,

S = p(x, θ) = exp(θ>c(x)− ψ(θ)

): θ ∈ Θ,

where c1, . . . , cn are scalar functions such that 1, c1, . . . , cn are linearly in-dependent. We also assume that Θ ⊆ Θ0 where

Θ0 =θ ∈ Rn : ψ(θ) , log

∫eθ>c(x) dx <∞

and that Θ0 has non-empty interior. Let X and Y be the solution of thefollowing system of SDEs,

dXt = f(t,Xt) dt+ σ(t,Xt) dWt

dYt = h(t,Xt) dt+ dVt.

The density πt(z) of the conditional distribution of the signal satisfies theStratonovich SDE,

dπt(z) = A∗πt(z)dt− 12πt(z)(‖h(z)‖2 − πt(‖h‖2))

+ πt(z)(h>(z)− πt(h>)) dYt, (8.13)

where is used to denote Stratonovich integration and A∗ is the operatorwhich is the formal adjoint of A,

A∗ϕ , −d∑i=1

∂

∂xi(f iϕ) + 1

2

d∑i,j=1

∂2

∂xi∂xj

(ϕ

d∑k=1

σikσjk

).

By using the Stratonovich chain rule, we get from (8.13) that

d√πt =

12√πt dπt = Rt(

√πt)dt−Q0

t (√πt)dt+

m∑k=1

Qkt (√πt) dY kt ,

where Rt and(Qkt)mk=0

are the following non-linear time-dependent operators

Rt(√p) ,

A∗p

2√p

Q0t (√p) ,

√p

4(‖h‖2 − πt

(‖h‖2

))Qkt (√p) ,

√p

2(‖h‖k − πt

(‖h‖k

)).

Assume now that for all θ ∈ Θ and all t ≥ 0

Ep(·,θ)

[(A∗p(·, θ)p(·, θ)

)2]<∞

8.3 The Projection Filter and Moments Methods 201

and Ep(·,θ)[|h|4] < ∞. This implies that Rt(√p(·, θ)) and Qkt (

√p(·, θ)), for

k = 0, 1, . . . ,m, are vectors in L2(Rd).We define the exponential projection filter for the exponential family S to

be the solution of the stochastic differential equation

d√p(·, θt) = Λθt Rt(

√p(·, θt)) dt− Λθt Q0

t (√p(·, θt)) dt

+m∑k=1

Λθt Qkt (√p(·, θt)) Y kt ,

where Λθt : L2 → L√p(·,θ)S

1/2 is the orthogonal projection

vΛθt7→

n∑i=1

n∑j=1

4gij (θ)

⟨v,

∂√p(·, θ)∂θj

⟩ ∂√p(·, θ)∂θi

.

In other words,√p(·, θt) satisfies a differential equation whose driving vector

fields are the projections of the corresponding vector fields appearing in theequation satisfied by

√πt onto the tangent space of the manifold S1/2, and

therefore, p(·, θt) is a natural candidate for an approximation of the conditionaldistribution of the signal at time t, when the approximation is sought amongthe elements of S.

One can prove that for the exponential family

p(x, θ) = exp[θ>c(x)− ψ(θ)

],

the projection filter density Rπt is equal to p(·, θt), where the parameter θtsatisfies the stochastic differential equation

dθt = g−1(θt)(

E[Ac− 1

2‖h‖2(c− E[c])

]dt

+m∑k=1

E[hkt (c− E[c])] Y kt), (8.14)

where E[·] = Ep(·,θt)[·]. Therefore, in order to approximate πt, solve (8.14) andthen compute the density corresponding to its solution.

Example 8.7. We consider the cubic sensor, i.e., the following problem

dXt = σ dWt

dYt = X3t dt+ dVt.

We choose now S to be the following family of densities

S =

p(x, θ) = exp

(6∑i=1

θixi − ψ(θ)

): θ ∈ Θ ⊂ R6, θ6 < 0

.


Let ηk(θ) be the kth moment of the probability with density p(·, θ), i.e.,ηk(θ) ,

∫∞−∞ xkp(x, θ) dx; clearly η0(θ) = 1. It is possible to show that the

following recurrence relation holds

η6+i(θ) = − 16θ6

(i+ 1)ηi(θ) +6∑j=1

θjηi+j(θ)

, i ≥ 0,

and therefore we only need to compute η1(θ), . . . , η5(θ) in order to computeall the moments. The entries of the Fisher information matrix gij(θ) are givenby

gij(θ) =∂2ψ(θ)∂θi∂θj

= ηi+j(θ)− ηi(θ)ηj(θ)

and (8.14) reduces to the SDE,

dθt = g−1(θt)γ•(θt)dt− λ0• dt+ λ•dYt,

where

λ0• = (0, 0, 0, 0, 0, 1/2)>

λ• = (0, 0, 1, 0, 0, 0)>

γ• = 12σ

2(0, 2η0(θ), 6η1(θ), 12η2(θ), 2− η3(θ), 30η4(θ))>.

See Brigo et al. [24] for details of the numerical implementation of the projec-tion filter in this case.

The idea of fixing the form of the approximating conditional density andthen evolving it by imposing appropriate constraints on the parameters wasfirst introduced by Kushner in 1967 (see [177]). In [183], the same method isused to produce approximations for the filtering problem with a continuoustime signal and discrete time observations.

8.4 The Spectral Approach

The spectral approach for the numerical estimation of the conditional distri-bution of the signal was introduced by Lototsky, Mikulevicius and Rozovskiiin 1997 (see [197] for details). Further developments on spectral methods canbe found in [195, 198, 199]. For a recent survey see [196]. This section followsclosely the original approach and the results contained in [197] (see also [208]).

Let us begin by recalling from Chapter 7 that pt(z), the density of theunnormalised conditional distribution of the signal, is the (unique) solutionof the stochastic partial differential equation (7.20),

pt(x) = p0(x) +∫ t

0

A∗ps(x) ds+∫ t

0

h>(x)ps(x) dYs,

8.4 The Spectral Approach 203

in a suitably chosen function space (e.g. L2k(Rd)). The spectral approach is

based on decomposing pt into a sum of the form

pt(z) =∑α

1√α!ϕα(t, z)ξα(Y ), (8.15)

where ξα(Y ) are certain polynomials (see below) of Wiener integrals withrespect to Y and ϕα(t, z) are deterministic Hermite–Fourier coefficients in theCameron–Martin orthogonal decomposition of pt(z). This expansion separatesthe parameters from the observations: the Hermite–Fourier coefficients aredetermined only by the coefficients of the signal process, its initial distributionand the observation function h, whereas the polynomials ξα(Y ) are completelydetermined by the observation process.

A collection α = (αlk)1≤l≤d,k≥1 of nonnegative integers is called a d-dimensional multi-index if only finitely many of αlk are different from zero.Let J be the set of all d-dimensional multi-indices. For α ∈ J we define:

|α| ,∑l,k

αlk : the length of α

d(α) , maxk ≥ 1 : αlk > 0 for some 1 ≤ l ≤ d

: the order of α

α! ,∏k,l

αlk!.

Let mk = mk(s)k≥1 be an orthonormal system in the space L2([0, t]) andξk,l be the following random variables

ξk,l =∫ t

0

mk(s) dY l(s).

Under the new probability measure P, ξk,l are i.i.d. Gaussian random variables(as Y =

(Y l)

is a standard Brownian motion under P). Let also (Hn)n≥1 bethe Hermite polynomials

Hn(x) , (−1)nex2/2 d2

dxne−x

2/2

and (ξα)α be the Wick polynomials

ξα ,∏k,l

Hαlk(ξk,l)√αlk!

.

Then (ξα)α form a complete orthonormal system in L2(Ω,Yt, P). Their cor-responding coefficients in the expansion (8.15) satisfy the following system ofdeterministic partial differential equations


dϕαt (z)dt

= A∗ϕα(t, z) +∑k,l

αlkmk(t)hl(z)ϕα(k,l)(t, z)

ϕα0 (z) = π0(z)1|α|=0,

(8.16)

whereα = (αlk)1≤l≤d,k≥1 ∈ J

and α(i, j) stands for the multi-index (αlk)1≤l≤d,k≥1 with

αlk =αlk if k 6= i or ` 6= j or bothmax(0, αji − 1) if k = i and ` = j

.

Theorem 8.8. Under certain technical assumptions (given in Lototsky et al.[197]), the series ∑

α

1√α!ϕαt (z)ξα

converges in L2(Ω, P) and in L1(Ω,P) and we have

pt(z) =∑α

1√α!ϕα(t, z)ξα, P-a.s. (8.17)

Also the following Parseval’s equality holds

E[|pt(z)|2] =∑α

1α!|ϕα(t, z)|2.

For computational purposes one needs to truncate the sum in the expan-sion of pt. Let JnN be the following finite set of indices

JnN = α : |α| ≤ N, d(α) ≤ n

and choose the following deterministic basis

m1(s) =1√t; mk(s) =

√2t

cos(π(k − 1)s

t

), k ≥ 1, 0 ≤ s ≤ t.

Then, again under some technical assumptions, we have the following.

Theorem 8.9. If pn,Nt (z) ,∑α∈JnN

(1/√α!)ϕα(t, z)ξα, then

E[‖pn,Nt − pt‖2L2] ≤ C1

t

(N + 1)!+C2t

n,

supz∈Rd

E[|pn,Nt (z)− pt(z)|2] ≤ C1t

(N + 1)!+C2t

n,

where the constants C1t , C2

t , C1t , and C2

t are independent of n and N .

8.4 The Spectral Approach 205

One can also construct a recursive version of the expansion (8.17) (see[197] for a discussion of the method based on the above approximation). Let0 = t0 < t1 < · · · < tM = T be a uniform partition of the interval [0, T ]with step ∆ (ti = i∆, i = 0, . . . , M). Let mi

k = mik(s) be a complete

orthonormal system in L2([ti−1, ti]). We define the random variables

ξik,l =∫ ti

ti−1

mik(s) dY l(s), ξiα =

∏k,l

Hαlk(ξik,l)√

(αlk)!

,

where Hn is the nth Hermite polynomial. Consider the following system ofdeterministic partial differential equations

dϕiα(t, z, g)dt

= A∗ϕiα(t, z, g)

+∑k,l

αl,kmiα(t)hl(z)ϕiα(k,l)(t, z, g), t ∈ [ti−1, ti]

ϕiα(ti−1, z, g) = g(z)1|α|=0.

(8.18)

We observe that, for each i = 1, . . . ,M , the system (8.18) is similar to (8.16),the difference being that the initial time is no longer zero and we allow for anarbitrary initial condition which may be different for different is. The followingis the recursive version of Theorem 8.8.

Theorem 8.10. If p0(z) = π0(z), then for each z ∈ Rd and each ti, i =1,. . . ,M , the unnormalised conditional distribution of the signal is given by

pti(z) =∑α

1√α!ϕiα(ti, z, pti−1(·))ξiα (P-a.s.). (8.19)

The series converges in L2(Ω,Yt, P) and L1(Ω,Yt,P) and the following Par-seval’s equality holds,

E[|pti(z)|2] =∑α

1α!|ϕiα(ti, z, pti−1(·))|2.

For computational purposes we truncate (8.19). We introduce the followingbasis

mik(t) = mk(t− ti−1), ti−1 ≤ t ≤ ti,

m1(t) =1√∆,

mk(t) =

√2∆

cos(π(k − 1)t

∆

), k ≥ 1, t ∈ [0, ∆],

mk(t) = 0, k ≥ 1, t 6∈ [0, ∆].


Theorem 8.11. If pn,N0 (z) = π0(z) and

pn,Nti (z) =∑α∈JnN

1√α!ϕiα(∆, z)ξiα,

where ϕiα(∆, z) are the solutions of the system

dϕiα(t, z)dt

= A∗ϕiα(t, z) +∑k,l

αl,kmiα(t)hl(z)ϕiα(k,l)(t, z), t ∈ [0, ∆]

ϕiα(0, z) = pn,Nti−1(z)1|α|=0,

then

max1≤i≤M

E[‖pn,Nti − pti‖2L2] ≤ BeBT

((C∆)N

(N + 1)!+∆2

n

),

max1≤i≤M

supz

E[|pn,Nti (z)− pti(z)|2] ≤ BeBT(

(C∆)N

(N + 1)!+∆2

n

),

where the constants B, C, B and C are independent of n, N , ∆ and T .

8.5 Partial Differential Equations Methods

This type of method uses the fact that pt(z), the density of the unnormalisedconditional distribution of the signal, is the solution of a partial differentialequation, albeit a stochastic one. Therefore classical PDE methods may beapplied to this stochastic PDE to obtain an approximation to the density pt.These methods are very successful in low-dimensional problems, but cannotbe applied in high-dimensional problems as they require the use of a spacegrid whose size increases exponentially with the dimension of the state spaceof the signal. This section follows closely the description of the method givenin Cai et al. [37]. The first step is to apply the splitting-up algorithm (see[186, 187] for results and details) to the Zakai equation

dpt(z) = A∗pt(z) dt+ pt(z)h>(z) dYt.

Let 0 = t0 < t1 < · · · < tn < · · · be a uniform partition of the interval [0,∞)with time step ∆ = tn−tn−1. Then the density ptn(z) will be approximated byp∆n (z), where the transition from p∆n−1(z) to p∆n (z) is divided into the followingtwo steps.

• The first step, called the prediction step, consists in solving the followingFokker–Planck equation for the time interval [tn−1, tn],

∂pnt∂t

= A∗pt(z)

pntn−1= p∆n−1

8.5 Partial Differential Equations Methods 207

and we denote the prior estimate by p∆n , pntn . The Fokker–Planck equa-tion is solved by using the implicit Euler scheme, i.e., we solve

p∆n −∆A∗p∆n = p∆n−1. (8.20)

• The second step, called the correction step, uses the new observation Ytnto update p∆n . Define

z∆n ,1∆

(Ytn − Ytn−1

)=

1∆

∫ tn

tn−1

h(Xs) ds+1∆

(Wtn −Wtn−1

).

Using the Kallianpur–Striebel formula, define p∆n (z) for z ∈ Rd as

p∆n (z) , cnψ∆n (z)p∆n (z),

where ψ∆n (z) , exp(− 1

2∆ ‖z∆n − h(z)‖2

)and cn is a normalisation con-

stant chosen such that ∫Rdp∆n (z) dz = 1.

Assume that the infinitesimal generator of the signal is the followingsecond-order differential operator

A =d∑

i,j=1

aij(·)∂2

∂xi∂xj+

d∑i=1

fi(·)∂

∂xi.

We can approximate the solution to equation (8.20) by using a finite differencescheme on a given d-dimensional regular grid Ωh with mesh h = (h1, . . . , hm)in order to approximate the differential operator A. The scheme approximatesfirst-order derivatives evaluated at x as (ei is the unit vector in the ith coor-dinate)

∂ϕ

∂xi

∣∣∣∣x

'

ϕ(x+ eihi)− ϕ(x)

hiif fi(x) ≥ 0

ϕ(x)− ϕ(x− eihi)hi

if fi(x) < 0

and the second-order derivatives as

∂2ϕ

∂x2i

∣∣∣∣x

' ϕ(x+ eihi)− 2ϕ(x) + ϕ(x− eihi)h2i

and

∂2ϕ

∂xi∂xj

∣∣∣∣x

'

12hi

(ϕ(x+eihi+ejhj)−ϕ(x+eihi)

hj− ϕ(x+ejhj)−ϕ(x)

hj

+ ϕ(x)−ϕ(x−ejhj)hj

− ϕ(x−eihi)−ϕ(x−eihi−ejhj)hj

)if aij ≥ 0,

12hi

(ϕ(x+eihi)−ϕ(x+eihi−ejhj)

hj− ϕ(x)−ϕ(x−ejhj)

hj

+ ϕ(x+ejhj)−ϕ(x)hj

− ϕ(x−eihi+ejhj)−ϕ(x−eihi)hj

)if aij < 0.


For each grid point x ∈ Ωh define the set V h to be the set of points accessiblefrom x, that is,

V h(x) , x+ εieihi + εjejhj , ∀ εi, εj ∈ −1, 0,+1, i 6= j

and the set Nh(x) ⊃ V h(x) to be the set of nearest neighbors of x, includingx itself

Nh(x) , x+ ε1e1h1 + · · ·+ εdedhd, ∀ ε1, . . . , εd ∈ −1, 0,+1 .

The operator A is approximated by Ah, where Ah is the operator

Ahϕ(x) ,∑

y∈V h(x)

Ah(x, y)ϕ(y)

with coefficients† given for each x ∈ Ωh by

Ah(x, x) = −d∑i=1

1h2i

aii(x)−∑j : j 6=i

12hihj

|aij(x)|

− d∑i=1

1hi|fi(x)|

Ah(x, x± eihi) =1

2h2i


|aij(x)|+ 1hif±i (x)

Ah(x, x+ eihi ± ejhj) =1

2hihja±ij(x)

Ah(x, x− eihi ∓ ejhj) =1

2hihja±ij(x)

Ah(x, y) = 0, otherwise

for all i, j = 1, . . . , d, i 6= j. One can check that, for all x ∈ Ωh, where

Ωh ,⋃x∈Ωh

Nh(x),

it holds that ∑y∈V h(x)

Ah(x, y) = 0.

If for all x ∈ Rd and i = 1, . . . , d, the condition

1h2i


12hihj

|aij(x)| ≥ 0, (8.21)

is satisfied then

Ah(x, x) ≤ 0 Ah(x, y) ≥ 0 ∀x ∈ Ωh,∀y ∈ Ωh(x) \ x.† The notation x+ denotes max(x, 0) and x− denotes min(x, 0).

8.6 Particle Methods 209

Condition (8.21) ensures that Ah can be interpreted as the generator of apure jump Markov process taking values in the discretisation grid Ωh. As aconsequence the solution of the resulting approximation of the Fokker–Planckequation p∆n will always be a discrete probability distribution.

For recent results regarding the splitting-up algorithm see the work ofGyongy and Krylov in [118, 119]. The method described above can be refinedto permit better approximations of pt by using composite or adaptive grids(see Cai et al. [37] for details). See also Kushner and Dupuis [181], Lototskyet al. [194], Sun and Glowinski [263], Benes [9] and Florchingen and Le Gland[101] for related results.

For a general framework for proving convergence results for this class ofmethods, see Chapter 7 of the monograph by Kushner [182] and the referencescontained therein. See also Kushner and Huang [184] for further convergenceresults.

8.6 Particle Methods

Particle methods† are algorithms which approximate the stochastic processπt with discrete random measures of the form∑

i

ai(t)δvi(t),

in other words, with empirical distributions associated with sets of randomlylocated particles of stochastic masses a1(t),a2(t), . . . , which have stochasticpositions v1(t),v2(t), . . . where vi(t) ∈ S. Particle methods are currently amongthe most successful and versatile methods for numerically solving the filteringproblem and are discussed in depth in the following two chapters.

The basis of this class of numerical method is the representation of πtgiven by the Kallianpur–Striebel formula (3.33). That is, for any ϕ a boundedBorel-measurable function, we have


,

where ρt is the unnormalised conditional distribution of Xt

ρt(ϕ) = E[ϕ(Xt)Zt

∣∣∣Yt] , (8.22)

and

Zt = exp(∫ t

0

h(Xs)> dYs −12

∫ t

0

‖h(Xs)‖2 ds).

† Also known as particle filters or sequential Monte Carlo methods.


The expectation in (8.22) is taken with respect to the probability measureP under which the process Y is a Brownian motion independent of X (seeSection 3.3 for details).

One can then use a Monte Carlo approximation for E[ϕ(Xt)Zt | Yt]. Thatis, a large number of independent realisations of the signal are produced (sayn) and, for each of them, the corresponding expression ϕ(Xt)Zt is computed.Then, by taking the average of all the resulting values, one obtains an ap-proximation of E[ϕ(Xt)Zt | Yt]. To be more precise, let vj , j = 1, . . . , n ben mutually independent stochastic processes and independent of Y , each ofthem being a solution of the martingale problem for (A, π0). In other wordsthe pairs (vj , Y ), j = 1, . . . , n are identically distributed and have the samedistribution as the pair (X,Y ) (under P). Also let aj , j = 1, . . . , n be thefollowing exponential martingales

aj(t) = 1 +∫ t

0

aj(s)h(vj(s))> dYs, t ≥ 0. (8.23)

In other words

aj(t) = exp(∫ t

0

h(vj(s))> dYs −12

∫ t

0

‖h(vj(s))‖2 ds), t ≥ 0.

Hence, the triples (vj , aj , Y ), j = 1, . . . , n are identically distributed and havethe same distribution as the triple (X, Z, Y ) (under P).

Exercise 8.12. Show that the pairs (vj(t), aj(t)), j = 1, . . . , n are mutuallyindependent conditional upon the observation σ-algebra Yt.

Let ρn = ρnt , t ≥ 0 and πn = πnt , t ≥ 0 be the following sequences ofmeasure-valued processes

ρnt ,1n

n∑j=1

aj(t)δvj(t), t ≥ 0 (8.24)

πnt ,ρnt

ρnt (1), t ≥ 0

=n∑j=1

anj (t)δvj(t), t ≥ 0, (8.25)

where the normalised weights anj have the form

anj (t) =aj(t)∑nk=1 ak(t)

, j = 1, . . . , n, t ≥ 0.

That is, ρnt is the empirical measure of n (random) particles with positionsvj(t), j = 1, . . . , n and weights aj(t)/n, j = 1, . . . , n and πnt is its normalisedversion. We have the following.


Lemma 8.13. For any ϕ ∈ B(S) we have

E[(ρnt (ϕ)− ρt(ϕ))2 | Yt] =c1,ϕ(t)n

, (8.26)

where c1,ϕ(t) , E[(ϕ(Xt)Zt − ρt(ϕ))2 | Yt]. Moreover

E[(ρnt (ϕ)− ρt(ϕ))4 | Yt

]≤ c2,ϕ(t)

n2, (8.27)

where c2,ϕ(t) , 6E[(ϕ(Xt)Zt − ρt(ϕ))4 | Yt].

Proof. Observe that since the triples (vj , aj , Y ), j = 1, . . . , n are identicallydistributed and have the same distribution as the triple (X, Z, Y ), we havefor j = 1, . . . ,m,

E [ϕ(vj(t))aj(t) | Yt] = E[ϕ(Xt)Zt | Yt

]= ρt(ϕ).

In particularE [ρnt (ϕ) | Yt] = ρt(ϕ)

and the random variables ξϕj , j = 1, . . . , n defined by

ξϕj , ϕ (vj(t)) aj(t)− ρt(ϕ), j = 1, . . . , n,

have zero mean and the same distribution as ϕ(Xt)Zt− ρt(ϕ). It then followsthat

1n

n∑j=1

ξϕj = ρnt (ϕ)− ρt(ϕ).

Since the pairs (vi(t), ai(t)) and (vj(t), aj(t)) for i 6= j, conditional upon Ytare independent, it follows that the random variables ξϕj , j = 1, . . . , n aremutually independent conditional upon Yt. It follows immediately that

E[

(ρnt (ϕ)− ρt(ϕ))2∣∣∣Yt] =

1n2

E

n∑j=1

ξϕj

2∣∣∣∣∣∣∣Yt

=1n2

n∑j=1

E[(ξϕj )2

∣∣Yt]=

1n2

n∑j=1

E[

(ϕ(vj(t))aj(t)− ρt(ϕ))2∣∣∣Yt]

=c1,ϕ(t)n

.

Similarly


E[

(ρnt (ϕ)− ρt(ϕ))4∣∣∣Yt] =

1n4

E

n∑j=1

ξϕj

4∣∣∣∣∣∣∣Yt

=1n4

n∑j=1

E[(ξϕj)4∣∣∣Yt]

+12n4

∑1≤j1<j2≤n

E[(ξϕj1)2∣∣∣Yt] E

[(ξϕj2)2∣∣∣Yt]

≤ E[(ϕ(Xt)Zt − ρt(ϕ))4]n3

+6n(n− 1)

n4(c1,ϕ(t))2

and the claim follows since, by Jensen’s inequality, we have

(c1,ϕ(t))2 ≤ E[(ϕ(Xt)Zt − ρt(ϕ))4].

ut

Remark 8.14. More generally one can prove that for any integer p and anyϕ ∈ B(S),

E[ (ρnt (ϕ)− ρt(ϕ))2p∣∣Yt] ≤ cp,ϕ(t)

np, (8.28)

wherecp,ϕ(t) = kpE

[(ϕ(Xt)Zt − ρt(ϕ))2p

∣∣∣Yt] , (8.29)

where kp is some universal constant.

Of course, Lemma 8.13 and Remark 8.14 are of little use if the randomvariables cp,ϕ(t) are not finite a.s. In the following we assume that they are.Under this condition the lemma implies that ρnt (ϕ) converges in expectationto ρt(ϕ) for any ϕ ∈ B(S) with the rate of convergence of order 1/

√n.

Exercise 8.15. Let cp,ϕ(t) be the Yt-adapted random variable defined in(8.29). Show that if E[Z2p

t ] < ∞, then E[cp,ϕ(t)] < ∞, hence the randomvariable cp,ϕ(t) is finite P-almost surely for any ϕ ∈ B(S). In particular, showthat if the function h is bounded, then cp,ϕ(t) < ∞, P-almost surely for anyϕ ∈ B(S).

The convergence of ρnt (ϕ) to ρt(ϕ) is valid for larger classes of function ϕ(not just bounded functions) provided that ϕ(Xt)Zt is P-integrable. Moreover,the existence of higher moments of ϕ(Xt)Zt ensures a control on the rate ofconvergence. However, in the following we restrict ourselves to just boundedtest functions.

Proposition 8.16. If E[Z2pt ] < ∞, then for any ϕ ∈ B(S), there exists a

finite Yt-adapted random variable cp,ϕ(t) such that for any ϕ ∈ B(S),

E[ (πnt (ϕ)− πt(ϕ))2p∣∣Yt] ≤ cp,ϕ(t)

np. (8.30)


Proof. Observe that

πnt (ϕ)− πt(ϕ) =ρnt (ϕ)ρnt (1)

1ρt(1)

(ρt(1)− ρnt (1)) +1

ρt(1)(ρnt (ϕ)− ρt(ϕ)) ,

hence, since |ρnt (ϕ)| ≤ ‖ϕ‖∞ρnt (1), we have

|πnt (ϕ)− πt(ϕ)| ≤ ‖ϕ‖∞ρt(1)

|ρnt (1)− ρt(1)|+ 1ρt(1)

|ρnt (ϕ)− ρt(ϕ)| (8.31)

and, by the triangle inequality,

E[ (πnt (ϕ)− πt(ϕ))2p∣∣Yt]1/2p ≤ ‖ϕ‖∞

ρt(1)E[(ρnt (1)− ρnt (1))2p

∣∣Yt]1/2p+

1ρt(1)

E[

(ρnt (ϕ)− ρt(ϕ))2p∣∣∣Yt]1/2p .

Remark 8.14 and Exercise 8.15 imply that there exists a finite Yt-adaptedrandom variable such that for any ϕ ∈ B(S) we have

E[ (ρnt (ϕ)− ρt(ϕ))2p∣∣Yt] ≤ cp,ϕ(t)

np;

hence (8.30) holds with cp,ϕ(t) being the Yt-adapted random variable

cp,ϕ(t) ,

(cp,ϕ(t)1/2p + ‖ϕ‖∞cp,1(t)1/2p

)2pρt(1)2p

. (8.32)

ut

Lemma 8.13 shows the convergence of ρnt (ϕ) to ρt(ϕ) when conditioned withrespect to the observation σ-algebra Yt. It also implies the convergence inexpectation,† and the almost sure convergence of ρnt to ρt.

Theorem 8.17. If E[Z2t ] <∞, then for any ϕ ∈ B(S) we have

E[|ρnt (ϕ)− ρt(ϕ)|] ≤ c1(t)√n‖ϕ‖∞,

where c1(t) ,√

E[Z2t ]. In particular e limn→∞ ρnt = ρt. Moreover, if E[Z2p

t ] <∞, for p ≥ 2 then for any ε ∈ (0, 1/2 − 1/(2p)) and ϕ ∈ B(S) there exists apositive random variable cε,p,ϕ(t) which is almost surely finite such that

|ρnt (ϕ)− ρt(ϕ)| ≤ cε,p,ϕ(t)nε

. (8.33)

In particular, ρnt converges to ρt, P-almost surely.

† Recall that ρnt → ρt in expectation if limn→∞ E [|ρn

t f − ρtf |] = 0 for all f ∈ Cb(S).See Section A.10 for the definition of convergence in expectation.


Proof. From Lemma 8.13 we get, using Jensen’s inequality, that

E[|ρnt (ϕ)− ρt(ϕ)|] ≤√

E[(ρnt (ϕ)− ρt(ϕ))2] =

√E[c1,ϕ(t)]√n

,

hence the first claim is true since

E[c1,ϕ(t)] = E[(ϕ(Xt)Zt − ρt(ϕ))2]

= E[ϕ(Xt)2Z2

t − 2ρt(ϕ)Ztϕ(Xt) + ρt(ϕ)2]

= E[ϕ(Xt)2Z2

t − ρt(ϕ)2]

≤ ‖ϕ‖2∞E[Z2t ].

Similarly

E[(ρnt (ϕ)− ρt(ϕ))2p] ≤ E[cp,ϕ(t)]np

,

where cp,ϕ(t) is the random variable defined in (8.29), which implies (8.33)and the almost sure convergence of ρnt to ρ follows as a consequence of RemarkA.38 in the appendix. ut

Let us turn our attention to the convergence of πnt . The almost sure con-vergence of πnt to πt holds under the same conditions as the convergence ofρnt to ρt. However, the convergence in expectation of πnt to πt requires anadditional integrability condition on ρ−1

t (1).

Theorem 8.18. If E[Z2t ] < ∞ and E

[ρ−2t (1)

]< ∞, then for any ϕ ∈ B(S),

we have

E[|πnt (ϕ)− πt(ϕ)|] ≤ c1(t)√n‖ϕ‖, (8.34)

where c1(t) = 2√

E[Z2t ]E[ρ−2

t (1)]. In particular πnt converges to πt in expec-

tation. Moreover, if E[Z2pt ] <∞, for p ≥ 2 then for any ε ∈ (0, 1/2− 1/(2p))

there exists a positive random variable cε,p,ϕ(t) almost surely finite such that

|πnt (ϕ)− πt(ϕ)| ≤ cε,p,ϕ(t)nε

. (8.35)

In particular, πnt converges to πt, P-almost surely.

Proof. By inequality (8.31) and the Cauchy–Schwartz inequality we get that

E[|πnt (ϕ)− πt(ϕ)|] ≤ ‖ϕ‖√

E[ρ−2t (1)

]E [|ρnt (1)− ρt(1)|2]

+√

E[ρ−2t (1)

]E [|ρnt (ϕ)− ρt(ϕ)|2].


Moreover, for any ϕ ∈ B(S), from the proof of Theorem 8.17, it follows that

E[(ρnt (ϕ)− ρt(ϕ))2] ≤ 1n‖ϕ‖2E[Z2

t ];

hence the first claim is true.For the almost sure convergence result observe that inequalities (8.31) and

(8.33) imply that

|πnt (ϕ)− πt(ϕ)| ≤ ‖ϕ‖∞ρt(1)

|ρnt (1)− ρt(1)|+ 1ρt(1)

|ρnt (ϕ)− ρt(ϕ)|

≤ ‖ϕ‖∞ρt(1)

cε,p,1(t)nε

+1

ρt(1)cε,p,ϕ(t)nε

and the claim follows with

cε,p,ϕ(t) =‖ϕ‖∞cε,p,1(t) + cε,p,ϕ(t)

ρt(1).

ut

Exercise 8.19. Show that E[ρ−2t (1)] <∞ if the function h is bounded.

Exercise 8.20. Show that if E[Z2pt ] <∞, then there exists a positive constant

cp(t) such that for any ϕ ∈ B(S) we have

E[|ρnt (ϕ)− ρt(ϕ)|p] ≤ cp(t)np/2

‖ϕ‖p∞.

Similarly, show that if E[Z2pt ] < ∞ and E[ρ−2p

t (1)] < ∞, then for any ϕ ∈B(S), we have

E[|πnt (ϕ)− πt(ϕ)|p] ≤ cp(t)np/2

‖ϕ‖p∞.

Let M = ϕi, i ≥ 0, where ϕi ∈ Cb(S) be a countable convergencedetermining set such that ‖ϕi‖∞ ≤ 1 for any i ≥ 0 and let dM be the metricon M(S) (see Section A.10 for additional details):

dM :M(S)×M(S)→ [0,∞), d(µ, ν) =∞∑i=0

12i|µϕi − νϕi|.

Theorems 8.17 and 8.18 give the following corollary.

Corollary 8.21. If E[Z2t ] <∞, then

E[dM (ρnt , ρt)] ≤2√

E[Z2t ]

√n

. (8.36)

Similarly if E[Z2t ] <∞ and E

[ρ−2t (1)

]<∞, then for any ϕ ∈ B(S), we have


E[dM (πnt , πt)] ≤4√

E[Z2t ]E

[ρ−2t (1)

]√n

. (8.37)

Moreover, if E[Z2pt ] < ∞, for p ≥ 2 then for any ε ∈ (0, 1/2 − 1/(2p)) there

exists a positive random variable cε(t) which is almost surely finite such that

dM (ρnt , ρt) ≤cε(t)nε

, dM (πnt , π) ≤ cε(t)nε

. (8.38)

Proof. From Theorem 8.17 we get, using the fact that ‖ϕi‖∞ ≤ 1, that

E[dM (ρnt , ρt)] ≤∞∑i=0

12i

E[|ρnt (ϕi)− ρt(ϕi)|]

≤ c1(t)√n

∞∑i=0

12i≤ 2c1(t)√

n,

which establishes (8.36). Inequality (8.37) follows by a similar argument. Bythe triangle inequality and Exercise 8.20 it follows that

E[dM (ρnt , ρt)p]1/p ≤

∞∑i=0

12i

E[|ρnt (ϕi)− ρt(ϕi)|p]1/p ≤cp(t)np/2

∞∑i=0

12i.

The first inequality in (8.38) then follows from Remark A.38 in the appendix.The second inequality in (8.38) follows in a similar manner. ut

Corollary 8.21 states that both ρnt converges to ρt, and πnt converges to πtin expectation with the rate 1/

√n. It also states that the corresponding rate

for the almost sure convergence is slightly lower than 1/√n.

The above analysis requires the existence of higher moments of the martin-gale Z. Of course, the question arises as to what happens if they do not existand we only know that Z is integrable. In this case πnt still converges almostsurely to πt for fixed observation paths s 7→ Ys as a consequence of the stronglaw of large numbers. To state this precisely, it is necessary to use an explicitdescription of the underlying probability space Ω as a product space, wherethe processes (X,Y ) live on one component and the processes vj , j = 1, 2, . . .live on another. The details and the ensuing analysis is cumbersome, so wedo not include the details. Moreover, in this case, the random measure πntwill not converge to the random measure πt (over the product space) and noconvergence rates may be available. In Chapter 10 we discuss the convergencefor fixed observations for the discrete time framework.

Theorems 8.17 and 8.18 and Corollary 8.21 show that the Monte Carlomethod will produce approximations for ρt, respectively πt, provided enoughparticles (independent realizations of the signal) are used. The number ofparticles depends upon the magnitude of the constants appearing in the upperbounds of the rates of convergence, which in turn depend on the magnitude


of the higher moments of the exponential martingale Z. This is bad news,because these higher moments of the exponential martingale Z increase veryrapidly as functions of time.

The particle picture makes the reason for the deterioration in the accuracyof the approximations with time clearer. Each particle has a trajectory which isindependent of the signal trajectory, and its corresponding weight depends onhow close its trajectory is to the signal trajectory: the weight is the likelihoodof the trajectory given the observation.† Typically, most particles’ trajecto-ries diverge very quickly from the signal trajectory, with a few ‘lucky’ onesremaining close to the signal. Therefore the majority of the weights decreaseto zero, while a small minority become very large. As a result only the ‘lucky’particles will contribute significantly to the sums (8.24) and (8.25) giving theapproximations for ρt, respectively, πt. The convergence of the Monte Carlomethod is therefore very slow as a large number of particles is needed in orderto have a sufficient number of particles in the right area (with correspondinglylarge weights).

To solve this problem, a wealth of methods have been proposed. In filter-ing theory, the generic name for these methods is particle filters or sequentialMonte Carlo methods. These methods use a correction mechanism that cullsparticles with small weights and multiplies particles with large weights. Thecorrection procedure depends on the trajectory of the particle and the obser-vation data. This is effective as particles with small weights (i.e. particles withunlikely trajectories/positions) are not carried forward uselessly whereas themost probable regions of the signal state space are explored more thoroughly.The result is a cloud of particles, with those surviving to the current timeproviding an estimate for the conditional distribution of the signal.

In the following two chapters we study this class of methods in greaterdetail. In Chapter 9 we discuss such a particle method for the continuoustime framework together with corresponding convergence results. In Chapter10, we look at particle methods for solving the filtering problem in the discreteframework.


8.12 It is enough to show that the stochastic integrals

Ij =∫ t

0

h(vj(s))> dYs, j = 1, . . . , n

are mutually independent given Yt. This follows immediately from the factthat the random variables

Imj =m∑i=1

h(vj(it/m))>(Y(i+1)t/m − Yit/m), j = 1, . . . , n

† In Chapter 10 we make this statement precise for the discrete time framework.


are mutually independent given Yt, hence by the bounded convergence theo-rem

E

n∏j=1

exp(iλjIj)

∣∣∣∣∣∣Yt = lim

k→∞E

n∏j=1

exp(iλjI

mkj

)∣∣∣∣∣∣Yt

= limk→∞

n∏j=1

E[exp(iλjI

mkj

)∣∣Yt]=

n∏j=1

E [ exp(iλjIj)| Yt] (8.39)

for any λj , j = 1, . . . , n. In (8.39), (Imkj )k>0 is a suitably chosen subsequenceof (Imj )m>0 so that Imkj converges to Ij almost surely.

8.15 From (8.29) and the inequality (a+ b)k ≤ 2k−1(ak + bk),

E[cp,ϕ] = kpE[(ϕ(Xt)Zt − ρt(ϕ)

)2p]

≤ 22p−1kpE[(ϕ(Xt)Zt)2p + (ρt(ϕ))2p

]≤ 22p−1kp‖ϕ‖2p∞

(E[Z2p

t ] + E[(ρt(1))2p]).

The first term is bounded by the assumption E[Z2p] <∞; for the second termuse the conditional form of Jensen

E[(ρt(1))2p

]= E

[(E[Zt | Yt])2p

]≤ E

[E[Z2p

t | Yt]]

= E[Z2pt ] <∞.

Therefore E[cp,ϕ] <∞, which implies that cp,ϕ <∞ P-a.s.For the second part, where h is bounded, use the explicit form

Z2pt = exp

(2p

m∑i=1

∫ t

0

hi dY is − pm∑i=1

∫ t

0

hi(Xs)2 ds

)≤ exp((2p2 − p)mt‖h‖2∞)Θt,

where Θ = Θt, t ≥ 0 is the exponential martingale

Θt , exp

(2p

m∑i=1

∫ t

0

hi dY is −(2p)2

2

m∑i=1

∫ t

0

hi(Xs)2 ds

).

The boundedness of h implies that Θ is a genuine martingale via Novikov’scondition (see Theorem B.34). Taking expectations, we see that E[Z2p

t ] isbounded by exp((2p2 − p)mt‖h‖2∞).

8.19 By Jensen’s inequality


E[ρ−2t (1)] = E

[E[Zt | Yt

]−2]≤ E

[E[Z−2t | Yt

]]= E

[Z−2t

]and from the explicit form for Zt,

Z−2t = exp

(−2∫ t

0

h(Xs)> dYs +∫ t

0

‖h(Xs)‖2 ds)

≤ exp(3mt‖h‖2∞)Θt,

where Θ = Θt, t ≥ 0 is the exponential martingale

Θt , exp

(−2

m∑i=1

∫ t

0

hi dY is − 2m∑i=1

∫ t

0

hi(Xs)2 ds

).

The boundedness of h implies that Θ is a genuine martingale via Novikov’scondition (see Theorem B.34). Taking expectations, we see that E[ρ−2

t (1)] isbounded by exp(3mt‖h‖2∞).

8.20 By Jensen’s inequality and (8.28)

E [|ρnt (ϕ)− ρ(ϕ)|p] ≤√

E [(ρnt (ϕ)− ρ(ϕ))2p]

≤√

E[E [(ρnt (ϕ)− ρ(ϕ))2p | Yt]

]≤

√E[cp,ϕ(t)]

np/2.

From the computations in Exercise 8.15,

E[cp,ϕ(t)] ≤ Kp(t)‖ϕ‖2p∞,

whereKp(t) = 4pkpE

[Z2pt

],

thus

E [|ρnt (ϕ)− ρ(ϕ)|p] ≤√Kp(t)‖ϕ‖p∞np/2

.

Therefore the result follows with cp(t) =√Kp(t).

For the second part, from (8.31) and the inequality (a+b)p < 2p−1(ap+bp),

|πnt (ϕ)− π(ϕ)|p ≤ 2p−1 ‖ϕ‖p∞ρt(1)p

|ρnt (1))− ρt(1)|p +2p−1

ρt(1)p|ρnt (ϕ)− ρt(ϕ)|p ,

so by Cauchy–Schwartz


E [|πnt (ϕ)− π(ϕ)|p] ≤ 2p−1‖ϕ‖p∞

√E [ρt(1)−2p] E

[cp,1np

]+ 2p−1

√E [ρt(1)−2p] E

[cp,ϕnp

]≤ 2p−1

√E [ρt(1)−2p]

np/2

(‖ϕ‖p∞

√E [cp,1] +

√E [cp,ϕ]

)

≤ 2p−1

√E [ρt(1)−2p]

np/2‖ϕ‖p∞2

√Kp(t),

so the result follows with

cp(t) = 2p√Kp(t)

√E [ρt(1)−2p].

9

A Continuous Time Particle Filter

9.1 Introduction

Throughout this chapter, we take the signal X to be the solution of (3.9);that is, X = (Xi)di=1 is the solution of the stochastic differential equation

dXt = f(Xt)dt+ σ(Xt) dVt, (9.1)

where f : Rd → Rd and σ : Rd → Rd×p are bounded and globally Lipschitzfunctions and V = (V j)pj=1 is a p-dimensional Brownian motion. As discussedin Section 3.2, the generator A associated with the process X is the second-order differential operator,

A =d∑i=1

f i∂

∂xi+

d∑i,j=1

aij∂2

∂xi∂xj,

where a = 12σσ

>. Since both f and a are bounded, the domain of the gen-erator A, D(A) is C2

b (Rd), the space of bounded twice continuously differ-entiable functions with bounded first and second partial derivatives; for anyϕ ∈ C2

b (Rd), the process Mϕ = Mϕt , t ≥ 0 defined by†

Mϕt , ϕ(Xt)− ϕ(X0)−

∫ t

0

Aϕ(Xs) ds,

=∫ t

0

((∇ϕ)>σ)(Xs) dVs, t ≥ 0

is an Ft-adapted martingale.The observation process is the solution of the evolution equation (3.5);

that is, Y is an m-dimensional stochastic process that satisfies

dYt = h(Xt) dt+ dWt,

† In the following (∇ϕ)> is the row vector (∂1ϕ, . . . , ∂dϕ).


222 9 A Continuous Time Particle Filter

where h = (hi)mi=1 : Rd → Rm is a bounded measurable function and Wis a standard m-dimensional Brownian motion independent of X. Since h isbounded, condition (3.25) is satisfied. Hence the process Z = Zt, t > 0defined by

Zt , exp(−∫ t

0

h(Xs)> dWs −12

∫ t

0

‖h(Xs)‖2 ds), t ≥ 0, (9.2)

is a genuine martingale and the probability P whose Radon–Nikodym deriva-tive with respect to P is given on Ft by Zt, viz

dPdP

∣∣∣∣∣Ft

= Zt,

is well defined (see Section 3.3 for details, also Theorem B.34 and CorollaryB.31). As was shown in Chapter 3, under P, the process Y is a Brownianmotion independent of X. Then the Kallianpur–Striebel formula (3.33) statesthat


, P(P)-a.s.,

where ρt is the unnormalized conditional distribution of X, which satisfies

ρt(ϕ) = E[ϕ(Xt)Zt | Yt

]for any bounded Borel-measurable function ϕ and

Zt = exp(∫ t

0

h(Xs)> dYs −12

∫ t

0

‖h(Xs)‖2 ds). (9.3)

Similar to the Monte Carlo method which is described in Section 8.6,the particle filter presented below produces a measure-valued process πn =πnt , t ≥ 0 which represents the empirical measure of n (random) particleswith varying weights

πnt ,n∑j=1

anj (t)δvnj (t), t ≥ 0.

The difference between the Monte Carlo method described earlier and theparticle filter which we are about to describe is the presence of an additionalcorrection procedure, which is applied at regular intervals to the system of par-ticles. At the correction times, each particle is replaced by a random numberof particles (possibly zero). We say that the particles branch into a randomnumber of offspring. This is done in a consistent manner so that particleswith small weights have no offspring (i.e. are killed), and particles with largeweights are replaced by several offspring.

9.2 The Approximating Particle System 223

The chapter is organised as follows. In the following section we describe indetail the particle filter and some of its properties. In Section 9.3 we reviewthe dual of the process ρ, which was introduced in Chapter 7, and give anumber of preliminary results. The convergence results are proved in Section9.4.

9.2 The Approximating Particle System

The particle system at time 0 consists of n particles all with equal weights1/n, and positions vnj (0), for j = 1, . . . , n. We choose the initial positions ofthe particles to be independent, identically distributed random variables withcommon distribution π0, for j, n ∈ N. Hence the approximating measure attime 0 is

πn0 =1n

n∑j=1

δvnj (0).

The time interval [0,∞) is partitioned into sub-intervals of equal length δ.During the time interval [iδ, (i + 1)δ), the particles all move with the samelaw as the signal X; that is, for t ∈ [iδ, (i+ 1)δ),

vnj (t) = vnj (iδ) +∫ t

iδ

f(vnj (s)) ds+∫ t

iδ

σ(vnj (s)) dV (j)s , j = 1, . . . , n, (9.4)

where (V (j))nj=1 are mutually independent Ft-adapted p-dimensional Brown-ian motions which are independent of Y , and independent of all other randomvariables in the system. The notation V (j) is used to make it clear that theseare not the components of each p-dimensional Brownian motion. The weightsanj (t) are of the form

anj (t) ,anj (t)∑nk=1 a

nk (t)

,

where

anj (t) = 1 +m∑k=1

∫ t

iδ

anj (s)hk(vnj (s)) dY ks ; (9.5)

in other words

anj (t) = exp(∫ t

iδ

h(vnj (s))> dYs −12

∫ t

iδ

‖h(vnj (s))‖2 ds). (9.6)

For t ∈ [iδ, (i+ 1)δ), define

πnt ,n∑j=1

anj (t)δvnj (t).


At the end of the interval, each particle branches into a random number ofparticles. Each offspring particle initially inherits the spatial position of itsparent. After branching all the particles are reindexed (from 1 to n) and allof the (unnormalized) weights are reinitialised back to 1. When necessary,we use the notation j′ = 1, 2, . . . , n to denote the particle index prior to thebranching event, to distinguish it from the index after the branching eventwhich we denote by j = 1, 2, . . . , n. Let on,(i+1)δ

j′ be the number of offspringproduced by the j′th particle at time (i+ 1)δ in the n-particle approximatingsystem. Then o

n,(i+1)δj′ is F(i+1)δ-adapted and†

on,(i+1)δj′ ,

[na

n,(i+1)δj′

]with prob. 1− nan,(i+1)δ

j′ [na

n,(i+1)δj′

]+ 1 with prob. nan,(i+1)δ

j′ ,(9.7)

where an,(i+1)δj′ is the value of the particle’s weight immediately prior to the

branching; in other words,

an,(i+1)δj′ = anj′((i+ 1)δ−) = lim

t(i+1)δanj′(t). (9.8)

Hence if F(i+1)δ− is the σ-algebra of events up to time (i+ 1)δ, viz

F(i+1)δ− = σ(Fs, s < (i+ 1)δ),

then from (9.7),

E[on,(i+1)δj′ | F(i+1)δ−

]= na

n,(i+1)δj′ , (9.9)

and the conditional variance of the number of offspring is

E[(on,(i+1)δj′

)2 ∣∣F(i+1)δ−

]−(E[on,(i+1)δj′

∣∣F(i+1)δ−

])2

=na

n,(i+1)δj′

(1−

na

n,(i+1)δj′

).

Exercise 9.1. Let a > 0 be a positive constant and Aa be the set of allinteger-valued random variables ξ such that E[ξ] = a, viz

Aa , ξ : Ω → N | E[ξ] = a .

Let var(ξ) = E[ξ2]−a2 be the variance of an arbitrary random variable ξ ∈ Aa.Show that there exists a random variable ξmin ∈ Aa with minimal variance.That is, var(ξmin) ≤ var(ξ) for any ξ ∈ Aa. Moreover show that

ξmin =

[a] with prob. 1− a[a] + 1 with prob. a

(9.10)

† In the following, [x] is the largest integer smaller than x and x is the fractionalpart of x; that is, x = x− [x].


and var(ξmin) = a(1−a). More generally show that E[ϕ(ξmin)] ≤ E[ϕ(ξ)]for any convex function ϕ : R→ R.

Remark 9.2. Following Exercise 9.1, we deduce that the random variableson,(i+1)δj′ defined by (9.7) have conditional minimal variance in the set of all

integer-valued random variables ξ such that E[ξ | F(i+1)δ−] = nan,(i+1)δj′ for

j = 1, . . . , n. This property is important as it is the variance of the randomvariables onj that influences the speed of convergence of the correspondingalgorithm.

9.2.1 The Branching Algorithm

We wish to control the branching process so that the number of particles inthe system remains constant at n; that is, we require that for each i,

n∑j′=1

on,(i+1)δj′ = n,

which implies that the random variables on,(i+1)δj′ , j′ = 1, . . . , n will be corre-

lated.Let un,(i+1)δ

j′ , j′ = 1, . . . , n − 1 be n − 1 mutually independent randomvariables, uniformly distributed on [0, 1], which are independent of all otherrandom variables in the system. To simplify notation in the statement ofthe algorithm, we omit the superscript (i + 1)δ in the notation for on,(i+1)δ

j′ ,

an,(i+1)δj′ and u

n,(i+1)δj′ . The following algorithm is then applied.

g := n h := nfor j′ := 1 to n− 1

ifnanj′

+g − nanj′

< 1 then

if unj′ < 1−(nanj′

/g

)then

onj′ :=[nanj′

]else

onj′ :=[nanj′

]+ (h− [g])

end ifelse

if unj′ < 1−(1−

nanj′

)/ (1− g) then

onj′ :=[nanj′

]+ 1

elseonj′ :=

[nanj′

]+ (h− [g])

end ifend ifg := g − nanj′h := h− onj′

end foronn := h


Some of the properties of the random variables onj′ , j′ = 1, . . . , n are givenby the following proposition. Since there is no risk of confusion, in the state-ment and proof of this proposition, the primes on the indices are omitted andthus the variables are denoted onj nj=1.

Proposition 9.3. The random variables onj for j = 1, . . . , n have the follow-ing properties.

a.∑nj=1 o

nj = n.

b. For any j = 1, . . . , n we have E[onj ] = nanj .c. For any j = 1, . . . , n, onj has minimal variance, specifically

E[(onj − nanj )2] = nanj (1− nanj ).

d. For any k = 1, . . . , n − 1, the random variables on1:k =∑kj=1 o

nj , and

onk+1:n =∑nj=k+1 o

nj have variance

E[(on1:k − nan1:k)2] = nan1:k (1− nan1:k) .E[(onk+1:n − nank+1:n)2] =

nank+1:n

(1−

nank+1:n

),

where an1:k =∑kj=1 a

nj and ank+1:n =

∑nj=k+1 a

nj .

e. For 1 ≤ i < j ≤ n, the random variables oni and onj are negatively correlated.That is,

E[(oni − nani )(onj − nanj )] ≤ 0.

Proof. Property (a) follows immediately from the fact that onn is defined as

onn = n−n−1∑j′=1

onj′ .

For properties (b), (c) and (d), we proceed by induction. First define thesequence of σ-algebras

Uk = σ(unj , j = 1, . . . , k), k = 1, . . . , n− 1,

where unj , j = 1, . . . , n − 1 are the random variables used to construct theon′

j s. Then from the algorithm,

on1 = [nan1 ] + 1[0,nan1 ] (un1 ) ;

hence on1 has mean nan1 and minimal variance from Exercise 9.1. As a con-sequence of property (a), it also holds that on2:n has minimal variance. Theinduction step follows from the fact that h stores the number of offspringwhich are not yet assigned and g stores the sum of their corresponding means.In other words at the kth iteration for k ≥ 2, h = onk:n = n − on1:k−1 andg = nank:n = n − nan1:k−1. It is clear that nank +

nank+1:n

is either equal


to nank:n or nank:n + 1. In the first of these cases, from the algorithm itfollows that for k ≥ 2,

onk = [nank ] + (onk:n − [nank:n]) 1[1−nank/nank:n,1] (unk ) , (9.11)

from which it follows from the fact that onk+1:n + onk = onk:n, that

onk+1:n =[nank+1:n

]+ (onk:n − [nank:n]) 1[0,1−nank/na

nk:n] (unk ) ; (9.12)

hence, using the fact that onk:n is Uk−1-measurable and unk is independent ofUk−1, we get from (9.11) that

E [(onk − nank ) | Uk−1] = −nank+ (onk:n − [nank:n])nanknank:n

=nanknank:n

(onk:n − nank:n)

= (onk:n − nank:n)nanknank:n

(9.13)

and by a similar calculation

E[(onk − nank )2 | Uk−1]

= (onk:n − nank:n)2 nanknank:n

+ (nank:n − nank) nank

+ 2 (onk:n − nank:n) (nank:n − nank)nanknank:n

. (9.14)

The identities (9.13), (9.14) and the corresponding identities derived from(9.12), viz:

E[ok+1:n − nank+1:n | Uk−1

]= (ok:n − nank:n)

(1− na

nk

nank:n

)and

E[(onk+1:n − nank+1:n)2 | Uk−1] = (onk:n − nank:n)2

(1− na

nk

nank:n

)+ 2 (onk:n − nank:n) nank

(1− na

nk

nank:n

)+ (nank:n − nank) nak

which give the induction step for properties (b), (c) and (d).For example, in the case of (b), taking expectation over (9.13) we see that

E [onk − nank ] =nanknank:n

E [onk:n − nank:n]


and the right-hand side is zero by the inductive hypothesis. The case nank+nank+1:n

= nank:n+ 1 is treated in a similar manner. Finally, for the proof

of property (e) one shows first that for j > i,

E[(onj − nanj

)| Ui]

= ci:j(oni+1:n − nani+1:n

)ci:j = pj

j−2∏k=i

qk ≥ 0,

where we adopt the convention∏j−2k=i qk = 1 if i = j − 1, and where

pj =

nanj

/nanj:n

ifnanj

+nanj+1:n

=nanj:n

(1−

nanj

)/(1−

nanj:n

)ifnanj

+nanj+1:n

=nanj:n

+ 1

qk =

nank:n /

nank−1:n

ifnank−1

+ nank:n =

nank−1:n

(1− nank:n) /

(1−

nank−1:n

)otherwise.

Then, for j > i

E[(oni − nani )

(onj − nanj

)]= ci:jE

[(oni − nani )

(oni+1:n − nani+1:n

)]= −rici:j ,

where

ri =

nani

nani+1:n

if nani +

nani+1:n

= nani:n

(1− nani )(1−

nani+1:n

)if nani +

nani+1:n

= nani:n+ 1.

As ri > 0 and ci:j > 0, it follows that

E[(oni − nani )

(onj − nanj

)]< 0.

ut

Remark 9.4. Proposition 9.3 states that the algorithm presented above pro-duces an n-tuple of integer-valued random variables onj for j = 1, . . . , n withminimal variance, negatively correlated and whose sum is always n. Moreover,not only do the individual onj s have minimal variance, but also any sum of theform

∑kj=1 o

nj or

∑nj=k o

nj is an integer-valued random variable with minimal

variance for any k = 1, . . . , n. This additional property can be interpreted asa further restriction on the random perturbation introduced by the branchingcorrection.

Remark 9.5. Since the change of measure from P to P does not affect thedistribution of the random variables unj′ , for j′ = 1, . . . , n−1, all the propertiesstated in Proposition 9.3 hold true under P as well.


Lemma 9.6. The process πn = πnt , t ≥ 0 is a probability measure-valuedprocess with cadlag paths. In particular, πn is continuous on any interval[iδ, (i+ 1)δ), i ≥ 0. Also, for any i > 0 we have

E[πniδ | Fiδ−] = limtiδ

πnt . (9.15)

The same identity holds true under the probability measure P. That is,

E[πniδ | Fiδ−] = limtiδ

πnt .

Proof. Since the pair processes (anj (t), vnj (t)), j = 1, 2, . . . , n are continuousin the interval [iδ, (i+ 1)δ) it follows that for any ϕ ∈ Cb(Rd) the function

πnt (ϕ) =n∑j=1

anj (t)ϕ(vnj (t))

is continuous for t ∈ (iδ, (i+ 1)δ). Hence πn is continuous with respect to theweak topology on M(Rd) for t ∈ (iδ, (i + 1)δ), for each i ≥ 0. By the sameargument, πn is right continuous and has left limits at iδ for any i > 0. Forany t ≥ 0,

πnt (1) =n∑j=1

anj (t) = 1,

therefore πn is probability measure-valued.The identity (9.15) follows by observing that at the time iδ the weights

are reset to one; thus for ϕ ∈ B(Rd), it follows that

πniδ(ϕ) =1n

n∑j′=1

on,iδj′ ϕ(vnj′(iδ))

and from (9.8) and (9.9), we have

E [πniδ(ϕ) | Fiδ−] =1n

n∑j′=1

E[on,iδj′ |Fiδ− ]ϕ(vnj′(iδ)

)=

n∑j′=1

an,iδj′ ϕ(vnj′(iδ)

)= limtiδ

n∑j′=1

anj′(t)ϕ(vnj′(t)

).

Finally, from Remark 9.5, since the change of measure from P to P does notaffect the distribution of the random variables unj′ , for j′ = 1, . . . , n − 1, itfollows that

E[on,iδj′ | F(i+1)δ−

]= nan,iδj′ ,

hence also E[πniδ | Fiδ−] = limtiδ πnt . ut


If the system does not undergo any corrections, that is, δ = ∞, then theabove method is simply the Monte Carlo method described in Section 8.6. Theconvergence of the Monte Carlo approximation is very slow as the particleswander away from the signal’s trajectory forcing the unnormalised weightsto become infinitesimally small. Consequently the branching correction pro-cedure is introduced to cull the unlikely particles and multiply those situatedin the right areas.

However, the branching procedure introduces randomness into the systemas it replaces each weight with a random number of offspring. As such, thedistribution of the number of offspring has to be chosen with great care tominimise this effect. The random number of offspring should have minimalvariance. That is, as the mean number of offspring is pre-determined, weshould choose the onj′s to have the smallest possible variance amongst allinteger-valued random variables with the given mean nanj′ . It is easy to checkthat if the onj′s have the distribution described by (9.7) then they have minimalvariance.

In [66], Crisan and Lyons describe a generic way to construct n-tuples ofinteger-valued random variables with the minimal variance property and thetotal sum equal to n. This is done by means of an associated binary tree,hence the name Tree-based branching Algorithms (which are sometimes ab-breviated as TBBAs). The algorithm presented above is a specific example ofthe class described in [66]. To the authors’ knowledge only one other alter-native algorithm is known that produces n-tuples which satisfy the minimalvariance property. It was introduced by Whitley [268] and independently byCarpenter, Clifford and Fearnhead [39]. Further remarks on the branchingalgorithm can be found at the end of Chapter 10.

9.3 Preliminary Results

The following proposition gives us the evolution equation for the approximat-ing measure-valued process πn.

Proposition 9.7. The probability measure-valued process πn = πnt , t ≥ 0satisfies the following evolution equation

πnt (ϕ) = πn0 (ϕ) +∫ t

0

πns (Aϕ) ds+ Sn,ϕt +Mn,ϕ[t/δ]

+m∑k=1

∫ t

0

(πns (hkϕ)− πns (hk)πns (ϕ))(dY ks − πns (hk) ds

), (9.16)

for any ϕ ∈ C2b (Rd), where Sn,ϕ = Sn,ϕt , t ≥ 0 is the Ft-adapted martingale

Sn,ϕt =1n

∞∑i=0

n∑j=1

∫ (i+1)δ∧t

iδ∧tanj (s)(∇ϕ)>σ)(vnj (s)) dV (j)

s ,

9.3 Preliminary Results 231

and Mn,ϕ = Mn,ϕk , k > 0 is the discrete parameter martingale

Mn,ϕk =

1n

k∑i=1

n∑j′=1

(onj′(iδ)− nan,iδj′ )ϕ(vnj′(iδ)), k > 0. (9.17)

Proof. Let Fkδ− = σ (Fs, 0 ≤ s < kδ) be the σ-algebra of events up to time kδ(the time of the kth-branching) and πnkδ− = limtkδ π

nt . For t ∈ [iδ, (i+ 1)δ),

we have† for ϕ ∈ C2b (Rd),

πnt (ϕ) = πn0 (ϕ) +Mn,ϕi +

i∑k=1

(πnkδ−(ϕ)− πn(k−1)δ(ϕ)

)+ (πnt (ϕ)− πniδ(ϕ)) , (9.18)

where Mn,ϕ =Mn,ϕj , j ≥ 0

is the process defined as

Mn,ϕj =

j∑k=1

(πnkδ(ϕ)− πnkδ−(ϕ)

), for j ≥ 0.

The martingale property of Mn,ϕ follows from (9.15) and the explicit expres-sion (9.17) from the fact that πnkδ = (1/n)

∑nj′=1 o

n,kδj′ δvn

j′ (kδ)and πnkδ− =∑n

j′=1 an,kδj′ δvn

j′ (kδ).

We now find an expression for the third and fourth terms on the right-handside of (9.18). From Ito’s formula using (9.4), (9.5) and the independence ofY and V , it follows that

d(anj (t)ϕ(vnj (t))

)= anj (t)Aϕ(vnj (t)) dt

+ anj (t)((∇ϕ)>σ)(vnj (t)) dV (j)t

+ anj (t)ϕ(vnj (t))h>(vnj (t)) dYt,

and

d

(n∑k=1

ank (t)

)=

n∑k=1

ank (t)h>(vnk (t)) dYt,

for any ϕ ∈ C2b (Rd). Hence for t ∈ [kδ, (k + 1)δ) and k = 0, 1, . . . , i, we have

† We use the standard convention∑0

k=1 = 0.


πnt (ϕ)− πn(k−1)δ(ϕ) =∫ t

(k−1)δ

d

n∑j=1

anj ϕ(vnj (s))

(9.19)

=∫ t

(k−1)δ

n∑j=1

d

(anj (s)ϕ

(vnj (s)

)∑np=1 a

np (s)

)

=∫ t

(k−1)δ

πns (Aϕ) ds

+m∑r=1

∫ t

(k−1)δ

(πns (hrϕ)− πns (hr)πns (ϕ))

× ( dY rs − πns (hr) ds)

+n∑j=1

∫ t

(k−1)δ

anj (s)((∇ϕ)>σ)(vnj (s)) dV (j)s . (9.20)

Taking the limit as t kδ yields,

πnkδ−(ϕ)− πn(k−1)δ(ϕ) =∫ kδ

(k−1)δ

πns (Aϕ) ds

+n∑j=1

∫ kδ

(k−1)δ

anj (s)((∇ϕ)>σ)(vnj (s)) dV (j)s

+m∑r=1

∫ kδ

(k−1)δ

(πns (hrϕ)− πns (hr)πns (ϕ))

× (dY rs − πns (hr) ds). (9.21)

Finally, (9.18), (9.20) and (9.21) imply (9.16). ut

In the following we choose a fixed time horizon t > 0 and let Yt = Yts, s ∈[0, t] be the backward filtration

Yts = σ(Yt − Yr, r ∈ [s, t]).

Recall that Cmb (Rd) is the set of all bounded, continuous functions withbounded partial derivatives up to order m on which we define the norm

‖ϕ‖m,∞ =∑|α|≤m

supx∈Rd

|Dαϕ(x)| , ϕ ∈ Cmb (Rd),

where α = (α1, . . . , αd) is a multi-index and Dαϕ = (∂1)α1 · · · (∂d)α

d

ϕ. Alsorecall that Wm

p (Rd) is the set of all functions with generalized partial deriva-tives up to order m with both the function and all its partial derivatives beingp-integrable on which we define the Sobolev norm


‖ϕ‖m,p =

∑|α|≤m

∫Rd|Dαϕ(x)|p dx

1/p

.

In the following we impose conditions under which the dual of the solutionof the Zakai equation exists (see Chapter 7 for details). We assume that thematrix-valued function a is uniformly strictly elliptic. We also assume thatthere exists an integer m > 2 and a positive constant p > max(d/(m− 2), 2)such that for all i, j = 1, . . . , d, aij ∈ Cm+2

b (Rd), fi ∈ Cm+1b (Rd) and for all

i = 1, . . . ,m we have hi ∈ Cm+1b (Rd). Under these conditions, for any bounded

ϕ ∈ Wmp (Rd) there exists a function-valued process ψt,ϕ = ψt,ϕs , s ∈ [0, t]

which is the dual of the measure-valued process ρ = ρs, s ∈ [0, t] (thesolution of the Zakai equation) in the sense of Theorem 7.22. That is, for anyϕ ∈Wm

p (Rd) ∩B(Rd), the process

s 7→ ρs(ψt,ϕs

), s ∈ [0, t]

is almost surely constant. We recall below the properties of the dual as de-scribed in Chapter 7.

1. For every x ∈ Rd, ψt,ϕs (x) is a real-valued process measurable with respectto the backward filtration Yt.

2. Almost surely, ψt,ϕ is jointly continuous over [0,∞) × Rd and is twicedifferentiable in the spatial variable. Both ψt,ϕs and its partial derivativesare continuous bounded functions.

3. ψt,ϕ is a solution of the following backward stochastic partial differentialequation which is identical to (7.30):

ψt,ϕs (x) = ϕ(x)−∫ t

s

Aψt,ϕp (x) dp

−∫ t

s

ψt,ϕp (x)h>(x) dYp, 0 ≤ s ≤ t, x ∈ Rd,

where∫ tsψt,ϕp h> dYp is a backward Ito integral.

4. There exists a constant c = c(p) independent of ϕ such that

E

[sups∈[0,t]

∥∥ψt,ϕs ∥∥p2,∞

]≤ c‖ϕ‖pm,p. (9.22)

As mentioned in Chapter 7, the dual ψt,ϕ can be defined for a larger classof the test functions ϕ than Wm

p (Rd), using the representation (7.33). We canrewrite (7.33) in the following form,

ψt,ϕs (x) = E[ϕ(v(t))ats(v, Y ) | Yt, v(s) = x

], (9.23)

for any ϕ ∈ B(Rd). In (9.23), v = v(s), s ∈ [0, t] is an Fs-adapted Markovprocess, independent of Y that satisfies the same stochastic differential equa-tion as the signal; that is,


dv(t) = f(v(t)) dt+ σ(v(t)) dVt

and

ats(v, Y ) = exp(∫ t

s

h(v(r))> dYr −12

∫ t

s

‖h(v(r))‖2 dr).

Lemma 9.8. For s ∈ [0, t] and ϕ ∈ B(Rd), we have

ψt,ϕs (v(s)) = E[ϕ(v(t))ats(v, Y ) | Fs ∨ Yt

].

Proof. From (9.23) and the properties of the conditional expectation

ψt,ϕs (v(s)) = E[ϕ(v(t))ats(v, Y ) | Yt ∨ σ(v(s))

]and the claim follows by the Markov property of the process v and its inde-pendence from Yt. ut

Lemma 9.9. For any ϕ ∈ B(Rd) and any k < [t/δ], the real-valued process

s ∈ [kδ, (k + 1)δ ∧ t) 7→ ψt,ϕs (vnj (s))anj (s)

is an Fs ∨ Yt-adapted martingale. Moreover, if ϕ ∈ Wmp (Rd) ∩ B(Rd) where

m > 2 and (m− 2)p > d

ψt,ϕs (vnj (s))anj (s) = ψt,ϕkδ(vnj (kδ)

)+∫ s

kδ

anj (p)((∇ψt,ϕp )>σ)(vnj (p)

)dV (j)

p , (9.24)

for s ∈ [kδ, (k + 1)δ ∧ t) and j = 1, . . . , n.

Proof. For the first part of the proof we cannot simply use the fact that ψt,ϕ

is a (classical) solution of the backward stochastic partial differential equation(7.30) as the test function ϕ does not necessarily belong to Wm

p (Rd). However,from Lemma 9.8 it follows that

ψt,ϕs (vnj (s)) = E[ϕ(vnj (t)

)ats(v

nj , Y ) | Fs ∨ Yt

], (9.25)

where for j = 1, . . . , n, following (9.6),

ats(vnj , Y ) = exp

(∫ t

s

h(vnj (r)

)> dYr −12

∫ t

s

‖h(vnj (r))‖2 dr)

and vnj (s) is given by

vnj (s) = vnj (kδ)+∫ s

kδ

f(vnj (r)) dr+∫ s

kδ

σ(vnj (r)) dV (j)r , j = 1, . . . , n, (9.26)

which is taken as the definition for s ∈ [kδ, t]. Comparing this with (9.4) itis clear that if (k + 1)δ < t, then this vnj (s) may not agree with the previous


definition on ((k + 1)δ, t]. Observe that ats(vnj , Y ) = anj (t)/anj (s) where anj (s)

is given for s ∈ [kδ, t] by

anj (s) = exp(∫ s

kδ

h(vnj (p))> dYp −12

∫ s

kδ

‖h(vnj (p))‖2 dp)

; (9.27)

since anj (s) is Fs-adapted it is also Fs ∨ Yt-adapted, thus

ψt,ϕs (vnj (s))anj (s) = E[ϕ(vnj (t)

)anj (t) | Fs ∨ Yt]. (9.28)

Since s 7→ E[ϕ(vnj (t)

)anj (t) | Fs ∨ Yt] is an Fs ∨ Yt-adapted martingale for

s ∈ [0, t], so is s 7→ ψt,ϕs (vnj (s))anj (s). This completes the proof of the first partof the lemma.

For the second part of the lemma, as ϕ ∈ Wmp (Rd), it is now possible to

use properties 1–4 of the dual process ψt,ϕ, in particular the fact that ψt,ϕ

is differentiable. The stochastic integral on the right-hand side of (9.24) iswell defined as the Brownian motion V (j) = V (j)

s , s ∈ [kδ, (k + 1)δ ∧ t) isFs ∨ Yt-adapted (V (j) is independent of Y ) and so is the integrand

s ∈ [kδ, (k + 1)δ ∧ t) 7→ anj (p)((∇ψt,ϕp )>σ)(vnj (p)

).

Moreover, the stochastic integral on the right-hand side of (9.24) is a genuinemartingale since its quadratic variation process Q = Qs, s ∈ [kδ, (k+1)δ∧t)satisfies the inequality

E[Qs] ≤ K2σ

∫ s

kδ

E[‖ψt,ϕp ‖21,∞

]E[(anj (p))2

]dp <∞. (9.29)

In (9.29) we used the fact that ‖ψt,ϕp ‖21,∞ and anj (p) are mutually independentand that σ is uniformly bounded by Kσ. We cannot prove (9.24) by applyingIto’s formula directly: ψt,ϕp is Ytp-measurable, whereas anj (p) is Fp-measurable.Instead, we use a density argument.

Since all terms appearing in (9.24) are measurable with respect to the σ-algebra Fkδ ∨Ytkδ ∨ (Vj)tkδ, where Ytkδ = σ(Yr −Ykδ, r ∈ [kδ, t]) and (Vj)tkδ =σ(V jr − V

jkδ r ∈ [kδ, t]), it suffices to prove that

E[χ(ψt,ϕs (vnj (s))anj (s)− ψt,ϕkδ

(vnj (kδ)

))]= E

[χ

∫ s

kδ


)dV (j)

p

], (9.30)

where χ is any bounded Fkδ ∨Ytkδ ∨ (Vj)tkδ-measurable random variable. It issufficient to work with a much smaller class of bounded Fkδ ∨ Ytkδ ∨ (Vj)tkδ-measurable random variables. Let b : [kδ, t] → Rm and c : [kδ, t] → Rdbe bounded, Borel-measurable functions and let θb and θc be the following(bounded) processes


θbr , exp(i

∫ r

kδ

b>p dYp +12

∫ r

kδ

‖bp‖2 dp), (9.31)

and

θcr , exp(i

∫ r

kδ

c>p dV (j)p +

12

∫ r

kδ

‖cp‖2 dp). (9.32)

Then it is sufficient to show that (9.30) holds true for χ of the form χ = ζθbtθct ,

for any choice of b in (9.31) and c in (9.32) and any bounded Fkδ-measurablerandom variable ζ (see Corollary B.40 for a justification of the above). Fors ∈ [kδ, (k + 1)δ ∧ t),

E[ψt,ϕs (vnj (s))anj (s)ζθbtθ

ct | Fkδ ∨ Yskδ ∨

(Vj)skδ

]= Ξs(vnj (s))anj (s)ζθbsθ

cs, (9.33)

where Ξ = Ξs(·), s ∈ [kδ, (k + 1)δ ∧ t] is given by

Ξs(·) , E[ψt,ϕs (·)θbs | Fkδ ∨ Yskδ ∨

(Vj)skδ

],

and

θbs ,θbtθbs

= exp(i

∫ t

s

b>p dYp +12

∫ t

s

‖bp‖2 dp).

Both ψt,ϕs and θbs are measurable with respect to the σ-algebra Yts, which isindependent of Fkδ ∨Yskδ ∨

(Vj)skδ

, hence Ξs(·) = E[ψt,ϕs (·)θbs]. As in the proofof Theorem 7.22 it follows that for any r ∈ Cmb ([0,∞),Rd) and any x ∈ Rd,

Ξs(x) = ϕ(x)−∫ t

s

AΞp(x) dp− i∫ t

s

h>(x)rpΞp(x) dp, 0 ≤ s ≤ t. (9.34)

Equivalently Ξ(·) = Ξs(·), s ∈ [0, t] is the unique solution of the parabolicPDE (4.14) with final time condition Ξt(·) = ϕ(·). From the Sobolev embed-ding theorem as a consequence of the condition (m− 2)p > d, it follows thatϕ has a modification on a set of null Lebesgue measure which is in Cb(Rd),therefore the solution to the PDE Ξ ∈ C1,2

b ([0, t]×Rd). From (9.33) it followsthat

E[(ψt,ϕs (vnj (s))anj (s)− ψt,ϕkδ

(vnj (kδ)

))χ]

= E[ζ(Ξs(vnj (s))anj (s)θbsθ

cs −Ξkδ

(vnj (kδ)

))]. (9.35)

As Ξ is the solution of a deterministic PDE with deterministic initial con-dition, it follows that Ξs(vnj (s)) is Fs-measurable. Thus as all the terms arenow measurable with respect to the same filtration, it is possible to applyIto’s rule and use the PDE (9.34) to obtain


E[ζ(Ξs(vnj (s))anj (s)θbsθ

cs −Ξkδ

(vnj (kδ)

)anj (kδ)θbkδθ

ckδ

)]= E

[ζ

∫ s

kδ

d(anj (p)Ξp(vnj (p))θbpθ

cp

)]= E

[ζ

∫ s

kδ

(anj (p)θbpθ

cp

(AΞp(vnj (p)) + iΞp(vnj (p))h>(vnj (p))bp

+∂Ξp∂p

(vnj (p)))

+ i(∇Ξ)>σcpθbpθcp

)dp]

= E[iζ

∫ s

kδ

anj (p)(∇Ξ>σ)cpθbpθcp dp

]= E

[iζ

∫ s

kδ

anj (p)(∇Ξ>p σ

)(vnj (p))cpθbpθ

cp dp

]. (9.36)

A second similar application of Ito’s formula using (9.32) yields

E[ζθbtθ

ct

∫ s

kδ

anj (p)((∇ψt,ϕp )>σ)(vnj (p)) dV jp

∣∣∣∣ Fkδ ∨ Ytkδ]= ζθbt E

[∫ s

kδ

d(θct

∫ s

kδ

anj (p)((∇ψt,ϕp )>σ)(vnj (p)) dV jp

) ∣∣∣∣ Fkδ ∨ Ytkδ]= iζθbt E

[∫ s

kδ

anj (p)((∇ψt,ϕp )>σ

)(vnj (p))cpθcp dp

∣∣∣∣ Fkδ ∨ Ytkδ] . (9.37)

Use of Fubini’s theorem and the tower property of conditional expectationgives

E[ζ

∫ s

kδ


)(vnj (p))cpθbtθ

cp dp

]=∫ s

kδ

E[ζanj (p)

(∇(ψt,ϕp )>σ

)(vnj (p))cpθbtθ

cp

]dp

=∫ s

kδ

E[E[ζanj (p)

(∇(ψt,ϕp )>σ

)(vnj (p))cpθbtθ

cp

∣∣ Fkδ ∨ Ypkδ ∨ (Vj)pkδ]]

dp

=∫ s

kδ

E[ζθcpθ

bpanj (p)cpE

[(∇(ψt,ϕp )>σ

)(vnj (p))θbp

∣∣∣ Fkδ ∨ Ypkδ ∨ (Vj)pkδ]]

dp

=∫ s

kδ

E[ζθcpθ

bpanj (p)cpE

[(∇(ψt,ϕp )>

)(vnj (p))θbp

]σ(vnj (p))

]dp

=∫ s

kδ

E[ζθcpθ

bpanj (p)cp∇E

[((ψt,ϕp )>

)(vnj (p))θbp

]σ(vnj (p))

]dp

= E[ζ

∫ s

kδ

anj (p)(∇Ξ>p σ

)(vnj (p))cpθbpθ

cp dp

].

Using this result and (9.37) it follows that


E[ζθbtθ

ct

∫ s

kδ


)(vnj (p)) dV jp

]= E

[iζ

∫ s

kδ

anj (p)(∇Ξ>p σ

)(vnj (p))cpθbpθ

cp dp

]. (9.38)

From (9.35), (9.36) and (9.38) we deduce (9.30) and hence the result of thelemma. ut

To show that ψt,ϕs is dual to ρs for arbitrary ϕ ∈ B(Rd), use the fact that(vnj (s), anj (s)) have the same law as (X, Z) and (9.28),

ρs(ψt,ϕs

)= E[Zsψt,ϕs (Xs) | Ys]= E[Zsψt,ϕs (Xs) | Yt]= E[ψt,ϕs (vnj (s))anj (s) | Yt]

= E[E[ϕ(vnj (t)

)anj (t) | Fs ∨ Yt

]| Yt]

= E[ϕ(vnj (t)

)anj (t) | Yt

]= E

[ϕ(Xt)Zt | Yt

]= ρt(ϕ).

Define the following Ft-adapted martingale ξn = ξnt , t ≥ 0 by

ξnt ,

[t/δ]∏i=1

1n

n∑j=1

an,iδj

1n

n∑j=1

anj (t)

.

Exercise 9.10. Prove that for any t ≥ 0 and p ≥ 1, there exist two constantsct,p1 and ct,p2 which depend only on maxk=1,...,m ‖hk‖0,∞ such that

supn≥0

sups∈[0,t]

E [(ξns )p] ≤ ct,p1 , (9.39)

andmax

j=1,...,nsupn≥0

sups∈[0,t]

E[(ξns a

nj (s)

)p] ≤ ct,p2 . (9.40)

We use the martingale ξnt to linearize πnt in order to make it easier toanalyze the convergence of πn. Let ρn = ρnt , t ≥ 0 be the measure-valuedprocess defined by

ρnt , ξnt πnt =

ξn[t/δ]δ

n

n∑j=1

anj (t)δvnj (t).

Exercise 9.11. Show that ρn = ρnt , t ≥ 0 is a measure-valued processwhich satisfies the following evolution equation


ρnt (ϕ) = πn0 (ϕ) +∫ t

0

ρns (Aϕ)ds+ Sn,ϕt + Mn,ϕ[t/δ]

+m∑k=1

∫ t

0

ρns (hkϕ) dY ks , (9.41)

for any ϕ ∈ C2b (Rd). In (9.41), Sn,ϕ = Sn,ϕt , t ≥ 0 is an Ft-adapted mar-

tingale

Sn,ϕt =1n

∞∑i=0

n∑j=1

∫ (i+1)δ∧t

iδ∧tξniδa

nj (s)((∇ϕ)>σ)(vnj (s))dV js

and Mn,ϕ =Mn,ϕk , k > 0

is the discrete martingale

Mn,ϕk =

1n

k∑i=1

ξniδ

n∑j′=1

(onj′(iδ)− nan,iδj′ )ϕ(vnj′(iδ)), k > 0.

Proposition 9.12. For any ϕ ∈ B(Rd), the real-valued process ρn· (ψt,ϕ· ) =

ρns (ψt,ϕs ), s ∈ [0, t] is an Fs ∨ Yt-adapted martingale.

Proof. From Lemma 9.9 we deduce that for s ∈ [[t/δ]δ, t], we have

E[anj (t)ϕ(vnj (t)) | Fs ∨ Yt

]= anj (s)ψt,ϕs (vnj (s))

which implies, in particular that

E[an,kδj′ ψt,ϕkδ(vnj′(kδ)

)| Fs ∨ Yt] = anj′(s)ψ

t,ϕs (vnj′(s))

for any s ∈ [(k − 1)δ, kδ). Hence

E [ρnt (ϕ) | Fs ∨ Yt] =ξn[t/δ]δ

n

n∑j=1

E[anj (t)ϕ(vnj (t)

)| Fs ∨ Yt]

= ρns(ψt,ϕs

), for [t/δ]δ ≤ s ≤ t (9.42)

and, for s ∈ [(k − 1)δ, kδ),

E[ρnkδ−(ψt,ϕkδ−) | Fs ∨ Yt

]=ξn(k−1)δ

n

n∑j′=1

E[an,kδj′ ψt,ϕkδ (vnj′(kδ)) | Fs ∨ Yt

]= ρns (ψt,ϕs ). (9.43)

Finally

E[ρnkδ(ψt,ϕkδ

)| Fkδ− ∨ Yt] =

ξnkδn

n∑j′=1

an,kδj′∑nk′=1 a

n,kδk′ /n

ψt,ϕkδ (vnj′(kδ))

= ρnkδ−(ψt,ϕkδ−). (9.44)

The proposition now follows from (9.42), (9.43) and (9.44). ut


Proposition 9.13. For any ϕ ∈ Wmp (Rd) ∩ B(Rd), the real-valued process

ρn· (ψt,ϕ· ) = ρns (ψt,ϕs ) , s ∈ [0, t] has the representation

ρnt (ϕ) = πn0 (ψt,ϕ0 ) + Sn,ϕt + Mn,ϕ[t/δ]. (9.45)

In (9.45), Sn,ϕ = Sn,ϕs , s ∈ [0, t] is the Fs ∨ Yt-adapted martingale

Sn,ϕs ,∞∑i=0

n∑j=1

ξniδn

∫ (i+1)δ∧s

iδ∧sanj (p)((∇ψt,ϕp )>σ)(vnj (p)) dV (j)

p

and Mn,ϕ = Mn,ϕk , k > 0 is the discrete martingale

Mn,ϕk ,

k∑i=1

ξniδn

n∑j=1

(onj (iδ)− nanj (iδ))ψt,ϕiδ (vnj (iδ)), k > 0.

Proof. As in (9.18), we have for t ∈ [iδ, (i+ 1)δ) that

ρnt (ϕ) = ρnt (ψt,ϕt )

= πn0 (ψt,ϕ0 ) + Mn,ϕi +

i∑k=1

(ρnkδ−(ψt,ϕkδ−)− ρn(k−1)δ(ψ

t,ϕ(k−1)δ)

)+ (ρnt (ψt,ϕt )− ρniδ(ψ

t,ϕiδ )), (9.46)

where Mn,ϕ = Mn,ϕi , i ≥ 0 is the process defined as (note that ψt,ϕkδ− = ψt,ϕkδ )

Mn,ϕi =

i∑k=1

(ρnkδ(ψt,ϕkδ )− ρnkδ−(ψt,ϕkδ−))

=i∑

k=1

ξnkδ(πnkδ(ψ

t,ϕkδ )− πnkδ−(ψt,ϕkδ ))

=1n

i∑k=1

ξnkδ

n∑j′=1

(on,kδj′ − nan,kδj′ )ψt,ϕkδ (vnj′(kδ)), for i ≥ 0. (9.47)

The random variables on,kδj are independent of Ytkδ since they are Fkδ-adapted.Then (9.9) implies

E[on,kδj′ | Fkδ− ∨ Ytkδ

]= E

[on,kδj′ | Fkδ−

]= nan,kδj′ ,

whence the martingale property of Mn,ϕ. Finally, from the representation(9.24) we deduce that for t ∈ [iδ, (i+ 1)δ),

9.4 The Convergence Results 241

ρnt (ψt,ϕt ) =ξniδn

n∑j=1

anj (t)ψt,ϕt(vnj (t)

)=ξniδn

n∑j=1

ψt,ϕiδ(vnj (iδ)

)+ξniδn

n∑j=1

∫ t

iδ


)dV (j)

p ,

hence

ρnt (ψt,ϕt )− ρniδ(ψt,ϕiδ ) =

ξniδn

n∑j=1

∫ t

iδ


)dV (j)

p .

Similarly

ρnkδ−(ψt,ϕkδ−)− ρn(k−1)δ(ψt,ϕ(k−1)δ)

=ξn(k−1)δ

n

n∑j=1

∫ kδ

(k−1)δ


)dV (j)

p ,

which completes the proof of the representation (9.45). ut

9.4 The Convergence Results

In this section we begin by showing that ρnt (ϕ) converges to ρt(ϕ) in Propo-sition 9.14 and that πnt (ϕ) converges to πt(ϕ) in Theorem 9.15 for anyϕ ∈ Cb(Rd). These results imply that ρnt converges to ρt and πnt convergesto πt as measure-valued random variables (Corollary 9.17). Proposition 9.14and Theorem 9.15 are then used to prove two stronger results, namely thatthe process ρn· (ϕ) converges to ρ·(ϕ) in Proposition 9.18 and that the processπn· (ϕ) converges to π·(ϕ) in Theorem 9.19 for any ϕ ∈ C2

b (Rd).† These implyin turn, by Corollary 9.20, that the measure-valued process ρn· converges to ρ·and that the probability measure-valued process πn· converges to π· Boundson the rates of convergence are also obtained.

Proposition 9.14. If the coefficients σ,f and h are bounded and Lipschitz,then for any T ≥ 0, there exists a constant cT3 independent of n such that forany ϕ ∈ Cb(Rd), we have

E[(ρnt (ϕ)− ρt(ϕ))2] ≤ cT3n‖ϕ‖20,∞, t ∈ [0, T ]. (9.48)

In particular, for all t ≥ 0, ρnt converges in expectation to ρt.

† Note the smaller class of test functions for which results 9.18 and 9.19 hold true.


Proof. It suffices to prove (9.48) for any non-negative ϕ ∈ Cb(Rd). Obviously,we have

ρnt (ϕ)− ρt(ϕ) =(ρnt (ϕ)− ρn[t/δ]δ(ψ

t,ϕ[t/δ]δ)

)+

[t/δ]∑k=1

(ρnkδ(ψ

t,ϕkδ )− ρnkδ−(ψt,ϕkδ−)

)+

[t/δ]∑k=1

(ρnkδ−(ψt,ϕkδ−)− ρn(k−1)δ(ψ

t,ϕ(k−1)δ)

)+(πn0(ψt,ϕ0

)− π0

(ψt,ϕ0

)). (9.49)

We must bound each term on the right-hand side individually. For the firstterm, using the martingale property of ρn(ψt,ϕ) and the fact that the randomvariables vnj (t) for j = 1, 2, . . . , n are mutually independent conditional uponF[t/δ]δ ∨ Yt (since the generating Brownian motions V (j), for j = 1, 2, . . . , nare mutually independent), we have

E[(ρnt (ϕ)− ρn[t/δ]δ(ψ

t,ϕ[t/δ]δ))

2 | F[t/δ]δ ∨ Yt]

= E[(ρnt (ϕ)− E[ρnt (ϕ) | F[t/δ]δ ∨ Yt])2 | F[t/δ]δ ∨ Yt]

=(ξn[t/δ]δ)

2

n2E

n∑j=1

ϕ(vnj (t))anj (t)

2∣∣∣∣∣∣∣F[t/δ]δ ∨ Yt

−

(ξn[t/δ]δ)2

n2

n∑j=1

E[ϕ(vnj (t))anj (t) | F[t/δ]δ ∨ Yt

]2

≤(ξn[t/δ]δ)

2

n2‖ϕ‖20,∞

n∑j=1

E[anj (t)2 | F[t/δ]δ ∨ Yt]. (9.50)

By taking expectation on both sides of (9.50) and using (9.40) for p = 2, weobtain

E[(ρnt (ϕ)− ρn[t/δ]δ(ψ

t,ϕ[t/δ]δ)

)2]≤‖ϕ‖20,∞n2

n∑j=1

E[(ξn[t/δ]δ)2anj (t)2]

≤ ct,22

n‖ϕ‖20,∞. (9.51)

Similarly (although in this case we do not have the uniform bound on ψt,ϕkδwhich was used with ψt,ϕt ),

E[(ρnkδ−(ψt,ϕkδ−)− ρn(k−1)δ(ψ

t,ϕ(k−1)δ)

)2]

≤ 1n2

n∑j′=1

E[(ξn(k−1)δa

n,kδj′ )2ψt,ϕkδ (vnj′(kδ))

2]. (9.52)


From (9.25) we deduce that

ψt,ϕkδ (vnj′(kδ)) = E[ϕ(vnj (t))atkδ(v

nj , Y ) | Fkδ ∨ Yt

];

hence by Jensen’s inequality

E[(ψt,ϕs (vnj′(kδ))

)p] ≤ E[E[ϕ(vnj (t))atkδ(v

nj , Y ) | Fkδ ∨ Yt

]p]= E

[(ϕ(vnj (t))atkδ(v

nj , Y )

)p].

Therefore

E[(ψt,ϕs (vnj′(kδ))

)p]≤ ‖ϕ‖p0,∞E

[exp

(∫ t

kδ

ph(vnj′(r)

)> dYr −12

∫ t

kδ

p2‖h(vnj′(r))‖2 dr)

× exp(p2 − p

2

∫ t

kδ

‖h(vnj′(r))‖2 dr)]

≤ exp(

12m(p2 − p)t max

k=1,...,m‖hk‖20,∞

)‖ϕ‖p0,∞. (9.53)

Using this upper bound with p = 4, the bound (9.40) and the Cauchy–Schwarzinequality on the right-hand side of (9.52),

E[(ρnkδ−(ψt,ϕkδ−)− ρn(k−1)δ(ψ

t,ϕ(k−1)δ)

)2]

≤√ct,42 exp

(3mt max

k=1,...,m‖hk‖20,∞

) ‖ϕ‖20,∞n

. (9.54)

For the second term on the right-hand side of (9.49), observe that

E[(ρnkδ(ψ

t,ϕkδ )− ρnkδ−(ψt,ϕkδ−)

)2 | Fkδ− ∨ Yt]=ξ2kδ

n2

n∑j′,l′=1

E[(on,kδj′ − nan,kδj′

)(on,kδl′ − nan,kδl′

)| Fkδ− ∨ Yt

]× ψt,ϕkδ (vnj′ (kδ))ψ

t,ϕkδ (vnl′(kδ)).

Since the test function ϕ was chosen to be non-negative, and the randomvariables on,kδj′ , j′ = 1, . . . , n are negatively correlated (see Proposition 9.3part e.) it follows that

E[(ρnkδ(ψ

t,ϕkδ )− ρnkδ−(ψt,ϕkδ−))2 | Fkδ− ∨ Yt

]≤ ξ2

kδ

n2

n∑j′=1

E[(on,kδj′ − nan,kδj′

)2

| Fkδ− ∨ Yt]ψt,ϕkδ (vnj′(kδ))

2

≤ ξ2kδ

n2

n∑j′=1

nan,kδj′

(1−

nan,kδj′

)ψt,ϕkδ (vnj′(kδ))

2.


Finally using the inequality q(1 − q) ≤ 14 for q = nan,kδj′ and (9.53) with

p = 2, it follows that

E[(ρnkδ(ψ

t,ϕkδ )− ρnkδ−(ψt,ϕkδ−))2

]≤ 1

4nexp

(mt max

k=1,...,m‖hk‖20,∞

)‖ϕ‖20,∞. (9.55)

For the last term, note that ψt,ϕ0 is Yt-measurable, therefore using the mutualindependence of the initial points vnj (0), and the fact that

E[ψt,ϕ0 (vnj (0)) | Yt] = π0(ψt,ϕ0 ),

we obtain

E[(πn0 (ψt,ϕ0 )− π0(ψt,ϕ0 )

)2 | Yt]=

1n2

n∑j=1

E[(ψt,ϕ0 (vnj (0))

)2 | Yt]− (π0(ψt,ϕ0 ))2

≤ 1n2

n∑j=1

E[(ψt,ϕ0 (vnj (0))

)2 | Yt] .Hence using the result (9.53) with p = 2,

E[(πn0 (ψt,ϕ0 )− π0(ψt,ϕ0 )

)2] ≤ 1n2

n∑j=1

E[ψt,ϕ0 (vnj (0))2]

≤ 1n

exp(mt max

k=1,...,m‖hk‖20,∞

)‖ϕ‖20,∞. (9.56)

The bounds on individual terms (9.51), (9.54), (9.55) and (9.56) substitutedinto (9.49) yields the result (9.48). ut

Theorem 9.15. If the coefficients σ,f and h are bounded and Lipschitz, thenfor any T ≥ 0, there exists a constant cT4 independent of n such that for anyϕ ∈ Cb(Rd), we have

E [|πnt (ϕ)− πt(ϕ)|] ≤ cT4√n‖ϕ‖0,∞, t ∈ [0, T ]. (9.57)

In particular, for all t ≥ 0, πnt converges in expectation to πt.

Proof. Since πnt (ϕ)ρnt (1) = ξnt πnt (ϕ) = ρnt (ϕ)

πnt (ϕ)− πt(ϕ) = (ρnt (ϕ)− ρt(ϕ)) (ρt(1))−1

− πnt (ϕ) (ρnt (1)− ρt(1)) (ρt(1))−1.


Define

mt ,

√E[(ρt(1))−2

].

Following Exercise 9.16 below, mt <∞, hence by Cauchy–Schwartz

E [|πnt (ϕ)− πt(ϕ)|] ≤ mt

√E[(ρnt (ϕ)− ρt(ϕ))2

]+mt‖ϕ‖0,∞

√E[(ρnt (1)− ρt(1))2

], (9.58)

and the result follows by applying Proposition 9.14 to the two expectationson the right-hand side of (9.58). ut

Exercise 9.16. Prove that E[supt∈[0,T ](ρt(1))−2] <∞ for any T ≥ 0.

Let M = ϕi, i ≥ 0 ∈ Cb(Rd) be a countable convergence determiningset such that ‖ϕi‖ ≤ 1 for any i ≥ 0 and dM be the metric on MF (Rd) (seeSection A.10 for additional details)

dM :MF (Rd)×MF (Rd)→ [0,∞), d(µ, ν) =∞∑i=0

|µϕi − νϕi|2i

.

Proposition 9.14 and Theorem 9.15 give the following corollary.

Corollary 9.17. If the coefficients σ,f and h are bounded and Lipschitz, then

supt∈[0,T ]

E[dM (ρnt , ρt)] ≤2√cT3√n, sup

t∈[0,T ]

E[dM (πnt , πt)] ≤2cT4√n. (9.59)

Thus ρnt converges to ρt in expectation and πnt converges to πt in expec-tation. In the following, we prove a stronger convergence result.

Proposition 9.18. If the coefficients σ,f and h are bounded and Lipschitz,then for any T ≥ 0, there exists a constant cT5 independent of n such that

E

[supt∈[0,T ]

(ρnt (ϕ)− ρt(ϕ))2

]≤ cT5

n‖ϕ‖22,∞ (9.60)

for any ϕ ∈ C2b (Rd).

Proof. Again, it suffices to prove (9.60) for any non-negative ϕ ∈ C2b (Rd).

Following Exercise 9.11 we have that

ρnt (ϕ)− ρt(ϕ) = (πn0 (ϕ)− π0(ϕ)) +∫ t

0

(ρns (Aϕ)− ρs(Aϕ)) ds+ Sn,ϕt

+ Mn,ϕ[t/δ] +

m∑k=1

∫ t

0

(ρns (hkϕ)− ρs(hkϕ)) dY ks , (9.61)


where Sn,ϕ = Sn,ϕt , t ≥ 0 is the martingale

Sn,ϕt ,1n

∞∑i=0

n∑j=1

∫ (i+1)δ∧t

iδ∧tξniδa

nj (s)((∇ϕ)>σ)(vnj (s))dV (j)

s ,

and Mn,ϕ =Mn,ϕk , k > 0

is the discrete parameter martingale

Mn,ϕk ,

1n

k∑i=1

ξniδ

n∑j′=1

(onj′(iδ)− nan,iδj′ )ϕ(vnj′(iδ)), k > 0.

We show that each of the five terms on the right-hand side of (9.61) satisfies aninequality of the form (9.60). For the first term, using the mutual independenceof the initial locations of the particles vnj (0), we obtain

E[(πn0 (ϕ)− π0(ϕ))2

]=

1n

(π0(ϕ2)− π0(ϕ)2

)≤ 1n‖ϕ‖20,∞. (9.62)

For the second term, by Cauchy–Schwartz

E

[supt∈[0,T ]

(∫ t

0

(ρns (Aϕ)− ρs(Aϕ))ds)2]

≤ E

[supt∈[0,T ]

t

∫ t

0

(ρns (Aϕ)− ρs(Aϕ))2 ds

]

= E

[T

∫ T

0


]. (9.63)

By Fubini’s theorem and (9.48), we obtain

E

[∫ T

0


]≤ cT3 T

n‖Aϕ‖20,∞. (9.64)

From the boundedness of σ and f since there exists c6 = c6(‖σ‖0,∞, ‖f‖0,∞)such that

‖Aϕ‖20,∞ ≤ c6‖ϕ‖22,∞,

from (9.63) and (9.64) that

E

[supt∈[0,T ]

(∫ t

0

(ρns (Aϕ)− ρs(Aϕ)) ds)2]≤ cT3 c6T

2

n‖ϕ‖22,∞. (9.65)

For the third term, we use the Burkholder–Davis–Gundy inequality (Theo-rem B.36). If we denote by C the constant in the Burkholder–Davis–Gundyinequality applied to F (x) = x2, then


E

[supt∈[0,T ]

(Sn,ϕt )2

]≤ CE

[⟨Sn,ϕ

⟩T

]=

C

n2

n∑j=1

∫ T

0

E[(ξn[s/δ]δa

nj (s))2((∇ϕ)>σσ>∇ϕ)(vnj (s))

]ds. (9.66)

From (9.40) and the fact that σ is bounded, we deduce that there exists aconstant cT7 such that

E[(ξn[s/δ]δanj (s))2((∇ϕ)>σσ>∇ϕ)(vnj (s))] ≤ cT7 ‖ϕ‖22,∞, (9.67)

for any s ∈ [0, T ]. From (9.66) and (9.67)

E

[supt∈[0,T ]

(Sn,ϕt )2

]≤ CcT7 T

n‖ϕ‖22,∞. (9.68)

For the fourth term on the right-hand side of (9.61), by Doob’s maximalinequality

E[

maxk=1,...,[T/δ]

(Mn,ϕk

)2] ≤ 4E[(Mn,ϕ

[T/δ]

)2]. (9.69)

Since ϕ is non-negative and the offspring numbers, onj′(iδ) for j′ = 1, . . . , n,are negatively correlated, from the orthogonality of martingale increments

E[(Mn,ϕ

[T/δ]

)2]

≤ 1n2

[T/δ]∑i=1

n∑j=1

E[(ξniδ)

2 nanj (iδ)

(1−

nanj (iδ)

) (ϕ(vnj (iδ)

))2]

≤‖ϕ‖20,∞

4n2

[T/δ]∑i=1

n∑j=1

E[(ξniδ)

2]. (9.70)

Then, from (9.39), (9.69) and (9.70) there exists a constant cT8 = cT,21 [T/δ]/4independent of n such that

E[

maxk=1,...,[T/δ]

(Mn,ϕk

)2] ≤ cT8n‖ϕ‖20,∞. (9.71)

To bound the last term, we use the Burkholder–Davis–Gundy inequality (The-orem B.36), Fubini’s theorem and the conclusion of Proposition 9.14 (vizequation (9.48)) to obtain


E

[supt∈[0,T ]

(∫ t

0

(ρns (hkϕ)− ρs(hkϕ)) dY ks

)2]

≤ CE

[∫ T

0

(ρns (hkϕ)− ρs(hkϕ))2 ds

]

≤ C∫ T

0

E[(ρns (hkϕ)− ρs(hkϕ))2

]ds

≤ CcT3 T‖hk‖0,∞n

‖ϕ‖20,∞. (9.72)

The bounds (9.62), (9.65), (9.68), (9.71) and (9.72) together imply (9.60). ut

Theorem 9.19. If the coefficients σ,f and h are bounded and Lipschitz, thenfor any T ≥ 0, there exists a constant cT9 independent of n such that

E

[supt∈[0,T ]

|πnt (ϕ)− πt(ϕ)|

]≤ cT9√

n‖ϕ‖2,∞ (9.73)

for any ϕ ∈ C2b (Rd).

Proof. As in the proof of Theorem 9.15,

E

[supt∈[0,T ]

|πnt (ϕ)− πt(ϕ)|

]≤ mT

√√√√E

[supt∈[0,T ]

(ρnt (ϕ)− ρt(ϕ))2

]

+ mT ‖ϕ‖0,∞

√√√√E

[supt∈[0,T ]

(ρnt (1)− ρt(1))2

],

where, following Exercise 9.16,

mT ,

√√√√E

[supt∈[0,T ]

(ρt(1))−2

]<∞

and the result follows from Proposition 9.18. ut

Let M = ϕi, i ≥ 0 where each ϕi ∈ C2b (Rd) be a countable convergence

determining set such that ‖ϕi‖∞ ≤ 1 and ‖ϕ‖2,∞ ≤ 1 for any i ≥ 0 anddM be the corresponding metric on MF (Rd) as defined in Section A.10. Thefollowing corollary of Proposition 9.18 and Theorem 9.19 is then immediate.

Corollary 9.20. If the coefficients σ,f and h are bounded and Lipschitz, thenwe have

E

[supt∈[0,T ]

dM (ρnt , ρt)

]≤ 2

√cT5√n, E

[supt∈[0,T ]

dM (πnt , πt)

]≤ 2cT9√

n(9.74)

for any T ≥ 0.

9.5 Other Results 249

9.5 Other Results

The particle filter described above merges the weighted approximation ap-proach, as presented in Kurtz and Xiong [171, 174] for a general class ofnon-linear stochastic partial differential equations (to which the Kushner–Stratonovich equation belongs) with the branching corrections approach in-troduced by Crisan and Lyons in [65]. The convergence of the resulting ap-proximation follows from Theorem 9.15 under fairly mild conditions on thecoefficients. The convergence results described above can be extended to thecorrelated noise framework. See Section 3.8 for a description of this frameworkand Crisan [61] for details of the proofs in this case. More refined convergenceresults require the use of the decomposition (9.61). For this we make use ofthe properties of the dual of ρ supplied by the theory of stochastic evolutionsystems (cf. Rozovskii [250]; see also Veretennikov [267] for a direct approachto establishing the dual property of ψt,ϕ).

The decomposition (9.61) is very important. It will lead to an exact rateof convergence, that is, to computing the limit

limn→∞

nE[(ρnt (ϕ)− ρt(ϕ))2

]and also to a central limit theorem (note that the three terms on the right-handside of (9.61) are mutually orthogonal). For this we need to understand thelimiting behaviour of the covariance matrix of the random variables onj , j =1, . . . , n. This has yet to be achieved.

In the last ten years we have witnessed a rapid development of the theory ofparticle approximations to the solution of non-linear filtering, and implicitly tosolving SPDEs similar to the filtering equations. The discrete time frameworkhas been extensively studied and a multitude of convergence and stabilityresults have been proved. A comprehensive description of these developmentsin the wider context of approximations of Feynman–Kac formulae can befound in Del Moral [216] and the references therein. See also Del Moral andJacod [217] for a result involving discrete observations but a continuous signal.

Results concerning particle approximations for the continuous time filter-ing problem are far fewer that their discrete counterparts. The developmentof particle filters for continuous time problems started in the mid-1990s. InCrisan and Lyons [64], the particle construction of a superprocess is extendedto the case of a branching measure-valued process in a random environment.When averaged, the particle system used in the construction is shown to con-verge to the solution of the Zakai equation. In Crisan et al. [63], the idea ofminimal variance branching is introduced (instead of fixed variance branch-ing) with the resulting particle system shown to converge to the solution ofthe Zakai equation. Finally, in Crisan and Lyons [65], a direct approximationof πt is produced by using a normalised branching approach. In Crisan etal. [62], an alternative approximation to the Kushner–Stratonovich equation(3.57) is given where the branching step is replaced by a correction procedure


using multinomial resampling. The multinomial resampling procedure pro-duces conditionally independent approximate samples from the conditionaldistribution of the signal, thus facilitating the analysis of the correspondingalgorithms. It is, however, suboptimal. For a heuristic explanation, assumethat between two consecutive correction steps, the information we receive onthe signal is ‘bad’ (the signal-to-noise ratio is small). Consequently the cor-responding weights will all be (roughly) equal: that is, all the particles areequally likely. The correction procedure should leave the particles untouchedin this case as there is no reason to cull or multiply any of the particles. Thisis exactly what the minimal branching step does: each particle has exactlyone offspring. The multinomial resampling correction will not do this: someparticle will be resampled more than others thus introducing an unnecessaryrandom perturbation to the system. For theoretical results related to the sub-optimality of the multinomial resampling procedure, see e.g. Crisan and Lyons[66] and Chopin [51]. Even if one uses the minimal variance branching cor-rection, additional randomness is still introduced in the system, which canaffect the convergence rates (see Crisan [60]). It remains an open question asto when and how often should one use the correction procedure.

On a parallel approach, Del Moral and Miclo [218] produced a particle filterusing the pathwise approach of Davis [74]. The idea is to recast the equationsof non-linear filtering in a form in which no stochastic integration is required.Then one can apply Del Moral’s general method of approximating Feynman–Kac formulae. This approach is important as it emphasises the robustness ofthe particle filter, although it requires that the observation noise and signalnoise are independent. While it cannot be applied to the correlated noiseframework, it is nevertheless a very promising approach and we expect furtherresearch to show its full potential.

9.6 The Implementation of the Particle Approximationfor πt

In the following we give a brief description of the implementation of the parti-cle approximation analysed in this chapter. We start by choosing parametersn, δ and m. We use n particles and we apply the correction (branching) pro-cedure at times kδ, for i > 1, divide the inter branching intervals [(k−1)δ, kδ]into m subintervals of length δ/m and apply the Euler method to generatethe trajectories of the particles. The following is the initialization step.

Initialization

For j := 1, . . . , nSample vj(0) from π0.aj(0) := 1.

end for

9.6 The Implementation of the Particle Approximation for πt 251

π0 := 1n

∑nj=1 δvj(0)

Assign value t := 0

The standard sampling procedure can be replaced by any alternative methodthat produces an approximation for π0. For example, a stratified samplingprocedure, if available, will produce a better approximation. In the specialcase where π0 is a Dirac measure concentrated at x0 ∈ Rd, the value x0 isassigned to all initial positions vj(0) of the particles. The following is the(two-step) iteration procedure.

Iteration [iδ to (i+ 1)δ]

1. Evolution of the particlesfor l := 0 to m− 1

for j := 1 to nGenerate the Gaussian random vector ∆V .vj(t+ δ/m) := vj(t) + f(vj(t))δ/m+ σ(vj(t))∆V

√δ/m.

bj(t+ δ/m) := h(vj(t))> (Yt+δ/m − Yt)− (δ/2m)‖h(vj(t))‖2aj(t+ δ/m) := aj(t) exp(bj(t+ δ/m))

end fort := t+ δ/mΣ(t) :=

∑nj=1 aj(t)

πnt := 1Σ(t)

∑nj=1 aj(t)δvj(t).

end for

In the above ∆V = (∆V1, ∆V2, . . . ,∆Vp)> is a p-dimensional random vectorwith independent identically distributed entries ∆Vi ∼ N(0, 1) for all i =1, . . . , p.

The Euler method used above can be replaced by any other weak approxi-mation method for the solution of the stochastic differential equation satisfiedby the signal (see for example Kloeden and Platen [151] for alternative ap-proximation methods). The choice of the parameters δ and m depends on thefrequency of the arrivals of the new observations Yt. We have assumed thatthe observation Yt is available for all time instants t which are integer multi-ples of δ/m. There are no theoretical results as to what is the right balancebetween the size of the intervals between corrections and the number of stepsused to approximate the law of the signal, in other words what is the optimalchoice of parameters δ and m.

2. Branching procedure

for j := 1 to naj(t) := aj(t)/Σ(t)

end forfor j′ := 1 to n


Calculate the number of offspring onj′(t) for the j′th particlein the system of particles with weights/positions (aj(t), vj(t))using the algorithm described in Section 9.2.1.

end forWe have now n particles with positions

(v1(t), v1(t), . . . , v1(t)︸︷︷︸o1(t)

, v2(t), v2(t), . . . , v2(t)︸︷︷︸o2(t)

, . . .) (9.75)

Reindex the positions of the particles as v1(t), v2(t), . . . , vn(t).for j := 1, . . . , n

aj(t) := 1end for

The positions of the particles with no offspring will no longer appear amongthose described by the formula (9.75). Alternatives to the branching procedureare described in Section 10.5. For example, one can use the sampling withreplacement method. In this case Step 2 is replaced by the following.

2′. Resampling procedure

for j := 1 to naj(t) := aj(t)/Σ(t).

end forfor j := 1 to n

Pick vj(t) by sampling with replacement from the set of par-ticle positions (v1(t), v2(t), . . . , vn(t)) according to the proba-bility vector of normalized weights (a1(t), a2(t), . . . , an(t)).

end forReindex the positions of the particles as v1(t), v2(t), . . . , vn(t).for j := 1, . . . , n

aj(t) := 1end for

However, the resampling procedure generates a multinomial offspring distri-bution which is known to be suboptimal. In particular, it does not have theminimal variance property enjoyed by the offspring distribution produced bythe algorithm described in Section 9.2.1 (see Section 10.5 for details).


9.1 In the case where a is an integer it is immediate that taking ξmin = aachieves the minimal variance of zero, and by Jensen’s inequality for anyconvex function ϕ, for ξ ∈ Aa, E[ϕ(ξ)] ≥ ϕ(E(ξ)) = ϕ(a) = E[ϕ(ξmin)] thusE[ϕ(ξmin)] ≤ E[ϕ(ξ)] for any ξ ∈ Aa.


For the more general case, let ξ ∈ Aa. Suppose that the law of ξ assignsnon-zero probability mass to two integers which are not adjacent. That is, wecan find k, l such that P(ξ = k) > 0 and P(ξ = l) > 0 and k + 1 ≤ l − 1.

We construct a new random variable ζ from ξ by moving some probabilitymass β > 0 from k to k+ 1 and some from l to l− 1. Let U ⊂ ω : ξ(ω) = kand D ⊂ ω : ξ(ω) = l, be such that P(U) = P(D) = β; then define

ζ , ξ + 1U − 1D.

Thus by direct computation, E[ζ] = a+ β − β, so ζ ∈ Aa; secondly

var(ζ) = E[ζ2]− a2 = E[ξ2] + 2β(1 + k − l)− a2

= var(ξ) + 2β(1 + k − l).

As we assumed that k + 1 ≤ l − 1, it follows that var(ζ) < var(ξ). Conse-quently the variance minimizing element of Aa can only have non-zero prob-ability mass on two adjacent negative integers, and then the condition on theexpectation ensures that this must be ξmin given by (9.10).

Now consider ϕ a convex function, we use the same argument

E[ϕ(ζ)] = E[ϕ(ξ)] + β (ϕ(k + 1)− ϕ(k) + ϕ(l − 1)− ϕ(l)) .

Now we use that fact that if ϕ is a convex function for any points a < b < c,since the graph of ϕ lies below the chord (a, ϕ(a))–(c, ϕ(c)),

ϕ(b) ≤ ϕ(a)c− bc− a

+ ϕ(c)b− ac− a

,

which implies thatϕ(b)− ϕ(a)

b− a≤ ϕ(c)− ϕ(b)

c− b.

If k+ 1 = l− 1 we can apply this result directly to see that ϕ(k+ 1)−ϕ(k) ≤ϕ(l) − ϕ(l − 1), otherwise we use the result twice, for k < k + 1 < l − 1 andfor k + 1 < l − 1 < l, to obtain

ϕ(k + 1)− ϕ(k) ≤ ϕ(l − 1)− ϕ(k + 1)k − l − 2

≤ ϕ(l)− ϕ(l − 1)

thusE[ϕ(ζ)] ≤ E[ϕ(ξ)].

This inequality will be strict unless ϕ is linear between k and l. If it is strict,then we can argue as before that E[ϕ(ζ)] < E[ϕ(ζ)]. It is therefore clear thatif we can find a non-adjacent pair of integers k and l, such that ϕ is notlinear between k and l then the random variable ξ cannot minimize E[ϕ(ξ)].Consequently, a ξ which minimizes E[ϕ(ξ)] can either assign strictly positivemass to a single pair of adjacent integers, or it can assign strictly positive


probability to any number of integers, provided that they are all contained ina single interval of R where the function φ(x) is linear.

In the second case where ξ ∈ Aa only assigns non-negative probabilityto integers in an interval where ϕ is linear, it is immediate that E[ϕ(ξ)] =ϕ(E[ξ]) = ϕ(a), thus as a consequence of Jensen’s inequality such a ξ achievesthe minimum value of E[ϕ(ξ)] over ξ ∈ Aa. Since ξ ∈ Aa satisfies E[ξ] = a,the region where ϕ is linear must include the integers [a] and [a]+1, thereforewith ξmin defined by (9.10), E[ϕ(ξmin)] = ϕ(E[a]).

It therefore follows that in either case, the minimum value is uniquelyattained by ξmin unless ϕ is linear in which case E[ϕ(ξ)] is constant for anyξ ∈ Aa. E[ϕ(ξmin)] ≤ E[ϕ(ξ)] for any ξ ∈ Aa.

9.10 We have for t ∈ [kδ, (k + 1)δ]

(anj (t)

)p = exp(p

∫ t

kδ

h(vnj (s))> dYs −p

2

∫ t

kδ

‖h(vnj (s))‖2 ds)

= Mp(t) exp(p2 − p

2

∫ t

kδ

‖h(vnj (s))‖2 ds)

≤Mp(t) exp

(p2 − p

2

m∑i=1

‖hi‖2∞(t− kδ)

),

where Mp = Mp(t), t ∈ [kδ, (k + 1)δ] is the exponential martingale definedas

Mp(t) , exp(p

∫ t

kδ

h(vnj (s))> dYs −p2

2

∫ t

kδ

‖h(vnj (s))‖2 ds).

Hence

E[(anj (t)

)p | Fkδ] ≤ exp

(p2 − p

2

m∑i=1

‖hi‖2∞(t− kδ)

),

which, in turn, implies that

E

1n

n∑j=1

anj (t)

p∣∣∣∣∣∣Fkδ ≤ exp

(p2 − p

2

m∑i=1

‖hi‖2∞(t− kδ)

). (9.76)

Therefore

E[(ξnt )p | F[t/δ]δ

]=(ξn[t/δ]δ

)pE

1n

n∑j=1

anj (t)

p∣∣∣∣∣∣F[t/δ]δ

≤(ξn[t/δ]δ

)pexp

((p2 − p)(t− kδ)

2

m∑i=1

‖hi‖2∞

). (9.77)

Also from (9.76) one proves that


E[(ξnkδ)

p |F(k−1)δ

]≤(ξn(k−1)δ

)pexp

(p2 − p

2

m∑i=1

‖hi‖2∞δ

)hence, by induction,

E[(ξnkδ)p] ≤ exp

(p2 − p

2

m∑i=1

‖hi‖2∞kδ

). (9.78)

Finally from (9.76), (9.77) and (9.78) we get (9.39). The bound (9.40) followsin a similar manner.

9.11 We follow the proof of Proposition 9.7 Let Fkδ− = σ(Fs, 0 ≤ s < kδ)be the σ-algebra of events up to time kδ (the time of the kth-branching) andρnkδ− = limtkδ ρ

nt . For t ∈ [iδ, (i+ 1)δ), we have†

ρnt (ϕ) = πn0 (ϕ) + Mn,ϕi +

i∑k=1

(ρnkδ−(ϕ)− ρn(k−1)δ(ϕ))

+ (ρnt (ϕ)− ρniδ(ϕ)) ,

where Mn,ϕ =Mn,ϕk , k > 0

is the martingale

Mn,ϕi =

i∑k=1

(ρnkδ(ϕ)− ρnkδ−(ϕ)

)=

1n

i∑k=1

ξniδ

n∑j′=1

(onj′(iδ)− nan,iδj′ )ϕ(vnj′(iδ)), for i ≥ 0.

Next, by Ito’s formula, from (9.4) and (9.5), we get that

danj (t)ϕ(vnj (t)

)= anj (t)Aϕ(vnj (t)) dt

+ anj (t)((∇ϕ)>σ)(vnj (t)) dVt

+ anj (t)ϕ(vnj (t))h(vnj (t))> dYt

for ϕ ∈ C2b (Rd). Hence for t ∈ [kδ, (k + 1)δ), for k = 0, 1, . . . , i, we have

ρnt (ϕ)− ρnkδ(ϕ) =∫ t

kδ

ξnkδ

n∑j=1

danj (s)ϕ(vnj (s))

=∫ t

kδ

ρns (Aϕ) ds

+1n

n∑j=1

∫ t

kδ

ξnkδanj (s)((∇ϕ)>σ)(vnj (s)) dV js

+m∑r=1

∫ t

kδ

ρns (hrϕ) dY rs .

† We use the standard convention∑0

k=1 = 0.


Similarly

ρnkδ−(ϕ)− ρn(k−1)δ(ϕ) =∫ kδ

(k−1)δ

ρns (Aϕ) ds

+1n

n∑j=1

∫ kδ

(k−1)δ

ξnkδanj (s)((∇ϕ)>σ)(vnj (s)) dV js

+m∑r=1

∫ kδ

(k−1)δ

ρns (hrϕ) dY rs .

9.16 Following Lemma 3.29, the process t 7→ ρt(1) has the explicit represen-tation (3.55). That is,

ρt(1) = exp(∫ t

0

πs(h>) dYs −12

∫ t

0

πs(h>)πs(h) ds).

As in Exercise 9.10 with p = −2, for t ∈ [0, T ],

ρt(1)−2 ≤ exp(3mt‖h‖2∞

)Mt,

where M = Mt, t ∈ [0, T ] is the exponential martingale defined as

Mt , exp(−2∫ t

0

πs(h>) dYs − 2∫ t

0

πs(h>)πs(h) ds).

Using an argument similar to that used in the solution of Exercise 3.10 basedon the Gronwall inequality and the Burkholder–Davis–Gundy inequality (seeTheorem B.36 in the appendix), one shows that

E

[supt∈[0,T ]

Mt

]<∞;

hence the claim.

10

Particle Filters in Discrete Time

The purpose of this chapter is to present a rigorous mathematical treatmentof the convergence of particle filters in the (simpler) framework where boththe signal X and the observation Y are discrete time processes. This restric-tion means that this chapter does not use stochastic calculus. The chapteris organized as follows. In the following section we describe the discrete timeframework. In Section 10.2 we deduce the recurrence formula for the condi-tional distribution of the signal in discrete time. In Section 10.3 we deducenecessary and sufficient conditions for sequences of (random) measures to con-verge to the conditional distribution of the signal. In Section 10.4 we describea generic class of particle filters which are shown to converge in the followingsection.

10.1 The Framework

Let the signal X = Xt, t ∈ N be a stochastic process defined on the prob-ability space (Ω,F ,P) with values in Rd. Let FXt be the filtration generatedby the process; that is,

FXt , σ(Xs, s ∈ [0, t]).

We assume that X is a Markov chain. That is, for all t ∈ N and A ∈ B(Rd),

P(Xt+1 ∈ A | FXt

)= P (Xt+1 ∈ A | Xt) . (10.1)

The transition kernel of the Markov chain X is the function Kt(·, ·) definedon Rd × B(Rd) such that, for all t ∈ N and x ∈ Rd,

Kt(x,A) = P(Xt+1 ∈ A | Xt = x). (10.2)

The transition kernel Kt is required to have the following properties.

i. Kt(x, ·) is a probability measure on (Rd,B(Rd)), for all t ∈ N and x ∈ Rd.

A. Bain, D. Crisan, Fundamentals of Stochastic Filtering,DOI 10.1007/978-0-387-76896-0 10, c© Springer Science+Business Media, LLC 2009

258 10 Particle Filters in Discrete Time

ii. Kt(·, A) ∈ B(Rd), for all t ∈ N and A ∈ B(Rd).

The distribution of X is uniquely determined by its initial distributionand its transition kernel (see Theorem A.11 for details of how a stochasticprocess may be constructed from its transition kernels). Let us denote by qtthe distribution of the random variable Xt,

qt(A) , P(Xt ∈ A).

Then, from (10.2), it follows that qt satisfies the recurrence formula

qt+1 = Ktqt, t ≥ 0,

where Ktqt is the measure defined by

(Ktqt)(A) ,∫

RdKt(x,A)qt(dx). (10.3)

Hence, by induction it follows that

qt = Kt−1 . . .K1K0q0, t > 0.

Exercise 10.1. For arbitrary ϕ ∈ B(Rd) and t ≥ 0, define Ktϕ as

Ktϕ(x) =∫

Rdϕ(y)Kt(x, dy).

i. Prove that Ktϕ ∈ B(Rd) for any t ≥ 0.ii. Prove that Ktqt is a probability measure for any t ≥ 0.iii. Prove that, for any ϕ ∈ B(Rd) and t > 0, we have

Ktqt(ϕ) = qt(Ktϕ),

hence in generalqt(ϕ) = q0(ϕt), t > 0,

where ϕt = K0K1 . . .Kt−1ϕ ∈ B(Rd).

Let the observation process Y = Yt, t ∈ N be an Rm-valued stochasticprocess defined as follows

Yt , h(t,Xt) +Wt, t > 0, (10.4)

and Y0 = 0. In (10.4), h : N × Rd → Rm is a Borel-measurable function andfor all t ∈ N, Wt : Ω → Rm are mutually independent random vectors withlaws absolutely continuous with respect to the Lebesgue measure λ on Rm.We denote by g(t, ·) the density of Wt with respect to λ and we further assumethat g(t, ·) ∈ B(Rd) and is a strictly positive function.

The filtering problem consists of computing the conditional distribution ofthe signal given the σ-algebra generated by the observation process from time

10.2 The Recurrence Formula for πt 259

0 up to the current time i.e. computing the (random) probability measure πt,where

πt(A) , P(Xt ∈ A | σ(Y0:t)), (10.5)πtf = E [f(Xt) | σ(Y0:t)]

for all A ∈ B(Rd) and f ∈ B(Rd), where Y0:t is the random vector Y0:t ,(Y0, Y1, . . . , Yt).† For arbitrary y0:t , (y0, y1, . . . , yt) ∈ (Rm)t+1, let πy0:tt bethe (non-random) probability measure defined as

πy0:tt (A) , P (Xt ∈ A | Y0:t = y0:t) , (10.6)πy0:tt f = E [f(Xt) | Y0:t = y0:t]

for all A ∈ B(Rd) and f ∈ B(Rd). Then πt = πY0:tt . While πt is a random

probability measure, πy0:tt is a deterministic probability measure. We also in-troduce pt and p

y0:t−1t , t > 0 the predicted conditional probability measures

defined by

py0:t−1t (A) , P (Xt ∈ A | Y0:t−1 = y0:t−1) ,py0:t−1t f = E [f(Xt) | Y0:t−1 = y0:t−1] .

Again pt = pY0:t−1t .

In the statistics and engineering literature the probability qt is commonlycalled the prior distribution of the signal Xt, whilst πt is called the (Bayesian)posterior distribution.

10.2 The Recurrence Formula for πt

The following lemma gives the density of the random vector Ys:t = (Ys, . . . , Yt)for arbitrary s, t ∈ N, s ≤ t.

Lemma 10.2. Let PYs:t ∈ P((Rm)t−s+1) be the probability distribution of Ys:tand λ be the Lebesgue measure on ((Rm)t−s+1,B((Rm)t−s+1)). Then, for all0 < s ≤ t <∞, PYs:t is absolutely continuous with respect to λ and its Radon–Nikodym derivative is

dPYs:tdλ

(ys:t) = Υ (ys:t) ,∫

(Rd)t−s+1

t∏i=s

gi(yi − h(i, xi))PXs:t(dxs:t),

where PXs:t ∈ P((Rd)t−s+1) is the probability distribution of the random vectorXs:t = (Xs, . . . , Xt).

† Y0:t, t ∈ N is the path process associated with the observation process Y =Yt, t ∈ N. That is, Y0:t, t ∈ N records the entire history of Y up to time t,not just its current value.


Proof. Let Cs:t = Cs×· · ·×Ct, where Cr are arbitrary Borel sets, Cr ∈ B(Rm)for all s ≤ r ≤ t. We need to prove that

PYs:t(Cs:t) = P (Ys:t ∈ Cs:t) =∫Cs:t

Υ (ys:t)dys . . . dyt. (10.7)

Using the properties of the conditional probability,

P (Ys:t ∈ Cs:t) =∫

(Rd)t−s+1P (Ys:t ∈ Cs:t | Xs:t = xs:t) PXs:t (dxs:t) . (10.8)

Since (Xs, . . . , Xt) is independent of (Ws, . . . ,Wt), from (10.4) it follows that

P (Ys:t ∈ Cs:t | Xs:t = xs:t) = E

[t∏i=s

1Ci (h(i,Xi) +Wi) | Xs,t = xs:t

]

= E

[t∏i=s

1Ci (h(i, xi) +Wi)

],

thus by the mutual independence of Ws, . . . , Wt,

P (Ys:t ∈ Cs:t | Xs:t = xs:t) =t∏i=s

E [1Ci (h(i, xi) +Wi)]

=t∏i=s

∫Ci

gi(yi − h(i, xi)) dyi. (10.9)

By combining (10.8) and (10.9) and applying Fubini’s theorem, we obtain(10.7). ut

Remark 10.3. A special case of (10.9) gives that

P (Yt ∈ dyt | Xt = xt) = gt(yt − h(t, xt)) dyt,

which explains why the function gytt : Rd → R defined by

gytt (x) = gt(yt − h(t, x)), x ∈ Rd (10.10)

is commonly referred to as the likelihood function.

Since gi for i = s, . . . , t are strictly positive, the density of the randomvector (Ys, . . . , Yt) is also strictly positive. This condition can be relaxed (i.e. girequired to be non-negative), however, the relaxation requires a more involvedtheoretical treatment of the particle filter.

The recurrence formula for πt involves two operations defined on P(Rd):a transformation via the transition kernel Kt and a projective product asso-ciated with the likelihood function gytt defined as follows.


Definition 10.4. Let p ∈ P(Rd) be a probability measure, and let ϕ ∈ B(Rd)be a non-negative function such that p(ϕ) > 0. The projective product ϕ ∗ pis the (set) function ϕ ∗ p : B(Rd)→ R defined by

ϕ ∗ p(A) ,

∫A

ϕ(x)p(dx)

p(ϕ)

for any A ∈ B(Rd).

In the above definition, recall that

p(ϕ) =∫

Rdϕ(x)p(dx).

Exercise 10.5. Prove that ϕ ∗ p is a probability measure on B(Rd).

The projective product ϕ ∗ p is a probability measure which is absolutelycontinuous with respect to p, whose Radon–Nikodym derivative with respectto p is proportional to ϕ, viz:

d(ϕ ∗ p)dp

= cϕ,

where c is the normalizing constant, c = 1/p(ϕ).The following result gives the recurrence formula for the conditional prob-

ability of the signal. The prior and the posterior distributions coincide at time0, π0 = q0, since Y0 = 0 (i.e. no observations are available at time 0).

Proposition 10.6. For any fixed path (y0, y1, . . . , yt, . . .) the sequence of(non-random) probability measures (πy0:tt )t≥0 satisfies the following recurrencerelation

πy0:tt = gytt ∗Kt−1πy0:t−1t−1 , t > 0. (10.11)

The recurrence formula (10.11) holds PY0:t-almost surely.† Equivalently, theconditional distribution of the signal satisfies the following recurrence relation

πt = gYtt ∗Kt−1πt−1, t > 0, (10.12)

and the recurrence is satisfied P-almost surely.

Proof. For all f ∈ B(Rd), using the Markov property of X and the definitionof the transition kernel K,

E[f(Xt) | FXt−1

]= E [f(Xt) | Xt−1] = Kt−1f(Xt−1).

† Equivalently, formula (10.11) holds true λ-almost surely where λ is the Lebesguemeasure on (Rm)t+1.


Since W0:t−1 is independent of X0:t, from property (f) of conditional expec-tation,†

E[f(Xt) | FXt−1 ∨ σ(W0:t−1)

]= E

[f(Xt) | FXt−1

],

hence, using property (d) of conditional expectation

ptf = E [f(Xt) | Y0:t−1]

= E[E[f(Xt) | FXt−1 ∨ σ(W0:t−1)

]| σ(Y0:t−1)

]= E [Kt−1f(Xt−1) | σ(Y0:t−1)]= πt−1(Kt−1f),

which implies that pt = Kt−1πt−1 (as in Exercise 10.1 part (iii) or equivalentlypy0:t−1t = Kt−1π

y0:t−1t−1 .

Next we prove that πy0:tt = gytt ∗ py0:t−1t . Let C0:t = C0 × · · · × Ct where

Cr ∈ B(Rm) for r = 0, 1, . . . , t. We need to prove that for any A ∈ B(Rd),∫C0:t

πy0:tt (A) PY0:t(dy0:t) =∫C0:t

gytt ∗ py0:t−1t (A) PY0:t(dy0:t). (10.13)

By (A.2), the left-hand side of (10.13) is equal to P(Xt ∈ A ∩ Y0:t ∈C0:t). Since σ(X0:t,W0:t−1) ⊃ σ(Xt, Y0:t−1), from property (f) of conditionalexpectation

P (Yt ∈ Ct | Xt, Y0:t−1) = E (P (Yt ∈ Ct | X0:t,W0:t−1) | Xt, Y0:t−1) (10.14)

and using property (d) of conditional expectations and (10.9)

P (Yt ∈ Ct | X0:t,W0:t−1) = P (Yt ∈ Ct | X0:t)

= P(Y0:t ∈ (Rm)t × Ct | X0:t

)=∫Ct

gt(yt − h(t,Xt)) dyt. (10.15)

From (10.14) and (10.15),

P (Yt ∈ Ct | Xt, Y0:t−1) = E (P (Yt ∈ Ct | Xt,W0:t−1) | Xt, Y0:t−1)

=∫Ct

gt(yt − h(t,Xt)) dyt.

This gives us

P (Yt ∈ Ct | Xt = xt, Y0:t−1 = y0:t−1) =∫Ct

gytt (xt) dyt, (10.16)

where gyt is defined in (10.10); hence

† See Section A.2 for a list of the properties of conditional expectation.


PY0:t(C0:t) = P(Yt ∈ Ct ∩ Xt ∈ Rd ∩ Y0:t−1 ∈ C0:t−1

)=∫

Rd×C0:t−1

P (Yt ∈ Ct | Xt = xt, Y0:t−1 = y0:t−1)

PXt,Y0:t−1 (dxt,dy0:t−1)

=∫

Rd×C0:t−1

∫Ct

gytt (xt) dyt py0:t−1t (dxt)PY0:t−1(dy0:t−1)

=∫C0:t

∫Rdgytt (xt)p

y0:t−1t (dxt) PY0:t−1(dy0:t−1) dyt. (10.17)

In (10.17), we used the identity

PXt,Y0:t−1 (dxt,dy0:t−1) = py0:t−1t (dxt)PY0:t−1(dy0:t−1), (10.18)

which is again a consequence of the vector-valued equivalent of (A.2), sincefor all A ∈ B(Rd), we have

P ((Xt, Y0:t−1) ∈ A× C0:t−1)

=∫C0:t−1

P (Xt ∈ A | Y0:t−1 = y0:t−1) PY0:t−1(dy0:t−1)

=∫A×C0:t−1

py0:t−1t (dxt)PY0:t−1(dy0:t−1).

From (10.17)

PY0:t(dy0:t) = py0:t−1t (gytt ) dytPY0:t−1(dy0:t−1).

Hence the second term in (10.13) is equal to∫C0:t

gytt ∗ py0:t−1t (A)PY0:t(dy0:t)

=∫C0:t

∫Agytt (xt)p

y0:t−1t (dxt)

py0:t−1t (gytt )

PY0:t (dy0:t)

=∫C0:t

∫A

gytt (xt)py0:t−1t (dxt) dytPY0:t−1(dy0:t−1).

Finally, using (10.16) and (10.18),∫C0:t

gytt ∗ py0:t−1t (A)PY0:t(dy0:t)

=∫A×C0:t−1

(∫Ct

gytt (xt)dyt

)py0:t−1t (dxt)PY0:t−1(dy0:t−1)

=∫A×C0:t−1

P (Yt ∈ Ct | Xt = xt, Y0:t−1 = y0:t−1)

× PXt,Y0:t−1 (dxt,dy0:t−1)= P (Xt ∈ A ∩ Y0:t ∈ C0:t) .


From the earlier discussion this is sufficient to establish the result. ut

As it can be seen from its proof, the recurrence formula (10.12) can berewritten in the following expanded way,

πt−1 7→ pt = Kt−1πt−1 7→ πt = gYtt ∗ pt, t > 0. (10.19)

The first step is called the prediction step: it occurs at time t before the arrivalof the new observation Yt. The second step is the updating step as it takesinto account the new observation Yt. A similar expansion holds true for therecurrence formula (10.11); that is,

πy0:t−1t−1 7→ p

y0:t−1t = Kt−1π

y0:t−1t−1 7→ πy0:tt = gytt ∗ p

y0:t−1t , t > 0. (10.20)

The simplicity of the recurrence formulae (10.19) and (10.20) is misleading. Aclosed formula for the posterior distribution exists only in exceptional cases(the linear/Gaussian filter). The main difficulty resides in the updating step:the projective product is a non-linear transformation involving the computa-tion of the normalising constant pt(gYtt ) or py0:t−1

t (gytt ) which requires an inte-gration over a (possibly) high-dimensional space. In Section 10.4 we presenta generic class of particle filters which can be used to approximate numeri-cally the posterior distribution. Before that we state and prove necessary andsufficient criteria for sequences of approximations to converge to the posteriordistribution.

10.3 Convergence of Approximations to πt

We have two sets of criteria: for the case when the observation is a priori fixedto a particular outcome, that is, say

Y0 = y0, Y1 = y1, . . .

and for the case when the observation remains random. The first case is thesimpler of the two, since the measures to be approximated are not random.

10.3.1 The Fixed Observation Case

We look first at the case when the observation process has an arbitrary, butfixed, value y0:T , where T is a finite time horizon. We assume that the re-currence formula (10.20) for πy0:tt – the conditional distribution of the signalgiven the event Y0:t = y0:t – holds true for the particular observation pathy0:t for all 0 ≤ t ≤ T (remember that (10.20) is valid PY0:t-almost surely). Asstated above, (10.20) requires the computation of the predicted conditionalprobability measure py0:t−1

t :

πy0:t−1t−1 −→ p

y0:t−1t −→ πy0:tt .

10.3 Convergence of Approximations to πt 265

Therefore it is natural to study algorithms which provide recursive approx-imations for πy0:tt using intermediate approximations for py0:t−1

t . Denote by(πnt )∞n=1 the approximating sequence for πy0:tt and (pnt )∞n=1 the approximat-ing sequence for py0:t−1

t . Is is assumed that the following three conditions aresatisfied.

• πnt and pnt are random measures, not necessarily probability measures.• pnt 6= 0, πnt 6= 0 (i.e. no approximation should be trivial).• pnt g

ytt > 0 for all n > 0, 0 ≤ t ≤ T .

Let πnt be defined as a (random) probability measure absolutely continuouswith respect to pnt for t ∈ N and n ≥ 1 such that

πnt = gytt ∗ pnt ; (10.21)

thus

πnt f =pnt (fgyt)pnt g

yt. (10.22)

The following theorems give necessary and sufficient conditions for the con-vergence of pnt to p

y0:t−1t and πnt to πy0:tt . In order to simplify notation, for

the remainder of this subsection, dependence on y0:t is suppressed and πy0:tt isdenoted by πt, p

y0:t−1t by pt and gytt by gt. It is important to remember that

the observation process is a given fixed path y0:T .

Theorem 10.7. For all f ∈ B(Rd) and all t ∈ [0, T ] the limits

a0. limn→∞ E [|πnt f − πtf |] = 0,b0. limn→∞ E [|pnt f − ptf |] = 0,

hold if and only if for all f ∈ B(Rd) and all t ∈ [0, T ] we have

a1. limn→∞ E [|πn0 f − π0f |] = 0,b1. limn→∞ E

[∣∣pnt f −Kt−1πnt−1f

∣∣] = limn→∞ E [|πnt f − πnt f |] = 0.

Proof. The necessity of conditions (a0) and (b0) is proved by induction. Thelimit (a0) follows in the starting case of t = 0 from (a1). We need to showthat if πnt−1 converges in expectation to πt−1 and pnt converges in expectationto pt then πnt converges in expectation to πt. Since pt = Kt−1πt−1, for allf ∈ B(Rd), by the triangle inequality

|pnt f − ptf | ≤ |pnt f −Kt−1πnt−1f |+ |Kt−1π

nt−1f −Kt−1πt−1f |. (10.23)

The expected value of the first term on the right-hand side of (10.23) convergesto zero from (b1). Also using Exercise 10.1, Kt−1f ∈ B(Rd) and Kt−1π

nt−1f =

πnt−1(Kt−1f) and Kt−1πt−1f = πt−1(Kt−1f) hence

limn→∞

E[∣∣Kt−1π

nt−1f −Kt−1πt−1f

∣∣] = 0.

By taking expectation of both sides of (10.23),


Thus conditions (a1) and (b1) imply that pnt converges in expectation topt and πnt converges in expectation to πt (see Section A.10 for the definition ofconvergence in expectation). The convergence in expectation of pnt and of πntholds if and only if conditions (a1) and (b1) are satisfied for all f ∈ Cb(Rd) (notnecessarily for all f ∈ B(Rd)) provided additional constraints are imposed onthe transition kernel of the signal and of the likelihood functions; see Corollary10.10 below.

Definition 10.8. The transition kernel Kt is said to satisfy the Feller prop-erty if Ktf ∈ Cb(Rd) for all f ∈ Cb(Rd).

Exercise 10.9. Let Vt∞t=1 be a sequence of independent one-dimensionalstandard normal random variables.

i. Let X = Xt, t ∈ N be given by the following recursive formula

Xt+1 = a(Xt) + Vt,

where a : R → R is a continuous function. Show that the correspondingtransition kernel for X satisfies the Feller property.

ii. Let X = Xt, t ∈ N be given by the following recursive formula

Xt+1 = Xt + sgn(Xt) + Vt.

Then show that the corresponding transition kernel for X does not satisfythe Feller property.

The following result gives equivalent conditions for the convergence inexpectation.

Corollary 10.10. Assume that the transition kernel for X is Feller and thatthe likelihood functions gt are all continuous. Then the sequences pnt , πnt con-verge in expectation to pt and πt for all t ∈ [0, T ] if and only if conditions(a1) and (b1) are satisfied for all f ∈ Cb(Rd) and all t ∈ [0, T ].

Proof. The proof is a straightforward modification of the proof of Theorem10.7. The Feller property is used in the convergence to zero of the second termon the right-hand side of (10.23):

limn→∞

E[∣∣Kt−1π

nt−1f −Kt−1πt−1f

∣∣]= limn→∞

E[∣∣πnt−1 (Kt−1f)− πt−1 (Kt−1f)

∣∣] = 0.

That is, only if Kt−1f is continuous, we can conclude that the limit above iszero. The continuity of gt is used to conclude that both terms on the right-hand side of (10.26) converge to zero. ut


Following Remark A.38 in the appendix, if there exists a positive constantp > 1 such that

E[|πnt f − πtf |

2p]≤ cfnp, (10.30)

where cf is a positive constant depending on the test function f , but indepen-dent of n, then, for any ε ∈ (0, 1/2 − 1/(2p)) there exists a positive randomvariable cf,ε almost surely finite such that

|πnt f − πtf | ≤cf,εnε

.

In particular πnt f converges to πtf almost surely. Moreover if (10.30) holds forany f ∈ M where M is a countable convergence determining set (as definedin Section A.10), then, almost surely, πnt converges to πt in the weak topology.This means that there exists a set Ω ∈ F such that P(Ω) = 1 and for anyω ∈ Ω the corresponding sequence of probability measures πn,ωt satisfies

limn→∞

πn,ωt (f) = πt(f),

for any f ∈ Cb(Rd). This cannot be extended to the convergence for anyf ∈ B(Rd) (i.e. to the stronger, so-called convergence in total variation, ofπn,ωt to πt).

Exercise 10.11. Let µ be the uniform measure on the interval [0, 1] and(µn)n≥1 be the sequence of probability measures

µn =1n

n∑i=1

δi/n.

i. Show that (µn)n≥1 converges to µ in the weak topology.ii. Let f = 1Q∩[0,1] ∈ B(Rd) be the indicator set of all the rational numbers in

[0, 1]. Show that µn(f) 6→ µ(f), hence µn does not converge to µ in totalvariation.

Having rates of convergence for the higher moments of the error termsπnt f − πtf as in (10.30) is therefore very useful as they imply the almost sureconvergence of the approximations in the weak topology with no additionalassumptions required on the transition kernels of the signal and the likelihoodfunction. However, if we wish a result in the same vein as that of Theorem 10.7,the same assumptions as in Corollary 10.10 must be imposed. The followingtheorem gives us the corresponding criterion for the almost sure convergenceof pnt to pt and πnt to πt in the weak topology. The theorem makes use ofthe metric dM as defined in Section A.10 which generates the weak topologyon MF (Rd). The choice of the metric is not important; any metric whichgenerates the weak topology may be used.


Theorem 10.12. Assume that the transition kernel for X is Feller and thatthe likelihood functions gt are all continuous for all t ∈ [0, T ]. Then the se-quence pnt converges almost surely to pt and πnt converges almost surely to πtfor all t ∈ [0, T ] if and only if the following two conditions are satisfied for allt ∈ [0, T ]

a2. limn→∞ πn0 = π0, P-a.s.b2. limn→∞ dM

(pnt , π

nt−1Kt−1

)= limn→∞ dM (πnt , π

nt ) = 0, P-a.s.

Proof. The sufficiency of the conditions (a2) and (b2) is proved as above byinduction using inequalities (10.23), (10.25) and (10.27). It remains to provethat (a2) and (b2) are necessary. Assume that for all t ≥ 0 pnt converges almostsurely to pt and πnt converges almost surely to πt This implies that πnt−1Kt−1

converges almost surely to pt (which is equal to πt−1Kt−1) and using (10.25),that πnt converges almost surely to πt.

Hence, almost surely limn→∞ dM(pnt , pt) = 0, limn→∞ dM(πnt , πt) = 0,limn→∞ dM(πnt−1Kt−1, pt) = 0 and limn→∞ dM(πnt , πt) = 0. Finally, usingthe triangle inequality

dM(pnt , π

nt−1Kt−1

)≤ dM (pnt , pt) + dM

(pt, π

nt−1Kt−1

)and

dM (πnt , πnt ) ≤ dM (πnt , πt) + dM (πt, πnt ) ,

which imply (b2). ut

Remark 10.13. Theorems 10.7 and 10.12 and Corollary 10.10 are very natural.They say that we obtain approximations of py0:t−1

t and πy0:tt for all t ∈ [0, T ]if and only if we start from an approximation of π0 and then ‘follow closely’the recurrence formula (10.20) for py0:t−1

t and πy0:tt .

The natural question arises as to whether we can lift the results to the casewhen the observation process is random and not just a given fixed observationpath.

10.3.2 The Random Observation Case

In the previous section both the converging sequences and the limiting mea-sures depend on the fixed value of the observation. Let us look first at theconvergence in mean. If for an arbitrary f ∈ B(Rd), the condition

limn→∞

E [|πn,y0:tt f − πy0:tt f |] = 0,

holds for PY0:t-almost all values y0:t and there exists a PY0:t-integrable functionw(y0:t) such that, for all n ≥ 0,


E [|πn,y0:tt f − πy0:tt f |] ≤ wf (y0:t) PY0:t-a.s.,† (10.31)

then by the dominated convergence theorem,

limn→∞

E[∣∣∣πn,Y0:t

t f − πtf∣∣∣]

= limn→∞

∫(Rm)t+1

E [|πn,y0:tt f − πy0:tt f |] PY0:t (dy0:t) = 0.

Hence conditions (a1) and (b1) are also sufficient for convergence in the ran-dom observation case. In particular, if (a1) and (b1) are satisfied for anyf ∈ Cb(Rd) and the two additional assumptions of Corollary 10.10 hold thenπn,Y0:tt converges in expectation to πt. Similar remarks apply to pt. Also, the

existence of rates of convergence for higher moments and appropriate inte-grability conditions can lead to the P-almost sure convergence of πn,Y0:t

t toπt.

However, a necessary and sufficient condition can not be obtained in thismanner, since limn→∞ E[|πn,Y0:t

t f − πtf |] = 0 does not imply

limn→∞

E [|πn,y0:tt f − πy0:tt f |] = 0

for PY0:t-almost all values y0:t.The randomness of the approximating measures pn,Y0:t−1

t and πn,Y0:tt now

comes from two sources; one is the (random) observation Y and the otherone is the actual construction of the approximations. In the case of particleapproximations, randomness is introduced in the system during each of thepropagation steps (see the next section for details). As the following conver-gence results show, the effect of the second source of randomness vanishesasymptotically (the approximating measures converge to pt and πt).

The following proposition is the equivalent of Theorem 10.7 for the ran-dom observation case. Here and throughout the remainder of the section thedependence on the process Y is suppressed from the notations pn,Y0:t

t , πn,Y0:tt ,

gYtt , and so on.

Proposition 10.14. Assume that for any t ≥ 0, there exists a constant ct > 0such that ptgt ≥ ct. Then, for all f ∈ B(Rd) and all t ≥ 0 the limits

a0 ′. limn→∞ E [|πnt f − πtf |] = 0,b0 ′. limn→∞ E [|pnt f − ptf |] = 0,

hold if and only if for all f ∈ B(Rd) and all t ≥ 0

a1 ′. limn→∞ E [|πn0 f − π0f |] = 0,b1 ′. limn→∞ E[|pnt f −Kt−1π

nt−1f |] = limn→∞ E[|πnt f − πnt f |] = 0.

† Condition (10.31) is trivially satisfied for approximations which are probabilitymeasures since in this case wf = 2‖f‖∞ satisfies the condition.


Proof. The proof follows step by step that of Theorem 10.7. The only stepthat differs slightly is the proof of convergence to zero of E[|πnt f−πtf |]. Usingthe equivalent of the inequality (10.25)

E [|πnt f − πtf |] ≤ ‖f‖∞E[

1ptgt|pnt gt − ptgt|

]+ E

[1ptgt|pnt (fgt)− pt(fgt)|

]. (10.32)

Since 1/(ptgt) is now random it can not be taken outside the expectations asin (10.26). However, by using the assumption ptgt ≥ ct, we deduce that

E [|πnt f − πtf |] ≤‖f‖∞ct

E [|pnt gt − ptgt|] +1ct

E [|pnt (fgt)− pt(fgt)|]

and hence the required convergence. ut

The condition that ptgt ≥ ct is difficult to check in practice. It is sometimesreplaced by the condition that E[1/(ptgt)2] <∞ together with the convergenceto zero of the second moments of pnt gt − ptgt and pnt (fgt) − pt(fgt) (see theproof of convergence of the particle filter in continuous time described in theprevious chapter).

As in the previous case, conditions (a1′) and (b1′) imply that pnt convergesin expectation to pt and πnt converges in expectation to πt. A result analogousto Corollary 10.10 is true for the convergence in expectation of pnt and πnt ,provided that the same additional constraints are imposed on the transitionkernel of the signal and of the likelihood functions.

The existence of rates of convergence for the higher moments of the er-ror terms πnt f − πtf as in (10.30) can be used to deduce the almost sureconvergence of the approximations in the weak topology with no additionalconstraints imposed upon the transition kernel of the signal or the likelihoodfunction. However, in order to prove a similar result to Theorem 10.7, thesame assumptions as in Corollary 10.10 must be imposed. The following the-orem gives us the corresponding criterion for the almost sure convergence ofpnt to pt and πnt to πt in the weak topology. The result is true without theneed to use the cumbersome assumption ptgt ≥ ct for any t ≥ 0. It makes useof the metric dM, defined in Section A.10, which generates the weak topologyon MF (Rd). The choice of the metric is not important; any metric whichgenerates the weak topology may be used.

Proposition 10.15. Assume that the transition kernel for X is Feller andthat the likelihood functions gt are all continuous. Then the sequence pnt con-verges almost surely to pt and πnt converges almost surely to πt, for all t ≥ 0if and only if, for all t ≥ 0,

a2 ′. limn→∞ πn0 = π0, P-a.s.b2 ′. limn→∞ dM

(pnt ,Kt−1π

nt−1

)= limn→∞ dM (πnt , π

nt ) = 0.


Proof. The proof is similar to that of Theorem 10.12, the only difference beingthe proof that limn→∞ pnt = pt, P-a.s. implies limn→∞ πnt = πt, P-a.s. whichis as follows. LetM be a convergence determining set of functions in Cb(Rd),for instance, the set used to construct the metric dM. Then almost surely

limn→∞

pnt gt = ptgt and limn→∞

pnt (gtf) = pt(gtf) for all f ∈M.

Hence, again almost surely, we have

limn→∞

πnt f = limn→∞

pnt (gtf)pnt gt

=pt(gtf)ptgt

(ω) = πtf, ∀f ∈M

which implies limn→∞ πnt = πt, P-a.s. ut

In the next section we present examples of approximations to the posteriordistribution which satisfy the conditions of these results. The algorithms usedto produce these approximations are called particle filters or sequential MonteCarlo methods.

10.4 Particle Filters in Discrete Time

The algorithms presented below involve the use of a system of n particleswhich evolve (mutate) according to the law of X. After each mutation thesystem is corrected: each particle is replaced by a random number of particleswhose mean is proportional to the likelihood of the position of the particle.After imposing some weak restrictions on the offspring distribution of theparticles, the empirical measure associated with the particle systems is provento converge (as n tends to∞) to the conditional distribution of the signal giventhe observation.

Denote by πnt the approximation to πt and by pnt the approximation to pt.The particle filter has the following description.

1. Initialization [t = 0].For i = 1, . . . , n, sample x(i)

0 from π0.2. Iteration [t− 1 to t].

Let x(i)t−1, i = 1, . . . , n be the positions of the particles at time t− 1.

a) For i = 1, . . . , n, sample x(i)t from Kt−1(x(i)

t−1, ·). Compute the (nor-malized) weight w(i)

t = gt(x(i)t )/(

∑nj=1 gt(x

(j)t )).

b) Replace each particle by ξ(i)t offspring such that

∑ni=1 ξ

(i)t = n. Denote

the positions of the offspring particles by x(i)t , i = 1, . . . , n.

10.4 Particle Filters in Discrete Time 273

It follows from the above that the particle filter starts from πn0 : the empir-ical measure associated with a set of n random particles of mass 1/n whosepositions x(i)

0 for i = 1, . . . , n form a sample of size n from π0,

πn0 ,1n

n∑i=1

δx(i)0.

In general, define πnt to be

πnt ,1n

n∑i=1

δx(i)t,

where x(i)t for i = 1, . . . , n are the positions of the particles of mass 1/n

obtained after the second step of the iteration. Let πnt be the weighted measure

πnt ,n∑i=1

w(i)t δ

x(i)t.

We introduce the following σ-algebras

Ft = σ(x(i)s , x(i)

s , s ≤ t, i = 1, . . . , n)

Ft = σ(x(i)s , x(i)

s , s < t, x(i)t , i = 1, . . . , n).

Obviously Ft ⊂ Ft and the (random) probability measures pnt and πnt are Ft-measurable whilst πnt is Ft-measurable for any t ≥ 0. The random variablesx

(i)t for i = 1, . . . , n are chosen to be mutually independent conditional uponFt−1.

The iteration uses πnt−1 to obtain πnt , but not any of the previous approxi-mations. Following part (a) of the iteration, each particle changes its positionaccording to the transition kernel of the signal. Let pnt be the empirical dis-tribution associated with the cloud of particles of mass 1/n after part (a) ofthe iteration

pnt =1n

n∑i=1

δx(i)t.

This step of the algorithm is known as the importance sampling step (pop-ular in the statistics literature) or mutation step (inherited from the geneticalgorithms literature).

Exercise 10.16. Prove that E [pnt | Ft−1] = Knt−1π

nt−1.

Remark 10.17. An alternative way to obtain pnt from πnt−1 is to sample ntimes from the measure Kt−1π

nt−1 and define pnt to be the empirical measure

associated with this sample.


We assume that the offspring vector ξt = (ξ(i)t )ni=1 satisfies the following

two conditions.

1. The conditional mean number of offspring is proportional to w(i)t . More

preciselyE[ξ

(i)t | Ft

]= nw

(i)t . (10.33)

2. Let Ant be the conditional covariance matrix of the random vector ξt ,

(ξ(i)t )ni=1,

Ant , E[(ξt − nwt)> (ξt − nwt) | Ft

]with entries

(Ant )ij = E[(ξ

(i)t − nw

(i)t

)(ξ

(j)t − nw

(j)t

) ∣∣∣ Ft] ,where wt , (w(i)

t )ni=1 is the vector of weights. Then assume that thereexists a constant ct, such that

q>Ant q ≤ nct (10.34)

for any n-dimensional vector q =(q(i))ni=1∈ Rn, such that |q(i)| ≤ 1 for

i = 1, . . . , n.

Exercise 10.18. Prove that the following identity holds

πnt =1n

n∑i=1

ξ(i)t δ

x(i)t,

and that E[πnt | Ft] = πnt .

Step (b) of the iteration is called the selection step. The particles obtainedafter the first step of the recursion are multiplied or discarded according tothe magnitude of the likelihood weights. In turn the likelihood weights areproportional to the likelihood of the new observation given the correspond-ing position of the particle (see Remark 10.3). Hence if nw(i)

t is small, feweroffspring are expected than if nw(i)

t is large. Since

nw(i)t =

gt

(x

(i)t

)1n

∑nj=1 gt

(x

(j)t

) ,nw

(i)t is small when the corresponding value of the likelihood function gt(x

(i)t )

is smaller than the likelihood function averaged over the positions of all theparticles. In conclusion, the effect of part (b) of the iteration is that it discardsparticles in unlikely positions and multiplies those in more likely ones. Follow-ing Exercise 10.18, this is done in an unbiased manner: the conditional expec-tation of the approximation after applying the step is equal to the weighted

10.5 Offspring Distributions 275

sample obtained after the first step of the recursion. That is, the average ofthe mass ξ(i)

t /n associated with particle i is equal to w(i)t , the weight of the

particle before applying the step.

Exercise 10.19. Prove that, for all f ∈ B(Rd), we have

E[(πnt f − πnt f)2

]≤ ct‖f‖2∞

n.

Exercise 10.19 implies that the randomness introduced in part (b) of theiteration, as measured by the second moment of πnt f− πnt f , tends to zero withrate given by 1/n, where n is the number of particles in the system.

Lemma 10.20. Condition (10.34) is equivalent to

q>Ant q ≤ nct (10.35)

for any n-dimensional vector q =(q(i))ni=1∈ [0, 1]n, where ct is a fixed con-

stant.

Proof. Obviously (10.34) implies (10.35), so we only need to show the reverseimplication. Let q ∈ Rn be an arbitrary vector such that q = (q(i))ni=1, |q(i)| ≤1, i = 1, . . . , n. Let also

q(i)+ , max

(q(i), 0

), q

(i)− , max

(−q(i), 0

), 0 ≤ q(i)

+ , q(i)− ≤ 1

and q+ = (q(i)+ )ni=1 and q− = (q(i)

− )ni=1. Then q = q+ − q−. Define ‖ · ‖A to bethe semi-norm associated with the matrix A; that is,

‖q‖A ,√q>Aq.

If all the eigenvalues of A are strictly positive, then ‖ · ‖A is a genuine norm.Using the triangle inequality and (10.35),

‖q‖Ant ≤ ‖q+‖Ant + ‖q−‖Ant ≤ 2√nct,

which implies that (10.34) holds with ct = 4ct. ut

10.5 Offspring Distributions

In order to have a complete description of the particle filter we need to spec-ify the offspring distribution. The most popular offspring distribution is themultinomial distribution

ξt = Multinomial(n,w

(1)t , . . . , w

(n)t

);

that is,


P(ξ

(i)t = n(i), i = 1, . . . , n

)=

n!∏ni=1 n

(i)!

n∏i=1

(w

(i)t

)n(i)

.

The multinomial distribution is the empirical distribution of an n-sample fromthe distribution πnt . In other words, if we sample (with replacement) n timesfrom the population of particles with positions x(i)

t , i = 1, . . . , n accordingto the probability distribution given by the corresponding weights w(i)

t , i =1, . . . , n and denote by ξ(i)

t the number of times that the particle with positionx

(i)t is chosen, then ξt = (ξ(i)

t )ni=1 has the above multinomial distribution.

Lemma 10.21. If ξt has a multinomial distribution then it satisfies the un-biasedness condition; that is,

E[ξ

(i)t | Ft

]= nw

(i)t ,

for any i = 1, . . . , n. Also ξt satisfies condition (10.34).

Proof. The unbiasedness condition follows immediately from the properties ofthe multinomial distribution. Also

E[(ξ

(i)t − nw

(i)t

)2

| Ft]

= nw(i)t

(1− w(i)

t

)E[(ξ

(i)t − nw

(i)t

)(ξ

(j)t − nw

(j)t

)| Ft

]= −nw(i)

t w(j)t , i 6= j.

Then for all q =(q(i))ni=1∈ [−1, 1]n,

q>Ant q =n∑i=1

nw(i)t

(1− w(i)

t

)(q(i))2

− 2∑

1≤i<j≤n

nw(i)t w

(j)t q(i)q(j)

= nn∑i=1

w(i)t

(q(i))2

− n

(n∑i=1

w(i)t q(i)

)2

≤ nn∑i=1

w(i)t ,

and since∑ni=1 w

(i)t = 1, (10.34) holds with ct = 1. ut

The particle filter with this choice of offspring distribution is called thebootstrap filter or the sampling importance resampling algorithm (SIR algo-rithm). It was introduced by Gordon, Salmond and Smith in [106] (see the lastsection for further historical remarks). Within the context of the bootstrapfilter, the second step is called the resampling step.

The bootstrap filter is quick and easy to implement and amenable to par-allelisation. This explains its great popularity among practitioners. However,


it is suboptimal: the resampling step replaces the (normalised) weights w(i)t

by the random masses ξ(i)t /n, where ξ(i)

t is the number of offspring of the ithparticle. Since ξt has a multinomial distribution, ξ(i)

t can take any value be-tween 0 and n. That is, even when w(i)

t is high (the position of the ith particleis very likely), the ith particle may have very few offspring or even none atall (albeit with small probability).

If ξt is obtained by residual sampling, rather than by independent samplingwith replacement, then the above disadvantage can be avoided. In this case

ξt = [nwt] + ξt. (10.36)

In (10.36), [nwt] is the (row) vector of integer parts of the quantities nw(i)t .

That is,[nwt] =

([nw

(1)t

], . . . ,

[nw

(n)t

]),

and ξt has multinomial distribution

ξt = Multinomial(n, w

(1)t , . . . , w

(n)t

),

where the integer n is given by

n , n−n∑i=1

[nw

(i)t

]=

n∑i=1

nw

(i)t

and the weights w(i)

t are given by

w(i)t ,

nw

(i)t

∑ni=1

nw

(i)t

.By using residual sampling to obtain ξt, we ensure that the original weightsw

(i)t are replaced by a random weight which is at least [nw(i)

t ]/n. This is theclosest integer multiple of 1/n lower than the actual weight w(i)

t . In this way,eliminating particles with likely positions is no longer possible. As long as thecorresponding weight is larger than 1/n, the particle will have at least oneoffspring.

Lemma 10.22. If ξt has distribution given by (10.36), it satisfies both theunbiasedness condition (10.33) and condition (10.34).

Proof. The unbiasedness condition follows from the properties of the multi-nomial distribution:

E[ξ

(i)t | Ft

]=[nw

(i)t

]+ E

[ξ

(i)t | Ft

]=[nw

(i)t

]+ nw

(i)t

=[nw

(i)t

]+nw

(i)t

= nw

(i)t .


Also

E[(ξ

(i)t − nw

(i)t

)2

| Ft]

= E[(ξ

(i)t − nw

(i)t )2

| Ft]

= nw(i)t

(1− w(i)

t

)and

E[(ξ

(i)t − nw

(i)t

)(ξ

(j)t − nw

(j)t

)| Ft

]= −nw(i)

t w(j)t .

Then for all q = (q(i))ni=1 ∈ [−1, 1]n, we have

q>Ant q =n∑i=1

nw(i)t

(1− w(i)

t

)(q(i))2

− 2∑

1≤i<j≤n

nw(i)t w

(j)t q(i)q(j)

=n∑i=1

nw(i)t

(q(i))2

− n

(n∑i=1

w(i)t q(i)

)2

≤n∑i=1

nw(i)t ,

and since∑ni=1 nw

(i)t =

∑ni=1nw

(i)t < n, (10.34) holds with ct = 1. ut

Exercise 10.23. In addition to the bound on the second moment of πnt f−πnt fresulting by imposing the assumption (10.34) on the offspring distributionξt (see Exercise 10.19), prove that if ξt has multinomial distribution or thedistribution given by (10.36), then there exists a constant c such that, for allf ∈ B(Rd), we have

E[(πnt f − πnt f)4 | Ft

]≤ c‖f‖4∞

n2.

The residual sampling distribution is still suboptimal; the correction stepnow replaces the weight w(i)

t by the deterministic mass [nw(i)t ]/n to which it

adds a random mass given by ξ(i)t /n, where ξ(i)

t can take any value between0 and n. This creates a problem for particles with small weights. Even whenw

(i)t is small (the position of the ith particle is very unlikely) it may have a

large number of offspring: up to n offspring are possible (albeit with smallprobability). The multinomial distribution also suffers from this problem.

If ξt is obtained by using the branching algorithm described in Section9.2.1, then both the above difficulties are eliminated. In this case, the numberof offspring ξ(i)

t for each individual particle has the distribution

ξ(i)t =

[nw

(i)t

]with probability 1−

nw

(i)t

[nw

(i)t

]+ 1 with probability

nw

(i)t

,

(10.37)


whilst∑ni=1 ξ

(i)t remains equal to n.

If the particle has a weight w(i)t > 1/n, then the particle will have offspring.

Thus if the corresponding likelihood function gt(x(i)t ) is larger than the like-

lihood averaged over all the existing particles (1/n)∑nj=1 gt(x

(j)t ), then the

ith site is selected and the higher the weight w(i)t the more offspring the ith

particle will have. If w(i)t is less than or equal to 1/n, the particle will have

at most one offspring. It will have no offspring with probability 1− nw(i)t , as

in this case nw(i)t = nw(i)

t . Hence, if w(i)t 1/n, no mass is likely to be

assigned to site i; the ith particle is very unlikely and it is eliminated fromthe sample.

The algorithm described in Section 9.2.1 belongs to a class of algorithmscalled tree-based branching algorithms. If ξt is obtained by using the branchingalgorithm described in Section 9.2.1, then it is optimal in the sense that, forany i = 1, . . . , n, ξ(i)

t has the smallest possible variance amongst all integer-valued random variables with the given mean nw

(i)t . Hence, the algorithm

ensures that minimal randomness, as measured by the variance of the massallocated to individual sites, is introduced to the system. The minimal varianceproperty for the distribution produced by any tree-based branching algorithmholds true not only for individual sites but also for all groups of sites corre-sponding to a node of the building binary tree. A second optimality propertyof this distribution is that it has the minimal relative entropy with respectto the measure πt which it replaces in the class of all empirical distributionsof n particles of mass 1/n. The interested reader should consult Crisan [60]for details of these properties. See also Kunsch [169] for further results on thedistribution produced by the branching algorithm.

Lemma 10.24. If ξt is produced by the algorithm described in Section 9.2.1,it satisfies both unbiasedness condition (10.33) and condition (10.34).

Proof. The unbiasedness condition immediately follows from (10.37)

E[ξ

(i)t | Ft

]=[nw

(i)t

] (1−

nw

(i)t

)+([nw

(i)t

]+ 1)

nw(i)t

= nw

(i)t .

Also

E[(ξ

(i)t − nw

(i)t

)2

| Ft]

=nw

(i)t

(1−

nw

(i)t

),

and from Proposition 9.3, part (e),

E[(ξ

(i)t − nw

(i)t

)(ξ

(j)t − nw

(j)t

)| Ft

]≤ 0.

Then for all q = (q(i))ni=1 = [0, 1]n, we have

q>Ant q ≤n∑i=1

nw

(i)t

(1−

nw

(i)t

),


and since nw(i)t (1−nw

(i)t ) < 1

4 , following Lemma 10.20, condition (10.34)holds with ct = 1

4 . ut

For further theoretical results related to the properties of the above off-spring distributions, see Chopin [51] and Kunsch [169].

There exists another algorithm that satisfies the same minimal varianceproperty of the branching algorithm described above. It was introduced byCarpenter, Clifford and Fearnhead in the context of particle approximations(see [38]). The method had appeared earlier in the field of genetic algorithmsand it is known under the name of stochastic universal sampling (see Baker[6] and Whitley [268]). However, the offspring distribution generated by thismethod does not satisfy condition (10.34) and the convergence of the particlefilter with this method is still an open question.†

All offspring distributions presented above leave the total number of parti-cles constant and satisfy (10.34). However, the condition that the total numberof particles does not change is not essential.

One can choose the individual offspring numbers ξ(i)t to be mutually in-

dependent given Ft. As alternatives for the distribution of the integer-valuedrandom variables ξ(i)

t the following can be used.

1. ξ(i)t = B(n,w(i)

t ); that is, ξ(i)t are binomially distributed with parameters

(n,w(i)t ).

2. ξ(i)t = P (nw(i)

t ); that is, ξ(i)t are Poisson distributed with parameters

nw(i)t .

3. ξ(i)t are Bernoulli distributed with distribution given by (10.37).

Exercise 10.25. Show that if the individual offspring numbers ξ(i)t are mu-

tually independent given Ft and have any of the three distributions describedabove, then ξt satisfies both the unbiasedness condition and condition (10.34).

The Bernoulli distribution is the optimal choice for independent offspringdistributions. Since

∑ni=1 ξ

(i)t is no longer equal to n, the approximating mea-

sure πnt is no longer a probability measure. However, following the unbiased-ness condition (10.33) and condition (10.34), the total mass πnt (1) of theapproximating measure is a martingale which satisfies, for any t ∈ [0, T ],

E[(πnt (1)− 1)2

]≤ c

n,

where c = c(T ) is a constant independent of n. This implies that for large nthe mass oscillations become very small. Indeed, by Chebyshev’s inequality

P (|πnt (1)− 1| ≥ ε) ≤ c

nε2.

† See Kunsch [169] for some partial results.

10.6 Convergence of the Algorithm 281

Hence, having a non-constant number of particles does not necessarily leadto instability. The oscillations in the number of particles can in themselvesconstitute an indicator of the convergence of the algorithm. Such an offspringdistribution with independent individual offspring numbers is easy to imple-ment and saves computational effort. An algorithm with variable number ofparticles is presented in Crisan, Del Moral and Lyons [67]. Theorems 10.7 and10.12 and all other results presented above can be used in order to prove theconvergence of the algorithm in [67] and indeed any algorithm based on suchoffspring distributions.

10.6 Convergence of the Algorithm

First fix the observation process to an arbitrary value y0:T , where T is a finitetime horizon and we prove that the random measures resulting from the classof algorithm described above converge to πy0:tt and p

y0:t−1t for all 0 ≤ t ≤ T .

Exercise 10.26. Prove that πn0 converges in expectation to π0 and alsolimn→∞ πn0 = π0, P-a.s.

Theorem 10.27. Let (pnt )∞n=1 and (πnt )∞n=1 be the measure-valued sequencesproduced by the class of algorithms described above. Then, for all 0 ≤ t ≤ T ,we have

limn→∞

E [|πnt f − πtf |] = limn→∞

E [|pnt f − ptf |] = 0,

for all f ∈ B(Rd). In particular, (pnt )∞n=1 converges in expectation to py0:t−1t

and (πnt )∞n=1 converges in expectation to πy0:tt for all 0 ≤ t ≤ T .

Proof. We apply Theorem 10.7. Since (a1) holds as a consequence of Ex-ercise 10.26, it is only necessary to verify condition (b1). From Exercise10.16, E [pnt f | Ft] = πnt−1(Kt−1f) and using the independence of the sam-ple x(i)

t ni=1 conditional on Ft−1,

E[(pnt f − πnt−1(Kt−1f)

)2 | Ft−1

]=

1n2

E

( n∑i=1

f(x

(i)t

)−Kt−1f

(x

(i)t−1

))2∣∣∣∣∣∣ Ft−1

=

1n2

n∑i=1

E[(f(x

(i)t

))2

| Ft−1

]

− 1n2

n∑i=1

(E[Kt−1f

(x

(i)t−1

)| Ft−1

])2

=1nπnt−1

(Kt−1f

2 − (Kt−1f)2).


Therefore E[(pnt f − πnt−1Kt−1f)2] ≤ ‖f‖2∞/n and the first limit in (b1) issatisfied. The second limit in (b1) follows from Exercise 10.19. ut

Corollary 10.28. For all 0 ≤ t ≤ T , there exists a constant kt such that

E[(πnt f − πtf)2

]≤ kt‖f‖2∞

n, (10.38)

for all f ∈ B(Rd).

Proof. We proceed by induction. Since x(i)0 , i = 1, . . . , n is an n-independent

sample from π0,

E[(πn0 f − π0f)2

]≤ ‖f‖

2∞

n,

hence by Jensen’s inequality (10.38) is true for t = 0 with k0 = 1. Now assumethat (10.38) holds at time t− 1. Then

E[(πnt−1(Kt−1f)− πt−1(Kt−1f))2

]≤ kt−1‖Kt−1f‖2∞

n≤ kt−1‖f‖2∞

n. (10.39)

Also from the proof of Theorem 10.27,

E[(pnt f − πnt−1Kt−1f

)2] ≤ ‖f‖2∞n

. (10.40)

By using inequality (10.23) and the triangle inequality for the L2-norm,

E[(pnt f − ptf)2

]≤ kt‖f‖2∞

n, (10.41)

where kt = (√kt−1 + 1)2. In turn, (10.41) and (10.25) imply that


]≤ kt‖f‖2∞

n, (10.42)

where kt = 4kt‖gt‖2∞/(ptgt)2. From Exercise 10.19.


]≤ ct‖f‖2∞

n, (10.43)

where ct is the constant appearing in (10.34). Finally from (10.42), (10.43)and the triangle inequality (10.27), (10.38) holds with kt = (

√ct +

√kt)2.

This completes the induction step. ut

Condition (10.34) is essential in establishing the above rate of convergence.A more general condition than (10.34) is possible, for example, that thereexists α > 0 such that

q>Ant q ≤ nαct (10.44)

10.6 Convergence of the Algorithm 283

for any q ∈ [−1, 1]n. In this case, inequality (10.43) would become

E[(πnt f − πnt f)2] ≤ ct‖f‖2∞n2−α .

Hence the overall rate of convergence would take the form


]≤ kt‖f‖2∞nmax(2−α,1)

for all f ∈ B(Rd). Hence if α > 1 we will see a deterioration in the overallrate of convergence. On the other hand, if α < 1 no improvement in the rateof convergence is obtained as the error in all the other steps of the particlefilter remains of order 1/n. So α = 1 is the most suitable choice for condition(10.34).

Theorem 10.29. If the offspring distribution is multinomial or is given by(10.36), then for all 0 ≤ t ≤ T ,

limn→∞

pnt = py0:t−1t and lim

n→∞πnt = πy0:tt P-a.s.

Proof. We apply Theorem 10.12. Since condition (a2) holds as a consequenceof Exercise 10.26, it is only necessary to verify condition (b2). LetM⊂ Cb(Rd)be a countable, convergence determining set of functions (see Section A.10 fordetails). Following Exercise 10.16, for any f ∈M,

E[f(x

(i)t

) ∣∣∣ Ft−1

]= Kt−1f

(x

(i)t−1

)and using the independence of the sample x(i)

t ni=1 conditional on Ft−1,

E[(pnt f −Kt−1π

nt−1f

)4 | Ft−1

]= E

( 1n

n∑i=1

(f(x

(i)t

)−Kt−1f

(x

(i)t−1

)))4∣∣∣∣∣∣Ft−1

=

1n4

n∑i=1

E[(f(x

(i)t

)−Kt−1f

(x

(i)t−1

))4∣∣∣∣ Ft−1

]+

6n4

∑1≤i<j≤n

E[(f(x

(i)t

)−Kt−1f

(x

(i)t−1

))2

(f(x

(j)t

)−Kt−1f

(x

(j)t−1

))2∣∣∣∣Ft−1

]. (10.45)

Observe that since ‖Kt−1f‖∞ ≤ ‖f‖∞,

E[(f(x

(i)t

)−Kt−1f

(x

(i)t−1

))4∣∣∣∣ Ft−1

]≤ 16‖f‖4∞


and

E[(f(x

(i)t

)−Kt−1f

(x

(i)t−1

))2 (f(x

(j)t

)−Kt−1f

(x

(j)t−1

))2∣∣∣∣ Ft−1

]≤ 16‖f‖4∞.

Hence by taking the expectation of both terms in (10.45)

E[(pnt f −Kt−1π

nt−1f

)4] ≤ 16‖f‖4∞n3

+6n4

n(n− 1)2

16‖f‖4∞

≤ 48‖f‖4∞n2

. (10.46)

From (10.46), following Remark A.38 in the appendix, for any ε ∈ (0, 14 ) there

exists a positive random variable cf,ε which is almost surely finite such that∣∣pnt f −Kt−1πnt−1f

∣∣ ≤ cf,εnε

.

In particular |pnt f − Kt−1πnt−1f | converges to zero, P-a.s., for any f ∈ M.

Therefore limn→∞ dM(pnt ,Kt−1π

nt−1

)= 0 which is the first limit in (b2).

Similarly, following Exercise 10.23, one proves that, for all f ∈M,


]≤ c‖f‖4∞

n2(10.47)

which implies that limn→∞ dM (πnt , πnt ) = 0, hence also the second limit in

b2. holds. ut

We now consider the case where the observation process is no longer aparticular fixed outcome, but is random. With similar arguments one usesPropositions 10.14 and 10.15 to prove the following.

Corollary 10.30. Assume that for all t ≥ 0, there exists a constant ct > 0such that ptgt ≥ ct. Then we have

limn→∞

E[∣∣∣πn,Y0:t

t f − πtf∣∣∣] = lim

n→∞E[∣∣∣pn,Y0:t−1

t f − ptf∣∣∣] = 0

for all f ∈ B(Rd) and all t ≥ 0. In particular, (pn,Y0:t−1t )∞n=1 converges in

expectation to py0:t−1t and (πn,Y0:t

t )∞n=1 converges in expectation to πy0:tt for allt ≥ 0.

Corollary 10.31. If the offspring distribution is multinomial or is given by(10.36), then

limn→∞

pn,Y0:t−1t = pt and lim

n→∞πn,Y0:tt = πt P-a.s.

for all t ≥ 0.

10.7 Final Discussion 285

10.7 Final Discussion

The results presented in Section 10.3 provide efficient techniques for provingconvergence of particle algorithms. The necessary and sufficient conditions((a0), (b0)), ((a1), (b1)) and ((a2), (b2)) are natural and easy to verify as itcan be seen in the proofs of Theorems 10.27 and 10.29.

The necessary and sufficient conditions can be applied when the algorithmsstudied provide both πnt (the approximation to πt) and also pnt (the interme-diate approximation to pt). Algorithms are possible where πnt is obtained fromπnt−1 without using the approximation for pt. In other words one can performthe mutation step using a different transition from that of the signal. In thestatistics literature, the transition kernel Kt is usually called the importancedistribution. Should a kernel (or importance distribution) Kt be used whichis different from that of the signal Kt, the form of the weights appearing inthe selection step of the particle filter must be changed. The results presentedin Section 10.3 then apply for pt now given by Kt−1πt−1 and the weightedmeasure πnt defined in (10.21) given by

πnt =n∑i=1

w(i)t δ

x(i)t,

where w(i)t are the new weights. See Doucet et al. [83] and Pitt and Shephard

[244] and the references contained therein which describe the use of suchimportance distributions.

As already pointed out, the randomness introduced in the system at eachselection step must be kept to a minimum as it affects the rate of convergenceof the algorithm. Therefore one should not apply the selection step after everynew observation arrives. Assume that the information received from the ob-servation is ‘bad’ (i.e. the signal-to-noise ratio is small). Because of this, thelikelihood function is close to being constant and the corresponding weightsare all (roughly) equal; w(i)

t ' 1/n. In other words, the observation is unin-formative; it cannot distinguish between different sites and all particles areequally likely. In this case no selection procedure needs to be performed. Theobservation is stored in the weights of the approximation πnt and carried for-ward to the next step. If a correction procedure is nevertheless performed andξt has a minimal variance distribution, all particles will have a single offspring‘most of the time’. In other words the system remains largely unchanged withhigh probability. However, with small probability, the ith particle might haveno offspring (if w(i)

t < 1/n) or two offspring (if w(i)t > 1/n). Hence randomness

still enters the system and this can affect the convergence rates (see Crisanand Lyons [66] for a related result in the continuous time framework). If ξtdoes not have a minimal variance distribution, the amount of randomness iseven higher. It remains an open question as to when and how often one shoulduse the selection procedure.


The first paper on the sequential Monte Carlo methods was that of Hand-schin and Mayne [120] which appeared in 1969. Unfortunately, Handschin andMayne’s paper appeared at a time when the lack of computing power meantthat it could not be implemented; thus their ideas were overlooked. In the late1980s, the advances in computer hardware rekindled interest in obtaining se-quential Monte Carlo methods for approximating the posterior distribution.The first paper describing numerical integration for Bayesian filtering waspublished by Kitagawa [150] in 1987. The area developed rapidly followingthe publication of the bootstrap filter by Gordon, Salmond and Smith [106]in 1993. The development of the bootstrap filter was inspired by the earlierwork of Rubin [251] on the SIR algorithm from 1987. The use of the algo-rithm has spread very quickly among engineers and computer scientists. Animportant example is the work of Isard and Blake in computer vision (see[132, 133, 134]).

The first convergence results on particle filters in discrete time were pub-lished by Del Moral in 1996 (see [214, 215]). Together with Rigal and Salut,he produced several earlier LAAS-CNRS reports which were originally clas-sified ([219, 221, 220]) which contain the description of the bootstrap filter.The condition (10.34) was introduced by Crisan, Del Moral and Lyons in 1999(see [67]). The tree-based branching algorithm appeared in Crisan and Lyons[66].

In the last ten years we have witnessed a rapid development of the theoryof particle filters in discrete time. The discrete time framework has been exten-sively studied and a multitude of convergence and stability results have beenproved. A comprehensive account of these developments in the wider contextof approximations of Feynman–Kac formulae can be found in Del Moral [216]and the references therein.


10.1

i. For ϕ = IA where A is an arbitrary Borel set, Ktϕ ∈ B(Rd) by property(ii) of the transition kernel. By linearity, the same is true for ϕ beinga simple function, that is, a linear combination of indicator functions.Consider next an arbitrary ϕ ∈ B(Rd). Then there exists a sequence ofsimple functions (ϕn)n≥0 uniformly bounded which converges to ϕ. Thenby the dominated convergence theorem Kt(ϕn)(x) converges to Kt(ϕ)(x)for any x ∈ Rd. Hence Kt(ϕ)(x) is Borel-measurable. The boundednessresults from the fact that

|Ktϕ(x)| =∣∣∣∣∫

Rdϕ(y)Kt(x, dy)

∣∣∣∣ ≤ ‖ϕ‖∞ ∫RdKt(x, dy) = ‖ϕ‖∞

for any x ∈ Rd; hence ‖Ktϕ‖∞ ≤ ‖ϕ‖∞.


ii. Let Ai ∈ B(Rd) be a sequence of disjoint sets for i = 1, 2, . . . , then usingproperty (i) of Kt,

Ktqt(∪∞i=1Ai) =∫

RdKt(x,∪∞i=1Ai)qt(dx)

=∫

RdlimN→∞

N∑i=1

Kt(x,Ai)qt(dx)

= limN→∞

N∑i=1

∫RdKt(x,Ai)qt(dx) =

∞∑i=1

(Ktqt)(Ai),

where the bounded convergence theorem was used to interchange the limitand the integral (using Kt(x,Ω) = 1 as the bound). Consequently Ktqt iscountably additive and hence a measure. To check that it is a probabilitymeasure

Ktqt(Ω) =∫

RdKt(x,Ω)qt(dx) =

∫Rdqt(dx) = 1.

iii.(Ktqt)(ϕ) =

∫y∈Rd

ϕ(y)∫x∈Rd

Kt(x, dy)qt(dx).

By Fubini’s theorem, which is applicable since ϕ is bounded and as aconsequence of (ii) Ktqt is a probability measure, which implies thatKtqt(|ϕ|) ≤ ‖ϕ‖∞ <∞),

(Ktqt)(ϕ) =∫x∈Rd

qt(dx)∫y∈Rd

ϕ(y)Kt(x, dy) = qt(Ktϕ)

and the general case follows by induction.

10.5 Finite additivity is trivial from the linearity of the integral, and count-able additivity follows from the bounded convergence theorem, since ϕ isbounded. Thus ϕ ∗ p is a measure. It is clear that ϕ ∗ p(Ω) = p(ϕ)/p(ϕ) = 1,so it is a probability measure.

10.9

i. For arbitrary ϕ ∈ Cb(Rd). It is clear that

Ktϕ(x) =∫

Rdϕ(y)

1√2π

exp(− (y − a(x))2

2

)=∫

Rdϕ(y + a(x))

1√2π

exp(−y

2

2

).

Then by the dominated convergence theorem using the continuity of a,

limx→x0

Kt(ϕn)(x) = Kt(ϕn)(x0)

for arbitrary x0 ∈ Rd, hence the continuity of Ktϕ.


ii. Choose a strictly increasing ϕ ∈ Cb(Rd). Then, as above

limx↑0

Kt(ϕn)(x) =∫

Rdϕ(y − 1)

1√2π

exp(−y

2

2

)<

∫Rdϕ(y + 1)

1√2π

exp(−y

2

2

)= lim

x↓0Kt(ϕn)(x).

10.11

i. Let f ∈ Cb(R); then

µnf =1n

n∑i=1

f(i/n)

and

µf =∫ 1

0

f(x) dx.

As f ∈ Cb(R), it is Riemann integrable. Therefore the Riemann approxi-mation µnf → µf as n→∞.

ii. If f = 1Q∩[0,1] then µnf = 1, yet µf = 0. Hence µnf 6→ µf as n→∞.

10.16 Following part (a) of the iteration we get, for arbitrary f ∈ B(Rd),that

E[f(x

(i)t

) ∣∣∣ Ft−1

]= Kt−1f(x(i)

t−1) ∀i = 1, . . . , n;

hence

E[pnt f | Ft] =1n

n∑i=1

E[f(x

(i)t

) ∣∣∣ Ft−1

]=

1n

n∑i=1

Kt−1f(x

(i)t−1

)= Kt−1π

nt−1(f) = πt−1(Kt−1f).

10.18 The first assertion follows trivially from part (b) of the iteration. Nextobserve that, for all f ∈ B(Rd),

E[πnt f | Ft

]=

1n

n∑i=1

f(x(i)t )E

[ξ

(i)t | Ft

]= πnt f,

since E[ξ(i)t | Ft] = nw

(i)t for any i = 1, . . . , n.

10.19

E[(πnt f − πnt f)2 | Ft

]=

1n2

n∑i,j=1

f(x

(i)t

)f(x

(j)t

)(Ant )ij . (10.48)

By applying (10.34) with q =(q(i))di=1

, where q(i) = f(x(i)t )/‖f‖∞, we get

that


n∑i,j=1

f(x

(i)t

)‖f‖∞

(Ant )ijf(x

(j)t

)‖f‖∞

≤ nct. (10.49)

The exercise now follows from (10.48) and (10.49).

10.23 The multinomial distribution is the empirical distribution of an n-sample from the distribution πnt . Hence, in this case, πnt has the representation

πnt =1n

n∑1

δζ(i)t,

where ζ(i)t are random variables mutually independent given Ft such that

E[f(ζ(i)t ) | Ft] = πnt f for any f ∈ B(Rd). Using the independence of the

sample ζ(i)t ni=1 conditional on Ft,

E[(πnt f − πnt f)4 | Ft] =E

1n

(n∑i=1

(f(ζ(i)t )− πnt f)

)4∣∣∣∣∣∣ Ft

=1n4

n∑i=1

E[(f(ζ(i)t )− πnt f)4 | Ft]

+6n4

∑1≤i<j≤n

E[(f(ζ(i)t )− πnt f)2(f(ζ(j)

t )− πnt f)2 | Ft].

Observe that since |πnt f | ≤ ‖f‖∞,

E[(f(ζ(i)t )− πnt f)4 | Ft] ≤ 16‖f‖4∞

andE[(f(ζ(i)

t )− πnt f)2(f(ζ(j)t )− πnt f)2 | Ft] ≤ 16‖f‖4∞.

Hence

E[(πnt f − πnt f)4 | Ft] ≤16‖f‖4∞n3

+48‖f‖4∞(n− 1)

n3≤ 48‖f‖4∞

n2.

The bound for the case when the offspring distribution is given by (10.36) isproved in a similar manner.

10.25 πn0 is the empirical measure associated with a set of n random particlesof mass 1/n whose positions x(i)

0 for i = 1, . . . , n form a sample of size n fromπ0. Hence, in particular, E[f(x(i)

0 )] = πn0 f for any f ∈ B(Rd) and by a similarargument to that in Exercise 10.23,

E[(πn0 f − π0f)4 | Ft] ≤48‖f‖4∞n2

,


which implies the convergence in expectation by Jensen’s inequality and thealmost sure convergence follows from Remark A.38 in the appendix.

10.26 Immediate from the computation of the first and second moments ofthe binomial, Poisson and Bernoulli distributions and the fact that, due to theindependence of the random variables ξ(i)

t , the conditional covariance matrixAnt is diagonal,

(Ant )ij = E[(ξ

(i)t − nw

(i)t

)(ξ

(j)t − nw

(j)t

)| Ft

]= 0, i 6= j,

where wt , (w(i)t )ni=1 is the vector of weights. Hence

q>Ant q =n∑i=1

(q(i))2E[(ξ

(i)t − nw

(i)t

)2

| Ft]

for any n-dimensional vector q =(q(i))ni=1∈ [−1, 1]n.

Part III

Appendices

A

Measure Theory

A.1 Monotone Class Theorem

Let S be a set. A family C of subsets of S is called a π-system if it is closedunder finite intersection. That is, for any A,B ∈ C we have that A ∩B ∈ C.

Theorem A.1. Let H be a vector space of bounded functions from S into Rcontaining the constant function 1. Assume that H has the property that forany sequence (fn)n≥1 of non-negative functions in H such that fn f wheref is a bounded function on S, then f ∈ H. Also assume that H contains theindicator function of every set in some π-system C. Then H contains everybounded σ(C)-measurable function of S.

For a proof of Theorem A.1 and other related results see Williams [272] orRogers and Williams [248].

A.2 Conditional Expectation

Let (Ω,F ,P) be a probability space and G ⊂ F be a sub-σ-algebra of F .The conditional expectation of an integrable F-measurable random variable ξgiven G is defined as the integrable G-measurable random variable, denotedby E[ξ | G], with the property that∫

A

ξ dP =∫A

E[ξ | G] dP, for all A ∈ G. (A.1)

Then E[ξ | G] exists and is almost surely unique (for a proof of this result seefor example Williams [272]). By this we mean that if ξ is another G-measurableintegrable random variable such that∫

A

ξ dP =∫A

E[ξ | G] dP, for all A ∈ G,

294 A Measure Theory

then E[ξ | G] = ξ, P-a.s.The following are some of the important properties of the conditional

expectation which are used throughout the text.

a. If α1, α2 ∈ R and ξ1, ξ2 are F-measurable, then

E[α1ξ1 + α2ξ2 | G] = α1E[ξ1 | G] + α2E[ξ2 | G], P-a.s.

b. If ξ ≥ 0, then E[ξ | G] ≥ 0, P-a.s.c. If 0 ≤ ξn ξ, then E[ξn | G] E[ξ | G], P-a.s.d. If H is a sub-σ-algebra of G, then E [E[ξ | G] | H] = E[ξ | H], P-a.s.e. If ξ is G-measurable, then E[ξη | G] = ξE[η | G], P-a.s.f. If H is independent of σ(σ(ξ),G), then

E[ξ | σ(G,H)] = E[ξ | G], P-a.s.

The conditional probability of a set A ∈ F with respect to the σ-algebra Gis the random variable denoted by P(A | G) defined as P(A | G) , E[IA | G],where IA is the indicator function of the set A. From (A.1),

P(A ∩B) =∫B

P(A | G) dP, for all B ∈ G. (A.2)

This definition of conditional probability has the shortcoming that the condi-tional probability P(A | G) is only defined outside of a null set which dependsupon the set A. As there may be an uncountable number of possible choicesfor A, P(· | G) may not be a probability measure.

Under certain conditions regular conditional probabilities as in Definition2.28 exist. Regular conditional distributions (following the nomenclature inBreiman [23] whose proof we follow) exist under much less restrictive condi-tions.

Definition A.2. Let (Ω,F ,P) be a probability space, (E, E) be a measurablespace, X : Ω → E be an F/E-measurable random element and G a sub-σ-algebra of F . A function Q(ω,B) defined for all ω ∈ Ω and B ∈ E is called aregular conditional distribution of X with respect to G if

(a) For each B ∈ E, the map Q(·, B) is G-measurable.(b) For each ω ∈ Ω, Q(ω, ·) is a probability measure on (E, E).(c) For any B ∈ E,

Q(·, B) = P(X ∈ B | G) P-a.s. (A.3)

Theorem A.3. If the space (E, E) in which X takes values is a Borel space,that is, if there exists a function ϕ : E → R such that ϕ is E-measurableand ϕ−1 is B(R)-measurable, then the regular conditional distribution of thevariable X conditional upon G in the sense of Definition A.2 exists.

A.2 Conditional Expectation 295

Proof. Consider the case when (E, E) = (R,B(R)). First we construct a reg-ular version of the distribution function P(X < x | G). Define a countablefamily of random variables by selecting versions Qq(ω) = P(X < q | G)(ω).For each q ∈ Q, define for r, q ∈ Q,

Mr,q , ω : Qr < Qq

and then define the set on which monotonicity of the distribution functionfails

M ,⋃r>qr,q∈Q

Mr,q.

It is clear from property (b) of the conditional expectation that P(M) = 0.Similarly define for q ∈ Q,

Nq ,

ω : lim

r↑qQr 6= Qq

and

N ,⋃q∈Q

Nq;

by property (c) of conditional expectation it follows that P(Nq) = 0, soP(N) = 0. Finally define

L∞ ,

ω : limq→∞q∈Q

Qq 6= 1

and L−∞ ,

ω : limq→−∞q∈Q

Qq 6= 0

,

and again P(L∞) = P(L−∞) = 0.Define

F (x | G) ,

limr↑xr∈Q

Qr if ω /∈M ∪N ∪ L∞ ∪ L−∞

Φ(x) otherwise,

where Φ(x) is the distribution function of the normal N(0, 1) distribution (itschoice is arbitrary). It follows using property (c) of conditional expectationapplied to the functions fri = 1(−∞,ri) with ri ∈ Q a sequence such that ri ↑ xthat F (x | G) satisfies all the properties of a distribution function and is aversion of P(X < x | G).

This distribution function can be extended to define a measure Q(· | G).Let H be the class of B ∈ B(R) such that Q(B | G) is a version of P(X ∈ B |G). It is clear that H contains all finite disjoint unions of intervals of the form[a, b) for a, b ∈ R so by the monotone class theorem A.1 the result follows.

In the general case, Y = ϕ(X) is a real-valued random variable and so hasregular conditional distribution such that for B ∈ B(R), Q(B | G) = P(Y ∈B | G); thus define

Q(B | G) , Q(ϕ(B) | G),

and since ϕ−1 is measurable it follows that Q has the required properties. ut


Lemma A.4. If X is as in the statement of Theorem A.3 and ψ is a E-measurable function such that E[|ψ(X)|] < ∞ then if Q(· | G) is a regularconditional distribution for X given G it follows that

E[ψ(X) | G] =∫E

ψ(x)Q(dx | G).

Proof. If A ∈ B then it is clear that the result follows from (A.3). By linearitythis extends to simple functions, by monotone convergence to non-negativefunctions, and in general write ψ = ψ+ − ψ−. ut

A.3 Topological Results

Definition A.5. A metric space (E, d) is said to be separable if it has acountable dense set. That is, for any x ∈ E, given ε > 0 we can find y in thiscountable set such that d(x, y) < ε.

Lemma A.6. Let (X, ρ) be a separable metric space. Then X is homeomor-phic to a subspace of [0, 1]N, the space of sequences of real numbers in [0, 1]with the topology of co-ordinatewise convergence.

Proof. Define a bounded version of the metric ρ , ρ/(1 + ρ); it is easilychecked that this is a metric on X, and the space (X, ρ) is also separable.Clearly the metric satisfies the bounds 0 ≤ ρ ≤ 1. As a consequence of sepa-rability we can choose a countable set x1,x2, . . . which is dense in (X, ρ).

Define J = [0, 1]N and endow this space with the metric d which generatedthe topology of co-ordinatewise convergence. Define α : X → J ,

α : x 7→ (ρ(x, x1), ρ(x, x2), . . .).

Suppose x(n) → x in X; then by continuity of ρ it is immediate thatρ(x(n), xk)→ ρ(x, xk) for each k ∈ N and thus α(x(n))→ α(x).

Conversely if α(x(n))→ α(x) then this implies that ρ(x(n), xk)→ ρ(x, xk)for each k. Then by the triangle inequality

ρ(x(n), x) ≤ ρ(x(n), xk) + ρ(xk, x)

and since ρ(x(n), xk)→ ρ(x, xk) it is immediate that

lim supn→∞

ρ(x(n), x) ≤ 2ρ(xk, x) ∀k.

As this holds for all k ∈ N and the xks are dense in X we may pick a sequencexmk → x whence ρ(x(n), x) → 0 as n → ∞. Hence α is a homeomorphismX → J . ut

The following is a standard result and the proof is based on that in Rogersand Williams [248] who reference Bourbaki [22] Chapter IX, Section 6, No 1.

A.3 Topological Results 297

Theorem A.7. A complete separable metric space X is homeomorphic to aBorel subset of a compact metric space.

Proof. By Lemma A.6 there is a homeomorphism α : X → J . Let d denotethe metric giving the topology of co-ordinatewise convergence on J . We mustnow consider α(X) and show that it is a countable intersection of open setsin J and hence belongs to the σ-algebra of open sets, the Borel σ-algebra.

For ε > 0 and x ∈ X we can find δ(ε) such that for any y ∈ X,d(α(x), α(y)) < δ implies that ρ(x, y) < ε. For n ∈ N set ε = 1/(2n) andthen consider the ball B(α(x), δ(ε) ∧ ε). It is immediate that the d-diameterof this ball is at most 1/n. But also, as a consequence of the choice of δ, theimage under α−1 of the intersection of this ball with X has ρ-diameter atmost 1/n.

Let α(X) be the closure of α(X) under the metric d in J . Define a setUn ⊆ α(X) to be the set of x ∈ α(X) such that there exists an open ball Nx,nabout x of d-diameter less than 1/n, with ρ-diameter of the image under α−1

of the intersection of α(X) and this ball less than 1/n. By the argument ofthe previous paragraph we see that if x ∈ α(X) we can always find such aball; hence α(X) ⊆ Un.

For x ∈ ∩nUn choose xn ∈ α(X)∩⋂k≤nNx,k. By construction d(x, xk) ≤

1/n, thus xn → x as n → ∞ under the d metric on J . However, for r ≥n both points xr and xn are in Nx,n thus ρ(α−1(xr), α−1(xn)) ≤ 1/n, so(α−1(xr))r≥1 is a Cauchy sequence in (X, ρ). But this space is complete sothere exists y ∈ X such that α−1(xn) → y. As α is a homeomorphism thisimplies that d(xn, α(y))→ 0. Hence by uniqueness of limits x = α(y) and thusit is immediate that x ∈ α(X). Therefore ∩n Un ⊆ α(X); since α(X) ⊆ Un itfollows immediately that

α(X) =⋂n

Un. (A.4)

It is now necessary to show that Un is relatively open in α(X). From thedefinition of Un, for any x ∈ Un we can find Nx,n with diameter propertiesas above which is a subset of J containing x. For any arbitrary z ∈ α(X), by(A.4) there exists x ∈ Un such that z ∈ Nx,n; then by choosing Nz,n = Nx,nit is clear that z ∈ Un. Therefore Nx,n ∩ α(X) ⊆ Un from which we concludethat Un is relatively open in α(X). Therefore we can write Un = α(X) ∩ Vnwhere Vn is open in J

α(X) =⋂n

Un = α(X) ∩

(⋂n

Vn

), (A.5)

where Vn are open subsets of J . It only remains to show that α(X) can beexpressed as a countable intersection of open sets; this is easily done since

α(X) =⋂n

x ∈ J : d(x, α(X)) < 1/n ,


therefore it follows that α(X) is a countable intersection of open sets in J .Together with (A.5) it follows that α(X) is a countable intersection of opensets. ut

Theorem A.8. Any compact metric space X is separable.

Proof. Consider the open cover of X which is the uncountable union of allballs of radius 1/n centred on each point in X. As X is compact there exists afinite subcover. Let xn1 , . . . , xnNn be the centres of the balls in one such finitesubcover. By a diagonal argument we can construct a countable set which isthe union of all these centres for all n ∈ N. This set is clearly dense in X andcountable, so X is separable. ut

Theorem A.9. If E is a compact metric space then the set of continuousreal-valued functions defined on E is separable.

Proof. By Theorem A.8, the space E is separable. Let x1,x2, . . . be a countabledense subset of E. Define h0(x) = 1, and hn(x) = d(x, xn), for n ≥ 1. Nowdefine an algebra of polynomials in these hns with coefficients in the rationals

A =x 7→

∑qn0,...,nrk0,...,kr

hn0k0

(x) . . . hnrkr (x) : qn0,...,nrk0,...,kr

∈ Q.

The closure of A is an algebra containing constant functions and it is clearthat it separates points in E, therefore by the Stone–Weierstrass theorem, itfollows that A is dense in C(E). ut

Corollary A.10. If E is a compact metric space then there exists a countableset f1, f2, . . . which is dense in C(E).

Proof. By Theorem A.8 E is separable, so by Theorem A.9 the space C(E)is separable and hence has a dense countable subset. ut

A.4 Tulcea’s Theorem

Tulcea’s theorem (see Tulcea [265]) is frequently stated in the form for productspaces and their σ-algebras (for a very elegant proof in this vein see Ethier andKurtz [95, Appendix 9]) and this form is sufficient to establish the existenceof stochastic processes. We give the theorem in a more general form wherethe measures are defined on the same space X, but defined on an increasingfamily of σ-algebras Bn as this makes the important condition on the atomsof the σ-algebras clear. The approach taken here is based on that in Stroockand Varadhan [261].

Define the atom A(x) of the Borel σ-algebra B on the space X, for x ∈ Xby

A(x) ,⋂B : B ∈ B, x ∈ B, (A.6)

that is, A(x) is the smallest element of B which contains x.

A.4 Tulcea’s Theorem 299

Theorem A.11. Let (X,B) be a measurable space and let Bn be an increasingfamily of sub-σ-algebras of B such that B = σ(

⋃∞n=1 Bn). Suppose that these

σ-algebras satisfy the following constraint. If An is a sequence of atoms suchthat An ∈ Bn and A1 ⊇ A2 ⊇ · · · then

⋂∞n=0An 6= ∅.

Let P0 be a probability measure defined on B0 and let πn be a family ofprobability kernels, where πn(x, ·) is a measure on (X,Bn) and the mappingx 7→ πn(x, ·) is Bn−1-measurable. Such a probability kernel allows us to defineinductively a family of probability measures on (X,Bn) via

Pn(A) ,∫X

πn(x,A)Pn−1(dx), (A.7)

with the starting point for the induction being given by the probability measureP0.

Suppose that the kernels πn(x, ·) satisfy the compatibility condition that forx /∈ Nn, where Nn is a Pn-null set, the kernel πn+1(x, ·) is supported on An(x)(i.e. if B ∈ Bn+1 and B ∩ An(x) = ∅ then πn+1(x,B) = 0). That is, startingfrom a point x, the transition measure only contains with positive probabilitytransitions to points y such that x and y belong to the same atom of Bn.

Then there exists a unique probability measure P defined on B such thatP|Bn = Pn for all n ∈ N.

Proof. It is elementary to see that Pn as defined in (A.7) is a probabilitymeasure on Bn and that Pn+1 agrees with Pn on Bn. We can then define a setfunction P on

⋃Bn by setting P(Bn) = Pn(Bn) for Bn ∈ Bn.

From the definition (A.7), for B ∈ Bn we have defined Pn inductively viathe transition functions

Pn(Bn) =∫X

· · ·∫X

πn(qn−1, B)πn−1(qn−2,dqn−1) · · ·π1(q0,dq1) P0(dq0).

(A.8)To simplify the notation define πm,n such that πm,n(x, ·) is a measure onM(X,Bn) as follows.

If m ≥ n ≥ 0 and B ∈ Bn, then define πm,n(x,B) = 1B(x) which is clearlyBn-measurable and hence as Bm ⊇ Bn, x 7→ πm,n(x,B) is also Bm-measurable.If m < n define πm,n inductively using the transition kernel πn,

πm,n(x,B) ,∫X

πn(yn−1, B)πm,n−1(x, dyn−1). (A.9)

It is clear that in both cases x 7→ πm,n(x, ·) is Bm-measurable. Thus πm,n

can be viewed as a transition kernel from (X,Bm) to (X,Bn). From thesedefinitions, for m < n

πm,n(x,B) =∫X

· · ·∫X

πn(yn−1, B) · · ·πm+1(ym,dym+1)πm,m(x,dym)

=∫X

· · ·∫X

πn(yn−1, B) · · ·πm+2(ym+1,dym+2)πm+1(x, dym+1).


It therefore follows from the above with m = 0 and (A.8) that for B ∈ Bn,

P(Bn) = Pn(Bn) =∫X

π0,n(y0, B)P0(dy0). (A.10)

We must show that P is a probability measure on⋃∞n=0 Bn, as then the

Caratheodory extension theorem† establishes the existence of an extensionto a probability measure on (X,σ (

⋃∞n=0 B)). The only non-trivial condition

which must be verified for P to be a measure is countable additivity.A necessary and sufficient condition for countable additivity of P is that

if Bn ∈⋃n Bn, are such that B1 ⊇ B2 ⊇ · · · and

⋂nBn = ∅ then P(Bn)→ 0

as n→∞ (the proof can be found in many books on measure theory, see forexample page 200 of Williams [272]). It is clear that the non-trivial cases arecovered by considering Bn ∈ Bn for each n ∈ N.

We argue by contradiction; suppose that P(Bn) ≥ ε > 0 for all n ∈ N. Wemust exhibit a point of

⋂nBn; as we started with the assumption that this

intersection was empty, this is the desired contradiction.Define

F 0n ,

x ∈ X : π0,n(x,Bn) ≥ ε/2

. (A.11)

Since x 7→ π0,n(x, ·) is B0-measurable, it follows that Fn0 ∈ B0. Then from(A.10) it is clear that

P(Bn) ≤ P0(F 0n) + ε/2.

As by assumption P(Bn) ≥ ε for all n ∈ N, we conclude that P0(F 0n) ≥ ε/2

for all n ∈ N.Suppose that x ∈ F 0

n+1; then π0,n+1(x,Bn+1) ≥ ε/2. But Bn+1 ⊆ Bn, soπ0,n+1(x,Bn) ≥ ε/2. From (A.9) it follows that

π0,n+1(x,Bn) =∫X

πn+1(yn, Bn)π0,n(x, dyn),

for y /∈ Nn, the probability measure πn+1(y, ·) is supported on An(y). AsBn ∈ Bn, from the definition of an atom, it follows that y ∈ Bn if and onlyif An(y) ⊆ Bn, thus πn+1(y,Bn) = 1Bn(y) for y /∈ Nn. So on integration weobtain that π0,n(x,Bn) = π0,n+1(x,Bn) ≥ ε/2. Thus x ∈ Fn0 . So we haveshown that Fn+1

0 ⊆ Fn0 .Since P0(Fn0 ) ≥ ε/2 for all n and the Fn form a non-increasing sequence,

it is then immediate that P0(⋂∞n=0 F

n0 ) ≥ ε/2, whence we can find x0 /∈ N0

such that π0,n(x0, Bn) ≥ ε/2 for all n ∈ N.Now we proceed inductively; suppose that we have found x0, x1, . . . xm−1

such that x0 /∈ N0 and xi ∈ Ai−1(xi−1) \ Ni for i = 1, . . . ,m − 1, and

† Caratheodory extension theorem: Let S be a set, S0 be an algebra of subsetsof S and S = σ(S0). Let µ0 be a countably additive map µ0 : S0 → [0,∞];then there exists a measure µ on (S,S) such that µ = µ0 on S0. Furthermoreif µ0(S) < ∞, then this extension is unique. For a proof of the theorem see, forexample, Williams [272] or Rogers and Williams [248].

A.4 Tulcea’s Theorem 301

πi,n(xi, Bn) ≥ ε/2i+1 for all n ∈ N for each i = 0, . . . , m − 1. We havealready established the result for the case m = 0. Now define

Fmn , x ∈ X : πm,n(x,Bn) ≥ ε/2m+1;

from the integral representation for πm,n,

πm−1,n(x,Bn) =∫X

πm,n(ym, Bn)πm(x, dym),

it follows by an argument analogous to that for F 0n , that

ε/2m ≤ πm−1,n(xm−1, Bn) ≤ ε/2m+1 + πm(xm−1, Fmn ),

where the inequality on the left hand side follows from the inductive hypoth-esis. As in the case for m = 0, we can deduce that Fmn+1 ⊆ Fmn . Thus

πm

(xm−1,

∞⋂n=0

Fmn

)≥ ε/2m+1, (A.12)

which implies that we can choose xm ∈⋂∞n=0 F

mn , such that πm,n(xm, Bn) >

ε/2m+1 for all n ∈ N, and from (A.12) as the set of suitable xm has strictlypositive probability, it cannot be empty, and we can choose an xm not inthe Pm-null set Nm. Therefore, this choice can be made such that xm ∈Am−1(xm−1) \Nm. This establishes the required inductive step.

Now consider the case of πn,n(xn, Bn); we see from the definition that thisis just 1Bn(xn), but by choice of the xns, πn,n(xn, Bn) > 0. Consequentlyas xn /∈ Nn, by the support property of the transition kernels, it followsthat An(xn) ⊆ Bn for each n. Thus

⋂An(xn) ⊂

⋂Bn and if we define

Kn ,⋂ni=0Ai(xi) it follows that xn ∈ Kn and Kn is a descending sequence;

by the σ-algebra property it is clear that Kn ∈ Bn, and since An(xn) is anatom in Bn it follows that Kn = An(xn). We thus have a decreasing sequenceof atoms; by the initial assumption, such an intersection is non-empty, that is,⋂An(xn) 6= ∅ which implies that

⋂Bn 6= ∅, but this is a contradiction, since

we assumed that this intersection was empty. Therefore P is countably additiveand the existence of an extension follows from the theorem of Caratheodory.

ut

A.4.1 The Daniell–Kolmogorov–Tulcea Theorem

The Daniell–Kolmogorov–Tulcea theorem gives conditions under which thelaw of a stochastic process can be extended from its finite-dimensional distri-butions to its full (infinite-dimensional) law.

The original form of this result due to Daniell and Kolmogorov (see Doob[81] or Rogers and Williams [248, section II.30]) requires topological conditionson the space X; the space X needs to be Borel, that is, homeomorphic to a


Borel set in some space, which is the case if X is a complete separable metricspace as a consequence of Theorem A.7.

It is possible to take an alternative probabilistic approach using Tulcea’stheorem. In this approach the finite-dimensional distributions are related toeach other through the use of regular conditional probabilities as transitionkernels; while this does not explicitly use topological conditions, such condi-tions may be required to establish the existence of these regular conditionalprobabilities (as was seen in Exercise 2.29 regular conditional probabilities areguaranteed to exist if X is a complete separable metric space).

We use the notation XI for the I-fold product space generated by X, thatis, XI =

∏i∈I Xi where Xis are identical copies of X, and let BI denote the

product σ-algebra on XI ; that is, BI =∏i∈I Bi where Bi are copies of B. If

U and V are finite subsets of the index set I, let πVU denote the restrictionmap from XV to XU .

Theorem A.12. Let X be a complete separable metric space. Let µU be afamily of probability measures on (XU ,BU ), for U any finite subset of I.Suppose that these measures satisfy the compatibility condition for U ⊂ V

µU = µV πVU .

Then there exists a unique probability measure on (XI ,BI) such that µU =µ πIU for any U a finite subset of I.

Proof. Let Fin(I) denote the set of all finite subsets of I. It is immediate fromthe compatibility condition that we can find a finitely additive µ0 which is aprobability measure on (XI ,

⋃F∈Fin(I)(π

IF )−1(BF )), such that for U ∈ Fin(I),

µU = (πIU )−1 µ0. If we can show that µ0 is countably additive, then theCaratheodory extension theorem implies that µ0 can be extended to a measureµ on (XI ,BI).

We cannot directly use Tulcea’s theorem to construct the extension mea-sure; however we can use it to show that µ0 is countably additive. SupposeAn is a non-increasing family of sets An ∈

⋃F∈Fin(I)(π

IF )−1(BF ) such that

An ↓ ∅; we must show that µ0(An)→ 0.Given the Ais, we can find finite subsets Fi of I such that Ai ∈ (πIFi)

−1BFifor each i. Without loss of generality we can choose this sequence so thatF0 ⊂ F1 ⊂ F2 ⊂ · · · . Define Fn , (πIFn)−1(BFn) ⊂ BI . As a consequence ofthe product space structure, these σ-algebras satisfy the condition that theintersection of a decreasing family of atoms Zn ∈ Fn is non-empty.

For q ∈ XI and B ∈ Fn, let

πn(q,B) ,

(µFn

∣∣∣∣(πFnFn−1

)−1

(BFn−1))(

πIFn(q),(πIFn

)−1(B)

),

where (µFn | G)(ω, ·) for G ⊂ BFn is the regular conditional probability dis-tribution of µFn given G. We refer to the properties of regular conditional

A.5 Cadlag Paths 303

probability distribution using the nomenclature of Definition 2.28. This πn isa probability kernel from (XI ,Fn−1) to (XI ,Fn), i.e. πn(q, ·) is a measureon (XI ,Fn) and the map q 7→ πn(q, ·) is Fn−1-measurable (which followsfrom property (b) of regular conditional distribution). In order to apply Tul-cea’s theorem we must verify that the compatibility condition is satisfied i.e.πn(q, ·) is supported for a.e. q on the atom in Fn−1 containing q which isdenoted An−1(q). This is readily established by computing π(q, (An−1(q))c)and using property (c) of regular conditional distribution and the fact thatq /∈ (An−1(q))c. Thus we can apply Tulcea’s theorem to find a unique proba-bility measure µ on (XI , σ(

⋃∞n=0 Fn)) such that µ is equal to µ0 on

⋃∞n=0 Fn.

Hence as An ∈ Fn, it follows that µ(An) = µ0(An) for each n and thereforesince µ is countably additive µ0(An) ↓ 0 which establishes the required count-able additivity of µ0. ut

A.5 Cadlag Paths

A cadlag (continue a droite, limite a gauche) path is one which is right con-tinuous with left limits; that is, xt has cadlag paths if for all t ∈ [0,∞), thelimit xt− exists and xt = xt+. Such paths are sometimes described as RCLL(right continuous with left limits). The space of cadlag functions from [0,∞)to E is conventionally denoted DE [0,∞).

Useful references for this material are Billingsley [19, Chapter 3], Ethierand Kurtz [95, Sections 3.5–3.9], and Whitt [269, Chapter 12].

A.5.1 Discontinuities of Cadlag Paths

Clearly cadlag paths can only have left discontinuities, i.e. points t wherext 6= xt−.

Lemma A.13. For any ε > 0, a cadlag path taking values in a metric space(E, d) has at most a finite number of discontinuities of size in the metric dgreater than ε; that is, the set

D = t ∈ [0, T ] : d(xt, xt−) > ε

contains at most a finite number of points.

Proof. Let τ be the supremum of t ∈ [0, T ] such that [0, t) can be finitelysubdivided 0 < t0 < t1 < · · · < tk = t with the subdivision having theproperty that for i = 0, . . . , k− 1, sups,r∈[ti,ti+1) d(xs, xr) < ε. As right limitsexist at 0 it is clear that τ > 0 and since a left limit exists at τ− it is clearthat the interval [0, τ) can be thus subdivided. Right continuity implies thatthere exists δ > 0 such that for 0 ≤ t′−t < δ, then d(xt′ , xt) < ε; consequentlythe result holds for [0, t′), which contradicts the fact that τ is the supremumunless τ = T , consequently τ = T . Therefore [0, T ) can be so subdivided:


jumps of size greater than ε can only occur at the tis, of which there are afinite number and thus there must be at most a finite number of such jumps.

ut

Lemma A.14. Let X be a cadlag stochastic process taking values in a metricspace (E, d); then

t ∈ [0,∞) : P(Xt− 6= Xt) > 0

contains at most countably many points.

Proof. For ε > 0, define

Jt(ε) , ω : d(Xt(ω), Xt−(ω)) > ε

Fix ε, then for any T > 0, δ > 0 we show that there are at most a finitenumber of points t ∈ [0, T ] such that P(Jt(ε)) > δ. Suppose this is false, andan infinite sequence ti of disjoint times ti ∈ [0, T ] exists. Then by Fatou’slemma

P(

lim infi→∞

(Jti(ε))c)≤ lim inf

i→∞P ((Jti(ε))

c)

thus

P(

lim supi→∞

Jti(ε))≥ lim sup

i→∞P(Jti(ε)) > δ,

so the event that Jt(ε) occurs for an infinite number of the tis has strictlypositive probability and is hence non empty. This implies that there is a cadlagpath with an infinite number of jumps in [0, T ] of size greater than ε, whichcontradicts the conclusion of Lemma A.13. Taking the union over a countablesequence δn ↓ 0, it then follows that P(Jt(ε)) > 0 for at most a countable setof t ∈ [0, T ].

Clearly P(Jt(ε)) → P(Xt 6= Xt−) as ε → 0, thus the set t ∈ [0, T ] :P(Xt 6= Xt−) > 0 contains at most a countable number of points. By takingthe countable union over T ∈ N, it follows that t ∈ [0,∞) : P(Xt 6= Xt−) > 0is at most countable. ut

A.5.2 Skorohod Topology

Consider the sequence of functions xn(t) = 1t≥1/n, and the function x(t) =1t>0 which are all elements of DE [0,∞). In the uniform topology which weused on CE [0,∞), as n → ∞ the sequence xn does not converge to x; yetconsidered as cadlag paths it appears natural that xn should converge to xsince the location of the unit jump of xn converges to the location of the unitjump of x. A different topology is required. The Skorohod topology is the mostfrequently used topology on the space DE [0,∞) which resolves this problem.Let λ : [0,∞)→ [0,∞), and define

A.5 Cadlag Paths 305

γ(λ) , esssupt≥0

| log λ′(t)|

= sups>t≥0

∣∣∣∣logλ(s)− λ(t)

s− t

∣∣∣∣ .Let Λ be the subspace of Lipschitz continuous increasing functions from[0,∞)→ [0,∞) such that λ(0) = 0, limt→∞ λ(t) =∞ and γ(λ) <∞.

The Skorohod topology is most readily defined in terms of a metric whichinduces the topology. For x, y ∈ DE [0,∞) define a metric dDE (x, y) by

dDE (x, y) = infλ∈Λ

[γ(λ) ∨

∫ ∞0

e−ud(x, y, λ, u) du],

whered(x, y, λ, u) = sup

t≥0d(x(t ∧ u), y(λ(t) ∧ u)).

It is of course necessary to verify that this satisfies the definition of a metric.This is straightforward, but details may be found in Ethier and Kurtz [95,Chapter 3, pages 117-118]. For the functions xn and x in the example, it isclear that dDR(xn, x)→ 0 as n→∞. While there are other simpler topologieswhich have this property, the following proposition is the main reason whythe Skorohod topology is the preferred choice of topology on DE .

Proposition A.15. If the metric space (E, d) is complete and separable, then(DE [0,∞), dDE ) is also complete and separable.

Proof. The following proof follows Ethier and Kurtz [95]. As E is separable,it has a countable dense set. Let xnn≥1 be such a set. Given n, 0 = t0 <t1 < · · · < tn where tj ∈ Q+ and ij ∈ N for j = 0, . . . , n define the piecewiseconstant function

x(t) =

xik tk ≤ t < tk+1

xin t ≥ tn.

The set of all such functions forms a dense subset of DE [0,∞), therefore thespace is separable.

To show that the space is complete, suppose that ynn≥1 is a Cauchysequence in (DE [0,∞), dDE ), which implies that there exists an increasingsequence of numbers Nk such that for n,m ≥ Nk,

dDE (yn, ym) ≤ 2−k−1e−k.

Set vk = yNk ; then dDE (vk, vk+1) ≤ 2−k−1e−k. Thus there exists λk such that∫ ∞0

e−ud(vk, vk+1, λk, u) du < 2−ke−k.

As d(x, y, λ, u) is monotonic increasing in u, it follows that for any v ≥ 0,

306 A Measure Theory∫ ∞0

e−ud(x, y, λ, u) du ≥ d(x, y, λ, v)∫ ∞v

e−u du = e−vd(x, y, λ, v).

Therefore it is possible to find λk ∈ Λ and uk > k such that

max(γ(λk), d(vk, vk+1, λk, uk)) ≤ 2−k. (A.13)

Then form the limit of the composition of the functions λi

µk , limn→∞

λk+n · · ·λk+1 λk.

It then follows that

γ(µk) ≤∞∑i=k

γ(λi) ≤∞∑i=k

2−i = 2−k+1 <∞;

thus µk ∈ Λ. Using the bound (A.13) it follows that for k ∈ N,

supt≥0

d(vk(µ−1

k (t) ∧ uk), vk+1(µ−1k+1(t) ∧ uk)

)= sup

t≥0d(vk(µ−1

k (t) ∧ uk), vk+1(λk µ−1k (t) ∧ uk)

)= sup

t≥0d (vk(t ∧ uk), vk+1(λk(t) ∧ uk))

= d (vk, vk+1, λk, uk)

≤ 2−k.

Since (E, d) is complete, it now follows that zk = vk µ−1k converges uniformly

on compact sets of t to some limit, which we denote z. As each zk has cadlagpaths, it follows that the limit also has cadlag paths and thus belongs toDE [0,∞). It only remains to show that vk converges to z in the Skorohodtopology. This follows since, γ(µ−1

k )→ 0 as k →∞ and for fixed T > 0,

limk→∞

sup0≤t≤T

d(vk µ−1

k (t), z(t))

= 0.

ut

A.6 Stopping Times

In this section, the notation Fot is used to emphasise that this filtration hasnot been augmented.

Definition A.16. A random variable T taking values in [0,∞) is said to bean Fot -stopping time, if for all t ≥ 0, the event T ≤ t ∈ Fot .

A.6 Stopping Times 307

The subject of stopping times is too large to cover in any detail here. For moredetails see Rogers and Williams [248], or Dellacherie and Meyer [77, SectionIV.3].

Lemma A.17. A random variable T taking values in [0,∞) is an Fot+-stopping time if and only if T < t ∈ Fot for all t ≥ 0.

Proof. If T < t ∈ Fot for all t ≥ 0 then since

T ≤ t =⋂ε>0

T < t+ ε,

it follows that T ≤ t ∈ Fot+ε for any t ≥ 0 and ε > 0, thus T ≤ t ∈ Fot+.Thus T is an Fot+-stopping time.

Conversely if T is an Fot+-stopping time then since

T < t =∞⋃n=1

T ≤ t− 1/n

and each T ≤ t− 1/n ∈ Fo(t−1/n)+ ⊆ Fot , therefore T < t ∈ Fot . ut

Lemma A.18. Let Tn be a sequence of Fot -stopping times. Then T = infn Tn

is an Fot+-stopping time.

Proof. Write the event infn Tn < t asinfnTn < t

=⋂n

Tn < t.

By Lemma A.17 each term in this intersection belongs to Fot+, therefore sodoes the intersection which again by Lemma A.17 implies that infn Tn is aFot+-stopping time. ut

Lemma A.19. Let X be a real-valued, continuous, adapted process and a ∈ R.Define Ta , inft ≥ 0 : Xt ≥ a. Then Ta is a Ft-stopping time

Proof. The set ω : Xq(ω) ≥ a is Fq-measurable for any q ∈ Q+ as X isFt-adapted. Then using the path continuity of X,

Ta ≤ t =ω : inf

0≤s≤tXs(ω) ≥ a

=

⋃q∈Q+:0≤q≤t

ω : Xq(ω) ≥ a .

Thus Ta ≤ t may be written as a countable union of Ft-measurable setsand so is itself Ft-measurable. Hence Ta is a Ft-stopping time. ut

Theorem A.20 (Debut Theorem). Let X be a process defined in sometopological space (S) (with its associated Borel σ-algebra B(S)). Assume thatX is progressively measurable relative to a filtration Fot . Then for A ∈ B(S),the mapping DA = inft ≥ 0;Xt ∈ A defines an Ft-stopping time, where Ftis the augmentation of Fot .


For a proof see Theorem IV.50 of Dellacherie and Meyer [77]. See alsoRogers and Williams [248, 249] for related results.

We require a technical result regarding the augmentation aspect of theusual conditions which is used during the innovations derivation of the filteringequations.

Lemma A.21. Let Gt be the filtration Fot ∨N where N contains all the P-nullsets. If T is a Gt-stopping time, then there exists a Fot+-stopping time T ′ suchthat T = T ′ P-a.s. In addition if L ∈ GT then there exists M ∈ FoT+ such thatL = M P-a.s.

Proof. Consider a stopping time of the form T = a1A +∞1Ac where a ∈ R+

and A ∈ Ga; in this case let B be an element of Foa such that the symmetricdifference A4B is a P-null set and define T ′ = a1B +∞1Bc . For a generalGt-stopping time T use a dyadic approximation. Let

S(n) ,∞∑k=0

k2−n1(k−1)2−n≤T<k2−n.

Clearly S(n) is GT -measurable and by construction S(n) ≥ T . Thus Sn is aGt-stopping time. But the stopping time S(n) takes values in a countable set,so

S(n) = infk

k2−n1Ak +∞IAck

,

where Ak , S(n) = k2−n. The result has already been proved for stoppingtimes of the form of those inside the infimum. As T = limn S

(n) = infn S(n),consequently the result holds for all Gt-stopping times. As a consequence ofthis limiting operation Fot+ appears instead of Fot .

To prove the second assertion, let L ∈ GT . By the first part since L ∈ G∞there exists L′ ∈ Fo∞ such that L = L′ P-a.s. Let V = T1L +∞1Lc a.s. Usingthe first part again, Fot+-stopping times V ′ and T ′ can be constructed such thatV = V ′ a.s. and T = T ′ a.s. Define M , L′ ∩ T ′ =∞ ∪ V ′ = T ′ <∞.Clearly M is FoT+-measurable and it follows that L = M P-a.s. ut

The following lemma is trivial, but worth stating to avoid confusion in themore complex proof which follows.

Lemma A.22. Let X ot be the unaugmented σ-algebra generated by a processX. Then for T an X ot -stopping time, if T (ω) ≤ t and Xs(ω) = Xs(ω′) fors ≤ t then T (ω′) ≤ t.

Proof. As T is a stopping time, T ≤ t ∈ X ot = σ(Xs : 0 ≤ s ≤ t) fromwhich the result follows. ut

Corollary A.23. Let X ot be the unaugmented σ-algebra generated by a processX. Then for T an X ot -stopping time, if T (ω) ≤ t and Xs(ω) = Xs(ω′) fors ≤ t then T (ω′) = T (ω).

A.6 Stopping Times 309

Proof. Apply Lemma A.22 with t = T (ω) to conclude T (ω′) ≤ T (ω). Bysymmetry, T (ω) ≤ T (ω′) from which the result follows. ut

Lemma A.24. Let X ot be the unaugmented σ-algebra generated by a processX. Then for T a X ot -stopping time, for all t ≥ 0,

X ot∧T = σ Xs∧T : 0 ≤ s ≤ t .

Proof. Since T ∧ t is also a X ot -stopping time, it suffices to show

X oT = σ Xs∧T : s ≥ 0 .

The definition of the σ-algebra associated with a stopping time is that

X oT , B ∈ X o∞ : B ∩ T ≤ s ∈ X os for all s ≥ 0 .

If A ∈ FoT then it follows from this definition that

TA =

T if ω ∈ A,+∞ otherwise,

defines a X ot -stopping time. Conversely if for some set A, the time TA definedas above is a stopping time it follows that A ∈ X oT . Therefore we will haveestablished the result if we can show that A ∈ σXs∧T : s ≥ 0 is a necessaryand sufficient condition for TA to be a stopping time.

For the first implication, assume that TA is a X ot -stopping time. It isnecessary to show that A ∈ σXs∧T : s ≥ 0. Suppose that ω, ω′ ∈ Ω are suchthat Xs(ω) = Xs(ω′) for s ≤ T (ω). We will establish that A ∈ σXs∧T : s ≥0 if we show ω ∈ A implies that ω′ ∈ A.

If T (ω) =∞ then it is immediate that the trajectories Xs(ω) and Xs(ω′)are identical and hence ω′ ∈ A. Therefore consider T (ω) < ∞; if ω ∈ A thenTA(ω) = T (ω) and since it was assumed that TA is a X ot -stopping time thefact that Xs(ω) and Xs(ω′) agree for s ≤ TA(ω) implies by Corollary A.23that TA(ω′) = TA(ω) = T (ω) < ∞ and from TA(ω′) < ∞ it follows thatω′ ∈ A.

We must now prove the opposite implication; that is, given that T is astopping time and A ∈ σXs∧T : s ≥ 0, we must show that TA is a stoppingtime.

Given arbitrary t ≥ 0, if TA(ω) ≤ t and Xs(ω) = Xs(ω′) for s ≤ t it followsthat ω ∈ A (since TA(ω) < ∞). If T (ω) ≤ t and Xs(ω) = Xs(ω′) for s ≤ t,since T is a stopping time it follows from Corollary A.23 that T (ω) = T (ω′).Since we assumed A ∈ σXs∧T : s ≥ 0 it follows that ω′ ∈ A from which wededuce TA(ω) = T (ω) = T (ω′) = TA(ω′) whence

TA(ω) ≤ t,Xs(ω) = Xs(ω′) for all s ≤ t ⇒ TA(ω′) ≤ t,

which implies that TA(ω) ≤ t ∈ X ot and hence that TA is a X ot -stoppingtime. ut


For many arguments it is required that the augmentation of the filtrationgenerated by a process be right continuous. While left continuity of samplepaths does imply left continuity of the filtration, right continuity (or evencontinuity) of the sample paths does not imply that the augmentation of thegenerated filtration is right continuous. This can be seen by considering theevent that a process has a local maximum at time t which may be in Xt+ butnot Xt (see the solution to Problem 7.1 (iii) in Chapter 2 of Karatzas andShreve [149]). The following proposition gives an important class of processfor which the right continuity does hold.

Proposition A.25. If X is a d-dimensional strong Markov process, then theaugmentation of the filtration generated by X is right continuous.

Proof. Denote by X o the (unaugmented) filtration generated by the processX. If 0 ≤ t0 < t1 < · · · < tn ≤ s < tn+1 · · · < tm, then by application of thestrong Markov property to the trivial X ot+-stopping time s,

P(Xt0 ∈ Γ0, . . . Xtm ∈ Γm | Fs+)= 1Xt0∈Γ0,...,Xtn∈ΓnP(Xtn+1 ∈ Γn+1, . . . , Xm ∈ Γm | Xs).

The right-hand side in this equation is clearly Xs-measurable and it is P-a.s.equal to P(Xt0 ∈ Γ0, . . . Xtm ∈ Γm | Fs+). As this holds for all cylinder sets,it follows that for all F ∈ X o∞ there exists a X os -measurable random variablewhich is P-a.s. equal to P(F | X os+).

Suppose that F ∈ X os+ ⊆ X o∞; then clearly P(F | X os+) = 1F . As abovethere exists a X os -measurable random variable 1F such that 1F = 1F a.s.Define the event G , ω : 1F (ω) = 1, then G ∈ X os and the events G andF differ by at most a null set (i.e. the symmetric difference G4F is null).Therefore F ∈ Xs, which establishes that X os+ ⊆ Xs for all s ≥ 0.

It is clear that Xs ⊆ Xs+. Now prove the converse implication. Supposethat F ∈ Xs+, which implies that for all n, F ∈ Xs+1/n. Therefore thereexists Gn ∈ X os+1/n such that F and Gn differ by a null set. Define G ,⋂∞m=1

⋃∞n=mGn. Then clearly G ∈ X os+ ⊆ Xs (by the result just proved). To

show that F ∈ Xs, it suffices to show that this G differs by at most a null setfrom F . Consider

G \ F ⊆

( ∞⋃n=1

Gn

)\ F =

∞⋃n=1

(Gn \ F ),

where the right-hand side is a countable union of null sets; thus G \F is null.Secondly

F \G = F ∩

( ∞⋂m=1

∞⋃n=m

Gn

)c= F ∩

( ∞⋃m=1

∞⋂n=m

Gcn

)

=∞⋃m=1

(F ∩

( ∞⋂n=m

Gcn

))⊆∞⋃m=1

F ∩Gcm =∞⋃m=1

(F \Gm),

A.7 The Optional Projection 311

and again the right-hand side is a countable union of null sets, thus F \G isnull. Therefore F ∈ Xs, which implies that Xs+ ⊆ Xs; hence Xs = Xs+. ut

A.7 The Optional Projection

Proof of Theorem 2.7

Proof. The proof uses a monotone class argument (Theorem A.1). Let H bethe class of bounded measurable processes for which an optional projectionexists. The class of functions 1[s,t)1F , where s < t and F ∈ F can readily beseen to form a π-system which generates the measurable processes. Define Zto be a cadlag version of the martingale t 7→ E(1F | Ft) (which necessarilyexists since we have assumed that the usual conditions hold); then we mayset

o(1[s,t)1F

)(r, ω) = 1[s,t)(r)Zr(ω).

It is necessary to check that the defining condition (2.8) is satisfied. Let Tbe a stopping time. Then from Doob’s optional sampling theorem (which isapplicable in this case without restrictions on T , because the martingale Z isbounded and hence uniformly integrable) that

E[1F | FT ] = E[Z∞ | FT ] = ZT

whenceE[1F 1T<∞ | FT ] = ZT 1T<∞ P-a.s.

To apply the Monotone class theorem A.1 it is necessary to check that if Xn isa bounded monotone sequence inH with limit X then the optional projectionsoXn converge to the optional projection of X. Consider

Y , lim infn→∞

oXn1| lim infn→∞ oXn|<∞.

We must check that Y is the optional projection of X. Thanks to property (c)of conditional expectation the condition (2.8) is immediate. Consequently His a monotone class and thus by the monotone class theorem A.1 the optionalprojection exists for any bounded B×F-measurable process. To extend to theunbounded non-negative case consider X ∧ n and pass to the limit.

In order to verify that the projection is unique up to indistinguishability,consider two candidates for the optional projection Y and Z. For any stoppingtime T from (2.8) it follows that

YT 1T<∞ = ZT 1T<∞, P-a.s. (A.14)

Define F , (t, ω) : Zt(ω) 6= Yt(ω). Since both Z and Y are optional pro-cesses the set F is an optional subset of [0,∞)×Ω. Write π : [0,∞)×Ω → Ω for


the canonical projection map π : (t, ω) 7→ ω. Now argue by contradiction. Sup-pose that Z and Y are not indistinguishable; this implies that P(π(F )) > 0. Bythe optional section theorem (see Dellacherie and Meyer [77, IV.84]) it followsthat given ε > 0 there exists a stopping time U such that when U(ω) < ∞,(U(ω), ω) ∈ F and P(U < ∞) ≥ P(π(F )) − ε. As it has been assumed thatP(π(F )) > 0, by choosing ε sufficiently small, P(U < ∞) > 0. It followsthat on some set of non-zero probability 1U<∞YU 6= 1U<∞ZU . But from(A.14) this may only hold on a null set, which is a contradiction. ThereforeP(π(F )) = 0 and it follows that Z and Y are indistinguishable. ut

Lemma A.26. If almost surely Xt ≥ 0 for all t ≥ 0 then oXt ≥ 0 for allt ≥ 0 almost surely.

Proof. Use the monotone class argument (Theorem A.1) in the proof of theexistence of the optional projection, noting that if F ∈ F then the cadlagversion of E[1F | Ft] is non-negative a.s. Alternatively use the optional sectiontheorem as in the proof of uniqueness. ut

A.7.1 Path Regularity

Introduce the following notation for a one-sided limit, in this case the rightlimit

lim sups↓↓t

xs , lim sups→t:s>t

xs = infv>t

supt<u≤v

xu,

a similar notation with s ↑↑ t being used for the left limit.The following lemma is required to establish right continuity. It can be

applied to the optional projection since being optional it must also be pro-gressively measurable.

Lemma A.27. Let X be a progressively measurable stochastic process takingvalues in R; then lim infs↓↓tXs and lim sups↓↓tXs are progressively measur-able.

Proof. It is sufficient to consider the case of lim sup. Let b ∈ R be such thatb > 0, then define

Xnt ,

supkb2−n≤s<(k+1)b2−n Xs if b(k − 1)2−n ≤ t < bk2−n, k < 2n,lim sups↓↓bXs if b(1− 2−n) ≤ t ≤ b.

For every t ≤ b, the supremum in the above definition is Fb-measurable sinceX is progressively measurable; thus the random variable Xn

t is Fb-measurable.For every ω ∈ Ω, Xn

t (ω) has trajectories which are right continuous fort ∈ [0, b]. Therefore Xn is B([0, b])⊗ Fb-measurable and is thus progressivelymeasurable. On [0, b] it is clear that lim supn→∞Xn

t = lim sups↓↓tXs, hencelim sups↓↓tXs is progressively measurable. ut


In a similar vein, the following lemma is required in order to establish theexistence of left limits. For the left limits the result is stronger and the lim infand lim sup are previsible and thus also progressively measurable.

Lemma A.28. Let X be a progressively measurable stochastic process takingvalues in R; then lim infs↑↑tXs and lim sups↑↑tXs are previsible.

Proof. It suffices to consider lim sups↑↑tXt. Define

Xnt ,

∑k>0

1k2−n<t≤(k+1)2−n sup(k−1)2−n<s≤k2−n

Xs,

from this definition it is clear that Xnt is previsible as it is a sum of left

continuous, adapted, processes. But as lim supn→∞Xnt = lim sups↑↑tXs, it

follows that lim sups↑↑tXs is previsible. ut

Proof of Theorem 2.9

Proof. First observe that if Yt is bounded then oYt must also be bounded.There are three things which must be established; first, the existence of rightlimits; second, right continuity; and third the existence of left limits. Becauseof the difference between Lemmas A.27 and A.28 the cases of left and rightlimits are not identical. The first part of the proof establishes the existence ofright limits. It is sufficient to show that

P

(lim infs↓↓t

oYs < lim sups↓↓t

oYs for some t ∈ [0,∞)

)= 0. (A.15)

The following steps are familiar from the proof of Doob’s martingale reg-ularization theorem which is used to guarantee the existence of cadlag modi-fications of martingales. If the right limit does not exist at t ∈ [0,∞), that is,if lim infs↓↓t oYs < lim sups↓↓t oYs, then rationals a, b can be found such thatlim infs↓↓t oYs < a < b < lim sups↓↓t oYs. The event that the right limit doesnot exist has thus been decomposed into a countable union over the rationals:

ω : lim infs↓↓t

oYs(ω) < lim sups↓↓t

oYs(ω) for some t ∈ [0,∞)

=

⋃a,b∈Q

ω : lim inf

s↓↓toYs(ω) < a < b < lim sup

s↓↓t

oYs(ω) for some t ∈ [0,∞)

.

The lim sup and lim inf processes are progressively measurable by LemmaA.27, therefore for rationals a < b, the set

Ea,b ,

(t, ω) : lim inf

s↓↓toYs < a < b < lim sup

s↓↓t

oYs

,


is progressively measurable.Now argue by contradiction; suppose that (A.15) is not true. Then from

the decomposition into a countable union, it follows that we can find a, b ∈ Qsuch that a < b and

0 < P

(lim infs↓↓t

oYs < a < b < lim sups↓↓t

oYs for some t ∈ [0,∞)

)= P(π(Ea,b)),

where the projection π is defined for A ⊂ [0,∞)×Ω, by π(A) = ω : (ω, t) ∈A. Define

Sa,b , inft ≥ 0 : (t, ω) ∈ Ea,b,

which is the debut of a progressively measurable set, and thus by the Debuttheorem (Theorem A.20 applied to the progressive process 1Ea,b(t, ω)) is astopping time (and hence optional). For a given ω, this stopping time Sa,b(ω)is the first time where lim infs↓↓t oYs and lim sups↓↓t oYs straddle the interval[a, b] and thus the right limit fails to exist at this point.

If ω ∈ π(Ea,b) then there exists t ∈ [0,∞) such that (t, ω) ∈ Ea,b and thisimplies t ≥ S(ω), whence S(ω) <∞. Thus, if P(π(Ea,b)) > 0 then this impliesP(Sa,b < ∞) > 0. Thus a consequence of the assumption that (A.15) is falseis that we can find a, b ∈ Q, with a < b such that P(Sa,b <∞) > 0. This willlead to a contradiction. For the remainder of the argument we can keep a andb fixed and consequently we write S in place of Sa,b.

DefineA0 , (t, ω) : S(ω) < t < S(ω) + 1, oYt(ω) < a;

it then follows that the projection π(A0) = S < ∞. Thus by the optionalsection theorem, since A0 is optional (S is a stopping time and oYt is a priorioptional), we can find a stopping time S0 such that on S0 <∞, (S0(ω), ω) ∈A0 and

P(S0 <∞) > (1− 1/2)P(S <∞).

Define

A1 , (t, ω) : S(ω) < t < (S(ω) + 1/2) ∧ S0(ω), oYt(ω) > b

and again by the optional section theorem we can find a stopping time S1

such that on S1 <∞, (S1(ω), ω) ∈ A1 and

P(S1 <∞) > (1− 1/22)P(S <∞).

We can carry on this construction inductively defining

A2k ,

(t, ω) : S(ω) < t < (S(ω) + 2−2k) ∧ S2k−1(ω), oYt(ω) < a,

and

A2k+1 ,

(t, ω) : S(ω) < t < (S(ω) + 2−(2k+1)) ∧ S2k(ω), oYt(ω) > b.


We can construct stopping times using the optional section theorem such thatfor each i, on Si <∞, (Si(ω), ω) ∈ Ai, and such that

P(Si <∞) >(

1− 2−(i+1))

P(S <∞).

On the event Si < ∞ it is clear that Si < Si−1 and Si < S + 2−i. Alsoif S = ∞ it follows that Si = ∞ for all i, thus Si < ∞ implies S < ∞, soP(Si <∞, S <∞) = P(Si <∞), whence

P(Si =∞, S <∞) = P(S <∞)− P(Si <∞, S <∞)

= P(S <∞)− P(Si <∞) ≤ P(S <∞)/2i+1.

Thus∞∑i=0

P(Si =∞, S <∞) ≤ P(S <∞) ≤ 1 <∞,

so by the first Borel–Cantelli lemma the probability that infinitely many ofthe events Si =∞, S <∞ occur is zero. In other words for ω ∈ S <∞,we can find an index i0(ω) such that for i ≥ i0, the sequence Si converges ina decreasing fashion to S and oYSi < a for even i, and oYSi > b for odd i.

Define Ri = supj≥i Sj , which is a monotonically decreasing sequence. Al-most surely, Ri = Si for i sufficiently large, therefore limi→∞Ri = S a.s.and on the event S < ∞, for i sufficiently large, oYRi < a for i even, andoYRi > b for i odd. Set Ti = Ri ∧ N . On S < N, for j sufficiently largeSj < N , hence using the boundedness of oY to enable interchange of limitand expectation

lim supi→∞

E [oYT2i ] ≤ aP(S < N) + E[oYN1S≥N

],

lim infi→∞

E[oYT2i+1

]≥ bP(S < N) + E

[oYN1S≥N

].

But since Ti is bounded by N , from the definition of the optional projection(2.8) it is clear that

E[oYTi ] = E [E [YTi1Ti<∞ | FTi ]] = E[YTi ]. (A.16)

Thus, since Y has right limits, by an application of the bounded convergencetheorem E[YTi ]→ E[YT ], and so as i→∞

E[oYTi ]→ E[oYT ]. (A.17)

Thus

lim supi→∞

E[oYTi ] = lim supi→∞

E[oYT2i ] and lim infi→∞

E[oYTi ] = lim infi→∞

E[oYT2i+1 ],

so, if P(S < N) > 0 we see that since a < b, lim supi→∞ E[oYTi ] <lim infi→∞ E[oYTi ], which is a contradiction therefore P(S < N) = 0. As


N was chosen arbitrarily, this implies that P(S = ∞) = 1 which is a con-tradiction, since we assumed P(S < ∞) > 0. Thus a.s., right limits of oYtexist.

Now we must show that oYt is right continuous. Let oYt+ be the processof right limits. As this process is adapted and right continuous, it follows thatit is optional. Consider for ε > 0, the set

Aε , (t, ω) : oYt(ω) ≥ oYt+(ω) + ε.

Suppose that P(π(Aε)) > 0, from which we deduce a contradiction. By theoptional section theorem, for δ > 0, we can find a stopping time S such thaton S <∞, (S(ω), ω) ∈ Aε, and P(S <∞) = P(π(Aε))− δ. We may choose δsuch that P(S <∞) > 0. Let Sn = S + 1/n, and bound these times by someN , which is chosen sufficiently large that P(S < N) > 0. Thus set Tn , Sn∧Nand T , S ∧N . Hence by bounded convergence

limn→∞

E[oYTn ] = E[oYN1S≥N ] + E[oYT+1S<N ], (A.18)

butE[oYT ] = E[oYN1S≥N ] + E[oYT 1S<N ]. (A.19)

As the right-hand sides of (A.18) and (A.19) are not equal we conclude thatlimn→∞ E(oYTn) 6= E(oYT ), which contradicts (A.17). Therefore P(π(Aε)) =0. The same argument can be applied to

Bε = (t, ω) : oYt(ω) ≤ oYt+(ω)− ε,

which allows us to conclude that P(π(Bε)) = 0; hence

P (oYt = oYt+, ∀t ∈ [0,∞)) = 1,

and thus, up to indistinguishability, the process oYt is right continuous.The existence of left limits is approached in a similar fashion; by Lemma

A.28, the processes lim infs↑↑t oYs and lim sups↑↑t oYs are previsible and henceoptional. For a, b ∈ Q we define

Fa,b ,

(t, ω) : lim inf

s↑↑toYs(ω) < a < b < lim sup

s↑↑t

oYs(ω)

.

We assume P(π(Fa,b)) > 0 and deduce a contradiction. Since Fa,b is optional,we may apply the optional section theorem to find an optional time T suchthat on T <∞, the point (T (ω), ω) ∈ Fa,b and with P(T <∞) > ε . Define

C0 , (t, ω) : t < T (ω), oYt < a,

which is itself optional; thus another application of the optional section the-orem constructs a stopping time R0 such that on R0 < ∞ (R(ω), ω) ∈ C0

and since R0 < T it is clear that P(R0 <∞) > ε.

A.8 The Previsible Projection 317

Then define

C1 , (t, ω) : R0(ω) < t < T (ω), oYt > b,

which is optional and by the optional section theorem we can find a stoppingtime R1 such that on R1(ω) <∞, (R1(ω), ω) ∈ C1 and again R1 < T impliesthat P(R1 <∞) > ε. Proceed inductively.

We have constructed an increasing sequence of optional times Rk such thaton the event T < ∞, YRk < a for even k, and oYRk > b for odd k. DefineLk = Rk ∧ N for some N ; then this is an increasing sequence of boundedstopping times and clearly on T < N the limit limn E[oYLn ] does not exist.But since Ln is bounded, from (A.16) it follows that this limit must exist a.s.;hence P(T < N) = 0, which as N was arbitrary implies P(T <∞) = 0, whichis a contradiction. ut

The results used in the above proof are due to Doob and can be found ina very clear paper [82] which is discussed further in Benveniste [16]. Thesepapers work in the context of separable processes, which are processes whosegraph is the closure of the graph of the process with time restricted to somecountable set D. That is, for every t ∈ [0,∞) there exists a sequence ti ∈ Dsuch that ti → t and xti → xt. In these papers ‘rules of play’ disallow the useof the optional section theorem except when unavoidable and the above resultsare proved without its use. These results can be extended (with the addition ofextra conditions) to optionally separable processes, which are similarly defined,but the set D consists of a countable collection of stopping times and by anapplication of the optional section theorem it can be shown that every optionalprocess is optionally separable. The direct approach via the optional sectiontheorems is used in Dellacherie and Meyer [79].

A.8 The Previsible Projection

The optional projection (called the projection bien measurable in some earlyarticles) has been discussed extensively and is the projection which is of im-portance in the theory of filtering; a closely related concept is the previsible(or predictable) projection. Some of the very early theoretical papers makeuse of this projection. By convention we take F0− = F0.

Theorem A.29. Let X be a bounded measurable process; then there exists anoptional process oX called the previsible projection of X such that for everyprevisible stopping time T ,

pXT 1T<∞ = E[XT 1T<∞ | FT−

]. (A.20)

This process is unique up to indistinguishability, i.e. any processes which sat-isfy these conditions will be indistinguishable.


Proof. As in the proof of Theorem 2.7, let F be a measurable set, and defineZt to be a cadlag version of the martingale E[1F | Ft]. Then we define theprevisible projection of 1(s,t]1F by

p(1(s,t]1F

)(r, ω) = 1(s,t](r)Zr−(ω).

We must verify that this satisfies (A.20); let T be a previsible stopping time.Then we can find a sequence Tn of stopping times such that Tn ≤ Tn+1 <T for all n. By Doob’s optional sampling theorem applied to the uniformlyintegrable martingale Z;

E[1F | FTn ] = E[Z∞ | FTn ] = ZTn ,

now pass to the limit as n → ∞, using the martingale convergence theorem(see Theorem B.1), and we get

ZT− = E [Z∞ | ∨∞n=1FTn ]

and from the definition of the σ-algebra of T− it follows that

ZT− = E[Z∞ | FT−].

To complete the proof, apply the monotone class theorem A.1 as in the prooffor the optional projection and use the same optional section theorem argu-ment for uniqueness. ut

The previsible and optional projection are actually very similar, as thefollowing theorem illustrates.

Theorem A.30. Let X be a bounded measurable process; then the set

(t, ω) : oXt(ω) 6= pXt(ω)

is a countable union of graphs of stopping times.

Proof. Again we use the monotone class argument. Consider the process1[s,t)(r)1F , from (2.8) and (A.20) the set of points of difference is

(t, ω) : Zt(ω) 6= Zt−(ω)

and since Z is a cadlag process we can define a sequence Tn of stopping timescorresponding to the nth discontinuity of Z, and by Lemma A.13 there are atmost countably many such discontinuities, therefore the points of differenceare contained in the countable union of the graphs of these Tns. ut

A.9 The Optional Projection Without the Usual Conditions 319

A.9 The Optional Projection Without the UsualConditions

The proof of the optional projection theorem in Section A.7 depends cruciallyon the usual conditions to construct a cadlag version of a martingale, both theaugmentation by null sets and the right continuity of the filtration being used.The result can be proved on the uncompleted σ-algebra by making suitablemodifications to the process constructed by Theorem 2.7. These results werefirst established in Dellacherie and Meyer [78] and take their definitive formin [77], the latter approach being followed here. The proofs in this section areof a more advanced nature and make use of a number of non-trivial resultsabout σ-algebras of stopping times which are not proved here. These resultsand their proofs can be found in, for example, Rogers and Williams [249]. Asusual let Fot denote the unaugmented σ-algebra corresponding to Ft.

Lemma A.31. Let L ⊂ R+ ×Ω be such that

L =⋃n

(Sn(ω), ω) : ω ∈ Ω,

where the Sn are positive Fot -stopping times. We can find disjoint Fot -stoppingtimes Tn to replace the Sn such that

L =⋃n

(Tn(ω), ω) : ω ∈ Ω.

Proof. Define T1 = S1 and define

An , ω ∈ Ω : S1 6= Sn, S2 6= Sn, . . . , Sn−1 6= Sn.

Then it is clear that An ∈ FoSn . From the definition of this σ-algebra, if wedefine

Tn , Sn1An +∞1Acn ,

then this Tn is a stopping time. It is clear that this process may be continuedinductively. The disjointness of the Tns follows by construction. ut

Given this lemma the following result is useful when modifying a processas it allows us to break down the ‘bad’ set A of points in a useful fashion.

Lemma A.32. Let A be a subset of R+×Ω contained in a countable union ofgraphs of positive random variables then A = K∪L where K and L are disjointmeasurable sets such that K is contained in a disjoint union of graphs of op-tional times and L intersects the graph of any optional time on an evanescentset.†

† A set A ⊂ [0,∞) × Ω is evanescent if the projection π(A) = ω : ∃t ∈[0,∞) such that (ω, t) ∈ A is contained in a P-null set. Two indistinguishableprocesses differ on an evanescent set.


Proof. Let V denote the set of all optional times. For Z a positive randomvariable define V (Z) = ∪T∈Vω : Z(ω) = T (ω); consequently there is auseful decomposition Z = Z ′ ∧ Z ′′, where

Z ′ = Z1V (Z) +∞1V (Z)c

Z ′′ = Z1V (Z)c +∞1V (Z).

From the definition of V (Z) the set (Z ′(ω), ω) : ω ∈ Ω is contained in thegraph of a countable number of optional times and if T is an optional timethen P(Z ′′ = T < ∞) = 0. Let the covering of A by a countable family ofgraphs of random variables be written

A ⊆∞⋃n=1

(Zn(ω), ω) : ω ∈ Ω

and form a decomposition of each random variable Zn = Z ′n ∧ Z ′′n as above.Clearly

⋃∞n=1(Z ′n(ω), ω) : ω ∈ Ω is also covered by a countable union of

graphs of optional times and by Lemma A.31 we can find a sequence of disjointoptional times Tn such that

∞⋃n=1

(Z ′n(ω), ω) : ω ∈ Ω ⊆∞⋃n=1

(Tn(ω), ω) : ω ∈ Ω.

Define

K = A ∩∞⋃n=1

(Z ′n(ω), ω) : ω ∈ Ω = A ∩⋃n

(Tn(ω), ω) : ω ∈ Ω

L = A ∩∞⋃n=1

(Z ′′(ω), ω) : ω ∈ Ω = A \⋃n

(Tn(ω), ω) : ω ∈ Ω.

Clearly A = K ∪L, hence this is a decomposition of A which has the requiredproperties. ut

Lemma A.33. For every Ft-optional process Xt there is an indistinguishableFot+-optional process.

Proof. Let T be an Ft-stopping time. Consider the process Xt = 1[0,T ), whichis cadlag and Ft-adapted, and hence Ft-optional. By Lemma A.21 there existsan Fot+-stopping time T ′ such that T = T ′ a.s. If we define X ′t = 1[0,T ′), thensince this process is cadlag and Fot+-adapted, it is clearly an Fot+-optionalprocess.

P(ω : X ′t(ω) = Xt(ω) ∀t) = P(T = T ′) = 1,

which implies that the processes X ′ and X are indistinguishable.We extend from processes of the form 1[0,T ) to the whole of O using the

monotone class framework (Theorem A.1) to extend to bounded optional pro-cesses, and use truncation to extended to the unbounded case. ut

A.9 The Optional Projection Without the Usual Conditions 321

Lemma A.34. For every Ft-previsible process, there is an indistinguishableFot -previsible process.

Proof. We first show that if T is Ft-previsible; then there exists T ′ which isFot -previsible, such that T = T ′ a.s. As T = 0 ∈ F0−, we need only considerthe case where T > 0. Let Tn be a sequence of Ft-stopping times announcing†

T . By Lemma A.33 it is clear that we can find Rn an Fot+-stopping time suchthat Rn = Tn a.s. Define Ln , maxi=1,...,nRn; clearly this is an increasingsequence of stopping times. Let this sequence have limit L.

Define An , Ln = 0 ∪ Ln < L and define

Mn =

Ln ∧ n if ω ∈ An+∞ otherwise.

Since the sets An are decreasing, the stopping times Mn form an increasingsequence and the sequence Mn announces everywhere its limit T ′. This limitis strictly positive. Because T ′ is announced, T ′ is an Fot -previsible time andT = T ′ a.s. Finish the proof by a monotone class argument as in LemmaA.33. ut

The main result of this section is the following extension of the optionalprojection theorem which does not require the imposition of the usual condi-tions.

Theorem A.35. Given a stochastic process X, we can construct an Fot op-tional process Zt such that for every stopping time T ,

ZT 1T<∞ = E[ZT 1T<∞ | FT

], (A.21)

and this process is unique up to indistinguishability.

Proof. By the optional projection theorem 2.7 we can construct an Ft-optionalprocess Zt which satisfies (A.21). By Lemma A.33 we can find a process Ztwhich is indistinguishable from Zt but which is Fot+-optional. In general thisprocess Z will not be Fot -optional. We must therefore modify it.

Similarly using Theorem A.29, we can construct an Ft-previsible processYt, and using Lemma A.34, we can find an Fot -previsible process Yt which isindistinguishable from the process Yt.

Let H = (t, ω) : Yt(ω) 6= Zt(ω); then it follows by Theorem A.30, thatthis graph of differences is contained within a countable disjoint union ofgraphs of random variables. Thus by Lemma A.32 we may write H = K ∪ L† A stopping time T is called announceable if there exists an announcing sequence

(Tn)n≥1 for T . This means that for any n ≥ 1 and ω ∈ Ω, Tn(ω) ≤ Tn+1(ω) <T (ω) and Tn(ω) T (ω). A stopping time T is announceable if and only if it isprevisible. For details see Rogers and Williams [248].


such that for T any Fot -stopping time P(ω : (T (ω), ω) ∈ L) = 0 and thereexists a sequence of Fot -stopping times Tn such that

K ⊂⋃n

(Tn(ω), ω) : ω ∈ Ω.

For each n let Zn be a version of E[XTn1Tn<∞ | FoTn ]; then we can define

Zt(ω) ,

Yt(ω) if (t, ω) /∈

⋃n(Tn(ω), ω) : ω ∈ Ω

Zn(ω) if (t, ω) ∈ (Tn(ω), ω) : ω ∈ Ω.(A.22)

It is immediate that this Zt is Fot -optional. Let us now show that it satisfies(A.21). Let T be an Fot -optional time and let A ∈ FoT . Set An = A∩T = Tn;thus A ∈ FoTn . Let B = A \

⋃nAn and thus B ∈ FoT .

From the definition (A.22),

ZT 1An1T<∞ = Zn1An1Tn<∞ = 1AnE[1Tn<∞XTn | FoTn ]= E[XTn1An1Tn<∞ | FoTn ] = E[XT 1An1T<∞ | FoT ].

Consequently

E[1AnZT 1T<∞] = E[1AnE[1T<∞XT | FoT ]] = E[1AnXT 1T<∞].

So on An the conditions are satisfied. Now consider B, on which a.s. T 6= Tnfor all n; hence (T (ω), ω) /∈ L. Since P((T (ω), ω) ∈ K) = 0, it follows that a.s.(T (ω), ω) /∈ H. Recalling the definition of H this implies that Yt(ω) = ζt(ω)a.s.; from the Definition A.22 on B, ZT = YT , thus

E[1BZT ] = E[1BζT ] = E[1BE[XT | FoT+)] = E[1BXT ].

Thus on An for each n and on B the process Z is an optional projection ofX. The uniqueness argument using the optional section theorem is exactlyanalogous to that used in the proof of Theorem 2.7. ut

A.10 Convergence of Measure-valued Random Variables

Let (Ω,F ,P) be a probability space and let (µn)∞n=1 be a sequence of randommeasures, µn : Ω → M(S) and µ : Ω → M(S) be another measure-valuedrandom variable. In the following we define two types of convergence for se-quences of measure-valued random variables:

1. limn→∞ E [|µnf − µf |] = 0 for all f ∈ Cb(S).2. limn→∞ µn = µ, P-a.s.

We call the first type of convergence convergence in expectation. If thereexists an integrable random variable w : Ω → R such that µn(1) ≤ w for all n,then limn→∞ µn = µ, P-a.s., implies that µn converged to µ in expectation bythe dominated convergence theorem. The extra condition is satisfied if (µn)∞n=1

is a sequence of random probability measures, since in this case, µn(1) = 1for all n. We also have the following.

A.10 Convergence of Measure-valued Random Variables 323

Remark A.36. If µn converges in expectation to µ, then there exist sequencesn(m) such that limm→∞ µn(m) = µ, P-a.s.

Proof. SinceM(S) is isomorphic to (0,∞)×P(S), with the isomorphism beinggiven by

ν ∈M(S) 7→ (ν(1), ν/ν(1)) ∈ (0,∞)× P(S),

it follows from Theorem 2.18 that there exists a countable convergence deter-mining set of functions†

M , ϕ0, ϕ1, ϕ2, . . ., (A.23)

where ϕ0 is the constant function equal to 1 everywhere and ϕi ∈ Cb(S) forany i > 0. Since

limn→∞

E [|µnf − µf |] = 0

for all f ∈ ϕ0, ϕ1, ϕ2, . . . and the set ϕ0, ϕ1, ϕ2, . . . is countable, one canfind a subsequence n(m) such that, with probability one, limm→∞ µn(m)ϕi =µϕi for all i ≥ 0, hence the claim. ut

If a suitable bound on the rate of convergence for E [|µnf − µf |] is known,then the sequence n(m) can be specified explicitly. For instance we have thefollowing.

Remark A.37. Assume that there exists a countable convergence determiningset M such that, for any f ∈M,

E [|µnf − µf |] ≤ cf√n,

where cf is a positive constant independent of n, then limm→∞ µm3

= µ,P-a.s.

Proof. By Fatou’s lemma

E

[ ∞∑m=1

∣∣∣µm3f − µf

∣∣∣] ≤ limn→∞

n∑m=1

E[∣∣∣µm3

f − µf∣∣∣]

≤ cf∞∑m=1

1m3/2

<∞.

Hence∞∑m=1

∣∣∣µm3f − µf

∣∣∣ <∞ P-a.s.,

† Recall that M is a convergence determining set if, for any sequence of fi-nite measures νn, n = 1, 2, . . . and ν being another finite measure for whichlimn→∞ νnf = νf for all f ∈M, it follows limn→∞ νn = ν.


thereforelimm→∞

µm3f = µf for any f ∈M.

Since M is countable and convergence determining, it also follows thatlimm→∞ µm

3= µ, P-a.s. ut

Let d : P(S)×P(S)→ [0,∞) be the metric defined in Theorem 2.19; thatis, for µ, ν ∈ P(S),

d(µ, ν) =∞∑i=1

|µϕi − νϕi|2i

,

where ϕ1,ϕ2, . . . are elements of Cb(S) such that ‖ϕi‖∞ = 1 and let ϕ0 = 1.We can extend d to a metric on M(S) as follows.

dM :M(S)×M(S)→ [0,∞), d(µ, ν) ,∞∑i=0

12i|µϕi − νϕi|. (A.24)

The careful reader should check that dM is a metric and that indeed dMinduces the weak topology on M(S). Using dM, the almost sure convergence2. is equivalent to

2′. limn→∞ dM(µn, µ) = 0, P-a.s.

If there exists an integrable random variable w : Ω → R such that µn(1) ≤w for all n, then similarly, (1) implies

1′. limn→∞ E [dM(µn, µ)] = 0.

However, a stronger condition (such as tightness) must be imposed in orderfor condition (1) to be equivalent to condition (1′).

It is usually the case that convergence in expectation is easier to estab-lish than almost sure convergence. However, if we have control on the highermoments of the error variables µnf − µf then we can deduce the almost sureconvergence of µn to µ. The following remark shows how this can be achievedand is used repeatedly in Chapters 8, 9 and 10.

Remark A.38. i. Assume that there exists a positive constant p > 1 and acountable convergence determining set M such that, for any f ∈ M, wehave

E[|µnf − µf |2p

]≤ cfnp,

where cf is a positive constant independent of n. Then, for any ε ∈ (0, 1/2−1/(2p)) there exists a positive random variable cf,ε almost surely finite suchthat

|µnf − µf | ≤ cf,εnε

.

In particular, limn→∞ µn = µ, P-a.s.

A.11 Gronwall’s Lemma 325

ii. Similarly, assume that there exists a positive constant p > 1 and a countableconvergence determining set M such that

E[dM(µn, µ)2p

]≤ c

np,

where dM is the metric defined in (A.24) and c is a positive constant in-dependent of n. Then, for any ε ∈ (0, 1/2− 1/(2p)) there exists a positiverandom variable cε almost surely finite such that

|µnf − µf | ≤ cεnε, P-a.s.

In particular, limn→∞ µn = µ, P-a.s.

Proof. As in the proof of Remark A.37,

E

[ ∞∑n=1

n2εp|µnf − µf |2p]≤ cf

∞∑m=1

1np−2εp

<∞,

since p− 2εp > 1. Let cf,ε be the random variable

cf,ε =

( ∞∑n=1

n2εp|µnf − µf |2p)1/2p

.

As (cf,ε)2p is integrable, cf,ε is almost surely finite and

nε|µnf − µf | ≤ cf,ε.

Therefore limn→∞ µnf = µf for any f ∈ M. Again, since M is countableand convergence determining, it also follows that limn→∞ µn = µ, P-a.s. Part(ii) of the remark follows in a similar manner. ut

A.11 Gronwall’s Lemma

An important and frequently used result in the theory of stochastic differentialequations is Gronwall’s lemma.

Lemma A.39 (Gronwall). Let x, y and z be measurable non-negative func-tions on the real numbers. If y is bounded and z is integrable on [0, T ] forsome T ≥ 0, and for all 0 ≤ t ≤ T ,

xt ≤ zt +∫ t

0

xsys ds, (A.25)

then for all 0 ≤ t ≤ T ,

xt ≤ zt +∫ t

0

zsysexp(∫ t

s

yr dr)

ds.


Proof. Multiplying both sides of the inequality (A.25) by yt exp(−∫ t

0ys ds

)yields

xtyt exp(−∫ t

0

ys ds)−(∫ t

0

xsys ds)yt exp

(−∫ t

0

ys ds)

≤ ztyt exp(−∫ t

0

ys ds).

The left-hand side can be written as the derivative of a product,

ddt

[(∫ t

0

xsys ds)

exp(−∫ t

0

ys ds)]≤ ztyt exp

(−∫ t

0

ys ds),

which can be integrated to give(∫ t

0

xsys ds)

exp(−∫ t

0

ysds)≤∫ t

0

zsys exp(−∫ s

0

yr dr)

ds,

or equivalently ∫ t

0

xsys ds ≤∫ t

0

zsys exp(∫ t

s

yr dr)

ds.

Combining this with the original equation (A.25) gives the desired result. ut

Corollary A.40. If x is a real-valued function such that for all t ≥ 0,

xt ≤ A+B

∫ t

0

xs ds,

then for all t ≥ 0,xt ≤ AeBt.

Proof. We have for t ≥ 0,

xt ≤ A+∫ t

0

ABeB(t−s) ds

≤ A+ABeBt(e−tB − 1)/(−B) = AeBt.

ut

A.12 Explicit Construction of the UnderlyingSample Space for the Stochastic Filtering Problem

Let (S, d) be a complete separable metric space (a Polish space) and Ω1 bethe space of S-valued continuous functions defined on [0,∞), endowed with

A.12 Explicit Construction of Sample Space 327

the topology of uniform convergence on compact intervals and with the Borelσ-algebra associated denoted with F1,

Ω1 = C([0,∞),S), F1 = B(Ω1). (A.26)

Let X be an S-valued process defined on this space; Xt(ω1) = ω1(t), ω1 ∈Ω1. We observe that Xt is measurable with respect to the σ-algebra F1 andconsider the filtration associated with the process X,

F1t = σ(Xs, s ∈ [0, t]). (A.27)

Let A : Cb(S) → Cb(S) be an unbounded operator with domain D(A) with1 ∈ D(A) and A1 = 0 and let P1 be a probability measure which is a solutionof the martingale problem associated with the infinitesimal generator A andthe initial distribution π0 ∈ P(S), i.e., under P1, the distribution of X0 is π0

and


∫ t

0

Aϕ(Xs) ds, F1t , 0 ≤ t <∞, (A.28)

is a martingale for any ϕ ∈ D(A). Let also Ω2 be defined similarly to Ω1, butwith S = Rm. Hence

Ω2 = C([0,∞),Rm), F2 = B(Ω2). (A.29)

We consider also V to be the canonical process in Ω2, (i.e. Vt(ω2) = ω2(t),ω2 ∈ Ω2) and P2 to be a probability measure such that V is an m-dimensionalstandard Brownian motion on (Ω2,F2) with respect to it. We consider nowthe following.

Ω , Ω1 ×Ω2,

F ′ , F1 ⊗F2,

P , P1 ⊗ P2,

N , B ⊂ Ω : B ⊂ A, A ∈ F , P(A) = 0F , F ′ ∨N .

So (Ω,F ,P) is a complete probability space and, under P, X and V aretwo independent processes. They can be viewed as processes on the productspace (Ω,F ,P) in the usual way: as projections onto their original spaces ofdefinition. If W is the canonical process on Ω, then

W (t) = ω(t) = (ω1(t), ω2(t))

X = p1(ω) where p1 : Ω → Ω1, p1(ω) = ω1

V = p2(ω) where p2 : Ω → Ω2, p2(ω) = ω2.

Mϕt is also a martingale with respect to the larger filtration Ft, where


Ft = σ(Xs, Vs, s ∈ [0, t]) ∨N .

Let h : S→ Rm be a Borel-measurable function with the property that

P

(∫ T

0

‖h(Xs)‖ ds <∞

)= 1 for all T > 0,

Finally let Y be the following stochastic process (usually called the observationprocess)

Yt =∫ t

0

h(s,Xs) ds+ Vt, t ≥ 0.

B

Stochastic Analysis

B.1 Martingale Theory in Continuous Time

The subject of martingale theory is too large to cover in an appendix suchas this. There are many useful references, for example, Rogers and Williams[248] or Doob [81].

Theorem B.1. If M = Mt, t ≥ 0 is a right continuous martingale boundedin Lp for p ≥ 1, that is, supt≥0 E[|Mt|p] < ∞, then there exists an Lp-integrable random variable M∞ such that Mt →M∞ almost surely as t→∞.Furthermore,

1. If M is bounded in Lp for p > 1, then Mt →M∞ in Lp as t→∞.2. If M is bounded in L1 and Mt, t ≥ 0 is uniformly integrable thenMt →M∞ in L1 as t→∞.

If either condition (1) or (2) holds then the extended process Mt, t ∈ [0,∞]is a martingale.

For a proof see Theorem 1.5 of Chung and Williams [53].The following lemma provides a very useful test for identifying martingales.

Lemma B.2. Let M = Mt, t ≥ 0 be a cadlag adapted process such that foreach bounded stopping time T , E[|MT |] < ∞ and E[MT ] = E[M0] then M isa martingale.

Proof. For s < t and A ∈ Fs define

T (ω) ,

s if ω ∈ A,t if ω ∈ Ac.

Then T is a stopping time and

E[M0] = E[MT ] = E[Ms1A] + E[Mt1Ac ],

330 B Stochastic Analysis

and trivially for the stopping time t,

E[M0] = E[Mt] = E[Mt1A] + E[Mt1Ac ],

so E[Mt1A] = E[Ms1A] which implies that Ms = E[Mt | Fs] a.s. which to-gether with the integrability condition implies M is a martingale. ut

By a straightforward change to this proof the following corollary may beestablished.

Corollary B.3. Let Mt, t ≥ 0 be a cadlag adapted process such that foreach stopping time (potentially infinite) T , E[|MT |] <∞ and E[MT ] = E[M0]then M is a uniformly integrable martingale.

Definition B.4. Let M be a stochastic process. If M0 is F0-measurable andthere exists an increasing sequence Tn of stopping times such that Tn → ∞a.s. and such that

MTn = Mt∧Tn −M0, t ≥ 0

is a Ft-adapted martingale for each n ∈ N, then M is called a local martingaleand the sequence Tn is called a reducing sequence for the local martingale M .

The initial condition M0 is treated separately to avoid imposing integrabilityconditions on M0.

B.2 Ito Integral

The stochastic integrals which arise in this book are the integrals of stochasticprocesses with respect to continuous local martingales. The following sectioncontains a very brief overview of the construction of the Ito integral in thiscontext and the necessary conditions on the integrands for the integral to bewell defined.

The results are presented starting from the previsible integrands, sincein the general theory of stochastic integration these form the natural classof integrators. The results then extend in the case of continuous martingaleintegrators to integrands in the class of progressively measurable processes andif the quadratic variation of the continuous martingale is absolutely continuouswith respect to Lebesgue measure (as for example in the case of integrals withrespect to Brownian motion) then this extends further to all adapted, jointlymeasurable processes. It is also possible to construct directly the stochasticintegral with a continuous martingale integrator on the space of progressivelymeasurable processes (this approach is followed in e.g. Ethier and Kurtz [95]).

There are numerous references which describe the material in this sectionin much greater detail; examples include Chung and Williams [53], Karatzasand Shreve [149], Protter [247] and Dellacherie and Meyer [79].

B.2 Ito Integral 331

Definition B.5. The previsible (predictable) σ-algebra denoted P is the σ-algebra of subsets of [0,∞)×Ω generated by left continuous processes valuedin R; that is, it is the smallest σ-algebra with respect to which all left con-tinuous processes are measurable. A process is said to be previsible if it isP-measurable.

Lemma B.6. Let A be the ring† of subsets of [0,∞)×Ω generated by the setsof the form (s, t]×A where A ∈ Fs and 0 ≤ s < t and the sets 0 ×A forA ∈ F0. Then σ(A) = P.

Proof. It suffices to show that any adapted left continuous process (as a gen-erator of P) can be approximated by finite linear combinations of indicatorfunctions of elements of A. Let H be a bounded adapted left continuous pro-cess; define

Ht = limk→∞

limn→∞

nk∑i=2

H(i−1)/n1((i−1)/n,i/n](t).

As Ht is adapted it follows that H(i−1)/n ∈ F(i−1)/n, thus each term in thesum is A-measurable, and therefore by linearity so is the whole sum. ut

Definition B.7. Define the vector space of elementary function E to be thespace of finite linear combinations of indicator functions of elements of A.

Definition B.8. For the indicator function X = 1(s,t]×A for A ∈ Fs, whichan element of E, we can define the stochastic integral∫ ∞

0

Xr dMr , 1A(Mt −Ms).

For X = 10×A where A ∈ F0, define the integral to be identically zero. Thisdefinition can be extended by linearity to the space of elementary functions E.Further define the integral between 0 and t by∫ t

0

Xr dMr ,∫ ∞

0

1[0,t](r)Xr dMr.

Lemma B.9. If M is a martingale and X ∈ E then∫ t

0Xr dMr is a Ft-adapted

martingale.

Proof. Consider Xt = 1A1(r,s](t) where A ∈ Fr. From Definition B.8,∫ t

0

Xp dMp =∫ ∞

0

1[0,t](p)Xp dMp = 1A(Ms∧t −Mr∧t),

and hence as M is a martingale and A ∈ Fr, then by considering separatelythe cases 0 ≤ p ≤ r, r < p ≤ s and p > s it follows that† A ring is a class of subsets closed under finite unions and set differences A \ B

and which contains the empty set.


E[∫ t

0

Xr dMr

∣∣∣∣ Fp] = E [1A(Ms∧t −Mr∧t) | Fp]

= 1AE(Ms∧p −Mr∧p) =∫ p

0

Xs dMs.

By linearity, this result extends to X ∈ E . ut

B.2.1 Quadratic Variation

The total variation is the variation which is used in the construction of theusual Lebesgue–Stieltjes integral. This cannot be used to define a non-trivialstochastic integral, as any continuous local martingale of finite variation isindistinguishable from zero.

Definition B.10. The quadratic variation process† 〈M〉t of a continuoussquare integrable martingale M is a continuous increasing process At startingfrom zero such that M2

t −At is a martingale.

Theorem B.11. If M is a continuous square integrable martingale then thequadratic variation process 〈M〉t exists and is unique.

The following proof is based on Theorem 4.3 of Chung and Williams [53]who attribute the argument to M. J. Sharpe.

Proof. Without loss of generality consider a martingale starting from zero.The result is first proved for a martingale which is bounded by C. For givenn ∈ N, define tnj , j2−n and tnj , t ∧ tnj for j ∈ N and

Snt ,∞∑j=0

(Mtnj+1

−Mtnj

)2

.

By rearrangement of terms in the summations

M2t =

∞∑k=0

(M2tnk+1−M2

tnk

)= 2

∞∑k=0

Mtnk

(Mtnk+1

−Mtnk

)+∞∑k=0

(Mtnk+1

−Mtnk

)2

.

Therefore

Snt = M2t − 2

∞∑k=0

Mtnk

(Mtnk+1

−Mtnk

). (B.1)

† Technically, if we were to study discontinuous processes what is being constructedhere should be denoted [M ]t. The process 〈M〉t, when it exists, is the dual pre-visible projection of [M ]t. In the continuous case, the two processes coincide, andhistorical precedent makes 〈M〉t the more common notation.


For fixed n and t the summation in (B.1) contains a finite number of non zeroterms each of which is a continuous martingale. It therefore follows that theSnt −M2

t is a continuous martingale for each n.It is now necessary to show that as n → ∞, for fixed t, the sequence

Snt , n ∈ N is a Cauchy sequence and therefore converges in L2. If we considerfixed m < n and for notational convenience write tj for tnj , then it is possibleto relate the points on the two dyadic meshes by setting t′j = 2−m[tj2m] andt′j , t∧ t′j ; that is, t′j is the closest point on the coarser mesh to the left of tj .It follows from (B.1) that

Snt − Smt = −2[2nt]∑j=0

(Mtj−Mt′j

)(Mtj+1

−Mtj

). (B.2)

Define Zj , Mtj−Mt′j

; as t′j ≤ tj it follows that Zj is Ftj -measurable. Forj < k since Zj(Mtj+1

−Mtj)Zk is Ftk -measurable it follows that

E[Zj

(Mtj+1

−Mtj

)Zk

(Mtk+1

−Mtk

)]= 0. (B.3)

Hence using (B.3) and the Cauchy–Schwartz inequality

E[(Snt − Smt )2

]= 4E

[2nt]∑j=0

Z2j

(Mtj+1

−Mtj

)2

≤ 4E

sup0≤r≤s≤ts−r<2−m

(Mr −Ms)2

[2nt]∑j=0

(Mtj+1

−Mtj

)2

≤ 4

√√√√√√E

sup

0≤r≤s≤ts−r<2−m

(Mr −Ms)2

2

×

√√√√√√E

[2nt]∑j=0

(Mtj+1

−Mtj

)2

2.

The first term tends to zero using the fact that M being continuous is uni-formly continuous on the bounded time interval [0, t]. It remains to show thatthe second term is bounded. Write aj , (Mtj+1

−Mtj)2, for j ∈ N; then


E

[2nt]∑j=0

(Mtj+1

−Mtj

)2

2 = E

[2nt]∑j=0

aj

2

= E

[2nt]∑j=0

a2j + 2

[2nt]∑j=0

aj

[2nt]∑k=j+1

ak

= E

[2nt]∑j=0

a2j

+ 2E

[2nt]∑j=0

aj E

[2nt]∑k=j+1

ak

∣∣∣∣∣∣ Ftj+1

.It is clear that since the ajs are non-negative and M is bounded by C that

[2nt]∑j=0

a2j ≤ max

l=0,...,[2nt]al

[2nt]∑j=0

aj ≤ 4C2

[2nt]∑j=0

aj

and

E

[2nt]∑k=j+1

ak

∣∣∣∣∣∣ Ftj+1

=∞∑

k=j+1

E[(Mtk+1

−Mtk

)2

| Ftj+1

]

=∞∑

k=j+1

E[M2tk+1−M2

tk| Ftj+1

]= E

[M2t −M2

tj+1| Ftj+1

]≤ C2.

From these two bounds

E

[2nt]∑j=0

(Mtj+1

−Mtj

)2

2 ≤ (4C2 + 2C2)E

[2nt]∑j=0

aj

= 6C2E

[M2t

]<∞.

As this bound holds uniformly in n, m, as n and m → ∞ it follows thatSnt − Smt → 0 in the L2 sense and hence the sequence Snt , n ∈ N convergesin L2 to a limit which we denote St. As the martingale property is preservedby L2 limits, it follows that M2

t − St, t ≥ 0 is a martingale.It is necessary to show that St is increasing. Let s < t,

St − Ss = limn→∞

(Snt − Sns ) in L2.

Then writing k , infj : tj > s,

Snt − Sns =∑tj>s

(Mtj+1

−Mtj

)2

+(Mtk−Mtk−1

)2

−(Ms −Mtk−1

)2

.


Clearly∣∣∣∣(Mtk−Mtk−1

)2

−(Ms −Mtk−1

)2∣∣∣∣ ≤ 2 sup

0≤r≤s≤ts−r<2−m

(Mr −Ms)2,

where the bound on the right-hand side tends to zero in L2 as n → ∞.Therefore in L2

St − Ss = limn→∞

∑tj>s

(Mtj+1

−Mtj

)2

and hence St − Ss ≥ 0 almost surely, so the process S is a.s. increasing.It remains to show that a version of St can be chosen which is almost

surely continuous. By Doob’s L2-inequality applied to the martingale (B.2) itfollows that

E[supt≤a|Snt − Smt |2

]≤ 4E

[(Sna − Sma )2

];

thus a suitable subsequence nk can be chosen such that Snkt converges a.s.uniformly on compact time intervals to a limit S which from the continuityof M must be continuous a.s.

Uniqueness follows from the result that a continuous local martingale offinite variation is everywhere zero. Suppose the process A in the above def-inition were not unique. That is, suppose that also for some Bt continuousincreasing from zero, M2

t −Bt is a martingale. Then as M2t −At is also a mar-

tingale, by subtracting these two equations we get that At−Bt is a martingale,null at zero. It clearly must have finite variation, and hence be zero.

To extend to the general case where the martingale M is not bounded usea sequence of stopping times

Tn , inft ≥ 0 : |Mt| > n;

then MTnt , t ≥ 0 is a bounded martingale to which the proof can be applied

to construct 〈MTn〉. By uniqueness it follows that 〈MTn〉 and 〈MTn+1〉 agreeon [0, Tn] so a process 〈M〉 may be defined. ut

Definition B.12. Define a measure on ([0,∞) × Ω,P) in terms of thequadratic variation of M via

µM (A) , E[∫ ∞

0

1A(s, ω) d〈M〉s]. (B.4)

In terms of this measure we can define an associated norm on a P-measurableprocess X via

‖X‖M ,∫

[0,∞)×ΩX2 dµM . (B.5)


This norm can be written using (B.4) and (B.5) more simply as

‖X‖M = E[∫ ∞

0

X2s d〈M〉s

].

Definition B.13. Define L2P , X ∈ P : ‖X‖M <∞.

This space L2P with associated norm ‖ · ‖M is a Banach space. Denote by

L2P the space of equivalence classes of elements of L2

P , where we consider theequivalence class of an element X to be all those elements Y ∈ L2

P whichsatisfy ‖X − Y ‖M = 0.

Lemma B.14. The space of bounded elements of E, which we denote E isdense in the subspace of bounded functions in L2

P .

Proof. This is a classical monotone class theorem proof which explains therequirement to work within spaces of bounded functions. Define

C =H ∈ P : H is bounded, ∀ε > 0 ∃J ∈ E : ‖H − J‖M < ε

.

It is clear that E ⊂ C. Thus it also follows that the constant function one isincluded in C. The fact that C is a vector space is immediate. It remains toverify that if Hn ↑ H where Hn ∈ C with H bounded that this implies H ∈ C.

Fix ε > 0. By the bounded convergence theorem for Stieltjes integrals, itfollows that ‖Hn − H‖M → 0 as n → ∞; thus we can find N such that forn ≥ N , ‖Hn−H‖M < ε/2. As HN ∈ C, it follows that there exists J ∈ E suchthat ‖J −HN‖M < ε/2. Thus by the triangle inequality ‖H − J‖M ≤ ‖H −HN‖M+‖HN−J‖M < ε. Hence by the monotone class theorem σ(E) ⊂ C. ut

Lemma B.15. For X ∈ E it follows that

E

[(∫ ∞0

Xr dMr

)2]

= ‖X‖M .

Proof. Consider X = 1(s,t]×A where A ∈ Fs and s < t. Then

E

[(∫ ∞0

Xt dMr

)2]

= E

[(∫ ∞0

1(s,t](r)1A dMr

)2]

= E[(Mt −Ms)21A

]= E

[1A(M2t − 2MtMs +M2

s

)]= E

[1A(M2t +M2

s

)]− 2E [1AE [MtMs | Fs]]

= E[1A(M2t −M2

s

)].

Then from the definition of µM it follows that

µM ((s, t]×A) = E[1A(〈M〉t − 〈M〉s)].


We know M2t − 〈M〉t is a local martingale, so it follows that

µM ((s, t]×A) = E

[(∫ ∞0

1(s,t](r)1A dMr

)2]

and by linearity this extends to functions in E . ut

As a consequence of Lemma B.14 it follows that given any bounded X ∈L2P we can construct an approximating sequence Xn ∈ E such that ‖Xn −

X‖M → 0 as n → ∞. Using Lemma B.15 it follows that∫∞

0Xns dMs is a

Cauchy sequence in the L2 sense; thus we can make the following definition.

Definition B.16. For X ∈ L2P we may define the Ito integral in the L2 sense

through the isometry

E

[(∫ ∞0

Xr dMr

)2]

= ‖X‖M . (B.6)

We must check that this extension of the stochastic integral is well defined.That is, consider another approximating sequence Yn → X; we must show thatthis converges to the same limit as the sequence Xn considered previously, butthis is immediate from the isometry.

Remark B.17. From the above definition of the stochastic integral in an L2

sense as a limit of approximations∫∞

0Xnr dMr, it follows that since conver-

gence in L2 implies convergence in probability we can also define the extensionof the stochastic integral as a limit in probability. By a standard result, thereexists a subsequence nk such that

∫∞0Xnkr dMr converges a.s. as k → ∞. It

might appear that this would lead to a pathwise extension (i.e. a definitionfor each ω). However, this a.s. limit is not well defined: different choices ofapproximating sequence can give rise to limits which differ on (potentiallydifferent) null sets. As there are an uncountable number of possible approx-imating sequences the union of these null sets may not be null and thus thelimit not well defined.

The following theorem finds numerous applications throughout the book,usually to show that the expectation of a particular stochastic integral termis 0.

Theorem B.18. If X ∈ L2P and M is a square integrable martingale then∫ t

0Xs dMs is a martingale.

Proof. Let Xn ∈ E be sequence converging to X in the ‖ · ‖M norm; then byLemma B.9 each

∫ t0Xns dMs is a martingale. By the Ito isometry

∫ t0Xns dMs

converges to∫ t

0Xs dMs in L2 and the martingale property is preserved by L2

limits. ut


B.2.2 Continuous Integrator

The foregoing arguments cannot be used to extend the definition of thestochastic integral to integrands outside of the class of previsible processes.For example, the previsible processes do not form a dense set in the spaceof progressively measurable processes so approximation arguments can notbe used to extend the definition to progressively measurable integrands. Theapproach taken here is based on Chung and Williams [53].

Let µM be a measure on [0,∞)×Ω which is an extension of µM (that isµM and µM agree on P and µM is defined on a larger σ-algebra than P).

Given a process X which is B × F-measurable, if there is a previsibleprocess Z such that ∫

[0,∞)×Ω(X − Z)2 dµM = 0, (B.7)

which, by the usual Lebesgue argument, is equivalent to

µM ((t, ω) : Xt(ω) 6= Zt(ω)) = 0,

then we may define∫∞

0Xs dMs ,

∫∞0Zs dMs. In general we cannot hope to

find such a Z for all B × F-measurable X. However, in the case where theintegrator M is continuous we can find such a previsible Z for all progressivelymeasurable X.

Let N be the set of µM null sets and define P = P ∨ N ; then it followsthat for X a P-measurable process, we can find a process Z in P such thatµM ((t, ω) : Xt(ω) 6= Zt(ω)) = 0. Hence (B.7) will hold and consequently wemay define

∫∞0Xs dMs ,

∫∞0Zs dMs. The following theorem is an important

application of this result.

Theorem B.19. Let M be a continuous martingale. Then if X is progres-sively measurable we can define the integral of X with respect to M in the Itosense through the extension of the isometry

E

[(∫ ∞0

Xs dMs

)2]

= E[∫ ∞

0

X2s dµM

].

Proof. From the foregoing remarks, it is clear that it is sufficient to showthat every progressively measurable process X is P-measurable. There aretwo approaches to establishing this: one is direct via the previsible projectionand the other indirect via the optional projection. In either case, the result ofLemma B.21 is established, and the conclusion of the theorem follows. ut

Optional Projection Route

We begin with a measurability result which we need in the proof of the mainresult in this section.


Lemma B.20. If X is progressively measurable and T is a stopping time,then XT 1T<∞ is FT -measurable.

Proof. For fixed t the map ω 7→ X(t, ω) defined on [0, t] × Ω is B[0, t] ⊗ F-measurable. Since T is a stopping time ω 7→ T (ω) ∧ t is Ft-measurable. Bycomposition of functions† it follows that ω 7→ X(T (ω)∧t, ω) is Ft-measurable.Now define Y = XT 1T≤∞; for any t it is clear Y 1T≤t = XT∧t1T≤t. Henceon T ≤ t it follows that Y is Ft-measurable, which by the definition of FTimplies that Y is FT -measurable. ut

Lemma B.21. The set of progressively measurable functions on [0,∞) × Ωis contained in P.

Proof. First we must show that all optional processes are P-measurable. Thisis straightforward: if τ is a stopping time we must show that 1[0,τ ] is P-measurable. But 1[0,τ) is previsible and thus automatically P-measurable,hence it is sufficient to establish that [τ ] , (τ(ω), ω) : τ(ω) < ∞, ω ∈ Ω ∈P. But

µM ([τ ]) = E[∫ ∞

0

1τ(ω)=s d〈M〉s]

= E[〈M〉t − 〈M〉t−] = 0;

the final equality follows from the fact that Mt is continuous.Starting from a progressively measurable process X, by Theorem 2.7 we

can construct its optional projection oX. From (B.4),

µM ((t, ω) : oXt(ω) 6= Xt(ω)) = E[∫ ∞

0

1oXs(ω)6=Xs(ω) d〈M〉s].

Defineτt = infs ≥ 0 : 〈M〉s > t;

since the set (t,∞) × Ω is progressively measurable, and 〈M〉t is continuousand hence progressively measurable, it follows that τt is a stopping time bythe Debut theorem (Theorem A.20). Hence,

µM ((t, ω) : oXt(ω) 6= Xt(ω)) = E[∫ ∞

0

1oXs(ω)6=Xs(ω) d〈M〉s]

= E

[∫ 〈M〉∞0

1oXτs (ω)6=Xτs (ω) ds

]

= E[∫ ∞

0

1τs<∞1oXτs (ω) 6=Xτs (ω) ds].

† It is important to realise that this argument depends fundamentally on the pro-gressive measurability of X, it is in fact the same argument which is used (e.g.in Rogers and Williams [248, Lemma II.73.11]) to show that for progressivelymeasurable X, XT is FT -measurable for T an Ft-stopping time.


Thus using Fubini’s theorem

µM ((t, ω) : oXt(ω) 6= Xt(ω)) = E[∫ ∞

0

1τs<∞1oXτs 6=Xτs ds]

=∫ ∞

0

P(τs <∞, oXτs 6= Xτs) ds.

From Lemma B.20 it follows that for any stopping time τ , Xτ1τ<∞ is Fτ -measurable; thus from the definition of optional projection

oXτ1τ<∞ = E[Xτ1τ<∞ | Fτ ]= Xτ1τ<∞ P-a.s.

Hence µM ((t, ω) : oXt(ω) 6= Xt(ω)) = 0. But we have shown that the optionalprocesses are P-measurable, and oX is an optional process; thus from thedefinition of P there exists a previsible process Z such that µM ((t, ω) : Zt(ω) 6=oXt(ω)) = 0 hence using these two results µM ((t, ω) : Zt(ω) 6= Xt(ω)) = 0which implies that X is P-measurable. ut

Previsible Projection Route

While the previous approach shows that the progressively measurable pro-cesses can be viewed as the class of integrands, the argument is not construc-tive. By considering the previsible projection we can provide a constructiveargument. In brief, if X is progressively measurable and M is a continuousmartingale then ∫ ∞

0

Xs dMs =∫ ∞

0

pXs dMs,

where pX, the previsible projection of X, is a previsible process and theintegral on the right-hand side is to be understood in the sense of DefinitionB.16.

Lemma B.22. If X is progressively measurable and T is a previsible time,then XT 1T<∞ is FT−-measurable.

Proof. If T is a previsible time then there exists an announcing sequenceTn ↑ T such that Tn is a stopping time. By Lemma B.20 it follows for each nthat XTn1Tn<∞ is FTn-measurable. Recall that

FT− =∨n

FTn ,

so if we define random variables Y n , XTn1Tn<∞ and

Y , lim infn→∞

Y n,

then it follows that Y is FT−-measurable. ut


From the Debut theorem,

τt , infs ≥ 0 : 〈M〉s > t

is a Ft-stopping time. Therefore τt−1/n is an increasing sequence of stoppingtimes and their limit is

τt , infs ≥ 0 : 〈M〉s ≥ t

therefore τt is a previsible time. We can now complete the proof of LemmaB.21 using the definition of the previsible projection.

Proof. Starting from a progressively measurable process X by Theorem A.29we can construct its previsible projection pX, from (B.4),

µM (pXt(ω) 6= Xt(ω)) = E[∫ ∞

0

1(s,ω)pXs(ω) 6=Xs(ω) d〈M〉s].

Using the previsible time τt,

µM (pXt(ω) 6= Xt(ω)) = E[∫ ∞

0

1pXs(ω)6=Xs(ω) d〈M〉s]

= E

[∫ 〈M〉∞0

1pXτs (ω)6=Xτs (ω) ds

]

= E[∫ ∞

0

1τs<∞1pXτs (ω)6=Xτs (ω) ds].

Thus using Fubini’s theorem

µM ((t, ω) : pXt(ω) 6= Xt(ω)) =∫ ∞

0

P(τs <∞, pXτs 6= Xτs) ds.

From Lemma B.22 it follows that for any previsible time τ , Xτ1τ<∞ isFτ−-measurable; thus from the definition of previsible projection

pXτ1τ<∞ = E[Xτ1τ<∞ | Fτ−]= Xτ1τ<∞ P-a.s.

Hence µM ((t, ω) : pXt(ω) 6= Xt(ω)) = 0. Therefore X is P-measurable. Wealso see that the previsible process Z in (B.7) is just the previsible projectionof X. ut

B.2.3 Integration by Parts Formula

The stochastic form of the integration parts formula leads to Ito’s formulawhich is the most important result for practical computations.


Lemma B.23. Let M be a continuous martingale. Then

〈M〉t = M2t −M2

0 − 2∫ t

0

Ms dMs.

Proof. Following the argument and notation of the proof of Theorem B.11define Xn by

Xns (ω) ,

∞∑j=0

Mtj (ω)1(tj ,tj+1](s);

while Xn is defined in terms of an infinite number of non-zero terms, it isclear that 1[0,t](s)X

ns ∈ E . Therefore using the definition B.8,

Snt =∞∑j=0

(M2tj+1−M2

tj− 2Mtj

(Mtj+1

−Mtj

))= M2

t −M20 −

∫ ∞0

1[0,t](s)Xns dMs.

As the process M is continuous, it is clear that for fixed ω, Xn(ω) → M(ω)uniformly on compact subsets of time and therefore by bounded convergence,‖Xn1[0,t]−M1[0,t]‖M tends to zero. Thus by the Ito isometry (B.6) the resultfollows. ut

Lemma B.24. Let M and N be square integrable martingales; then

MtNt = M0Nt +∫ t

0

Ms dNs +∫ t

0

Ns dMs + 〈M,N〉t.

Proof. Apply the polarization identity

〈M,N〉t = (〈M +N〉t − 〈M −N〉t)/4

to the result of Lemma B.23, to give

〈M,N〉t = (1/4)(

(Mt +Nt)2 − (M0 +N0)2 − 2∫ t

0

(Ms +Ns) dMs

− 2∫ t

0

(Ms +Ns) dNs − (Mt −Nt)2 − (M0 −N0)2

− 2∫ t

0

(Ms −Ns) dMs + 2∫ t

0

(Ms −Ns) dNs

)= MtNt −M0N0 −

∫ t

0

Ns dMs −∫ t

0

Ms dNs.

ut


B.2.4 Ito’s Formula

Theorem B.25. If X is an Rd-valued semimartingale and f ∈ C2(Rd) then

f(Xt) = f(X0)+d∑i=1

∫ t

0

∂

∂xif(Xs) dXi

s+12

d∑i,j=1

∫ t

0

∂2

∂xi∂xjf(Xs) d〈Xi, Xj〉s.

The continuity condition on f in the statement of Ito’s lemma is important; ifit does not hold then the local time of X must be considered (see for exampleChapter 7 of Chung and Williams [53] or Section IV. 43 of Rogers and Williams[249]).

Proof. We sketch a proof for d = 1. The finite variation case is the standardfundamental theorem of calculus for Stieltjes integration. Consider the caseof M a martingale.

The proof is carried out by showing it holds for f(x) = xk for all k; bylinearity it then holds for all polynomials and by a standard approximationargument for all f ∈ C2(R). To establish the result for polynomials proceedby induction. Suppose it holds for functions f and g; then by Lemma B.24,

d(f(Mt)g(Mt)) = f(Mt) dg(Mt) + g(Mt) df(Mt) + d〈f(Mt), g(Mt)〉t= f(Mt)(g′(Mt) dMt + 1

2g′′(Mt) d〈M〉t)

+ g(Mt)(f ′(Mt) dMt + 12f′′(Mt) d〈M〉t)

+ g′(Mt)f ′(Mt) d〈M〉t.

Since the result clearly holds for f(x) = x, it follows that it holds for allpolynomials. The extension to C2(R) functions follows from a standard ap-proximation argument (see e.g. Rogers and Williams [249] for details). ut

B.2.5 Localization

The integral may be extended to a larger class of integrands by the procedureof localization. Let H be a progressively measurable process. Define a non-decreasing sequence of stopping times

Tn , inft≥0

∫ t

0

H2s d〈M〉s > n

; (B.8)

then it is clear that the process HTnt , Ht∧Tn is in the space LP . Thus the

stochastic integral∫∞

0HTns dMs is defined in the Ito sense of Definition B.16.

Theorem B.26. If for all t ≥ 0,

P(∫ t

0

H2s d〈M〉s <∞

)= 1, (B.9)


then we may define the stochastic integral∫ ∞0

Hs dMs , limn→∞

∫ ∞0

HTns dMs.

Proof. Under condition (B.9) the sequence of stopping times Tn defined in(B.8) tends to infinity P-a.s. It is straightforward to verify that this is welldefined; that is, different choices of sequence Tn tending to infinity give riseto the same limit. ut

This general definition of integral is then a local martingale. We can simi-larly extend to integrators M which are local martingales by using the mini-mum of a reducing sequence Rn for the local martingale M and the sequenceTn above.

B.3 Stochastic Calculus

A very useful result can be proved using the Ito calculus about the character-isation of Brownian motion, due to Levy.

Theorem B.27. Let Bit≥0 be continuous local martingales starting fromzero for i = 1, . . . , n. Then Bt = (B1

t , . . . , Bnt ) is a Brownian motion with

respect to (Ω,F ,P) adapted to the filtration Ft, if and only if

〈Bi, Bj〉t = δijt ∀i, j ∈ 1, . . . , n.

Proof. In these circumstances it follows that the statement Bt is a Brownianmotion is by definition equivalent to stating that Bt − Bs is independent ofFs and is distributed normally with mean zero and covariance matrix (t−s)I.

Clearly if Bt is a Brownian motion then the covariation result followstrivially from the definitions. To establish the converse, we assume 〈Bi, Bj〉t =δijt for i, j ∈ 1, . . . , n and prove that Bt is a Brownian motion.

Observe that for fixed θ ∈ Rn we can define Mθt by

Mθt = f(Bt, t) , exp

(iθ>Bt +

12‖θ‖2 t

).

By application of Ito’s formula to f we obtain (in differential form using theEinstein summation convention)

d (f(Bt, t)) =∂f

∂xj(Bt, t) dBjt +

∂f

∂t(Bt, t) dt+

12

∂2f

∂xj∂xk(Bt, t) d〈Bj , Bk〉t

= iθjf(Bt, t) dBjt +12‖θ‖2f(Bt, t) dt− 1

2θjθkδjkf(Bt, t) dt

= iθjf(Bt, t) dBjt .

B.3 Stochastic Calculus 345

Hence

Mθt = 1 +

∫ t

0

d(f(Bt, t)),

and is a sum of stochastic integrals with respect to continuous local martin-gales and is hence itself a continuous local martingale. But for each t, using| · | to denote the complex modulus

|Mθt | = exp

(12‖θ‖2t

)<∞.

Hence for any fixed time t0, M t0t satisfies

|M t0t | ≤ |M t0

∞| <∞,

and so is a bounded local martingale. Hence M t0t , t ≥ 0 is a genuine mar-

tingale. Thus for 0 ≤ s < t we have

E[exp(iθ>(Bt −Bs)

)| Fs

]= exp

(−1

2(t− s)‖θ‖2

)a.s.

However, this is the characteristic function of a multivariate normal randomvariable distributed as N(O, (t−s)I). Thus by the Levy characteristic functiontheorem Bt −Bs is an N(O, (t− s)I) random variable. ut

B.3.1 Girsanov’s Theorem

Girsanov’s theorem for the change of drift underlies many important results.The result has an important converse but this is not used here.

Theorem B.28. Let M be a continuous martingale, and let Z be the associ-ated exponential martingale

Zt = exp(Mt − 1

2 〈M〉t). (B.10)

If Z is a uniformly integrable martingale, then a new measure Q, equivalentto P, may be defined by

dQdP

, Z∞.

Furthermore, if X is a continuous P local martingale then Xt − 〈X,M〉t is aQ-local martingale.

Proof. Since Z is a uniformly integrable martingale it follows from TheoremB.1 (martingale convergence) that Zt = E[Z∞ | Ft]. Hence Q constructedthus is a probability measure which is equivalent to P. Now consider X, aP-local martingale. Define a sequence of stopping times which tend to infinityvia

Tn , inft ≥ 0 : |Xt| ≥ n or |〈X,M〉t| ≥ n.


Consider the process Y defined via

Y , XTnt − 〈XTn ,M〉t.

By Ito’s formula applied to (B.10), dZt = ZtdMt; a second application of Ito’sformula yields

d(ZtYt) = 1t≤Tn (ZtdYt + YtdZt + 〈Z, Y 〉t)= 1t≤Tn (Zt(dXt − d〈X,M〉t) + YtZtdMt + 〈Z, Y 〉t)= 1t≤Tn (Zt(dXt − d〈X,M〉t)

+ (Xt − 〈X,M〉t)ZtdMt + Ztd〈X,M〉t)=1t≤Tn ((Xt − 〈X,M〉t)ZtdMt + ZtdXt) ,

where the result 〈Z, Y 〉t = Zt〈X,M〉t follows from the Kunita–Watanabeidentity; hence ZY is a P-local martingale. But Z is uniformly integrable andY is bounded (by construction of the stopping time Tn), hence ZY is a genuineP-martingale. Hence for s < t and A ∈ Fs, we have

EQ [(Yt − Ys)1A] = E [Z∞(Yt − Ys)1A] = E [(ZtYt − ZsYs)1A] = 0;

hence Y is a Q-martingale. Thus Xt − 〈X,M〉t is a Q-local martingale, sinceTn is a reducing sequence such that (X − 〈X,M〉)Tn is a Q-martingale, andTn ↑ ∞ as n→∞. ut

Corollary B.29. Let Wt be a P-Brownian motion and define Q as in TheoremB.28; then Wt = Wt − 〈W,M〉t is a Q-Brownian motion.

Proof. Since W is a Brownian motion it follows that 〈W,W 〉t = t for all t ≥ 0.Since Wt is continuous and 〈W , W 〉t = 〈W,W 〉t = t, it follows from Levy’scharacterisation of Brownian motion (Theorem B.27) that W is a Q-Brownianmotion. ut

The form of Girsanov’s theorem in Theorem B.28 is too restrictive for manyapplications of interest. In particular the requirement that the martingale Zbe uniformly integrable and the implied equivalence of P and Q on F rulesout even such simple applications as transforming Xt = µt + Wt to removethe constant drift. In this case the martingale Zt = exp(µWt− 1

2µ2t) is clearly

not uniformly integrable. If we consider A ∈ F∞ defined by

A =

limt→∞

Xt − µtt

= 0, (B.11)

it is clear that P(A) = 1, yet under a measure Q under which X has no driftQ(A) = 0. Since equivalent measures have the same null sets it would followthat if this measure which killed the drift were equivalent to P then A shouldalso be null, a contradiction. Hence on F the measures P and Q cannot beequivalent.


If we consider restricting the definition of the measure Q to Ft for finitet then the above problem is avoided. In the example given earlier under Qt

the process X restricted to [0, t] is a Brownian motion with zero drift. Thisapproach via a family of consistent measures is used in the change of measureapproach to filtering, which is described in Chapter 3. Since we have justshown that there does not exist any measure equivalent to P under which Xis a Brownian motion on [0,∞) it is clear that we cannot, in general, find ameasure Q defined on F∞ such that the restriction of Q to Ft is Qt.

Define a set function on⋃

0≤t<∞ Ft by

Q(A) = Qt(A), ∀A ∈ Ft, ∀t ≥ 0. (B.12)

If we have a finite set A1, . . . , An of elements of⋃

0≤t<∞ Ft, then we can finds such that Ai ∈ Fs for i = 1, . . . , n and since Qs is a probability measureit follows that the set function Q is finitely additive. It is immediate thatQ(∅) = 0 and Q(Ω) = 1.

It is not obvious whether Q is countably additive. If Q is countably ad-ditive, then Caratheodory’s theorem allows us to extend the definition of Qto σ

(⋃0≤t<∞ Ft

)= F∞. This can be resolved in special situations by using

Tulcea’s theorem. The σ-algebras Ft are all defined on the same space, so theatom condition of Tulcea’s theorem is non-trivial (contrast with the case ofthe product spaces used in the proof of the Daniell–Kolmogorov–Tulcea theo-rem), which explains why this extension cannot be carried out in general. Thefollowing corollary gives an important example where an extension is possible.

Corollary B.30. Let Ω = C([0,∞),Rd) and let Xt be the canonical processon Ω. Define Fot = σ(Xs : 0 ≤ s ≤ t). If

Zt = exp(Mt − 1

2 〈M〉t)

is a Fot+-adapted martingale then there exists a unique measure Q on (Ω,Fo∞)such that

dQdP

∣∣∣∣Fot+

= Zt, ∀t

and the process Xt−〈X,M〉t is a Q local martingale with respect to Fot+t≥0.

Proof. We apply Theorem B.28 to the process Zt, which is clearly a uniformlyintegrable martingale (since Zts = E[Ztt | Fos+]). We may thus define a familyQt of measures equivalent to P on Fot+. It is clear that these measures areconsistent; that is for s ≤ t, Qt restricted to Fos+ is identical to Qs.

For any finite set of times t1 < t2 < · · · such that tk → ∞ as k → ∞,since the sample space Ω = C([0,∞),Rd) is a complete separable metricspace, regular conditional probabilities in the sense of Definition 2.28 exist asa consequence of Exercise 2.29, and we may denote them Qtk(· | Ftk−1+) fork = 1, 2, . . ..


The sequence of σ-algebras Fotk+ is clearly increasing. If we consider asequence Ak of atoms with each Ak ∈ Fotk+ such that A1 ⊇ A2 ⊃ ·, then usingthe fact that these are the unaugmented σ-algebras on the canonical samplespace it follows that ∩∞k=1Ak 6= ∅. Therefore, using these regular conditionalprobabilities as the transition kernels, we may now apply Tulcea’s theoremA.11 to construct a measure Q on Fo∞ which is consistent with Qtk on Fotk+ foreach k. The consistency condition ensures that the measure Q thus obtainedis independent of the choice of the times tks. ut

Corollary B.31. Let Wt be a P-Brownian motion and define Q as in Corol-lary B.30; then Wt = Wt − 〈W,M〉t is a Q-Brownian motion with respect toFot+.

Proof. As for Corollary B.29. ut

B.3.2 Martingale Representation Theorem

The following representation theorem has many uses. The proof given hereonly establishes the existence of the representation. The results of Clark al-low an explicit form to be established (see Nualart [227, Proposition 1.3.14]for details, or Section IV.41 of Rogers and Williams [249] for an elementaryaccount).

Theorem B.32. Let B be an m-dimensional Brownian motion and let Ft bethe right continuous enlargement of the σ-algebra generated by B augmented†

with the null sets N . Let T > 0 be a constant time. If X is a square integrablerandom variable measurable with respect to the σ-algebra FT then there existsa previsible νs such that

X = E[X] +∫ T

0

ν>s dBs. (B.13)

Proof. To establish the respresentation (B.13), without loss of generality wemay consider the case EX = 0 (in the general case apply the result toX−EX).Define the space

L2T =

H : H is Ft-previsible and E

[∫ T

0

‖Hs‖2 ds

]<∞

.

Consider the stochastic integral map

J : L2T → L2(FT ),

defined by† This condition is satisfied automatically if the filtration satisfies the usual condi-

tions.


J(H) =∫ T

0

H>s dBs.

As a consequence of the Ito isometry theorem, this map is an isometry. Hencethe image V under J of the Hilbert space L2

T is complete and hence a closedsubspace of L2

0(FT ) = H ∈ L2(FT ) : EH = 0. The theorem is proved if wecan establish that the image is equal to the whole space L2

0(FT ) for the imageis the space of random variables X which admit a representation of the form(B.13).

Consider the orthogonal complement of V in L20(FT ). We aim to show that

every element of this orthogonal complement is zero. Suppose that Z is in theorthogonal complement of L2

0(FT ); thus

E(ZX) = 0 for all X ∈ L20(FT ). (B.14)

Define Zt = E[Z | Yt] which is an L2-bounded martingale. We know that theσ-algebra F0 is trivial by the Blumental 0–1 law therefore

Z0 = E[Z | F0] = E(Z) = 0 P-a.s.

Let H ∈ L2T and NT , J(H) and define Nt , E[NT | Ft] for 0 ≤ t ≤ T .

It is clear that NT ∈ V . Let S be a stopping time such that S ≤ T ; then byoptional sampling

NS = E[NT | FS ] = E

[∫ S

0

H>s dBs +∫ T

S

H>s dBs

∣∣∣∣∣FS]

= J(H1[0,S]),

so consequently NS ∈ V . The orthogonality relation (B.14) then implies thatE(ZNS) = 0. Thus using the properties of conditional expectation

0 = E[ZNS ] = E[E[ZNS | FS ]] = E[NSE[Z | FS ]] = E[ZSNS ].

Since this holds for S a bounded stopping time, and ZT and NT are squareintegrable, it follows that ZtNt is a uniformly integrable martingale and hence〈Z,N〉t is a null process.

Let εt be an element of the set St defined in Lemma B.39 where thestochastic process Y is taken to be the Brownian motion B. Extending J inthe obvious way to m-dimensional vector processes, we have that

εt = 1 + J(iεr1[0,t])

for some r ∈ L∞([0, t],Rm). Using the above, Ztεt = Z0 + ZtJ(iεr1[0,t]). BothZtJ(iεr1[0,t]), t ≥ 0 and Zt, t ≥ 0 are martingales and Z0 = 0; hence

E[εtZt] = E[Z0] + E[ZtJ

(iεr1[0,t]

)]= E(Z0) = 0.

Thus since this holds for all εt ∈ St and the set St is total this implies thatZt = 0 P-a.s. ut


Remark B.33. For X a square integrable Ft-adapted martingale this resultcan be applied to XT , followed by conditioning and use of the martingaleproperty to obtain for any 0 ≤ t ≤ T ,

Xt = E[XT | Ft] = E[XT ] +∫ t∧T

0

ν>s dBs

= E(X0) +∫ t

0

ν>s dBs.

As the choice of the constant time T was arbitrary, it is clear that this resultholds for all t ≥ 0.

B.3.3 Novikov’s Condition

One of the most useful conditions for checking whether a local martingale ofexponential form is a martingale is that due to Novikov.

Theorem B.34. If Zt = exp(Mt − 1

2 〈M〉t)

for M a continuous local martin-gale, then a sufficient condition for Z to be a martingale is that

E[exp( 1

2 〈M〉t]<∞, 0 ≤ t <∞.

Proof. Define the stopping time

Sb = inft ≥ 0 : Ms − s = b

and note that P(Sb <∞) = 1. Then define

Yt , exp(Mt − 12 t); (B.15)

it follows by the optional stopping theorem that E[exp(MSb− 12Sb)] = 1, which

implies E[exp( 12Sb)] = e−b. Consider

Nt , Yt∧Sb , t ≥ 0,

which is also a martingale. Since P(Sb <∞) = 1 it follows that

N∞ = lims→∞

Ns = exp(MSb − 12Sb).

By Fatou’s lemma Ns is a supermartingale with last element. But E(N∞) =1 = E(N0) whence N is a uniformly integrable martingale. So by optionalsampling for any stopping time R,

E[exp(MR∧Sb − 1

2 (R ∧ Sb))]

= 1.

Fix t ≥ 0 and set R = 〈M〉t. It then follows for b < 0,

E(1Sb<〈M〉t exp

(b+ 1

2Sb))

+ E(1Sb≥〈M〉t exp

(Mt − 1

2 〈M〉t))

= 1.

The first expectation is bounded by ebE(

12 〈M〉t

), thus from the condition of

the theorem it converges to zero as b → −∞. The second term converges toE(Zt) as a consequence of monotone convergence. Thus E(Zt) = 1. ut


B.3.4 Stochastic Fubini Theorem

The Fubini theorem of measure theory has a useful extension to stochasticintegrals. The form stated here requires a boundedness assumption and assuch is not the most general form possible, but is that which is most usefulfor applications. We assume that all the stochastic integrals are with respectto continuous semimartingales, because this is the framework considered here.To extend the result it is simply necessary to stipulate that a cadlag version ofthe stochastic integrals be chosen. For a more general form see Protter [247,Theorem IV.46].

In this theorem we consider a family of processes parametrised by an indexa ∈ A, and let µ be a finite measure on the space (A,A); that is, µ(A) <∞.

Theorem B.35. Let X be a semimartingale and µ a finite measure. LetHat = H(t, a, ω) be a bounded B[0, t] ⊗ A ⊗ P measurable process and

Zat ,∫ t

0Has dXs. If we define Ht ,

∫AHat µ(da) then Yt =

∫AZa µ(da) is

the process given by the stochastic integral∫ t

0Hs dXs.

Proof. By stopping we can reduce the case to that of X ∈ L2. As a conse-quence of the usual Fubini theorem it suffices to consider X a martingale.The proof proceeds via a monotone class argument. Suppose H(t, a, ω) =K(t, ω)f(a) for f bounded A-measurable. Then it follows that

Zt = f(a)∫ t

0

K(s, ω) dXs,

and hence ∫A

Zat µ(da) =∫A

f(a)(∫ t

0

K(s, ω) dXs

)µ(da)

=∫ t

0

K(s, ω) dXs

∫A

f(a)µ(da)

=∫ t

0

(∫A

f(a)µ(da)K(s, ω))

dXs

=∫ t

0

Hs dXs.

Thus we have established the result in this simple case and by linearity to thevector space of finite linear combinations of bounded functions of this form.It remains to show the monotone property; that is, suppose that the resultholds for Hn and Hn → H. We must show that the result holds for H.

Let Zan,t ,∫ t

0Han dXs. We are interested in convergence uniformly in t;

thus note that

E[supt

∣∣∣∣∫A

Zan,t µ(da)−∫A

Zat µ(da)∣∣∣∣] ≤ E

[∫A

supt|Zan,t − Zat |µ(da)

].


We show that the right-hand side tends to zero as n → ∞. By Jensen’sinequality and Cauchy–Schwartz we can compute as follows,(

E[∫

A


])2

≤ E

[(∫A


)2]

≤∫A

µ(da) E[∫

A

supt|Zan,t − Zat |2 µ(da)

].

Then an application of the non-stochastic version of Fubini’s theorem followedby Doob’s L2-inequality implies that

1µ(A)

E

[(∫A


)2]≤∫A

E

[sup

s∈[0,T ]

|Zan,s − Zas |2]µ(da)

≤ 4∫A

E[(Zan,∞ − Za∞)2

]µ(da)

≤ 4∫A

E [〈Zan − Za〉∞] µ(da).

Then by the Kunita–Watanabe identity

1µ(A)

E(∫

A


)2

≤ 4∫A

E(∫ ∞

0

(Han,s −Ha

s )2 d〈X〉s)µ(da).

Since Hn increases monotonically to a bounded process H it follows thatHn and H are uniformly bounded; we may apply the dominated convergencetheorem to the double integral and expectation and thus the right-hand sideconverges to zero. Thus

limn→∞

E[supt

∣∣∣∣∫A

Zan,t µ(da)−∫A

Zat µ(da)∣∣∣∣] = 0. (B.16)

We may conclude from this that∫A

supt|Zan,t − Zat |µ(da) <∞ a.s.

as a consequence of which∫A|Zat |µ(da) < ∞ for all t a.s., and thus the

integral∫AZat µ(da) is defined a.s. for all t. Defining Hn,t ,

∫AHan,t µ(da), we

have from (B.16) that∫ t

0Hn,sdXs converges in probability uniformly in t to∫

AZat µ(da). Since a priori the result holds for Hn we have that∫ t

0

Hn,s dXs =∫A

Zan,t µ(da),


and since by the stochastic form of the dominated convergence theorem∫ t0Hn,s dXs tends to

∫ t0Hs dXs as n→∞ it follows that∫ t

0

Hs dXs =∫A

Zat µ(da).

ut

B.3.5 Burkholder–Davis–Gundy Inequalities

Theorem B.36. If F : [0,∞) → [0,∞) is a continuous increasing functionsuch that F (0) = 0, and for every α > 1

KF = supx∈[0,∞)

F (αx)F (x)

<∞,

then there exist constants cF and CF such that for every continuous localmartingale M ,

cFE[F(√〈M〉∞

)]≤ E

[F

(supt≥0|Mt|

)]≤ CFE

[F(√〈M〉∞

)].

An example of a suitable function F which satisfies the conditions of thetheorem is F (x) = xp for p > 0.

Various proofs exist of this result. The proof given follows Burkholder’sapproach in Chapter II of [36]. The proof requires the following lemma.

Lemma B.37. Let X and Y be nonnegative real-valued random variables. Letβ > 1, δ > 0, ε > 0 be such that for all λ > 0,

P(X > βλ, Y ≤ δλ) ≤ εP(X > λ). (B.17)

Let γ and η be such that F (βλ) ≤ γF (λ) and F (δ−1λ) ≤ ηF (λ). If γε < 1then

E [F (X)] ≤ γη

1− γεE [F (Y )] .

Proof. Assume without loss of generality that E[F (X)] <∞. It is clear from(B.17) that for λ > 0,

P(X > βλ) = P(X > βλ, Y ≤ δλ) + P(X > βλ, Y > δλ)≤ εP(X > λ) + P(Y > δλ). (B.18)

Since F (0) = 0 by assumption, it follows that

F (x) =∫ x

0

dF (λ) =∫ ∞

0

Iλ<x dF (λ);

thus by Fubini’s theorem


E[F (X)] =∫ ∞

0

P(X > λ) dF (λ).

Thus using (B.18) it follows that

E[F (X/β)] =∫ ∞

0

P(X > βλ) dF (λ)

≤ ε∫ ∞

0

P(X > λ) dF (λ) +∫ ∞

0

P(Y > δλ) dF (λ)

≤ εE[F (X)] + E[Y/δ];

from the conditions on η, and γ it then follows that

E[F (X/β)] ≤ εγE[F (X/β)] + ηE[F (Y )].

Since we assumed E[F (X)] <∞, and εγ < 1, it follows that

E[F (X/β)] ≤ η

1− εγE[F (Y )],

and the result follows using the condition on γ. ut

We can now prove the Burkholder–Davis–Gundy inequality, by using theabove lemma.

Proof. Let τ = infu : |Mu| > λ which is an Ft-stopping time. Define Nt ,(Mτ+t −Mτ )2 − (〈M〉τ+t − 〈M〉τ ), which is a continuous Fτ+t-adapted localmartingale. Choose β > 1, 0 < δ < 1. On the event defined by supt≥0 |Mt| >βλ, 〈M〉∞ ≤ δ2λ2 the martingale Nt must hit the level (β − 1)2λ2 − δ2λ2

before it hits −δ2λ2.From elementary use of the optional sampling theorem the probability of

a martingale hitting a level b before a level a is given by −a/(b− a); thus

P(

supt≥0|Mt| > βλ, 〈M〉∞ ≤ δ2λ2 | Fτ

)≤ δ2/(β − 1)2.

Hence as β > 1,

P(

supt≥0|Mt| > βλ, 〈M〉∞ ≤ δ2λ2

)= P

(supt≥0|Mt| > βλ, 〈M〉∞ ≤ δ2λ2, τ <∞

)= E

[P(

supt≥0|Mt| > βλ, 〈M〉∞ ≤ δ2λ2

∣∣∣∣Fτ) 1τ<∞

]≤ δ2P(τ <∞)/(β − 1)2.

It is immediate that since β > 1, F (βλ) < KFF (λ) and similarly since δ < 1,F (λ/δ) < KFF (λ), so we may take γ = η = KF . Now we can choose 0 < δ < 1

B.5 Total Sets in L1 355

sufficiently small that εγ = δ2/(β − 1)2 < 1/KF . Therefore all the conditionsof Lemma B.37 are satisfied whence

E[F

(supt≥0|Mt|

)]≤ CE

[F(√〈M〉∞

)]and the opposite inequality can be established similarly. ut

B.4 Stochastic Differential Equations

Theorem B.38. Let f : Rd → Rd and σ : Rd → Rp be Lipschitz functions.That is, there exist positive constants Kf and Kσ such that

‖f(x)− f(y)‖ ≤ Kf‖x− y‖, ‖σ(x)− σ(y)‖ ≤ Kσ‖x− y‖,

for all x, y ∈ Rd.Given a probability space (Ω,F ,P) and a filtration Ft, t ≥ 0 which

satisfies the usual conditions, let W be an Ft-adapted Brownian motion andlet ζ be an F0-adapted random variable. Then there exists a unique continuousadapted process X = Xt, t ≥ 0 which is a strong solution of the SDE,

Xt = ζ +∫ t

0

f(Xs) ds+∫ t

0

σ(Xs) dWs.

The proof of this theorem can be found as Theorem 10.6 of Chung andWilliams [53] and is similar to the proof of Theorem 2.9 of Chapter 5 inKaratzas and Shreve [149].

B.5 Total Sets in L1

The use of the following density result in stochastic filtering originated in thework of Krylov and Rozovskii.

Lemma B.39. On the filtered probability space (Ω,F , P) let Y be a Brownianmotion starting from zero adapted to the filtration Yt; then define the set

St =εt = exp

(i

∫ t

0

r>s dYs +12

∫ t

0

‖rs‖2 ds)

: r ∈ L∞ ([0, t],Rm)

(B.19)

Then St is a total set in L1(Ω,Yt, P). That is, if a ∈ L1(Ω,Yt, P) and E[aεt] =0, for all εt ∈ St, then a = 0 P-a.s. Furthermore each process ε in the set Stsatisfies an SDE of the form

dεt = iεtr>t dYt,

for some r ∈ L∞([0, t],Rm).


Proof. We follow the proof in Bensoussan [13, page 83]. Define a set

S′t =εt = exp

(i

∫ t

0

r>s dYs

)r ∈ L∞([0, t],Rm)

.

Let a be a fixed element of L1(Ω,Yt, P) such that E[aεt] = 0 for all εt ∈ S′t.This can easily be seen to be equivalent to the statement that E[aεt] = 0 for allεt ∈ St, which we assume. To establish the result, we assume that E[aεt] = 0for all εt ∈ S′t, and show that a is zero a.s. Take t1, t2, . . . , tp ∈ (0, t) witht1 < t2 < · · · < tp, then given l1, l2, . . . , ln ∈ Rm, define

µp , lp, µp−1 , lp + lp−1, . . . µ1 , lp + · · ·+ l1.

Adopting the convention that t0 = 0, define a function

rt =µh for t ∈ (th−1, th), h = 1, . . . , p,0 for t ∈ (tp, T ),

whence as Yt0 = Y0 = 0,

p∑h=1

l>h Yth =p∑

h=1

µ>h (Yth − Yth−1) =∫ t

0

r>s dYs.

Hence for a ∈ L1(Ω,Yt, P)

E

[a exp

(i

p∑h=1

l>h Yth

)]= E

[a exp

(i

∫ t

0

r>s dYs

)]= 0,

where the second equality follows from the fact that we have assumed E[aεt] =0 for all ε ∈ S′t. By linearity therefore,

E

[a

K∑k=1

ck exp

(i

p∑h=1

l>h,kYth

)]= 0,

where this holds for all K and for all coefficients c1, . . . , cK ∈ C, and valueslh,k ∈ R. Let F (x1, . . . , xp) be a continuous bounded complex-valued func-tion defined on (Rm)p. By Weierstrass’ approximation theorem, there existsa uniformly bounded sequence of functions of the form

P (n)(x1, . . . , xp) =Kn∑k=1

c(n)k exp

(i

p∑h=1

(l(n)h,k)>xh

)such that

limn→∞

P (n)(x1, . . . , xp) = F (x1, . . . , xp).

Hence we have E[aF (Yt1 , . . . , Ytp)] = 0 for every continuous bounded func-tion F , and by a further approximation argument, we can take F to be a

B.5 Total Sets in L1 357

bounded function, measurable with respect to the σ-algebra σ(Yt1 , . . . , Ytp).Since t1, t2, . . . , tp were chosen arbitrarily, we obtain that E[ab] = 0, for b anybounded Yt-measurable function. In particular it gives E[a2 ∧m] = 0 for ar-bitrary m; hence a = 0 P-a.s. ut

The following corollary enables us to use a smaller set of functions in thedefinition of the set St, in particular we can consider only bounded continuousfunctions with any number m of bounded continuous derivatives.

Corollary B.40. Assume the same conditions as in Lemma B.39. Define theset

Spt =εt = exp

(i

∫ t

0

r>s dYs +12

∫ t

0

‖rs‖2 ds)

: r ∈ Cpb ([0, t],Rm)

(B.20)

where m is an arbitrary non-negative integer. Then Smt is a total set inL1(Ω,Yt, P). That is, if a ∈ L1(Ω,Yt, P) and E[aεt] = 0, for all εt ∈ St,then a = 0 P-a.s. Furthermore each process ε in the set St satisfies an SDEof the form

dεt = iεtr>t dYt,

for some r ∈ L∞([0, t],Rm).

Proof. Let us prove the corollary for the case p = 0, that is, for r a boundedcontinuous function. To do this, as a consequence of Lemma B.39, it suffices toshow that if a ∈ L1(Ω,Yt, P) and E[aεt] = 0, for all εt ∈ S0

t , then E[aεt] = 0,for all εt ∈ St. Pick an arbitrary εt ∈ St,

εt = exp(i

∫ t

0

r>s dYs +12

∫ t

0

‖rs‖2 ds), r ∈ L∞([0, t],Rm).

First let us note that by the fundamental theorem of calculus, as r ∈L∞([0, t],Rm), the function p : [0, t]→ Rm defined as

ps =∫ s

0

ru du

is continuous and differentiable almost everywhere. Moreover, for almost alls ∈ [0, t]

dpsds

= rs.

Now let rn ∈ C0b ([0, t],Rm) be defined as

rns , n(ps − p0∨s−1/n

), s ∈ [0, t].

Then rn is uniformly bounded by same bound as r and from the above, foralmost all s ∈ [0, t], limn→∞ rns = rs. By the bounded convergence theorem,


limn→∞

∫ t

0

‖rns ‖2 ds =∫ t

0

‖rs‖2 ds

and also

limn→∞

E

[(∫ t

0

r>s dYs −∫ t

0

(rns )> dYs

)2]

= 0.

Hence at least for a subsequence (rnk)nk>0, by the Ito isometry

limk→∞

∫ t

0

(rnks )> dYs =∫ t

0

r>s dYs, P-a.s.

and hence, the uniformly bounded sequence

εkt = exp(i

∫ t

0

(rnks )> dYs +12

∫ t

0

‖rnks ‖2 ds)

converges, P-almost surely to εt. Then, via another use of the dominatedconvergence theorem

E[aεt] = limk→∞

E[aεkt ] = 0,

since εkt ∈ S0t for all k ≥ 0. This completes the proof of the corollary for p = 0.

For higher values of p, one iterates the above procedure. ut

B.6 Limits of Stochastic Integrals

The following proposition is used in the proof of the Zakai equation.

Proposition B.41. Let (Ω,F ,P) be a probability space, Bt,Ft be a stan-dard n-dimensional Brownian motion defined on this space and Ψn, Ψ be anFt-adapted process such that

∫ t0Ψ2n ds <∞,

∫ t0Ψ2 ds <∞, P-a.s. and

limn→∞

∫ t

0

‖Ψn − Ψ‖2 ds = 0

in probability; then

limn→∞

supt∈[0,T ]

∣∣∣∣∫ t

0

(Ψ>n − Ψ>) dBs

∣∣∣∣ = 0

in probability.

Proof. Given arbitrary t, ε, η > 0 we first prove that for an n-dimensionalprocess ϕ,

P(

sup0≤s≤t

∣∣∣∣∫ s

0

ϕ>r dBr

∣∣∣∣ ≥ ε) ≤ P(∫ t

0

‖ϕs‖2 ds > η

)+

4ηε2. (B.21)

B.6 Limits of Stochastic Integrals 359

To this end, define

τη , inft :∫ t

0

‖ϕs‖2 ds > η

,

and a corresponding stopped version of ϕ,

ϕηs , ϕs1[0,τη ](s).

Then using these definitions

P(

sup0≤s≤t

∣∣∣∣∫ s

0

ϕ>r dBr

∣∣∣∣ ≥ ε) = P(τη < t; sup

0≤s≤t

∣∣∣∣∫ s

0

ϕ>r dBr

∣∣∣∣ ≥ ε)+ P

(τη ≥ t; sup

0≤s≤t

∣∣∣∣∫ s

0

ϕ>r dBr

∣∣∣∣ ≥ ε)≤ P (τη < t) + P

(sup

0≤s≤t

∣∣∣∣∫ s

0

(ϕηr)> dBr

∣∣∣∣ ≥ ε)≤ P

(∫ t

0

‖ϕs‖2 ds > η

)+ P

(sup

0≤s≤t

∣∣∣∣∫ s

0

(ϕηr)> dBr

∣∣∣∣ ≥ ε) .By Chebychev’s inequality and Doob’s L2-inequality the second term on theright-hand side can be bounded

P(

sup0≤s≤t

∣∣∣∣∫ s

0

(ϕηr)> dBr

∣∣∣∣ ≥ ε) ≤ 1ε2

E

[(sup

0≤s≤t

∣∣∣∣∫ s

0

(ϕηr)> dBr

∣∣∣∣)2]

≤ 4ε2

E

[(∫ t

0

(ϕηr)> dBr

)2]

≤ 4ε2

E[∫ t

0

‖ϕηr‖2 dr

]≤ 4ηε2,

which establishes (B.21). Applying this result with fixed ε to ϕ = Ψn − Ψyields

P

(supt∈[0,T ]

∣∣∣∣∫ t

0

(Ψ>n − Ψ>) dBs

∣∣∣∣ ≥ ε)≤ P

(∫ t

0

‖Ψn − Ψ‖2 ds > η

)+

4ηε2.

Given arbitrary δ > 0, by choosing η < δε2/8 the second term on the right-hand side is then bounded by δ/2 and with this η by the condition of theproposition there exists N(η) such that for n ≥ N(η) the first term is boundedby δ/2. Thus the right-hand side can be bounded by δ. ut


B.7 An Exponential Functional of Brownian motion

In this section we deduce an explicit expression of a certain exponential func-tional of Brownian motion which is used in Chapter 6. Let Bt, t ≥ 0 bea d-dimensional standard Brownian motion. Let β : [0, t]→ Rd be a boundedmeasurable function, Γ a d × d real matrix and δ ∈ Rd. In this section, wecompute the following functional of B,

Iβ,Γ,δt = E[

exp(∫ t

0

B>s βs ds− 12

∫ t

0

‖ΓBs‖2 ds)∣∣∣∣Bt = δ

]. (B.22)

In (B.22) we use the standard notation

B>s βs =d∑i=1

Bisβis, ‖ΓBs‖2 =

d∑i,j=1

(Γ ijBjs

)2, s ≥ 0.

To obtain a closed formula for (B.22), we use Levy’s diagonalisation procedure,a powerful tool for deriving explicit formulae. Other results and techniques ofthis kind can be found in Yor [280] and the references contained therein. Theorthogonal decomposition of Bs with respect to Bt is

Bs =s

tBt +

(Bs −

s

tBt

), s ∈ [0, t],

and using the Fourier decomposition of the Brownian motion (as in Wiener’sconstruction of the Brownian motion)

Bs =s

tBt +

∑k≥1

√2t

sin(ksπ/t)kπ/t

ξk, s ∈ [0, t], (B.23)

where ξk; k ≥ 1 are standard normal random vectors with independententries, which are also independent of Bt and the infinite sum has a subse-quence of its partial sums which almost surely converges uniformly (see Itoand McKean [135, page 22]), we obtain the following.

Lemma B.42. Let ν ∈ R and µk ∈ Rd, k ≥ 1 be the following constants

νβ,Γ,δ(t) , exp(

1t

∫ t

0

sδ>βs ds− 16‖Γδ‖2 t

)µβ,Γ,δk (t) ,

∫ t

0

sin(ksπ/t)kπ/t

βs ds+ (−1)kt2

k2π2Γ>Γδ, k ≥ 1.

Then

Iβ,Γ,δt = νβ,Γ,δ(t)E

exp

∑k≥1

(√2tξ>k µ

β,Γ,δk (t)− t2

2k2π2‖Γξk‖2

). (B.24)

B.7 An Exponential Functional of Brownian motion 361

Proof. We have from (B.23),∫ t

0

B>s βs ds =1t

∫ t

0

sδ>βs ds+∑k≥1

√2t

∫ t

0

sin(ksπ/t)kπ/t

ξ>k βs ds (B.25)

and similarly∫ t

0

‖ΓBs‖2 ds =13‖Γδ‖2t− 2

√2t

∑k≥1

(−1)kt2

k2π2ξ>k Γ

>Γδ

+∫ t

0

∥∥∥∥∥∥Γ√2

t

∑k≥1

sin(ksπ/t)kπ/t

ξk

∥∥∥∥∥∥2

ds. (B.26)

Next using the standard orthonormality results for Fourier series

∫ t

0

(√2t

sin(ksπ

t

))2

ds = 1, ∀k ≥ 1,∫ t

0

sin(k1sπ

t

)sin(k2sπ

t

)ds = 0, ∀k1, k2 ≥ 1, k1 6= k2,

it follows that

∫ t

0

∥∥∥∥∥∥Γ√2

t

∑k≥1

sin(ksπ/t)kπ/t

ξk

∥∥∥∥∥∥2

ds =∑k≥1

‖Γξk‖2t2

k2π2. (B.27)

The identity (B.24) follows immediately from equations (B.25), (B.26) and(B.27). ut

Let P be an orthogonal matrix (PP> = P>P = I) and D be a diagonalmatrix D = diag(γ1, γ2, . . . , γd) such that Γ>Γ = P>DP . Obviously (γi)

di=1

are the eigenvalues of the real symmetric matrix Γ>Γ .

Lemma B.43. Let aβ,Γ,δi,k (t), for i = 1, . . . , d and k ≥ 1 be the followingconstants

aβ,Γ,δi,k (t) =d∑j=1

P ij(µβ,Γ,δk (t)

)j.

Then

Iβ,Γ,δt = νβ,Γ,δ(t)d∏i=1

1√∏k≥1

[γit2

k2π2 + 1] exp

∑k≥1

aβ,Γ,δi,k (t)2(t2γik2π2 + 1

)t

. (B.28)


Proof. Let ξk, k ≥ 1 be the independent identically distributed standardnormal random vectors defined by ξk = Pξk for any k ≥ 1. As a consequenceof Lemma B.42 we obtain that

Iβ,Γ,δt = νβ,Γ,δ(t)E

exp

∑k≥1

(√2tξ>k Pµ

β,Γ,δk (t)− t2

2k2π2ξ>k Dξk

). (B.29)

Define the σ-algebras

Gk , σ(ξp, p ≥ k) and G ,⋂k≥1

Gk.

Now define

ζ , exp

∑k≥1

(√2tξ>k Pµ

β,Γ,δk (t)− t2

2k2π2ξ>k Dξk

) ;

using the independence of ξ1, . . . , ξn, . . . and Kolmogorov’s 0–1 Law (seeWilliams [272, page 46]), we see that

E[ζ] = E

ζ∣∣∣∣∣∣⋂k≥1

Gk

.Since Gk is a decreasing sequence of σ-algebras, the Levy downward theorem(see Williams [272, page 136]) implies that

E

ζ∣∣∣∣∣∣⋂k≥1

Gk

= limk→∞

E[ζ | Gk].

Hence we determine first E[ζ | Gk] and then take the limit as k →∞ to obtainthe expectation in (B.29). Hence

E[ζ] =∏k≥1

E

[exp

((√2tξ>k Pµ

β,Γ,δk (t)− t2

2k2π2ξ>k Dξk

))]

=∏k≥1

d∏i=1

1√2π

∫ ∞−∞

exp

√2taβ,Γ,δi,k (t)x−

(t2γik2π2 + 1

)x2

2

dx,

and identity (B.28) follows immediately. ut

Proposition B.44. Let fβ,Γ (t) be the following constant


fβ,Γ (t) ,∫ t

0

∫ t

0

d∑i=1

sinh((s− t)√γi) sinh(s′√γi)

2√γi sinh

(t√γi)×

d∑j=1

P ijβjs

d∑j′=1

P ij′βj′

s′ dsds′,

and Rt,β,Γ (δ) be the following second-order polynomial in δ

Rt,β,Γ (δ) ,

∫ t

0

d∑i=1

sinh(s√γi)

γi sinh(t√γi)

d∑j=1

P ijβjs ds

d∑j′=1

P ij′ (Γ>Γδ

)j′

−d∑i=1

coth(t√γi)

2γi√γi

d∑j=1

P ij(Γ>Γδ

)j2

.

Then

Iβ,Γ,δt =d∏i=1

√t√γi

sinh(t√γi)

exp(fβ,Γ (t) +Rt,β,Γ (δ) +

‖δ‖2

2t

). (B.30)

Proof. Using the classical identity (B.35), the infinite product in the denom-inator of (B.28) is equal to sinh(t

√γi)/(t

√γi). Then we need to expand the

argument of the exponential in (B.28). The following argument makes use ofthe identities (B.32)–(B.34). We have that

aβ,Γ,δi,k (t) =∫ t

0

sin(ksπ/t)kπ/t

cβ,Γi (s) ds+ (−1)kt2

k2π2cΓ,δi

and

aβ,Γ,δi,k (t)2 =∫ t

0

∫ t

0

sin(ksπ/t)kπ/t

sin(ks′π/t)kπ/t

cβ,Γi (s)cβ,Γi (s′) dsds′

+ 2(−1)kt2

k2π2cΓ,δi

∫ t

0

sin(ksπ/t)kπ/t

cβ,Γi (s) ds

+(

t2

k2π2cΓ,δi

)2

, (B.31)

where cβ,Γi (s) =∑dj=1 P

ijβjs and cΓ,δi =∑dj=1 P

ij(Γ>Γδ

)j . Next we sum upover k each of the three terms on the right-hand side of (B.31). For the firstterm we use

364 B Stochastic Analysis∑k≥1

sin(ksπ/t)sin(ks′π/t)(kπ/t)2t (t2γi/(k2π2) + 1)

=t

2π2

∑k≥1

cos(k(s− s′)π/t)− cos(k(s+ s′)π/t)t2γi/π2 + k2

=t

2π2

(π

2t√γi/π

)cosh

((s− t− s′)√γi

)− cosh

((s− t+ s′)

√γi)

sinh(t√γi)

=sinh((s− t)√γi)sinh(s′

√γi)

2√γi sinh

(t√γi) ;

hence

∑k≥1

∫ t

0

∫ t

0

sin(ksπ/t)kπ/t

sin(ks′π/t)kπ/t

cβ,Γi (s)cβ,Γi (s′) dsds′

t (t2γi/(k2π2) + 1)

=∫ t

0

∫ t

0

sinh((s− t)√γi)sinh(s′√γi)

2√γi sinh

(t√γi) cβ,Γi (s)cβ,Γi (s′) dsds′.

For the second term,

∑k≥1

(−1)kt2

k2π2

sin(ksπ/t)kπ/t

t(t2γi/(k2π2) + 1)=t2

π3

∑k≥1

(−1)k sin(ksπ/t)k (t2γi/π2 + k2)

=t2

π3

(π

2t2γi/π2

sinh(s√γi)

sinh(t√γi)− sπ

2t3γi/π2

)=(

12γi

sinh(s√γi)

sinh(t√γi)− s

2tγi

);

hence

d∑i=1

∑k≥1

2(−1)kt2

k2π2cΓ,δi

∫ t

0

sin(ksπ/t)kπ/t

cβ,Γi (s) ds

t(t2γi/(k2π2) + 1)

=∫ t

0

d∑i=1

(sinh(s

√γi)

sinh(t√γi)− s

t

)cβ,Γi (s)cΓ,δi

γids

=∫ t

0

d∑i=1

sinh(s√γi)

sinh(t√γi)

cβ,Γi (s)cΓ,δiγi

ds

− 1t

∫ t

0

sδ>βs ds,

since∑di=1 c

β,Γi (s)cΓ,δi /γi = δ>βs. For the last term we get


∑k≥1

(t2cΓ,δi /(k2π2)

)2

t (t2γi/(k2π2) + 1)=

t

γi

(cΓ,δi

)2∑k≥1

(1

k2π2− 1t2γi + k2π2

)

=t

γi

(cΓ,δi

)2(

16

+1

2t2γi− 1

2t√γi

coth(t√γi))

;

then

d∑i=1

∑k≥1

(t2cΓ,δi /(k2π2)

)2

t(t2γi/(k2π2) + 1)

=‖Γδ‖2

6+‖δ‖2

2t−

d∑i=1

coth(t√γi)

2γi√γi

d∑j=1

P ij(Γ>Γδ

)j2

,

since∑di=1

(cΓ,δi

)2

/γi = ‖Γδ‖2,∑di=1

(cΓ,δi

)2

/γ2i = ‖δ‖2. In the above we

used the following classical identities.∑k≥1

cos krz2 + k2

=π

2ze(r−π)z + e−(r−π)z

eπz − e−πz− 1

2z2, ∀r ∈ (0, 2π), (B.32)

∑k≥1

(−1)ksin kr

k (z2 + k2)=

π

2z2

erz − e−rz

eπz − e−πz− r

2z2, ∀r ∈ (−π, π), (B.33)

∑k≥1

1z2 + k2π2

=12z

(coth z − 1

z

), (B.34)

∏k≥1

[1 +

l2

k2

]=

sinh(πl)πl

, (B.35)

and∑k≥1 1/k2 = π2/6 (for proofs of these identities see for example, Mac-

robert [201]). We finally find the closed formula for the Brownian functional(B.22). ut

In the one-dimensional case Proposition B.44 takes the following simplerform. This is the form of the result which is used in Chapter 6 to derive thedensity of πt for the Benes filter.

Corollary B.45. Let Bt, t ≥ 0 be a standard Brownian motion, β : [0, t]→R be a bounded measurable function, and Γ ∈ R be a positive constant. Then

E[

exp(∫ t

0

Bsβs ds− 12

∫ t

0

Γ 2B2s ds

)∣∣∣∣Bt = δ

]= fβ,Γ (t) exp

((∫ t

0

sinh(sΓ )sinh(tΓ )

βs ds)δ − Γ coth(tΓ )

2δ2 +

δ2

2t

), (B.36)

where


fβ,Γ (t) =

√tΓ

sinh(tΓ )exp(∫ t

0

∫ t

0

sinh((s− t)Γ ) sinh(s′Γ )2Γ sinh(tΓ )

βsβs′ dsds′).

1

Introduction

1.1 Foreword

The development of mathematics since the 1950s has gone through many radi-cal changes both in scope and in depth. Practical applications are being foundfor an increasing number of theoretical results and practical problems havealso stimulated the development of theory. In the case of stochastic filtering, itis not clear whether this first arose as an application found for general theory,or as the solution of a practical problem.

Stochastic filtering now covers so many areas that it would be futile toattempt to write a comprehensive book on the subject. The purpose of thistext is not to be exhaustive, but to provide a modern, solid and accessiblestarting point for studying the subject.

The aim of stochastic filtering is to estimate an evolving dynamical sys-tem, the signal, customarily modelled by a stochastic process. Throughoutthe book the signal process is denoted by X = Xt, t ≥ 0, where t is thetemporal parameter. Alternatively, one could choose a discrete time process,i.e. a process X = Xt, t ∈ N where t takes values in the (discrete) set0, 1, 2, . . .. The former continuous time description of the process has thebenefit that use can be made of the power of stochastic calculus. A discretetime process may be viewed as a continuous time process with jumps at fixedtimes. Thus a discrete time process can be viewed as a special case of a con-tinuous time process. However, it is not necessarily effective to do so sinceit is much easier and more transparent to study the discrete case directly.Unless otherwise stated, the process X and all other processes are defined ona probability space (Ω,F ,P).

The signal process X can not be measured directly. However, a partialmeasurement of the signal can be obtained. This measurement is modelledby another continuous time process Y = Yt, t ≥ 0 which is called theobservation process. This observation process is a function of X and a mea-surement noise. The measurement noise is modelled by a stochastic processW = Wt, t ≥ 0. Hence,

A. Bain, D. Crisan, Fundamentals of Stochastic Filtering,DOI 10.1007/978-0-387-76896-0 1, c© Springer Science+Business Media, LLC 2009

2 1 Introduction

Yt = ft(Xt,Wt), t ∈ [0,∞).

Let Y = Yt, t ≥ 0 be the filtration generated by the observation process Y ;namely,

Yt = σ (Ys, s ∈ [0, t]) , t ≥ 0.

This σ-algebra Yt can be interpreted as the information available from obser-vations up to time t. This information can be used to make various inferencesabout X, for example:

• What is the best estimate (denoted by Xt) of the value of the signal attime t, given the observations up to time t? If best estimate means the bestmean square estimate, then this translates into computing E[Xt | Yt], theconditional mean of Xt given Yt.

• Given the observations up to time t, what is the estimate of the differenceXt−Xt? For example, if the signal is real-valued, we may want to computeE[(Xt − Xt)2 | Yt] = E[X2

t | Yt]− E[Xt | Yt]2.• What is the probability that the signal at time t can be found within

a certain set A, again given the observations up to time t? This meanscomputing P(Xt ∈ A | Yt), the conditional probability of the event Xt ∈A given Yt.

The typical form of such an inference requires the computation or approx-imation of one or more quantities of the form E[ϕ(Xt) | Yt], where ϕ is areal-valued function defined on the state space of the signal. Each of thesestatistics will provide fragments of information about Xt. But what if all in-formation about Xt which is contained in Yt is required? Mathematically, thismeans computing πt, the conditional distribution of Xt given Yt. This πt isdefined as a random probability measure which is measurable with respect toYt so that†

E [ϕ(Xt) | Yt] =∫

Sϕ(x)πt(dx), (1.1)

for all statistics ϕ for which both terms of the above identity make sense.Knowing πt will enable us, at least theoretically, to compute any inferenceof Xt given Yt which is of interest, by integrating a suitable function ϕ withrespect to πt.

The measurability of πt with respect to Yt is crucial. However, this condi-tion is sometimes overlooked and treated as a rather meaningless theoreticalrequirement. The following theorem illustrates the significance of the condi-tion (for a proof see, e.g. Proposition 4.9 page 69 in [23]).

Theorem 1.1. Let Ω be a probability space and a, b : Ω → R be two arbitraryfunctions. Let A be the σ-algebra generated by a, that is the smallest σ-algebra† The identity (1.1) holds P-almost surely, i.e. there can be a subset of Ω of proba-

bility zero where (1.1) does not hold. The formal definition of the process πt canbe found in Chapter 2.

1.2 The Contents of the Book 3

such that a is A/B(R)-measurable. Then if b is also A/B(R)-measurable thereexists a B(R)/B(R)-measurable function f : R→ R such that b = f a, where denotes function composition.

Hence if b is “a-measurable”, then b is determined by a. If we know thevalue of a then (theoretically) we will know the value of b. In practice however,it is often impossible to obtain an explicit formula for the connecting functionf and this is the main difficulty in solving the filtering problem. Translatingthis concept into the context of filtering tells us that the random probabilityπt is a function of Ys for s ∈ [0, t]. Thus πt is determined by the values of theobservation process in the time interval [0, t].

1.2 The Contents of the Book

The book is divided into two parts. The first part deals with the theoreticalaspects of the problem of stochastic filtering and the second describes numer-ical methods for solving the filtering problem with emphasis on the class ofparticle approximations.

In Chapter 2 a fundamental measure-theoretic result related to π is proved:that the conditional distribution of the signal can be viewed as a stochasticprocess with values in the space of probability measures.

The filtering problem is stated formally in Chapter 3 for a class of problemwhere the signal X takes values in a state space S and is the solution of amartingale problem associated with an operator A. Two examples of filteringproblems which can be considered in this fashion are:

1. The state space S = Rd and X = (Xi)di=1 is the solution of a d-dimensionalstochastic differential equation driven by an m-dimensional Brownian mo-tion process V = (V j)mj=1,

Xit = Xi

0 +∫ t

0

f i(Xs)ds+m∑j=1

∫ t

0

σij(Xs) dV js , i = 1, . . . , d. (1.2)

In this case, the signal process is the solution of a martingale problemassociated with the second-order differential operator

A =d∑i=1

f i∂

∂xi+

12

d∑i,j=1

(m∑k=1

σikσjk

)∂2

∂xi∂xj.

2. The state space S = I and X is a continuous time Markov chain withfinite state space I. In this case, the corresponding operator is given bythe Q-matrix of the chain.

The observation process Y is required to satisfy a stochastic evolutionequation of the form

4 1 Introduction

Yt = Y0 +∫ t

0

h(Xs) ds+Wt, (1.3)

where W = (W i)ni=1 is an n-dimensional Brownian motion independent of Xand h = (hi)ni=1 : S→ Rn is called the sensor function.

The filtering equations for a problem of this class are then deduced. Inparticular, it is proved that for any test function ϕ in the domain of A wehave†

dπt(ϕ) = πt(Aϕ) dt+n∑i=1

(πt(hiϕ)− πt

(hi)πt(ϕ)

)×(dY it − πt(hiϕ)dt

). (1.4)

Also, πt has an unnormalized version, denoted by ρt, which satisfies the linearequation

dρt(ϕ) = ρt(Aϕ) dt+n∑i=1

ρt(hiϕ) dY it . (1.5)

The identity


is called the Kallianpur–Striebel formula.The first term of (1.5) describes the evolution of the signal and the accu-

mulation of observations is reflected in the second term. The same terms (withthe same interpretations) can be found in (1.4) and the additional terms aredue to the normalization procedure.

In Chapter 3 we present two approaches to deducing the filtering equations(1.4) and (1.5): the change of measure approach and the innovation approach.An extension is also described to the case where the noise driving the obser-vation process is no longer independent of the signal. This feature is quitecommon, for example, in financial applications.

Chapter 4 contains a detailed study of the uniqueness of the solution of thefiltering equations (1.4) and (1.5). The uniqueness can be shown by followinga partial differential equations approach. The solution of certain partial dif-ferential equations with final condition is proved to be a partial dual for thefiltering equations which leads to a proof of uniqueness. The second approachto proving uniqueness of the solution of the filtering equations follows therecent work of Heunis and Lucic.

In Chapter 5, we study the robust representation formula for the condi-tional expectation of the signal. The representation is robust in the sense thatits dependence on the observation process Y is continuous. The result hasimportant practical and theoretical consequences.

† If a is a measure on a space S and f is an a-integrable function then a(f) ,∫S f(x)a(dx).

1.3 Historical Account 5

Chapter 6 is devoted to finite-dimensional filters. Two classes of filterare described: the Kalman–Bucy filter and the Benes filter. Explicit formulaeare deduced for both πt and ρt and the finite-dimensionality of the filters isemphasized. The analysis of the Benes filter uses the robust representationresult presented in Chapter 5.

Among practitioners, it is generally accepted that the state space for πtis that of densities with respect to the Lebesgue measure. Inherent in this isthe (often unproved) assumption that πt will always be absolutely continuouswith respect to the Lebesgue measure. This is not always the case, althoughusually practitioners assume the correct conditions to ensure this. We discussthis issue in Chapter 7 and we look at the stochastic PDEs satisfied by thedensity of πt and the density of ρt.

Chapter 8 gives an overview of the main computational methods currentlyavailable for solving the filtering problem. As expected of a topic with such adiversity of applications, numerous algorithms for solving the filtering problemhave been developed.

Six classes of numerical method are presented: linearization methods (theextended Kalman filter), approximations by (exact) finite-dimensional filters,the projection filter/moment methods, spectral methods, PDE methods andparticle methods.

Chapter 9 contains a detailed study of a continuous time particle filter.Particle filters (also known as sequential Monte Carlo methods) are some ofthe most successful methods for the numerical approximations of the solutionof the filtering problem.

Chapter 10 is a self-contained, elementary treatment of particle approxi-mations to the solution of the stochastic filtering problem in the discrete timecase.

Finally, two appendices contain an assortment of measure theory, proba-bility theory and stochastic analysis results included in order to make the textas self-contained as possible.

1.3 Historical Account

The origins of the filtering problem in discrete time can be traced back to thework of Kolmogorov [152, 153] and Krein [155, 156]. In the continuous timecase Wiener [270] was the first to discuss the optimal estimation of dynamicalsystems in the presence of noise. The Wiener filter consists of a signal X whichis a stationary process and an associated measurement process Y = X + Vwhere V is some independent noise. The object is to use the values of Yto estimate X, where the estimation is required to have the following threeproperties.

• Causal : Xt is to be estimated using Ys for s ≤ t.• Optimal : The estimate, say Xt, should minimise the mean square error

E[(X − Xt)2].

6 1 Introduction

• Online: At any (arbitrary) time t, the estimate Xt should be available.

The Wiener filter gives a linear, time-invariant causal estimate of the form

Xt =∫ t

−∞h(t− s)Y (s) ds,

where h(s) is called the transfer function. Wiener studied and solved thisproblem using the spectral theory of stationary processes. The results wereincluded in a classified National Defense Research Council report issued inJanuary/February 1942. The report, nicknamed “The Yellow Peril” (accordingto Wiener [271] this was because of the yellow paper in which it was bound)was widely circulated among defence engineers. Subsequently declassified, itappeared as a book, [270], in 1949.

It is important to note that all consequent advances in the theory andpractical implementation of stochastic filtering always adhered to the threeprecepts enumerated above: causality, optimality and online estimation.

The next major development in stochastic filtering was the introduction ofthe linear filter. In this case, the signal satisfies a stochastic differential equa-tion of the form (1.2) with linear coefficients and Gaussian initial conditionand the observation equation satisfies an evolution equation of the form (1.3)with a linear sensor function. The linear filter can be solved explicitly; in otherwords, πt is given by a closed formula. The solution is a finite-dimensional one:πt is Gaussian, hence completely determined by its mean and its covariancematrix. Moreover it is quite easy to estimate the two parameters. The co-variance matrix does not depend on Y and it satisfies a deterministic Riccatiequation. Hence it can be solved in advance, before the filter is applied online.The mean satisfies a linear stochastic differential equation driven by Y , whosesolution can be easily computed. These were the reasons for the linear filter’swidespread success in the 1960s; for example it was used by NASA to get theApollo missions off the ground and to the moon.† Bucy and Kalman were thepioneers in this field. Kalman was the first to publish in a wide circulationjournal. In [146], he solved the discrete time version of the linear filter. Bucyobtained similar results independently.

Following the success of the linear filter, scientists started to explore differ-ent avenues. Firstly they extended the application of the Kalman filter beyondthe linear/Gaussian framework. The basis of this extension is the fact that,locally, all systems behave linearly. So, at least locally, one can apply theKalman filter equation. This gave rise to a class of algorithm called the ex-tended Kalman filter. At the time of writing these algorithms, most of whichare empirical and without theoretical foundation, are still widely used in avariety of applications.‡

† For an account of the linear filter’s applications to aerospace engineering andfurther references see Cipra [54].‡ We study the extended Kalman filter in some detail in Chapter 6.


Stratonovich’s work in non-linear filtering theory took place at the sametime as the work of Bucy and Kalman. Stratonovich† presented his first resultsin the theory of conditional Markov processes and the related optimal non-linear filtering at the All-Union Conference on Statistical Radiophysics inGorki (1958) and in a seminar [257]; they were published as [259].

Nevertheless, there was considerable unease about the methods used byStratonovich to deduce the continuous time filtering equation. The paper [259]appeared with an editorial footnote indicating that part of the exposition wasnot wholly convincing. Writing in Mathematical Reviews, Bharucha-Reid [17]indicated that he was inclined to agree with the editor’s comment concerningthe author’s arguments in the continuous case.

Part of the problem was that Stratonovich was using the stochastic inte-gral which today bears his name. Stratonovich himself mentions this misun-derstanding in [260, page 42]. He also points out (ibid., page 227) that thelinear filtering equations were published by him in [258].

On the other side of the Atlantic in the mid-1960s Kushner [175, 176, 178]derived and analysed equation (1.4) using Ito (and not Stratonovich) calculus.Shiryaev [255] provided the first rigorous derivation in the case of a generalobservation process where the signal and observation noises may be corre-lated. The equation (1.4) was also obtained in various forms by other authors,namely: Bucy [30] and Wonham [273]. In 1968, Kailath [137] introduced theinnovation approach to linear filtering. This new method for deducing the fil-tering equations was extended in the early 1970s by Frost and Kailath [103]and by Fujisaki, Kallianpur and Kunita [104]. The equation (1.4) is now com-monly referred to as either the Fujisaki–Kallianpur–Kunita equation or theKushner–Stratonovich equation.

Similarly, the filtering equation (1.5) was introduced in the same periodby Duncan [85], [84], Mortensen [222] and Zakai [281], and is consequentlyreferred to as the Zakai or the Duncan–Mortensen–Zakai equation.

The stochastic partial differential equations‡ associated with the filteringequations were rigorously analysed and extended in the late 1970s by Pardoux[236, 237, 238] and Krylov and Rozovskii [159, 160, 161, 162]. Pardoux adopteda functional analytic approach in analysing these SPDEs, whereas Krylovand Rozovskii examined the filtering equations using methods inherited fromclassical PDE theory. See Rozovskii [250] and the references therein for ananalysis of the filtering equations using these methods.

Another important development in filtering theory was initiated by Clark[56] and continued by Davis [72, 74, 75]. In the late 1970s, Clark introducedthe concept of robust or pathwise filtering; that is, πt(ϕ) is a function of theobservation path Ys, s ∈ [0, T ],† We thank Gregorii Milstein and Michael Tretyakov for drawing our attention to

Stratonovitch’s historical account [260].‡ Here we refer to the strong version of the filtering equations (1.4) and (1.5) as

described in Chapter 7.

8 1 Introduction

πt(ϕ) = Φ(Ys; s ∈ [0, T ]),

where Φ is a function defined on the corresponding space of trajectories. ButΦ is not uniquely defined. Any other function Φ′ equal to Φ on a set of measureone would be an equally acceptable version of πt(ϕ). From a computationalpoint of view, we need to identify a continuous version of Φ.†

Given the success of the linear/Gaussian filter, scientists tried to find otherclasses of filtering problem where the solution was finite-dimensional and/orhad a closed form. Benes [9] succeeded in doing this. The class of filter whichhe studied had a linearly evolving observation process. However the signalwas allowed to have a non-linear drift as long as it satisfied a certain (quiterestrictive) condition, thenceforth known as the Benes condition. The linearfilter satisfies the Benes condition.

Brockett and Clark [26, 27, 28] initiated a Lie algebraic approach to thefiltering problem. From the linearized form of the Zakai equation one candeduce that ρt lies on a surface “generated” by two differential operators.One is the infinitesimal generator of X, generally a second-order differentialoperator and the other is a linear zero-order operator. From a Lie algebraicpoint of view the Kalman filter and the Benes filter are isomorphic, wherethe isomorphism is given by a state space transformation. Benes continuedhis work in [10] where he found a larger class of exact filter for which thecorresponding Lie algebra is no longer isomorphic with that associated withthe Kalman–Bucy filter. Following Benes, Daum derived new classes of exactfilters in [69] and [70]. A number of other classes of finite-dimensional filterhave been discovered; see the series of papers by Chiou, Chen, Hu, Leung,Wu, Yau and Yau [48, 49, 50, 131, 274, 277, 276, 278]. See also the papers byMaybank [203, 204] and Schwartz and Dickinson [254].

In contrast to these finite-dimensional filters, results have been discov-ered which prove that generically the filtering problem is infinite-dimensional(Chaleyat-Maurel and Michel [42]). Hazewinkel, Marcus and Sussmann [121,122] and Mitter [210] have contributed to this area. The general consensus isnow that finite-dimensional filters are the exceptions and not the rule.

The work of Kallianpur has been influential in the field. The papers whichcontain the derivation of the Kallianpur–Striebel formula [144] and the deriva-tion of the filtering equation [104] are of particular interest. Jointly withKarandikar in the papers [138, 139, 140, 141, 142, 143], Kallianpur extendedthe theory of stochastic filtering to finitely additive measures in place of count-ably additive measures.

The area expanded rapidly in the 1980s and 1990s. Among the top-ics developed in this period were: stability of the solution of the filter-ing problem, the uniqueness and Feynman–Kac representations of the solu-tions of the filtering equations, Malliavin calculus applied to the qualitativeanalysis of πt and connections were discovered between filtering and infor-mation theory. In addition to the scientists already mentioned Bensoussan† We analyze the pathwise approach to stochastic filtering in Chapter 5.


[12, 14, 15], Budhiraja [32, 33, 34, 35], Chaleyat-Maurel [40, 41, 44, 45], Dun-can [86, 87, 88, 89], Elliott [90, 91, 92, 94], Grigelionis [107, 108, 109, 111],Gyongy [112, 113, 115, 116, 117], Hazewinkel [124, 123, 125, 126], Heunis[127, 128, 129, 130], Kunita [165, 166, 167, 168], Kurtz [170, 172, 173, 174],Liptser [52, 190, 191], Michel [46, 47, 207, 20], Mikulevicius [109, 110, 208, 209],Mitter [98, 211, 212, 213], Newton [212, 225, 226], Picard [240, 241, 242, 243],Ocone [57, 228, 229, 230, 232, 233] Runggaldier [80, 96, 154, 191] and Zeitouni[4, 5, 282, 283, 284] contributed during this period. In addition to these papers,monographs were written by Bensoussan [13], Liptser and Shiryaev [192, 193]and Rozovskii [250] and Pardoux published lecture notes [238].

Much of the work carried out in the 1990s has focussed on the numericalsolution of the filtering problem. The advent of fast computers has encouragedresearch in this area beyond the linear/Gaussian filter. Development in thisarea continues today. In Chapter 8 some historical comments are given foreach of the six classes of numerical method discussed. Kushner (see e.g. [177,179, 180, 181]) worked in particular on approximations of the solution of thefiltering problem by means of finite Markov chain approximations (which areclassified in Chapter 8 as PDE methods). Among others he introduced theimportant idea of a robust discrete state approximation, the finite differencemethod. Le Gland and his collaborators (see [25, 24, 100, 101, 136, 187, 188,223]) have contributed to the development of several classes of approximationincluding the projection filter, PDE methods and particle methods.

Rapid progress continues to be made in both the theory and applications ofstochastic filtering. In addition to work on the classical filtering problem, thereis ongoing work on the analysis of the filtering problem for infinite-dimensionalproblems and problems where the Brownian motion noise is replaced by either‘coloured’ noise, or fractional Brownian motion. Applications of stochasticfiltering have been found within mathematical finance. There is continuingwork for developing both generic/universal numerical methods for solving thefiltering problem and problem specific ones.

At a Cambridge conference on stochastic processes in July 2001, MosheZakai was asked what he thought of stochastic filtering as a subject for futureresearch students. He replied that he always advised his students ‘to have analternative subject on the side, just in case!’ We hope that this book will assistanyone interested in learning about this challenging subject!

References

1. Robert A. Adams. Sobolev Spaces. Academic Press, Orlando, FL, 2nd edition,2003.

2. Lakhdar Aggoun and Robert J. Elliott. Measure Theory and Filtering, vol-ume 15 of Cambridge Series in Statistical and Probabilistic Mathematics. Cam-bridge University Press, Cambridge, UK, 2004.

3. Deborah F. Allinger and Sanjoy K. Mitter. New results on the innovationsproblem for nonlinear filtering. Stochastics, 4(4):339–348, 1980/81.

4. Rami Atar, Frederi Viens, and Ofer Zeitouni. Robustness of Zakai’s equa-tion via Feynman-Kac representations. In Stochastic Analysis, Control, Op-timization and Applications, Systems Control Found. Appl., pages 339–352.Birkhauser Boston, Boston, MA, 1999.

5. Rami Atar and Ofer Zeitouni. Exponential stability for nonlinear filtering.Ann. Inst. H. Poincare Probab. Statist., 33(6):697–725, 1997.

6. J. E. Baker. Reducing bias and inefficiency in the selection algorithm. InJohn J. Grefenstette, editor, Proceedings of the Second International Confer-ence on Genetic Algorithms and their Applications, pages 14–21, Mahwah, NJ,1987. Lawrence Erlbaum.

7. John. S. Baras, Gilmer L. Blankenship, and William E. Hopkins Jr. Existence,uniqueness, and asymptotic behaviour of solutions to a class of Zakai equationswith unbounded coefficients. IEEE Trans. Automatic Control, AC-28(2):203–214, 1983.

8. Eduardo Bayro-Corrochano and Yiwen Zhang. The motor extended Kalmanfilter: A geometric approach for rigid motion estimation. J. Math. ImagingVision, 13(3):205–228, 2000.

9. V. E. Benes. Exact finite-dimensional filters for certain diffusions with nonlin-ear drift. Stochastics, 5(1-2):65–92, 1981.

10. V. E. Benes. New exact nonlinear filters with large Lie algebras. SystemsControl Lett., 5(4):217–221, 1985.

11. V. E. Benes. Nonexistence of strong nonanticipating solutions to stochasticDEs: implications for functional DEs, filtering and control. Stochastic Process.Appl., 5(3):243–263, 1977.

12. A. Bensoussan. On some approximation techniques in nonlinear filtering. InStochastic Differential Systems, Stochastic Control Theory and Applications

368 References

(Minneapolis, MN, 1986), volume 10 of IMA Vol. Math. Appl., pages 17–31.Springer, New York, 1988.

13. A. Bensoussan. Stochastic Control of Partially Observable Systems. CambridgeUniversity Press, Cambridge, UK, 1992.

14. A. Bensoussan, R. Glowinski, and A. Rascanu. Approximation of the Zakaiequation by splitting up method. SIAM J. Control Optim., 28:1420–1431, 1990.

15. Alain Bensoussan. Nonlinear filtering theory. In Recent advances in stochasticcalculus (College Park, MD, 1987), Progr. Automat. Info. Systems, pages 27–64. Springer, New York, 1990.

16. Albert Benveniste. Separabilite optionnelle, d’apres doob. In Seminaire deProbabilities, X (Univ. Strasbourg), Annees universitaire 1974/1975, volume511 of Lecture Notes in Math., pages 521–531. Springer Verlag, Berlin, 1976.

17. A. T. Bharucha-Reid. Review of Stratonovich, Conditional markov processes.Mathematical Reviews, (MR0137157), 1963.

18. A. G. Bhatt, G. Kallianpur, and R. L. Karandikar. Uniqueness and robustnessof solution of measure-valued equations of nonlinear filtering. Ann. Probab.,23(4):1895–1938, 1995.

19. P. Billingsley. Convergence of Probability Measures. Wiley, New York, 1968.20. Jean-Michel Bismut and Dominique Michel. Diffusions conditionnelles. II.

Generateur conditionnel. Application au filtrage. J. Funct. Anal., 45(2):274–292, 1982.

21. B. Z. Bobrovsky and M. Zakai. Asymptotic a priori estimates for the errorin the nonlinear filtering problem. IEEE Trans. Inform. Theory, 28:371–376,1982.

22. N. Bourbaki. Elements de Mathematique: Topologie Generale [French]. Her-mann, Paris, France, 1958.

23. Leo Breiman. Probability. Classics in Applied Mathematics. SIAM, Philadel-phia, PA, 1992.

24. Damiano Brigo, Bernard Hanzon, and Francois Le Gland. A differential ge-ometric approach to nonlinear filtering: the projection filter. IEEE Trans.Automat. Control, 43(2):247–252, 1998.

25. Damiano Brigo, Bernard Hanzon, and Francois Le Gland. Approximate non-linear filtering by projection on exponential manifolds of densities. Bernoulli,5(3):495–534, 1999.

26. R. W. Brockett. Nonlinear systems and nonlinear estimation theory. InStochastic Systems: The Mathematics of Filtering and Identification and Ap-plications (Les Arcs, 1980), volume 78 of NATO Adv. Study Inst. Ser. C: Math.Phys. Sci., pages 441–477, Dordrecht-Boston, 1981. Reidel.

27. R. W. Brockett. Nonlinear control theory and differential geometry. InZ. Ciesielski and C. Olech, editors, Proceedings of the International Congressof Mathematicians, pages 1357–1367, Warsaw, 1984. Polish Scientific.

28. R. W. Brockett and J. M. C. Clark. The geometry of the conditional densityequation. analysis and optimisation of stochastic systems. In Proceedings of theInternational Conference, University of Oxford, Oxford, 1978, pages 299–309,London-New York, 1980. Academic Press.

29. R. S. Bucy. Optimum finite time filters for a special non-stationary classof inputs. Technical Report Internal Report B. B. D. 600, March 31, JohnsHopkins Applied Physics Laboratory, 1959.

30. R. S. Bucy. Nonlinear filtering. IEEE Trans. Automatic Control, AC-10:198,1965.

References 369

31. R. S. Bucy and P. D. Joseph. Filtering for Stochastic Processes with Applica-tions to Guidance. Chelsea, New York, second edition, 1987.

32. A. Budhiraja and G. Kallianpur. Approximations to the solution of the Zakaiequation using multiple Wiener and Stratonovich integral expansions. Stochas-tics Stochastics Rep., 56(3-4):271–315, 1996.

33. A. Budhiraja and G. Kallianpur. The Feynman-Stratonovich semigroup andStratonovich integral expansions in nonlinear filtering. Appl. Math. Optim.,35(1):91–116, 1997.

34. A. Budhiraja and D. Ocone. Exponential stability in discrete-time filtering fornon-ergodic signals. Stochastic Process. Appl., 82(2):245–257, 1999.

35. Amarjit Budhiraja and Harold J. Kushner. Approximation and limit results fornonlinear filters over an infinite time interval. II. Random sampling algorithms.SIAM J. Control Optim., 38(6):1874–1908 (electronic), 2000.

36. D. L. Burkholder. Distribution function inequalities for martingales. Ann.Prob., 1(1):19–42, 1973.

37. Z. Cai, F. Le Gland, and H. Zhang. An adaptive local grid refinement methodfor nonlinear filtering. Technical Report 2679, INRIA, 1995.

38. J. Carpenter, P. Clifford, and P. Fearnhead. An improved particle filter fornon-linear problems. IEE Proceedings – Radar, Sonar and Navigation, 146:2–7, 1999.

39. J. R. Carpenter, P. Clifford, and P. Fearnhead. Sampling strategies for MonteCarlo filters for non-linear systems. IEE Colloquium Digest, 243:6/1–6/3, 1996.

40. M. Chaleyat-Maurel. Robustesse du filtre et calcul des variations stochastique.J. Funct. Anal., 68(1):55–71, 1986.

41. M. Chaleyat-Maurel. Continuity in nonlinear filtering. Some different ap-proaches. In Stochastic Partial Differential Equations and Applications(Trento, 1985), volume 1236 of Lecture Notes in Math., pages 25–39. Springer,Berlin, 1987.

42. M. Chaleyat-Maurel and D. Michel. Des resultats de non existence de filtre dedimension finie. Stochastics, 13(1-2):83–102, 1984.

43. M. Chaleyat-Maurel and D. Michel. Hypoellipticity theorems and conditionallaws. Z. Wahrsch. Verw. Gebiete, 65(4):573–597, 1984.

44. M. Chaleyat-Maurel and D. Michel. The support of the law of a filter in C∞

topology. In Stochastic Differential Systems, Stochastic Control Theory andApplications (Minneapolis, MN, 1986), volume 10 of IMA Vol. Math. Appl.,pages 395–407. Springer, New York, 1988.

45. M. Chaleyat-Maurel and D. Michel. The support of the density of a filter inthe uncorrelated case. In Stochastic Partial Differential Equations and Appli-cations, II (Trento, 1988), volume 1390 of Lecture Notes in Math., pages 33–41.Springer, Berlin, 1989.

46. M. Chaleyat-Maurel and D. Michel. Support theorems in nonlinear filtering.In New Trends in Nonlinear Control Theory (Nantes, 1988), volume 122 ofLecture Notes in Control and Inform. Sci., pages 396–403. Springer, Berlin,1989.

47. M. Chaleyat-Maurel and D. Michel. A Stroock Varadhan support theorem innonlinear filtering theory. Probab. Theory Related Fields, 84(1):119–139, 1990.

48. J. Chen, S. S.-T. Yau, and C.-W. Leung. Finite-dimensional filters with nonlin-ear drift. IV. Classification of finite-dimensional estimation algebras of maximalrank with state-space dimension 3. SIAM J. Control Optim., 34(1):179–198,1996.

370 References

49. J. Chen, S. S.-T. Yau, and C.-W. Leung. Finite-dimensional filters with nonlin-ear drift. VIII. Classification of finite-dimensional estimation algebras of max-imal rank with state-space dimension 4. SIAM J. Control Optim., 35(4):1132–1141, 1997.

50. W. L. Chiou and S. S.-T. Yau. Finite-dimensional filters with nonlinear drift. II.Brockett’s problem on classification of finite-dimensional estimation algebras.SIAM J. Control Optim., 32(1):297–310, 1994.

51. N. Chopin. Central limit theorem for sequential Monte Carlo methods and itsapplication to Bayesian inference. Annals of Statistics, 32(6):2385–2411, 2004.

52. P.-L. Chow, R. Khasminskii, and R. Liptser. Tracking of signal and its deriva-tives in Gaussian white noise. Stochastic Process. Appl., 69(2):259–273, 1997.

53. K. L. Chung and R. J. Williams. Introduction to Stochastic Integration.Birkhauser, Boston, second edition, 1990.

54. B. Cipra. Engineers look to Kalman filtering for guidance. SIAM News, 26(5),1993.

55. J. M. C. Clark. Conditions for one to one correspondence between an observa-tion process and its innovation. Technical report, Centre for Computing andAutomation, Imperial College, London, 1969.

56. J. M. C. Clark. The design of robust approximations to the stochastic differ-ential equations of nonlinear filtering. In J. K. Skwirzynski, editor, Commu-nication Systems and Random Process Theory, volume 25 of Proc. 2nd NATOAdvanced Study Inst. Ser. E, Appl. Sci., pages 721–734. Sijthoff & Noordhoff,Alphen aan den Rijn, 1978.

57. J. M. C. Clark, D. L. Ocone, and C. Coumarbatch. Relative entropy anderror bounds for filtering of Markov processes. Math. Control Signals Systems,12(4):346–360, 1999.

58. M. Cohen de Lara. Finite-dimensional filters. II. Invariance group techniques.SIAM J. Control Optim., 35(3):1002–1029, 1997.

59. M. Cohen de Lara. Finite-dimensional filters. part I: The Wei normal tech-nique. Part II: Invariance group technique. SIAM J. Control Optim., 35(3):980–1029, 1997.

60. D. Crisan. Exact rates of convergence for a branching particle approximationto the solution of the Zakai equation. Ann. Probab., 31(2):693–718, 2003.

61. D. Crisan. Particle approximations for a class of stochastic partial differentialequations. Appl. Math. Optim., 54(3):293–314, 2006.

62. D. Crisan, P. Del Moral, and T. Lyons. Interacting particle systems approxima-tions of the Kushner-Stratonovitch equation. Adv. in Appl. Probab., 31(3):819–838, 1999.

63. D. Crisan, J. Gaines, and T. Lyons. Convergence of a branching parti-cle method to the solution of the Zakai equation. SIAM J. Appl. Math.,58(5):1568–1590, 1998.

64. D. Crisan and T. Lyons. Nonlinear filtering and measure-valued processes.Probab. Theory Related Fields, 109(2):217–244, 1997.

65. D. Crisan and T. Lyons. A particle approximation of the solution of theKushner-Stratonovitch equation. Probab. Theory Related Fields, 115(4):549–578, 1999.

66. D. Crisan and T. Lyons. Minimal entropy approximations and optimal al-gorithms for the filtering problem. Monte Carlo Methods and Applications,8(4):343–356, 2002.

References 371

67. D. Crisan, P. Del Moral, and T. Lyons. Discrete filtering using branching andinteracting particle systems. Markov Processes and Related Fields, 5(3):293–318, 1999.

68. R. W. R. Darling. Geometrically intrinsic nonlinear recursive filters. Technicalreport, Berkeley Statistics Department, 1998. http://www.stat.berkeley.

edu/~darling/GINRF.69. F. E. Daum. New exact nonlinear filters. In J. C. Spall, editor, Bayesian

Analysis of Time Series and Dynamic Models, pages 199–226, New York, 1988.Marcel Dekker.

70. F. E. Daum. New exact nonlinear filters: Theory and applications. Proc. SPIE,2235:636–649, 1994.

71. M. H. A. Davis. Linear Estimation and Stochastic Control. Chapman and HallMathematics Series. Chapman and Hall, London, 1977.

72. M. H. A. Davis. On a multiplicative functional transformation arising in non-linear filtering theory. Z. Wahrsch. Verw. Gebiete, 54(2):125–139, 1980.

73. M. H. A. Davis. New approach to filtering for nonlinear systems. Proc. IEE-D,128(5):166–172, 1981.

74. M. H. A. Davis. Pathwise nonlinear filtering. In M. Hazewinkel and J. C.Willems, editors, Stochastic Systems: The Mathematics of Filtering and Iden-tification and Applications, Proc. NATO Advanced Study Inst. Ser. C 78, pages505–528, Dordrecht-Boston, 1981. Reidel.

75. M. H. A. Davis. A pathwise solution of the equations of nonlinear filter-ing. Theory Probability Applications [trans. of Teor. Veroyatnost. i Primenen.],27(1):167–175, 1982.

76. M. H. A. Davis and M. P. Spathopoulos. Pathwise nonlinear filtering for nonde-generate diffusions with noise correlation. SIAM J. Control Optim., 25(2):260–278, 1987.

77. Claude Dellacherie and Paul-Andre Meyer. Probabilites et potentiel. ChapitresI a IV. [French] [Probability and potential. Chapters I–IV] . Hermann, Paris,1975.

78. Claude Dellacherie and Paul-Andre Meyer. Un noveau theoreme de projectionet de section [French]. In Seminaire de Probabilites, IX (Seconde Partie, Univ.Strasbourg, Annees universitaires 1973/1974 et 1974/1975), pages 239–245.Springer Verlag, New York, 1975.

79. Claude Dellacherie and Paul-Andre Meyer. Probabilites et potentiel. ChapitresV a VIII. [French] [Probability and potential. Chapters V–VIII] Theorie desmartingales. Hermann, Paris, 1980.

80. Giovanni B. Di Masi and Wolfgang J. Runggaldier. An adaptive linear ap-proach to nonlinear filtering. In Applications of Mathematics in Industry andTechnology (Siena, 1988), pages 308–316. Teubner, Stuttgart, 1989.

81. J. L. Doob. Stochastic Processes. Wiley, New York, 1963.82. J. L. Doob. Stochastic process measurability conditions. Annales de l’institut

Fourier, 25(3–4):163–176, 1975.83. Arnaud Doucet, Nando de Freitas, and Neil Gordon. Sequential Monte Carlo

Methods in Practice. Stat. Eng. Inf. Sci. Springer, New York, 2001.84. T. E. Duncan. Likelihood functions for stochastic signals in white noise. In-

formation and Control, 16:303–310, 1970.85. T. E. Duncan. On the absolute continuity of measures. Ann. Math. Statist.,

41:30–38, 1970.

372 References

86. T. E. Duncan. On the steady state filtering problem for linear pure delay timesystems. In Analysis and control of systems (IRIA Sem., Rocquencourt, 1979),pages 25–42. INRIA, Rocquencourt, 1980.

87. T. E. Duncan. Stochastic filtering in manifolds. In Control Science and Tech-nology for the Progress of Society, Vol. 1 (Kyoto, 1981), pages 553–556. IFAC,Luxembourg, 1982.

88. T. E. Duncan. Explicit solutions for an estimation problem in manifolds asso-ciated with Lie groups. In Differential Geometry: The Interface Between Pureand Applied Mathematics (San Antonio, TX, 1986), volume 68 of Contemp.Math., pages 99–109. Amer. Math. Soc., Providence, RI, 1987.

89. T. E. Duncan. An estimation problem in compact Lie groups. Systems ControlLett., 10(4):257–263, 1988.

90. R. J Elliott and V. Krishnamurthy. Exact finite-dimensional filters for maxi-mum likelihood parameter estimation of continuous-time linear Gaussian sys-tems. SIAM J. Control Optim., 35(6):1908–1923, 1997.

91. R. J Elliott and J. van der Hoek. A finite-dimensional filter for hybrid obser-vations. IEEE Trans. Automat. Control, 43(5):736–739, 1998.

92. Robert J. Elliott and Michael Kohlmann. Robust filtering for correlated mul-tidimensional observations. Math. Z., 178(4):559–578, 1981.

93. Robert J. Elliott and Michael Kohlmann. The existence of smooth densities forthe prediction filtering and smoothing problems. Acta Appl. Math., 14(3):269–286, 1989.

94. Robert J. Elliott and John B. Moore. Zakai equations for Hilbert space valuedprocesses. Stochastic Anal. Appl., 16(4):597–605, 1998.

95. Stewart N. Ethier and Thomas G. Kurtz. Markov Processes: Characterizationand Convergence. Wiley, New York, 1986.

96. Marco Ferrante and Wolfgang J. Runggaldier. On necessary conditions for theexistence of finite-dimensional filters in discrete time. Systems Control Lett.,14(1):63–69, 1990.

97. W. H. Fleming and E. Pardoux. Optimal control of partially observed diffu-sions. SIAM J. Control Optim., 20(2):261–285, 1982.

98. Wendell H. Fleming and Sanjoy K. Mitter. Optimal control and nonlinear fil-tering for nondegenerate diffusion processes. Stochastics, 8(1):63–77, 1982/83.

99. Patrick Florchinger. Malliavin calculus with time dependent coefficients andapplication to nonlinear filtering. Probab. Theory Related Fields, 86(2):203–223, 1990.

100. Patrick Florchinger and Francois Le Gland. Time-discretization of the Za-kai equation for diffusion processes observed in correlated noise. In Analysisand Optimization of Systems (Antibes, 1990), volume 144 of Lecture Notes inControl and Inform. Sci., pages 228–237. Springer, Berlin, 1990.

101. Patrick Florchinger and Francois Le Gland. Time-discretization of the Za-kai equation for diffusion processes observed in correlated noise. StochasticsStochastics Rep., 35(4):233–256, 1991.

102. Avner Friedman. Partial Differential Equations of Parabolic Type. Prentice-Hall, Englewood Cliffs, NJ, 1964.

103. P. Frost and T. Kailath. An innovations approach to least-squares estimation.III. IEEE Trans. Autom. Control, AC-16:217–226, 1971.

104. M. Fujisaki, G. Kallianpur, and H. Kunita. Stochastic differential equationsfor the non linear filtering problem. Osaka J. Math., 9:19–40, 1972.

References 373

105. R. K. Getoor. On the construction of kernels. In Seminaire de Probabilites,IX (Seconde Partie, Univ. Strasbourg, Annees universitaires 1973/1974 et1974/1975), volume 465 of Lecture Notes in Math., pages 443–463. SpringerVerlag, Berlin, 1975.

106. N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approach tononlinear/non-Gaussian Bayesian state estimation. IEE Proceedings, Part F,140(2):107–113, 1993.

107. B. Grigelionis. The theory of nonlinear estimation and semimartingales. Izv.Akad. Nauk UzSSR Ser. Fiz.-Mat. Nauk, (3):17–22, 97, 1981.

108. B. Grigelionis. Stochastic nonlinear filtering equations and semimartingales.In Nonlinear Filtering and Stochastic Control (Cortona, 1981), volume 972 ofLecture Notes in Math., pages 63–99. Springer, Berlin, 1982.

109. B. Grigelionis and R. Mikulevicius. On weak convergence to random processeswith boundary conditions. In Nonlinear Filtering and Stochastic Control (Cor-tona, 1981), volume 972 of Lecture Notes in Math., pages 260–275. Springer,Berlin, 1982.

110. B. Grigelionis and R. Mikulevicius. Stochastic evolution equations and den-sities of the conditional distributions. In Theory and Application of RandomFields (Bangalore, 1982), volume 49 of Lecture Notes in Control and Inform.Sci., pages 49–88. Springer, Berlin, 1983.

111. B. Grigelionis and R. Mikulyavichyus. Robustness in nonlinear filtering theory.Litovsk. Mat. Sb., 22(4):37–45, 1982.

112. I. Gyongy. The approximation of stochastic partial differential equations andapplications in nonlinear filtering. Comput. Math. Appl., 19(1):47–63, 1990.

113. I. Gyongy and N. V. Krylov. Stochastic partial differential equations withunbounded coefficients and applications. II. Stochastics Stochastics Rep., 32(3-4):165–180, 1990.

114. I. Gyongy and N. V. Krylov. On stochastic partial differential equations withunbounded coefficients. In Stochastic partial differential equations and appli-cations (Trento, 1990), volume 268 of Pitman Res. Notes Math. Ser., pages191–203. Longman Sci. Tech., Harlow, 1992.

115. Istvan Gyongy. On stochastic partial differential equations. Results on approx-imations. In Topics in Stochastic Systems: Modelling, Estimation and AdaptiveControl, volume 161 of Lecture Notes in Control and Inform. Sci., pages 116–136. Springer, Berlin, 1991.

116. Istvan Gyongy. Filtering on manifolds. Acta Appl. Math., 35(1-2):165–177,1994. White noise models and stochastic systems (Enschede, 1992).

117. Istvan Gyongy. Stochastic partial differential equations on manifolds. II. Non-linear filtering. Potential Anal., 6(1):39–56, 1997.

118. Istvan Gyongy and Nicolai Krylov. On the rate of convergence of splitting-upapproximations for SPDEs. In Stochastic inequalities and applications, vol-ume 56 of Progr. Probab., pages 301–321. Birkhauser, 2003.

119. Istvan Gyongy and Nicolai Krylov. On the splitting-up method and stochasticpartial differential equations. Ann. Probab., 31(2):564–591, 2003.

120. J. E. Handschin and D. Q. Mayne. Monte Carlo techniques to estimate theconditional expectation in multi-stage non-linear filtering. Internat. J. Control,1(9):547–559, 1969.

121. M. Hazewinkel, S. I. Marcus, and H. J. Sussmann. Nonexistence of finite-dimensional filters for conditional statistics of the cubic sensor problem. Sys-tems Control Lett., 3(6):331–340, 1983.

374 References

122. M. Hazewinkel, S. I. Marcus, and H. J. Sussmann. Nonexistence of finite-dimensional filters for conditional statistics of the cubic sensor problem. InFiltering and Control of Random Processes (Paris, 1983), volume 61 of LectureNotes in Control and Inform. Sci., pages 76–103, Berlin, 1984. Springer.

123. Michiel Hazewinkel. Lie algebraic methods in filtering and identification. In VI-IIth International Congress on Mathematical Physics (Marseille, 1986), pages120–137. World Scientific, Singapore, 1987.

124. Michiel Hazewinkel. Lie algebraic method in filtering and identification. InStochastic Processes in Physics and Engineering (Bielefeld, 1986), volume 42of Math. Appl., pages 159–176. Reidel, Dordrecht, 1988.

125. Michiel Hazewinkel. Non-Gaussian linear filtering, identification of linear sys-tems, and the symplectic group. In Modeling and Control of Systems in Engi-neering, Quantum Mechanics, Economics and Biosciences (Sophia-Antipolis,1988), volume 121 of Lecture Notes in Control and Inform. Sci., pages 299–308.Springer, Berlin, 1989.

126. Michiel Hazewinkel. Non-Gaussian linear filtering, identification of linear sys-tems, and the symplectic group. In Signal Processing, Part II, volume 23 ofIMA Vol. Math. Appl., pages 99–113. Springer, New York, 1990.

127. A. J. Heunis. Nonlinear filtering of rare events with large signal-to-noise ratio.J. Appl. Probab., 24(4):929–948, 1987.

128. A. J. Heunis. On the stochastic differential equations of filtering theory. Appl.Math. Comput., 37(3):185–218, 1990.

129. A. J. Heunis. On the stochastic differential equations of filtering theory. Appl.Math. Comput., 39(3, suppl.):3s–36s, 1990.

130. Andrew Heunis. Rates of convergence for an adaptive filtering algorithm drivenby stationary dependent data. SIAM J. Control Optim., 32(1):116–139, 1994.

131. Guo-Qing Hu, Stephen S. T. Yau, and Wen-Lin Chiou. Finite-dimensionalfilters with nonlinear drift. XIII. Classification of finite-dimensional estimationalgebras of maximal rank with state space dimension five. Loo-Keng Hua: agreat mathematician of the twentieth century. Asian J. Math., 4(4):905–931,2000.

132. M. Isard and A. Blake. Visual tracking by stochastic propagation of conditionaldensity. In Proceedings of the 4th European Conference on Computer Vision,pages 343–356, New York, 1996. Springer Verlag.

133. M. Isard and A. Blake. Condensation conditional density propagation for visualtracking. Int. J. Computer Vision, 1998.

134. M. Isard and A. Blake. A mixed-state condensation tracker with automaticmodel switching. In Proceedings of the 6th International Conference on Com-puter Vision, pages 107–112, 1998.

135. K. Ito and H. P. McKean. Diffusion Processes and Their Sample Paths. Aca-demic Press, New York, 1965.

136. Matthew R. James and Francois Le Gland. Numerical approximation for non-linear filtering and finite-time observers. In Applied Stochastic Analysis (NewBrunswick, NJ, 1991), volume 177 of Lecture Notes in Control and Inform.Sci., pages 159–175. Springer, Berlin, 1992.

137. T. Kailath. An innovations approach to least-squares estimation. I. linearfiltering in additive white noise. IEEE Trans. Autom. Control, AC-13:646–655, 1968.

References 375

138. G. Kallianpur. White noise theory of filtering—Some robustness and consis-tency results. In Stochastic Differential Systems (Marseille-Luminy, 1984), vol-ume 69 of Lecture Notes in Control and Inform. Sci., pages 217–223. Springer,Berlin, 1985.

139. G. Kallianpur and R. L. Karandikar. The Markov property of the filter inthe finitely additive white noise approach to nonlinear filtering. Stochastics,13(3):177–198, 1984.

140. G. Kallianpur and R. L. Karandikar. Measure-valued equations for the opti-mum filter in finitely additive nonlinear filtering theory. Z. Wahrsch. Verw.Gebiete, 66(1):1–17, 1984.

141. G. Kallianpur and R. L. Karandikar. A finitely additive white noise approachto nonlinear filtering: A brief survey. In Multivariate Analysis VI (Pittsburgh,PA, 1983), pages 335–344. North-Holland, Amsterdam, 1985.

142. G. Kallianpur and R. L. Karandikar. White noise calculus and nonlinear fil-tering theory. Ann. Probab., 13(4):1033–1107, 1985.

143. G. Kallianpur and R. L. Karandikar. White Noise Theory of Prediction, Fil-tering and Smoothing, volume 3 of Stochastics Monographs. Gordon & BreachScience, New York, 1988.

144. G. Kallianpur and C. Striebel. Estimation of stochastic systems: Arbitrary sys-tem process with additive white noise observation errors. Ann. Math. Statist.,39(3):785–801, 1968.

145. Gopinath Kallianpur. Stochastic filtering theory, volume 13 of Applications ofMathematics. Springer, New York, 1980.

146. R. E. Kalman. A new approach to linear filtering and prediction problems. J.Basic Eng., 82:35–45, 1960.

147. R. E. Kalman and R. S. Bucy. New results in linear filtering and predictiontheory. Trans. ASME, Ser. D, J. Basic Eng., 83:95–108, 1961.

148. Jim Kao, Dawn Flicker, Kayo Ide, and Michael Ghil. Estimating model pa-rameters for an impact-produced shock-wave simulation: Optimal use of partialdata with the extended Kalman filter. J. Comput. Phys., 214(2):725–737, 2006.

149. I. Karatzas and S. E. Shreve. Brownian Motion and Stochastic Calculus.,volume 113 of Graduate Texts in Mathematics. Springer, New York, secondedition, 1991.

150. Genshiro Kitagawa. Non-Gaussian state-space modeling of nonstationary timeseries. with comments and a reply by the author. J. Amer. Statist. Assoc.,82(400):1032–1063, 1987.

151. P. E. Kloeden and E. Platen. The Numerical Solution of Stochastic DifferentialEquations. Springer, New York, 1992.

152. A. N. Kolmogorov. Sur l’interpolation et extrapolation des suites stationnaires.C. R. Acad. Sci., 208:2043, 1939.

153. A. N. Kolmogorov. Interpolation and extrapolation. Bulletin de l-academiedes sciences de U.S.S.R., Ser. Math., 5:3–14, 1941.

154. Hayri Korezlioglu and Wolfgang J. Runggaldier. Filtering for nonlinear systemsdriven by nonwhite noises: An approximation scheme. Stochastics StochasticsRep., 44(1-2):65–102, 1993.

155. M. G. Krein. On a generalization of some investigations of G. Szego, W. M.smirnov, and A. N. Kolmogorov. Dokl. Adad. Nauk SSSR, 46:91–94, 1945.

156. M. G. Krein. On a problem of extrapolation of A. N. Kolmogorov. Dokl. Akad.Nauk SSSR, 46:306–309, 1945.

376 References

157. N. V. Krylov. On Lp-theory of stochastic partial differential equations in thewhole space. SIAM J. Math. Anal., 27(2):313–340, 1996.

158. N. V. Krylov. An analytic approach to SPDEs. In Stochastic Partial Differen-tial Equations: Six Perspectives, number 64 in Math. Surveys Monogr., pages185–242. Amer. Math. Soc., Providence, RI, 1999.

159. N. V. Krylov and B. L. Rozovskiı. The Cauchy problem for linear stochasticpartial differential equations. Izv. Akad. Nauk SSSR Ser. Mat., 41(6):1329–1347, 1448, 1977.

160. N. V. Krylov and B. L. Rozovskii. Conditional distributions of diffusion pro-cesses. Izv. Akad. Nauk SSSR Ser. Mat., 42(2):356–378,470, 1978.

161. N. V. Krylov and B. L. Rozovskiı. Characteristics of second-order degenerateparabolic Ito equations. Trudy Sem. Petrovsk., (8):153–168, 1982.

162. N. V. Krylov and B. L. Rozovskiı. Stochastic partial differential equations anddiffusion processes. Uspekhi Mat. Nauk, 37(6(228)):75–95, 1982.

163. N. V. Krylov and A. Zatezalo. A direct approach to deriving filtering equationsfor diffusion processes. Appl. Math. Optim., 42(3):315–332, 2000.

164. H. Kunita. Stochastic Flows and Stochastic Differential Equations. Number 24in Cambridge Studies in Advanced Mathematics. Cambridge University Press,Cambridge, UK, 1990.

165. Hiroshi Kunita. Cauchy problem for stochastic partial differential equationsarising in nonlinear filtering theory. Systems Control Lett., 1(1):37–41, 1981/82.

166. Hiroshi Kunita. Stochastic partial differential equations connected with non-linear filtering. In Nonlinear Filtering and Stochastic Control (Cortona, 1981),volume 972 of Lecture Notes in Math., pages 100–169. Springer, Berlin, 1982.

167. Hiroshi Kunita. Ergodic properties of nonlinear filtering processes. In SpatialStochastic Processes, volume 19 of Progr. Probab., pages 233–256. BirkhauserBoston, 1991.

168. Hiroshi Kunita. The stability and approximation problems in nonlinear filteringtheory. In Stochastic Analysis, pages 311–330. Academic Press, Boston, 1991.

169. Hans R. Kunsch. Recursive Monte Carlo filters: Algorithms and theoreticalanalysis. Ann. Statist., 33(5):1983–2021, 2005.

170. T. G. Kurtz and D. L. Ocone. Unique characterization of conditional distribu-tions in nonlinear filtering. Ann. Probab., 16(1):80–107, 1988.

171. T. G. Kurtz and J. Xiong. Numerical solutions for a class of SPDEs with ap-plication to filtering. In Stochastics in Finite and Infinite Dimensions, TrendsMath., pages 233–258. Birkhauser Boston, 2001.

172. Thomas G. Kurtz. Martingale problems for conditional distributions of Markovprocesses. Electron. J. Probab., 3:no. 9, 29 pp. (electronic), 1998.

173. Thomas G. Kurtz and Daniel Ocone. A martingale problem for conditionaldistributions and uniqueness for the nonlinear filtering equations. In StochasticDifferential Systems (Marseille-Luminy, 1984), volume 69 of Lecture Notes inControl and Inform. Sci., pages 224–234. Springer, Berlin, 1985.

174. Thomas G. Kurtz and Jie Xiong. Particle representations for a class of non-linear SPDEs. Stochastic Process. Appl., 83(1):103–126, 1999.

175. H. Kushner. On the differential equations satisfied by conditional densities ofmarkov processes, with applications. SIAM J. Control, 2:106–119, 1964.

176. H. Kushner. Technical Report JA2123, M.I.T Lincoln Laboratory, March 1963.177. H. J. Kushner. Approximations of nonlinear filters. IEEE Trans. Automat.

Control, AC-12:546–556, 1967.

References 377

178. H. J. Kushner. Dynamical equations for optimal nonlinear filtering. J. Differ-ential Equations, 3:179–190, 1967.

179. H. J. Kushner. A robust discrete state approximation to the optimal nonlinearfilter for a diffusion. Stochastics, 3(2):75–83, 1979.

180. H. J. Kushner. Robustness and convergence of approximations to nonlinearfilters for jump-diffusions. Matematica Aplicada e Computacional, 16(2):153–183, 1997.

181. H. J. Kushner and P. Dupuis. Numerical Methods for Stochastic ControlProblems in Continuous Time. Number 24 in Applications of Mathematics.Springer, New York, 1992.

182. Harold J. Kushner. Weak Convergence Methods and Singularly PerturbedStochastic Control and Filtering Problems, volume 3 of Systems & Control:Foundations & Applications. Birkhauser Boston, 1990.

183. Harold J. Kushner and Amarjit S. Budhiraja. A nonlinear filtering algorithmbased on an approximation of the conditional distribution. IEEE Trans. Au-tom. Control, 45(3):580–585, 2000.

184. Harold J. Kushner and Hai Huang. Approximate and limit results for nonlinearfilters with wide bandwidth observation noise. Stochastics, 16(1-2):65–96, 1986.

185. S. Kusuoka and D. Stroock. The partial Malliavin calculus and its applicationto nonlinear filtering. Stochastics, 12(2):83–142, 1984.

186. F. Le Gland. Time discretization of nonlinear filtering equations. In Proceedingsof the 28th IEEE-CSS Conference Decision Control, Tampa, FL, pages 2601–2606, 1989.

187. Francois Le Gland. Splitting-up approximation for SPDEs and SDEs withapplication to nonlinear filtering. In Stochastic Partial Differential Equationsand Their Applications (Charlotte, NC, 1991), volume 176 of Lecture Notes inControl and Inform. Sci., pages 177–187. Springer, New York, 1992.

188. Francois Le Gland and Nadia Oudjane. Stability and uniform approximationof nonlinear filters using the Hilbert metric and application to particle filters.Ann. Appl. Probab., 14(1):144–187, 2004.

189. J. Levine. Finite-dimensional realizations of stochastic PDEs and applicationto filtering. Stochastics Stochastics Rep., 37(1–2):75–103, 1991.

190. Robert Liptser and Ofer Zeitouni. Robust diffusion approximation for nonlinearfiltering. J. Math. Systems Estim. Control, 8(1):22 pp. (electronic), 1998.

191. Robert S. Liptser and Wolfgang J. Runggaldier. On diffusion approximationsfor filtering. Stochastic Process. Appl., 38(2):205–238, 1991.

192. Robert S. Liptser and Albert N. Shiryaev. Statistics of Random Processes.I General Theory, volume 5 of Stochastic Modelling and Applied Probablility.Springer, New York, second edition, 2001. Translated from the 1974 Russianoriginal by A. B. Aries.

193. Robert S. Liptser and Albert N. Shiryaev. Statistics of Random Processes.II Applications, volume 6 of Stochastic Modelling and Applied Probability.Springer, New York, second edition, 2001. Translated from the 1974 Russianoriginal by A. B. Aries.

194. S. Lototsky, C. Rao, and B. Rozovskii. Fast nonlinear filter for continuous-discrete time multiple models. In Proceedings of the 35th IEEE Conference onDecision and Control, Kobe, Japan, 1996, volume 4, pages 4060–4064, Madison,WI, 1997. Omnipress.

378 References

195. S. V. Lototsky. Optimal filtering of stochastic parabolic equations. In RecentDevelopments in Stochastic Analysis and Related Topics, pages 330–353. WorldScientific, Hackensack, NJ, 2004.

196. S. V. Lototsky. Wiener chaos and nonlinear filtering. Appl. Math. Optim.,54(3):265–291, 2006.

197. Sergey Lototsky, Remigijus Mikulevicius, and Boris L. Rozovskii. Nonlinearfiltering revisited: A spectral approach. SIAM J. Control Optim., 35(2):435–461, 1997.

198. Sergey Lototsky and Boris Rozovskii. Stochastic differential equations: AWiener chaos approach. In From Stochastic Calculus to Mathematical Finance,pages 433–506. Springer, New York, 2006.

199. Sergey V. Lototsky. Nonlinear filtering of diffusion processes in correlatednoise: analysis by separation of variables. Appl. Math. Optim., 47(2):167–194,2003.

200. Vladimir M. Lucic and Andrew J. Heunis. On uniqueness of solutions forthe stochastic differential equations of nonlinear filtering. Ann. Appl. Probab.,11(1):182–209, 2001.

201. T. M. Macrobert. Functions of a Complex Variable. St. Martin’s Press, NewYork, 1954.

202. Michael Mangold, Markus Grotsch, Min Sheng, and Achim Kienle. State es-timation of a molten carbonate fuel cell by an extended Kalman filter. InControl and Observer Design for Nonlinear Finite and Infinite DimensionalSystems, volume 322 of Lecture Notes in Control and Inform. Sci., pages 93–109. Springer, New York, 2005.

203. S. J. Maybank. Path integrals and finite-dimensional filters. In StochasticPartial Differential Equations (Edinburgh, 1994), volume 216 of London Math.Soc. Lecture Note Ser., pages 209–229, Cambridge, UK, 1995. Cambridge Uni-versity Press.

204. Stephen Maybank. Finite-dimensional filters. Phil. Trans. R. Soc. Lond. A,354(1710):1099–1123, 1996.

205. Paul-Andre Meyer. Sur un probleme de filtration [French]. In Seminaire deProbabilities, VII (Univ. Strasbourg), Annees universitaire 1971/1972, volume321 of Lecture Notes in Math., pages 223–247. Springer Verlag, Berlin, 1973.

206. Paul-Andre Meyer. La theorie de la prediction de F. Knight [French].In Seminaire de Probabilities, X (Univ. Strasbourg), Annees universitaire1974/1975, volume 511 of Lecture Notes in Math., pages 86–103. Springer Ver-lag, Berlin, 1976.

207. Dominique Michel. Regularite des lois conditionnelles en theorie du filtragenon-lineaire et calcul des variations stochastique. J. Funct. Anal., 41(1):8–36,1981.

208. R. Mikulevicius and B. L. Rozovskii. Separation of observations and parametersin nonlinear filtering. In Proceedings of the 32nd IEEE Conference on Decisionand Control, Part 2, San Antonio. IEEE Control Systems Society, 1993.

209. R. Mikulevicius and B. L. Rozovskii. Fourier-Hermite expansions for nonlinearfiltering. Teor. Veroyatnost. i Primenen., 44(3):675–680, 1999.

210. Sanjoy K. Mitter. Existence and nonexistence of finite-dimensional filters.Rend. Sem. Mat. Univ. Politec. Torino, Special Issue:173–188, 1982.

211. Sanjoy K. Mitter. Geometric theory of nonlinear filtering. In MathematicalTools and Models for Control, Systems Analysis and Signal Processing, Vol.

References 379

3 (Toulouse/Paris, 1981/1982), Travaux Rech. Coop. Programme 567, pages37–60. CNRS, Paris, 1983.

212. Sanjoy K. Mitter and Nigel J. Newton. A variational approach to nonlinearestimation. SIAM J. Control Optim., 42(5):1813–1833 (electronic), 2003.

213. Sanjoy K. Mitter and Irvin C. Schick. Point estimation, stochastic approxima-tion, and robust Kalman filtering. In Systems, Models and Feedback: Theoryand Applications (Capri, 1992), volume 12 of Progr. Systems Control Theory,pages 127–151. Birkhauser Boston, 1992.

214. P. Del Moral. Non-linear filtering: Interacting particle solution. Markov Pro-cesses Related Fields, 2:555–580, 1996.

215. P. Del Moral. Non-linear filtering using random particles. Theory ProbabilityApplications, 40(4):690–701, 1996.

216. P. Del Moral. Feynman-Kac formulae. Genealogical and Interacting ParticleSystems with Applications. Springer, New York, 2004.

217. P. Del Moral and J. Jacod. The Monte-Carlo method for filtering with discrete-time observations: Central limit theorems. In Numerical Methods and Stochas-tics (Toronto, ON, 1999), Fields Inst. Commun., 34, pages 29–53. Amer. Math.Soc., Providence, RI, 2002.

218. P. Del Moral and L. Miclo. Branching and interacting particle systems approxi-mations of Feynman-Kac formulae with applications to non-linear filtering. InSeminaire de Probabilites, XXXIV, volume 1729 of Lecture Notes in Math.,pages 1–145. Springer, Berlin, 2000.

219. P. Del Moral, J. C. Noyer, G. Rigal, and G. Salut. Traitement particulairedu signal radar : detection, estimation et reconnaissance de cibles aeriennes.Technical Report 92495, LAAS, Dcembre 1992.

220. P. Del Moral, G. Rigal, and G. Salut. Estimation et commande optimale non-lineaire : un cadre unifie pour la resolution particulaire. Technical Report91137, LAAS, 1991.

221. P. Del Moral, G. Rigal, and G. Salut. Filtrage non-lineaire non-gaussien ap-plique au recalage de plates-formes inertielles. Technical Report 92207, LAAS,Juin 1992.

222. R. E. Mortensen. Stochastic optimal control with noisy observations. Internat.J. Control, 1(4):455–464, 1966.

223. Christian Musso, Nadia Oudjane, and Francois Le Gland. Improving regu-larised particle filters. In Sequential Monte Carlo Methods in Practice, Stat.Eng. Inf. Sci., pages 247–271. Springer, New York, 2001.

224. David E. Newland. Harmonic wavelet analysis. Proc. Roy. Soc. London Ser.A, 443(1917):203–225, 1993.

225. Nigel J. Newton. Observation sampling and quantisation for continuous-timeestimators. Stochastic Process. Appl., 87(2):311–337, 2000.

226. Nigel J. Newton. Observations preprocessing and quantization for nonlinearfilters. SIAM J. Control Optim., 38(2):482–502 (electronic), 2000.

227. David Nualart. The Malliavin Calculus and Related Topics. Springer, NewYork, second edition, 2006.

228. D. L. Ocone. Asymptotic stability of Benes filters. Stochastic Anal. Appl.,17(6):1053–1074, 1999.

229. Daniel Ocone. Multiple integral expansions for nonlinear filtering. Stochastics,10(1):1–30, 1983.

380 References

230. Daniel Ocone. Application of Wiener space analysis to nonlinear filtering.In Theory and Applications of Nonlinear Control Systems (Stockholm, 1985),pages 387–400. North-Holland, Amsterdam, 1986.

231. Daniel Ocone. Stochastic calculus of variations for stochastic partial differentialequations. J. Funct. Anal., 79(2):288–331, 1988.

232. Daniel Ocone. Entropy inequalities and entropy dynamics in nonlinear filteringof diffusion processes. In Stochastic Analysis, Control, Optimization and Ap-plications, Systems Control Found. Appl., pages 477–496. Birkhauser Boston,1999.

233. Daniel Ocone and Etienne Pardoux. A Lie algebraic criterion for nonexistenceof finite-dimensionally computable filters. In Stochastic Partial DifferentialEquations and Applications, II (Trento, 1988), volume 1390 of Lecture Notesin Math., pages 197–204. Springer, Berlin, 1989.

234. O. A. Oleınik and E. V. Radkevic. Second Order Equations with NonnegativeCharacteristic Form. Plenum Press, New York, 1973.

235. Levent Ozbek and Murat Efe. An adaptive extended Kalman filter withapplication to compartment models. Comm. Statist. Simulation Comput.,33(1):145–158, 2004.

236. E. Pardoux. Equations aux derivees partielles stochastiques non lineariresmonotones. PhD thesis, Univ Paris XI, Orsay, 1975.

237. E. Pardoux. Stochastic partial differential equations and filtering of diffusionprocesses. Stochastics, 3(2):127–167, 1979.

238. E. Pardoux. Filtrage non lineaire et equations aux derivees partielles stochas-tiques associees. In Ecole d’Ete de Probabilites de Saint-Flour XIX – 1989,volume 1464 of Lecture Notes in Mathematics, pages 67–163. Springer, 1991.

239. P. Parthasarathy. Probability Measures on Metric Spaces. Academic Press,New York, 1967.

240. J. Picard. Efficiency of the extended Kalman filter for nonlinear systems withsmall noise. SIAM J. Appl. Math., 51(3):843–885, 1991.

241. Jean Picard. Approximation of nonlinear filtering problems and order of con-vergence. In Filtering and Control of Random Processes (Paris, 1983), vol-ume 61 of Lecture Notes in Control and Inform. Sci., pages 219–236. Springer,Berlin, 1984.

242. Jean Picard. An estimate of the error in time discretization of nonlinear fil-tering problems. In Theory and Applications of Nonlinear Control Systems(Stockholm, 1985), pages 401–412. North-Holland, Amsterdam, 1986.

243. Jean Picard. Nonlinear filtering of one-dimensional diffusions in the case of ahigh signal-to-noise ratio. SIAM J. Appl. Math., 46(6):1098–1125, 1986.

244. Michael K. Pitt and Neil Shephard. Filtering via simulation: Auxiliary particlefilters. J. Amer. Statist. Assoc., 94(446):590–599, 1999.

245. M. Pontier, C. Stricker, and J. Szpirglas. Sur le theoreme de representationpar raport a l’innovation [French]. In Seminaire de Probabilites, XX (Univ.Strasbourg, Annees universitaires 1984/1985), volume 1204 of Lecture Notesin Math., pages 34–39. Springer Verlag, Berlin, 1986.

246. Yu. V. Prokhorov. Convergence of random processes and limit theoremsin probability theory. Theory Probability Applications [Teor. Veroyatnost. iPrimenen.], 1(2):157–214, 1956.

247. P. Protter. Stochastic Integration and Differential Equations. Springer, Berlin,second edition, 2003.

References 381

248. L. C. G. Rogers and D. Williams. Diffusions, Markov Processes and Martin-gales: Volume I Foundations. Cambridge University Press, Cambridge, UK,second edition, 2000.

249. L. C. G. Rogers and D. Williams. Diffusions, Markov Processes and Martin-gales: Volume II Ito Calculus. Cambridge University Press, Cambridge, UK,second edition, 2000.

250. B. L. Rozovskii. Stochastic Evolution Systems. Kluwer, Dordrecht, 1990.251. D. B. Rubin. A noniterative sampling/importance resampling alternative to

the data augmentation algorithm for creating a few imputations when thefraction of missing information is modest: The SIR algorithm (discussion ofTanner and Wong). J. Amer. Statist. Assoc., 82:543–546, 1987.

252. Laurent Saloff-Coste. Aspects of Sobolev-Type Inequalities, volume 289 of Lon-don Mathematical Society Lecture Note Series. Cambridge University Press,Cambridge, UK, 2002.

253. G. C. Schmidt. Designing nonlinear filters based on Daum’s theory. J. ofGuidance, Control Dynamics, 16(2):371–376, 1993.

254. Carla A.I. Schwartz and Bradley W. Dickinson. Characterizing finite-dimensional filters for the linear innovations of continuous-time random pro-cesses. IEEE Trans. Autom. Control, 30(3):312–315, 1985.

255. A. N. Shiryaev. Some new results in the theory of controlled random pro-cesses [Russian]. In Transactions of the Fourth Prague Conference on Informa-tion Theory, Statistical Decision Functions, Random Processes (Prague, 1965),pages 131–203. Academia Prague, 1967.

256. Elias M. Stein. Singular Integrals and Differentiability Properties of Func-tions. Number 30 in Princeton Mathematical Series. Princeton UniversityPress, Princeton, NJ, 1970.

257. R. L. Stratonovich. On the theory of optimal non-linear filtration of randomfunctions. Teor. Veroyatnost. i Primenen., 4:223–225, 1959.

258. R. L. Stratonovich. Application of the theory of Markov processes for optimumfiltration of signals. Radio Eng. Electron. Phys, 1:1–19, 1960.

259. R. L. Stratonovich. Conditional Markov processes. Theory Probability Appli-cations [translation of Teor. Verojatnost. i Primenen.], 5(2):156–178, 1960.

260. R. L. Stratonovich. Conditional Markov Processes and Their Application to theTheory of Optimal Control, volume 7 of Modern Analytic and ComputationalMethods in Science and Mathematics. Elsevier, New York, 1968. Translatedfrom the Russian by R. N. and N. B. McDonough for Scripta Technica.

261. D. W. Stroock and S. R. S. Varadhan. Multidimensional Diffusion Processes.Springer, New York, 1979.

262. Daniel W. Stroock. Probability Theory, An Analytic View. Cambridge Univer-sity Press, Cambridge, UK, 1993.

263. M. Sun and R. Glowinski. Pathwise approximation and simulation for the Zakaifiltering equation through operator splitting. Calcolo, 30(3):219–239 (1994),1993.

264. J. Szpirglas. Sur l’equivalence d’equations differentielles stochastiques a valeursmesures intervenant dans le filtrage Markovien non lineaire [French]. Ann. Inst.H. Poincare Sect. B (N.S.), 14(1):33–59, 1978.

265. I. Tulcea. Measures dans les espaces produits [French]. Atti. Accad. Naz. LinceiRend. Cl. Sci. Fis. Math. Nat., 8(7):208–211, 1949.

382 References

266. A. S. Ustunel. Some comments on the filtering of diffusions and the Malliavincalculus. In Stochastic analysis and related topics (Silivri, 1986), volume 1316of Lecture Notes in Math., pages 247–266. Springer, Berlin, 1988.

267. A. Yu. Veretennikov. On backward filtering equations for SDE systems (directapproach). In Stochastic Partial Differential equations (Edinburgh, 1994), vol-ume 216 of London Math. Soc. Lecture Note Ser., pages 304–311, Cambridge,UK, 1995. Cambridge Univ. Press.

268. D. Whitley. A genetic algorithm tutorial. Statist. Comput., 4:65–85, 1994.269. Ward Whitt. Stochastic Process Limits. An Introduction to Stochastic-Process

Limits and Their Application to Queues. Springer, New York, 2002.270. N. Wiener. Extrapolation, Interpolation, and Smoothing of Stationary Time

Series: With Engineering Applications. MIT Press, Cambridge, MA, 1949.271. N. Wiener. I Am a Mathematician. Doubleday, Garden City, NY; Victor

Gollancz, London, 1956.272. D. Williams. Probability with Martingales. Cambridge University Press, Cam-

bridge, UK, 1991.273. W. M. Wonham. Some applications of stochastic differential equations to opti-

mal nonlinear filtering. J. Soc. Indust. Appl. Math. Ser. A Control, 2:347–369,1965.

274. Xi Wu, Stephen S.-T. Yau, and Guo-Qing Hu. Finite-dimensional filters withnonlinear drift. XII. Linear and constant structure of Wong-matrix. In Stochas-tic Theory and Control (Lawrence, KS, 2001), volume 280 of Lecture Notes inControl and Inform. Sci., pages 507–518, Berlin, 2002. Springer.

275. T. Yamada and S. Watanabe. On the uniqueness of solutions of stochasticdifferential equations. J. Math. Kyoto Univ., 11:151–167, 1971.

276. Shing-Tung Yau and Stephen S. T. Yau. Finite-dimensional filters with non-linear drift. XI. Explicit solution of the generalized Kolmogorov equation inBrockett-Mitter program. Adv. Math., 140(2):156–189, 1998.

277. Stephen S.-T. Yau. Finite-dimensional filters with nonlinear drift. I. A class offilters including both Kalman-Bucy filters and Benes filters. J. Math. SystemsEstim. Control, 4(2):181–203, 1994.

278. Stephen S.-T. Yau and Guo-Qing Hu. Finite-dimensional filters with nonlineardrift. X. Explicit solution of DMZ equation. IEEE Trans. Autom. Control,46(1):142–148, 2001.

279. Marc Yor. Sur les theories du filtrage et de la prediction [French]. In Seminairede Probabilities, XI (Univ. Strasbourg), Annees universitaire 1975/1976, vol-ume 581 of Lecture Notes in Math., pages 257–297. Springer Verlag, Berlin,1977.

280. Marc Yor. Some Aspects of Brownian Motion, Part 1: Some Special Functionals(Lectures in Mathematics, ETH, Zurich). Birkhauser Boston, 1992.

281. Moshe Zakai. On the optimal filtering of diffusion processes. Z. Wahrschein-lichkeitstheorie und Verw. Gebiete, 11:230–243, 1969.

282. O. Zeitouni. On the tightness of some error bounds for the nonlinear filteringproblem. IEEE Trans. Autom. Control, 29(9):854–857, 1984.

283. O. Zeitouni and B. Z. Bobrovsky. On the reference probability approach to theequations of nonlinear filtering. Stochastics, 19(3):133–149, 1986.

284. Ofer Zeitouni. On the filtering of noise-contaminated signals observed via hardlimiters. IEEE Trans. Inform. Theory, 34(5, part 1):1041–1048, 1988.

Author Name Index

A

Adams, R. A. 165, 166Aggoun, L. 192Allinger, D. F. 35

B

Baker, J. E. 280Baras, J. S. 179Bayro-Corrochano, E. 194Benes, V. E. 8, 142, 197–199Bensoussan, A. 8, 9, 95, 104, 196, 356Bharucha-Reid, A. T. 7Bhatt, A. G. 126Billingsley, P. 303Blake, A. 286Bobrovsky, B. Z. 196Bourbaki, N. 27, 296Breiman, L. 294Brigo, D. 199, 202Brockett, R. W. 8Bucy, R. S. 6, 7, 192Budhiraja, A. 9Burkholder, D. L. 353

C

Carpenter, J. 230, 280Chaleyat-Maurel, M. 8, 9Chen, J. 8Chiou, W. L. 8Chopin, N. 280

Chung, K. L. 329, 330, 332, 338, 343,355

Cipra, B. 6Clark, J. M. C. 7, 8, 35, 129, 139, 348Clifford, P. 230, 280Cohen de Lara, M. 199Crisan, D. 230, 249, 279, 281, 285, 286

D

Daniell, P. J. 301Darling, R. W. R. 199Daum, F. E. 8, 199Davis, M. H. A. 7, 149, 250Del Moral, P. 249, 250, 281, 286Dellacherie, C. 307, 308, 312, 317, 319,

330Dickinson, B. W. 8Dieudonne, J. 32Doob, J. L. 18, 58, 88, 301, 329Doucet, A. 285Duncan, T. E. 7, 9Dynkin, E. B. 43

E

Efe, M. 194Elliott, R. J. 9, 192Ethier, S. N. 298, 303, 305, 330

F

Fearnhead, P. 230, 280Fleming, W. H. 196

384 Author Name Index

Friedman, A. 101, 103Frost, P. 7Fujisaki, M. 7, 34, 45

G

Getoor, R. K. 27, 28Gordon, N. J. 276, 286Grigelionis, B. 9Gyongy, I. 9, 139, 209

H

Halmos, P. R. 32Handschin, J. E. 286Hazewinkel, M. 8, 9Heunis, A. J. 9, 95, 113, 114, 126Hu, G.-Q. 8

I

Isard, M. 286Ito, K. 360

J

Jacod J. 249Joseph, P. D. 192

K

Kunsch, H. R. 279, 280Kailath, T. 7Kallianpur, G. 7, 8, 34, 35, 45, 57Kalman, R. E. 6Kao, J. 194Karandikar, R. L. 8Karatzas, I. 51, 88, 310, 330, 355Kitagawa, G. 286Kloeden, P. E. 251Kolmogorov, A. N. 5, 13, 31, 32, 301Krein, M. G. 5Krylov, N. V. 7, 93, 139, 209, 355Kunita, H. 7, 9, 34, 45, 182Kuratowksi, K. 27Kurtz, T. G. 9, 126, 165, 249, 298, 303,

305, 330Kushner, H. J. 7, 9, 139, 202

L

Levy, P. 344, 362Le Gland, F. 9Leung, C. W. 8Liptser, R. S. 9Lototsky, S. 202, 204Lucic, V. M. 95, 113, 114, 126Lyons, T. J. 230, 249, 250, 281, 285,

286

M

Mangold, M. 194Marcus, S. I. 8Maybank, S. J. 8Mayne, D. Q. 286McKean, H. P. 360Meyer, P. A. 27, 45, 307, 308, 312, 317,

319, 330Michel, D. 8, 9Miclo, L. 250Mikulevicius, R. 9, 202Mitter, S. K. 8, 9, 35Mortensen, R. E. 7

N

Newton, N. J. 9Novikov, A. A. 52, 350Nualart, D. 348

O

Ocone, D. L. 9, 126Oleınik, O. A. 105Ozbek, L. 194

P

Pardoux, E. 7, 9, 182, 193, 196Picard, J. 9, 195, 196Pitt, M. K. 285Platen, E. 251Prokhorov, Y. V. 45Protter P. 330, 351

R

Radkevic, E. V. 105

Author Name Index 385

Rigal, G. 286Rogers, L. C. G. 17, 32, 58, 293, 296,

300, 301, 307, 308, 319, 321, 329,339, 343, 348

Rozovskii, B. L. 7, 9, 93, 176, 177, 182,202, 355

Rubin, D. B. 286Runggaldier, W. J. 9

S

Salmond, D. J. 276, 286Saloff-Coste, L. 166Salut, G. 286Schmidt, G. C. 199Schwartz, C. A. I. 8Sharpe, M. J. 332Shephard, N. 285Shiryaev, A. N. 7, 9Shreve, S. E. 51, 88, 310, 330, 355Smith, A. F. M. 276, 286Stein, E. M. 166Stratonovich, R. S. 7Striebel, C. 8, 57Stroock, D. W. 28, 298Sussmann, H. J. 8Szpirglas, J. 125

T

Tsirel’son, B. S. 35Tulcea, I. 298

V

Varadhan, S. R. S. 298

Veretennikov, A. Y. 249

W

Watanabe, S. 35Whitley, D. 230, 280Whitt, W. 303Wiener, N. 5Williams, D. 17, 32, 43, 58, 293, 296,

300, 301, 307, 308, 319, 321, 329,339, 343, 348, 362

Williams, R. J. 329, 330, 332, 338, 343,355

Wonham, W. M. 7Wu, X. 8

X

Xiong, J. 165, 249

Y

Yamada, T. 35Yau, S.-T. 8Yau, S. S.-T. 8Yor, M. 28, 360

Z

Zakai, M. 7, 196Zatezalo, A. 93Zeitouni, O. 9

Subject Index

A

Announcing sequence 321Atom 298Augmented filtration see Observation

filtrationAveraging over the characteristics

formula 182

B

Benes condition 142, 196Benes filter 141, 146, 196

the d-dimensional case 197Bootstrap filter 276, 286Borel space 301Branching algorithm 278Brownian motion 346

exponential functional of 360, 361,363, 365

Fourier decomposition of 360Levy’s characterisation 344, 346

Burkholder–Davis–Gundy inequalities246, 256, 353

C

Cadlag path 303Caratheodory extension theorem 300,

347Change detection filter see Change-

detection problemChange of measure method 49, 52Change-detection problem 52, 69

Clark’s robustness result see Robustrepresentation formula

ClassU 96, 97, 100, 107, 109, 110, 113, 118U 109, 110U ′ 110, 111, 113, 114, 116U ′ 116

ConditionU 97, 102, 107, 110U′ 113, 114, 116U′′ 114

Conditional distributionof Xt 2–3, 191

approximating sequence 265density of 174density of the 200recurrence formula 261, 264unnormalised 58, 173, 175

regular 294Conditional expectation 293Conditional probability

of a set 294regular 32, 294, 296, 347

Convergence determining set 323Convergence in expectation 322, 324Cubic sensor 201

D

Debut theorem 307, 314, 339, 341Daniell–Kolmogorov–Tulcea theorem

301, 302, 347Density of ρt

existence of 168

388 Subject Index

smoothness of 174Dual previsible projection 332Duncan–Mortensen–Zakai equation

see Zakai equation

E

Empirical measure 210Euler method 251Evanescent set 319Exponential projection filter 201Extended Kalman filter 194

F

Feller property 267Feynman–Kac formula 182Filtering

equations 4, 16, 72, 93, 125, 249, 308see Kushner–Stratonovich

equation, Zakai equationfor inhomogeneous test functions

69problem 13, 48

discrete time 258–259the correlated noise case 73–75,

109Finite difference scheme 207Finite-dimensional filters 141, 146,

154, 196–199Fisher information matrix 199Fokker–Planck equation 206Fujisaki–Kallianpur–Kunita equa-

tion see Kushner–Stratonovichequation

G

Generator of the process X 48, 50, 51,151, 168, 207, 221

domain of the 47, 50–51maximal 51

Girsanov’s theorem 345, 346Gronwall’s lemma 78, 79, 81, 88, 172,

325

H

Hermite polynomials 203

I

Importance distribution 285Importance sampling 273Indistinguishable processes 319Infinitesimal generator see Generator

of the process XInnovation

approach 7, 49, 70–73process 33–34

Ito integral see Stochastic integralIto isometry 337, 338, 349Ito’s formula 343

K

Kallianpur–Striebel formula 57, 59,128

Kalman–Bucy filter 6, 148–154, 191,192, 199

1D case 158as a Benes filter 142, 148

Kushner–Stratonovich equation 68,71, 153

correlated noise case 74finite-dimensional 66for inhomogeneous test functions

69linear case 151strong form 179uniqueness of solution 110, 116

L

Likelihood function 260Linear filter see Kalman–Bucy filterLocal martingale 330, 344

M

Markov chain 257Martingale 329

representation theorem 348uniformly integrable 330, 346

Martingale convergence theorem 318,329, 345

Martingale problem 47Martingale representation theorem 35,

38, 44Measurement noise 1

Subject Index 389

Monotone class theorem 29, 31, 293,295, 311, 318, 336

Monte Carlo approximation 210, 216,222, 230

convergence of 213, 214, 217convergence rate 215, 216

Multinomial resampling see Resam-pling procedure

Mutation step 273

N

Non-linear filtering see Stochasticfiltering

Non-linear filtering problem seeFiltering problem

Novikov’s condition 52, 127, 131, 218,222, 350

O

Observationfiltration 13–17

right continuity of the 17, 27,33–40

unaugmented 16process 1, 3, 16

discrete time 258σ-algebra see Observation filtration

Offspring distribution 224, 252,274–281

Bernoulli 280binomial 280minimal variance 225, 226, 228, 230,

279, 280multinomial 275–277obtained by residual sampling 277Poisson 280

Optional process 320Optional projection of a process

17–19, 311–317, 338kernel for the 27without the usual conditions 321

P

Parabolic PDEsexistence and uniqueness result 100maximum principle for 102systems of 102

uniformly 101, 121Parseval’s equality 204, 205Particle filter 209, 222–224

branching algorithm 225convergence rates 241, 244, 245, 248correction step 222, 230, 250discrete time 272–273

convergence of 281–284prediction step 264updating step 264

evolution equation 230implementation 250–252

correction step 251, 252evolution step 251

offspring distribution see Offspringdistribution

path regularity 229resampling procedure 250, 252

Particle methods see Particle filterPath process 259PDE Methods

correction step 207prediction step 206

πthe stochastic process 14, 27–32

cadlag version of 31πt see Conditional distribution of Xt

Polarization identity 342Posterior distribution 259Predictable σ-algebra see Previsible

σ-algebraPredictable process see Previsible

processPredicted conditional probability 259Previsible σ-algebra 331Previsible process 321, 331, 338Previsible projection of a process 317,

321, 340, 341Prior distribution 259Projection bien measurable see

Optional projection of a processProjection filter 199Projective product 261

Q

Q-matrix 51Quadratic variation 332, 335, 342

390 Subject Index

R

Reducing sequence 330Regular grid 207Regularisation method 167Regularised measure 167Resampling procedure 276Residual sampling 277ρ see Conditional distribution of Xt,

unnormaliseddensity of 173, 178dual of 165, 180–182, 233, 238

Riccati equation 152, 192Ring of subsets 331Robust representation formula 129,

137

S

Sampling with replacement methodsee Resampling procedure

Selection step 274Sensor function 4Separable metric space 296Sequential Monte Carlo methods see

Particle filterSignal process 1, 3, 16, 47

discrete time version 257filtration associated with the 47

in discrete time 257particular cases 49–52

SIR algorithm 276Skorohod topology 304–305Sobolev

embedding theorem 166space 166

Splitting-up algorithm 206Stochastic differential equation

strong solution 355Stochastic filtering 1, 3, 6, 8, 9, see also

Filtering problemStochastic Fubini’s theorem 351Stochastic integral 330–341

limits of 358localization 343martingale property 337

Stochastic integration by parts 342Stopping time 306

announceable 321

T

TBBA see Tree-based branchingalgorithms

Total sets in L1 355, 357Transition kernel 257Tree-based branching algorithms 230,

279Tulcea’s theorem 298, 303, 347, 348

U

Uniqueness of solution see Kushner–Stratonovich equation, uniquenessof solution, see Zakai equation,uniqueness of solution

Usual conditions 16, 319

W

Weak topology on P(S) 21–27metric for 26

Wick polynomials 203Wiener filter 5–6

Z

Zakai equation 62, 69, 73, 154, 177correlated noise case 74, 111finite-dimensional 65for inhomogeneous test functions

69, 97strong form 67, 175–178, 202–203,

206uniqueness of solution 107, 109, 114,

182

centlib.ajums.ac.ircentlib.ajums.ac.ir/multiMediaFile/58588890-4-1.pdf · Stochastic Mechanics...

Documents

Transcript of centlib.ajums.ac.ircentlib.ajums.ac.ir/multiMediaFile/58588890-4-1.pdf · Stochastic Mechanics...