Methodes de Monte Carlo´ et Algorithmes Stochastiquescermics.enpc.fr/~bl/monte-carlo/poly.pdf ·...

Methodes de Monte Carloet Algorithmes Stochastiques

BERNARD LAPEYRE

Septembre 2017

Contents

1 Monte-Carlo Methods for Options 11.1 On the convergence rate of Monte-Carlo methods . . . . . . . . . . . . . . . . 31.2 Simulation methods of classical laws . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Simulation of the uniform law . . . . . . . . . . . . . . . . . . . . . . 41.2.2 Simulation of some common laws of finance . . . . . . . . . . . . . . 5

1.3 Variance Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.2 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3 Antithetic variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3.4 Stratification methods . . . . . . . . . . . . . . . . . . . . . . . . . . 151.3.5 Mean value or conditioning . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Low discrepancy sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5 Monte–Carlo methods for pricing American options. . . . . . . . . . . . . . . 241.6 Exercises and problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2 Introduction to stochastic algorithms 512.1 A reminder on martingale convergence theorems . . . . . . . . . . . . . . . . 51

2.1.1 Consequence and examples of uses . . . . . . . . . . . . . . . . . . . 522.2 Almost sure convergence for some stochastic algorithms . . . . . . . . . . . . 55

2.2.1 Almost sure convergence for the Robbins-Monro algorithm . . . . . . 552.2.2 Almost sure convergence for the Kiefer-Wolfowitz algorithm . . . . . . 57

2.3 Speed of convergence of stochastic algorithms . . . . . . . . . . . . . . . . . . 582.3.1 Introduction in a simplified context . . . . . . . . . . . . . . . . . . . 582.3.2 L2 and locally L2 martingales . . . . . . . . . . . . . . . . . . . . . . 602.3.3 Central limit theorem for martingales . . . . . . . . . . . . . . . . . . 632.3.4 A central limit theorem for the Robbins-Monro algorithm . . . . . . . 67

References 75

i

ii CONTENTS

Chapter 1

Monte-Carlo Methods for Options

Monte-Carlo methods are extensively used in financial institutions to compute European op-tions prices, to evaluate sensitivities of portfolios to various parameters and to compute riskmeasurements.

Let us describe the principle of the Monte-Carlo methods on an elementary example. Let∫[0,1]d

f(x)dx,

where f(·) is a bounded real valued function. Represent I as E(f(U)), where U is a uniformlydistributed random variable on [0, 1]d. By the Strong Law of Large Numbers, if (Ui, i ≥ 1) is afamily of uniformly distributed independent random variables on [0, 1]d, then the average

SN =1

N

N∑i=1

f(Ui) (1.1)

converges to E(f(U)) almost surely when N tends to infinity. This suggests a very simplealgorithm to approximate I: call a random number generator N times and compute the aver-age (1.1). Observe that the method converges for any integrable function on [0, 1]d : f is notnecessarily a smooth function.

In order to efficiently use the above Monte-Carlo method, we need to know its rate of con-vergence and to determine when it is more efficient than deterministic algorithms. The CentralLimit Theorem provides the asymptotic distribution of

√N(SN − I) when N tends to +∞.

Various refinements of the Central Limit Theorem, such as Berry-Essen and Bikelis theorems,provide non asymptotic estimates.

The preceding consideration shows that the convergence rate of a Monte Carlo method israther slow (1/

√N ). Moreover, the approximation error is random and may take large values

even ifN is large (however, the probability of such an event tends to 0 whenN tends to infinity).Nevertheless, the Monte-Carlo methods are useful in practice. For instance, consider an integralin a hypercube [0, 1]d, with d large (d = 40, e.g.). It is clear that the quadrature methods requiretoo many points (the number of points increases exponentially with the dimension of the space).Low discrepancy sequences are efficient for moderate value of d but this efficiency decreasesdrastically when d becomes large (the discrepancy behaves like C(d) logd(N)

Nwhere the constant

1

2 CHAPTER 1. MONTE-CARLO METHODS FOR OPTIONS

C(d) may be extremely large.). A Monte-Carlo method does not have such disadvantages :it requires the simulation of independent random vectors (X1, . . . , Xd), whose coordinates areindependent. Thus, compared to the computation of the one-dimensional situation, the numberof trials is multiplied by d only and therefore the method remains tractable even when d islarge. In addition, another advantage of the Monte-Carlo methods is their parallel nature: eachprocessor of a parallel computer can be assigned the task of making a random trial.

To summarize the preceding discussion : probabilistic algorithms are used in situationswhere the deterministic methods are unefficient, especially when the dimension of the statespace is very large. Obviously, the approximation error is random and the rate of convergenceis slow, but in these cases it is still the best method known.

1.1. ON THE CONVERGENCE RATE OF MONTE-CARLO METHODS 3

1.1 On the convergence rate of Monte-Carlo methods

In this section we present results which justify the use of Monte-Carlo methods and help tochoose the appropriate number of simulations N of a Monte-Carlo method in terms of thedesired accuracy and the confidence interval on the accuracy.

Theorem 1.1.1 (Strong Law of Large Numbers). Let (Xi, i ≥ 1) be a sequence of independentidentically distributed random variables such that E(|X1|) < +∞. The one has :

limn→+∞

1

n(X1 + · · ·+Xn) = E(X1) a.s.

Remark 1.1.1. The random variable X1 needs to be integrable. Therefore the Strong Lawof Large Numbers does not apply when X1 is Cauchy distributed, that is when its density is

1π(1+x2)

.

Convergence rate We now seek estimates on the error

εn = E(X)− 1

n(X1 + · · ·+Xn).

The Central Limit Theorem precises the asymptotic distribution of√NεN .

Theorem 1.1.2 (Central Limit Theorem). Let (Xi, i ≥ 1) be a sequence of independent identi-cally distributed random variables such that E(X2

1 ) < +∞.Let σ2 denote the variance of X1,that is

σ2 = E(X21 )− E(X1)2 = E

((X1 − E(X1))2

).

Then : (√n

σεn

)converges in distribution to G,

where G is a Gaussian random variable with mean 0 and variance 1.

Remark 1.1.2. From this theorem it follows that for all c1 < c2

limn→+∞

P(σ√nc1 ≤ εn ≤

σ√nc2

)=

∫ c2

c1

e−x2

2dx√2π.

In practice, one applies the following approximate rule, for n large enough, the law of εn is aGaussian random variable with mean 0 and variance σ2/n.

Note that it is impossible to bound the error, since the support of any (non degenerate) Gaus-sian random variable is R. Nevertheless the preceding rule allow one to define a confidenceinterval : for instance, observe that

P (|G| ≤ 1.96) ≈ 0.95.

Therefore, with a probability closed to 0.95, for n is large enough, one has :

|εn| ≤ 1.96σ√n.


How to estimate the variance The previous result shows that it is crucial to estimate thestandard deviation σ of the random variable. Its easy to do this by using the same samples asfor the expectation. Let X be a random variable, and (X1, . . . , XN) a sample drawn along thelaw of X . We will denote by XN the Monte-Carlo estimator of E(X) given by

XN =1

N

N∑i=1

Xi.

A standard estimator for the variance is given by

σ2N =

1

N − 1

N∑i=1

(Xi − XN

)2,

σ2N is often called the empirical variance of the sample. Note that σ2

N can be rewritten as

σ2N =

N

N − 1

(1

N

N∑i=1

X2i − X2

N

).

On this last formula, it is obvious that XN and σ2N can be computed using only

∑Ni=1 Xi and∑N

i=1X2i .

Moreover, one can prove, when E(X2) < +∞, that limN→+∞ σ2N = σ2, almost surely, and

that E (σ2N) = σ2 (the estimator is unbiased). This leads to an (approximate) confidence interval

by replacing σ par σn in the standard confidence interval. With a probability near of 0.95, E (X)belongs to the (random) interval given by[

XN −1.96σN√

N, XN +

1.96σN√N

].

So, with very little additional computations, (we only have to compute σN on a sample alreadydrawn) we can give an reasonable estimate of the error done by approximating E(X) with XN .The possibility to give an error estimate with a small numerical cost, is a very useful feature ofMonte-Carlo methods.

1.2 Simulation methods of classical laws

The aim of this section is to give a short introduction to sampling methods used in finance. Ouraim is not to be exhaustive on this broad subject (for this we refer to, e.g., [Devroye(1986)])but to describe methods needed for the simulation of random variables widely used in finance.Thus we concentrate on Gaussian random variables and Gaussian vectors.

1.2.1 Simulation of the uniform law

In this subsection we present basic algorithms producing sequences of “pseudo random num-bers”, whose statistical properties mimic those of sequences of independent and identically

1.2. SIMULATION METHODS OF CLASSICAL LAWS 5

uniformly distributed random variables. For a recent survey on random generators see, forinstance, [L’Ecuyer(1990)] and for mathematical treatment of these problems, see Niederre-iter [Niederreiter(1995)] and the references therein. To generate a deterministic sequence which“looks like” independent random variables uniformly distributed on [0, 1], the simplest (and themost widely used) methods are congruential methods. They are defined through four integersa, b, m and U0. The integer U0 is the seed of the generator, m is the order of the congruence, ais the multiplicative term. A pseudo random sequence is obtained from the following inductiveformula:

Un = (aUn−1 + b) (mod. m)

In practice, the seed is set to U0 at the beginning of a program and must never be changed insidethe program.

Observe that a pseudo random number generator consists of a completely deterministic al-gorithm. Such an algorithm produces sequences which statistically behaves (almost) like se-quences of independent and identically uniformly distributed random variables. There is notheoretical criterion which ensures that a pseudo random number generator is statistically ac-ceptable. Such a property is established on the basis of empirical tests. For example, one buildsa sample from successive calls to the generator, and one then applies the Chi–square test or theKolmogorov–Smirnov test in order to test whether one can reasonably accept the hypothesisthat the sample results from independent and uniformly distributed random variables. A gener-ator is good when no severe test has rejected that hypothesis. Good choice for a, b, m are givenin [L’Ecuyer(1990)] and [Knuth(1998)]. The reader is also refered to the following web site en-tirely devoted to Monte-Carlo simulation : http://random.mat.sbg.ac.at/links/.

1.2.2 Simulation of some common laws of finance

We now explain the basic methods used to simulate laws in financial models.

Using the distribution function in simulation The simplest method of simulation relies onthe use of the distribution function.

Proposition 1.2.1. LetX be a real random variable with strictly positive and continuous densitypX(x). Let F be its distribution function defined by

F (x) = P (X ≤ x).

Let U be a uniformly distributed in [0, 1] random variable. Then X and F−1(U) have the samedistribution function, that is to say X and F−1(U) have the same law.

Demonstration : Clearly, as F−1 is strictly increasing, we have

P(F−1(U) ≤ x

)= P (U ≤ F (x)) .

Now, as F (x) ≤ 1, we have P (F−1(U) ≤ x) = F (x). So F−1(U) and X have the samedistribution function and, hence, the same law.


Remark 1.2.2. Simulation of an exponential lawThis result can be extended to a general case (that is to say a law which does not admit a density,or with density not necessarily strictly positive). In this case we have to define the inverse F−1

of the increasing function F by

F−1(u) = inf x ∈ R, F (x) ≥ u .

If we note that F−1(u) ≤ x if and only if u ≤ F (x) the end of the previous proof remains thesame.

Remark 1.2.3. Simulation of asset models with jumps uses random variables with exponentiallaws. The preceding proposition applies to the simulation of an exponential law of parameterλ > 0, whose density is given by

λ exp(−λx)1R+(x).

In this case, a simple computation leads to F (x) = 1 − e−λx, so the equation F (x) = u canbe solved as x − log(1−u)

λ. If U follows a uniform distribution on [0, 1], − log(1−U)

λ(or − log(U)

λ

follows an exponential law with parameter λ).

Remark 1.2.4. This method can also be used to sample Gaussian random variables. Of courseneither the distribution function nor its inverse are exactly known but some rather good poly-nomial approximations can be found, e.g. , in [Abramovitz and Stegun(1970)]. This method isnumerically more complex than Box-Muller method (see below) but can be used when usinglow discrepancy sequences to sample Gaussian random variables.

Conditional simulation using the distribution function In stratification methods, describedlater in this chapter, it is necessary to sample real random variable X , given that this randomvariable belongs to a given interval ]a, b]. This can be easily done by using the distributionfunction. Let U be a random variable uniform on [0, 1], F be the distribution function of X ,F (x) = P(X ≤ x) and F−1 be its inverse. The law of Y defined by

Y = F−1 (F (a) + (F (b)− F (a))U) ,

is equal to the conditional law of X given that X ∈]a, b]. This can be easily proved by checkingthat the distribution function of Y is equal to the one of X knowing that X ∈]a, b].

Gaussian Law The Gaussian law with mean 0 and variance 1 on R is the law with the densitygiven by

1√2π

exp

(−x

2

2

).

Therefore, this distribution function of the Gaussian random variable X is given by

P(X ≤ z) =1√2π

∫ z

−∞exp

(−x

2

2

)dx, ∀z ∈ R.

The most widely used simulation method of a Gaussian law is the Box-Muller method. Thismethod is based upon the following result (See exercise 2 for a proof.).

1.2. SIMULATION METHODS OF CLASSICAL LAWS 7

Proposition 1.2.5. Let U and V be two independent random variables which are uniformlydistributed on [0, 1]. Let X and Y be defined by

X =√−2 logU sin(2πV ),

Y =√−2 logU cos(2πV ).

Then X and Y are two independent Gaussian random variables with mean 0 and variance 1.

Of course, the method can be used to simulate N independent realizations of the same realGaussian law. The simulation of the two first realizations is performed by calling a randomnumber generator twice and by computing X and Y as above. Then the generator is called twoother times to compute the corresponding two new values of X and Y , which provides two newrealizations which are independent and mutually independent of the two first realizations, andso on.

Simulation of a Gaussian vector To simulate a Gaussian vector

X = (X1, . . . , Xd)

with zero mean and with a d×d covariance matrix C = (cij, 1 ≤ i, j ≤ n) with cij = E(X iXj)one can proceed as follows.

C is a covariance matrix, so it is positive (since, for each v ∈ Rd, v.Cv = E ((v.X)2) ≥ 0).Standard results of linear algebra prove that there exists a d × d matrix A, called a square rootof C such that

AA∗ = C,

where A∗ is the transposed matrix of A = (aij, 1 ≤ i, j ≤ n).

Moreover one can compute a square root of a given positive symmetric matrix by specifyingthat aij = 0 for i < j (i.e. A is a lower triangular matrix). Under this hypothesis, its easy to seethat A is uniquely determined by the following algorithm

a11 :=√c11

For 2 < i ≤ d

ai1 :=ci1a11

,

then i increasing from 2 to d,

aii :=

√√√√cii −i−1∑j=1

|aij|2,

For j < i ≤ d

aij :=cij −

∑j−1k=1 aikajkajj

,

For 1 < i < j

aij := 0.


This way of computing a square root of a positive symmetric matrix is known as the Cholevskyalgorithm.

Now, if we assume that G = (G1, . . . , Gd) is a vector of independent Gaussian randomvariables with mean 0 and variance 1 (which are easy to sample as we have already seen), onecan check that Y = AG is a Gaussian vector with mean 0 et with covariance matrix given byAA∗ = C. As X et Y are two Gaussian vectors with the same mean and covariance matrix, thelaw of X and Y are the same. This leads to the following simulation algorithm.

Simulate the vector (G1, . . . , Gd) of independent Gaussian variables as explainedabove. Then return the vector X = AG.

Discrete law Consider a random variable X taking values in a finite set xk, k = 1, . . . , N.The value xk is taken with probability pk. To simulate the law of X , one simulates a randomvariable U uniform on [0, 1]. If the value u of the trial satisfies

k−1∑j=0

pj < u ≤k∑j=0

pj,

one decides to return the value xk. Clearly the random variable obtained by using this procedurefollows the same law as X .

Bibliographic remark A very complete discussion on the simulation of non uniform randomvariables can be found in [Devroye(1986)], results and discussion on the construction of pseudo-random sequences in Knuth [Knuth(1998)].

[Ripley(2006)],[Rubinstein(1981)] and [Hammersley and Handscomb(1979)] are referencebooks on simulation methods. See also the survey paper by Niederreiter [Niederreiter(1995)]and the references therein, in particular these which concern nonlinear random number genera-tors.

1.3 Variance Reduction

All the results of the preceding section show that the ratio σ/√N governs the ac-

curacy of a Monte-Carlo method with N simulations. An obvious consequence ofthis fact is that one always has interest to rewrite the quantity to compute as theexpectation of a random variable which has a smaller variance : this is the ba-sic idea of variance reduction techniques. For complements, we refer the readerto [Kalos and Whitlock(2008)],[Hammersley and Handscomb(1979)],[Rubinstein(1981)]or [Ripley(2006)].

Suppose that we want to evaluate E (X). We try to find an alternative representation for thisexpectation as

E (X) = E (Y ) + C,

using a random variable Y with lower variance and C a known constant. A lot of techniques areknown in order to implement this idea. This paragraph gives an introduction to some standardmethods.

1.3. VARIANCE REDUCTION 9

1.3.1 Control variates

The basic idea of control variate is to write E(f(X)) as

E(f(X)) = E(f(X)− h(X)) + E(h(X)),

where E(h(X)) can be explicitly computed and Var (f(X)−h(X)) is smaller than Var (f(X)).In these circumstances, we use a Monte-Carlo method to estimate E(f(X) − h(X)), and weadd the value of E(h(X)). Let us illustrate this principle by several financial examples.

Using call-put arbitrage formula for variance reduction Let St be the price at time t of agiven asset and denote by C the price of the European call option

C = E(e−rT (ST −K)+

),

and by P the price of the European put option

P = E(e−rT (K − ST )+

).

There exists a relation between the price of the put and the call which does not depend on themodels for the price of the asset,namely, the “call-put arbitrage formula” :

C − P = E(e−rT (ST −K)

)= S0 −Ke−rT .

This formula (easily proved using linearity of the expectation) can be used to reduce the varianceof a call option since

C = E(e−rT (K − ST )+

)+ S0 −Ke−rT .

The Monte-Carlo computation of the call is then reduced to the computation of the put option.

Remark 1.3.1. For the Black-Scholes model explicit formulas for the variance of the put andthe call options can be obtained. In most cases, the variance of the put option is smaller thanthe variance of the call since the payoff of the put is bounded whereas the payoff of the call isnot. Thus, one should compute put option prices even when one needs a call prices.

Remark 1.3.2. Observe that call-put relations can also be obtained for Asian options or basketoptions.

For example, for Asian options, set ST = 1T

∫0Ssds. We have :

E((ST −K

)+

)− E

((K − ST

)+

)= E

(ST)−K,

and, in the Black-Scholes model,

E(ST)

=1

T

∫ T

0

E(Ss)ds =1

T

∫ T

0

S0ersds = S0

erT − 1

rT.


Basket options. A very similar idea can be used for pricing basket options. Assume that, fori = 1, . . . , d

SiT = xie

(r − 1

2

p∑j=1

σ2ij

)T +

p∑j=1

σijWjT

,

where W 1, . . . ,W p are independent Brownian motions. Let ai, 1 ≤ i ≤ p, be positive realnumbers such that a1 + · · ·+ ad = 1. We want to compute a put option on a basket

E ((K −X)+) ,

where X = a1S1T + · · ·+ adS

dT . The idea is to approximate

X

m=a1x1

merT+

∑pj=1 σ1jW

jT + · · ·+ adxd

merT+

∑pj=1 σdjW

jT

where m = a1x1 + · · ·+ adxd, by Ym

where Y is the log-normal random variable

Y = me∑di=1

aixim (rT+

∑pj=1 σijW

jT ).

As we can compute an explicit formula for

E[(K − Y )+

],

one can to use the control variate Z = (K − Y )+ and sample (K −X)+ − (K − Y )+.

1.3.2 Importance sampling

Importance sampling is another variance reduction procedure. It is obtained by changing thesampling law.

We start by introducing this method in a very simple context. Suppose we want to compute

E(g(X)),

X being a random variable following the density f(x) on R, then

E(g(X)) =

∫Rg(x)f(x)dx.

Let f be another density such that f(x) > 0 and∫R f(x)dx = 1. Clearly one can write E(g(X))

as

E(g(X)) =

∫R

g(x)f(x)

f(x)f(x)dx = E

(g(Y )f(Y )

f(Y )

),

where Y has density f(x) under P. We thus can approximate E(g(X)) by

1

n

(g(Y1)f(Y1)

f(Y1)+ · · ·+ g(Yn)f(Yn)

f(Yn)

),


where (Y1, . . . , Yn) are independant copies of Y . Set Z = g(Y )f(Y )/f(Y ). We gave decreasedthe variance of the simulation if Var (Z) < Var (g(X)). It is easy to compute the variance ofZ as

Var (Z) =

∫R

g2(x)f 2(x)

f(x)dx− E(g(X))2.

From this and an easy computation it follows that if g(x) > 0 and f(x) = g(x)f(x)/E(g(X))then Var (Z) = 0! Of course this result cannot be used in practice as it relies on the ex-act knowledge of E(g(X)), which is the exactly what we want to compute. Nevertheless, itleads to a heuristic approach : choose f(x) as a good approximation of |g(x)f(x)| such thatf(x)/

∫R f(x)dx can be sampled easily.

An elementary financial example Suppose that G is a Gaussian random variable with meanzero and unit variance, and that we want to compute

E (φ(G)) ,

for some function φ. We choose to sample the law of G = G + m, m being a real constant tobe determined carefully. We have :

E (φ(G)) = E

(φ(G)

f(G)

f(G)

)= E

(φ(G)e−mG+m2

2

).

This equality can be rewritten as

E (φ(G)) = E(φ(G+m)e−mG−

m2

2

).

Suppose we want to compute a European call option in the Black and Scholes model, we have

φ(G) =(λeσG −K

)+,

and assume that λ << K. In this case, P(λeσG > K) is very small and unlikely the optionwill be exercised. This fact can lead to a very large error in a standard Monte-Carlo method. Inorder to increase to exercise probability, we can use the previous equality

E((λeσG −K

)+

)= E

((λeσ(G+m) −K

)+e−mG−

m2

2

),

and choose m = m0 with λeσm0 = K, since

P(λeσ(G+m0) > K

)=

1

2.

This choice of m is certainly not optimal; however it drastically improves the efficiency of theMonte-Carlo method when λ << K (see exercise 4 for a mathematical hint of this fact).


The multidimensional case Monte-Carlo simulations are really useful for problems withlarge dimension, and thus we have to extend the previous method to multidimensional setting.The ideas of this section come from [Glasserman et al.(1999)].

Let us start by considering the pricing of index options. Let σ be a n×d matrix and (Wt, t ≥0) a d-dimensional Brownian motion. Denote by (St, t ≥ 0) the solution of

dS1t = S1

t (rdt+ [σdWt]1). . .

dSnt = Snt (rdt+ [σdWt]n)

where [σdWt]i =∑d

j=1 σijdWjt .

Moreover, denote by It the value of the index

It =n∑i=1

aiSit ,

where a1, . . . , an is a given set of positive numbers such that∑n

i=1 ai = 1. Suppose that wewant to compute the price of a European call option with payoff at time T given by

h = (IT −K)+ .

As

SiT = Si0 exp

((r − 1

2

d∑j=1

σ2ij

)T +

d∑j=1

σijWjT

),

there exists a function φ such that

h = φ (G1, . . . , Gd) ,

where Gj = W jT/√T . The price of this option can be rewritten as

E (φ(G))

where G = (G1, . . . , Gd) is a d-dimensional Gaussian vector with unit covariance matrix.

As in the one dimensional case, it is easy (by a change of variable) to prove that, if m =(m1, . . . ,md),

E (φ(G)) = E(φ(G+m)e−m.G−

|m|22

), (1.2)

where m.G =∑d

i=1miGi and |m|2 =∑d

i=1m2i . In view of 1.2, the variance V (m) of the

random variableXm = φ(G+m)e−m.G−

|m|22

isV (m) = E

(φ2(G+m)e−2m.G−|m|2

)− E (φ(G))2 ,

= E(φ2(G+m)e−m.(G+m)+

|m|22 e−m.G−

|m|22

)− E (φ(G))2 ,

= E(φ2(G)e−m.G+

|m|22

)− E (φ(G))2 .

Exercise 5 provides an example of the use of this formula to reduce variance. The readeris refered to [Glasserman et al.(1999)] for an almost optimal way to choose the parameter mbased on this representation.


1.3.3 Antithetic variables

The use of antithetic variables is widespread in Monte-Carlo simulation. This technique is oftenefficient but its gains are less dramatic than other variance reduction techniques.

We begin by considering a simple and instructive example. Let

I =

∫ 1

0

g(x)dx.

If U follows a uniform law on the interval [0, 1], then 1− U has the same law as U , and thus

I =1

2

∫ 1

0

(g(x) + g(1− x))dx = E(

1

2(g(U) + g(1− U))

).

Therefore one can draw n independent random variables U1, . . . , Un following a uniform lawon [0, 1], and approximate I by

I2n = 1n

(12(g(U1) + g(1− U1)) + · · ·+ 1

2(g(Un) + g(1− Un))

)= 1

2n(g(U1) + g(1− U1) + · · ·+ g(Un) + g(1− Un)) .

We need to compare the efficiency of this Monte-Carlo method with the standard one with 2ndrawings

I02n = 1

2n(g(U1) + g(U2) + · · ·+ g(U2n−1) + g(U2n))

= 1n

(12(g(U1) + g(U2)) + · · ·+ 1

2(g(U2n−1) + g(U2n))

).

We will now compare the variances of I2n and I02n. Observe that in doing this we assume that

most of numerical work relies in the evaluation of f and the time devoted to the simulation ofthe random variables is negligible. This is often a realistic assumption.

An easy computation shows that the variance of the standard estimator is

Var (I02n) =

1

2nVar (g(U1)) ,

whereas

Var (I2n) =1

nVar

(1

2(g(U1) + g(1− U1))

)=

1

4n(Var (g(U1)) + Var (g(1− U1)) + 2Cov (g(U1), g(1− U1)))

=1

2n(Var (g(U1) + Cov (g(U1), g(1− U1))) .

Obviously, Var (I2n) ≤ Var (I02n) if and only if Cov (g(U1), g(1 − U1)) ≤ 0. One can prove

that if f is a monotonic function this is always true (see 6 for a proof) and thus the Monte-Carlomethod using antithetic variables is better than the standard one.

This ideas can be generalized in dimension greater than 1, in which case we use the transfor-mation

(U1, . . . , Ud)→ (1− U1, . . . , 1− Ud).


More generaly, if X is a random variable taking its values in Rd and T is a transformation of Rd

such that the law of T (X) is the same as the law of X , we can construct an antithetic methodusing the equality

E(g(X)) =1

2E (g(X) + g(T (X))) .

Namely, if (X1, . . . , Xn) are independent and sampled along the law of X , we can consider theestimator

I2n =1

2n(g(X1) + g(T (X1)) + · · ·+ g(Xn) + g(T (Xn)))

and compare it to

I02n =

1

2n(g(X1) + g(X2)) + · · ·+ g(X2n−1) + g(X2n)) .

The same computations as before prove that the estimator I2n is better than the crude one if andonly if Cov (g(X), g(T (X))) ≤ 0. We now show a few elementary examples in finance.

A toy financial example. Let G be a standard Gaussian random variable and consider the calloption

E((λeσG −K

)+

).

Clearly the law of −G is the same as the law of G, and thus the function T to be considered isT (x) = −x. As the payoff is increasing as a function of G, the following antithetic estimatorcertainly reduces the variance :

I2n =1

2n(g(G1) + g(−G1) + · · ·+ g(Gn) + g(−Gn)) ,

where g(x) = (λeσx −K)+.

Antithetic variables for path-dependent options. Consider the path dependent option withpayoff at time T

ψ (Ss, s ≤ T ) ,

where (St, t ≥ 0) is the lognormal diffusion

St = x exp

((r − 1

2σ2)t+ σWt

).

As the law of (−Wt, t ≥ 0) is the same as the law of (Wt, t ≥ 0) one has

E(ψ

(x exp

((r − 1

2σ2)s+ σWs

), s ≤ T

))= E

(ψ

(x exp

((r − 1

2σ2)s− σWs

), s ≤ T

)),

and, for appropriate functionals ψ, the antithetic variable method may be efficient.


1.3.4 Stratification methods

These methods are widely used in statistics (see [Cochran(1953)]). Assume that we want tocompute the expectation

I = E(g(X)) =

∫Rdg(x)f(x)dx,

where X is a Rd valued random variable with density f(x).

Let (Di, 1 ≤ i ≤ m) be a partition of Rd. I can be expressed as

I =m∑i=1

E(1X∈Dig(X)) =m∑i=1

E(g(X)|X ∈ Di)P(X ∈ Di),

where

E(g(X)|X ∈ Di) =E(1X∈Dig(X))

P(X ∈ Di).

Note that E(g(X)|X ∈ Di) can be interpreted as E(g(X i)) where X i is a random variablewhose law is the law of X conditioned by X belongs to Di, whose density is

1∫Dif(y)dy

1x∈Dif(x)dx.

Remark 1.3.3. The random variable X i is easily simulated using an acceptance rejection pro-cedure. But this method is clearly unefficient when P(X ∈ Di) is small.

When the numbers pi = P(X ∈ Di) can be explicitly computed, one can use a Monte-Carlomethod to approximate each conditional expectation Ii = E(g(X)|X ∈ Di) by

Ii =1

ni

(g(X i

1) + · · ·+ g(X ini

)),

where (X i1, . . . , X

ini

) are independent copies of X i. An estimator I of I is then

I =m∑i=1

piIi.

Of course the samples used to compute Ii are supposed to be independent and so the varianceof I is

m∑i=1

p2i

σ2i

ni,

where σ2i be the variance of g(X i).

Fix the total number of simulations∑m

i=1 ni = n. This minimization the variance above, onemust choose

ni = npiσi∑mi=1 piσi

.

For this values of ni, the variance of I is given in this case by

1

n

(m∑i=1

piσi

)2

.


Note that this variance is smaller than the one obtained without stratification. Indeed,

Var (g(X)) = E(g(X)2

)− E (g(X))2

=m∑i=1

piE(g2(X)|X ∈ Di

)−

(m∑i=1

piE (g(X)|X ∈ Di)

)2

=m∑i=1

piVar (g(X)|X ∈ Di) +m∑i=1

piE (g(X)|X ∈ Di)2

−

(m∑i=1

piE (g(X)|X ∈ Di)

)2

.

Using the convexity inequality for x2 we obtain (∑m

i=1 piai)2 ≤

∑mi=1 pia

2i if∑m

i=1 pi = 1, andthe inequality

Var (g(X)) ≥m∑i=1

piVar (g(X)|X ∈ Di) ≥

(m∑i=1

piσi

)2

,

follows.

Remark 1.3.4. The optimal stratification involves the σi’s which are seldom explicitly known.So one needs to estimate these σi’s by Monte-Carlo simulations.

Moreover note that arbitrary choices of ni may increase the variance. Common way tocircumvent this difficulty is to choose

ni = npi.

The corresponding variance1

n

m∑i=1

piσ2i ,

is always smaller than the original one as∑m

i=1 piσ2i ≤ Var (g(X)). This choice is often made

when the probabilities pi can be computed. For more considerations on the choice of the ni andalso, for hints on suitable choices of the sets Di, see [Cochran(1953)].

A toy example in finance In the standard Black and Scholes model the price of a call optionis

E((λeσG −K

)+

).

It is natural to use the following strata for G : either G ≤ d = log(K/λ)σ

or G > d. Of course thevariance of the stratum G ≤ d is equal to zero, so if you follow the optimal choice of number,you do not have to simulate points in this stratum : all points have to be sampled in the stratumG ≥ d! This can be easily done by using the (numerical) inverse of the distribution function ofa Gaussian random variable.

Of course, one does not need Monte-Carlo methods to compute call options for the Blackand Scholes models; we now consider a more convincing example.


Basket options Most of what follows comes from [Glasserman et al.(1999)]. The computa-tion of an European basket option in a multidimensional Black-Scholes model can be expressedas

E(h(G)),

for some function h and for G = (G1, . . . , Gn) a vector of independent standard Gaussianrandom variables. Choose a vector u ∈ Rn such that |u| = 1 (note that< u,G >= u1G1 +· · ·+unGn is also a standard Gaussian random variable.). Then choose a partition (Bi, 1 ≤ i ≤ n)of R such that

P(< u,G >∈ Bi) = P(G1 ∈ Bi) = 1/n.

This can be done by setting

Bi =]N−1((i− 1)/n), N−1(i/n)],

where N is the distribution function of a standard Gaussian random variable and N−1 is itsinverse. We then define the strata by setting

Di = < u, x >∈ Bi .

In order to implement our stratification method we need to solve two simulation problems

• sample a Gaussian random variable < u,G > given that < u,G > belongs to Bi,

• sample a new vector G knowing the value < u,G >.

The first problem is easily solved since the law of

N−1

(i− 1

N+U

N

), (1.3)

is precisely the law a standard Gaussian random variable conditioned to be in Bi (see page 6).

To solve the second point, observe that

G− < u,G > u

is a Gaussian vector independent of < u,G > with covariance matrix I − u⊗ u′ (where u⊗ u′denotes the matrix defined by (u ⊗ u′)ij = uiuj). Let Y be a copy of the vector G. ObviouslyY− < u, Y > u is independent of G and has the same law as G− < u,G > u. So

G =< u,G > u+G− < u,G > u and < u,G > u+ Y− < u, Y > u,

have the same probability law. This leads to the following simulation method of G given <u,G >= λ :

• sample n independent standard Gaussian random variables Y i,

• set G = λu+ Y− < u, Y > u.

To make this method efficient, the choice of the vector u is crucial : an almost optimal way tochoose the vector u can be found in [Glasserman et al.(1999)].


1.3.5 Mean value or conditioning

This method uses the well known fact that conditioning reduces the variance. Indeed, for anysquare integrable random variable Z, we have

E(Z) = E(E(Z|Y )),

where Y is any random variable defined on the same probability space as Z. It is well knownthat E(Z|Y ) can be written as

E(Z|Y ) = φ(Y ),

for some measurable function φ. Suppose in addition that Z is square integrable. As the condi-tional expectation is a L2 projection

E(φ(Y )2

)≤ E(Z2),

and thus Var (φ(Y )) ≤ Var (Z).

Of course the practical efficiency of simulating φ(Y ) instead ofZ heavily relies on an explicitformula for the function φ. This can be achieved when Z = f(X, Y ), where X and Y areindependent random variables. In this case, we have

E(f(X, Y )|Y ) = φ(Y ),

where φ(y) = E(f(X, y)).

A basic example. Suppose that we want to compute P (X ≤ Y ) where X and Y are indepen-dent random variables. This situation occurs in finance, when one computes the hedge of anexchange option (or the price of a digital exchange option).

Using the preceding, we have

P (X ≤ Y ) = E (F (Y )) ,

where F is the distribution function of X . The variance reduction can be significant, especiallywhen the probability P (X ≤ Y ) is small.

1.4 Low discrepancy sequences

Using sequences of points “more regular” than random points may sometimes improve Monte-Carlo methods. We look for deterministic sequences (xi, i ≥ 1) such that∫

[0,1]df(x)dx ≈ 1

n(f(x1) + · · ·+ f(xn)),

for all function f in a large enough set.

When the considered sequence is deterministic, the method is called a quasi Monte-Carlomethod. One can find sequences such that the speed of convergence of the previous approxima-tion is of the order K log(n)d

n(when the function f is regular enough). Such a sequence is called

a “low discrepancy sequence”.

1.4. LOW DISCREPANCY SEQUENCES 19

We give now a mathematical definition of a uniformly distributed sequence. In this definitionthe notion of discrepancy is involved. By definition, if x and y are two points in [0, 1]d, x ≤ yif and only if xi ≤ yi, for all 1 ≤ i ≤ d.

Definition 1.4.1. A sequence (xn)n≥1 is said to be uniformly distributed on [0, 1]d if one of thefollowing equivalent properties is fulfilled :

1. For all y = (y1, · · · , yd) ∈ [0, 1]d :

limn→+∞

1

n

n∑k=1

1xk∈[0,y]

d∏i=1

yi = Volume([0, y]),

where [0, y] = z ∈ [0, 1]d, 0 ≤ z ≤ y.

2. Let D∗n(x) = supy∈[0,1]d

∣∣∣∣∣ 1nn∑k=1

1xk∈[0,y] − Volume([0, y])

∣∣∣∣∣ be the discrepancy of the sequence,

thenlim

n→+∞D∗n(x) = 0,

3. For every bounded continuous function f on [0, 1]d

limn→+∞

1

n

n∑k=1

f(xk) =

∫[0,1]d

f(x)dx,

Remark 1.4.1. • If (Un)n≥1 is a sequence of independent random variables with uniformlaw on [0, 1], the random sequence

(Un(ω), n ≥ 1),

is almost surely uniformly distributed. Moreover, we have an iterated logarithm law forthe discrepancy, namely,

lim supn

√2n

log(log n)D∗n(U) = 1 a.s.

• The discrepancy of any infinite sequence satisfies the following property

D∗n > Cd(log n)max( d−1

2,1)

nfor an infinite number of values of n,

where Cd is a constant which depends on d only. This result is known as the Roth theorem(see [Roth(1954)]).

• It is possible to construct d-dimensional sequences with discrepancies bounded by(log n)d/n. We will see later in this section some examples of such sequences. Notethat, using the Roth theorem, these sequences are almost optimal. These sequences are,in principle, asymptotically better than random numbers.

In practice we use a number of drawing between 103 and 108 and, in this case, the bestknown sequences are not clearly better than random numbers in term of discrepancy. Thisis especially true in large dimension (greater than 100).


The discrepancy allows one to give an estimation of the approximation error

1

n

n∑k=1

f(xk)−∫

[0,1]df(x)dx,

when f has a finite variation in the sense of Hardy and Krause. This estimate is known as theKoksma-Hlawka inequality.

Proposition 1.4.2 (Koksma-Hlawka inequality). Let g be a finite variation function in the senseof Hardy and Krause and denote by V (g) its variation. Then for n ≥ 1∣∣∣∣∣ 1

N

N∑k=1

g(xk)−∫

[0,1]dg(u)du

∣∣∣∣∣ ≤ V (g)D∗N(x).

Remark 1.4.3. This result is very different from the central limit theorem used for randomsequences, which leads to a confidence interval for a given probability. Here, this estimationis deterministic. This can be seen as a useful property of low discrepancy sequences, but thisestimation involves V (g) andD∗N(x) and both of these quantities are extremely hard to estimatein practice. So, the theorem gives in most cases a large overestimation of the real error (veryoften, too large to be useful) .

For a general definition of finite variation function in the sense of Hardy and Krausesee [Niederreiter(1992)]. In dimension 1, this notion coincides with the notion of a functionwith finite variation in the classical sense. In dimension d, when g is d times continuouslydifferentiable, the variation of V (g) is given by

d∑k=1

∑1≤i1<···<ik≤d

∫ x ∈ [0, 1]d

xj = 1, for j 6= i1, . . . , ik

∣∣∣∣ ∂kg(x)

∂xi1 · · · ∂xik

∣∣∣∣ dxi1 . . . dxik .When the dimension d increases, a function with finite variation has to be smoother. For in-stance, the set function 1f(x1,...,xd)>λ has an infinite variation when d ≥ 2. Moreover, most ofthe standard option payoffs for basket options such as

(a1x1 + · · ·+ adxd −K)+ or (K − (a1x1 + · · ·+ adxd))+

do not have finite variation when d ≥ 3 (see [Ksas(2000)] for a proof).

Note that the efficiency of a law discrepancy method depends not only on the representation ofthe expectation, but also on the way the random variable is simulated. Moreover, the methodchosen can lead to functions with infinite variation, even when the variance is bounded.

For instance, assume that we want to compute E (f(G)), where G is a real random variableand f is a function such that Var (f(G)) < +∞, f is increasing, f(−∞) = 0 and f(+∞) =+∞. Assume that we simulate along the law of G using the inverse of the distribution functiondenoted by N(x). For the sake of simplicity, we will assume thatN is differentiable and strictlyincreasing. If U is a random variable drawn uniformly on [0, 1], we have

E (f(G)) = E(f(N−1(U)

)= E (g(U)) .


In order to use the Koksma-Hlawka inequality we need to compute the variation of g. But

V (g) =

∫ 1

0

|g′|(u)du

=

∫ 1

0

f ′(N−1(u))dN−1(u)

=

∫Rf ′(x)dx = f(+∞)− f(−∞) = +∞.

An example in finance is given by the call option where

f(G) =(λeσG −K

)+,

and G is a standard Gaussian random variable. Of course, it is easy in this case to solve thisproblem by first computing the price of the put option and then by using the call-put arbitragerelation to retrieve the call price.

We will now give examples of some of the most widely used low discrepancy sequencesin finance. For other examples and an exhaustive and rigorous presentation of this subjectsee [Niederreiter(1992)].

The Van Der Corput sequence Let p be an integer, p ≥ 2 and n a positive integer. We denoteby a0, a1, . . . , ar the p-adic decomposition of n, that is to say the unique set of integers ai suchthat 0 ≤ ai < p for 0 ≤ i ≤ r and ar > 0 with

n = a0 + a1p+ · · ·+ arpr.

Using standard notations, n can be writen as

n = arar−1 . . . a1a0 in base p.

The Van Der Corput sequence in base p is given by

φp(n) =a0

p+ · · ·+ ar

pr.

The definition of φp(n) can be rewriten as follows

if n = arar−1 . . . a0 then φp(n) = 0, a0a2 . . . ar,

where 0, a0a2 . . . ar denotes the p−adic decomposition of a number.

Halton sequences. Halton sequences are multidimensional generalizations of Van Der Corputsequence. Let p1, · · · , pd be the first d prime numbers. The Halton sequence is defined by

xdn = (φp1(n), · · · , φpd(n)) (1.4)

for an integer n and where φpi(n) is the Van Der Corput sequence in base pi.

One can prove that the discrepancy of a d−dimensional Halton sequence can be estimatedby

D∗n ≤1

n

d∏i=1

pi log(pin)

log(pi).


Faure sequence. These sequences are defined in [Faure(1981)] and [Faure(1982)]. The Fauresequence in dimension d is defined as follows. Let r be an odd integer greater than d. Nowdefine a function T on the set of numbers x such that

x =∑k≥0

akrk+1

,

where the sum is finite and each ak belongs to 0, . . . , r − 1, by

T (x) =∑k≥0

bkrk+1

,

where

bk =∑i≥k

(ik

)ai mod r,

and(ik

)denote the binomial coefficients. The Faure sequence is then defined as follows

xn =(φr(n− 1), T (φr(n− 1)), · · · , T d−1(φr(n− 1))

), (1.5)

where φr(n) is the Van Der Corput sequence of basis r. The discrepancy of this sequence isbounded by C log(n)d

n.

Irrational translation of the torus These sequences are defined by

xn = (nα1, . . . , nαd), (1.6)

where x is the fractional part of the number x and α = (α1, · · · , αd) is a vector of realnumbers such that (1, α1, · · · , αd) is a free family on Q. This is equivalent to say that there isno linear relation with integer coefficients (λi, i = 0, . . . , d) such that

λ0 + λ1α1 + · · ·+ λdαd = 0.

Note that this condition implies that the αi are irrational numbers.

One convenient way to choose such a family is to define α by

(√p1, · · · ,

√pd) ,

where p1, . . . , pd are the d first prime numbers. See [Pages and Xiao(1997)] for numerical ex-periments on this sequence.

Sobol sequence ([Sobol’(1967)]) One of the most used low discrepancy sequences is theSobol sequence. This sequence uses the binary decomposition of a number n

n =∑k≥1

ak(n)2k−1,

where the ak(n) ∈ 0, 1. Note that ak(n) = 0, for k large enough.


First choose a polynomial of degree q with coefficient in Z/2Z

P = α0 + α1X + · · ·+ αqXq,

such that α0 = αq = 1. The polynomial P is supposed to be irreducible and primitive inZ/2Z. See [Roman(1992)] for definitions and appendix A.4 of this book for an algorithm forcomputing such polynomials (a table of (some) irreducible polynomials is also available in thisbook and algorithm for testing the primitivity of a polynomial is available in Maple).

Choose an arbitrary vector of (M1, . . . ,Mq) ∈ Nq, such that Mk is odd and less than 2k.Define Mn, for n > q by

Mn = ⊕qi=12iαiMn−i ⊕Mk−q,

where ⊕ is defined bym⊕ n =

∑k≥0

(ak(m) XOR ak(n))2k,

and XOR is the bitwise operator defined by

aXORb = (a+ b) mod. 2.

A direction sequence (Vk, k ≥ 0) of real numbers is then defined by

Vk =Mk

2k,

and a one dimensional Sobol sequence xn, by

xn = ⊕k≥0ak(n)Vk,

if n =∑

k≥1 ak(n)2k−1, A multidimensional sequence can be constructed by using differentpolynomials for each dimension.

A variant of the Sobol sequence can be defined using a “Gray code”. For a given integer n, wecan define a Gray code of n, (bk(n), k ≥ 0), by the binary decomposition of G(n) = n⊕ [n/2]

n⊕ [n/2] =∑k≥0

bk(n)2k.

Note that the function G is bijective from

0, . . . , 2N − 1

to itself. The main interest of Graycodes is that the binary representation of G(n) and G(n + 1) differ in exactly one bit. Thevariant proposed by Antonov et Salev (see [Antonov and Saleev(1980)]) is defined by

xn = b1(n)V1 ⊕ · · · ⊕ br(n)Vr.

For an exhaustive study of the Sobol sequence, see [Sobol’(1967)] and [Sobol’(1976)]. Aprogram allowing to generate some Sobol sequences for small dimensions can be foundin [Press et al.(1992)], see also [Fox(1988)]. Empirical studies indicate that Sobol se-quences are among the most efficient low discrepancy sequences (see [Fox et al.(1992)]and [Radovic et al.(1996)] for numerical comparisons of sequences).


1.5 Monte–Carlo methods for pricing American options.

We present some recent Monte–Carlo algorithms devoted to the pricing of American options.This part follows the work done by Pierre Cohort (see [Cohort(2001)]). We give the de-scription of two class of algorithm. The first class use an approximation of conditional ex-pectation (Longstaff and Schwartz [Longstaff and Schwartz(1998)] and Tsitsiklis and VanRoy[Tsitsiklis and Roy(2000)]). The second one is based on an approximation a the underlyingMarkov chain (quantization algorithms [Pages and Bally(2000), Pages et al.(2001)], Broadieand Glassermann [Broadie and Glassermann(1997a), Broadie and Glassermann(1997b)] andBarraquand and Martineau [Barraquand and Martineau(1995)]).

Introduction

We assume that an American option can only be exercised at a given set of time (t0, . . . , tN =T ). In this case we say that the option is a Bermudean option.

The asset price at time tn is supposed to be given by a Markov chain (Xn, 0 ≤ n ≤ N) witha transition matrix between n and n+ 1 given by Pn(x, dy). This means that for every boundedf

E (f(Xn+1)|Fn) = Pnf(Xn) :=

∫f(y)Pn(Xn, dy),

where Fn = σ (Xk, k ≤ n).

We assume that the payoff at time tn is given by ϕ (Xn) where ϕ is a bounded functionand that the actualization between time tj and tj+1 can be expressed as 1/(1 + Rj), Rj being aconstant positive interest rate between time tj and tj+1. This allows us to define an actualizationcoefficient B between arbitrary time j and k, when j < k, by setting

B(j, k) =k−1∏l=j

1

1 +Rl

.

By convention we set B(j, j) = 1. Note that B(j, j + 1) = 1/(1 + Rj). With these notations,the price at time 0 of this option is

Q0 = supτ∈T0,N

E (B (0, τ)ϕ (Xτ )) (1.7)

where Tj,N is the set of Fn-stopping times taking values in j, . . . , N.

Remark 1.5.1. For the d-dimensional Black and Scholes model the assets prices (Sx,it , 1 ≤ i ≤d) can be written as

Sx,it = xie

(r − 1

2

∑1≤j≤p

σ2ij

)t+

∑1≤j≤p

σijWjt

where (Wt, t ≥ 0) is a p–dimensional Brownian motion, σ is a d × p matrix, x belongs to Rd,σ2i =

∑1≤j≤p σ

2ij and r > 0 is the risk-less interest rate. So Pn is defined by

Pnf(x) = E(f(Sxtn+1−tn)

).

1.5. MONTE–CARLO METHODS FOR PRICING AMERICAN OPTIONS. 25

Moreover, we have1 +Rn = exp(r(tn+1 − tn)).

Standard theory of optimal stopping proves that Q0 = u(0, X0), where u is the solution ofthe dynamic programming algorithm

u(N, x) = ϕ(x)

u(j, x) = max(ϕ(x), Pju(j + 1, x)

), 0 ≤ j ≤ N − 1.

(1.8)

and Pj(x, dy) is the discounted transition kernel of the Markov chain (Xj, j = 0, . . . , N) be-tween j and j + 1 given by

Pjf(x) = B(j, j + 1)Pjf(x).

The stopping timeτ ∗ = inf j ≥ 0, u(j,Xj) = ϕ (Xj) . (1.9)

is optimal, that is to say

Q0 = u(0, X0) = E (B (0, τ ∗)ϕ (Xτ∗)) .

Moreover, the price Qj of this American option at time j given by

Qj = supτ∈Tj,N

E (B (j, τ)ϕ (Xτ ) |Fj) ,

can be computed as u(j,Xj) and and optimal stopping time at time j is given by τ ∗j where

τ ∗j = inf i ≥ j;u(i,Xi) = ϕ(Xi) .

The Longstaff and Schwartz algorithm.

This algorithm approximate the optimal stopping time τ ∗ onM given paths by random variablesτ (m) depending on the path m and then estimate the price Q0 according to a Monte-Carloformula

QM0 =

1

M

∑1≤i≤M

B(0, τ (m))ϕ(X

(m)

τ (m)

). (1.10)

The construction of (τm, 1 ≤ m ≤ M) is presented according to the following steps. First wewrite the dynamic programming algorithm directly on the optimal stopping time τ ∗. Then wepresent the Longstaff and Schwartz approximation of this recursion. Finally we give a Monte-Carlo version, leading to τ (m).

A dynamical programming algorithm for optimal stopping time. The main feature of theLongstaff and Schwartz algorithm is to use a dynamic principle for optimal stopping timesrather than for the value function. Note that that τ ∗ can be computed using the sequence (τ ∗j , 0 ≤j ≤ N) defined by the backward induction

τ ∗N = N,τ ∗j = j1ϕ(Xj)≥u(j,Xj) + τ ∗j+11ϕ(Xj)<u(j,Xj).

(1.11)


Note that

ϕ(Xj) < u(j,Xj) =ϕ(Xj) < Pju(j + 1, Xj))

=ϕ(Xj) < E

(B(j, τ ∗j+1)ϕ(Xτ∗j+1

)∣∣∣Xj

).

(1.12)

With this definition for τ ∗j , it is easy to check recursively that

τ ∗j = min i ≥ j, ϕ (Xi) = u(i,Xi) . (1.13)

Thus τ ∗j (and thus τ ∗ = τ ∗0 ) are optimal stopping times at time j. In order to estimate τ ∗ wehave to find a way to approximate the conditional expectation

E(B(j, τ ∗j+1

)ϕ(Xτ∗j+1

)|Xj

). (1.14)

The basic idea of Longstaff and Schwartz is to introduce a least square regression method toperform this approximation.

The regression method We denote by Zj+1 the random variable

Zj+1 = B(j, τ ∗j+1

)ϕ(Xτ∗j+1

).

Obviously, as B and ϕ are bounded, Zj+1 is also a bounded random variable. Let us recall thatthe conditional expectation

E (Zj+1|Xj) (1.15)

can be expressed as ψj(Xj), where ψj minimizes

E([Xj+1 − f(Xj)]

2)among all functions f such that E (f(Xj)

2) < +∞. This come from the definition of theconditional expectation as a L2-projection on the set of the σ(Xj)-measurable random variablesand the well known property that all these random variables can be writen as f(Xj).

Assume now that we have a sequence of functions (gl, l ≥ 1) which is a total basis ofL2j = L2

(Rd, law of Xj

)for every time j, 1 ≤ j ≤ N . For all time index j we are thus able to

express ψj asψj =

∑l≥1

αlgl,

where the convergence of the series has to be understood in L2j . So we have a way to compute

recursively the optimal exercise time

1. initialize τN = N . Then inductively

2. define αj = (αjl , l ≥ 0) as the sequence which minimizes

E([B(j, τ ∗j+1

)ϕ(Xτ∗j+1

)− (α · g)(Xj)

]2)

where α · g =∑

l≥1 αlgl.


3. define τ ∗j = j1ϕ(Xj)≥(αj ·g)(Xj) + τ ∗j+11ϕ(Xj)<(αj ·g)(Xj).

This program is not really implementable since we have to perform a minimization in an infinitedimensional space and we are not able to compute the expectation involved in the minimizationproblem in step 2.

In order to implement the algorithm, the basic idea is to truncate the series at index k. Thisleads to the following modified program.

1. initialize τN = N . Then inductively

2. define αj,k = (αj,kl , 0 ≤ l ≤ k) as the vector which minimizes

E([B (j, τj+1)ϕ

(Xτj+1

)− (αj,k · g)(Xj)

]2)(1.16)

where (αj,k · g) =∑k

l=1 αj,kl gl.

3. defineτj = j1ϕ(Xj)≥(αj,k·g)(Xj) + τj+11ϕ(Xj)<(αj,k·g)(Xj).

To make the previous program implementable, we introduce a Monte-Carlo version of (1.16).

A Monte-Carlo approach to the regression problem Let (X(m)n , 0 ≤ n ≤ N), for 1 ≤

m ≤ M be M paths sampled along the law of the process (Xn, 0 ≤ n ≤ N). We replace theminimization problem 1.16 by

1. initialize τMN = N . Then inductively

2. define αj,kM = (αj,kM , 0 ≤ j ≤ k) as the vector which minimizes

1

M

∑1≤m≤M

((α · g)

(X

(m)j

)−B (j, τj+1)ϕ

(X(m)τj+1

))2

(1.17)

3. define for each trajectory m

τ(m)j = j1ϕ(Xj)≥(αj,kM ·g)(Xj) + τ

(m)j+11ϕ(Xj)<(αj,kM ·g)(Xj).

This algorithm is now fully implementable and that the estimator of the price is given by

1

MB(0, τ (m))ϕ(X

(m)

τ (m)).

Remark 1.5.2. • The minimization problem (1.17) is a standard least square approxima-tion problem. An algorithm for solving it can be found for instance in [Press et al.(1992)].

• The basis (gk, k ≥ 1) is supposed to be independent of the time index j. This is just forsake of simplicity and one can change the basis at each time index if needed for numericalefficiency. For instance, it is quite usual to add the payoff function in the basis, at leastwhen j approaches N .


• Very often for financial products such as call or put options,

P (ϕ(Xj) = 0) > 0,

for every time j. Since ϕ (Xj) = 0 ⊂ Aj, it is useless to compute

E(B(j, τ ∗j+1

)ϕ(Xτ∗j+1

) ∣∣∣Xj

)on the set ϕ (Xj) = 0 and we can thus restrict ourself in the regression to trajectoriessuch that ϕ (Xj) > 0.

• A rigorous proof of the convergence of the Longstaff Schwartz algorithm is givenin [Clement et al.(2002)].

The Tsitsiklis–VanRoy algorithm.

This algorithm use a revised dynamical programming algorithm similar to (1.8) but involving

u(j, x) = Pju(j + 1, x)

instead of u(j, x). An approximation of u is then performed by using a regression method.

Another form of the dynamic programming algorithm. Using u, the algorithm (1.8) canbe rewriten as

u(N, x) = Pn−1ϕ(x)For 0 ≤ j ≤ N − 2,

u(j, x) = Pj max (ϕ(.), u(j + 1, .)) (x).

(1.18)

So that Q0 = max(ϕ(x0), u(0, x0).

For every 1 ≤ j ≤ N − 1, let (gl, l ≥ 1) be a basis of L2j . At time N − 1, the authors

approximate u(j, x) by its projection on the subspace spanned by g1, . . . , gk, according to theL2j norm. They iterate this procedure backward in time, projecting at time j the function

B (j, j + 1) max(ϕ(x), (αj+1 · g)(x)

)on the vector space generated by g1, . . . , gk according to the L2

j norm. The resulting approxi-mated dynamical programming algorithm is then

αN−1 = D−1N−1E

(B (N − 1, N)ϕ (XN) tgN−1 (XN−1)

)For 1 ≤ j ≤ N − 2αj = D−1

j E (B (j, j + 1) max (ϕ (Xj+1) , (αj+1 · g) (Xj+1)) tg (Xj))α = E (B (0, 1) max (ϕ (X1) , (α1 · g) (X1)))

(1.19)

where DMj is the covariance matrix of the vector g(Xj) defined as

(Dj)k,l = E (gk(Xj)gl(Xj))− E (gk(Xj))E (gl(Xj)) .

At time 0, the proposed approximation of the price is given by

max (ϕ (x0) , α) .


The Monte-Carlo algorithm. In order to obtain an implementable algorithm we use a set oftrajectories denoted by (X(m), 1 ≤ m ≤ M) sampled along the law of X . The Monte-Carloversion of (1.19) is

αN−1M = (DM

N−1)−1 1M

∑1≤l≤M B (N − 1, N)ϕ

(X

(m)N

)tg(X

(m)N−1

)For 1 ≤ j ≤ N − 2,αjM = (DM

j )−1B (j, j + 1)×× 1M

∑1≤l≤M max

(ϕ(X

(m)j+1

), (αj+1

M · g)(X

(m)j+1

))tg(X

(m)j

)α0M = B (0, 1) 1

M

∑1≤l≤M max

(ϕ(X

(m)1

), (α1

M · g)(X

(m)1

)),

(1.20)

where Dj denotes the empirical dispersion matrix of the set of points

(g(X(m)j ), 1 ≤ m ≤M)

of Rk.The price at time 0 is approximated by max (ϕ (X0) , α0M) .

The quantization algorithms.

Algorithms based on quantization methods have been introduced by Bally and Pages in[Pages and Bally(2000)]. The principle of these algorithms is to discretize, at time j, the un-derlying Markov process Xj by a set of “quantization levels” yj =

(yji)

1≤i≤njand then to

approximate the transition kernel of X at time j by a discrete kernel Pj = (Pj(yjk, y

j+1l ), 1 ≤

k ≤ nj, l ≤ nj+1).

More precisely, let Pj (x, dy) be the transition kernel between times j and j + 1 of a Markovchain (Xn, 0 ≤ n ≤ N) taking its values in Rn and starting from x0.

For 1 ≤ j ≤ N , let yj be a set of vectors of Rd

yj =yj1, . . . , y

jnj

.

Note that X0 = x0 is supposed to be constant and need not to be discretized. We recall that

supτ∈T0,N

E (B (0, τ)ϕ(Xτ )) = u(0, x0),

where u is given by the dynamic programming algorithm (1.8)u(N, x) = ϕ(x)

u(j, x) = max(ϕ(x), Pju(j + 1, x)

)0 ≤ j ≤ N − 1.

(1.21)

We will now approxime the actualized transition kernel PjB(j, j + 1)Pj by a discrete tran-sition kernel Pj defined on yj × yj+1. This kernel must satisfy the following requirement :Pj(yjk, y

j+1l

)approximate the actualized probability that the process X moves from a neigh-

borhood of yjk at time j to a neighborhood of yj+1l at time j + 1. This can be done using Pj

defined by

Pj(yjk, y

j+1l

)= B(j, j + 1)

P(Xj ∈ Ck(yj), Xj+1 ∈ Cl(yj+1))

P(Xj ∈ Ck(yj))(1.22)


where the set Ci (yj) is the ith Voronoı cell of the set of points yj , defined by

Ci(yj)

=

z ∈ Rd,

∥∥z − yji∥∥ = min1≤l≤nj

∥∥z − yjl ∥∥ . (1.23)

If we assume that, for every x ∈ Rd and l, Pj (x, ∂Cl (yj+1)) = 0, equation (1.22) defines a

probability transition kernel.

The approximation of the dynamic programming algorithm (1.21) using the matrix Pj isgiven by

u(N, yNk ) = f(yNk)

1 ≤ k ≤ nNFor 0 ≤ j ≤ N − 1, 1 ≤ k ≤ nj ,

u(j, yjk) max

f (yjk) , ∑1≤l≤nj+1

Pj(yjk, y

j+1l

)u(j + 1, yj+1

l )

(1.24)

and the proposed price approximation is u(0, x0).

In order to have an algorithm we need to construct a Monte-Carlo procedure to approximatePj . Let (X(m), 1 ≤ m ≤M) be M sample paths drawned along the law of the markov chain Xand independent. The formula (1.22) suggests to consider the MonteCarlo estimator Pj definedby

Pj(yjk, y

j+1l

)= B(j, j + 1)

∑1≤m≤M 1

X(m)j ∈Ck(yj),X

(m)j+1Cl(y

j+1)∑

1≤m≤M 1X

(m)j ∈Ck(yj)

(1.25)

An implementable version of (1.24) obtained by replacing Pj by Pj in (1.24).

Remark 1.5.3. When u has been computed at the point (yji , 1 ≤ i ≤ nj), we can extend it toRn by setting u(j, x) = u(j, yji ) for x ∈ Ci(y

j). This allows us to approximate the stoppingtime τ ∗ by

τ = min j;ϕ(Xj) = u(j,Xj) , (1.26)

and leads to another approximation for the price at time zero by

1

M

∑1≤m≤M

B(0, τ (m)

)f(Qτ (m) , X

(m)

τ (m)

),

where τ (m) is the stopping time computed using 1.26 on the trajectory m.

Choice of the quantization sets There are many ways to implement the quantization method,depending on the choice of the quantization sets yj . We present here two of them.

The first one is the Random Quantization algorithm. We assume that nj = n for all j. Weuse a self–quantization procedure to generate the quantization sets, that is to say we set

yjm = X(m)j 1 ≤ j ≤ n 1 ≤ m ≤ n.

where X(m), 1 ≤ m ≤ M are independently drawn along the law of X . This choice is accept-able but is not the best among all the random choices (see [Cohort(2004)]).


A second way to choose the quantization sets is to minimize a quantity reflecting the qualityof the quantization set. The distortion D defined for a set y = (yk, 1leqk ≤ n) and a randomvariable X by

D (y,X)E(

min1≤k≤n

‖X − yk‖2

)is a classical choice for this. Moreover, an error bound for the previously defined approx-imation of the price can be derived in terms of the distortions (D (yj, Xj) , 1 ≤ j ≤ N)(see [Pages and Bally(2000)]). It can be shown that there exists an optimal set y∗ satisfying

D (y∗, X) = miny∈(Rd)

nD (y,X)

This optimal set can be computed numerically by using the Lloyd algorithm : let y(0) ∈(Rd)n

be such that all points are different and define a sequence y(n) setting

y(n+1)k = E

(Y |Y ∈ Ck

(y(n))).

Note that a Monte-Carlo method must be used to approximate the previous expectation. ThenD(y(n), X

)is decreasing and the sequence y(n) converges to a point y∗ which realizes a

local minimum of D(., X). For more details on the Lloyd algorithm and its convergencesee [Sabin and Gray(1986)]. Another classical optimization procedure for D uses the Koho-nen algorithm, see [Pages(1995)].

The Broadie–Glassermann algorithm.

For this algorithm we use a random mesh depending on time and write a dynamical program-ming algorithm at points of this mesh.

Mesh generation. We assume that the transition matrices on Rn of the process (Xn, ngeq0)Pj(x, dy) have a density

fj (x, y) .

This is obviously the case in the Black and Scholes model. We assume moreover that X0 =x0 ∈ Rn.

Let (Y(m)

1 , 1 ≤ m ≤ n) be a sample following the density f0 (x0, y). The set (Y(m)

1 , 1 ≤m ≤ n) define the mesh at time 1.

Then the mesh at time j + 1 (Y(m)j+1 , 1 ≤ m ≤ n) is defined assuming that Yj = (Y

(m)j , 1 ≤

m ≤ n) is already drawn and that the conditional law of Y j+1 knowing Y j is the law of nindependant random variables with (conditional) density

1

n

∑1≤l≤n

fj

(Y

(l)j , y

)dy. (1.27)

The next step consists in writing an approximation of the dynamical programming algorithmu(N, x) = ϕ(x),

u(j, x) max(ϕ(x), Pju(j + 1, x)

)0 ≤ j ≤ N − 1.

(1.28)


using only the points (Y(m)j , 1 ≤ m ≤ n). For this Broadie and Glassermann propose to

approximatePju(j + 1, Y

(k)j ),

by a sum1

n

∑1≤l≤n

βk,lj (Y(k)j , Y

(m)j+1 )u(j + 1, Y

(m)j+1 ). (1.29)

They choose the weights βjkl by considering (1.29) as a Monte-Carlo sum. Indeed, one has

E

(u(j + 1, Y

(m)j+1 )

fj

(Y

(k)j , Y

(m)j+1

)1n

∑1≤l≤n fj

(Y

(l)j , Y

(m)j+1

)∣∣∣∣∣Y (k)j , k = 1, . . . , n

)=

=

∫Rdu(j + 1, u)

fj

(Y

(k)j , u

)1n

∑1≤l≤n fj(Y

(l)j , y)

1

n

∑1≤l≤n

fj

(Y

(l)j , y

)dy

=

∫Rdu(j + 1, u)fj

(Y

(k)j , y

)dy

= Pju(j + 1, Y(k)j )

(1.30)

But (Y(m)j+1 , 1 ≤ m ≤ n) are, conditionally to (Y

(m)j , 1 ≤ m ≤ n), independant and identically

distributed. So a natural estimator for (1.30) is given by

1

n

∑1≤m≤n

βjk,mu(j + 1, Y(m)j+1 ) (1.31)

where

βjk,m =fj

(Y

(k)j , Y

(m)j+1

)1n

∑1≤l≤n fj

(Y

(k)j , Y

(l)j+1

) . (1.32)

The approximation of the dynamical programming algorithm is then given byu(N, Y N

(k)) = ϕ(Y N(k)),

for 0 ≤ k ≤ n, 1 ≤ j ≤ N ,

u(j, Y(k)j ) max

(ϕ(Y

(k)j

),∑

1≤l≤n βjk,m, u(j + 1, Y

(m)j+1 )

),

and the price approximation by u(0, x0).

The Barraquand–Martineau algorithm

To avoid dimensionality issue, the authors propose to approximate the optimal stopping strat-egy a sub–optimal one : assume that the option holder knows at time j the payoff valuesϕ (Xi) , i ≤ n but not the stock values Xi, i ≤ n. Then the option holder can only ex-ercise according to a stopping time with respect to (Gn, n ≥ 0) where Gn = σ(Xi, i ≤ n).Doing this, the price is approximated (by below) by

supτ∈G0,N

E (B (0, τ)ϕ (Xτ )) (1.33)


where G0,N is the set of the Gn–stopping times taking values in 0, . . . , N.To compute (1.33), the authors propose a quantization method for the one-dimensional dy-

namic programming algorithm. Obviously, in general, the process (ϕ(Xn), n ≥ 0) is notMarkovian with respect to G and a second approximation has to be done in order to writethe following dynamical programming algorithm assuming that (ϕ(Xn), n ≥ 0) is Markovian

u(N, x) = ϕ(x)

u(j, x) = max(ϕ(x), Pju(j + 1, x)

), 1 ≤ j ≤ N,

(1.34)

where Pj(x, dy) is given by

E (B(n, n+ 1)f (ϕ(Xn+1)) |ϕ(Xn)) = Pnf (ϕ(Xn)) .

Assume that the two previous approximations are acceptable. The algorithm (1.34) can now behandled by a one–dimensional quantization technique. For 1 ≤ j ≤ N , let

zj2 < . . . < zjnj

⊂ R

and let zj1 = −∞, zjnj+1 = +∞. Then, define the sets yj =yj1, . . . , y

jnj

by

yjk = E(ϕ (Xj) |ϕ (Xj) ∈

[zjk, z

jk+1

]). (1.35)

As in the quantization algorithm, the kernel Pj is discretized on yj × yj+1 by ˆPj where

ˆPj(yjk, y

j+1l

)= B(j, j + 1)

P(ϕ(Xj) ∈ [zjk, z

jk+1], ϕ(Xj+1) ∈ [zj+1

l , zj+1l+1 ]

)P(ϕ(Xj) ∈ [zjk, z

jk+1]

) (1.36)

The next step is to approximate (1.34) using Pj(yjk, y

j+1l

),

û(N, yNk ) = yNk , 1 ≤ k ≤ nNû(j, yjk) = max

(yjk,∑

1≤l≤nj+1

ˆPj(yjk, y

j+1l

)û(j + 1, yj+1

l )),

For 1 ≤ k ≤ nj, 0 ≤ j ≤ N.

(1.37)

The approximation of the price at time 0 is given by û(0, x0).

A Monte-Carlo version of (1.37) is obtain using m samples of the process X , (X(m)j , 1 ≤

j ≤ N) and approximating

yjk with1

Card (Ajk)

∑m∈Ajk

ϕ(X(m)j ),

andˆjP with

Card (Ajk ∩ Aj+1l )

Card (Ajk),

whereAjk =

1 ≤ m ≤ n, ϕ(X

(m)j ) ∈

[zjk, z

jk+1

].

Remark 1.5.4. Comparison of most of the algorithms presented here can be foundin [Fu et al.(2001)].


1.6 Exercises and problems

Exercise 1. Let Z be a Gaussian random variable and K a positive real number.

1. Let d = E(Z)−log(K)√Var (Z)

, prove that

E(1Z≥log(K)(e

Z)

= eE(Z)+ 12

Var (Z)N(d+

√Var (Z)

).

2. Prove the formula (Black and Scholes formula)

E((eZ −K

)+

)= eE(Z)+ 1

2Var (Z)N

(d+

√Var (Z)

)−KN(d),

Exercise 2. Let X and Y be two independent Gaussian random variables with mean 0 andvariance 1.

1. Prove that, if R =√X2 + Y 2 and θ is the polar angle, then θ/2π and exp(−R2/2) are

two independent random variables following a uniform law on [0, 1].

2. Using the previous result, deduce proposition 1.2.5.

Exercise 3. Consider the case of a European call in the Black and Scholes model with a stochas-tic interest rate. Suppose that the price of the stock is 1, and the option price at time 0 is givenE(Z) with Z defined by

Ze−∫ T0 rθdθ

[e∫ T0 rθdθ−σ

2

2T+σWT −K

]+.

1. Prove that the variance of Z is bounded by Ee−σ2T+2σWT .

2. Prove that Ee− 12γ2T+γWT = 1, and deduce an estimate for the variance of Z

Exercise 4. Let λ and K be two real positive numbers such that λ < K and Xm be the randomvariable

Xm =(λeσ(G+m) −K

)+e−mG−

m2

2 .

We denote its variance by σ2m. Give an expression for the derivative of σ2

m with respect tom as an expectation, then deduce that σ2

m is a decreasing function of m when m ≤ m0 =log(K/λ)/σ.

Exercise 5. Let G = (G1, . . . , Gd) be a d-dimensional Gaussian vector with covariance equalto the identity matrix. For each m ∈ Rd, let V (m) denote the variance of the random variable

Xm = φ(G+m)e−m.G−|m|22 .

1. Prove that∂V

∂mi

(m) = E(φ2(G)e−m.G+

|m|22 (mi −Gi)

).

1.6. EXERCISES AND PROBLEMS 35

2. Assume that φ(G) =(∑d

i=1 λieσiGi −K

)+

, λi, σi, K being real positive constant. Let

mi(α) = αλiσi. Prove that, if∑d

i=1 λi < K then

dV (m(α))

dα≤ 0,

for 0 ≤ α ≤ K−∑di=1 λi∑d

i=1(λiσi)2and deduce that if

mi0 = σiλi

K −∑d

i=1 λi∑di=1(λiσi)2

,

then V (m0) ≤ V (0).

Problem 6. The aim of this problem is to prove that the antithetic variable method decreasesthe variance for a function which is monotonous with respect to each of its arguments.

1. Let f and g be two increasing functions from R to R. Prove that, if X and Y are two realrandom variables then we have

E (f(X)g(X)) + E (f(Y )g(Y )) ≥ E (f(X)g(Y )) + E (f(Y )g(X)) .

2. Deduce that, if X is a real random variable, then

E (f(X)g(X)) ≥ E (f(X))E (g(X)) .

3. Prove that if X1, . . . , Xn are n independent random variables then

E (f(X1, . . . , Xn)g(X1, . . . , Xn)|Xn) = φ(Xn),

where φ is a function which can be computed as an expectation.

4. Deduce from this property that if f and g are two increasing (with respect each of itsargument) functions then

E (f(X1, . . . , Xn)g(X1, . . . , Xn))

≥ E (f(X1, . . . , Xn))E (g(X1, . . . , Xn)) .

5. Let h be a function from [0, 1]n in R which is monotonous with respect to each of itsarguments. Let U1, . . . , Un be n independent random variables following the uniform lawon [0, 1]. Prove that

Cov (h(U1, . . . , Un)h(1− U1, . . . , 1− Un)) ≤ 0,

and deduce that in this case the antithetic variable method decreases the variance.

Problem 7. Let X and Y be independent real random variables. Let F and G be the distri-bution functions of X and G respectively We want to compute by a Monte-Carlo method theprobability

θ = P (X + Y ≤ t) .


1. Propose a variance reduction procedure using a conditioning method.

2. We assume that F and G are (at least numerically) easily invertible. Explain how toimplement the antithetic variates methods. Why does this method decrease the variancein this case ?

3. Assume that h is a function such that∫ 1

0|h(s)|2ds < +∞. Let (Ui, i ≥ 1) be a se-

quence of independent random variates with a uniform distribution on [0, 1]. Prove that1

N

∑Ni=1 h ((i− 1 + Ui) /n) has a lower variance than

1

N

∑Ni=1 h (Ui) .

Problem 8. Let Z be a random variable given by

Z = λ1eβ1X1 + λ2e

β2X2 ,

where (X1, X2) is a couple of real random variables and λ1, λ2, β1 and β2 are real positivenumbers. This problem studies various methods to compute the price of an index option givenby p = P (Z > t) .

1. In this question, we assume that (X1, X2) is a Gaussian vector with mean 0 such thatVar (X1) = Var (X2) = 1 and Cov (X1, X2) = ρ, with |ρ| ≤ 1. Explain how tosimulate random samples along the law of Z. Describe a Monte-Carlo method allowingto estimate p and explain how to estimate the error of the method.

2. Explain how to use low discrepancy sequences to compute p.

3. We assume that X1 and X2 are two independent Gaussian random variables with mean 0and variance 1. Let m be a real number. Prove that p can be written as

p = E[φ(X1, X2)1λ1eβ1(X1+m)+λ2eβ2(X2+m)≥t

],

for some function φ. How can we choose m such that

P(λ1eβ1(X1+m) + λ2e

β2(X2+m) ≥ t) ≥ 1

4?

Propose a new Monte-Carlo method which allows to compute p. Explain how to checkon the drawings that the method does reduce the variance.

4. Assuming now that X1 and X2 are two independent random variables with distributionfunctions F1(x) and F2(x) respectively. Prove that

p = E[1−G2

(t− λ1e

β1X1)],

where G2(x) is a function such that the variance of

1−G2

(t− λ1e

λ1X1),

is always less than the variance of 1λ1eβ1X1+λ2eλ2X2>t. Propose a new Monte-Carlomethod to compute p.


5. We assume again that (X1, X2) is a Gaussian vector with mean 0 and such thatVar (X1) = Var (X2) = 1 and Cov (X1, X2) = ρ, with |ρ| ≤ 1. Prove thatp = E [1− F2 (φ(X1))] where F2 is the repartition function of X2 and φ a function tobe computed.

Deduce a variance reduction method computing p.

Problem 9. Calcul par une methode de Monte-Carlo du prix d’un CDS.On considere une suite de temps deterministes 0 < T1 < T2 < · · · < Tn et un taux d’interetr ≥ 0. On note par τ un temps (de defaut) aleatoire et l’on cherche a evaluer par une methodede Monte-Carlo le prix d’un CDS donne par :

P = E (H(τ))

ou :

H(τ) = R

n−1∑i=1

e−rTi1τ>Ti − e−rτ1τ≤Tn

Intensite deterministe On suppose que (λs, s ≥ 0) est une fonction continue non aleatoirede s, telle que, pour tout s ≥ 0, λs > 0. On suppose que τ suit la loi

1t≥0e−

∫ t0 λsdsλtdt.

1. Reconnaıtre la loi de τ lorsque λs ne depend pas de s. Comment peut on, alors, simulerun variable aleatoire suivant la loi de τ ? Montrer que l’on peut calculer P a l’aide d’uneformule simple.

2. Soit ξ une variable aleatoire de loi exponentielle de parametre 1. On note Λ la fonctionde R+ dans R+ definie par :

Λ(t) =

∫ t

0

λsds,

et Λ−1 son inverse. Montrer que Λ−1(ξ) suit la meme loi que τ et, en deduire une methodede simulation selon la loi de τ .

3. En supposant Λ et Λ−1 calculable explicitement, proposer une methode de Monte-Carlopermettant de calculer P . Expliquer comment l’on peut estimer, alors, l’erreur commise.

4. On suppose que Λ(Tn) ≤ K. Montrer que :

P =

(R

n−1∑i=1

e−rTi

)P (ξ > K) + E (H(τ)|ξ ≤ K)P (ξ ≤ K) .

Expliquer comment simuler efficacement une variable aleatoire selon la loi de ξ condi-tionnellement a l’evenement ξ ≤ K.

En deduire une methode de reduction de variance pour le calcul de P .


Intensite aleatoire On suppose dans ce paragraphe on suppose que λt est un processusaleatoire donne par λt = exp(Xt) ou (Xt, t ≥ 0) est solution de :

dXt = −c (Xt − α) dt+ σdWt, X0 = x.

avec c, σ, α des reels donnes et (Wt, t ≥ 0) un mouvement brownien.

On pose τ = Λ−1(ξ), ξ etant un variable aleatoire de loi exponentielle de parametre 1independante de W .

1. Calculer E(λt). Proposer une variable de controle pour le calcul de P . Comment peut onverifier numeriquement que cette variable de controle reduit effectivement la variance ?

2. Calculer :

E

(R

n−1∑i=1

e−rTi1τ>Ti − e−rτ1τ<Tn∣∣∣λt, t ≥ 0

).

En deduire un methode de simulation evitant de simuler la variable aleatoire ξ dont onmontrera qu’elle reduit la variance.

Problem 10. Une methode de fonction d’importance

1. Soit β > 0, on pose ξβ = ξ/β. Montrer que, pour toute fonction f mesurable positive etpour tout β > 0 on a :

E (f(ξ)) = E (f(ξβ)gβ(ξβ)) .

gβ etant une fonction que l’on explicitera en fonction de β.

2. En deduire que pour tout β > 0 :

P = E(gβ(ξ/β)H(Λ−1(ξ/β))

)avec H(t) = R

∑ni=1 e

−rTi1t>Ti − e−rt1t<Tn.

3. On pose Xβ = gβ(ξ/β)H(Λ−1(ξ/β)), montrer que pour tout β > 0 :

Var (Xβ) = E(gβ(ξ)H2(Λ−1(ξ))

)− P 2.

4. Montrer que Var (Xβ) < +∞ si β < 2 et que :

Var (X2) =1

2E(∫ +∞

0

H2(Λ−1(t))dt

)− P 2.

5. Montrer que β → Var (Xβ) est une fonction strictement convexe sur l’intervalle ]0, 2[ etque

limβ→0+

Var (Xβ) = +∞.

En admettant que limt→+∞ Λ−1(t) = +∞ presque surement, montrer quelimβ→2− Var (Xβ) = +∞.


6. Quel β est il souhaitable de choisir dans une methode de Monte-Carlo ? Proposer unmethode permettant d’approcher cette valeur.

Exercise 11. On considere X une variable aleatoire gaussienne centree reduite.

1. Pour f une fonction bornee, on note I = E(f(X)) et I1n et I2

n les estimateurs :

I1n =

1

2n(f(X1) + f(X2) + . . .+ f(X2n−1) + f(X2n)) .

I2n =

1

2n(f(X1) + f(−X1) + . . .+ f(Xn) + f(−Xn)) .

ou (Xn, n ≥ 1) est une suite de variables aleatoires independantes tirees selon la loi deX .

Identifier la limite en loi des variables aleatoires√n(I1

n − I). Meme question pour lafamille de variables aleatoires

√n(I2

n − I).

On calculera la variance des lois limites.

2. Comment peut on estimer la variance de lois limites precedentes a l’aide de l’echantillon(Xn, 1 ≤ i ≤ 2n) pour I1

n et (Xn, 1 ≤ i ≤ n) pour I2n?

Comment evaluer l’erreur dans une methode de Monte-Carlo utilisant I1n ou I2

n ?

3. Montrer que si f est une fonction croissante Cov (f(X), f(−X)) ≤ 0. Quel estdans ce cas, l’estimateur qui vous parait preferable I1

n ou I2n ? Meme question si f est

decroissante.

Exercise 12. Soit G une variable aleatoire gaussienne centree reduite.

1. On pose Lm = exp(−mG− m2

2

), montrer que E(Lmf(G+m)) = E(f(G)) pour toute

fonction f bornee.

Soit Xm une autre variable aleatoire, integrable telle que E(Xmf(G + m)) = E(f(G))pour toute fonction f bornee. Montrer que E (Xm|G) = Lm.

Dans une methode de simulation quelle representation de E(f(G)) vaut il mieux utiliserE(Xmf(G+m)) ou E(Lmf(G+m)) ?

2. Montrer que la variance de Lmf(G+m) se met sous la forme

E(e−mG+m2

2 f 2(G))− E(f(G))2,

et que la valeur de m qui minimise cette variance est donnee parm = E(Gf2(G))E(f2(G))

. Que vautce m optimum lorsque f(x) = x ? Commentaire.

3. Soit p1 et p2 deux nombres positifs de somme 1 et m1 et m2 2 reels, on pose :

l(g) = p1em1g−

m21

2 + p2em2g−

m22

2 .

On pose pour f mesurable bornee µ(f) = E(l(G)f(G)). Montrer que

µ(f) =

∫Rf(x)p(x)dx,

p etant une densite que l’on precisera.


4. Proposer une technique de simulation selon la loi de densite p.

5. On suppose que G est une variable aleatoire suivant la loi precedente. Montrer que :

E(l−1(G)f(G)) = E(f(G)),

Var (l−1(G)f(G)) = E(l−1(G)f 2(G)

)− E(f(G))2.

6. On s’interesse au cas p1 = p2 = 1/2, m1 = −m2 = m et f(x) = x. Montrer que :

Var (l−1(G)G) = E

(em

2/2G2

cosh(mG)

).

On note v(m) cette variance comme fonction de m. Verifier que v′(0) = 0 et v′′(0) < 0.

Comment choisir m pour reduire la variance lors d’un calcul de E(G) ?

Exercise 13. Soit X une variable aleatoire reelle et Y une variable de controle reelle. Onsupposera que E(X2) < +∞ et que E(Y 2) < +∞, E(Y ) = 0 pour i = 1, . . . , n.

1. Soit λ un vecteur de Rn, calculer Var (X − λY ) et la valeur λ∗ qui minimise cettevariance. As t’on interet a supposer X et Y independantes ?

2. On suppose que ((Xn, Yn), n ≥ 0) est une suite de variables aleatoires independantestirees selon la loi du couple (X, Y ). On definit λ∗n par

λ∗n =

∑ni=1XiYi − 1

n

∑ni=1 Xi

∑ni=1 Yi∑n

i=1X2i − 1

n(∑n

i=1Xi)n.

Montrer que λ∗n tends presque surement vers λ∗ lorsque n tend vers +∞.

3. Montrez que√n(λ∗n − λn

)Yn, ou Yn = (Y1 + · · ·+Yn)/n tend vers 0 presque surement.

4. En utilisant le theoreme de Slutsky (voir question suivante) montrer que :

√n

(1

n(X1 − λ∗nY1 + · · ·+Xn − λ∗nYn)− E(X)

)tends vers un loi gaussienne de variance Var (X − λ∗nY ).

Comment interpreter le resultat pour une methode de Monte-Carlo utilisant λX commevariable de controle ?

5. Montrez le theoreme de Slutsky, c’est a dire que si Xn converge en loi vers X et Ynconverge en loi vers une constante a alors le couple (Xn, Yn) converge en loi vers (X, a)(on pourra considerer la fonction caracteristique du couple (Xn, Yn).

Problem 14. Methode de biaisage d’un tirage uniforme par une loi βLe cas de la dimension 1. On note U une variable aleatoire de loi uniforme sur l’intervalle[0, 1]. On considere la famille de loi β(a, 1) dont la densite est donnee, pour a > 0, par :

aua−11u∈[0,1].

On note Va une variable aleatoire de de loi β(a, 1).


1. Proposer une methode de simulation selon la loi β(a, 1). Pour g une fonction bornee,comment peut-on estimer E (g(Va)) a l’aide d’une methode de Monte-Carlo ? Commentobtenir un ordre de grandeur de l’erreur dans cette methode ?

2. Verifier que, si f est une fonction de R dans R telle que E(|f(U)|) < +∞, pour touta > 0 :

E (f(U)) = E(f(Va)

aV a−1a

).

Comment utiliser cette relation pour calculer E (f(U)) a l’aide d’une methode de Monte-Carlo ? Quelle fonction de a doit on alors minimiser pour obtenir une methode optimale?

3. On suppose que f est bornee. Montrer que, pour tout 0 < a < 2 :

σ2a = Var

(f(Va)

aV a−1a

)= E

(f 2(U)

aUa−1

)− E (f(U))2 .

4. En utilisant le lemme de Fatou, montrer que, si P(f(U) 6= 0) > 0, lima→0+ σ2a = +∞.

Puis que, si f est continue en 0 avec f(0) 6= 0, lima→2− σ2a = +∞.

On supposera, dans la suite, que f est bornee, continue en 0 avec f(0) 6= 0 (ce quiimplique que P(f(U) 6= 0) > 0).

5. Montrez que, pour 0 < a < 2, σ2a admet des derivees d’ordre 1 et 2 par rapport a a qui

s’ecrivent sous la forme :

dσ2a

da= E

(f 2(U)g1(a, U)

)etd2σ2

a

da2= E

(f 2(U)g2(a, U)

),

g1 et g2 etant des fonctions de R+∗ × [0, 1] dans R que l’on calculera.

6. En deduire que σ2a est une fonction convexe sur l’intervalle ]0, 2[ qui atteint son minimum

au point a solution unique de l’equation ψ(a) = 0 ou

ψ(a) = E(f 2(U)

a2Ua−1(1− a| ln(U)|)

).

7. Soit (Un, n ≥ 1) une suite de variables aleatoires independantes suivant une loi uniformesur [0, 1]. On definit ψn(a), pour a > 0, par

ψn(a) =1

n

n∑i=1

f 2(Ui)

a2Ua−1i

(1− a| ln(Ui)|)

Verifier que, pour a ∈]0, 2[, ψn(a) converge presque surement vers ψ(a) lorsque n tendsvers +∞.

8. On pose τ = infk ≥ 0, |f(Uk)| > 0. Verifier que P(τ < +∞) = 1. Montrerque, pour n ≥ τ , ψn(a) est une fonction continue et croissante de a verifiant ψn(0−) =−∞ et ψn(+∞) = +∞. En deduire qu’il existe une unique an > 0 tel que ψn(an) =0. Proposer un algorithme permettant de calculer an et puis (sans preuve) une methoded’approximation de a.


Le cas d’une dimension plus grande que 1. Pour d ≥ 1, on note U = (U1, . . . , Ud) unevariable aleatoire de loi uniforme sur [0, 1]d. Pour a > 0, on considere une famille de vecteursaleatoires Va = (V 1

a , . . . , Vda ) composes de variables aleatoires independantes de loi β(a, 1).

1. Montrer que, si f est une fonction de [0, 1]d dans R telle que E(|f(U)|) < +∞, pour touta > 0, on peut ecrire :

E (f(U)) = E (l(a, Va)f(Va)) ,

l etant une fonction de R+∗ × [0, 1]d dans R que l’on precisera.

2. Pour f bornee et 0 < a < 2, exprimer σ2a = Var (l(a, Va)f(Va)) a l’aide de la fonction l

et du vecteur aleatoire U .

3. Expliquer comment l’on a interet a choisir a pour mettre en oeuvre une methode deMonte-Carlo.

On suppose que f est bornee, continue en 0 et telle que f(0) 6= 0. Verifier que σ2a est

une fonction convexe, telle que σ20+ = +∞ et σ2

2− = +∞Montrer que la a optimal est lasolution unique de dσ2

a

da= 0. Montrer que cette condition peut s’ecrire sous la forme

dσ2a

da= E (H(a, U)) = 0.

Proposer (sans preuve) une methode d’approximation de a.

Problem 15. Biaisage gaussien “previsible”Soit G = (G1, G2, . . . , Gn) un vecteur de gaussiennes, centrees, reduites, independantes.

On considere n fonctions λi(g), i = 1, . . . , n, telles que : λi(g) = fi(g1, . . . , gi−1), ou les fisont des fonctions de Ri−1 dans R, supposees regulieres et bornees (f1 et donc λ1 sont supposeesconstantes).

On note Rλ la transformation de Rn dans Rn qui associe a g, g′ = Rλ(g) dont les coor-donnees sont calculees par, i croissant de 1 a n,

g′i = gi + fi(g′1, . . . , g

′i−1) = gi + λi(g

′).

2. Montrer que Rλ est une transformation bijective de Rn dans Rn. On verifiera que R−1λ

son inverse est donne par R−1λ (g′)i = g′i − λi(g′).

Montrer que le determinant de la matrice jacobienne de R−1λ est identiquement egal a 1.

On definit la variable aleatoire Mλn (G) par

Mλn (G) = exp

(n∑i=1

λi(G)Gi −1

2

n∑i=1

λ2i (G)

)

3. Soit Fi = σ(G1, . . . , Gi) montrer que, pour i ≤ n,

E(

exp

(λi(G)Gi −

1

2λ2i (G)

)|Fi−1

)= 1.

En deduire que E(Mλn (G)) = 1.


4. Verifier que l’on definit une nouvelle probabilite P(λ) en posant

P(λ) (A) = E(Mλ

n (G)1A).

et que l’on a (E(λ) designe l’esperance sous la probabilite P(λ))

E(λ)(X) = E(Mλ

n (G)X)

pour toute variable aleatoire P(λ)-integrable X .

5. En utilisant le changement de variable g = R−1λ (g′), montrer que la loi de G sous P(λ) est

identique a celle de Rλ(G) sous P. En deduire une methode de simulation de la loi de Gsous P(λ).

6. On note Xλ = φ(G)/Mλn (G). Verifier que E(λ)(Xλ) = E (φ(G)) et montrer que la

variance de Xλ sous P(λ) est donnee par

Var (λ)(Xλ) = E(e−

∑ni=1 λi(G)Gi+

12

∑ni=1 λ

2i (G)φ(G)2

)− E (φ(G))2

7. Comment peut on simuler une variable aleatoire dont la loi est celle de Xλ =φ(G)/Mλ

n (G) sous P(λ) ? Quel est l’interet de pouvoir representer E (φ(G)) sous laforme E(λ)(Xλ) ?

8. Pour α un reel donne, on pose u(α) = Var (αλ)(Xαλ). Montrer que u est une fonctiondifferentiable et que :

u′(α) = E

((n∑i=1

αλi(G)2 −n∑i=1

λi(G)Gi

)e−

∑ni=1 αλi(G)Gi+

12

∑ni=1 α

2λi(G)2φ(G)2

).

On notera, en particulier, que u′(0) vaut −E ((∑n

i=1 λi(G)Gi)φ(G)2) .

9. Montrer que u est deux fois differentiable et que u′′ peut s’exprimer sous la forme

u′′(α) = E

(n∑i=1

λ2i (G) +

(n∑i=1

αλi(G)2 −n∑i=1

λi(G)Gi

)2×

× e−

n∑i=1

αλi(G)Gi +1

2

n∑i=1

α2λi(G)2

φ(G)2

)

10. On suppose dans la fin de ce probleme que la fonction φ|λ| est non nulle sur un ensem-ble de mesure de Lebesgue non nulle. Montrer que u est strictement convexe et quelimα→±∞ u(α) = +∞.

11. Montrer que si u′(0) 6= 0, il existe un α tel que u(α) < u(0), puis que, pour ce α, αλ estun choix qui permet de reduire la variance de la methode de Monte-Carlo.

12. Montrer, en utilisant la stricte convexite de u, que si u′(0) = 0, 0 realise le minimum u.Interpreter ce resultat.


13. On suppose, pour cette question, que λ est un vecteur de fonctions paires (c’est a dire quiverifie, pour tout g, λ(−g) = λ(g) non nul et que la fonction φ2 est paire. Montrer que,sous ces hypotheses, λ ne permet pas de reduire la variance.

Problem 16. Simulation directionnelle et stratificationSoit G = (G1, . . . , Gd) un vecteur constitue de d gaussiennes centrees reduites independanteset H un fonction continue de Rd dans R et on cherche a calculer la probabilite :

p = P (H(G) ≥ λ) ,

pour une valeur λ positive donnee.

1. Decrire la methode de Monte-Carlo habituelle permettant d’estimer p et expliquer com-ment l’on peut evaluer l’erreur de la methode.

2. On suppose que l’on cherche a approche une valeur de p de l’ordre de 10−10. Donner uneevaluation (grossiere) du nombre minimal de tirages n permettant d’evaluer p a 20% pres.Lorsque le calcul de H demande un temps de l’ordre d’une seconde, qu’en concluez vous? (1 an ≈ 3× 107 secondes)

3. On ecrit le vecteur G sous la forme G = RA ou R = |G| et A = G/|G| prend ses valeursdans la sphere unite de Rd, Sd = x ∈ Rd, |x| = 1.Montrer que R2 suit une loi du χ2 a d degres de liberte et que A suit une loi invariante parrotation dont le support est inclu dans Sd (On admettra que cela implique que la loi de Aest la loi uniforme sur la sphere).

4. A l’aide du changement de variable polaire generalise (g1, . . . , gd)→ (r, θ1, . . . , θd−1)g1 = r cos θ1

g2 = r sin θ1 cos θ2

· · ·gd−1 = r sin θ1 sin θ2 . . . sin θd−2 cos θd−1

gd = r sin θ1 sin θ2 . . . sin θd−2 sin θd−1

montrer que R et A sont des variables aleatoires independantes.

5. On suppose, a partir de maintenant, que pour tout a ∈ Sd la fonction r → H(ra) eststrictement croissante, que H(0) = 0 et limr→+∞H(ra) = +∞.

Soit a ∈ Sd, on pose φ(a) = P(H(Ra) ≥ λ). Calculer explicitement φ(a) en fonctionde la fonction de repartition de la loi du χ2 a d degres de liberte et de la solution deH(r∗a) = λ.

6. Soit (Ai, i ≥ 0) une suite de variables aleatoires independantes et de loi uniforme sur lasphere unite. Montrer que limn→+∞

∑ni=1 φ(Ai) = p, au sens de la convergence presque

sure.

7. Montrer que Var (φ(A1)) ≤ Var(1H(G)≥λ

)et comparer la vitesse de convergence de

l’estimateur de la question 1 et celui de la question precedente.


8. Nous allons maintenant reduire la variance de l’estimateur precedent par une techniquede stratification.

On considere les 2d quadrants definit, pour ε ∈ −1,+1d, par :

Cε = x ∈ Sd, ε1x1 ≥ 0, . . . , εdxd ≥ 0.

Montrer que P(A1 ∈ Cε∩Cε′) = 0, pour ε 6= ε′. En deduire la valeur de ρε = P(A1 ∈ Cε)?

9. On considere l’estimateur In stratifie dans les strates (Cε, ε ∈ −1,+1d) :

In =∑

ε∈−1,+1d

ρεnε

nε∑k=1

φ(A(ε)k ),

ou nε = nwε(n) avec wε(n) ≥ 0 et∑

ε∈−1,+1d wε(n) = 1 et ou les (A(ε)k , k ≥ 0) sont

des suites de variables aleatoires independantes suivant une loi uniforme surCε, ces suitesetant independantes entre elles.

Calculer Var (In) en fonction de n et des (wε(n), ε ∈ −1,+1d), des (ρε, ε ∈−1,+1d) et des (σε, ε ∈ −1,+1d) ou σ2

ε = Var (A(ε)1 ).

10. Montrer l’inegalite : (N∑i=1

αiai

)2

≤N∑i=1

αia2i ,

pour (αi, ai, 1 ≤ i ≤ N) des reels positifs tel que∑N

i=1 αi = 1.

En deduire que Var (In) ≥ 1n

(∑ε∈−1,+1d) ρεσε

)2

.

Comment peut on choisir les (wε(n), ε ∈ −1,+1d) pour realiser la borne inferieureprecedente ?

11. Proposer une methode de simulation selon la loi uniforme sur Cε ?

12. On fixe une fois pour toute les wε (independement de n) par :

wε =ρεσε∑

ε∈−1,+1p ρεσε.

On pose nε = [nwε]. Montrer que, au sens de la convergence presque sure :

limn→+∞

In = p.

13. Montrer que la limite en loi des variables aleatoires :√n(In − p) est une gaussienne

centree de variance(∑

ε∈−1,+1d) ρεσε

)2

.

Comment peut on utiliser ce resultat pour estimer l’erreur commise lorsque l’on estime pa l’aide de In ?

Problem 17. Methode du “carre latin”


Le cas de la dimension 1 On considere un suite de variables aleatoires independantes de loiuniforme sur [0, 1], U = (Ui, i ≥ 1) et une permutation σ tiree uniformement sur l’ensembledes permutations de 1, . . . , N et independante de la suite U . Pour N entier fixe et pour1 ≤ i ≤ N , on pose :

Y Ni =

i− UiN

et XNi = Y N

σ(i).

On considere une fonction mesurable bornee f et l’on cherche a comparer l’estimateur clas-sique :

I0(N) =1

N(f(U1) + · · ·+ f(UN)) ,

avec :I1(N) =

1

N

(f(XN

1 ) + · · ·+ f(XNN ))

1. Quel est la limite de I0(N) lorsqueN tends vers +∞ ? Comment peut on estimer l’erreurcommise lorsque l’on approxime cette limite par I0(N) ?

2. Pour un i fixe,1 ≤ i ≤ N , identifier la loi de XNi . En deduire que E (I1(N)) = E(f(U1))

3. Montrer que I1(N) = 1N

(f(Y N

1 ) + · · ·+ f(Y NN )), et en deduire que

NVar (I1(N)) =

∫ 1

0

f 2(s)ds−NN∑k=1

(∫ k/N

(k−1)/N

f(s)ds

)2

,

4. Verifier que NVar (I1(N)) =N

2

N∑i=1

∫ i/N

(i−1)/N

∫ i/N

(i−1)/N

(f(s) − f(s′))2dsds′. Lorsque f

est une fonction continue, montrer que NVar (I1(N)) tend vers 0 lorsque N tends vers+∞. Que vaut NVar (I0(N)) ?

5. Pour f continue, en quel sens I1(N) convergente t’il vers E(f(U)) ?

6. Quel estimateur vous parait preferable, I0(N) ou I1(N), du point de vue de sa vitesse deconvergence en moyenne quadratique ?

D’un point de vue pratique quels vous paraissent les inconvenients de I1(N) par rapporta I1(N) ?

7. Comment peut-on interpreter l’estimateur I0(N) en terme de methode de stratification ?

Le cas de la dimension 2 On considere maintenant le cas bidimensionnel. U designe un suitede variables aleatoires independantes de loi uniforme sur [0, 1]2, U = ((U1

i , U2i ), i ≥ 1) et deux

permutations independantes σ1 et σ2 tiree uniformement sur l’ensemble des permutations de1, . . . , N et independantes de la suite U . Pour N entier fixe et pour 1 ≤ i, j ≤ N , on pose :

Y Ni,j =

(i− U1

i

N,j − U2

j

N

)et XN

i = Y Nσ1(i),σ2(i).

On pose, comme dans la premiere partie, pour f une fonction bornee de [0, 1]2 dans R :

I0(N) =1

N(f(U1) + · · ·+ f(UN)) et I1(N) =

1

N

(f(XN

1 ) + · · ·+ f(XNN ))


1. Montrer que

E(f(XN

i ))

=1

N2

∑1≤k≤N1≤l≤N

E (f(Yk,l)) = E (f(U1))

2. Montrer que, pour i 6= j :

E(f(XN

i )f(XNj ))

=1

N2(N − 1)2

∑1≤k1 6=k2≤N1≤l1 6=l2≤N

E (f(Yk1,l1))E (f(Yk2,l2)) ,

puis que :

E(f(XN

i )f(XNj ))

=1

N2(N − 1)2

N4E(f(U1))2 −

∑1≤k≤N

1≤l1,l2≤N

E (f(Yk,l1))E (f(Yk,l2))

−∑

1≤k1,k2≤N1≤l≤N

E (f(Yk1,l))E (f(Yk2,l)) +∑

1≤k≤N1≤l≤N

E (f(Yk,l))2.

3. En deduire que, pour f continue :

limN→+∞

(N − 1)Cov (f(XN1 ), f(XN

2 )) = 2E(f(U))2 −∫

[0,1]3f(x, y)f(x, y′)dxdydy′

−∫

[0,1]3f(x, y)f(x′, y)dxdx′dy.

4. Montrer que :

Var (E(f(U)|U1)) =

∫[0,1]3

f(x, y)f(x, y′)dxdydy′ − E (f(U))2 ,

Var (E(f(U)|U2)) =

∫[0,1]3

f(x, y)f(x′, y)dxdx′dy − E (f(U))2 .

5. En deduire que :

limN→+∞

NVar (I1(N)) = Var (f(U))− Var (E(f(U)|U1))− Var (E(f(U)|U2)) .

6. Quel est l’interet d’utiliser I1(N) plutot que I0(N) ? Quels en sont les inconvenients ?

7. Verifier que :

Var f(U)− E(f(U)|U1)− E(f(U)|U2) = Var (f(U))− Var (E(f(U)|U1))

−Var (E(f(U)|U2)) .

Problem 18. Annulation du bais d’une suite d’estimateurs On se donne une variablealeatoire Y a valeurs dans R de carre integrable.


1. On suppose, tout d’abord, que l’on sait simuler exactement la variable aleatoire Y . Pourun ε reel positif, montrer que le nombre N(ε) de tirages aleatoires de Y independantspermettant d’obtenir (avec une grande probabilite) une approximation de E(Y ) avec uneerreur d’environ ε croit comme Cte/ε2.

On suppose maintenant que l’on ne sait pas simuler exactement Y mais seulement qu’il existeune suite de variables aleatoires de carre integrable (Yn, n ≥ 0) facilement simulables1 tellesque

limn→+∞

E((Yn − Y )2

)= 0. (1.38)

2. Montrer que le biais bn = E(Y )−E(Yn) tend vers 0 lorsque n tend vers +∞. Verifier quelimn→+∞ E((Yn − Y )Y ) = 0. En deduire que limn→+∞ E(Y 2

n ) = E(Y 2) et en conclureque limn→+∞Var (Yn) = Var (Y ).

3. On se place dans un cas ou l’on a une estimation a priori du bias du type |bn| ≤ Cnα

, avecα > 0. On choisit un n > 0 et le nombre de tirage N , (Y 1

n , . . . , YNn ), selon la loi de

Yn. On approxime E(Y ) par Y n,N = 1N

∑Ni=1 Y

in. Montrer que, pour N grand, avec une

probabilite proche de 1, on a

|Y n,N − E(Y )| ≤ R(n,N) :=C

nα+ 2

K√N.

4. On prend comme mesure de l’erreur R(n,N) et l’on suppose que le cout de N tiragesselon la loi de Yn est donne par Cout(n,N) = C1nN . On s’impose une contrainte surle cout total du calcul Ctotal. On s’interesse a l’erreur minimale R(n,N) que l’on peutobtenir sous la contrainte de cout Cout(n,N) = Ctotal, definie par :

Error(Ctotal) = minn,N,Cost(n,N)=Ctotal

R(n,N)

Montrer que Error(Ctotal) = Cte × C−α

2α+1

total . En deduire que le cout minimal necessairepour obtenir une precision ε croıt comme Cte × ε−

2α+1α . En comparant avec le resultat

de la quoi question 1, expliquer en quoi la presence du biais degrade la performance del’algorithme de simulation.

Comment se debarasser du biais ? Nous allons voir que l’on peut construire, a partir dela suite (Yn, n ≥ 0) une variable aleatoire simulable et sans biais Y (c’est-a-dire telle queE() = E(Y )) au prix toutefois d’une augmentation de la variance.

On considere, pour cela, une variable aleatoire τ a valeurs dans N∗ independant de la suite(Yn, n ≥ 0) dont la loi est donnee par

P(τ = n) = pn, avec pn > 0 pour tout n ≥ 1. (1.39)

On note ∆Yn = Yn − Yn−1, pour n ≥ 1 (lorsque n = 1, on pose Y0 = 0). On definit Y par

Y =∆Yτpτ

=∑n≥1

∆Ynpn

1τ=n.

1C’est le cas lorsque l’on discretise une equation differentielle stochastique.


5. Montrer que, sous l’hypothese (1.39), E(|Y |) =∑

n≥1 E(|∆Yn|). En deduire que lorsqueE(|Y |) < +∞, on a E(Y |Yn, n ≥ 1) = Y et E(Y ) = E(Y ).

6. Donner un exemple de suite deterministe (∆Yn, n ≥ 1) ou Y =∑

n≥1 ∆Yn convergemais ou E(|Y |) = +∞, pour tout choix de p verifiant l’hypothese (1.39).

7. Montrer que l’on a

E(Y 2) =∑n≥1

E ((∆Yn)2)

pn,

et que lorsque E(Y 2) < +∞, Var (Y ) ≥ Var (Y ).

8. On cherche a approximer E(Y ) ou Y = f(XT ) et Yn = f(X(n)T ) ou X une solu-

tion d’une EDS et X(n) son approximation par un schema d’Euler de pas hn = 1/2n.Montrer (en supposant les coefficients de l’EDS et f aussi reguliers que souhaite) que

E(∣∣∣f(X

(n)T )− f(XT )

∣∣∣2) ≤ Cρn, ρ etant un reel positif inferieur a 1 que l’on precisera.

En deduire que E((∆Yn)2) ≤ 4Cρn.

Par ailleurs le cout d’une simulation de Yn est donne par Cte × 2n. Montrer que, si l’onneglige le temps de simulation de τ , le cout moyen d’une simulation de Y est proportion-nel a

∑n≥1 pn2n.

9. On suppose que, comme dans l’exemple precedent, E ((∆Yn)2) ≤ Cρn, avec ρ < 1 etque le cout d’une simulation de Yn est donne par par cαn avec α > 1.

On suppose enfin que la loi de τ est une loi geometrique2 de parametre q avec 0 < q < 1(on note q′ = 1− q). Verifier que, si q′ > ρ, E((Y )2) < +∞ et que :

E(Y 2) ≤ Cρq′

q(q′ − ρ).

Par ailleurs, montrer que le cout moyen d’une simulation de Y , Cmoy = E(cατ ), est finilorsque αq′ < 1 et qu’il alors egal a cαq/(1− αq′).

En supposant3 que la performance de la simulation est mesuree par le produit E(Cmoy)×E((Y )2) montrer que l’on doit avoir ρα < 1 pour esperer obtenir une methode efficace etque, dans ce cas, le meilleur choix de q′ est q′ =

√ρ/α.

2i.e. pour tout n ≥ 1, pn = q(1− q)n.3Voir, pour des elements de justification, Glynn P.W., Whit W., 1992, The asymptotic efficiency of simulation

estimators, Oper. Res. 40(3):505-520.

Chapter 2

Introduction to stochastic algorithms

2.1 A reminder on martingale convergence theorems

F = (Fn, n ≥ 0) denote an increasing sequence of σ-algebra of a probability space (Ω,A,P).

Definition 2.1.1. A sequence of real random variable (Mn, n ≥ 0) is a F-martingale if andonly if, for all n ≥ 0 :

• Mn is Fn-measurable

• Mn is integrable, E(|Mn|) < +∞.

• E (Mn+1|Fn) = Mn.

- When, for all n ≥ 0, E (Mn+1|Fn) ≤Mn the sequence is called a super-martingale.

- When, for all n ≥ 0, E (Mn+1|Fn) ≥Mn the sequence is called a sub-martingale.

Definition 2.1.2. An F-stopping time is a random variable τ taking its values in N ∪ +∞such that, for all n ≥ 0, τ ≤ n ∈ Fn.

Given a stopping time τ and a process (Mn, n ≥ 0), we can define a stopped process byMn∧τ . It is easy to check that a stopped martingale (resp. sub, super) remains an F-martingale(resp. sub, super).

Exercise 19. Check it using the fact that :

M(n+1)∧τ −Mn∧τ = 1τ>n (Mn+1 −Mn) .

Convergence of super-martingale Almost sure convergence of super-martingale can be ob-tained under weak conditions.

Theorem 2.1.3. Let (Mn, n ≥ 0) be a positive super-martingale with respect to F (i.e. theconditional expectation is decreasing E (Mn+1|Fn) ≤ Mn, then Mn converge almost surely toa random variable M∞ when n goes to +∞.

51

52 CHAPTER 2. INTRODUCTION TO STOCHASTIC ALGORITHMS

For a proof see [Williams(1991)].

Remark 2.1.1. The previous result remain true if, for all n, Mn ≥ −a, with a ≥ 0 (as Mn + ais a positive super-martingale).

To obtain Lp-convergence we need stronger assumptions.

Theorem 2.1.4. Assume (Mn, n ≥ 0) is a martingale with respect to F , bounded in Lp for ap > 1 (i.e. supn≥0 E (|Mn|p) < +∞), then then Mn converge almost surely and in Lp to arandom variable M∞ when n goes to +∞.

Remark 2.1.2. The case p = 1 is a special case, if (Mn, n ≥ 1) is bounded in L1, Mn convergeto M∞ almost surely but we need to add the uniform integrability of the sequence to obtainconvergence in L1.

For a proof of these theorems see for instance [Williams(1991)] chapter 11.

2.1.1 Consequence and examples of uses

We first remind a deterministic lemma know as Kronecker Lemma.

Lemma 2.1.3 (Kronecker Lemma). Let (An, n ≥ 1) be an increasing sequence of strictly posi-tive real numbers, such that limn→+∞An = +∞.

Let (εk, k ≥ 1) be a sequence of real numbers, such that Sn =∑n

i=1 εk/Ak converge when ngoes to +∞.

Then :

limn→+∞

1

An

n∑i=1

εk = 0.

Proof. As Sn converge, we can write Sn = S∞+ηn, ηn being a sequence converging to 0 whenn goes to +∞. Moreover, using Abel transform :

n∑k=1

εk =n∑i=1

AkεkAk

= AnSn −n−1∑k=1

Sk (Ak+1 − Ak) .

So :

1

An

n∑k=1

εk = S∞ + ηn − S∞An − A1

An− 1

An

n−1∑k=1

ηk (Ak+1 − Ak)

= S∞A1

An− 1

An

n−1∑k=1

ηk (Ak+1 − Ak) .(2.1)

The first two terms converge to 0 (as An converge to +∞ and ηn to 0 when n goes to +∞). Thethird one is a (generalized) Cesaro mean of a sequence which converges to 0 so it converges to0.

2.1. A REMINDER ON MARTINGALE CONVERGENCE THEOREMS 53

A proof of the strong law of large number As an application of martingale convergencetheorem, we will give a short proof of the strong law of large numbers for a square integrablerandom variable X . Let (Xn, n ≥ 1) be a sequence of independent random variables followingthe law of X . Denote by X ′n = Xn − E(X).

Let Fn = σ(Xk, k ≤ n) and Mn be :

Mn =n∑k=1

X ′kk.

Note that, using independence and E(X ′k) = 0, Mn is an F-martingale. Moreover, using onceagain independence, we get :

E(M2n) = Var (X)

n∑k=1

1

k2< K < +∞.

So the martingale M is bounded in L2, and using theorem 2.1.4 converge almost surely to M∞.Using Kronecker lemma this implies that :

limn→+∞

1

n

n∑k=1

X ′k = 0,

or limn→+∞1n

∑nk=1Xk = E(X).

We can relax the L2 hypothesis to obtain the full strong law of large numbers under thetraditional L1 condition. See the following exercise (and [Williams(1991)] for a solution ifneeded).

Exercise 20. Suppose that (Xn, n ≥ 1) are independent variables following the law of X , withE(|X|) < +∞. Define Yn by :

Yn = Xn1|Xn|≤n.

1. Prove that limn→+∞ E(Yn) = E(X).

2. Prove that∑+∞

n=1 P(|X| > n) ≤ E(|X|), and deduce that

P(Exists n0(ω), for all n ≥ n0, Xn = Yn) = 1.

3. Prove that : ∑n≥1

Var (Yn)

n2≤ E

(|X|2f(|X|)

),

where

f(z) =∑

n≥max(1,z)

1

n2≤ 2

max(1, z).

Deduce that∑

n≥1 Var (Yn)/n2 ≤ 2E (|X|) < +∞.


4. Let Wn = Yn − E(Yn), prove that∑

k≥nWk

kconverge when n goes to +∞, and deduce,

using Kronecker lemma, that

limn→+∞

1

n

∑k≤n

Wk = 0,

then deduce limn→+∞1n

∑k≤n Yn = E(X).

5. Using the result of question 2, prove that limn→+∞1n

∑k≤nXn = E(X)

An extension of the super-martingale convergence theorem For the proof of the conver-gence of stochastic algorithms we will need on extension of the super-martingale convergencetheorem 2.1.3 known as Robbins-Sigmund lemma.

Lemma 2.1.4 (Robbins-Sigmund lemma). Assume Vn, an, bn, cn are sequences of positiverandom variables adapted to (Fn, n ≥ 0) and that, moreover, fpr all n ≥ 0 :

E (Vn+1|Fn) ≤ (1 + an)Vn + bn − cn,

then, on∑

n≥1 an < +∞,∑

n≥1 bn < +∞

, Vn converge to a random variable V∞ and∑n≥1 cn < +∞.

Proof. Let :

αn =1∏n

i=1 (1 + ak).

Then define V ′n = αn−1Vn, b′n = αnbn, c′n = αncn. Clearly the hypothesis can be rewritten as :

E(V ′n+1|Fn

)≤ V ′n + b′n − c′n.

This means, if Xn = V ′n −∑n−1

k=0(b′k − c′k), that Xn is a super-martingale.

Now consider the stopping time τa :

τa = inf

n ≥ 0,

n−1∑k=0

(b′k − c′k) ≥ a

,

(the infimum is +∞ if the set is empty).

τa is a stooping time such that, if n ≤ τa, then Xn ≥ −a. So Xn∧τa is a super-martingale,bounded from below, so it converges. So we can conclude that limn→+∞Xn exists on the set∪a>0 τa = +∞. But, as an is positive, αn is a positive, decreasing sequence, so it convergesto α when n goes to +∞. Moreover :

αn =1∏n

i=1 (1 + ak)= e−

∑nk=0 log(1+ak) ≥ e−

∑nk=0 ak .

So on the set∑

n≥1 an < +∞,∑

n≥1 bn < +∞

, α > 0. It follows that∑

n≥1 b′n < +∞, and

as, c′n > 0 that, for all n ≥ 0

n∑k=0

b′k −n∑k=0

c′k ≤n∑k=0

b′k < +∞.

2.2. ALMOST SURE CONVERGENCE FOR SOME STOCHASTIC ALGORITHMS 55

So for a >∑n

k=0 b′k, Ta = +∞, and we can conclude that, on the set∑

n≥1

an < +∞,∑n≥1

bn < +∞

,

Xn converge to X∞ when n goes to∞. Moreover :

0 ≤n∑k=0

c′k = Xn − V ′n +n∑k=0

b′k,

so, as Vn ≥ 0, and Xn and∑n

k=0 b′k converge when n goes to +∞, we get that

∑k≥1 c

′k is finite,

then, as αn converge to α > 0 that∑

k≥1 ck is finite.

2.2 Almost sure convergence for some stochastic algorithms

2.2.1 Almost sure convergence for the Robbins-Monro algorithm

Theorem 2.2.1. Let f be a function from Rd to Rd. Assume that :

H1 Hypothesis on the function f . f is continuous and there exists x∗ ∈ Rd, such that f(x∗) =0 and 〈f(x), x− x∗〉 > 0 for x 6= x∗.

H2 Hypothesis on the step size γ. (γn, n ≥ 1) is a decreasing sequence of positive realnumbers such that

∑n≥1 γn = +∞ and

∑n≥1 γ

2n < +∞.

H3 Hypothesis on the sequence Y . (Fn, n ≥ 0) is a filtration on a probability space and(Yn, n ≥ 1) is sequence of random variables on this probability space such that

H3.1 E (Yn+1|Fn) = f(Xn),

H3.2 E(|Yn+1 − f(Xn)|2 |Fn

)≤ σ2(Xn) where

s2(x) = σ2(x) + f 2(x) ≤ K(1 + |x|2).

Define the sequence (Xn, n ≥ 0), by X0 = x0, where x0 is a point in Rd and, for n ≥ 0

Xn+1 = Xn − γnYn+1.

Then limn→+∞Xn = x∗, a.s..

Remark 2.2.1. • The main application of this algorithm arise when f(x) can be written as

f(x) = E (F (x, U)) ,

where U follows a known law and F is a function. In this case Yn is defined as Yn =F (Xn, Un+1) where (Un, n ≥ 1) is a sequence of independent random variables followingthe law of U . Clearly, if Fn = σ(U1, . . . , Un), we have

E (F (Xn, Un+1)|Fn) = f(Xn).


MoreoverE(|F (Xn, Un+1)− f(Xn)|2|Fn

)= σ2(Xn),

where σ2(x) = E (|F (x, U)− f(x)|2). So H3.2 is an hypothesis on behavior of theexpectation and the variance of F (x, U) when |x| goes to +∞.

• The hypothesis H1 is fulfilled when f(x) = ∇V (x) where V is a strictly convex functionand there exists x∗ which minimize V (x). This will be the case in most of our examples.

Proof. First note that hypothesis H3 implies that

E(Y 2n+1|Fn

)= E

((Yn+1 − f(Xn))2|Fn

)+ E

(f(Xn)2

)≤ s2(Xn) ≤ K(1 + |Xn|2) ≤ K ′(1 + |Xn − x∗|2).

(2.2)

Let V (x) = |x− x∗|2 and set Vn = V (Xn). Clearly :

Vn+1 = Vn + γ2nY

2n+1 − 2γn〈Xn − x∗, Yn+1〉.

Taking the conditional expectation we obtain :

E (Vn+1|Fn) = Vn + γ2nE(Y 2n+1|Fn

)− 2γn〈Xn − x∗,E (Yn+1|Fn)〉,

and using hypothesis H3 we get

E (Vn+1|Fn) ≤ Vn + γ2ns

2(Xn)− 2γn〈Xn − x∗, f(Xn)〉.

So, using inequality 2.2, we obtain :

E (Vn+1|Fn) ≤ Vn +Kγ2n (1 + Vn)− 2γn〈Xn − x∗, f(Xn)〉

= Vn(1 +Kγ2

n

)+Kγn − 2γn〈Xn − x∗, f(Xn)〉.

(2.3)

So, using Robbins-Sigmund lemma, with an = bn = Kγ2n, and cn = γn〈Xn − x∗, f(Xn)〉

(which is positive because of hypothesis H1), we get (by hypotheses H2∑

n≥1 γ2n < +∞) that,

both

• Vn converge to V∞, almost surely,

•∑

n≥1 γn〈Xn − x∗, f(Xn)〉 < +∞.

Obviously V∞ is a positive random variable, and we only need to check that this random isequal to 0.

Assume that P(V∞ > 0) > 0, then on the set V∞ > 0 we have 0 < V∞/2 ≤ Vn ≤ 3V∞/2for n ≥ n0(ω), so∑

n≥1

γn〈Xn − x∗, f(Xn)〉 ≥ infV∞/2≤|x−x∗|2≤3V∞

〈x− x∗, f(x)〉∑n≥n0

γn.

But∑

n≥n0γn = +∞ and infV∞/2≤|x−x∗|2≤3V∞〈x − x∗, f(x)〉 > 0 (remind that f , and so

〈x − x∗, f(x)〉, are continuous and V∞/2 ≤ |x− x∗|2 ≤ 3V∞ is a compact set). So on the setV∞ > 0 we should have

∑n≥1 γn〈Xn − x∗, f(Xn)〉 = +∞, but we know that this sum is

almost surely finite. So we have proved that P(V∞ > 0) = 0, which prove the almost sureconvergence of the algorithm.

2.2. ALMOST SURE CONVERGENCE FOR SOME STOCHASTIC ALGORITHMS 57

2.2.2 Almost sure convergence for the Kiefer-Wolfowitz algorithm

The Kiefer-Wolfowitz algorithm is a variant of the Robbins-Monro algorithm. Its convergencecan be proved using the Robbins-Siegmund lemma.

Theorem 2.2.2. Let φ be a function from R to R, such that

φ(x) = E (F (x, U)) ,

where U is a random variable taking its values in Rp and F is a function from R× Rp to R.

We assume that

• φ is a C2 strictly convex function such that

|φ′′(x)| ≤ K(1 + |x|),

and there exist x∗ which minimize φ on R.

• (γn, n ≥ 1) and (cn, n ≥ 1) are such 2 positive, decreasing sequence of real numberssuch that ∑

n≥1

γn = +∞,∑n≥1

γncn < +∞,∑n≥1

γ2n

c2n

< +∞,

• s2(x) = E (F 2(x, U)) ≤ K(1 + |x|).

• (U1n, n ≥ 1) and (U1

n2, n ≥ 1) are 2 independent sequences of independent randomvariables following the of U .

We define (Xn, n ≥ 0) by X0 = x0 ∈ R and, inductively

Xn+1 = Xn − γnF (Xn + cn, U

1n+1)− f(Xn − cn, U2

n+1)

2cn.

Thenlim

n→+∞Xn = x∗, a.s.

Proof. First note that because of the assumptions on φ, for |c| ≤ 1,

|φ(x+ c)− φ(x− c)− 2cφ′(x+ c)| ≤ c2K (1 + |x− x∗|) . (2.4)

Define Vn = |Xn − x∗|2, clearly

Vn+1 = Vn + |Xn+1 −Xn|2 + 2 (Xn − x∗) (Xn+1 −Xn) .

Moreover

|Xn+1 −Xn|2 ≤2γ2

n

4c2n

(∣∣f(Xn + cn, U1n+1

∣∣2 +∣∣f(Xn − cn, U2

n+1

∣∣2 .)So :

E(|Xn+1 −Xn|2 |Fn

)≤ γ2

n

2c2n

(s2(Xn + cn) + s2(Xn − cn)

),


and

E(Vn+1|Fn) ≤ Vn

A1 +γ2n

2c2n

(s2(Xn + cn) + s2(Xn − cn)

)A2 − γn

cn(Xn − x∗) [φ(Xn + cn)− φ(Xn − cn)− 2cnφ

′(Xn)]

A3 − γn(Xn − x∗)φ′(Xn).

(2.5)

Now, assuming that n is large enough to have cn ≤ 1,

A1 ≤ γ2n

c2n

K(1 + |Xn + cn|2 + |Xn − cn|2

)≤ γ2

n

c2n

K ′(1 + |Xn|2

)≤ γ2

n

c2n

K ′′(1 + |Xn − x∗|2

)=γ2n

c2n

K ′′(1 + Vn),

(2.6)

Note that K < K ′ < K ′′ but that, as usual in this kind of proof, K ′′ is still denoted by K 1.Moreover using (2.4), we obtain (using also |x| ≤ (1 + x2)/2)

A2 ≤ 2 |Xn − x∗|γn2cn

Kc2n = Kγncn |Xn − x∗| ≤ Kγncn

(1 + |Xn − x∗|2

)= Kγncn(1+Vn).

Finally we obtain

E(Vn+1|Fn) ≤ Vn(1 +Kγ2n

c2n

+Kγncn) +Kγ2n

c2n

+Kγncn − γn(Xn − x∗)φ′(Xn).

Now we can using the Robbins-Siegmund lemma, setting

• an = bn = K γ2nc2n

+Kγncn

• cn = γn(Xn − x∗)φ′(Xn), which is positive because of the convexity of φ.

So, using the fact that∑

n≥1 an =∑

n≥1 bn < +∞, we obtain Vn converge to V∞ and∑n≥1 cn < +∞. Now using the same argument as in the proof the convergence of the Robbins-

Monro algorithm, we can conclude that P(V∞ = 0) = 1. This ends the proof of the convergenceof the algorithm.

2.3 Speed of convergence of stochastic algorithms

2.3.1 Introduction in a simplified context

Robbins-Monro type algorithms are well known to cause problems of speed of convergence.We will see that these algorithms can lead to central limit theorem (convergence in C/

√n) but

not for an arbitrary choice of γn, in some sense, γn has to be large enough to have an optimalrate of convergence.

1K denote a “constant” which can change from line to line (and even in the same line)!

2.3. SPEED OF CONVERGENCE OF STOCHASTIC ALGORITHMS 59

It is easy to show this in a simplified framework. We assume that

F (x, u) = cx+ u,

where c > 0 and that U follows a Gaussian law with mean 0 and variance 1. The standardRobbins-Monro algorithm can be written as

Xn+1 = Xn − γn (cXn + Un+1) .

with γn = α/(n + 1). In this case f(x) = cx and, using theorem 2.2.1, we can prove that Xn

converge, almost surely, to 0 when n goes to +∞.

To obtain more explicit computations we replace the discrete dynamic by a continuous one

dXt = −γt (cXtdt+ σdWt) , X0 = x.

where γt = αt+1

. Using a standard way to solve this equation, we compute

d(ec

∫ t0 γsdsXt

)= ec

∫ t0 γsds [cγtXtdt− cγtXt − γtσdWt] = −ec

∫ t0 γsdsγtσdWt.

Butec

∫ t0 γsds = ecα

∫ t0

1s+1

ds = (t+ 1)cα.

So, solving the previous equation, we get

Xt =X0

(t+ 1)cα− σα

(t+ 1)cα

∫ t

0

1

(s+ 1)1−cαdWs.

An easy computation leads to

E(X2t ) =

|X0|2

(t+ 1)2cα+

σ2α2

2cα− 1

[1

t+ 1− 1

(t+ 1)2cα

].

We can now guess the asymptotic behavior of Xt

• if 2αx > 1, E(X2t ) behave as C/(t + 1), so we can hope a central limit behavior for the

algorithm,

• if 2αx < 1, E(X2t ) behave as C/(t + 1)2cα which is worse than the awaited central limit

behavior.

We can check on this example (exercise), that, when 2αx > 1√tXt converge in distribution to

a Gaussian random variable, but that, when 2αx < 1, tcαXt converge almost surely to a randomvariable.

We will see in what follows, that we can fully justify on the discrete algorithm : when α islarge enough, a central limit theorem is true and when α is too small an asymptotic convergenceworse that a central limit theorem occurs.

The result on which relies the proof is the central limit theorem for sequence of martingales(see below).


Practical considerations When using a Robbins-Monro style algorithm, in order to have agood behavior, γn has to be chosen large enough in order to have a central limit theorem, butnot too large in order to minimize the variance of the algorithm. One way to do this is to chooseγn = α/n with α large enough, another way is to choose γn = α/nβ , with 1/2 < β < 1. Butthis last choice increase the variance of the algorithm.

2.3.2 L2 and locally L2 martingales

Definition 2.3.1. Let (Fn, n ≥ 0) be a filtration on a probability space. An F-martingale(Mn, n ≥ 0) is called a F-square integrable martingale if, for all n ≥ 0, E(M2

n) < +∞.

in this case, we are able to define a very useful object, the bracket of the martingale. We willsee that the bracket of a martingale give a good indication of the asymptotic behavior of themartingale. This object will be useful to write the central limit theorem for martingales.

Definition 2.3.2. Assume (Mn, n ≥ 0) is a square integrable martingale. There exists a unique,previsible, increasing process (〈M〉n, n ≥ 0), equal at 0 at time 0 such that

M2n − 〈M〉n

is a martingale. Moreover 〈M〉n can be defined by 〈M〉0 = 0 and

〈M〉n+1 − 〈M〉n = E((Mn+1 −Mn)2|Fn

)= E

(M2

n+1|Fn)−M2

n.

Proof. Previsible means here that 〈M〉n isFn−1-measurable. So, it is easy to check that if 〈M〉nis previsible and M2

n − 〈M〉n is a martingale

〈M〉n+1 − 〈M〉n = E(M2

n+1|Fn)−M2

n,

which proves the unicity of 〈M〉 adding 〈M〉0 = 0. Moreover, using the martingale property ofM , on can check that

E(M2

n+1|Fn)−M2

n = E((Mn+1 −Mn)2|Fn

)≥ 0,

which proves that 〈M〉n is increasing.

The following theorem relate the almost sure asymptotic behavior of a square integrablemartingale Mn to its bracket.

Theorem 2.3.3 (Strong law of large number for martingales). Let (Mn, n ≥ 0) be a squareintegrable martingale and denote by (〈M〉n, n ≥ 0) its bracket, then

• on 〈M〉∞ := limn→+∞〈M〉n < +∞, Mn converge almost surely to a random variabledenoted as M∞.

• on 〈M〉∞ = +∞,lim

n→+∞

Mn

〈M〉n= 0, a.s..

Moreover, as soon as a(t) is a positive, increasing function such that∫ +∞

0dt

1+a(t)< +∞

limn→+∞

Mn√a(〈M〉n)

= 0, a.s..


Proof. Define τp asτp = inf n ≥ 0, 〈M〉n+1 ≥ p .

τp is a stopping time as 〈M〉 is previsible. Note that, by definition, 〈M〉τp∧n ≤ 〈M〉τp ≤ p.

So M (p)n = Mn∧τp is also a martingale and 〈M (p)〉n = 〈M〉n∧τp (since M2

n∧τp − 〈M〉n∧τp is amartingale).

Now remark that, for all n ≥ 0

E((M (p)

n )2)

= E(M20 ) + E

(〈M (p)〉n

)≤ E(M2

0 ) + p.

So (Mn∧τp , n ≥ 0) is a martingale bounded in L2 so it converges when n goes to +∞. So onthe set τp = +∞, Mn itself converges to a random variable M∞. As this is true for everyp, we have proved that Mn converge to M∞ on the set ∪p≥0 τp = +∞. But 〈M〉∞ < p =τp = +∞, so

〈M〉∞ < +∞ = ∪p≥0 〈M〉∞ < p ⊂ ∪p≥0 τp = +∞ .

So, the set 〈M〉∞ < +∞, Mn converge to M∞, which proves the first point.

For the second one, we will consider new martingale defined by

Nn =n∑k=1

Mk −Mk−1√1 + a(〈M〉k)

.

(Nn, n ≥ 0) is a martingale as it is defined as martingale transform of the martingale M (recallthat 〈M〉k is Fk−1-measurable, which is exactly what is needed to prove that N is a martingaletransform).

Moreover it is easy to check that this martingale is still a square integrable martingale andthat we can compute its bracket because

〈N〉n+1 − 〈N〉n := E(

(Mn+1 −Mn)2

1 + a(〈M〉n+1)

∣∣∣∣Fn) =〈M〉n+1 − 〈M〉n1 + a(〈M〉n+1)

.

But :

〈N〉n =n∑k=1

〈M〉k − 〈M〉k−1

1 + a(〈M〉k)≤

n∑k=1

∫ 〈M〉k〈M〉k−1

dt

1 + a(t)≤∫ +∞

0

dt

1 + a(t)< +∞.

So 〈N〉∞ < +∞, a.s., and, using first part of this theorem, Nn converge a.s. to N∞. But usingKronecker lemma, as we know that

N∞ =∞∑k=1

Mk −Mk−1√1 + a(〈M〉k)

we conclude that limn→+∞Mn√

1+a(〈M〉n)= 0, and so that limn→+∞

Mn√a(〈M〉n)

= 0. The first case

is obtained for a(t) = t2.


Application to the strong law of large numbers Assume that (Xn, n ≥ 1) is a sequenceof independent random variables following the law of X , such that E(|X|2) < +∞. DefineX ′n = Xn − E(X), then

Sn = X1 + · · ·+Xn − nE(X) = X ′1 + · · ·+X ′n,

is a martingale with respect to σ(X1, . . . , Xn). As, using independence, E ((Sn+1 − Sn)2|Fn) =E((X ′n+1)2|Fn) = Var (X), 〈S〉n = nVar (X). So 〈S〉∞ =∞ and using the previous theoremwe get limn→+∞ Sn/n = 0, which give the strong law of large numbers.

Moreover using a(t) = t1+ε, with ε > 0, we get

limn→+∞

1

nε/2√n

X1 + · · ·+Xn

n− E(X)

= 0, a.s..

So we obtain a useful information on the speed of convergence of X1+···+Xnn

to E(X).

Nevertheless, to obtain the central limit theorem a different tool is needed.

Extension to martingales locally in L2 We can extend the definition of the bracket for alarger class of martingale : the martingale locally in L2. We now give some definitions.

Definition 2.3.4. A process (Mn, n ≥ 0) is a local martingale, if there exists a sequence ofstopping times (τp, p ≥ 0) such that

• τp increase in p and, a.s., goes to∞ when p goes to∞.

• (Mn∧τp , n ≥ 0) is a martingale.

Definition 2.3.5. A process (Mn, n ≥ 0) is a locally square integrable martingale, if it existssequence of stopping times (τp, p ≥ 0) such that

• τp increase in p and, a.s., goes to∞ when p goes to∞.

• (Mn∧τp , n ≥ 0) is a martingale bounded in L2.

Proposition 2.3.1. If (Mn, n ≥ 0) is a locally square integrable martingale, there exist aunique Fn−1-adapted increasing process (〈M〉n, n ≥ 0), equal at 0 at time 0, such that(M2

n − 〈M〉n, n ≥ 0) is a local martingale.

If τp, increasing with p and, a.s., going to∞ when p goes to∞, is such that M τpn = Mτp∧n

is a square integrable martingale then 〈M τp〉n = 〈M〉τp∧n.

Proof. The proof of this proposition is left as an exercise.

The strong law of large number remains true and unchange for martingale locally in L2.

Theorem 2.3.6. Let (Mn, n ≥ 0) be a locally square integrable martingale and denote by(〈M〉n, n ≥ 0) its bracket, then

• on 〈M〉∞ := limn→+∞〈M〉n < +∞, Mn converge almost surely to a random variabledenoted as M∞.


• on 〈M〉∞ = +∞, as soon as a(t) is a positive, increasing function such that∫ +∞0

dt1+a(t)

< +∞

limn→+∞

Mn√a(〈M〉n)

= 0, a.s..

Proof. The proof is almost identical to the square integrable case. it is left as an exercise.

Exercise 21. Let (Xn, n ≥ 1) be a sequence of independent real random variables followingthe law of X , such that E(|X|) < +∞. Denote by Fn = σ(X1, . . . , Xn) and assume that(λn, n ≥ 0) is an Fn-adapted sequence of random variables. define Mn by

Mn =n−1∑k=0

λkXk+1.

• Prove that M is bounded in L2 if and only if∑+∞

k=0 E (λ2k) < +∞.

• Prove that M is a L2-martingale if and only if, for all k ≥ 0 E (λ2k) < +∞.

• Prove that M is a locally L2 martingale if and only if, for all k ≥ 0 |λk| < +∞.

• Proves there exists martingales which are not locally in L2.

2.3.3 Central limit theorem for martingales

We begin by stating the central limit theorem for martingales.

Theorem 2.3.7. Let (Mn, n ≥ 0) be a locally in L2 martingale and a(n) be a sequence ofstrictly positive real numbers increasing to +∞. Assume that

Bracket condition : limn→+∞

1

a(n)〈M〉n = σ2 in probability. (2.7)

Lindeberg condition : for all ε > 0,

limn→+∞

1

a(n)

n∑k=1

E(

(∆Mk+1)21|∆Mk+1|≥ε

√a(n)

∣∣∣∣Fk) = 0, in probability. (2.8)

Then :Mn√a(n)

converge in distribution to σG,

where G is a Gaussian random variable with mean 0 and variance 1.

Remark 2.3.2. Roughly speaking in order to obtain a Central limit theorem for a martingale,we need to get, first, an asymptotic deterministic estimate for 〈M〉n ≈ a(n) when n goes toinfinity and then, to check the Lindeberg condition.

Exercise 22 gives a proof in a simple case in which the role of the martingale hypotheseis clearer than in the proof which is given below. For a complete discussion on martingaleconvergence theorem, we refer to [Hall and Heyde(1980)].


Proof. This proof is a slightly adapted version of the one of [Major(2013)].

We will need an extension of Lebesgue theorem (also known as Lebesgue theorem) whichsay that if Xn converge in probability2 to X and |Xn| ≤ X with E(X) < +∞, thenlimn→+∞ E(Xn) = E(X) (see exercise 24 for a proof).

We denote by M (k) the locally in L2 martingale M (k)j = Mj/

√a(k) and by 〈M (k)〉 its

bracket. Then we introduce for each k ≥ 0 the stopping time

τk = infj ≥ 0, 〈M (k)〉j+1 > 2σ2

, (2.9)

The random variable τk is a stopping time with respect to the σ-algebras Fj , j ≥ 0, since therandom variable 〈M (k)〉j+1 is Fj measurable. Moreover P(limk→+∞ τk = +∞) = 1, becauseP(τk > j) = P(〈M〉j+1 ≤ 2σ2a(k)) and a(k) tends to +∞ when k goes to +∞. Nowintroduce, the stopped process M [k] defined by

M[k]j = M

(k)j∧τk .

We can check that M [k] is a martingale bounded in L2 as

〈M [k]〉j = 〈M (k)〉j∧τk ≤ 2σ2, (2.10)

so E(〈M [k]〉j

)≤ 2σ2 < +∞ and E

((M

[k]j )2

)≤ E

((M

(k)0 )2

)+ 2σ2 < +∞. So M [k] is an

L2 martingale whose braket can be computed as

∆〈M [k]〉j := 〈M [k]〉j − 〈M [k]〉j−1 = E(

(∆M[k]j )2

∣∣∣Fj−1

), where ∆M

[k]j := M

[k]j −M

[k]j−1.

Note that we can rewrite the Lindeberg condition (2.8) as

limn→+∞

k∑j=1

E(

(∆M(k)j )21

|∆M(k)j |≥ε

∣∣∣∣Fj−1

)= 0, in probability.

Moreover |∆M [k]j | = 1j<τk|∆M

(k)j | ≤ |∆M

(k)j |, so

limn→+∞

k∑j=1

E(

(∆M[k]j )21

|∆M [k]j |≥ε

∣∣∣∣Fj−1

)= 0, in probability.

Now, as E(

(∆M[k]j )21

|∆M [k]j |≥ε

∣∣∣∣Fj−1

)≤ E

((∆M

[k]j )2

∣∣∣Fj−1

)= ∆〈M [k]〉j , taking expec-

tation in the previous convergence and justifying it by the (extended) Lebesgue theorem (as〈M [k]〉k ≤ 2σ2) we obtain a stronger Lindeberg condition for M [k]

limn→+∞

k∑j=1

E(

(∆M[k]j )21

|∆M [k]j |≥ε

) = 0. (2.11)

Now note that 〈M [k]〉k = 1a(k)〈M〉k∧τk and, using hyphothesis (2.7), that

limk→+∞

P(τk > k) = limk→+∞

P(〈M〉k ≤ 2σ2a(k)

)= 1.

2which is a weaker asumption than the almost surely convergence usually assumed.


So we have limk→+∞〈M [k]〉k = σ2, in probability. Now taking expectation and using againLebesgue theorem we get a stronger bracket condition for M [k]

limk→+∞

E(〈M [k]〉k

)= σ2. (2.12)

Now let Sk = M(k)k and Sk = M

[k]k = M

(k)k∧τk for k ≥ 1. With these notation,

we want to prove that Sk converge in law to to a gaussian random variable. But Sk −Sk converges in probability to 0 when k → ∞ (since Sk = Sk if τk > k and we have al-ready seen that limk→+∞ P(τk > k) = 1.). So, using Slutzky lemma, it remains to prove theconvergence in probability of Sk to a gaussian random variable, i.e.

limk→∞

E(eitSk) = e−σ2t2/2 for all real numbers t. (2.13)

And we will show that relation (2.13) follows from

limk→∞

E(eitSk+t2Uk/2) = 1 for all real numbers t, (2.14)

where Uk = 〈M [k]〉k. Indeed, Uk converge in probability to σ2 if k → ∞, and 0 ≤ Uk ≤ 2σ2

for all k ≥ 1 because of (2.10). Hence eitSk+t2Uk/2 − eitSk+σ2t2/2 converge in probability to 0for all real numbers t if k →∞, and∣∣∣eitSk+t2Uk/2 − eitSk+σ2t2/2

∣∣∣ ≤ ∣∣∣et2Uk/2 − eσ2t2/2∣∣∣ ≤ et

2σ2

Hence by (extended) Lebesgue’s theorem limk→∞ E(eitSk+t2Uk/2 − eitSk+σ2t2/2) = 0. For-mula (2.13) follows from this statement if we can prove (2.14).

For this we first show that∣∣∣E(eitSk+t2Uk/2)− 1∣∣∣ ≤ eσ

2t

k∑j=1

E∣∣∣et2∆〈M [k]〉j/2E ( eit∆M

[k]j |Fj−1)− 1

∣∣∣ . (2.15)

Indeed, let us introduce the random variables Sk,j = M[k]j , Uk,j = 〈M [k]〉j , for j ≥ 1 and

Sk,0 = 0, Uk,0 = 0 for all indices k ≥ 1. Then we have Sk,k = Sk, Uk,k = Uk, and

E(eitSk+t2Uk/2 − 1

)=

k∑j=1

E(eitSk,j+t

2Uk,j/2 − eitSk,j−1+t2Uk,j−1/2)

=k∑j=1

EeitSk,j−1+t2Uk,j−1/2E[(eit∆M

[k]j +t2∆〈M [k]〉j/2 − 1

∣∣∣Fj−1

)].

Since eitSk,j−1+t2Uk,j−1/2 is bounded by eσ2t2 , it follows from the above identity that∣∣∣E(eitSk+t2Uk/2 − 1)∣∣∣ ≤ eσ

2t

k∑j=1

E∣∣∣E(eit∆M [k]

j +t2∆〈M [k]〉j/2 − 1∣∣∣Fj−1

)∣∣∣ ,and as E

(eit∆M

[k]j +t2∆〈M [k]〉j/2 − 1

∣∣∣Fj−1

)= et

2∆〈M [k]〉j/2E(eit∆M

[k]j

∣∣∣Fj−1

)− 1, this implies

the estimate (2.15).


To prove formula (2.14) with the help of inequality (2.15) we have to give an estimate forE∣∣∣et2∆〈M [k]〉j/2E

(eit∆M

[k]j

∣∣∣Fj−1

)− 1∣∣∣.

The expression et2∆〈M [k]〉j/2 can be written in the form et2∆〈M [k]〉j/2 = 1 +

t2∆〈M [k]〉j2

+ η(1)k,j

with an appropriate random variable η(1)k,j which satisfies the inequality |η(1)

k,j | ≤ K1(t)∆〈M [k]〉2jwith some number K1(t) depending only on the parameter t, because 〈M [k]〉j ≤ 2σ2 by for-mula (2.10). We can estimate the expression

η(2)k,j = E

(eit∆M

[k]j − 1 +

t2(∆M[k]j )2

2

∣∣∣∣∣Fj−1

)in a similar way. To do this let us fix a small number ε > 0, and show that the inequality∣∣∣∣∣eit∆M [k]

j − 1− it∆M [k]j +

t2(∆M[k]j )2

2

∣∣∣∣∣ ≤ α(∆M[k]j ),

holds with α(x) = t2x21|x|>ε + ε6|t|3x21|x|≤ε. Indeed, we get this estimate by bounding the

expression∣∣∣eitx − 1− itx+ t2x2

2

∣∣∣ by t2x2 if |x| > ε and by |t|3|x|36≤ ε |t|

3x2

6if |x| ≤ ε. Using that

E(∆M[k]j |Fj−1) = 0 and taking the conditional expectation in the last inequality with respect

to Fj−1 we get

|η(2)k,j | ≤ E

(∣∣∣∣∣eit∆M [k]j − 1− it∆M [k]

j +t2(∆M

[k]j )2

2

∣∣∣∣∣∣∣∣∣∣Fj−1

)

≤ E(α(∆M

[k]j )∣∣∣Fk,j) ≤ t2E

((∆M

[k]j )21

|∆M [k]j |>ε

∣∣∣∣Fj−1

)+ε

6|t|3∆〈M [k]〉j.

Since 〈M [k]〉j ≤ 2σ2, both η(1)k,j and η(2)

k,j are bounded random variables (with a bound dependingonly on the parameter t), and the above estimates imply that∣∣∣et2∆〈M [k]〉j/2E

(eit∆M

[k]j

∣∣∣Fj−1

)− 1∣∣∣

=

∣∣∣∣(1 +t2∆〈M [k]〉j

2+ η

(1)k,j

)(1− t2∆〈M [k]〉j

2+ η

(2)k,j

)− 1

∣∣∣∣≤ t4(∆〈M [k]〉j)2 +K3(t)

(|η(1)k,j |+ |η

(2)k,j |)

≤ K4(t)

((∆〈M [k]〉j)2 + E

((∆M

[k]j )21

|∆M [k]j |>ε

∣∣∣∣Fj−1

)+ ε∆〈M [k]〉j

).

Let us take the expectation of the left-hand side and right-hand side expression in the last in-equality and sum up for all indices j ≥ 1. The inequality obtained in such a way together withformula (2.15) imply that

|EeitSk+t2Uk/2 − 1| ≤ K5(t)

(k∑j=1

E((∆〈M [k]〉j)2

)(2.16)

+k∑j=1

E(

(∆M[k]j )21

|∆M [k]j |>ε

)+ εk∑j=1

E∆〈M [k]〉j

).


To estimate the first sum at the right-hand side of (2.16) let us make the following estimate:

E

(∆〈M [k]〉j)2

= E

[E(

(∆M[k]j )21|∆M [k]

j |>ε∣∣∣∣Fj−1

)+ E

((∆M

[k]j )21|∆M [k]

j |≤ε∣∣∣∣Fj−1

)]2

≤ 2E

E(

(∆M[k]j )21|∆M [k]

j |>ε∣∣∣∣Fj−1

)2

+ 2E

E(

(∆M[k]j )21|∆M [k]

j |≤ε∣∣∣∣Fj−1

)2

≤ 2E

∆〈M [k]〉jE(

∆M[k]j )21|∆M [k]

j |>ε∣∣∣∣Fj−1

)+ 2ε2E

E(

(∆M[k]j )21|∆M [k]

j |≤ε∣∣∣∣Fj−1

)≤ 4σ2E

((∆M

[k]j )21|∆M [k]

j |>ε)+ 2ε2E

(∆〈M [k]〉j

).

Using this estimate and (2.16), we obtain

∣∣∣EeitSk+t2Uk/2 − 1∣∣∣ ≤ K6(t)

(k∑j=1

E(

(∆M[k]j )21

|∆M [k]j |>ε

)+ ε

k∑j=1

E(∆〈M [k]〉j

)),

and relations (2.11), (2.12) imply formula (2.14). Thus we have proved the central limit theo-rem.

Application to the standard case It is easy to recover the traditional central limit theoremusing the previous corollary. For this, consider a sequence of independent random variablesfollowing the law of X such that E(|X|2) < +∞. Let a(n) = n and Mn = X1 + · · · + Xn −nE(X). M is a martingale and its bracket 〈M〉n = nVar (X) (so obviously 〈M〉n/n convergeto Var (X) = σ2!). It remains to check the Lindeberg condition. But

1

n

n∑k=1

E(X2k+11|Xk+1|≥ε

√n∣∣∣Fk) = E

(X21|X|≥ε√n

).

But integrability of X2 and Lebesgue theorem proves that E(X21|X|≥ε√n

)converge to 0

when n goes to∞.

2.3.4 A central limit theorem for the Robbins-Monro algorithm

We can now derive a central limit theorem for the Robbins-Monro algorithm. We will onlydeal with the uni-dimensional case. This part follows closely [Duflo(1997)] (or [Duflo(1990)]in french).

Theorem 2.3.8. Let F (x, u) be a function from R×Rp to R and U is a random variable takingits values in Rd. We assume that

• f(x) = E (F (x, U)) is C2.

• f(x∗) = 0 and 〈f(x), x− x∗〉 > 0, for x 6= x∗.

• f ′(x∗) = c, where c > 0.


• If σ2(x) = Var (F (x, U)2), s2(x) = σ2(x) + f 2(x) ≤ K (1 + |x|2).

• It exists η > 0 such that, for all x ∈ R

v2+η(x) := E(|F (x, U)|2+η

)< +∞,

and supn≥0 v2+η(Xn) < +∞.

We consider a sequence (Un, n ≥ 1) of independent random variables following the law of Uand γ = α

n. We define Xn by

Xn+1 = Xn − γnF (Xn, Un+1), Xo = x ∈ R.

Then Xn converge almost surely to x∗ and

• if cα > 1/2,√n(Xn − x∗) converge in distribution to a zero mean Gaussian random

variable with variance σ2 = α2σ2(x∗)/(2cα− 1)

• if cα < 1/2, ncα(Xn − x∗) converge almost surely to a random variable.

Remark 2.3.3. It is easy to optimize in α the asymptotic variance α2σ2(x∗)/(2cα − 1) and toprove that the optimal choice is given by α = 1/c.

The same type of TCL can be obtained when γn = α/nβ , with 1/2 < β < 1. In this case itcan be proved that, for every α, (Xn − x∗)/

√γn converge in distribution to a gaussian random

variable, wathever the value of α.

Proof. We begin the proof with a simple deterministic lemma.

Lemma 2.3.4. Let c, α be strictly positive numbers. Assume that γn = α/n.

Define αn = 1− cγn, βn = α1 . . . αn, then there exists a strictly positive number C such that

limn→+∞

βnncα = C and lim

n→+∞(γn/βn)n1−cα =

α

C.

Proof. Using Taylor formula, we get, for 0 < x < 1

0 ≤ log(1− x) + x ≤ x2

2

1

(1− x)2.

So if n ≥ n0, such that cγn < 1/2, we have

0 ≤ log(1− cγn) + cγn ≤ 2c2γ2n.

From this we deduce that, if sn =∑n

k=1 γk,

limn→+∞

βnecsn = C1 > 0.

But limn→+∞∑n

k=1 1/k − log(n) = C2 where C2 is the Euler constant. So we have

limn→+∞

βnnαc = C3 = C1e

αC2 .

and from this we deduce that limn→+∞(γn/βn)n1−cα = αC3

.


In what follows we will assume that x∗ = 0. Our algorithm writes as

Xn+1 = Xn − γnF (Xn, Un+1)

= Xn(1− cγn)− γn [F (Xn, Un+1)− f(Xn)]− γn [f(Xn)− cXn] .

Now, denote

• ∆Mn+1 = F (Xn, Un+1)− f(Xn) (∆Mn+1 is a martingale increment),

• Rn = f(Xn)− cXn (Rn will be “small”),

• αn = 1− cγn and βn = α1 . . . αn.

With these notationsXn+1

βn=

Xn

βn−1

− γnβn

∆Mn+1 −γnβnRn,

so

Xn = X0βn−1 − βn−1

n−1∑k=0

γkβk

∆Mk+1 − βn−1

n−1∑k=0

γkβkRk.

The main point is now to estimate the bracket of Nn the martingale defined by

Nn =n−1∑k=0

γkβk

∆Mk+1.

From this we will deduce the asymptotic behavior of Xn

Clearly

〈N〉n =n−1∑k=0

γ2k

β2k

σ2(Xn).

But we already know that Xn converge to x∗ = 0, so, by continuity of σ, limn→+∞ σ2(Xn) =

σ2(x∗) > 0.

The case where 2cα > 1 In this case we are interested in the convergence in distribution of√n(Xn − x∗) to a gaussian random variable. The main step will be to apply the central limit

theorem for the martingale N . So we have to get an asymptotic estimate to its bracket 〈N〉n.

For this, taking into account that (see previous lemma) γkβk≈ 1

Ckcα−1, we can prove that, for

2cα > 1, limn→+∞〈N〉n = +∞, and it is easy to check that

limn→+∞

∑n−1k=0

γ2kβ2kσ2(Xn)∑n−1

k=0

γ2kβ2kσ2(x∗)

= 1, a.s..

A more precise analysis (left as an exercise) shows that

n1−2cα

n−1∑k=0

γ2k

β2k

= (1 + εn)α2

C2(2cα− 1),


and leads to

limn→+∞

〈N〉nn2cα−1

=α2σ2(x∗)

C2(2cα− 1), a.s..

So we have the bracket condition needed to apply the central limit theorem for martingale Nwith a(n) = n2cα−1. It remains to check the Lindeberg condition. For this note that

E(|X|21|X|≥δ

∣∣B) ≤ E ( |X|2+η| B)

δ2.

So we have

1

a(n)

n−1∑k=1

E(|∆Nk+1|21

|∆Nk+1|≥ε√a(n)

∣∣∣∣Fk)

≤ 1

a(n)

n−1∑k=1

1

(ε√a(n))η

E(|∆Nk+1|2+η

∣∣Fk)=

1

a(n)1+η/2

1

εη

n−1∑k=1

(γkβk

)2

E(|F (Xk, Uk+1)− f(Xk)|2+η

∣∣Fk)≤ L(ω)

εη1

a(n)1+η/2

n−1∑k=1

(γkβk

)2

,

where L(ω) = supn≥0 v2+η(Xn) (which is supposed to be a.s. finite by hypothesis). So we get

1

a(n)

n−1∑k=1

E(|∆Nk+1|21

|∆Nk+1|≥ε√a(n)

∣∣∣∣Fk)

≤ L(ω)

εη1

a(n)1+η/2

n−1∑k=1

(γkβk

)2

≈ Kn−(2cα−1)(1+η/2)n(cα−1)(2+η)+1

= Kn−η/2.

And this ends the proof that Lindeberg condition is fulfilled. So Nn/√a(n) converge in dis-

tribution to a gaussian random variable with variance α2σ2(x∗)C2(2cα−1)

, and so√nβnNn converge to a

gaussian random variable with variance α2σ2(x∗)(2cα−1)

.

To conclude (using Slutzky lemma) that√nXn converge in distribution to the same random

variable we need to prove that√nβn−1X0 converge to 0 in probability (this is immediate) and

that εn =√nβn−1

∑n−1k=0

γkβkRk also converge to 0 in probability. This part is heavily technical

and will not be proved here 3.

3Note, though, that, |Rn| ≤ C|Xn|2, so if we can prove that E(|Xn|2) ≤ K/n, we have

√nβn−1

n−1∑k=0

E(∣∣∣∣γkβkRk

∣∣∣∣) ≤ √nβn−1

n−1∑k=0

γkkβk

≤ K/√n.

So, εn converge in L1 (and so in probability) to 0.


The case where 2cα < 1 In this case, we will prove that Xn−x∗βn

converge almost surely to arandom variable.

For this, we first check that

〈N〉n ≈ σ2(x∗)n−1∑k=0

γ2k

β2k

≈ σ2(x∗)α2Kn−1∑k=0

k2cα−2 < +∞.

So, using the strong law for martingale, Nn converge a.s. to N∞. If remains to check that∑n−1k=0

γkβkRk converge a.s. to obtain the result of the theorem.

But, as |Rk| ≤ C|Xk|2, we have

E

(∣∣∣∣∣n−1∑k=0

γkβkRk

∣∣∣∣∣)≤ CE

(n−1∑k=0

γkβk|Xk|2

)≤ C

n−1∑k=0

γkβk

E(|Xk|2) ≈ Cn−1∑k=0

E(|Xk|2)

k1−cα .

So if we can prove there exist β > cα such that E(|Xn|2) ≤ Cnβ

, we will be able to deduce theabsolute convergence of

∑n−1k=0

γkβkRk.

Exercise 22. In this exercise, we prove the central limit theroem for martingale in a specialcase.

Let (Mn, n ≥ 0) be a martingale such that supn≥0 |∆Mn| ≤ K < +∞, where K is aconstant. M is a square integrable martingale and, so, we can denote by 〈M〉 its bracket.Assume moreover that

limn→+∞

〈M〉nn

= σ2, a.s. (2.17)

where σ is a positive real number.

1. For λ real, let φj(λ) = logE(eλ∆Mj

∣∣Fj−1

), prove that

Xn = exp

(λMn −

n∑j=1

φj(λ)

),

is a martingale.

2. For u ∈ R and n large enough (to be able to define the complex logarithm) prove that

exp

(iuMn√n−

n∑j=1

φj(iu/√n)

)

is a complex martingale and so

E

[exp

(iuMn√n−

n∑j=1

φj(iu/√n)

)]= 1.

3. Using |ex − 1− x− x2/2| ≤ |x|3, for |x| ≤ 1 prove that, for n large enough∣∣∣∣E(ei u√n∆Mj

∣∣∣Fj−1

)− 1 +

u2

2nE(

(∆Mj)2∣∣Fj−1

)∣∣∣∣ ≤ u3

n3/2K3,


then that, using | log(1 + x)− x| ≤ |x|2, that for a c,∣∣∣∣φj ( iu√n

)− u2

2nE(

(∆Mj)2∣∣Fj−1

)∣∣∣∣ ≤ c

n3/2

4. Prove, using 2.17

limn→+∞

n∑j=1

φj

(iu√n

)= −σ

2u2

2, a.s.

5. Proves that

limn→+∞

E

[exp

(iuMn√n−

n∑j=1

φj(iu/√n)

)]− E

[exp

(iuMn√n

+σ2u2

2

)]= 0,

and deduce that limn→+∞ E[exp

(iuMn√

n

)]= exp

(σ2u2

2

).

6. Conclude that Mn√n

converge in distribution to a gaussian random variable.

7. Generalize the result when

limn→+∞

〈M〉na(n)

= σ2, a.s. (2.18)

Exercise 23. Assume that (un, n ≥ 0) and (bn, n ≥ 0) are two sequence of positive realnumbers, c > 0 such that, for all n ≥ 0

un+1 ≤ un

(1− c

n

)+ bn.

1. Let βn = 1/(∏n−1

i=1 (1− ck)). Prove that

un ≤K

βn+

1

βn

n−1∑k=1

βk+1bk ≤K

βn+

n−1∑k=1

bk

2. Assume that bk = Ckα

, with α > 1, prove that

un ≤K

nc+

K

nα−1≤ K

ninf(c,α−1).

Exercise 24. We assume that Xn converge in probability to X and that |Xn| ≤ X withE(X)< +∞. We want to prove that limn→+∞ E(Xn) = E(X).

1. Let K be a positive real number and define φK(x) by φK(x) = (−K)1x<−K +x1|x|≤K +K1K<x. Prove that |φK(x)− φK(y)| ≤ |x− y|.

2. Prove that

E (|Xn −X|) ≤ E (|φK(Xn)− φK(X)|) + 2E(X1X≥K).


3. Prove that limn→∞ E(X1X≥K) = 0.

4. Prove for a given K that,

E (|φK(Xn)− φK(X)|) ≤ 2KP (|Xn −X| ≥ ε) + ε,

and deduce the extended Lebesgue theorem.

Exercise 25. We assume that Xn converge in distribution to X and that f is a continuousfunction such that E(|f(Xn)|) < +∞, for all n ≥ 1 and satisfies a property of equi-integrability

limA→+∞

E(|f(Xn)|1|f(Xn)|>A

)= 0

We want to prove that limn→+∞ E(f(Xn)) = E(f(X)).

1. Prove that supn≥1 E(|f(Xn)|) < +∞ and, considering the continuous function f(x)∧K,that E(|f(X)|) ≤ supn≥1 E(|f(Xn)|).

2. Prove that there exists a family of continuous functions φAδ (x), x ∈ R such that, φAδ (x) ≤1x>A and for every x ∈ R, φAδ (x) converge to 1x>A.

3. Prove that

|E(f(Xn)− E(f(X))| ≤∣∣E (f(Xn)[1− φAδ (Xn)

])− E

(f(X)

[1− φAδ (X)

])∣∣+ E

(|f(Xn)|1|f(Xn)|>A

)+ E

(|f(X)|1|f(X)|>A

),

and conclude.

4. Find an alternative way to recover the result of the exercise 24.

Problem 26. Une methode de Monte-Carlo adaptative

On considere une fonction f , mesurable et bornee, de Rp dans R et X une variable aleatoirea valeur dans Rp.

On s’interesse a un cas ou l’on sait representer E (f(X)) sous la forme

E (f(X)) = E (H(f ;λ, U)) , (2.19)

ou λ ∈ Rn, U est une variable aleatoire a valeur dans [0, 1]d suivant une loi uniforme et ou, pourtout λ ∈ Rn, H(f ;λ, U) est une variable aleatoire de carre integrable qui prend des valeursreelles. La question 2 montrer que cela est generalement possible.

Le but de ce probleme est de montrer que l’on peut, dans ce cas, faire varier λ au cours destirages tout en conservant les proprietes de convergence d’un algorithme de type Monte-Carlo.

1. On note φ la fonction de repartition d’une gaussienne centree reduite et φ−1 son inverse.On suppose que X est une variable aleatoire reelle de loi gaussienne centree reduite.

Montrer que, si λ ∈ R, H definie par :

H(f ;λ, U) = e−λG−λ2

2 f (G+ λ) , ou G = φ−1(U)

permet de satisfaire la condition (2.19).


2. Soit X une variable aleatoire a valeur dans Rp, de loi arbitraire, que l’on peut obtenirpar une methode de simulation: cela signifie qu’il existe une fonction ψ de [0, 1]d dansRp telle que, si U = (U1, . . . , Ud) suit une loi uniforme sur [0, 1]d, la loi de ψ(U) estidentique a celle de X .

On pose, pour i = 1, . . . , d, Gi = φ−1(Ui) et l’on considere λ = (λ1, . . . , λd) ∈ Rd.Proposer a partir de (G1 +λ1, . . . , Gd+λd) un choix de variable aleatoire H(f ;λ, U) quipermet de satisfaire l’egalite (2.19).

3. On considere une suite (Un, n ≥ 1) de variables aleatoires independantes uniformementdistribuees sur [0, 1]d. On note Fn = σ

(Uk, 1 ≤ k ≤ n

).

Soit λ un vecteur fixe, comment peut on utiliser H(f ;λ, U) pour estimer E (f(X)) ?Comment peut-on estimer l’erreur commise sur cette estimation ?

Quel est le critere pertinent pour choisir λ ?

On suppose que λ n’est plus constant mais evolue au fils du temps et est donne par une suite(λn, n ≥ 0) de variables aleatoires Fn-mesurable (λ0 est supposee constante). On suppose que

pour tout λ ∈ Rp, s2(λ) = Var (H(f ;λ, U)) < +∞,et s2(λ) est une fonction continue de Rd dans R. (2.20)

4. On pose

Mn =n−1∑i=0

[H(f ;λi, Ui+1)− E(f(X))] .

Montrer que, si H est uniformement bornee4, (Mn, n ≥ 0) est une Fn-martingale dont lecrochet 〈M〉n s’exprime sous la forme 〈M〉n =

∑n−1i=0 s

2(λi).

5. Plus generalement montrer que, sous l’hypothese (2.20), (Mn, n ≥ 0) est une mar-tingale localement dans L2 de crochet toujours donne par 〈M〉n =

∑n−1i=0 s

2(λi) (onpourra utiliser la famille de temps d’arret τA = inf n ≥ 0, |λn| > A et verifier que(Mt∧τA , n ≥ 0) est une martingale bornee dans L2).

6. En utilisant la loi forte de grand nombre pour les martingales localement dans L2, montrerque si

limn→+∞

∑n−1i=0 s

2(λi)

n= c, (2.21)

ou c est une constante strictement positive, alors

limn→+∞

1

n

n−1∑i=0

H(f ;λi, Ui+1) = E(f(X)).

Interpreter ce resultat en terme de methode de Monte-Carlo. Verifier que, si λn convergepresque surement vers λ∗ tel que s2(λ∗) > 0, le resultat de cette question est vrai.

7. Quelle hypothese faudrait-t’il ajouter a (2.21) pour obtenir un theoreme central limitedans la methode precedente (ne pas chercher a la verifier) ? Enoncer le resultat que l’onobtiendrait alors.

4i.e., il existe K reel positif tel que pour tout λ et u, |H(f ;λ, u)| ≤ K < +∞

Bibliography

[Abramovitz and Stegun(1970)] M. Abramovitz and A. Stegun. Handbook of MathematicalFunctions. Dover, 9th edition, 1970.

[Antonov and Saleev(1980)] I.A. Antonov and V.M. Saleev. An economic method of comput-ing lpτ -sequences. USSR Comput. Maths. Math. Phys, 19:252–256, 1980.

[Barraquand and Martineau(1995)] J. Barraquand and D. Martineau. Numerical valuation ofhigh dimensional multivariate american securities. J. of Finance and Quantitative Analy-sis, 30:383–405, 1995.

[Broadie and Glassermann(1997a)] M. Broadie and P. Glassermann. Pricing american-stylesecurities using simulation. J.of Economic Dynamics and Control, 21:1323–1352, 1997a.

[Broadie and Glassermann(1997b)] M. Broadie and P. Glassermann. A stochastic mesh methodfor pricing high-dimensional american options. Working Paper, Columbia University:1–37, 1997b.

[Clement et al.(2002)] Emmanuelle Clement, Damien Lamberton, and Philip Protter. An anal-ysis of a least squares regression method for American option pricing. Finance Stoch., 6(4):449–471, 2002.

[Cochran(1953)] William G. Cochran. Sampling techniques. John Wiley & Sons, Inc., NewYork; Chapman & Hall, Ltd., London, 1953.

[Cohort(2001)] P. Cohort. Monte–carlo methods for pricing american options. Doc-umentation for the premia software, INRIA and ENPC, February 2001. seehttp://cermics.enpc.fr/˜premia.

[Cohort(2004)] Pierre Cohort. Limit theorems for random normalized distortion. Ann. Appl.Probab., 14(1):118–143, 2004.

[Devroye(1986)] Luc Devroye. Nonuniform random variate generation. Springer-Verlag, NewYork, 1986.

[Duflo(1990)] Marie Duflo. Methodes recursives aleatoires. Techniques Stochastiques. Mas-son, Paris, 1990.

[Duflo(1997)] Marie Duflo. Random iterative models, volume 34 of Applications of Mathemat-ics (New York). Springer-Verlag, Berlin, 1997. Translated from the 1990 French originalby Stephen S. Wilson and revised by the author.

75

76 BIBLIOGRAPHY

[Faure(1981)] H. Faure. Discrepance de suites associees a un systeme de numeration (en di-mension 1). Bull. Soc. Math. France, 109:143–182, 1981.

[Faure(1982)] H. Faure. Discrepance de suites associees a un systeme de numeration (en di-mension s). Acta Arith., 41:337–351, 1982.

[Fox(1988)] B.L. Fox. Algorithm 659:implementating Sobol’s quasirandom sequence genera-tor. ACM Transaction on Mathematical Software, 14(1):88–100, 1988.

[Fox et al.(1992)] B.L. Fox, P. Bratley, and H. Neiderreiter. Implementation and test of lowdiscrepancy sequences. ACM Trans. Model. Comput. Simul., 2(3):195–213, July 1992.

[Fu et al.(2001)] M.C. Fu, S.B. Laprise, D.B. Madan, Y. Su, and R. Wu. Pricing americanoptions: a comparison of monte carlo simulation approaches. Journal of ComputationalFinance, 4(3), Spring 2001.

[Glasserman et al.(1999)] P. Glasserman, P. Heidelberger, and P. Shahabuddin. Asymptoticallyoptimal importance sampling and stratification for pricing path dependent opions. Math-ematical Finance, 9(2):117–152, April 1999.

[Hall and Heyde(1980)] P. Hall and C. C. Heyde. Martingale limit theory and its application.Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York-London, 1980.Probability and Mathematical Statistics.

[Hammersley and Handscomb(1979)] J. Hammersley and D. Handscomb. Monte Carlo Meth-ods. Chapman and Hall, London, 1979.

[Kalos and Whitlock(2008)] Malvin H. Kalos and Paula A. Whitlock. Monte Carlo methods.Wiley-Blackwell, Weinheim, second edition, 2008.

[Knuth(1998)] Donald E. Knuth. The art of computer programming. Vol. 2. Addison-Wesley,Reading, MA, 1998. Seminumerical algorithms, Third edition [of MR0286318].

[Ksas(2000)] Frederic Ksas. Methodes de Monte-Carlo et suites a discrepance faible ap-pliquees au calcul d’options en finance. PhD thesis, Universite d’Evry, Juillet 2000.

[L’Ecuyer(1990)] P. L’Ecuyer. Random numbers for simulation. Communications of the ACM,33(10), Octobre 1990.

[Longstaff and Schwartz(1998)] F.A. Longstaff and E.S. Schwartz. Valuing american op-tions by simulations:a simple least-squares approach. Working Paper Anderson GraduateSchool of Management University of California, 25, 1998.

[Major(2013)] Peter Major. Central limit theorems for martingales. Technicalreport, Mathematical Institute of the Hungarian Academy of Sciences, 2013.http://www.math-inst.hu/˜major/probability/martingale.pdf.

[Niederreiter(1995)] H. Niederreiter. New developments in uniform pseudorandom numberand vector generation. In P.J.-S. Shiue and H. Niederreiter, editors, Monte Carlo andQuasi-Monte Carlo Methods in Scientific Computing, volume 106 of Lecture Notes inStatistics, pages 87–120, Heidelberg, New York, 1995. Springer-Verlag.

BIBLIOGRAPHY 77

[Niederreiter(1992)] Harald Niederreiter. Random number generation and quasi-Monte Carlomethods, volume 63 of CBMS-NSF Regional Conference Series in Applied Mathematics.Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1992.

[Pages and Bally(2000)] G. Pages and V. Bally. A quantization algorithm for solving multi–dimensional optimal stopping problems. Technical report 628, universite Paris 6, 2000.

[Pages et al.(2001)] G. Pages, V. Bally, and Printems. A stochastic optimisation method fornonlinear problems. Technical report number 02, University Paris XII, 2001.

[Pages(1995)] J.C. Fort G. Pages. About the a.s. convergence of the Kohonen algorithm with ageneral neighborhood function. The Annals of Applied Probability, 5(4), 1995.

[Pages and Xiao(1997)] G. Pages and Y.J. Xiao. Sequences with low discrepancy and pseudorandom numbers: theoretical results and numerical tests. J. Statist. Comput. Simul, 56:163–188, 1997.

[Press et al.(1992)] W. H. Press, Saul A. Teukolsky, W. T. Vetterling, and Brian P. Flannery.Numerical receipes in C. Cambridge University Press, Cambridge, second edition, 1992.The art of scientific computing.

[Radovic et al.(1996)] Igor Radovic, Ilya M. Sobol’, and Robert F. Tichy. Quasi-Monte Carlomethods for numerical integration: Comparison of different low discrepancy sequences.Monte Carlo Methods Appl., 2(1):1–14, 1996.

[Ripley(2006)] Brian D. Ripley. Stochastic simulation. Wiley Series in Probability and Statis-tics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2006. Reprint of the 1987original, Wiley-Interscience Paperback Series.

[Roman(1992)] S. Roman. Coding and information theory. Number 134 in Graduate Texts inMathematics. Springer-Verlag, New York, 1992.

[Roth(1954)] K. F. Roth. On irregularities of distribution. Mathematika, 1:73–79, 1954.

[Rubinstein(1981)] R.Y. Rubinstein. Simulation and the Monte Carlo Method. Wiley Series inProbabilities and Mathematical Statistics, 1981.

[Sabin and Gray(1986)] M.J. Sabin and R.M. Gray. Global convergence and empirical consis-tency of the generalised lloyd algorithm. IEEE Transactions on Information Theory, 32:148–155, March 1986.

[Sobol’(1967)] I.M. Sobol’. On the distribution of points in a cube and the approximate evalu-ation od integral. USSR Computational Math. and Math. Phys., 7:86–112, 1967.

[Sobol’(1976)] I.M. Sobol’. Uniformly distributed sequences with an additional unifrom prop-erty. USSR Computational Math. and Math. Phys., pages 236–242, 1976.

[Tsitsiklis and Roy(2000)] J.N. Tsitsiklis and B. Van Roy. Regression methods for pricingcomplex american-style options. Working Paper, MIT:1–22, 2000.

[Williams(1991)] David Williams. Probability with martingales. Cambridge MathematicalTextbooks. Cambridge University Press, Cambridge, 1991.

Methodes de Monte Carlo´ et Algorithmes Stochastiquescermics.enpc.fr/~bl/monte-carlo/poly.pdf ·...

Documents

Transcript of Methodes de Monte Carlo´ et Algorithmes Stochastiquescermics.enpc.fr/~bl/monte-carlo/poly.pdf ·...