Financial Engineering

24
Mathematical Methods for Financial Engineering, I Autumn 2009 Raymond Brummelhuis Department of Economics, Mathematics and Statistics, Birkbeck College, University of London, Malet Street, London WC1E 7HX October 1, 2009

description

Fin eng course 1

Transcript of Financial Engineering

  • Mathematical Methods for

    Financial Engineering, I

    Autumn 2009

    Raymond Brummelhuis

    Department of Economics, Mathematics and Statistics,

    Birkbeck College, University of London,

    Malet Street, London WC1E 7HX

    October 1, 2009

  • 2

  • Chapter 1

    Introduction

    Market prices of liquidly traded financial assets depend on a huge number offactors: macro-economic ones such as interest rates, inflation, balanced or un-balanced budgets, micro-economic and business-specific factors, e.g., flexibilityof labour markets, sale numbers and investments. Further, more elusive psy-chological factors play a part, for instance, the aggregate expectations, illusionsand disillusions of the various market players, both professional ones (stock-brokers, market makers, fund managers, banks and institutional investors suchas pension funds), and humble private investors (e.g., academics seeking to com-plement their modest salary). Although many people have dreamed (and still dodream) of all-encompassing deterministic models for stock prices, for example,the number of potentially influencing factors seems too high to provide realistichope of such a thing. Thus, in the first half of the 20th century researchers be-gan to consider that a statistical approach to financial markets might be best.In fact this started at the very beginning of the century, in 1900, when a youngFrench mathematician, Louis Bachelier, defended a thesis at the Sorbonne inParis, France, on a probabilistic model of the French Bourse. In his thesis,he developed the first mathematical model of what later came to be known asBrownian motion, with the specific aim of giving a statistical description ofthe prices of financial transactions on the Paris stock market. The phrase withwhich he ended his thesis, that the Bourse, without knowing it, follows thelaws of probability, is still a guiding principle of modern quantitative finance.Sadly, Bacheliers work was forgotten for about half a century, but was rediscov-ered in the 1960s (in part independently), and adapted by the economist PaulSamuelson to give what is still the basic model of the price of a freely tradedsecurity, the so-called exponential (or geometric) Brownian motion.1 The roleof probability in finance has since then only increased, and quite sophisticatedtools of modern mathematical probability, e.g., stochastic differential calculus,martingales and stopping times, in combination with an array of equally sophis-ticated analytic and numerical methods, are routinely used in the daily businessof pricing, hedging and risk assessment of increasingly elaborate financial prod-ucts. The Mathematical methods Module of the Financial Engineering MSc isdesigned to teach you the necessary mathematical background to the moderntheory of asset pricing. As such, it splits quite naturally into two parts: part

    1Peter Bernsteins book Capital Ideas [4] gives an account of the history of quantitativefinance in the 20th century.

    3

  • 4I, to be taught during the Autumn semester, will concentrate on the necessaryprobability theory, while part II, to be given in the Spring Semester (BirkbeckCollege does not acknowledge the existence of Winter), will treat the numeri-cal mathematics which is necessary to get reliable numbers out of the variousmathematical models to which you will be exposed.

    Modern quantitative finance is founded upon the concept of stochastic pro-cesses as the basic description of the price of liquidly traded assets, in particularthose which serve as underlyings for derivative instruments such as options, fu-tures, swaps and the like. To price these derivatives it makes extensive use ofwhat is called stochastic calculus (also known as Ito calculus, in honour of itsinventor). Stochastic calculus can be thought of as the extension of ordinarydifferential calculus to the case where the variables (both dependent and inde-pendent) can be random, that is, have a value which depends on chance. Recallthat in ordinary calculus, as created by Newton and Leibniz in the 17th cen-tury, we are interested in the behaviour of functions of, in the simplest case, oneindependent variable, i.e., y = f(x), for x R. In particular, it was importantto estimate the perturbation in the dependent variable y if x changes by somesmall amount x. The answer is given by the derivative, f (x), and by therelation

    y = f(x) = f(x+x) f(x) ' f (x)x,

    where ' means that we are neglecting higher powers of x. If we did not makesuch an approximation, we would have to include further terms of the Taylorseries (provided f is sufficiently many times differentiable):

    f(x+x) f(x) = f (x)x+ f(x)2!

    (x)2 + + f(k)(x)k!

    (x)k + .

    It is convenient at this point to follow our 17th- and 18th-century mathematicalancestors, and introduce what are called infinitesimals dx (also called differen-tials). These are non-zero quantities whose higher powers are equal to 0:

    (dx)2 = (dx)3 = = 0.

    We then simply write f (x) = (f(x+ dx) f(x))/dx, or

    df(x) = f (x) dx.

    The mathematical problem with infinitesimals is that they cannot be real num-bers, since no non-zero real number has its square equal to 0. For appliedmathematics this is less of a problem, since we simply think of dx as a num-ber that is so small that its square and higher powers can safely be neglected.Physicists and engineers, unlike pure mathematicians, have in any case neverstopped using infinitesimals. Moreover, although they are mathematically notquite rigorous, they generally lead to correct results if used with care, and mosttrained mathematicians can take any argument involving infinitesimals and rou-tinely convert it into a mathematically flawless proof leading to the same finalresult. What is more, infinitesimals can often be used to great effect to bringout the basic intuition underlying results that might otherwise seem miraculousor obscure.

  • Mathematical Methods I, Autumn 2009 5

    A case in point will be stochastic calculus, which aims to extend the notionof derivative to the case where x and y above are replaced by random vari-ables (which we will systematically denote by capital letters, e.g., X and Y ).In this case the small changes dX will be stochastic also, and we have to tryto establish a relation between an infinitesimal stochastic change of the inde-pendent (stochastic) variable, dX, and the corresponding change in Y = f(X),f(X+dX)f(X). In fact, in most applications, we need to consider an infinitefamily of random variables (Xt)t0, where t is a positive real (non-stochastic!)parameter, usually interpreted as time; for example, Xt might represent themarket price of a stock at time t 0. The first thing to do then is to establisha relation between dXt and dt.

    It turns out that, for a large class of stochastic processes called diffusionprocesses, or Ito processes,2 it is useful to take dXt to be a Gaussian randomvariable, with variance proportional to dt. This implies that (dXt)2 will be ofsize ' dt, and can no longer be neglected in the Taylor expansion of f(Xt+dXt).However, (dXt)3 ' (dt)3/2 and such higher powers will still be negligible. Wetherefore expect a formula of the form

    df(Xt) = f (Xt) dXt +12f (Xt)(dXt)2,

    which is essentially the famous Ito lemma. Moreover, (dXt)2 turns out to benon-stochastic, but essentially a multiple of dt. The simplest stochastic pro-cess to which we can apply these ideas is the aforementioned Brownian motion(Wt)t0, which will in fact serve as the basic building block for more compli-cated processes. Brownian motion turns out to be continuous, in the sense thatif two consecutive times s < t are very close to each other, the (random) val-ues Ws and Wt will also be very close, with probability closer to one. Modernfinance also uses other kinds of processes, which can suddenly jump from onevalue to another between one instant of time and the next. There is a similarbasic building block in this case, which is called the Poisson process, in whichthe jumps are of a fixed size and occur at a fixed mean rate. In the much morecomplicated Levy processes, which have recently become popular in financialmodelling, the random variable can jump at different rates, and the jump sizeswill also be stochastic, instead of being fixed. For all of these processes, peoplehave established analogues of the Ito lemma mentioned above. Although wewill briefly look at these more complicated processes, our emphasis will be onBrownian motion and diffusion processes.

    To set up all of this in a mathematically rigorous way we are usually obligedto undergo (one is tempted to say, suffer) an extensive preparation in abstractprobability theory and measure theory. It turns out, however, that the basicrules of Ito calculus can be explained, and motivated, using a more intuitive,19th-century (or even 17th-century), approach to probability, if we are willingto accept the use of infinitesimals, tacitly genuflecting towards analysis whilstignoring its warnings. This will be our initial tactic, with the aim of famil-iarizing you as quickly as possible with stochastic calculus, including processeswith jumps. Afterwards we will take a closer look at the measure-theoreticfoundations of modern probability, and sketch how the material explained inthe first half fits into this more abstract framework, and can be used to make

    2In honour of K. Ito, the inventor of stochastic calculus.

  • 6things rigorous. I would like to stress, however, that mathematical rigour isnot our primary aim. An equally important point is that the measure-theoreticapproach to probability will allow us to formalize concepts such as informationcontained in a random variable or in a stochastic process, up to time t, mar-tingale, and stopping times. A martingale is a stochastic process which canbe thought of as a fair (gambling) game, in the sense that at each point in timeand given all information on how the game has developed up to that point intime, ones expected gain when continuing to play still equals ones expectedloss (think of repeatedly tossing a perfect coin). In a world without interestrates, idealized stock prices should be martingales: this is one way of formu-lating the so-called Efficient Market Hypothesis. Stopping times are (future)random times at which you (or somebody else) will have taken some actiondepending on the information made available at that future time. These arebasic for understanding American-style contracts, for example, or any kind ofsituation in which investors are free to choose their time of action. Finally, theabstract approach will allow us to change probabilities, and clarify the conceptof risk-neutral investors as (hypothetical) investors who accord different prob-abilities to the same events as non-risk-neutral ones, to the effect that they donot need to be rewarded for risk taking.

    Starred remarks, examples, etc., mostly serve to put the material in a widermathematical context, and can be skipped without loss of continuity.

  • Chapter 2

    Review of probabilitytheory (19th-century style)

    2.1 Real random variables

    We shall initially take an informal approach to probability theory, using proba-bility and random variables, for the moment, as unexplained, primitive notionsof the theory (much like points and lines in Euclidean geometry), for whichwe will put our trust in commonly shared intuition. In particular, a real-valuedrandom variable will be a quantity that can take different values, but for whichwe do know the various probabilities that it will lie in any given interval of realnumbers. That is, a real random variable X will (at this stage of the theory)be characterized by various probabilities, e.g.,

    P(a X b) := (probability that X will lie between a and b), (2.1)which, by definition, will be a number between 0 and 1:

    P(a X b) [0, 1].Here P stands for probability, and a and b can be any pair of real numbers. Wealso allow a or b to be , respectively. By convention, X < and X are trivially fulfilled statements, whose probability is 1, and X < is anempty statement, whose probability is 0. We obviously want P(a X b) tobe a number between 0 and 1.

    Random variables will systematically be denoted by capital letters X, Y , Z,etc., while ordinary real numbers will be denoted by lower-case letters x, y, z(this convention will later on be extended to vector-valued random variablesand ordinary vectors).

    Discrete random variables are easier to describe. These only take values insome discrete (possibly infinite) set of real numbers {x1, x2, . . .}. Such a discreterandom variable X is completely determined by the probabilities

    pj = P(X = xj). (2.2)

    This clearly implies that

    P(a X b) =

    j: axjbpj .

    7

  • 8In particular, this probability is 0 if none of the xjs lie between a and b.The following is an example of a discrete random variable which is basic in

    mathematical finance

    Example 2.1. In the binomial option pricing model we suppose that SN , theprice of the underlying security N days into the future, can take on the values

    dNS0, dN1uS0, . . . , dNjujS0, . . . , uNS0 for 0 j N,

    S0 being todays price and u and d two fixed positive real numbers (we areassuming that, in any one day, the stocks price can only go up or down by afixed fraction, u respectively d). The probability that SN is any of these valuesis then defined by

    P(SN = dNjujS0

    )=(N

    j

    )pNj(1 p)j .

    Here p is the probability of a daily up move (price moving from S to uS fromone day to the next), and 1 p that of a down move (S dS).Exercise 2.2. To have a well-defined discrete random variable, we need

    j

    pj = 1.

    (Why?) Check that this is indeed the case for the binomial model definedjust now.

    Hint. Use the binomial theorem.

    Another important example of a discrete random variable is a Poisson ran-dom variable.

    Example 2.3. A Poisson random variable N is a discrete random variable,taking its values in N = {0, 1, 2, . . .}, for which

    P(N = k) = pk =k

    k!e.

    Here > 0 is a parameter. Note that

    k=1 pk = 1 because e =

    k=0

    k

    k! .

    One can go a long way using only discrete random variables, but at somepoint it becomes extremely convenient1 to dispose of continuous random vari-ables. These do not take on any particular real value with a non-zero probability,but their probable values are, so to speak, spread out over entire intervals, andoften even over the whole of R. For such a random variable X we will havethat P(X = a) = 0 for any a R, but typically P(a X a + ) 6= 0, forany > 0. A very important example of such a random variable is a standardnormal random variable, as follows.

    Example 2.4. X is called a standard normal random variable (we also use theterm Gaussian random variable) if, for any a < b,

    P(a < X b) = ba

    ex2/2 dx

    2pi.

    1And even essential: see the central limit theorem below!

  • Mathematical Methods I, Autumn 2009 9

    The condition that P( < X 0,0

    FX(x+ ) = FX(x), for all x R.

    F Remark 2.7. The mathematical reason for this third property is perhapsnot yet very clear at this point; the right-continuity is in fact connected withthe fact that we have a -sign in (2.3); if we had defined FX with a

  • 10

    Definition 2.8. A random variable X is said to have a probability densityfunction, or pdf, if its cdf is of the form

    FX(x) = x

    f(x) dx,

    for some (integrable) function f : R R. This function f is (essentially2)unique, and we often write f = fX to stress the dependence onX.We sometimeswrite X fX for X has pdf fX .

    If X has a pdf, FX is continuous, and for reasonable (say, continuous) f ,FX will also be differentiable, with derivative

    F X(x) = f(x).

    A standard normal variable has pdf

    ex2/2

    2pi

    .

    Discrete random variables, such as Poisson random variables, do not have apdf, since their cdfs have jumps and are therefore not continuous.

    F Remark 2.9. One can construct very curious cdfs which are continuouseverywhere, have derivatives at almost all3 their points, but which do not havea pdf. If this derivative is equal to 0 almost everywhere, such a cdf (and itsassociated random variable) is called totally singular. Note that the condition ofbeing continuous prevents such a cdf from having jumps; in particular, P(X =a) will still be 0, for any a R. Such a cdf is still far away from being the cdfof a discrete random variable.

    A function f = f(x) will be the pdf of a random variable X iff4 the functionF (x) =

    x f(y) dy has the properties of a cdf. This leads to the following

    characterizing properties of a pdf:

    f(x) 0 everywhere (corresponding to FX being increasing) f(x) dx = 1 (corresponding to FX(x) 1 as x or, equivalently,to the total probability having to sum to 1).

    Continuity, and therefore right-continuity, is automatic for functions F (x) whichcan be written as integrals.

    Random variables X having a pdf are the easiest to work with. Althoughthe probability that such an X will take on precisely the value x R is equalto 0 for any real x, there is a useful alternative. Let dx be a (calculus-style)infinitesimal:

    dx 6= 0, (dx)2 = (dx)3 = = 0.2We can, for example, change the value of f(x) at a finite number of points x without

    changing the integral.3A term whose precise technical sense we will explain when discussing measure-theoretic

    probability, but which in the present case basically means that the set of points where thisCDF does not have a derivative has length 0

    4A very useful abbreviation, standing for if and only if.

  • Mathematical Methods I, Autumn 2009 11

    (Such infinitesimals do not really exist, but we think of them as numbers whichare so small that their squares may be safely neglected in any computation. Anoperative definition, in a computing context, might be to take dx so small thatall the significant digits of its square are equal to 0, within machine precisionor within the precision which is significant for the problem at hand.) We thenthink of fX(x) dx as being the probability that X lies in the infinitesimally smallinterval between x and x+ dx:

    P(X [x, x+ dx]) = fX(x) dx.Often we will be quite sloppy in our notation, and simply write

    P(X = x) = fX(x),

    although, strictly speaking, the left-hand side is 0 here.To see how this works, consider the definition of the mean of a continuous

    random variable X. The mean of a discrete random variable is equal to the sumof the possible values it can assume times the probability that it will take onthat value:

    j

    ajP(X = aj).

    For a random variable X having a pdf we would like to take the sum over allpossible x of x P(X [x, x+ dx]). The continuous analogue of a sum being anintegral, this leads to the following definition:

    E(X) =Rxf(x) dx. (2.4)

    More generally, and for similar reasons, when we consider functions g(X) ofsuch a random variable X, its mean is given by the very important formula

    E(g(X)) =Rg(x)f(x) dx, (2.5)

    provided this integral has a sense and is finite. The formula is very importantbecause, typically in finance, option prices can be expressed as means and in99% (or perhaps even 100%) of the cases you will be using (2.5) when evaluatingthe price analytically and sometimes even when evaluating it numerically.

    Particular examples of (2.5) are of course the mean E(X) of X itself (corre-sponding to g(x) = x) and, putting X := E(X), the variance of X,

    var(X) = E((X X)2

    )=R

    (x X)2f(x) dx, (2.6)

    corresponding to taking g(x) = (x X)2 and again assuming the integral isfinite. We often write

    var(X) = 2X ,

    where X =var(X) is the standard deviation; X is a measure of how much,

    in the mean, X differs from its mean, X .5 A very useful computational rule isthat

    var(X) = E(X2) (E(X))2,5Another measure of this deviation could be something like E(|X X |), but experience

    has taught us that quadratic expressions such as variances are much easier to compute with.

  • 12

    which is left as an easy exercise.Higher moments are often also very useful in finance, in particular the fol-

    lowing two:

    skewness s(X) = E((X )3

    3

    )=R

    (x

    )3f(x) dx, (2.7)

    kurtosis (X) = E((X )4

    4

    )=R

    (x

    )4f(x) dx. (2.8)

    Skewness is an indication of whether the pdf is tilted to the right or to the leftof its mean: if s(X) > 0, then X is more likely to exceed than to be less than, and vice versa. A large kurtosis is an indication that |X| can take large valueswith relatively high probability. These quantities play an important role in theeconometric analysis of financial returns. The following example computes themfor the benchmark case of a normal random variable with arbitrary mean andvariance.

    Example 2.10 (general normal or Gaussian variables). A random vari-able X is said to be normally distributed with mean and variance 2 if X hasprobability density

    12pi

    e(x)2/22 . (2.9)

    In this case we writeX N(, 2).

    The standard normal random variable corresponds to = 0 and = 1. Weeasily check that this is a correct definition, since this function is non-negative,and since its total probability is equal to 1:

    12pi2

    e(x)2/22 dx = 1.

    (The easiest way to see this is by making the successive changes of variablesx x+ and x x to reduce to the case of a standard normal variable.)

    If x N(, 2), then its mean, variance, skewness and kurtosis are, respec-tively,

    meanRxe(x)

    2/22 dx2pi2

    = ,

    varianceR(x )2e(x)2/22 dx

    2pi2= 2,

    skewness13

    R(x )3e(x)2/22 dx

    2pi2= 0 (by symmetry),

    kurtosis14

    R(x )4e(x)2/22 dx

    2pi2= 3.

    (To do these integrals, first make a change of variables, as above, to get rid of and .)

    If X is any, not necessarily normally distributed, random variable, we oftencompare its kurtosis with that of a normal distributed random variable havingthe same variance. This leads to the concept of excess kurtosis:

    exc (X) = (X) 3. (2.10)

  • Mathematical Methods I, Autumn 2009 13

    If the excess kurtosis is positive, the pdf of X is interpreted to have moreprobability mass in the tails, or fatter tails, than that of a comparable normaldistribution, with the same mean and variance as X.

    F Example 2.11. Random variables do not need to have a well-defined meanor variance: the following is a classical example, dating back to the 1800s, whenit caused much controversy among the French probabilists. A random variableX is said to be Cauchy , or have a Cauchy distribution, if its pdf is given by

    1pi

    11 + x2

    .

    This gives rise to a well-defined random variable, since

    1pi

    dx

    1 + x2= 1,

    as we can easily check, using that a primitive of (1 + x2)1 is arctanx. Is themean of X well-defined? This is not quite clear. On the one hand we mightargue that (briefly forgetting about the 1/pi in front)

    x

    1 + x2dx = lim

    R

    RR

    x

    1 + x2dx = 0,

    by symmetry. On the other hand, if we take the limit of the integrals overasymmetric intervals expanding to the whole of R, the answer comes out quitedifferently. For example:

    limR

    2RR

    x

    1 + x2dx = lim

    R

    [12log(1 + x2)

    ]2RR

    = limR

    12log(1 + 4R2

    1 +R2

    )= log 2 6= 0!

    (Here log stands for the natural logarithm, with basis e.)What is going on here? Mathematically speaking,

    f(x) dx = lim

    a,b

    ba

    f(x) dx

    will be well-defined, that is, independent of the way a and b tend to , if

    limR

    RR

    |f(x)|dx

  • 14

    Even if we argue that the symmetric definition of the mean is natural, since inour example the pdf is symmetric, and therefore set X = E(X) = 0, we runinto problems with the variance, since it is clear that, e.g.,

    limR

    RR

    x2

    1 + x2dx =.

    (Use integration by parts, or estimate the integral from below by, for example, R1x2/(1 + x2) dx R

    1(1/2) dx = (R 1)/2.)

    The Cauchy distribution is a particular example of a more general class ofdistributions called the Levy stable distributions, which also include the nor-mal distribution (which is in fact the only member of this class having a finitevariance), and to which we will (hopefully) devote some time at during theselectures. Levy stable distributions have been proposed as more accurate modelsof financial asset returns than the traditional normal distributions (followingpioneering work by Mandelbrot and Fama in the 1960s, but are more difficultto work with for a number of technical reasons, not the least of which is theirinfinite variance.

    F Remark 2.12. How would one define the mean E(X) and, more generally,E(g(X)) if X is neither discrete nor has a probability density? This is not quiteso obvious, but it turns out that for reasonable functions g = g(x) : R R thefollowing definition is a natural generalization of (2.5):

    E(g(X)) = limN

    j=

    g

    (j

    N

    )(FX

    (j + 1N

    ) FX

    (j

    N

    )). (2.11)

    This formula is motivated by the classical construction of an integral bag(x) dx

    as limit of sums over rectangles filling in the surface under the graph of g, as youmay recall from your calculus course (indeed, the latter corresponds formally totaking FX(x) = x, although this is not a pdf). We will denote the right-handside of (2.11) by

    Rg(x) dFX(x), (2.12)

    with the understanding that if FX is differentiable (and X has a pdf F X = fX),then

    dFX(x) = fX(x) dx,

    so that we get (2.5) again.

    F Exercise 2.13. Show that, if X is a discrete random variable, taking valuesin {a1, a2, . . .} with probabilities p1, p2, . . . , then for continuous g,

    Rg(x) dFX =

    j

    pjg(aj). (2.13)

    As special cases we re-obtain the classical formulas for the mean and variance fa discrete random variable:

    E(X) = X =nj=1

    pjaj ,

  • Mathematical Methods I, Autumn 2009 15

    and

    var(X) =nj=1

    (aj X)2pj .

    What about (2.13) when g has a jump in x = a1 and is right-continuous there?And what if it is left-continuous? (Take a1 = 0, to simplify.)

    Exercise 2.14. Compute the mean and variance of a Poisson random variable.

    Exercise 2.15. Let X be a random variable with pdf f = fX . Show that X2

    also has a pdf, which is given by

    12x

    (f(x) + f(x)).

    As an application, compute the pdf of X2 when X is standard normal. Theresult is called a 2(1) or

    2-distribution with one degree of freedom.

    Exercise 2.16. Let Z be a standard normal variable. Compute the pdf of X =eZ ; this is called a log-normal distribution. Compute the mean and varianceof X.

    Exercise 2.17. A Student t-distribution with > 2 degrees of freedom has apdf of the form

    t(x) = C

    (1 +

    x2

    2)/2

    ,

    C being a normalization constant put there to ensure thatt(x) dx = 1.

    Show that if X is Student, then its mean exists. Also show that its varianceexists iff > 3, and its skewness and kurtosis iff, respectively, > 4, > 5.

    Hint. Use that the integral1

    dx/x is finite iff > 1.

    2.2 Random vectors and families ofrandom variables

    Consider now a vector of real random variables (X1, . . . , XN ). How do wecapture the probabilistic behaviour of such a random vector? Clearly, we mustknow the probability distribution FXj of each Xj individually, but we need toknow more. For example, we also need to know joint probabilities, e.g., theprobability that a1 < X1 b1 and a2 < X2 b2. In fact, all this informationcan be obtained from the joint distribution function,

    FX1,...,XN (x1, . . . , xN ) := P(X1 x1, X2 x2, . . . , XN xN ), (2.14)the probability that, simultaneously, X1 x1 and X2 x2, etc. One can getjoint probabilities such as

    P(a1 < X1 b1, a2 < X2 b2, . . . , aN XN bN ) (2.15)from (2.14), by algebraic manipulations, but we will leave this as an exercisefor the interested reader (the answer can be found in most books on probabilitytheory; try to work it out for two variables (X1, X2)).

  • 16

    We will again mostly work with random vectors (X1, . . . , XN ) which havea multivariate probability density, in the (obvious) sense that there exists afunction fX1,...,XN : RN R0 such that

    FX1,...,XN (x1, . . . , xN ) = x1

    xN

    fX1,...,XN (y1, . . . , yN ) dy1 dyN .(2.16)

    We then say that (X1, . . . , XN ) has joint pdf fX1,...,XN . Note that in this case(2.15) simply equals b1

    a1

    bNaN

    fX1,...,XN dy1 dyN ,

    where, to simplify the formulas, we will often leave out the variables of fX1,...,XN .Note that the definition of a joint pdf implies that

    fX1,...,XN )(x1, . . . , xN ) =N

    x1 xN FX1,...,XN (x1, . . . , xN ).

    If (X1, . . . , XN ) has joint pdf fX1,...,XN , then the natural definition of the ex-pectation of a function g(X1, . . . , XN ) of the X1, . . . , XN is

    E(g(X1, . . . , XN )) :=R Rg(x1, . . . , xN )fX1,...,XN (x1, . . . , xN ) dx1 xN .

    (2.17)

    We will usually write this more briefly as

    E(g(X1, . . . , XN )) =RN

    gfX1,...,XN dx,

    with dx = dx1 dxN .In particular, we define the means Xj and the covariances cov(Xi, Xj) by

    Xj = E(Xj) =RN

    xjfX1,...,XN dx, (2.18)

    and

    cov(Xi, Xj) = E((Xi Xi)(Xj Xj )

    )=RN

    (xi Xi)(xj Xj )fX1,...,XN dx. (2.19)

    The following is the multivariable generalization of a normal random vari-able.

    Example 2.18 (jointly normally distributed random vectors). Let V =(Vij)1i,jN be a non-singular symmetric N N -matrix:

    Vij = Vji R, det(V ) 6= 0.We say that (X1, . . . , XN ) are jointly normally distributed with mean =(1, . . . , N ) and variancecovariance matrix V if their joint pdf is equal to

    fX1,...,XN (x1, . . . , xN ) =exp(x, V 1x/2)(2pi)N/2

    det(V )

    . (2.20)

  • Mathematical Methods I, Autumn 2009 17

    Here V 1x stands for the inverse of V applied to x = (x1, . . . xN ) RN , and, stands for the Euclidean inner product on RN :

    x, y = x1y1 + + xNyN(often written as xty, where t stands for transpose).

    Remark 2.19. We can check that if

    (X1, . . . , XN ) N(, V ),then

    E(Xj) = j ,and

    cov(Xi, Xj) = Vij .

    This can be verified directly by using a bit of linear algebra and multivariablecalculus, by diagonalizing V using a suitable rotation of RN , and using thechange of variables formula for multiple integrals; details are left to the inter-ested reader. An easier and perhaps more natural way to deal with multivariatenormals will be introduced in Section 3.4 below.

    From the joint distribution of (X1, . . . , XN ) we can reconstruct the singlecdfs of the Xj by taking the marginals of FX1,...,XN . For example, since,trivially, any Xj

  • 18

    We will very soon need to go beyond finite vectors of random variables, andconsider infinite families of these. Indeed, a continuous-time stochastic processis defined as a collection of random variables (Xt)t0, one for each positive t,the latter playing the role of time. How do we specify such a stochastic process?This turns out to be a bit delicate, in particular as to the question of whento identify two such stochastic processes, but for the moment we will use thefollowing working definition.

    Definition 2.21 (stochastic processes, provisional working definition).A continuous-time stochastic process is a collection of random variables Xt, onefor each t 0, such that, for any finite collection of times {t1, t2, . . . , tN} weknow the joint probability distribution FXt1 ,...,XtN : R

    N [0, 1] of (Xt1 , . . . , XtN )(here N can be arbitrarily big).

    F Remark 2.22. The delicacy here resides in the fact that t ranges over acontinuous set. Discrete-time stochastic processes (Xn)nN are less problematicin the sense that these are completely determined by all joint distributionsFX1,...,XN , for arbitrarily large N .

    For continuous t we usually include a (left- or right-) continuity conditionon the sample trajectories t Xt. To properly define the latter one needs totake the measure-theoretic approach to probability, which will be sketched laterin these lectures.

    2.3 Independent random variables andconditional probabilities

    The concept of an independent random variable is basic in probability and statis-tics. Let us consider a pair of random variables (X,Y ) with joint probabilitydistribution FX,Y .

    Definition 2.23 (independent random variables). Two random variablesX and Y are independent if, for all x, y,

    FX,Y (x, y) = FX(x)FY (y). (2.21)

    We can easily check that if (X,Y ) has a joint pdf, then X and Y areindependent iff

    fX,Y (x, y) = fX(x)fY (y), (2.22)

    where fX and fY are the marginal pdfs

    fX(x) =

    fX,Y (x, y) dy, etc.

    We can easily show from this that E(XY ) = E(X)E(Y ) (the double integraljust becomes a product of two one-dimensional integrals). More generally (andfor the same reason) we have the following result.

    Proposition 2.24.

    X, Y independent E(g(X)h(Y )) = E(g(X))E(h(Y )). (2.23)for any two functions g = g(x) and h = h(y) for which these expectations arewell-defined.

  • Mathematical Methods I, Autumn 2009 19

    F Remark 2.25. Having the right-hand side of (2.23) for a sufficiently largeclass of functions g and h, for example, for all bounded continuous functions, isin fact equivalent to Definition 2.23.

    If we recall that covariance of two random variables X and Y ,

    cov(X,Y ) = E((X X)(Y Y )

    ),

    where X = E(X) and Y = E(Y ) are the means, then (2.23) implies

    X, Y independent cov(X,Y ) = 0.

    For jointly normal random variables there is a well-known converse:

    Proposition 2.26. If (X,Y ) is jointly normally distributed, then cov(X,Y ) = 0implies that X and Y are independent.

    More generally, if X = (X1, , XN ) N(, V ) then its components X1, . . . , XNare all independent iff Vij = 0 for all i 6= j, that is, iff V is diagonal.

    The proof is quite easy, since if V is diagonal, then the pdf fX simply becomesa product

    i

    12piVii

    e(xii)2/2Vii ,

    of univariate normal distributions whose means are equal to i and whose vari-ances are Vii, which are the pdf of the Xi.

    Warning! Proposition 2.26 is not true if X and Y are not jointly normal: hav-ing covariance 0 is in general much weaker than being independent, as thefollowing example shows.

    Example 2.27. Let X N(0, 1) be a standard normal random variable, andlet Y = X2 1. Then E(X) = E(Y ) = 0, the latter holding because E(X2) = 1and

    cov(X,Y ) = E(XY

    )= E

    (X(X2 1))

    = E(X3) E(X)= 0,

    recalling that E(X) = E(X3) = 0. Thus X and Y have zero covariance. How-ever, they are not independent: intuitively this is clear, since Y is simply afunction of X, and therefore as dependent on X as can be! Formally, if we takeg(x) = x2 1 and h(x) = x, then

    E(g(X)h(Y )

    )= E

    ((X2 1)2) > 0,

    contradicting independence, by Proposition 2.24.

    Working a little harder, we can find a joint pdf fX,Y such that cov(X,Y ) =0 and both its marginals are normal, but for which X and Y are still notindependent. This shows that we have to be very careful when formulating

  • 20

    Proposition 2.26: we cannot replace (X,Y ) normally distributed by both Xand Y normally distributed.

    Recalling the definition of the linear correlation coefficient,

    (X,Y ) :=cov(X,Y )

    var(X)var(X)

    , (2.24)

    which is always a number between 1 and 1, then X and Y independent impliesthat (X,Y ) = 0, but the converse is not true, except again for Gaussian randomvariables.

    The previous considerations generalize naturally to N -tuples of random vari-ables: X1, . . . , XN will be called independent iff

    FX1,...,XN (x1, . . . , xN ) = FX1(x1) FXN (xN ), (2.25)for any x1, . . . , xN ) RN , and we have the natural generalization of (2.23),which we will leave as an exercise.

    We next turn to the concept of conditional probability for pairs of randomvariables. The discussion here will be limited to random variables having den-sities. The general case needs a more abstract approach, which will be givenlater during these lectures.

    Recall, from elementary probability theory, that the conditional probabil-ity of some event A happening, given that B has (or will have) happened, isdefined as

    P(A | B) = P(A and B )P(B)

    .

    For X,Y two random variables with probability densities fX and fY , we cantherefore compute the conditional probability of X being in [x, x+dx] given thatY is in [y, y + dy] as

    P(X (x, x+ dx) | Y (y, y + dy))

    =P(X (x, x+ dx), y (y, y + dy))

    P(Y (y, y + dy))=

    fX,Y (x, y) dx dyfY (y) dy

    =fX,Y (x, y)fY (y)

    dx, (2.26)

    assuming of course that FY (y) 6= 0. We will often simply write this as

    P(X = x | Y = y) = fX,Y (x, y)fY (y)

    ,

    forgetting about the dx, and read the left-hand side as the probability densityof X, given that Y = y.

    If X and Y are independent, (2.26) simplifies to

    P(X = x | Y = y) = fX(x), (2.27)which corresponds to intuition: if X and Y are independent, the probability ofX taking on the value x does not depend on what value Y has taken.

  • Mathematical Methods I, Autumn 2009 21

    We record the following useful formulas:

    P(a < X < b and c < Y < d) = ba

    dc

    fX,Y (x, y) dx dy

    = ba

    dc

    P(X = x | Y = y)fY (y) dx dy

    = dc

    (P(a X b | Y = y)fY (y)

    )dy.

    Again, the concept of conditional probability density generalizes in a naturalway from pairs of random variables to arbitrarily many random variables; onlythe notation gets a bit more involved. For example, consider an N -tuple ofrandom variables X = (X1, . . . , XN ), with joint pdf fX = fX1,...,XN , and picka k, 1 k N . If x = (x1, . . . , xN ) RN , we split x as

    x = (x, x),

    withx = (x1, . . . , xk) Rk,

    andx = (xk+1, . . . , xN ) RNk.

    Similarly, we writeX = (X , X ),

    whereX = (X1, . . . , Xk),

    andX = (Xk+1, . . . , XN ).

    We can then write, symbolically, that fX = fX,X . The probability density ofX , given that X = x is then equal to

    P(X = x | X = x) = fX(x, x)

    fX(x), (2.28)

    where fX(x) is the marginal distribution of X , obtained by integrating outthe x-variables:

    fX(x) =RkfX(x, x) dx

    =R RfX1,...,XN (x1, . . . , xk, xk+1, . . . , xN ) dx1 dxk.

    Conditional densities are very useful in defining stochastic processes. Forexample, so-calledMarkov processes can be specified by defining the conditionalprobability densities,

    P(Xt = x | Xs = y), 0 s < t,together with the defining Markov property that, for any s1 < < sN < t,

    P(Xt = x | Xs1 = s1, . . . XsN = sN ) = P(Xt = x | XsN = yN ).The last equation is a mathematical way of stating that the future only dependson the past via the present. The transition probabilities P(Xt = x | Xs = y)will have to satisfy certain consistency conditions, as we will see in chapter 6.

  • 22

    2.4 The central limit theorem

    In its simplest form, the central limit theorem is concerned with sums of randomvariable which are independent and identically distributed (usually abbreviatedas i.i.d.). The latter means all Xj have the same probability distribution func-tions: FX1 = FX2 = . We also require that their mean and variance arefinite:

    :=

    xfXj (x) dx

  • Mathematical Methods I, Autumn 2009 23

    be regarded as the sum of a lot of small independent but basically identicallydistributed random influences, the CLT suggests that it is reasonable to modelthis by a normally distributed random variableThis is the basic intuition be-hind modelling financial asset returns by Gaussian random variables, as in thestandard geometric Brownian motion model for stock prices. We have to becareful, however: Theorem 2.28 is for arbitrary but fixed a and b, and basicallyonly tells us something about the behaviour of the centre of the distribution ofSN . Indeed, empirical work during the 1990s (and also much earlier) has shownthat the actual stock returns have fat-tailed distributions, in the sense that verylarge or very small returns occur with much larger probabilities than predictedby the normal model.

    In the next chapter, we will use the CLT to introduce Brownian motion asa limit of a sequence of random walks, and from there go on to introduce theIto calculus.

    Exercise 2.29 (moments of the normal distribution). Let Z N(0, 1)be a standard normal distribution, and let

    mn := E(Zn) =

    xnex2/2 dx

    2pi

    be its nth moment.

    (a) Explain why all odd moments of Z are 0.

    (b) Show that the even moments are related by m2k = (2k 1)m2k2, anddeduce from this that

    m2k =(2k)!2kk!

    .

    Hint. Integrate by parts.

    (c) Now consider the odd moments of the absolute value |Z| of Z:

    E(|Z|2k+1) = m2k+1.

    Show that as long as 2k 1 > 0, m2k+1 = 2k m2k1, and deduce fromthis that

    m2k+1 = 2kk!

    2pi.

    Exercise 2.30. Show that if Z N(0, 2), then 1Z N(0, 1). Use this andthe previous exercise to compute the moments of Z and of |Z|.

  • 24