8/14/2019 Prof[1] F P Kelly - Probability
1/78
Probability
Prof. F.P. Kelly
Lent 1996
These notes are maintained by Paul Metcalfe.
Comments and corrections to [email protected].
8/14/2019 Prof[1] F P Kelly - Probability
2/78
Revision: 1.1
Date: 2002/09/20 14:45:43
The following people have maintained these notes.
June 2000 Kate Metcalfe
June 2000 July 2004 Andrew RogersJuly 2004 date Paul Metcalfe
8/14/2019 Prof[1] F P Kelly - Probability
3/78
Contents
Introduction v
1 Basic Concepts 11.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Classical Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Stirlings Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 The Axiomatic Approach 5
2.1 The Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Random Variables 11
3.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Indicator Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Inclusion - Exclusion Formula . . . . . . . . . . . . . . . . . . . . . 18
3.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Inequalities 23
4.1 Jensens Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 Markovs Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Chebyshevs Inequality . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Generating Functions 31
5.1 Combinatorial Applications . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . 35
5.4 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6 Continuous Random Variables 47
6.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . 50
6.2 Transformation of Random Variables . . . . . . . . . . . . . . . . . . 57
6.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . 64
iii
8/14/2019 Prof[1] F P Kelly - Probability
4/78
iv CONTENTS
6.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.5 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . 71
8/14/2019 Prof[1] F P Kelly - Probability
5/78
Introduction
These notes are based on the course Probability given by Prof. F.P. Kelly in Cam-
bridge in the Lent Term 1996. This typed version of the notes is totally unconnected
with Prof. Kelly.
Other sets of notes are available for different courses. At the time of typing thesecourses were:
Probability Discrete Mathematics
Analysis Further Analysis
Methods Quantum Mechanics
Fluid Dynamics 1 Quadratic Mathematics
Geometry Dynamics of D.E.s
Foundations of QM Electrodynamics
Methods of Math. Phys Fluid Dynamics 2
Waves (etc.) Statistical Physics
General Relativity Dynamical Systems
Combinatorics Bifurcations in Nonlinear Convection
They may be downloaded from
http://www.istari.ucam.org/maths/ .
v
http://www.istari.ucam.org/maths/http://www.istari.ucam.org/maths/http://www.istari.ucam.org/maths/8/14/2019 Prof[1] F P Kelly - Probability
6/78
vi INTRODUCTION
8/14/2019 Prof[1] F P Kelly - Probability
7/78
Chapter 1
Basic Concepts
1.1 Sample Space
Suppose we have an experiment with a set of outcomes. Then is called the samplespace. A potential outcome is called a sample point.
For instance, if the experiment is tossing coins, then = {H, T}, or if the experi-ment was tossing two dice, then = {(i, j) : i, j {1, . . . , 6}}.
A subset A of is called an event. An event A occurs is when the experiment isperformed, the outcome satisfies A. For the coin-tossing experiment, thenthe event of a head appearing is A = {H} and for the two dice, the event rolling afour would be A = {(1, 3), (2, 2), (3, 1)}.
1.2 Classical Probability
If is finite, = {1, . . . , n}, and each of the n sample points is equally likelythen the probability of event A occurring is
P(A) =|A|||
Example. Choose r digits from a table of random numbers. Find the probability thatfor0 k 9,
1. no digit exceeds k,
2. k is the greatest digit drawn.
Solution. The event that no digit exceeds k is
Ak = {(a1, . . . , ar) : 0 ai k, i = 1 . . . r} .
Now |Ak| = (k + 1)r, so that P(Ak) =k+110
r.
Let Bk be the event that k is the greatest digit drawn. Then Bk = Ak \ Ak1. AlsoAk1 Ak, so that |Bk| = (k + 1)r kr . Thus P(Bk) = (k+1)
rkr10r
1
8/14/2019 Prof[1] F P Kelly - Probability
8/78
2 CHAPTER 1. BASIC CONCEPTS
The problem of the points
Players A and B play a series of games. The winner of a game wins a point. The twoplayers are equally skillful and the stake will be won by the first player to reach a target.
They are forced to stop when A is within 2 points and B within 3 points. How should
the stake be divided?
Pascal suggested that the following continuations were equally likely
AAAA AAAB AABB ABBB BBBB
AABA ABBA BABB
ABAA ABAB BBAB
BAAA BABA BBBA
BAAB
BBAA
This makes the ratio 11 : 5. It was previously thought that the ratio should be 6 : 4on considering termination, but these results are not equally likely.
1.3 Combinatorial Analysis
The fundamental rule is:
Suppose r experiments are such that the first may result in any of n1 possible out-comes and such that for each of the possible outcomes of the first i 1 experimentsthere are ni possible outcomes to experiment i. Let ai be the outcome of experiment i.Then there are a total of
ri=1 ni distinct r-tuples (a1, . . . , ar) describing the possible
outcomes of the r experiments.
Proof. Induction.
1.4 Stirlings Formula
For functions g(n) and h(n), we say that g is asymptotically equivalent to h and write
g(n) h(n) if g(n)h(n) 1 as n .Theorem 1.1 (Stirlings Formula). As n ,
logn!
2nnnen 0
and thus n! 2nnnen.
We first prove the weak form of Stirlings formula, that log(n!) n log n.Proof. log n! =
n1 log k. Nown
1
log xdx n1
log k n+11
log xdx,
andz1
logx dx = z log z z + 1, and son log n n + 1 log n! (n + 1) log(n + 1) n.
Divide by n log n and let n to sandwich logn!n logn between terms that tend to 1.Therefore log n! n log n.
8/14/2019 Prof[1] F P Kelly - Probability
9/78
1.4. STIRLINGS FORMULA 3
Now we prove the strong form.
Proof. For x > 0, we have
1 x + x2 x3 < 11 + x
< 1 x + x2.
Now integrate from 0 to y to obtain
y y2/2 + y3/3 y4/4 < log(1 + y) < y y2/2 + y3/3.
Let hn = logn!en
nn+1/2. Then1 we obtain
1
12n2 1
12n3 hn hn+1 1
12n2+
1
6n3.
For n 2, 0 hn hn+1 1n2 . Thus hn is a decreasing sequence, and 0 h2hn+1
nr=2(hrhr+1)
1
1r2 . Therefore hn is bounded below, decreasing
so is convergent. Let the limit be A. We have obtained
n! eAnn+1/2en.
We need a trick to find A. Let Ir =/20
sinr d. We obtain the recurrence Ir =r1r Ir2 by integrating by parts. Therefore I2n =
(2n)!(2nn!)2
/2 and I2n+1 =(2nn!)2
(2n+1)!.
Now In is decreasing, so
1 I2nI2n+1
I2n1I2n+1
= 1 +1
2n 1.
But by substituting our formula in, we get that
I2nI2n+1
2
2n + 1
n
2
e2A 2
e2A.
Therefore e2A = 2 as required.
1by playing silly buggers with log 1 + 1n
8/14/2019 Prof[1] F P Kelly - Probability
10/78
4 CHAPTER 1. BASIC CONCEPTS
8/14/2019 Prof[1] F P Kelly - Probability
11/78
Chapter 2
The Axiomatic Approach
2.1 The Axioms
Let be a sample space. Then probabilityP is a real valued function defined on subsetsof satisfying :-
1. 0 P(A) 1 for A ,
2. P() = 1,
3. for a finite or infinite sequence A1, A2, of disjoint events, P(Ai) =i P(Au).
The number P(A) is called the probability of event A.
We can look at some distributions here. Consider an arbitrary finite or countable
= {1, 2, . . . } and an arbitrary collection {p1, p2, . . . } of non-negative numberswith sum 1. If we define
P(A) =
i:iApi,
it is easy to see that this function satisfies the axioms. The numbers p1, p2, . . . arecalled a probability distribution. If is finite with n elements, and ifp1 = p2 = =pn =
1n we recover the classical definition of probability.
Another example would be to let = {0, 1, . . . } and attach to outcome r theprobability pr = e
r
r! for some > 0. This is a distribution (as may be easilyverified), and is called the Poisson distribution with parameter .
Theorem 2.1 (Properties ofP). A probability P satisfies
1. P(Ac) = 1 P(A),
2. P() = 0,
3. ifA B then P(A) P(B),
4. P(A B) = P(A) + P(B) P(A B).
5
8/14/2019 Prof[1] F P Kelly - Probability
12/78
6 CHAPTER 2. THE AXIOMATIC APPROACH
Proof. Note that = AAc, and AAc = . Thus 1 = P() = P(A)+P(Ac). Nowwe can use this to obtain
P
() = 1 P
(c
) = 0. IfA B, write B = A (B Ac
),so that P(B) = P(A) + P(B Ac) P(A). Finally, write A B = A (B Ac)and B = (B A) (B Ac). Then P(A B) = P(A) + P(B Ac) and P(B) =P(B A) + P(B Ac), which gives the result.Theorem 2.2 (Booles Inequality). For any A1, A2, ,
P
n1
Ai
ni
P(Ai)
P
1
Ai
i
P(Ai)
Proof. Let B1 = A1 and then inductively let Bi = Ai\
i11
Bk. Thus the Bis aredisjoint and
i Bi =
i Ai. Therefore
P
i
Ai
= P
i
Bi
=i
P(Bi)
i
P(Ai) as Bi Ai.
Theorem 2.3 (Inclusion-Exclusion Formula).
P
n1
Ai
=
S{1,...,n}
S=
(1)|S|1P
jSAj
.
Proof. We know that P(A1 A2) = P(A1) + P(A2) P(A1 A2). Thus the resultis true for n = 2. We also have that
P(A1 An) = P(A1 An1) + P(An) P((A1 An1) An) .But by distributivity, we have
P
n
i
Ai
= P
n1
1
Ai
+ P(An) P
n1
1
(Ai An)
.
Application of the inductive hypothesis yields the result.
Corollary (Bonferroni Inequalities).
S{1,...,r}
S=
(1)|S|1P
jSAj
or
P
n1
Ai
according as r is even or odd. Or in other words, if the inclusion-exclusion formula istruncated, the error has the sign of the omitted term and is smaller in absolute value.
Note that the case r = 1 is Booles inequality.
8/14/2019 Prof[1] F P Kelly - Probability
13/78
2.2. INDEPENDENCE 7
Proof. The result is true for n = 2. If true for n 1, then it is true for n and 1 r n 1 by the inductive step above, which expresses a n-union in terms of two n 1unions. It is true for r = n by the inclusion-exclusion formula.Example (Derangements). After a dinner, the n guests take coats at random from a
pile. Find the probability that at least one guest has the right coat.
Solution. Let Ak be the event that guest k has his1 own coat.
We want P(n
i=1 Ai). Now,
P(Ai1 Air) =(n r)!
n!,
by counting the number of ways of matching guests and coats after i1, . . . , ir havetaken theirs. Thus
i1
8/14/2019 Prof[1] F P Kelly - Probability
14/78
8 CHAPTER 2. THE AXIOMATIC APPROACH
Solution. We first calculate the probabilities of the events A1, A2, A3, A1A2, A1A3and A1 A2 A3.
Event Probability
A11836
= 12
A2 As above,12
A36336
= 12
A1 A2 3336 = 14A1
A3
3336
= 14
A1 A2 A3 0
Thus by a series of multiplications, we can see that A1 and A2 are independent, A1and A3 are independent (also A2 and A3), but that A1, A2 and A3 are notindependent.
Now we wish to state what we mean by 2 independent experiments 2. Consider
1 = {1, . . . } and 2 = {1, . . . } with associated probability distributions {p1, . . . }and {q1, . . . }. Then, by 2 independent experiments, we mean the sample space1 2 with probability distribution P((i, j)) = piqj .
Now, suppose A
1 and B
2. The event A can be interpreted as an event in
1 2, namely A 2, and similarly for B. ThenP(A B) =
iAjB
piqj =iA
pijB
qj = P(A)P(B) ,
which is why they are called independent experiments. The obvious generalisation
to n experiments can be made, but for an infinite sequence of experiments we mean asample space 1 2 . . . satisfying the appropriate formula n N.
You might like to find the probability that n independent tosses of a biased coinwith the probability of heads p results in a total ofr heads.
2.3 Distributions
The binomial distribution with parameters n andp, 0 p 1 has = {0, . . . , n} andprobabilities pi =
ni
pi(1 p)ni.
Theorem 2.4 (Poisson approximation to binomial). Ifn , p 0 with np = held fixed, then
n
r
pr(1 p)nr e
r
r!.
2or more generally, n.
8/14/2019 Prof[1] F P Kelly - Probability
15/78
2.4. CONDITIONAL PROBABILITY 9
Proof. nr
pr(1 p)nr = n(n 1) . . . (n r + 1)
r!pr(1 p)nr
=n
n
n 1n
. . .n r + 1
n
(np)r
r!(1 p)nr
=r
i=1
n i + 1
n
r
r!
1
n
n1
n
r
1 r
r! e 1
= er
r!.
Suppose an infinite sequence of independent trials is to be performed. Each trial
results in a success with probability p (0, 1) or a failure with probability 1 p. Sucha sequence is called a sequence of Bernoulli trials. The probability that the first success
occurs after exactly r failures ispr = p(1p)r. This is the geometric distribution withparameter p. Since
0 pr = 1, the probability that all trials result in failure is zero.
2.4 Conditional Probability
Definition 2.2. ProvidedP(B) > 0 , we define the conditional probability ofA|B3 tobe
P(A
|B) =
P(A B)P(B)
.
Whenever we write P(A|B), we assume thatP(B) > 0.Note that ifA and B are independent then P(A|B) = P(A).
Theorem 2.5. 1. P(A B) = P(A|B)P(B),2. P(A B C) = P(A|B C)P(B|C)P(C),3. P(A|B C) = P(AB|C)
P(B|C) ,
4. the function P(|B) restricted to subsets ofB is a probability function on B.Proof. Results 1 to 3 are immediate from the definition of conditional probability. For
result 4, note that AB B, so P(A B) P(B) and thus P(A|B) 1. P(B|B) =1 (obviously), so it just remains to show the last axiom. For disjoint Ais,
P
i
Ai
B
=P(
i(Ai B))P(B)
=
i P(Ai B)P(B)
=i
P(Ai|B) , as required.
3read A givenB.
8/14/2019 Prof[1] F P Kelly - Probability
16/78
8/14/2019 Prof[1] F P Kelly - Probability
17/78
Chapter 3
Random Variables
Let be finite or countable, and let p = P({}) for .
Definition 3.1. A random variable X is a function X : R.
Note that random variable is a somewhat inaccurate term, a random variable is
neither random nor a variable.
Example. If = {(i, j), 1 i, j t} , then we can define random variables X andY by X(i, j) = i +j andY(i, j) = max{i, j}
LetRX
be the image of
underX
. When the range is finite or countable then the
random variable is said to be discrete.
We write P(X = xi) for
:X()=xip , and for B R
P(X B) =xB
P(X = x) .
Then
(P(X = x) , x RX)
is the distribution of the random variable X. Note that it is a probability distributionover RX .
3.1 Expectation
Definition 3.2. The expectation of a random variable X is the number
E[X] =
pwX()
provided that this sum converges absolutely.
11
8/14/2019 Prof[1] F P Kelly - Probability
18/78
12 CHAPTER 3. RANDOM VARIABLES
Note that
E[X] =
pwX()
=
xRX
:X()=x
pX()
=
xRXx
:X()=x
p
=
xRXxP(X = x) .
Absolute convergence allows the sum to be taken in any order.
If X is a positive random variable and if
pX() = we write E[X] =+. If
xRXx0
xP(X = x) = and
xRXx
8/14/2019 Prof[1] F P Kelly - Probability
19/78
3.1. EXPECTATION 13
Solution.
E[X] =n
r=0
rpr(1 p)nr
nr
=n
r=0
rn!
r!(n r)!pr(1 p)nr
= nn
r=1
(n 1)!(r 1)!(n r)!p
r(1 p)nr
= np
nr=1
(n 1)!(r 1)!(n r)!p
r1(1 p)nr
= np
n1r=1
(n
1)!
(r)!(n r)!pr
(1 p)n
1
r
= npn1r=1
n 1
r
pr(1 p)n1r
= np
For any function f: R R the composition of f and X defines a new randomvariable f and X defines the new random variable f(X) given by
f(X)(w) = f(X(w)).
Example. Ifa, b andc are constants, then a + bXand(X c)2 are random variablesdefined by
(a + bX)(w) = a + bX(w) and
(X c)2(w) = (X(w) c)2.Note that E[X] is a constant.
Theorem 3.1.
1. IfX 0 then E[X] 0.2. IfX 0 andE[X] = 0 then P(X = 0) = 1.
3. Ifa andb are constants then E[a + bX] = a + bE[X].
4. For any random variables X, Y then E[X+ Y] = E[X] + E[Y].
5. E[X] is the constant which minimises E
(X c)2
.
Proof. 1. X 0 means Xw 0 w
So E[X] =
pX() 0
2. If with p > 0 and X() > 0 then E[X] > 0, therefore P(X = 0) = 1.
8/14/2019 Prof[1] F P Kelly - Probability
20/78
14 CHAPTER 3. RANDOM VARIABLES
3.
E[a + bX] =
(a + bX())p
= a
p + b
pX()
= a + E[X] .
4. Trivial.
5. Now
E
(X c)2
= E
(X E[X] + E[X] c)2
= E
[(X E[X])
2+ 2(X E[X])(E[X] c) + [(E[X] c)]
2
]= E
(X E[X])2 + 2(E[X] c)E[(X E[X])] + (E[X] c)2
= E
(X E[X])2 + (E[X] c)2.This is clearly minimised when c = E[X].
Theorem 3.2. For any random variables X1, X2,....,Xn
E
n
i=1
Xi
=
ni=1
E[Xi]
Proof.
E
n
i=1
Xi
= E
n1i=1
Xi + Xn
= E
n1i=1
Xi
+ E[X]
Result follows by induction.
3.2 Variance
Var X = E
X2 E[X]2 for Random Variable X
= E[X E[X]]2 = 2Standard Deviation =
Var X
Theorem 3.3. Properties of Variance
(i) Var X 0 ifVar X = 0, then P(X = E[X]) = 1Proof - from property 1 of expectation
(ii) Ifa, b constants, Var (a + bX) = b2 Var X
8/14/2019 Prof[1] F P Kelly - Probability
21/78
3.2. VARIANCE 15
Proof.
Var a + bX = E[a + bX a bE[X]]= b2E[X E[X]]= b2 Var X
(iii) Var X = E
X2 E[X]2
Proof.
E[X E[X]]2 = EX2 2XE[X] + (E[X])2= E
X2
2E[X]E[X] + E[X]2
= EX2 (E[X])2
Example. Let X have the geometric distribution P(X = r) = pqr with r = 0, 1, 2...andp + q = 1. Then E[X] = qp andVar X =
qp2 .
Solution.
E[X] =r=0
rpqr = pqr=0
rqr1
=1
pq
r=0
d
dq(qr) = pq
d
dq
11 q
= pq(1 q)2
=
q
p
E
X2
=r=0
r2p2q2r
= pq
r=1
r(r + 1)qr1 r=1
rqr1
= pq(2
(1 q)3 1
(1 q)2 =2q
p2 q
p
Var X = E
X2 E[X]2
=2q
p2 q
p q
2
p
=q
p2
Definition 3.3. The co-variance of random variables X andY is:
Cov(X, Y) = E[(X E[X])(Y E[Y])]The correlation ofX andY is:
Corr(X, Y) =Cov(X, Y)Var XVar Y
8/14/2019 Prof[1] F P Kelly - Probability
22/78
16 CHAPTER 3. RANDOM VARIABLES
Linear Regression
Theorem 3.4. Var (X+ Y) = Var X + Var Y + 2Cov(X, Y)
Proof.
Var (X + Y) = E
(X+ Y)2 E[X] E[Y]2= E
(X E[X])2 + (Y E[Y])2 + 2(X E[X])(Y E[Y])
= Var X+ Var Y + 2Cov(X, Y)
3.3 Indicator Function
Definition 3.4. The Indicator Function I[A] of an eventA is the function
I[A](w) =
1, if A;0, if / A. (3.1)
NB that I[A] is a random variable
1.
E[I[A]] = P(A)
E[I[A]] =
pI[A](w)
= P(A)
2. I[Ac] = 1 I[A]
3. I[A
B] = I[A]I[B]
4.
I[A B] = I[A] + I[B] I[A]I[B]I[A B]() = 1 if A or BI[A B]() = I[A]() + I[B]() I[A]I[B]() WORKS!
Example. n couples are arranged randomly around a table such that males and fe-males alternate. LetN = The number of husbands sitting next to their wives. Calculate
8/14/2019 Prof[1] F P Kelly - Probability
23/78
3.3. INDICATOR FUNCTION 17
the E[N] and the Var N.
N =n
i=1
I[Ai] Ai = event couple i are together
E[N] = E
n
i=1
I[Ai]
=n
i=1
E[I[Ai]]
=
n
i=1
2
n
Thus E[N] = n2
n= 2
E
N2
= E
n
i=1
I[Ai]
2
= E
n
i=1
I[Ai]
2
+ 2ij
I[Ai]I[Aj ]
= nE
I[Ai]2
+ n(n 1)E[(I[A1]I[A2])]EI[Ai]2 = E[I[Ai]] = 2
nE[(I[A1]I[A2])] = IE[[A1 B2]] = P(A1 A2)
= P(A1)P(A2|A1)
=2
n
1
n 11
n 1 n 2n 1
2
n 1
Var N = E
N2 E[N]2
=2
n 1 (1 + 2(n 2)) 2
=2(n 2)
n 1
8/14/2019 Prof[1] F P Kelly - Probability
24/78
8/14/2019 Prof[1] F P Kelly - Probability
25/78
3.5. INDEPENDENCE 19
Proof.
P(f1(X1) = y1, . . . , f n(Xn) = yn) =
x1:f1(X1)=y1,...xn:fn(Xn)=yn
P(X1 = x1, . . . , X n = xn)
=N1
xi:fi(Xi)=yi
P(Xi = xi)
=N1
P(fi(Xi) = yi)
Theorem 3.6. IfX1.....Xn are independent random variables then:
E
N1
Xi
=
N1
E[Xi]
NOTE that E[
Xi] =E[Xi] without requiring independence.
Proof. Write Ri for RXi the range ofXi
EN
1
Xi = x1R1
.... xnRn
x1..xnP(X1 = x1, X2 = x2......., Xn = xn)
=N1
xiRi
P(Xi = xi)
=N1
E[Xi]
Theorem 3.7. IfX1,...,Xn are independent random variables and f1....fn are func-tion R
R then:
E
N1
fi(Xi)
=
N1
E[fi(Xi)]
Proof. Obvious from last two theorems!
Theorem 3.8. IfX1,...,Xn are independent random variables then:
Var
n
i=1
Xi
=
ni=1
Var Xi
8/14/2019 Prof[1] F P Kelly - Probability
26/78
20 CHAPTER 3. RANDOM VARIABLES
Proof.
Var
n
i=1
Xi
= E
n
i=1
Xi
2 E
ni=1
Xi
2
= E
i
X2i +i=j
XiXj
E
n
i=1
Xi
2
=i
E
X2i
+i=j
E[XiXj ] i
E[Xi]2
i=j
E[Xi]E[Xj ]
=i
E
X2i E[Xi]2
=i
Var Xi
Theorem 3.9. IfX1,...,Xn are independent identically distributed random variablesthen
Var
1
n
n
i=1Xi
=
1
nVar Xi
Proof.
Var
1
n
ni=1
Xi
=
1
n2Var Xi
=1
n2
ni=1
Var Xi
=1
nVar Xi
Example. Experimental Design. Two rods of unknown lengths a, b. A rule canmeasure the length but with but with error having 0 mean (unbiased) and variance 2.
Errors independent from measurement to measurement. To estimate a, b we could takeseparate measurements A, B of each rod.
E[A] = a Var A = 2
E[B] = b Var B = 2
8/14/2019 Prof[1] F P Kelly - Probability
27/78
3.5. INDEPENDENCE 21
Can we do better? YEP! Measure a + b as X anda b as Y
E[X] = a + b Var X = 2
E[Y] = a b Var Y = 2
E
X+ Y
2
= a
VarX+ Y
2=
1
22
E
X Y
2
= b
VarX Y
2=
1
22
So this is better.
Example. Non standard dice. You choose 1 then I choose one. Around this cycle
a B P(A B) = 23
. So the relation A better that B is not transitive.
8/14/2019 Prof[1] F P Kelly - Probability
28/78
22 CHAPTER 3. RANDOM VARIABLES
8/14/2019 Prof[1] F P Kelly - Probability
29/78
Chapter 4
Inequalities
4.1 Jensens Inequality
A function f; (a, b) R is convex if
f(px + qy) pf(x) + (1 p)f(y) - x, y (a, b) - p (0, 1)
Strictly convex if strict inequality holds when x = y
f is concave iff is convex. f is strictly concave iff is strictly convex
23
8/14/2019 Prof[1] F P Kelly - Probability
30/78
24 CHAPTER 4. INEQUALITIES
Concave
neither concave or convex.
We know that if f is twice differentiable and f
(x) 0 for x (a, b) the if f isconvex and strictly convex if f
(x) 0 forx (a, b).
Example.
f(x) = log x
f
(x) = 1
x
f
(x) =1
x2 0
f(x) is strictly convex on (0, )Example.
f(x) = x log xf(x) = (1 + logx)
f
(x) =1x
0
Strictly concave.
Example. f(x = x3 is strictly convex on (0, ) but not on (, )Theorem 4.1. Letf : (a, b) R be a convex function. Then:
ni=1
pif(xi) f
ni=1
pixi
x1, . . . , X n (a, b), p1, . . . , pn (0, 1) and
pi = 1. Further more if f is strictlyconvex then equality holds if and only if all xs are equal.
E[f(X)] f(E[X])
8/14/2019 Prof[1] F P Kelly - Probability
31/78
4.1. JENSENS INEQUALITY 25
Proof. By induction on n n = 1 nothing to prove n = 2 definition of convexity.
Assume results holds up to n-1. Consider x1,...,xn (a, b), p1,...,pn (0, 1) andpi = 1For i = 2...n, set p
i =pi
1 pi , such thatn
i=2
p
i = 1
Then by the inductive hypothesis twice, first for n-1, then for 2
n1
pifi(xi) = p1f(x1) + (1 p1)n
i=2
p
if(xi)
p1f(x1) + (1 p1)f
ni=2
p
ixi
fp1x1 + (1 p1)n
i=2pixi
= f
n
i=1
pixi
f is strictly convex n 3 and not all the xis equal then we assume not all of x2...xnare equal. But then
(1 pj)n
i=2
p
if(xi) (1 pj)f
ni=2
p
ixi
So the inequality is strict.
Corollary (AM/GM Inequality). Positive real numbers x1, . . . , xnn
i=1
xi
1n
1n
ni=1
xi
Equality holds if and only ifx1 = x2 = = xnProof. Let
P(X = xi) =1
n
then f(x) = log x is a convex function on (0, ).So
E[f(x)] f(E[x]) (Jensens Inequality)E[log x] logE[x] [1]
Therefore 1n
n1
log xi log 1n
n1
x
n
i=1
xi
1n
1n
ni=1
xi [2]
For strictness since f strictly convex equation holds in [1] and hence [2] if and only if
x1 = x2 = = xn
8/14/2019 Prof[1] F P Kelly - Probability
32/78
26 CHAPTER 4. INEQUALITIES
If f : (a, b) R is a convex function then it can be shown that at each pointy (a, b) a linear function y + yx such that
f(x) y + yx x (a, b)f(y) = y + yy
If f is differentiable at y then the linear function is the tangent f(y) + (x y)f(y)
Let y = E[X], = y and = y
f(E[X]) = + E[X]
So for any random variable X taking values in (a, b)
E[f(X)] E[ + X]= + E[X]
= f(E[X])
4.2 Cauchy-Schwarz Inequality
Theorem 4.2. For any random variables X, Y,
E[XY ]2 EX2EY2Proof. For a, b R Let
LetZ = aX bYThen0 EZ2 = E(aX bY)2
= a2E
X2 2abE[XY ] + b2EY2
quadratic in a with at most one real root and therefore has discriminant 0.
8/14/2019 Prof[1] F P Kelly - Probability
33/78
4.3. MARKOVS INEQUALITY 27
Take b = 0E[XY ]2 EX2EY2
Corollary.
|Corr(X, Y)| 1
Proof. Apply Cauchy-Schwarz to the random variables X E[X] and Y E[Y]
4.3 Markovs Inequality
Theorem 4.3. If X is any random variable with finite mean then,
P(|X| a) E[|X|]a
for any a 0
Proof. Let
A =
|X
| a
Then |X| aI[A]
Take expectation
E[|X|] aP(A)E[|X|] aP(|X| a)
4.4 Chebyshevs Inequality
Theorem 4.4. Let X be a random variable with E
X2 . Then 0
P(|X| ) E
X2
2
Proof.
I[|X| ] x2
2x
8/14/2019 Prof[1] F P Kelly - Probability
34/78
28 CHAPTER 4. INEQUALITIES
Then
I[|X| ] x2
2
Take Expectation
P(|X| ) E
x2
2
=E
X2
2
Note
1. The result is distribution free - no assumption about the distribution of X (other
thanE
X2 ).
2. It is the best possible inequality, in the following sense
X = + with probabilityc
22
= with probability c22
= 0 with probability 1 c2
Then P(|X| ) = c2
EX2
= cP(|X| ) = c
2= E
X2
2
3. If = E[X] then applying the inequality to X gives
P(X ) Var X2
Often the most useful form.
4.5 Law of Large Numbers
Theorem 4.5 (Weak law of large numbers). LetX1, X2..... be a sequences of inde-pendent identically distributed random variables with Variance 2 Let
Sn =n
i=1
Xi
Then
0, P
Snn
0 as n
8/14/2019 Prof[1] F P Kelly - Probability
35/78
4.5. LAW OF LARGE NUMBERS 29
Proof. By Chebyshevs Inequality
P
Snn
E
(Snn )2
2
=E
(Sn n)2
n22properties of expectation
=Var Sn
n22Since E[Sn] = n
But Var Sn = n2
Thus P
Snn
n
2
n22=
2
n2 0
Example. A1, A2... are independent events, each with probability p. Let Xi = I[Ai].Then
Snn
=nA
n=
number of times A occurs
number of trials
= E[I[Ai]] = P(Ai) = p
Theorem states that
P
Snn p
0 as n
Which recovers the intuitive definition of probability.
Example. A Random Sample of size n is a sequence X1, X2, . . . , X n of independent
identically distributed random variables (n observations)
X =
ni=1 Xi
nis called the SAMPLE MEAN
Theorem states that provided the variance ofXi is finite, the probability that the samplemean differs from the mean of the distribution by more than approaches 0 as n .
We have shown the weak law of large numbers. Why weak? a strong form oflarger numbers.
P
Snn
as n
= 1
This is NOT the same as the weak form. What does this mean?
determines Snn
, n = 1, 2, . . .
as a sequence of real numbers. Hence it either tends to or it doesnt.
P
:
Sn()
n as n
= 1
8/14/2019 Prof[1] F P Kelly - Probability
36/78
30 CHAPTER 4. INEQUALITIES
8/14/2019 Prof[1] F P Kelly - Probability
37/78
Chapter 5
Generating Functions
In this chapter, assume that X is a random variable taking values in the range 0, 1, 2, . . ..Let pr = P(X = r) r = 0, 1, 2, . . .
Definition 5.1. The Probability Generating Function (p.g.f) of the random variable
X,or of the distribution pr = 0, 1, 2, . . ., is
p(z) = E
zX
=r=0
zrP(X = r) =r=0
przr
This p(z) is a polynomial or a power series. If a power series then it is convergent for|z| 1 by comparison with a geometric series.
|p(z)| r
pr |z|r
r
pr = 1
Example.
pr =1
6r = 1, . . . , 6
p(z) = E
zX
=1
6
1 + z + . . . z6
=
z
6
1 z61 z
Theorem 5.1. The distribution of X is uniquely determined by the p.g.f p(z).
Proof. We know that we can differential p(z) term by term for |z| 1p(z) = p1 + 2p2z + . . .
and so p(0) = p1 (p(0) = p0)
Repeated differentiation gives
p(i)(z) =r=i
r!
(r i)!przri
and has p(i) = 0 = i!pi Thus we can recover p0, p1, . . . from p(z)
31
8/14/2019 Prof[1] F P Kelly - Probability
38/78
32 CHAPTER 5. GENERATING FUNCTIONS
Theorem 5.2 (Abels Lemma).
E[X] = limz1p(z)
Proof.
p(z) =r=i
rprzr1 |z| 1
For z (0, 1), p(z) is a non decreasing function of z and is bounded above by
E[X] =r=i
rpr
Choose 0, N large enough thatNr=i
rpr E[X]
Then
limz1
r=i
rprzr1 lim
z1
Nr=i
rprzr1 =
Nr=i
rpr
True 0 and soE[X] = lim
z1p(z)
Usually p(z) is continuous at z=1, then E[X] = p(1).
Recall p(z) = z
61 z
6
1 z
Theorem 5.3.
E[X(X 1)] = limz1
p(z)
Proof.
p(z) =r=2
r(r 1)pzr2
Proof now the same as Abels Lemma
Theorem 5.4. Suppose that X1, X2, . . . , X n are independent random variables withp.g.fs p1(z), p2(z), . . . , pn(z). Then the p.g.f of
X1 + X2 + . . . X n
is
p1(z)p2(z) . . . pn(z)
Proof.
E
zX1+X2+...Xn
= E
zX1 .zX2 . . . . zXn
= E
zX1E
zX2
. . .E
zXn
= p1(z)p2(z) . . . pn(z)
8/14/2019 Prof[1] F P Kelly - Probability
39/78
33
Example. Suppose X has Poisson Distribution
P(X = r) = e rr!
r = 0, 1, . . .
Then
E
zX
=r=0
zrer
r!
= eez
= e(1z)
Lets calculate the variance of X
p
= e(1z) p
= 2e(1z)
Then
E[X] = limz1
p(z) = p
(1)( Since p
(z) continuous atz = 1 )E[X] =
E[X(X 1)] = p(1) = 2Var X = E
X2
E[X]2= E[X(X 1)] + E[X] E[X]2= 2 + 2=
Example. Suppose that Y has a Poisson Distribution with parameter. If X and Y areindependent then:
E
zX+Y
= E
zXE
zY
= e(1z)e(1z)
= e(+)(1z)
But this is the p.g.f of a Poisson random variable with parameter + . By uniqueness(first theorem of the p.g.f) this must be the distribution for X + Y
Example. X has a binomial Distribution,
P(X = r) =
n
rpr(1 p)nr r = 0, 1, . . .
E
zX
=n
r=0
n
r
pr(1 p)nrzr
= (pz + 1 p)n
This shows thatX = Y1 + Y2 + + Yn. Where Y1 + Y2 + + Yn are independentrandom variables each with
P(Yi = 1) = p P(Yi = 0) = 1 pNote if the p.g.f factorizes look to see if the random variable can be written as a sum.
8/14/2019 Prof[1] F P Kelly - Probability
40/78
34 CHAPTER 5. GENERATING FUNCTIONS
5.1 Combinatorial Applications
Tile a (2 n) bathroom with (2 1) tiles. How many ways can this be done? Say fnfn = fn1 + fn2 f0 = f1 = 1
Let
F(z) =n=0
fnzn
fnzn = fn1zn + fn2zn
n=2
fnzn =
n=2
fn1zn +n=0
fn2zn
F(z) f0 zf1 = z(F(z) f0) + z2F(z)F(z)(1 z z2) = f0(1 z) + zf1
= 1 z + z = 1.Since f0 = f1 = 1, then F(z) =
11zz2
Let
1 =1 +
5
22 =
1 52
F(z) =1
(1 1z)(1 2z)=
1
(1 1z) 2
(1 2z)=
1
1 2
1
n=0
n1 zn 2
n=0
n2zn
The coefficient ofzn1 , that is fn, is
fn =1
1 2 (n+11 n+12 )
5.2 Conditional Expectation
Let X and Y be random variables with joint distribution
P(X = x, Y = y)
Then the distribution of X is
P(X = x) =yRy
P(X = x, Y = y)
This is often called the Marginal distribution for X. The conditional distribution for Xgiven by Y = y is
P(X = x|Y = y) = P(X = x, Y = y)P(Y = y)
8/14/2019 Prof[1] F P Kelly - Probability
41/78
5.3. PROPERTIES OF CONDITIONAL EXPECTATION 35
Definition 5.2. The conditional expectation of X given Y = y is,
E[X = x|Y = y] =xRx
xP(X = x|Y = y)
The conditional Expectation of X given Y is the random variable E[X|Y] defined by
E[X|Y] () = E[X|Y = Y()]
Thus E[X|Y] : R
Example. Let X1, X2, . . . , X n be independent identically distributed random vari-ables with P(X1 = 1) = p andP(X1 = 0) = 1 p. Let
Y = X1 + X2 + + Xn
Then
P(X1 = 1|Y = r) = P(X1 = 1, Y = r)P(Y = r)
=P(X1 = 1, X2 + + Xn = r 1)
P(Y = r)
=P(X1)P(X2 + + Xn = r 1)
P(Y = r)
=p
n1r1
pr1(1 p)nrnrpr
(1 p)nr
=
n1r1
nr
=
r
n
Then
E[X1|Y = r] = 0 P(X1 = 0|Y = r) + 1 P(X1 = 1|Y = r)=
r
n
E[X1|Y = Y()] =
1
n
Y()
Therefore E[X1|Y] = 1n
Y
Note a random variable - a function of Y.
5.3 Properties of Conditional Expectation
Theorem 5.5.
E[E[X|Y]] = E[X]
8/14/2019 Prof[1] F P Kelly - Probability
42/78
36 CHAPTER 5. GENERATING FUNCTIONS
Proof.
E[E[X|Y]] =yRy
P(Y = y)E[X|Y = y]
=y
P(Y = y)xRx
P(X = x|Y = y)
=y
x
xP(X = x|Y = y)
= E[X]
Theorem 5.6. IfX andY are independent then
E[X|Y] = E[X]
Proof. IfX and Y are independent then for any y Ry
E[X|Y = y] =xRx
xP(X = x|Y = y) =x
xP(X = x) = E[X]
Example. Let X1, X2, . . . be i.i.d.r.vs with p.g.f p(z). Let N be a random variableindependent ofX1, X2, . . . with p.g.fh(z). What is the p.g.f of:
X1 + X2 + + XN
E
zX1+,...,Xn
= EE
zX1+,...,Xn |N=
n=0
P(N = n)E
zX1+,...,Xn |N = n
=n=0
P(N = n) (p(z))n
= h(p(z))
Then for example
E[X1+, . . . , X n] =d
dzh(p(z))
z=1
= h(1)p
(1) = E[N]E[X1]
Exercise Calculate d2
dz2 h(p(z)) and hence
Var X1+, . . . , X n
In terms ofVar N and Var X1
8/14/2019 Prof[1] F P Kelly - Probability
43/78
5.4. BRANCHING PROCESSES 37
5.4 Branching Processes
X0, X1 . . . sequence of random variables. Xn number of individuals in the nth gener-
ation of population. Assume.
1. X0 = 1
2. Each individual lives for unit time then on death produces k offspring, probabil-ity fk. fk = 1
3. All offspring behave independently.
Xn+1 = Yn1 + Y
n2 + + Ynn
Where Yni are i.i.d.r.vs. Yni number of offspring of individual i in generation n.
Assume
1. f0 02. f0 + f1 1
Let F(z) be the probability generating function ofYni .
F(z) =n=0
fkzk = E
zXi
= E
zY
ni
Let
Fn(z) = E
zXn
Then F1(z) = F(z) the probability generating function of the offspring distribution.
Theorem 5.7.
Fn+1(z) = Fn(F(z)) = F(F(. . . (F(z)) . . . ))
Fn(z) is an n-fold iterative formula.
8/14/2019 Prof[1] F P Kelly - Probability
44/78
38 CHAPTER 5. GENERATING FUNCTIONS
Proof.
Fn+1(z) = E
zXn+1
= EE
zXn+1 |Xn
=
n=0
P(Xn = k)E
zXn+1 |Xn = k
=n=0
P(Xn = k)E
zYn1 +Y
n2 ++Ynn
=
n=0
P(Xn = k)E
zYn1
. . .E
zY
nn
=
n=0
P(Xn = k) (F(z))k
= Fn(F(z))
Theorem 5.8. Mean and Variance of population size
Ifm =k=0
kfk
and2 =
k=0
(k
m)2fk
Mean and Variance of offspring distribution.
Then E[Xn] = mn
Var Xn =
2mn1(mn1)
m1 , m = 1n2, m = 1
(5.1)
Proof. Prove by calculating F(z), F
(z) Alternatively
E[Xn] = E[E[Xn|Xn1]]= E[m
|Xn
1]
= mE[Xn1]= mn by induction
E
(Xn mXn1)2
= EE
(Xn mXn1)2|Xn
= E[Var (Xn|Xn1)]= E
2Xn1
= 2mn1
Thus
E
X2n 2mE[XnXn1] + m2EX2n12 = 2mn1
8/14/2019 Prof[1] F P Kelly - Probability
45/78
5.4. BRANCHING PROCESSES 39
Now calculate
E[XnXn1] = E[E[XnXn1|Xn1]]= E[Xn1E[Xn|Xn1]]= E[Xn1mXn1]
= mE
X2n1
Then E
X2n
= 2mn1 + m2E[Xn1]2
Var Xn = E
X2n E[Xn]2
= m2E
X2n1
+ 2mn1 m2E[Xn1]2= m2 Var Xn1 + 2mn1
= m4 Var Xn2 +
2(mn1 + mn)
= m2(n1) Var X1 + 2(mn1 + mn + + m2n3)= 2mn1(1 + m + + mn)
To deal with extinction we need to be careful with limits as n . Let
An = Xn = 0
= Extinction occurs by generation n
and let A =
1
An
= the event that extinction ever occurs
Can we calculate P(A) from P(An)?
More generally let An be an increasing sequence
A1 A2 . . .
and define
A = limnAn =
1 An
Define Bn for n 1
B1 = A1
Bn = An
n1i=1
Ai
c
= An Acn1
8/14/2019 Prof[1] F P Kelly - Probability
46/78
40 CHAPTER 5. GENERATING FUNCTIONS
Bn for n 1 are disjoint events andi=1
Ai =i=1
Bi
ni=1
Ai =n
i=1
Bi
P
i=1
Ai
= P
i=1
Bi
=1
P(Bi)
= limn
n1P(Bi)
= limn
ni=1
Bi
= limn
ni=1
Ai
= limnP(An)
Thus
P
limn
An
= lim
nP(An)
Probability is a continuous set function. Thus
P(extinction ever occurs) = limn
P(An)
= limnP(Xn = 0)
= q, Say
Note P(Xn = 0), n = 1, 2, 3, . . . is an increasing sequence so limit exists. But
P(Xn = 0) = Fn(0) Fn is the p.g.f ofXn
So
q = limnFn(0)Also
F(q) = F
limn
Fn(0)
= limnF(Fn(0))
Since F is continuous
= limnFn+1(0)
Thus F(q) = q
q is called the Extinction Probability.
8/14/2019 Prof[1] F P Kelly - Probability
47/78
5.4. BRANCHING PROCESSES 41
Alternative Derivation
q =k
P(X1 = k)P(extinction|X1 = k)
=
P(X1 = k) qk
= F(q)
Theorem 5.9. The probability of extinction, q, is the smallest positive root of the equa-tion F(q) = q. m is the mean of the offspring distribution.
Ifm 1 then q = 1, while ifm 1thenq 1
Proof.
F(1) = 1 m =0
kf
k = limz1
F(z)
F
(z) =j=z
j(j 1)zj2 in 0 z 1 Since f0 + f1 1 Also F(0) = f0 0
Thus ifm 1, there does not exists a q (0, 1) with F(q) = q. If m 1 then let
be the smallest positive root ofF(z) = z then 1. Further,
F(0) F() = F(F(0)) F() =
Fn(0) n 1q = lim
nFn(0) 0q = Since q is a root ofF(z) = z
8/14/2019 Prof[1] F P Kelly - Probability
48/78
42 CHAPTER 5. GENERATING FUNCTIONS
5.5 Random Walks
Let X1, X2, . . . be i.i.d.r.vs. Let
Sn = S0 + X1 + X2 + + Xn Where, usually S0 = 0
Then Sn (n = 0, 1, 2, . . . is a 1 dimensional Random Walk.
We shall assume
Xn =
1, with probability p
1, with probability q (5.2)
This is a simple random walk. If p = q = 12
then the random walk is called symmetric
Example (Gamblers Ruin). You have an initial fortune of A and I have an initial fortune of B. We toss coins repeatedly I win with probability p and you win withprobability q. What is the probability that I bankrupt you before you bankrupt me?
8/14/2019 Prof[1] F P Kelly - Probability
49/78
5.5. RANDOM WALKS 43
Seta = A + B andz = B Stop a random walk starting at z when it hits 0 ora.
Letpz be the probability that the random walk hits a before it hits 0, starting fromz. Letqz be the probability that the random walk hits 0 before it hits a , starting fromz. After the first step the gamblers fortune is eitherz 1 or z + 1 with prob p andqrespectively. From the law of total probability.
pz = qpz1 +ppz+1 0 z a
Also p0 = 0 andpa = 1. Must solve pt
2
t + q = 0.t =
1 1 4pq2p
=1 1 2p
2p= 1 or
q
p
General Solution forp = q is
pz = A + B
q
p
zA + B = 0A =
1
1 qp
aand so
pz =1
qp
z
1 qpaIfp = q, the general solution is A + Bz
pz =z
a
To calculate qz, observe that this is the same problem withp,q,z replaced byp,q,azrespectively. Thus
qz =
qp
aqp
zqp
a 1
ifp = q
8/14/2019 Prof[1] F P Kelly - Probability
50/78
44 CHAPTER 5. GENERATING FUNCTIONS
or
qz =
a
z
z ifp = qThus qz +pz = 1 and so on, as we expected, the game ends with probability one.
P(hits 0 before a) = qz
qz =
qp
a ( qp )z
qp
a 1
ifp = q
Or =a z
zifp = q
What happens as a ?
P( paths hit0 ever) =
a=z+1
path hits 0 before it hits a
P(hits 0 ever) = limaP(hits 0 before a)
= lima qz
=
q
p
p q
= 1 p = q
LetG be the ultimate gain or loss.
G = a z, with probability pzz, with probability qz (5.3)
E[G] =
apz z, ifp = q0, ifp = q
(5.4)
Fair game remains fair if the coin is fair then then games based on it have expected
reward0.
Duration of a Game Let Dz be the expected time until the random walk hits 0or a, starting from z. Is Dz finite? Dz is bounded above by x the mean of geometricrandom variables (number of windows of size a before a window with all +1s or1s). Hence Dz is finite. Consider the first step. Then
Dz = 1 +pDz+1 + qDz1E[duration] = E[E[duration | first step]]
= p (E[duration | first step up]) + q (E[duration | first step down])= p(1 + Dz+1) + q(1 + Dz1)
Equation holds for 0 z a with D0 = Da = 0. Lets try for a particular solutionDz = Cz
Cz = Cp(z + 1) + Cq(z 1) + 1C =
1
q p for p = q
8/14/2019 Prof[1] F P Kelly - Probability
51/78
5.5. RANDOM WALKS 45
Consider the homogeneous relation
pt2 t + q = 0 t1 = 1 t2 = qp
General Solution for p = q is
Dz = A + B
q
p
z+
z
q = p
Substitute z = 0, a to get A and B
Dz =z
q p a
q p1
qp
z1
qp
a p = q
Ifp = q then a particular solution is z2. General solutionDz z2 + A + Bz
Substituting the boundary conditions given.,
Dz = z(a z) p = qExample. Initial Capital.
p q z a P(ruin) E[gain] E[duration]0.5 0.5 90 100 0.1 0 900
0.45 0.55 9 10 0.21 -1.1 11
0.45 0.55 90 100 0.87 -77 766
Stop the random walk when it hits 0 ora.We have absorption at0 ora. Let
Uz,n = P(r.w. hits 0 at time nstarts at z)
Uz,n+1 = pUz+1,n + qUz1,n 0 z a n 0U0,n = Ua,n = 0 n 0
Ua,0 = 1Uz,0 = 0 0 z a
LetUz =
n=0
Uz,nsn.
Now multiply by sn+1 and add forn = 0, 1, 2 . . .
Uz(s) = psUz+1(s) + qsUz1(s)Where U0(s) = 1 andUa(s) = 0
Look for a solution
Ux(s) = ((s))z (s)
8/14/2019 Prof[1] F P Kelly - Probability
52/78
46 CHAPTER 5. GENERATING FUNCTIONS
Must satisfy
(s) = ps (((s))
2
+ qsTwo Roots,
1(s), 2(s) =1
1 4pqs22ps
Every Solution of the form
Uz(s) = A(s) (1(s))z + B(s) (2(s))
z
Substitute U0(s) = 1 andUa(s) = 0.A(s) + B(s) = 1 and
A(s) (1(s))a
+ B(s) (2(s))a
= 0
Uz(s) =(1(s))a (2(s))z (1(s))z (2(s))a
(1(s))a (2(s))a
But1(s)2(s) =q
precall quadratic
Uz(s) =
q
p
(1(s))
az (2(s))az(1(s))
a (2(s))a
Same method give generating function for absorption probabilities at the other barrier.
Generating function for the duration of the game is the sum of these two generating
functions.
8/14/2019 Prof[1] F P Kelly - Probability
53/78
Chapter 6
Continuous Random Variables
In this chapter we drop the assumption that id finite or countable. Assume we aregiven a probability p on some subset of.
For example, spin a pointer, and let give the position at which it stops, with = : 0 2. Let
P( [0, ]) = 2
(0 2)Definition 6.1. A continuous random variable X is a function X : Rfor which
P(a X() b) =ba
f(x)dx
Where f(x) is a function satisfying
1. f(x) 02.
+ f(x)dx = 1
The function f is called the Probability Density Function.
For example, if X() = given position of the pointer then x is a continuousrandom variable with p.d.f
f(x) =
12
, (0 x 2)0, otherwise
(6.1)
This is an example of a uniformly distributed random variable. On the interval [0, 2]
47
8/14/2019 Prof[1] F P Kelly - Probability
54/78
48 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
in this case. Intuition about probability density functions is based on the approximate
relation.P(X [x, x + xx]) =
x+xx
x
f(z)dz
Proofs however more often use the distribution function
F(x) = P(X x)F(x) is increasing in x.
IfX is a continuous random variable then
F(x) =
x
f(z)dz
and so F is continuous and differentiable.
F(x) = f(x)
(At any point x where then fundamental theorem of calculus applies).
The distribution function is also defined for a discrete random variable,
F(x) =
:X()xp
and so F is a step function.
In either case
P(a X b) = P(X b) P(X a) = F(b) F(a)
8/14/2019 Prof[1] F P Kelly - Probability
55/78
49
Example. The exponential distribution. Let
F(x) =
1 ex, 0 x 0, x 0 (6.2)
The corresponding pdf is
f(x) = ex 0 x
this is known as the exponential distribution with parameter . IfX has this distribu-tion then
P(X x + z|X z) = P(X x + z)P(X z)
= e(x+z)ez
= ex
= P(X x)
This is known as the memoryless property of the exponential distribution.
Theorem 6.1. IfXis a continuous random variable with pdff(x) andh(x) is a contin-uous strictly increasing function with h1(x) differentiable then h(x) is a continuousrandom variable with pdf
fh(x) = fh1(x) ddx
h1(x)
Proof. The distribution function ofh(X) is
P(h(X) x) = PX h1(x) = Fh1(x)Since h is strictly increasing and F is the distribution function of X Then.
d
dxP(h(X) x)
is a continuous random variable with pdf as claimed fh. Note usually need to repeatproof than remember the result.
8/14/2019 Prof[1] F P Kelly - Probability
56/78
50 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example. Suppose X U[0, 1] that is it is uniformly distributed on [0, 1] ConsiderY = log x
P(Y y) = P( log X y)= P
X eY
=
1eY
1dx
= 1 eY
Thus Y is exponentially distributed.
More generally
Theorem 6.2. LetU
U[0, 1]. For any continuous distribution function F, the randomvariable X defined by X = F1(u) has distribution function F.
Proof.
P(X x) = PF1(u) x= P(U F(x))= F(x) U[0, 1]
Remark
1. a bit more messy for discrete random variables
P(X = Xi) = pi i = 0, 1, . . .
Let
X = xj if
j1i=0
pi U j
i=0
pi U U[0, 1]
2. useful for simulations
6.1 Jointly Distributed Random Variables
For two random variables X and Y the joint distribution function is
F(x, y) = P(X x, Y y) F : R2 [0, 1]
Let
FX(x) = P(Xz x)= P(X x, Y )= F(x, )= lim
yF(x, y)
8/14/2019 Prof[1] F P Kelly - Probability
57/78
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 51
This is called the marginal distribution of X. Similarly
FY(x) = F(, y)
X1, X2, . . . , X n are jointly distributed continuous random variables if for a set c Rb
P((X1, X2, . . . , X n) c) =
. . .
(x1,...,xn)c
f(x1, . . . , xn)dx1 . . . d xn
For some function f called the joint probability density function satisfying the obvious
conditions.
1.
f(x1, . . . , xn)dx1 0
2. . . .
Rn
f(x1, . . . , xn)dx1 . . . d xn = 1
Example. (n = 2)
F(x, y) = P(X x, Y y)=
x
y
f(u, v)dudv
and so f(x, y) =2F(x, y)
xy
Theorem 6.3. provided defined at (x, y). If X and y are jointly continuous randomvariables then they are individually continuous.
Proof. Since X and Y are jointly continuous random variables
P(X A) = P(X A, Y (, +))=
A
f(x, y)dxdy
= fAfX(x)dx
where fX(x) =
f(x, y)dy
is the pdf ofX.
Jointly continuous random variables X and Y are Independentif
f(x, y) = fX(x)fY(y)
Then P(X A, Y B) = P(X A)P(Y B)
Similarly jointly continuous random variables X1, . . . , X n are independent if
f(x1, . . . , xn) =
ni=1
fXi(xi)
8/14/2019 Prof[1] F P Kelly - Probability
58/78
52 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Where fXi(xi) are the pdfs of the individual random variables.
Example. Two points X and Y are tossed at random and independently onto a linesegment of length L. What is the probability that:
|X Y| l?
Suppose that at random means uniformly so that
f(x, y) =1
L2x, y [0, L]2
8/14/2019 Prof[1] F P Kelly - Probability
59/78
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 53
Desired probability
=
A
f(x, y)dxdy
=area of A
L2
=L2 212(L l)2
L2
=2Ll l2
L2
Example (Buffons Needle Problem). A needle of length l is tossed at random onto a
floor marked with parallel lines a distance L apart l L. What is the probability thatthe needle intersects one of the parallel lines.
Let [0, 2] be the angle between the needle and the parallel lines and let x bethe distance from the bottom of the needle to the line closest to it. It is reasonable to
suppose that X is distributed Uniformly.
X U[0, L] U[0, )
andX and are independent. Thus
f(x, ) =1
l0 x L and0
8/14/2019 Prof[1] F P Kelly - Probability
60/78
54 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
The needle intersects the line if and only if X sin The event A
=
A
f(x, )dxd
= l
0
sin
Ld
=2l
L
Definition 6.2. The expectation or mean of a continuous random variable X is
E[X] =
xf(x)dx
provided not both of xf(x)dx and0
xf(x)dx are infinite
Example (Normal Distribution). Let
f(x) =12
e(x)2
22 x
This is non-negative for it to be a pdf we also need to check that
f(x)dx = 1
Make the substitution z = x . Then
I =1
2
e(x)2
22 dx
=12
ez22 dz
Thus I2 =1
2
ex22 dx
ey22 dy
=1
2
e(y2+x2)
2 dxdy
=1
2
20
0
re22 drd
= 2
0
d = 1
Therefore I = 1. A random variable with the pdf f(x) given above has a Normaldistribution with parameters and2 we write this as
X N[, 2]The Expectation is
E[X] =12
xe(x)2
22 dx
=12
(x )e(x)2
22 dx +12
e(x)2
22 dx.
8/14/2019 Prof[1] F P Kelly - Probability
61/78
6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 55
The first term is convergent and equals zero by symmetry, so that
E[X] = 0 +
=
Theorem 6.4. IfX is a continuous random variable then,
E[X] =
0
P(X x) dx 0
P(X x) dx
Proof. 0
P(X x) dx =0
x
f(y)dy
dx
= 0
0
I[y
x]f(y)dydx
=
0
y0
dxf(y)dy
=
0
yf(y)dy
Similarly
0
P(X x) dx =0
yf(y)dy
result follows.
Note This holds for discrete random variables and is useful as a general way of
finding the expectation whether the random variable is discrete or continuous.
IfX takes values in the set [0, 1, . . . , ] Theorem states
E[X] =n=0
P(X n)
and a direct proof follows
n=0
P(X n) =n=0
m=0
I[m n]P(X = m)
=
m=0
n=0
I[m n]P(X = m)
=
m=0
mP(X = m)
Theorem 6.5. Let X be a continuous random variable with pdf f(x) and let h(x) bea continuous real-valued function. Then provided
|h(x)| f(x)dx
E[h(x)] =
h(x)f(x)dx
8/14/2019 Prof[1] F P Kelly - Probability
62/78
56 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Proof.
0
P(h(X) y) dy
=
0
x:h(x)0
f(x)dx
dy
=
0
x:h(x)0
I[h(x) y]f(x)dxdy
=
x:h(x)
h(x)00
dy
f(x)dx
= x:h(x)0
h(x)f(x)dy
Similarly
0
P(h(X) y) = x:h(x)0
h(x)f(x)dy
So the result follows from the last theorem.
Definition 6.3. The variance of a continuous random variable X is
Var X = E
(X E[X])2
Note The properties of expectation and variance are the same for discrete and contin-uous random variables just replace
with
in the proofs.
Example.
Var X = E
X2 E[X]2
=
x2f(x)dx
xf(x)dx
2
Example. Suppose X N[, 2] Letz = X then
P(Z z) = P
X
z
= P(X + z)
=
+z
12
e(x)2
22 dx
Let
u =
x
=
z
12
eu22 du
= (z) The distribution function of a N(0, 1) random variable
Z N(0, 1)
8/14/2019 Prof[1] F P Kelly - Probability
63/78
6.2. TRANSFORMATION OF RANDOM VARIABLES 57
What is the variance ofZ?
Var X = E
Z2 E[Z]2 Last term is zero
=12
z2ez22 dz
=
1
2ze
z22
+
ez22 dz
= 0 + 1 = 1
Var X = 1
Variance ofX?
X = + z
Thus E[X] = we know that already
Var X = 2 Var Z
Var X = 2
X (, 2)
6.2 Transformation of Random Variables
Suppose X1, X2, . . . , X n have joint pdff(x1, . . . , xn) let
Y1 = r1(X1, X2, . . . , X n)
Y2 = r2(X1, X2, . . . , X n)
...
Yn = rn(X1, X2, . . . , X n)
Let R Rn be such that
P((X1, X2, . . . , X n) R) = 1
Let S be the image of R under the above transformation suppose the transformationfrom R to S is 1-1 (bijective).
8/14/2019 Prof[1] F P Kelly - Probability
64/78
58 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Then inverse functions
x1 = s1(y1, y2, . . . , yn)
x2 = s2(y1, y2, . . . , yn) . . .
xn = sn(y1, y2, . . . , yn)
Assume that siyj
exists and is continuous at every point (y1, y2, . . . , yn) in S
J =
s1y1
. . . s1yn...
. . ....
sn
y1 . . .
sn
yn
(6.3)
IfA R
P((X1, . . . , X n) A) [1] =
A
f(x1, . . . , xn)dx1 . . . d xn
=
B
f(s1, . . . , sn) |J| dy1 . . . d yn
Where B is the image of A
= P((Y1, . . . , Y n) B)[2]
Since transformation is 1-1 then [1],[2] are the same
Thus the density for Y1, . . . , Y n is
g((y1, y2, . . . , yn) = f(s1(y1, y2, . . . , yn), . . . , sn(y1, y2, . . . , yn)) |J|
y1, y2, . . . , yn S= 0 otherwise.
Example (density of products and quotients). Suppose that(X, Y) has density
f(x, y) =
4xy, for0 x 1, 0 y 10, Otherwise.
(6.4)
LetU = XY andV = XY
8/14/2019 Prof[1] F P Kelly - Probability
65/78
6.2. TRANSFORMATION OF RANDOM VARIABLES 59
X =
U V Y =
V
U
x =
uv y =
v
u
x
u=
1
2
v
u
x
v=
1
2
u
v
y
u=
12
v12
u32
y
v=
1
2
uv.
Therefore |J| = 12u and so
g(u, v) =1
2u(4xy)
=1
2u 4uv
v
u
= 2u
vif(u, v) D
= 0 Otherwise.
Note U and V are NOT independent
g(u, v) = 2u
vI[(u, v) D]
not product of the two identities.
When the transformations are linear things are simpler still. Let A be the n ninvertible matrix.
Y1
...
Yn
= A
X1...
Xn
.
|J| = det A1 = det A1
Then the pdf of(Y1, . . . , Y n) is
g(y1, . . . ,n ) =
1
det A f(A1
g)
Example. Suppose X1, X2 have the pdff(x1, x2). Calculate the pdf ofX1 + X2.LetY = X1 + X2 andZ = X2. Then X1 = Y Z andX2 = Z.
A1 =
1 10 1
(6.5)
det A1 = 11
det A
Then
g(y, z) = f(x1, x2) = f(y z, z)
8/14/2019 Prof[1] F P Kelly - Probability
66/78
60 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
joint distributions ofY andX.
Marginal density of Y is
g(y) =
f(y z, z)dz y
org(y) =
f(z, y z)dz By change of variable
IfX1 andX2 are independent, with pgfs f1 andf2 then
f(x1, x2) = f(x1)f(x2)
and then g(y) =
f(y z)f(z)dz
- the convolution of f1 andf2
For the pdf f(x) x is a mode iff(x) f(x)xx is a median if x
f(x)dx
x
f(x)dx =1
2
For a discrete random variable, x is a median if
P(X x) 12
orP(X x) 12
IfX1, . . . , X n is a sample from the distribution then recall that the sample mean is
1
n
n
1
Xi
LetY1, . . . , Y n (the statistics) be the values of X1, . . . , X n arranged in increasingorder. Then the sample median is Yn+1
2ifn is odd or any value in
Yn2
, Yn+12
if n is even
IfYn = maxX1, . . . , X n andX1, . . . , X n are iidrvs with distribution F and den-sity f then,
P(Yn y) = P(X1 y , . . . , X n y)= (F(y))n
8/14/2019 Prof[1] F P Kelly - Probability
67/78
8/14/2019 Prof[1] F P Kelly - Probability
68/78
62 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Z = AY
Where
A =
1 0 0 . . . 0 01 1 0 . . . 0 00 1 1 . . . 0 0...
.... . .
......
0 0 . . . 1 1
(6.7)
det(A) = 1
h(z1, . . . , zn) = g(y1, . . . , yn)
= n!f(y1) . . . f (yn)
= n!ney1 . . . eyn
= n!ne(y1++yn)
= n!ne(z12z2++nzn)
=n
i=1
ieizn+1i
Thus h(z1, . . . , zn) is expressed as the product of n density functions and
Zn+1i exp(i)
exponentially distributed with parameteri, with z1, . . . , zn independent.
Example. LetX andY be independentN(0.1) random variables. Let
D = R2 = X2 + Y2
then tan = YX then
d = x2 + y2 and = arctany
x
|J| =
2x 2yyx2
1+( yx )2
1x
1+( yx )2
= 2 (6.8)
8/14/2019 Prof[1] F P Kelly - Probability
69/78
8/14/2019 Prof[1] F P Kelly - Probability
70/78
64 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
(2)The chord is perpendicular to a given diameter and the point of intersection is
uniformly distributed over the diameter.
a2 +
3
2
2=
32
answer= 12
(3) The foot of the perpendicular to the chord from the centre of the circle is uni-
formly distributed over the diameter of the interior circle.
interior circle has radius 12
.
answer =122
12
=1
4
6.3 Moment Generating Functions
If X is a continuous random variable then the analogue of the pgf is the moment gen-
erating function defined by
m() = E
ex
for those such that m() is finite
m() =
exf(x)dx
where f(x) is the pdf ofX.
8/14/2019 Prof[1] F P Kelly - Probability
71/78
6.3. MOMENT GENERATING FUNCTIONS 65
Theorem 6.6. The moment generating function determines the distribution ofX, pro-
videdm() is finite for some interval containing the origin.Proof. Not proved.
Theorem 6.7. IfX andY are independent random variables with moment generatingfunction mx() andmy() then X+ Y has the moment generating function
mx+y() = mx() my()Proof.
E
e(x+y)
= E
exey
= Ee
x
Eey
= mx()my()
Theorem 6.8. The rth moment ofX ie the expected value of Xr, E[Xr], is the coeffi-cient of
r
r!of the series expansion of n().
Proof. Sketch of....
eX = 1 + X +2
2!X2 + . . .
EeX = 1 + E[X] +2
2!
EX2 + . . .
Example. Recall X has an exponential distribution, parameter if it has a densityex for0 x .
E
eX
=
0
exexdx
=
0
e()xdx
=
= m() for
E[X] = m(0) =
( )2=0
= 1
E
X2
=
2
( )2=0
=2
2
Thus
Var X = E
X2 E[X]2
=2
2 1
2
8/14/2019 Prof[1] F P Kelly - Probability
72/78
8/14/2019 Prof[1] F P Kelly - Probability
73/78
6.4. CENTRAL LIMIT THEOREM 67
Proof. 1.
E
e(X+Y)
= E
eX
E
eY
= e(1+12
21
2)e(2+12
22
2)
= e(1+2)+12 (
21+
22)
2
which is the moment generating function for
N[1 + 2, 21 +
22]
2.
E
e(aX)
= E
e(a)X
= e1(a)+
12
21(a)
2
= e(a1)+12a
2212
which is the moment generating function of
N[a1, a221]
6.4 Central Limit Theorem
X1, . . . , X n iidrvs, mean 0 and variance 2. Xi has density
Var Xi = 2
X1 + + Xn has Variance
Var X1 + + Xn = n2
8/14/2019 Prof[1] F P Kelly - Probability
74/78
68 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
X1++Xnn has Variance
VarX1 + + Xn
n=
2
n
X1++Xnn
has Variance
VarX1 + + Xn
n= 2
Theorem 6.10. LetX1, . . . , X n be iidrvs with E[Xi] = andVar Xi = 2 .
Sn =n1
Xi
Then (a, b) such that a b
limnP
a Sn n
n b
=
ba
12
ez22 dz
Which is the pdf of a N[0, 1] random variable.
8/14/2019 Prof[1] F P Kelly - Probability
75/78
6.4. CENTRAL LIMIT THEOREM 69
Proof. Sketch of proof.....
WLOG take = 0 and
2
= 1. we can replace Xi by
Xi
. mgf ofXi ismXi() = E
eXi
= 1 + E[Xi] +
2
2E
X2i
+3
3!E
X3i
+ . . .
= 1 +2
2+
3
3!E
X3i
+ . . .
The mgf of Snn
Ee Sn
n = Een(X1++Xn)
= E
e
nX1
. . .E
enXn
= E
enX1n
=
mX1
n
n
=
1 +
2
2n+
3E
X3
3!n32
n= e
2
2 as n
Which is the mgf ofN[0, 1] random variable.
Note ifSn Bin[n, p] Xi = 1 with probability p and = 0 with probability (1p).Then Sn np
npq N[0, 1]
This is called the normal approximation the the binomial distribution. Applies as n with p constant. Earlier we discussed the Poisson approximation to the binomial.which applies when n and np is constant.Example. There are two competing airlines. n passengers each select 1 of the 2 plans
at random. Number of passengers in plane one
S Bin[n, 12
]
Suppose each plane has s seats and let
f(s) = P(S s)S np
npq n[0, 1]
f(s) = P
S 1
2n
12
n
s 12
n12
n
= 1
2s nn
therefore ifn = 1000 ands = 537 then f(s) = 0.01. Planes hold 1074 seats only 74in excess.
8/14/2019 Prof[1] F P Kelly - Probability
76/78
70 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example. An unknown fraction of the electorate, p, vote labour. It is desired to find p
within an error no exceeding 0.005. How large should the sample be.Let the fraction of labour votes in the sample be p
. We can never be certain (with-
out complete enumeration), that
p p 0.005. Instead choose n so that the eventp p 0.005 have probability 0.95.
P
p p 0.005 = P(|Sn np| 0.005n)= P
|Sn np|npq
0.005
nn
Choose n such that the probability is 0.95.
1.961.96
12
ex22 dx = 2(1.96) 1
We must choose n so that
0.005
nn
1.96
But we dont know p. Butpq 14
with the worst case p = q = 12
n 1.962
0.00521
4 40, 000
If we replace 0.005 by 0.01 the n 10, 000 will be sufficient. And is we replace 0.005by 0.045 then n 475 will suffice.
Note Answer does not depend upon the total population.
8/14/2019 Prof[1] F P Kelly - Probability
77/78
6.5. MULTIVARIATE NORMAL DISTRIBUTION 71
6.5 Multivariate normal distribution
Let x1, . . . , X n be iid N[0, 1] random variables with joint density g(x1, . . . xn)
g(x1, . . . xn) =
ni=1
12
ex2i2
=1
(2)n2
e12
Pni=1 x
2i
=1
(2)n2
e12 xx
Write
X =
X1X2
...
Xn
and let z = + A X where A is an invertible matrix
x = A1(x ). Density ofz
f(z1, . . . , zn) =1
(2)n2
1
det Ae12 (A
1(z))T(A1(z))
=1
(2)n2 || 12
e12 (z)T1(z)
where AAT. This is the multivariate normal density
z MV N[, ]
Cov(zi, zj) = E[(zi i)(zj j)]But this is the (i, j) entry of
E
(z )(z )T = E(A X)(A X)T= AE
XXT
AT
= AIAT = AAT = Covariance matrix
If the covariance matrix of the MVN distribution is diagonal, then the components of
the random vector z are independent since
f(z1, . . . , zn) =n
i=1
1
(2)12 i
e12
ziii
2
Where
=
21 0 . . . 00 22 . . . 0...
.... . .
...
0 0 . . . 2n
Not necessarily true if the distribution is no MVN recall sheet 2 question 9.
8/14/2019 Prof[1] F P Kelly - Probability
78/78
72 CHAPTER 6. CONTINUOUS RANDOM VARIABLES
Example (bivariate normal).
f(x1, x2) =1
2(1 p2) 12 12
exp
1
2(1 p2)
x1 1
1
2
2p
x1 1
1
x2 2
2
+
x1 1
1
2
1, 2 0 and 1 p +1. Joint distribution of a bivariate normal randomvariable.
Example. An example with
1 =1
1 p2
21 p11
12
p11 12
22
=
21 p12
p12 22
E[Xi] = i andVar Xi = 2i . Cov(X1, X2) = 12p.
Correlation(X1, X2) =Cov(X1, X2)
12= p
Top Related