Post on 28-Mar-2021
Chapter 5
Chapter 5 sections
Discrete univariate distributions:5.2 Bernoulli and Binomial distributionsJust skim 5.3 Hypergeometric distributions5.4 Poisson distributionsJust skim 5.5 Negative Binomial distributions
Continuous univariate distributions:5.6 Normal distributions5.7 Gamma distributionsJust skim 5.8 Beta distributions
Multivariate distributionsJust skim 5.9 Multinomial distributions5.10 Bivariate normal distributions
1 / 43
Chapter 5 5.1 Introduction
Families of distributions
How:Parameter and Parameter spacepf /pdf and cdf - new notation: f (x | parameters )
Mean, variance and the m.g.f. ψ(t)Features, connections to other distributions, approximationReasoning behind a distribution
Why:Natural justification for certain experimentsA model for the uncertainty in an experiment
All models are wrong, but some are useful – George Box
2 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Bernoulli distributions
Def: Bernoulli distributions – Bernoulli(p)
A r.v. X has the Bernoulli distribution with parameter p if P(X = 1) = pand P(X = 0) = 1− p. The pf of X is
f (x |p) =
{px (1− p)1−x for x = 0,10 otherwise
Parameter space: p ∈ [0,1]
In an experiment with only two possible outcomes, “success” and“failure”, let X = number successes. Then X ∼ Bernoulli(p) wherep is the probability of success.E(X ) = p, Var(X ) = p(1− p) and ψ(t) = E(etX ) = pet + (1− p)
The cdf is F (x |p) =
0 for x < 01− p for 0 ≤ x < 11 for x ≥ 1
3 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Binomial distributions
Def: Binomial distributions – Binomial(n,p)
A r.v. X has the Binomial distribution with parameters n and p if X hasthe pf
f (x |n,p) =
{ (nx
)px (1− p)n−x for x = 0,1, . . . ,n
0 otherwise
Parameter space: n is a positive integer and p ∈ [0,1]
If X is the number of “successes” in n independent tries where prob. ofsuccess is p each time, then X ∼ Binomial(n,p)
Theorem 5.2.1If X1,X2, . . . ,Xn form n Bernoulli trials with parameter p(i.e. are i.i.d. Bernoulli(p)) then X = X1 + · · ·+ Xn ∼ Binomial(n,p)
4 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Binomial distributions
Let X ∼ Binomial(n,p)
E(X ) = np, Var(X ) = np(1− p)
To find the m.g.f. of X write X = X1 + · · ·+ Xn where Xi ’s arei.i.d. Bernoulli(p). Then ψi(t) = pet + 1− p and we get
ψ(t) =n∏
i=1
ψi(t) =n∏
i=1
(pet + 1− p
)= (pet + 1− p)n
cdf: F (x |n,p) =∑x
t=0(n
t
)pt (1− p)n−t = yikes!
Theorem 5.2.2If Xi ∼ Binomial(ni ,p), i = 1, . . . , k and the Xi ’s are independent, thenX = X1 + · · ·+ Xk ∼ Binomial(
∑ki=1 ni ,p)
5 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Example: Blood testing (Example 5.2.7)
The setup:1000 people need to be tested for a disease that affects 0.2% ofall people.The test is guaranteed to detect the disease if it is present in ablood sample.
Task: Find all the people that have the disease.Strategy: Test 1000 samples
What’s the expected number of people that have the disease?Any assumptions you need to make?
Strategy (611):Divide the people into 10 groups of 100.For each group take a portion of each of the 100 blood samplesand combine into one sample.Then test the combined blood samples (10 tests).
6 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Example: Blood testing (Example 5.2.7) – continued
Strategy (611):
If all of these tests are negative then none of the 1000 peoplehave the disease. Total number of tests needed: 10If one of these tests are positive then we test each of the 100people in that group. Total number of tests needed: 110...If all of the 10 tests are positive we end up having to do 1010 tests
Is this strategy better?What is the expected number of tests needed?When does this strategy lose?
7 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Example: Blood testing (Example 5.2.7) – continued
Let Yi = 1 if test for group i is positive and Yi = 0 otherwiseLet Y = Y1 + · · ·+ Y10 = the number of groups where everyindividual has to be tested.Total number of tests needed: T = 10 + 100Y .
Let Zi = number of people in group i that have the disease,i = 1, . . . ,10. Then Zi ∼ Binomial(100,0.002)Then Yi is a Bernoulli(p) r.v. where
p = P(Yi = 1) = P(Zi > 0) = 1− P(Zi = 0)
= 1−(
1000
)0.0020(1− 0.002)100 = 1− 0.998100 = 0.181
Then Y ∼ Binomial(10,0.181)ET = E(10+100Y ) = 10+100E(Y ) = 10+100(10×0.181) = 191
8 / 43
Chapter 5 5.2 Bernoulli and Binomial distributions
Example: Blood testing (Example 5.2.7) – continued
When does this strategy (611) lose?Worst case scenarioP(T ≥ 1000) = P(Y ≥ 9.9) = P(Y = 10) =
(1010
)0.181100.8190
3.8× 10−8
Question: can we go further - a 611-A strategy
Any further improvement?
9 / 43
Chapter 5 5.3 Hypergeometric distributions
Hypergeometric distributions
Def: Hypergeometric distributionsA random variable X has the Hypergeometric distribution withparameters N, M and n if it has the pf
f (x |N,M,n) =
(Nx
)( Mn−x
)(N+Mn
)Parameter space: N, M and n are nonnegative integers with
n ≤ N + M
Reasoning:Say we have a finite population with N items of type I and M itemsof type II.Let X be the number of items of type I when we take n sampleswithout replacement from that populationThen X has the hypergeometric distribution
10 / 43
Chapter 5 5.3 Hypergeometric distributions
Hypergeometric distributions
Binomial: Sampling with replacement(effectively infinite population)Hypergeometric: Sample without replacement from a finitepopulationYou can also think of the Hypergeometric distribution as a sum ofdependent Bernoulli trials
Limiting situation:Theorem 5.3.4: If the samples size n is much smaller than thetotal population N + M then the Hypergeometric distribution withparameters N, M and n will be nearly the same as the Binomialdistribution with parameters
n and p =N
N + M
11 / 43
Chapter 5 5.4 Poisson distributions
Poisson distributions
Def: Poisson distributions – Poisson(λ)
A random variable X has the Poisson distribution with mean λ if it hasthe pf
f (x |λ) =
{e−λλx
x! for x = 0,1,2 . . .0 otherwise
Parameter space: λ > 0
Show thatf (x |λ) is a pfE(X ) = λ
Var(X ) = λ
ψ(t) = eλ(et−1)
The cdf: F (x |λ) =∑x
k=0e−λλk
k! = yikes.
12 / 43
Chapter 5 5.4 Poisson distributions
Why Poisson?
The Poisson distribution is useful for modeling uncertainty incounts / arrivalsExamples:
How many calls arrive at a switch board in one hour?How many busses pass while you wait at the bus stop for 10 min?How many bird nests are there in a certain area?
Under certain conditions (Poisson postulates) the Poissondistribution can be shown to be the distribution of the number ofarrivals (Poisson process). However, the Poisson distribution isoften used as a model for uncertainty of counts in other types ofexperiments.The Poisson distribution can also be used as an approximation tothe Binomial(n,p) distribution when n is large and p is small.
13 / 43
Chapter 5 5.4 Poisson distributions
Poisson Postulates
For t ≥ 0, let Xt be a random variable with possible values in N0(Think: Xt = number of arrivals from time 0 to time t)
(i) Start with no arrivals: X0 = 0
(ii) Arrivals in disjoint time periods are ind.: Xs and Xt − Xs ind. if s < t
(iii) Number of arrivals depends only on period length:
Xs and Xt+s − Xt are identically distributed
(iv) Arrival probability is proportional to period length, if length is small:
limt→0
P(Xt = 1)t
= λ
(v) No simultaneous arrivals: limt→0P(Xt>1)
t = 0
If (i) - (v) hold then for any integer n
P(Xt = n) = e−λt (λt)n
n!that is, Xt ∼ Poisson(λt)
Can be defined in terms of spatial areas too.14 / 43
Chapter 5 5.4 Poisson distributions
Properties of the Poisson Distributions
Useful recursive property: P(X = x) = λx P(X = x − 1) for x ≥ 1
Theorem 5.4.4: Sum of Poissons is a PoissonIf X1, . . . ,Xk are independent r.v. and Xi ∼ Poisson(λi) for all i , then
X1 + · · ·+ Xk ∼ Poisson
(k∑
i=1
λi
)
Theorem 5.4.5: Approximation to Binomial
Let Xn ∼ Binomial(n,pn), where 0 < pn < 1 for all n and {pn}∞n=1 is asequence so that limn→∞ npn = λ. Then
limn→∞
fXn (x |n,pn) = e−λλx
x!= f Poisson(x |λ)
for all x = 0,1,2, . . .
15 / 43
Chapter 5 5.4 Poisson distributions
Example: Poisson as approximation to BinomialRecall the disease testing example. We had
X =1000∑i=1
Xi ∼ Binomial(1000,0.002) and
Y ∼ Binomial(100,0.181)
16 / 43
Chapter 5 5.5 Negative Binomial distributions
Geometric distributions
Def: Geometric distributions Geometric(p)
A random variable X has the Geometric distribution with parameter p ifit has the pf
f (x |r ,p) =
{p(1− p)x for x = 0,1,2 . . .0 otherwise
Parameter space: 0 < p < 1
Say we have an infinite sequence of Bernoulli trials withparameter pX = number of “failures” before the first “success” . ThenX ∼ Geometric(p)
17 / 43
Chapter 5 5.5 Negative Binomial distributions
Negative Binomial distributions
Def: Negative Binomial distributions – NegBinomial(r ,p)
A random variable X has the Negative Binomial distribution withparameters r and p if it has the pf
f (x |r ,p) =
{ (r+x−1x
)pr (1− p)x for x = 0,1,2 . . .
0 otherwise
Parameter space: 0 < p < 1 and r positive integer.
Say we have an infinite sequence of Bernoulli trials withparameter pX = number of “failures” before the r th “success”. ThenX ∼ NegBinomial(r ,p)
Geometric(p) = NegBinomial(1,p)
Theorem 5.5.2: If X1, . . . ,Xr are i.i.d. Geometric(p) thenX = X1 + · · ·+ Xr ∼ NegBinomial(r ,p)
18 / 43
Chapter 5 5.5 Negative Binomial distributions
Chapter 5 sections
Discrete univariate distributions:5.2 Bernoulli and Binomial distributionsJust skim 5.3 Hypergeometric distributions5.4 Poisson distributionsJust skim 5.5 Negative Binomial distributions
Continuous univariate distributions:5.6 Normal distributions5.7 Gamma distributionsJust skim 5.8 Beta distributions
Multivariate distributionsJust skim 5.9 Multinomial distributions5.10 Bivariate normal distributions
19 / 43
Chapter 5 5.7 Gamma distributions
Gamma distributions
The Gamma function: Γ(α) =∫∞
0 xα−1e−xdxΓ(1) = 1 and Γ(0.5) =
√π
Γ(α) = (α− 1)Γ(α− 1) if α > 1
Def: Gamma distributions – Gamma(α, β)
A continuous r.v. X has the gamma distribution with parameters α andβ if it has the pdf
f (x |α, β) =
{βα
Γ(α)xα−1e−βx for x > 00 otherwise
Parameter space: α > 0 and β > 0
Gamma(1, β) is the same as the exponential distribution withparameter β, Expo(β)
20 / 43
Chapter 5 5.7 Gamma distributions
Properties of the gamma distributions
ψ(t) =(
ββ−t
)α, for t < β.
E(X ) = αβ and E(X ) = α
β2
If X1, . . . ,Xk are independent Γ(αi , β) r.v. then
X1 + · · ·+ Xk ∼ Gamma
(k∑
i=1
αi , β
)
21 / 43
Chapter 5 5.7 Gamma distributions
Properties of the gamma distributions
Theorem 5.7.9: Exponential distribution is memoryless
Let X ∼ Expo(β) and let t > 0. Then for any h > 0
P(X ≥ t + h|X ≥ t) = P(X ≥ h)
Theorem 5.7.12: Times between arrivals in a Poisson process
Let Zk be the time until the k th arrival in a Poisson process with rate β.Let Y1 = Z1 and Yk = Zk − Zk−1 for k ≥ 2.Then Y1,Y2,Y3, . . . are i.i.d. with the exponential distribution withparameter β.
22 / 43
Chapter 5 5.8 Beta distributions
Beta distributions
Def: Beta distributions – Beta(α, β)
A continuous r.v. X has the beta distribution with parameters α and β ifit has the pdf
f (x |α, β) =
{Γ(α+β)
Γ(α)Γ(β)xα−1(1− x)β−1 for 0 < x < 10 otherwise
Parameter space: α > 0 and β > 0
Beta(1,1) = Uniform(0,1)
Used to model a r.v.that takes values between 0 and 1.The Beta distributions are often used as prior distributions forprobability parameters, e.g. the p in the Binomial distribution.
23 / 43
Chapter 5 5.8 Beta distributions
Beta distributions
24 / 43
Chapter 5 5.8 Beta distributions
Chapter 5 sections
Discrete univariate distributions:5.2 Bernoulli and Binomial distributionsJust skim 5.3 Hypergeometric distributions5.4 Poisson distributionsJust skim 5.5 Negative Binomial distributions
Continuous univariate distributions:5.6 Normal distributions5.7 Gamma distributionsJust skim 5.8 Beta distributions
Multivariate distributionsJust skim 5.9 Multinomial distributions5.10 Bivariate normal distributions
25 / 43
Chapter 5 5.6 Normal distributions
Why Normal?Works well in practice. Many physical experimentshave distributions that are approximately normalCentral Limit Theorem: Sum of many i.i.d. randomvariables are approximately normally distributedMathematically convenient – especially themultivariate normal distribution.
Can explicitly obtain the distribution of manyfunctions of a normally distributed random variablehave.Marginal and conditional distributions of amultivariate normal are also normal (multivariate orunivariate).
Developed by Gauss and then Laplace in the early1800sAlso known at the Gaussian distributions
Gauss
Laplace26 / 43
Chapter 5 5.6 Normal distributions
Normal distributions
Def: Normal distributions – N(µ, σ2)
A continuous r.v. X has the normal distribution with mean µ andvariance σ2 if it has the pdf
f (x |µ, σ2) =1√
2π σexp
(−(x − µ)2
2σ2
), −∞ < x <∞
Parameter space: µ ∈ R and σ2 > 0
Show:ψ(t) = exp
(µt + 1
2σ2t2)
E(X ) = µ
Var(X ) = σ2
27 / 43
Chapter 5 5.6 Normal distributions
The Bell curve
28 / 43
Chapter 5 5.6 Normal distributions
Standard normal
Standard normal distribution: N(0,1)
The normal distribution with µ = 0 and σ2 = 1 is called the standardnormal distribution and the pdf and cdf are denoted as φ(x) and Φ(x)
The cdf for a normal distribution cannot be expressed in closedform and is evaluated using numerical approximations.
Φ(x) is tabulated in the back of the book. Many calculators andprograms such as R, Matlab, Excel etc. can calculate Φ(x).
Φ(−x) = 1− Φ(x)
Φ−1(p) = −Φ−1(1− p)
29 / 43
Chapter 5 5.6 Normal distributions
Properties of the normal distributions
Theorem 5.6.4: Linear transformation of a normal is still normal
If X ∼ N(µ, σ2) and Y = aX + b where a and b are constants anda 6= 0 then
Y ∼ N(aµ+ b,a2σ2)
Let F be the cdf of X , where X ∼ N(µ, σ2). Then
F (x) = Φ
(x − µσ
)and
F−1(p) = µ+ σΦ−1(p)
30 / 43
Chapter 5 5.6 Normal distributions
Example: Measured Voltage
Suppose the measured voltage, X , in a certain electric circuit has thenormal distribution with mean 120 and standard deviation 2
1 What is the probability that the measured voltage is between 118and 122?
2 Below what value will 95% of the measurements be?
31 / 43
Chapter 5 5.6 Normal distributions
Properties of the normal distributions
Theorem 5.6.7: Linear combination of ind. normals is a normal
Let X1, . . . ,Xk be independent r.v. and Xi ∼ N(µi , σ2i ) for i = 1, . . . , k .
ThenX1 + · · ·+ Xk ∼ N
(µ1 + · · ·+ µk , σ
21 + · · ·+ σ2
k
)Also, if a1, . . . ,ak and b are constants where at least one ai is not zero:
a1X1 + · · ·+ akXk + b ∼ N
(b +
k∑i=1
aiµi ,
k∑i=1
a2i σ
2i
)
In particular:The sample mean: X n = 1
n∑n
i=1 Xi
If X1, . . . ,Xn are a random sample from a N(µ, σ2), what is thedistribution of the sample mean?
32 / 43
Chapter 5 5.6 Normal distributions
Example: Measured voltage – continued
Suppose the measured voltage, X , in a certain electric circuit has thenormal distribution with mean 120 and standard deviation 2.
If three independent measurements of the voltage are made, whatis the probability that the sample mean X 3 will lie between 118and 120?Find x that satisfies P(|X 3 − 120| ≤ x) = 0.95
33 / 43
Chapter 5 5.6 Normal distributions
Area under the curve
34 / 43
Chapter 5 5.6 Normal distributions
Lognormal distributions
Def: Lognormal distributions
If log(X ) ∼ N(µ, σ2) then we say that X has the Lognormal distributionwith parameters µ and σ2.
The support of the lognormaldistribution is (0,∞).Often used to model timebefore failure.
Example:Let X and Y be independent random variables such thatlog(X ) ∼ N(1.6,4.5) and log(Y ) ∼ N(3,6). What is thedistribution of the product XY?
35 / 43
Chapter 5 5.10 Bivariate normal distributions
Bivariate normal distributions
Def: Bivariate normalTwo continuous r.v. X1 and X2 have the bivariate normal distributionwith means µ1 and µ2, variances σ2
1 and σ22 and correlation ρ if they
have the joint pdf
f (x1, x2) =1
2π(1− ρ2)1/2σ1σ2
× exp(− 1
2(1− ρ2)
[(x1 − µ1)
2
σ21
− 2ρ(
x1 − µ1
σ1
)(x2 − µ2
σ2
)+
(x2 − µ2)2
σ22
])(1)
Parameter space: µi ∈ R, σ2i > 0 for i = 1,2 and −1 ≤ ρ ≤ 1
36 / 43
Chapter 5 5.10 Bivariate normal distributions
Bivariate normal pdf
Bivariate normal pdf with different ρ:
Contours:
37 / 43
Chapter 5 5.10 Bivariate normal distributions
Bivariate normal as linear combination
Theorem 5.10.1: Bivariate normal from two ind. standard normalsLet Z1 ∼ N(0,1) and Z2 ∼ N(0,1) be independent.Let µi ∈ R, σ2
i > 0 for i = 1,2 and −1 ≤ ρ ≤ 1 and let
X1 = σ1Z1 + µ1
X2 = σ2(ρZ1 +√
1− ρ2Z2) + µ2 (2)
Then the joint distribution of X1 and X2 is bivariate normal withparameters µ1, µ2, σ2
1, σ22 and ρ
Theorem 5.10.2 (part 1) – the other wayLet X1 and X2 have the pdf in (1). Then there exist independentstandard normal r.v. Z1 and Z2 so that (2) holds.
38 / 43
Chapter 5 5.10 Bivariate normal distributions
Properties of a bivariate normal
Theorem 5.10.2 (part 2)Let X1 and X2 have the pdf in (1). Then the marginal distributions are
X1 ∼ N(µ1, σ21) and X2 ∼ N(µ2, σ
22)
And the correlation between X1 and X2 is ρ
Theorem 5.10.4: The conditional is normalLet X1 and X2 have the pdf in (1). Then the conditional distribution ofX2 given that X1 = x1 is (univariate) normal with
E(X2|X1 = x1) = µ2 + ρσ2(x1 − µ1)
σ1and
Var(X2|X1 = x1) = (1− ρ2)σ22
39 / 43
Chapter 5 5.10 Bivariate normal distributions
Properties of a bivariate normal
Theorem 5.10.3: Uncorrelated⇒ IndependentLet X1 and X2 have the bivariate normal distribution. Then X1 and X2are independent if and only if they are uncorrelated.
Only holds for the multivariate normal distributionOne of the very convenient properties of the normal distribution
Theorem 5.10.5: Linear combinations are normalLet X1 and X2 have the pdf in (1) and let a1, a2 and b be constants.Then Y = a1X1 + a2X2 + b is normally distributed with
E(Y ) = a1µ1 + a2µ2 + b and
Var(Y ) = a21σ
21 + a2
2σ22 + 2a1a2ρσ1σ2
This extends what we already had for independent normals40 / 43
Chapter 5 5.10 Bivariate normal distributions
Example
Let X1 and X2 have the bivariate normaldistribution with means µ1 = 3, µ2 = 5,variances σ2
1 = 4, σ22 = 9 and correlation
ρ = 0.6.a) Find the distribution of X2 − 2X1
b) What is expected value of X2, given thatwe observed X1 = 2?
c) What is the probability that X1 > X2?
41 / 43
Chapter 5 5.10 Bivariate normal distributions
Multivariate normal – Matrix notation
The pdf of an n-dimensional normal distribution, X ∼ N(µ,Σ):
f (x) =1
(2π)n/2|Σ|1/2 exp{−1
2(x− µ)ᵀΣ−1(x− µ)
}where
µ =
µ1µ2...µn
, x =
x1x2...
xn
and Σ =
σ2
1 σ1,2 σ1,3 · · · σ1,nσ2,1 σ2
2 σ2,3 · · · σ2,nσ3,1 σ3,2 σ2
3 · · · σ3,n...
......
. . ....
σn,1 σn,2 σn,3 · · · σ2n
µ is the mean vector and Σ is called the variance-covariance matrix.
42 / 43
Chapter 5 5.10 Bivariate normal distributions
Multivariate normal – Matrix notation
Same things hold for multivariate normal distribution as the bivariate.Let X ∼ N(µ,Σ)
Linear combinations of X are normalAX + b is (multivariate) normal for fixed matrix A and vector bThe marginal distribution of Xi is normal with mean µi andvariance σ2
i
The off-diagonal elements of Σ are the covariances betweenindividual elements of X, i.e. Cov(Xi ,Xj) = σi,j .The joint marginal distributions are also normal where the meanand covariance matrix are found by picking the correspondingelements from µ and rows and columns from Σ.The conditional distributions are also normal (multivariate orunivariate)
43 / 43