Download - Chapter 5 Chapter 5 sections - Duke University...Chapter 5 5.2 Bernoulli and Binomial distributions Example: Blood testing (Example 5.2.7) – continued Strategy (611): If all of these

Chapter 5

Chapter 5 sections

Discrete univariate distributions:5.2 Bernoulli and Binomial distributionsJust skim 5.3 Hypergeometric distributions5.4 Poisson distributionsJust skim 5.5 Negative Binomial distributions

Continuous univariate distributions:5.6 Normal distributions5.7 Gamma distributionsJust skim 5.8 Beta distributions

Multivariate distributionsJust skim 5.9 Multinomial distributions5.10 Bivariate normal distributions

1 / 43

Chapter 5 5.1 Introduction

Families of distributions

How:Parameter and Parameter spacepf /pdf and cdf - new notation: f (x | parameters )

Mean, variance and the m.g.f. ψ(t)Features, connections to other distributions, approximationReasoning behind a distribution

Why:Natural justification for certain experimentsA model for the uncertainty in an experiment

All models are wrong, but some are useful – George Box

2 / 43

Chapter 5 5.2 Bernoulli and Binomial distributions

Bernoulli distributions

Def: Bernoulli distributions – Bernoulli(p)

A r.v. X has the Bernoulli distribution with parameter p if P(X = 1) = pand P(X = 0) = 1− p. The pf of X is

f (x |p) =

{px (1− p)1−x for x = 0,10 otherwise

Parameter space: p ∈ [0,1]

In an experiment with only two possible outcomes, “success” and“failure”, let X = number successes. Then X ∼ Bernoulli(p) wherep is the probability of success.E(X ) = p, Var(X ) = p(1− p) and ψ(t) = E(etX ) = pet + (1− p)

The cdf is F (x |p) =

0 for x < 01− p for 0 ≤ x < 11 for x ≥ 1

3 / 43


Binomial distributions

Def: Binomial distributions – Binomial(n,p)

A r.v. X has the Binomial distribution with parameters n and p if X hasthe pf

f (x |n,p) =

{ (nx

)px (1− p)n−x for x = 0,1, . . . ,n

0 otherwise

Parameter space: n is a positive integer and p ∈ [0,1]

If X is the number of “successes” in n independent tries where prob. ofsuccess is p each time, then X ∼ Binomial(n,p)

Theorem 5.2.1If X1,X2, . . . ,Xn form n Bernoulli trials with parameter p(i.e. are i.i.d. Bernoulli(p)) then X = X1 + · · ·+ Xn ∼ Binomial(n,p)

4 / 43


Binomial distributions

Let X ∼ Binomial(n,p)

E(X ) = np, Var(X ) = np(1− p)

To find the m.g.f. of X write X = X1 + · · ·+ Xn where Xi ’s arei.i.d. Bernoulli(p). Then ψi(t) = pet + 1− p and we get

ψ(t) =n∏

i=1

ψi(t) =n∏

i=1

(pet + 1− p

)= (pet + 1− p)n

cdf: F (x |n,p) =∑x

t=0(n

t

)pt (1− p)n−t = yikes!

Theorem 5.2.2If Xi ∼ Binomial(ni ,p), i = 1, . . . , k and the Xi ’s are independent, thenX = X1 + · · ·+ Xk ∼ Binomial(

∑ki=1 ni ,p)

5 / 43


Example: Blood testing (Example 5.2.7)

The setup:1000 people need to be tested for a disease that affects 0.2% ofall people.The test is guaranteed to detect the disease if it is present in ablood sample.

Task: Find all the people that have the disease.Strategy: Test 1000 samples

What’s the expected number of people that have the disease?Any assumptions you need to make?

Strategy (611):Divide the people into 10 groups of 100.For each group take a portion of each of the 100 blood samplesand combine into one sample.Then test the combined blood samples (10 tests).

6 / 43


Example: Blood testing (Example 5.2.7) – continued

Strategy (611):

If all of these tests are negative then none of the 1000 peoplehave the disease. Total number of tests needed: 10If one of these tests are positive then we test each of the 100people in that group. Total number of tests needed: 110...If all of the 10 tests are positive we end up having to do 1010 tests

Is this strategy better?What is the expected number of tests needed?When does this strategy lose?

7 / 43



Let Yi = 1 if test for group i is positive and Yi = 0 otherwiseLet Y = Y1 + · · ·+ Y10 = the number of groups where everyindividual has to be tested.Total number of tests needed: T = 10 + 100Y .

Let Zi = number of people in group i that have the disease,i = 1, . . . ,10. Then Zi ∼ Binomial(100,0.002)Then Yi is a Bernoulli(p) r.v. where

p = P(Yi = 1) = P(Zi > 0) = 1− P(Zi = 0)

= 1−(

1000

)0.0020(1− 0.002)100 = 1− 0.998100 = 0.181

Then Y ∼ Binomial(10,0.181)ET = E(10+100Y ) = 10+100E(Y ) = 10+100(10×0.181) = 191

8 / 43



When does this strategy (611) lose?Worst case scenarioP(T ≥ 1000) = P(Y ≥ 9.9) = P(Y = 10) =

(1010

)0.181100.8190

3.8× 10−8

Question: can we go further - a 611-A strategy

Any further improvement?

9 / 43

Chapter 5 5.3 Hypergeometric distributions

Hypergeometric distributions

Def: Hypergeometric distributionsA random variable X has the Hypergeometric distribution withparameters N, M and n if it has the pf

f (x |N,M,n) =

(Nx

)( Mn−x

)(N+Mn

)Parameter space: N, M and n are nonnegative integers with

n ≤ N + M

Reasoning:Say we have a finite population with N items of type I and M itemsof type II.Let X be the number of items of type I when we take n sampleswithout replacement from that populationThen X has the hypergeometric distribution

10 / 43

Chapter 5 5.3 Hypergeometric distributions

Hypergeometric distributions

Binomial: Sampling with replacement(effectively infinite population)Hypergeometric: Sample without replacement from a finitepopulationYou can also think of the Hypergeometric distribution as a sum ofdependent Bernoulli trials

Limiting situation:Theorem 5.3.4: If the samples size n is much smaller than thetotal population N + M then the Hypergeometric distribution withparameters N, M and n will be nearly the same as the Binomialdistribution with parameters

n and p =N

N + M

11 / 43

Chapter 5 5.4 Poisson distributions

Poisson distributions

Def: Poisson distributions – Poisson(λ)

A random variable X has the Poisson distribution with mean λ if it hasthe pf

f (x |λ) =

{e−λλx

x! for x = 0,1,2 . . .0 otherwise

Parameter space: λ > 0

Show thatf (x |λ) is a pfE(X ) = λ

Var(X ) = λ

ψ(t) = eλ(et−1)

The cdf: F (x |λ) =∑x

k=0e−λλk

k! = yikes.

12 / 43


Why Poisson?

The Poisson distribution is useful for modeling uncertainty incounts / arrivalsExamples:

How many calls arrive at a switch board in one hour?How many busses pass while you wait at the bus stop for 10 min?How many bird nests are there in a certain area?

Under certain conditions (Poisson postulates) the Poissondistribution can be shown to be the distribution of the number ofarrivals (Poisson process). However, the Poisson distribution isoften used as a model for uncertainty of counts in other types ofexperiments.The Poisson distribution can also be used as an approximation tothe Binomial(n,p) distribution when n is large and p is small.

13 / 43


Poisson Postulates

For t ≥ 0, let Xt be a random variable with possible values in N0(Think: Xt = number of arrivals from time 0 to time t)

(i) Start with no arrivals: X0 = 0

(ii) Arrivals in disjoint time periods are ind.: Xs and Xt − Xs ind. if s < t

(iii) Number of arrivals depends only on period length:

Xs and Xt+s − Xt are identically distributed

(iv) Arrival probability is proportional to period length, if length is small:

limt→0

P(Xt = 1)t

= λ

(v) No simultaneous arrivals: limt→0P(Xt>1)

t = 0

If (i) - (v) hold then for any integer n

P(Xt = n) = e−λt (λt)n

n!that is, Xt ∼ Poisson(λt)

Can be defined in terms of spatial areas too.14 / 43


Properties of the Poisson Distributions

Useful recursive property: P(X = x) = λx P(X = x − 1) for x ≥ 1

Theorem 5.4.4: Sum of Poissons is a PoissonIf X1, . . . ,Xk are independent r.v. and Xi ∼ Poisson(λi) for all i , then

X1 + · · ·+ Xk ∼ Poisson

(k∑

i=1

λi

)

Theorem 5.4.5: Approximation to Binomial

Let Xn ∼ Binomial(n,pn), where 0 < pn < 1 for all n and {pn}∞n=1 is asequence so that limn→∞ npn = λ. Then

limn→∞

fXn (x |n,pn) = e−λλx

x!= f Poisson(x |λ)

for all x = 0,1,2, . . .

15 / 43


Example: Poisson as approximation to BinomialRecall the disease testing example. We had

X =1000∑i=1

Xi ∼ Binomial(1000,0.002) and

Y ∼ Binomial(100,0.181)

16 / 43

Chapter 5 5.5 Negative Binomial distributions

Geometric distributions

Def: Geometric distributions Geometric(p)

A random variable X has the Geometric distribution with parameter p ifit has the pf

f (x |r ,p) =

{p(1− p)x for x = 0,1,2 . . .0 otherwise

Parameter space: 0 < p < 1

Say we have an infinite sequence of Bernoulli trials withparameter pX = number of “failures” before the first “success” . ThenX ∼ Geometric(p)

17 / 43


Negative Binomial distributions

Def: Negative Binomial distributions – NegBinomial(r ,p)

A random variable X has the Negative Binomial distribution withparameters r and p if it has the pf

f (x |r ,p) =

{ (r+x−1x

)pr (1− p)x for x = 0,1,2 . . .

0 otherwise

Parameter space: 0 < p < 1 and r positive integer.

Say we have an infinite sequence of Bernoulli trials withparameter pX = number of “failures” before the r th “success”. ThenX ∼ NegBinomial(r ,p)

Geometric(p) = NegBinomial(1,p)

Theorem 5.5.2: If X1, . . . ,Xr are i.i.d. Geometric(p) thenX = X1 + · · ·+ Xr ∼ NegBinomial(r ,p)

18 / 43


Chapter 5 sections




19 / 43

Chapter 5 5.7 Gamma distributions

Gamma distributions

The Gamma function: Γ(α) =∫∞

0 xα−1e−xdxΓ(1) = 1 and Γ(0.5) =

√π

Γ(α) = (α− 1)Γ(α− 1) if α > 1

Def: Gamma distributions – Gamma(α, β)

A continuous r.v. X has the gamma distribution with parameters α andβ if it has the pdf

f (x |α, β) =

{βα

Γ(α)xα−1e−βx for x > 00 otherwise

Parameter space: α > 0 and β > 0

Gamma(1, β) is the same as the exponential distribution withparameter β, Expo(β)

20 / 43


Properties of the gamma distributions

ψ(t) =(

ββ−t

)α, for t < β.

E(X ) = αβ and E(X ) = α

β2

If X1, . . . ,Xk are independent Γ(αi , β) r.v. then

X1 + · · ·+ Xk ∼ Gamma

(k∑

i=1

αi , β

)

21 / 43


Properties of the gamma distributions

Theorem 5.7.9: Exponential distribution is memoryless

Let X ∼ Expo(β) and let t > 0. Then for any h > 0

P(X ≥ t + h|X ≥ t) = P(X ≥ h)

Theorem 5.7.12: Times between arrivals in a Poisson process

Let Zk be the time until the k th arrival in a Poisson process with rate β.Let Y1 = Z1 and Yk = Zk − Zk−1 for k ≥ 2.Then Y1,Y2,Y3, . . . are i.i.d. with the exponential distribution withparameter β.

22 / 43

Chapter 5 5.8 Beta distributions

Beta distributions

Def: Beta distributions – Beta(α, β)

A continuous r.v. X has the beta distribution with parameters α and β ifit has the pdf

f (x |α, β) =

{Γ(α+β)

Γ(α)Γ(β)xα−1(1− x)β−1 for 0 < x < 10 otherwise

Parameter space: α > 0 and β > 0

Beta(1,1) = Uniform(0,1)

Used to model a r.v.that takes values between 0 and 1.The Beta distributions are often used as prior distributions forprobability parameters, e.g. the p in the Binomial distribution.

23 / 43


Beta distributions

24 / 43


Chapter 5 sections




25 / 43

Chapter 5 5.6 Normal distributions

Why Normal?Works well in practice. Many physical experimentshave distributions that are approximately normalCentral Limit Theorem: Sum of many i.i.d. randomvariables are approximately normally distributedMathematically convenient – especially themultivariate normal distribution.

Can explicitly obtain the distribution of manyfunctions of a normally distributed random variablehave.Marginal and conditional distributions of amultivariate normal are also normal (multivariate orunivariate).

Developed by Gauss and then Laplace in the early1800sAlso known at the Gaussian distributions

Gauss

Laplace26 / 43


Normal distributions

Def: Normal distributions – N(µ, σ2)

A continuous r.v. X has the normal distribution with mean µ andvariance σ2 if it has the pdf

f (x |µ, σ2) =1√

2π σexp

(−(x − µ)2

2σ2

), −∞ < x <∞

Parameter space: µ ∈ R and σ2 > 0

Show:ψ(t) = exp

(µt + 1

2σ2t2)

E(X ) = µ

Var(X ) = σ2

27 / 43


The Bell curve

28 / 43


Standard normal

Standard normal distribution: N(0,1)

The normal distribution with µ = 0 and σ2 = 1 is called the standardnormal distribution and the pdf and cdf are denoted as φ(x) and Φ(x)

The cdf for a normal distribution cannot be expressed in closedform and is evaluated using numerical approximations.

Φ(x) is tabulated in the back of the book. Many calculators andprograms such as R, Matlab, Excel etc. can calculate Φ(x).

Φ(−x) = 1− Φ(x)

Φ−1(p) = −Φ−1(1− p)

29 / 43


Properties of the normal distributions

Theorem 5.6.4: Linear transformation of a normal is still normal

If X ∼ N(µ, σ2) and Y = aX + b where a and b are constants anda 6= 0 then

Y ∼ N(aµ+ b,a2σ2)

Let F be the cdf of X , where X ∼ N(µ, σ2). Then

F (x) = Φ

(x − µσ

)and

F−1(p) = µ+ σΦ−1(p)

30 / 43


Example: Measured Voltage

Suppose the measured voltage, X , in a certain electric circuit has thenormal distribution with mean 120 and standard deviation 2

1 What is the probability that the measured voltage is between 118and 122?

2 Below what value will 95% of the measurements be?

31 / 43


Properties of the normal distributions

Theorem 5.6.7: Linear combination of ind. normals is a normal

Let X1, . . . ,Xk be independent r.v. and Xi ∼ N(µi , σ2i ) for i = 1, . . . , k .

ThenX1 + · · ·+ Xk ∼ N

(µ1 + · · ·+ µk , σ

21 + · · ·+ σ2

k

)Also, if a1, . . . ,ak and b are constants where at least one ai is not zero:

a1X1 + · · ·+ akXk + b ∼ N

(b +

k∑i=1

aiµi ,

k∑i=1

a2i σ

2i

)

In particular:The sample mean: X n = 1

n∑n

i=1 Xi

If X1, . . . ,Xn are a random sample from a N(µ, σ2), what is thedistribution of the sample mean?

32 / 43


Example: Measured voltage – continued

Suppose the measured voltage, X , in a certain electric circuit has thenormal distribution with mean 120 and standard deviation 2.

If three independent measurements of the voltage are made, whatis the probability that the sample mean X 3 will lie between 118and 120?Find x that satisfies P(|X 3 − 120| ≤ x) = 0.95

33 / 43


Area under the curve

34 / 43


Lognormal distributions

Def: Lognormal distributions

If log(X ) ∼ N(µ, σ2) then we say that X has the Lognormal distributionwith parameters µ and σ2.

The support of the lognormaldistribution is (0,∞).Often used to model timebefore failure.

Example:Let X and Y be independent random variables such thatlog(X ) ∼ N(1.6,4.5) and log(Y ) ∼ N(3,6). What is thedistribution of the product XY?

35 / 43

Chapter 5 5.10 Bivariate normal distributions

Bivariate normal distributions

Def: Bivariate normalTwo continuous r.v. X1 and X2 have the bivariate normal distributionwith means µ1 and µ2, variances σ2

1 and σ22 and correlation ρ if they

have the joint pdf

f (x1, x2) =1

2π(1− ρ2)1/2σ1σ2

× exp(− 1

2(1− ρ2)

[(x1 − µ1)

2

σ21

− 2ρ(

x1 − µ1

σ1

)(x2 − µ2

σ2

)+

(x2 − µ2)2

σ22

])(1)

Parameter space: µi ∈ R, σ2i > 0 for i = 1,2 and −1 ≤ ρ ≤ 1

36 / 43


Bivariate normal pdf

Bivariate normal pdf with different ρ:

Contours:

37 / 43


Bivariate normal as linear combination

Theorem 5.10.1: Bivariate normal from two ind. standard normalsLet Z1 ∼ N(0,1) and Z2 ∼ N(0,1) be independent.Let µi ∈ R, σ2

i > 0 for i = 1,2 and −1 ≤ ρ ≤ 1 and let

X1 = σ1Z1 + µ1

X2 = σ2(ρZ1 +√

1− ρ2Z2) + µ2 (2)

Then the joint distribution of X1 and X2 is bivariate normal withparameters µ1, µ2, σ2

1, σ22 and ρ

Theorem 5.10.2 (part 1) – the other wayLet X1 and X2 have the pdf in (1). Then there exist independentstandard normal r.v. Z1 and Z2 so that (2) holds.

38 / 43


Properties of a bivariate normal

Theorem 5.10.2 (part 2)Let X1 and X2 have the pdf in (1). Then the marginal distributions are

X1 ∼ N(µ1, σ21) and X2 ∼ N(µ2, σ

22)

And the correlation between X1 and X2 is ρ

Theorem 5.10.4: The conditional is normalLet X1 and X2 have the pdf in (1). Then the conditional distribution ofX2 given that X1 = x1 is (univariate) normal with

E(X2|X1 = x1) = µ2 + ρσ2(x1 − µ1)

σ1and

Var(X2|X1 = x1) = (1− ρ2)σ22

39 / 43


Properties of a bivariate normal

Theorem 5.10.3: Uncorrelated⇒ IndependentLet X1 and X2 have the bivariate normal distribution. Then X1 and X2are independent if and only if they are uncorrelated.

Only holds for the multivariate normal distributionOne of the very convenient properties of the normal distribution

Theorem 5.10.5: Linear combinations are normalLet X1 and X2 have the pdf in (1) and let a1, a2 and b be constants.Then Y = a1X1 + a2X2 + b is normally distributed with

E(Y ) = a1µ1 + a2µ2 + b and

Var(Y ) = a21σ

21 + a2

2σ22 + 2a1a2ρσ1σ2

This extends what we already had for independent normals40 / 43


Example

Let X1 and X2 have the bivariate normaldistribution with means µ1 = 3, µ2 = 5,variances σ2

1 = 4, σ22 = 9 and correlation

ρ = 0.6.a) Find the distribution of X2 − 2X1

b) What is expected value of X2, given thatwe observed X1 = 2?

c) What is the probability that X1 > X2?

41 / 43


Multivariate normal – Matrix notation

The pdf of an n-dimensional normal distribution, X ∼ N(µ,Σ):

f (x) =1

(2π)n/2|Σ|1/2 exp{−1

2(x− µ)ᵀΣ−1(x− µ)

}where

µ =

µ1µ2...µn

, x =

x1x2...

xn

and Σ =

σ2

1 σ1,2 σ1,3 · · · σ1,nσ2,1 σ2

2 σ2,3 · · · σ2,nσ3,1 σ3,2 σ2

3 · · · σ3,n...

......

. . ....

σn,1 σn,2 σn,3 · · · σ2n

µ is the mean vector and Σ is called the variance-covariance matrix.

42 / 43


Multivariate normal – Matrix notation

Same things hold for multivariate normal distribution as the bivariate.Let X ∼ N(µ,Σ)

Linear combinations of X are normalAX + b is (multivariate) normal for fixed matrix A and vector bThe marginal distribution of Xi is normal with mean µi andvariance σ2

i

The off-diagonal elements of Σ are the covariances betweenindividual elements of X, i.e. Cov(Xi ,Xj) = σi,j .The joint marginal distributions are also normal where the meanand covariance matrix are found by picking the correspondingelements from µ and rows and columns from Σ.The conditional distributions are also normal (multivariate orunivariate)

43 / 43