Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4...

32
Chapter 4 Discrete Probability Distributions 4.1 Random variable A random variable is a function that assigns values to different events in a sample space. Example 4.1.1. Consider the experiment of rolling two dice to- gether. Let X denote the sum of the two numbers. The sample 1

Transcript of Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4...

Page 1: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

Chapter 4

Discrete Probability Distributions

4.1 Random variable

A random variable is a function that assigns values to different events

in a sample space.

Example 4.1.1. Consider the experiment of rolling two dice to-

gether. Let X denote the sum of the two numbers. The sample

1

Page 2: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

space is given by,

S =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

11 12 13 14 15 16

21 22 23 24 25 26

31 32 33 34 35 36

41 42 43 44 45 46

51 52 53 54 55 56

61 62 63 64 65 66

⎫⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎭

(4.1.1)

Then X can take values 2, 3, . . . , 12 with probabilities 1/36, 2/36,

3/36, 4/36, 5/36, 6/36, 5/36, 4/36, 3/36, 2/36, and 1/36 respectively.

Thus X is a random variable.

Discrete random variable

Random variable whose values can be listed or counted are called

discrete random variable.

Example 4.1.2. Suppose a physician agrees to use a new antihyper-

tensive drug on a trial basis on the first four untreated hypertensive

patients she encounters in her practice. Let X denote the number

Chapter 4 2

Page 3: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

of patients out of 4 whose hypertension are brought under control.

then X is a discrete random variable with possible values 0, 1, 2, 3,

and 4.

Discrete random variables does not always have to assume integer

values. A pathologist, while grading liver biopsies, categorized the

amount of fat in the liver in the following ways: 0: no fat (< 5%), 1:

5%-30%, 2: 30%-70%, and 3: >70%. After finishing the grading he

realized that he should have defined the category 0 as absolutely no

fat (0%). Thy pathologist cleverly split the zero category into two to

add a category >0 - 5% which he denoted by .5. Thus the whole set

of values can be listed as: {0, .5, 1, 2, and 3}.

Continuous random variable

Random variable whose values cannot be listed or counted are called

discrete random variable. Continuous random variables have un-

countable sets as their support (set of values of the random variable).

Example 4.1.3. Height, Weight, BMI, Time to events, total amount

Chapter 4 3

Page 4: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

of drug taken, lab values such as creatinine clearance, drug half life

are examples of continuous random variables.

4.2 Probability mass function

The rule or function that expresses the probabilities associated with

the values of a random variable in terms of its values is called prob-

ability mass function or probability distribution. If x is a possible

value of a discrete random variable X , then the probability mass

function assigns the probability Pr(X = x) to the value x. Often

the relationship cannot be given by a single equation or formula and

in such cases a tabular representation of the values and the proba-

bilities form the probability mass function.

Example 4.2.1. Consider the experiment of rolling two dice to-

gether. Let X denote the sum of the two numbers. Then X can take

values 2, 3, . . . , 12 with probabilities 1/36, 2/36, 3/36, 4/36, 5/36,

6/36, 5/36, 4/36, 3/36, 2/36, and 1/36 respectively. Thus one can

Chapter 4 4

Page 5: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

write the probability distribution of X as follows:

Table 4.1: Probability distribution of sum of two numbers when two dice are rolled together.

x 2 3 4 5 6 7 8 9 10 11 12

Pr(X = x) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

Or, one can write the probability mass function using a mathematical

function

Pr(X = x) =6 − |7 − x|

12, x = 2, 3, . . . , 12. (4.2.1)

Example 4.2.2. Example 4.6 (FOB). Suppose a physician

agrees to use a new antihypertensive drug on a trial basis on the first

four untreated hypertensive patients she encounters in her practice.

Let X denote the number of patients out of 4 whose hypertension

are brought under control. then X is a discrete random variable with

possible values 0, 1, 2, 3, and 4. Suppose from the study conducted

by the drug company we know that when a patient is treated with

this drug, there is a 70% probability of response. Assuming that the

Chapter 4 5

Page 6: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

four patients are independent,

Pr(X = 0) = Pr(No Response on all)

= 0.3 ∗ 0.3 ∗ 0.3 ∗ 0.3

= 0.008.

P r(X = 1) = Pr(Response on the first but not on the rest)

+Pr(Response on the 2nd but not on the rest)

+Pr(Response on the 3rd but not on the rest)

+Pr(Response on the 4th but not on the rest)

= 4(0.3 ∗ 0.3 ∗ 0.3 ∗ 0.7) = 0.076.

P r(X = 2) = Pr(Response on first and second but not on the rest)

+Pr(Response on 1st and third but not on the rest)

+...

+Pr(Response on 3rd and 4th but not on the rest)

= 6(0.3 ∗ 0.3 ∗ 0.7 ∗ 0.7) = 0.265

Chapter 4 6

Page 7: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Pr(X = 3) = Pr(No response on the first but on the rest)

+Pr(No response on the 2nd but on the rest)

+Pr(No response on the 3rd but on the rest)

+Pr(No response on the 4th but on the rest)

= 4(0.3 ∗ 0.7 ∗ 0.7 ∗ 0.7) = 0.411.

And, similarly, Pr(X = 4) = Pr(Response on all) = 0.7 ∗ 0.7 ∗0.7 ∗ 0.7 = 0.240.

In tabular representation,

Table 4.2: Probability distribution of number of “Success” in hypertension control.

x 0 1 2 3 4

Pr(X = x) 0.008 0.076 0.265 0.411 0.240

In functional form,

Pr(X = x) =

⎛⎝ 4

x

⎞⎠ (0.7)x(0.3)4−x, x = 0, 1, 2, 3, 4,

which is so-called “Binomial distribution”.

Probability mass function satisfies:

Chapter 4 7

Page 8: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

(i)0 ≤ Pr(X = x) ≤ 1, and

(ii)∑

x P r(X = x) = 1.

Probability Distribution and Frequency Distribution

Probability distribution of a random variable describes how frequently

the values of the random variable are “expected” to occur in an in-

finite number of experiments. Whereas the relative frequency distri-

bution gives a snapshot of the same “observed” in a finite number of

experiments.

Example 4.2.3. Example 4.8 (FOB). Suppose the drug com-

pany provided the drug to 100 physicians and asked them to treat

first four of their untreated hypertensive patients with it. Out of 100

physicians, 19 were able to bring all the four patients under control,

48 brought 3 patients under control, 24 brought 2 patients under

control and the remaining 9 brought only 1 patient under control.

Here is the frequency distribution:

Chapter 4 8

Page 9: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Table 4.3: Frequency distribution of number of “Success” in hypertension con-

trol.

x 0 1 2 3 4 Total

Frequency 0 9 24 48 19 100

Relative Frequency 0/100 9/100 24/100 48/100 19/100 1

Compare this with the “Probability distribution” given in Table

4.2,

Table 4.4: Frequency and probability distribution of number of “Success” in

hypertension control.

x 0 1 2 3 4 Total

Relative Frequency 0.00 0.09 0.24 0.48 0.19 1.0

Pr(X = x) 0.008 0.076 0.265 0.411 0.240 1.0

In practice, one would like to know if the claim made by the company

is true or not. To do that one needs to compare the probability

distribution with the frequency distribution and see how “close” they

are. This is done by so-called “goodness-of-fit” test in statistics which

compares a theoretical probability model to an observed one.

Chapter 4 9

Page 10: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

4.2.1 Expected value/Population mean

Let us continue the hypertension example. What is the mean (aver-

age) number of patients brought under control out of 4 patients by

100 physicians?

x̄ =0 ∗ 0 + 1 ∗ 9 + 2 ∗ 24 + 3 ∗ 48 + 4 ∗ 19

100

= 0(0/100) + 1(9/100) + 2(24/100) + 3(48/100) + 4(19/100)

= 2.77. (4.2.2)

Thus on average each physician brought 2.8 patients under control

out of 4.

Notice that x̄ is being calculated based on the relative frequencies

from the frequency distribution. Similarly, one can think of what the

expected number of patients out of 4 would be brought under control

if the probability model provided by the drug company is correct.

This number is called the expected value of the random variable

Chapter 4 10

Page 11: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

“number of ‘Success’ in hypertension control”, and is calculated as

μ = 0(0.008) + 1(0.076) + 2(0.265) + 3(0.411) + 4(0.240)

= 2.80. (4.2.3)

Thus, if the company’s claim of 70% response were true, then on

average, we would expect 2.8 out of 4 patients to be under control

when treated with this drug. This expected number of responders in

samples of 4 patients is close to the observed average.

Expected value of a discrete random variable

μ = E(X) =∑

x x ∗ Pr(X = x)

Chapter 4 11

Page 12: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Example 4.2.4. Example 4.10 (FOB). The probability mass

function of X ,the number of episodes of otitis media in the first two

years of life is given by

Table 4.5: Probability mass function X= the number of episodes of otitis media

in the first two years of life.

x 0 1 2 3 4 5 6

Pr(X = x) 0.129 0.264 0.271 0.185 0.095 0.039 0.017

What is the expected number of episodes of otitis media in the first

two years of life?

μ = E(X) = 0(0.129)+1(0.264)+2(0.271)+3(0.185)+4(0.095)+

5(0.039) + 6(0.017) = 2.04.

Thus on average a child would be expected to have about 2 episodes

of otitis media in the first two years of life.

Chapter 4 12

Page 13: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

4.2.2 Variance of a discrete random variable

Sample variance, as introduced in chapter 2 describes how the ob-

servations are spread over the whole range. Variance of a random

variable, or the population variance measures the spread relative to

the expected value.

Variance of a discrete random variable

σ2 = V ar(X) =∑

x(x − μ)2 ∗ Pr(X = x), or

σ2 = V ar(X) = E(X2) − μ2,

σ2 = V ar(X) =∑

x x2 ∗ Pr(X = x) − μ2.

Example 4.2.5. Example 4.12 (FOB). The probability mass

function of X ,the number of episodes of otitis media in the first two

years of life is given by

Table 4.6: Probability mass function X= the number of episodes of otitis media

in the first two years of life.

x 0 1 2 3 4 5 6

Pr(X = x) 0.129 0.264 0.271 0.185 0.095 0.039 0.017

Chapter 4 13

Page 14: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

To calculate the variance, we will use the second formula. First let

us extend the above table to include a x2 row.

x 0 1 2 3 4 5 6

x2 0 1 4 9 16 25 36

Pr(X = x) 0.129 0.264 0.271 0.185 0.095 0.039 0.017

E(X2) = 0(0.129) + 1(0.264) + 4(0.271) + 9(0.185) + 16(0.095) +

25(0.039) + 36(0.017) = 6.12.

Then, σ2 = E(X2) − μ2 = 6.12 − (2.04)2 = 1.96.

Corresponding population standard deviation, σ =√

σ2 = 1.4.

Chapter 4 14

Page 15: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

4.3 Cumulative distribution function/Distribution func-

tion

For certain real number x, the cumulative distribution function F (x)

gives the probability that the random variable X assumes a value less

than or equal to x.

Cumulative distribution function (c.d.f)

F (x) = Pr(X ≤ x).

Example 4.3.1. Example 4.14 (FOB). The cumulative distri-

bution function of the random variable X , the number of episodes of

otitis media in the first two years of life.

x 0 1 2 3 4 5 6

Pr(X = x) 0.129 0.264 0.271 0.185 0.095 0.039 0.017

F(x) =Pr(X ≤ x) 0.129 0.393 0.664 0.849 0.944 0.983 1.0

Chapter 4 15

Page 16: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

The c.d.f F (x) is defined for all real numbers. In the above example,

we can calculate, for example,

F (−2) = 0,

F (2.1) = F (2) = 0.664,

F (8) = F (6) = 1.

4.4 Factorial, permutations and combinations

4.4.1 Factorial

How many ways can you order n objects? Or, how many ways n

individuals can sit in n chairs? Start with n = 2.

n = 2 : AB, BA → 2 = 2 ∗ 1 = 2!

n = 3 : ABC, ACB, BAC, BCA, CAB, CBA → 6 = 3∗2∗1 = 3!

Chapter 4 16

Page 17: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

n = 4 : ABCD, ABDC, ACBD, ACDB, ADBC, ADCB,

BACD, BADC, BCAD, BCDA, BDAC, BDCA,

CABD, CADB, CBAD, CBDA, CDAB, CDBA,

DABC, DACB, DBAC, DBCA, DCAB, DCBA

→ 24 = 4 ∗ 3 ∗ 2 ∗ 1 = 4!

For general n,

n! = n(n − 1)(n − 2) . . . 3 ∗ 2 ∗ 1.

0! = 1.

4.4.2 Permutations

How many ways can you choose k objects from n(≥ k) objects? Or,

how many ways n individuals can sit in k chairs?

The first chair can be filled in n possible ways.

The 2nd chair can be filled in n − 1 possible ways.

...

The kth chair can be filled in n − k + 1 possible ways.

Chapter 4 17

Page 18: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Thus the total number of ways k chairs can be occupied by n

individuals is

nPk= n ∗ (n − 1) ∗ . . . ∗ (n − k + 2) ∗ (n − k + 1)

=n!

(n − k)!. (4.4.1)

4.4.3 Combinations

Sometimes order of selection does not matter. For example, how

many ways can you choose 2 individuals from 4 who volunteered to

be in a clinical trial? Note that choosing A and B is same as choosing

B and A. The possible ways are

AB, AC, AD, BC, BD, CD.

We write,

4C2 =

⎛⎝ 4

2

⎞⎠ = 6.

In general,

Chapter 4 18

Page 19: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

nCk=

⎛⎝ n

k

⎞⎠ = n!

k!(n−k)!.

Simple way to calculate combinations

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

1 7 21 35 35 21 7 1

⎛⎝ 1

0

⎞⎠ = 1,

⎛⎝ 1

1

⎞⎠ = 1

⎛⎝ 2

0

⎞⎠ = 1,

⎛⎝ 2

1

⎞⎠ = 2,

⎛⎝ 2

2

⎞⎠ = 1

⎛⎝ 3

0

⎞⎠ = 1,

⎛⎝ 3

1

⎞⎠ = 3,

⎛⎝ 3

2

⎞⎠ = 3,

⎛⎝ 3

3

⎞⎠ = 1,

etc.

Chapter 4 19

Page 20: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

4.5 Binomial Distribution

Let us revisit the antihypertensive drug example (Example 4.2.2 in

this chapter).

With a 70% chance of response, if a physician tries the drug

on 4 patients, what is the probability that 2 of the four

patients’ hypertension will be under control?

Note that which patient responds is not important, we only look for

two responders out of four. How many ways can 2 responders can be

chosen out of 4? Obviously,

⎛⎝ 4

2

⎞⎠ = 6 ways

(RRNN, RNRN, RNNR, NRRN, NRNR, NNRR).

Now what is the probability that exactly one of theses sequences will

occur?

Pr(RRNN) = (.7)(.7)(.3)(.3) = (.7)2(.3)2 = 0.0441.

Chapter 4 20

Page 21: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Notice that every sequence has the same number of responders and

non-responders.

Pr(exactly 2 out of 4 will respond) =

⎛⎝ 4

2

⎞⎠ (.7)2(.3)2 = 0.265.

In this example, we had

• Two possible outcomes (Response/No response)

• A fixed number (4) of trials (patients)

• A constant probability (0.70) of “success” (response) for each

trial, and

• The trials (patients) are independent.

Under these conditions, the probability distribution of the number of

successes is said to follow a binomial distribution. For n independent

trials with each trial having probability of success p, the probability

distribution of the number of successes X is given by

Chapter 4 21

Page 22: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Binomial distribution

Pr(X = x) =

⎛⎝ n

x

⎞⎠ px(1 − p)n−x, x = 0, 1, 2, . . . , n.

Using the above distribution, we can easily calculate the probability

of any number of success in any number of trials. For instance, for

the antihypertensive drug example

Pr(3 out of 4 will respond) =

⎛⎝ 4

3

⎞⎠ .73(1 − .7)4−3 = 0.411.

Example 4.5.1. Example 4.25 (FOB). What is the probability

of obtaining 2 boys out of 5 children if the probability of a boy is

0.51 at each birth and the sexes of successive children are considered

to be independent of each other?

Here, n = 5, p = 0.51.

Pr(X = 2) =

⎛⎝ 5

2

⎞⎠ .512(1 − .51)5−2 = 0.306.

Example 4.5.2. What is the probability of obtaining at least 2

boys out of 5 children if the probability of a boy is 0.51 at each birth

Chapter 4 22

Page 23: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

and the sexes of successive children are considered to be independent

of each other?

Here, n = 5, p = 0.51.

Pr(X ≥ 2) = 1 − Pr(X ≤ 1)

= 1 − {Pr(X = 0) + Pr(X = 1)}

= 1 −⎧⎨⎩

⎛⎝ 5

0

⎞⎠ .510(1 − .51)5−0 +

⎛⎝ 5

1

⎞⎠ .511(1 − .51)5−1

⎫⎬⎭

= 1 − {0.028 + 0.147}= 1 − 0.175

= 0.825.

The above example shows that if n becomes even moderately large,

it is time consuming, if not difficult to compute the probabilities.

Realizing this, statisticians have tabulated the binomial probabilities

for moderately large sample sizes (Table 1, Appendix, FOB) and

some specific values of p.

Example 4.5.3. What is the probability of obtaining exactly 3

Chapter 4 23

Page 24: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

boys out of 7 children if the probability of a boy is 0.51 at each birth

and the sexes of successive children are considered to be independent

of each other?

Here, n = 7, p = 0.51. From Table 1, Appendix

Pr(X = 3) = 0.27.

The exact probability is

⎛⎝ 7

3

⎞⎠ .517(1 − .51)7−3 = 0.27.

Example 4.5.4. What is the probability of obtaining more than

3 boys out of 7 children if the probability of a boy is 0.51 at each

birth and the sexes of successive children are considered to be inde-

pendent of each other?

Here, n = 7, p = 0.51.

Pr(X > 3) = 1 − Pr(X ≤ 3)

= 1 − {Pr(X = 0) + Pr(X = 1) + Pr(X = 2) + Pr(X = 3)}= 1 − {0.0078 + 0.0547 + 0.1641 + 0.2734}= 1 − 0.5 = 0.5.

Chapter 4 24

Page 25: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Run the following codes in SAS to check the answers:

data binom;

/*calculate P(X<=3) from b(n=7,p=0.51) distribution*/

pxle3=CDF(’Binomial’,3,0.51,7);

/*calculate P(X>3) from b(n=7,p=0.51) distribution*/

pxgt3=1-CDF(’Binomial’,3,0.51,7);

/*calculate P(X>=3) from b(n=7,p=0.51) distribution*/

pxge3=1-CDF(’Binomial’,2,0.51,7);

/*calculate P(X=3) from b(n=7,p=0.51) distribution*/

pxeq3=pmf(’Binomial’,3,0.51,7);

run;

proc print; run;

Example 4.5.5. Example 4.29. (FOB). Compute (i) the prob-

ability of obtaining exactly 75 cases of chronic bronchitis and (ii) the

probability of obtaining at least 75 cases of chronic bronchitis in the

first year of life among 1500 families, where both parents are chronic

bronchitics, if the underlying incidence rate of chronic bronchitis in

Chapter 4 25

Page 26: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

the first year of life is 0.05.

Here, n = 1500, p = 0.05, and X=cases of chronic bronchitis in the

first year of life among 1500 families.

(i) Pr(X = 75) =

⎛⎝ 1500

75

⎞⎠ (.05)75(1 − .05)1500−75

= 0.047.

(ii) Pr(X ≥ 75) =1500∑x=75

⎛⎝ 1500

x

⎞⎠ (.05)x(1 − .05)1500−x

= 1 − 0.483 = 0.517.

SAS codes:

data binom2;

/*calculate P(X=75) from b(n=1500,p=0.05) distribution*/

pxeq75=PMF(’Binomial’,75,0.05,1500);

/*calculate P(X>=75) from b(n=1500,p=0.05) distribution*/

pxge75=1-CDF(’Binomial’,74,0.05,1500);

run;

proc print; run;

Chapter 4 26

Page 27: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Expected value (Mean) and Variance of the Binomial Distribution

If we conduct a Bernoulli trial (a trial that results in only two out-

comes - success and failure) with probability of success p n times,

what would be the expected number of successes?

Mean and variance of a binomial distribution

E(X) = np,

V ar(X) = np(1 − p).

Now that we know the formula, we can easily calculate the expected

number of successes in four hypertensive patients when treated with

an antihypertensive drug with a response rate of 70%. The answer

is: 4*.70=2.8 (same as what we found in equation 4.2.3 on Page 11).

Parameters of binomial distribution

n and p are the two parameters of Binomial distribution.

Chapter 4 27

Page 28: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

4.6 Poisson Distribution

In the binomial distribution, we had a fixed number of trials. How-

ever, in many situations the number of trials might not be fixed.

Here are some situations:

• Number of Lexus brand cars crossing a particular intersection

within a fixed time interval (theoretically it could be 0, 1, 2, . . . ,∞.)

• Number of deaths caused by typhoid fever in 20 years

• Number of bacterial colonies growing on a 100− cm2 agar plate

• Number of goals scored by a team in a 20-minute game

In all these cases, there is no fixed number of trials. But what if we

want to calculate, for instance, the probability that the team scores

just one goal in a 20-minute period?

Chapter 4 28

Page 29: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Assumptions of Poisson Distribution

1. The probability of an event in an infinitesimal time interval is

very small

2. The number of events in two distinct time intervals are indepen-

dent, and

3. The rate of occurrence depends only on the length of time (pro-

portional to the length), but not on where the interval starts or

ends.

Under the above assumptions, the distribution of the number of

events within a specific period of time is said to follow a Poisson

distribution. Suppose the rate at which the events occur in an inter-

val is μ. Then the probability mass function for X , the number of

events in that interval is:

Probability mass function of a Poisson random variable

Pr(X = x) = e−µμx

x!, x = 0, 1, 2, . . .

Chapter 4 29

Page 30: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

Parameters of Poisson distribution

μ is the only parameter of the Poisson distribution.

Example 4.6.1. Suppose that the number of deaths from typhoid

fever over a 1-year period is distributed as a Poisson random variable

with parameter μ = 4.6.

1. What is the probability mass function of the number of deaths

in 6-months period?

Since μ = 4.6/year, μ = 2.3/6-month. Let X = the number

of deaths within 6-month period. Then the distribution of the

number of deaths within 6-month period is Poisson with mass

function:

Pr(X = x) =e−2.3(2.3)x

x!, x = 0, 1, 2, . . . .

2. What is the probability that 3 deaths occur in 6-months period?

Pr(X = 3) =e−2.3(2.3)3

3!= 0.203.

Chapter 4 30

Page 31: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

3. What is the probability of more than 3 deaths in 6-months pe-

riod?

Pr(X > 3) = 1 − {Pr(X = 0) + Pr(X = 1) + Pr(X = 2) + Pr(X = 3)}= 1 −

{e−2.3 + e−2.3(2.3) +

e−2.3(2.3)2

2!+

e−2.3(2.3)3

3!

}

= 1 − {.10 + .23 + .27 + .20}= 0.20. (4.6.1)

4. What is the probability of no more than 2 deaths in 3-months

period?

Let Y = the number of deaths within 3-month period. Then the

distribution of the number of deaths within 3-month period is

Poisson with mass function:

Pr(Y = y) =e−1.15(1.15)y

y!, y = 0, 1, 2, . . . .

P r(Y ≤ 2) = Pr(Y = 0) + Pr(Y = 1) + Pr(Y = 2)

= e−1.15 + e−1.15(1.15) +e−1.15(1.15)2

2!

= .32 + .36 + .21

= 0.89. (4.6.2)

Chapter 4 31

Page 32: Chapter 4 Discrete Probability Distributionswahed/teaching/2041/summer09/chapter 4.pdf · Chapter 4 Discrete Probability Distributions 4.1 Random variable ... Chapter 4 5. BIOS 2041

BIOS 2041 Statistical Methods Abdus S. Wahed

SAS codes:

data poisson;

/*Prob that 3 deaths occur in 6 months period*/

pxeq3=PMF(’Poisson’,3,2.3);

/*Prob that more than 3 deaths occur in 6 months period*/

pxgt3=1-CDF(’Poisson’,3,2.3);

/*Prob that no more than 2 deaths occur in 3 months period

pyle2=CDF(’Poisson’,2,1.15);

run;

proc print; run;

Chapter 4 32