Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it...

112
Discrete Mathematics and Its Applications Lecture 5: Discrete Probability: Random Variables MING GAO DaSE@ ECNU (for course related communications) [email protected] May 15, 2020

Transcript of Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it...

Page 1: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Discrete Mathematics and Its ApplicationsLecture 5: Discrete Probability: Random Variables

MING GAO

DaSE@ ECNU(for course related communications)

[email protected]

May 15, 2020

Page 2: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Outline

1 Random Variable

2 Bernoulli Trials and the Binomial Distribution

3 Bayes’ Theorem

4 Applications of Bayes’ Theorem

5 Take-aways

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 2 / 38

Page 3: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Random variables

Definition: A random variable (r.v.) X is a function from samplespace Ω of an experiment to the set of real numbers in R, i.e.,

∀ω ∈ Ω,X (ω) = x ∈ R.

Remarks

Note that a random variable is a function. It is not a variable,and it is not random!

We usually use notation X ,Y , etc. to represent a r.v., and x , yto represent the numerical values. For example, X = x meansthat r.v. X has value x .

The domain of the function can be countable and uncountable.If it is countable, the random variable is a discrete r.v.,otherwise continuous r.v..

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 3 / 38

Page 4: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Random variables

Definition: A random variable (r.v.) X is a function from samplespace Ω of an experiment to the set of real numbers in R, i.e.,

∀ω ∈ Ω,X (ω) = x ∈ R.

Remarks

Note that a random variable is a function. It is not a variable,and it is not random!

We usually use notation X ,Y , etc. to represent a r.v., and x , yto represent the numerical values. For example, X = x meansthat r.v. X has value x .

The domain of the function can be countable and uncountable.If it is countable, the random variable is a discrete r.v.,otherwise continuous r.v..

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 3 / 38

Page 5: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of r.v.

A coin is tossed. If X is the r.v. whose value is the number of headsobtained, then

X (H) = 1,X (T ) = 0.

And then tossed again. We define sample space Ω =HH,HT ,TH,TT. If Y is the r.v. whose value is the numberof heads obtained, then

X (HH) = 2,X (HT ) = X (TH) = 1,X (TT ) = 0.

When a player rolls a die, he will win $1 if the outcome is 1,2 or 3,otherwise lose 1$. Let Ω = 1, 2, 3, 4, 5, 6 and define X as follows:

X (1) = X (2) = X (3) = 1,X (4) = X (5) = X (6) = −1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 4 / 38

Page 6: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of r.v.

A coin is tossed. If X is the r.v. whose value is the number of headsobtained, then

X (H) = 1,X (T ) = 0.

And then tossed again. We define sample space Ω =HH,HT ,TH,TT. If Y is the r.v. whose value is the numberof heads obtained, then

X (HH) = 2,X (HT ) = X (TH) = 1,X (TT ) = 0.

When a player rolls a die, he will win $1 if the outcome is 1,2 or 3,otherwise lose 1$. Let Ω = 1, 2, 3, 4, 5, 6 and define X as follows:

X (1) = X (2) = X (3) = 1,X (4) = X (5) = X (6) = −1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 4 / 38

Page 7: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of r.v.

A coin is tossed. If X is the r.v. whose value is the number of headsobtained, then

X (H) = 1,X (T ) = 0.

And then tossed again. We define sample space Ω =HH,HT ,TH,TT. If Y is the r.v. whose value is the numberof heads obtained, then

X (HH) = 2,X (HT ) = X (TH) = 1,X (TT ) = 0.

When a player rolls a die, he will win $1 if the outcome is 1,2 or 3,otherwise lose 1$. Let Ω = 1, 2, 3, 4, 5, 6 and define X as follows:

X (1) = X (2) = X (3) = 1,X (4) = X (5) = X (6) = −1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 4 / 38

Page 8: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Random variables VS. events

Suppose now that a sample space Ω = ω1, ω2, · · · , ωn is given,and r.v. X on Ω is defined the number of heads obtained when wetoss a coin twice.

Event E1 represents only one head obtained. Hence,

E1 = ω : X (ω) = 1;

Event E2 represents even heads obtained. Hence,

E = ω : X (ω) mod 2 = 0;

Event E2 represents at least one heads obtained. Hence,

E = ω : X (ω) > 0.

These indicate that we can also define probability about r.v.s.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 5 / 38

Page 9: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Random variables VS. events

Suppose now that a sample space Ω = ω1, ω2, · · · , ωn is given,and r.v. X on Ω is defined the number of heads obtained when wetoss a coin twice.

Event E1 represents only one head obtained. Hence,

E1 = ω : X (ω) = 1;

Event E2 represents even heads obtained. Hence,

E = ω : X (ω) mod 2 = 0;

Event E2 represents at least one heads obtained. Hence,

E = ω : X (ω) > 0.

These indicate that we can also define probability about r.v.s.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 5 / 38

Page 10: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Distribution

Definition: The distribution of a r.v. X on a sample space Ω is theset of pairs (r , p(X = r)) for all r ∈ X (Ω), where P(X = r) is theprobability that r.v. X takes value r . That is, the set of pairs in thisdistribution is determined by probabilities P(X = r) for r ∈ X (Ω).

Remarks

Distribution is also a function;

If we define event E which X has vaule x in Ω, then,

P(E ) = P(ω : X (ω) = x) = P(X = x) = f (x);

f (x) is a probability distribution (function) if

f (x) ≥ 0;∑x f (x) = 1;

P(X ≤ c) = P(ω ∈ Ω : X (ω) ≤ c).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 6 / 38

Page 11: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Distribution

Definition: The distribution of a r.v. X on a sample space Ω is theset of pairs (r , p(X = r)) for all r ∈ X (Ω), where P(X = r) is theprobability that r.v. X takes value r . That is, the set of pairs in thisdistribution is determined by probabilities P(X = r) for r ∈ X (Ω).

Remarks

Distribution is also a function;

If we define event E which X has vaule x in Ω, then,

P(E ) = P(ω : X (ω) = x) = P(X = x) = f (x);

f (x) is a probability distribution (function) if

f (x) ≥ 0;∑x f (x) = 1;

P(X ≤ c) = P(ω ∈ Ω : X (ω) ≤ c).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 6 / 38

Page 12: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136

3 118

4 112 5 1

96 5

36 7 16

8 536 9 1

910 1

12 11 118

12 136

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 13: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

18

4 112 5 1

96 5

36 7 16

8 536 9 1

910 1

12 11 118

12 136

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 14: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12

5 19

6 536 7 1

68 5

36 9 19

10 112 11 1

1812 1

36

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 15: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

68 5

36 9 19

10 112 11 1

1812 1

36

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 16: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536

7 16

8 536 9 1

910 1

12 11 118

12 136

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 17: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

6

8 536 9 1

910 1

12 11 118

12 136

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 18: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

68 5

36

9 19

10 112 11 1

1812 1

36

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 19: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

68 5

36 9 19

10 112 11 1

1812 1

36

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 20: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

68 5

36 9 19

10 112

11 118

12 136

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 21: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

68 5

36 9 19

10 112 11 1

18

12 136

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 22: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: Let X be the sum of the numbers that appear when apair of dice is rolled. What are the values and probabilities of thisrandom variable for 36 possible outcomes (i , j), when these two dicesare rolled?

Solution:value prob. value prob.

2 136 3 1

184 1

12 5 19

6 536 7 1

68 5

36 9 19

10 112 11 1

1812 1

36

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 7 / 38

Page 23: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Joint and marginal probability distributions

Definition

Let X and Y be two r.v.s,

f (x , y) = P(X = x ∧ Y = y) is the joint probabilitydistribution;

fX (x) is the marginal probability distribution for r.v. X .

Note that

fX (x) = P(X = x) = P(X = x ∧ Ω)

= P(X = x ∧ (Y = y1 ∨ Y = y2 ∨ · · · ))

= P((X = x ∧ Y = y1) ∨ (X = x ∧ Y = y2) ∨ · · · )= P(X = x ∧ Y = y1) + P(X = x ∧ Y = y2) + · · ·

=∑y

P(X = x ∧ Y = y) =∑yi

f (x , y)

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 8 / 38

Page 24: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Joint and marginal probability distributions

Definition

Let X and Y be two r.v.s,

f (x , y) = P(X = x ∧ Y = y) is the joint probabilitydistribution;

fX (x) is the marginal probability distribution for r.v. X .

Note that

fX (x) = P(X = x) = P(X = x ∧ Ω)

= P(X = x ∧ (Y = y1 ∨ Y = y2 ∨ · · · ))

= P((X = x ∧ Y = y1) ∨ (X = x ∧ Y = y2) ∨ · · · )= P(X = x ∧ Y = y1) + P(X = x ∧ Y = y2) + · · ·

=∑y

P(X = x ∧ Y = y) =∑yi

f (x , y)

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 8 / 38

Page 25: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Joint and marginal probability distributions

Definition

Let X and Y be two r.v.s,

f (x , y) = P(X = x ∧ Y = y) is the joint probabilitydistribution;

fX (x) is the marginal probability distribution for r.v. X .

Note that

fX (x) = P(X = x) = P(X = x ∧ Ω)

= P(X = x ∧ (Y = y1 ∨ Y = y2 ∨ · · · ))

= P((X = x ∧ Y = y1) ∨ (X = x ∧ Y = y2) ∨ · · · )= P(X = x ∧ Y = y1) + P(X = x ∧ Y = y2) + · · ·

=∑y

P(X = x ∧ Y = y) =∑yi

f (x , y)

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 8 / 38

Page 26: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Independence of r.v.

Definition

Let r.v.s X and Y are pair-wise independent if and only if for∀x , y ∈ R, we have

P(X = x ∧ Y = y) = P(X = x)P(Y = y);

Let r.v.s X1,X2, · · · ,Xn are mutually independent if and onlyif for ∀xij ∈ R

P(Xi1 = xi1 ∧ Xi2 = xi2 ∧ · · · ∧ Xim = xim)

= P(Xi1 = xi1)P(Xi2 = xi2) · · ·P(Xim = xim),

where ij , j = 1, 2, · · · ,m, are integers with1 ≤ i1 < i2 < · · · < im ≤ n and m ≥ 2.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 9 / 38

Page 27: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Independence of r.v.

Definition

Let r.v.s X and Y are pair-wise independent if and only if for∀x , y ∈ R, we have

P(X = x ∧ Y = y) = P(X = x)P(Y = y);

Let r.v.s X1,X2, · · · ,Xn are mutually independent if and onlyif for ∀xij ∈ R

P(Xi1 = xi1 ∧ Xi2 = xi2 ∧ · · · ∧ Xim = xim)

= P(Xi1 = xi1)P(Xi2 = xi2) · · ·P(Xim = xim),

where ij , j = 1, 2, · · · ,m, are integers with1 ≤ i1 < i2 < · · · < im ≤ n and m ≥ 2.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 9 / 38

Page 28: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Independence of r.v. Cont’d

Corollary

Let r.v.s X and Y are independent if and only if for ∀x , y ∈ R, s.t.P(Y = y) 6= 0, we have

P(X = x |Y = y) =P(X = x ∧ Y = y)

P(Y = y)

=P(X = x)P(Y = y)

P(Y = y)= P(X = x)

Definition

Let X and Y be two r.v.s,

f (x , y) = P(X = x ∧ Y = y) is the joint probability function;

f1(x) is the marginal probability function for r.v. X .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 10 / 38

Page 29: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Independence of r.v. Cont’d

Corollary

Let r.v.s X and Y are independent if and only if for ∀x , y ∈ R, s.t.P(Y = y) 6= 0, we have

P(X = x |Y = y) =P(X = x ∧ Y = y)

P(Y = y)

=P(X = x)P(Y = y)

P(Y = y)= P(X = x)

Definition

Let X and Y be two r.v.s,

f (x , y) = P(X = x ∧ Y = y) is the joint probability function;

f1(x) is the marginal probability function for r.v. X .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 10 / 38

Page 30: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Random Variable

Examples of distribution

Question: A biased coin (Pr(H) = 2/3) is flipped twice. Let Xcount the number of heads. What are the values and probabilities ofthis random variable?Solution:Let Xi count the number of heads in the i−th flip.

Pr(X = 0) = Pr(X1 = 0 ∧ X2 = 0) = Pr(X1 = 0)P(X2 = 0)

= (1/3)2 = 1/9

Pr(X = 1) = Pr((X1 = 0 ∧ X2 = 1) ∨ (X1 = 1 ∧ X2 = 0))

= Pr(X1 = 1)P(X2 = 0) + Pr(X1 = 0)P(X2 = 1)

= 2 · 1/3 · 2/3 = 4/9

Pr(X = 2) = Pr(X1 = 1 ∧ X2 = 1) = Pr(X1 = 1)P(X2 = 1)

= (2/3)2 = 4/9

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 11 / 38

Page 31: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Bernoulli Trials

Definition

Each performance of an experiment with two possible outcomes iscalled a Bernoulli trial.

In general, a possible outcome of a Bernoulli trial is called asuccess or a failure.

If p is the probability of a success and q is the probability of afailure, it follows that p + q = 1.

Many problems can be solved by determining the probability ofk successes when an experiment consists of n mutuallyindependent Bernoulli trials.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 12 / 38

Page 32: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Bernoulli Trials

Definition

Each performance of an experiment with two possible outcomes iscalled a Bernoulli trial.

In general, a possible outcome of a Bernoulli trial is called asuccess or a failure.

If p is the probability of a success and q is the probability of afailure, it follows that p + q = 1.

Many problems can be solved by determining the probability ofk successes when an experiment consists of n mutuallyindependent Bernoulli trials.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 12 / 38

Page 33: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Mutually independent Bernoulli trials

Flipping coin

Question: A coin is biased so that the probability of heads is 2/3.What is the probability that exactly four heads come up when thecoin is flipped seven times, assuming that the flips are independent?

Solution:Let r.v. Xi be the i−th flip of the coin (i = 1, 2, · · · , 7), where Xi

denote whether obtain the head or not. Hence, we have

Xi =

1, if we obtain head;0, otherwise.

Let r.v. X be # heads when the coin is flipped seven times. Wehave

X =7∑

i=1

Xi .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 13 / 38

Page 34: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Mutually independent Bernoulli trials

Flipping coin

Question: A coin is biased so that the probability of heads is 2/3.What is the probability that exactly four heads come up when thecoin is flipped seven times, assuming that the flips are independent?Solution:

Let r.v. Xi be the i−th flip of the coin (i = 1, 2, · · · , 7), where Xi

denote whether obtain the head or not. Hence, we have

Xi =

1, if we obtain head;0, otherwise.

Let r.v. X be # heads when the coin is flipped seven times. Wehave

X =7∑

i=1

Xi .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 13 / 38

Page 35: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Mutually independent Bernoulli trials

Flipping coin

Question: A coin is biased so that the probability of heads is 2/3.What is the probability that exactly four heads come up when thecoin is flipped seven times, assuming that the flips are independent?Solution:Let r.v. Xi be the i−th flip of the coin (i = 1, 2, · · · , 7), where Xi

denote whether obtain the head or not. Hence, we have

Xi =

1, if we obtain head;0, otherwise.

Let r.v. X be # heads when the coin is flipped seven times. Wehave

X =7∑

i=1

Xi .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 13 / 38

Page 36: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Mutually independent Bernoulli trials

Flipping coin

Question: A coin is biased so that the probability of heads is 2/3.What is the probability that exactly four heads come up when thecoin is flipped seven times, assuming that the flips are independent?Solution:Let r.v. Xi be the i−th flip of the coin (i = 1, 2, · · · , 7), where Xi

denote whether obtain the head or not. Hence, we have

Xi =

1, if we obtain head;0, otherwise.

Let r.v. X be # heads when the coin is flipped seven times. Wehave

X =7∑

i=1

Xi .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 13 / 38

Page 37: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

X = 4 means that there are only four 1s in seven r.v.s Xi .

The number of ways four of the seven flips can be heads is C (7, 4).Note that X1 = X2 = X3 = X4 = 1 and X5 = X6 = X7 = 0 is one ofways. Hence, we have

P(X1 = 1∧X2 = 1 ∧ X3 = 1 ∧ X4 = 1∧X5 = 0 ∧ X6 = 0 ∧ X7 = 0)

= (2/3)4(1/3)3

Therefore,P(X = 4) = C (7, 4)(2/3)4(1/3)3.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 14 / 38

Page 38: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

X = 4 means that there are only four 1s in seven r.v.s Xi .The number of ways four of the seven flips can be heads is C (7, 4).

Note that X1 = X2 = X3 = X4 = 1 and X5 = X6 = X7 = 0 is one ofways. Hence, we have

P(X1 = 1∧X2 = 1 ∧ X3 = 1 ∧ X4 = 1∧X5 = 0 ∧ X6 = 0 ∧ X7 = 0)

= (2/3)4(1/3)3

Therefore,P(X = 4) = C (7, 4)(2/3)4(1/3)3.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 14 / 38

Page 39: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

X = 4 means that there are only four 1s in seven r.v.s Xi .The number of ways four of the seven flips can be heads is C (7, 4).Note that X1 = X2 = X3 = X4 = 1 and X5 = X6 = X7 = 0 is one ofways. Hence, we have

P(X1 = 1∧X2 = 1 ∧ X3 = 1 ∧ X4 = 1∧X5 = 0 ∧ X6 = 0 ∧ X7 = 0)

= (2/3)4(1/3)3

Therefore,P(X = 4) = C (7, 4)(2/3)4(1/3)3.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 14 / 38

Page 40: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

X = 4 means that there are only four 1s in seven r.v.s Xi .The number of ways four of the seven flips can be heads is C (7, 4).Note that X1 = X2 = X3 = X4 = 1 and X5 = X6 = X7 = 0 is one ofways. Hence, we have

P(X1 = 1∧X2 = 1 ∧ X3 = 1 ∧ X4 = 1∧X5 = 0 ∧ X6 = 0 ∧ X7 = 0)

= (2/3)4(1/3)3

Therefore,P(X = 4) = C (7, 4)(2/3)4(1/3)3.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 14 / 38

Page 41: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Binomial distribution

Theorem

The probability of exactly k successes in n independent Bernoullitrials, with probability of success p and probability of failure q = 1−p,is

P(X = k) = C (n, k)pkqn−k .

Binomial distribution

Let B(k ; n, p) denote the probability of k successes in n independentBernoulli trials with probability of success p and probability of failureq = 1 − p. We call this function the binomial distribution, i.e.,B(k ; n, p) = P(X = k) = C (n, k)pkqn−k .Note that we will say X ∼ Bin(n, p), and

n∑k=0

C (n, k)pkqn−k = (p + q)n = 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 15 / 38

Page 42: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Binomial distribution

Theorem

The probability of exactly k successes in n independent Bernoullitrials, with probability of success p and probability of failure q = 1−p,is

P(X = k) = C (n, k)pkqn−k .

Binomial distribution

Let B(k ; n, p) denote the probability of k successes in n independentBernoulli trials with probability of success p and probability of failureq = 1 − p. We call this function the binomial distribution, i.e.,B(k ; n, p) = P(X = k) = C (n, k)pkqn−k .

Note that we will say X ∼ Bin(n, p), and

n∑k=0

C (n, k)pkqn−k = (p + q)n = 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 15 / 38

Page 43: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Binomial distribution

Theorem

The probability of exactly k successes in n independent Bernoullitrials, with probability of success p and probability of failure q = 1−p,is

P(X = k) = C (n, k)pkqn−k .

Binomial distribution

Let B(k ; n, p) denote the probability of k successes in n independentBernoulli trials with probability of success p and probability of failureq = 1 − p. We call this function the binomial distribution, i.e.,B(k ; n, p) = P(X = k) = C (n, k)pkqn−k .Note that we will say X ∼ Bin(n, p), and

n∑k=0

C (n, k)pkqn−k = (p + q)n = 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 15 / 38

Page 44: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Binomial distribution Cont’d

This distribution is useful for modeling many real-worldproblems, such as # 3s when we roll a die n times,

The Bernoulli distribution is a special case of the binomialdistribution, where n = 1.

Any binomial distribution, Bin(n, p), is the distribution of thesum of n Bernoulli trials, Bin(p), each with probability p.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 16 / 38

Page 45: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

Let r.v. Y be the number of coin flips until the first head obtained.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1)

= pqk−1

Let G (k ; p) denote the probability of failures before the k−th inde-pendent Bernoulli trials with probability of success p and probabilityof failure q = 1− p.We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 17 / 38

Page 46: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

Let r.v. Y be the number of coin flips until the first head obtained.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1)

= pqk−1

Let G (k ; p) denote the probability of failures before the k−th inde-pendent Bernoulli trials with probability of success p and probabilityof failure q = 1− p.We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 17 / 38

Page 47: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

Let r.v. Y be the number of coin flips until the first head obtained.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1)

= pqk−1

Let G (k ; p) denote the probability of failures before the k−th inde-pendent Bernoulli trials with probability of success p and probabilityof failure q = 1− p.

We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 17 / 38

Page 48: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Flipping coin Cont’d

Let r.v. Y be the number of coin flips until the first head obtained.

P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · ·Xk−1 = 0 ∧ Xk = 1)

= Πk−1i=1 P(Xi = 0) · P(Xk = 1)

= pqk−1

Let G (k ; p) denote the probability of failures before the k−th inde-pendent Bernoulli trials with probability of success p and probabilityof failure q = 1− p.We call this function the Geometric distribution, i.e.,

G (k; p) = pqk−1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 17 / 38

Page 49: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing

Question: Hashing functions map a large universe of keys (such asthe approximately 300 million Social Security numbers in the UnitedStates) to a much smaller set of storage locations. A good hashingfunction yields few collisions, which are mappings of two differentkeys to the same memory location. What is the probability that notwo keys are mapped to the same location by a hashing function, or,in other words, that there are no collisions?Solution:To calculate this probability, we assume that the probability that arandomly and uniformly selected key is mapped to a location is 1/m,where m is # available locations.

Suppose that the keys are k1, k2, · · · , kn. When we add a new recordki , the probability that it is mapped to a location different from thelocations of already hashed records, that h(ki ) 6= h(kj) for 1 ≤ j < iis (m − i + 1)/m.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 18 / 38

Page 50: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing

Question: Hashing functions map a large universe of keys (such asthe approximately 300 million Social Security numbers in the UnitedStates) to a much smaller set of storage locations. A good hashingfunction yields few collisions, which are mappings of two differentkeys to the same memory location. What is the probability that notwo keys are mapped to the same location by a hashing function, or,in other words, that there are no collisions?Solution:To calculate this probability, we assume that the probability that arandomly and uniformly selected key is mapped to a location is 1/m,where m is # available locations.Suppose that the keys are k1, k2, · · · , kn. When we add a new recordki , the probability that it is mapped to a location different from thelocations of already hashed records, that h(ki ) 6= h(kj) for 1 ≤ j < iis (m − i + 1)/m.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 18 / 38

Page 51: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing Cont’d

Because the keys are independent, the probability that all n keys aremapped to different locations is

H(n,m) =m − 1

m

m − 2

m· · · m − n + 1

m.

Recall the bounds for the same birthday problem that

en(n−1)

2m ≤ mk

m(m − 1)(m − 2) · · · (m − n + 1)=

1

H(n,m)≤ e

n(n−1)2(m−n+1) ,

That is

1− e−n(n−1)

2m ≤ 1− H(n,m) ≤ 1− e− n(n−1)

2(m−n+1) .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 19 / 38

Page 52: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing Cont’d

Because the keys are independent, the probability that all n keys aremapped to different locations is

H(n,m) =m − 1

m

m − 2

m· · · m − n + 1

m.

Recall the bounds for the same birthday problem that

en(n−1)

2m ≤ mk

m(m − 1)(m − 2) · · · (m − n + 1)=

1

H(n,m)≤ e

n(n−1)2(m−n+1) ,

That is

1− e−n(n−1)

2m ≤ 1− H(n,m) ≤ 1− e− n(n−1)

2(m−n+1) .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 19 / 38

Page 53: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing Cont’d

Because the keys are independent, the probability that all n keys aremapped to different locations is

H(n,m) =m − 1

m

m − 2

m· · · m − n + 1

m.

Recall the bounds for the same birthday problem that

en(n−1)

2m ≤ mk

m(m − 1)(m − 2) · · · (m − n + 1)=

1

H(n,m)≤ e

n(n−1)2(m−n+1) ,

That is

1− e−n(n−1)

2m ≤ 1− H(n,m) ≤ 1− e− n(n−1)

2(m−n+1) .

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 19 / 38

Page 54: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing Cont’d

Techniques from calculus can be used to find the smallest value of ngiven a value of m such that the probability of a collision is greaterthan a particular threshold, for example 0.5.

0.5 ≤ 1− e−n(n−1)

2m ≤ 1− H(n,m).

Hence, we have

n(n − 1) > 2 ln 2 ·m, i.e., n >√

2 ln 2 ·m( approximately).

For example, when m = 1, 000, 000, the smallest integer n such thatthe probability of a collision is greater than 1/2 is 1178.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 20 / 38

Page 55: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing Cont’d

Techniques from calculus can be used to find the smallest value of ngiven a value of m such that the probability of a collision is greaterthan a particular threshold, for example 0.5.

0.5 ≤ 1− e−n(n−1)

2m ≤ 1− H(n,m).

Hence, we have

n(n − 1) > 2 ln 2 ·m, i.e., n >√

2 ln 2 ·m( approximately).

For example, when m = 1, 000, 000, the smallest integer n such thatthe probability of a collision is greater than 1/2 is 1178.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 20 / 38

Page 56: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Collision in hashing Cont’d

Techniques from calculus can be used to find the smallest value of ngiven a value of m such that the probability of a collision is greaterthan a particular threshold, for example 0.5.

0.5 ≤ 1− e−n(n−1)

2m ≤ 1− H(n,m).

Hence, we have

n(n − 1) > 2 ln 2 ·m, i.e., n >√

2 ln 2 ·m( approximately).

For example, when m = 1, 000, 000, the smallest integer n such thatthe probability of a collision is greater than 1/2 is 1178.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 20 / 38

Page 57: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithms

A Monte Carlo algorithm is a randomized or probabilistic algorith-m whose output may be inaccuracy with a certain (typically small)probability.Probabilistic algorithms make random choices at one or more steps,and result in different output even given the same input, which isdifferent from all deterministic algorithms.

Monte Carlo algorithm for a decision problem: The probabilitythat the algorithm answers the decision problem correctly increasesas more tests are carried out.Step i:

Algorithm responses

true, the answer is “true”;unknown, either “true” or “false.”

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 21 / 38

Page 58: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithms

A Monte Carlo algorithm is a randomized or probabilistic algorith-m whose output may be inaccuracy with a certain (typically small)probability.Probabilistic algorithms make random choices at one or more steps,and result in different output even given the same input, which isdifferent from all deterministic algorithms.

Monte Carlo algorithm for a decision problem: The probabilitythat the algorithm answers the decision problem correctly increasesas more tests are carried out.

Step i:

Algorithm responses

true, the answer is “true”;unknown, either “true” or “false.”

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 21 / 38

Page 59: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithms

A Monte Carlo algorithm is a randomized or probabilistic algorith-m whose output may be inaccuracy with a certain (typically small)probability.Probabilistic algorithms make random choices at one or more steps,and result in different output even given the same input, which isdifferent from all deterministic algorithms.

Monte Carlo algorithm for a decision problem: The probabilitythat the algorithm answers the decision problem correctly increasesas more tests are carried out.Step i:

Algorithm responses

true, the answer is “true”;unknown, either “true” or “false.”

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 21 / 38

Page 60: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithm Cont’d

After running all the iterations:Output:

Algorithm returns

true, yield at least one “true”;false, yield “unknown” in every iteration.

We will show that possibility of making mistake becomes extremelyunlikely as # tests increases.Suppose that p is the probability that the response of a test is “true”given that the answer is “true”.Because the algorithm answers “false” when all n iterations yield theanswer “unknown” and the iterations perform independent tests, theprobability of error is (1− p)n.When p 6= 0, this probability approaches 0 as the number of testsincreases. Consequently, the probability that the algorithm answers“true” when the answer is “true” approaches 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 22 / 38

Page 61: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithm Cont’d

After running all the iterations:Output:

Algorithm returns

true, yield at least one “true”;false, yield “unknown” in every iteration.

We will show that possibility of making mistake becomes extremelyunlikely as # tests increases.

Suppose that p is the probability that the response of a test is “true”given that the answer is “true”.Because the algorithm answers “false” when all n iterations yield theanswer “unknown” and the iterations perform independent tests, theprobability of error is (1− p)n.When p 6= 0, this probability approaches 0 as the number of testsincreases. Consequently, the probability that the algorithm answers“true” when the answer is “true” approaches 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 22 / 38

Page 62: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithm Cont’d

After running all the iterations:Output:

Algorithm returns

true, yield at least one “true”;false, yield “unknown” in every iteration.

We will show that possibility of making mistake becomes extremelyunlikely as # tests increases.Suppose that p is the probability that the response of a test is “true”given that the answer is “true”.

Because the algorithm answers “false” when all n iterations yield theanswer “unknown” and the iterations perform independent tests, theprobability of error is (1− p)n.When p 6= 0, this probability approaches 0 as the number of testsincreases. Consequently, the probability that the algorithm answers“true” when the answer is “true” approaches 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 22 / 38

Page 63: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithm Cont’d

After running all the iterations:Output:

Algorithm returns

true, yield at least one “true”;false, yield “unknown” in every iteration.

We will show that possibility of making mistake becomes extremelyunlikely as # tests increases.Suppose that p is the probability that the response of a test is “true”given that the answer is “true”.Because the algorithm answers “false” when all n iterations yield theanswer “unknown” and the iterations perform independent tests, theprobability of error is (1− p)n.

When p 6= 0, this probability approaches 0 as the number of testsincreases. Consequently, the probability that the algorithm answers“true” when the answer is “true” approaches 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 22 / 38

Page 64: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo algorithm Cont’d

After running all the iterations:Output:

Algorithm returns

true, yield at least one “true”;false, yield “unknown” in every iteration.

We will show that possibility of making mistake becomes extremelyunlikely as # tests increases.Suppose that p is the probability that the response of a test is “true”given that the answer is “true”.Because the algorithm answers “false” when all n iterations yield theanswer “unknown” and the iterations perform independent tests, theprobability of error is (1− p)n.When p 6= 0, this probability approaches 0 as the number of testsincreases. Consequently, the probability that the algorithm answers“true” when the answer is “true” approaches 1.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 22 / 38

Page 65: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo II

Algorithm:

Step i: It randomly and uniformlygenerates a point Pi inside the samplespace Ω = (x , y)|0 ≤ x , y ≤ 1.Let setS = (x , y) : x2 + y2 ≤ 1 ∧ x , y ≥ 0 bethe circle region. And ∀Pi ∈ S , wedefine IS(Pi ) and IΩ−S(Pi );π4 ≈

∑ni=1 IS (Pi )∑n

i=1 IS (Pi )+∑n

i=1 IΩ−S (Pi ).

Question: How accurate of the probabilistic algorithm?We cannot answer the question in this moment, once we learn ex-pectation of r.v.s (coming soon).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 23 / 38

Page 66: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo II

Algorithm:Step i: It randomly and uniformlygenerates a point Pi inside the samplespace Ω = (x , y)|0 ≤ x , y ≤ 1.

Let setS = (x , y) : x2 + y2 ≤ 1 ∧ x , y ≥ 0 bethe circle region. And ∀Pi ∈ S , wedefine IS(Pi ) and IΩ−S(Pi );π4 ≈

∑ni=1 IS (Pi )∑n

i=1 IS (Pi )+∑n

i=1 IΩ−S (Pi ).

Question: How accurate of the probabilistic algorithm?We cannot answer the question in this moment, once we learn ex-pectation of r.v.s (coming soon).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 23 / 38

Page 67: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo II

Algorithm:Step i: It randomly and uniformlygenerates a point Pi inside the samplespace Ω = (x , y)|0 ≤ x , y ≤ 1.Let setS = (x , y) : x2 + y2 ≤ 1 ∧ x , y ≥ 0 bethe circle region. And ∀Pi ∈ S , wedefine IS(Pi ) and IΩ−S(Pi );

π4 ≈

∑ni=1 IS (Pi )∑n

i=1 IS (Pi )+∑n

i=1 IΩ−S (Pi ).

Question: How accurate of the probabilistic algorithm?We cannot answer the question in this moment, once we learn ex-pectation of r.v.s (coming soon).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 23 / 38

Page 68: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo II

Algorithm:Step i: It randomly and uniformlygenerates a point Pi inside the samplespace Ω = (x , y)|0 ≤ x , y ≤ 1.Let setS = (x , y) : x2 + y2 ≤ 1 ∧ x , y ≥ 0 bethe circle region. And ∀Pi ∈ S , wedefine IS(Pi ) and IΩ−S(Pi );π4 ≈

∑ni=1 IS (Pi )∑n

i=1 IS (Pi )+∑n

i=1 IΩ−S (Pi ).

Question: How accurate of the probabilistic algorithm?We cannot answer the question in this moment, once we learn ex-pectation of r.v.s (coming soon).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 23 / 38

Page 69: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Monte Carlo II

Algorithm:Step i: It randomly and uniformlygenerates a point Pi inside the samplespace Ω = (x , y)|0 ≤ x , y ≤ 1.Let setS = (x , y) : x2 + y2 ≤ 1 ∧ x , y ≥ 0 bethe circle region. And ∀Pi ∈ S , wedefine IS(Pi ) and IΩ−S(Pi );π4 ≈

∑ni=1 IS (Pi )∑n

i=1 IS (Pi )+∑n

i=1 IΩ−S (Pi ).

Question: How accurate of the probabilistic algorithm?We cannot answer the question in this moment, once we learn ex-pectation of r.v.s (coming soon).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 23 / 38

Page 70: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bernoulli Trials and the Binomial Distribution

Sample with discrete distribution

How to sample from discrete distribution 0.1, 0.2, 0.3, 0.4?

CDF sample:

O(log n) for CDF sample,and O(1) for aliasingsample.

Aliasing sample:

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 24 / 38

Page 71: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example

Question: We have two boxes. The first contains two green ballsand seven red balls; the second contains four green balls and threered balls. Bob selects a ball by first choosing one of the two boxesat random. He then selects one of the balls in this box at random. IfBob has selected a red ball, what is the probability that he selecteda red ball from the first box?

Solution: Let E be the event that Bob has chosen a red ball. LetF and F be the event that Bob has chosen a ball from the first boxand the second box, respectively.We want to find P(F |E ), the probability that the ball Bob selectedcame from the first box, given that it is red.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 25 / 38

Page 72: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example

Question: We have two boxes. The first contains two green ballsand seven red balls; the second contains four green balls and threered balls. Bob selects a ball by first choosing one of the two boxesat random. He then selects one of the balls in this box at random. IfBob has selected a red ball, what is the probability that he selecteda red ball from the first box?

Solution: Let E be the event that Bob has chosen a red ball. LetF and F be the event that Bob has chosen a ball from the first boxand the second box, respectively.

We want to find P(F |E ), the probability that the ball Bob selectedcame from the first box, given that it is red.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 25 / 38

Page 73: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example

Question: We have two boxes. The first contains two green ballsand seven red balls; the second contains four green balls and threered balls. Bob selects a ball by first choosing one of the two boxesat random. He then selects one of the balls in this box at random. IfBob has selected a red ball, what is the probability that he selecteda red ball from the first box?

Solution: Let E be the event that Bob has chosen a red ball. LetF and F be the event that Bob has chosen a ball from the first boxand the second box, respectively.We want to find P(F |E ), the probability that the ball Bob selectedcame from the first box, given that it is red.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 25 / 38

Page 74: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

In terms of the definition of conditional probability, we have

P(F |E ) =P(F ∩ E )

P(E ).

Our target is to compute P(F ∩E ) and P(E ).

Suppose that P(F ) =P(F ) = 1

2 . We have known that

P(E |F ) =7

9,P(E |F ) =

3

7.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 26 / 38

Page 75: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

In terms of the definition of conditional probability, we have

P(F |E ) =P(F ∩ E )

P(E ).

Our target is to compute P(F ∩E ) and P(E ). Suppose that P(F ) =P(F ) = 1

2 . We have known that

P(E |F ) =7

9,P(E |F ) =

3

7.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 26 / 38

Page 76: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

In terms of the definition of conditional probability, we have

P(F |E ) =P(F ∩ E )

P(E ).

Our target is to compute P(F ∩E ) and P(E ). Suppose that P(F ) =P(F ) = 1

2 . We have known that

P(E |F ) =7

9,P(E |F ) =

3

7.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 26 / 38

Page 77: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

Then,

P(E ∩ F ) = P(E |F )P(F ) = (7

9)(

1

2) =

7

18,

P(E ∩ F ) = P(E |F )P(F ) = (3

7)(

1

2) =

3

14.

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.

P(E ) = P(E ∩ F ) + P(E ∩ F ) =7

18+

3

14=

38

63.

We conclude that

P(F |E ) =P(F ∩ E )

P(E )=

7/18

38/63=

49

76.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 27 / 38

Page 78: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

Then,

P(E ∩ F ) = P(E |F )P(F ) = (7

9)(

1

2) =

7

18,

P(E ∩ F ) = P(E |F )P(F ) = (3

7)(

1

2) =

3

14.

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.

P(E ) = P(E ∩ F ) + P(E ∩ F ) =7

18+

3

14=

38

63.

We conclude that

P(F |E ) =P(F ∩ E )

P(E )=

7/18

38/63=

49

76.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 27 / 38

Page 79: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

Then,

P(E ∩ F ) = P(E |F )P(F ) = (7

9)(

1

2) =

7

18,

P(E ∩ F ) = P(E |F )P(F ) = (3

7)(

1

2) =

3

14.

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.

P(E ) = P(E ∩ F ) + P(E ∩ F ) =7

18+

3

14=

38

63.

We conclude that

P(F |E ) =P(F ∩ E )

P(E )=

7/18

38/63=

49

76.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 27 / 38

Page 80: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Running example Cont’d

Then,

P(E ∩ F ) = P(E |F )P(F ) = (7

9)(

1

2) =

7

18,

P(E ∩ F ) = P(E |F )P(F ) = (3

7)(

1

2) =

3

14.

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.

P(E ) = P(E ∩ F ) + P(E ∩ F ) =7

18+

3

14=

38

63.

We conclude that

P(F |E ) =P(F ∩ E )

P(E )=

7/18

38/63=

49

76.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 27 / 38

Page 81: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem

Theorem

Suppose that E and F are events from a sample space Ω such thatP(E ) 6= 0 and P(F ) 6= 0. Then

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

Proof.

Since we have P(F |E ) = P(F∩E)P(E) , our target is therefore to compute

P(F ∩ E ) and P(E ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 28 / 38

Page 82: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem

Theorem

Suppose that E and F are events from a sample space Ω such thatP(E ) 6= 0 and P(F ) 6= 0. Then

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

Proof.

Since we have P(F |E ) = P(F∩E)P(E) , our target is therefore to compute

P(F ∩ E ) and P(E ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 28 / 38

Page 83: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem

Theorem

Suppose that E and F are events from a sample space Ω such thatP(E ) 6= 0 and P(F ) 6= 0. Then

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

Proof.

Since we have P(F |E ) = P(F∩E)P(E) , our target is therefore to compute

P(F ∩ E ) and P(E ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 28 / 38

Page 84: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem Cont’d

Proof

Then,

P(E ∩ F ) = P(E |F )P(F ),P(E ∩ F ) = P(E |F )P(F ).

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.We have

P(E ) = P(E ∩ F ) + P(E ∩ F ).

We can conclude that

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 29 / 38

Page 85: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem Cont’d

Proof

Then,

P(E ∩ F ) = P(E |F )P(F ),P(E ∩ F ) = P(E |F )P(F ).

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.

We haveP(E ) = P(E ∩ F ) + P(E ∩ F ).

We can conclude that

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 29 / 38

Page 86: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem Cont’d

Proof

Then,

P(E ∩ F ) = P(E |F )P(F ),P(E ∩ F ) = P(E |F )P(F ).

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.We have

P(E ) = P(E ∩ F ) + P(E ∩ F ).

We can conclude that

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 29 / 38

Page 87: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Bayes’ Theorem Cont’d

Proof

Then,

P(E ∩ F ) = P(E |F )P(F ),P(E ∩ F ) = P(E |F )P(F ).

Note that E = (E ∩ F ) ∪ (E ∩ F ) and (E ∩ F ) ∩ (E ∩ F ) = ∅.We have

P(E ) = P(E ∩ F ) + P(E ∩ F ).

We can conclude that

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 29 / 38

Page 88: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Generalized Bayes’ Theorem

Theorem

Suppose that E is an event from a sample space Ω and F1,F2, · · · ,Fnis a partition of the sample space. Let P(E ) 6= 0 and P(Fi ) 6= 0 for∀i . Then

P(Fi |E ) =P(E |Fi )P(Fi )∑n

k=1 P(E |Fk)P(Fk).

Proof:

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 30 / 38

Page 89: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Bayes’ Theorem

Generalized Bayes’ Theorem

Theorem

Suppose that E is an event from a sample space Ω and F1,F2, · · · ,Fnis a partition of the sample space. Let P(E ) 6= 0 and P(Fi ) 6= 0 for∀i . Then

P(Fi |E ) =P(E |Fi )P(Fi )∑n

k=1 P(E |Fk)P(Fk).

Proof:

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 30 / 38

Page 90: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Diagnostic test for rare disease

Suppose that one of 100,000 persons has a particular rare disease forwhich there is a fairly accurate diagnostic test. This test is correct99.0% when given to a person selected at random who has the dis-ease; it is correct 99.5% when given to a person selected at randomwho does not have the disease. Given this information can we find

the probability that a person who tests positive for the diseasehas the disease?

the probability that a person who tests negative for the diseasedoes not have the disease?

Should a person who tests positive be very concerned that he or shehas the disease?

Solution:Let F be the event that a person selected at random has the disease,and let E be the event that a person selected at random tests positivefor the disease. Hence, we have p(F ) = 1/100, 000 = 10−5.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 31 / 38

Page 91: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Diagnostic test for rare disease

Suppose that one of 100,000 persons has a particular rare disease forwhich there is a fairly accurate diagnostic test. This test is correct99.0% when given to a person selected at random who has the dis-ease; it is correct 99.5% when given to a person selected at randomwho does not have the disease. Given this information can we find

the probability that a person who tests positive for the diseasehas the disease?

the probability that a person who tests negative for the diseasedoes not have the disease?

Should a person who tests positive be very concerned that he or shehas the disease?Solution:Let F be the event that a person selected at random has the disease,and let E be the event that a person selected at random tests positivefor the disease. Hence, we have p(F ) = 1/100, 000 = 10−5.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 31 / 38

Page 92: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Diagnostic test for rare disease Cont’d

Then we also have P(E |F ) = 0.99, P(E |F ) = 0.01, P(E |F ) = 0.995,and P(E |F ) = 0.005.

Case a: In terms of Bayes’ theorem, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.99 · 10−5

0.99 · 10−5 + 0.005 · 0.99999≈ 0.002

Case b: Similarly, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.995 · 0.99999

0.995 · 0.99999 + 0.01 · 10−5≈ 0.9999999

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 32 / 38

Page 93: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Diagnostic test for rare disease Cont’d

Then we also have P(E |F ) = 0.99, P(E |F ) = 0.01, P(E |F ) = 0.995,and P(E |F ) = 0.005.Case a: In terms of Bayes’ theorem, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.99 · 10−5

0.99 · 10−5 + 0.005 · 0.99999≈ 0.002

Case b: Similarly, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.995 · 0.99999

0.995 · 0.99999 + 0.01 · 10−5≈ 0.9999999

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 32 / 38

Page 94: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Diagnostic test for rare disease Cont’d

Then we also have P(E |F ) = 0.99, P(E |F ) = 0.01, P(E |F ) = 0.995,and P(E |F ) = 0.005.Case a: In terms of Bayes’ theorem, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.99 · 10−5

0.99 · 10−5 + 0.005 · 0.99999≈ 0.002

Case b: Similarly, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.995 · 0.99999

0.995 · 0.99999 + 0.01 · 10−5≈ 0.9999999

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 32 / 38

Page 95: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Diagnostic test for rare disease Cont’d

Then we also have P(E |F ) = 0.99, P(E |F ) = 0.01, P(E |F ) = 0.995,and P(E |F ) = 0.005.Case a: In terms of Bayes’ theorem, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.99 · 10−5

0.99 · 10−5 + 0.005 · 0.99999≈ 0.002

Case b: Similarly, we have

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F )

=0.995 · 0.99999

0.995 · 0.99999 + 0.01 · 10−5≈ 0.9999999

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 32 / 38

Page 96: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters

Most electronic mailboxes receive a flood of unwanted and unsolicitedmessages, known as spam.

On the Internet, an Internet Water Army is a group of Internetghostwriters paid to post online comments with particular content.

Question: How to detect spam email?Solution: Bayesian spam filters look for occurrences of particularwords in messages. For a particular word w , the probability that wappears in a spam e-mail message is estimated by determining #times w appears in a message from a large set of messages knownto be spam and # times it appears in a large set of messages knownnot to be spam.Step 1: Collect ground-truth Suppose we have a set B of messagesknown to be spam and a set G of messages known not to be spam.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 33 / 38

Page 97: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters

Most electronic mailboxes receive a flood of unwanted and unsolicitedmessages, known as spam.On the Internet, an Internet Water Army is a group of Internetghostwriters paid to post online comments with particular content.

Question: How to detect spam email?Solution: Bayesian spam filters look for occurrences of particularwords in messages. For a particular word w , the probability that wappears in a spam e-mail message is estimated by determining #times w appears in a message from a large set of messages knownto be spam and # times it appears in a large set of messages knownnot to be spam.Step 1: Collect ground-truth Suppose we have a set B of messagesknown to be spam and a set G of messages known not to be spam.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 33 / 38

Page 98: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters

Most electronic mailboxes receive a flood of unwanted and unsolicitedmessages, known as spam.On the Internet, an Internet Water Army is a group of Internetghostwriters paid to post online comments with particular content.

Question: How to detect spam email?

Solution: Bayesian spam filters look for occurrences of particularwords in messages. For a particular word w , the probability that wappears in a spam e-mail message is estimated by determining #times w appears in a message from a large set of messages knownto be spam and # times it appears in a large set of messages knownnot to be spam.Step 1: Collect ground-truth Suppose we have a set B of messagesknown to be spam and a set G of messages known not to be spam.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 33 / 38

Page 99: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters

Most electronic mailboxes receive a flood of unwanted and unsolicitedmessages, known as spam.On the Internet, an Internet Water Army is a group of Internetghostwriters paid to post online comments with particular content.

Question: How to detect spam email?Solution: Bayesian spam filters look for occurrences of particularwords in messages. For a particular word w , the probability that wappears in a spam e-mail message is estimated by determining #times w appears in a message from a large set of messages knownto be spam and # times it appears in a large set of messages knownnot to be spam.

Step 1: Collect ground-truth Suppose we have a set B of messagesknown to be spam and a set G of messages known not to be spam.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 33 / 38

Page 100: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters

Most electronic mailboxes receive a flood of unwanted and unsolicitedmessages, known as spam.On the Internet, an Internet Water Army is a group of Internetghostwriters paid to post online comments with particular content.

Question: How to detect spam email?Solution: Bayesian spam filters look for occurrences of particularwords in messages. For a particular word w , the probability that wappears in a spam e-mail message is estimated by determining #times w appears in a message from a large set of messages knownto be spam and # times it appears in a large set of messages knownnot to be spam.Step 1: Collect ground-truth Suppose we have a set B of messagesknown to be spam and a set G of messages known not to be spam.

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 33 / 38

Page 101: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

Step 2: Learn parameters We next identify the words that occurin B and in G . Let nB(w) and nG (w) be # messages containingword w in sets B and G , respectively.

Let p(w) = nB(w)/|B| and q(w) = nG (w)/|G | be the empiricalprobabilities that a message are not spam and spam contains wordw , respectively.Step 3: Make decision Now suppose we receive a new e-mail mes-sage containing word w . Let F be the event that the message isspam. Let E be the event that the message contains word w .By Bayes theorem, the probability that the message is spam, giventhat it contains word w , is

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 34 / 38

Page 102: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

Step 2: Learn parameters We next identify the words that occurin B and in G . Let nB(w) and nG (w) be # messages containingword w in sets B and G , respectively.Let p(w) = nB(w)/|B| and q(w) = nG (w)/|G | be the empiricalprobabilities that a message are not spam and spam contains wordw , respectively.Step 3: Make decision Now suppose we receive a new e-mail mes-sage containing word w . Let F be the event that the message isspam. Let E be the event that the message contains word w .

By Bayes theorem, the probability that the message is spam, giventhat it contains word w , is

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 34 / 38

Page 103: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

Step 2: Learn parameters We next identify the words that occurin B and in G . Let nB(w) and nG (w) be # messages containingword w in sets B and G , respectively.Let p(w) = nB(w)/|B| and q(w) = nG (w)/|G | be the empiricalprobabilities that a message are not spam and spam contains wordw , respectively.Step 3: Make decision Now suppose we receive a new e-mail mes-sage containing word w . Let F be the event that the message isspam. Let E be the event that the message contains word w .By Bayes theorem, the probability that the message is spam, giventhat it contains word w , is

P(F |E ) =P(E |F )P(F )

P(E |F )P(F ) + P(E |F )P(F ).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 34 / 38

Page 104: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

To apply the above formula, we first estimate P(F ), the probabilitythat an incoming message is spam, as well as P(F ), the probabilitythat the incoming message is not spam.

Without prior knowledge about the likelihood that an incoming mes-sage is spam, for simplicity we assume that the message is equallylikely to be spam as it is not to be spam, i.e., P(F ) = P(F ) = 1/2.Using this assumption, we find that the probability that a message isspam, given that it contains word w , is

P(F |E ) =P(E |F )

P(E |F ) + P(E |F ).

By estimating P(E |F ) and P(E |F ), P(F |E ) can be estimated by

r(w) =p(w)

p(w) + q(w).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 35 / 38

Page 105: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

To apply the above formula, we first estimate P(F ), the probabilitythat an incoming message is spam, as well as P(F ), the probabilitythat the incoming message is not spam.Without prior knowledge about the likelihood that an incoming mes-sage is spam, for simplicity we assume that the message is equallylikely to be spam as it is not to be spam, i.e., P(F ) = P(F ) = 1/2.

Using this assumption, we find that the probability that a message isspam, given that it contains word w , is

P(F |E ) =P(E |F )

P(E |F ) + P(E |F ).

By estimating P(E |F ) and P(E |F ), P(F |E ) can be estimated by

r(w) =p(w)

p(w) + q(w).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 35 / 38

Page 106: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

To apply the above formula, we first estimate P(F ), the probabilitythat an incoming message is spam, as well as P(F ), the probabilitythat the incoming message is not spam.Without prior knowledge about the likelihood that an incoming mes-sage is spam, for simplicity we assume that the message is equallylikely to be spam as it is not to be spam, i.e., P(F ) = P(F ) = 1/2.Using this assumption, we find that the probability that a message isspam, given that it contains word w , is

P(F |E ) =P(E |F )

P(E |F ) + P(E |F ).

By estimating P(E |F ) and P(E |F ), P(F |E ) can be estimated by

r(w) =p(w)

p(w) + q(w).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 35 / 38

Page 107: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Bayesian spam filters Cont’d

To apply the above formula, we first estimate P(F ), the probabilitythat an incoming message is spam, as well as P(F ), the probabilitythat the incoming message is not spam.Without prior knowledge about the likelihood that an incoming mes-sage is spam, for simplicity we assume that the message is equallylikely to be spam as it is not to be spam, i.e., P(F ) = P(F ) = 1/2.Using this assumption, we find that the probability that a message isspam, given that it contains word w , is

P(F |E ) =P(E |F )

P(E |F ) + P(E |F ).

By estimating P(E |F ) and P(E |F ), P(F |E ) can be estimated by

r(w) =p(w)

p(w) + q(w).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 35 / 38

Page 108: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Extended Bayesian spam filters

The more words we use to estimate the probability that an incomingmail message is spam, the better is our chance that we correctlydetermine whether it is spam.In general, if Ei is the event that the message contains word wi ,assuming that P(S) = P(S), and that events Ei |S are independent,then by Bayes theorem the probability that a message containing allwords w1,w2, · · · ,wk is spam is

P(S |k⋂

i=1

Ei ) =P(⋂k

i=1 Ei |S)P(S)

P(⋂k

i=1 Ei |S)P(S) + P(⋂k

i=1 Ei |S)P(S)

=Πki=1P(Ei |S)

Πki=1P(Ei |S) + Πk

i=1P(Ei |S)

≈Πki=1p(wi )

Πki=1p(wi ) + Πk

i=1q(wi )= r(w1,w2, · · · ,wk).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 36 / 38

Page 109: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Naive Bayes

Why is this called Naive Bayes?

The model employs the chain rulefor repeated applications of thedefinition of conditional probability.

To handle underflow, we calculateΠni=1P(Xi |S) =

exp(∑n

i=1 logP(Xi |S)).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 37 / 38

Page 110: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Naive Bayes

Why is this called Naive Bayes?

The model employs the chain rulefor repeated applications of thedefinition of conditional probability.

To handle underflow, we calculateΠni=1P(Xi |S) =

exp(∑n

i=1 logP(Xi |S)).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 37 / 38

Page 111: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Applications of Bayes’ Theorem

Naive Bayes

Why is this called Naive Bayes?

The model employs the chain rulefor repeated applications of thedefinition of conditional probability.

To handle underflow, we calculateΠni=1P(Xi |S) =

exp(∑n

i=1 logP(Xi |S)).

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 37 / 38

Page 112: Discrete Mathematics and Its Applicationsdase.ecnu.edu.cn/mgao/teaching/DM_2020_Spring/... · If it is countable, the random variable is a discrete r.v., otherwise continuous r.v..

Take-aways

Take-aways

Conclusions

Random variable

Bernoulli Trials and the Binomial Distribution

Bayes’ Theorem

Applications of Bayes’ Theorem

MING GAO (DaSE@ECNU) Discrete Mathematics and Its Applications May 15, 2020 38 / 38