1 Probability Theory Rosen 5 th ed., Chapter 5. 2 Random Experiment - Terminology A random (or...

39
1 Probability Theory Rosen 5 Rosen 5 th th ed., Chapter 5 ed., Chapter 5

Transcript of 1 Probability Theory Rosen 5 th ed., Chapter 5. 2 Random Experiment - Terminology A random (or...

1

Probability Theory

Rosen 5Rosen 5thth ed., Chapter 5 ed., Chapter 5

2

Random Experiment - Terminology

• A random (or stochastic) A random (or stochastic) experimentexperiment is an is an experiment whose result is experiment whose result is notnot certain in certain in advance. advance.

• An An outcomeoutcome is the result of a random is the result of a random experiment.experiment.

• The The sample spacesample space SS of the experiment is the of the experiment is the set of possible outcomes.set of possible outcomes.

• An An eventevent E E is a subset of the sample space.is a subset of the sample space.

3

Examples

• ExperimentExperiment: roll two dice: roll two dice

• Possible Possible OutcomesOutcomes: (1,2), (5,6) etc.: (1,2), (5,6) etc.

• Sample spaceSample space: {(1,1) (1,2) … (6,6)}: {(1,1) (1,2) … (6,6)}

(36 possible outcomes)(36 possible outcomes)

• EventEvent: sum is 7: sum is 7

{(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}{(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}

4

Probability

• The The probabilityprobability p p = Pr[= Pr[EE] ] [0,1] of an event [0,1] of an event EE is is a real number a real number representing our degree of certaintyrepresenting our degree of certainty that that EE will occur. will occur.– If Pr[If Pr[EE] = 1, then ] = 1, then EE is is absolutely certainabsolutely certain to occur, to occur,

– If Pr[If Pr[EE] = 0, then ] = 0, then EE is is absolutely certain absolutely certain notnot to occur,.to occur,.

– If Pr[If Pr[EE] = ½, then we are ] = ½, then we are completely uncertaincompletely uncertain about about whether whether EE will occur; will occur;

– What about other cases?What about other cases?

5

Four Definitions of Probability

• Several alternative definitions of probability Several alternative definitions of probability are commonly encountered:are commonly encountered:– Frequentist, Bayesian, Laplacian, AxiomaticFrequentist, Bayesian, Laplacian, Axiomatic

• They have different strengths & They have different strengths & weaknesses.weaknesses.

• Fortunately, they coincide and work well Fortunately, they coincide and work well with each other in most cases.with each other in most cases.

6

Probability: Laplacian Definition

• First, assume that all outcomes in the sample First, assume that all outcomes in the sample space are space are equally likely.equally likely.

• Then, the probability of event Then, the probability of event EE is given by: is given by:

Pr[Pr[EE] = |] = |EE| / || / |SS| |

7

Example

• ExperimentExperiment: roll two dice: roll two dice• Sample space SSample space S: {(1,1) (1,2) … (6,6)}: {(1,1) (1,2) … (6,6)} (36 possible outcomes)(36 possible outcomes)• Event EEvent E: sum is 7: sum is 7 {(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}{(1,6) (6,1) (2,5) (5,2) (3,4) (4,3)}

P(E) = |E| / |S| = 6 / 36 = 1/6P(E) = |E| / |S| = 6 / 36 = 1/6

8

Another Example

• What is the probability of wining the lottery What is the probability of wining the lottery assuming that you have to pick correctly 6 assuming that you have to pick correctly 6 numbers out of 40 numbers?numbers out of 40 numbers?

9

Probability of Complementary Events

• Let Let EE be an event in a sample space be an event in a sample space SS..

• Then, Then, EE represents the represents the complementarycomplementary event:event:

Pr[Pr[EE] = 1-Pr[E]] = 1-Pr[E]

• Proof: Proof:

Pr[E]=(|S|-|E|)/|S| = 1 – |E|/|S| = 1 − Pr[Pr[E]=(|S|-|E|)/|S| = 1 – |E|/|S| = 1 − Pr[EE]]

10

Example

• A sequence of ten bits is randomly A sequence of ten bits is randomly generated. What is the probability that at generated. What is the probability that at least one of these bits is 0?least one of these bits is 0?

11

Probability of Unions of Events

• Let Let EE11,,EE22 SS, then: , then:

Pr[Pr[EE11 EE22] = Pr[] = Pr[EE11] + Pr[] + Pr[EE22] − Pr[] − Pr[EE11EE22]]

– By the inclusion-exclusion principle.By the inclusion-exclusion principle.

12

Example

• What is the probability that a positive integer What is the probability that a positive integer selected at random from the set of positive selected at random from the set of positive integers not exceeding 100 is divisible by either 2 integers not exceeding 100 is divisible by either 2 or 5?or 5?

13

Mutually Exclusive Events

• Two events Two events EE11, , EE22 are called are called mutually mutually exclusiveexclusive if they are disjoint: if they are disjoint: EE11EE22 = =

• Note that two mutually exclusive events Note that two mutually exclusive events cannot both occurcannot both occur in the same instance of a in the same instance of a given experiment.given experiment.– If event If event EE11 happens, then event happens, then event EE22 cannot, or vice- cannot, or vice-

versa. versa.

• For mutually exclusive events,For mutually exclusive events,Pr[Pr[EE11 EE22] = Pr[] = Pr[EE11] + Pr[] + Pr[EE22].].

14

Example

• Experiment: Rolling a die once: Experiment: Rolling a die once: – Sample space S = {1,2,3,4,5,6} Sample space S = {1,2,3,4,5,6}

– EE11 = 'observe an odd number' = {1,3,5} = 'observe an odd number' = {1,3,5}

– EE22 = 'observe an even number' = {2,4,6} = 'observe an even number' = {2,4,6}

– EE11EE22 = = , so , so EE11 and and EE22 are mutually are mutually

exclusive. exclusive.

15

Probability: Axiomatic Definition

• How to study probabilities of experiments where How to study probabilities of experiments where outcomes outcomes may not be equally likelymay not be equally likely??

• Let Let SS be the sample space of an experiment with a be the sample space of an experiment with a finite or countable number of outcomes.finite or countable number of outcomes.

• Define a function Define a function pp::SS→[0,1], called a →[0,1], called a probability probability distributiondistribution..– AssignAssign a probability a probability p(s)p(s) to each outcome to each outcome s s wherewhere

p(s)p(s)= = (# times (# times s s occurs) / (# times the experiment is occurs) / (# times the experiment is performed)performed)

as # of times as # of times ∞ ∞

16

Probability: Axiomatic Definition

• Two conditions must be met:Two conditions must be met:• 0 ≤ 0 ≤ pp((ss) ≤ 1 for all outcomes ) ≤ 1 for all outcomes ssSS..

• ∑ ∑ pp((ss) = 1 (sum is over all ) = 1 (sum is over all ssS)S)..

• The probability of any event The probability of any event EESS is just: is just:

Es

spE )(]Pr[

17

Example

• Suppose a die is biased (or loaded) so that 3 appears twice Suppose a die is biased (or loaded) so that 3 appears twice as often as each other number but that the other five as often as each other number but that the other five outcomes are equally likely. What is the probability that an outcomes are equally likely. What is the probability that an odd number appears when we roll the die?odd number appears when we roll the die?

18

Properties (same as before …)

Pr[Pr[EE] = 1 − Pr[] = 1 − Pr[EE]]

Pr[Pr[EE11 EE22] = Pr[] = Pr[EE11] + Pr[] + Pr[EE22] − Pr[] − Pr[EE11EE22]]

• For mutually exclusive events,For mutually exclusive events,Pr[Pr[EE11 EE22] = Pr[] = Pr[EE11] + Pr[] + Pr[EE22].].

19

Independent Events

• Two events Two events EE,,FF are are independentindependent if the if the outcome of event E, has outcome of event E, has no effectno effect on the on the outcome of event F.outcome of event F.

Pr[Pr[EEFF] = Pr[] = Pr[EE]·Pr[]·Pr[FF].].

• Example: Flip a coin, and roll a die.Example: Flip a coin, and roll a die.Pr[ quarter is heads Pr[ quarter is heads die is 1 ] = die is 1 ] =

Pr[quarter is heads] × Pr[die is 1]Pr[quarter is heads] × Pr[die is 1]

20

Example

• Are the events Are the events

E=“a family with three children has children of both sexes”, E=“a family with three children has children of both sexes”, and and

F=“a family with three children has at most one boy” F=“a family with three children has at most one boy”

independent? independent?

21

Mutually Exclusive vs Independent Events

• If A and B are mutually exclusive, If A and B are mutually exclusive, they they cannot be independentcannot be independent. .

• If A and B are independent, If A and B are independent, they cannot be they cannot be mutually exclusivemutually exclusive..

22

Conditional Probability

• Then, the Then, the conditional probabilityconditional probability of of E given FE given F, , written Pr[written Pr[EE||FF], is defined as ], is defined as

Pr[Pr[EE||FF] = Pr[] = Pr[EEFF]/Pr[]/Pr[FF]]• P(Cavity) = 0.1P(Cavity) = 0.1

- - In the absence of any other information, there is a 10% In the absence of any other information, there is a 10% chance that somebody has a cavity.chance that somebody has a cavity.

• P(Cavity/Toothache) = 0.8P(Cavity/Toothache) = 0.8- - There is a 80% chance that somebody has a cavity given There is a 80% chance that somebody has a cavity given that he has a toothache.that he has a toothache.

23

Example

• A bit string of length four is generated at random so that A bit string of length four is generated at random so that each of the 16 bit strings of length four is equally likely. each of the 16 bit strings of length four is equally likely. What is the probability that it contains at least two What is the probability that it contains at least two consecutive 0s, given that its first bit is a 0?consecutive 0s, given that its first bit is a 0?

24

Conditional Probability (cont’d)

• Note that if Note that if EE and and FF are independent, then are independent, then Pr[ Pr[EE||FF] = Pr[] = Pr[EE]]

• Using Pr[Using Pr[EE||FF] = Pr[] = Pr[EEFF]/Pr[]/Pr[FF] we have:] we have:

Pr[Pr[EEFF] = Pr[] = Pr[EE||FF] Pr[F]] Pr[F]

Pr[Pr[EEFF] = Pr[] = Pr[FF||EE] Pr[E]] Pr[E]

• Combining them we get the Bayes rule:Combining them we get the Bayes rule:Pr[ | ] Pr[ ]

Pr[ | ]Pr[ ]

F E EE F

F

25

Bayes’s Theorem

• Useful for computing the probability that a Useful for computing the probability that a hypothesis hypothesis HH is correct, given data is correct, given data DD::

• Extremely useful in artificial intelligence apps:Extremely useful in artificial intelligence apps:– Data mining, automated diagnosis, pattern recognition, Data mining, automated diagnosis, pattern recognition,

statistical modeling, evaluating scientific hypotheses.statistical modeling, evaluating scientific hypotheses.

Pr[ | ] Pr[ ]Pr[ | ]

Pr[ ]

Data Hypothesis HypothesisHypothesis Data

Data

26

Example

• Consider the probability of Consider the probability of DiseaseDisease given given SymptomsSymptoms

P(Disease/Symptoms) = P(Disease/Symptoms) =

P(Symptoms/Disease)P(Disease) / P(Symptoms)P(Symptoms/Disease)P(Disease) / P(Symptoms)

27

Random Variable

• A A random variablerandom variable VV is a functionis a function (i.e., not a (i.e., not a variable) that assigns a value to each event !variable) that assigns a value to each event !

28

Why Using Random Variables?

• It is easier to deal with a It is easier to deal with a summary variablesummary variable than with the than with the original probability structure.original probability structure.

• ExampleExample: in an opinion poll, we ask 50 people whether : in an opinion poll, we ask 50 people whether agree or disagree with a certain issue.agree or disagree with a certain issue.– Suppose we record a "1" for agree and "0" for disagree.Suppose we record a "1" for agree and "0" for disagree.

– The sample space for this experiment has possible elements.The sample space for this experiment has possible elements.

– Suppose we are only interested in the number of people who agree.Suppose we are only interested in the number of people who agree.

– Define the variable X=“Define the variable X=“number of 1’s recorded out of 50number of 1’s recorded out of 50””

– Easier to deal with this sample space - has only 51 elements!Easier to deal with this sample space - has only 51 elements!

502

29

Random Variables

• How is the probability function of a r.v. being How is the probability function of a r.v. being defined from the probability function of the defined from the probability function of the original sample space?original sample space?– Suppose the original sample space is:Suppose the original sample space is: S={ }S={ }– Suppose the range of the r.v. X is:Suppose the range of the r.v. X is:

{ }{ }– We will observe X= We will observe X= iffiff the outcome of the random the outcome of the random

experiment is an such that X( )=experiment is an such that X( )=

1 2, ,..., ns s s

1 2, ,..., mx x x

jx

js jxjs S

: ( )

( ) ( )j j j

j js X s x

P X x P s

30

Example

• X = “sum of dice”X = “sum of dice”– X=5 corresponds to E={(1,4) (4,1) (2,3) (3,2)}X=5 corresponds to E={(1,4) (4,1) (2,3) (3,2)}

P(X=5)=P(E)= =P(X=5)=P(E)= =

P((1,4) + P((4,1)) + P((2,3)) + P((3,2)) = 4/36=1/9P((1,4) + P((4,1)) + P((2,3)) + P((3,2)) = 4/36=1/9

: ( )

( )s X s x

P s

31

Expectation Values

• For a random variable For a random variable XX having a numeric having a numeric domain, its domain, its expected valueexpected value or or weighted average weighted average valuevalue or or arithmetic mean valuearithmetic mean value EE[[XX] is defined as] is defined as

• Provides a central point for the distribution of Provides a central point for the distribution of values of the r.v. values of the r.v.

( ) ( )x

E X xP X x

32

Example

• X=“outcome of rolling a die”X=“outcome of rolling a die”

E(X) = ∑ xE(X) = ∑ x·P·P(X=x) = 1(X=x) = 1·· 1/6 + 2 1/6 + 2 ··1/6 + 31/6 + 3·· 1/6 + 1/6 +

4 4 ··1/6 + 5 1/6 + 5 ··1/6 + 61/6 + 6· · 1/61/6 = = 3.5 3.5

33

Another Example

• X=“sum of two dice”X=“sum of two dice”

P(X=2)=P(X=12)=1/36 P(X=3)=P(X=11)=2/36P(X=2)=P(X=12)=1/36 P(X=3)=P(X=11)=2/36

P(X=4)=P(X=10)=3/36 P(X=5)=P(X=9)=4/36P(X=4)=P(X=10)=3/36 P(X=5)=P(X=9)=4/36

P(X=6)=P(X=8)=5/36 P(X=7)=6/36P(X=6)=P(X=8)=5/36 P(X=7)=6/36

E(X) = ∑ xE(X) = ∑ x·P·P(X=x) =2(X=x) =2··1/36 + 31/36 + 3··2/36 + ….2/36 + ….

+ + 1212· · 1/36 = 71/36 = 7

34

Linearity of Expectation

• Let Let XX11, , XX22 be any two random variables be any two random variables

derived from the same sample space, then:derived from the same sample space, then:– EE[[XX11++XX22] = ] = EE[[XX11] + ] + EE[[XX22]]

– EE[[aXaX11 + + bb] = ] = aaEE[[XX11] + ] + bb

35

Example

• Find the expected value of the sum of the numbers Find the expected value of the sum of the numbers that appear when a pair of fair dice is rolled.that appear when a pair of fair dice is rolled.

XX11: “: “number appearing on first dienumber appearing on first die””

XX22: “: “number appearing on second dienumber appearing on second die””

Y = XY = X11 + X + X22 : “sum of dice” : “sum of dice”

E(Y) = E(XE(Y) = E(X11+X+X22)=E(X)=E(X11)+E(X)+E(X22) = 3.5+3.5=7) = 3.5+3.5=7

36

Variance

• The The variancevariance VarVar[[XX] = ] = σσ22((XX) of a r.v. ) of a r.v. XX is is defined as:defined as:

wherewhere• The The standard deviationstandard deviation or or root-mean-squareroot-mean-square

(RMS) (RMS) differencedifference of of XX, , σσ((XX) :≡ ) :≡ VarVar[[XX]]1/21/2..• Indicates how Indicates how spread outspread out the values of the r.v. the values of the r.v.

are.are.

2 2 2( ) (( ) ) ( )Var X E X E X

( )E X

37

Example

• X=“outcome of rolling a die”X=“outcome of rolling a die”

E(X) = ∑ xE(X) = ∑ x·P·P(X=x) = 1(X=x) = 1·· 1/6 + 2 1/6 + 2 ··1/6 + 31/6 + 3·· 1/6 + 1/6 +

4 4 ··1/6 + 5 1/6 + 5 ··1/6 + 61/6 + 6· · 1/61/6 = = 3.5 3.5

= = 2( ) (( ) )Var X E X 2 2 2(1 3.5) 1/ 6 (2 3.5) 1/ 6 ... (6 3.5) 1/ 6 35 /12

38

Properties of Variance

• Let Let XX11, , XX22 be any two be any two independentindependent random random

variables derived from the same sample space, variables derived from the same sample space, then:then:

VarVar[[XX11++XX22] = ] = VarVar[[XX11] + ] + VarVar[[XX22]]

2( ) ( )Var aX b a Var X

39

Example

• Suppose two dice are rolled and X is a r.v.Suppose two dice are rolled and X is a r.v.

X = “X = “sum of dicesum of dice” ”

• Find Var(X)Find Var(X)