Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.
-
date post
21-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.
![Page 1: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/1.jpg)
Slide 1
A Quick Overview of Probability
William W. CohenMachine Learning 10-601
Jan 2008
![Page 2: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/2.jpg)
Slide 2
Probabilistic and Bayesian Analytics
Andrew W. MooreProfessor
School of Computer ScienceCarnegie Mellon University
www.cs.cmu.edu/[email protected]
412-268-7599
Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. PowerPoint originals are available. If you make use of a significant portion of these slides in your own lecture, please include this message, or the following link to the source repository of Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials . Comments and corrections gratefully received.
Copyright © Andrew W. Moore
[Some material pilfered from http://www.cs.cmu.edu/~awm/tutorials]
![Page 3: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/3.jpg)
Slide 3
Announcements
• Recitation this week:• Matlab tutorial• Probability q/a and tutorial
• HW1 is due on Monday
![Page 4: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/4.jpg)
Slide 4
Zeno’s paradox• Lance Armstrong and the
tortoise have a race• Lance is 10x faster• Tortoise has a 1m head start
at time 0
0 1
• So, when Lance gets to 1m the tortoise is at 1.1m• So, when Lance gets to 1.1m the tortoise is at 1.11m …• So, when Lance gets to 1.11m the tortoise is at 1.111m … and Lance will never catch up -?
1+0.1+0.01+0.001+0.0001+… = ?
unresolved until calculus was invented
![Page 5: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/5.jpg)
Slide 5
The Problem of Induction• David Hume (1711-
1776): pointed out1. Empirically,
induction seems to work
2. Statement (1) is an application of induction.
• This stumped people for about 200 years
1. Of the Different Species of Philosophy.
2. Of the Origin of Ideas
3. Of the Association of Ideas
4. Sceptical Doubts Concerning the Operations of the Understanding
5. Sceptical Solution of These Doubts
6. Of Probability9
7. Of the Idea of Necessary Connexion
8. Of Liberty and Necessity
9. Of the Reason of Animals
10. Of Miracles
11. Of A Particular Providence and of A Future State
12. Of the Academical Or Sceptical Philosophy
![Page 6: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/6.jpg)
Slide 6
A Second Problem of Induction
• A black crow seems to support the hypothesis “all crows are black”.
• A pink highlighter supports the hypothesis “all non-black things are non-crows”
• Thus, a pink highlighter supports the hypothesis “all crows are black”.
)(CROW)(BLACK
lyequivalentor
)(BLACK)(CROW
xxx
xxx
![Page 7: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/7.jpg)
Slide 7
A Third Problem of Induction
• You have much less than 200 years to figure it all out.
![Page 8: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/8.jpg)
Slide 8
Probability Theory• Events
• discrete random variables, boolean random variables, compound events
• Axioms of probability• What defines a reasonable theory of uncertainty
• Compound events• Independent events• Conditional probabilities• Bayes rule and beliefs• Joint probability distribution
![Page 9: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/9.jpg)
Slide 9
Discrete Random Variables• A is a Boolean-valued random variable if
• A denotes an event, • there is uncertainty as to whether A occurs.
• Examples• A = The US president in 2023 will be male• A = You wake up tomorrow with a headache• A = You have Ebola• A = the 1,000,000,000,000th digit of π is 7
• Define P(A) as “the fraction of possible worlds in which A is true”• We’re assuming all possible worlds are equally probable
![Page 10: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/10.jpg)
Slide 10
Discrete Random Variables• A is a Boolean-valued random variable if
• A denotes an event, • there is uncertainty as to whether A occurs.
• Define P(A) as “the fraction of experiments in which A is true”• We’re assuming all possible outcomes are equiprobable
• Examples• You roll two 6-sided die (the experiment) and get doubles
(A=doubles, the outcome)• I pick two students in the class (the experiment) and they have
the same birthday (A=same birthday, the outcome)
a possible outcome of an “experiment”
the experiment is not deterministic
![Page 11: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/11.jpg)
Slide 11
Visualizing A
Event space of all possible worlds
Its area is 1Worlds in which A is False
Worlds in which A is true
P(A) = Area ofreddish oval
![Page 12: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/12.jpg)
Slide 12
The Axioms of Probability
• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
![Page 13: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/13.jpg)
Slide 13
The Axioms Of Probability
(This is Andrew’s joke)
![Page 14: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/14.jpg)
Slide 14
These Axioms are Not to be Trifled With
• There have been many many other approaches to understanding “uncertainty”:
• Fuzzy Logic, three-valued logic, Dempster-Shafer, non-monotonic reasoning, …
• 25 years ago people in AI argued about these; now they mostly don’t• Any scheme for combining uncertain information, uncertain
“beliefs”, etc,… really should obey these axioms• If you gamble based on “uncertain beliefs”, then [you can
be exploited by an opponent] [your uncertainty formalism violates the axioms] - di Finetti 1931 (the “Dutch book argument”)
![Page 15: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/15.jpg)
Slide 15
Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get any smaller than 0
And a zero area would mean no world could ever have A true
![Page 16: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/16.jpg)
Slide 16
Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
The area of A can’t get any bigger than 1
And an area of 1 would mean all worlds will have A true
![Page 17: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/17.jpg)
Slide 17
Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
A
B
![Page 18: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/18.jpg)
Slide 18
Interpreting the axioms• 0 <= P(A) <= 1• P(True) = 1• P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
A
B
P(A or B)
BP(A and B)
Simple addition and subtraction
![Page 19: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/19.jpg)
Slide 19
Theorems from the Axioms• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
P(not A) = P(~A) = 1-P(A)
P(A or ~A) = 1 P(A and ~A) = 0
P(A or ~A) = P(A) + P(~A) - P(A and ~A)
1 = P(A) + P(~A) - 0
![Page 20: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/20.jpg)
Slide 20
Elementary Probability in Pictures
• P(~A) + P(A) = 1
A ~A
![Page 21: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/21.jpg)
Slide 21
Side Note
• I am inflicting these proofs on you for two reasons:
1. These kind of manipulations will need to be second nature to you if you use probabilistic analytics in depth
2. Suffering is good for you (This is also Andrew’s joke)
![Page 22: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/22.jpg)
Slide 22
Another important theorem• 0 <= P(A) <= 1, P(True) = 1, P(False) = 0• P(A or B) = P(A) + P(B) - P(A and B)
P(A) = P(A ^ B) + P(A ^ ~B)
A = A and (B or ~B) = (A and B) or (A and ~B)
P(A) = P(A and B) + P(A and ~B) – P((A and B) and (A and ~B))
P(A) = P(A and B) + P(A and ~B) – P(A and A and B and ~B)
![Page 23: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/23.jpg)
Slide 23
Elementary Probability in Pictures
• P(A) = P(A ^ B) + P(A ^ ~B)
B
~B
A ^ ~B
A ^ B
![Page 24: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/24.jpg)
Slide 24
The Monty Hall Problem• You’re in a game show.
Behind one door is a prize. Behind the others, goats.
• You pick one of three doors, say #1
• The host, Monty Hall, opens one door, revealing…a goat!
3
You now can either
• stick with your guess
• always change doors
• flip a coin and pick a new door randomly according to the coin
![Page 25: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/25.jpg)
Slide 25
The Monty Hall Problem• Case 1: you don’t
swap.• W = you win.
• Pre-goat: P(W)=1/3• Post-goat: P(W)=1/3
• Case 2: you swap• W1=you picked the
cash initially.• W2=you win.
• Pre-goat: P(W1)=1/3.
• Post-goat:• W2 = ~W1• Pr(W2) = 1-
P(W1)=2/3. Moral: ?
![Page 26: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/26.jpg)
Slide 26
The Extreme Monty Hall/Survivor Problem
• You’re in a game show. There are 10,000 doors. Only one of them has a prize.
• You pick a door.• Over the remaining 13 weeks, the host eliminates
9,998 of the remaining doors.• For the season finale:
• Do you switch, or not?…
![Page 27: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/27.jpg)
Slide 27
Some practical problems
• You’re the DM in a D&D game.• Joe brings his own d20 and throws 4 critical hits in
a row to start off• DM=dungeon master• D20 = 20-sided die• “Critical hit” = 19 or 20
• Is Joe cheating?• What is P(A), A=four critical hits?
• A is a compound event: A = C1 and C2 and C3 and C4
![Page 28: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/28.jpg)
Slide 28
Independent Events
• Definition: two events A and B are independent if Pr(A and B)=Pr(A)*Pr(B).
• Intuition: outcome of A has no effect on the outcome of B (and vice versa).• We need to assume the different rolls are
independent to solve the problem.• You almost always need to assume
independence of something to solve any learning problem.
![Page 29: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/29.jpg)
Slide 29
Some practical problems
• You’re the DM in a D&D game.• Joe brings his own d20 and throws 4 critical hits in
a row to start off• DM=dungeon master• D20 = 20-sided die• “Critical hit” = 19 or 20
• What are the odds of that happening with a fair die?
• Ci=critical hit on trial i, i=1,2,3,4 • P(C1 and C2 … and C4) = P(C1)*…*P(C4) =
(1/10)^4Followup: D=pick an ace or king out of deck three times in a row: D=D1 ^ D2 ^ D3
![Page 30: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/30.jpg)
Slide 30
Some practical problems
The specs for the loaded d20 say that it has 20 outcomes, X where
• P(X=20) = 0.25
• P(X=19) = 0.25
• for i=1,…,18, P(X=i)= Z * 1/18
• What is Z?
![Page 31: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/31.jpg)
Slide 31
Multivalued Discrete Random Variables
• Suppose A can take on more than 2 values• A is a random variable with arity k if it can take
on exactly one value out of {v1,v2, .. vk}• Thus…
jivAvAP ji if 0)(
1)( 21 kvAvAvAP
![Page 32: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/32.jpg)
Slide 32
More about Multivalued Random Variables
• Using the axioms of probability…0 <= P(A) <= 1, P(True) = 1, P(False) = 0P(A or B) = P(A) + P(B) - P(A and B)
• And assuming that A obeys…
• It’s easy to prove that
jivAvAP ji if 0)(1)( 21 kvAvAvAP
)()(1
21
i
jji vAPvAvAvAP
![Page 33: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/33.jpg)
Slide 33
More about Multivalued Random Variables
• Using the axioms of probabilityand assuming that A obeys…
• It’s easy to prove that
jivAvAP ji if 0)(1)( 21 kvAvAvAP
)()(1
21
i
jji vAPvAvAvAP
• And thus we can prove
1)(1
k
jjvAP
![Page 34: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/34.jpg)
Slide 34
Elementary Probability in Pictures
1)(1
k
jjvAP
A=1
A=2
A=3
A=4
A=5
![Page 35: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/35.jpg)
Slide 35
Some practical problemsThe specs for the loaded d20 say that it has 20 outcomes, X
• P(X=20) = P(X=19) = 0.25
• for i=1,…,18, P(X=i)= Z * 1/18 and what is Z?
5.018
11825.025.0)(1
1)(
18
1
1
ZvAP
vAP
jj
k
jj
![Page 36: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/36.jpg)
Slide 36
Some practical problems• You (probably) have 8
neighbors and 5 close neighbors.
• What is Pr(A), A=one or more of your neighbors has the same sign as you?• What’s the
experiment?• What is Pr(B), B=you
and your close neighbors all have different signs?• What about
neighbors?
n c n
c * c
n c n
Moral: ?
![Page 37: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/37.jpg)
Slide 37
Some practical problemsI bought a loaded d20 on EBay…but it didn’t come with any specs. How can I find out how it behaves?
Frequency
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Face Shown
P(X=20) = P(X=19) = 0.25
for i=1,…,18, P(X=i)= 0.5 * 1/18
![Page 38: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/38.jpg)
Slide 38
Some practical problems
• I have 3 standard d20 dice, 1 loaded die.
• Experiment: (1) pick a d20 uniformly at random then (2) roll it. Let A=d20 picked is fair and B=roll 19 or 20 with that die. What is P(B)?
P(B) = P(B and A) + P(B and ~A)= 0.1*0.75 + 0.5*0.25 = 0.2
using Andrew’s “important theorem” P(A) = P(A ^ B) + P(A ^ ~B)
![Page 39: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/39.jpg)
Slide 39
Elementary Probability in Pictures
• P(A) = P(A ^ B) + P(A ^ ~B)
B
~B
A ^ ~B
A ^ B
Followup:
What if I change the ratio of fair to loaded die in the experiment?
![Page 40: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/40.jpg)
Slide 40
Some practical problems
• I have lots of standard d20 die, lots of loaded die, all identical.
• Experiment is the same: (1) pick a d20 uniformly at random then (2) roll it. Can I mix the dice together so that P(B)=0.137 ?P(B) = P(B and A) + P(B and ~A)= 0.1*λ + 0.5*(1- λ) = 0.137
λ = (0.5 - 0.137)/0.4 = 0.9075“mixture model”
![Page 41: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/41.jpg)
Slide 41
Another picture for this problem
A (fair die) ~A (loaded)
A and B ~A and B
It’s more convenient to say• “if you’ve picked a fair die then …” i.e. Pr(critical hit|fair die)=0.1• “if you’ve picked the loaded die then….” Pr(critical hit|loaded die)=0.5
Conditional probability:Pr(B|A) = P(B^A)/P(A)
P(B|A) P(B|~A)
![Page 42: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/42.jpg)
Slide 42
Definition of Conditional Probability
P(A ^ B) P(A|B) = ----------- P(B)
Corollary: The Chain Rule
P(A ^ B) = P(A|B) P(B)
![Page 43: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/43.jpg)
Slide 43
Some practical problems
• I have 3 standard d20 dice, 1 loaded die.
• Experiment: (1) pick a d20 uniformly at random then (2) roll it. Let A=d20 picked is fair and B=roll 19 or 20 with that die. What is P(B)?
P(B) = P(B|A) P(A) + P(B|~A) P(~A) = 0.1*0.75 + 0.5*0.25 = 0.2
“marginalizing out” A
![Page 44: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/44.jpg)
Slide 44
A (fair die) ~A (loaded)
A and B ~A and B P(B|A) P(B|~A)
P(A) P(~A)P(B) = P(B|A)P(A) + P(B|~A)P(~A)
![Page 45: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/45.jpg)
Slide 45
Some practical problems
• I have 3 standard d20 dice, 1 loaded die.
• Experiment: (1) pick a d20 uniformly at random then (2) roll it. Let A=d20 picked is fair and B=roll 19 or 20 with that die.
• Suppose B happens (e.g., I roll a 20). What is the chance the die I rolled is fair? i.e. what is P(A|B) ?
![Page 46: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/46.jpg)
Slide 46
A (fair die) ~A (loaded)
A and B ~A and B
P(B|A) P(B|~A)
P(A) P(~A)
P(A and B) = P(B|A) * P(A)
P(A and B) = P(A|B) * P(B)
P(A|B) * P(B) = P(B|A) * P(A)
P(B|A) * P(A)
P(B)P(A|B) =
A (fair die) ~A (loaded)
A and B ~A and B
P(B)
P(A|B) = ?
![Page 47: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/47.jpg)
Slide 48
P(B|A) * P(A)
P(B)P(A|B) =
P(A|B) * P(B)
P(A)P(B|A) =
Bayes, Thomas (1763) An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53:370-418
…by no means merely a curious speculation in the doctrine of chances, but necessary to be solved in order to a sure foundation for all our reasonings concerning past facts, and what is likely to be hereafter…. necessary to be considered by any that would give a clear account of the strength of analogical or inductive reasoning…
Bayes’ rule
priorposterior
![Page 48: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/48.jpg)
Slide 49
More General Forms of Bayes Rule
)(~)|~()()|(
)()|()|(
APABPAPABP
APABPBAP
)(
)()|()|(
XBP
XAPXABPXBAP
![Page 49: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/49.jpg)
Slide 50
More General Forms of Bayes Rule
An
kkk
iii
vAPvABP
vAPvABPBvAP
1
)()|(
)()|()|(
![Page 50: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/50.jpg)
Slide 51
Useful Easy-to-prove facts
1)|()|( BAPBAP
1)|(1
An
kk BvAP
![Page 51: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/51.jpg)
Slide 52
More about Bayes rule• An Intuitive Explanation of Bayesian Reasoning:
Bayes' Theorem for the curious and bewildered; an excruciatingly gentle introduction - Eliezer Yudkowsky
• Problem: Suppose that a barrel contains many small plastic eggs. Some eggs are painted red and some are painted blue. 40% of the eggs in the bin contain pearls, and 60% contain nothing. 30% of eggs containing pearls are painted blue, and 10% of eggs containing nothing are painted blue. What is the probability that a blue egg contains a pearl?
![Page 52: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/52.jpg)
Slide 53
Some practical problems• Joe throws 4 critical hits in a row, is Joe cheating?• A = Joe using cheater’s die• C = roll 19 or 20; P(C|A)=0.5, P(C|~A)=0.1• B = C1 and C2 and C3 and C4• Pr(B|A) = 0.0625 P(B|~A)=0.0001
)(~)|~()()|(
)()|()|(
APABPAPABP
APABPBAP
))(1(*0001.0)(*0625.0
)(*0625.0)|(
APAP
APBAP
![Page 53: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/53.jpg)
Slide 54
What’s the experiment and outcome here?
• Outcome A: Joe is cheating• Experiment:
• Joe picked a die uniformly at random from a bag containing 10,000 fair die and one bad one.
• Joe is a D&D player picked uniformly at random from set of 1,000,000 people and n of them cheat with probability p>0.
• I have no idea, but I don’t like his looks. Call it P(A)=0.1
![Page 54: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/54.jpg)
Slide 55
Remember: Don’t Mess with The Axioms
• A subjective belief can be treated, mathematically, like a probability• Use those axioms!
• There have been many many other approaches to understanding “uncertainty”:
• Fuzzy Logic, three-valued logic, Dempster-Shafer, non-monotonic reasoning, …
• 25 years ago people in AI argued about these; now they mostly don’t• Any scheme for combining uncertain information, uncertain
“beliefs”, etc,… really should obey these axioms• If you gamble based on “uncertain beliefs”, then [you can be
exploited by an opponent] [your uncertainty formalism violates the axioms] - di Finetti 1931 (the “Dutch book argument”)
![Page 55: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/55.jpg)
Slide 56
Some practical problems
• Joe throws 4 critical hits in a row, is Joe cheating?
• A = Joe using cheater’s die• C = roll 19 or 20; P(C|A)=0.5, P(C|~A)=0.1• B = C1 and C2 and C3 and C4• Pr(B|A) = 0.0625 P(B|~A)=0.0001
)(/)()|(
)(/)()|(
)|(
)|(
BPAPABP
BPAPABP
BAP
BAP
)(
)()|()|(
BP
APABPBAP
)(
)(
)|(
)|(
AP
AP
ABP
ABP
)(
)(
0001.0
0625.0
AP
AP
)(
)(250,6
AP
AP
Moral: with enough evidence the prior P(A) doesn’t really matter.
![Page 56: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/56.jpg)
Slide 57
Some practical problemsI bought a loaded d20 on EBay…but it didn’t come with any specs. How can I find out how it behaves?
Frequency
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Face Shown
![Page 57: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/57.jpg)
Slide 58
One solutionI bought a loaded d20 on EBay…but it didn’t come with any specs. How can I find out how it behaves?
Frequency
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Face Shown
P(1)=0
P(2)=0
P(3)=0
P(4)=0.1
…
P(19)=0.25
P(20)=0.2
MLE = maximumlikelihood estimate
![Page 58: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/58.jpg)
Slide 59
A better solutionI bought a loaded d20 on EBay…but it didn’t come with any specs. How can I find out how it behaves?
Frequency
0
1
2
3
4
5
6
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Face Shown
P(A=i|B) = P(B|A=i)*P(A=i) / P(B)
B = observed counts
P(A=i) = prior probability (subjective)
… and we’ll need some tricks to make this simple to compute
MAP = maximuma posteriori estimate
![Page 59: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/59.jpg)
Slide 65
Some practical problems
• I have 1 standard d6 die, 2 loaded d6 die.
• Loaded high: P(X=6)=0.50 Loaded low: P(X=1)=0.50
• Experiment: pick one d6 uniformly at random (A) and roll it. What is more likely – rolling a seven or rolling doubles?
Three combinations: HL, HF, FLP(D) = P(D ^ A=HL) + P(D ^ A=HF) + P(D ^ A=FL) = P(D | A=HL)*P(A=HL) + P(D|A=HF)*P(A=HF) + P(A|A=FL)*P(A=FL)
![Page 60: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/60.jpg)
Slide 66
Some practical problems
• I have 1 standard d6 die, 2 loaded d6 die.
• Loaded high: P(X=6)=0.50 Loaded low: P(X=1)=0.50
• Experiment: pick one d6 uniformly at random (A) and roll it. What is more likely – rolling a seven or rolling doubles?
1 2 3 4 5 6
1 D 7
2 D 7
3 D 7
4 7 D
5 7 D
6 7 D
Three combinations: HL, HF, FL
Roll 1
Roll
2
![Page 61: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/61.jpg)
Slide 67
A brute-force solutionA Roll 1 Roll 2 P
FL 1 1 1/3 * 1/6 * ½
FL 1 2 1/3 * 1/6 * 1/10
FL 1 … …
… 1 6
FL 2 1
FL 2 …
… … …
FL 6 6
HL 1 1
HL 1 2
… … …
HF 1 1
…
Comment
doubles
seven
doubles
doubles
A joint probability table shows P(X1=x1 and … and Xk=xk) for every possible combination of values x1,x2,…., xk
With this you can compute any P(A) where A is any boolean combination of the primitive events (Xi=Xk), e.g.
• P(doubles)
• P(seven or eleven)
• P(total is higher than 5)
• ….
![Page 62: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/62.jpg)
Slide 68
The Joint Distribution
Recipe for making a joint distribution of M variables:
Example: Boolean variables A, B, C
![Page 63: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/63.jpg)
Slide 69
The Joint Distribution
Recipe for making a joint distribution of M variables:
1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows).
Example: Boolean variables A, B, C
A B C0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
![Page 64: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/64.jpg)
Slide 70
The Joint Distribution
Recipe for making a joint distribution of M variables:
1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows).
2. For each combination of values, say how probable it is.
Example: Boolean variables A, B, C
A B C Prob0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
![Page 65: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/65.jpg)
Slide 71
The Joint Distribution
Recipe for making a joint distribution of M variables:
1. Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have 2M rows).
2. For each combination of values, say how probable it is.
3. If you subscribe to the axioms of probability, those numbers must sum to 1.
Example: Boolean variables A, B, C
A B C Prob0 0 0 0.30
0 0 1 0.05
0 1 0 0.10
0 1 1 0.05
1 0 0 0.05
1 0 1 0.10
1 1 0 0.25
1 1 1 0.10
A
B
C0.050.25
0.10 0.050.05
0.10
0.100.30
![Page 66: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/66.jpg)
Slide 72
Using the Joint
One you have the JD you can ask for the probability of any logical expression involving your attribute
E
PEP matching rows
)row()(
![Page 67: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/67.jpg)
Slide 73
Using the Joint
P(Poor Male) = 0.4654 E
PEP matching rows
)row()(
![Page 68: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/68.jpg)
Slide 74
Using the Joint
P(Poor) = 0.7604 E
PEP matching rows
)row()(
![Page 69: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/69.jpg)
Slide 75
Inference with the
Joint
2
2 1
matching rows
and matching rows
2
2121 )row(
)row(
)(
)()|(
E
EE
P
P
EP
EEPEEP
![Page 70: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/70.jpg)
Slide 76
Inference with the
Joint
2
2 1
matching rows
and matching rows
2
2121 )row(
)row(
)(
)()|(
E
EE
P
P
EP
EEPEEP
P(Male | Poor) = 0.4654 / 0.7604 = 0.612
![Page 71: Slide 1 A Quick Overview of Probability William W. Cohen Machine Learning 10-601 Jan 2008.](https://reader031.fdocuments.net/reader031/viewer/2022032105/56649d5d5503460f94a3b72b/html5/thumbnails/71.jpg)
Slide 77
Inference is a big deal
• I’ve got this evidence. What’s the chance that this conclusion is true?• I’ve got a sore neck: how likely am I to have
meningitis?• I see my lights are out and it’s 9pm. What’s the
chance my spouse is already asleep?