246 Approximating Probabilistic Inference in Bayesian Belief
Bayesian Statistics and Belief Networks
description
Transcript of Bayesian Statistics and Belief Networks
![Page 1: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/1.jpg)
Bayesian Statistics and Belief Networks
![Page 2: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/2.jpg)
Overview
• Book: Ch 13,14• Refresher on Probability• Bayesian classifiers• Belief Networks / Bayesian Networks
![Page 3: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/3.jpg)
Why Should We Care?
• Theoretical framework for machine learning, classification, knowledge representation, analysis
• Bayesian methods are capable of handling noisy, incomplete data sets
• Bayesian methods are commonly in use today
![Page 4: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/4.jpg)
Bayesian Approach To Probability and Statistics
• Classical Probability : Physical property of the world (e.g., 50% flip of a fair coin). True probability.
• Bayesian Probability : A person’s degree of belief in event X. Personal probability.
• Unlike classical probability, Bayesian probabilities benefit from but do not require repeated trials - only focus on next event; e.g. probability Seawolves win next game?
![Page 5: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/5.jpg)
Uncertainty
![Page 6: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/6.jpg)
Methods for Handling Uncertainty
![Page 7: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/7.jpg)
Probability
![Page 8: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/8.jpg)
Making Decisions Under Uncertainty
![Page 9: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/9.jpg)
Probability Basics
![Page 10: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/10.jpg)
Random Variables
![Page 11: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/11.jpg)
Prior Probability
![Page 12: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/12.jpg)
Conditional Probability
![Page 13: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/13.jpg)
Inference by Enumeration
![Page 14: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/14.jpg)
Inference by Enumeration
![Page 15: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/15.jpg)
Bayes RuleProduct Rule:
P A B P A B P B
P A B P B A P A
|
|
Equating Sides: P B A
P A B P BP A
|( | ) ( )
( )
P Class evidenceP evidence Class P Class
P evidence|
( | ) ( )( )
i.e.
All classification methods can be seen as estimates of Bayes’ Rule, with different techniques to estimate P(evidence|Class).
![Page 16: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/16.jpg)
Inference by Enumeration
![Page 17: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/17.jpg)
Simple Bayes Rule ExampleProbability your computer has a virus, V, = 1/1000.If virused, the probability of a crash that day, C, = 4/5.Probability your computer crashes in one day, C, = 1/10.
P(C|V)=0.8P(V)=1/1000P(C)=1/10
P V CP C V P V
P C( | )
( | ) ( )( )
( . )( . )( . )
. 08 0 001
010 008
Even though a crash is a strong indicator of a virus, we expect only8/1000 crashes to be caused by viruses. Why not compute P(V|C) from direct evidence? Causal vs. Diagnostic knowledge; (consider if P(C) suddenly drops).
![Page 18: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/18.jpg)
Bayesian Classifiers P Class evidence
P evidence Class P ClassP evidence
|( | ) ( )
( )
If we’re selecting the single most likely class, we onlyneed to find the class that maximizes P(e|Class)P(Class).
Hard part is estimating P(e|Class).Evidence e typically consists of a set of observations:
E e e en( , ,..., )1 2
Usual simplifying assumption is conditional independence:
P e C P e Cii
n
( | ) ( | )
1
P C e
P C P e C
P e
ii
n
( | )( ) ( | )
( )
1
![Page 19: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/19.jpg)
Bayesian Classifier ExampleProbability C=Virus C=Bad DiskP(C) 0.4 0.6P(crashes|C) 0.1 0.2P(diskfull|C) 0.6 0.1
Given a case where the disk is full and computer crashes,the classifier chooses Virus as most likely since(0.4)(0.1)(0.6) > (0.6)(0.2)(0.1).
![Page 20: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/20.jpg)
Beyond Conditional Independence
• Include second-order dependencies; i.e. pairwise combination of variables via joint probabilities:
Linear Classifier: C1
C2
P e c P e c P e c2 1 11( | ) ( | )[ ( | )] Correction factor - Difficult to compute -
n2
joint probabilities to consider
![Page 21: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/21.jpg)
Belief Networks
• DAG that represents the dependencies between variables and specifies the joint probability distribution
• Random variables make up the nodes• Directed links represent causal direct influences• Each node has a conditional probability table
quantifying the effects from the parents• No directed cycles
![Page 22: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/22.jpg)
Burglary Alarm ExampleBurglary Earthquake
Alarm
John Calls Mary Calls
P(B)0.001
P(E)0.002
B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001
A P(J)T 0.90F 0.05
A P(M)T 0.70F 0.01
![Page 23: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/23.jpg)
Sample Bayesian Network
![Page 24: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/24.jpg)
Using The Belief NetworkBurglary Earthquake
Alarm
John Calls Mary Calls
P(B)0.001
P(E)0.002
B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001
A P(J)T 0.90F 0.05
A P(M)T 0.70F 0.01
P x x x P x Parents Xn i ii
n
( , ,... ) ( | ( ))1 21
Probability of alarm, no burglary or earthquake, both John and Mary call:
P J A P M A P A B E P B P E( | ) ( | ) ( | ) ( ) ( ) ( . )( . )( . )( . )( . ) .0 9 0 7 0 001 0 999 0 998 0 00062
![Page 25: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/25.jpg)
Belief Computations• Two types; both are NP-Hard• Belief Revision
– Model explanatory/diagnostic tasks– Given evidence, what is the most likely hypothesis to explain
the evidence?– Also called abductive reasoning
• Belief Updating– Queries– Given evidence, what is the probability of some other random
variable occurring?
![Page 26: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/26.jpg)
Belief Revision• Given some evidence variables, find the state of all other variables
that maximize the probability.• E.g.: We know John Calls, but not Mary. What is the most likely
state? Only consider assignments where J=T and M=F, and maximize. Best:
049.0)99.0)(05.0)(999.0)(998.0)(999.0()|()|()|()()(
AMPAJPEBAPEPBP
![Page 27: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/27.jpg)
Belief Updating
• Causal Inferences
• Diagnostic Inferences
• Intercausal Inferences
• Mixed Inferences
Q E
Q
E
E EQ
E Q
![Page 28: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/28.jpg)
Causal InferencesInference from cause to effect.
E.g. Given a burglary, what is P(J|B)?
85.0)05.0)(06.0()9.0)(94.0()|()05.0)(()9.0)(()|(
94.0)|()95.0)(002.0(1)94.0)(998.0(1)|(
)95.0)(()()94.0)(()()|(?)|(
BJPAPAPBJP
BAPBAP
EPBPEPBPBAPBJP
P(M|B)=0.67 via similar calculations
Burglary Earthquake
Alarm
John Calls Mary Calls
P(B)0.001
P(E)0.002
B E P(A)T T 0.95T F 0.94F T 0.29F F 0.001
A P(J)T 0.90F 0.05
A P(M)T 0.70F 0.01
![Page 29: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/29.jpg)
Diagnostic InferencesFrom effect to cause. E.g. Given that John calls, what is the P(burglary)?
)()()|()|(
JPBPBJPJBP
002517.0)()001.0)(999.0)(998.0()94.0)(998.0)(001.0(
)29.0)(002.0)(999.0()95.0)(002.0)(001.0()()001.0)(()()94.0)(()(
)29.0)(()()95.0)(()()(
AP
APEPBPEPBP
EPBPEPBPAP
What is P(J)? Need P(A) first:
052.0)()05.0)(9975.0()9.0)(002517.0()(
)05.0)(()9.0)(()(
JPJP
APAPJP 016.0)052.0(
)001.0)(85.0()|( JBP
Many false positives.
![Page 30: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/30.jpg)
Intercausal InferencesExplaining Away Inferences.
Given an alarm, P(B|A)=0.37. But if we add the evidence that
earthquake is true, then P(B|A^E)=0.003.
Even though B and E are independent, the presence of
one may make the other more/less likely.
![Page 31: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/31.jpg)
Mixed Inferences
Simultaneous intercausal and diagnostic inference.
E.g., if John calls and Earthquake is false:
017.0)^|(03.0)^|(
EJBPEJAP
Computing these values exactly is somewhat complicated.
![Page 32: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/32.jpg)
Exact Computation - Polytree Algorithm
• Judea Pearl, 1982• Only works on singly-connected networks - at
most one undirected path between any two nodes. • Backward-chaining Message-passing algorithm for
computing posterior probabilities for query node X– Compute causal support for X, evidence variables
“above” X– Compute evidential support for X, evidence variables
“below” X
![Page 33: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/33.jpg)
Polytree ComputationU(1
) U(m)
X
Z(1,j) Z(n,j)
Y(1)
Y(n)
...
...
xE
xE
zj jyzijijjiiy
i yix
u ixuiix
xx
iEzPzXyPyEPXEP
EUPuXPEXP
XEPEXPEXP
)|(),|()|()|(
)|()|()|(
)|()|()|(
\
\
Algorithm recursive, message
passing chain
![Page 34: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/34.jpg)
Other Query Methods• Exact Algorithms
– Clustering• Cluster nodes to make single cluster, message-pass along that cluster
– Symbolic Probabilistic Inference• Uses d-separation to find expressions to combine
• Approximate Algorithms– Select sampling distribution, conduct trials sampling from root
to evidence nodes, accumulating weight for each node. Still tractable for dense networks.
• Forward Simulation• Stochastic Simulation
![Page 35: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/35.jpg)
Summary• Bayesian methods provide sound theory and
framework for implementation of classifiers• Bayesian networks a natural way to represent
conditional independence information. Qualitative info in links, quantitative in tables.
• NP-complete or NP-hard to compute exact values; typical to make simplifying assumptions or approximate methods.
• Many Bayesian tools and systems exist
![Page 36: Bayesian Statistics and Belief Networks](https://reader035.fdocuments.net/reader035/viewer/2022062315/56816331550346895dd3b416/html5/thumbnails/36.jpg)
References• Russel, S. and Norvig, P. (1995). Artificial Intelligence,
A Modern Approach. Prentice Hall.• Weiss, S. and Kulikowski, C. (1991). Computer Systems
That Learn. Morgan Kaufman.• Heckerman, D. (1996). A Tutorial on Learning with
Bayesian Networks. Microsoft Technical Report MSR-TR-95-06.
• Internet Resources on Bayesian Networks and Machine Learning: http://www.cs.orst.edu/~wangxi/resource.html