Probabilistic Graphical Models - Caltech...
Transcript of Probabilistic Graphical Models - Caltech...
![Page 1: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/1.jpg)
Probabilistic
Graphical Models
Lecture 2 – Bayesian Networks Representation
CS/CNS/EE 155
Andreas Krause
![Page 2: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/2.jpg)
2
AnnouncementsWill meet in Steele 102 for now
Still looking for another 1-2 TAs..
Homework 1 will be out soon. Start early!! ☺
![Page 3: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/3.jpg)
3
Multivariate distributionsInstead of random variable, have random vector
X(ω) = [X1(ω),…,Xn(ω)]
Specify P(X1=x1,…,Xn=xn)
Suppose all Xi are Bernoulli variables.
How many parameters do we need to specify?
3
![Page 4: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/4.jpg)
4
Marginal distributionsSuppose we have joint distribution P(X1,…,Xn)
Then
If all Xi binary: How many terms?
4
![Page 5: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/5.jpg)
5
Rules for random variablesChain rule
Bayes’ rule
![Page 6: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/6.jpg)
6
Key concept: Conditional independence
Events α, β conditionally independent given γ if
Random variables X and Y cond. indep. given Z if
for all x∈ Val(X), y∈ Val(Y), Z∈ Val(Z)
P(X = x, Y = y | Z = z) = P(X =x | Z = z) P(Y = y| Z= z)
If P(Y=y |Z=z)>0, that’s equivalent to
P(X = x | Z = z, Y = y) = P(X = x | Z = z)
Similarly for sets of random variables X, Y, Z
We write: P � X⊥ Y | Z6
![Page 7: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/7.jpg)
7
Why is conditional independence useful?P(X1,…,Xn) = P(X1) P(X2 | X1) … P(Xn | X1,…,Xn-1)
How many parameters?
Now suppose X1 …Xi-1 ⊥ Xi+1… Xn | Xi for all i
Then
P(X1,…,Xn) =
How many parameters?
Can we compute P(Xn) more efficiently?
![Page 8: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/8.jpg)
8
Properties of Conditional Independence
Symmetry
X ⊥ Y | Z ⇒ Y ⊥ X | Z
Decomposition
X ⊥ Y,W | Z ⇒ X ⊥ Y | Z
Contraction
(X ⊥ Y | Z) Æ (X ⊥W | Y,Z) ⇒ X ⊥ Y,W | Z
Weak union
X ⊥ Y,W | Z ⇒ X ⊥ Y | Z,W
Intersection
(X ⊥ Y | Z,W) Æ (X ⊥W | Y,Z) ⇒ X ⊥ Y,W | Z
Holds only if distribution is positive, i.e., P>0
![Page 9: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/9.jpg)
9
Key questionsHow do we specify distributions that satisfy particular
independence properties?
� Representation
How can we exploit independence properties for
efficient computation?
� Inference
How can we identify independence properties
present in data?
� Learning
Will now see example: Bayesian Networks
![Page 10: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/10.jpg)
10
Key ideaConditional parameterization
(instead of joint parameterization)
For each RV, specify P(Xi | XA) for set XA of RVs
Then use chain rule to get joint parametrization
Have to be careful to guarantee legal distribution…
![Page 11: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/11.jpg)
11
Example: 2 variables
![Page 12: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/12.jpg)
12
Example: 3 variables
![Page 13: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/13.jpg)
13
Example: Naïve Bayes modelsClass variable Y
Evidence variables X1,…,Xn
Assume that XA⊥ XB | Y
for all subsets XA,XB of {X1,…,Xn}
Conditional parametrization:
Specify P(Y)
Specify P(Xi | Y)
Joint distribution
![Page 14: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/14.jpg)
14
Today: Bayesian networksCompact representation of distributions over large
number of variables
(Often) allows efficient exact inference (computing
marginals, etc.)
HailFinder
56 vars
~ 3 states each
�~1026 terms
> 10.000 years
on Top
supercomputersJavaBayes applet
![Page 15: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/15.jpg)
15
Causal parametrizationGraph with directed edges from (immediate) causes
to (immediate) effects
Earthquake Burglary
Alarm
JohnCalls MaryCalls
![Page 16: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/16.jpg)
16
Bayesian networksA Bayesian network structure is a directed, acyclic graph G, where each vertex s of G is interpreted as a random variable Xs (with unspecified distribution)
A Bayesian network (G,P) consists of
A BN structure G and ..
..a set of conditional probability distributions (CPTs)P(Xs | PaXs
), where PaXsare the parents of node Xs such that
(G,P) defines joint distribution
![Page 17: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/17.jpg)
17
Bayesian networksCan every probability distribution be described by a BN?
![Page 18: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/18.jpg)
18
Representing the world using BNs
Want to make sure that I(P) ⊆ I(P’)
Need to understand CI properties of BN (G,P)
s1 s
2 s3
s4
s5 s
7s6
s11
s12
s9 s
10
s8
s1 s
3
s12
s9
True distribution P’
with cond. ind. I(P’) Bayes net (G,P)
with I(P)
represent
![Page 19: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/19.jpg)
19
Which kind of CI does a BN imply?
E B
A
J M
![Page 20: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/20.jpg)
20
Which kind of CI does a BN imply?
E B
A
J M
![Page 21: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/21.jpg)
21
Local Markov AssumptionEach BN Structure G is associated with the following
conditional independence assumptions
X ⊥ NonDescendentsX | PaX
We write Iloc(G) for these conditional independences
Suppose (G,P) is a Bayesian network representing P
Does it hold that Iloc(G) ⊆ I(P)?
If this holds, we say G is an I-map for P.
![Page 22: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/22.jpg)
22
Factorization Theorem
s1 s
2 s3
s4
s5 s
7s6
s11
s12
s9 s
10
s8
s1 s
3
s12
s9
Iloc(G) ⊆ I(P)
True distribution P
can be represented exactly as
G is an I-map of P
(independence map)
i.e., P can be represented as
a Bayes net (G,P)
![Page 23: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/23.jpg)
23
Factorization Theorem
s1 s
2 s3
s4
s5 s
7s6
s11
s12
s9 s
10
s8
s1 s
3
s12
s9
Iloc(G) ⊆ I(P)
G is an I-map of P
(independence map)
True distribution P
can be represented exactly as
a Bayes net (G,P)
![Page 24: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/24.jpg)
24
Proof: I-Map to factorization
![Page 25: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/25.jpg)
25
Factorization Theorem
s1 s
2 s3
s4
s5 s
7s6
s11
s12
s9 s
10
s8
s1 s
3
s12
s9
Iloc(G) ⊆ I(P)
G is an I-map of P
(independence map)
True distribution P
can be represented exactly as
a Bayes net (G,P)
![Page 26: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/26.jpg)
26
The general case
![Page 27: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/27.jpg)
27
Factorization Theorem
s1 s
2 s3
s4
s5 s
7s6
s11
s12
s9 s
10
s8
s1 s
3
s12
s9
Iloc(G) ⊆ I(P)True distribution P
can be represented exactly as
Bayesian network (G,P)G is an I-map of P
(independence map)
![Page 28: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/28.jpg)
28
Defining a Bayes NetGiven random variables and known conditional
independences
Pick ordering X1,…,Xn of the variables
For each Xi
Find minimal subset A ⊆{X1,…,Xi-1} such that Xi ⊥ X¬A | A,
where ¬A = {X1,…,Xn} \ A
Specify / learn CPD(Xi | A)
Ordering matters a lot for compactness of
representation! More later this course.
![Page 29: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/29.jpg)
29
Adding edges doesn’t hurtTheorem:
Let G be an I-Map for P, and G’ be derived from G by
adding an edge. Then G’ is an I-Map of P
(G’ is strictly more expressive than G)
Proof
![Page 30: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/30.jpg)
30
Additional conditional independencies
BN specifies joint distribution through conditional
parameterization that satisfies Local Markov Property
But we also talked about additional properties of CI
Weak Union, Intersection, Contraction, …
Which additional CI does a particular BN specify?
All CI that can be derived through algebraic operations
![Page 31: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/31.jpg)
31
What you need to knowBayesian networks
Local Markov property
I-Maps
Factorization Theorem
![Page 32: Probabilistic Graphical Models - Caltech Computingcourses.cms.caltech.edu/cs155/slides/cs155-02... · Probabilistic Graphical Models Lecture 2 –Bayesian Networks Representation](https://reader033.fdocuments.net/reader033/viewer/2022060408/5f0fd65c7e708231d446214b/html5/thumbnails/32.jpg)
32
TasksSubscribe to Mailing list
https://utils.its.caltech.edu/mailman/listinfo/cs155
Read Koller & Friedman Chapter 3.1-3.3
Form groups and think about class projects. If you
have difficulty finding a group, email Pete Trautman