Directed Graphical Models
Transcript of Directed Graphical Models
![Page 1: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/1.jpg)
Directed Probabilistic Graphical Models
CMSC 678
UMBC
![Page 2: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/2.jpg)
Announcement 1: Assignment 3
Due Wednesday April 11th, 11:59 AM
Any questions?
![Page 3: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/3.jpg)
Announcement 2: Progress Report on Project
Due Monday April 16th, 11:59 AM
Build on the proposal:Update to address commentsDiscuss the progress youβve madeDiscuss what remains to be doneDiscuss any new blocks youβve experienced
(or anticipate experiencing)
Any questions?
![Page 4: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/4.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 5: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/5.jpg)
Recap from last timeβ¦
![Page 6: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/6.jpg)
Expectation Maximization (EM): E-step0. Assume some value for your parameters
Two step, iterative algorithm
1. E-step: count under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood, assuming these uncertain counts
count(π§π , π€π)π(π§π)
π π‘+1 (π§)π(π‘)(π§)estimated
countshttp://blog.innotas.com/wp-
![Page 7: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/7.jpg)
EM Math
maxπ
πΌπ§ ~ ππ(π‘)
(β |π€) log ππ(π§, π€)
E-step: count under uncertainty
M-step: maximize log-likelihood
old parameters
new parametersnew parametersposterior distribution
π π = log-likelihood of complete data (X,Y)
π« π = posterior log-likelihood of incomplete data Y
β³ π = marginal log-likelihood of observed data X
β³ π = πΌπβΌπ(π‘)[π π |π] β πΌπβΌπ(π‘)[π« π |π]
EM does not decrease the marginal log-likelihood
![Page 8: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/8.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 9: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/9.jpg)
Assume an original optimization problem
Lagrange multipliers
![Page 10: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/10.jpg)
Assume an original optimization problem
We convert it to a new optimization problem:
Lagrange multipliers
![Page 11: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/11.jpg)
Lagrange multipliers: an equivalent problem?
![Page 12: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/12.jpg)
Lagrange multipliers: an equivalent problem?
![Page 13: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/13.jpg)
Lagrange multipliers: an equivalent problem?
![Page 14: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/14.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 15: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/15.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
π€1 = 1
π€2 = 5
π€3 = 4
β―
for roll π = 1 to π:π€π βΌ Cat(π)
Generative Story
a probability distribution over 6
sides of the die
π=1
6
ππ = 1 0 β€ ππ β€ 1, βπ
![Page 16: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/16.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
π€1 = 1
π€2 = 5
π€3 = 4
β―
for roll π = 1 to π:π€π βΌ Cat(π)
Generative Story
β π =
π
log ππ(π€π)
=
π
log ππ€π
Maximize Log-likelihood
![Page 17: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/17.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
for roll π = 1 to π:π€π βΌ Cat(π)
Generative Story
β π =
π
log ππ€π
Maximize Log-likelihood
Q: Whatβs an easy way to maximize this, as written exactly (even without calculus)?
![Page 18: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/18.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
for roll π = 1 to π:π€π βΌ Cat(π)
Generative Story
β π =
π
log ππ€π
Maximize Log-likelihood
Q: Whatβs an easy way to maximize this, as written exactly (even without calculus)?
A: Just keep increasing ππ (we know π must be a distribution, but itβs not specified)
![Page 19: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/19.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
β π =
π
log ππ€πs. t.
π=1
6
ππ = 1
Maximize Log-likelihood (with distribution constraints)
(we can include the inequality constraints
0 β€ ππ, but it complicates the problem and, right
now, is not needed)
solve using Lagrange multipliers
![Page 20: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/20.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
β± π =
π
log ππ€πβ π
π=1
6
ππ β 1
Maximize Log-likelihood (with distribution constraints)
(we can include the inequality constraints
0 β€ ππ, but it complicates the
problem and, right now, is not needed)
πβ± π
πππ=
π:π€π=π
1
ππ€π
β π πβ± π
ππ= β
π=1
6
ππ + 1
![Page 21: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/21.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
β± π =
π
log ππ€πβ π
π=1
6
ππ β 1
Maximize Log-likelihood (with distribution constraints)
(we can include the inequality constraints
0 β€ ππ, but it complicates the
problem and, right now, is not needed)
ππ =Οπ:π€π=π
1
π optimal π when
π=1
6
ππ = 1
![Page 22: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/22.jpg)
Probabilistic Estimation of Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
N different (independent) rolls
β± π =
π
log ππ€πβ π
π=1
6
ππ β 1
Maximize Log-likelihood (with distribution constraints)
(we can include the inequality constraints
0 β€ ππ, but it complicates the
problem and, right now, is not needed)
ππ =Οπ:π€π=π
1
ΟπΟπ:π€π=π1=πππ optimal π when
π=1
6
ππ = 1
![Page 23: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/23.jpg)
Example: Conditionally Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1 π π€1|π§1 β―π π§π π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π
add complexity to better explain what we see
π€1 = 1
π€2 = 5
β―
π§1 = π»
π§2 = π
π heads = π
π tails = 1 β π
π heads = πΎ
π heads = π
π tails = 1 β πΎ
π tails = 1 β π
![Page 24: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/24.jpg)
Example: Conditionally Rolling a Die
π π€1, π€2, β¦ , π€π = π π€1 π π€2 β―π π€π =ΰ·
π
π π€π
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1 π π€1|π§1 β―π π§π π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π
add complexity to better explain what we see
π heads = π
π tails = 1 β π
π heads = πΎ
π heads = π
π tails = 1 β πΎ
π tails = 1 β π
for item π = 1 to π:π§π ~ Bernoulli π
Generative Story
π = distribution over pennyπΎ = distribution for dollar coinπ = distribution over dime
if π§π = π»:π€π ~ Bernoulli πΎ
else: π€π ~ Bernoulli π
![Page 25: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/25.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 26: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/26.jpg)
Classify with Bayes Rule
argmaxπ log π π π) + log π(π)
likelihood prior
argmaxππ π π)
![Page 27: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/27.jpg)
The Bag of Words Representation
Adapted from Jurafsky & Martin (draft)
![Page 28: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/28.jpg)
The Bag of Words Representation
Adapted from Jurafsky & Martin (draft)
![Page 29: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/29.jpg)
The Bag of Words Representation
29Adapted from Jurafsky & Martin (draft)
![Page 30: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/30.jpg)
Bag of Words Representation
Ξ³( )=c
seen 2
sweet 1
whimsical 1
recommend 1
happy 1
... ...classifier
classifier
Adapted from Jurafsky & Martin (draft)
![Page 31: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/31.jpg)
NaΓ―ve Bayes: A Generative StoryGenerative Story
π = distribution over πΎ labelsfor label π = 1 to πΎ:
global parameters
ππ = generate parameters
![Page 32: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/32.jpg)
NaΓ―ve Bayes: A Generative Story
for item π = 1 to π:π¦π ~ Cat π
Generative Story
π = distribution over πΎ labelsy
for label π = 1 to πΎ:
ππ = generate parameters
![Page 33: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/33.jpg)
NaΓ―ve Bayes: A Generative Story
for item π = 1 to π:π¦π ~ Cat π
Generative Story
π = distribution over πΎ labels
for each feature ππ₯ππ βΌ Fπ(ππ¦π)
π₯π1 π₯π2 π₯π3 π₯π4 π₯π5
y
for label π = 1 to πΎ:
ππ = generate parameters
local variables
![Page 34: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/34.jpg)
NaΓ―ve Bayes: A Generative Story
for item π = 1 to π:π¦π ~ Cat π
Generative Story
π = distribution over πΎ labels
π₯π1 π₯π2 π₯π3 π₯π4 π₯π5
y
for label π = 1 to πΎ:
each xij conditionally independent of one
another (given the label)
ππ = generate parameters
for each feature ππ₯ππ βΌ Fπ(ππ¦π)
![Page 35: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/35.jpg)
NaΓ―ve Bayes: A Generative Story
for item π = 1 to π:π¦π ~ Cat π
Generative Story
β π =
π
π
log πΉπ¦π(π₯ππ; ππ¦π) +
π
log ππ¦π s. t.
Maximize Log-likelihood
π = distribution over πΎ labels
π₯π1 π₯π2 π₯π3 π₯π4 π₯π5
y
for label π = 1 to πΎ:
π
ππ = 1 ππ is valid for πΉπ
ππ = generate parameters
for each feature ππ₯ππ βΌ Fπ(ππ¦π)
![Page 36: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/36.jpg)
Multinomial NaΓ―ve Bayes: A Generative Story
for item π = 1 to π:π¦π ~ Cat π
Generative Story
β π =
π
π
log ππ¦π,π₯π,π +
π
logππ¦π s. t.
Maximize Log-likelihood
π = distribution over πΎ labels
for each feature ππ₯ππ βΌ Cat(ππ¦π,π)
π₯π1 π₯π2 π₯π3 π₯π4 π₯π5
y
for label π = 1 to πΎ:ππ = distribution over J feature values
π
ππ = 1
π
πππ = 1 βπ
![Page 37: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/37.jpg)
Multinomial NaΓ―ve Bayes: A Generative Story
for item π = 1 to π:π¦π ~ Cat π
Generative Story
β π
=
π
π
log ππ¦π,π₯π,π +
π
logππ¦π β π
π
ππ β 1 β
π
ππ
π
πππ β 1
Maximize Log-likelihood via Lagrange Multipliers
π = distribution over πΎ labels
for each feature ππ₯ππ βΌ Cat(ππ¦π,π)
π₯π1 π₯π2 π₯π3 π₯π4 π₯π5
y
for label π = 1 to πΎ:ππ = distribution over J feature values
![Page 38: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/38.jpg)
Multinomial NaΓ―ve Bayes: Learning
Calculate class priors
For each k:itemsk = all items with class = k
Calculate feature generation termsFor each k:
obsk = single object containing all items labeled as k
For each feature jnkj = # of occurrences of j in obsk
π π =|itemsπ|
# itemsπ π|π =
πππΟπβ² πππβ²
![Page 39: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/39.jpg)
Brill and Banko (2001)
With enough data, the classifier may not matter
Adapted from Jurafsky & Martin (draft)
![Page 40: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/40.jpg)
Summary: NaΓ―ve Bayes is Not So NaΓ―ve, but not without issue
ProVery Fast, low storage requirements
Robust to Irrelevant Features
Very good in domains with many equally important features
Optimal if the independence assumptions hold
Dependable baseline for text classification (but often not the best)
ConModel the posterior in one go? (e.g., use conditional maxent)
Are the features really uncorrelated?
Are plain counts always appropriate?
Are there βbetterβ ways of handling missing/noisy data?
(automated, more principled)
Adapted from Jurafsky & Martin (draft)
![Page 41: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/41.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 42: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/42.jpg)
Hidden Markov Models
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb
Noun Verb Noun Prep Noun Noun
Prep Noun Noun(i):
(ii):
Class-based
model
Bigram model
of the classes
Model all
class sequences
π π€π|π§π π π§π|π§πβ1
π§1,..,π§π
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π
![Page 43: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/43.jpg)
Hidden Markov Model
Goal: maximize (log-)likelihood
In practice: we donβt actually observe these zvalues; we just see the words w
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
![Page 44: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/44.jpg)
Hidden Markov Model
Goal: maximize (log-)likelihood
In practice: we donβt actually observe these zvalues; we just see the words w
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
if we knew the probability parametersthen we could estimate z and evaluate
likelihoodβ¦ but we donβt! :(
if we did observe z, estimating the probability parameters would be easyβ¦
but we donβt! :(
![Page 45: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/45.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
![Page 46: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/46.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
transitionprobabilities/parameters
![Page 47: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/47.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
emissionprobabilities/parameters
transitionprobabilities/parameters
![Page 48: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/48.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
Transition and emission distributions do not change
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
emissionprobabilities/parameters
transitionprobabilities/parameters
![Page 49: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/49.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
Transition and emission distributions do not change
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
emissionprobabilities/parameters
transitionprobabilities/parameters
Q: How many different probability values are there with K states and V vocab items?
![Page 50: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/50.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
Transition and emission distributions do not change
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1
emissionprobabilities/parameters
transitionprobabilities/parameters
Q: How many different probability values are there with K states and V vocab items?
A: VK emission values and K2 transition values
![Page 51: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/51.jpg)
Hidden Markov Model Representation
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1emission
probabilities/parameterstransitionprobabilities/parameters
z1
w1
β¦
w2 w3 w4
z2 z3 z4
represent the probabilities and independence assumptions in a graph
![Page 52: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/52.jpg)
Hidden Markov Model Representation
π π§1, π€1, π§2, π€2, β¦ , π§π, π€π = π π§1| π§0 π π€1|π§1 β―π π§π| π§πβ1 π π€π|π§π
=ΰ·
π
π π€π|π§π π π§π| π§πβ1emission
probabilities/parameterstransitionprobabilities/parameters
z1
w1
β¦
w2 w3 w4
z2 z3 z4
π π€1|π§1 π π€2|π§2 π π€3|π§3 π π€4|π§4
π π§2| π§1 π π§3| π§2 π π§4| π§3π π§1| π§0
initial starting distribution
(β__SEQ_SYM__β)
Each zi can take the value of one of K latent states
Transition and emission distributions do not change
![Page 53: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/53.jpg)
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
β¦
w2 w3 w4
z2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V β¦
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 54: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/54.jpg)
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
β¦
w2 w3 w4
π π€1|π π π€2|π π π€3|π π π€4|π
z2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z4 = V β¦
π π€4|ππ π€3|ππ π€2|ππ π€1|π
z3 = V
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 55: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/55.jpg)
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
β¦
w2 w3 w4
π π€1|π π π€2|π π π€3|π π π€4|π
π π| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
π π| π π π| π π π| ππ π| start
β¦
π π€4|ππ π€3|ππ π€2|ππ π€1|π
π π| π π π| π π π| π
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 56: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/56.jpg)
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
β¦
w2 w3 w4
π π€1|π π π€2|π π π€3|π π π€4|π
π π| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
π π| π π π| π π π| ππ π| start
β¦π π| π π π| π π π| ππ π| π
π π| π π π| π
π π€4|ππ π€3|ππ π€2|ππ π€1|π
π π| π π π| π π π| π
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 57: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/57.jpg)
A Latent Sequence is a Path through the Graph
z1 = N
w1 w2 w3 w4
π π€1|ππ π€4|π
π π| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
π π| π
π π| ππ π| π
π π€3|ππ π€2|π
Q: Whatβs the probability of(N, w1), (V, w2), (V, w3), (N, w4)?
A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) =0.0002822
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 58: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/58.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 59: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/59.jpg)
Message Passing: Count the Soldiers
If you are the front soldier in the line, say the number βoneβ to the soldier behind you.
If you are the rearmost soldier in the line, say the number βoneβ to the soldier in front of you.
If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side
ITILA, Ch 16
![Page 60: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/60.jpg)
Message Passing: Count the Soldiers
If you are the front soldier in the line, say the number βoneβ to the soldier behind you.
If you are the rearmost soldier in the line, say the number βoneβ to the soldier in front of you.
If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side
ITILA, Ch 16
![Page 61: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/61.jpg)
Message Passing: Count the Soldiers
If you are the front soldier in the line, say the number βoneβ to the soldier behind you.
If you are the rearmost soldier in the line, say the number βoneβ to the soldier in front of you.
If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side
ITILA, Ch 16
![Page 62: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/62.jpg)
Message Passing: Count the Soldiers
If you are the front soldier in the line, say the number βoneβ to the soldier behind you.
If you are the rearmost soldier in the line, say the number βoneβ to the soldier in front of you.
If a soldier ahead of or behind you says a number to you, add one to it, and say the new number to the soldier on the other side
ITILA, Ch 16
![Page 63: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/63.jpg)
Whatβs the Maximum Weighted Path?
9 6
7
3
32 1
4
![Page 64: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/64.jpg)
Whatβs the Maximum Weighted Path?
9 6
7
3
32 1
4
![Page 65: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/65.jpg)
Whatβs the Maximum Weighted Path?
9 6
7
3
32 1
4+310
+37
![Page 66: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/66.jpg)
Whatβs the Maximum Weighted Path?
9 6
7
3
32 1
4+310
+37
![Page 67: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/67.jpg)
Whatβs the Maximum Weighted Path?
9 6
7
3
32 1
4+310
+37
+10 19 +10 16 +10 42 +10 11
![Page 68: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/68.jpg)
Whatβs the Maximum Weighted Path?
9 6
7
3
32 1
4+310
+37
+10 19 +10 16 +10 42 +10 11
![Page 69: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/69.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 70: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/70.jpg)
Whatβs the Maximum Value?
consider βany shared path ending with B (AB, BB, or CB) Bβ
maximize across the previous hidden state values
π£ π, π΅ = maxπ β²
π£ π β 1, π β² β π π΅ π β²) β π(obs at π | π΅)
v(i, B) is the maximum
probability of any paths to that state B from the beginning (and
emitting the observation)
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
![Page 71: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/71.jpg)
Whatβs the Maximum Value?
consider βany shared path ending with B (AB, BB, or CB) Bβ
maximize across the previous hidden state values
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi-2
= C
zi-2
= B
zi-2
= A
π£ π, π΅ = maxπ β²
π£ π β 1, π β² β π π΅ π β²) β π(obs at π | π΅)
v(i, B) is the maximum
probability of any paths to that state B from the beginning (and
emitting the observation)
![Page 72: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/72.jpg)
Whatβs the Maximum Value?
consider βany shared path ending with B (AB, BB, or CB) Bβ
maximize across the previous hidden state values
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi-2
= C
zi-2
= B
zi-2
= Acomputing v at time i-1will correctly incorporate
(maximize over) paths through time i-2:
we correctly obey the Markov property
π£ π, π΅ = maxπ β²
π£ π β 1, π β² β π π΅ π β²) β π(obs at π | π΅)
v(i, B) is the maximum
probability of any paths to that state B from the beginning (and
emitting the observation)
Viterbi algorithm
![Page 73: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/73.jpg)
Viterbi Algorithmv = double[N+2][K*]
b = int[N+2][K*]
v[*][*] = 0
v[0][START] = 1
for(i = 1; i β€ N+1; ++i) {
for(state = 0; state < K*; ++state) {
pobs = pemission(obsi | state)
for(old = 0; old < K*; ++old) {
pmove = ptransition(state | old)
if(v[i-1][old] * pobs * pmove > v[i][state]) {
v[i][state] = v[i-1][old] * pobs * pmove
b[i][state] = old}
}
}
}
backpointers/book-keeping
![Page 74: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/74.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 75: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/75.jpg)
Forward Probability
Ξ±(i, B) is the total probability of all paths to that state B from the
beginning
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi-2
= C
zi-2
= B
zi-2
= A
![Page 76: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/76.jpg)
Forward Probability
marginalize across the previous hidden state values
πΌ π, π΅ =
π β²
πΌ π β 1, π β² β π π΅ π β²) β π(obs at π | π΅)
computing Ξ± at time i-1 will correctly incorporate paths through time i-2: we correctly obey the Markov property
Ξ±(i, B) is the total probability of all paths to that state B from the
beginning
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi-2
= C
zi-2
= B
zi-2
= A
![Page 77: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/77.jpg)
Forward Probability
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
πΌ π, π =
π β²
πΌ π β 1, π β² β π π π β²) β π(obs at π | π )
![Page 78: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/78.jpg)
Forward Probability
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
πΌ π, π =
π β²
πΌ π β 1, π β² β π π π β²) β π(obs at π | π )
how likely is it to get into state s this way?
what are the immediate ways to get into state s?
whatβs the total probability up until now?
![Page 79: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/79.jpg)
Forward Probability
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
πΌ π, π =
π β²
πΌ π β 1, π β² β π π π β²) β π(obs at π | π )
how likely is it to get into state s this way?
what are the immediate ways to get into state s?
whatβs the total probability up until now?
Q: What do we return? (How do we return the likelihood of the sequence?) A: Ξ±[N+1][end]
![Page 80: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/80.jpg)
OutlineRecap of EM
Math: Lagrange Multipliers for constrained optimization
Probabilistic Modeling Example: Die Rolling
Directed Graphical ModelsNaΓ―ve BayesHidden Markov Models
Message Passing: Directed Graphical Model InferenceMost likely sequenceTotal (marginal) probabilityEM in D-PGMs
![Page 81: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/81.jpg)
Forward & Backward Message Passing
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi+1
= C
zi+1
= B
zi+1
= A
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
Ξ²(i, s) is the total probability of all paths:
1. that start at step i at state s
2. that terminate at the end3. (that emit the observation obs at i+1)
![Page 82: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/82.jpg)
Forward & Backward Message Passing
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi+1
= C
zi+1
= B
zi+1
= A
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
Ξ²(i, s) is the total probability of all paths:
1. that start at step i at state s
2. that terminate at the end3. (that emit the observation obs at i+1)
Ξ±(i, B) Ξ²(i, B)
![Page 83: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/83.jpg)
Forward & Backward Message Passing
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
zi+1
= C
zi+1
= B
zi+1
= A
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
Ξ²(i, s) is the total probability of all paths:
1. that start at step i at state s
2. that terminate at the end3. (that emit the observation obs at i+1)
Ξ±(i, B) Ξ²(i, B)
Ξ±(i, B) * Ξ²(i, B) = total probability of paths through state B at step i
![Page 84: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/84.jpg)
Forward & Backward Message Passing
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
Ξ²(i, s) is the total probability of all paths:
1. that start at step i at state s
2. that terminate at the end3. (that emit the observation obs at i+1)
Ξ±(i, B) Ξ²(i+1, s)
zi+1
= C
zi+1
= B
zi+1
= A
![Page 85: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/85.jpg)
Forward & Backward Message Passing
zi-1
= C
zi-1
= B
zi-1
= A
zi = C
zi = B
zi = A
Ξ±(i, B) Ξ²(i+1, sβ)
zi+1
= C
zi+1
= B
zi+1
= A
Ξ±(i, s) is the total probability of all paths:
1. that start from the beginning
2. that end (currently) in s at step i
3. that emit the observation obs at i
Ξ²(i, s) is the total probability of all paths:
1. that start at step i at state s
2. that terminate at the end3. (that emit the observation obs at i+1)
Ξ±(i, B) * p(sβ | B) * p(obs at i+1 | sβ) * Ξ²(i+1, sβ) =total probability of paths through the Bsβ arc (at time i)
![Page 86: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/86.jpg)
With Both Forward and Backward Values
Ξ±(i, s) * p(sβ | B) * p(obs at i+1 | sβ) * Ξ²(i+1, sβ) =total probability of paths through the ssβ arc (at time i)
Ξ±(i, s) * Ξ²(i, s) = total probability of paths through state s at step i
π π§π = π π€1, β― , π€π) =πΌ π, π β π½(π, π )
πΌ(π + 1, END)
π π§π = π , π§π+1 = π β² π€1, β― , π€π) =πΌ π, π β π π β² π β π obsπ+1 π β² β π½(π + 1, π β²)
πΌ(π + 1, END)
![Page 87: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/87.jpg)
Expectation Maximization (EM)0. Assume some value for your parameters
Two step, iterative algorithm
1. E-step: count under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood, assuming these uncertain counts
estimated counts
![Page 88: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/88.jpg)
Expectation Maximization (EM)0. Assume some value for your parameters
Two step, iterative algorithm
1. E-step: count under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood, assuming these uncertain counts
estimated counts
pobs(w | s)
ptrans(sβ | s)
![Page 89: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/89.jpg)
Expectation Maximization (EM)0. Assume some value for your parameters
Two step, iterative algorithm
1. E-step: count under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood, assuming these uncertain counts
estimated counts
pobs(w | s)
ptrans(sβ | s)
πβ π§π = π π€1, β― ,π€π) =πΌ π, π β π½(π, π )
πΌ(π + 1, END)
πβ π§π = π , π§π+1 = π β² π€1, β― ,π€π) =πΌ π, π β π π β² π β π obsπ+1 π β² β π½(π + 1, π β²)
πΌ(π + 1, END)
![Page 90: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/90.jpg)
M-Step
βmaximize log-likelihood, assuming these uncertain countsβ
πnew π β² π ) =π(π β π β²)
Οπ₯ π(π β π₯)
if we observed the hidden transitionsβ¦
![Page 91: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/91.jpg)
M-Step
βmaximize log-likelihood, assuming these uncertain countsβ
πnew π β² π ) =πΌπ βπ β²[π π β π β² ]
Οπ₯πΌπ βπ₯[π π β π₯ ]
we donβt the hidden transitions, but we can approximately count
![Page 92: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/92.jpg)
M-Step
βmaximize log-likelihood, assuming these uncertain countsβ
πnew π β² π ) =πΌπ βπ β²[π π β π β² ]
Οπ₯πΌπ βπ₯[π π β π₯ ]
we donβt the hidden transitions, but we can approximately count
we compute these in the E-step, with our Ξ± and Ξ² values
![Page 93: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/93.jpg)
EM For HMMs (Baum-Welch
Algorithm)
Ξ± = computeForwards()
Ξ² = computeBackwards()
L = Ξ±[N+1][END]
for(i = N; i β₯ 0; --i) {
for(next = 0; next < K*; ++next) {
cobs(obsi+1 | next) += Ξ±[i+1][next]* Ξ²[i+1][next]/L
for(state = 0; state < K*; ++state) {
u = pobs(obsi+1 | next) * ptrans (next | state)
ctrans(next| state) +=
Ξ±[i][state] * u * Ξ²[i+1][next]/L
}
}
}
![Page 94: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/94.jpg)
Bayesian Networks:Directed Acyclic Graphs
π π₯1, π₯2, π₯3, β¦ , π₯π =ΰ·
π
π π₯π π(π₯π))
βparents ofβ
topological sort
![Page 95: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/95.jpg)
Bayesian Networks:Directed Acyclic Graphs
π π₯1, π₯2, π₯3, β¦ , π₯π =ΰ·
π
π π₯π π(π₯π))
exact inference in general DAGs is NP-hard
inference in trees can be exact
![Page 96: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/96.jpg)
D-Separation: Testing for Conditional Independence
Variables X & Y are conditionally
independent given Z if all (undirected) paths from
(any variable in) X to (any variable in) Y are
d-separated by Z
d-separation
P has a chain with an observed middle node
P has a fork with an observed parent node
P includes a βv-structureβ or βcolliderβ with all unobserved descendants
X & Y are d-separated if for all paths P, one of the following is true:
X Y
X Y
X Z Y
![Page 97: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/97.jpg)
D-Separation: Testing for Conditional Independence
Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable
in) X to (any variable in) Y are d-separated by Z
d-separation
P has a chain with an observed middle node
P has a fork with an observed parent node
P includes a βv-structureβ or βcolliderβ with all unobserved descendants
X & Y are d-separated if for all paths P, one of the following is true:
X Z Y
X
Z
Y
X Z Y
observing Z blocks the path from X to Y
observing Z blocks the path from X to Y
not observing Z blocks the path from X to Y
![Page 98: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/98.jpg)
D-Separation: Testing for Conditional Independence
Variables X & Y are conditionally independent given Z if all (undirected) paths from (any variable
in) X to (any variable in) Y are d-separated by Z
d-separation
P has a chain with an observed middle node
P has a fork with an observed parent node
P includes a βv-structureβ or βcolliderβ with all unobserved descendants
X & Y are d-separated if for all paths P, one of the following is true:
X Z Y
X
Z
Y
X Z Y
observing Z blocks the path from X to Y
observing Z blocks the path from X to Y
not observing Z blocks the path from X to Y
π π₯, π¦, π§ = π π₯ π π¦ π(π§|π₯, π¦)
π π₯, π¦ =
π§
π π₯ π π¦ π(π§|π₯, π¦) = π π₯ π π¦
![Page 99: Directed Graphical Models](https://reader035.fdocuments.net/reader035/viewer/2022071913/62d5945c7f1f78292c6801ae/html5/thumbnails/99.jpg)
Markov Blanket
x
Markov blanket of a node x is its parents, children, and
children's parents
π π₯π π₯πβ π =π(π₯1, β¦ , π₯π)
β« π π₯1, β¦ , π₯π ππ₯π
=Οπ π(π₯π|π π₯π )
β« Οπ π π₯π π π₯π ) ππ₯πfactor out terms not dependent on xi
factorization of graph
=Οπ:π=π or πβπ π₯π
π(π₯π|π π₯π )
β« Οπ:π=π or πβπ π₯ππ π₯π π π₯π ) ππ₯π
the set of nodes needed to form the complete conditional for a variable xi