Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)
Transcript of Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)
![Page 1: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/1.jpg)
Hidden Markov Models (II):Log-Likelihood (Forward Algorithm)
CMSC 473/673UMBC
October 11th, 2017
![Page 2: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/2.jpg)
Course Announcement: Assignment 2
Due next Saturday, 10/21 at 11:59 PM (~10 days)
Any questions?
![Page 3: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/3.jpg)
Recap from last timeβ¦
![Page 4: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/4.jpg)
Expectation Maximization (EM)0. Assume some value for your parameters
Two step, iterative algorithm
1. E-step: count under uncertainty, assuming these parameters
2. M-step: maximize log-likelihood, assuming these uncertain counts
estimated counts
![Page 5: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/5.jpg)
Counting Requires Marginalizing
w z1 & w z2 & w z3 & w z4 & w
ππ π€π€ = ππ π§π§1,π€π€ + ππ π§π§2,π€π€ + ππ π§π§3,π€π€ + ππ π§π§4,π€π€ = οΏ½π§π§=1
4
ππ(π§π§ππ ,π€π€)
E-step: count under uncertainty, assuming these parameters
break into 4 disjoint pieces
![Page 6: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/6.jpg)
Agenda
HMM Motivation (Part of Speech) and Brief Definition
What is Part of Speech?
HMM Detailed Definition
HMM Tasks
![Page 7: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/7.jpg)
Parts of Speech
Adapted from Luke Zettlemoyer
Closed class words
Open class words
Nouns
milk cat
cats
UMBC
Baltimorebread
speak
give
cando
may
Verbs
Adjectives
would-be
wettestlargehappy
red
fake
Kamp & Partee (1995)
Adverbs
recentlyhappily
thenthere (location)
intransitive
run
ditransitive
transitive
subsective
non-subsective
modals, auxiliaries
I you
Determiners
PrepositionsConjunctions
Pronouns
athe
everywhat
inunder
topParticles
(set) up
so (far)not
(call) off
and or ifbecause
there
Numbers
one
1,324
because
![Page 8: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/8.jpg)
Penn Treebank Part of Speech
3SLP: Chapter 10
![Page 9: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/9.jpg)
http://universaldependencies.org/
![Page 10: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/10.jpg)
Hidden Markov Models: Part of Speech
p(British Left Waffles on Falkland Islands)
Adjective Noun Verb
Noun Verb Noun Prep Noun Noun
Prep Noun Noun(i):
(ii):
Class-basedmodel
Bigram modelof the classes
Model allclass sequences
ππ π€π€ππ|π§π§ππ ππ π§π§ππ|π§π§ππβ1 οΏ½π§π§1,..,π§π§ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ
![Page 11: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/11.jpg)
Hidden Markov Model
Goal: maximize (log-)likelihood
In practice: we donβt actually observe these zvalues; we just see the words w
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1| π§π§0 ππ π€π€1|π§π§1 β―ππ π§π§ππ| π§π§ππβ1 ππ π€π€ππ|π§π§ππ= οΏ½
ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ| π§π§ππβ1
if we knew the probability parametersthen we could estimate z and evaluate
likelihoodβ¦ but we donβt! :(
if we did observe z, estimating the probability parameters would be easyβ¦
but we donβt! :(
![Page 12: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/12.jpg)
Hidden Markov Model Terminology
Each zi can take the value of one of K latent states
Transition and emission distributions do not change
Q: How many different probability values are there with K states and V vocab items?
A: VK emission values and K2 transition values
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1| π§π§0 ππ π€π€1|π§π§1 β―ππ π§π§ππ| π§π§ππβ1 ππ π€π€ππ|π§π§ππ= οΏ½
ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ| π§π§ππβ1
emissionprobabilities/parameters
transitionprobabilities/parameters
![Page 13: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/13.jpg)
Hidden Markov Model Representationππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1| π§π§0 ππ π€π€1|π§π§1 β―ππ π§π§ππ| π§π§ππβ1 ππ π€π€ππ|π§π§ππ
= οΏ½ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ| π§π§ππβ1emission
probabilities/parameterstransitionprobabilities/parameters
z1
w1
β¦
w2 w3 w4
z2 z3 z4
ππ π€π€1|π§π§1 ππ π€π€2|π§π§2 ππ π€π€3|π§π§3 ππ π€π€4|π§π§4
ππ π§π§2| π§π§1 ππ π§π§3| π§π§2 ππ π§π§4| π§π§3ππ π§π§1| π§π§0
initial starting distribution
(βBOSβ)
Each zi can take the value of one of K latent states
Transition and emission distributions do not change
Graphical Models(see 478/678)
![Page 14: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/14.jpg)
Example: 2-state Hidden Markov Model as a Lattice
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
![Page 15: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/15.jpg)
Hidden Markov Model
Unigram Class-based Language Model (βKβ coins)
Unigram Language Model
Comparison of Joint Probabilitiesππ π€π€1,π€π€2, β¦ ,π€π€ππ = ππ π€π€1 ππ π€π€2 β―ππ π€π€ππ = οΏ½
ππ
ππ π€π€ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1 ππ π€π€1|π§π§1 β―ππ π§π§ππ ππ π€π€ππ|π§π§ππ= οΏ½
ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1| π§π§0 ππ π€π€1|π§π§1 β―ππ π§π§ππ| π§π§ππβ1 ππ π€π€ππ|π§π§ππ= οΏ½
ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ| π§π§ππβ1
![Page 16: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/16.jpg)
Agenda
HMM Motivation (Part of Speech) and Brief Definition
What is Part of Speech?
HMM Detailed Definition
HMM Tasks
![Page 17: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/17.jpg)
Hidden Markov Model Tasks
Calculate the (log) likelihood of an observed sequence w1, β¦, wN
Calculate the most likely sequence of states (for an observed sequence)
Learn the emission and transition parameters
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1| π§π§0 ππ π€π€1|π§π§1 β―ππ π§π§ππ| π§π§ππβ1 ππ π€π€ππ|π§π§ππ= οΏ½
ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ| π§π§ππβ1emission
probabilities/parameterstransitionprobabilities/parameters
![Page 18: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/18.jpg)
Hidden Markov Model Tasks
Calculate the (log) likelihood of an observed sequence w1, β¦, wN
Calculate the most likely sequence of states (for an observed sequence)
Learn the emission and transition parameters
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ = ππ π§π§1| π§π§0 ππ π€π€1|π§π§1 β―ππ π§π§ππ| π§π§ππβ1 ππ π€π€ππ|π§π§ππ= οΏ½
ππ
ππ π€π€ππ|π§π§ππ ππ π§π§ππ| π§π§ππβ1emission
probabilities/parameterstransitionprobabilities/parameters
![Page 19: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/19.jpg)
HMM Likelihood Task
Marginalize over all latent sequence joint likelihoods
ππ π€π€1,π€π€2, β¦ ,π€π€ππ = οΏ½π§π§1,β―,π§π§ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ
Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?
![Page 20: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/20.jpg)
HMM Likelihood Task
Marginalize over all latent sequence joint likelihoods
ππ π€π€1,π€π€2, β¦ ,π€π€ππ = οΏ½π§π§1,β―,π§π§ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ
Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?
A: KN
![Page 21: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/21.jpg)
HMM Likelihood Task
Marginalize over all latent sequence joint likelihoods
ππ π€π€1,π€π€2, β¦ ,π€π€ππ = οΏ½π§π§1,β―,π§π§ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ
Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?
A: KN
Goal:Find a way to compute this exponential sum efficiently
(in polynomial time)
![Page 22: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/22.jpg)
HMM Likelihood Task
Marginalize over all latent sequence joint likelihoods
ππ π€π€1,π€π€2, β¦ ,π€π€ππ = οΏ½π§π§1,β―,π§π§ππ
ππ π§π§1,π€π€1, π§π§2,π€π€2, β¦ , π§π§ππ,π€π€ππ
Q: In a K-state HMM for a length N observation sequence, how many summands (different latent sequences) are there?
A: KN
Goal:Find a way to compute this exponential sum efficiently
(in polynomial time)
Like in language modeling, you need to model when to stop generating.
This ending state is generally not included in βK.β
![Page 23: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/23.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
Q: What are the latent sequences here (EOS excluded)?
![Page 24: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/24.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
(N, w1), (N, w2), (N, w3), (N, w4)(N, w1), (N, w2), (N, w3), (V, w4)(N, w1), (N, w2), (V, w3), (N, w4)(N, w1), (N, w2), (V, w3), (V, w4)
A:(N, w1), (V, w2), (N, w3), (N, w4)(N, w1), (V, w2), (N, w3), (V, w4)(N, w1), (V, w2), (V, w3), (N, w4)(N, w1), (V, w2), (V, w3), (V, w4)
(V, w1), (N, w2), (N, w3), (N, w4)(V, w1), (N, w2), (N, w3), (V, w4)β¦ (six more)
Q: What are the latent sequences here (EOS excluded)?
![Page 25: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/25.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
(N, w1), (N, w2), (N, w3), (N, w4)(N, w1), (N, w2), (N, w3), (V, w4)(N, w1), (N, w2), (V, w3), (N, w4)(N, w1), (N, w2), (V, w3), (V, w4)
A:(N, w1), (V, w2), (N, w3), (N, w4)(N, w1), (V, w2), (N, w3), (V, w4)(N, w1), (V, w2), (V, w3), (N, w4)(N, w1), (V, w2), (V, w3), (V, w4)
(V, w1), (N, w2), (N, w3), (N, w4)(V, w1), (N, w2), (N, w3), (V, w4)β¦ (six more)
Q: What are the latent sequences here (EOS excluded)?
![Page 26: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/26.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 27: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/27.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
Q: Whatβs the probability of(N, w1), (V, w2), (V, w3), (N, w4)?N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 28: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/28.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππππ ππ| ππ
ππ π€π€3|ππππ π€π€2|ππ
Q: Whatβs the probability of(N, w1), (V, w2), (V, w3), (N, w4)?
A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) =0.0002822
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 29: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/29.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππππ ππ| ππ
ππ π€π€3|ππππ π€π€2|ππ
Q: Whatβs the probability of(N, w1), (V, w2), (V, w3), (N, w4) with ending included (unique ending symbol β#β)?A: (.7*.7) * (.8*.6) * (.35*.1) * (.6*.05) *
(.05 * 1) =0.00001235
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4 #
N .7 .2 .05 .05 0
V .2 .6 .1 .1 0end 0 0 0 0 1
![Page 30: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/30.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1
β¦
w2 w3 w4
ππ π€π€1|ππ ππ π€π€2|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| start
β¦ππ ππ| ππ ππ ππ| ππ ππ ππ| ππππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
ππ π€π€4|ππππ π€π€3|ππππ π€π€2|ππππ π€π€1|ππ
ππ ππ| ππ ππ ππ| ππ ππ ππ| ππ
Q: Whatβs the probability of(N, w1), (V, w2), (N, w3), (N, w4)?N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 31: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/31.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππππ ππ| ππ
ππ π€π€2|ππ
ππ ππ| ππ
Q: Whatβs the probability of(N, w1), (V, w2), (N, w3), (N, w4)?
A: (.7*.7) * (.8*.6) * (.6*.05) * (.15*.05) =0.00007056
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 32: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/32.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππππ ππ| ππ
ππ π€π€2|ππ
ππ ππ| ππ
Q: Whatβs the probability of(N, w1), (V, w2), (N, w3), (N, w4) with ending (unique symbol β#β)?
A: (.7*.7) * (.8*.6) * (.6*.05) * (.15*.05) *(.05 * 1) =
0.000002646
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4 #
N .7 .2 .05 .05 0
V .2 .6 .1 .1 0end 0 0 0 0 1
![Page 33: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/33.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| start
z2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππππ ππ| ππ
ππ π€π€2|ππ
ππ ππ| ππ
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππππ π€π€2|ππ
![Page 34: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/34.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| start
z2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππππ ππ| ππ
ππ π€π€2|ππ
ππ ππ| ππ
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππππ π€π€2|ππ
Up until here, all the computation was the same
![Page 35: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/35.jpg)
2 (3)-State HMM Likelihood
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€3|ππ ππ π€π€4|ππ
ππ ππ| start
z2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππππ ππ| ππ
ππ π€π€2|ππ
ππ ππ| ππ
z1 = N
w1 w2 w3 w4
ππ π€π€1|ππ ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππππ π€π€2|ππ
Up until here, all the computation was the same
Letβs reuse what computations we can
![Page 36: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/36.jpg)
Reusing Computation: Attempt #1
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
(.7*.7)
(.7*.7) * (.8*.6)
(.7*.7) * (.8*.6) * (.4*.1)
(.7*.7) * (.8*.6) * (.6*.05)
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 37: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/37.jpg)
Reusing Computation: Attempt #1
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
(.7*.7)
(.7*.7) * (.8*.6)
(.7*.7) * (.8*.6) * (.4*.1)
(.7*.7) * (.8*.6) * (.6*.05)
Remember: these are only two of the 16 paths through the
trellis
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 38: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/38.jpg)
Reusing Computation: Attempt #1
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
(.7*.7)
aN[2, V] =(.7*.7) * (.8*.6)
aN[2, V] * (.4*.1)
aN[2, V] * (.6*.05)
Remember: these are only two of the 16 paths through the
trellis
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 39: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/39.jpg)
Reusing Computation: Attempt #1
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
a[1,N] =(.7*.7)
aN[2, V] =a[1, V] * (.8*.6)
aN[2, V] * (.4*.1)
aN[2, V] * (.6*.05)
Remember: these are only two of the 16 paths through the
trellis
These aP(t, s) values
represent the probability of a state s at time
t in a particular shared path P
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 40: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/40.jpg)
Reusing Computation: Attempt #1
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
a[1,N] =(.7*.7)
aN[2, V] =a[1, V] * (.8*.6)
aN[2, V] * (.4*.1)
aN[2, V] * (.6*.05)
Remember: these are only two of the 16 paths through the
trellis
These aP(t, s) values
represent the probability of a state s at time
t in a particular shared path P
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
βNVβ is path prefix
![Page 41: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/41.jpg)
Reusing Computation: Attempt #1
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
(.7*.7)
aN[2, V] =(.7*.7) * (.8*.6)
aN[2, V] * (.4*.1)
aN [2, V] * (.6*.05)
Remember: these are only two of the 16 paths through the
trellis
How do we handle the other 14 paths?
Marginalize at each time step
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
![Page 42: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/42.jpg)
Reusing Computation: Attempt #1
so far, weβve only considered βparticular shared path (AB) *β
aAB(i, *) = aA(i-1, B) * p(* | B) * p(obs at i | *)
zi-1= C
zi-1= B
zi-1 = A
zi = C
zi = B
zi = A
zi-2= C
zi-2= B
zi-2 = A
what about any of the other paths (AAB, ACB, etc.)?
aA(i-1, B)a(i-2, A) aAB(i, *)
![Page 43: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/43.jpg)
Reusing Computation: Attempt #2
zi-1= C
zi-1= B
zi-1 = A
zi = C
zi = B
zi = A
letβs first consider βany shared path ending with B (AB, BB, or CB) Bβ
zi-2= C
zi-2= B
zi-2 = A
![Page 44: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/44.jpg)
Reusing Computation: Attempt #2
letβs first consider βany shared path ending with B (AB, BB, or CB) Bβmarginalize across the previous hidden state values
πΌπΌ ππ,π΅π΅ = οΏ½π π
πΌπΌ ππ β 1, π π β ππ π΅π΅ π π ) β ππ(obs at ππ | π΅π΅)
zi-1= C
zi-1= B
zi-1 = A
zi = C
zi = B
zi = A
zi-2= C
zi-2= B
zi-2 = A
![Page 45: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/45.jpg)
Reusing Computation: Attempt #2
letβs first consider βany shared path ending with B (AB, BB, or CB) Bβmarginalize across the previous hidden state values
πΌπΌ ππ,π΅π΅ = οΏ½π π
πΌπΌ ππ β 1, π π β ππ π΅π΅ π π ) β ππ(obs at ππ | π΅π΅)
aAB(i, B) = aA(i-1, B) * p(B | B) * p(obs at i | B)
zi-1= C
zi-1= B
zi-1 = A
zi = C
zi = B
zi = A
zi-2= C
zi-2= B
zi-2 = A
![Page 46: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/46.jpg)
Reusing Computation: Attempt #2
letβs first consider βany shared path ending with B (AB, BB, or CB) Bβmarginalize across the previous hidden state values
πΌπΌ ππ,π΅π΅ = οΏ½π π
πΌπΌ ππ β 1, π π β ππ π΅π΅ π π ) β ππ(obs at ππ | π΅π΅)
aAB(i, B) = aA(i-1, B) * p(B | B) * p(obs at i | B)
computing Ξ± at time i-1 will correctly incorporate paths
through time i-2: we correctly obey the Markov property
zi-1= C
zi-1= B
zi-1 = A
zi = C
zi = B
zi = A
zi-2= C
zi-2= B
zi-2 = A
![Page 47: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/47.jpg)
Forward Probability
letβs first consider βany shared path ending with B (AB, BB, or CB) Bβmarginalize across the previous hidden state values
πΌπΌ ππ,π΅π΅ = οΏ½π π β²πΌπΌ ππ β 1, π π β² β ππ π΅π΅ π π β²) β ππ(obs at ππ | π΅π΅)
aAB(i, B) = aA(i-1, B) * p(B | B) * p(obs at i | B)
computing Ξ± at time i-1 will correctly incorporate paths
through time i-2: we correctly obey the Markov property
Ξ±(i, B) is the total probability of all paths to that state B from the
beginning
zi-1= C
zi-1= B
zi-1 = A
zi = C
zi = B
zi = A
zi-2= C
zi-2= B
zi-2 = A
![Page 48: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/48.jpg)
Forward Probability
Ξ±(i, s) is the total probability of all paths:1. that start from the beginning
2. that end (currently) in s at step i3. that emit the observation obs at i
πΌπΌ ππ, π π = οΏ½π π β²πΌπΌ ππ β 1, π π β² β ππ π π π π β²) β ππ(obs at ππ | π π )
![Page 49: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/49.jpg)
Forward Probability
Ξ±(i, s) is the total probability of all paths:1. that start from the beginning
2. that end (currently) in s at step i3. that emit the observation obs at i
πΌπΌ ππ, π π = οΏ½π π β²πΌπΌ ππ β 1, π π β² β ππ π π π π β²) β ππ(obs at ππ | π π )
how likely is it to get into state s this way?
what are the immediate ways to get into state s?
whatβs the total probability up until now?
![Page 50: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/50.jpg)
2 (3) -State HMM Likelihood with Forward Probabilities
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
Ξ±[1, N] =(.7*.7)
Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6)
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
ππ ππ| start
Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)
Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.15*.05)
![Page 51: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/51.jpg)
2 (3) -State HMM Likelihood with Forward Probabilities
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
Ξ±[1, N] =(.7*.7)
Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6)
Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)
Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.15*.05)
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
ππ ππ| start
Ξ±[1, V] =(.2*.2)
![Page 52: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/52.jpg)
2 (3) -State HMM Likelihood with Forward Probabilities
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
Ξ±[1, N] =(.7*.7) = .49
Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6) =0.2436
Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)
Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.2*.05)
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
ππ ππ| start
Ξ±[1, V] =(.2*.2) = .04
Ξ±[2, N] =Ξ±[1, N] * (.15*.2) +Ξ±[1, V] * (.6*.2) =.0195
![Page 53: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/53.jpg)
2 (3) -State HMM Likelihood with Forward Probabilities
w3 w4
ππ π€π€3|ππ ππ π€π€4|ππ
z3 = N
z4 = N
z3 = V
z4 = V
ππ ππ| ππ
z1 = N
w1 w2
w3 w4
ππ π€π€1|ππ
ππ π€π€4|ππ
ππ ππ| startz2 = N
z3 = N
z4 = N
z1 = V
z2 = V
z3 = V
z4 = V
ππ ππ| ππ
ππ ππ| ππ
ππ ππ| ππ
ππ π€π€3|ππ
ππ π€π€2|ππ
ππ ππ| ππ
Ξ±[1, N] =(.7*.7) = .49
Ξ±[2, V] =Ξ±[1, N] * (.8*.6) +Ξ±[1, V] * (.35*.6) =0.2436
Ξ±[3, V] =Ξ±[2, V] * (.35*.1)+Ξ±[2, N] * (.8*.1)
Ξ±[3, N] =Ξ±[2, V] * (.6*.05) +Ξ±[2, N] * (.2*.05)
N V end
start .7 .2 .1
N .15 .8 .05
V .6 .35 .05
w1 w2 w3 w4
N .7 .2 .05 .05
V .2 .6 .1 .1
ππ ππ| start
Ξ±[1, V] =(.2*.2) = .04
Ξ±[2, N] =Ξ±[1, N] * (.15*.2) +Ξ±[1, V] * (.6*.2) =.0195
Use dynamic programming to build the Ξ± left-to-right
![Page 54: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/54.jpg)
Forward Algorithm
Ξ±: a 2D table, N+2 x K*N+2: number of observations (+2 for the
BOS & EOS symbols)K*: number of states
Use dynamic programming to build the Ξ± left-to-right
![Page 55: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/55.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {
}
![Page 56: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/56.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
}}
![Page 57: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/57.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = pemission(obsi | state)
}}
![Page 58: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/58.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove
}}
}
![Page 59: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/59.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove
}}
}
we still need to learn these (EM)
![Page 60: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/60.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove
}}
}
Q: What do we return? (How do we return the likelihood of the sequence?)
![Page 61: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/61.jpg)
Forward AlgorithmΞ± = double[N+2][K*]Ξ±[0][*] = 1.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = ptransition(state | old)Ξ±[i][state] += Ξ±[i-1][old] * pobs * pmove
}}
}
Q: What do we return? (How do we return the likelihood of the sequence?)
A: Ξ±[N+1][end]
![Page 62: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/62.jpg)
Interactive HMM Example
https://goo.gl/rbHEoc (Jason Eisner, 2002)
Original: http://www.cs.jhu.edu/~jason/465/PowerPoint/lect24-hmm.xls
![Page 63: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/63.jpg)
Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]
Ξ±[0][*] = 0.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],
Ξ±[i-1][old] + pobs + pmove)}
}}
![Page 64: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/64.jpg)
Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]
Ξ±[0][*] = 0.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],
Ξ±[i-1][old] + pobs + pmove)}
}}
logadd ππππ, ππππ = log exp ππππ + exp(ππππ)
![Page 65: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/65.jpg)
Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]
Ξ±[0][*] = 0.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],
Ξ±[i-1][old] + pobs + pmove)}
}}
logadd ππππ, ππππ = log exp ππππ + exp(ππππ)
this can still overflow! (why?)
![Page 66: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/66.jpg)
Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]
Ξ±[0][*] = 0.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],
Ξ±[i-1][old] + pobs + pmove)}
}} logadd ππππ, ππππ = οΏ½ ππππ + log 1 + exp ππππ β ππππ , ππππ β₯ ππππ
ππππ + log 1 + exp ππππ β ππππ , ππππ > ππππ
![Page 67: Hidden Markov Models (II): Log-Likelihood (Forward Algorithm)](https://reader031.fdocuments.net/reader031/viewer/2022013018/61d14e995744095b9e74c238/html5/thumbnails/67.jpg)
Forward Algorithm in Log-SpaceΞ± = double[N+2][K*]
Ξ±[0][*] = 0.0
for(i = 1; i β€ N+1; ++i) {for(state = 0; state < K*; ++state) {
pobs = log pemission(obsi | state)for(old = 0; old < K*; ++old) {
pmove = log ptransition(state | old)Ξ±[i][state] = logadd(Ξ±[i][state],
Ξ±[i-1][old] + pobs + pmove)}
}} logadd ππππ, ππππ = οΏ½ ππππ + log 1 + exp ππππ β ππππ , ππππ β₯ ππππ
ππππ + log 1 + exp ππππ β ππππ , ππππ > ππππ
scipy.misc.logsumexp