FSA and HMM
description
Transcript of FSA and HMM
Definition of FSA
A FSA is • Q: a finite set of states• Σ: a finite set of input symbols• I: the set of initial states• F: the set of final states• : the transition relation
between states.
),,,,( FIQ
QQ }){(
Definition of FST
A FST is • Q: a finite set of states• Σ: a finite set of input symbols• Γ: a finite set of output symbols• I: the set of initial states• F: the set of final states • : the transition
relation between states.
FSA can be seen as a special case of FST
),,,,,( FIQ
QQ }){(}){(
• The extended transition relation is the smallest set such that
• T transduces a string x into a string y if there exists a path from the initial state to a final state whose input is x and whose output is y:
*),,,(),,,(*),,,(
*
sybxaqsbarryxq
*),,,(..][ fyxqtsFfIqiffyTx
*
Operations on FSTs
• Union:
• Concatenation:
• Composition:
ySxoryTxiffySTx ][][][
zSxandyTwiffyzSTwx ][][][
zSyandyTxtsyiffzSTx ][][..][
Probabilistic finite-state automata (PFA)
• Informally, in a PFA, each arc is associated with a probability.
• The probability of a path is the multiplication of the arcs on the path.
• The probability of a string x is the sum of the probabilities of all the paths for x.
• Tasks:– Given a string x, find the best path for x.– Given a string x, find the probability of x in a PFA.– Find the string with the highest probability in a PFA– …
Formal definition of PFA
A PFA is • Q: a finite set of N states• Σ: a finite set of input symbols• I: Q R+ (initial-state probabilities)• F: Q R+ (final-state probabilities)• : the transition relation
between states.• P: (transition probabilities)
),,,,,( PFIQ
QQ }){(
R
qI 1)(
a
qaqPqFQq
'
1)',,()(
1,1
),()(
),,(*)(*)(),(
1,1,1,1
11
111,1,1
nqnnn
ii
n
iinnn
qwPwP
qwqpqFqIqwP
Constraints on function:
Probability of a string:
Consistency of a PFA
Let A be a PFA.• Def: P(x | A) = the sum of all the valid paths for x in A.• Def: a valid path in A is a path for some string x with
probability greater than 0.• Def: A is called consistent if
• Def: a state of a PFA is useful if it appears in at least one valid path.
• Proposition: a PFA is consistent if all its states are useful. Q1 of Hw1
x
AxP 1)|(
An example of PFA
q0:0 q1:0.2
b:0.8
a:1I(q0)=1.0I(q1)=0.0
P(abn)=0.2*0.8n
18.01
8.0*2.08.0*2.0)()(
0
00
n
n
n
n
x
abPxP
Weighted finite-state automata (WFA)
• Each arc is associated with a weight.• “Sum” and “Multiplication” can be other
meanings.
))(),,()(()(,
tFtxsPsIxweightQts
Two types of HMMs
• State-emission HMM (Moore machine): – The emission probability depends only on the
state (from-state or to-state).
• Arc-emission HMM (Mealy machine): – The probability depends on (from-state, to-state)
pair.
State-emission HMM
Two kinds of parameters:• Transition probability: P(sj | si)• Output (Emission) probability: P(wk | si) # of Parameters: O(NM+N2)
w1 w4 w1
s1 s2sN…
w5w3w1
Arc-emission HMM
s1 s2sN…
w1
w5
Same kinds of parameters but the emission probabilities depend on both states: P(wk, sj
| si)
# of Parameters: O(N2M+N2).
w4
w3
w2
w1
w1
Are the two types of HMMs equivalent?
• For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2).
• The reverse is also true.
Q3 and Q4 of hw1.
Definition of arc-emission HMM
• A HMM is a tuple :
– A set of states S={s1, s2, …, sN}.
– A set of output symbols Σ={w1, …, wM}.
– Initial state probabilities
– State transition prob: A={aij}.
– Symbol emission prob: B={bijk}
• State sequence: X1,n
• Output sequence: O1,n
}{ i
),,,,( BAS
1,1
),()(
),|(*)|(*)(),(
1,1,1,1
11
111,1,1
nXnnn
iiii
n
iinn
XOPOP
xxoPxxPxXOP
Constraints
M
k
N
jij
N
ii
bijk
a
1
1
1
1
1
11 ijk
k jijba
For any integer n and any HMM
nO
HMMOP||
1)|(
Q2 of hw1.
Properties of HMM
• Limited horizon:
• Time invariance: the probabilities do not change over time:
• The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.
)|(),...,|( 1211 tttt XXPXXXXP
)|()|( 11 mtmttt XXPXXP
Applications of HMM
• N-gram POS tagging– Bigram tagger: oi is a word, and si is a POS tag.– Trigram tagger: oi is a word, and si is ??
• Other tagging problems: – Word segmentation– Chunking– NE tagging– Punctuation predication– …
• Other applications: ASR, ….
Three fundamental questions for HMMs
1. Finding the probability of an observation
2. Finding the best state sequence
3. Training: estimating parameters
(1) Finding the probability of the observation
Forward probability: the probability of producing
O1,t-1 while ending up in state si:
),()( 1,1 iXOPt tt
def
i
)1()(1
TOPN
ii
Calculating forward probability
tijoiji
i
ttttti
tttttti
tti
t
ttj
bat
iXjXoPiXOP
iXOjXoPiXOP
jXiXOP
jXOPt
)(
)|,(*),(
),|,(*),(
),,(
),()1(
11,1
1,111,1
1,1
1,1
ii )1(Initialization:
Induction:
(2) Finding the best state sequence
• Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).
Viterbi algorithm
X1 X2XT…
o1 o2oT
XT+1
Viterbi algorithm
The probability of the best path that produces O1,t-1 while ending up in state si:
),,(max)( 1,11,11,1
iXOXPt tttX
def
it
ii )1(
tijoijii
j batt )(max)1(
Initialization:
Induction:
Modify it to allow epsilon emission: Q5 of hw1.
Summary of HMM
• Two types of HMMs: state-emission and arc-emission HMM:
• Properties: Markov assumption• Applications: POS-tagging, etc.• Finding the probability of an observation: forward
probability• Decoding: Viterbi decoding
),,,,( BAKS
Relation between WFA and HMM
• HMM can be seen as a special type of WFA.
• Given an HMM, how to build an equivalent WFA?
Converting HMM into WFA
Given an HMM , build a WFA
such that. for any input sequence O, P(O|HMM)=P(O|WFA).
– Build a WFA: add a final state and arcs to it– Show that there is a one-to-one mapping between the
paths in HMM and the paths in WFA– Prove that the probabilities in HMM and in WFA are
identical.
),,,,,( 2 PFIQ
),,,,( 1 BAS
HMM WFA
1),,(
*),,(
}|),,{()}0*(,|),,{(
1)(0)(
0)(),()(
}{
}{
12
fqP
baqwqP
SqfqabSqqqwq
fFqFSq
fIqqISq
fSQ
i
ijkijjki
iiijijkjijki
The WFA is not a PFA.
Need to create a new state (the final state) and add edges to it.
A slightly different definition of HMM
• A HMM is a tuple :– A set of states S={s1, s2, …, sN}.
– A set of output symbols Σ={w1, …, wM}.
– Initial state probabilities
– State transition prob: A={aij}.
– Symbol emission prob: B={bijk}
– qf is the final state: there are no outcoming edges from qf
}{ i
),,,,,( fqBAS
Constraints
0
0
1
1
1
,,
,
1
1
1
kjq
jq
M
kf
N
jijf
N
ii
f
f
bkj
aj
bijkqi
aqi
For any HMM (under this new definition)
O
HMMOP 1)|(
HMM PFA
ijkijjki
ijijkjijki
ff
baqwqP
abSqqqwq
qFqSqandqF
qqISq
SQ
*),,(
)}0*(,|),,{(
0)(}{1)(
)()(12
),,,,,( 1 fqBASHMM ),,,,,( 2 PFIQPFA
PFA HMM),,,,,( 2 fqBASHMM ),,,,,( 1 PFIQPFA
1
/),,(
][
),,(
0][
][][
}{
}{
,,
,
12
f
f
qi
ijjkiijk
qi
jkk
iij
f
f
b
aqwqPbQi
iFa
qwqPaQi
q
iIiQi
qQS
Need to add a new final state and edges to it
Project: Part 1
• Learn to use Carmel (a WFST package)
• Use Carmel as an HMM Viterbi decoder for a trigram POS tagger.
• The instruction will be handed out on 1/12, and the project is due on 1/19.