FSA and HMM

FSA and HMM

LING 572

Fei Xia

1/5/06

Outline

• FSA

• HMM

• Relation between FSA and HMM

Definition of FSA

A FSA is • Q: a finite set of states• Σ: a finite set of input symbols• I: the set of initial states• F: the set of final states• : the transition relation

between states.

),,,,( FIQ

QQ }){(

An example of FSA

q0 q1

ba

Definition of FST

A FST is • Q: a finite set of states• Σ: a finite set of input symbols• Γ: a finite set of output symbols• I: the set of initial states• F: the set of final states • : the transition

relation between states.

FSA can be seen as a special case of FST

),,,,,( FIQ

QQ }){(}){(

• The extended transition relation is the smallest set such that

• T transduces a string x into a string y if there exists a path from the initial state to a final state whose input is x and whose output is y:

*),,,(),,,(*),,,(

*

sybxaqsbarryxq

*),,,(..][ fyxqtsFfIqiffyTx

*

An example of FST

q0 q1

b:ya:x

Operations on FSTs

• Union:

• Concatenation:

• Composition:

ySxoryTxiffySTx ][][][

zSxandyTwiffyzSTwx ][][][

zSyandyTxtsyiffzSTx ][][..][

An example of composition operation

q0 q1

b:ya:x

q0

y:zx:ε

Probabilistic finite-state automata (PFA)

• Informally, in a PFA, each arc is associated with a probability.

• The probability of a path is the multiplication of the arcs on the path.

• The probability of a string x is the sum of the probabilities of all the paths for x.

• Tasks:– Given a string x, find the best path for x.– Given a string x, find the probability of x in a PFA.– Find the string with the highest probability in a PFA– …

Formal definition of PFA

A PFA is • Q: a finite set of N states• Σ: a finite set of input symbols• I: Q R+ (initial-state probabilities)• F: Q R+ (final-state probabilities)• : the transition relation

between states.• P: (transition probabilities)

),,,,,( PFIQ

QQ }){(

R

Qq

qI 1)(

Qq

a

qaqPqFQq

'

1)',,()(

1,1

),()(

),,(*)(*)(),(

1,1,1,1

11

111,1,1

nqnnn

ii

n

iinnn

qwPwP

qwqpqFqIqwP

Constraints on function:

Probability of a string:

Consistency of a PFA

Let A be a PFA.• Def: P(x | A) = the sum of all the valid paths for x in A.• Def: a valid path in A is a path for some string x with

probability greater than 0.• Def: A is called consistent if

• Def: a state of a PFA is useful if it appears in at least one valid path.

• Proposition: a PFA is consistent if all its states are useful. Q1 of Hw1

x

AxP 1)|(

An example of PFA

q0:0 q1:0.2

b:0.8

a:1I(q0)=1.0I(q1)=0.0

P(abn)=0.2*0.8n

18.01

8.0*2.08.0*2.0)()(

0

00

n

n

n

n

x

abPxP

Weighted finite-state automata (WFA)

• Each arc is associated with a weight.• “Sum” and “Multiplication” can be other

meanings.

))(),,()(()(,

tFtxsPsIxweightQts

Two types of HMMs

• State-emission HMM (Moore machine): – The emission probability depends only on the

state (from-state or to-state).

• Arc-emission HMM (Mealy machine): – The probability depends on (from-state, to-state)

pair.

State-emission HMM

Two kinds of parameters:• Transition probability: P(sj | si)• Output (Emission) probability: P(wk | si) # of Parameters: O(NM+N2)

w1 w4 w1

s1 s2sN…

w5w3w1

Arc-emission HMM

s1 s2sN…

w1

w5

Same kinds of parameters but the emission probabilities depend on both states: P(wk, sj

| si)

# of Parameters: O(N2M+N2).

w4

w3

w2

w1

w1

Are the two types of HMMs equivalent?

• For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2).

• The reverse is also true.

Q3 and Q4 of hw1.

Definition of arc-emission HMM

• A HMM is a tuple :

– A set of states S={s1, s2, …, sN}.

– A set of output symbols Σ={w1, …, wM}.

– Initial state probabilities

– State transition prob: A={aij}.

– Symbol emission prob: B={bijk}

• State sequence: X1,n

• Output sequence: O1,n

}{ i

),,,,( BAS

1,1

),()(

),|(*)|(*)(),(

1,1,1,1

11

111,1,1

nXnnn

iiii

n

iinn

XOPOP

xxoPxxPxXOP

Constraints

M

k

N

jij

N

ii

bijk

a

1

1

1

1

1

11 ijk

k jijba

For any integer n and any HMM

nO

HMMOP||

1)|(

Q2 of hw1.

Properties of HMM

• Limited horizon:

• Time invariance: the probabilities do not change over time:

• The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

)|(),...,|( 1211 tttt XXPXXXXP

)|()|( 11 mtmttt XXPXXP

Applications of HMM

• N-gram POS tagging– Bigram tagger: oi is a word, and si is a POS tag.– Trigram tagger: oi is a word, and si is ??

• Other tagging problems: – Word segmentation– Chunking– NE tagging– Punctuation predication– …

• Other applications: ASR, ….

Three fundamental questions for HMMs

1. Finding the probability of an observation

2. Finding the best state sequence

3. Training: estimating parameters

(1) Finding the probability of the observation

Forward probability: the probability of producing

O1,t-1 while ending up in state si:

),()( 1,1 iXOPt tt

def

i

)1()(1

TOPN

ii

Calculating forward probability

tijoiji

i

ttttti

tttttti

tti

t

ttj

bat

iXjXoPiXOP

iXOjXoPiXOP

jXiXOP

jXOPt

)(

)|,(*),(

),|,(*),(

),,(

),()1(

11,1

1,111,1

1,1

1,1

ii )1(Initialization:

Induction:

(2) Finding the best state sequence

• Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).

Viterbi algorithm

X1 X2XT…

o1 o2oT

XT+1

Viterbi algorithm

The probability of the best path that produces O1,t-1 while ending up in state si:

),,(max)( 1,11,11,1

iXOXPt tttX

def

it

ii )1(

tijoijii

j batt )(max)1(

Initialization:

Induction:

Modify it to allow epsilon emission: Q5 of hw1.

Summary of HMM

• Two types of HMMs: state-emission and arc-emission HMM:

• Properties: Markov assumption• Applications: POS-tagging, etc.• Finding the probability of an observation: forward

probability• Decoding: Viterbi decoding

),,,,( BAKS

Relation between FSA and HMM

Relation between WFA and HMM

• HMM can be seen as a special type of WFA.

• Given an HMM, how to build an equivalent WFA?

Converting HMM into WFA

Given an HMM , build a WFA

such that. for any input sequence O, P(O|HMM)=P(O|WFA).

– Build a WFA: add a final state and arcs to it– Show that there is a one-to-one mapping between the

paths in HMM and the paths in WFA– Prove that the probabilities in HMM and in WFA are

identical.

),,,,,( 2 PFIQ

),,,,( 1 BAS

HMM WFA

1),,(

*),,(

}|),,{()}0*(,|),,{(

1)(0)(

0)(),()(

}{

}{

12

fqP

baqwqP

SqfqabSqqqwq

fFqFSq

fIqqISq

fSQ

i

ijkijjki

iiijijkjijki

The WFA is not a PFA.

Need to create a new state (the final state) and add edges to it.

A slightly different definition of HMM

• A HMM is a tuple :– A set of states S={s1, s2, …, sN}.

– A set of output symbols Σ={w1, …, wM}.

– Initial state probabilities

– State transition prob: A={aij}.

– Symbol emission prob: B={bijk}

– qf is the final state: there are no outcoming edges from qf

}{ i

),,,,,( fqBAS

Constraints

0

0

1

1

1

,,

,

1

1

1

kjq

jq

M

kf

N

jijf

N

ii

f

f

bkj

aj

bijkqi

aqi

For any HMM (under this new definition)

O

HMMOP 1)|(

HMM PFA

ijkijjki

ijijkjijki

ff

baqwqP

abSqqqwq

qFqSqandqF

qqISq

SQ

*),,(

)}0*(,|),,{(

0)(}{1)(

)()(12

),,,,,( 1 fqBASHMM ),,,,,( 2 PFIQPFA

PFA HMM),,,,,( 2 fqBASHMM ),,,,,( 1 PFIQPFA

1

/),,(

][

),,(

0][

][][

}{

}{

,,

,

12

f

f

qi

ijjkiijk

qi

jkk

iij

f

f

b

aqwqPbQi

iFa

qwqPaQi

q

iIiQi

qQS

Need to add a new final state and edges to it

Project: Part 1

• Learn to use Carmel (a WFST package)

• Use Carmel as an HMM Viterbi decoder for a trigram POS tagger.

• The instruction will be handed out on 1/12, and the project is due on 1/19.

Summary

• FSA

• HMM

• Relation between FSA and HMM– HMM (the common def) is a special case of

WFA – HMM (a different def) is equivalent to PFA.

FSA and HMM

Documents

Transcript of FSA and HMM