Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena...

50
Mausam (Slides based on Dan Klein, Luke Zettlemoyer, Alex Simma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov Models Chapter 15 1

Transcript of Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena...

Page 1: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Mausam

(Slides based on Dan Klein, Luke Zettlemoyer, Alex

Simma, Erik Sudderth, David Fernandez-Baca,

Drena Dobbs, Serafim Batzoglou, William Cohen,

Andrew McCallum, Dan Weld)

Hidden Markov ModelsChapter 15

1

Page 2: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Temporal Models

• Graphical models with a temporal component

• St/Xt = set of unobservable variables at time t

• Wt/Yt = set of evidence variables at time t

• Notation Xa:b = Xa, Xa+1, …, Xb

2

Page 3: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Target Tracking

• Estimate motion of targets in 3D world from indirect, potentially noisy measurements

3

Radar-based tracking

of multiple targets

Visual tracking of

articulated objects(L. Sigal et. al., 2006)

Page 4: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Financial Forecasting

• Predict future market behavior from historical data, news reports, expert opinions, … 4

http://www.steadfastinvestor.com/

Page 5: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Biological Sequence Analysis

• Temporal models can be adapted to exploit more general forms of sequential structure, like those arising in DNA sequences 5

(E. Birney, 2001)

Page 6: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Speech Recognition• Given an audio

waveform, would like to robustly extract & recognize any spoken words

• Statistical models can be used to– Provide greater

robustness to noise

– Adapt to accent of different speakers

– Learn from training6

S. Roweis, 2004

Page 7: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Markov Chain

• Set of states

– Initial probabilities

– Transition probabilities

Markov Chain models system dynamics7

Page 8: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Markov Chains: Graphical Models

8

0.5

0.3

0.2

0.1

0.9 0.6

0.40.0

0.0

Difference from a Markov Decision Process?

it is a system that transitions by itself

Page 9: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Hidden Markov Model

• Set of states

– Initial probabilities

– Transition probabilities

• Set of potential observations

– Emission/Observation probabilities

HMM generates observation sequence

o1 o2 o3 o4 o5

9

Page 10: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Hidden Markov Models (HMMs)

Finite state machine

Graphical Model...Hidden states

Observations

o1 o2 o3 o4 o5 o6 o7 o8

Hidden state sequence

Observation sequence

Generates

Xt-2 Xt-1 Xt

yt-2 yt-1 yt...

...

...

Random variable Xt

takes values from{s1, s2, s3, s4}

Random variable yt takes values from s {o1, o2, o3, o4, o5, …}

10

Page 11: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

HMM

Finite state machine

o1 o2 o3 o4 o5 o6 o7 o8

Hidden state sequence

Observation sequence

Generates

Graphical Model...Hidden states

Observations

Xt-2 Xt-1 Xt

yt-2 yt-1 yt...

...

...

Random variable Xt

takes values from sss{s1, s2, s3, s4}

Random variable yt takes values from s {o1, o2, o3, o4, o5, …}

11

Page 12: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

HMM

Graphical Model...Hidden states

Observations

Xt-2 Xt-1 Xt

yt-2 yt-1 yt...

...

...

Random variable yt takes values from sss{s1, s2, s3, s4}

Random variable xt takes values from s {o1, o2, o3, o4, o5, …}

Need Parameters: Start state probabilities: P(x1=sk )Transition probabilities: P(xt=si | xt-1=sk)Observation probabilities: P(yt=oj | xt=sk )

12

Page 13: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Hidden Markov Models

13

• Just another graphical model…

“Conditioned on the present,

the past & future are independent”

hidden

states

observed

vars

Transition

Distribution

Ob

se

rva

tio

n

Dis

trib

utio

n

Page 14: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Hidden states

14

hidden

states

observed

process

• Given , earlier observations provide no

additional information about the future:

Page 15: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

HMM Generative Process

15

We can easily sample sequences pairs:

X0:n,Y0:n = S0:n,W0:n

For i = 1 ... n

Sample si from the distribution P(si|si-1)

Sample wi from the distribution P(wi|si)

Sample initial state: P(x0)

Page 16: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Example: POS Tagging

• Useful as a pre-processing step

16

DT NN IN NN VBD NNS VBDThe average of interbank offered rates plummeted …

DT NNP NN VBD VBN RP NN NNS

The Georgia branch had taken on loan commitments …

Setup:

states S = {DT, NNP, NN, ... } are the POS tags

Observations W = V are words

Transition dist’n P(si|si-1) models the tag sequences

Observation dist’n P(wi|si) models words given their POS

Page 17: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Example: Chunking

• Find spans of text with certain properties

• For example: named entities with types

– (PER, ORG, or LOC)

• Germany ’s representative to the European Union’s veterinary committee Werner Zwingman said on Wednesday consumers should ...

• [Germany]LOC ’s representative to the [European Union]ORG ‘s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should ...

17

Page 18: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Example: Chunking

• [Germany]LOC ’s representative to the [European Union]ORG ‘s veterinary committee [Werner Zwingman]PER said on Wednesday consumers should ...

• Germany/BL ’s/NA representative/NA to/NA the/NA European/BO Union/CO ‘s/NA veterinary/NA committee/NA Werner/BP Zwingman/CP said/NA on/NA Wednesday/NA consumers/NA should/NA ...

• HMM Model:– States S = {NA, BL, CL, BO, CO, BL, CL} represent beginnings (BL, BO, BP} and

continuations (CL, CO, CP) of chunks, and other (NA)

– Observations W = V are words

– Transition dist’n P(si |si -1) models the tag sequences

– Observation dist’n P(wi |si) models words given their type

18

Page 19: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Example: The Occasionally Dishonest CasinoA casino has two dice:• Fair die:

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6• Loaded die:

P(1) = P(2) = P(3) = P(4) = P(5) = 1/10; P(6) = 1/2

• Dealer switches between dice as:

– Prob(Fair Loaded) = 0.01

– Prob(Loaded Fair) = 0.2

– Transitions between dice obey a Markov process

Game:1. You bet $12. You roll (always with a fair die)3. Casino player rolls

(maybe with fair die, maybe with loaded die)

4. Highest number wins $2 19

Page 20: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

An HMM for the occasionally dishonest casino

20

P(1|F) = 1/6

P(2|F) = 1/6

P(3|F) = 1/6

P(4|F) = 1/6

P(5|F) = 1/6

P(6|F) = 1/6

P(1|L) = 1/10

P(2|L) = 1/10

P(3|L) = 1/10

P(4|L) = 1/10

P(5|L) = 1/10

P(6|L) = 1/2

Page 21: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Question # 1 – Evaluation

GIVEN

A sequence of rolls by the casino player

124552646214614613613666166466163661636616361…

QUESTION

How likely is this sequence, given our model of how the casino works?

This is the EVALUATION problem in HMMs21

Page 22: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Question # 2 – Decoding

GIVEN

A sequence of rolls by the casino player

1245526462146146136136661664661636616366163…

QUESTION

What portion of the sequence was generated with the fair die, and what portion with the loaded die?

This is the DECODING question in HMMs22

Page 23: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Question # 3 – Learning

GIVEN

A sequence of rolls by the casino player

124552646214614613613666166466163661636616361651…

QUESTION

How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back?

This is the LEARNING question in HMMs23

Page 24: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

HMM Inference

• Evaluation: prob. of observing an obs. sequence

– Forward Algorithm (very similar to Viterbi)

• Decoding: most likely sequence of hidden states

– Viterbi algorithm

• Marginal distribution: prob. of a particular state

– Forward-Backward

24

Page 25: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Decoding ProblemGiven w=w1 …wn and HMM θ, what is “best” parse s1 …sn?

Several possible meanings of ‘solution’1. States which are individually most likely2. Single best state sequence

We want sequence s1 …sn,such that P(s|w) is maximized

s* = argmaxs P( s|w )

1

2

K

1

2

K

1

2

K

1

2

K

o1 o2 o3 oT

2

1

K

2

25

Page 26: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Most Likely Sequence

• Problem: find the most likely (Viterbi) sequence under the model

26

P(s0:n,w0:n) = P(NNP|) P(Fed|NNP) P(VBZ|NNP) P(raises|VBZ) P(NN|NNP)…..

NNP VBZ NN NNS CD NN

NNP NNS NN NNS CD NN

NNP VBZ VB NNS CD NN

logP = -23

logP = -29

logP = -27

In principle, we’re done – list all possible tag sequences, score each one, pick the best one (the Viterbi state sequence)

Fed raises interest rates 0.5 percent .

NNP VBZ NN NNS CD NN .

Given model parameters, we can score any sequence pair

2n multiplications per sequence

|S|n state sequences!

Page 27: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The occasionally dishonest casino

• Known:– The structure of the model– The transition probabilities

• Hidden: What the casino did– FFFFFLLLLLLLFFFF...

• Observable: The series of die tosses– 3415256664666153...

• What we must infer:– When was a fair die used?– When was a loaded one used?

• The answer is a sequenceFFFFFFFLLLLLLFFF...

27

Page 28: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The occasionally dishonest casino

00227.0

6

199.0

6

199.0

6

15.0

)|6()|()|2()|()|6()0|(),Pr( )1(

FpFFpFpFFpFpFpsw

28

008.0

5.08.01.08.05.05.0

)|6()|()|2()|()|6()0|(),Pr( )2(

LpLLpLpLLpLpLpsw

0000417.0

5.001.06

12.05.05.0

)|6()|()|2()|()|6()0|(),Pr( )3(

LpFLpFpLFpLpLpsw

LLLs )2(

LFLs )3(

6,2,6,, 321 wwww

FFFs )2(

Page 29: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Finding the Best Trajectory • Too many trajectories (state sequences) to list• Option 1: Beam Search

– A beam is a set of partial hypotheses– Start with just the single empty trajectory– At each derivation step:

• Consider all continuations of previous hypotheses• Discard most, keep top k

29

<>

Fed:N

Fed:V

Fed:J

raises:N

raises:V

raises:N

raises:V

Beam search works ok in practice … but sometimes you want the optimal answer

… and there’s usually a better option than naïve beams

Page 30: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The State Lattice / Trellis

30

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 31: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The State Lattice / Trellis

31

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 32: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

32

--1)

Dynamic Programming

δi(s): probability of most likely state sequence ending with state s, given observations w1, …, wi

Page 33: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

33

--1)

Dynamic Programming

δi(s): probability of most likely state sequence ending with state s, given observations w1, …, wi

Page 34: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

34

Vitterbi Algorithm

Page 35: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The Viterbi Algorithmw1 w2 …w i-1 w i………………………………wN

State 1

2

K

i δi(s)

Maxs’ δi-1(s’) * Ptrans* Pobs

Remember: δi(s) = probability of most likely state seq ending with s at time i

35

Page 36: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Terminating Viterbi

δ

δ

δ

δ

δ

w1 w2 …………………………………………..wN

State 1

2

K

iChoose Max

36

Page 37: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Terminating Viterbi

Time: O(|S|2N)Space: O(|S|N)

w1 w2 …………………………………………..wN

State 1

2

K

i

Linear in length of sequence

Maxs’ δN-1(s’) * Ptrans* Pobs

Max

How did we compute *?

δ*

Now Backchain to Find Final Sequence

37

Page 38: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Viterbi: Example

1

s

w

0

0

6 2 6

(1/6)(1/2)= 1/12

0

(1/2)(1/2)= 1/4

(1/6)max{(1/12)0.99,(1/4)0.2}

= 0.01375

(1/10)max{(1/12)0.01,(1/4)0.8}

= 0.02

B

F

L

0 0

(1/6)max{0.013750.99,0.020.2}

= 0.00226875

(1/2)max{0.013750.01,0.020.8}

= 0.08

)'()'|(max)|()( 1'

ssspswps is

ii

38

Viterbi: Example

Page 39: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Viterbi gets it right more often than not

39

Page 40: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Computing Marginals

• Problem: find the marginal distribution

40

In principle, we’re done – list all possible tag sequences,

score each one, sum up the values

Fed raises interest rates 0.5 percent .

NNP VBZ NN NNS CD NN .

P(NNP|) P(Fed|NNP) P(VBZ|NNP) P(raises|VBZ) P(NN|NNP)…..

Given model parameters, we can score any tag sequence

Page 41: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The State Lattice / Trellis

41

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

^

N

V

J

D

$

START Fed raises interest rates END

Page 42: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The Forward Backward Algorithm

42

)|(),(

),|(),(

),,(

:1:0

:0:1:0

:1:0

iniii

iiniii

niii

swPwsP

wswPwsP

wwsP

),( :0 ni wsP

Page 43: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The Forward Backward Algorithm

Sum over all paths, on both sides:

43

Page 44: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

44

The Forward Backward Algorithm

Page 45: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

HMM Learning

• Learning from data D

– Supervised

• D = {(s0:n,w0:n)i | i = 1 ... m}

– Unsupervised

• D = {(w0:n)i | i = 1 ... m}

• We won’t do this case!

• (~hidden vars) EM– Also called Baum Welch algorithm

45

Page 46: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Supervised Learning

– Given data D = {Xi | i = 1 ... m } where Xi=(s0:n,w0:n) is a state, observation sequence pair

– Define the parameters Θ to include:• For every pair of states:

• For every state, obs. pair:

– Then the data likelihood is:

46

And the maximum likelihood solutions is

Page 47: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Final ML Estimates (as in BNs)

– c(s,s’) and c(s,w) are the empirical counts of transitions and observations in the data D

– The final, intuitive, estimates:

47

Just as with BNs, the counts can be zero

use smoothing techniques!

Page 48: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

The Problem with HMMs

• We want more than an Atomic View of Words

• We want many arbitrary, overlapping features of words

identity of word

ends in “-ly”, “-ed”, “-ing”

is capitalized

appears in a name database/Wordnet

xt -1

xt

wt

xt+1

wt +1

wt -1

Use discriminative models instead of generative ones(e.g., Conditional Random Fields)

48

Page 49: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Finite State Models

Naïve Bayes

Logistic Regression

Linear-chain CRFs

HMMsGenerative

directed models

General CRFs

Sequence

Sequence

Conditional Conditional Conditional

GeneralGraphs

GeneralGraphs

49

Page 50: Hidden Markov Models - courses.cs.washington.eduSimma, Erik Sudderth, David Fernandez-Baca, Drena Dobbs, Serafim Batzoglou, William Cohen, Andrew McCallum, Dan Weld) Hidden Markov

Temporal Models

• Full Bayesian Networks have dynamic versions too

– Dynamic Bayesian Networks (Chapter 15.5)

– HMM is a special case

• HMMs with continuous variables often useful for filtering (estimating current state)

– Kalman filters (Chapter 15.4)

50