Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

31
Hidden Markov Models Hidden Markov Models Introduction to Introduction to Artificial Intelligence Artificial Intelligence COS302 COS302 Michael L. Littman Michael L. Littman Fall 2001 Fall 2001
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Page 1: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Hidden Markov ModelsHidden Markov Models

Introduction toIntroduction toArtificial IntelligenceArtificial Intelligence

COS302COS302

Michael L. LittmanMichael L. Littman

Fall 2001Fall 2001

Page 2: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

AdministrationAdministration

Exams need a swift kick.Exams need a swift kick.

Form project groups next week.Form project groups next week.

Project due on the last day.Project due on the last day.

BUTBUT! There will be milestones.! There will be milestones.

In 2 weeks, synonyms via web.In 2 weeks, synonyms via web.

3 weeks, synonyms via wordnet.3 weeks, synonyms via wordnet.

Page 3: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Shannon Game AgainShannon Game Again

RecallRecall

Sue swallowed the large green __.Sue swallowed the large green __.

OkOk: pepper, frog, pea, pill: pepper, frog, pea, pill

Not ok: Not ok: beige, running, verybeige, running, very

Parts of speech could help:Parts of speech could help:

noun verbed det adj adj noun verbed det adj adj nounnoun

Page 4: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

POS Language ModelPOS Language Model

How could we define a probabilistic How could we define a probabilistic model over sequences of parts of model over sequences of parts of speech?speech?

How could we use this to define a How could we use this to define a model over word sequences?model over word sequences?

Page 5: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Hidden Markov ModelHidden Markov Model

Idea: We have states with a simple Idea: We have states with a simple transition rule (Markov model). We transition rule (Markov model). We observe a probabilistic function of observe a probabilistic function of the states. Therefore, the states the states. Therefore, the states are not what we see…are not what we see…

Page 6: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

HMM ExampleHMM Example

Browsing the web, connection is Browsing the web, connection is either up or down. Observe either up or down. Observe response or no.response or no.

Response: 0.7No response: 0.3

Response: 0.0No response: 1.0

0.1

0.2 0.80.9

UP DOWN

Page 7: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

HMM DefinitionHMM Definition

N states, M observationsN states, M observations

(s): prob. starting state is s(s): prob. starting state is s

p(s,s’): prob. of s to s’ transitionp(s,s’): prob. of s to s’ transition

b(s, k): probability of obs k from sb(s, k): probability of obs k from s

kk00 k k11 … k … kll: observation sequence: observation sequence

Page 8: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

HMM ProblemsHMM Problems

Probability of a state sequenceProbability of a state sequence

Probability of an obs. sequenceProbability of an obs. sequence

Most like state sequence for an Most like state sequence for an observation sequenceobservation sequence

Most likely model given obs. seq.Most likely model given obs. seq.

Response: 0.7No response: 0.3

Response: 0.0No response: 1.0

0.1

0.2 0.80.9

UP DOWN

Page 9: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

States ExampleStates Example

Probability of:Probability of:

U U U U D DD D U U U U U U

isis

1.0 0.9 0.1 0.8 0.2 0.9 0.9 = .011661.0 0.9 0.1 0.8 0.2 0.9 0.9 = .01166

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

Page 10: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

DerivationDerivation

Pr(sPr(s00…s…sll))

= Pr(s= Pr(s00) Pr(s) Pr(s11 | s | s00) Pr(s) Pr(s22 | s | s00 s s11)…)… Pr(sPr(sll | s| s00 s s11 s s22 … s … sl-1l-1))

= Pr(s= Pr(s00) Pr(s) Pr(s11 | s | s00) Pr(s) Pr(s22 | s | s11)…)…Pr(sPr(sll | s | sl-1l-1))

= = (s(s00) p(s) p(s00, s, s11) p(s) p(s11, s, s22)… p(s)… p(sl-1l-1, s, sll))

Page 11: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

States/Obs. ExampleStates/Obs. Example

Probability of:Probability of: R R R R N N NN N N R R NN given given U U U U D DD D U U U U U Uisis0.7 0.7 1.0 1.0 0.3 0.7 0.3 = .03087000.7 0.7 1.0 1.0 0.3 0.7 0.3 = .0308700

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

Page 12: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

DerivationDerivation

Pr(kPr(k00…k…kll | s | s00…s…sll))

= Pr(k= Pr(k00|s|s00…s…sll) Pr(k) Pr(k11 | s | s00…s…sll, k, k00) Pr(k) Pr(k22 | | ss00…s…sll, k, k00, k, k11) …) … Pr(k Pr(kll | | ss00…s…sll, k, k00…k…kl-1l-1))

= Pr(k= Pr(k00|s|s00) Pr(k) Pr(k11 | s | s11) Pr(k) Pr(k22 | s | s22) …) … Pr(kPr(kll | s | sll))

= b(s= b(s00,k,k00) b(s) b(s11,k,k11) b(s) b(s22,k,k22) … b(s) … b(sll,k,kll))

Page 13: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Observation ExampleObservation Example

Probability of:Probability of:

R R R R N N NN N N R R NN

= sum{s= sum{s00…s…s66} Pr(s} Pr(s00…s…s66) ) Pr(R R Pr(R R N N NN N N R R N N | s| s00…s…s66))

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

Page 14: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

DerivationDerivation

Pr(kPr(k00…k…kll))

= sum{s= sum{s00…s…sll} Pr(s} Pr(s00…s…s66) Pr(k) Pr(k00…k…kll |s |s00…s…sll))

= sum{s= sum{s00…s…sll} Pr(s} Pr(s00) Pr(k) Pr(k00|s|s00)) Pr(s Pr(s11 | s | s00) ) Pr(kPr(k11 | s | s11) Pr(s) Pr(s22 | s | s11) Pr(k) Pr(k22 | s | s22) …) … Pr(sPr(sll | s| sl-1l-1) Pr(k) Pr(kll | s | sll))

= sum{s= sum{s00…s…sll} } (s(s00) b(s) b(s00,k,k00) p(s) p(s00, s, s11) b(s) b(s11,k,k11) ) p(sp(s11, s, s22) b(s) b(s22,k,k22) … p(s) … p(sl-1l-1, s, sll) b(s) b(sll,k,kll))

Page 15: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Uh Oh, Better Get Uh Oh, Better Get

Grows exponentially with l: MGrows exponentially with l: Mll

How do this more efficiently?How do this more efficiently?

(i,t): probability of seeing first t (i,t): probability of seeing first t observations observations andand ending up in state ending up in state i: Pr(ki: Pr(k00…k…ktt, s, stt=i)=i)

(i,0) = (i,0) = (s(s00) b(k) b(k00, s, sii))

(i,t) = sum(i,t) = sumjj (j,t-1) p(s(j,t-1) p(sjj,s,sii) b(k) b(ktt, s, sii))

Return Pr(kReturn Pr(k00…k…kll) = sum) = sumjj (j,l)(j,l)

Page 16: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Partial DerivationPartial Derivation

(i,t) (i,t)

= Pr(k= Pr(k00…k…ktt, s, stt=i)=i)

= sum= sumjj Pr(k Pr(k00…k…kt-1t-1, s, st-1t-1=j)=j) Pr(kPr(k00…k…ktt, , sstt=i | k=i | k00…k…kt-1t-1, s, st-1t-1=j)=j)

= sum= sumjj (j,t-1) (j,t-1) Pr(kPr(k00…k…kt-1t-1 | k | k00…k…kt-1t-1, s, st-1t-1=j)=j)Pr(sPr(stt=i, k=i, ktt | k | k00…k…kt-1t-1, s, st-1t-1=j)=j)

= sum= sumjj (j,t-1) (j,t-1) Pr(sPr(stt=i | k=i | k00…k…kt-1t-1, s, st-1t-1=j)=j)Pr(kPr(ktt | s | stt=i, k=i, k00…k…kt-1t-1, s, st-1t-1=j)=j)

= sum= sumjj (j,t-1) Pr(s(j,t-1) Pr(stt=i | s=i | st-1t-1=j) Pr(k=j) Pr(ktt | s | stt=i)=i)

= sum= sumjj (j,t-1) p(s(j,t-1) p(sjj,s,sii) b(k) b(ktt, s, sii))

Page 17: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Pr(Pr(N N NN N N))

= Pr(= Pr(UUUUUU) Pr() Pr(N N N N N N | | UUUUUU))+ Pr(+ Pr(UUUUDD) Pr() Pr(N N N N N N | | UUUUDD))+ Pr(+ Pr(UUDDUU) Pr() Pr(N N N N N N | | UUDDUU))+ Pr(+ Pr(UUDDDD) Pr() Pr(N N N N N N | | UUDDDD))+ Pr(+ Pr(DDUUUU) Pr() Pr(N N N N N N | | DDUUUU))+ Pr(+ Pr(DDUUDD) Pr() Pr(N N N N N N | | DDUUDD))+ Pr(+ Pr(DDDDUU) Pr() Pr(N N N N N N | | DDDDUU))+ Pr(+ Pr(DDDDDD) Pr() Pr(N N N N N N | | DDDDDD))

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

Page 18: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Pr(Pr(N N NN N N))

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

NN NN NN

UPUP

DOWNDOWN

Page 19: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

POS TaggingPOS Tagging

Simple tagging model says part of Simple tagging model says part of speech depends only on previous speech depends only on previous part of speech, word depends only part of speech, word depends only on part of speech.on part of speech.

So, HMM state is previous part of So, HMM state is previous part of speech, observation is word.speech, observation is word.

What are the probabilities and where What are the probabilities and where could they come from?could they come from?

Page 20: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Shannon GameShannon Game

If we have a POS-based language If we have a POS-based language model, how do we compute model, how do we compute probabilities?probabilities?

Pr(Pr(Sue ate the small blue candy.Sue ate the small blue candy.))

= sum= sumtagstags Pr(sent|tags) Pr(sent|tags)

Page 21: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Best Tag SequenceBest Tag Sequence

Useful problem to solve:Useful problem to solve: Given sentence, find most likely Given sentence, find most likely

sequence of POS tagssequence of POS tagsSueSue sawsaw thethe flyfly ..nounnoun verbverb detdet nounnoun ..

maxmaxtagstags Pr(tags|sent) Pr(tags|sent)

= max= maxtagstags Pr(sent|tags) Pr(tags) / Pr(sent|tags) Pr(tags) / Pr(sent)Pr(sent)

= c max= c maxtagstags Pr(sent|tags) Pr(tags) Pr(sent|tags) Pr(tags)

Page 22: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

ViterbiViterbi

(i,t): probability of (i,t): probability of most likelymost likely state state sequence that ends in state i when sequence that ends in state i when seeing first t observations:seeing first t observations:

max{smax{s00…s…st-1t-1} Pr(k} Pr(k00…k…ktt,s,s00…s…st-1t-1, s, stt=i)=i)

(i,0) = (i,0) = (s(s00) b(k) b(k00, s, sii))

(i,t) = max(i,t) = maxjj (j,t-1) p(s(j,t-1) p(sjj,s,sii) b(k) b(ktt, s, sii))

Return maxReturn maxjj (j,l) (trace it back)(j,l) (trace it back)

Page 23: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

max Pr(s*) Pr(max Pr(s*) Pr(NNNNNN|s*)|s*)

max {max { Pr(Pr(UUUUUU) Pr() Pr(N N N N N N | | UUUUUU),),Pr(Pr(UUUUDD) Pr() Pr(N N N N N N | | UUUUDD),),Pr(Pr(UUDDUU) Pr() Pr(N N N N N N | | UUDDUU),),Pr(Pr(UUDDDD) Pr() Pr(N N N N N N | | UUDDDD),),Pr(Pr(DDUUUU) Pr() Pr(N N N N N N | | DDUUUU),),Pr(Pr(DDUUDD) Pr() Pr(N N N N N N | | DDUUDD),),Pr(Pr(DDDDUU) Pr() Pr(N N N N N N | | DDDDUU),),Pr(Pr(DDDDDD) Pr() Pr(N N N N N N | | DDDDDD) }) }

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

Page 24: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

max Pr(s*) Pr(max Pr(s*) Pr(NNNNNN|s*)|s*)

R: 0.7, N: 0.3 R: 0.0, N: 1.00.1

0.2 0.80.9

NN NN NN

UPUP

DOWNDOWN

Page 25: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

PCFGPCFG

An analogy…An analogy…

Finite-state Automata (FSAs):Finite-state Automata (FSAs):

Hidden Markov Models (HMMs) ::Hidden Markov Models (HMMs) ::

Context-free Grammars (CFGs):Context-free Grammars (CFGs):

Probabilistic context-free grammars Probabilistic context-free grammars (PCFGs)(PCFGs)

Page 26: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Grammars & RecursionGrammars & Recursion

S S 1.01.0 NP VP NP VP

NP NP 0.450.45 NP PP, NP PP, 0.550.55 N N

PP PP 1.01.0 P NP P NP

VP VP 0.30.3 VP PP, VP PP, 0.70.7 V NP V NP

N N 0.40.4 micemice, , 0.250.25 hatshats, , 0.350.35 pimplespimples

P P 1.01.0 withwith

V V 1.01.0 worewore

Page 27: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Computing with PCFGsComputing with PCFGs

Can compute Pr(sent) and also Can compute Pr(sent) and also maxmaxtreetree Pr(tree|sent) using Pr(tree|sent) using algorithms like the ones we algorithms like the ones we discussed for HMMs. Still discussed for HMMs. Still polynomial time.polynomial time.

Page 28: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

What to LearnWhat to Learn

HMM DefinitionHMM Definition

Computing probability of state and Computing probability of state and observation sequencesobservation sequences

Part of Speech TaggingPart of Speech Tagging

Viterbi: most likely state sequenceViterbi: most likely state sequence

PCFG DefinitionPCFG Definition

Page 29: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 7 (due 11/21)Homework 7 (due 11/21)

1.1. Give a maximization scheme for filling Give a maximization scheme for filling in the two blanks in a sentence like “I in the two blanks in a sentence like “I hate it when ___ goes ___ like that.” Be hate it when ___ goes ___ like that.” Be somewhat rigorous to make the TA’s job somewhat rigorous to make the TA’s job easier.easier.

2. Derive that (a) Pr(k2. Derive that (a) Pr(k00…k…kll) = sum) = sumjj (j,l), (j,l), and (b) and (b) (i,0) = (i,0) = (s(s00) b(k) b(k00, s, sii).).

3. Email me your project group of three or 3. Email me your project group of three or four.four.

Page 30: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 8 (due 11/28)Homework 8 (due 11/28)

1.1. Write a program that decides if a Write a program that decides if a pair of words are synonyms using pair of words are synonyms using the web. I’ll send you the list, you the web. I’ll send you the list, you send me the answers.send me the answers.

2.2. more soonmore soon

Page 31: Hidden Markov Models Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.

Homework 9 (due 12/5)Homework 9 (due 12/5)

1.1. Write a program that decides if a Write a program that decides if a pair of words are synonyms using pair of words are synonyms using wordnet. I’ll send you the list, you wordnet. I’ll send you the list, you send me the answers.send me the answers.

2.2. more soonmore soon