Hidden Markov Models (HMMs) - Biostatistics · Hidden Markov Models Back to the weather example....

46
Hidden Markov Models (HMMs) November 14, 2017

Transcript of Hidden Markov Models (HMMs) - Biostatistics · Hidden Markov Models Back to the weather example....

Hidden Markov Models (HMMs)

November 14, 2017

inferring a hidden truth

1) You hear a static-filled radio transmission.

how can you determine what did the sender intended to say?

2) You know that genes have open reading frames and are segmented (interrupted by introns) in eukaryotic genomes.

Looking at a large genomic region with lots of open reading frames, which ones belong to genes?

inferring a hidden truth: simple HMMs

hidden states (intended words, gene presence) are related to the observed states (static, lots of open reading frames).

Each hidden state has one observed state.

Characteristics of the current hidden state are governed in some way by the previous hidden state, but not that state’s previous state or any earlier states.

Characteristics of the current observed state are governed by the current hidden state.

Markov chains

3 states of weather: sunny, cloudy, rainyObserved once a day at the same time

All transitions are possible, with some probabilityEach state depends only on the previous state

Markov chains: another view

time t1 t2 t3 t4 t5state1 state2

startstate3 state4 state5

Markov chains

State transition matrix: the probability of the weather today given yesterday’s weather

The rows of the transition matrix must sum to oneInitial distribution must be defined (day one: p(sunny)=?, p(cloudy)=?, p(rainy)=? ...)

Markov chains

P(xL|xL-1, xL-2, . . . x1) = P(xL|xL-1) for all L.

What does this mean?? The current state, L, does not depend on anything but the previous state. This is the memoryless property. Very important.

First-order Markov model

P(x) = probability of a particular sequence of observations x = {x1, x2, . . . xn}pij = probability that if the previous symbol is i, the next symbol will be j.

Under this model, p(ACCGATA) (probability of observing this sequence) is just pApACpCCpCGpGApATpTA

where pAC = p(there will be a C after an A) = p(C|A), and that probability does NOT depend on anything in the sequence besides that preceding A.

Then p(ACCGATA) = p1Πpij

Higher order Markov chains

Sunny = S, cloudy = C

2nd order Markov model: weather depends on yesterday plus the day before

Not all state transitions are possible!

SSCSCC = S1S2 + S2C3 + C3S4 + S4C5 + C5C6

SS SC

CCCS

Hidden Markov Models

Back to the weather example. All we can observe now is the behavior of a dog—only he can see the weather, we cannot!!!

Dog can be in, out, or standing pathetically on the porch. This depends on the weather in a quantifiable way. How do we figure out what the weather is if we can only observe the dog?

Hidden Markov ModelsDog’s behavior is the “emission” of the weather (the hidden states)

Output matrix = emission probabilitiesHidden states = system described by Markov modelObservable states = side effects of the Markov model

Hidden Markov Models: another view

time t1 t2 t3 t4 t5

q1 q2 q3 q4 q5

observation 1

pq1q2 pq2q3 pq3q4 pq4q5πq1

observation 2observation 3

observation 4observation 5

q’s can be sunny, cloudy, or rainy. Observations are the dog’s behavior (in, out, porch)

Hidden Markov Models

All we observe is the dog:

IOOOIPIIIOOOOOPPIIIIIPIWhat’s the underlying weather (the hidden states)?

How likely is this sequence, given our model of how the dog works?

What portion of the sequence was generated by each state?

Hidden Markov Models

All we observe is the dog: IOIOIPI

weather? guess: RSRSRRR? p(dog’s behavior) = 0.023

but p(RSRSRRR) is only 0.00012.

CCCCCCC? p(dog) = 0.002 but p(weather) = 0.0094

start: p(c)=0.2, p(r)=0.2, p(s)=0.6

Hidden Markov Models: the three questions

EvaluationGiven a HMM, M, and a sequence of observations, xFind P(x|M)

DecodingGiven a HMM, M, and a sequence of observations, xFind the sequence Q of hidden states that maximizes P(x, Q|M)

LearningGiven an unknown HMM, M, and a sequence of observations, xFind parameters θ that maximize P(x|θ, M)

Hidden Markov model: Five components

1. A set of N hidden states S1, S2, SN

S1 = sunnyS2 = cloudyS3 = rainyN=3

Hidden Markov model: Five components

2. An alphabet of distinct observation symbols

A = {in, out, porch} = {I,O,P}

Hidden Markov model: Five components

3. Transition probability matrix P = (pij) whereqt is the shorthand for the hidden state at time t.qt = Si means that the hidden state at time t was state Sipij = P(qt+1 = Sj|qt = Si)transition matrix: hidden states!

Hidden Markov model: Five components

4. Emission probabilities: For each state Si and a in Abi(a) = p(Si emits symbol a) The probabilities bi(a) form an NxM matrix where N=#hidden states,

M=#observed statesb1(O) = p(S1 emits “out”) = 0.7

Hidden Markov model: Five components

5. An initial distribution vector π = (πi) where πi = P(q1 = Si).

start: p(c)=0.2, p(r)=0.2, p(s)=0.6

p(q1 = S1) = probability that the (hidden) first state is sunny = 0.6

so π = (0.6,0.2,0.2) NOTE that the first emitted state is not specified in the initial distribution vector. That’s part of the model.

HMMs: another view

time t1 t2 t3 t4 t5

q1 q2 q3 q4 q5

x1 x2 x3 x4 x5

Sq1 Sq2 Sq3 Sq4 Sq5

bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5)

pq1q2 pq2q3 pq3q4 pq4q5πq1

x = {O,I,I,O,P}S = {S, C, R}

HMM: solve problem 1 (evaluation)

Given a HMM, M, and a sequence, x, find P(x|M)–this tells you how unusual the observations are, regardless of hidden states.

One way to do this is brute force: find all possible sequences of hidden states calculate p(X|Q) for each then p(X) = Σp(X|Q)p(Q) (sum over ALL hidden state sequences Q)

But this takes an exponential number of calculations . . . 2NTN where N=#hidden states and T=length of observed sequence

HMM: solve problem 1 (evaluation)

Given a HMM, M, and a sequence, x, find P(x|M)

Forward algorithm:

Calculate the probability of the sequence of observations up to and including time t:

P(x1 x2 x3 x4 . . . xt) = ?? that’s the same problem.

If we knew the hidden state for xt, we could use that, so let

α(t, i) = P(x1 x2 x3 x4 . . . xt, qt = Si)

(a joint probability)

HMM: solve problem 1 (evaluation)

evaluation

time t1 t2 t3 t4 t5=T

q1 q2 q3 q4 q5

I O P P I

Sq1 Sq2 Sq3 Sq4 Sq5

bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5)

pq1q2 pq2q3 pq3q4 pq4q5πq1

HMM: solve problem 1 (evaluation)

Given a HMM, M, and a sequence, x, find P(x|M)We can also use the Backward algorithm for this problem. Briefly:

where T is the total number of observed states. Generalize:

evaluation

time t1 t2 t3 t4 t5=T

q1 q2 q3 q4 q5

I O P P I

Sq1 Sq2 Sq3 Sq4 Sq5

bs1(x1) bs2(x2) bs3(x3) bs4(x4) bs5(x5)

pq1q2 pq2q3 pq3q4 pq4q5πq1

HMM —forward-backward algorithm

If you’re following this you realize the forward and backward algorithms alone are just formalizations of the brute force method! How does this help?

Now, if we fix the identity of the hidden state at time t, we can calculate the probability of the sequence. These formulas come in handy later.

HMM —forward-backward algorithm

The forward and backward algorithms work together: I know how to calculate the probability of a sequence up to a time point, if I know the hidden state at that time point (α(t,i)).

I know how to calculate the probability of an observed sequence from the end backward to a time point, given the hidden state at that time point (β(T,i))

HMM —forward-backward algorithm

dog: IOOOPPPIO 123456789

p(1-4, 4=S)*p(4-9|4=S)p(1-4, 4=C)*p(4-9|4=C)p(1-4, 4=R)*p(4-9|4=R)} p(IOOOPPPIO)

HMM: solve problem 2 (decoding)

Decoding: Given a HMM M and a sequence x, find the sequence π of states that maximizes P(x, Q|M)

We’ll use the Viterbi algorithm.Assumptions needed for the Viterbi algorithm: 1) the observed and hidden events must be in a sequence 2) an observed event must correspond to one and only one hidden

event 3) computing the most likely sequence up to point t depends only on

the observed event at t and the most likely sequence up to t-1.

HMM: solve problem 2 (decoding)

Viterbi algorithm

HMM: solve problem 2 (decoding)

Viterbi algorithm

In this problem, we don’t really care what the probability of the sequence is. It happened. What we want to know is whether it was sunny when the dog was inside on the third day.

HMM: solve problem 2 (decoding)

Viterbi algorithm: uses a form of dynamic programming to decode hidden state at each time point, using the forward and backward algorithms.

Observed: OOIP

What is q3?

HMM: solve problem 2 (decoding)

Viterbi algorithm: uses a form of dynamic programming to decode hidden state at each time point, using the forward and backward algorithms.

OOIP What is q3?

time t1 t2 t3 t4

hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4

bq1(O) bq2(O) bq3(I) bq4(P)

emitted O O I P

HMM: solve problem 2 (decoding)

OOIP What is q3?

need to figure out the most likely hidden states & look at which is at t3.

time t1 t2 t3 t4

hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4

bq1(O) bq2(O) bq3(I) bq4(P)

emitted O O I P

HMM: solve problem 2 (decoding)

time t1 t2 t3 t4

hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4

bq1(O) bq2(O) bq3(I) bq4(P)

emitted O O I P

HMM: solve problem 2 (decoding)

time t1 t2 t3 t4

hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4

bq1(O) bq2(O) bq3(I) bq4(P)

emitted O O I P

HMM: solve problem 2 (decoding)

time t1 t2 t3 t4

hidden π(q1 ) q1 p12 q2 p23 q3 p34 q4

bq1(O) bq2(O) bq3(I) bq4(P)

emitted O O I P

= π(q1 ) * bq1(O) * p12 * bq2(O) * p23 * bq3(I) = bq4(P) * p34

HMM: solve problem 2 (decoding)

O O I P

Sunny

Cloudy

Rainy

HMM: solve problem 2 (decoding)

O O I P

= π(q1 ) * bq1(O) * p12 * bq2(O) * p2S * bS(I) = bq4(P) * pS4

HMM: solve problem 2 (decoding)

O O I P

HMM: solve problem 2 (decoding)

O O I P

HMM: solve problem 2 (decoding)

The Viterbi algorithm is very powerful and can distinguish subtle features of strings

Originally designed for speech processing

“Dishonest casino problem”

One example: very crude gene finding

exon intron intergenic

exon 0.4 0.5 0.1

intron 0.2 0.8 0

intergenic 0.1 0 0.9

ni

ni+1 21bp coding

21bp noncoding

exon 0.90 0.1

intron 0.2 0.8

intergenic 0.3 0.7

CCCCCNNNNNNCCNNNCCCCCCCCCNNCNCCCNNNNNN

evaluating nonoverlapping chunks of 21bp sequence C=“coding” (no stop codon) N=“noncoding” (one or more stop codons)

Applications of HMMs and Markov Models

Sequence alignment — pairwise and multiplePFAMHMMProHMMERSAM

Making profiles to describe sequence familiesFinding signals in DNA

Gene finding (GLIMMER, GENSCAN)Motif findingSegmentation analysis (microarray data, any signals)Finding CpG islands