Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite...

64
Infinite Structured Explicit Duration Hidden Markov Models Frank Wood Joint work with Chris Wiggins, Mike Dewar, and Jonathan Huggins Columbia University November, 2011 Wood (Columbia University) ISEDHMM November, 2011 1 / 57

Transcript of Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite...

Page 1: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Infinite Structured Explicit DurationHidden Markov Models

Frank Wood

Joint work with Chris Wiggins, Mike Dewar, and Jonathan Huggins

Columbia University

November, 2011

Wood (Columbia University) ISEDHMM November, 2011 1 / 57

Page 2: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Hidden Markov Models

Hidden Markov models (HMMs) [Rabiner, 1989] are an important tool fordata exploration and engineering applications.

Applications include

Speech recognition [Jelinek, 1997, Juang and Rabiner, 1985]

Natural language processing [Manning and Schutze, 1999]

Hand-writing recognition [Nag et al., 1986]

DNA and other biological sequence modeling applications [Kroghet al., 1994]

Gesture recognition [Tanguay Jr, 1995, Wilson and Bobick, 1999]

Financial data modeling [Ryden et al., 1998]

. . . and many more.

Wood (Columbia University) ISEDHMM November, 2011 2 / 57

Page 3: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Graphical Model: Hidden Markov Model

πm

θm

y1

y2 yT

y3K

z0 z1 z2 z3 zT

Wood (Columbia University) ISEDHMM November, 2011 3 / 57

Page 4: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Notation: Hidden Markov Model

zt |zt−1 = m ∼ Discrete(πm)

yt |zt = m ∼ Fθ(θm)

A =

...

......

π1 · · · πm · · · πK...

......

Wood (Columbia University) ISEDHMM November, 2011 4 / 57

Page 5: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

HMM: Typical Usage Scenario (Character Recognition)

Training data: multiple “observed” yt = vt , ht sequences of styluspositions for each kind of character

Task: train |Σ| different models, one for each character

Latent states: sequences of strokes

Usage: classify new stylus position sequences using trained modelsMσ = Aσ,Θσ

P(Mσ|y1, . . . , yT ) ∝ P(y1, . . . , yT |Mσ)P(Mσ)

Wood (Columbia University) ISEDHMM November, 2011 5 / 57

Page 6: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Shortcomings of Original HMM Specification

Latent state dwell times are not usually geometrically distributed

P(zt = m, . . . , zt+L = m|A)

=L∏`=1

P(zt+`+1 = m|zt+` = m,A)

= Geometric(L; πm(m))

Prior knowledge often suggests structural constraints on allowabletransitions

The state cardinality of the latent Markov chain K is usually unknown

Wood (Columbia University) ISEDHMM November, 2011 6 / 57

Page 7: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Explicit Duration HMM / H. Semi-Markov Model

[Mitchell et al., 1995, Murphy, 2002, Yu and Kobayashi, 2003, Yu, 2010]

Latent state sequence z = (s1, r1, . . . , sT , rT)Latent state id sequence s = (s1, . . . , sT )

Latent “remaining duration” sequence r = (r1, . . . , rT )

State-specific duration distribution Fr (λm)

Other distributions the same

An EDHMM transitions between states in a different way than does atypical HMM. Unless rt = 0 the current remaining duration is decrementedand the state does not change. If rt = 0 then the EDHMM transitions to astate m 6= st according to the distribution defined by πst

Wood (Columbia University) ISEDHMM November, 2011 7 / 57

Page 8: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

EDHMM notation

Latent state zt = st , rt is tuple consisting of state identity and time leftin state.

st |st−1, rt−1 ∼

I(st = st−1), rt−1 > 0

Discrete(πst−1), rt−1 = 0

rt |st , rt−1 ∼

I(rt = rt−1 − 1), rt−1 > 0

Fr (λst ), rt−1 = 0

yt |st ∼ Fθ(θst )

Wood (Columbia University) ISEDHMM November, 2011 8 / 57

Page 9: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

EDHMM: Graphical Model

λm

πm

θm

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3K

Wood (Columbia University) ISEDHMM November, 2011 9 / 57

Page 10: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured HMMs: i.e. left-to-right HMM [Rabiner, 1989]

Example: Chicken pox

Observations vital signs

Latent states pre-infection, infected, post-infectiona

State transition structure can’t go from infected to pre-infection

adisregarding shingles

Structured transitions imply zeros in the transition matrix A, i.e.

p(st = m|st−1 = `) = 0 ∀ m < `

Wood (Columbia University) ISEDHMM November, 2011 10 / 57

Page 11: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bayesian HMM

We will put a prior on parameters so that we can effect a solutionthat conforms to our ideas about what the solution should look like

Structured prior examples

Ai,j = 0 (hard constraints)Ai,j ≈

Pj Ai,j (rich get richer)

Regularization means that we can specify a model with moreparameters than could possibly be needed

infinite complexity (i.e. K →∞) avoids many model selection problems“extra” states can be thought of as auxiliary or nuisance variables

Wood (Columbia University) ISEDHMM November, 2011 11 / 57

Page 12: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bayesian HMM

πm

Hy θm

y1

y2 yT

y3K

z0 z1 z2 z3 zTHz

πm ∼ Hz

θm ∼ Hy

zt |zt−1 = m ∼ Discrete(πm)

yt |zt = m ∼ Fθ(θm)

Wood (Columbia University) ISEDHMM November, 2011 12 / 57

Page 13: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Infinite HMMs (IHMM) [Beal et al., 2002, Teh et al., 2006]

πm

θm

y1

y2 yT

y3∞

z0 z1 z2 z3 zT

c0

γ

c1

Hy

K →∞

Wood (Columbia University) ISEDHMM November, 2011 13 / 57

Page 14: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Other HMM variants

Sticky Infinite HMMs [Fox et al., 2011]

Extra parameter per state used to bias towards self-transition

Hierarchical HMMs [Murphy, 2001]

State is hierarchical (i.e. sequence of letters composed of strokesequences)

Factorial HMMs [Ghahramani and Jordan, 1996]

Infinite explicit duration HMM [Johnson and Willsky, 2010]

No generative model. Latent history sampling used to assert theexistence of an implicitly defined IEDHMM that can be sampled fromby rejecting HDP-HMM samples that violate transition constraints.

Wood (Columbia University) ISEDHMM November, 2011 14 / 57

Page 15: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Infinite Structured Explicit Duration HMM (ISEDHMM)

Generative framework for HMMs with

Explicitly parameterized duration distributions

Structured transition priors

Countable state cardinality

Fundamental Problems

How to generate structured, dependent, infinite-dimensional transitiondistributions.

How to do inference in HMMs with countable state cardinality andcountable duration distribution support.

[Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 15 / 57

Page 16: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Graphical Model

c0 Hr

γ λm

c1 πm

Hy θm

r0

s0

r1

s1

y1

r2

s2

y2

rT

sT

yT

r3

s3

y3∞

Structured, dependent, infinite-dimensional transition distributions?

Wood (Columbia University) ISEDHMM November, 2011 16 / 57

Page 17: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Poisson process [Kingman, 1993]

Gamma process [Kingman, 1993]

SNΓPs [Rao and Teh, 2009]

Structured, dependent, infinite-dimensional transition distributions[Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 17 / 57

Page 18: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Gamma process [Kingman, 1993] as Poisson process overΘ⊗ V ⊗ [0,∞) with rate / mean measure

µ(Θ, V , S) = α(Θ, V )

∫Sγ−1e−γ

A draw from a Gamma process with

α(Θ, V ) = c0Hθ(Θ)Hv (V ).

[Kingman, 1993] has the form

G =∞∑

m=1

γmδ(θm,vm)

where (θm, vm) ∼ Hθ × Hv .

Wood (Columbia University) ISEDHMM November, 2011 18 / 57

Page 19: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Non-disjoint “restricted projections” of Gamma processes aredependent Gamma processes (SNΓPs) [Rao and Teh, 2009]

Hv = Geom(p)

0v0

R0

1v1

R1

2v2

R2

3v3

R3

4v4

R4

5v5

R5

6v6

R6

7v7

8v8

R+

9v9

. . .

. . .

. . .

M

R4

G0 =∑m 6=0

. . . , · · · ,G4 =∑m 6=4

γmδθm , · · ·

Wood (Columbia University) ISEDHMM November, 2011 19 / 57

Page 20: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

Normalized dependent ΓP draws are dependent Dirichlet processdraws.1 In the ISEDHMM, DP draws are the dependent, structured,infinite-dimensional transition distributions

D4 =G4

G4(Θ)(1)

=

∑m 6=4 γmδθm∑

θ∈Θ

∑m′ 6=4 γm′δθm′

(2)

=

∑m 6=4 γmδθm∑m′ 6=4 γm′

(3)

=∑m 6=4

γm∑m′ 6=4 γm′

δθm

1a draw from a Dirichlet process [Ferguson, 1973] is an infinite sum of weightedatoms [Sethuraman, 1994] where the weights sum to one.

Wood (Columbia University) ISEDHMM November, 2011 20 / 57

Page 21: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM: Recipe

GRM GR1 GR2 GR3

G2, D2G1, D1 G3, D3 GM , DM

D2, 2D1, 1 D3, 3 DM , M

c0

c1

G = GR+

Structured, dependent, infinite dimensional transition distributionsπm can be formed from draws from DDPs [Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 21 / 57

Page 22: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM Inference: Beam Sampling

We employ the forward-filtering, backward slice-sampling approach for theIHMM of [Van Gael et al., 2008] and EDHMM of [Dewar, Wiggins and W,2011], in which the state and duration variables s and r are sampledconditioned on auxiliary slice variables u.

Net result: efficient, always finite forward backward procedure for samplinglatent states.

Wood (Columbia University) ISEDHMM November, 2011 22 / 57

Page 23: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) ISEDHMM November, 2011 23 / 57

Page 24: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) ISEDHMM November, 2011 23 / 57

Page 25: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Auxiliary Variables for Sampling

Objective: get samples of x .

Sometimes it is easier to introducean auxiliary variable u and to Gibbssample the joint P(x , u) (i.e. samplefrom P(x |u;λ) then P(u|x , λ), etc.)then discard the u values than it is todirectly sample from p(x |λ).

Useful when: p(x |λ) does not have aknown parametric form but adding uresults in a parametric form andwhen x has countable support andsampling it requires enumerating allvalues.

λ

x

λ

x

u

Wood (Columbia University) ISEDHMM November, 2011 23 / 57

Page 26: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Slice Sampling: A very useful auxiliary variable sampling trick

Unreasonable Pedagogical Example:

x |λ ∼ Poisson(λ) (countable support)

enumeration strategy for sampling x (impossible)

auxiliary variable u with P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)

Note: Marginal distribution of x is

P(x |λ) =∑u

P(x , u|λ)

=∑u

P(x |λ)P(u|x , λ)

=∑u

P(x |λ)I(0 ≤ u ≤ P(x |λ))

P(x |λ)

=∑u

I(0 ≤ u ≤ P(x |λ)) = P(x |λ)

Wood (Columbia University) ISEDHMM November, 2011 24 / 57

Page 27: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Slice Sampling: A very useful auxiliary variable sampling trick

This suggests a Gibbs sampling scheme: alternately sampling from

P(x |u, λ) ∝ I(u ≤ P(x |λ))

finite support, uniform above slice, enumeration possible

P(u|x , λ) = I(0≤u≤P(x |λ))P(x |λ)

uniform between 0 and y = P(x |λ)

then discarding the u values to arrive at x samples marginally distributedaccording to P(x |λ).

Wood (Columbia University) ISEDHMM November, 2011 25 / 57

Page 28: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

ISEDHMM Inference: Beam Sampling

Forward-backward slice sampling only has to consider a finite number ofsuccessor states at each timestep. With auxiliary variables

p(ut |zt , zt−1) =I(ut < p(zt |zt−1))

p(zt |zt−1)

and

p(zt |zt−1) = p((st , rt)|(st−1, rt−1))

=

rt−1 > 0, I(st = st−1)I(rt = rt−1 − 1)

rt−1 = 0, πst−1st Fr (rt ;λst ).

one can run standard forward-backward conditioned on u’s.

Wood (Columbia University) ISEDHMM November, 2011 26 / 57

Page 29: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Results

To illustrate IEDHMM learning on synthetic data, five hundred datapointswere generated using a 4 state EDHMM with Poisson durationdistributions

λ = (10, 20, 3, 7)

and Gaussian emission distributions with means

µ = (−6,−2, 2, 6)

all unit variance.

Wood (Columbia University) ISEDHMM November, 2011 27 / 57

Page 30: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Synthetic Data Results

0 100 200 300 400 500−10

−5

0

5

10

time

emission/mean

4 5 6 70

500

1000

number of states

coun

t

number of states

coun

ts

Wood (Columbia University) ISEDHMM November, 2011 28 / 57

Page 31: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Synthetic Data, State Duration Parameter Posterior

0 5 10 15 20 250

100

200

300

duration rate

coun

t

Wood (Columbia University) ISEDHMM November, 2011 29 / 57

Page 32: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Synthetic Data, State Mean Posterior

−10 −5 0 5 100

200

400

600

emission mean

coun

t

Wood (Columbia University) ISEDHMM November, 2011 30 / 57

Page 33: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Nanoscale Transistor Spontaneous Voltage Fluctuation

0 1000 2000 3000 4000 5000−3

−2

−1

0

1

2

time

emission/mean

Wood (Columbia University) ISEDHMM November, 2011 31 / 57

Page 34: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM vs. IHMM: Modeling the Morse Code Cepstrum

Wood (Columbia University) ISEDHMM November, 2011 32 / 57

Page 35: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Wrap-Up

Novel Gamma process construction for dependent, structured, infinitedimensional HMM transition distributions.

Other transition distribution structures (left-to-right) can beimplemented simply by changing “restricted projection regions.”

ISEDHMM framework generalizes the HMM, Bayesian HMM, infiniteHMM, left-to-right HMM, explicit duration HMM, and more.

Wood (Columbia University) ISEDHMM November, 2011 33 / 57

Page 36: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Future Work

Generalize to spatial prior on HMM states (“location”)

Simultaneous location and mappingProcess diagram modeling for systems biology

Applications; seeking “users”

Wood (Columbia University) ISEDHMM November, 2011 34 / 57

Page 37: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Questions?

Thank you!

Wood (Columbia University) ISEDHMM November, 2011 35 / 57

Page 38: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography I

M J Beal, Z Ghahramani, and C E Rasmussen. The Infinite HiddenMarkov Model. In Advances in Neural Information Processing Systems,pages 29–245, March 2002.

M Dewar, C Wiggins, and F Wood. Inference in hidden Markov modelswith explicit state duration distributions. In Submission, 2011.

T.S. Ferguson. A Bayesian analysis of some nonparametric problems. TheAnnals of Statistics, 1(2):209–230, 1973.

E B Fox, E B Sudderth, M I Jordan, and A S Willsky. A StickyHDP-HMM with Application to Speaker Diarization. Annals of AppliedStatistics, 5(2A):1020–1056, 2011.

Z Ghahramani and M I Jordan. Factorial Hidden Markov Models. MachineLearning, 29:245–273, 1996.

J. Huggins and F. Wood. Infinite structured explicit duration hiddenMarkov models. In Under Review, 2012.

Wood (Columbia University) ISEDHMM November, 2011 36 / 57

Page 39: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography II

F. Jelinek. Statistical Methods for Speech Recognition. MIT Press, 1997.

M Johnson and A S Willsky. The hierarchical Dirichlet process hiddensemi-Markov model. In Proceedings of the Twenty-Sixth ConferenceAnnual Conference on Uncertainty in Artificial Intelligence (UAI-10),pages 252–259, 2010.

B H Juang and L R Rabiner. Mixture Autoregressive Hidden MarkovModels for Speech Signals. IEEE Transactions on Acoustics, Speech,and Signal Processing, 33(6):1404–1413, 1985.

J F C Kingman. Poisson Processes. Oxford Studies in Probability. OxfordUniversity Press, 1993.

A. Krogh, M. Brown, I. S. Mian, K. Sjolander, and D. Haussler. HiddenMarkov models in computational biology: Applications to proteinmodelling . Journal of Molecular Biology, 235:1501–1531, 1994.

C. Manning and H. Schutze. Foundations of statistical natural languageprocessing. MIT Press, Cambridge, MA, 1999.

Wood (Columbia University) ISEDHMM November, 2011 37 / 57

Page 40: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography III

C Mitchell, M Harper, and L Jamieson. On the complexity of explicitduration HMM’s. IEEE Transactions on Speech and Audio Processing, 3(3):213–217, 1995.

K P Murphy. Hierarchical HMMs. Technical report, University CaliforniaBerkeley, 2001.

K P Murphy. Hidden semi-markov models (HSMMs). Technical report,MIT, 2002.

R. Nag, K. Wong, and F. Fallside. Script regonition using hidden Markovmodels. In ICASSP86, pages 2071–2074, 1986.

L R Rabiner. A tutorial on hidden Markov models and selected applicationsin speech recognition. In Proceedings of the IEEE, pages 257–286, 1989.

V Rao and Y W Whye Teh. Spatial Normalized Gamma Processes. InAdvances in Neural Information Processing Systems, pages 1554–1562,2009.

Wood (Columbia University) ISEDHMM November, 2011 38 / 57

Page 41: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography IV

T. Ryden, T. Terasvirta, and S. rAsbrink. Stylized facts of daily returnseries and the hidden markov model. Journal of Applied Econometrics,13(3):217–244, 1998.

J. Sethuraman. A constructive definition of Dirichlet priors. StatisticaSinica, 4:639–650, 1994.

D.O. Tanguay Jr. Hidden Markov models for gesture recognition. PhDthesis, Massachusetts Institute of Technology, 1995.

Y W Teh, M I Jordan, M J Beal, and D M Blei. Hierarchical DirichletProcesses. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

J Van Gael, Y Saatci, Y W Teh, and Z Ghahramani. Beam sampling forthe infinite hidden Markov model. In Proceedings of the 25thInternational Conference on Machine Learning, pages 1088–1095. ACM,2008.

Wood (Columbia University) ISEDHMM November, 2011 39 / 57

Page 42: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Bibliography V

A.D. Wilson and A.F. Bobick. Parametric hidden Markov models forgesture recognition. Pattern Analysis and Machine Intelligence, IEEETransactions on, 21(9):884–900, 1999.

S Yu and H Kobayashi. An Efficient Forward–Backward Algorithm for anExplicit-Duration Hidden Markov Model. Signal Processing letters, 10(1):11–14, 2003.

S Z Yu. Hidden semi-Markov models. Artificial Intelligence, 174(2):215–243, 2010.

Wood (Columbia University) ISEDHMM November, 2011 40 / 57

Page 43: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM vs. IHMM

02

46

0

5

100

0.5

1

log(1 + duration KL)emission KL

AMI d

iffer

ence

00.5

12468

−0.6

−0.4

−0.2

0

0.2

0.4

log(1 + duration KL)emission KL

AMI d

iffer

ence

Left: data from Poisson duration distribution ISEDHMM, Right: data from IHMM fit with Poisson duration distribution

ISEDHMM

Wood (Columbia University) ISEDHMM November, 2011 41 / 57

Page 44: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Poisson Process [Kingman, 1993]

Poisson process can be defined by the requirement that the randomvariables defined as the counts of the number of “events” inside each of anumber of non-overlapping finite sub-regions of some space should eachhave a Poisson distribution and should be independent of each other.

ΩA

B

C

D

E

N(A) ∼ Poisson(µ(A))

Wood (Columbia University) ISEDHMM November, 2011 42 / 57

Page 45: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Gamma Process as Poisson Process [Kingman, 1993]

Define a Poisson process over the product space Θ⊗ [0,∞) with meanmeasure

µ(Θ, S) = α(Θ)

∫Sγ−1e−γ

where Θ ∈ Ω and S ⊂ [0,∞). A draw from this Poisson process yields acountably infinite set of pairs (θn, γn)n≥1, which can be used to form anatomic random measure

G =∑n≥1

γnδθn .

Wood (Columbia University) ISEDHMM November, 2011 43 / 57

Page 46: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Gamma Process

This discrete measure is draw from G ∼ ΓP(α) (from previous slide)

G =∑n≥1

γnδθn .

A ΓP can be thought of as an unnormalized DP.

Wood (Columbia University) ISEDHMM November, 2011 44 / 57

Page 47: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Dirichlet Processes

D = G/G (Θ) is a sample from a DP with base measure α

D ∼ DP(α).

A draw from a Dirichlet process is an infinite mixture of weighted, discreteatoms.

For the ISEDHMM

each atom is a next state (there are a countably infinite number ofsuch states)

each atoms’ weight is the probability of transitioning to that state

there will be a countably infinite number of such transitiondistributions

Wood (Columbia University) ISEDHMM November, 2011 45 / 57

Page 48: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured Transitions Via Dependent Dirichlet Processes

One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.

The normalization of these dependent gamma processes form a set ofdependent DPs.

We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.

Wood (Columbia University) ISEDHMM November, 2011 46 / 57

Page 49: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured Transitions Via Dependent Dirichlet Processes

One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.

The normalization of these dependent gamma processes form a set ofdependent DPs.

We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.

Wood (Columbia University) ISEDHMM November, 2011 46 / 57

Page 50: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Structured Transitions Via Dependent Dirichlet Processes

One way to define a set of dependent DPs is to construct a basegamma process over an augmented space by taking the union ofdisjoint independent gamma processes, then define a series ofrestricted projections of that base process, which are themselvesgamma processes.

The normalization of these dependent gamma processes form a set ofdependent DPs.

We will use this procedure to construct a number of dependent DPs(one for each HMM state) which preclude certain transitions. Theprecluded transitions, a form of dependence, arise from particulars.

Wood (Columbia University) ISEDHMM November, 2011 46 / 57

Page 51: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

SNΓP construction of IEDHMM transition distributions

GRM GR1 GR2 GR3

G2, D2G1, D1 G3, D3 GM , DM

D2, 2D1, 1 D3, 3 DM , M

c0

c1

G = GR+

[Huggins and W, 2011]

Wood (Columbia University) ISEDHMM November, 2011 47 / 57

Page 52: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

To review SNΓPs formally, let V be an arbitrary auxiliary space. In general,one can think of this as a covariate, time, or an index. For V ⊂ V, let

α(Θ,V ) = c0Hθ(Θ)Hv (V )

be the base measure for a gamma process G = ΓP(α) defined over theproduct space Θ⊗ V. Here c0 is a concentration parameter and Hθ andHv are probability measures. We will refer to G as the “base ΓP”.

Wood (Columbia University) ISEDHMM November, 2011 48 / 57

Page 53: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Let T be an index set and define restricted projected measures αm for allm ∈ T such that

αm(Θ) = α(Θ,Vm)

where Vm is a subset of V indexed by m.

The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by

Gm(Θ) = G (Θ,Vm).

Then Gm(Θ) is distributed according to a gamma process with basemeasure αm

Gm ∼ ΓP(αm).

Wood (Columbia University) ISEDHMM November, 2011 49 / 57

Page 54: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Let T be an index set and define restricted projected measures αm for allm ∈ T such that

αm(Θ) = α(Θ,Vm)

where Vm is a subset of V indexed by m.

The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by

Gm(Θ) = G (Θ,Vm).

Then Gm(Θ) is distributed according to a gamma process with basemeasure αm

Gm ∼ ΓP(αm).

Wood (Columbia University) ISEDHMM November, 2011 49 / 57

Page 55: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Let T be an index set and define restricted projected measures αm for allm ∈ T such that

αm(Θ) = α(Θ,Vm)

where Vm is a subset of V indexed by m.

The SNΓP gets its name from thinking of V as a space and Vm as a regionof space indexed by m. With G ∼ G being a draw from the base gammaprocess, define the restricted projection Gm by

Gm(Θ) = G (Θ,Vm).

Then Gm(Θ) is distributed according to a gamma process with basemeasure αm

Gm ∼ ΓP(αm).

Wood (Columbia University) ISEDHMM November, 2011 49 / 57

Page 56: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

Spatial Normalized Gamma Processes [Rao and Teh, 2009]

Normalizing Gm yields a draw Dm = Gm/Gm(Θ) distributed according to aDirichlet process with base measure αm

Dm ∼ DP(αm).

Dm(Θ) is not in general independent of Dm′(Θ) because they can shareatoms from G .

Wood (Columbia University) ISEDHMM November, 2011 50 / 57

Page 57: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM [Huggins and W, 2011] (an example ISEDHMM)

Recall the gamma process G = ΓP(α) with base measure

α(Θ,V ) = c0Hθ(Θ)Hv (V ).

Draws G ∼ G are measures over the parameter and covariate productspace Θ⊗ V with the form

G =∞∑

m=1

γmδ(θm,vm)

where (θm, vm) ∼ Hθ × Hv .

In the IEDHMM θm = λm, θm is the duration parameter and outputdistribution parameters for state m.

For pedagogical expediency let T = V = 0, 1, 2, 3, 4, . . . then vm ∈ N.

Wood (Columbia University) ISEDHMM November, 2011 51 / 57

Page 58: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: “spatial” regions for restrictions of base ΓP

Rm = vm m ∈ TR+ = V \ VR = Rm : m ∈ T ∪ R+Rm = R \ Rm m ∈ T

Wood (Columbia University) ISEDHMM November, 2011 52 / 57

Page 59: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Geometric base distribution over states example

Hv = Geom(p)

0v0

R0

1v1

R1

2v2

R2

3v3

R3

4v4

R4

5v5

R5

6v6

R6

7v7

8v8

R+

9v9

. . .

. . .

. . .

M

R4

Wood (Columbia University) ISEDHMM November, 2011 53 / 57

Page 60: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: SNΓP restrictions

From before, using the restricted projection αR(Θ) = α(Θ,R) settingGR(Θ) = G (Θ,R) and DR = GR/GR(Θ) we have

GR ∼ ΓP(αR)

DR ∼ DP(αR).

In the case of the IEDHMM, the base measure corresponding to the pointregion Rm ∈ R is

αRm(Θ) = c0δθm(Θ)Hv (Rm),

while for R+ it isαR+(Θ) = c0Hθ(Θ)Hv (R+).

Wood (Columbia University) ISEDHMM November, 2011 54 / 57

Page 61: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: SNΓP restrictions

From before, using the restricted projection αR(Θ) = α(Θ,R) settingGR(Θ) = G (Θ,R) and DR = GR/GR(Θ) we have

GR ∼ ΓP(αR)

DR ∼ DP(αR).

In the case of the IEDHMM, the base measure corresponding to the pointregion Rm ∈ R is

αRm(Θ) = c0δθm(Θ)Hv (Rm),

while for R+ it isαR+(Θ) = c0Hθ(Θ)Hv (R+).

Wood (Columbia University) ISEDHMM November, 2011 54 / 57

Page 62: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: From restricted ΓPs to DPs

A highly dependent transition distribution for state m is a DP draw Dm

Dm =Gm

Gm(Θ)=

∑R∈Rm

GR∑R′∈Rm

GR′(Θ)

=∑

R∈Rm

GR(Θ)∑R′∈Rm

GR′

GR

GR(Θ)

=∑

R∈Rm

GR(Θ)∑R′∈Rm

GR′DR

Wood (Columbia University) ISEDHMM November, 2011 55 / 57

Page 63: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Mixture of DP construction for transition distributions

DRm ∼ DP(αRm)

DR+ ∼ DP(αR+)

γm ∼ Gamma(c0Hv (Rm), 1)

γ+ ∼ Gamma(c0Hv (R+), 1)

βmk =I(m 6= k)γk

γ+ +∑M

k ′ 6=m γk ′

βm+ =γ+

γ+ +∑M

k ′ 6=m γk ′

Dm =M∑

k 6=m

βmkDRk+ βm+DR+ .

Wood (Columbia University) ISEDHMM November, 2011 56 / 57

Page 64: Infinite Structured Explicit Duration Hidden Markov Modelsfwood/talks/ishsmm.pdf · In nite Structured Explicit Duration Hidden Markov Models ... Countable state cardinality ... ISEDHMM:

IEDHMM: Relaxing the dependence hierarchically

These dependent DPs are the base dist.’s for conditional state transitiondistributions Dm ∼ DP(c1Dm). With βm = (βm1, . . . , βmM , βm+)

πm ∼ Dirichlet(c1βm),

Dm =M∑

k=1

πmkδθm + πm+DR+

where c1 is a concentration parameter and πm = (πm1, . . . , πmM , πm+).

The conditional state transition probability row vector πm is finite, sinceprobabilities of transitioning to new states have been merged into a singleprobability πm+ =

∑∞k=M+1 πmk . This “bin” is dynamically split and

joined at inference time.

Wood (Columbia University) ISEDHMM November, 2011 57 / 57