2012 mdsp pr06  hmm

27
Course Calendar Class DATE Contents 1 Sep. 26 Course information & Course overview 2 Oct. 4 Bayes Estimation 3 11 Classical Bayes Estimation - Kalman Filter - 4 18 Simulation-based Bayesian Methods 5 25 Modern Bayesian Estimation Particle Filter 6 Nov. 1 HMM(Hidden Markov Model) Nov. 8 No Class 7 15 Supervised Learning 8 29 Bayesian Decision 9 Dec. 6 PCA(Principal Component Analysis) 10 13 ICA(Independent Component Analysis) 11 20 Applications of PCA and ICA 12 27 Clustering, k-means et al. 13 Jan. 17 Other Topics 1 Kernel machine. 14 22(Tue) Other Topics 2

Transcript of 2012 mdsp pr06  hmm

Page 1: 2012 mdsp pr06  hmm

Course Calendar Class DATE Contents

1 Sep. 26 Course information & Course overview

2 Oct. 4 Bayes Estimation

3 〃 11 Classical Bayes Estimation - Kalman Filter -

4 〃 18 Simulation-based Bayesian Methods

5 〃 25 Modern Bayesian Estimation :Particle Filter

6 Nov. 1 HMM(Hidden Markov Model)

Nov. 8 No Class

7 〃 15 Supervised Learning

8 〃 29 Bayesian Decision

9 Dec. 6 PCA(Principal Component Analysis)

10 〃 13 ICA(Independent Component Analysis)

11 〃 20 Applications of PCA and ICA

12 〃 27 Clustering, k-means et al.

13 Jan. 17 Other Topics 1 Kernel machine.

14 〃 22(Tue) Other Topics 2

Page 2: 2012 mdsp pr06  hmm

Lecture Plan

Hidden Markov Model

1. Introduction

2. Hidden Markov Model (HMM)

Discrete-time Markov Chain & HMM

3. Evaluation Problem

4. Decoding Problem

5. Learning Problem

6. HMM for speech recognition

Page 3: 2012 mdsp pr06  hmm

1. Introduction

3

1.1 Discrete-time hidden Markov model (HMM)

The HMM is a stochastic model of an ordered or sequential process

that can be used for modeling and estimation. Its distinguishing feature

is the probabilistic model for both states transition and measurements

process. The internal states are usually not observed directly therefore,

are hidden.

1.2 Applications

All fields treating discrete representation of stochastic processes:

Speech recognition, Natural language, Communications, Economics,

Biomedical (DNA analysis), Computer vision (Gesture recognition)

Page 4: 2012 mdsp pr06  hmm

2. HMM

4

2.1 Discrete-time Markov Chains (finite state machine)

1, 2,

At each time step t, the state variables are defined by

state space

where, .....,

Pr : the probability that at time t the state i is occupied.

1st-order Markovian:

Pr

xN

i

n

x t

x t

x t

X

X X X X

1 , 2 ,...., 0 Pr 1

Define a :=Pr 1 (time-stationary)

m r l n m

mn n m

x t x t x x t x t

x t x t

Define the state transition probability:

: Pr 1 (time stationary)

Initial probability: Pr 0 = 1,...,

mn n m

i i x

a x t x t

x i N

X

X(1)

(2)

(3)

(4)

0 ,.....,xNx x

Page 5: 2012 mdsp pr06  hmm

(hidden (latent) states) (visible variables)

Introduce the measurement or output process into the Markov chain

Markov Chain + measurements HMM

discrete outputs in the observation spacyN

1 2

:

e.

(observation space)

, ,.....,

The observation or emission probability (likelihood)

: Pr

Define

: 1 , 2 ,......, , : 1 , 2 ,......,

:

yN

kl l k

T i j l T k r p

t T r

y t

y y y

c y t x t

X x x x T Y y y y T

Y y

Y

Y

0

,......,

Initial / final (absorber) states: , .

st y T

x

2.2 Hidden Markov Model (HMM)

(5)

(6)

(7)

Page 6: 2012 mdsp pr06  hmm

6

x(t) : state sequence (hidden)

y(t) : measurement sequence

1ix t jx t 1kx t

1ly t my t 1ny t

ija jka

ilc jmcknc

Structure of HMM

: , ,ij jk iParameters a c

Page 7: 2012 mdsp pr06  hmm

7

1ix t jx t 1kx t

1ly t my t 1ny t

ija jka

ilc jmcknc

Evaluation problem : Determine the probability

of a particular sequence of output YT

Decoding problem : For a given output sequence YT,

determine the most probable sequence XT of hidden states that

leads to the observation YT

Learning problem : Given

a set of output sequences

{Y(i)t}, determine the model

parameters . ija jkc

Three Central Issues in HMM

Page 8: 2012 mdsp pr06  hmm

8

1ix t jx t 1kx t

1ly t my t 1ny t

ija jka

ilc jmcknc

Evaluation problem :

Determine the probability of a particular

sequence of output YT

Decoding problem : For a given output sequences Yt’s,

determine the most probable sequence of hidden states

that lead to those observations

Learning problem : Given

a set of output sequences

Yt’s, determine the model

parameters . ija jmc

Three Central Issues in HMM

Pr TY

Page 9: 2012 mdsp pr06  hmm

9

1ix t jx t 1kx t

1ly t my t 1ny t

ija jka

ilc jmcknc

Evaluation problem : Determine the

probability of a particular sequence of output Yt

Decoding problem : For a given output

sequence YT, determine the most probable

sequence XT of hidden states that leads to the

observation YT

Learning problem : Given

a set of output sequences

Yt’s, determine the model

parameters . ija jmc

Three Central Issues in HMM

Page 10: 2012 mdsp pr06  hmm

10

1ix t jx t 1kx t

1ly t my t 1ny t

ija jka

ilc jmcknc

Evaluation problem : Determine the

probability of a particular sequence of output Yt

Decoding problem : For a given output sequences Yt’s,

determine the most probable sequence of hidden states

that lead to those observations

Learning problem :

Given a set of output

sequences {Y(i)T},

determine the model

parameters . ija jmc

Three Central Issues in HMM

Page 11: 2012 mdsp pr06  hmm

3. Evaluation Problem

max

1

max

The probability that the model produces an output sequence is

Pr Pr Pr

where indicates an index of a possible sequence of T hidden

states of , and (the number of possible

T

r

r rT T T T

r

rT

Y

Y Y X X

r

X r

terms in ).TT xX N

1

0

0

From the first-order Markovian property, the second term of (8)

Pr Pr 1 , 2 ,....,

Pr 1 product of the '

At , : final absorbing state and gives

the output uniquly.

T

T

ij

t

X x x x T

x t x t a s

t T x T x x T

y

(8)

(9)

Page 12: 2012 mdsp pr06  hmm

max

max

1

1sum over all possible sequences of hidden states of

From the measurement mechanism,

Pr Pr

Finally, (9) and (10) give

Pr Pr Pr 1

Tr

T T

t

r T

T

r tr

Y X y t x t

Y y t x t x t x t

the conditional probabilities

Basic elements in the Pr representation in (11) :

Pr Pr 1

Evaluation of Pr by the forward algorithm uses the following

: Pr

T

T

j t j

Y

y t x t x t x t

Y

forward variables

t Y x t

Forward Algorithm

(10)

(11)

(12)

(13)

Page 13: 2012 mdsp pr06  hmm

The represents the probability that the HMM is

in the hidden state at step (i.e. ) having generated

the first (0 ) elements of ( i.e. ).

j

j j

T t

t

x t x t x

t t Y Y

Forward variable computation algorithm:

0 0 and initial state

1 0 and initial state

elsewhere 1

where is the index of the output

j

i ij jk

i

k

t j

t t j

t a c

k y t y

(14)

Page 14: 2012 mdsp pr06  hmm

14

Example: Forward algorithm

0

4

0

0 1 2 3 0 1 2 3

Three hidden states and an explicit sbsorber state and unique final

output

= , , , , , , ,

1 0 0 0 1 0 0 0

0.2 0.3 0.1 0.4 0 0.3 0.4,

0.2 0.5 0.2 0.1 0 0.1 0.1

0.8 0.1 0.0 0.1

,

0.1

0ij jk

x

y

x x x x y y

c

y y y

a

X Y

1 3 2 0

1

0

0.2

0.1

0 0.5 0.2 0.2

the probability it generates the following particular sequence

1 , 2 , ,

Suppose we have the initial hidden state at 0 to be

7

1

.

.

0.

Compute

y y y y y t y y t y

t x

Page 15: 2012 mdsp pr06  hmm

15

0

1

2

3

x

x

x

x

t

0

0 0

1

1 0

0

2 0

0

3 0

0

0 1

0.09

1 1

0.01

2 1

0.2

3 1

0

0 2

0.0052

1 2

0.0077

2 2

0.0057

3 2

0

0 3

0.024

1 3

0.0002

2 3

0.0007

3 3

0.0011

0 4

0

1 4

0

2 4

0

3 4

0 1 2 3 4

y t 1y3y 2y 0y

Trellis Diagram of forward variables

* Numerical values in above diagram are rounded to four places of decimals

Page 16: 2012 mdsp pr06  hmm

16

4. Decoding problem

16

Problem:

Suppose we have an HMM as well as an observation YT.

Determine the most likely sequence of hidden states

{x(0), ..……,x(T) } that leads the observation.

Solution: Convenient method is to connect the hidden states with the

highest value of 𝛼𝑗 at each step t in the trellis diagram.

This one does not always give the optimal solution due to the

existence of forbidden path connection in the case of 𝑎𝑖𝑗 = 0.

Page 17: 2012 mdsp pr06  hmm

17

5. Learning problem

17

Problem:

Given a training set of observation sequences, {YT

j } j=1,….,J

Find the best (MAP) estimate of aij and cjk ,assuming(*) that the

hidden state sequences for these training data are known a priori. (*More sophisticated algorithm without assuming this is actually applied in

practice)

Useful approach is to iteratively update the parameters in order to

better explain the observed training sequences.

Forward-backward algorithm The forward variables 𝛼i(t) as well as the following backward

variables𝛽i(t) are employed in this algorithm.

Page 18: 2012 mdsp pr06  hmm

0

0

Backward variable computation:

0 and

1 and

1 otherwise

where is the index of the output 1

i

i i

j ij jk

j

k

t T x t x

t t T x t x

t a c

k y t y

1:

:

: Pr

represents the probability that the model is in

and generates the given target sequence from ( 1) to ( ) .

i t T i

i

t T

defined by

t Y x t

x t x

y t y T Y

Backward Algorithm

Backward variable(15)

(16)

Page 19: 2012 mdsp pr06  hmm

19

Define the following posterior probability of a state sequence passes

through state xi(t-1) and xj(t).

1

Pr

i ij jk j

ij

T

t a c tt

Y

Forward-backward algorithm

: Pr 1 , ,

where = , is the HMM parameters.

ij i j T

ij jk

t x t x t Y

a c

The 𝛾𝑖𝑗 𝑡 is given as follows:

*

1 :

Pr 1 , ,

Pr 1 , , , 1

Pr Pr

ij i j T

i j t t T i ij jk j

T T

t x t x t Y

x t x t Y Y t a c t

Y Y

(17)

(18)

(19)

Path’s past history at t

Evaluation problem

Current activity at t with observation yk(t+1)

Path’s future history at t

Page 20: 2012 mdsp pr06  hmm

Baum-Welch estimate

(20)

1

1 1

1

1 1

ˆ ,

ˆ , such that

x

x

T

ij

tij NT

ij

t j

T

ij

tjk kNT

ij

t j

t

a

t

t

c y t y

t

In our case, since all of the paths are known, then it is possible to

count the number of times each particular transition or output

observation in a set of training data. The HMM parameters are

empirically estimated by use of the following frequent or

repetition ratio.

Page 21: 2012 mdsp pr06  hmm

21

where

: counted number of the state transition from to

: counted number of the output observation

emitted from (t) to y

ij i j

jk

i k

N x t x x

N y t

x

ˆ ˆ,ij jk

ij jk

ij jk

j j

N x t N y ta c

N x t N y t

Unknown state sequence case:

Some types of iterative approach will be applied.

Expectation Maximization (EM)/Baum-Welch approach:

Start form an initial guess of 𝑎ij , and 𝑐jk gives the initial estimates of

αi(t-1) and 𝛽j(t) , then repeat the B-W with known parameters.

(21)

Page 22: 2012 mdsp pr06  hmm

Phoneme(*)-unit HMM

(* smallest segment unit of speech sound)

Short-time Frequency spectrum data (Mel cepstrum 12-dimensional vector sequence) → y(t) t=1,…,T

Left-to-right HMM (no reverse-time transition model )

Learning problem HMM of /a/

6. Application of HMM -speech recognition-

Page 23: 2012 mdsp pr06  hmm

Speech Signal

0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

-6

-4

-2

0

2

4

6

time

amp

litud

e

/g/

/a/

Phoneme

Page 24: 2012 mdsp pr06  hmm

20 40 60 80 100 120

50

100

150

200

250-60

-40

-20

0

20

40

Time-Frequency domain representation -SPECTROGRAM-

Time (Frame number)

Frequ

en

cy

(Bin

Page 25: 2012 mdsp pr06  hmm

HMM of /k/ HMM of /e/ HMM of /i/ HMM of /o/

Word-level HMM (Linked phoneme HMM’s)

Page 26: 2012 mdsp pr06  hmm

Learning problem

𝑊𝑖 𝑖th 𝑊𝑜𝑟𝑑 ↔ 𝜽𝑖 (𝐻𝑀𝑀 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠)

Recognition:

Given a sequence of speech feature vectors for an uttered

word y(t) t=1~T,

Find the most probable word WMAX in the following

sense

Evaluation problem

max

Pr Prarg Pr ( )

Pr

Pr : Language model

Pr Pr 1,.....

i

T i i

i TW

T

i

T i T i W

Y W WW Max W Y Bayes

Y

W

Y W Y i N

Page 27: 2012 mdsp pr06  hmm

Other application fields of HMM ・Time sequence : Music, Economics ・Symbol sequence : Natural language ・Spatial sequence : Image processing (Gesture recognition) ・structure order : Sequence of a gene's DNA

References: Main reference materials in this lecture are

[1] R.O. Duda, P.E. Hart, and D. G. Stork, “Pattern Classification”, John Wiley & Sons, 2nd edition, 2004 [2] J. Candy, “ Bayesian Signal Processing Classical, Modern, and Particle Filtering

Methods”, John Wiley/IEEE Press, 2009