Speech Recogntion Using Hidden Markov Models
-
Upload
iknowiamanidiot -
Category
Documents
-
view
234 -
download
0
Transcript of Speech Recogntion Using Hidden Markov Models
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
1/61
SPEECH RECOGNTION
USING HIDDEN MARKOV
MODELS
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
2/61
OUTLINE
ITHE SPEECH
SIGNAL
IITHE HIDDEN
MARKOVMODEL
IIISPEECH
RECOGNITIONUSING HMM
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
3/61
INTRODUCTION
APPLICATIONS :
1. HANDS-FREE COMPUTING
II. AUTOMATIC TRANSLATION
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
4/61
EARLY HISTORY
1952Isolated digit recognition for asingle speaker.
1959 Vowel Recognition Program
1970sIsolated word recognitionbecame a usable technology.
Pattern recognition ideas areapplied to speech
recognition.Ideas of LPC are employed in
speech recognition.
1980sIntroduction of HMM
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
5/61
I. THE SPEECH SIGNAL
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
6/61
OUTLINE:
THE SPEECHSIGNAL
SPEECH
PRODUCTION
SPEECHREPRESENTATION
3-STATEREPRESENTATION
SPECTRALREPRESENTATION
SPEECH TOFEATUREVECTORS
PRE-PROCESSING
WINDOWING
FEATUREEXTRACTION
POSTPROCESSING
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
7/61
SPEECH PRODUCTION
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
8/61
What does each block represent
.. ?
Voiced Components
Impulse train Generator Lungs
Glottal pulse model Epiglottis
Vocal tract model Vocal Tract
Radiation model Lips
Random noise Unvoiced
sounds
generator
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
9/61
SPEECH REPRESENTATION
Short-time stationary / quasi stationary
Types :
Time-domain representation
Frequency-domain representation
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
10/61
Time-domain representation :
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
11/61
Frequency-domain
representation:
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
12/61
OBTAINING FEATURE
VECTORS
PreprocessingFrame
Blocking andWindowing
FeatureExtraction
Postprocessing
Why do we need feature vectors ?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
13/61
Pre-processing :
Noisecancellation
Pre-emphasis
VoiceActivationDetection
(VAD)
Purpose : To modify raw speech signal so that
It is more suitable for feature extraction
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
14/61
Noise Cancelling and Pre-
emphasis
Methods for noise cancellation Spectral subtraction
Adaptive noise cancellation
Pre-emphasis
To emphasize high frequency
components
.because often high frequency
components have low SNR
H(z) = 1- 0.5z-1 ; S1(z) = H(z)S(z)
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
15/61
Voice Activation Detection (VAD)
The signal is chopped-off !!!!
Finds the end-points of the utterances.
Why.?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
16/61
This is for a single chunk.
Ws1(m) = Ps1(m)(1 Zs1(m))Sc
Ps1= short term power estimate
Zs1= zero-crossing rate
Sc= scaling factor
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
17/61
The threshold twis decided by some function of the
mean and variance of Ws1itself.
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
18/61
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
19/61
Windowing
Window function such as HammingWindow is applied to reduce the
discontinuity at the edges of blocks
Hamming Window
w(k) = 0.54 0.46 cos ( 2k / K1 )
K = no. of samples in a speech signal
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
20/61
Feature Extraction:
Feature Extraction
LPC MFCC
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
21/61
Linear Predictive coding
(LPC) Encodes at low bit-rate
Assumption : speech sample at
current time can be approximated
from past samples. Glottal, vocal-tract, lip-radiation
transfer functions are integrated into
all-pole LPC filter. Feature vectors are ak.
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
22/61
Mel Frequency Cepstral
Coefficients (MFCC)
A non-linear frequency scale is used
Linear until 1KHz
Logarithmic afterwards
Similar to human Cochlea
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
23/61
Xt[n] is the DFT of the tthinput speech frame,
Hm[n] is the frequency response of mthfilter in
the filter bank, N is the window size of the
transform and M is the total number of filters
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
24/61
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
25/61
Advantages MFCC reduces information in speech to
small no. of coefficients
MFCC tries to model loudness
MFCC resembles human auditory model,and it is easy to compute
But for better accuracy in speech
recognition both models are usedsimultaneously.
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
26/61
Post Processing
Weightfunction
Normalization
To give more weightage
to certain features
To re-scale the numerical values
of the features. To stay in the
same numerical range
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
27/61
HIDDEN MARKOV MODEL
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
28/61
MARKOV CHAINS:
Markov Process ?
First Order Markov Process. ?
Markov Chain: Markov Process withfinite states
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
29/61
HIDDEN MARKOV MODEL
HMM : If one cannot observe states
If states are visible then it is termed asObservable Markov model
In a hidden Markov model, the state isnot directly visible, but output,dependent on the state, is visible
HMM l
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
30/61
HMM example
Imagine that you are a climatologist in
the year 2999 studying the history ofglobal warming. You cannot find any
records of the weather for the summer
of 2007, but you do find Jasons diary,which lists how many ice-creams
Jason ate every day that summer. Our
goal is to use these observations to
estimate the temperature every day.
Assume there are only two kinds of
days: cold (C) and hot (H).
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
31/61
Notation :
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
32/61
Notation :
T = length of the observation sequence
N = number of states in the model M = number of distinct observation symbols i.e., the number of
symbols observed.
Q = {q0,q1,...,qN1} = distinct states of the Markov process
V = {0,1,...,M 1} = discrete set of possible observations
A = {ai,j}where ai,j = P(it+1 | it= i), the probability of being in state
j at time t+1 given that we were is state i at time t. We assumethat ai,j are independent of time. These are also referred asstate transition probabilities
B = { bj(k)}, bj(k) = P(vkat t| it= j), the probability of observingsymbol vkgiven that we were in state i . Also termed asobservation probability matrix
= initial state distribution. = {i} , i = P(i1= i), theprobability of being in state i at the beginning of the experimenti.e., at t=1.
O= (O0,O 1,...,O T1) = observation sequence. Ot will denotethe observation symbol observed at time t.
= (A, B, ) will be used as a compact notation to denoteHMM.
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
33/61
The three problems for HMMs
Problem -1 Problem 1: Given the observation
sequence O = O1, O2,.. OT, and a
model = (A, B, ), how do wecompute P(O| ), the probability of the
observation sequence, given the
model ?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
34/61
Problem - 1
Evaluation Problem
It tells us how well a given modelmatches the observation sequence.
Application in speech recognition. ?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
35/61
Problem -11
Given the observation sequence O =
O1, O2,.. OT, and a model = (A, B,
), how do we choose a
corresponding state sequence Q = q1
q2.. qTwhich is optimal in some
meaningful sense. (i.e., best explains
the observation sequence)?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
36/61
Problem -11
We attempt to uncover the hiddensequences.
We can never uncover the exacthidden state sequence.
Application in speech recognition. ?
What if a phoneme is lost in a word .
?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
37/61
Problem -111
How do we adjust model parameters
= (A, B, ) to maximize P(O| ) ?
This is associated with training of
HMM
Solution to Problem 1
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
38/61
Solution to Problem - 1
Imagine that you are a climatologist in
the year 2999 studying the history ofglobal warming. You cannot find any
records of the weather for the summer
of 2007, but you do find Jasons diary,which lists how many ice-creams
Jason ate every day that summer. Our
goal is to use these observations to
estimate the temperature every day.
Assume there are only two kinds of
days: cold (C) and hot (H).
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
39/61
.8 .2
Given the HMM, what is the probability of the sequence {3,
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
40/61
We want to compute P(O|) or P(O)
This task is not straight-forward,
because we dont know the states that
produced this observation sequence
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
41/61
For the state sequence Q = {H,H,C}, Given
O = {3,1,3}
Compute joint prob. P(O,Q) . ?
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
42/61
We have shown for one particularcase, but there are 8 different state
sequences, such as {C,C,C}, {C,C,H}
etc We would sum over all the 8 possible
state sequences i.e.,
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
43/61
This is a greedy algorithm
For N hidden states and T
observations there are NTcomb. ofstate seq.
So we move on to a recursivealgorithm called Forward Algorithm
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
44/61
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
45/61
S
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
46/61
Solution to Problem11
Given a HMM, we are trying to find themost-likely state sequence for a
particular observation sequence.
Employing greedy algorithm, wewant to find the seq. of hidden states
that maximizes
Pr(observed seq. , hidden state comb. | )
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
47/61
Problem: Computationally expensive!!!
Solution: Viterbi Decoding
Logic: It is an inductive algorithm in
which at each instant you keep the
best possible state sequence for eachof the N states as the intermediate
state for the desired observation
sequence O = o1 ,o2,...,oT
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
48/61
Our goal is to maximize P(O,Q|)
P(O,Q| ) = P(O|Q, ). P(Q| )
=1.bq1(o1).aq1q2.bq2(o2)aqT1qT.bqT(oT)
Now define,
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
49/61
It can be seen that, P(O,Q| ) = exp (-U(q0,q1,q2,...,qT))
Initially our goal was to maximizeP(O,Q|)
Now, we want to minimize U(Q)
U(Q) is an attempt to re-scale theprobability values.
-ln( aqjqk bqk(Ot) ) can be viewed asCost function.
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
50/61
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
51/61
Solution to Problem - 111
Deals with training HMM Encodes HMM parameters to fit the
observation
2 methods to solve this.. ! Segmental K-means Algorithm
Baum-Welch Re-estimation formula
S t l K l ith
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
52/61
Segmental K-means algorithm :
Tries to adjust model parameters to maximizethe prob. of P(O,Q|), where Q is theOptimum seq. found by problem-2
Baum-Welch Re-estimation formulae :
Tries to adjust model parameters to maximizethe prob. of P(O,Q|).
Finds more general solution.
So which is preferred. ?
Segmental K means
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
53/61
Segmental K-means
algorithm Let,
= no. of observation seq.
T = length of each observation seq.
D = dimension of each observationsymbol Dimensions 1,2,3. . .. D
Length 1,2,3, .
T
For a single
observation seq.
i.e., for = 1
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
54/61
Choose N symbols (dimension D), and
assign the remaining Tsymbols to
each of the N chosen ones accordingto Euclidean dist.
Calculate initial and transition prob.
Calculate observation symbol prob
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
55/61
Calculate observation symbol prob.
Using these formulae
Assumption : symbol prob. Distributionare assumed to be Gaussian
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
56/61
Find the optimal state sequence Q* as
given by the solution to Problem 2 foreach training sequence using
computed above. A vector is
reassigned a state if its original
assignment is different from the
corresponding estimated optimum
state.
This process is contd. unless there is
no new assignment operation.
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
57/61
Isolated word recognizer :
Assume we have a vocabulary of Vwords, also we have K utterances ofeach word.
Training a HMM:
For each word v in the vocabulary, we
must build an HMM v , i.e., we mustestimate the model parameters (A,B,)that optimize the likelihood of the trainingset observation vectors of the vth word.
Testing :
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
58/61
Testing :
For each unknown word which is to berecognized, first we should measure the
observation sequence O = O1,O2. OT,via feature analysis of the speech
corresponding to the word, followed bycalculation of model likelihoods for all
possible models, P(O| v), followed by
selection of the word whose model
likelihood is highest
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
59/61
A simple yes,no example .
Continuous speech
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
60/61
Continuous speech
Recognition We connect the HMMs in a sequence.
Instead of taking the one with
maximum probability, we try tominimizes the expectancy of a given
loss function.
Reason: Well we are predicting
multiple words here .
-
8/10/2019 Speech Recogntion Using Hidden Markov Models
61/61