Speech Recogntion Using Hidden Markov Models

download Speech Recogntion Using Hidden Markov Models

of 61

Transcript of Speech Recogntion Using Hidden Markov Models

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    1/61

    SPEECH RECOGNTION

    USING HIDDEN MARKOV

    MODELS

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    2/61

    OUTLINE

    ITHE SPEECH

    SIGNAL

    IITHE HIDDEN

    MARKOVMODEL

    IIISPEECH

    RECOGNITIONUSING HMM

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    3/61

    INTRODUCTION

    APPLICATIONS :

    1. HANDS-FREE COMPUTING

    II. AUTOMATIC TRANSLATION

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    4/61

    EARLY HISTORY

    1952Isolated digit recognition for asingle speaker.

    1959 Vowel Recognition Program

    1970sIsolated word recognitionbecame a usable technology.

    Pattern recognition ideas areapplied to speech

    recognition.Ideas of LPC are employed in

    speech recognition.

    1980sIntroduction of HMM

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    5/61

    I. THE SPEECH SIGNAL

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    6/61

    OUTLINE:

    THE SPEECHSIGNAL

    SPEECH

    PRODUCTION

    SPEECHREPRESENTATION

    3-STATEREPRESENTATION

    SPECTRALREPRESENTATION

    SPEECH TOFEATUREVECTORS

    PRE-PROCESSING

    WINDOWING

    FEATUREEXTRACTION

    POSTPROCESSING

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    7/61

    SPEECH PRODUCTION

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    8/61

    What does each block represent

    .. ?

    Voiced Components

    Impulse train Generator Lungs

    Glottal pulse model Epiglottis

    Vocal tract model Vocal Tract

    Radiation model Lips

    Random noise Unvoiced

    sounds

    generator

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    9/61

    SPEECH REPRESENTATION

    Short-time stationary / quasi stationary

    Types :

    Time-domain representation

    Frequency-domain representation

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    10/61

    Time-domain representation :

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    11/61

    Frequency-domain

    representation:

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    12/61

    OBTAINING FEATURE

    VECTORS

    PreprocessingFrame

    Blocking andWindowing

    FeatureExtraction

    Postprocessing

    Why do we need feature vectors ?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    13/61

    Pre-processing :

    Noisecancellation

    Pre-emphasis

    VoiceActivationDetection

    (VAD)

    Purpose : To modify raw speech signal so that

    It is more suitable for feature extraction

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    14/61

    Noise Cancelling and Pre-

    emphasis

    Methods for noise cancellation Spectral subtraction

    Adaptive noise cancellation

    Pre-emphasis

    To emphasize high frequency

    components

    .because often high frequency

    components have low SNR

    H(z) = 1- 0.5z-1 ; S1(z) = H(z)S(z)

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    15/61

    Voice Activation Detection (VAD)

    The signal is chopped-off !!!!

    Finds the end-points of the utterances.

    Why.?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    16/61

    This is for a single chunk.

    Ws1(m) = Ps1(m)(1 Zs1(m))Sc

    Ps1= short term power estimate

    Zs1= zero-crossing rate

    Sc= scaling factor

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    17/61

    The threshold twis decided by some function of the

    mean and variance of Ws1itself.

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    18/61

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    19/61

    Windowing

    Window function such as HammingWindow is applied to reduce the

    discontinuity at the edges of blocks

    Hamming Window

    w(k) = 0.54 0.46 cos ( 2k / K1 )

    K = no. of samples in a speech signal

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    20/61

    Feature Extraction:

    Feature Extraction

    LPC MFCC

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    21/61

    Linear Predictive coding

    (LPC) Encodes at low bit-rate

    Assumption : speech sample at

    current time can be approximated

    from past samples. Glottal, vocal-tract, lip-radiation

    transfer functions are integrated into

    all-pole LPC filter. Feature vectors are ak.

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    22/61

    Mel Frequency Cepstral

    Coefficients (MFCC)

    A non-linear frequency scale is used

    Linear until 1KHz

    Logarithmic afterwards

    Similar to human Cochlea

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    23/61

    Xt[n] is the DFT of the tthinput speech frame,

    Hm[n] is the frequency response of mthfilter in

    the filter bank, N is the window size of the

    transform and M is the total number of filters

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    24/61

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    25/61

    Advantages MFCC reduces information in speech to

    small no. of coefficients

    MFCC tries to model loudness

    MFCC resembles human auditory model,and it is easy to compute

    But for better accuracy in speech

    recognition both models are usedsimultaneously.

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    26/61

    Post Processing

    Weightfunction

    Normalization

    To give more weightage

    to certain features

    To re-scale the numerical values

    of the features. To stay in the

    same numerical range

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    27/61

    HIDDEN MARKOV MODEL

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    28/61

    MARKOV CHAINS:

    Markov Process ?

    First Order Markov Process. ?

    Markov Chain: Markov Process withfinite states

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    29/61

    HIDDEN MARKOV MODEL

    HMM : If one cannot observe states

    If states are visible then it is termed asObservable Markov model

    In a hidden Markov model, the state isnot directly visible, but output,dependent on the state, is visible

    HMM l

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    30/61

    HMM example

    Imagine that you are a climatologist in

    the year 2999 studying the history ofglobal warming. You cannot find any

    records of the weather for the summer

    of 2007, but you do find Jasons diary,which lists how many ice-creams

    Jason ate every day that summer. Our

    goal is to use these observations to

    estimate the temperature every day.

    Assume there are only two kinds of

    days: cold (C) and hot (H).

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    31/61

    Notation :

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    32/61

    Notation :

    T = length of the observation sequence

    N = number of states in the model M = number of distinct observation symbols i.e., the number of

    symbols observed.

    Q = {q0,q1,...,qN1} = distinct states of the Markov process

    V = {0,1,...,M 1} = discrete set of possible observations

    A = {ai,j}where ai,j = P(it+1 | it= i), the probability of being in state

    j at time t+1 given that we were is state i at time t. We assumethat ai,j are independent of time. These are also referred asstate transition probabilities

    B = { bj(k)}, bj(k) = P(vkat t| it= j), the probability of observingsymbol vkgiven that we were in state i . Also termed asobservation probability matrix

    = initial state distribution. = {i} , i = P(i1= i), theprobability of being in state i at the beginning of the experimenti.e., at t=1.

    O= (O0,O 1,...,O T1) = observation sequence. Ot will denotethe observation symbol observed at time t.

    = (A, B, ) will be used as a compact notation to denoteHMM.

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    33/61

    The three problems for HMMs

    Problem -1 Problem 1: Given the observation

    sequence O = O1, O2,.. OT, and a

    model = (A, B, ), how do wecompute P(O| ), the probability of the

    observation sequence, given the

    model ?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    34/61

    Problem - 1

    Evaluation Problem

    It tells us how well a given modelmatches the observation sequence.

    Application in speech recognition. ?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    35/61

    Problem -11

    Given the observation sequence O =

    O1, O2,.. OT, and a model = (A, B,

    ), how do we choose a

    corresponding state sequence Q = q1

    q2.. qTwhich is optimal in some

    meaningful sense. (i.e., best explains

    the observation sequence)?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    36/61

    Problem -11

    We attempt to uncover the hiddensequences.

    We can never uncover the exacthidden state sequence.

    Application in speech recognition. ?

    What if a phoneme is lost in a word .

    ?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    37/61

    Problem -111

    How do we adjust model parameters

    = (A, B, ) to maximize P(O| ) ?

    This is associated with training of

    HMM

    Solution to Problem 1

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    38/61

    Solution to Problem - 1

    Imagine that you are a climatologist in

    the year 2999 studying the history ofglobal warming. You cannot find any

    records of the weather for the summer

    of 2007, but you do find Jasons diary,which lists how many ice-creams

    Jason ate every day that summer. Our

    goal is to use these observations to

    estimate the temperature every day.

    Assume there are only two kinds of

    days: cold (C) and hot (H).

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    39/61

    .8 .2

    Given the HMM, what is the probability of the sequence {3,

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    40/61

    We want to compute P(O|) or P(O)

    This task is not straight-forward,

    because we dont know the states that

    produced this observation sequence

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    41/61

    For the state sequence Q = {H,H,C}, Given

    O = {3,1,3}

    Compute joint prob. P(O,Q) . ?

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    42/61

    We have shown for one particularcase, but there are 8 different state

    sequences, such as {C,C,C}, {C,C,H}

    etc We would sum over all the 8 possible

    state sequences i.e.,

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    43/61

    This is a greedy algorithm

    For N hidden states and T

    observations there are NTcomb. ofstate seq.

    So we move on to a recursivealgorithm called Forward Algorithm

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    44/61

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    45/61

    S

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    46/61

    Solution to Problem11

    Given a HMM, we are trying to find themost-likely state sequence for a

    particular observation sequence.

    Employing greedy algorithm, wewant to find the seq. of hidden states

    that maximizes

    Pr(observed seq. , hidden state comb. | )

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    47/61

    Problem: Computationally expensive!!!

    Solution: Viterbi Decoding

    Logic: It is an inductive algorithm in

    which at each instant you keep the

    best possible state sequence for eachof the N states as the intermediate

    state for the desired observation

    sequence O = o1 ,o2,...,oT

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    48/61

    Our goal is to maximize P(O,Q|)

    P(O,Q| ) = P(O|Q, ). P(Q| )

    =1.bq1(o1).aq1q2.bq2(o2)aqT1qT.bqT(oT)

    Now define,

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    49/61

    It can be seen that, P(O,Q| ) = exp (-U(q0,q1,q2,...,qT))

    Initially our goal was to maximizeP(O,Q|)

    Now, we want to minimize U(Q)

    U(Q) is an attempt to re-scale theprobability values.

    -ln( aqjqk bqk(Ot) ) can be viewed asCost function.

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    50/61

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    51/61

    Solution to Problem - 111

    Deals with training HMM Encodes HMM parameters to fit the

    observation

    2 methods to solve this.. ! Segmental K-means Algorithm

    Baum-Welch Re-estimation formula

    S t l K l ith

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    52/61

    Segmental K-means algorithm :

    Tries to adjust model parameters to maximizethe prob. of P(O,Q|), where Q is theOptimum seq. found by problem-2

    Baum-Welch Re-estimation formulae :

    Tries to adjust model parameters to maximizethe prob. of P(O,Q|).

    Finds more general solution.

    So which is preferred. ?

    Segmental K means

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    53/61

    Segmental K-means

    algorithm Let,

    = no. of observation seq.

    T = length of each observation seq.

    D = dimension of each observationsymbol Dimensions 1,2,3. . .. D

    Length 1,2,3, .

    T

    For a single

    observation seq.

    i.e., for = 1

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    54/61

    Choose N symbols (dimension D), and

    assign the remaining Tsymbols to

    each of the N chosen ones accordingto Euclidean dist.

    Calculate initial and transition prob.

    Calculate observation symbol prob

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    55/61

    Calculate observation symbol prob.

    Using these formulae

    Assumption : symbol prob. Distributionare assumed to be Gaussian

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    56/61

    Find the optimal state sequence Q* as

    given by the solution to Problem 2 foreach training sequence using

    computed above. A vector is

    reassigned a state if its original

    assignment is different from the

    corresponding estimated optimum

    state.

    This process is contd. unless there is

    no new assignment operation.

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    57/61

    Isolated word recognizer :

    Assume we have a vocabulary of Vwords, also we have K utterances ofeach word.

    Training a HMM:

    For each word v in the vocabulary, we

    must build an HMM v , i.e., we mustestimate the model parameters (A,B,)that optimize the likelihood of the trainingset observation vectors of the vth word.

    Testing :

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    58/61

    Testing :

    For each unknown word which is to berecognized, first we should measure the

    observation sequence O = O1,O2. OT,via feature analysis of the speech

    corresponding to the word, followed bycalculation of model likelihoods for all

    possible models, P(O| v), followed by

    selection of the word whose model

    likelihood is highest

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    59/61

    A simple yes,no example .

    Continuous speech

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    60/61

    Continuous speech

    Recognition We connect the HMMs in a sequence.

    Instead of taking the one with

    maximum probability, we try tominimizes the expectancy of a given

    loss function.

    Reason: Well we are predicting

    multiple words here .

  • 8/10/2019 Speech Recogntion Using Hidden Markov Models

    61/61