•URL: .../publications/courses/ece_8423/lectures/current/lecture_28

ECE 8443 – Pattern RecognitionECE 8423 – Adaptive Signal Processing

•Objectives:Language Modeling in ASRDiscriminative Feature MappingExample SystemCourse Evaluations

• Resources:MB: Unsupervised LM AdaptationRS: Statistical Language ModelingDP: Discriminatively Trained FeaturesAS: Discriminative AdaptationIBM: GALE Mandarin

• URL: .../publications/courses/ece_8423/lectures/current/lecture_28.ppt• MP3: .../publications/courses/ece_8423/lectures/current/lecture_28.mp3

LECTURE 28: STATE OF THE ART

http://www.cslu.ogi.edu/people/roark/ICASSP03.pdf

http://www.cs.cmu.edu/~roni/papers/survey-slm-IEEE-PROC-0004.pdf

http://dpovey.googlepages.com/interspeech05_av_jing.pdf

http://www.icsi.berkeley.edu/pubs/speech/interspeech-fmpe-map.ps.pdf

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/ARCHIVE/lecture_28_ibm2008.pdf

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/lecture_28.ppt

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8423/lectures/current/lecture_28.mp3

http://www.ece.msstate.edu/research/isip/publications/courses/ece_8463/lectures/current/lecture_32/index.html

ECE 8423: Lecture 28, Slide 2

Statistical Approach To Speech Recognition


Core components:• transduction• feature extraction• acoustic modeling (hidden Markov

models)• language modeling (statistical N-

grams)• search (Viterbi beam)• knowledge sources

Our focus will be on the acoustic modeling components of the system.

Speech Recognition Architectures


Statistical Language Modeling: N-Gram Models• The probability of a word sequence, , can be decomposed as:

• Clearly, estimating this for every unique word history is prohibitive. A practical approach is to assume this probability depends only on an equivalence class:

• There are three common simplifications, known as N-grams, we can make:

• Of course, there are many ways to merge histories, such as based on linguistic context (e.g., parts of speech such as article, noun), and we can use higher-order N-grams.

nwwwwW ...321

n

iii

nn

n

wwwwP

wwwwPwwwPwwPwP

wwwwPP

1121

121213121

321 ...

W

n

iii

n

iii

wwwwP

wwwwPP

1121

1121

W

21121

1121

121

iii

ii

i

wwwwwwwww

www

:Trigram:Bigram

:Unigram


N-Gram Models Require Adaptation


MAP Language Model (LM) Adaptation• The LM adaptation problem is often described as an interpolation problem

between an existing LM and an LM estimated from new data.

• Any of the approaches we have previously discussed can be employed. MAP adaptation can be shown to simplify to:

• If additional assumptions about the priors for the histories are made, this simplifies further to:

• Most of the adaptation methods we have discussed previously can be applied to this problem because a language model at its core is just a likelihood model.

• However, language models must also deal with the problem of unseen events, and hence models must be smoothed to account for sparseness of data.

121121121 )1(ˆ iinewiioldii wwwwPwwwwPwwwwP


Discriminatively-Trained Features• Features can also be adapted using a similar transformational approach that

we used for Gaussian means:

where ht represents a transformation high-dimensional features and M represents a dimensionality reduction transformation. This approach combines the large-margin classification approaches (e.g., support vector machines) with traditional GMM approaches.

• The transformation M is typically estimated using an MPE criterion, and hence this method is often called fMPE.

ttt Mhxy


State of the Art Systems (IBM GALE)


Summary• Discussed adaptation of language models and showed the process is similar

to that for feature vectors.

• Discussed feature-space adaptation.

• Reviewed a state of the art system that uses many forms of adaptation.

• Course evaluations…

•URL: .../publications/courses/ece_8423/lectures/current/lecture_28

Documents

Transcript of •URL: .../publications/courses/ece_8423/lectures/current/lecture_28