Advisor : Jian-Jiun Ding, Ph. D. Presenter : Ke-Jie Liao NTU,GICE,DISP Lab,MD531 1.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
-
Upload
ashlee-ward -
Category
Documents
-
view
219 -
download
0
description
Transcript of Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
![Page 1: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/1.jpg)
Noise Reduction in Speech Recognition
Professor:Jian-Jiun DingStudent: Yung Chang
2011/05/06
![Page 2: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/2.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 3: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/3.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 4: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/4.jpg)
Mel Frequency Cepstral Coefficients(MFCC)
The most common used feature in speech recognition Advantages: High accuracy and low complexity
39 dimension
![Page 5: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/5.jpg)
Mel Frequency Cepstral Coefficients(MFCC)
The framework of feature extraction:
Speech signalPre-emphasis
Window
DFT Mel filter-bank
Log(| |2)
IDFTMFCC
energy
derivatives
x(n) x’(n)
xt(n) At(k)
Yt(m)
Yt’(m)yt (j)
tt
tt
tt
t
ejyejy
ejy
22 ,,,
yet
![Page 6: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/6.jpg)
Pre-emohasis
Pre-emphasis of spectrum at higher frequencies
Pre-emphasisx[n] x’[n]
![Page 7: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/7.jpg)
End-point Detection(Voice activity detection)
Noise(silence) Speech
![Page 8: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/8.jpg)
Windowing
Rectangle window
Hamming window
![Page 9: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/9.jpg)
Mel-filter bank
After DFT we get spectrum
frequency
amplitude
![Page 10: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/10.jpg)
Mel-filter bank
Triangular shape in frequency(overlaped)
Uniformly spaced below 1kHz
Logarithmic scale above 1kHz
frequency
amplitude
![Page 11: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/11.jpg)
Delta Coefficients
1 st/2 nd order differences
1 st order
13 dimension
2 nd order
39 dimension
![Page 12: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/12.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 13: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/13.jpg)
Mismatch in Statistical Speech Recognition
Possible Approaches for Acoustic Environment Mismatch
h[n]acoustic reception
microphone distortionphone/wireless channeln1(t) n2(t)
Feature Extraction Search
Speech Corpus
AcousticModels
Lexicon LanguageModel
TextCorpus
y[n] O =o1o2…oT
feature vectors
inputsignal
additivenoise convolutional noise additive
noise
outputsentences
original speech
x[n]
W=w1w2...wR
(training)
(recognition)
Feature Extraction
Feature Extraction
ModelTraining
Search andRecognition
AcousticModels
AcousticModels
Speech Enhancement Feature-based Approaches Model-based Approaches
y[n]
x[n]
![Page 14: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/14.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 15: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/15.jpg)
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)
Cepstral Mean Substraction(CMS)—Convolutional Noise Convolutional noise in time domain becomes additive in
cepstral domain y[n] = x[n]h[n] y = x+h ,x, y, h in cepstral domain most convolutional noise changes only very slightly for some
reasonable time interval x = yh Cepstral Mean Substraction(CMS)
assuming E[ x ] = 0 , then E[ y ] = h xCMS = yE[ y ]
P(x)
P(y)P(x)
P(y)
CMS
P P
![Page 16: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/16.jpg)
Feature-based Approach- Cepstral Moment Normalization (CMS, CMVN)
CMVN: variance normalized as well xCMVN= xCMS/[Var(xCMS)]1/2
P(x)P(x) P(x)
CMS CMVN
P(y) P(y) P(y)
![Page 17: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/17.jpg)
Feature-based Approach-HEQ(Histogram Equalization)
The whole distribution equalized y=CDFy
-1[CDFx(x)]
CDFy
PP
yx
CDFx
P=0.2P=0.2
3 3.5
![Page 18: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/18.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 19: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/19.jpg)
Feature-based Approach-RASTA
amplitude
f
modulation frequency
amplitude
f
Perform filtering on these signals(temporal filtering)
![Page 20: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/20.jpg)
Feature-based Approach-RASTA(Relative Spectral Temporal filtering)
Assume the rate of change of noise often lies outside the typical rate of vocal tract shape
A specially designed temporal filter
411
44
33
110
1
zzb
zazazaazB
Modulation Frequency (Hz )
Emphasize speech
![Page 21: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/21.jpg)
Data-driven Temporal filtering
PCA(Principal Component Analysis)
x
y
e
![Page 22: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/22.jpg)
Data-driven Temporal filtering
We should not guess our filter, but get it from data
Frame index
B1(z)B2(z)
Bn(z)
L
zk(1)zk(2)zk(3)
Original feature stream yt
filterconvolution
![Page 23: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/23.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 24: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/24.jpg)
Speech Enhancement- Spectral Subtraction(SS)
producing a better signal by trying to remove the noise for listening purposes or recognition purposes Noise n[n] changes fast and unpredictably in time
domain, but relatively slowly in frequency domain, N(w)
t
amplitude speech noise
f
speech
noise
amplitude
![Page 25: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/25.jpg)
Outline
Mel Frequency Cepstral Coefficient(MFCC) Mismatch in speech recognition
Feature based-CMS 、 CMVN 、 HEQ Feature based-RASTA 、 data-driven Speech enhancement-Spectral
substraction 、 wiener filtering Conclusions and applications
![Page 26: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/26.jpg)
Conclusions
We give a general framework of how to extract speech feature
We introduce the mainstream robustness There are still numerous noise reduction methods(leave
in the reference)
![Page 27: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/27.jpg)
References
![Page 28: Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.](https://reader035.fdocuments.net/reader035/viewer/2022062413/5a4d1b877f8b9ab0599bd4a6/html5/thumbnails/28.jpg)
Q & A