專題研究 語音訊號處理專題 (Special Projects in Speech Signal Processing) 李琳山
專題研究 Week 2...
Transcript of 專題研究 Week 2...
![Page 1: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/1.jpg)
專題研究 Week 2
Introduction
Prof. Lin-Shan Lee
TA: Chung-Ming Chien
1
![Page 2: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/2.jpg)
Outline2
1.Recap
2.Acoustic modeling
3.Homework
![Page 3: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/3.jpg)
Recap3
![Page 4: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/4.jpg)
語音辨識系統
Front-end Signal
Processing
AcousticModel Lexicon
Feature
Vectors Linguistic Decoding and
Search Algorithm
OutputSentence
SpeechCorpora
AcousticModel
Training
LanguageModel
Construction
TextCorpora
LanguageModel
Input Speech
今天
• Conventional ASR (Automatic Speech Recognition) system:
• Deep learning based ASR system
5
![Page 5: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/5.jpg)
語音辨識系統
• Conventional ASR (Automatic Speech Recognition) system:
8
Week 3
Week 5
Week 4
• Deep learning based ASR system
Front-end Signal
Processing
AcousticModel Lexicon
Feature
Vectors Linguistic Decoding and
Search Algorithm
OutputSentence
SpeechCorpora
AcousticModel
Training
LanguageModel
Construction
TextCorpora
LanguageModel
Input Speech
今天
Week 1
Week 2
![Page 6: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/6.jpg)
MFCC (Mel-frequency cepstral coefficients)
6
13 dimensions vector
![Page 7: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/7.jpg)
Extract Feature (02.extract.feat.sh)
◆ compute-mfcc-feats
◆ add-deltas
◆ compute-cmvn-stats
◆ apply-cmvn
7
![Page 8: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/8.jpg)
Acoustic Modeling8
Front-end Signal
Processing
AcousticModel Lexicon
Feature
Vectors Linguistic Decoding and
Search Algorithm
OutputSentence
SpeechCorpora
AcousticModel
Training
LanguageModel
Construction
TextCorpora
LanguageModel
Input Speech
今天
03.mono.train.sh
05.tree.build.sh
06.tri.train.sh
![Page 9: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/9.jpg)
Hidden Markov Model(HMM)
◆ Given◆ Sequence of observations(balls)
◆ Hidden Markov model (transition between the baskets
and observation probability for each basket)
◆ Expected◆ Sequence of states(baskets)
9
![Page 10: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/10.jpg)
Hidden Markov Model(HMM)
◆ Elements of an HMM {S,A,B,π}◆ S is a set of N states
◆ A is the N x N matrix of state transition probabilities
◆ B is a set of N probability functions, each describing the
observation probability with respect to a state
◆ π is the vector of initial state probabilities
10
s2
s1
s3
{R:.3,G:.2,B:.5}
{R:.7,G:.1,B:.2} {R:.3,G:.6,B:.1}
0.6
0.7
0.30.3
0.2
0.20.1
0.3
0.5
![Page 11: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/11.jpg)
Gaussian Mixture Model(GMM)
◆ Observation may be continuous. (e.g., mfcc)
◆ Use GMM to model continuous prob. density
function.
11
![Page 12: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/12.jpg)
Acoustic Model: P(O| λ)
◆ Model of a phone
Gaussian
Mixture Model
Markov Model
12
一般的HMM不必有方向性,但用在Acoustic model的都是單向的
![Page 13: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/13.jpg)
Acoustic Model: P(O| λ)13
![Page 14: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/14.jpg)
Acoustic Model: P(O| λ)14
![Page 15: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/15.jpg)
Acoustic model: Best State Seq. 15
![Page 16: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/16.jpg)
Acoustic model: Training 16
![Page 17: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/17.jpg)
Acoustic model: Training 17
O1
State
O2
O3
1 2 3 4 5 6 7 8 9 10
O4
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
s2
s3
s1
O5
O6
O9
O8
O7
O1
0
v1
v2
b1(v1)=3/4, b1(v2)=1/4
b2(v1)=1/3, b2(v2)=2/3
b3(v1)=2/3, b3(v2)=1/3
![Page 18: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/18.jpg)
Acoustic model: Training 18
◆ Initialization◆ Bad initialization leads to local minimum with higher
probability.
Model
Initialization:
Segmental K-means
Model
Re-estimation:
Baum-Welch
![Page 19: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/19.jpg)
Acoustic model: Training 19
◆ 假設有四個人同時發出「ㄅ」這個音
![Page 20: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/20.jpg)
Acoustic Model: P(O|W)
◆ One acoustic model for a phoneme?
◆ The Pronounce of a phoneme may be affected by
its neighbors!
20
ㄐ 一ㄣ ㄊ 一ㄢ
![Page 21: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/21.jpg)
Monophone v.s. Triphone
◆ Monophone◆ Consider only one phone information per model
◆ Ex. ㄧ, ㄨ, ㄩ
◆ Triphone◆ Consider both left and right neighboring phones
(60)3→ 216,000
◆ Ex. ㄇ+ㄧ+ㄠ
21
![Page 22: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/22.jpg)
Triphone
◆ Too much (216000) model to train?
◆ Share!
22
Generalized Triphone Shared Distribution Model (SDM)
• Sharing at Model Level • Sharing at State Level
OOV的概念?
![Page 23: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/23.jpg)
Triphone23
Example Questions(designed with human knowledge):
12: Is left context a vowel?
24: Is left context a back-vowel?
30: Is left context a low-vowel?
32: Is left context a rounded-vowel?
12
30sil-b+ua-
b+uo-
b+uy-
b+uY-
b+u
32
46 42
U-b+u
u-b+u
i-b+u24
e-b+ur-b+u 50
N-b+uM-b+u
E-b+u
yes no
◆ Decision tree decides which triphones should be
combined
![Page 24: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/24.jpg)
Acoustic Model: Training Steps
◆ Get features(previous section)
◆ Train monophone model
◆ Use previous model to build decision tree for
triphone
◆ Train triphone model
24
![Page 25: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/25.jpg)
Acoustic Model: Training Steps
◆ Get features(previous section)
◆ Train monophone model ◆ a. gmm-init-mono Initialize monophone model
◆ b. compile-train-graphs Get train graph
◆ c. align-equal-compiled model -> decode & align
(gmm-align-compiled instead when looping)
◆ d. gmm-acc-stats-ali EM training: E step
◆ e. gmm-est EM training: M step
◆ f. numgauss = numgauss + incgauss
◆ g. Goto step c. Train several times
◆ Use previous model to build decision tree for
triphone
◆ Train triphone model
25
![Page 26: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/26.jpg)
Acoustic Model: Training Steps
◆ Get features(previous section)
◆ Train monophone model
◆ Use previous model to build decision tree for
triphone
◆ Train triphone model◆ a. gmm-init-model Initialize GMM ( from decision tree)
◆ b. gmm-mixup Gaussian merging (increase #gaussian)
◆ c. convert-ali Convert alignments(model <-> decisoin tree)
◆ d. compile-train-graphs get train graph
◆ e. gmm-align-compiled model -> decode&align
◆ f. gmm-acc-stats-ali EM training: E step
◆ g. gmm-est EM training: M step
◆ h. numgauss = numgauss + incgauss
◆ i. Goto step e. train several times
26
![Page 27: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/27.jpg)
align-equal-compiled
◆ Write an equally spaced alignment (for getting training started)
◆ Usage: align-equal-compiled <graphs-rspecifier> <features-rspecifier>
<alignments-wspecifier>
e.g.
align-equal-compiled 1.fsts 1.fsts scp:train.scp ark:equal.ali
27
![Page 28: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/28.jpg)
gmm-align-compiled
◆ Performing re-alignment
◆ Usage: gmm-align-compiled [options] <model-in> <graphs-rspecifier>
<feature-rspecifier> <alignments-wspecifier>
e.g.
align-equal-compiled 1.mdl ark:graphs.fsts1.fsts scp:train.scp ark:equal.ali
◆ gmm-align-compiled $scale_opts --beam=$beam --retry-
beam=$[$beam*4] <hmm-model*> ark:$dir/train.graph ark,s,cs:$feat
ark:<alignment*>
◆ For first iteration(in monophone) beamwidth = 6, others = 10;
◆ Only realign at
mono: $realign_iters="1 2 3 4 5 6 7 8 9 10 12 14 16 18 20 23 26 29 32
35 38”
tri: $realign_iters=“10 20 30”
28
![Page 29: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/29.jpg)
gmm-acc-stats-ali
◆ Accumulate stats for GMM training.(E step)
◆ Usage: gmm-acc-stats-ali [options] <model-in>
<feature-rspecifier> <alignments-rspecifier> <stats-out>
e.g.
gmm-acc-stats-ali 1.mdl scp:train.scp ark:1.ali 1.acc
◆ gmm-acc-stats-ali --binary=false <hmm-model*>
ark,s,cs:$feat ark,s,cs:<alignment*> <stats>
29
![Page 30: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/30.jpg)
gmm-est
◆ Do Maximum Likelihood re-estimation of GMM-based
acoustic model
◆ Usage: gmm-est [options] <model-in> <stats-in>
<model-out>
e.g.
gmm-est 1.mdl 1.acc 2.mdl
◆ gmm-est --binary=false --write-occs=<*.occs> --mix-
up=$numgauss <hmm-model-in> <stats> <hmm-
model-out>
--write-occs : File to write pdf occupation counts to.
$numgauss increases every time.
30
![Page 31: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/31.jpg)
03.mono.train.sh, 05.tree.build.sh, 06.tri.train.sh
Homework31
![Page 32: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/32.jpg)
To Do: Acoustic Modeling32
◆ Step1. Execute the following commands.◆ script/03.mono.train.sh | tee log/03.mono.train.log
◆ script/05.tree.build.sh | tee log/05.tree.build.log
◆ script/06.tri.train.sh | tee log/06.tri.train.log
◆ Step2. finish code in TODO◆ script/03.mono.train.sh
◆ script/06.tri.train.sh
◆ Step3. Observe the output and results.
◆ Step4. (opt.) tune #gaussian and #iteration.
![Page 33: 專題研究 Week 2 Introductionspeech.ee.ntu.edu.tw/Project2020Spring/SpeechProj2.pdf專題研究Week 2 Introduction Prof. Lin-Shan Lee TA: Chung-Ming Chien 1 2 Outline 1.Recap 2.Acoustic](https://reader035.fdocuments.net/reader035/viewer/2022062507/5fe640626c575051c346b6e7/html5/thumbnails/33.jpg)
Hint(important!!)33
◆ Use the variables already defined.
◆ Use these formula:
◆ Pipe for error
◆ compute-mfcc-feats … 2> $log