Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.
-
Upload
adriana-chinnock -
Category
Documents
-
view
213 -
download
0
Transcript of Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.
![Page 1: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/1.jpg)
Learning Structured Models for Phone Recognition
Slav Petrov, Adam Pauls, Dan Klein
![Page 2: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/2.jpg)
Acoustic Modeling
![Page 3: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/3.jpg)
Motivation
Standard acoustic models impose many structural constraints
We propose an automatic approach
Use TIMIT Dataset MFCC features Full covariance Gaussians (Young and Woodland, 1994)
![Page 4: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/4.jpg)
Phone Classification
? ? ? ? ? ? ? ? ??
![Page 5: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/5.jpg)
Phone Classification
æ
![Page 6: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/6.jpg)
HMMs for Phone Classification
![Page 7: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/7.jpg)
HMMs for Phone Classification
Temporal Structure
![Page 8: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/8.jpg)
Standard subphone/mixture HMM
Temporal Structure
Gaussian Mixtures
Model Error rate
HMM Baseline 25.1%
![Page 9: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/9.jpg)
Our ModelStandard Model
Single Gaussians
Fully Connected
![Page 10: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/10.jpg)
Hierarchical Baum-Welch Training
32.1%
28.7%
25.6%
HMM Baseline 25.1%
5 Split rounds 21.4%
23.9%
![Page 11: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/11.jpg)
Phone Classification Results
Method Error Rate
GMM Baseline (Sha and Saul, 2006) 26.0 %
HMM Baseline (Gunawardana et al., 2005) 25.1 %
SVM (Clarkson and Moreno, 1999) 22.4 %
Hidden CRF (Gunawardana et al., 2005) 21.7 %
Our Work 21.4 %
Large Margin GMM (Sha and Saul, 2006) 21.1 %
![Page 12: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/12.jpg)
Phone Recognition
? ? ? ? ? ? ? ? ?
![Page 13: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/13.jpg)
Standard State-Tied Acoustic Models
![Page 14: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/14.jpg)
No more State-Tying
![Page 15: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/15.jpg)
No more Gaussian Mixtures
![Page 16: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/16.jpg)
Fully connected internal structure
![Page 17: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/17.jpg)
Fully connected external structure
![Page 18: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/18.jpg)
Refinement of the /ih/-phone
![Page 19: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/19.jpg)
Refinement of the /ih/-phone
![Page 20: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/20.jpg)
Refinement of the /ih/-phone
![Page 21: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/21.jpg)
Refinement of the /ih/-phone
![Page 22: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/22.jpg)
Refinement of the /l/-phone
![Page 23: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/23.jpg)
Hierarchical Refinement Results
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0 500 1000 1500 2000
Number of States
Error Rate
Split and Merge, Automatic Alignment Split Only
HMM Baseline 41.7%
5 Split Rounds 28.4%
![Page 24: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/24.jpg)
Merging
Not all phones are equally complex Compute log likelihood loss from merging
Split model Merged at one node
t-1 t t+1 t-1 t t+1
![Page 25: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/25.jpg)
Merging Criterion
t-1 t t+1
t-1 t t+1
![Page 26: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/26.jpg)
Split and Merge Results
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0 500 1000 1500 2000
Number of States
Error Rate
Split and Merge Split Only
Split Only 28.4%
Split & Merge 27.3%
![Page 27: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/27.jpg)
0
5
10
15
20
25
30
35
ae ao ay eh er ey ih f r s sil aa ah ix iy z cl k sh n
vcl ow l
m t v
uw aw ax ch w th el dh uh p
en oy hh jh ng y b d dx g zh epi
HMM states per phone
![Page 28: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/28.jpg)
ey eh ao
0
5
10
15
20
25
30
35
ae ao ay eh er ey ih f r s sil aa ah ix iy z cl k sh n
vcl ow l
m t v
uw aw ax ch w th el dh uh p
en oy hh jh ng y b d dx g zh epi
HMM states per phone
![Page 29: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/29.jpg)
g d b
0
5
10
15
20
25
30
35
ae ao ay eh er ey ih f r s sil aa ah ix iy z cl k sh n
vcl ow l
m t v
uw aw ax ch w th el dh uh p
en oy hh jh ng y b d dx g zh epi
HMM states per phone
![Page 30: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/30.jpg)
Alignment
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0 500 1000 1500 2000
Number of States
Error Rate
Split and Merge Split Only Split and Merge, Automatic Alignment
Hand Aligned 27.3%
Auto Aligned 26.3%
Results
![Page 31: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/31.jpg)
0
5
10
15
20
25
30
35
ae ao ay eh er ey ih aa ah ix iy ow uw aw ax el uh en oy f r s z k sh n l m t v ch w th dh
p hh jh ng
y b d dx g zh sil cl vcl epi
Hand Aligned Auto Aligned
Alignment State Distribution
![Page 32: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/32.jpg)
Inference
State sequence: d1-d6-d6-d4-ae5-ae2-ae3-ae0-d2-d2-d3-d7-d5
Phone sequence:d - d - d -d -ae - ae - ae - ae - d - d -d - d - d
Transcription d - ae - d
Viterbi
Variational
???
![Page 33: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/33.jpg)
Variational Inference
Variational Approximation:
Viterbi 26.3%
Variational 25.1%
: Posterior edge marginals
Solution:
![Page 34: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/34.jpg)
Phone Recognition Results
Method Error Rate
State-Tied Triphone HMM (HTK)
(Young and Woodland, 1994)27.7 %
Gender Dependent Triphone HMM
(Lamel and Gauvain, 1993) 27.1 %
Our Work 26.1 %
Bayesian Triphone HMM
(Ming and Smith, 1998) 25.6 %
Heterogeneous classifiers
(Halberstadt and Glass, 1998) 24.4 %
![Page 35: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/35.jpg)
Conclusions
Minimalist, Automatic Approach Unconstrained Accurate
Phone Classification Competitive with state-of-the-art discriminative
methods despite being generative
Phone Recognition Better than standard state-tied triphone models
![Page 36: Learning Structured Models for Phone Recognition Slav Petrov, Adam Pauls, Dan Klein.](https://reader031.fdocuments.net/reader031/viewer/2022032516/56649c6d5503460f9491ebf1/html5/thumbnails/36.jpg)
Thank you!
http://nlp.cs.berkeley.edu