1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
-
Upload
kristina-doyle -
Category
Documents
-
view
213 -
download
0
Transcript of 1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
1M4 speech recognition
University of SheffieldUniversity of SheffieldUniversity of SheffieldUniversity of Sheffield
M4 speech recognition
Vincent Wan, Martin Karafiát
2M4 speech recognition
The RecogniserThe RecogniserThe RecogniserThe Recogniser
Frontend
n-best lattice generationBest first decoding
(Ducoder)
Trigram language model(SRILM)
Word internaltriphone models
MLLR adaptation
(HTK)
Cross wordtriphone models
Recognitionoutput
Lattice rescoringTime synchronous decoding
(HTK)
MLLR adaptation
(HTK)
Recognitionoutput
3M4 speech recognition
System limitationsSystem limitationsSystem limitationsSystem limitations
• N-best list rescoring not optimal
• Adaptation must be performed on two sets of acoustic models
• Many more hyper-parameters to tune manually
• SRILM is not efficient on very large language models (greater than 10e+9 words)
4M4 speech recognition
Advances since last meetingAdvances since last meetingAdvances since last meetingAdvances since last meeting
• Models trained on two databases– SWITCHBOARD recogniser
• Acoustic & language models trained on 200 hours of speech
– ICSI meetings recogniser• Acoustic models trained on 40 hours of speech
• Language model is a combination of SWB and ICSI
• Improvements mainly affect the Switchboard models
• 16kHz sampling rate used throughout
5M4 speech recognition
Advances since last meetingAdvances since last meetingAdvances since last meetingAdvances since last meeting
• Adaptation of word internal context dependent models
• Unified the phone sets and pronunciation dictionaries– Improved the pronunciation dictionary for Switchboard– Now using the ICSI dictionary with missing pronunciations
imported from the ISIP dictionary
• Better handling of multiple pronunciations during acoustic model training
• General bug fixes
6M4 speech recognition
Results overviewResults overviewResults overviewResults overview
SWB trn ICSI trnSWB trn
ICSI adpt
ICSI trn
ICSI adpt
ICSI trn
M4 adpt
SWB trn
M4 adpt
SWB trn
ICSI adpt
M4 adpt
SWB55.05
45.41
ICSI 52.36 53.99 49.27
M473.47 *
79.17 †
84.67 *
81.27 †
% word error rates
* Results from lapel mics† Results from beam former
7M4 speech recognition
Results: adaptation vs. direct training on ICSIResults: adaptation vs. direct training on ICSIResults: adaptation vs. direct training on ICSIResults: adaptation vs. direct training on ICSI
ICSI trainedSWB trained
ICSI adapted
Monophone models * 73.37 78.89
Context dependent word internal models *
66.08 70.59
Lattice rescoring (none or spkr independent adaptation)
52.34 53.99
Lattice rescoring
(speaker adaptation)49.27 51.18
% word error rates
* Results from Ducoder using all pruning
8M4 speech recognition
Acoustic model adaptation issueAcoustic model adaptation issueAcoustic model adaptation issueAcoustic model adaptation issue
• Acoustic models are presently not very adaptive– Better MLLR code required (next slide)– More training data required
• Need to make better use of the combined ICSI/SWB training data for M4.
9M4 speech recognition
Other newsOther newsOther newsOther news
• The next version of HTK’s adaptation code will be made available to M4 before the official public release.
• Sheffield to acquire HTK LVCSR decoder– Licensing issues to be resolved– May be able to make binaries available to M4 partners