Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga...
-
Upload
cecily-gray -
Category
Documents
-
view
214 -
download
0
Transcript of Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga...
![Page 1: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/1.jpg)
Speech controlled keyboard
Instructor: Dr. John G. HarrisTA: M. Skowronski
Andréa Matsunaga ([email protected])Maurício O. Tsugawa ([email protected])
©2002, UFL-COE-ECE
EEL 6586 - Automatic Speech ProcessingEEL 6586 - Automatic Speech ProcessingFinal Project – Spring - 2002Final Project – Spring - 2002
![Page 2: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/2.jpg)
Agenda
• Introduction
• Challenges
• Project Description
• Results
• Demo
![Page 3: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/3.jpg)
Introduction
• Why Speech Recognition?
• Why keyboard?
![Page 4: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/4.jpg)
Challenges
• Vocabulary Size (Not so big, but 4x HW#4)
•Homework #4, part B5
• 100% (on training data)
• 99%~100% (on test data)
• Find a good training data
• Real-Time Processing using Matlab
• Matlab is not multithreaded
• Audiorecorder does not offer much control
![Page 5: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/5.jpg)
Project Description
Real-TimeRecording
HMMEngine
![Page 6: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/6.jpg)
Recording
• Data Acquisition Toolbox from MathWorks
• more control than audiorecorder
• very simple triggering scheme
• End point detection based on:
• short-time zero crossing
• short-time energy
• Trigger level adjusted during recording
![Page 7: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/7.jpg)
HMM Engine• Frame window size: 15 ms (no overlap.)• Feature vector:
• MFCC from Malcom Slaney’s mfcc.m (12 coefficients, no c[0])• Delta (12 coefficients, K=2)• Delta-delta (12 coefficients, K=1)
• HMM models:• 1 female and 1 male models for each digit• 1 model for letters (no sufficient database)• 8 states (second classification with 4 states for class “E”)• EM iterations: 10
• Classifier:• Viterbi (hmm_vit from h2m)• Utterance classified according to the max log likelihood of all HMMs.
• Noise Reduction:• Cepstral mean subtraction for both TRAIN and TEST data.
![Page 8: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/8.jpg)
Results
• Using about 40 utterances per class as training data:
• About 92% accuracy on training data
• HW4 Extra Credit Recognition dropped from 99% to about 10%!
![Page 9: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/9.jpg)
Results
• Using only digits:
• very good recognition
• Using digits+alphabet:
• {E, {B, V}, {D, G}, {P, T}, {C, Z}}
• {F, X, S}, {L, M, N}, {A, K, J, {H, 8}} {O}
• {I, 5, 9} {Q, 2}
• {U} {R} {W} {Y} {0} {1} {3} {4} {6} {7}
Poor
Good
![Page 10: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/10.jpg)
Demo
Please, enjoy the demo!
![Page 11: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,](https://reader036.fdocuments.net/reader036/viewer/2022082713/5697bfe21a28abf838cb4ab0/html5/thumbnails/11.jpg)
Thank you!
Questions?