Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga...

11
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga ([email protected]) Maurício O. Tsugawa ([email protected]) ©2002, UFL-COE-ECE EEL 6586 - Automatic Speech Processing EEL 6586 - Automatic Speech Processing Final Project – Spring - 2002 Final Project – Spring - 2002

Transcript of Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga...

Page 1: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Speech controlled keyboard

Instructor: Dr. John G. HarrisTA: M. Skowronski

Andréa Matsunaga ([email protected])Maurício O. Tsugawa ([email protected])

©2002, UFL-COE-ECE

EEL 6586 - Automatic Speech ProcessingEEL 6586 - Automatic Speech ProcessingFinal Project – Spring - 2002Final Project – Spring - 2002

Page 2: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Agenda

• Introduction

• Challenges

• Project Description

• Results

• Demo

Page 3: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Introduction

• Why Speech Recognition?

• Why keyboard?

Page 4: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Challenges

• Vocabulary Size (Not so big, but 4x HW#4)

•Homework #4, part B5

• 100% (on training data)

• 99%~100% (on test data)

• Find a good training data

• Real-Time Processing using Matlab

• Matlab is not multithreaded

• Audiorecorder does not offer much control

Page 5: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Project Description

Real-TimeRecording

HMMEngine

Page 6: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Recording

• Data Acquisition Toolbox from MathWorks

• more control than audiorecorder

• very simple triggering scheme

• End point detection based on:

• short-time zero crossing

• short-time energy

• Trigger level adjusted during recording

Page 7: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

HMM Engine• Frame window size: 15 ms (no overlap.)• Feature vector:

• MFCC from Malcom Slaney’s mfcc.m (12 coefficients, no c[0])• Delta (12 coefficients, K=2)• Delta-delta (12 coefficients, K=1)

• HMM models:• 1 female and 1 male models for each digit• 1 model for letters (no sufficient database)• 8 states (second classification with 4 states for class “E”)• EM iterations: 10

• Classifier:• Viterbi (hmm_vit from h2m)• Utterance classified according to the max log likelihood of all HMMs.

• Noise Reduction:• Cepstral mean subtraction for both TRAIN and TEST data.

Page 8: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Results

• Using about 40 utterances per class as training data:

• About 92% accuracy on training data

• HW4 Extra Credit Recognition dropped from 99% to about 10%!

Page 9: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Results

• Using only digits:

• very good recognition

• Using digits+alphabet:

• {E, {B, V}, {D, G}, {P, T}, {C, Z}}

• {F, X, S}, {L, M, N}, {A, K, J, {H, 8}} {O}

• {I, 5, 9} {Q, 2}

• {U} {R} {W} {Y} {0} {1} {3} {4} {6} {7}

Poor

Good

Page 10: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Demo

Please, enjoy the demo!

Page 11: Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga (ammatsun@ufl.edu) Maurício O. Tsugawa (tsugawa@ufl.edu) ©2002,

Thank you!

Questions?