Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas...

1
Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas P. Way Department of Computing Sciences, Villanova University Abstract Background Applications Four general application categories for ASR are: • Command Recognition • Dictation • Interactive Voice Response (IVR) • Assistive Technologies Motivation System Design Part 1 - DiBS Low recognition rate for domain specific jargon is one of the key weaknesses in ASR. DiBS was developed to solve this problem. Table: Summary of the accuracy results for five scenarios. Description Accuracy Range Usability Untrained 75% 64%-83% Poor to fair Minimal Training 88% 78%-93% Sufficient Moderate Training 90% 81%-96% Good Moderate Training and Customized dictionary 91% 83%-96% Good Moderate Training, Customized Dictionary and pronunciations 94% 86%-98% Very good System Design Part 2 - VUST Table. Recognition accuracy for 4 classifications of classroom speech. Classification Words Correct Total Words Percent Recognized Planning 628 758 83% Lecture 5930 6925 86% Roll-call 155 254 61% Discussion 1556 1846 84% TOTAL 8269 9783 85% Contributions & Future Work Contributions Proved to be an affordable and beneficial assistive system Provides an easy to use software Improves Recognition Accuracy Distributed and portable application Future work Commercial Quality Post speech profiles and jargon in a central repository Evaluate other speech engines Deploy in classrooms SERVER Consists of three major components: the speech recognition software, a dictionary enhancement tool, and a transcription distribution application. Uses an ASR system designed to be affordable, accurate and easy to set up and use. Around one hour of speech training are enough to get good accuracy Training through windows control panel or through the VUST instructor’s Console Simple setup and configuration. User friendly interface Instructor initiates transcription Students connect via web applet Accurate results even without added jargon (table below) We have tested the ASR system with five scenarios: Untrained, some training, moderate training, moderate training and some added jargon using DiBS and moderate training with added jargon and custom pronunciation for the added jargon. Many enhancements took place on specific domains during the following years such as the introduction of the Hidden Markov Model (HMM). At the beginning of the 21st century, commercial speech recognition systems finally became practical and affordable, with many products on the market. The most popular vendors being IBM and Dragon. The quest for automatic speech recognition (ASR) started in 1939 with the introduction of VODER by AT&T. With the now wide availability of ASR software, the technology has become an application area that is emerging in assistive technology. For people who are deaf and hard of hearing, the accessibility and freedom that can be afforded by using a computer to recognize speech is finally beginning to be realized. The design of such a truly usable ASR system requires an understanding of the approaches, user requirements, and available technology. Speech recognition software is maturing, and possesses the potential to provide real-time note taking assistance in the classroom, particularly for deaf and hard of hearing students. This research talks about speech recognition in general, and reports on a practical, portable and readily deployed application that provides a cost-effective, automatic transcription system with the goal of making computer science lectures inclusive of deaf and hard of hearing students. The design of the system is described, some specific technology choices and implementation approaches are discussed, and results of two phases of an in-class evaluation of the system are analyzed. Ideas for student research projects that could extend and enhance the system also are proposed. Nady UHF-3 wireless headset system 3 …click ‘Connect and Start Recognition’ to start VUST server. Run the VUST program and selects a speech profile. 2 1 Connect wireless microphone receiver to computer and wear headset & transmitter. 1 Connect to VUST transcription server URL using web browser. 2 1 Select available connection, and click “Connect”. 3 Transcription is received once the lecture begins. • 28 million deaf and hard of hearing individuals in the US (Around 500 million world wide) • Limited benefit from hearing aids and cochlear implants as these are most useful in face to face conversations • Note takers and sign language interpreters are expensive to hire and provide limited assistance due to the need to paraphrase during a lecture • Developing countries provide no assistance • Commercial ASR systems are expensive to acquire

Transcript of Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas...

Page 1: Design of a Speech Recognition System to Assist Hearing Impaired Students Richard Kheir 2 and Thomas P. Way Department of Computing Sciences, Villanova.

Design of a Speech Recognition System to Assist Hearing Impaired StudentsRichard Kheir2 and Thomas P. Way

Department of Computing Sciences, Villanova University

Abstract

Background

ApplicationsFour general application categories for ASR are:

• Command Recognition

• Dictation

• Interactive Voice Response (IVR)

• Assistive Technologies

Motivation

System Design Part 1 - DiBSLow recognition rate for domain specific jargon is one of the keyweaknesses in ASR. DiBS was developed to solve this problem.

Table: Summary of the accuracy results for five scenarios.

Description Accuracy Range Usability Untrained 75% 64%-83% Poor to fairMinimal Training 88% 78%-93% SufficientModerate Training 90% 81%-96% GoodModerate Training and Customized dictionary 91% 83%-96% GoodModerate Training, Customized Dictionary and pronunciations 94% 86%-98% Very good

System Design Part 2 - VUST

Table. Recognition accuracy for 4 classifications of classroom speech.

Classification Words Correct Total Words Percent Recognized

Planning 628 758 83%

Lecture 5930 6925 86%

Roll-call 155 254 61%

Discussion 1556 1846 84%

TOTAL 8269 9783 85%

Contributions & Future WorkContributions Proved to be an affordable and beneficial

assistive system

Provides an easy to use software

Improves Recognition Accuracy

Distributed and portable application

Future work Commercial Quality

Post speech profiles and jargon in a central repository

Evaluate other speech engines

Deploy in classrooms

SERVER

• Consists of three major components: the speech recognition software, a dictionary enhancement tool, and a transcription distribution application.

• Uses an ASR system designed to be affordable, accurate and easy to set up and use.

• Around one hour of speech training are enough to get good accuracy

• Training through windows control panel or through the VUST instructor’s Console

• Simple setup and configuration.

• User friendly interface

• Instructor initiates transcription

• Students connect via web applet

• Accurate results even without added jargon (table below)

We have tested the ASR system with five scenarios: Untrained, some training, moderate training, moderate training and some added jargon using DiBS and moderate training with added jargon and custom pronunciation for the added jargon.

Many enhancements took place on specific domains during the following years such as the introduction of the Hidden Markov Model (HMM). At the beginning of the 21st century, commercial speech recognition systems finally became practical and affordable, with many products on the market. The most popular vendors being IBM and Dragon.

The quest for automatic speech recognition (ASR) startedin 1939 with the introduction of VODER by AT&T.

With the now wide availability of ASR software, the technology has become an application area that is emerging in assistive technology. For people who are deaf and hard of hearing, the accessibility and freedom that can be afforded by using a computer to recognize speech is finally beginning to be realized. The design of such a truly usable ASR system requires an understanding of the approaches, user requirements, and available technology.

Speech recognition software is maturing, and possesses the potential to provide real-time note taking assistance in the classroom, particularly for deaf and hard of hearing students. This research talks about speech recognition in general, and reports on a practical, portable and readily deployed application that provides a cost-effective, automatic transcription system with the goal of

making computer science lectures inclusive of deaf and hard of hearing students. The design of the system is described, some specific technology choices and implementation approaches are discussed, and results of two phases of an in-class evaluation of the system are analyzed. Ideas for student research projects that could extend and enhance the system also are proposed.

Nady UHF-3 wireless headset system

3 …click ‘Connect and Start Recognition’ to start VUST server.

Run the VUST program and selects a speech profile.2

1 Connect wireless microphone receiver to computer and wear headset & transmitter.

1

Connect to VUST transcription server URL using web browser.

21 Select available connection, and click “Connect”.

3Transcription is received once the lecture begins.

• 28 million deaf and hard of hearing individuals in the US (Around 500 million world wide)

• Limited benefit from hearing aids and cochlear implants as these are most useful in face to face conversations

• Note takers and sign language interpreters are expensive to hire and provide limited assistance due to the need to paraphrase during a lecture

• Developing countries provide no assistance

• Commercial ASR systems are expensive to acquire