Voice recognition

VOICE RECOGNITIONDepartment of Computer Engineering

2007152025

Yoseop Shin

Voice recognition (also known as automatic speech recognition) converts ⋯⋯ using the binary code for a string of character codes).

B

F

A

GD

E

C

Introduction of Voice Recognition

Voice RecognitionRecognize what is being said

Identify the person speaking

The market growth for voice recognitionSa

les (in

billion)

0

0.8

1.5

2.3

3

2004 2005 2006 2007 2008 2009

market for voice-recognition technology topped $1 billion for the first time in 2006. !100 percent increase in just two years (2006-2008). !The market for server-based voice-recognition technology to power call centers and the like reached nearly $600 million in 2006 and is expected to double by 2009.

Distributed Speech Recognition

▪ Commonly using in Mobile Devices ▪ e.g. Motorola, Google iPhone

Speech Coder

Speech Decoder ISDN ASR

Front-endASR

Decoder

ASR Front-end

ASR Decoder

Conventional

DSR

Pitch Analysis Noise Reduction

Contents server

Hidden Markov Model (HMM)

▪ Modern general-purpose speech recognition systems are generally based on HMMs.

▪ Statistical Model. ▪ The most popular statistical model in natural language processing.

▪ trained automatically, simple, computationally feasible to use.

Artificial neural network

▪ computational model based on biological neural networks.

▪ Training non-linear relation itself.

Performance of Speech Recognition

▪ Isolated Word Recognition (aprx to 95~97%)

- High Accuracy

- Very Limited Words

- Short command or Simple Control

▪ Continuous Word Recognition (aprx to 85~90%)

- Low Accuracy

- higher than 95% (1,000 ~ 3,000 words)

Performance of Speech Recognition

Researcher Feature DB Words Accuracy

IBM Isolated Word Recognition English 20,000 95.0%

NEC Isolated Word Recognition Japanese 1,800 97.5%

ATR Continuous Word Recognition Japanese 1,035 95.3%

SRI Continuous Word Recognition English 1,000 95.2%

CMU Continuous Word Recognition ATIS 3,000 95.0%

Ney Continuous Word Recognition NAB’94 20,000 84.6%

Cambridge Continuous Word Recognition HUB4 32,800 83.8%

KAIST Continuous Word Recognition Korean 3,064 96.7%

Optimal conditions

▪ have speech characteristics which match the training data.

▪ can achieve proper speaker adaptation. ▪ work in a clean noise environment .

(e.g. quiet office or laboratory space)

Applications – medical transcription

▪ MT (Medical Transcription)

- Searches, queries, and for filling may all be faster to perform by voice than by using a keyboard.

Applications - People with Disabilities

▪ Deaf Telephony

▪ Voice To Text

▪ Captioned Telephone

▪ Using Mouse with mouth

Prof. Sang-Mook Lee Seoul National University

School of Earth and Environmental Science

Perl Scripting with Windows Vista

Applications – Further Applications

▪ Automatic Translation

▪ Automotive Speech Recognition (e.g. Ford Sync)

▪ Telematics (e.g. vehicle Navigation Systems)

▪ Court reporting

▪ Hands-free computing : voice command recognition computer user interface

▪ Mobile telephony, including mobile email

▪ Transcription (digital speech-to-text)

▪ Air Traffic Control Speech Recognition

Tom Clancy’s ENDWAR

Speech Recognition Software

▪ Korea

VoiceTech ByVoice 2.0

▪ U.S.

Dragon Naturally Speaking

Speech Recognition Software

THANK YOU

FOR YOUR ATTENTION!

Questions? Contact me : [email protected]

mailto:[email protected]

Voice recognition

Technology

Transcript of Voice recognition