Speech Recognition, Digitization, Generation

download Speech Recognition, Digitization, Generation

of 12

description

five variants of speech technology- Discrete-word recognition,- Continuous-speech recognition,- Voice information systems, - Speech generation and - Non-speech auditory interfaces

Transcript of Speech Recognition, Digitization, Generation

Speech recognition, digitization and generation

Speech recognition, digitization and generationSpeech technologyFor designers of human/computer interaction systems, speech and audio technologies have at least five variations:

Discrete-word recognition,Continuous-speech recognition,Voice information systems, Speech generation and Non-speech auditory interfacesDiscrete word recognitionDiscrete-word recognition devices recognize individual words spoken by a specific person

they can work with 90 to 98% reliability for 100 to 1000 word or larger vocabularies.

Speaker-dependent training, in which users repeat the full vocabulary once or twice, is a part of many systems. Such training yields higher accuracy than in speaker-independent systems, but the elimination of training expands the scope of commercial applications.

Quiet environments, head-mounted microphones and careful choice of vocabularies improve recognition ratesTelephone companies offer voice-dialing services, even on cell phones, to allow users simply to say CallMom and be connected.

Phone-based recognition of numbers, yes/no answers, and selections from Voice menus are successful and increasingly applied.

However, full-sentence commands such as Reserve two seats on the first flight tomorrow from New York to Washington are just moving from are search challenge to commercial use.

Current research projects are devoted to improving recognition rates in difficult conditions, eliminating the need for speaker-dependent.

Speech recognition for discrete words works well for special-purpose applications, but it does not serve as a general interaction medium.Continuous speech recognitionContinuous-speech-recognition systems enable users to dictate letters and compose reports verbally for automatic transcription.

Review, correction, and revision are usually accomplished with keyboards and displays.

Users need practice in dictation and seem to do best with speech input when preparing standard reports.

Continuous speech-recognition systems also enable automatic scanning and retrieval from radio or television programs, court proceedings, lectures, or telephone calls for specific words or topics

Difficulties in implementationA major difficulty for software designers is recognizing the boundaries between spoken words, because normal speech patterns blur the boundaries.

Other problems are diverse accents, variable speaking rates, disruptive back-ground noise, and changing emotional intonation.

the most difficult problem is matching the semantic interpretation and contextual understanding that humans apply easily to predict and disambiguate words.Voice information systemsStored speech is commonly used to provide telephone-based information about tourist sites and government services, and for after-hours messages from organizations.

These voice information systems, often called Interactive Voice Response(IVR), can provide good customer service at minimum cost if proper development methods and metrics are used

Voice prompts guide users so they can press keys to check on airline flight departure or arrival times etc

Voice information technologies are also used in popular personal voicemail systems.Speech generationSpeech generation is a successful technology with wide spread application in consumer products and on telephones.

When algorithms are used to generate the sound(synthesis), the intonation may sound robot-like and distracting. The quality of the sound can be improved when phonemes, words and phrases from digitized human speech can be smoothly integrated into meaningful sentences.

Text-to-speech utilities like the built-in Microsoft Windows Narrator can be used to read passages of text in web browsers and word processors.Speech generation and digitized speech segments are usually preferable when:

the messages are simple and short, deal with events in time, require an immediate responsewhen users visual channels are overloadedThey must be free to move aroundWhen the environment is too brightly lit, too poorly lit, subject to severe vibration, or otherwise unsuitable for visual displays.Non-speech auditory interfacesAuditory outputs include individual audio tones and more complex information presentation by combinations of sound and music

Computer systems added a range of tones to indicate warnings or to acknowledge the completion of an action.

Early Teletypes included a bell tone to alert users that a message was coming or that paper had run out. Later computer systems added a range of tones to indicate warnings or to acknowledge the completion of an action.

Auditory icons, such as a door opening, liquid pouring, or ball bouncing, help reinforce the visual metaphors in a graphical user interface or the product concepts for a toy.Game designers know that sounds can add realism, heighten tension, and engage users in powerful ways.

Research continues on auditory methods for emphasizing the distributions of data in information visualization or drawing attention to patterns, outliers, and clusters.

Auditory web browsers for blind users or telephonic usage have been developed. Users can hear text and link labels, and then make selections by key entry.

Auditory file browsers continue to be refined: each file might have a sound whose frequency is related to its size, and might be assigned an instrument

when the directory is opened, each file might play its sound simultaneously or sequentially. Alternatively, files might have sounds associated with their file types, so that users can hear whether there are spread sheet, graphic, or other textfiles.The potential for novel musical instruments seems especially attractive.

With touch-sensitive and haptic devices it is possible to offer appropriate feedback to give musicians an experience similar to a piano keyboard, a drum, or a wood-wind or stringed instrument.

It is also possible to invent new instruments whose frequencies, amplitudes, and effects are governed by the placement of the touch, as well as by its direction, and speed.

Music composition using computers expanded as musical-instrument digital-interface(MIDI) hardware and software became widely available at reasonable prices.