A glimpse of voice technology
-
Upload
vishad-garg -
Category
Technology
-
view
1.086 -
download
4
description
Transcript of A glimpse of voice technology
1Momentum ConfidentialJanuary, 2002
A Glimpse of Voice Technology
By:
Vishad Garg
Momentum India Pvt. Ltd.
91-9611077772
September 12, 2001
2Momentum ConfidentialJanuary, 2002
Automated Voice Processing
Voice Portal
Voice XML
Voice Portal at Work
Agenda
3Momentum ConfidentialJanuary, 2002
“Automated Voice Processing is the act of answering, routing, and handling phone calls with a computer-based system. The call processing system answers and processes calls according to the needs of the caller and the person and/or company being called.”
Definition
4Momentum ConfidentialJanuary, 2002
Interactive Voice Response (IVR)
Voice Mail
Automatic Call Distribution(ACD)
Audiotext
Predictive Dialer
Voice Portal
Applications of Voice Processing
5Momentum ConfidentialJanuary, 2002
“IVR systems facilitate people-to-computer/database communications.It automates the handling of calls by interacting with one or more online databases.”
IVR system works on the premise of:
Data Capture
Information Delivery
Computer Telephony Integration (CTI) Link
Interactive Voice Response (IVR)
6Momentum ConfidentialJanuary, 2002
“Voice mail enhances people-to-people communication. Voice mail is an umbrella covering a variety of automated voice processing features including voice mailboxes for storing and forwarding messages, voice menus for routing and responding to calls, recorded announcements for selectively disseminating information, and information access to databases.”
Voice Mail
7Momentum ConfidentialJanuary, 2002
“ACD facilitate distribution of incoming calls based upon some algorithms to a group of people (agents) that can answer the calls.It uses the facility of ANI and DNIS to perform it.”
Automatic Call Distribution(ACD)
8Momentum ConfidentialJanuary, 2002
“Audio text is a service that allows callers to access prerecorded information on a topic of interest to them. It allows multiple callers to retrieve recorded announcements containing information that would otherwise have been given by a person. The information retrieved is general and not specific to each caller .”
Audio text
9Momentum ConfidentialJanuary, 2002
“Predictive Dialer facilitate launching of calls and monitor their progress.Only connected calls are passed to agents.”
Predictive Dialer
Automated Voice Processing
Voice Portal
Voice XML
Voice Portal at Work
10Momentum ConfidentialJanuary, 2002
Agenda
11Momentum ConfidentialJanuary, 2002
“The convergence of the richness of the web and the accessibility of the phone is forming a vast new network - a voice portal, where internet content can be accessed from any phone, anywhere, using human voice”.
“Speech enabled access to web-based information”.
Definition
12Momentum ConfidentialJanuary, 2002
“Leverages the Internet for application development and delivery.”
Phone instead of PCVoiceXML instead of HTMLA voice browser instead of an ordinary web browser.
Voice Portal vs. Web Portal
13Momentum ConfidentialJanuary, 2002
Standard language enables portability. High level domain-specific language simplifies application development.
Can consolidate voice and web applications. Cost of creating a speech-based portal platform continues to decline.
Internet has raised public expectations, with people growing used to having information at their fingertips when they want it. Once people get accustomed to immediate news, weather reports or stock quotes over the Internet, the transition to the phone makes perfect sense.
Why bring the internet to voice applications?
14Momentum ConfidentialJanuary, 2002
Automatic Speech Recognition(ASR) Voice Browser Text-To-Speech VoiceXML
Voice Portal Key Components
15Momentum ConfidentialJanuary, 2002
Automatic Speech Recognition (ASR) is the technology that allows a machine to understand human speech.
Takes human speech input, digitizes it, and converts it into a machine-readable string of text.
A component called a recognizer then manipulates the text into a form that the recognizer uses to identify what the speaker said.
Automatic Speech Recognition
16Momentum ConfidentialJanuary, 2002
Implementation Platform
VXML Browser
Document-Server A document server processes request from a client application, the voice XML interpreter. The server produces VXML document in reply, which is processed by the voice XML interpreter.
VoiceXML interpreter is responsible for detecting an incoming call, acquiring the initial voice XML document and answering the call.
Voice Browser/Interpreter
17Momentum ConfidentialJanuary, 2002
TTS converts text strings inputs to the spoken outputs
TTS is increasingly being used to speak e-mail and Web-based text to callers
Text-To-Speech(TTS)
Automated Voice Processing
Voice Portal
Voice XML
Voice Portal at Work
18Momentum ConfidentialJanuary, 2002
Agenda
19Momentum ConfidentialJanuary, 2002
Voice extensible markup Language A language for specifying voice/audio dialogs Voice dialogs use audio prompts and text- to-
speech (TTS) for output; touch- tone keys (DTMF) and automatic speech recognition (ASR) for input.
Main input/ output device (initially) is the phone.
What is VXML
20Momentum ConfidentialJanuary, 2002
Bring full power of web development and content delivery to voice response applications
Shield authors from low level programming and platform specific details.
Enables Integration of Voice Services with data services using Client Server paradigm
Voice service is viewed as a sequence of interaction dialog between a user and an implementation platform.
Goal of VXML
21Momentum ConfidentialJanuary, 2002
Output of Synthesized speech Output of audio files Recognition of spoken input Recognition of DTMF input Recording of spoken input Telephony features such as call transfer and
disconnect
Scope of VXML
22Momentum ConfidentialJanuary, 2002
Application Dialog/Sub-dialog Session Grammar Events
VXML Concepts
23Momentum ConfidentialJanuary, 2002
A set of Documents sharing the same application root document
Root document variable and grammar available when transitioning to other document.
D3
Root
D1 D2
Application
24Momentum ConfidentialJanuary, 2002
A dialog is an interaction with the user, means prompt a menu and get some input
Two kind of dialogs,‘Forms'and‘Menu’ A sub-dialog is like a function call Sub-dialog use for database query
Dialog/Sub-dialog
25Momentum ConfidentialJanuary, 2002
A session begins when user starts to interact with a voice XML interpreter, it continues as documents are loaded and processed, and ends when requested by the user.
Session
26Momentum ConfidentialJanuary, 2002
A grammar is a set of phrases that a caller is expected to say during a dialog in response to a particular prompt.
A grammar can be as simple as “yes” versus “no” as large as a list of all the names of people living in a city.
A grammar file is a text file and it has the file extension .grammar
Grammar
27Momentum ConfidentialJanuary, 2002
VXML defines a mechanism for handling events not covered by the form mechanism
Events are thrown by the platform under variety of circumstances, user does not respond, response not recognize, help etc
Events are caught by catch elements.
Events
Automated Voice Processing
Voice Portal
Voice XML
Voice Portal at Work
28Momentum ConfidentialJanuary, 2002
Agenda
29Momentum ConfidentialJanuary, 2002
Momentum provides voice portal development services using the latest and preeminent speech-recognition and text-to-speech technology including Nuance, Speechworks and Fonix.
Momentum Voice Portal Development Services
30Momentum ConfidentialJanuary, 2002
Requirement Analysis
Prototype VUI Design Application Development
Testing Deployment
How We Do It?
31Momentum ConfidentialJanuary, 2002
Momentum has developed a Voice Portal Demo application, Momentum Travel Voice Portal (MTVP). The MTVP provides a user interface through voice to give functionalities for purchasing and reserving travel packages.
Momentum Travel Voice Portal
32Momentum ConfidentialJanuary, 2002
Momentum is using complete suite of Nuance voice technology, which includes-
Nuance 7.0.3 for voice recognition, call control and recording of prompt.
V-Builder for developing voice-user interface (VUI) that defines flow of interaction.
Grammar-Builder to write grammars that represents valid responses.
Nuance Speech Objects - Speech Objects are a set of reusable components implemented as Java beans.VXML is used as a development language for VUI.
Nuance in MTVP
33Momentum ConfidentialJanuary, 2002
To try the MTVP demo, dial any of the following phone number in US:
(800) 303-9987
(415) 869-6909
When the system asks you to enter a pin, you can dial one of the following PINS: 823272/ 823273/823274
Demo
34Momentum ConfidentialJanuary, 2002
We are also planning to embark upon voice driven E-commerce applications, i.e. V-Commerce, Voice Enabled Intranet and Unified Messaging.
Future Plans