A glimpse of voice technology

34
1 Momentum Confidential January, 2002 A Glimpse of Voice Technology By: Vishad Garg Momentum India Pvt. Ltd. [email protected] [email protected] 91-9611077772 September 12, 2001

description

This presentation talks about Speech IVR technology, basics of IVR, VoiceXMl and how to develop an IVR application.

Transcript of A glimpse of voice technology

Page 1: A glimpse of voice technology

1Momentum ConfidentialJanuary, 2002

A Glimpse of Voice Technology

By:

Vishad Garg

Momentum India Pvt. Ltd.

[email protected]

[email protected]

91-9611077772

September 12, 2001

Page 2: A glimpse of voice technology

2Momentum ConfidentialJanuary, 2002

Automated Voice Processing

Voice Portal

Voice XML

Voice Portal at Work

Agenda

Page 3: A glimpse of voice technology

3Momentum ConfidentialJanuary, 2002

“Automated Voice Processing is the act of answering, routing, and handling phone calls with a computer-based system. The call processing system answers and processes calls according to the needs of the caller and the person and/or company being called.”

Definition

Page 4: A glimpse of voice technology

4Momentum ConfidentialJanuary, 2002

Interactive Voice Response (IVR)

Voice Mail

Automatic Call Distribution(ACD)

Audiotext

Predictive Dialer

Voice Portal

Applications of Voice Processing

Page 5: A glimpse of voice technology

5Momentum ConfidentialJanuary, 2002

“IVR systems facilitate people-to-computer/database communications.It automates the handling of calls by interacting with one or more online databases.”

IVR system works on the premise of:

Data Capture

Information Delivery

Computer Telephony Integration (CTI) Link

Interactive Voice Response (IVR)

Page 6: A glimpse of voice technology

6Momentum ConfidentialJanuary, 2002

“Voice mail enhances people-to-people communication. Voice mail is an umbrella covering a variety of automated voice processing features including voice mailboxes for storing and forwarding messages, voice menus for routing and responding to calls, recorded announcements for selectively disseminating information, and information access to databases.”

Voice Mail

Page 7: A glimpse of voice technology

7Momentum ConfidentialJanuary, 2002

“ACD facilitate distribution of incoming calls based upon some algorithms to a group of people (agents) that can answer the calls.It uses the facility of ANI and DNIS to perform it.”

Automatic Call Distribution(ACD)

Page 8: A glimpse of voice technology

8Momentum ConfidentialJanuary, 2002

“Audio text is a service that allows callers to access prerecorded information on a topic of interest to them. It allows multiple callers to retrieve recorded announcements containing information that would otherwise have been given by a person. The information retrieved is general and not specific to each caller .”

Audio text

Page 9: A glimpse of voice technology

9Momentum ConfidentialJanuary, 2002

“Predictive Dialer facilitate launching of calls and monitor their progress.Only connected calls are passed to agents.”

Predictive Dialer

Page 10: A glimpse of voice technology

Automated Voice Processing

Voice Portal

Voice XML

Voice Portal at Work

10Momentum ConfidentialJanuary, 2002

Agenda

Page 11: A glimpse of voice technology

11Momentum ConfidentialJanuary, 2002

“The convergence of the richness of the web and the accessibility of the phone is forming a vast new network - a voice portal, where internet content can be accessed from any phone, anywhere, using human voice”.

“Speech enabled access to web-based information”.

Definition

Page 12: A glimpse of voice technology

12Momentum ConfidentialJanuary, 2002

“Leverages the Internet for application development and delivery.”

Phone instead of PCVoiceXML instead of HTMLA voice browser instead of an ordinary web browser.

Voice Portal vs. Web Portal

Page 13: A glimpse of voice technology

13Momentum ConfidentialJanuary, 2002

Standard language enables portability. High level domain-specific language simplifies application development.

Can consolidate voice and web applications. Cost of creating a speech-based portal platform continues to decline.

Internet has raised public expectations, with people growing used to having information at their fingertips when they want it. Once people get accustomed to immediate news, weather reports or stock quotes over the Internet, the transition to the phone makes perfect sense.

Why bring the internet to voice applications?

Page 14: A glimpse of voice technology

14Momentum ConfidentialJanuary, 2002

Automatic Speech Recognition(ASR) Voice Browser Text-To-Speech VoiceXML

Voice Portal Key Components

Page 15: A glimpse of voice technology

15Momentum ConfidentialJanuary, 2002

Automatic Speech Recognition (ASR) is the technology that allows a machine to understand human speech.

Takes human speech input, digitizes it, and converts it into a machine-readable string of text.

A component called a recognizer then manipulates the text into a form that the recognizer uses to identify what the speaker said.

Automatic Speech Recognition

Page 16: A glimpse of voice technology

16Momentum ConfidentialJanuary, 2002

Implementation Platform

VXML Browser

Document-Server A document server processes request from a client application, the voice XML interpreter. The server produces VXML document in reply, which is processed by the voice XML interpreter.

VoiceXML interpreter is responsible for detecting an incoming call, acquiring the initial voice XML document and answering the call.

Voice Browser/Interpreter

Page 17: A glimpse of voice technology

17Momentum ConfidentialJanuary, 2002

TTS converts text strings inputs to the spoken outputs

TTS is increasingly being used to speak e-mail and Web-based text to callers

Text-To-Speech(TTS)

Page 18: A glimpse of voice technology

Automated Voice Processing

Voice Portal

Voice XML

Voice Portal at Work

18Momentum ConfidentialJanuary, 2002

Agenda

Page 19: A glimpse of voice technology

19Momentum ConfidentialJanuary, 2002

Voice extensible markup Language A language for specifying voice/audio dialogs Voice dialogs use audio prompts and text- to-

speech (TTS) for output; touch- tone keys (DTMF) and automatic speech recognition (ASR) for input.

Main input/ output device (initially) is the phone.

What is VXML

Page 20: A glimpse of voice technology

20Momentum ConfidentialJanuary, 2002

Bring full power of web development and content delivery to voice response applications

Shield authors from low level programming and platform specific details.

Enables Integration of Voice Services with data services using Client Server paradigm

Voice service is viewed as a sequence of interaction dialog between a user and an implementation platform.

Goal of VXML

Page 21: A glimpse of voice technology

21Momentum ConfidentialJanuary, 2002

Output of Synthesized speech Output of audio files Recognition of spoken input Recognition of DTMF input Recording of spoken input Telephony features such as call transfer and

disconnect

Scope of VXML

Page 22: A glimpse of voice technology

22Momentum ConfidentialJanuary, 2002

Application Dialog/Sub-dialog Session Grammar Events

VXML Concepts

Page 23: A glimpse of voice technology

23Momentum ConfidentialJanuary, 2002

A set of Documents sharing the same application root document

Root document variable and grammar available when transitioning to other document.

D3

Root

D1 D2

Application

Page 24: A glimpse of voice technology

24Momentum ConfidentialJanuary, 2002

A dialog is an interaction with the user, means prompt a menu and get some input

Two kind of dialogs,‘Forms'and‘Menu’ A sub-dialog is like a function call Sub-dialog use for database query

Dialog/Sub-dialog

Page 25: A glimpse of voice technology

25Momentum ConfidentialJanuary, 2002

A session begins when user starts to interact with a voice XML interpreter, it continues as documents are loaded and processed, and ends when requested by the user.

Session

Page 26: A glimpse of voice technology

26Momentum ConfidentialJanuary, 2002

A grammar is a set of phrases that a caller is expected to say during a dialog in response to a particular prompt.

A grammar can be as simple as “yes” versus “no” as large as a list of all the names of people living in a city.

A grammar file is a text file and it has the file extension .grammar

Grammar

Page 27: A glimpse of voice technology

27Momentum ConfidentialJanuary, 2002

VXML defines a mechanism for handling events not covered by the form mechanism

Events are thrown by the platform under variety of circumstances, user does not respond, response not recognize, help etc

Events are caught by catch elements.

Events

Page 28: A glimpse of voice technology

Automated Voice Processing

Voice Portal

Voice XML

Voice Portal at Work

28Momentum ConfidentialJanuary, 2002

Agenda

Page 29: A glimpse of voice technology

29Momentum ConfidentialJanuary, 2002

Momentum provides voice portal development services using the latest and preeminent speech-recognition and text-to-speech technology including Nuance, Speechworks and Fonix.

Momentum Voice Portal Development Services

Page 30: A glimpse of voice technology

30Momentum ConfidentialJanuary, 2002

Requirement Analysis

Prototype VUI Design Application Development

Testing Deployment

How We Do It?

Page 31: A glimpse of voice technology

31Momentum ConfidentialJanuary, 2002

Momentum has developed a Voice Portal Demo application, Momentum Travel Voice Portal (MTVP). The MTVP provides a user interface through voice to give functionalities for purchasing and reserving travel packages.

Momentum Travel Voice Portal

Page 32: A glimpse of voice technology

32Momentum ConfidentialJanuary, 2002

Momentum is using complete suite of Nuance voice technology, which includes-

Nuance 7.0.3 for voice recognition, call control and recording of prompt.

V-Builder for developing voice-user interface (VUI) that defines flow of interaction.

Grammar-Builder to write grammars that represents valid responses.

Nuance Speech Objects - Speech Objects are a set of reusable components implemented as Java beans.VXML is used as a development language for VUI.

Nuance in MTVP

Page 33: A glimpse of voice technology

33Momentum ConfidentialJanuary, 2002

 

To try the MTVP demo, dial any of the following phone number in US:

(800) 303-9987

(415) 869-6909

When the system asks you to enter a pin, you can dial one of the following PINS: 823272/ 823273/823274

Demo

Page 34: A glimpse of voice technology

34Momentum ConfidentialJanuary, 2002

We are also planning to embark upon voice driven E-commerce applications, i.e. V-Commerce, Voice Enabled Intranet and Unified Messaging.

Future Plans