(1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist...

(1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc [email protected] O’Reilly Conference on Enterprise Java, 2001

Transcript of (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist...

Page 1: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


VoiceXMLOverview, Opportunities

& Challenges

Hitesh Kr. SethChief Technology EvangelistSeraNova, [email protected]’Reilly Conference on Enterprise Java, 2001

Page 2: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Agenda Introduction History Elements Developing Voice Portals Applications Vendor Landscape Challenges Resources

Page 3: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 4: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


The Web is Ubiquitous Key Highlights

HTTP Protocol HTML for Content

Static, Dynamically Generated

Usage Model Create Content/Scripts Publish on the Web Server Access it through a web browser

Page 5: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


What about Voice? Call Center, IVR based products have been

around IVR Applications usually are “DTMF” oriented

Interaction through the key pad rather than Voice Complex Infrastructure

Involve huge investments in proprietary solutions Lack of integration with the Internet ASP model for deployment wasn’t established Emergence of sophisticated

Text-to-Speech/Voice Recognition solutions

Page 6: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



What is VoiceXML? XML based markup language which describes

voice/touch-tone based interactions for development of interactive voice based applications

Page 7: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Application Model

Page 8: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Technical Highlights Based on XML 1.0 Supports

DTMF (touch tone keys) and Voice Input Press 1 for Email; Please say your name

TTS (Text-to-Speech) and Pre-Recorded Audio Output Recording of User Input Telephony Integration

e.g. Connect to a Live Operator Form & field level grammars direct and (near) natural dialogs

Direct: Which city would you like to go?San Jose

Natural Like: What can I do for you, today?I would like to travel from San Jose, CA to Newark, NJ on 15 Nov

Page 9: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Key Benefits Brings the ubiquity of Web to the ubiquitous access device

– an ordinary phone Reach billion(s) of LAN and mobile phones Hands free communication for automobiles Single Platform for developing Web & Voice Applications Opens up the web to reach billions of ordinary phones

worldwide Automated Customer Service

Can enhance customer satisfaction (immediate response) Lower costs (lesser customer service reps. and customer

waiting costs!) Can use it even in a flight!

Page 10: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Hello VoiceXML<?xml version="1.0"?>

<vxml version="1.0">



Hello World!




Page 11: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 12: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 13: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


History 3/2/1999

AT&T, Lucent & Motorola create VXML ForumNo of Members: 17

8/25/1999VoiceXML 0.9 Preliminary Spec ReleasedNo of Members: 61

3/7/2000VoiceXML 1.0 Spec ReleasedNo of Members: 79

5/22/2000VoiceXML 1.0 submitted to W3CNo of Members: 150

Today, there are 281 members of the VoiceXML Forum(10/5/2000)

Page 14: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Earlier Works SpeechML by IBM VoxML by Motorola PhoneWeb/PML by Lucent/AT&T

Page 15: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 16: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Elements Root

<vxml> Form/Interaction

<field>, <filled>, <initial>, <param>, <option> Grammar

<dtmf>, <grammar> Events

<error>, <exit>, <noinput>, <help>, <nomatch> Platform Specific

<meta>, <property>, <object> Telephony Integration

<disconnect>, <record>, <transfer>

Page 17: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Elements Language

<if>, <else>, <elseif>, <assign>, <value>, <var>, <script>, <return>, <clear>, <throw>, <catch>, <subdialog>, <block>

Prompt/Audio <break>, <sayas>, <audio>, <block>, <enumerate>,

<emp>, <prompt>, <pros>, <div>, <reprompt> Navigation

<choice>, <menu>, <link>, <goto>, <submit>

Page 18: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Prompts TTS (Text-to-Speech)

<prompt>What can I do for you?</prompt> <prompt>

Did you say <sayas class=“phone”>732-362-2187</sayas></prompt>

Did you say Area Code (732) 362-2187 Pre-Recorded Prompts

<prompt><audio src=“initial_greetings.wav”/>, Hitesh

</prompt> Rule of Thumb

Use TTS sparingly (only for dynamic information) <prompt bargein=“false”> can be used for Ads or any other

special announcements.

Page 19: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Navigation<?xml version="1.0"?><vxml version="1.0">

<menu><prompt>Welcome to your Personal Portal. <enumerate/> </prompt><choice dtmf="1" caching="safe" next="Email.jsp">Email</choice><choice dtmf="2" caching="safe" next="Calendar.jsp">Calendar</choice><choice dtmf="3" caching="safe" next=“EmployeeDirectory.jsp">Employee Directory</choice></menu>


Page 20: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Grammars Specify utterances that a user may speak to provide

corresponding string value or set of attribute-value pairs Can define a form grammar or field grammar Spec. doesn’t require an implementation to support a

particular format Common Grammar Formats

Java Speech API Grammar Spec (JSGF) Nuance GSL Speech Recognition Grammar Spec for W3C Speech Interface

Framework (Working Draft) Can be specified inline with the VoiceXML document or

referenced externally using the <grammar> tag

Page 21: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Grammars Inline...<field name="emplId">

<prompt>Say the name of the person</prompt><grammar type="application/x-jsgf">

hitesh seth {1} | ...


External...<field name="emplId">

<prompt>Say the name of the person</prompt><grammar type="application/x-jsgf" src="mycompany.gram#employee" caching="safe"/>...

</field>...mycompany.gram#JSGF V1.0;grammar mycompany;public <employee> =

(hitesh seth) {1} ...

Page 22: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Interaction<?xml version="1.0"?><vxml version="1.0">

<form id="Main"><field name="emplId">

<prompt>Say the name of the person</prompt><grammar type="application/x-jsgf">

(hitesh seth) {1}| ...


<if cond="emplId=='1'"><goto next="#Employee1"/>

<elseif cond="emplId=='2'"/>...



Page 23: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Interaction<form id=“Employee1">


<prompt>Hitesh Seth.

Direct Phone:

<sayas class="phone">732-362-2187</sayas>.






Page 24: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Telephony Integration <transfer> element Connect the user to another phone Applications

Assisted dialing Online Employee Directory! I would like to call Hitesh on his cellular phone. Connecting to (732) 433-5603 ….

Switching to a human Operator Welcome to XYZ Voice Portal. At any point of time say

Operator to connect to a customer service agent. Please say your name. ….

Page 25: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Telephony Integration<?xml version="1.0"?><vxml version="1.0"><form ...>... <field name="cmd">

<prompt>Hitesh’s direct phone is (732) 362-2187, Cellular ...</prompt><grammar type="application/x-jsgf">

home | direct | cellular</grammar><filled>

<if cond="cmd=='direct'"> <assign name="phone_no" expr="'7323622187'" /> <goto next="#CallTransfer"/>

<elseif cond="cmd=='cellular'"/>...



Page 26: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Telephony Integration <form id="CallTransfer">

<block><prompt><audio src="transfer.wav“/></prompt>

<transfer dest="{phone_no}"/>



Page 27: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Extensions <object> & <property> Tags <property>

Implementation Specific Properties e.g.

TTS Engine Parameters (gender, tone etc) <object>

Implementation Specific Components and Value Add Services

e.g. Integration with the components built for the underlying ASR

Engine (e.g. Nuance SpeechObjects) e.g. Component for getting an address

Caller-Id Information Service Cellular Phone Location Service

Page 28: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 29: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Developing Voice Portals

Page 30: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Developing What do you need?

Development Tool To develop/test the application

IBM WebSphere Voice Server SDK, Motorola Mobile ADK, Nuance V-Builder, Tellme Studio, …

Web Server To execute the scripts/server VoiceXML content

Apache, Microsoft, Netscape, … JSP, Servlets XML Parser, XSLT Processor

VoiceXML Interpreter/Implementation Platform Ordinary Touch Tone Phone PC with a good Sound Card and microphone

For Creating/Testing Applications using Simulators/SDKs

Page 31: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Static/Dynamic Serving! Up VoiceXML

Static v/s Dynamic Content Dynamic

Server Scripting technologies such as JSP,Servlets to generate VoiceXML

Dynamic Presentation using XML/XSLT XML represents content XSLT represents transformation of the content into

presentation Use Apache Cocoon!

Page 32: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Represents Data Static XML

or Dynamically Generated

using Server Scripts XSLT

Represents Formatting Write it yourself

or Create through a tool

Page 33: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Processing XML/XSLT JSP<%@page import="org.apache.xalan.xslt.*"%><% String xml =“AddressBook.xml"; XSLTProcessor processor=

XSLTProcessorFactory.getProcessor(); String xslFile = "AddressBook.xsl"; processor.process(

new XSLTInputSource(xmlFile),new XSLTInputSource(xslFile),new XSLTResultTarget(out));

%> Use Sophisticated Content Management Systems Create different Style Sheets for different interfaces - VoiceXML,


Page 34: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Deployment Infrastructure Required

In Addition to Web Application Server serving VoiceXML pages, you need

Telephony Interface Boards ASR Engine TTS Engine VoiceXML Interpreter Bandwidth/Incoming Lines

Deployment Options Pre-packaged VoiceXML Server (all-in-one) Pick and choose VoiceXML Solution components

ASR, TTS, VoiceXML Interpreter, Hardware Ports, Bandwidth Hosted Voice ASP Solutions

Page 35: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 36: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Applications Utilized Web Content/Information

Stock Quotes, Weather Information, News Customer Service

Order Status, Address Change, Automated Call Center, etc Commerce

Banking, Stock Trading, Voice Enabled Commerce Corporate Portals

Employee Directory, Employee Self Service - Human Resources, Email, Calendar, Unified Messaging

Alerts [Push Model] Server Initiated Transactions (Call me when the stock price of

any company in my portfolio goes up by $10)

Page 37: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Corporate Portal Scenario 1 (800) – XXXXXXX Welcome to Your Corporate Portal. Please say your name. Hitesh Seth Please enter your access code **** Good Morning, Hitesh. What can I do for you? Check my mail You have 34 new messages. Is there any new message from my boss? Yes there are two message from …

Page 38: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Corporate Portal (contd.) First message. Subject: Help Need in XYZ Project. Hitesh, could you please call …?. Reply I am in San Jose till 15th of November. I could come to

Phoenix on 16th November.[#] [used <record>]

Mail Sent When am I meeting with John today? You have a meeting with John, at 2:00 PM. Connect me to his office, please. Connecting to John’s direct number, (732) ...

[used <transfer>]

Page 39: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Vendor Landscape

Page 40: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Vendor Landscape All-in-one VoiceXML Gateways/Servers

Combines ASR, TTS, VoiceXML Interpreter, Hardware Ports

Lucent Speech Server, Motorola Voice Developer Gateway, VoiceGenie VoiceXML Gateway, …

ASR (Advanced Speech Recognition) Engines AT&T, IBM, Nuance, Philips, SpeechWorks, …

Development Tools IBM WebSphere Voice Server SDK, Motorola Mobile ADK,

Nuance V-Builder, Tellme Studio, … Recording & Developing Prompts

Microsoft Sound Recorder, Sonic Foundry Sound Forge, Syntrillium Software Cool Edit, ...

Page 41: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Vendor Landscape Text-to-Speech Engines

AT&T, Fonix TTS, L&H RealSpeak, Lucent TTS Engine, Nuance Vocalizer, SpeechWorks Speechify, …

Telephony Interface Boards Dialogic, Lucent, ...

Voice ASP Solutions BeVocal, Interactive Telesis, Tellme, VoiceGenie

Technologies, Voxeo.net, ...

Page 42: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 43: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Challenges Need Sophisticated Infrastructure Voice Recognition Quality Need to build Sophisticated Grammars for near

natural language speech recognition. Your Application is as good as its grammar.

TTS Quality & Customization Server Initiated VoiceXML Interactions! (Push

Model) VoiceXML Application Development Tools are still


Page 44: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Authentication Possible Approaches

User-Ids/Passwords Too cryptic for ASR Engines to recognize Usually need to spell it out, which is hard

Names/Access-Codes Names may not be unique; may be good for intranets

Telephone No/Access Codes Telephone No are unique (0017323622187) for International

Portal, (7323622187) for a US Portal (or redirected to a US only area)

Easy to Key in and/or say-aloud If available, use Caller-Id similar to “persistent cookie”

Voice Based Authentication Voice Print/Pattern

Page 45: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Performance Grammars

Inline v/s External Caching!

VoiceXML Documents Caching! Multiple interactions per document

Audio TTS v/s Recorded Prompts Quality v/s Size

Page 46: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Getting Started Take Small Steps

Use DTMF Enter your 10 digit account number Press 1 for Email, 2 for calendar, 3 for employee directory

Use Directed Dialogs Say the name of the person

Move towards natural language conversations What can I do for you?

Use TTS Sparingly for quality of voice interaction If your application incorporate ads, make sure to make

them short and crisp Start Small, grow big (try regional betas/limited trials and

move towards a larger audience)

Page 47: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Opportunities According to Kelsey Group

By 2005, Advertising and transaction from Voice Portals will

produce $5 billion in revenues and $6 billion for associated hardware, software and Net service provider companies.

(Adopted from Voice portal companies overshooting demand,

http://news.cnet.com/news/0-1004-200-1844967.html, May 9, 2000)

Page 48: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 49: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Resources Organizations

VoiceXML Forumhttp://www.voicexml.org

W3C Voice Browser Activityhttp://www.w3c.org/Voice

Specs VoiceXML Specification

http://www.voicexml.org/spec.html Java Speech API Grammar Spec (JSGF)


Page 50: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Resources Vendors










VocieGenie Technologieshttp://www.voicegenie.com


Page 51: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.



Page 52: (1) VoiceXML Overview, Opportunities & Challenges Hitesh Kr. Seth Chief Technology Evangelist SeraNova, Inc hitesh.seth@seranova.com OReilly Conference.


Thanks for your time.