The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications

21
The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD- DocArch

description

The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications. 14 Feb. 2002 Christophe Strobbe K.U.Leuven - ESAT-SCD-DocArch. Overview. Voice browsers History of voice markup languages W3C Speech Interface Framework Communication Architecture VoiceXML 2.0 - PowerPoint PPT Presentation

Transcript of The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to Web Applications

Page 1: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

The Voice-Enabled Web: VoiceXML and

Related Standards for Telephone Access to

Web Applications14 Feb. 2002

Christophe StrobbeK.U.Leuven - ESAT-SCD-DocArch

Page 2: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Overview• Voice browsers• History of voice markup languages• W3C Speech Interface Framework• Communication Architecture• VoiceXML 2.0• Grammars• SALT

• Not WAP/WML, Voice over IP

Page 3: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Voice Browser

Device (hardware and software) that interprets voice markup languages to generate voice output and interpret voice input.

Page 4: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Companies

Page 5: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

History

1990s: companies developed their own markup languages:

• PhoneML (AT&T)

• PhoneML (Lucent)

• VoxML (Motorola)

• TalkML (HP Labs)

• SpeechML (IBM)

=> VoiceXML Forum : VoiceXML 1.0

• 1998: W3C Voice Browser Workshop

Page 6: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

VoiceXML Specification History

• April 1999 – Initial spec – Request For Comment

• August 1999 – 0.9 Spec released

• March 2000 – 1.0 Spec released

• October 2001 – 2.0 Working Draft (W3C)

• March 2002 – next Working Draft

• 4th quarter 2002 – 2.0 Recommendation W3C?

Page 7: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Why Voice Markup Languages?

• “Voicifying” web pages by adding a few VoiceXML tags is not feasible:– basic design principles that make a good web page

are very different from those that make an efficient voice interface

– e.g. Raggett & Ben-Natan: “Voice Browsers” (W3C, 1998)

• … unless you want to create a multimodal interface (cf. SALT) ?

Page 8: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Speech Interface Framework

TTS

Language Understanding

WorldWideWeb

User

TelephoneSystem

DialogManager

LanguageGeneration

MediaPlanning

Prerecorded audio player

ASR

DTMF tone recognizer

Context Inter-

pretation

Lexicon Natural LanguageSemantics ML

VoiceXML2.0

Reusable ComponentsSpeech Synthesis ML

N-gram Grammar ML

SpeechRecognition

Grammar ML

Page 9: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Communication Architecture

Page 10: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

What is VoiceXML?

For creating audio dialogs that include• Synthesized speech• Digitized audio• Recognition of spoken and DTMF key input• Recording of spoken input• Telephony• Mixed-initiative conversationsMajor goal: bring the advantages of web-based development

and content delivery to interactive voice response applications.

Page 11: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Advantages of VoiceXML

As perceived by Motorola et al:• People want a better mobile user interface

while on the go

• Device Independent

• Open standards create and drive market demand

• Easy to program since similar to other XML-based languages

• Utilizes existing web infrastructure

Page 12: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Developing applications• To develop VoiceXML applications you have

to learn several languages:– VoiceXML

– ECMAScript (JavaScript/Jscript)

– a grammar format (GSL, JSGF, Speech Recognition Grammar Specification)

– a back end scripting language (Perl, Java, …)

• Web developers are used to this kind of environment

Page 13: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

VoiceXML Basics• XML-based

• More structured then HTML (describes structure and semantics of data, not presentation)– Must close all tags (i.e. <prompt> </prompt>)

• Structure of language described in a Document Type Description (DTD)

Page 14: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

VoiceXML Applications

• An application consists of a single application root document as well as zero or more other documents

• The application root document is loaded whenever any other document is accessed

• The application root document grammars and variables are visible in other application documents

Document root

DocumentDocumentDocument

Page 15: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

VoiceXML Documents• Documents can contain two types of dialogs:

– forms (<form>)

– menus (<menu>)

• Other elements:– <meta>: metadata, defined as name/value pair

– <var>: for declaring variables

– <script>: for client-side ECMAScript

– <catch>: for catching events

– <link>: transitions to other dialogs

Page 16: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Forms and menus• Forms may contain zero or more <field>

elements– the user must provide a value for the field before

proceeding to the next element in the form

– each field may specify a grammar that defines the allowable inputs

• Menus may contain one or more <choice> elements– a menu presents the user with a choice of options

and then transitions to another dialog

Page 17: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

VoiceXML Example01 <!-- helloworld.vxml -->

02 <?xml version="1.0"?>

03 <vxml version="1.0">

04 <form>

05 <block>

06 <prompt>

07 Hello World!

08 </prompt>

09 </block>

10 </form>

11 </vxml>

Page 18: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Example with Grammar01 <vxml version="1.0">

02 <meta name=“maintainer" content=“[email protected]"/>

03 <form id="hello">

04 <field name="item">

05 <prompt>Would you like coffee, tea, or juice?</prompt>

06 <grammar type="application/x-gsl">

07 [coffee tea juice] </grammar>

08 <filled>

09 <prompt>Your <value expr="item"/>

10 will be ready momentarily</prompt>

11 </filled>

12 </field>

13 </form>

14 </vxml>

Page 19: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Dynamic VoiceXML#!perl –w

print "Content-type: text/x-vxml \n\n";

$HOMEBUFFER = '<?xml version="1.0"?>

<vxml version="1.0">

<form>

<block>

<prompt> Hello World </prompt>

</block>

</form>

</vxml>';

print $HOMEBUFFER;

Page 20: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

Other Markup Languages• JSML: JSpeech Markup Language (Sun)

• Dialog ML (Dennis Heuer)

• SABLE (SABLE Consortium)

• DMML (Dialogue Moves Markup Language)

• SALT: Speech Application Language Tags (SALT Forum)

• (CallXML, Telephony Markup Language, …)

Progress since March 2000 (VoiceXML 1.0) ?

Page 21: The Voice-Enabled Web: VoiceXML and Related Standards for Telephone Access to  Web Applications

SALT• Speech Application Language Tags (SALT

Forum)

• SALT Forum founded by Microsoft, Intel, …; 15 October 2001

• very simple set of tags for extending existing markup languages (xHTML, XML)

• specification available Q1 2002

• specification submitted to standards body (W3C??) mid 2002