Multimodal user interfaces: Implementation Chris Vandervelpen [email protected].

Multimodal user interfaces: Implementation

Chris [email protected]

Overview

• Introduction• VoiceXml• X+V• From models to X + V• Demo: ACCESS Netfront• Conclusions• Questions

Introduction

• Focus on speech/direct manipulation on mobile device

• How can we deploy a multi modal UI– Build our own framework using speech

synthesizer/recognizers that interpret the designed models (reinventing the wheel)

– Build software that generates standardized markup from the models (use existing technologies) start point

VoiceXml

• Markup language for speech only interfaces

• Telephone interfaces• Using grammars for speech recognition

– Java Speech Grammar Format (JSGF)– Nuance Grammar Specification Language

(NGSL)• Speech output

– Synthesis– Prerecorded audio

• http://www.voicexml.org

http://www.voicexml.org/

http://www.voicexml.org/

VoiceXml

<vxml:form><vxml:field name=“departure_city“>

<vxml:grammar> <![CDATA[ #JSGF V1.0; grammar cities; <city> = brussels | antwerp | amsterdam;

]]> </vxml:grammar> <vxml:prompt> What departure city do you like?? </vxml:prompt> <vxml:catch event="help nomatch noinput"> For example, brussels, antwerp or amsterdam </vxml:catch>

<vxml:filled> <vxml:prompt>Your departure city is <vxml:value=“expr=departure_city”

/></vxml:prompt></vxml:filled>

</vxml:field><vxml:field name=“destination_city“>

………</vxml:field>

</vxml:form>

VoiceXml

• Mixed-initiative forms– Single user input for several fields– Supports more natural language

• For example – I want to fly from “brussels” to

“amsterdam”– Filling in departure_city and

destination_city fields

X + V

• X + V– XHtml: visual channel– VoiceXml snippets: speech channel

• Synchronization between modalities using Xml Events

• Multimodal browsers supporting X+V– ACCESS Netfront multimodal browser

(PocketPC)– Opera

• http://www.voicexml.org/specs/multimodal/x+v/12/

http://www.voicexml.org/specs/multimodal/x+v/12/

http://www.voicexml.org/specs/multimodal/x+v/12/

X + V

<html><body>

<form> <input id=“from” name=“from” size=“20”

ev:event=“inputfocus” ev:handler=“#voice_city_from” />

<input id=“to” name=“to” size=“20”ev:event=“inputfocus”ev:handler=“#voice_city_to” />

</form></body>

</html>

X + V

<vxml:form id=“voice_city”><vxml:field name=“departure_city_field“ id=“voice_city_from”>

<vxml:grammar> <![CDATA[ #JSGF V1.0; grammar cities; <city> = brussels | antwerp | amsterdam;

]]> </vxml:grammar> <vxml:prompt> What departure city do you like?? </vxml:prompt> <vxml:catch event="help nomatch noinput"> For example, brussels, antwerp or amsterdam </vxml:catch>

<vxml:filled> <vxml:assign name=“document.getElementById(‘from)” expr=“departure_city” /></vxml:filled>

</vxml:field><vxml:field name=“destination_city_field“ id=“voice_city_to” >

…….</vxml:field>

</vxml:form>

X + V

• Also usable with XForms• VoiceXml snippets and XForms

influence same XForms instance model synchronization

Models to X + V

Models to X + V

• Annotate UI description for speech [Shao2003: Transcoding HTML to VoiceXML Using Annotations]

• Extend this approach to UIML and X + V– Identify particular information structures

• Text areas• Menu/List structures• Top-level visual region

– Define their representation in XHTML and VoiceXml

– Generate the synchronization XML eventing code

Model to X + V

• Define a generic UIML widget vocabulary mapping for both GUI and speech [Plomp2002]

• TextEntry– <field> (VoiceXml)– <input type=“text” /> (XHtml)– System.Windows.Forms.TextBox

• Collection– <form> (VoiceXml)– <form> (XHtml)– System.Windows.Forms.Panel

• Access Netfront multimodal browser• PocketPC• Ordering pizza• Ordering Chinese

Demo

Conclusions

• X + V– built-in modality synchronization– alternative to own multimodal

implementation– declarative– transformation from UIML possible

Questions?

Multimodal user interfaces: Implementation Chris Vandervelpen [email protected].

Documents

Transcript of Multimodal user interfaces: Implementation Chris Vandervelpen [email protected].