Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog...
Transcript of Speech Service Creationnewspeechsolutions.com/SampleFiles/BuildingSpeechApps.pdf · Avaya Dialog...
Speech Service Creation
K. W. (Bill) Scholz
NewSpeech, LLC
An Overview of Speech Service Creation Tools
NY / NJ Chapter
December, 2006
� Speech Applications – where we were and where we are
� Building speech applications today
� Methodologies and Tools
� Reusable components & packaged applications
� Summary of today’s Leading VUI creation tools
� Highlight / compare / contrast industry’s leading tools
Agenda
What’s it take to build a speech app?
Requirements, Use Cases, Project Plan
Call flow, Implementation, & Test
Dialog Design & Test
Prompts, Grammars, & Test
Data / Back-end Integration, & Test
Unit Test, Integration Test, System Test
Pilot, Limited Deployment, Analysis
Full Deployment, Analysis
Where We’ve Come From: Building Speech Apps
� Development toolkits designed for building DTMF applications were extended to support speech
� Call flows had the sound-and-feel of DTMF apps
� Grammars were constructed by hand
� Back-end integration coded by hand, often targeting closed-architecture information stores
� Screen scraping – ‘row 12, column 37, 9 characters’
� Proprietary closed databases
� Separate natural language processors driven by recognizer output required separate ‘NL’ grammars
� Poor TTS quality generated need for recorded prompts
Where We Are: Building speech apps today
� Methodologies and Tools� Methodology: problem statement, use cases, dialog
design, project management
� Data / Back-end integration
� Reusable components� OpenSpeech Dialog Modules
� Reusable Dialog Components
� Packaged applications
� Testing & Analytics
Current Practice
Most applications use state-based dialogs
� Easiest to design, debug and test for current simple applications
� Natural fit with the directed dialogs that are easiest for novice users
� Speech recognizer grammars are simpler to construct and therefore less error prone
� As developers and users become exposed to more sophisticated dialog approaches, they will become less satisfied with state-based dialogs
� Goal-directed
� Conversational
� Rule-based
Tools for Building Speech Applications
� Dialog design, evaluation, call flow development back-end integration, prototype, deployment, tuning, life cycle support.
� Vendors � Active:
� Audium: the ‘Audium Builder’� DBscape Vocabase� Fluency: ‘Voice Runner’� OpenMethods: ‘OpenVXML’� TuVox: ‘CVR’ (‘Producer’ + management & analytics)� Vicorp: ‘xMP’� VoiceObjects: ‘VoiceObjects X6’
� Inactive:� Unisys: the ‘NL Speech Assistant’� Unveil: ‘Conversation Manager’� Vocalocity: ‘AppCenter’
� Support:� Eclipse – Back-end integration� Microsoft: ‘Visio’ for call flow representation� Nuance: OSI – Tuning
Avaya Dialog Designer
IBM WebSphere
Intervoice InVision
Microsoft Speech .NET
NetByTel (TuVox)
Nortel MPS Developer (was PeriProducer)
Nuance OSD
Orange Nextfire OAVS
And others……
SCE Tools: what to look for
� Manipulable element – what the SCE assembles
� Element detailing – how each is tailored for use
� Business rule / back-end integration
� Architectural model – underlying design pattern
� Life cycle support – pre- and post-deployment management and testing
" y e s "
D T 7
M ix e d I n i t i a t i v e
D T 7 . 1 p r o m p t f o r d a t a
D T 7 .2 Y e s /N o
p r o m p t1 : < d a ta 1 > . . . < d a t a N > . I s t h a t
c o r r e c t ? Y e s o r n o ?
P r o m p t3 : < d a ta 1 > a n d < d a ta 2 > . I s t h a t
c o r r e c t ? y e s o r n o
D T 7 .C 1 V a r io u s
P r o m p t 1 : W h a t i s
< d a ta 1 > ?
D T 7 .C 2 V a r io u s
P r o m p t 1 :
W h a t i s < d a ta 2 > ?
2 n d
" n o "
D T 7 .C n
P r o m p t 1 : W h a t i s
< d a ta N >
R e t u r n Y e s
G S 1
T r a n s a c t io n
E r r o r
R e c o v e r y
D T 7 .3 n Y e s /N o
p r o m p t1 : < d a ta N > C o r r e c t ?
P r o m p t2 : C o r r e c t ? y e s o r n o
D T 7 .3 Y e s /N o
p r o m p t1 : < d a ta 1 > I s t h a t
c o r r e c t ?
P r o m p t2 : C o r r e c t ? y e s o r n o
1 s t " n o "
y e s
2 n d " n o "
2 n d s i le n c e
2 n d m is r e c
D T 7 .3 b Y e s /N o
p r o m p t1 : < d a ta 1 > C o r r e c t ?
P r o m p t2 : C o r r e c t ? y e s o r n o
1 s t " n o "
2 n d " n o "
2 n d s i le n c e
2 n d m is r e c
( i f o n ly d a t a 2 w a s c o l le c t e d f r o m D T 7 .2 ,
t h e n g o t o D T 7 .C 1 a n d c o l le c t f i r s t p ie c e o f d a t a a n d t h e n
r e t u r n t o c o l le c t a n y r e m a in in g d a ta - t h is c a p a b i l i t y i s n o t im p le m e n t e d in A A T A K E C O M P L E T E )
1 s t " n o "
2 n d " n o "
2 n d s i le n c e
2 n d m is r e c
D T 7 .C 1
D T 7 .3d a t a 1 o n ly
D T 7 .C 12 n d m is r e c / s i le n c e
1 s t " n o "
( p r o m p t2 )
D T 7 . 3 b
d a ta 2 o n ly
D T 7 .4 U n a b le t o
c o l le c t < d a t a > .
D T 7 .4
D T 7 .4
Visio to Represent Dialog Call flow
Source: Unisys ‘FFA’ design specification)
Audium (Purchased by Cisco)
• Audium Builder: a GUI that permits users
to create and manage multiple applications
• Visual elements include functions for
managing databases, menus, dates and
times, or phone transfers, as well as credit
card or email processing.
• Application creation is done by dragging
elements to the workspace to construct the
call flow
• As elements are added their properties
can be configured to load pre-recorded
audio or TTS prompts, and configured to
play naturally to callers.
• Elements are interconnected using the GUI
to assign ‘exit states’ to reach an end goal.
Source: Joe Oh, Audium, (private communication)
Audium
Application treeview
Tools
Object properties
DBscape Vocabase
The VocaBase “Dialog Map” represents the sequence of modules, sub-modules, and steps. Clicking on any element permits access its detailed configuration.
Fluency ‘Voice Runner’
Key features of this tool are:� Visual component assembly� Integrated component assembly
analysis & testing� One click assembly deployment� Library of process and rule
components:� Address Collection� Credit Card Verification
Vicorp xMP
VoiceObjects 6 Desktop
� Tree structure to represent dialog design
� Point-and-click authoring.
� Layering includes system layers and user-built layers
� Single click packages an application for deployment
� Back-end integration: ‘connectors’ support both server-side scripting and J2EE code execution
� Uses object-oriented concepts
Source: http://www.voiceobjects.com/
VoiceObjects Desktop – At a glance
Individual editor for voice object
List of all available VoiceObjects
Source: Tiemo Winterkamp, VoiceObjects (private communication)
Components
Resources
Logic
Actions
VoiceObjects Desktop - Control Center
Source: Tiemo Winterkamp, VoiceObjects (private communication)
Microsoft Speech (Visual Studio)
Unisys ‘NLSA’
NLSA Grammar Specification
Vocalocity AppCenter
Source: Ken Rehor - 2005
OpenVXML – Open Source SCE
Back-end Integration
� Java, JSP, C#
� Scripting languages
� PERL
� JSP / ASP
� PHP
� …
� Databases� Oracle
� Microsoft SQL Server
� MySQL / PostgreSQL
� Web Services
� AJAX (Asynchronous Javascript and XML)
Eclipse
Testing
� Unit – emulation
� Callflow – WoZ or live
� Usability – WoZ or live
� Post deployment analytics
Modules and packaged applications
Modules: components and templates
Source: Steve Erlich, Apptera (private communication)
Application
A software program A software program
designed to perform a designed to perform a
specific set of functionsspecific set of functions
Component Template
A piece of software A piece of software
that can be combined that can be combined
with other pieces to with other pieces to
construct a programconstruct a program
A pattern used to A pattern used to
replicate objectsreplicate objects
SCE Analysis and Evaluation
� Manipulable element – what the SCE assembles� Dialog state� Object module� Conversation step
� Element detailing� Properties and values� Element attributes� Prompt and grammar management
� Business rule / back-end integration� Built-in primitives� Integration with Java, Web Services, Databases
� Architectural model� OO? FSM? SOA? MVC? Design patterns?� Visible dialog metalanguage?
� Life cycle: Deployment and post-deployment support� Reuse: create, package, and integrate reusable components� Test capability; test script generation; WoZ capability� Analytics
Audium
� Application Development assets� Gui is implemented using Eclipse. VISIO-like view� Inline grammars can be generated directly by the Studio� Centralized prompt management capability; recording scripts generated� OSDM integration supported (but RDCs are not)� XML dialog meta-language documented and the DTD provided� Multiple ‘Form’ elements can be combined to generate mixed-initiative
dialog� Multi-user collaboration is well supported and demonstrated at customer
sites
� Runtime assets� Applications published as XML; interpreted by a Java runtime engine� SNMP queries are generated
� Liabilities� Layering is not distinct – common database and external component
references � No 3rd party application support� No automatic test script generation� No dedicated form for mixed initiative� No runtime cluster or server management� No speaker verification or video service generation capability� Elements oriented towards programmers, not towards VUI designers
Vicorp
� Application Development assets� Explicit separation of presentation layer from business objects layer� Visio-like presentation of application call flow.� Inline grammars with confidence levels generated from item lists� Prompt categories facilitates multiple persona and language management.� Invokes 3rd party applications by URI with arguments.� Directed dialog, mixed initiative, and sub dialogs are supported.
� Runtime assets� Applications published as EAR files for execution on J2EE application server.� Service Management Console provided to mange server clusters.
� Liabilities� No support for the generation of SSML for TTS� Internal XML dialog meta-language not exposed for use� No automatic testing of applications; no post-deployment analytics� No support for multi-user management or collaboration� Speaker verification and video service generation not shown� It is not possible to open multiple simultaneous projects then cut-and-paste
between them.
VoiceObjects
� Application Development assets � Layering facilitates runtime prompt and persona remapping� Java extensions easily integrated as external resources� OSDM integration supported� Invokes 3rd party applications by URI with arguments.� XML dialog meta-language documented, DTD provided� Recording script generation by DB query� Multi-user collaboration supported: user logons with specific privileges
� Runtime assets� Single runtime engine accesses all applications as data� Runtime data collection through ‘InfoStore’ and a mature Analytics package.� Extensive server cluster management, including SNMP� Support for multi-tenancy: separate JVMs launched for each tenant
� Liabilities� Reusable Dialog Components are not supported� No explicit prompt management� Eclipse integration is incomplete� Confidence values not supported� No generation of SSML or recording scripts� No built-in application testing capability or test script generation capability� Natural language apps only supported by reference to external SLMs� External resources such as Java jar files are not managed by app dev
environment.
Conclusion
� Building speech applications today…..
…..a bit like a marriage!
Dialog modules,
Packaged apps
VUI built with
tools
ASR and TTS
subsystems
Summary
� Overview of speech application creation process
� Building speech applications today
� Methodologies and Tools
� Reusable components
� Packaged applications
� Where the field is going
� Dialog description languages and tools: MI, Personalization, automatic call flow generation
� SLMs, ASR & TTS improvements, Rule-Based and Case-Based Reasoning
Thank You.