SpeechBuilder: Facilitating Spoken Dialogue System Creation

16
L C S SpeechBuilder: Facilitating Spoken Dialogue System Creation Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science [email protected]

description

SpeechBuilder: Facilitating Spoken Dialogue System Creation. Eugene Weinstein Project Oxygen Core Team MIT Laboratory for Computer Science [email protected]. Language Generation. Speech Synthesis. Dialogue Management. Hub. Audio. Database Server. Speech Recog. Context Resolution. - PowerPoint PPT Presentation

Transcript of SpeechBuilder: Facilitating Spoken Dialogue System Creation

Page 1: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

L C S

SpeechBuilder: Facilitating Spoken Dialogue System Creation

Eugene Weinstein

Project Oxygen Core Team

MIT Laboratory for Computer Science

[email protected]

Page 2: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

• Developing robust, mixed-initiative spoken dialogue systems is difficult

– Complex systems can be created by human-language technology experts

SpeechBuilder

Hub

SpeechSynthesis

SpeechSynthesis

LanguageGeneration

LanguageGeneration

DialogueManagement

DialogueManagement

ContextResolution

ContextResolution

Language ProcessingLanguage

Processing

SpeechRecog.

SpeechRecog.

DatabaseServer

DatabaseServerAudioAudio

Bridging the Experience Gap

• SpeechBuilder aims to help novices rapidly create speech-based systems

– Uses intuitive methods for specifying domain-specific constraints

– Automatically configures HLT components using MIT GALAXY architecture

* Leverages future technical advances

* Encourages research on portability

– Novice developers must overcome a considerable technical challenge

Page 3: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

SpeechBuilderServer

SpeechBuilderServerHub

CGI ParameterGeneration

CGI ParameterGeneration

SpeechRecognition

SpeechRecognition

SpeechSynthesisSpeech

Synthesis

Language ProcessingLanguage

Processing

AudioServerAudioServer

HTTP

• Gives developer total control over application functionality

DeveloperApplicationDeveloper

Application

• Communication with Galaxy via simple HTTP protocol

“Turn on the lights in the kitchen”

action=set&frame=(object=lights, room=kitchen,value=on)

“Show me the banks on Main Street”

action=identify&frame=( object=(type=bank, on=(street=Main, ext=Street)))

Baseline Configuration

Page 4: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

• Still gives developer total control over application functionality

• Frame Relay server exposes Galaxy meaning representation to app

DeveloperApplicationDeveloper

Application

“Turn on the lights in the kitchen”

{c turn_management

:parse_frame {c turn

:object “lights” :room “kitchen”

:value “on”}

“Show me the banks on Main Street”{c turn_management :parse_frame {c identify “type” bank :pred {p :on {:street “Main”

:ext “Street”}}}

Modified Baseline Configuration (this class)

Frame RelayServer

Frame RelayServerHub

CGI ParameterGeneration

CGI ParameterGeneration

SpeechRecognition

SpeechRecognition

SpeechSynthesisSpeech

Synthesis

Language ProcessingLanguage

Processing

AudioServerAudioServer

TCP SocketSemantic

Frame

Page 5: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

• For a speech-based interface to structured data• No programming required; specify table(s) and constraints

DatabaseServer

DatabaseServerHub

LanguageGenerationLanguage

Generation

SpeechRecognition

SpeechRecognition

DiscourseResolutionDiscourseResolution

SpeechSynthesisSpeech

SynthesisDialogue

ManagementDialogue

Management

Language ProcessingLanguage

Processing

I/OServer

I/OServer

AudioServerAudioServer

AudioServerAudioServer INFO

Database Access Configuration **

Page 6: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

Step 1: Off-line creation and compilation

Hub

NLGNLG

ASRASR DiscoursDiscours

TTSTTS DialogDialog

NLUNLU

Audio

Audio SBSB

Query

Response

Step 2: On-line deployment

INFO

INFO

Dialog

NLG

HUBNLU

DiscASR

Upload

Compile

Creating a Speech-Based Application

Page 7: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

AudioServer

AudioServer

• Telephone or lightweight audio server

DatabaseServer

DatabaseServer

• Accesses back-end database

Language ProcessingLanguage

Processing

• N-best interface with ASR

• Grammar from attributes & actions

• Backs off to concept spotting

ContextResolution

ContextResolution

• New component performs concept inheritance & masking

• Processes ‘E-form’

DialogueManagement

DialogueManagement

• Generic server handles interactionSpeech

Synthesis

SpeechSynthesis

• Commercial product

LanguageGeneration

LanguageGeneration

• Generates ‘E-form’, SQL, & responses

• Default entries made

• Galaxy programmable hub controls interactions between all components

Hub

Human Language Technologies

SpeechRecognition

SpeechRecognition

• Generic acoustic models

• Unknown word model

• Class or hierarchical n-gram

Page 8: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

• Some columns are used to access entries (e.g., Name)– Column entries must be incorporated into ASR & NLU

• Some columns are only used in responses (e.g., Phone)– Column names must be incorporated into ASR & NLU

Name Phone Email Office

Jim Glass x3-1640 [email protected] 603

Stephanie Seneff x3-0451 [email protected] 643

Victor Zue x3-8513 [email protected] 601a

“What is the phone number for Victor Zue?”

Extracting Database Information **

Page 9: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

Knowledge Representation

• Concepts and actions form basis for understanding– Concepts become key/value entries in meaning representation

* city: Boston, New York… day: Monday, Tuesday

– Actions provide sentence-level patterns of specific queries

* “I want to fly from Boston to Taipei…” action=lookup_flight

– Action text can be bracketed to define hierarchical concepts **

* “I want to fly source=(from Boston) destination=(to Taipei)”

* source=Boston destination=Taipei

– Concepts and actions used to configure the following components

* Speech Recognition

* Natural Language Understanding

* Discourse

• Database columns define basic concepts– Column names can be grouped into concepts

* property: phone, email… weather: snow, rain…

Page 10: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

• Concept usage can be fine-tuned to improve performance:**

• By default, concepts are used for language modeling, parsing grammar, and meaning representation

– For language modeling and parsing grammar only (i.e., no meaning)

– For keyword spotting only (i.e., no role in language modeling)

– For fine-grained language modeling with coarser meaning representation

rain

hailsnow weather: snow“Will it snow?”

sprinkles

flurriesshowers

breezy

rainysnowy

snowfallaccumulation

rainfall

snowstormthunderstorm

blizzard

weather: snow

Language Modeling and Understanding

Page 11: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

Current Status

• SpeechBuilder has been operational for over two years

– Used by over 50 developers from MIT and elsewhere

– Used in undergraduate classes at MIT and Georgetown University

• ASR capabilities benchmarked against main systems

– Achieves same ASR performance as MIT Jupiter weather information system (6.8% word error rate on clean data) (phone #)

• Several prototype systems have been developed

– Information about faculty, staff and students at LCS and AI Labs (phone, email, room, voice messages, transfer, etc.)

– Application to control the various physical items in a typical office (lights, curtains, TV, VCR, projector, etc.)

– Others include TV schedules, real-time weather forecasts, hotel and restaurant information etc.

• SpeechBuilder used for initial design of many more complex domains

Page 12: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

• Increase sophistication of discourse and dialogue manager to handle more complex dialogues

– Enable finer specification of discourse capabilities

– Add generic capabilities for times, dates, etc.

• Incorporate confidence scoring and implement unsupervised training of acoustic and language models

• Create functionality to allow developers to create domain-specific concatenative speech synthesis

• Create alternative methods of domain specifications to streamline development

– Advanced developers don’t necessarily use web interface

– Allow for more efficient automatic generation of SpeechBuilder domains

Ongoing and Future Work

Page 13: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

Issam Bazzi

Scott Cyphers

Ed Filisko

Jim Glass

TJ Hazen

Lee Hetherington

Joe Polifroni

Stephanie Seneff

Michelle Spina

Eugene Weinstein

Jon Yi

Misha Zitser

Acknowledgements

Page 14: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

L C S

SpeechBuilder Hands-on Activity

Eugene Weinstein

Project Oxygen Core Team

MIT Laboratory for Computer Science

[email protected]

Page 15: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

Frame RelayServer

Frame RelayServerHub

CGI ParameterGeneration

CGI ParameterGeneration

SpeechRecognition

SpeechRecognition

SpeechSynthesisSpeech

Synthesis

Language ProcessingLanguage

Processing

AudioServerAudioServer

TCP Socket

• Still gives developer total control over application functionality

• Frame Relay server exposes Galaxy meaning representation to app

DeveloperApplicationDeveloper

Application

Modified Baseline Configuration (this class)

Semantic

Frame

Jaim

Page 16: SpeechBuilder:  Facilitating  Spoken Dialogue System Creation

Eugene Weinstein – MIT Lab for Computer Science Oxygen Alliance 2003 Workshop – February 24-28, 2003

SpeechBuilder API

Galaxy Frame Relay

• Galaxy meaning representation provided through frame relay

• Applications connect via TCP sockets

• API provided in Perl, Python, and Java– This class: Python API

Python classgalaxy.server.Server

Application

Python classgalaxy.frame.Frame

galaxy.server.Server methods:Constructor(machine,port,ID)

connect()processMessage(blocking)

disconnect()

galaxy.frame.Frame methods:getAction()

getAttribute(attr_name)getText()toString()

Python

API

TCPSock

et