Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS...
Transcript of Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS...
![Page 1: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/1.jpg)
Speech Processing 11-492/18-492Speech Processing 11-492/18-492
Spoken Dialog SystemsConversing with machines
![Page 2: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/2.jpg)
Spoken Dialog SystemsSpoken Dialog Systems
Not just ASR bolted onto TTSNot just ASR bolted onto TTS Different styles of interactionDifferent styles of interaction
Question/response systemsQuestion/response systems Mixed initiative systemsMixed initiative systems ““How May I Help You?” open questionsHow May I Help You?” open questions True conversational machine-human interactionTrue conversational machine-human interaction
![Page 3: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/3.jpg)
SDS OverviewSDS Overview
IntroductionIntroduction Building simple dialog systemsBuilding simple dialog systems VoiceXMLVoiceXML
A language for writing systemsA language for writing systems Beyond tree-based systemsBeyond tree-based systems Beyond spoken languageBeyond spoken language Non-task-oriented systemsNon-task-oriented systems Real-world deployment considerationsReal-world deployment considerations
![Page 4: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/4.jpg)
SDS ApplicationsSDS Applications
Information giving/requestInformation giving/request Flights, buses, stocks and weatherFlights, buses, stocks and weather Driving directionsDriving directions Answer questions, newsAnswer questions, news
TransactionalTransactional Reply your emailReply your email Credit card and bank enquiries, product purchaseCredit card and bank enquiries, product purchase
MaintenanceMaintenance Technical supportTechnical support Customer serviceCustomer service
![Page 5: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/5.jpg)
![Page 6: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/6.jpg)
SDS ApplicationsSDS Applications
EntertainmentEntertainment Game characters (NPC), toys, robotsGame characters (NPC), toys, robots
TutoringTutoring Math, scienceMath, science Language learningLanguage learning
Health careHealth care Depression screeningDepression screening Aphasia therapy Aphasia therapy
![Page 7: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/7.jpg)
Dialog TypesDialog Types System initiativeSystem initiative
Form-filling paradigmForm-filling paradigm Can switch language models at each turnCan switch language models at each turn Can “know” which is likely to be saidCan “know” which is likely to be said
Mixed initiativeMixed initiative Users can go where they likeUsers can go where they like System or user can lead the discussionSystem or user can lead the discussion
Classifying:Classifying: Users can say what they likeUsers can say what they like But really only “N” operations possibleBut really only “N” operations possible E.g. AT&T? “How may I help you?”E.g. AT&T? “How may I help you?”
![Page 8: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/8.jpg)
System Initiative System Initiative
Most commonMost common Machine controls the callMachine controls the call Few choices in the dialogFew choices in the dialog
Simple form filling:Simple form filling: What is your bank account numberWhat is your bank account number
Advantages:Advantages: You know what users will say (sort of)You know what users will say (sort of) Hard for user to get confusedHard for user to get confused Hard for system to get confusedHard for system to get confused Easy to buildEasy to build
Disadvantages:Disadvantages: Limited flexibility in interactionLimited flexibility in interaction Fixed dialog structureFixed dialog structure
Most reliable, but many turnsMost reliable, but many turns
![Page 9: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/9.jpg)
System InitiativeSystem Initiative
Let’s Go Bus InformationLet’s Go Bus Information 412 268 3526 (Anytime)412 268 3526 (Anytime) Provides bus information for PittsburghProvides bus information for Pittsburgh
Tell MeTell Me Company getting others to build systemsCompany getting others to build systems Stocks, weather, entertainmentStocks, weather, entertainment
![Page 10: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/10.jpg)
Mixed InitiativeMixed Initiative
User or system takes initiativeUser or system takes initiative More interesting dialogsMore interesting dialogs ““jump” through different parts of dialog statejump” through different parts of dialog state
AdvantagesAdvantages More realistic dialog More realistic dialog Can do more complex tasksCan do more complex tasks
DisadvantagesDisadvantages Can get confusingCan get confusing Can miss important partsCan miss important parts
![Page 11: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/11.jpg)
Classification DialogsClassification Dialogs
Sort out from N thingsSort out from N things User says “anything” and system directs themUser says “anything” and system directs them ReceptionistReceptionist
I have a problem with my billI have a problem with my bill What’s the area code for MiamiWhat’s the area code for Miami Did you know I can see the beach from hereDid you know I can see the beach from here
AdvantagesAdvantages (Apparently) complex understanding(Apparently) complex understanding Solves a very common taskSolves a very common task
DisadvantagesDisadvantages Actually quite restrictiveActually quite restrictive Needs data to train fromNeeds data to train from Needs to be updated Needs to be updated
![Page 12: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/12.jpg)
Beyond TelephonesBeyond Telephones
TelematicsTelematics Voice communication in carsVoice communication in cars CPS, music selection etc CPS, music selection etc
Web-based dialog systemsWeb-based dialog systems Robot InteractionRobot Interaction
Robot-robot and robot-human interactionRobot-robot and robot-human interaction Animated talking headAnimated talking head
Non-player characters – web agentsNon-player characters – web agents Speech to Speech translationSpeech to Speech translation CMU Skylar: integrating many dialog CMU Skylar: integrating many dialog
systemssystems
![Page 13: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/13.jpg)
Team TalkTeam Talk
Using speech to control multiple robotsUsing speech to control multiple robots Robots have names and distinct voicesRobots have names and distinct voices They report to each other and to you in voiceThey report to each other and to you in voice
![Page 14: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/14.jpg)
Other SDS Other SDS
Microsoft: Situated InteractionMicrosoft: Situated Interaction Talking Head that follows youTalking Head that follows you
CMU SV: AidasCMU SV: Aidas Restaurant recommendations in situRestaurant recommendations in situ
![Page 15: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/15.jpg)
True conversationTrue conversation
Requires more than just speechRequires more than just speech Non-verbal noises: laughing, er, um, etcNon-verbal noises: laughing, er, um, etc Eye gazeEye gaze Proper timing (not waiting 500ms before Proper timing (not waiting 500ms before
speaker)speaker) Back-channelingBack-channeling MovementMovement Talking about nothingTalking about nothing
![Page 16: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/16.jpg)
RoboreceptionistRoboreceptionist
Entrance to NSHEntrance to NSH Keyboard (no ASR)Keyboard (no ASR) TTS, face, movementTTS, face, movement Range finder to detect peopleRange finder to detect people Significant background Significant background
charactercharacter Mostly talks about nothingMostly talks about nothing
![Page 17: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/17.jpg)
Personal Intelligent SystemsPersonal Intelligent Systems
Example: Apple Siri, Google Now, Microsoft Example: Apple Siri, Google Now, Microsoft Cortana, Amazon Echo, etc.Cortana, Amazon Echo, etc.
Hub of all applicationsHub of all applications ExtendableExtendable PersonalizationPersonalization Cross-LanguageCross-Language Cross-CulturalCross-Cultural Future: interface-> true companionFuture: interface-> true companion
![Page 18: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/18.jpg)
Speech Processing 11-492/18-492Speech Processing 11-492/18-492
Spoken Dialog SystemsSDS components
![Page 19: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/19.jpg)
Spoken Dialog SystemsSpoken Dialog Systems
More than just ASR and TTSMore than just ASR and TTS RecognitionRecognition Language understandingLanguage understanding Manipulation of utterancesManipulation of utterances Generation of new informationGeneration of new information Text generationText generation SynthesisSynthesis
![Page 20: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/20.jpg)
SDS ArchitectureSDS Architecture
Language Generation
ASRLanguage
Understanding
Synthesis
Dialog Manager
Error Handling Strategies
Non Understanding
![Page 21: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/21.jpg)
SDS InternalsSDS Internals
Language UnderstandingLanguage Understanding From words to structureFrom words to structure
Dialog ManagerDialog Manager State of dialog (who is talking)State of dialog (who is talking) Direction of dialog (what next)Direction of dialog (what next) References, user profile etcReferences, user profile etc Interaction of database/internetInteraction of database/internet
Language GenerationLanguage Generation From structure to wordsFrom structure to words
![Page 22: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/22.jpg)
Language UnderstandingLanguage Understanding
Parsing of SPEECH not TEXTParsing of SPEECH not TEXT Eh, I wanna go, wanna go to Boston tomorrowEh, I wanna go, wanna go to Boston tomorrow If its not too much trouble I’d be very grateful if If its not too much trouble I’d be very grateful if
one might be able to aid me in arranging my one might be able to aid me in arranging my travel arrangements to Boston, Logan airport, travel arrangements to Boston, Logan airport, at sometime tomorrow morning, thank you.at sometime tomorrow morning, thank you.
Boston, tomorrowBoston, tomorrow
![Page 23: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/23.jpg)
Parsing: Output structureParsing: Output structure
““I wanna go to Boston, tomorrow”I wanna go to Boston, tomorrow” Destination: BOSDestination: BOS Departure: 20081028, AMDeparture: 20081028, AM Airline: unspecifeeAirline: unspecifee Special: unspecifeeSpecial: unspecifee
Convert speech to structureConvert speech to structure Sufficient for further processing/querySufficient for further processing/query
![Page 24: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/24.jpg)
Interaction ExampleInteraction Example
Intelligent Agent
Cheap Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.
find a cheap eating place oor taiwanese oood
User
![Page 25: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/25.jpg)
SDS ProcessSDS Process
find a cheap eating place oor taiwanese oood
User
target
ooodpriceAMOD
NN
seeking
PREP_FORIntelligent
Agent
![Page 26: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/26.jpg)
SDS ProcessSDS Process
User
target
ooodpriceAMOD
NN
seeking
PREP_FOR
Organized Domain Knowledge
Intelligent Agent
Ontology Induction
(semantic slot)
find a cheap eating place oor taiwanese oood
![Page 27: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/27.jpg)
SDS ProcessSDS Process
User
target
ooodpriceAMOD
NN
seeking
PREP_FOR
Organized Domain Knowledge
Intelligent Agent
Ontology Induction
(semantic slot)
Structure Learning
(inter-slot relation)
find a cheap eating place oor taiwanese oood
![Page 28: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/28.jpg)
SDS ProcessSDS Process
User
target
ooodpriceAMOD
NN
seeking
PREP_FORIntelligent
Agent
seeking=“find”target=“eating place”price=“cheap”oood=“taiwanese”
find a cheap eating place oor taiwanese oood
![Page 29: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/29.jpg)
find a cheap eating place oor taiwanese oood
SDS ProcessSDS Process
User
target
ooodpriceAMOD
NN
seeking
PREP_FORIntelligent
Agent
seeking=“find”target=“eating place”price=“cheap”oood=“taiwanese”
Semantic Decoding
![Page 30: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/30.jpg)
Automatic Slot IneuctionAutomatic Slot Ineuction
Chen et al. ASRU’13Chen et al. ASRU’13
can i have a cheap restaurant
Frame: capability
Frame: expensiveness
Frame: locale by use
Domain
DomainGeneral
slot candidate
32
can i have a cheap restaurant
![Page 31: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/31.jpg)
Parsing vs Language ModelParsing vs Language Model
Language ModelLanguage Model Model what actually gets saidModel what actually gets said
Parsing Parsing Extract the information you wantExtract the information you want
Models *can* be sharedModels *can* be shared Only accept things in the grammarOnly accept things in the grammar Can be over limitingCan be over limiting
![Page 32: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/32.jpg)
Neural Networks for SLUNeural Networks for SLU
RNN for Slot FillingRNN for Slot Filling
Step 1: word embedding Step 1: word embedding Step 2: short-term dependencies capturingStep 2: short-term dependencies capturing Step 3: long-term dependencies capturingStep 3: long-term dependencies capturing Step 4: different types of neural architectureStep 4: different types of neural architecture
http://deeplearning.net/tutorial/rnnslu.html#rnnsluhttp://deeplearning.net/tutorial/rnnslu.html#rnnslu
Mesnil et al. 2013
![Page 33: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/33.jpg)
Interactive Learning for SLUInteractive Learning for SLU
Luis : Interactive machine learning for Luis : Interactive machine learning for language understandinglanguage understanding
Advantages: Advantages: Non-expert could add in knowledge in Non-expert could add in knowledge in feature engineeringfeature engineeringActive-learning reduces heavy labelingActive-learning reduces heavy labeling
https://www.luis.ai/ https://www.luis.ai/
Williams et al. 2016
![Page 34: Speech Processing 11-492/18-492tts.speech.cs.cmu.edu/courses/11492/slides/sds_intro_comp.pdfSDS Internals Language Understanding From words to structure Dialog Manager State of dialog](https://reader034.fdocuments.net/reader034/viewer/2022043016/5f39263de8436a6aab050a3d/html5/thumbnails/34.jpg)
Dialog ManagerDialog Manager
Maintain stateMaintain state Where are we in the dialogWhere are we in the dialog Whose turn is itWhose turn is it
Waiting for speakerWaiting for speaker Waiting for database query (stall user)Waiting for database query (stall user)
Deal with barge-inDeal with barge-in