Post on 16-Dec-2015
RRL: A Rich Representation Language
for the Description of Agent Behaviour in NECA
Paul Piwek, ITRI, BrightonBrigitte Krenn, OFAI, Vienna
Marc Schröder, DFKI, Saarbrücken Martine Grice, IPUS, Saarbrücken
Stefan Baumann, IPUS, Saarbrücken Hannes Pirker, OFAI, Vienna
NECA
Duration: 2.5 years
Start: October 2001
A new generation of mixed multi-user / multi agent virtual spaces for the internet
Populated by affective conversational agents
Affective Conversational Agents
• Express themselves through
– Emotional speech and – synchronised non-verbal expression
Application Scenarios
Socialite– a multi-user web-application in the social
domain eShowRoom – a novel approach to the presentation of
products in e-Commerce applications
The NECA Platform will be evaluated in two concrete application scenarios
Socialite
NECA’s Architecture
Scene Generator
User Input
Scene Description
Affective Reasoner (AR)
NECA’s Architecture
Scene Generator
User Input
Scene Description
Multi-modal Output
Multi-modal Natural Language Generator (M-NLG)
Affective Reasoner (AR)
NECA’s Architecture
Scene Generator
Text/Concept to Speech Synthesis (CTS)
User Input
Scene Description
Multi-modal Output
Multi-modal Natural Language Generator (M-NLG)
Phonetic+Prosodic Information
Affective Reasoner (AR)
Emotional Speech
NECA’s Architecture
Scene Generator
Text/Concept to Speech Synthesis (CTS)
User Input
Scene Description
Multi-modal Output
Multi-modal Natural Language Generator (M-NLG)
Gesture Assignment Module (GA)
Phonetic+Prosodic Information
Affective Reasoner (AR)
Emotional Speech
Animation directives
NECA’s Architecture
Scene Generator
Text/Concept to Speech Synthesis (CTS)
User Input
Scene Description
Multi-modal Output
Multi-modal Natural Language Generator (M-NLG)
Gesture Assignment Module (GA)
Animation Control Sequence
Phonetic+Prosodic Information
Affective Reasoner (AR)
Emotional Speech
Player-Specific Rendering
Animation directives
NECA’s Architecture
Scene Generator
Text/Concept to Speech Synthesis (CTS)
User Input
Scene Description
Multi-modal Output
Multi-modal Natural Language Generator (M-NLG)
Gesture Assignment Module (GA)
Animation Control Sequence
Phonetic+Prosodic Information
Affective Reasoner (AR)
Emotional Speech
Player-Specific Rendering
Animation directives
RRL
RRL
RRL
RRL
Requirements for RRL
• Application Domain– Represent combinations of different types of
information– Expressivity
• Processing Modules– Ease of manipulation/search (incremental/fast)
• Developers (Maintainability)– Predictability– Locality– Conciseness– Intelligibility
Scene DescriptionSG
M-NLG
GA
TTS/CTS
What is a Scene? I Theatr. 1 A subdivision of (an act of) a play, in which the time is continuous and the setting fixed, …; the action and dialogue comprised in any one of these subdivisions. (New Shorter Oxford English Dictionary, 1996)
Scene Descriptions in a Nutshell
• Network representations:– Flat, uniform– Use the Description Logical T and A-box distinction.
T-box defines types, subtypes, attributes and constants
– Can emulate CFGs, so we can include, e.g., semantic representation languages: Discourse Representation Theory (Kamp & Reyle, 1994)
– Reification of expressions in the network provide useful handles for interleaving different types of information
– Lends itself well for graphical representation
Scene Descriptions in a Nutshell
• Further Features of (RRL) Scene Descriptions– For communication between modules: XML syntax– Temporal relations are explicitly represented.– Meta-conditions used in DRT for WH-questions,
Topics and Bridging Anaphora
eShowRoom Example
eShowRoom Example
eShowRoom Example
eShowRoom Example
Multimodal OutputSG
M-NLG
GA
TTS/CTS
• Multimodal Natural Language Generation (M-NLG) supplies– Information on emotional state– Conceptually rich input for Speech
Synthesis– Initial specification of gestures and facial
expressions for later use in Gesture Assignment
Neca’s Speech Synthesis: EmotionsSG
M-NLG
GA
TTS/CTS
• Not restricted to prosody (pitch, duration) • Several voice databases
– diphon-inventories for different voice qualities (modal, loud, soft)
• Emotive interjections• Gradual emotional states
– Shades of emotion / changing over time
Neca’s Speech Synthesis: Concept-to-SpeechSG
M-NLG
GA
TTS/CTS
• Concept-to-Speech instead of Text-to-Speech approach:– Part of Speech tags– Syntactic structure– Information status (given/new) – Information structure (theme/rheme)
CTS specific informationSG
M-NLG
GA
TTS/CTS
• <sentence>• <text>This car has leather seats.</text>• <gesture modality="voice" meaning="beautiful"/>• <sentence>
CTS specific informationSG
M-NLG
GA
TTS/CTS
• <sentence>• <text>This car has leather seats.</text>• <gesture modality="voice" meaning="beautiful"/>• • • <word text="This" pos="PDAT"/>• • <word text="car" pos="NN"/>• • • • • <word text="has" pos="VAFIN"/>• • <word text="leather seats" pos="NN" />• • <punct text="." pos="$."/>• </sentence>
CTS specific informationSG
M-NLG
GA
TTS/CTS
• <sentence>• <text>This car has leather seats.</text>• <gesture modality="voice" meaning="beautiful"/>• • <synPhrase category="NP" function="SB">• <word text="This" pos="PDAT"/>• • <word text="car" pos="NN"/>• • </synPhrase>• • <synPhrase phrase="VP" function="PD"> • <word text="has" pos="VAFIN"/>• <synPhrase phrase="NP" function="OA">• <word text="leather seats" pos="NN" />• </synPhrase>• <punct text="." pos="$."/>• </synPhrase>•
CTS specific informationSG
M-NLG
GA
TTS/CTS
• <sentence>• <text>This car has leather seats.</text>• <gesture modality="voice" meaning="beautiful"/>• • <synPhrase category="NP" function="SB">• <word text="This" pos="PDAT"/>• <infoStatus type="referent-given">• <word text="car" pos="NN"/>• <infoStatus />
• </synPhrase>• • <synPhrase phrase="VP" function="PD"> • <word text="has" pos="VAFIN"/>• <synPhrase phrase="NP" function="OA">• <word text="leather seats" pos="NN" />• </synPhrase>• <punct text="." pos="$."/>• </synPhrase>
CTS specific informationSG
M-NLG
GA
TTS/CTS
• <sentence>• <text>This car has leather seats.</text>• <gesture modality="voice" meaning="beautiful"/>• <infoStruct part="theme">• <synPhrase category="NP" function="SB">• <word text="This" pos="PDAT"/>• <infoStatus type="referent-given">• <word text="car" pos="NN"/>• </infoStatus>• </synPhrase>• <infoStruct part="rheme">• <synPhrase phrase="VP" function="PD"> • <word text="has" pos="VAFIN"/>• <synPhrase phrase="NP" function="OA">• <word text="leather seats" pos="NN" />• </synPhrase>• <punct text="." pos="$."/>• </synPhrase>• </infoStruct>• </infoStruct>• </sentence>
Prosodic/Phonetic Information for GASG
M-NLG
GA
TTS/CTS
• Phonetics– exact timing of speech sounds,
pauses and interjections
• Prosody– boundarie locations for
• syllables• words• prosodic phrases
Prosodic/Phonetic Information for GASG
M-NLG
GA
TTS/CTS – information on:• syllables bearing word-stress• position and type of sentence accents• position and type of prosodic
boundaries
Animation directivesSG
M-NLG
GA
TTS/CTS • Phonetic information (phonemes) used for specifying
– Visemes– breathing
Animation directivesSG
M-NLG
GA
TTS/CTS • Prosodic information (stress, accents, phrasing) used for specifying – synchronization of gestures with speech– eye-blinking– gaze
Conclusions
• RRL is representation language for wide range of expert knowledge required at interfaces of NECA modules.
• Scene Descriptions: uniform representation/integration of different types of information (illustrated with integration of DRT); using handles;…
• Speech Synthesis: conceptually rich input as opposed to text
• Gesture Assignment: access to exact timing of speech