May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
-
Upload
antony-atkins -
Category
Documents
-
view
212 -
download
0
Transcript of May 2006CLINT-CS Verbmobil1 CLINT-CS Dialogue II Verbmobil.
May 2006 CLINT-CS Verbmobil 2
Verbmobil
• Verbmobil is a spoken dialogue system that provides phone users with simultaneous dialogue interpretation services for restricted topics.
• Recognises spoken input, translates it, and then utters the translation.
• Three languages: German, English and Japanese
May 2006 CLINT-CS Verbmobil 3
Challenges for S and L Technology
Input Conditions
Naturalness Adaptability Dialogue Capabilities
Close speaking, PTT
Isolated words Speaker dependent
Monologue dictation
Telephone, pause based segmentation
Read continuous speech
Speaker independent
Information seeking dialogue
Open microphone, GSM quality
Spontaneous speech
Speaker adaptive
Multiparty negotiation
Incr
easi
ng d
iffic
ulty
May 2006 CLINT-CS Verbmobil 4
Grand Challenges
• Not a push-to-talk system. Has to decide for itself when user input is complete.
• Spontaneous speech including disfluencies and repair phenomena.
• Speaker adaptive.• Mixed initiative dialogue• Three different domains of discourse
May 2006 CLINT-CS Verbmobil 5
Domains
Scenario 1
Appointment
Scheduling
Scenario 2
Travel Planning
Scenario 3
Remote PC Maintenance
When?
Focus on temporal expressions
Vocabulary 2.5-6K
When? Where? How?
Focus on Temporal and spatial expresssions
Vocabulary 7-10K
What? When? Wherer? How?
Focus on integration of special sublanguage lexica
Vocabulary 15-30K
May 2006 CLINT-CS Verbmobil 6
Data Collection
Transliteratedspeech data
Segmented speech with prosodic labels
Dialogues annotatedwith dialogue acts
Treebanks& predicateargument structures
Aligned bilingualCorpora
A signficant programme of data collection was performedTo extract statistical properties of different kinds of data
May 2006 CLINT-CS Verbmobil 7
Speech Data
• Multi channel recording– close-speaking microphone– room microphone– various telephones
• Speech recognisers trained on data sets of different audio quality
May 2006 CLINT-CS Verbmobil 8
Multi Level Data Annotation
• Speech Data– Transliteration– Orthography– Pronunciation– Phonological Segmentation– Word Segmentation– Prosodic Segmentation
• Non Speech– Dialogue Acts– Treebanks
May 2006 CLINT-CS Verbmobil 9
Statistical Models
• Data used to train different statistical models using Machine Learning.
• Models include– Neural Networks– Probabilistic Automata (HMMs for speech)– Probabilistic CFGs (robust parsing)– Probabilistic Transfer Rules
May 2006 CLINT-CS Verbmobil 11
Architecture
• Different input devices (microphone, telephone, mobile, internet)
• Multilingual speech recognition (EN, DE, JP) including prosodic analysis
• Parsing
• Multi-level translation
• Multi-lingual generation
May 2006 CLINT-CS Verbmobil 12
Multi Engine Parsing Architecture
• Three different parsing models are employed– Probabilistic LR Parser– Robust Chunk Parsing– HPSG Chart Parser
• All parsing models produce trees that are tranformed into the same multistratal representation called VIT (Verbmobil Interface Terms)
• This facilitates integration of partial results from the different parsing models
May 2006 CLINT-CS Verbmobil 13
Translation Models
• Substring Based
• Template Based
• Dialogue Act Based
May 2006 CLINT-CS Verbmobil 14
Substring Based Translation
• Starts with the best sentence hypothesis of the speech recogniser
• Uses prosodic information to determine phrase boundaries and sentence mode
• Machine Learning methods applied to a sentence-aligned bilingual corpus
• The output of this module is a sequence of words in the target language together with a confidence measure that is used for selecting the best translation.
May 2006 CLINT-CS Verbmobil 15
Template Based Translation
• Based on 30K translation templates learned from a sentence-aligned corpus
Ti = (Tis,Ti
t){x1,..,xn}
• 3 phases:– SL Template matching– Subphrase Translation– TL utterance generation
May 2006 CLINT-CS Verbmobil 16
Template Translation Results
WL Best Hypothesis
All Word Lattice
Perfect Translation 47% 67%
Approx. Correct 16% 6%
Bad Translation 15% 5%
No Translation 22% 22%
May 2006 CLINT-CS Verbmobil 17
Multi Engine TranslationSegment 1If you prefer another hotel
Segment 2please let me know
case basedtranslation
substring basedtranslation
selection module
statisticaltranslation
dialogue basedtranslation
semantictransfer
Segment 1Semantic Xfer
Segment 2CBT
May 2006 CLINT-CS Verbmobil 18
Dialogue Act Based Translation
• Meaning based translation• Statistical classification of 19 dialogue acts.• Extraction of propositional content using finite
state transducers.• Content built from an ontology covering
appointment scheduling and travel planning tasks.
• Template based approach to generation of target language from content.
May 2006 CLINT-CS Verbmobil 19
Part of Ontology for Propositional Content
top
object situation quality
agent location
event actionabstract concrete
move-by-rail move-by-plane
move by public transport
journey move stay show meeting
May 2006 CLINT-CS Verbmobil 20
Dialogue Act Hierarchy
deliberatethankintroducebyegreet
control dialogue
promote task
manage task
DialogueAct
request suggestrequest clarifyrequest commentrequest commit
digressexcludeclarifyjustify
requestsuggestinformfeedbackcommitoffer
initdeferclose
May 2006 CLINT-CS Verbmobil 21
Dialogue –Based Translation:Transfer Component rules
Semantic RepresentationSource Language VIT
Semantic RepresentationTarget Language VIT
Dia
logu
e an
dco
ntex
t ev
alua
tion
GENERATION
May 2006 CLINT-CS Verbmobil 22
Prosody
• Input– Speech signal– Word Hypothesis Graph (WHG)
• Output– annotated WHG including, per word– duration, pitch, energy, pause info
• Used to classify phrase and clause boudaries, accented words, and sentence mood.
May 2006 CLINT-CS Verbmobil 23
Prosody – Sentence Mood
row? morYou are coming to
You are coming to mor ro w.
time
pitch
May 2006 CLINT-CS Verbmobil 24
Use of Prosodic Information
• Prosodic information is used systematically at all processing stages
• Prosodic difference can lead to different translation… wir haben noch (we still have vs. we have another)
May 2006 CLINT-CS Verbmobil 25
Multi Blackboard Architecture
• Final system comprises 69 highly interactive modules.
• No direct communication between modules.• Communication is handled by 198
blackboards.• Shared representation structures• A module typically subscribes to several
blackboards.
May 2006 CLINT-CS Verbmobil 26
Blackboards & Modules
command recogniser
generationrobust dialogue
semantics
semantic construction
spontaneous speechrecogniser
speakeradaptation
prosodic analysis
chunk parser
HPSG parser
semantictransfer
statissticalparser
dialogue actrecognition
Audio Data
WHG withprosodic labels
VIT discourserepresentation
May 2006 CLINT-CS Verbmobil 27
Multi Engine Approach
statisticalparser
chunk parser
HPSGparser
robust dialogue semanticKBased reconstruction
complete and spanning VIT
chart containingpartial VITs
AugmentedWHG
May 2006 CLINT-CS Verbmobil 28
Achievements
• 3 language pairs, three domains and a vocalbulary size of over 100K word forms
• Average processing time 4x original signal duration
• Word recognition rate of 75% for spontaneous speech
• 80% approximately correct translations• 90% success rate for dialogue tasks in end-
to-end evaluation
May 2006 CLINT-CS Verbmobil 29
Conclusion
• Speech to speech translation of spontaneous dialogues can only be cracked by combining deep and shallow processing
• The final architecture maximises the necessary interaction between processing modules
• Software engineering considerations must be taken seriously in such a project.