ESSLLI 2001 Helsinki Languages for the Annotation and Specification of Dialogues Languages for the...
-
Upload
meghan-young -
Category
Documents
-
view
219 -
download
1
Transcript of ESSLLI 2001 Helsinki Languages for the Annotation and Specification of Dialogues Languages for the...
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Languages for the Annotation and Specification of Dialogues
(updated 31-Oct-2001)
Gregor Erbach
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Course Outline
1. Introduction to Spoken Dialogue Systems2. Linguistic Resources in SDS3. Developing Spoken Dialogue Applications4. Annotation of Dialogues
– Uses of annotated dialogues– Levels of annotation, multilevel annotation– Annotation Graphs– Annotation Frameworks (ATLAS)
5. Introduction to XML6. Dialogue Annotation in XML (MATE)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Outline (2)
7. Evaluation of Spoken Dialogue Systems 8. Dialogue Specification Languages
– Behaviouristic Models (pattern-response)– Finite-State Models– Slot-Filling– Condition-Action Rules (HDDL)– Planning– Re-usable Dialogue Behaviours: SpeechObjects
9. Voice XML10. Research Challenges
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
• Human-machine dialogue differs from human-human dialogue:
– limited natural-language understanding
– limited vocabulary
– limited back-channel
– limited world knowledge and inference capabilities
– limited social and emotional competence
– speech recognition errors
• Design and implementation of dialogue system is a discipline between science and engineering
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Dialog System Architecture
speechoutput
dialoguecontrol
applicationlogic /reasoning database /
knowledge base
speech under-standing
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Dialogue Modelling
Interaction Model
Language Model Dialogue Model
(from Bernsen, Dybkjær and Dybkjær, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Speech and Audio Processing
Speech Understanding
Signal processing:
– Convert the audio wave into a sequence of feature vectors
Speech recognition:
– Decode the sequence of feature vectors into a sequence of words
Semantic interpretation:
– Determine the meaning of the recognized words
Speech Output
Speech generation:
– Generate marked-up word string from system semantics
Speech synthesis:
– Generate synthetic speech from a marked-up word string
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Automatic Speech Recognition (ASR)
• Research activities since the 1950s
• Widespread commercial use since a number of years, enabled by increased processor power, memory and better software engineering
• speech recognisers can be implemented on PCs as software-only applications
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
ASR Fundamentals
• Digitisation of the acoustic signal
• Signal analysis: distribution of acoustic energy over time and frequency, represented as feature vectors
• Matching against stored patterns (acoustic models)
• Selection of the best pattern by using linguistic knowledge and world knowledge
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Signal Analysis
(Output of the speech analysis tool PRAAT)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Challenges in ASR
• speaker-independent recognition
• Variation of speakers (age, dialect, diseases ...)
• Vocabulary size
• Continuous speech
• Spontaneous speech
• Background noise
• Distorted signal transmission
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Difficulty vs. Vocabulary Size
10 100 1000 10000 100000 1M
Dialoguesystem
Dictation System
Task Difficulty
Device control
VoiceDialling
Vocabulary Size
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
The Speech Recognition Problem
• Bayes’ Law
– P(a,b) = P(a|b) P(b) = P(b|a) P(a)
– Joint probability of a and b = probability of b times the probability of a given b
• The Recognition Problem
– Find most likely sequence w of “words” given the sequence of acoustic observation vectors a
– Use Bayes’ law to create a generative model
– ArgMaxw P(w|a) = ArgMaxw P(a|w) P(w) / P(a)
= ArgMaxw P(a|w) P(w)
• Acoustic Model: P(a|w)
• Language Model: P(w) (from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Pronunciation Modelling
• Needed for speech recognition and synthesis
• Maps orthographic representation of words to sequence(s) of phones
• Dictionary doesn’t cover language due to:
– open classes
– names
– inflectional and derivational morphology
• Pronunciation variation can be modeled with multiple pronunciation and/or acoustic mixtures
• If multiple pronunciations are given, estimate likelihoods
• Use rules (e.g. assimilation, devoicing, flapping), or statistical transducers
(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Language Modelling
• Assigns probability P(w) to word sequence w = w1 ,w2,…,wk
• Bayes’ Law provides a history-based model:
P(w1 ,w2,…,wk)
= P(w1) P(w2|w1) P(w3|w1,w2) … P(wk|w1,…,wk-1)
• Cluster histories to reduce number of parameters
(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
N-Gram Language Modelling
• n-gram assumption clusters based on last n-1 words
– P(wj|w1,…,wj-1) ~ P(wj|wj-n-1,…,wj-2 ,wj-1)
– unigrams ~ P(wj)
– bigrams ~ P(wj|wj-1)
– trigrams ~ P(wj|wj-2 ,wj-1)
• Trigrams often interpolated with bigram and unigram:
– thei typically estimated by maximum likelihood estimation on held out data (F(.|.) are relative frequencies)
– many other interpolations exist (another standard is a non-linear backoff)
k kk kk k wF
wF
wwF
wwF
wwwF
wwwFwww
)(
)(
)|(
)|(
),|(
),|(),|(P̂ 3
12
232
21
2133213
(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Recognition Grammars• Restrict the possible user inputs at each step of the dialogue
• Restriction of possible inputs is necessary for speaker-independent systems to improve recognition accuracy
• Recognition grammars in commercial dialogue systems are generally regular or context-free grammars
• Dynamically generated grammars can be used which are adapted to the state of the dialogue
• Closed grammars match user input from beginning to end
• Open grammars match parts of the user input
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Finite-State Language Models
• Write a finite-state task grammar (with non-recursive CFG)
• Simple Java Speech API example (from user’s guide):
public <Command> = [<Polite>] <Action> <Object> (and <Object>)*;
<Action> = open | close | delete;
<Object> = the window | the file;
<Polite> = please;
• Typically assume that all transitions are equi-probable
• Technology used in most current applications
• Can put semantic actions in the grammar(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Java Speech Grammar Format• Java Speech Grammar Format (JSGF) is a widely used format
for recognition grammars
<xyz> Grammatical Category xyz
* Repetition (0 to n times)
+ Repetition (1 to n times)
(...) Grouping
[...] Grouping, optional
| Alternatives
/n/ Alternative with weight n
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Recognition Grammar in JSGF
#JSGF V1.0 ISO8859-1 en;
grammar com.acme.commands;
<basicCmd> = <startPolite> <command> <endPolite>;
<command> = <action> <object>;
<action> = /10/ open |/2/ close |/1/ delete |/1/ move;
<object> = [the | a] (window | file | menu);
<startPolite> = (please | kindly | could you | oh mighty computer) *;
<endPolite> = [ please | thanks | thank you ];
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Word hypothesis graphs• Keep multiple tokens and return n-best paths/scores:
– p1 flights from Boston today
– p2 flights from Austin today
– p3 flights for Boston to pay
– p4 lights for Boston to pay• Can produce a packed word graph (a.k.a. lattice)
– likelihoods of paths in lattice should equal likelihood for n-best
flights
lights
from
for
for
Boston
Boston
Austin
today
topay
(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Measuring Recognition Performance
• Word Error Rate =
• Example scoring:
– actual utterance: four six seven nine three three seven
– recognizer: four oh six seven five three seven
insert subst delete
– WER: (1 + 1 + 1)/7 = 43%• Would like to study concept accuracy
– typically count only errors on content words [application dependent]
– ignore case marking (singular, plural, etc.)• For word/concept spotting applications:
– recall: percentage of target words (concept) found
– precision: percentage of hypothesized words (concepts) in target
Words
onsSubstitutiDeletionsInsertions
(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Dictation vs. Dialogue System
Only certain pat-terns are recognised at each step
Unrestricted, including complex sentences
Nature of the
User Input
Several thousand words, of which a subset is active
Up to 100.000 words, always active
Vocabulary
Size
Speaker-independentSpeaker-dependent or speaker-adaptive
(must be trained for each speaker)
Speaker
dependence
Dialogue system Dictation system
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Speaker Verification
• Speaker verification: confirm the claimed identity of a speaker
• Speaker identification: recognition of one speaker among a group of potential candidates
• Evaluation by means of the ratios "false acceptance" and "false rejection"
• One measure can be improved at the expense of the other
• For high-security applications, speaker verification should be combined with other methods (password, chip card, biometrics...).
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Linguistic Resources for Dialogue Systems
• Acoustic Models
• Phonetic Lexicon
• Language Models (Grammars)
• Dialogue Specifications
• System Output (Prompts)
• Training data: annotated human/human or human/machine dialogues
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Lingusistic Resources for SDS
Acoustic Models
• Tri-phone HMMs
• transcribed speech used for training
• orthographic transcriptions + noise markers + phonetic lexicon
• SpeechDat is a standard format for transcription. Each audio file is associated with a label file which contains the transcription plus information about the speaker (age, sex, education level) and the call (telephone network, environment)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
LHD: SAM, 6.0DBN: SpeechDat_Austrian_MobileVOL: MOBIL1AT_01SES: 0099DIR: \MOBIL1AT\BLOCK00\SES0099SRC: B10099C2.ATACCD: C2BEG: 0END: 63487REP: Connect Austria, ViennaRED: 02/Jan/2000RET: 16:15:45SAM: 8000SNB: 1SSB: 8QNT: A-LAW
2. Lingusistic Resources for SDS
SPEECHDAT Label File
SCD: 000099SEX: FAGE: 22ACC: NOEREG: WienENV: HOMENET: MOBILE, A1PHM: UNKNOWN, EFRSHT: 600-0663EDU: MATURANLN: DE-ATASS: OKLBD:LBR: 0,63487,,,,0354/329 851LBO: 0,,63487,[sta] null drei fünf
vier drei zwei neun acht fünf eins
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Lingusistic Resources for SDS
Phonetic Lexicon
• The phonetic lexicon consists of pairs <orthography, phonetic-representation+>, where the phonetic symbols correspond to the acoustic models used in the speech recogniser
• Phonetic lexicons are also used for text-to-speech synthesis.
• Example (with SAM-PA transcriptions):Abkehr a p k e:6
Abkommen a p + k O m @ n a p k O m @ n
Abkommens a p k O m @ n s
Ablauf a p l aU f
Ablegers a p l e: g 6 s
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Lingusistic Resources for SDS
Language Models
• Two kinds of language models are widely used: statistical language models and recognition grammars
• Statistical LMs are generally used for dictation systems
• Recognition grammars are often used for speaker-independent dialogue systems
• Recognition grammars are often finite-state models, or non left-recursive context-free grammars
• Statistical LMs and recognition grammars can be combined (e.g. Philips, Nuance 8)
• Language models can be trained or optimised using text corpora or transcriptions of dialogues
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Lingusistic Resources for SDS
Dialogue Specifications
• Dialogue specifications are used to control the flow of the dialogue
• Dialogue specifications can be expressed
– as executable code in some programming language
– as a task model
– in some dialogue specification language
• Dialogue specifications must provide repair strategies to deal with recognition failures and unacceptable user input
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Lingusistic Resources for SDS
System Output (Prompts)
• Prompts are the speech output provided to the user of the dialogue system
• Prompts should
– be clear and understandable
– encourage the user to produce system-friendly speech input
– convey the personality chosen for the system
• Other audio sounds ("earcons") can be used in addition to prompts to provide orientation
• Prompts can be pre-defined, constructed by concatenation of partial prompts, or produced by a NL generator
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
1. Spoken Dialogue Systems
Speech Output
• Recorded vs. synthesised speech
• Recorded speech has higher user acceptance
• Ensure smooth transitions and appropriate prosody when concatenating recorded speech
• In case of large or highly variable vocabulary, speech synthesis must be used.
• Speech synthesisers are evaluated according to intelligibility and naturalness.
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
2. Lingusistic Resources for SDS
Training data: annotated dialogues
• Transcribed speech data (not necessarily dialogues) for training of speech recogniser
• Text data (ideally transcriptions of dialogues from a running application) for training of language models and/or optimization of recognition grammars
• Labelled dialogues to determine the likely sequence of dialogue acts (dialogue grammar)
• Dialogues labelled with communication failures and emotional markup for optimizing dialogue specifications
• Annotated dialogues as a resource for system evaluation
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
• Conflicting requirements: system "intelligence" vs. control of the dialogue flow
• Imperfections of speech recognition (errors are the rule, not the exception)
• Limited "understanding" of user utterances (out of vocabulary, out of grammar)
• Dialogue system must take the initiative after dialogue failure and try to recover from the errors
• Personality of the dialogue application
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Development Process
1. Requirements specification
2. Definition of dialogue flow
3. Rapid prototyping or Wizard-of-Oz Experiment (outputs: annotated dialogues, questionnaires, interviews)
4. Pilot system with basic functionality
5. Internal Tests
6. Trascription and annotation of dialogues
7. Optimisation of system functionality
8. Tests with external users
9. Extension and tuning of the system
10. Unless satisfactory system performance: go to 5
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Tasks and Roles
• Gather requirements and produce requirement specification (Analyst)
• Specify dialogue flow (Dialogue Designer)
• Define prompts (Interaction Designer)
• Write and optimise recognition grammars (Grammar Writer)
• Usability testing with "real" users (Usability Tester)
• Transcribe and annotate dialogues from usability testing and deployed application (Annotator)
• Test and optimize grammars, language models and dialogues (Quality Assurance Engineer, Grammar Writer, Dialogue Designer)
• System Integration (Software Engineer)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Dialogue Initiative
1. System initiative
for systems that are not regularly used by the same users
2. User initiative
experienced users can issue commands without system prompts
3. Mixed initiative
e.g., for user questions or activation of help functionality
Over-answering of questions by the user
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Barge-in
• "Barge-In" is the interruption of system output by user input
• Advantages:
– Possibility to interrupt long system outputs (e.g. timetable information, reading of e-mails)
– Faster answering of system questions for regular users• Problems:
– Interruption of system output through background noise or side speech (to or from colleagues or children)
– Echo cancellation required to avoid activation of barge-in by system output
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Verification of User Input
• Verification is the confirmation of user input by the system, with a possibility of correction
• Explicit Verification: User must confirm the input explicitly, usually by saying "yes" or "no"
• Implicit Verification: The user's input is repeated, and accepted if the user does not contradict.
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Repair Strategies
• Misunderstandings and communication problems are common in human-human and in human-machine dialogues
• Repair strategies are used for recovering from communication failure.
• The relatively poor performance of speech recognisers causes many misunderstandings
• Repair strategies must therefore be part of every practical dialogue system
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Causes of Communication Problems
• No speech detected (volume too low)
• Failure to detect beginning or end of speech accurately (endpointing)
• Misrecognitions or no recognition results due to
– background noise
– distorted speech transmission (microphone, phone line)
– out-of-vocabulary words
– out-of-grammar input
– speaker variation
– lack of semantic interpretation
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Engineering Issues
• Telephony interfaces (analog, digital, Voice over IP)
• Speech signal quality
• Echo cancellation
• Scalability to hundreds or thousands of concurrent dialogues
• Real-time speed
• System integration
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Speech Recognition API• Abstract from recognition/synthesis engines
• Recognizer and synthesizer loading
• Acoustic and grammar model loading (dynamic updates)
• Recognition
– online
– n-best or lattice
• Synthesis
– markup
– barge in
• Acoustic control
– telephony interface
– microphone/speaker interface(from Carpenter and Chu-Carroll, 1998)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Case Study: Postal Rates System
• Technical Basis
– Philips SpeechMania with Austrian German acoustic models
– Windows NT 4.0
– Pentium III (Minimum: Pentium II, 300 MHz)
– 256 MB RAM (Minimum: 64 MB)
– ISDN card• Co-operation between Forschungszentrum Telekommunikation Wien
(ftw.) und the Austrian Post Office
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Case Stude: System Requirements
• Mostly casual users
• System initiative
• Over-answering is allowed
• No Barge-In
• Speed of Interaction is important
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Example Dialogue (1)
S1 (System): Möchten Sie ein Paket oder eine Briefsendung verschicken?
B2 (Benutzer): eine Briefsendung
S3: Möchten Sie|den Brief|ins Inland oder ins Ausland verschicken?
B4: ins Ausland
S5: In welches Zielland wollen Sie|den Brief|schicken?
B6: nach Italien
S7: Soll der Brief|nach Italien|als Priority versendet werden?
B8: nein danke bitte ich richtig (nein danke, das ist nicht nötig)
S9: Bitte geben Sie das Gewicht des Briefes an
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
3. Developing Spoken Dialogue Applications
Example Dialogue (2)
B10: Niue Chile Sudan Bermuda zwanzig Gramm (na ich schätz so zwei hundert zwanzig Gramm)
S11: nach Italien|sind folgende Sonderbehandlungen zulässig: Einschreiben, Eilsendung und Nachnahme. Wollen Sie eine dieser Sonderbehandlungen nutzen, wenn ja, welche?
B12: nein richtig bitte kostet hier (nein das ist mir zu kompliziert)
S13: Danke für Ihre Angaben
S14: Ein Brief|nach Italien|20|Gramm schwer, der|priority|versendet wird, kostet|7|Schilling
S15: Benötigen Sie eine weitere Auskunft?
B16: nein danke
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
• Purpose of dialogue annotation
– Linguistic description and analysis on different levels
– Resources for conversation analysis (sociological, socio-linguistic research)
– Resources for system engineering (acoustic models, language models)
– Resources for application development (Prompts, recognition grammars, dialogue design)
– Resources for system evaluation
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Annotation Schemas
• Corpus Encoding Standard
• MATE
• ATLAS
• DAMSL
The MATE project provides a good overview of annotation schemas
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Spoken Dialogue Corpora
• Human-Human– Call Home (spontaneous telephone speech)
– Map Task (direction giving on a map)
– Switchboard (task-oriented human-human dialogues)
– Childes (child language dialogues)
– Verbmobil (appointment scheduling dialogues)
– TRAINS (task-oriented dialogues in railroad freight domain)
• Human-Machine– Danish Dialogue System (57 dialogues, domestic flight reservation)
– Philips (13500 dialogues, train timetable information)
– Sundial (100 Wizard of Oz dialogues, British flight information)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Audio Properties in Corpora
• Sampling rate (samples/sec, Hz)
• Audio resolution (bit)
• Linear vs. logarithmic coding (a-law, -law)
• Mono vs. Stereo
• Type of microphone and recording environment
• Audio coding / compression
4. Dialogue Annotation
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Map Task Corpus
• Map Task is a cooperative task involving two participants who sit opposite one another and each has a map which the other cannot see
• One speaker (Instruction Giver) has a route marked on her map; the other speaker (Instruction Follower) has no route
• Speakers are told that the goal is to reproduce the Instruction Giver's route on the Instruction Follower's map
• Speakers know that the maps are not identical• 128 digitally recorded unscripted dialogues and 64 citation form readings of
lists of landmark names• Transcriptions and a wide range of annotations are available as XML
documents• Separation of corpus and annotation
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Levels of Annotation
• phonetic / phonological / orthographic
• prosody
• morphology / syntax / semantics
• co-reference
• dialogue acts
• turn-taking
• cross-level
• acoustic (noise, phone line characteristics)
• communication problems
• speech recognition results (human-machine dialogues)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Dialogue Acts
• Dialogue Moves (MapTask)
• Six initiating moves– instruct - commands the partner to carry out an action – explain - states information which has not been elicited by the partner – check - requests the partner to confirm information – align - checks the attention or agreement of the partner– query-yn - asks a question which takes a "yes" or "no" answer – query-w - any query which is not covered by the other categories
• One pre-initiating move– ready - a move which occurs after the close of a dialogue game and
prepare the conversation for a new game to be initiated
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
• Five response moves: – acknowledge - a verbal response which minimally shows that the
speaker has heard the move to which it responds – reply-y - any reply to any query with a yes-no surface form which
means "yes", however that is expressed – reply-n - a reply to a a query with a yes/no surface form which means
"no" – reply-w - any reply to any type of query which doesn't simply mean
"yes" or "no" – clarify - a repetition of information which the speaker has already
stated, often in response to a check move
4. Dialogue Annotation
Dialogue Acts (2)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Dialogue Grammars
• sequencing regularities in dialogue (adjacency pairs)
• capture the fact that questions are generally followed by answers, proposals by acceptances,
• dialogues are a collection of such act sequences, with embedded sequences for digressions and repairs
• Dialogue grammars can be used to predict the next dialogue act of the user
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Dialogue Acts (DAMSL)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Dialogue Acts (DAMSL)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Cross-Level Annotation
• Cross-level annotation provides links between different levels of annotation
• Useful for annotation of communication problems, which can be caused by phenomena in other levels (e.g. morphosyntax, coreference …)
• XML IDs and references provide a mechanism for annotation of cross-level phenomena
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Annotation Graphs
• 'Linguistic Annotation' covers any descriptive or analytical notations applied to raw language data
• The basic data may be in the form of time functions - audio, video and/or physiological recordings - or it may be textual
• Annotation graphs focus on the logical structure of linguistic annotations, not on file formats
• Annotation graphs provide a common conceptual core for a wide variety of existing annotation formats
(Bird and Liberman, 2001)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Dialogue Annotation
Annotation Graphs: formal definition
• An annotation graph G over a label set L and timelines (Ti,i) is a 3-tuple <N,A,> consisting of a node set N, a collection of arcs labelled with elements of L, and a time function : N Ti, which satisfies the following conditions:
1. <N,A> is a labelled acyclic digraph containing no nodes of degree zero.
2. For any path from node n1 to n2 in A, if (n1) and (n2) are defined, then there is a timeline i such that (n1) i (n2)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
4. Excursion: XML
• XML = Extensible Markup Language
• successor of SGML
• W3C standard
• very versatile; used for markup of texts, data interchange, databases, description of chemical structures, annotation of dialogues (e.g., MATE), specification of dialogues (e.g., VoiceXML) among others
• can describe any tree structure with complex node lables
• description of graph structures with identifiers and references
• An XML document consists of entities, elements and attributes
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
5. Introduction to XML
History of XML
(La)TeX ...
SGML
HTML XML
XHTML VXML MATE
hyperlinking
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
5. Introduction to XML
XML Elements and Attributes
• Elements delimit sections of documents
<phone><country>49</country><city>30</city><number>345077</number><ext>62</ext>
</phone>
• Attributes add information to elements
<phone type=mobile><country name="de"><net operator="vi" type="gsm1800">179</net><number status="secret">1238189</number>
</phone>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
5. Introduction to XML
ID and ID reference
• An ID attribute uniquely identifies an XML element
<person id="123"><name><first>Tony</first><last>Blair</last></name>
</person>
• An ID reference points to an element identified by an ID
<government><prime_minister idref="123"/><defense_minister idref="321"/>
</government>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
5. Introduction to XML
XML DTD
• The DTD (document type definition) is a grammar that defines valid XML documents
<!ELEMENT PHONE (COUNTRY,NET, NUMBER)>
<!ATTLIST PHONE
type CDATA #IMPLIED>
<!ELEMENT COUNTRY EMPTY>
<!ATTLIST COUNTRY>
name CDATA #REQUIRED>
<!ELEMENT NET (#PCDATA)>
<!ATTLIST NET
operator CDATA #IMPLIED
type CDATA #IMPLIED>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
MATE Annotation Tools
• MATE addresses the problems of creating, acquiring, and maintaining language corpora
1.through the development of a standard for annotating resources
2.through the provision of tools which make the processes of knowledge acquisition and extraction more efficient
• MATE treats spoken dialogue corpora at multiple levels, focusing on prosody, (morpho-) syntax, co-reference, dialogue acts, and communicative difficulties, as well as cross-level interaction.
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
MATE Timed-Unit File
<timed_unit_stream id ="xyz"><tu id="q1ec1g.1" start="0.0000" end="0.3294" utt="1">okay</tu><tu id="q1ec1g.4" start="0.3294" end="0.8432" utt="1">starting</tu><tu id="q1ec1g.5" start="0.8432" end="1.3702" utt="1">off</tu><sil id="q1ec1g.6" start="1.3702" end="1.5777"/><tu id="q1ec1g.7" start="1.5777" end="1.8413" utt="1">we</tu><tu id="q1ec1g.8" start="1.8414" end="2.2201" utt="1">are</tu><sil id="q1ec1g.9" start="2.2201" end="2.3518"/><tu id="q1ec1g.10" start="2.3518" end="2.8722" utt="1">above</tu><sil id="q1ec1g.11" start="2.8722" end="2.9644"/><tu id="q1ec1g.12" start="2.9644" end="3.0369" utt="1">a</tu><tu id="q1ec1g.13" start="3.0369" end="3.5244" utt="1">caravan</tu><tu id="q1ec1g.14" start="3.5244" end="3.9394" utt="1">park</tu><noi id="q1ec1g.16" start="3.9394" end="4.2885" type="nonvocal"/><sil id="q1ec1g.17" start="4.2885" end="4.5784"/><noi id="q1ec1g.18" start="4.5784" end="4.8617" type="lipsmack"/><noi id="q1ec1g.19" start="4.8617" end="5.3492" type="breath"/></timed_unit_stream>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
MATE Timed Unit DTD
<!ELEMENT timed_unit_stream (tu|sil|noi)* ><!ATTLIST timed_unit_stream id ID #REQUIRED><!ELEMENT tu (#PCDATA)><!ATTLIST tu id ID #REQUIRED start CDATA #REQUIRED end CDATA #REQUIRED utt CDATA #IMPLIED realisation CDATA #IMPLIED ><!ELEMENT sil EMPTY><!ATTLIST sil id ID #REQUIRED start CDATA #REQUIRED end CDATA #REQUIRED utt CDATA #IMPLIED>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
MATE Timed Unit DTD (2)
<!-- noi: a noise of some kind --><!ELEMENT noi EMPTY><!ATTLIST noi id ID #REQUIRED start CDATA #REQUIRED end CDATA #REQUIRED utt CDATA #IMPLIED type (lipsmack|outbreath|inbreath|breath|laugh|nonvocal| phongesture|unintelligible|lowamp|cough|external)
#REQUIRED >
<!-- Types of noiseslipsmack - smacking of lips and related lip/tongue clicksoutbreath - an out breathinbreath - an in breathbreath - a breath (not distinguished between in/out)laugh - laughternonvocal - non vocal tract noisephongesture - mere phonatory gesture: phonation without phonemesunintelligible - unintelligible, but apparently meant to be a wordlowamp - trailing low amplitude noise at the end of a word. cough - a coughexternal - other -->
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
Dialogue MovesMarkup in MATE
<move id="q1ec1.g.move.1" who="giver" label="ready" href="&gfile;#id(q1ec1g.1)"/>
<move id="q1ec1.g.move.2" who="giver" label="instruct" href="&gfile;#id(q1ec1g.4)..id(q1ec1g.14)"/>
<ims id="q1ec1.g.move.3.5" who="giver" href="&gfile;#id(q1ec1g.16)..id(q1ec1g.19)"/>
ID references (HREF) refer to times units on previous slide
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
MATE Dialogue Moves DTD
<!ELEMENT move_stream (move|ims)* ><!ATTLIST move_stream id ID #REQUIRED >
<!ELEMENT move (tu|sil|noi)* ><!ATTLIST move id ID #REQUIRED who (giver | follower) #REQUIRED label (instruct | explain | check | query-yn | query-w | align | reply-y | reply-n | reply-w | acknowledge | clarify | ready | uncodable) "uncodable" meta (true-meta | false-meta) "false-meta" aban (true-aban | false-aban) "false-aban" rep (other | self | none) "none" interj (true-interj | false-interj) "false-interj" cont (true-cont | false-cont) "false-cont">
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
MATE Dialogue Moves DTD (2)
<!ELEMENT ims (sil|noi)*>
<!ATTLIST ims
id ID #REQUIRED
who (giver | follower) #IMPLIED
%embedHyperlinkAttrs;>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
6. Dialogue Annotation in XML
Architectures for Annotation
XML
text filesRDB
Evaluationsoftware
Annotation tools
Query tools
Conversiontools
Extractiontools
XML
text filesRDB
Evaluationsoftware
Annotation tools
Query tools
Conversiontools
Extractiontools
AG API
Two-level architecture Three-level architecture(Bird and Liberman 2001)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
7. Evaluation of Dialogue Systems
• Blackbox evaluation: overall system performance is judged, but not its internal components
– Examples: task success, contextual appropriateness, user satisfaction
• Glassbox evaluation: system components are evaluated
– Examples: Word Accuracy, Concept Accuracy
• Subjective measures (e.g. user satisfaction) require human judgement; objective measures do not
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
7. Evaluation of Spoken Dialogue Systems
Evaluation: Turing Test
• Invented by computer science pioneer Alan Turing
• A system passes the Turing test if a human interlocutor cannot distinguish between human and machine
• Turing test does not really test system intelligence or appropriate dialogue behaviour, but getting away with simple behaviours
• The winning systems of the Loebner Prize often simulate paranoid or otherwise pathological dialogue behaviour
• It is not used as a serious evaluation method
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
7. Evaluation of Spoken Dialogue Systems
Evaluation Measures
Evaluate user's impression
Measure of the overall impression of a system
User satisfaction
(subjective)
Evaluate efficiency for the user
Time required for completing a transaction
productivity
(objective)
Evaluate the system's usability
Proportion of transactions that are successfully completed by the user
task success rate
(objective)
Evvaluate performance of speech recogniser (and language models)
Proportion of user's input words (domain concepts) that are correctly recognised
word (concept) accuracy
(objective)
purposeinterpretationmeasure
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
7. Evaluation of Spoken Dialogue Systems
Explaining User Satisfaction
• PARADISE evaluation research project at AT&T Labs
• Different objective evaluation metrics are used
– task success
– word accuracy
– dialogue cost (number of turns, number of repairs, etc.)
• Linear regression analysis is used to determine the relative contribution of the different objective criteria to subjective user satisfaction
• The method can be applied to whole dialogues and to subdialogues
• Different dialogue strategies can be compared
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
• Specification of the dialogue flow is a critical factor in the development of spoken dialogue systems
• Approaches for the definition of dialogue flow:– Behaviouristic models (stimulus-response)
– Flowcharts (e.g. CSLU toolkit)
– Slot-filling (e.g. VoiceXML)
– Condition-Action Rules (e.g. HDDL)
– Planning
– Re-usable components (e.g. Nuance SpeechObjects)
– Information State
– Event-driven
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Behaviouristic Models
• Dialogue behaviour is determined by pattern/response pairs
• Such systems are generally referred to as chatbots or chatterbots because they do "smalltalk"
• ELIZA is an early system (Weizenbaum, 1960s) simulating a non-directive psychologist
• Commerical systems available from companies like Kiwi-Logic or Artificial Life
• Cannot carry out a goal-oriented dialogue, but useful for reacting to certain situations, e.g. FAQs
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Dialogue Spec: Finite-State Models
• Clear flow of dialogue
• limited flexibility in dialogue flow
• very unwieldy for more complex dialogues
requestdestination
requestdeparture
requestdate
listflights
requestdep. time
greeting
requestarr. time
bye
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Dialogue Spec: Finite-State Models (2)
requestdestination
requestdeparture
requestdate
listflights
requestdep. time
greeting
requestarr. time
bye
sorry
no
nono
no
noyes yesyes
yesyes
yes
(from Androutsopoulos and Aretoulaki, in press)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Dialogue Spec: Finite-State Models
RapidApplicationDeveloper (RAD)from CSLU toolkit
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Dialogue Spec: Slot Filling
• System asks for missing information
• Over-answering can be handled easily
• Flexible dialogue flow
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Slot Filling: Example
Departure_Airport [London, Manchester, Glasgow, Birmingham]
Arrival_Airport [London, Manchester, Glasgow, Birmingham]
Departure_Date [<DATE>]
Departure_Time [<TIME_OF_DAY>, morning, afternoon, evening]
Number_of_Seats [1 ... 9]
Return_Flight [<BOOLEAN>]
Return_Date [<DATE>]
Return_Time [<TIME_OF_DAY>, morning, afternoon, evening]
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Dialogue Spec: Planning
• Dialogue system is given a goal, and tries to achieve the goal through general-purpose planning algorithms.
• Pre-conditions can be specified for a goal
• Example:
Goal: provide flight information
Preconditions:
Know departure airport
Know destination airport
Know flight date and time
Actions: look up flight in database, inform user
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
Goal: know information X
Precondition: X cannot be inferred from existing knowledge
Action:
Find X in database OR
Ask user about X
• General-purpose planning frameworks facilitate the integration of AI techniques (knowledge bases, inference etc.) into dialogue systems
8. Dialogue Specification
Dialogue Spec: Planning (2)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Condition-Action Rules
• Condition-Action Rules consist of a condition (COND) and an action
• The rules are checked in sequence until one condition is satisfied. The action of the rule is then executed, and the process starts over again.
• Conditions relate to the status of system variables (unknown, known, verified) or recogniser output (e.g. NO_SPEECH, NOTHING_UNDERSTOOD)
• Slot-filling can be easily implemented by condition-action rules
• Overanswering can be handled well
• Example: HDDL which is used in the Philips SpeechMania dialogue system
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
HDDL condition-action rule
COND( art == "paket" && !^gewicht )
{
QUESTION(gewicht)
{
INIT
{
"Geben Sie bitte das Gewicht des Pakets an";
}
}
}
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
8. Dialogue Specification
Modularisation: Speech Objects
• SpeechObjects are re-usable dialogue modules
• SpeechObjects perform well-defined functions such as taking time and date or taking credit card information (type, number, expiry date, name of cardholder)
• Error handling and verification is built into the speech objects
• Developers can build up their own libraries of re-usable speech objects.
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
9. VoiceXML
• VoiceXML is a language for the specification of dialogue systems
• VoiceXML is an XML application defined by a DTD (Document Type Description).
• Dialogue flow by "slot-filling" (Form Interpretation Algorithm)
• Processing is similar to the filling of forms in HTML pages
• VoiceXML is a W3C (WWW Consortium ) standard and is supported by a number of companies.
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
9. VoiceXML
VoiceXML Goals
• Minimize client-server interactions by specifying multiple interactions per document
• Shield applications authors from low-level, platform-specific details
• Separate user interaction code (in VoiceXML) from service logic (CGI scripts)
• Promote service portability across implementation platforms
• Ease of use for simple interactions, and powerful language features for complex dialogues
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
9. VoiceXML
VoiceXML Architecture
Document Server
VoiceXML InterpreterVoiceXML InterpreterContext
Implementation Platform
Request Document
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
9. VoiceXML
VoiceXML example
<?xml version="1.0"?><vxml version="1.0"> <form> <field name="drink"> <prompt>Would you like coffee, tea, milk, or nothing?</prompt> <grammar src="drink.gram" type="application/x-jsgf"/> </field> <block> <submit next="http://www.drink.example/drink2.asp"/> </block> </form></vxml>
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
9. VoiceXML
VoiceXML example dialogue
S (System): Would you like coffee, tea, milk, or nothing?
B (Benutzer): Orange juice.
S: I did not understand what you said.
S: Would you like coffee, tea, milk, or nothing?
B: Tea
S: (continues exectution with the VoixeXML program drink2.asp)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
9. VoiceXML
VoiceXML Form Interpretation Algorithm
• select phase: the next form item is selected for visiting.
• collect phase: the next unfilled form item is visited, which prompts the user for input, enables the appropriate grammars, and then waits for and collects an input (such as a spoken phrase or DTMF key presses) or an event (such as a request for help or a no input timeout).
• process phase: an input is processed by filling form items and executing <filled> elements to perform actions such as input validation. An event is processed by executing the appropriate event handler for that event type.
(from VoiceXML 1.0 specification)
ESSLLI 2001Helsinki
Languages for the Annotation and Specification of Dialogues
10. Challenges
• Combining spoken dialogue and multimedia interaction (multimodal dialogue)
• Combining speech recognition and pointing/clicking on the display
• Combining speech output with (animated) graphics or video
• Adaptation to the user
• Adaptation to the communicative situation
• Defining a dialogue specification language that is easy to use, and expressive enough to model complex dialogue behaviours
• Learning from annotated dialogues (e.g. Jornsson 93)