AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez...
-
Upload
alannah-dickerson -
Category
Documents
-
view
218 -
download
1
Transcript of AQA: a multilingual Anaphora annotation scheme for Question Answering E. Boldrini, P. Martínez...
AQA: a multilingual Anaphora annotation scheme for
Question Answering
E. Boldrini, P. Martínez Barco, B. Navarro Colorado, M. Puchol Blasco, C. Vargas Sierra
[eboldrini/patricio/borja/marcel/]@dlsi.ua.es [email protected]
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Outline• Introduction
• Corpus
• Principles
• Previous work
• Problematic cases
• Evaluation
• Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introductioninteraction• AQA: multilingual annotation scheme for anaphora
resolution that can be applied in machine learning for the improvement of QA systems
• To understand and annotate the way anaphora is used in each language
• To be able to detect the antecedent of each the anaphora and find the correct answer
• INTERACTION between the user and the system
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introductionlanguages
• Languages: Italian, Spanish, English
• Advantages: participate successfully in competitions in which the question is formulated in a language and the system shows the answer in another language
• Disadvantages: languages with different characteristics
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introductionlanguages
• Languages: Italian, Spanish, English
• Advantages: can participate successfully in competitions in which the question is formulated in a language and the system shows you the answer in another language
• Disadvantages: languages with different characteristics
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q065">¿Qué medio de transporte se utilizó en la Expedición Kon-tiki?</q><q id="q066">¿Cuántas personas <link rel="dir" status="ok" type="pron" ref="" ant="a" refq="q065">la</link> tripulaban?</q></t>
<t><q id="q265">Quale mezzo di trasporto venne usato nella spedizione Kon-Tiki?</q><q id="q266">Quanti membri d'equipaggio aveva <link rel="dir" status="ok" type="elips" ref="" ant="a" refq="q265">0</link>?</q></t>
<t><q id="q465">What transport was used in the Kon-Tiki Expedition?</q><q id="q466">How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>?</q></t>
Corpus
• Corpus for CLEF 2008 in English, Italian and Spanish
• 200 questions per language
• Topic-related questions
• Categories of questions: factoid, definition, and list
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Principlesannotated elements
• Each group has a topic
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Principlesannotated elements
• Each group has a topic
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elements
• If there is a subtopic, we mark it
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• Each question (question/answer pair) has a number
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Principlesannotated elments
• Each question (question/answer pair) has a number
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• Each anaphora has a number, the same of its antecedent
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Principlesannotated elments
• We indicate if the antecedent is in the question or in the answer
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• We indicate if the antecedent is in the question or in the answer
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Principlesannotated elments
• We indicate if the antecedent is in the question or in the answer
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
<t><q id="q482">Which city is the headquarters of the China's Eastern Fleet?</q><q id="q483">How far from China's capital city is <link rel="dir" status="ok" ant="a" refq="q482" type="pron" ref="">it</link>?</q><q id="q484">What was <link rel="indir" status="ok" ant="a" refq="q482" type="dd" ref="">its population</link> in 2002?</q></t>
Principlesannotated elments
• We indicate the number of the question or the answer where the antecedent is situated
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• We indicate the number of the question or the answer where the antecedent is situated
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Principlesannotated elments
• We select the type of anaphora
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• We select the type of anaphora
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Principlesannotated elments
• We select the type of anaphora
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q453">In which country is <de id="n28">the Colditz Castle</de>?</q><q id="q454">Exactly in which state is <link rel="dir" status="ok" type="pron" ref="n28" ant="q" refq="q453">it</link>?</q><q id="q455">Who was the first who escaped from <link rel="dir" status="ok" type="adv" ref="n28" ant="q" refq="q453">there</link> ?</q>
Principlesannotated elments
• We select the type of anaphora
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q412">Who published the Evangelium Vitae <de id="n6">encyclical</de>?</q><q id="q413">How many <link rel="dir" status="ok" ant="q" refq="q412" type="elips" ref="n6">0</link> did <link rel="dir" status="ok" ant="a" refq="q412" type="pron" ref="">he</link> publish?</q></t>
Principlesannotated elments
• We select the type of relation
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• We select the type of relation
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Principlesannotated elments
• We select the type of relation
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q416">Which islands are in <de id="n9">the Pelagie Islands</de>?</q><q id="q417">Which is <link rel="indir" status="ok" type="dd" ref="n9" ant="q" refq="q416">the biggest one</link>?
Principlesannotated elments
• We underline if the annotator has doubts or not
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
Principlesannotated elments
• We underline if the annotator has doubts or not
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Previuos work• UCREL (Fligelstone, 1992; Garside et al., 1997): first scheme
for anaphora resolution
• MUC: inclusion of the coreference task in MUC-6 and MUC-7
• Last decade of 20th century: anaphora resolution project for French (Popescu, Belis and Robba, 1997).
• Martínez-Barco and Palomar (2001): An annotation scheme for dialogues applied to anaphora resolution algorithm.
• MATE/GNOME (Poesio, 2004): meta-model
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Previuos workwhat we added
• MATE/GNOME (Poesio, 2004): meta-model
• Element link in the text with the information about the anaphora
• Identification of the question/answer pair
• Topic/subtopic
• Antecedent in the question or in the answer
• Status of the annotation
• Applied to three languages
• Applied to collections of questions
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Problematic cases
• World knowledge
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Problematic cases• World knowledge
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
<t><q id="q404">Which was <de id="n2">the "gordo" in the 1995 Christmas</de>?</q><q id="q405">Which was <link rel="indir" status="no" type="dd" ref="n2" ant="q" refq="q404">the prize</link>?</q></t>
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Problematic cases• World knowledge• World knowledge
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
• An antecedent contains another one<t><q id="q427">Who were <de id="n14">the founders of <de id="n15">Magnum Photos</de> </de>?</q><q id="q428">In what year did <link rel="dir" status="ok" ant="q" refq="q427" type="pron" ref="n14">they</link> found <link rel="dir" status="ok" type="pron" ref="n15" ant="q" refq="q427">it</link>?</q></t>
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Problematic cases• World knowledge
• An antecedent contains another one
• World knowledge
• An antecedent contains another one
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
<t><q id="q432">What is <de id="n18">the starring cast</de> of the film Beetlejuice?</q><q id="q433">Who of <link rel="dir" status="ok" type="pron" ref="n18" ant="q" refq="q432">them</link> is the main character?</q></t>
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Problematic cases• World knowledge
• An antecedent contains another one
• Collective nouns
• World knowledge
• An antecedent contains another one
• Collective nouns
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
• Two antecedents, but separated
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
• Doubtful position of the antecedent
• An anaphora inside a discourse entity
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
<t><q id="q429">Between what days was <de id="n16">the battle of Brunete</de>?</q><q id="q430">Where was the article of <de id="n17">Gerda Taro</de> about <link rel="dir" status="ok" type="dd" ref="n16" ant="q" refq="q429">this battle</link> published?</q><subt><q id="q431">Which hospital were <link rel="dir" status="ok" type="pron" ref="n17" ant="q" refq="q430">she</link> moved to after her accident?</q></subt></t>
<t><q id="q465">What transport was used in the Kon-Tiki Expedition?</q><q id="q466">How many people crewed <link rel="dir" status=”no" type="pron" ref="" ant=”q" refq="q465">it</link>?</q></t>
Problematic cases• World knowledge
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
• World knowledge
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
• Doubtful position of the antecedent
• An anaphora inside a discourse entity• An anaphora inside a discourse entity
?? ??
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Problematic cases• World knowledge
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
• World knowledge
• An antecedent contains another one
• Collective nouns
• Two antecedents, but separated
• Doubtful position of the antecedent
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
• An anaphora inside a discourse entity
<t><q id="q434">What is <de id="n19">a censer</de> ?</q><q id="q435">What name is given to <de id="n20"> <link rel="dir" status="no" type="pron" ref="n19" ant="q" refq="q434">the one</link> of the Cathedral of Santiago de Compostela </de>?</q><q id="q436">How much does <link rel="dir" status="ok" type="pron" ref="n20" ant="q" refq="q434">it</link> weight?</q></t>
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Evaluation
• Annotation
• 2 annotators
• Blind annotation
• Evaluation
• Each language independently
• Global results
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Evaluationsubdivision
• Topic boundary
• Anaphora detection
• Anaphora attibutes
• Antecedent recognition
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Evaluationtopic boundary
• Class N: new topic
• Class S: same topic
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
SPANISH ITALIAN ENGLISH
A1\A2 S N A1\A2 S N A1\A2 S N
S 62 0 S 62 0 S 61 0
N 0 138 N 0 138 N 1 138
Kappa 1 Kappa 1 Kappa 0,988
Evaluationanaphora detection
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
SPANISH ITALIAN ENGLISH
Anaphors detected by A1 70 69 67
Anaphors detected by A2 70 69 68
Anaphors detection agreement 70 69 67
Different anaphora boundary 1 1 0
Evaluationanaphora attributes (antecedent)
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
SPANISH ITALIAN ENGLISH
A1\A2 Q A A1\A2 Q A A1\A2 Q A
Q 64 0 Q 62 0 Q 61 0
A 0 6 A 0 7 A 0 6
Kappa 1 Kappa 1 Kappa 1
Evaluationanaphora attributes (type)
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
SPANISH ITALIAN ENGLISH
A1 A2 A1 A2 A1 A2
Elips 33 33 32 32 3 3
Pron 13 15 13 13 42 42
Adv 1 1 2 2 1 1
Sup 1 0 0 0 0 0
DD 22 21 22 22 21 21
Kappa 0,955 1 1
Evaluationanaphora attributes (relation)
• Dir: direct relation
• Indir: bridging relation
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
SPANISH ITALIAN ENGLISH
A1\A2 DIR INDIR A1\A2 DIR INDIR A1\A2 DIR INDIR
DIR 52 0 Q 51 0 Q 52 0
INDIR 4 14 INDIR 1 17 INDIR 2 13
Kappa 0,838 Kappa 0,961 Kappa 0,909
Evaluationantecedent recognition
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
SPANISH ITALIAN ENGLISH
Total antecedents into the answer (agreement) 6 7 6
Total antecedents into the question (agreement) 64 62 61
Anaphors pointing to the same question (refq) (agreement)
64 62 61
Antecedents with different boundary (disagreement)
2 3 1
Evaluationglobal results
• Total agreement results
• Spanish: 60/70 = 0,857
• Italian: 60/69 = 0,869
• English: 59/67 = 0,880
• Average: 0,868
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Conclusion
• Multilingual annotation scheme for anaphora resoultion
• For the improvement of QA system: the system can detect the antecedent of each anaphora and extract the correct answer
• For a true interaction between the system and the user
• Simple but complete
• Positive results of the evaluation
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages
Future work
• Integration of other languages
• Application of the annotation scheme to other corpora
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
Evaluationmeasure used
• Kappa
Introduction - Corpus - Principles - Previous work - Problematic cases - Evaluation - Conclusion
CBA 2008 Corpus-Based Approaches to Coreference Resolution in Romance Languages